FIFO or DMA?


Description



   A goal of this small project is an attempt to compare performance of two approaches for a fast UART communication - FIFO and DMA
   A STMicroelectronics(c) STM32F10x MCU uses Cortex-M3 core DMA to perform fast UART processing.
   For the same purpose, a NXP(c) LPC2x (ARM-7TDMI) and Luminary Micro(c) LM3Sx (Cortex-M3) uses a hardware UART rx/tx FIFO.

1. Performance calculation

  When TNKernel is running in the idle task, it means that all jobs in all tasks were completed and CPU is free.
   In the idle task, TNKernel just makes increment of tn_idle_count (32-bit counter).
   Each 1 sec the system writes tn_idle_count contents into tn_curr_performance variable and then sets tn_idle_count to 0.
   The reference value of tn_curr_performance is calculated when there are no user tasks - only two system tasks (idle and timer) are running. The value of the tn_curr_performance can be obtained under debugger.
   Then a full test project is running and tn_curr_performance value is send by the UART to the host (PC terminal).
   A CPU usage (a parameter is the opposite of performance) can be calculated by the formula:

   This method can checks the data transfer efficiency of a different MCUs without direct comparison (each MCU has an own clock rate and a lot other differences).

2. Tests description

   A test project uses 2 UARTs - UART1 with speed 921600 b/s and UART2 with speed 115200 b/s.
   The UART1 works in the reception mode only and receives each 1 ms from an external source the packet of data at the speed 921600 b/s. After reception of the each 6th packet, the contents of packet is transferred to the host (PC terminal) by the UART2.
   The UART2 sends a message to PC terminal each 50 ms at the speed 115200 b/s regardless of the UART1 reception state.
   The UART2 also can receive data from the host (PC) and send a response back to PC asynchronously to the other transmitting requests.
   An UART transmitting driver works in a non-blocking mode.

3. UART drivers for STM32F10x

   An UART Tx driver uses a semaphore semTxUART as a critical section to protect the driver from simultaneous access.
   A semTxPacketUART semaphore is used to signal about the end of the DMA transferring in the DMA end_of_transfer interrupt handler.
   An UART Rx driver uses a Rx memory buffer to store received data, the buffer is filled by Rx DMA.
   Each 1 ms (by SysTick interrupt), a Rx DMA is checked for a new arrived bytes. Is these bytes exists, an offset in the buffer and number of the bytes is send to Rx task for the processing by the TNKernel queue. In fact, this is a DMA polling operation.
   At the Rx DMA interrupt the similar operation is performed after DMA reloading.
   The reason of the Rx DMA interrupt using is a necessity to know precisely the time of the ending a byte reception.
   At the high speed reception, this information only can guarantee the time interval, that is enough for the Rx DMA reloading without lost the byte (s) at the reception - a polling can not be used for this purpose.
   To prevent a data damaging at the beginning of the Rx memory buffer, the data at the buffer beginning should be processed before the DMA reloading. For this reason, the Rx memory buffer should be big enough (in the most cases - 64..256 bytes is enough).

4. UART drivers for LPC21x

   An UART Tx driver uses a semaphore semTxUART as a critical section to protect the driver from simultaneous access.
   A semFifoEmptyTxUART semaphore is used to signal about the end of the FIFO data transmitting in the UART Tx_FIFO_Empty interrupt handler.
   An UART Rx driver used a TNKernel data queue and a fixed-sized memory pool.
   At the UART Rx FIFO Threshold interrupt (for the test project - 14 bytes) the system obtains a memory block from the fixed-sized memory pool (the block size is UART FIFO size -16 bytes) and copies all Rx FIFO contents to the memory block.
  Then information about memory block address and number of received bytes is packed and is send to Rx task for the processing (by the data queue).

5. UART drivers for LM3S8x

   A LM3S8x UART drivers are very similar to the LPC2x UART drivers.
   The LM3S8x UART (Arm(c) PrimeCell(tm) PL011 UART) has a minimal UART Tx FIFO Threshold equal <= 1/8  Tx FIFO size (2 bytes). To prevent Tx FiFO interrupts at the case when the Tx FiFO contains less than 2 bytes, an adiitional operation to disable/enable Tx FIFO interrupts is required.  

6. Test results

    CPU:   -  LPC2103;  f = 58.9824 MHz; ARM mode; compiler - CrossWorks 1.5 (GCC 3.4.4)
               - LM3S811;  f = 50 MHz; compiler - CrossWorks 1.7 (GCC 4.1.1 2006g3-26) 
              -  STM32F103RBT6; f = 72 MHz;  compiler - CrossWorks 1.7 (GCC 4.1.1 2006g3-26)

Test

Test description

tn_curr_performance

(average)

CPU usage, %

LPC2103 LM3S811 STM32F103RBT6 LPC2103 LM3S811 STM32F103RBT6
N1 Obtaining a reference value 1655012 2750624 2458000 0 0 0
N2 UART2 sends a message to PC terminal each 50 ms at the speed 115200 b/s 1639582 2727264 1960651 0.9 0.85 20.2
N3 test N2 conditions + UART1 receives from external source packet of data each 1 ms at the speed 921600 b/s. After reception of the each 6th packet (each 6 ms), the contents of a packet is transferred to the host (PC terminal) by UART2. 1347145 2295860 1823679 18.6 16.5 25.8

Downloads



test-performance-1-0.zip
   A performance test source code for NXP(c) LPC2103 and STMicroelectronics(c) STM32F103RBT6 MCUs
   The file also contains a projects for the Rowley CrossWorks Studio 1.7, Keil RVC v.3.24, IAR ARM v.5.20, GCC 4.3.2 (Codesourcery 2008q3-66)
test-performance-lm3s8x-1-0.zip
   A performance test source code for Luminary Micro (c) LM3S811 MCU
   The file also contains a projects for the Rowley CrossWorks Studio 1.7, Keil RVC v.3.24, IAR ARM v.5.20, GCC 4.3.2 (Codesourcery 2008q3-66)