ASFv4 vs ASFv3 Benchmark

ASFv4 vs ASFv3 benchmark comparison.

One of the best ways to show how the changes in ASFv4 have improved the drivers is to show some benchmark numbers. These benchmarks compare applications are written in ASFv4 code with the same application written in ASFv3 code. The behavior of these example applications is exactly the same. In all these examples the default linker script has been used which set aside 8192 bytes of SRAM for a stack. The stack usage has been removed from the SRAM numbers.

Table 1. ASFv4 vs. ASFv3 SPI Master Driver Memory Size Benchmark
  FLASH SRAM
  Full code Driver code Full code Driver code
ASFv3 4328 2916 184 + STACK 120
ASFv4 2908 1916 208 + STACK 64
Table 2. ASFv4 vs. ASFv3 SPI Master Driver Throughput
  Bytes/s
ASFv3 54078
ASFv4 82987

Compiled in Atmel Studio 7 (7.0.582) with GCC 4.9.3, with default settings except for optimization level -O3 and using the C lib specification nano-lib.

Hardware used is a SAMD21J18A. Both the CPU, buses, and SERCOM module are running at 8 MHz.

The example used for this benchmark initializes the device (clock setup, pin multiplexing, interrupts), set up a SERCOM instance for SPI Master mode with 1000,000 baud rate, enables it, and writes "Hello World!" on the SPI bus. "Full code" is the memory usage for the whole application loaded on the device while "driver code" is the memory usage of the SPI driver-specific code components. Because many different drivers support the same hardware in ASFv4, a lookup table is used to figure out which interrupt handler to execute for a given peripheral instance. On Cortex M0+ based devices this table is 112 bytes, and will not grow with the size of the project, this is the reason for ASFv4 slightly higher total (full code) SRAM usage in this small example code.

Table 3. ASFv4 vs. ASFv3 USART Driver Memory Size Benchmark
  FLASH SRAM
  Full code Driver code Full code Driver code
ASFv3 4940 3376 168 + STACK 80
ASFv4 2684 1756 232 + STACK 88

Compiled in Atmel Studio 7 (7.0.582) with GCC 4.9.3, and with default settings except for optimization level -Os, and using the C lib specification nano-lib.

The code was compiled for SAM D21.

The example used for this benchmark initializes the device (clock setup, pin multiplexing, interrupts), set up a SERCOM instance for USART mode with 9600 baud rate, enables it, and writes "Hello World!" on the USART bus. "Full code" is the memory usage for the whole application loaded on the device while "driver code" is the memory usage of the USART driver-specific code components. It's worth noting that the same 112 bytes interrupt handler lookup table is used in this example, and the USART driver includes a 16 bytes ring buffer for data reception to make it much less likely to lose data.