2 WCET Measurement
A bare metal application was developed for realizing how various latency factors increase the upper-bound of the software execution time. Using this bare metal application, the normal execution time and WCET are measured by disabling and enabling the latency factors. In the application, the device eNVM stores a custom bootloader. At device power-up, the E51 monitor core executes the bootloader for loading separate binary files in the DDR memory for each U54 application core. The bootloader executes from eNVM (code) and scratchpad (data).
The following figure shows the memory hierarchy and the interaction of application cores with different memory regions in the example application.
The following table lists the task executed by each U54 application core and the memory region used for fetching and executing the code. The bootloader loads all the binary files to different DDR regions.
Application Core | Task(s) | Memory Region | Description |
---|---|---|---|
U54_1 | Executes the 32 x 32 matrix multiplication task and prints the results on serial terminal. | Executes from ITIM and LIM. | The bootloader loads the 32 x 32 matrix multiplication task to DDR and U54_1 copies this task to ITIM and LIM for executing it. This scenario demonstrates the predictability of execution time. Because, the non-predictability caused by caching is eliminated in this scenario. |
U54_2 | Continuously flushes L2 cache lines using the L2 flush register. | Executes from cached DDR. | This task acts as adversarial to tasks running on U54_1 and U54_4 cores. The effects of continuously evicting or flushing the L2 cache lines on tasks executing from ITIM/LIM and DDR can be observed by executing this task. This scenario helps in analyzing the effects of L2 cache flush and refresh function on the execution time. |
U54_3 | Continuously reads data from DDR, which results in cache evicts. | Executes from cached DDR. | |
U54_4 | Executes 32 x 32 matrix multiplication task and prints results on serial terminal. | Executes from cached DDR address range. | DDR cached scenario demonstrates unpredictability of execution times due to presence of caches. This scenario is for comparison purpose only. |
The normal execution time is measured by excluding U54_2 and U54_3 application cores and by disabling branch prediction trashing in U54_1 and U54_4 cores. WCET is measured by including the U54_2 and U54_3 cores, and by enabling branch prediction trashing and a 1000 mcycle interrupt routine in U54_1 and U54_4 cores. At device power-up, bootloader provides an option to enable or disable U54_2 and U54_3 cores via the serial terminal. To calculate WCET, the U54_1 and U54_4 applications require a rebuild because the latency factors are enabled in the code. The normal execution time and WCET are measured based on scenarios highlighted in Figure 1.
- To measure the normal execution time on
U54_1 (ITIM and LIM) and U54_4 (cached DDR), the following sequence is followed:
- U54_2 and U54_3 are excluded > U54_1 and U54_4 execute the task without branch prediction trashing and interrupt routine. U54_4 executes the task from cached DDR region.
- To measure WCET on U54_1 (ITIM and LIM) and
U54_4 (cached DDR), the following sequence was followed:
- U54_2 and U54_3 are included > U54_1 and U54_4 execute the task with branch prediction trashing and interrupt routine. The U54_4 core executes the task from cached DDR region.
Bare metal user applications are built from the PolarFire SoC Bare Metal library, which includes hardware abstraction layer and drivers to access various MSS blocks such as L1 cache, L2 cache, timer, Interrupt registers, and peripherals. For more information about the Bare Metal library, see https://github.com/polarfire-soc/polarfire-soc-bare-metal-examples.