Visualizing the profiled data

Finally, the third step in the profiling process is to parse the generated .prof files and visualize them. This can be done from the IDE by clicking on RISC-V SoC Features > Reference SoC with HLS Accelerator(s) > Run SoC Profiler Viewer; or from the command-line, simply type:

shls soc_profiler_view
This command will print on the terminal a table with the summary of the runtime execution per module displaying the number of samples captured (i.e. the number of times the hardware module was called) minimum, maximum, average and aggregate runtimes per hardware module. By default the units are in microseconds. In this example, the table shows the profiling summary for a project with four hardware functions: hw_const_add(), hw_const_mult(), hw_sum(), and hw_delay(). A copy of this summary will be stored in a file under this directory: hls_output/reports/hls_soc_profiler.summary.rpt.
Project: soc_profiler

---------------------------------
Statistics (units in [us]):


--------[ main ]--------
main start      : 0.19 us
main finish     : 11747.38 us
main Delta Time : 11747.19 us

--------[ hw_const_add ]--------
Number of samples: 10
+-------------+-------+--------+-------+-----------+
|             | min   | max    | avg   | aggregate |
+-------------+-------+--------+-------+-----------+
| Write       | 50.76 | 208.28 | 69.08 | 690.76    |
| Accelerator | 5.42  | 6.62   | 5.59  | 55.94     |
| Total       |       |        |       | 746.7     |
+-------------+-------+--------+-------+-----------+

--------[ hw_const_mult ]--------
Number of samples: 10
+-------------+------+-------+-------+-----------+
|             | min  | max   | avg   | aggregate |
+-------------+------+-------+-------+-----------+
| Write       | 3.82 | 7.74  | 4.3   | 42.98     |
| Accelerator | 4.74 | 4.83  | 4.77  | 47.7      |
| Read        | 40.3 | 52.18 | 47.31 | 473.1     |
| Total       |      |       |       | 563.78    |
+-------------+------+-------+-------+-----------+

--------[ hw_delay ]--------
Number of samples: 100
+-------------+------+---------+------+-----------+
|             | min  | max     | avg  | aggregate |
+-------------+------+---------+------+-----------+
| Write       | 0.44 | 3.25    | 0.48 | 48.36     |
| Accelerator | 2.7  | 2000.96 | 22.7 | 2270.25   |
| Read        | 0.46 | 0.88    | 0.47 | 47.03     |
| Total       |      |         |      | 2365.64   |
+-------------+------+---------+------+-----------+

--------[ hw_sum ]--------
Number of samples: 10
+-------------+-------+-------+-------+-----------+
|             | min   | max   | avg   | aggregate |
+-------------+-------+-------+-------+-----------+
| Write       | 3.62  | 6.79  | 4.09  | 40.88     |
| Accelerator | 23.12 | 23.71 | 23.21 | 232.11    |
| Read        | 0.46  | 0.68  | 0.48  | 4.81      |
| Total       |       |       |       | 277.8     |
+-------------+-------+-------+-------+-----------+

In this context, the word Write refers to the time that the CPU spent writing data to the hardware module. The word Accelerator (or Acc in the plot below) refers to the time the hardware module was active. And finally, the word Read refers to the time that the CPU spent reading data from the hardware module's on-chip buffers. The command shls soc_profiler_view will also open up a GUI that shows a plot of the runtime. The GUI has a way to zoom-in to specific region of the plot. Also, the mouse pointer can hover over specific bar on the plot and the corresponding annotated data will be displayed.

By default, the plot will display the timelines for all the hardware modules in the project by including all the *.prof files in the plot. However, a single timeline for a specific hardware module can be displayed using the SOC_PROFILER_FILENAME makefile variable and assign to it the specific .prof file to visualize. For example, to display only the time line for the hardware module hw_const_sum() you can type this:

shls soc_profiler_view  SOC_PROFILER_FILENAME=./hls_output/files/hls_hw_const_sum.prof

Note the hls_ prefix in the filename. The plot will look like this:

Important: Currently this feature to selectively show a single timeline is only available via the command-line.

In some cases, such as in automated regression tests, an interactive GUI is not required or not even desired. The profiler can be run in non-interactive mode like this:

shls soc_profiler_view SOC_PROFILER_INTERACTIVE=0

In this case the summary file is still generated and printed to the terminal but the GUI plot will not be displayed. Instead an image file with .png format will be generated under hls_output/reports/hls_soc_profiler.png. Note that this non-interactive option is only available from the command-line.

Important:

At the moment, the SoC Profiler only measures the time for arguments declared as AXI Target, with or without DMA. Arguments defined as AXI Initiator will only show a small bar in the write timeline representing the CPU passing the pointer address from which the hardware module will initiate read or write requests. The AXI Initiator transfer time is implicitly considered as part of the hardware module runtime (i.e. Acc timeline). In the example above, the hw_const_add() module has no read time. That is because the CPU is not reading the on-chip buffer, the hardware module itself is transferring the data back to the CPU memory directly and therefore the time consumed is considered as part of the module execution runtime.