3.5.1.15.1 Further Throughput Enhancement with Loop Pipelining
(Ask a Question)In this example, the throughput of the streaming circuit will be limited by how frequently the functions can start processing new data (for example, how frequently the new loop iterations can be started). For instance, if the slowest function among the three functions can only start a new loop iteration every 4 cycles, then the throughput of the entire streaming circuit will be limited to processing one piece of data every 4 cycles. Therefore, as you may have guessed, we can further improve the circuit throughput by pipelining the loops in the three functions. If you run SmartHLS™ synthesis for the example (Compile Software to Hardware
), you should see in the Pipeline Result
section of our report file, summary.hls.<top_level>.rpt
, that all loops can be pipelined with an initiation interval of 1. That means all functions can start a new iteration every clock cycle, and hence the entire streaming circuit can process one piece of data every clock cycle. Now run the simulation (Simulate Hardware
) to confirm our expected throughput. The reported cycle latency should be just slightly more than the number of data samples to be processed (INPUTSIZE
is set to 128; the extra cycles are spent on activating the parallel accelerators, flushing out the pipelines, and verifying the results).