3.5.1.10.1 Dataflow Example: Canny with FIFOs

To see this complete code example, please refer to the C++ Canny Edge Detection (SW/HW Co-Simulation) example included in the SmartHLS IDE.

An example of overlapping sequential sub-functions is an image processing pipeline, such as Canny edge detection. Canny edge detection runs 4 image processing algorithms sequentially.

void canny(unsigned char *input_frame,
           unsigned char *output_frame) {
#pragma HLS function dataflow

#pragma HLS dataflow_channel variable(output_gf) type(double_buffer)
    unsigned char output_gf [HEIGHT * WIDTH];
#pragma HLS dataflow_channel variable(output_sf) type(double_buffer)
    unsigned short output_sf [HEIGHT * WIDTH];
#pragma HLS dataflow_channel variable(output_nm) type(double_buffer)
    unsigned char output_nm [HEIGHT * WIDTH];

    gaussian_filter(input_frame, output_gf);
    sobel_filter(output_gf, output_sf);
    nonmaximum_suppression(output_sf, output_nm);
    hysteresis_filter(output_nm, output_frame);
}

These can overlap using dataflow parallelism to operate as a single pipeline. Each time the canny function is called, one new piece of data will enter the pipeline. In the example above, the granularity of data is an entire image, of size HEIGHT * WIDTH pixels. After the fourth call, the pipeline will be in steady-state, and all 4 sub-functions will be executing in parallel on 4 separate images. The intermediate data arrays are double-buffered as specified 3.6.1.12 Dataflow Channel pragma.

Because the Canny pipeline processes the image data in sequential order, however, the granularity of data entering the pipeline each function call does not need to be an entire image. Instead, it can be a single pixel. The intermediate channels could then be FIFOs.

void canny_fifo(hls::FIFO<unsigned char> &input_fifo,
                hls::FIFO<unsigned char> &output_fifo) {
#pragma HLS function top
#pragma HLS function dataflow

    hls::FIFO<unsigned char> output_fifo_gf(/* depth = */ 2);
    hls::FIFO<unsigned short> output_fifo_sf(/* depth = */ 2);
    hls::FIFO<unsigned char> output_fifo_nm(/* depth = */ 2);

    gaussian_filter(input_fifo, output_fifo_gf);
    sobel_filter(output_fifo_gf, output_fifo_sf);
    nonmaximum_suppression(output_fifo_sf, output_fifo_nm);
    hysteresis_filter(output_fifo_nm, output_fifo);
}

In this case, the FIFO only needs to be as deep as 2 elements to accommodate the dataflow. By using FIFO rather than double/shared buffer, we can significantly save resource usage as well as achieve better performance by allowing the pipeline to be in steady state faster.

Another way to implement these as fifo channels would be to use the hls::FIFO library. This final version is included as a complete code example. Please refer to the C++ Canny Edge Detection (SW/HW Co-Simulation) example in the SmartHLS IDE.

void canny_fifo(hls::FIFO<unsigned char> &input_fifo,
                hls::FIFO<unsigned char> &output_fifo) {
#pragma HLS function top
#pragma HLS function dataflow

    hls::FIFO<unsigned char> output_fifo_gf(/* depth = */ 2);
    hls::FIFO<unsigned short> output_fifo_sf(/* depth = */ 2);
    hls::FIFO<unsigned char> output_fifo_nm(/* depth = */ 2);

    gaussian_filter(input_fifo, output_fifo_gf);
    sobel_filter(output_fifo_gf, output_fifo_sf);
    nonmaximum_suppression(output_fifo_sf, output_fifo_nm);
    hysteresis_filter(output_fifo_nm, output_fifo);
} 

The top-level function has been specified with #pragma HLS function top. The top-level function calls the four sub-functions, gaussian_filter, sobel_filter, nonmaximum_suppression, and hysteresis_filter, each of which are specified to be function pipelined with #pragma HLS function pipeline. The top-level arguments are input_fifo and output_fifo. The input_fifo is given as an argument into the first sub-function, gaussian_filter, and gives the inputs into the overall circuit. The output_fifo is given as an argument into the last sub-function, hysteresis_filter, and receives the outputs of the overall circuit. There are also intermediate FIFOs, output_fifo_gf, output_fifo_sf, and output_fifo_nm, which are given as arguments into the sub-function and thus connect them (i.e., outputs of gaussian_filter is given as inputs to sobel_filter).

When synthesizing a function with multiple pipelined sub-functions, specifying #pragma HLS function dataflow causes SmartHLS to parallelize the execution of all sub-functions, forming a streaming circuit with dataflow parallelism. In this case gaussian_filter executes as soon as there is data in the input_fifo, and sobel_filter starts running as soon as there is data in the output_fifo_sf. In other words, a sub-function does not wait for its previous sub-function to completely finish running before it starts to execute, but rather, it starts running as early as possible. Each sub-function also starts working on the next data while the previous data is being processed (in a pipelined fashion). If the initiation interval (II) is 1, a sub-function starts processing new data every clock cycle. Once the sub-functions reach steady-state, all sub-functions execute concurrently. This example showcases the synthesis of a streaming circuit that consists of a succession of concurrently executing dataflow sub-functions.