3.5.1.13 Data Flow Parallelism with SmartHLS Threads

Most of the time a dataflow design can be implemented with the 3.6.1.11 Dataflow pragma that requires minimal code change. However, for more complex parallelism, for example, with feedback/cycles between sub-functions, multi-threading APIs may be needed to explicitly describe the parallelism between the functions.

The concurrent execution of computational tasks can also be accurately described in software using hls::thread APIs. In addition, the continuous streams of data flowing through the tasks can be inferred using SmartHLS's built-in FIFO data structure (For more information, see 3.5.1.17.1 Streaming Library).

Important: Also see 3.5.1.17.7.1 Using DoubleBuffer and SharedBuffer in a multi-threaded dataflow design for another example that uses data buffer instead of FIFO to pass data between threads.

Let's take a look at the code snippet below, which is from the example project, "Fir Filter (Loop Pipelining with hls::thread)", included in the SmartHLS IDE. In the example, the main function contains the following code snippet:

// Create input and output FIFOs
hls::FIFO<int> input_fifo(/*depth*/ 2);
hls::FIFO<int> output_fifo(/*depth*/ 2);

// Launch thread kernels.
hls::thread<void> thread_var_fir(FIRFilterStreaming, &input_fifo, &output_fifo);
hls::thread<void> thread_var_injector(test_input_injector, &input_fifo);
hls::thread<void> thread_var_checker(test_output_checker, &output_fifo);

// Join threads.
thread_var_injector.join()
thread_var_checker.join();

The corresponding hardware is illustrated in the figure below.

The two hls::FIFO<int>s in the C++ code corresponds to the creation of the two FIFOs, where the bit-width is set according to the type shown in the constructor argument <int>. The three hls::thread<void> calls initiate and parallelize the executions of three computational tasks, where each task is passed in a FIFO (or a pointer to a struct containing more than one FIFO pointers) as its argument.

The FIFO connections and data flow directions are implied by the uses of FIFO read() and write() APIs. For example, the test_input_injector function has a write() call writing data into the input_fifo, and the FIRFilterStreaming function uses a read() call to read data out from the input_fifo. This means that the data flows through the input_fifo from test_input_injector to FIRFilterStreaming.

The join() API is called to wait for the completion of test_input_injector and test_output_checker. We do not "join" the FIRFilterStreaming thread since it contains an infinite loop (see code below) that is always active and processes incoming data from input_fifo whenever the FIFO is not empty. This closely matches the always running behaviour of streaming hardware, where hardware is constantly running and processing data..

Now let's take a look at the implementation of the main computational task (for example, FIRFilterStreaming threading function).

void FIRFilterStreaming(hls::FIFO<int> *input_fifo,
                         hls::FIFO<int> *output_fifo) {
     // This loop is pipelined and will be "always running", just like how a
     // streaming module always runs when new input is available.
     #pragma HLS loop pipeline
     while (1) {
         // Read from input FIFO.
         int in = input_fifo->read();

         printf("FIRFilterStreaming input: %d - %d\n", i, in);
         static int previous[TAPS] = {0}; // Need to store the last TAPS -1 samples.
         const int coefficients[TAPS] = {0, 1, 2,  3,  4,  5,  6,  7,
                                         8, 9, 10, 11, 12, 13, 14, 15};

         int j = 0, temp = 0;

         for (j = (TAPS - 1); j >= 1; j -= 1)
             previous[j] = previous[j - 1];
         previous[0] = in;

         for (j = 0; j < TAPS; j++)
             temp += previous[TAPS - j - 1] * coefficients[j];

         int output = (previous[TAPS - 1] == 0) ? 0 : temp;

         // Write to output FIFO.
         output_fifo->write(output);
    }
}
      

In the code shown in the example project, you will notice that all three threading functions contain a loop, which repeatedly reads and/or writes data from/to FIFOs to perform processing. In SmartHLS, this is how one can specify that functions are continuously processing data streams that are flowing through FIFOs.