3.5.1.15 Data Flow Parallelism with SmartHLS Threads
(Ask a Question)Most of the time a dataflow design can be implemented with the Dataflow pragma that requires minimal code change. However, for more complex parallelism, for example, with feedback/cycles between sub-functions, multi-threading APIs may be needed to explicitly describe the parallelism between the functions.
The concurrent execution of computational tasks can also be accurately described in software using hls::thread
APIs. In addition, the continuous streams of data flowing through the tasks can be inferred using SmartHLS's built-in FIFO data structure (For more information, see Streaming Library).
DoubleBuffer
and SharedBuffer
in a multi-threaded dataflow design for another example that uses data buffer instead of FIFO to pass data between threads.Let's take a look at the code snippet below, which is from the example project, "Fir Filter (Loop Pipelining with hls::thread
)", included in the SmartHLS IDE. In the example, the main
function contains the following code snippet:
// Create input and output FIFOs hls::FIFO<int> input_fifo(/*depth*/ 2); hls::FIFO<int> output_fifo(/*depth*/ 2); // Launch thread kernels. hls::thread<void> thread_var_fir(FIRFilterStreaming, &input_fifo, &output_fifo); hls::thread<void> thread_var_injector(test_input_injector, &input_fifo); hls::thread<void> thread_var_checker(test_output_checker, &output_fifo); // Join threads. thread_var_injector.join() thread_var_checker.join();
The corresponding hardware is illustrated in the figure below.
The two hls::FIFO<int>s in the C++ code corresponds to the creation of the two FIFOs, where the bit-width is set according to the type shown in the constructor argument <int>. The three hls::thread<void>
calls initiate and parallelize the executions of three computational tasks, where each task is passed in a FIFO (or a pointer to a struct containing more than one FIFO pointers) as its argument.
The FIFO connections and data flow directions are implied by the uses of FIFO read()
and write()
APIs. For example, the test_input_injector
function has a write()
call writing data into the input_fifo
, and the FIRFilterStreaming
function uses a read()
call to read data out from the input_fifo
. This means that the data flows through the input_fifo
from test_input_injector
to FIRFilterStreaming
.
The join()
API is called to wait for the completion of test_input_injector
and test_output_checker
. We do not "join" the FIRFilterStreaming
thread since it contains an infinite loop (see code below) that is always active and processes incoming data from input_fifo
whenever the FIFO is not empty. This closely matches the always running behaviour of streaming hardware, where hardware is constantly running and processing data..
Now let's take a look at the implementation of the main computational task (for example, FIRFilterStreaming
threading function).
void FIRFilterStreaming(hls::FIFO<int> *input_fifo, hls::FIFO<int> *output_fifo) { // This loop is pipelined and will be "always running", just like how a // streaming module always runs when new input is available. #pragma HLS loop pipeline while (1) { // Read from input FIFO. int in = input_fifo->read(); printf("FIRFilterStreaming input: %d - %d\n", i, in); static int previous[TAPS] = {0}; // Need to store the last TAPS -1 samples. const int coefficients[TAPS] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; int j = 0, temp = 0; for (j = (TAPS - 1); j >= 1; j -= 1) previous[j] = previous[j - 1]; previous[0] = in; for (j = 0; j < TAPS; j++) temp += previous[TAPS - j - 1] * coefficients[j]; int output = (previous[TAPS - 1] == 0) ? 0 : temp; // Write to output FIFO. output_fifo->write(output); } }
In the code shown in the example project, you will notice that all three threading functions contain a loop, which repeatedly reads and/or writes data from/to FIFOs to perform processing. In SmartHLS, this is how one can specify that functions are continuously processing data streams that are flowing through FIFOs.