3.5.1.14 Supported HLS Thread APIs

You can use HLS thread library by including the header file:

#include "hls/thread.hpp" 

The thread library is provided as a C++ template class. The template argument of hls::thread<T> object specifies the return type T of the threaded function. For example, hls::thread<int> is a thread that can invoke a function with int return type, and hls::thread<void> is a thread that can invoke a function that returns void.

To start the parallel execution of a function, we will pass the function and function call arguments to the constructor of a new thread instance,

// f1 is a function that we would like to execute concurrently.
void f1(int a);

// Create a new thread 't1' with the function 'f1' and argument 'm'.
// - <void> corresponds to the return type of 'f1'.
// - Argument 'm' corresponds to the parameter 'a' of 'f1'.
// - In software, this line creates a parallel thread to run the f1 function.
// - In hardware, this line means a dedicated hardware module for f1 should
// be created for this specific thread call, and the dedicated hardware
// module will start the execution right here.
hls::thread<void> t1(f1, m);


// Another way to create a parallel thread:
int f2();                     // f2 has no argument and the return type is <int>.
hls::thread<int> t2;        // Create a thread 't2' instance first.
t2 = hls::thread<int>(f2);  // Assign 't2' later with the function and arguments.

The code below shows how to join a thread (For example, wait for the thread completion), and optionally retrieve a non-void return value. Note that joining a thread will block the execution until the threaded function finishes.

hls::thread<void> t1(f1, m);
t1.join();  // The program will block here until thread 't1' finishes running 'f1'.

hls::thread<int> t2 = hls::thread<int>(f2);
int ret = t2.join();  // The program will wait for t2 to finish and retrieve the return value.   
 

If you have used std::thread, you may know passing an argument by reference requires a std::ref wrapper around the argument. Similarly, hls::ref is used to wrap the passed-in by reference argument when the hls::thread is created:

int f(int &a);

int x;
hls::thread<int> t = hls::thread<int>(f, hls::ref(x));
Important: SmartHLS threads differs from std::thread in a few aspects:
  • SmartHLS threads support retrieving the return value from the threaded function (this functionality is only supported using std::future in the standard threading library).
  • SmartHLS threads use templates to specify the return type of the threaded function.
  • SmartHLS threads are auto-detaching, which means if the function where the thread is created is exited without using join, the thread will be detached when destructed. But the threaded function can continue executing.

SmartHLS thread library also supports mutex and barrier as synchronization primitives.

mutex can be used to protect shared data from being simultaneously accessed by multiple threads. hls::mutex has lock() and unlock() methods.

barrier provides a thread-coordination mechanism that allows at most an expected number of threads to block until the expected number of threads arrive at the barrier. hls::barrier has init() and wait() methods.

The following example illustrates the use of hls::mutex and hls::barrier:

#define ARRAY_SIZE 20

#include <hls/thread.hpp>
#include <stdio.h>

volatile int input[ARRAY_SIZE] = {1,  2,  3,  4,  5,  6,  7,  8,  9,  10,
                         11, 12, 13, 14, 15, 16, 17, 18, 19, 20};

hls::mutex mutex;
hls::barrier barr;

int add(int &final_result, int thread_no) {
    int result = 0;
    for (int i = 0; i < ARRAY_SIZE; i++)
        result += input[i];

    // Use mutex so that only 1 thread can write at any time
    mutex.lock();
    final_result += result;
    mutex.unlock();

    // Wait for all threads to reach this point
    barr.wait();
    // Print the result after all threads update final_result
    printf("thread %d: final_result = %d\n", thread_no, final_result);

    return result;
}

int main() {
#pragma HLS function top
    // Initialize the barrier.
    barr.init(2);

    // Start the threads.
    int final_result = 0;
    hls::thread<int> thread1(add, hls::ref<int>(final_result), /*thread_no*/ 1);
    hls::thread<int> thread2(add, hls::ref<int>(final_result), /*thread_no*/ 2);

    // Join the threads.
    int result[2] = {0, 0};
    result[0] = thread1.join();
    result[1] = thread2.join();

    // Check result.
    int result_matches = 0;
    for (int i = 0; i < 2; i++) {
        printf("result[%d] = %d\n", i, result[i]);
        result_matches += (result[i] == 210);
    }
    // Check final_result is correct
    result_matches += (result[0] + result[1]) == final_result;

    printf("MATCHES: %d\n", result_matches);
    if (result_matches == 3) {
        printf("PASS\n");
        return 0;
    }

    printf("FAIL\n");
    return 1;
}