3.5.1.14 Supported HLS Thread APIs
(Ask a Question)You can use HLS thread library by including the header file:
#include "hls/thread.hpp"
The thread library is provided as a C++ template class. The template argument of hls::thread<T>
object specifies the return type T
of the threaded function. For example, hls::thread<int>
is a thread that can invoke a function with int
return type, and hls::thread<void>
is a thread that can invoke a function that returns void
.
To start the parallel execution of a function, we will pass the function and function call arguments to the constructor of a new thread instance,
// f1 is a function that we would like to execute concurrently. void f1(int a); // Create a new thread 't1' with the function 'f1' and argument 'm'. // - <void> corresponds to the return type of 'f1'. // - Argument 'm' corresponds to the parameter 'a' of 'f1'. // - In software, this line creates a parallel thread to run the f1 function. // - In hardware, this line means a dedicated hardware module for f1 should // be created for this specific thread call, and the dedicated hardware // module will start the execution right here. hls::thread<void> t1(f1, m); // Another way to create a parallel thread: int f2(); // f2 has no argument and the return type is <int>. hls::thread<int> t2; // Create a thread 't2' instance first. t2 = hls::thread<int>(f2); // Assign 't2' later with the function and arguments.
The code below shows how to join a thread (For example, wait for the thread completion), and optionally retrieve a non-void return value. Note that joining a thread will block the execution until the threaded function finishes.
hls::thread<void> t1(f1, m); t1.join(); // The program will block here until thread 't1' finishes running 'f1'. hls::thread<int> t2 = hls::thread<int>(f2); int ret = t2.join(); // The program will wait for t2 to finish and retrieve the return value.
If you have used std::thread
, you may know passing an argument by reference requires a std::ref
wrapper around the argument. Similarly, hls::ref
is used to wrap the passed-in by reference argument when the hls::thread
is created:
int f(int &a); int x; hls::thread<int> t = hls::thread<int>(f, hls::ref(x));
std::thread
in a few aspects:- SmartHLS threads support retrieving the return value from the threaded function (this functionality is only supported using
std::future
in the standard threading library). - SmartHLS threads use templates to specify the return type of the threaded function.
- SmartHLS threads are auto-detaching, which means if the function where the thread is created is exited without using
join
, the thread will be detached when destructed. But the threaded function can continue executing.
SmartHLS thread library also supports mutex
and barrier
as synchronization primitives.
mutex
can be used to protect shared data from being simultaneously accessed by multiple threads. hls::mutex
has lock()
and unlock()
methods.
barrier
provides a thread-coordination mechanism that allows at most an expected number of threads to block until the expected number of threads arrive at the barrier. hls::barrier
has init()
and wait()
methods.
The following example illustrates the use of hls::mutex
and hls::barrier
:
#define ARRAY_SIZE 20 #include <hls/thread.hpp> #include <stdio.h> volatile int input[ARRAY_SIZE] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}; hls::mutex mutex; hls::barrier barr; int add(int &final_result, int thread_no) { int result = 0; for (int i = 0; i < ARRAY_SIZE; i++) result += input[i]; // Use mutex so that only 1 thread can write at any time mutex.lock(); final_result += result; mutex.unlock(); // Wait for all threads to reach this point barr.wait(); // Print the result after all threads update final_result printf("thread %d: final_result = %d\n", thread_no, final_result); return result; } int main() { #pragma HLS function top // Initialize the barrier. barr.init(2); // Start the threads. int final_result = 0; hls::thread<int> thread1(add, hls::ref<int>(final_result), /*thread_no*/ 1); hls::thread<int> thread2(add, hls::ref<int>(final_result), /*thread_no*/ 2); // Join the threads. int result[2] = {0, 0}; result[0] = thread1.join(); result[1] = thread2.join(); // Check result. int result_matches = 0; for (int i = 0; i < 2; i++) { printf("result[%d] = %d\n", i, result[i]); result_matches += (result[i] == 210); } // Check final_result is correct result_matches += (result[0] + result[1]) == final_result; printf("MATCHES: %d\n", result_matches); if (result_matches == 3) { printf("PASS\n"); return 0; } printf("FAIL\n"); return 1; }