9.6 Thread-local Storage

The MPLAB XC32 compiler implements a thread-local storage (TLS) memory management feature that uses memory local to each thread to allow unique storage of objects that otherwise appear to have global scope. Using such storage can reduce the risk of race conditions when accessing shared data, reduce complexity in thread synchronization, and eliminate the need for locks, thereby improving runtime performance in multithreaded applications.

Objects qualified with __thread (see Thread Qualifier) are known as thread-safe objects and will be allocated to dedicated sections so that they can be linked separately from other objects. Initialized objects are allocated to the .tdata section; uninitialized objects to the .tbss section. The compiler will generate calls to a special routine to access the memory associated with thread-safe objects, based on which program thread is currently active.

The Arm ABI Addendum documentation specifies how thread-safe variables are managed and accessed, and indicates that these variables are stored in a special section of memory that is unique to each thread and are accessed via a thread pointer (TP) variable, which points to the base of the TLS area for the current thread.

The TLS sections can be allocated via one of two methods:
  • The Best Fit Allocator (BFA)
  • A customized linker script
However, regardless of how the TLS is allocated, any application using threads should use a library with TLS support, such as the picolibc library.

When TLS sections are handled by the BFA, the allocator collates .tdata and .tbss sections not allocated by a linker script, then concatenates them in the output, where they will be allocated space in program flash memory. Unless prevented by use of the -Wl,--no-tls-first-copy linker option, memory is allocated for the TLS block in RAM and the runtime startup code copies the .tdata sections to this memory and clears the .tbss section for the initial thread executed after Reset. The --tls-first-copy option makes this action explicit. The TLS block initialization code uses a dedicated xc32_init_tls() routine to perform the initialization. This routine is provided as a stub if the --no-tls-first-copy option has been used.

When the picolib library has been specified, the linker defines symbols to allow the initialization routine to determine the address ranges that need to be initialized. The linker symbols are listed in the following table.
Table 9-1. Linker-defined symbols associated with thread-local storage
SymbolRepresents
__tdata_sourceThe start of .tdata section in flash
__tdata_startThe start of .tdata section in RAM if a copy is created in ram, otherwise the same value as __tdata_source
__tdata_sizeThe size of .tdata
__tdata_endThe end of .tdata
__tbss_startThe start of .tbss
__tbss_sizeThe size of .tbss
__tbss_endThe end of .tbss
__tbss_offsetEquivalent to __tbss_start - __tdata_start)
__tls_alignEquivalent to max(alignof(.tdata), alignof(.tbss))
__arm32_tls_tcb_offsetEquivalent to (max(8, __tls_align))
__tls_baseEquivalent to __tdata_start
__tls_endEquivalent to __tbss_end
__tls_sizeEquivalent to __tbss_offset + __tbss_size

This allocation method uses the runtime startup code and linker scripts provided in the DFPs. The XC32 Picolibc library provides functions to initialize the TLS block for additional threads created.

Alternatively, a customized linker script can be written to allocate memory for the TLS. This linker script must gather all TLS sections and provide the symbols tabled above, which represent the allocated space for the TLS block. This script should be paired with customized runtime startup code that performs initialization of the initial TLS block.

The picolibc library provides functions to manage the thread pointer. The _set_tls() function has the prototype:
void _set_tls(void *tls)
and sets the TLS thread pointer for the core. It is architecture-specific and is used to point to the TLS area for the current thread. The _init_tls() function has the prototype:
void _init_tls(void *tls)
and initializes the TLS area for a new thread. It typically involves copying the initial values from the .tdata section and zeroing out the .tbss section.