3.5.1.14.1 Access-Based Memory Partitioning

(Ask a Question)

Access-based partitioning is automatically applied to all memories except for those at the top-level interfaces (see 3.5.3.2.4 I/O Memory). This flavor of memory partitioning will analyze the ranges of all accesses to a memory and create partitions based on these accesses. After analyzing all memory accesses, independent partitions will be implemented in independent memories. If two partitions overlap in what they access, they will be merged into one partition. If there are any sections of the memory that is not accessed, it will be discarded to reduce memory usage. For example, if there are two loops, where one loop accesses the first half of an array and the second loop accesses the second half of the array, the accesses to the array from the two loops are completely independent. In this case the array will be partitioned into two and be implemented in two memories, one that holds the first half of the array and another that holds the second half of the array. However, if both loops access the entire array, their accesses overlap, hence the two partitions will be merged into one and the array will just be implemented in a single memory (without being partitioned). Access-based partitioning is done automatically without needing any memory partition pragmas, in order to automatically improve memory bandwidth and reduce memory usage whenever possible.

Example

Access-based partitioning is automatically applied to all memories by SmartHLS™ except for interface memories (top-level function arguments and global variables accessed by both software testbench and hardware functions) to the top-level function. Interface memories need to be partitioned with the memory partition pragma. See the code snippet below that illustrate an example of accessed-based partitioning.

int array[1000];
int result = 0;

    //...

#pragma HLS loop unroll
    for (i = 0; i < 1000; i++) {
        result += array[i];
    }

In the example above, each iteration of the loop access an element of array and adds it to result. The unroll pragma is applied to completely unroll the loop. Without partitioning, SmartHLS will implement this array in a RAM (with 1000 elements), where an FPGA RAM can have up to two read/write ports. In this case, the loop will take 500 cycles, as 1000 reads are needed from the RAM and up to two reads can be performed per cycle with a two ported memory.

With access-based partitioning, the accesses to the above array will be analyzed. With unrolling, there will be 1000 load instructions, each of which will access a single array element, with no overlaps in accesses between the load instructions (For example, the accesses of each load instruction are independent). This creates 1000 partitions, with one array element in each partition. After partitioning, all 1000 reads can occur in the same clock cycle, as each memory will only need one memory access. Hence the entire loop can finish in a single cycle. With this example, we can see that memory partitioning can help to improve memory bandwidth and improve performance.

With access-based partitioning, SmartHLS outputs messages to the console specifying which memory has been partitioned into how many partitions, as shown below:

Info: Partitioning memory: array into 1000 partitions.

Please refer to the Optimization Guide for more examples and details.

v2024.1

3.5.1.14.1 Access-Based Memory Partitioning