3.5.1.19.5 C++ Arbitrary Precision Fixed Point Library
(Ask a Question)The C++ Arbitrary Precision Fixed Point library provides fast bit accurate software
simulation, and efficient equivalent hardware generation. The C++
ap_[u]fixpt
types allow specifying signed and unsigned
fixed point numbers of arbitrary width, and arbitrary fixed position relative to
the decimal. They can be used for arithmetic, concatenation, and bit level
operations. You can use the ap_[u]fixpt
type by including the
following header file.
#include "hls/ap_fixpt.hpp"
The ap_[u]fixpt
template allows specifying the width of the type,
how far the most significant bit is above the decimal, as well as several
quantization and overflow modes.
Quantization and overflow handling is triggered during assignment and construction. The policies used for quantization and overflow are based on the quantization and overflow modes of the left-hand side of an assignment, or of the value being constructed.
The template ap_[u]fixpt<W, I_W, Q_M, O_M>
is described in the
following table. The last two template parameters are optional.
Parameter | Description | |
---|---|---|
W | The width of the word in bits. | |
I_W |
How far the most significant bit is above the decimal. I_W can be negative. I_W > 0 implies the MSB is above the decimal. I_W <= 0 implies the MSB is below the decimal. If W >= I_W >= 0, then I_W is the number of bits used for the integer portion. | |
Q_M |
The Quantization(rounding) mode used when a result has precision below the least significant bit. Defaults to AP_TRN. | |
AP_TRN | Truncate bits below the LSB bringing the result closer to -∞. | |
AP_TRN_ZERO | Truncate bits below the LSB bringing the result closer to zero. | |
AP_RND | Round to the nearest representable value with the mid-point going towards +∞. | |
AP_RND_INF | Round to the nearest representable value with the mid-point going towards -∞ for negative numbers, and +∞ for positive numbers. | |
AP_RND_MIN_INF | Round to the nearest representable value with the mid-point going towards -∞. | |
AP_RND_ZERO | Round to the nearest representable value with the mid-point going towards 0. | |
AP_RND_CONV | Round to the nearest representable value with the mid-point going towards the nearest even multiple of the quantum. (This helps to remove bias in rounding). | |
O_M |
The Overflow mode used when a result exceeds the maximum or minimum representable value. Defaults to AP_WRAP. | |
AP_WRAP | Wraparound between the minimum and maximum representable values in the range. | |
AP_SAT | On positive and negative overflow saturate the result to the maximum or minimum value in the range respectively. | |
AP_SAT_ZERO | On any overflow set the result to zero. | |
AP_SAT_SYM |
On positive and negative overflow saturate the result to the maximum or minimum value in the range symmetrically about zero. For ap_ufixpt this is the same as AP_SAT. |
An ap_[u]fixpt
is a W bit wide integer, in 2's complement for the
signed case, which has some fixed position relative to the decimal. This means
that arithmetic is efficiently implemented as integer operations with some
shifting to line up decimals. Generally a fixed point number can be thought of as
a signed or unsigned integer word multiplied by 2^(I_W - W)
. The
range of values that an ap_[u]fixpt
can take on, as well as the
quantum that separates those values is determined by the W, and I_W template
parameters. The AP_SAT_SYM overflow mode forces the range to be symmetrical about
zero for signed fixed point types. This information is described in the following
table. Q here represents the quantum.
Type | Quantum | Range | AP_SAT_SYM Range |
---|---|---|---|
ap_ufixpt | 2^(I_W - W) |
0 to 2^(I_W) - Q |
0 to 2^(I_W) - Q |
ap_fixpt | 2^(I_W - W) |
-2^(I_W - 1) to 2^(I_W - 1) - Q |
-2^(I_W - 1) + Q to 2^(I_W - 1) - Q |
Some ap_[u]fixpt
ranges are demonstrated in the following table.
Type | Quantum | Range |
---|---|---|
ap_fixpt<8, 4> | 0.0625 | -8 to 7.9375 |
ap_ufixpt<4, 12> | 256 | 0 to 3840 |
ap_ufixpt<4, -2> | 0.015625 | 0 to 0.234375 |
An example using ap_fixpt
is show below.
#include "hls/ap_fixpt.hpp" #include "hls/streaming.hpp"
#define TAPS 8 // A signed fixed point type with 10 integer bits and 6 fractional bits // It employs convergent rounding for quantization, and saturation for overflow. typedef hls::ap_fixpt<16, 10, hls::AP_RND_CONV, hls::AP_SAT> fixpt_t; // A signed fixed point type with 3 integer bits and 1 fractional bit // It uses the default truncation, and wrapping modes. typedef hls::ap_fixpt<4, 3> fixpt_s_t;
void fir(hls::FIFO<fixpt_t> &input_fifo, hls::FIFO<fixpt_t> &output_fifo) { #pragma HLS function top #pragma HLS function pipeline fixpt_t in = input_fifo.read(); static fixpt_t previous[TAPS] = {0}; const fixpt_s_t coefficients[TAPS] = {-2, -1.5, -1, -0.5, 0.5, 1, 1.5, 2}; for (unsigned i = (TAPS - 1); i > 0; --i) { previous[i] = previous[i - 1]; } previous[0] = in; fixpt_t accumulate[TAPS]; for (unsigned i = 0; i < TAPS; ++i) { accumulate[i] = previous[i] * coefficients[i]; } // Accumulate results, doing adds and saturation in // a binary tree to reduce the number of serial saturation // checks. This significantly improves pipelining results // over serially adding results together when saturation // is required. for (unsigned i = TAPS >> 1; i > 0; i >>= 1) { for (unsigned j = 0; j < i; ++j) { accumulate[j] += accumulate[j + i]; } } output_fifo.write(accumulate[0]); }
This example implements a streaming FIR filter with 8 taps. Using the minimum width
ap_fixpt
to represent the constant coefficients allows the
multiply to happen at a smaller width than if they were the same (wider) type as
the inputs. This example ensures that no overflows occur by always assigning to an
ap_fixpt
that uses the AP_SAT overflow mode. This does
incur a performance penalty, but this is minimized here by accumulating the
results in a binary fashion, such that there are only log(TAPS) = 3 saturating
operations that depend on each other. If the results were accumulated in a single
variable in one loop then there would be TAPS = 8 saturating operations depending
on each other. Having more saturating operations in a row is slower because at
each step overflow needs to be checked before the next operation can occur.