4.3.4 Floating-Point Data Types

The MPLAB XC8 compiler supports 32-bit floating-point types. Floating point is implemented using a IEEE 754 32-bit format. Tabulated below are the floating-point data types and their size.

Table 4-2. Floating-Point Data Types
Type	Size (bits)	Arithmetic Type
`float`	32	Real
`double`	32	Real
`long double`	32	Real

Floating-point types are always signed and the unsigned keyword is illegal when specifying a floating-point type. All floating-point values are represented in little endian format with the LSB at the lower address.

Infinities are legal arguments for all operations and behave as the largest representable number with that sign. For example, the expression +inf + -inf yields the value 0.

The format for both floating-point types is described in the table below, where:

Sign is the sign bit, which indicates whether the number is positive or negative.
The Biased Exponent is 8 bits wide and is stored as excess 127 (i.e., an exponent of 0 is stored as 127).
Mantissa, is located to the right of the radix point. There is an implied bit to the left of the radix point which is always 1 except for a zero value, where the implied bit is zero. A zero value is indicated by a zero exponent.

The value of this number is (-1)^sign x 2⁽^exponent^-127) x 1. mantissa.

Table 4-3. Floating-Point Formats
Format	Sign	Biased Exponent	Mantissa
IEEE 754 32-bit	x	xxxx xxxx	xxx xxxx xxxx xxxx xxxx xxxx

An example of the IEEE 754 32-bit format shown in the following table. Note that the Most Significant Bit (MSb) of the mantissa column (i.e., the bit to the left of the radix point) is the implied bit, which is assumed to be 1 unless the exponent is zero.

Table 4-4. Floating-Point Format Example IEEE 754
Format	Value	Biased Exponent	1.mantissa	Decimal
32-bit	7DA6B69Bh	11111011b	1.01001101011011010011011b	2.77000e+37
32-bit	7DA6B69Bh	(251)	(1.302447676659)	—

The sign bit is zero; the biased exponent is 251, so the exponent is 251-127=124. Take the binary number to the right of the decimal point in the mantissa. Convert this to decimal and divide it by 223 where 23 is the size of the mantissa, to give 0.302447676659. Add 1 to this fraction. The floating-point number is then given by:

-1⁰x2¹²⁴x1.302447676659

which is approximately equal to:

2.77000e+37

Binary floating-point values are sometimes misunderstood. It is important to remember that not every floating-point value can be represented by a finite sized floating-point object. The size of the exponent in the number dictates the range of values that the number can hold and the size of the mantissa relates to the spacing of each value that can be represented exactly.

For example, if you are using a 32-bit wide floating-point type, it can store the value 95000.0 exactly. However, the next highest value which can be represented is (approximately) 95000.00781.

The characteristics of the floating-point formats are summarized in the declarations for <float.h>, contained in the Microchip Unified Standard Library Reference Guide. The symbols presented there are preprocessor macros that are available after including <float.h> in your source code. As the size and format of floating-point data types are not fully specified by the C Standard, these macros allow for more portable code which can check the limits of the range of values held by the type on this implementation.