5.3.4.1 Why Are My Floating-Point Results Not Quite What I Am Expecting?

First, ensure that you are using the floating-point data types that you intend. We recommend using the types long double, explicitly in your program, when IEEE double precision (64 bit) format floating-point values are desired, and float when IEEE single precision (32 bit) format values are desired. By default, the compiler uses IEEE single precision format for the type double. Use the -fno-short-double switch to specify IEEE double precision (64 bit) format for the type double.

Next, be aware of the limitations of the floating-point formats and the effects of rounding. Not all real numbers can be represented exactly in the floating-point formats. For example, the fraction 1/10 cannot be represented exactly in either the single or double precision formats.

If the result of a load or a computation is 1/10, the value stored in the floating-point format will be the closest approximation representable in that format. In such cases, it is said that the “true” value has been “rounded” to the nearest approximation, according to the rules of the IEEE arithmetic. This small discrepancy in a value that is introduced early in a computation can accumulate over many operations and produce noticeable error. In general, computations may start from numbers that are exactly representable (like 1 and 10), and yield results that are not (like 1/10). This is not due to the compiler's choice of code generated, nor any specifics of the microprocessor architecture, but rather an essential characteristic the IEEE floating-point formats and rules of arithmetic. Any compiler/microprocessor platform faces the same issues. For more information, see the following section in this user’s guide:

Floating-Point Data Types