Number Representations & States

"how numbers are stored in computers"

Digits and relative error

Subtracting two floating-point numbers using only digits (i.e. no guard digit) can result in a very large relative error, up to , where is the number base (typically 2 or 10). The worst case happens when you subtract a number like from a very close number like , where . Because there aren't enough digits to capture the fine difference, the subtraction loses key information and the result is much less accurate.

With base 10, this error could be 9 times the actual value, and in base 2, every digit in the result could be incorrect - a total failure of accuracy. To address this, floating-point systems use guard digits to keep extra precision during subtraction before rounding back to digits. This small change greatly improves accuracy, and allows the subtraction in the previous example to produce the exact result.

However, even with a guard digit, the relative error may occasionally be slightly larger than the standard rounding error , as shown in an example where the result differs by about (compared to ). Still, with a guard digit, the relative error stays small and controlled, which is why they are common in floating point systems.

Theorem

In a floating-point system with base and precision , the relative error in computing a difference using digits can be as large as .

Proof

Consider two numbers:

(1 followed by zeros)
(with digits, each )

The exact difference is .

However, when computed with only digits, the least significant digit of is lost, and the difference becomes .

Therefore, the absolute error is , and the relative error is

Discussion

With the base , the relative error in subtracting two close floating-point numbers can be as large as the result itself. For instance, a relative error of 100% () means all significant digits in the computed result may be incorrect. For decimal systems (), the relative error can be as high as - nine times the exact value!

To interpret this in terms of digit loss, consider the expression for the number of erroneous digits:

Using our earlier example, when and the relative error is , we get:

which implies that none of the digits in the result are reliable. Or put differently, all digits can be corrupted when subtracting two nearly equal numbers.

Guard Digits

To reduce this problem, floating-point implementations often use a guard digit — an extra digit retained during intermediate steps of subtraction. This extra digit helps preserve accuracy when the result is small due to cancellation.

Consider a subtraction of and with a guard digit, where and . The exact difference - because one extra digit was preserved before rounding, the result is computed exactly.

However, a guard digit doesn't guarantee perfect accuracy in all cases. In the case of and , the exact difference is and the rounded result is , yielding a relative error of - slightly larger than machine epsilon for 2-digit precision in base 10.

References

1. What Every Computer Scientist Should Know About Floating-Point Arithmetic David Goldberg