"how numbers are stored in computers"
If you work on building computer systems, chances are you'll need to understand how floating-point arithmetic works at a deeper level. Surprisingly, there aren't many clear and detailed resources out there on the topic.
The theorems below are a practical introduction to the numerical analysis behind floating-point arithmetic. They are written for a technical audience, but should be comprehensible by anyone - even the less mathematically inclined. The purpose of these proofs is to understand how floating point calculations may be reasoned about in general, especially when it comes to rounding and exactness.
Establishes an upper bound on relative error as a function of total digits.
Theorem 1
Establishes the upper bound on relative rounding error.
Theorem 2
An illustrative example of how rounding error bounds might be established for a formula.
Theorem 3
Explores the implications of rounding errors and collision when computing ln(1 + x).
Theorem 4
Considers the implications of rounding at halfway points like 0.5.
Theorem 5
How exact rounding enables precise arithmetic through high-low decomposition
Theorem 6
Behavior of floating point arithmetic when the base is 2 and operations are exactly rounded.
Theorem 7
Reducing numerical error from adding a sequence of numbers by tracking total accumulated error.
Theorem 8
Establishes bounds on relative error for certain subtraction operations.
Theorem 9
Establishes bounds on relative error for certain subtraction operations.
Theorem 10
Establishes conditions by which an exact subtraction may be performed in floating point arithmetic.
Theorem 11
Examines the numerical stability of a formula for calculating the area of a triangle.
Theorem 12
Bounding values and derivatives of the logarithmic mean function.
Theorem 13
Rounding to fewer significant digits using exact operations.
Theorem 14
Conversion between decimal precision numbers.
Theorem 15
In floating-point, exactness is rare due to rounding. But knowing when you can subtract without error is useful for designing robust numerical algorithms—like square root calculations, computing differences, or detecting small perturbations.