Number Representations & States

"how numbers are stored and used in computers"

Quadruple precision (FP128)

Quadruple precision floating point numbers, also known as FP128 or binary128, are a floating point format that uses 128 bits (16 bytes) of storage.

This format allows for numbers ranging from approximately ±6.4751751194380251109244389582276e-4966 to ±1.189731495357231765085759326628e+4932, making it suitable for scientific computing and other applications requiring very high precision arithmetic.

FP128 was added to IEEE 754 in the 2008 revision, though it was anticipated in the original 1985 standard as noted by William Kahan:

William Kahan, primary architect of the IEEE 754 standard For now the 10-byte Extended format is a tolerable compromise between the value of extra-precise arithmetic and the price of implementing it to run fast; very soon two more bytes of precision will become tolerable, and ultimately a 16-byte format ... That kind of gradual evolution towards wider precision was already in view when IEEE Standard 754 for Floating-Point Arithmetic was framed.

Anatomy of an FP128

They provide extremely high precision with:

  • 1 bit for the sign
  • 15 bits for the exponent
  • 112 bits for the significand/mantissa