"how numbers are stored and used in computers"
The 8-bit floating point format is a compact numerical representation of real numbers, where each number is represented by a single byte. It is commonly used in machine learning applications where memory usage and bandwidth capabilities are especially important.
Try editing the 8 bit floating point value below.
Unlike FP32, FP8 is not standardized - while every FP8 format uses 1 bit for the sign, the choice on how the remaining 7 bits should be divided between mantissa and exponent is dependent on the specific FP8 format being used.
Here is an example specification for an FP8 format named E4M3
.
e4m3
Emin = -6, Emax = 7
sign bit s
e
mantissa m
e
= 0m
= 0m
> 0e
> 0m
< 7Since an FP8 can represent 256 possible values, it is easy to visualize their numeric range and distribution with a table.
E4M3Each row is the first four bits, and each column is the second four bits.
There is also a separate specification for an FP8 named E4M3FNUZ
, where FN
means finite and UZ
means unsigned zeros.
E4M3FNUZEach row is the first four bits, and each column is the second four bits.
E5M2Each row is the first four bits, and each column is the second four bits.
E5M2FNUZEach row is the first four bits, and each column is the second four bits.