In standard binary, we handle integers easily (unsigned or two's complement). But what about 3.1415 or 0.005? In base 10 (Decimal), digits to the right of the dot represent powers of 1/10 ($10^{-1}, 10^{-2}$...).
In Binary (base 2), we do the same. Digits to the right of the binary point represent powers of 1/2 ($2^{-1}, 2^{-2}$...).
There are two main ways to handle this in computer architecture:
We lock the decimal point at a specific column (e.g., the last 4 bits are always fractions).
The decimal point "floats" using scientific notation ($1.xxx \times 2^{exp}$).
We will use the Hypothetical 8-Bit Model from your text to explain the standard 32-bit format. It divides bits into three distinct groups:
Actual Exp = BinaryValue - 7Value = 1.0 + bitsClick the bits below to flip them (0/1) and see how the floating point value is calculated in real-time. This models the 8-bit example.
Calculation:
0 0111 000 (Exp is 7-7=0, Mantissa is 0)0 0111 111 (Exp is 0, Mantissa is 0.875)1 1000 100 (Sign -, Exp 8-7=1, Mantissa 0.5)Because we have finite bits (8 in our model, 32 in standard), we cannot represent every number on the infinite number line. We can only store specific "dots".
Key Concept: The gaps between representable numbers get larger as the numbers get bigger.
Near Zero, accuracy is high (dense dots). At high magnitudes, accuracy drops (sparse dots).
With IEEE 754 (32-bit), the max rounding error is approx 0.000006%. Sufficient for engineering, but dangerous if you ignore it in high-precision comparison logic.