Skip to content
LAM
Read Home Blog
Make Projects HTML Tools Games
Touch grass Notes Resume Links
Home Blog HTML Projects
Tools Games Notes Resume Links
Back Floating Point & Rounding Computing
Download Open
Show description 2,323 chars · Computing

Floating Point & Rounding

Floating Point & Rounding









CSE 230: Module 4.6

Floating Point & Rounding Error


[ INSERT COIN TO LEARN ]








01. Non-Integer Numbers

In standard binary, we handle integers easily (unsigned or two's complement). But what about 3.1415 or 0.005? In base 10 (Decimal), digits to the right of the dot represent powers of 1/10 ($10^{-1}, 10^{-2}$...).

In Binary (base 2), we do the same. Digits to the right of the binary point represent powers of 1/2 ($2^{-1}, 2^{-2}$...).



... $2^2$ | $2^1$ | $2^0$ . $2^{-1}$ | $2^{-2}$ | $2^{-3}$ ...



There are two main ways to handle this in computer architecture:




Fixed Point

We lock the decimal point at a specific column (e.g., the last 4 bits are always fractions).


PRO: Simple math and logic.

CON: Inflexible. You lose range. If you need tiny numbers, you can't have huge numbers.





Floating Point (IEEE 754)

The decimal point "floats" using scientific notation ($1.xxx \times 2^{exp}$).


PRO: Huge dynamic range (tiny to massive).

CON: More complex hardware; precision varies based on magnitude.










02. Anatomy of IEEE 754

We will use the Hypothetical 8-Bit Model from your text to explain the standard 32-bit format. It divides bits into three distinct groups:



1 Sign Bit (1 bit): 0 = Positive, 1 = Negative.

2 Exponent (4 bits): Determines the scale (moving the dot). It uses a Bias of 7. Actual Exp = BinaryValue - 7

3 Significand (3 bits): The precision. It assumes a leading "1" before the dot. Value = 1.0 + bits




Value = $(-1)^S \times (1 + \text{Signif}) \times 2^{(Exp - 7)}$






03. Interactive Bit Lab

Click the bits below to flip them (0/1) and see how the floating point value is calculated in real-time. This models the 8-bit example.









SIGN BIT
Positive (+)




EXPONENT (Bias 7)
2^0




SIGNIFICAND (1.xxx)
1.0




1.0






Calculation:




Try These Text Examples:


Example 1 (1.0): 0 0111 000 (Exp is 7-7=0, Mantissa is 0)

Example 2 (1.875): 0 0111 111 (Exp is 0, Mantissa is 0.875)

Example 3 (-3.0): 1 1000 100 (Sign -, Exp 8-7=1, Mantissa 0.5)






04. Rounding Error

Because we have finite bits (8 in our model, 32 in standard), we cannot represent every number on the infinite number line.…

Floating Point & Rounding

19,353 bytes · HTML source
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Floating Point & Rounding</title>
    <link rel="preconnect" href="https://fonts.googleapis.com">
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
    <link href="https://fonts.googleapis.com/css2?family=Press+Start+2P&family=VT323&display=swap" rel="stylesheet">
    <style>
        :root {
            --bg-color: #121212;
            --panel-bg: #222222;
            --text-color: #dcdcdc;
            --accent-green: #4caf50;
            --accent-cyan: #00bcd4;
            --accent-pink: #e91e63;
            --accent-yellow: #ffeb3b;
            --border-light: #555;
            --shadow-hard: 4px 4px 0px #000;
        }

        * {
            box-sizing: border-box;
            scrollbar-width: thin;
            scrollbar-color: var(--accent-cyan) var(--bg-color);
        }

        body {
            margin: 0;
            padding: 0;
            background-color: var(--bg-color);
            color: var(--text-color);
            font-family: 'VT323', monospace;
            font-size: 1.3rem;
            line-height: 1.6;
            overflow-x: hidden;
        }

        /* CRT Scanline Effect Overlay */
        body::before {
            content: " ";
            display: block;
            position: fixed;
            top: 0;
            left: 0;
            bottom: 0;
            right: 0;
            background: linear-gradient(rgba(18, 16, 16, 0) 50%, rgba(0, 0, 0, 0.25) 50%), linear-gradient(90deg, rgba(255, 0, 0, 0.06), rgba(0, 255, 0, 0.02), rgba(0, 0, 255, 0.06));
            z-index: 1000;
            background-size: 100% 2px, 3px 100%;
            pointer-events: none;
        }

        h1, h2, h3 {
            font-family: 'Press Start 2P', cursive;
            text-transform: uppercase;
            line-height: 1.4;
            margin-top: 2rem;
        }

        h1 { color: var(--accent-yellow); font-size: 1.8rem; text-shadow: 4px 4px 0px #000; text-align: center; }
        h2 { color: var(--accent-cyan); font-size: 1.2rem; border-bottom: 4px solid var(--accent-cyan); padding-bottom: 10px; display: inline-block; }
        h3 { color: var(--accent-pink); font-size: 1rem; margin-top: 1.5rem; }

        a { color: var(--accent-yellow); text-decoration: none; border-bottom: 2px dashed var(--accent-yellow); transition: all 0.2s; }
        a:hover { background: var(--accent-yellow); color: #000; }

        /* Container Layout */
        .container {
            max-width: 900px;
            margin: 0 auto;
            padding: 20px;
            position: relative;
            z-index: 1;
        }

        /* 8-bit Box Styling */
        .retro-box {
            background: var(--panel-bg);
            border: 4px solid var(--text-color);
            box-shadow: var(--shadow-hard);
            padding: 2rem;
            margin-bottom: 3rem;
            position: relative;
        }

        .retro-box::after {
            content: "";
            position: absolute;
            top: -4px; left: -4px; right: -4px; bottom: -4px;
            box-shadow: inset 0 0 20px rgba(0,0,0,0.8);
            pointer-events: none;
        }

        /* Header / Hero */
        header {
            text-align: center;
            padding: 4rem 0;
            border-bottom: 4px solid var(--accent-green);
            margin-bottom: 2rem;
            background: #1a1a1a;
        }

        .pixel-subtitle {
            font-family: 'Press Start 2P', cursive;
            font-size: 0.7rem;
            color: var(--accent-green);
            margin-top: 1rem;
            letter-spacing: 2px;
        }

        /* Comparison Tables */
        .comparison-grid {
            display: grid;
            grid-template-columns: 1fr 1fr;
            gap: 20px;
            margin-top: 20px;
        }

        @media (max-width: 768px) {
            .comparison-grid { grid-template-columns: 1fr; }
        }

        .cmp-card {
            border: 2px solid var(--border-light);
            padding: 15px;
            background: #1e1e1e;
        }
        
        .cmp-card strong { color: var(--accent-yellow); }

        /* Code/Math Blocks */
        code, pre {
            font-family: 'VT323', monospace;
            background: #000;
            color: var(--accent-green);
            padding: 2px 6px;
            border: 1px solid var(--accent-green);
        }
        
        pre {
            padding: 15px;
            display: block;
            overflow-x: auto;
            box-shadow: var(--shadow-hard);
        }

        .math-block {
            background: #000;
            border: 2px dashed var(--accent-pink);
            color: #fff;
            padding: 1.5rem;
            text-align: center;
            font-size: 1.5rem;
            margin: 20px 0;
        }

        /* Interactive Bit Manipulator */
        .bit-lab {
            background: #000;
            border: 4px solid var(--accent-cyan);
            padding: 20px;
            margin: 30px 0;
            text-align: center;
        }

        .bit-container {
            display: flex;
            justify-content: center;
            flex-wrap: wrap;
            gap: 10px;
            margin-bottom: 20px;
        }

        .bit {
            width: 50px;
            height: 70px;
            font-family: 'Press Start 2P', cursive;
            font-size: 1.5rem;
            display: flex;
            align-items: center;
            justify-content: center;
            cursor: pointer;
            border: 2px solid #fff;
            background: #222;
            transition: transform 0.1s;
            position: relative;
            user-select: none;
        }

        .bit:active { transform: translateY(4px); box-shadow: none !important; }
        .bit.one { background: var(--accent-green); color: #000; box-shadow: 0 0 10px var(--accent-green); }
        .bit.zero { background: #333; color: #555; }
        
        /* Bit Labels */
        .bit::before {
            content: attr(data-label);
            position: absolute;
            top: -25px;
            font-family: 'VT323';
            font-size: 1rem;
            color: #aaa;
            width: 100%;
            text-align: center;
        }

        /* Color coding groups */
        .bit.sign { border-color: var(--accent-pink); }
        .bit.exp { border-color: var(--accent-cyan); }
        .bit.mant { border-color: var(--accent-yellow); }

        .lab-readout {
            display: grid;
            grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
            gap: 15px;
            text-align: left;
            margin-top: 20px;
            font-family: 'VT323';
        }

        .readout-box {
            background: #111;
            border: 1px solid #444;
            padding: 10px;
        }
        
        .readout-label { color: #888; font-size: 0.9rem; display: block; }
        .readout-value { font-size: 1.4rem; color: #fff; }

        .final-result {
            grid-column: 1 / -1;
            background: var(--accent-green);
            color: #000;
            text-align: center;
            padding: 10px;
            font-weight: bold;
            font-size: 1.8rem;
            border: 4px solid #fff;
            box-shadow: 4px 4px 0px #fff;
            margin-top: 10px;
        }

        /* Rounding Graphs (CSS only) */
        .chart-container {
            height: 200px;
            border-left: 2px solid #fff;
            border-bottom: 2px solid #fff;
            position: relative;
            margin: 40px 20px;
            background: linear-gradient(to top, rgba(255,255,255,0.05) 1px, transparent 1px);
            background-size: 100% 20px;
        }

        .chart-point {
            width: 8px;
            height: 8px;
            background: var(--accent-pink);
            position: absolute;
            border-radius: 0; /* Square pixels */
        }
        
        .chart-label {
            position: absolute;
            bottom: -30px;
            left: 50%;
            transform: translateX(-50%);
            font-size: 0.9rem;
            color: #888;
        }

        footer {
            text-align: center;
            padding: 3rem;
            background: #000;
            color: #555;
            font-size: 0.9rem;
            border-top: 4px solid var(--accent-green);
        }

        /* Utility */
        .highlight { color: var(--accent-yellow); font-weight: bold; }
        .text-center { text-align: center; }
        
        /* Tutorial step */
        .step-badge {
            background: var(--accent-pink);
            color: white;
            padding: 2px 8px;
            font-family: 'Press Start 2P';
            font-size: 0.6rem;
            margin-right: 10px;
            vertical-align: middle;
        }
    </style>
</head>
<body>

<header>
    <div class="container">
        <h1>CSE 230: Module 4.6</h1>
        <div class="pixel-subtitle">Floating Point & Rounding Error</div>
        <br>
        <span style="color:var(--accent-cyan)">[ INSERT COIN TO LEARN ]</span>
    </div>
</header>

<div class="container">

    <section class="retro-box">
        <h2>01. Non-Integer Numbers</h2>
        <p>In standard binary, we handle integers easily (unsigned or two's complement). But what about <strong>3.1415</strong> or <strong>0.005</strong>? In base 10 (Decimal), digits to the right of the dot represent powers of 1/10 ($10^{-1}, 10^{-2}$...).</p>
        <p>In Binary (base 2), we do the same. Digits to the right of the binary point represent powers of 1/2 ($2^{-1}, 2^{-2}$...).</p>
        
        <div class="math-block">
            ... $2^2$ | $2^1$ | $2^0$ . $2^{-1}$ | $2^{-2}$ | $2^{-3}$ ...
        </div>
        
        <p>There are two main ways to handle this in computer architecture:</p>
        
        <div class="comparison-grid">
            <div class="cmp-card">
                <h3>Fixed Point</h3>
                <p>We lock the decimal point at a specific column (e.g., the last 4 bits are always fractions).</p>
                <ul>
                    <li><strong style="color:var(--accent-green)">PRO:</strong> Simple math and logic.</li>
                    <li><strong style="color:var(--accent-pink)">CON:</strong> Inflexible. You lose range. If you need tiny numbers, you can't have huge numbers.</li>
                </ul>
            </div>
            <div class="cmp-card">
                <h3>Floating Point (IEEE 754)</h3>
                <p>The decimal point "floats" using scientific notation ($1.xxx \times 2^{exp}$).</p>
                <ul>
                    <li><strong style="color:var(--accent-green)">PRO:</strong> Huge dynamic range (tiny to massive).</li>
                    <li><strong style="color:var(--accent-pink)">CON:</strong> More complex hardware; precision varies based on magnitude.</li>
                </ul>
            </div>
        </div>
    </section>

    <section class="retro-box">
        <h2>02. Anatomy of IEEE 754</h2>
        <p>We will use the <strong>Hypothetical 8-Bit Model</strong> from your text to explain the standard 32-bit format. It divides bits into three distinct groups:</p>
        
        <ol>
            <li><span class="step-badge">1</span> <strong>Sign Bit (1 bit):</strong> 0 = Positive, 1 = Negative.</li>
            <li><span class="step-badge">2</span> <strong>Exponent (4 bits):</strong> Determines the scale (moving the dot). It uses a <strong>Bias</strong> of 7. <br><code>Actual Exp = BinaryValue - 7</code></li>
            <li><span class="step-badge">3</span> <strong>Significand (3 bits):</strong> The precision. It assumes a leading "1" before the dot. <br><code>Value = 1.0 + bits</code></li>
        </ol>

        <div class="math-block">
            Value = $(-1)^S \times (1 + \text{Signif}) \times 2^{(Exp - 7)}$
        </div>
    </section>

    <section class="retro-box" style="border-color: var(--accent-cyan);">
        <h2 style="color:var(--accent-yellow)">03. Interactive Bit Lab</h2>
        <p>Click the bits below to flip them (0/1) and see how the floating point value is calculated in real-time. This models the 8-bit example.</p>
        
        <div class="bit-lab">
            <div class="bit-container" id="bitSwitches">
                </div>

            <div class="lab-readout">
                <div class="readout-box" style="border-color:var(--accent-pink)">
                    <span class="readout-label">SIGN BIT</span>
                    <div id="outSign" class="readout-value">Positive (+)</div>
                </div>
                <div class="readout-box" style="border-color:var(--accent-cyan)">
                    <span class="readout-label">EXPONENT (Bias 7)</span>
                    <div id="outExp" class="readout-value">2^0</div>
                </div>
                <div class="readout-box" style="border-color:var(--accent-yellow)">
                    <span class="readout-label">SIGNIFICAND (1.xxx)</span>
                    <div id="outMant" class="readout-value">1.0</div>
                </div>
                <div class="readout-box final-result">
                    <div id="outFinal">1.0</div>
                </div>
            </div>
            
            <p style="font-size: 0.8rem; color:#666; margin-top:10px;">Calculation: <span id="outFormula"></span></p>
        </div>

        <h3>Try These Text Examples:</h3>
        <ul>
            <li><strong>Example 1 (1.0):</strong> <code>0 0111 000</code> (Exp is 7-7=0, Mantissa is 0)</li>
            <li><strong>Example 2 (1.875):</strong> <code>0 0111 111</code> (Exp is 0, Mantissa is 0.875)</li>
            <li><strong>Example 3 (-3.0):</strong> <code>1 1000 100</code> (Sign -, Exp 8-7=1, Mantissa 0.5)</li>
        </ul>
    </section>

    <section class="retro-box">
        <h2>04. Rounding Error</h2>
        <p>Because we have finite bits (8 in our model, 32 in standard), we cannot represent every number on the infinite number line. We can only store specific "dots".</p>
        
        <div style="background:#000; padding:15px; margin-bottom:15px; border-left:4px solid var(--accent-pink);">
            <p><strong>Key Concept:</strong> The gaps between representable numbers get <em>larger</em> as the numbers get bigger.</p>
        </div>

        <p>Near Zero, accuracy is high (dense dots). At high magnitudes, accuracy drops (sparse dots).</p>

        <div class="chart-container">
            <div class="chart-point" style="left: 5%; bottom: 10px;"></div>
            <div class="chart-point" style="left: 8%; bottom: 10px;"></div>
            <div class="chart-point" style="left: 11%; bottom: 10px;"></div>
            <div class="chart-point" style="left: 14%; bottom: 10px;"></div>
            
            <div class="chart-point" style="left: 25%; bottom: 10px;"></div>
            <div class="chart-point" style="left: 30%; bottom: 10px;"></div>
            
            <div class="chart-point" style="left: 50%; bottom: 10px;"></div>
            <div class="chart-point" style="left: 60%; bottom: 10px;"></div>
            
            <div class="chart-point" style="left: 85%; bottom: 10px;"></div>
            
            <div class="chart-label">0 &larr; Value Magnitude &rarr; MAX</div>
        </div>
        
        <p>With IEEE 754 (32-bit), the max rounding error is approx <strong>0.000006%</strong>. Sufficient for engineering, but dangerous if you ignore it in high-precision comparison logic.</p>
    </section>

</div>

<footer>
    CSE 230 // SPRING 2026 // ASU <br>
    <span style="color:var(--accent-green)">SYSTEM READY.</span>
</footer>

<script>
    // 8-Bit Floating Point Logic
    // Structure: 1 Sign, 4 Exponent, 3 Mantissa
    const bias = 7;
    
    // Initial State: 0 0111 000 (Represents 1.0)
    // Indexes: 0=Sign, 1-4=Exp, 5-7=Mantissa
    let bits = [0, 0, 1, 1, 1, 0, 0, 0];

    const container = document.getElementById('bitSwitches');

    function renderBits() {
        container.innerHTML = '';
        bits.forEach((val, index) => {
            const el = document.createElement('div');
            el.classList.add('bit');
            if (val === 1) {
                el.classList.add('one');
                el.innerText = '1';
                el.classList.remove('zero');
            } else {
                el.classList.add('zero');
                el.innerText = '0';
                el.classList.remove('one');
            }

            // Labels and Colors
            if (index === 0) {
                el.dataset.label = "SIGN";
                el.classList.add('sign');
            } else if (index >= 1 && index <= 4) {
                el.dataset.label = index === 2 ? "EXP" : ""; // Only label middle
                el.classList.add('exp');
            } else {
                el.dataset.label = index === 6 ? "MANT" : "";
                el.classList.add('mant');
            }

            // Click Handler
            el.onclick = () => {
                bits[index] = bits[index] === 0 ? 1 : 0;
                calculate();
                renderBits(); // Re-render to update UI
            };
            
            container.appendChild(el);
        });
    }

    function calculate() {
        // 1. Sign
        const s = bits[0];
        const signMultiplier = s === 0 ? 1 : -1;
        document.getElementById('outSign').innerText = s === 0 ? "Positive (+)" : "Negative (-)";
        document.getElementById('outSign').style.color = s === 0 ? "var(--accent-green)" : "var(--accent-pink)";

        // 2. Exponent (4 bits)
        // Convert array slice to string binary, then to int
        const expBits = bits.slice(1, 5).join('');
        const expInt = parseInt(expBits, 2);
        const actualExp = expInt - bias;
        document.getElementById('outExp').innerText = `2^${actualExp} (Raw: ${expInt})`;

        // 3. Significand (3 bits)
        // 1st bit = 1/2, 2nd = 1/4, 3rd = 1/8
        const m1 = bits[5] * 0.5;
        const m2 = bits[6] * 0.25;
        const m3 = bits[7] * 0.125;
        const fraction = m1 + m2 + m3;
        const actualMant = 1 + fraction;
        document.getElementById('outMant').innerText = actualMant.toFixed(3);

        // 4. Final Math
        const result = signMultiplier * actualMant * Math.pow(2, actualExp);
        
        document.getElementById('outFinal').innerText = result;
        
        // Update formula text
        const signStr = s === 0 ? "" : "-";
        document.getElementById('outFormula').innerHTML = 
            `${signStr}1.0 * ${actualMant} * 2<sup>${actualExp}</sup>`;
    }

    // Initialize
    renderBits();
    calculate();

</script>

</body>
</html>