Show description
Computer Architecture & Organization | Complete Reference
Computer Architecture & Organization | Complete Reference
Complete Reference Guide
Computer Architecture & Organization
From high-level programs to hardware execution — everything you need to understand how computers actually work.
01Program Execution
02Performance
03Data Representation
04Assembly Language
05Hardware Techniques
06Datapath Design
07Memory Hierarchy
08Multiprocessors
01
How Programs Are Executed by a Computer System
Basic Components of a CPU Design
The Central Processing Unit (CPU) is the brain of the computer. It consists of several fundamental components that work together to execute instructions:
Control Unit (CU)
The "manager" of the CPU. It fetches instructions from memory, decodes them to understand what operation to perform, and coordinates all other components to execute the instruction. Think of it as the conductor of an orchestra.
Arithmetic Logic Unit (ALU)
The "calculator" of the CPU. Performs all arithmetic operations (add, subtract, multiply, divide) and logical operations (AND, OR, NOT, XOR, comparisons). Every computation flows through here.
Registers
Ultra-fast, small storage locations inside the CPU. Used to hold data currently being processed, addresses, and status information. Much faster than RAM because they're on the CPU chip itself.
Program Counter (PC)
A special register that holds the memory address of the next instruction to execute. After each instruction fetch, it typically increments to point to the next sequential instruction (unless a jump/branch occurs).
Simplified CPU Architecture
┌─────────────────────────────────────────────────────────────┐
│ CPU │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Control Unit │ │
│ │ ┌─────────┐ ┌─────────────┐ ┌──────────────┐ │ │
│ │ │ PC │ │ Instruction │ │ Control │ │ │
│ │ │ Counter │──│ Register │──│ Signals │────┼───┼──▶
│ │ └─────────┘ └─────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────┼────────────────────────────┐ │
│ │ Register File │ │ │
│ │ ┌────┐ ┌────┐ ┌────┐ │ ┌────┐ ┌────┐ ┌────┐ │ │
│ │ │ R0 │ │ R1 │ │ R2 │ │ │ R3 │ │...…
Computer Architecture & Organization | Complete Reference
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Computer Architecture & Organization | Complete Reference</title>
<link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@300;400;500;600;700&family=Space+Grotesk:wght@300;400;500;600;700&family=Outfit:wght@300;400;500;600;700&display=swap" rel="stylesheet">
<style>
:root {
--bg-primary: #0a0e14;
--bg-secondary: #0d1117;
--bg-tertiary: #161b22;
--bg-card: #1a1f26;
--accent-cyan: #00d9ff;
--accent-green: #39ff14;
--accent-purple: #bf5af2;
--accent-orange: #ff9f43;
--accent-red: #ff5f56;
--text-primary: #e6edf3;
--text-secondary: #8b949e;
--text-muted: #484f58;
--border-color: #30363d;
--glow-cyan: 0 0 20px rgba(0, 217, 255, 0.3);
--glow-green: 0 0 20px rgba(57, 255, 20, 0.3);
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
html {
scroll-behavior: smooth;
}
body {
font-family: 'Outfit', sans-serif;
background: var(--bg-primary);
color: var(--text-primary);
line-height: 1.7;
min-height: 100vh;
overflow-x: hidden;
}
/* Animated background grid */
body::before {
content: '';
position: fixed;
top: 0;
left: 0;
right: 0;
bottom: 0;
background-image:
linear-gradient(rgba(0, 217, 255, 0.03) 1px, transparent 1px),
linear-gradient(90deg, rgba(0, 217, 255, 0.03) 1px, transparent 1px);
background-size: 50px 50px;
pointer-events: none;
z-index: -1;
}
/* Gradient orbs */
.orb {
position: fixed;
border-radius: 50%;
filter: blur(100px);
opacity: 0.15;
pointer-events: none;
z-index: -1;
}
.orb-1 {
width: 600px;
height: 600px;
background: var(--accent-cyan);
top: -200px;
right: -200px;
animation: float 20s ease-in-out infinite;
}
.orb-2 {
width: 400px;
height: 400px;
background: var(--accent-purple);
bottom: -100px;
left: -100px;
animation: float 25s ease-in-out infinite reverse;
}
@keyframes float {
0%, 100% { transform: translate(0, 0); }
50% { transform: translate(30px, 30px); }
}
/* Header */
header {
position: relative;
padding: 80px 40px;
text-align: center;
border-bottom: 1px solid var(--border-color);
background: linear-gradient(180deg, var(--bg-secondary) 0%, var(--bg-primary) 100%);
}
.header-badge {
display: inline-block;
padding: 8px 16px;
background: rgba(0, 217, 255, 0.1);
border: 1px solid var(--accent-cyan);
border-radius: 20px;
font-family: 'JetBrains Mono', monospace;
font-size: 0.75rem;
color: var(--accent-cyan);
text-transform: uppercase;
letter-spacing: 2px;
margin-bottom: 24px;
animation: pulse 3s ease-in-out infinite;
}
@keyframes pulse {
0%, 100% { box-shadow: 0 0 0 0 rgba(0, 217, 255, 0.4); }
50% { box-shadow: 0 0 20px 5px rgba(0, 217, 255, 0.2); }
}
h1 {
font-family: 'Space Grotesk', sans-serif;
font-size: clamp(2.5rem, 5vw, 4rem);
font-weight: 700;
background: linear-gradient(135deg, var(--text-primary) 0%, var(--accent-cyan) 100%);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
background-clip: text;
margin-bottom: 16px;
letter-spacing: -1px;
}
.subtitle {
font-size: 1.1rem;
color: var(--text-secondary);
max-width: 600px;
margin: 0 auto;
}
/* Navigation */
nav {
position: sticky;
top: 0;
z-index: 100;
background: rgba(10, 14, 20, 0.9);
backdrop-filter: blur(20px);
border-bottom: 1px solid var(--border-color);
padding: 16px 40px;
overflow-x: auto;
}
nav ul {
display: flex;
gap: 8px;
list-style: none;
max-width: 1400px;
margin: 0 auto;
justify-content: center;
flex-wrap: wrap;
}
nav a {
display: block;
padding: 10px 18px;
color: var(--text-secondary);
text-decoration: none;
font-family: 'JetBrains Mono', monospace;
font-size: 0.8rem;
border-radius: 8px;
transition: all 0.3s ease;
white-space: nowrap;
border: 1px solid transparent;
}
nav a:hover {
color: var(--accent-cyan);
background: rgba(0, 217, 255, 0.1);
border-color: var(--accent-cyan);
}
nav a .num {
color: var(--accent-green);
margin-right: 6px;
}
/* Main Content */
main {
max-width: 1200px;
margin: 0 auto;
padding: 60px 40px;
}
/* Section Styling */
section {
margin-bottom: 80px;
animation: fadeIn 0.6s ease-out;
}
@keyframes fadeIn {
from { opacity: 0; transform: translateY(20px); }
to { opacity: 1; transform: translateY(0); }
}
.section-header {
display: flex;
align-items: center;
gap: 20px;
margin-bottom: 40px;
padding-bottom: 20px;
border-bottom: 1px solid var(--border-color);
}
.section-number {
font-family: 'JetBrains Mono', monospace;
font-size: 3rem;
font-weight: 700;
color: var(--accent-cyan);
opacity: 0.3;
line-height: 1;
}
h2 {
font-family: 'Space Grotesk', sans-serif;
font-size: 1.8rem;
font-weight: 600;
color: var(--text-primary);
}
h3 {
font-family: 'Space Grotesk', sans-serif;
font-size: 1.3rem;
font-weight: 600;
color: var(--accent-cyan);
margin: 40px 0 20px;
display: flex;
align-items: center;
gap: 12px;
}
h3::before {
content: '▸';
color: var(--accent-green);
}
h4 {
font-size: 1.1rem;
color: var(--text-primary);
margin: 24px 0 12px;
font-weight: 500;
}
p {
color: var(--text-secondary);
margin-bottom: 16px;
}
/* Content Cards */
.card {
background: var(--bg-card);
border: 1px solid var(--border-color);
border-radius: 12px;
padding: 28px;
margin-bottom: 24px;
transition: all 0.3s ease;
position: relative;
overflow: hidden;
}
.card::before {
content: '';
position: absolute;
top: 0;
left: 0;
width: 3px;
height: 100%;
background: linear-gradient(180deg, var(--accent-cyan), var(--accent-purple));
opacity: 0;
transition: opacity 0.3s ease;
}
.card:hover {
border-color: var(--accent-cyan);
transform: translateX(4px);
}
.card:hover::before {
opacity: 1;
}
/* Code Blocks */
pre {
background: var(--bg-secondary);
border: 1px solid var(--border-color);
border-radius: 8px;
padding: 20px;
overflow-x: auto;
margin: 16px 0;
position: relative;
}
pre::before {
content: attr(data-lang);
position: absolute;
top: 8px;
right: 12px;
font-family: 'JetBrains Mono', monospace;
font-size: 0.7rem;
color: var(--text-muted);
text-transform: uppercase;
letter-spacing: 1px;
}
code {
font-family: 'JetBrains Mono', monospace;
font-size: 0.9rem;
line-height: 1.6;
color: var(--text-primary);
}
.code-comment { color: var(--text-muted); }
.code-keyword { color: var(--accent-purple); }
.code-register { color: var(--accent-cyan); }
.code-number { color: var(--accent-orange); }
.code-label { color: var(--accent-green); }
.code-string { color: var(--accent-orange); }
/* Inline code */
p code, li code {
background: rgba(0, 217, 255, 0.1);
padding: 2px 8px;
border-radius: 4px;
font-size: 0.85em;
color: var(--accent-cyan);
}
/* Tables */
.table-wrapper {
overflow-x: auto;
margin: 20px 0;
border-radius: 8px;
border: 1px solid var(--border-color);
}
table {
width: 100%;
border-collapse: collapse;
font-size: 0.9rem;
}
th {
background: var(--bg-tertiary);
color: var(--accent-cyan);
font-family: 'JetBrains Mono', monospace;
font-weight: 500;
text-align: left;
padding: 14px 18px;
border-bottom: 2px solid var(--accent-cyan);
text-transform: uppercase;
font-size: 0.75rem;
letter-spacing: 1px;
}
td {
padding: 14px 18px;
border-bottom: 1px solid var(--border-color);
color: var(--text-secondary);
}
tr:hover td {
background: rgba(0, 217, 255, 0.05);
}
/* Lists */
ul, ol {
margin: 16px 0 16px 24px;
color: var(--text-secondary);
}
li {
margin-bottom: 10px;
padding-left: 8px;
}
li::marker {
color: var(--accent-green);
}
/* Highlight boxes */
.highlight {
background: rgba(0, 217, 255, 0.08);
border-left: 3px solid var(--accent-cyan);
padding: 16px 20px;
border-radius: 0 8px 8px 0;
margin: 20px 0;
}
.highlight.warning {
background: rgba(255, 159, 67, 0.08);
border-left-color: var(--accent-orange);
}
.highlight.important {
background: rgba(191, 90, 242, 0.08);
border-left-color: var(--accent-purple);
}
.highlight-title {
font-family: 'JetBrains Mono', monospace;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 1px;
color: var(--accent-cyan);
margin-bottom: 8px;
}
.highlight.warning .highlight-title { color: var(--accent-orange); }
.highlight.important .highlight-title { color: var(--accent-purple); }
/* Formula display */
.formula {
background: var(--bg-secondary);
border: 1px solid var(--border-color);
border-radius: 8px;
padding: 20px;
text-align: center;
font-family: 'JetBrains Mono', monospace;
font-size: 1.1rem;
color: var(--accent-cyan);
margin: 20px 0;
letter-spacing: 1px;
}
/* Diagram containers */
.diagram {
background: var(--bg-secondary);
border: 1px solid var(--border-color);
border-radius: 8px;
padding: 30px;
margin: 24px 0;
text-align: center;
font-family: 'JetBrains Mono', monospace;
overflow-x: auto;
}
.diagram-title {
font-size: 0.8rem;
color: var(--text-muted);
margin-bottom: 20px;
text-transform: uppercase;
letter-spacing: 2px;
}
.diagram pre {
background: transparent;
border: none;
padding: 0;
display: inline-block;
text-align: left;
}
/* Grid layouts */
.grid-2 {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
gap: 24px;
margin: 24px 0;
}
/* Subsection styling */
.subsection {
margin-left: 20px;
padding-left: 20px;
border-left: 1px solid var(--border-color);
margin-top: 30px;
}
/* Tags */
.tag {
display: inline-block;
padding: 4px 10px;
background: rgba(57, 255, 20, 0.1);
border: 1px solid var(--accent-green);
border-radius: 4px;
font-family: 'JetBrains Mono', monospace;
font-size: 0.7rem;
color: var(--accent-green);
text-transform: uppercase;
letter-spacing: 1px;
margin-right: 8px;
}
.tag.purple {
background: rgba(191, 90, 242, 0.1);
border-color: var(--accent-purple);
color: var(--accent-purple);
}
.tag.orange {
background: rgba(255, 159, 67, 0.1);
border-color: var(--accent-orange);
color: var(--accent-orange);
}
/* Footer */
footer {
text-align: center;
padding: 60px 40px;
border-top: 1px solid var(--border-color);
background: var(--bg-secondary);
}
footer p {
font-family: 'JetBrains Mono', monospace;
font-size: 0.8rem;
color: var(--text-muted);
}
/* Scrollbar */
::-webkit-scrollbar {
width: 8px;
height: 8px;
}
::-webkit-scrollbar-track {
background: var(--bg-primary);
}
::-webkit-scrollbar-thumb {
background: var(--border-color);
border-radius: 4px;
}
::-webkit-scrollbar-thumb:hover {
background: var(--text-muted);
}
/* Selection */
::selection {
background: var(--accent-cyan);
color: var(--bg-primary);
}
/* Responsive */
@media (max-width: 768px) {
header, main, footer, nav {
padding-left: 20px;
padding-right: 20px;
}
.section-header {
flex-direction: column;
align-items: flex-start;
gap: 10px;
}
.section-number {
font-size: 2rem;
}
nav ul {
justify-content: flex-start;
}
}
</style>
</head>
<body>
<div class="orb orb-1"></div>
<div class="orb orb-2"></div>
<header>
<div class="header-badge">Complete Reference Guide</div>
<h1>Computer Architecture & Organization</h1>
<p class="subtitle">From high-level programs to hardware execution — everything you need to understand how computers actually work.</p>
</header>
<nav>
<ul>
<li><a href="#section-1"><span class="num">01</span>Program Execution</a></li>
<li><a href="#section-2"><span class="num">02</span>Performance</a></li>
<li><a href="#section-3"><span class="num">03</span>Data Representation</a></li>
<li><a href="#section-4"><span class="num">04</span>Assembly Language</a></li>
<li><a href="#section-5"><span class="num">05</span>Hardware Techniques</a></li>
<li><a href="#section-6"><span class="num">06</span>Datapath Design</a></li>
<li><a href="#section-7"><span class="num">07</span>Memory Hierarchy</a></li>
<li><a href="#section-8"><span class="num">08</span>Multiprocessors</a></li>
</ul>
</nav>
<main>
<!-- SECTION 1 -->
<section id="section-1">
<div class="section-header">
<span class="section-number">01</span>
<h2>How Programs Are Executed by a Computer System</h2>
</div>
<h3>Basic Components of a CPU Design</h3>
<div class="card">
<p>The Central Processing Unit (CPU) is the brain of the computer. It consists of several fundamental components that work together to execute instructions:</p>
<div class="grid-2">
<div>
<h4>Control Unit (CU)</h4>
<p>The "manager" of the CPU. It fetches instructions from memory, decodes them to understand what operation to perform, and coordinates all other components to execute the instruction. Think of it as the conductor of an orchestra.</p>
</div>
<div>
<h4>Arithmetic Logic Unit (ALU)</h4>
<p>The "calculator" of the CPU. Performs all arithmetic operations (add, subtract, multiply, divide) and logical operations (AND, OR, NOT, XOR, comparisons). Every computation flows through here.</p>
</div>
<div>
<h4>Registers</h4>
<p>Ultra-fast, small storage locations inside the CPU. Used to hold data currently being processed, addresses, and status information. Much faster than RAM because they're on the CPU chip itself.</p>
</div>
<div>
<h4>Program Counter (PC)</h4>
<p>A special register that holds the memory address of the next instruction to execute. After each instruction fetch, it typically increments to point to the next sequential instruction (unless a jump/branch occurs).</p>
</div>
</div>
<div class="diagram">
<div class="diagram-title">Simplified CPU Architecture</div>
<pre>
┌─────────────────────────────────────────────────────────────┐
│ CPU │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Control Unit │ │
│ │ ┌─────────┐ ┌─────────────┐ ┌──────────────┐ │ │
│ │ │ PC │ │ Instruction │ │ Control │ │ │
│ │ │ Counter │──│ Register │──│ Signals │────┼───┼──▶
│ │ └─────────┘ └─────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────┼────────────────────────────┐ │
│ │ Register File │ │ │
│ │ ┌────┐ ┌────┐ ┌────┐ │ ┌────┐ ┌────┐ ┌────┐ │ │
│ │ │ R0 │ │ R1 │ │ R2 │ │ │ R3 │ │... │ │ Rn │ │ │
│ │ └────┘ └────┘ └────┘ │ └────┘ └────┘ └────┘ │ │
│ └─────────────────────────┼────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────▼────────────────────────────┐ │
│ │ ALU │ │
│ │ ADD SUB MUL DIV AND OR XOR NOT │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
◀────────┴────────▶
Memory Bus
</pre>
</div>
</div>
<h3>The Program Compilation Process</h3>
<div class="card">
<p>When you write code in a high-level language like C or Python, it can't run directly on hardware. The compilation process transforms human-readable code into machine instructions the CPU understands.</p>
<div class="diagram">
<div class="diagram-title">Compilation Pipeline</div>
<pre>
Source Code (.c)
│
▼
┌─────────────────┐
│ PREPROCESSOR │ → Handles #include, #define, macros
└────────┬────────┘ Produces expanded source code
│
▼
┌─────────────────┐
│ COMPILER │ → Converts C to Assembly language
└────────┬────────┘ Syntax checking, optimization
│
▼
┌─────────────────┐
│ ASSEMBLER │ → Converts Assembly to Object code
└────────┬────────┘ Machine instructions (binary)
│
▼
┌─────────────────┐
│ LINKER │ → Combines object files + libraries
└────────┬────────┘ Resolves external references
│
▼
Executable File
</pre>
</div>
<h4>Stage Details</h4>
<ul>
<li><strong>Preprocessor:</strong> Text substitution phase. Expands macros, includes header files, processes conditional compilation directives. Output is still C code.</li>
<li><strong>Compiler:</strong> The heavy lifting. Performs lexical analysis (tokenization), parsing (syntax tree), semantic analysis (type checking), optimization, and code generation. Output is assembly language.</li>
<li><strong>Assembler:</strong> Translates assembly mnemonics into binary machine code. Creates object files containing machine instructions plus symbol tables for linking.</li>
<li><strong>Linker:</strong> Combines multiple object files and libraries. Resolves symbol references (like function calls to library functions). Produces final executable with proper memory addresses.</li>
</ul>
</div>
<h3>The Instruction Cycle (Fetch-Decode-Execute)</h3>
<div class="card">
<p>Every instruction the CPU executes follows the same fundamental cycle. This continuous loop is the heartbeat of computation.</p>
<div class="diagram">
<div class="diagram-title">Instruction Cycle</div>
<pre>
┌─────────────────────────────────────────────────────┐
│ │
▼ │
┌─────────┐ ┌──────────┐ ┌───────────┐ │
│ FETCH │ ──▶ │ DECODE │ ──▶ │ EXECUTE │ ─────────┘
└─────────┘ └──────────┘ └───────────┘
│ │ │
│ │ │
Read instruction Determine Perform the
from memory at what operation operation:
address in PC, and operands, ALU computation,
increment PC setup control memory access,
signals or branch
</pre>
</div>
<div class="highlight">
<div class="highlight-title">The Three Stages Explained</div>
<p><strong>1. Fetch:</strong> The CPU reads the instruction from memory at the address stored in the Program Counter. The instruction is loaded into the Instruction Register. The PC is incremented to point to the next instruction.</p>
<p><strong>2. Decode:</strong> The Control Unit examines the instruction to determine: What operation? Which registers? What addressing mode? It generates control signals to coordinate the execution.</p>
<p><strong>3. Execute:</strong> The actual work happens. This might involve the ALU performing arithmetic, data being read/written to memory, or the PC being modified for branches/jumps.</p>
</div>
<p>Some architectures add additional stages:</p>
<ul>
<li><strong>Memory Access:</strong> For load/store instructions, access RAM to read or write data</li>
<li><strong>Write Back:</strong> Store the result of the operation back into a register</li>
</ul>
</div>
</section>
<!-- SECTION 2 -->
<section id="section-2">
<div class="section-header">
<span class="section-number">02</span>
<h2>Hardware Factors Impacting Performance</h2>
</div>
<h3>Measuring and Comparing Computer Performance</h3>
<div class="card">
<p>Performance can be measured in several ways, each telling a different part of the story:</p>
<div class="grid-2">
<div class="highlight">
<div class="highlight-title">Execution Time</div>
<p>Total time to complete a task. Lower is better. This is what end users care about most.</p>
</div>
<div class="highlight important">
<div class="highlight-title">Throughput</div>
<p>Tasks completed per unit time. Higher is better. Important for servers handling many requests.</p>
</div>
</div>
<div class="formula">
Performance = 1 / Execution Time
</div>
<p>Comparing two systems:</p>
<div class="formula">
Speedup = Performance_new / Performance_old = Time_old / Time_new
</div>
<h4>Key Metrics</h4>
<ul>
<li><strong>MIPS (Million Instructions Per Second):</strong> Raw instruction throughput. Can be misleading because different instructions do different amounts of work.</li>
<li><strong>FLOPS (Floating Point Operations Per Second):</strong> Better for scientific computing comparisons. Measures actual computational work.</li>
<li><strong>Benchmarks (SPEC, Geekbench):</strong> Standardized programs that measure real-world performance across various workloads.</li>
</ul>
</div>
<h3>Propagation Delay</h3>
<div class="card">
<p>Propagation delay is the time it takes for a signal to travel through a logic gate or circuit. It's the fundamental speed limit of digital circuits.</p>
<div class="formula">
t_pd = Time for output to change after input changes
</div>
<p>Propagation delay is affected by:</p>
<ul>
<li><strong>Gate complexity:</strong> More transistors = more delay</li>
<li><strong>Capacitive load:</strong> More outputs to drive = slower switching</li>
<li><strong>Wire length:</strong> Signals take time to travel, even at near light speed</li>
<li><strong>Temperature:</strong> Higher temps generally increase delay</li>
<li><strong>Voltage:</strong> Lower voltage = slower transitions</li>
</ul>
<div class="highlight warning">
<div class="highlight-title">Critical Path</div>
<p>The critical path is the longest path through combinational logic between registers. It determines the minimum clock period and thus the maximum clock frequency. All design optimization focuses on reducing critical path delay.</p>
</div>
</div>
<h3>Clock Speed and Performance</h3>
<div class="card">
<p>Clock speed (frequency) determines how many cycles per second the CPU runs. Higher clock = more operations per second, but there are tradeoffs.</p>
<div class="formula">
Clock Period (T) = 1 / Frequency (f)
</div>
<div class="formula">
CPU Time = Instruction Count × CPI × Clock Period
</div>
<div class="formula">
CPU Time = (Instruction Count × CPI) / Clock Frequency
</div>
<p>Where <strong>CPI</strong> = Cycles Per Instruction (average)</p>
<h4>Why Not Just Crank Up Clock Speed?</h4>
<ul>
<li><strong>Power consumption:</strong> Power scales roughly with frequency cubed (P ∝ f³). Double the clock, roughly 8× the heat.</li>
<li><strong>Heat dissipation:</strong> Can't cool it fast enough. This is why we hit the "frequency wall" around 4-5 GHz.</li>
<li><strong>Signal integrity:</strong> At high frequencies, wires start acting like antennas and transmission lines.</li>
<li><strong>Timing margins:</strong> Less room for error in meeting setup/hold times.</li>
</ul>
</div>
<h3>Computer Architecture's Influence on Performance</h3>
<div class="card">
<p>Architecture decisions fundamentally impact what's possible:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Architectural Feature</th>
<th>Performance Impact</th>
</tr>
</thead>
<tbody>
<tr>
<td>Word Size (32-bit vs 64-bit)</td>
<td>More data processed per instruction, larger address space</td>
</tr>
<tr>
<td>Register Count</td>
<td>More registers = fewer memory accesses = faster</td>
</tr>
<tr>
<td>Cache Size & Levels</td>
<td>Larger/more cache = fewer slow main memory accesses</td>
</tr>
<tr>
<td>Pipeline Depth</td>
<td>More stages = higher clock possible, but hazard penalties</td>
</tr>
<tr>
<td>Superscalar Width</td>
<td>Execute multiple instructions per cycle</td>
</tr>
<tr>
<td>Out-of-Order Execution</td>
<td>Better utilization by reordering independent instructions</td>
</tr>
<tr>
<td>Branch Prediction</td>
<td>Reduce pipeline stalls from branches</td>
</tr>
</tbody>
</table>
</div>
</div>
<h3>Architecture vs. Instruction Set Relationship</h3>
<div class="card">
<p>The <strong>Instruction Set Architecture (ISA)</strong> is the interface between software and hardware — the "contract" that defines what instructions exist, their formats, registers, and addressing modes.</p>
<div class="grid-2">
<div>
<h4>CISC (Complex Instruction Set)</h4>
<p><span class="tag orange">x86</span></p>
<ul>
<li>Many complex instructions</li>
<li>Variable instruction lengths</li>
<li>Instructions can access memory directly</li>
<li>Fewer instructions needed per program</li>
<li>Complex decode logic</li>
</ul>
</div>
<div>
<h4>RISC (Reduced Instruction Set)</h4>
<p><span class="tag">ARM</span> <span class="tag">MIPS</span> <span class="tag">RISC-V</span></p>
<ul>
<li>Simple, uniform instructions</li>
<li>Fixed instruction length</li>
<li>Load/Store architecture</li>
<li>More instructions per program</li>
<li>Simpler decode, easier pipelining</li>
</ul>
</div>
</div>
<div class="highlight">
<div class="highlight-title">Key Insight</div>
<p>Modern "CISC" processors (like x86) actually translate complex instructions into RISC-like micro-operations internally. The ISA is preserved for compatibility, but execution is RISC-style for performance.</p>
</div>
</div>
<h3>Predicting Assembly Program Execution Time</h3>
<div class="card">
<p>To calculate execution time, you need to account for instruction mix, cycles per instruction type, and clock speed.</p>
<div class="formula">
Execution Time = Σ (Instruction_count_i × CPI_i) × Clock_period
</div>
<h4>Example Calculation</h4>
<p>Given a program with:</p>
<ul>
<li>100 ALU instructions (1 cycle each)</li>
<li>30 load instructions (3 cycles each — memory access)</li>
<li>20 store instructions (3 cycles each)</li>
<li>10 branch instructions (2 cycles each)</li>
<li>Clock frequency: 2 GHz</li>
</ul>
<div class="formula">
Total Cycles = (100 × 1) + (30 × 3) + (20 × 3) + (10 × 2) = 100 + 90 + 60 + 20 = 270 cycles
</div>
<div class="formula">
Clock Period = 1 / (2 × 10⁹) = 0.5 ns
</div>
<div class="formula">
Execution Time = 270 × 0.5 ns = 135 ns
</div>
<div class="highlight warning">
<div class="highlight-title">Real-World Complications</div>
<p>Actual execution time is affected by cache hits/misses (huge impact!), branch prediction accuracy, pipeline hazards, memory bandwidth, and out-of-order execution effects. These make precise prediction difficult.</p>
</div>
</div>
</section>
<!-- SECTION 3 -->
<section id="section-3">
<div class="section-header">
<span class="section-number">03</span>
<h2>Data Representation, Instruction Sets, and Addressing Modes</h2>
</div>
<h3>Binary and Hexadecimal Number Systems</h3>
<div class="card">
<h4>Binary (Base 2)</h4>
<p>Computers use binary because transistors have two states: on (1) and off (0). Each digit is a "bit".</p>
<div class="diagram">
<div class="diagram-title">Binary Place Values</div>
<pre>
Position: 7 6 5 4 3 2 1 0
Value: 128 64 32 16 8 4 2 1
2⁷ 2⁶ 2⁵ 2⁴ 2³ 2² 2¹ 2⁰
Example: 10110101₂ = 128 + 32 + 16 + 4 + 1 = 181₁₀
</pre>
</div>
<h4>Hexadecimal (Base 16)</h4>
<p>Hex is a compact way to represent binary. Each hex digit represents exactly 4 bits (a "nibble").</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Decimal</th><th>Binary</th><th>Hex</th>
<th>Decimal</th><th>Binary</th><th>Hex</th>
</tr>
</thead>
<tbody>
<tr><td>0</td><td>0000</td><td>0</td><td>8</td><td>1000</td><td>8</td></tr>
<tr><td>1</td><td>0001</td><td>1</td><td>9</td><td>1001</td><td>9</td></tr>
<tr><td>2</td><td>0010</td><td>2</td><td>10</td><td>1010</td><td>A</td></tr>
<tr><td>3</td><td>0011</td><td>3</td><td>11</td><td>1011</td><td>B</td></tr>
<tr><td>4</td><td>0100</td><td>4</td><td>12</td><td>1100</td><td>C</td></tr>
<tr><td>5</td><td>0101</td><td>5</td><td>13</td><td>1101</td><td>D</td></tr>
<tr><td>6</td><td>0110</td><td>6</td><td>14</td><td>1110</td><td>E</td></tr>
<tr><td>7</td><td>0111</td><td>7</td><td>15</td><td>1111</td><td>F</td></tr>
</tbody>
</table>
</div>
<div class="formula">
Example: 0xB5 = 1011 0101₂ = 181₁₀
</div>
</div>
<h3>Binary Addition and Subtraction</h3>
<div class="card">
<h4>Binary Addition</h4>
<p>Works just like decimal addition, but carry occurs at 2 instead of 10.</p>
<div class="diagram">
<pre>
Rules: Example: 0101 + 0011 = 1000 (5 + 3 = 8)
0 + 0 = 0
0 + 1 = 1 0 1 0 1 (5)
1 + 0 = 1 + 0 0 1 1 (3)
1 + 1 = 10 ---------
(0 carry 1) ¹ ¹
1 0 0 0 (8)
</pre>
</div>
<h4>Binary Subtraction (Using Two's Complement)</h4>
<p>Instead of subtracting, we add the negative. To negate in two's complement: invert all bits, then add 1.</p>
<div class="diagram">
<pre>
To compute 5 - 3 using 4 bits:
Step 1: Convert 3 to -3 (two's complement)
3 = 0011
Invert: 1100
Add 1: 1101 = -3
Step 2: Add 5 + (-3)
0101 (5)
+ 1101 (-3)
------
¹ 0010 (2) ← discard carry, answer is 2 ✓
</pre>
</div>
</div>
<h3>Decimal to Binary Conversion</h3>
<div class="card">
<h4>Method: Repeated Division by 2</h4>
<p>Divide by 2, record remainder. Repeat until quotient is 0. Read remainders bottom-to-top.</p>
<div class="diagram">
<pre>
Convert 181 to binary:
181 ÷ 2 = 90 remainder 1 ↑
90 ÷ 2 = 45 remainder 0 │
45 ÷ 2 = 22 remainder 1 │ Read upward
22 ÷ 2 = 11 remainder 0 │
11 ÷ 2 = 5 remainder 1 │
5 ÷ 2 = 2 remainder 1 │
2 ÷ 2 = 1 remainder 0 │
1 ÷ 2 = 0 remainder 1 │
Result: 181₁₀ = 10110101₂
</pre>
</div>
<h4>Binary to Decimal</h4>
<p>Multiply each bit by its place value and sum:</p>
<div class="formula">
10110101₂ = 1×128 + 0×64 + 1×32 + 1×16 + 0×8 + 1×4 + 0×2 + 1×1 = 181₁₀
</div>
</div>
<h3>Decimal to Hexadecimal Conversion</h3>
<div class="card">
<h4>Method 1: Repeated Division by 16</h4>
<div class="diagram">
<pre>
Convert 450 to hex:
450 ÷ 16 = 28 remainder 2 ↑
28 ÷ 16 = 1 remainder 12 (C) │ Read upward
1 ÷ 16 = 0 remainder 1 │
Result: 450₁₀ = 0x1C2
</pre>
</div>
<h4>Method 2: Convert to Binary First, Then Group by 4</h4>
<div class="formula">
450₁₀ = 111000010₂ = 0001 1100 0010₂ = 0x1C2
</div>
</div>
<h3>Signed Binary Representation (Two's Complement)</h3>
<div class="card">
<p>Two's complement is the standard for representing signed integers. The MSB (leftmost bit) indicates sign: 0 = positive, 1 = negative.</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>4-bit Binary</th>
<th>Unsigned Value</th>
<th>Signed Value (Two's Complement)</th>
</tr>
</thead>
<tbody>
<tr><td>0000</td><td>0</td><td>0</td></tr>
<tr><td>0001</td><td>1</td><td>+1</td></tr>
<tr><td>0111</td><td>7</td><td>+7 (max positive)</td></tr>
<tr><td>1000</td><td>8</td><td>-8 (min negative)</td></tr>
<tr><td>1111</td><td>15</td><td>-1</td></tr>
<tr><td>1110</td><td>14</td><td>-2</td></tr>
</tbody>
</table>
</div>
<h4>Range for n-bit Two's Complement</h4>
<div class="formula">
Range: -2^(n-1) to +2^(n-1) - 1
</div>
<p>For 8 bits: -128 to +127<br>
For 32 bits: -2,147,483,648 to +2,147,483,647</p>
<h4>Converting Negative Numbers</h4>
<div class="diagram">
<pre>
To represent -5 in 8-bit two's complement:
1. Start with +5: 00000101
2. Invert all bits: 11111010
3. Add 1: 11111011 ← This is -5
To verify: add 5 + (-5) should equal 0
00000101
+ 11111011
----------
100000000 ← Discard 9th bit carry = 00000000 ✓
</pre>
</div>
</div>
<h3>Big-Endian vs Little-Endian</h3>
<div class="card">
<p>Endianness describes byte ordering in multi-byte data. Consider the 32-bit value 0x12345678:</p>
<div class="diagram">
<div class="diagram-title">Memory Layout Comparison</div>
<pre>
Address: 0x100 0x101 0x102 0x103
Big-Endian: 12 34 56 78
↑ Most significant byte first
(Network byte order, used by some RISC)
Little-Endian: 78 56 34 12
↑ Least significant byte first
(x86, ARM default)
</pre>
</div>
<div class="grid-2">
<div class="highlight">
<div class="highlight-title">Big-Endian</div>
<p>Like reading left-to-right. MSB at lowest address. Used in: network protocols, some RISC processors, Java bytecode.</p>
</div>
<div class="highlight important">
<div class="highlight-title">Little-Endian</div>
<p>LSB at lowest address. Used in: x86, x64, ARM (usually), most modern PCs. Easier for hardware to extend values.</p>
</div>
</div>
</div>
<h3>Overflow and Underflow</h3>
<div class="card">
<h4>Overflow</h4>
<p>Occurs when the result of an operation is too large to represent in the available bits.</p>
<div class="diagram">
<pre>
8-bit signed addition overflow example:
01111111 (+127, max positive)
+ 00000001 (+1)
----------
10000000 (interpreted as -128, not +128!)
Overflow detection (signed):
If two positive numbers produce negative result → overflow
If two negative numbers produce positive result → overflow
</pre>
</div>
<h4>Underflow</h4>
<p>Occurs when the result is too negative (below minimum representable value).</p>
<div class="diagram">
<pre>
8-bit signed subtraction underflow:
10000000 (-128, min negative)
- 00000001 (-1)
----------
01111111 (+127, wrapped around!)
</pre>
</div>
<div class="highlight warning">
<div class="highlight-title">Consequences</div>
<p>Overflow/underflow can cause security vulnerabilities (integer overflow attacks), incorrect calculations, and program crashes. Always validate ranges and use appropriate data types!</p>
</div>
</div>
<h3>Assembly to Machine Language Translation</h3>
<div class="card">
<p>Assembly mnemonics are human-readable representations of machine code. Each instruction has a binary encoding defined by the ISA.</p>
<h4>MIPS R-Type Instruction Format Example</h4>
<div class="diagram">
<pre>
Instruction: ADD $t0, $s1, $s2 (add registers s1 and s2, store in t0)
R-Type Format (32 bits total):
┌────────┬───────┬───────┬───────┬───────┬────────┐
│ opcode │ rs │ rt │ rd │ shamt │ funct │
│ 6 bits │5 bits │5 bits │5 bits │5 bits │ 6 bits │
└────────┴───────┴───────┴───────┴───────┴────────┘
For ADD $t0, $s1, $s2:
opcode = 000000 (R-type)
rs = 10001 ($s1 = register 17)
rt = 10010 ($s2 = register 18)
rd = 01000 ($t0 = register 8)
shamt = 00000 (not a shift)
funct = 100000 (ADD function)
Machine code: 000000 10001 10010 01000 00000 100000
= 0x02324020
</pre>
</div>
<h4>I-Type and J-Type Formats</h4>
<div class="diagram">
<pre>
I-Type (Immediate): LW, SW, BEQ, ADDI, etc.
┌────────┬───────┬───────┬──────────────────┐
│ opcode │ rs │ rt │ immediate │
│ 6 bits │5 bits │5 bits │ 16 bits │
└────────┴───────┴───────┴──────────────────┘
J-Type (Jump): J, JAL
┌────────┬─────────────────────────────────┐
│ opcode │ address │
│ 6 bits │ 26 bits │
└────────┴─────────────────────────────────┘
</pre>
</div>
</div>
</section>
<!-- SECTION 4 -->
<section id="section-4">
<div class="section-header">
<span class="section-number">04</span>
<h2>Assembly Language Programming</h2>
</div>
<h3>Mathematical Expressions in Assembly</h3>
<div class="card">
<p>High-level math expressions must be broken down into individual operations using registers.</p>
<h4>Example: Compute result = (a + b) * (c - d)</h4>
<pre data-lang="MIPS Assembly"><code><span class="code-comment"># Assume: a in $s0, b in $s1, c in $s2, d in $s3</span>
<span class="code-comment"># Result will be in $s4</span>
<span class="code-keyword">add</span> <span class="code-register">$t0</span>, <span class="code-register">$s0</span>, <span class="code-register">$s1</span> <span class="code-comment"># $t0 = a + b</span>
<span class="code-keyword">sub</span> <span class="code-register">$t1</span>, <span class="code-register">$s2</span>, <span class="code-register">$s3</span> <span class="code-comment"># $t1 = c - d</span>
<span class="code-keyword">mul</span> <span class="code-register">$s4</span>, <span class="code-register">$t0</span>, <span class="code-register">$t1</span> <span class="code-comment"># $s4 = (a+b) * (c-d)</span></code></pre>
<h4>Example: Array Element Access (array[i] = array[i] + 5)</h4>
<pre data-lang="MIPS Assembly"><code><span class="code-comment"># Assume: base address of array in $s0, i in $s1</span>
<span class="code-keyword">sll</span> <span class="code-register">$t0</span>, <span class="code-register">$s1</span>, <span class="code-number">2</span> <span class="code-comment"># $t0 = i * 4 (word offset)</span>
<span class="code-keyword">add</span> <span class="code-register">$t0</span>, <span class="code-register">$t0</span>, <span class="code-register">$s0</span> <span class="code-comment"># $t0 = address of array[i]</span>
<span class="code-keyword">lw</span> <span class="code-register">$t1</span>, <span class="code-number">0</span>(<span class="code-register">$t0</span>) <span class="code-comment"># $t1 = array[i]</span>
<span class="code-keyword">addi</span> <span class="code-register">$t1</span>, <span class="code-register">$t1</span>, <span class="code-number">5</span> <span class="code-comment"># $t1 = array[i] + 5</span>
<span class="code-keyword">sw</span> <span class="code-register">$t1</span>, <span class="code-number">0</span>(<span class="code-register">$t0</span>) <span class="code-comment"># array[i] = $t1</span></code></pre>
</div>
<h3>Decision Structures (If-Else)</h3>
<div class="card">
<h4>Simple If Statement</h4>
<pre data-lang="C"><code><span class="code-comment">// C code:</span>
<span class="code-keyword">if</span> (a == b) {
c = d + e;
}</code></pre>
<pre data-lang="MIPS Assembly"><code><span class="code-comment"># a=$s0, b=$s1, c=$s2, d=$s3, e=$s4</span>
<span class="code-keyword">bne</span> <span class="code-register">$s0</span>, <span class="code-register">$s1</span>, <span class="code-label">skip</span> <span class="code-comment"># if a != b, skip the body</span>
<span class="code-keyword">add</span> <span class="code-register">$s2</span>, <span class="code-register">$s3</span>, <span class="code-register">$s4</span> <span class="code-comment"># c = d + e</span>
<span class="code-label">skip:</span>
<span class="code-comment"># continue...</span></code></pre>
<h4>If-Else Statement</h4>
<pre data-lang="C"><code><span class="code-comment">// C code:</span>
<span class="code-keyword">if</span> (a < b) {
c = <span class="code-number">1</span>;
} <span class="code-keyword">else</span> {
c = <span class="code-number">0</span>;
}</code></pre>
<pre data-lang="MIPS Assembly"><code><span class="code-comment"># a=$s0, b=$s1, c=$s2</span>
<span class="code-keyword">slt</span> <span class="code-register">$t0</span>, <span class="code-register">$s0</span>, <span class="code-register">$s1</span> <span class="code-comment"># $t0 = 1 if a < b, else 0</span>
<span class="code-keyword">beq</span> <span class="code-register">$t0</span>, <span class="code-register">$zero</span>, <span class="code-label">else</span> <span class="code-comment"># if $t0==0 (a >= b), go to else</span>
<span class="code-keyword">addi</span> <span class="code-register">$s2</span>, <span class="code-register">$zero</span>, <span class="code-number">1</span> <span class="code-comment"># c = 1</span>
<span class="code-keyword">j</span> <span class="code-label">done</span> <span class="code-comment"># skip else block</span>
<span class="code-label">else:</span>
<span class="code-keyword">add</span> <span class="code-register">$s2</span>, <span class="code-register">$zero</span>, <span class="code-register">$zero</span> <span class="code-comment"># c = 0</span>
<span class="code-label">done:</span></code></pre>
</div>
<h3>Loops</h3>
<div class="card">
<h4>While Loop</h4>
<pre data-lang="C"><code><span class="code-comment">// C code: sum numbers from 1 to n</span>
<span class="code-keyword">int</span> sum = <span class="code-number">0</span>;
<span class="code-keyword">int</span> i = <span class="code-number">1</span>;
<span class="code-keyword">while</span> (i <= n) {
sum = sum + i;
i = i + <span class="code-number">1</span>;
}</code></pre>
<pre data-lang="MIPS Assembly"><code><span class="code-comment"># n=$s0, sum=$s1, i=$s2</span>
<span class="code-keyword">add</span> <span class="code-register">$s1</span>, <span class="code-register">$zero</span>, <span class="code-register">$zero</span> <span class="code-comment"># sum = 0</span>
<span class="code-keyword">addi</span> <span class="code-register">$s2</span>, <span class="code-register">$zero</span>, <span class="code-number">1</span> <span class="code-comment"># i = 1</span>
<span class="code-label">loop:</span>
<span class="code-keyword">slt</span> <span class="code-register">$t0</span>, <span class="code-register">$s0</span>, <span class="code-register">$s2</span> <span class="code-comment"># $t0 = 1 if n < i</span>
<span class="code-keyword">bne</span> <span class="code-register">$t0</span>, <span class="code-register">$zero</span>, <span class="code-label">done</span> <span class="code-comment"># if n < i, exit loop</span>
<span class="code-keyword">add</span> <span class="code-register">$s1</span>, <span class="code-register">$s1</span>, <span class="code-register">$s2</span> <span class="code-comment"># sum = sum + i</span>
<span class="code-keyword">addi</span> <span class="code-register">$s2</span>, <span class="code-register">$s2</span>, <span class="code-number">1</span> <span class="code-comment"># i = i + 1</span>
<span class="code-keyword">j</span> <span class="code-label">loop</span> <span class="code-comment"># repeat</span>
<span class="code-label">done:</span></code></pre>
<h4>For Loop (Array Sum)</h4>
<pre data-lang="MIPS Assembly"><code><span class="code-comment"># Sum array elements: for(i=0; i<n; i++) sum += arr[i]</span>
<span class="code-comment"># arr base=$s0, n=$s1, sum in $s2</span>
<span class="code-keyword">add</span> <span class="code-register">$s2</span>, <span class="code-register">$zero</span>, <span class="code-register">$zero</span> <span class="code-comment"># sum = 0</span>
<span class="code-keyword">add</span> <span class="code-register">$t0</span>, <span class="code-register">$zero</span>, <span class="code-register">$zero</span> <span class="code-comment"># i = 0</span>
<span class="code-keyword">add</span> <span class="code-register">$t1</span>, <span class="code-register">$s0</span>, <span class="code-register">$zero</span> <span class="code-comment"># $t1 = current address</span>
<span class="code-label">loop:</span>
<span class="code-keyword">slt</span> <span class="code-register">$t2</span>, <span class="code-register">$t0</span>, <span class="code-register">$s1</span> <span class="code-comment"># $t2 = 1 if i < n</span>
<span class="code-keyword">beq</span> <span class="code-register">$t2</span>, <span class="code-register">$zero</span>, <span class="code-label">done</span> <span class="code-comment"># exit if i >= n</span>
<span class="code-keyword">lw</span> <span class="code-register">$t3</span>, <span class="code-number">0</span>(<span class="code-register">$t1</span>) <span class="code-comment"># $t3 = arr[i]</span>
<span class="code-keyword">add</span> <span class="code-register">$s2</span>, <span class="code-register">$s2</span>, <span class="code-register">$t3</span> <span class="code-comment"># sum += arr[i]</span>
<span class="code-keyword">addi</span> <span class="code-register">$t1</span>, <span class="code-register">$t1</span>, <span class="code-number">4</span> <span class="code-comment"># next address</span>
<span class="code-keyword">addi</span> <span class="code-register">$t0</span>, <span class="code-register">$t0</span>, <span class="code-number">1</span> <span class="code-comment"># i++</span>
<span class="code-keyword">j</span> <span class="code-label">loop</span>
<span class="code-label">done:</span></code></pre>
</div>
<h3>Procedures (Functions)</h3>
<div class="card">
<p>Procedures in assembly require careful management of the stack, return addresses, and registers.</p>
<h4>MIPS Calling Convention</h4>
<ul>
<li><code>$a0-$a3</code>: Arguments to procedure</li>
<li><code>$v0-$v1</code>: Return values</li>
<li><code>$ra</code>: Return address (set by <code>jal</code>)</li>
<li><code>$sp</code>: Stack pointer</li>
<li><code>$s0-$s7</code>: Saved registers (callee must preserve)</li>
<li><code>$t0-$t9</code>: Temporary registers (caller-saved)</li>
</ul>
<h4>Simple Leaf Procedure (No Nested Calls)</h4>
<pre data-lang="C"><code><span class="code-comment">// C code:</span>
<span class="code-keyword">int</span> square(<span class="code-keyword">int</span> x) {
<span class="code-keyword">return</span> x * x;
}</code></pre>
<pre data-lang="MIPS Assembly"><code><span class="code-label">square:</span>
<span class="code-keyword">mul</span> <span class="code-register">$v0</span>, <span class="code-register">$a0</span>, <span class="code-register">$a0</span> <span class="code-comment"># $v0 = x * x</span>
<span class="code-keyword">jr</span> <span class="code-register">$ra</span> <span class="code-comment"># return</span>
<span class="code-comment"># Calling the function:</span>
<span class="code-keyword">addi</span> <span class="code-register">$a0</span>, <span class="code-register">$zero</span>, <span class="code-number">5</span> <span class="code-comment"># arg = 5</span>
<span class="code-keyword">jal</span> <span class="code-label">square</span> <span class="code-comment"># call square(5)</span>
<span class="code-comment"># result now in $v0</span></code></pre>
<h4>Non-Leaf Procedure (Makes Nested Calls)</h4>
<pre data-lang="MIPS Assembly"><code><span class="code-comment"># int factorial(int n) { if(n<=1) return 1; return n * factorial(n-1); }</span>
<span class="code-label">factorial:</span>
<span class="code-keyword">addi</span> <span class="code-register">$sp</span>, <span class="code-register">$sp</span>, <span class="code-number">-8</span> <span class="code-comment"># allocate stack space</span>
<span class="code-keyword">sw</span> <span class="code-register">$ra</span>, <span class="code-number">4</span>(<span class="code-register">$sp</span>) <span class="code-comment"># save return address</span>
<span class="code-keyword">sw</span> <span class="code-register">$a0</span>, <span class="code-number">0</span>(<span class="code-register">$sp</span>) <span class="code-comment"># save argument n</span>
<span class="code-keyword">slti</span> <span class="code-register">$t0</span>, <span class="code-register">$a0</span>, <span class="code-number">2</span> <span class="code-comment"># $t0 = 1 if n < 2</span>
<span class="code-keyword">beq</span> <span class="code-register">$t0</span>, <span class="code-register">$zero</span>, <span class="code-label">recurse</span>
<span class="code-keyword">addi</span> <span class="code-register">$v0</span>, <span class="code-register">$zero</span>, <span class="code-number">1</span> <span class="code-comment"># base case: return 1</span>
<span class="code-keyword">addi</span> <span class="code-register">$sp</span>, <span class="code-register">$sp</span>, <span class="code-number">8</span> <span class="code-comment"># restore stack</span>
<span class="code-keyword">jr</span> <span class="code-register">$ra</span>
<span class="code-label">recurse:</span>
<span class="code-keyword">addi</span> <span class="code-register">$a0</span>, <span class="code-register">$a0</span>, <span class="code-number">-1</span> <span class="code-comment"># n - 1</span>
<span class="code-keyword">jal</span> <span class="code-label">factorial</span> <span class="code-comment"># recursive call</span>
<span class="code-keyword">lw</span> <span class="code-register">$a0</span>, <span class="code-number">0</span>(<span class="code-register">$sp</span>) <span class="code-comment"># restore n</span>
<span class="code-keyword">lw</span> <span class="code-register">$ra</span>, <span class="code-number">4</span>(<span class="code-register">$sp</span>) <span class="code-comment"># restore return address</span>
<span class="code-keyword">addi</span> <span class="code-register">$sp</span>, <span class="code-register">$sp</span>, <span class="code-number">8</span> <span class="code-comment"># deallocate stack</span>
<span class="code-keyword">mul</span> <span class="code-register">$v0</span>, <span class="code-register">$a0</span>, <span class="code-register">$v0</span> <span class="code-comment"># return n * factorial(n-1)</span>
<span class="code-keyword">jr</span> <span class="code-register">$ra</span></code></pre>
<div class="diagram">
<div class="diagram-title">Stack Frame Layout</div>
<pre>
High Memory
┌─────────────────────┐
│ Previous frames │
├─────────────────────┤
│ Saved $ra │ ← $sp + 4 (after allocation)
├─────────────────────┤
│ Saved $a0 (n) │ ← $sp (after allocation)
├─────────────────────┤
│ (next frame...) │
└─────────────────────┘
Low Memory
</pre>
</div>
</div>
</section>
<!-- SECTION 5 -->
<section id="section-5">
<div class="section-header">
<span class="section-number">05</span>
<h2>Hardware Performance Techniques</h2>
</div>
<h3>Increasing Clock Rate</h3>
<div class="card">
<div class="grid-2">
<div class="highlight">
<div class="highlight-title">Advantages</div>
<ul>
<li>More cycles per second = more instructions executed</li>
<li>Direct performance improvement for single-threaded code</li>
<li>Simple conceptually - just make everything faster</li>
</ul>
</div>
<div class="highlight warning">
<div class="highlight-title">Disadvantages</div>
<ul>
<li>Power consumption increases dramatically (P ∝ f³)</li>
<li>Heat generation becomes unmanageable</li>
<li>Requires faster transistors and better cooling</li>
<li>Hit practical limits around 4-5 GHz ("frequency wall")</li>
<li>Timing margins shrink, reliability concerns</li>
</ul>
</div>
</div>
</div>
<h3>Decreasing Silicon Feature Size</h3>
<div class="card">
<p>Feature size refers to the smallest elements that can be fabricated on a chip (measured in nanometers). Smaller = more transistors, lower power, higher speeds.</p>
<div class="grid-2">
<div class="highlight">
<div class="highlight-title">Advantages</div>
<ul>
<li>More transistors per chip (Moore's Law)</li>
<li>Lower capacitance = faster switching</li>
<li>Lower voltage operation possible</li>
<li>Reduced power consumption per transistor</li>
<li>Smaller die size = lower cost per chip</li>
</ul>
</div>
<div class="highlight warning">
<div class="highlight-title">Disadvantages</div>
<ul>
<li>Leakage current increases (quantum tunneling)</li>
<li>Manufacturing complexity and cost skyrocket</li>
<li>Heat density increases (same power, smaller area)</li>
<li>Reliability issues (electromigration, variability)</li>
<li>Approaching physical limits (atomic scale)</li>
</ul>
</div>
</div>
</div>
<h3>Pipelined Datapath</h3>
<div class="card">
<p>Pipelining divides instruction execution into stages, allowing multiple instructions to be "in flight" simultaneously. Like an assembly line.</p>
<div class="diagram">
<div class="diagram-title">5-Stage Pipeline</div>
<pre>
Time → 1 2 3 4 5 6 7 8
┌────┬────┬────┬────┬────┬────┬────┬────┐
Instr1│ IF │ ID │ EX │ MEM│ WB │ │ │ │
├────┼────┼────┼────┼────┼────┼────┼────┤
Instr2│ │ IF │ ID │ EX │ MEM│ WB │ │ │
├────┼────┼────┼────┼────┼────┼────┼────┤
Instr3│ │ │ IF │ ID │ EX │ MEM│ WB │ │
├────┼────┼────┼────┼────┼────┼────┼────┤
Instr4│ │ │ │ IF │ ID │ EX │ MEM│ WB │
└────┴────┴────┴────┴────┴────┴────┴────┘
IF = Instruction Fetch MEM = Memory Access
ID = Instruction Decode WB = Write Back
EX = Execute
</pre>
</div>
<div class="grid-2">
<div class="highlight">
<div class="highlight-title">Advantages</div>
<ul>
<li>Higher throughput (1 instruction completing per cycle at steady state)</li>
<li>Better hardware utilization</li>
<li>Enables higher clock frequencies (shorter critical path per stage)</li>
</ul>
</div>
<div class="highlight warning">
<div class="highlight-title">Disadvantages</div>
<ul>
<li>Pipeline hazards (data, control, structural)</li>
<li>Increased complexity</li>
<li>Latency for single instruction unchanged</li>
<li>Branch mispredictions cause pipeline flushes</li>
<li>Power overhead from pipeline registers</li>
</ul>
</div>
</div>
</div>
<h3>Branch Prediction</h3>
<div class="card">
<p>Branch prediction guesses the outcome of conditional branches before they're resolved, keeping the pipeline full.</p>
<h4>Static Prediction Methods</h4>
<ul>
<li><strong>Always Predict Not Taken:</strong> Simple, but wrong ~50% for general code</li>
<li><strong>Always Predict Taken:</strong> Better for loops (backward branches often taken)</li>
<li><strong>Backward Taken, Forward Not Taken (BTFNT):</strong> Exploits loop behavior</li>
</ul>
<h4>Dynamic Prediction Methods</h4>
<ul>
<li><strong>1-Bit Predictor:</strong> Remember last outcome. Problem: double misprediction at loop boundaries.</li>
<li><strong>2-Bit Saturating Counter:</strong> Needs two wrong predictions to change. Much better for loops.</li>
<li><strong>Correlating Predictors:</strong> Use history of recent branches to predict (captures patterns).</li>
<li><strong>Tournament Predictors:</strong> Multiple predictors compete; choose best per-branch.</li>
</ul>
<div class="diagram">
<div class="diagram-title">2-Bit Saturating Counter</div>
<pre>
Taken
┌───────────────────────────────┐
│ │
▼ Taken │
┌─────────┐ ─────────▶ ┌─────────┐
│ Strongly│ │ Weakly │
│ Taken │ ◀───────── │ Taken │
└─────────┘ Not Taken └─────────┘
│ ▲
│ │
│ Not Taken Taken │
▼ │
┌─────────┐ ─────────▶ ┌─────────┐
│ Strongly│ │ Weakly │
│Not Taken│ ◀───────── │Not Taken│
└─────────┘ Not Taken └─────────┘
▲ │
│ │
└───────────────────────────────┘
Not Taken
</pre>
</div>
<h4>Comparing Branch Prediction Performance</h4>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Predictor Type</th>
<th>Accuracy</th>
<th>Hardware Cost</th>
<th>Use Case</th>
</tr>
</thead>
<tbody>
<tr>
<td>Static (BTFNT)</td>
<td>~65%</td>
<td>None</td>
<td>Simple embedded</td>
</tr>
<tr>
<td>1-Bit Local</td>
<td>~80%</td>
<td>Low</td>
<td>Basic processors</td>
</tr>
<tr>
<td>2-Bit Local</td>
<td>~85%</td>
<td>Low</td>
<td>Most common</td>
</tr>
<tr>
<td>Correlating/Global</td>
<td>~92%</td>
<td>Medium</td>
<td>Desktop CPUs</td>
</tr>
<tr>
<td>Tournament/Hybrid</td>
<td>~95%+</td>
<td>High</td>
<td>High-performance</td>
</tr>
<tr>
<td>Neural (TAGE-SC-L)</td>
<td>~97%+</td>
<td>Very High</td>
<td>Modern CPUs</td>
</tr>
</tbody>
</table>
</div>
</div>
</section>
<!-- SECTION 6 -->
<section id="section-6">
<div class="section-header">
<span class="section-number">06</span>
<h2>Datapath Implementation</h2>
</div>
<h3>Single-Cycle Datapath</h3>
<div class="card">
<p>In a single-cycle design, every instruction completes in exactly one clock cycle. The clock period must be long enough for the slowest instruction.</p>
<div class="diagram">
<div class="diagram-title">Simplified Single-Cycle Datapath</div>
<pre>
┌─────┐ ┌─────────┐
PC ────▶│ I │─── Instr ──────▶│ Control │──── Control Signals
│ Mem │ └─────────┘
└─────┘ │
│ ▼
│ ┌──────────────────────────────────────┐
│ │ Register File │
│ │ ┌────┐ ┌────┐ │
│ │ │Read│───▶ Data1 ───▶┐ │Write│ │
│ │ │Reg1│ │ │Data │ │
│ │ └────┘ │ └────┘ │
│ │ ┌────┐ │ ▲ │
│ │ │Read│───▶ Data2 ──┐ │ │ │
│ │ │Reg2│ │ │ │ │
│ │ └────┘ │ │ │ │
│ └─────────────────────┼─┼─────┼───────┘
│ │ │ │
│ ▼ ▼ │
│ ┌─────┐ │
│ │ ALU │───┼───────────▶ Result
│ └─────┘ │
│ │ │
│ ▼ │
│ ┌───────┐ │
│ │ Data │──┘
│ │Memory │
│ └───────┘
│
▼
┌─────────┐
│ PC + 4 │───▶ Next PC
└─────────┘
</pre>
</div>
<div class="highlight warning">
<div class="highlight-title">Single-Cycle Limitations</div>
<p>Clock period = time for slowest instruction (usually load: fetch + decode + ALU + memory + write-back). Faster instructions (like ADD) must wait the same time, wasting cycles.</p>
</div>
</div>
<h3>Pipelined Datapath</h3>
<div class="card">
<p>Pipelining breaks execution into stages with pipeline registers between them. Each stage works on a different instruction simultaneously.</p>
<div class="diagram">
<div class="diagram-title">5-Stage Pipeline Datapath</div>
<pre>
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ IF │──▶│ ID │──▶│ EX │──▶│ MEM │──▶│ WB │
│ │ │ │ │ │ │ │ │ │
│ Fetch │ │ Decode │ │Execute │ │ Memory │ │ Write │
│ Instr │ │ & Read │ │ ALU │ │ Access │ │ Back │
│ │ │ Regs │ │ │ │ │ │ │
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘
│ │ │ │
▼ ▼ ▼ ▼
IF/ID ID/EX EX/MEM MEM/WB
Register Register Register Register
</pre>
</div>
<h4>Pipeline Hazards</h4>
<ul>
<li><strong>Structural Hazards:</strong> Hardware resource conflict (e.g., single memory for instruction and data). Solution: separate I-cache and D-cache.</li>
<li><strong>Data Hazards:</strong> Instruction needs data not yet available. Solutions: forwarding/bypassing, stalling.</li>
<li><strong>Control Hazards:</strong> Branch/jump changes flow. Solutions: branch prediction, delayed branches.</li>
</ul>
<h4>Comparison</h4>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Aspect</th>
<th>Single-Cycle</th>
<th>Pipelined</th>
</tr>
</thead>
<tbody>
<tr>
<td>Clock Period</td>
<td>Long (slowest instruction)</td>
<td>Short (slowest stage)</td>
</tr>
<tr>
<td>Throughput</td>
<td>1 instr per (long) cycle</td>
<td>~1 instr per (short) cycle</td>
</tr>
<tr>
<td>Latency</td>
<td>1 cycle</td>
<td>N cycles (N stages)</td>
</tr>
<tr>
<td>Complexity</td>
<td>Simple</td>
<td>Complex (hazard handling)</td>
</tr>
<tr>
<td>Hardware</td>
<td>Less</td>
<td>More (pipeline registers)</td>
</tr>
</tbody>
</table>
</div>
</div>
</section>
<!-- SECTION 7 -->
<section id="section-7">
<div class="section-header">
<span class="section-number">07</span>
<h2>Memory Hierarchy</h2>
</div>
<h3>Types of Memory</h3>
<div class="card">
<div class="diagram">
<div class="diagram-title">Memory Hierarchy Pyramid</div>
<pre>
┌───────┐
│ CPU │
│ Regs │ ← Fastest, smallest, most expensive
└───┬───┘ ~1 cycle access
│
┌────┴────┐
│ L1 │ ← ~4 cycles, 32-64 KB
│ Cache │
└────┬────┘
│
┌─────┴─────┐
│ L2 │ ← ~10-20 cycles, 256 KB - 1 MB
│ Cache │
└─────┬─────┘
│
┌──────┴──────┐
│ L3 │ ← ~40-75 cycles, 4-64 MB
│ Cache │
└──────┬──────┘
│
┌─────────┴─────────┐
│ Main Memory │ ← ~100-300 cycles, 8-128 GB
│ (RAM) │
└─────────┬─────────┘
│
┌────────────┴────────────┐
│ Secondary Storage │ ← millions of cycles
│ (SSD / HDD) │ TB capacity
└─────────────────────────┘
</pre>
</div>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Memory Type</th>
<th>Technology</th>
<th>Speed</th>
<th>Volatility</th>
<th>Use</th>
</tr>
</thead>
<tbody>
<tr>
<td>Registers</td>
<td>Flip-flops</td>
<td>~0.25 ns</td>
<td>Volatile</td>
<td>Current operands</td>
</tr>
<tr>
<td>SRAM (Cache)</td>
<td>6 transistors/bit</td>
<td>~1-10 ns</td>
<td>Volatile</td>
<td>L1/L2/L3 cache</td>
</tr>
<tr>
<td>DRAM (RAM)</td>
<td>1 transistor + capacitor</td>
<td>~50-100 ns</td>
<td>Volatile</td>
<td>Main memory</td>
</tr>
<tr>
<td>Flash/SSD</td>
<td>Floating-gate transistors</td>
<td>~25-100 μs</td>
<td>Non-volatile</td>
<td>Storage</td>
</tr>
<tr>
<td>HDD</td>
<td>Magnetic platters</td>
<td>~5-10 ms</td>
<td>Non-volatile</td>
<td>Bulk storage</td>
</tr>
</tbody>
</table>
</div>
</div>
<h3>How Different Memory Types Are Used</h3>
<div class="card">
<ul>
<li><strong>Registers:</strong> Hold data actively being computed. Compiler allocates variables here when possible.</li>
<li><strong>L1 Cache:</strong> Split into I-cache (instructions) and D-cache (data). Holds most recently accessed data. Checked first on every memory access.</li>
<li><strong>L2/L3 Cache:</strong> Larger, slower backup. L3 often shared among cores. Catches misses from L1.</li>
<li><strong>Main Memory (RAM):</strong> Holds active programs and data. OS manages allocation. Accessed on cache miss.</li>
<li><strong>Virtual Memory:</strong> Uses disk as "backup" for RAM. Pages swapped in/out as needed.</li>
<li><strong>Storage:</strong> Persistent file storage. Programs and data loaded from here into RAM.</li>
</ul>
<div class="highlight">
<div class="highlight-title">Principle of Locality</div>
<p><strong>Temporal Locality:</strong> Recently accessed data likely to be accessed again soon.<br>
<strong>Spatial Locality:</strong> Data near recently accessed data likely to be accessed soon.<br>
Caches exploit both by keeping recent data and fetching in blocks (cache lines).</p>
</div>
</div>
<h3>Calculating Cache Miss Delays</h3>
<div class="card">
<p>Cache performance significantly impacts overall system performance.</p>
<div class="formula">
Average Memory Access Time (AMAT) = Hit Time + (Miss Rate × Miss Penalty)
</div>
<h4>Example Calculation</h4>
<p>Given:</p>
<ul>
<li>L1 cache hit time: 1 cycle</li>
<li>L1 miss rate: 5%</li>
<li>L2 cache hit time: 10 cycles</li>
<li>L2 miss rate (of L1 misses): 20%</li>
<li>Main memory access time: 200 cycles</li>
</ul>
<div class="formula">
L2 AMAT = 10 + (0.20 × 200) = 10 + 40 = 50 cycles
</div>
<div class="formula">
Overall AMAT = 1 + (0.05 × 50) = 1 + 2.5 = 3.5 cycles average
</div>
<h4>For a Program</h4>
<p>Given a program with 1000 memory references:</p>
<ul>
<li>950 L1 hits (95%): 950 × 1 = 950 cycles</li>
<li>40 L2 hits (50 L1 misses × 80%): 40 × 10 = 400 cycles</li>
<li>10 Main memory accesses: 10 × 200 = 2000 cycles</li>
</ul>
<div class="formula">
Total = 950 + 400 + 2000 = 3350 cycles for 1000 references = 3.35 cycles/access
</div>
</div>
</section>
<!-- SECTION 8 -->
<section id="section-8">
<div class="section-header">
<span class="section-number">08</span>
<h2>Multiprocessor Architectures</h2>
</div>
<h3>Why Multiprocessors?</h3>
<div class="card">
<p>The transition to multiprocessor (multi-core) architectures happened because single-processor performance improvements hit fundamental limits:</p>
<h4>The Power Wall</h4>
<p>Power consumption and heat generation grew faster than performance. Increasing clock speed beyond ~4 GHz became impractical due to cooling limitations. Power ≈ Capacitance × Voltage² × Frequency.</p>
<h4>The Memory Wall</h4>
<p>Processor speed improved much faster than memory speed. CPUs spend increasing time waiting for data. Even with caches, memory latency limits single-thread performance.</p>
<h4>The ILP Wall</h4>
<p>Instruction-Level Parallelism has diminishing returns. Finding independent instructions to execute simultaneously becomes harder. Out-of-order execution complexity grows exponentially.</p>
<div class="highlight important">
<div class="highlight-title">The Solution: Parallelism</div>
<p>Instead of making one core faster, use multiple cores. Trade single-thread performance for multi-thread throughput. Let software exploit parallelism through threads/processes.</p>
</div>
<h4>Types of Parallelism</h4>
<div class="grid-2">
<div>
<h4>Thread-Level Parallelism (TLP)</h4>
<p>Multiple threads execute simultaneously on different cores. Requires parallel software design. Good for server workloads, scientific computing.</p>
</div>
<div>
<h4>Data-Level Parallelism (DLP)</h4>
<p>Same operation on multiple data elements (SIMD). GPUs excel at this. Good for graphics, machine learning, signal processing.</p>
</div>
</div>
<h4>Multiprocessor Types</h4>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Type</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>SMP (Symmetric)</td>
<td>All processors equal, shared memory</td>
<td>Typical desktop/laptop CPUs</td>
</tr>
<tr>
<td>NUMA</td>
<td>Non-uniform memory access times</td>
<td>Multi-socket servers</td>
</tr>
<tr>
<td>Heterogeneous</td>
<td>Different processor types (CPU + GPU)</td>
<td>Modern SoCs, game consoles</td>
</tr>
<tr>
<td>Cluster</td>
<td>Networked independent computers</td>
<td>Data centers, HPC</td>
</tr>
</tbody>
</table>
</div>
<h4>Challenges</h4>
<ul>
<li><strong>Cache Coherence:</strong> Keeping caches consistent when multiple cores access shared data</li>
<li><strong>Synchronization:</strong> Coordinating access to shared resources (locks, barriers)</li>
<li><strong>Amdahl's Law:</strong> Speedup limited by sequential portion of code. If 10% is sequential, max speedup is 10× regardless of core count.</li>
<li><strong>Programming Difficulty:</strong> Parallel programming is harder than sequential. Race conditions, deadlocks, load balancing.</li>
</ul>
<div class="formula">
Amdahl's Law: Speedup = 1 / ((1 - P) + P/N)
</div>
<p>Where P = parallel fraction, N = number of processors</p>
</div>
</section>
</main>
<footer>
<p>Computer Architecture & Organization Reference Guide</p>
<p style="margin-top: 8px; opacity: 0.6;">Covering CPU design, performance, data representation, assembly programming, and modern architectures</p>
</footer>
<script>
// Smooth scroll for nav links
document.querySelectorAll('nav a').forEach(anchor => {
anchor.addEventListener('click', function(e) {
e.preventDefault();
const targetId = this.getAttribute('href');
const target = document.querySelector(targetId);
if (target) {
target.scrollIntoView({ behavior: 'smooth', block: 'start' });
}
});
});
// Add scroll animation to sections
const observer = new IntersectionObserver((entries) => {
entries.forEach(entry => {
if (entry.isIntersecting) {
entry.target.style.opacity = '1';
entry.target.style.transform = 'translateY(0)';
}
});
}, { threshold: 0.1 });
document.querySelectorAll('section').forEach(section => {
section.style.opacity = '0';
section.style.transform = 'translateY(20px)';
section.style.transition = 'opacity 0.6s ease, transform 0.6s ease';
observer.observe(section);
});
// Highlight active nav item on scroll
const sections = document.querySelectorAll('section');
const navLinks = document.querySelectorAll('nav a');
window.addEventListener('scroll', () => {
let current = '';
sections.forEach(section => {
const sectionTop = section.offsetTop;
const sectionHeight = section.clientHeight;
if (scrollY >= sectionTop - 200) {
current = section.getAttribute('id');
}
});
navLinks.forEach(link => {
link.style.background = '';
link.style.borderColor = 'transparent';
if (link.getAttribute('href') === '#' + current) {
link.style.background = 'rgba(0, 217, 255, 0.1)';
link.style.borderColor = 'var(--accent-cyan)';
}
});
});
</script>
</body>
</html>