🎯 What is a Compiler?
A compiler is a translator that converts human-readable code into machine instructions that your computer's hardware can execute directly.
Why Do We Need Compilers?
High-level languages let programmers express complex ideas quickly and clearly. But computers only understand machine code - sequences of numbers representing basic operations. Machine code is nearly impossible for humans to read or write directly, so compilers bridge this gap.
🏗️ The Three Stages
Every compiler follows a similar structure, broken into three main stages:
Frontend
Syntax checking, tokenization, parsing
Middle
Analysis & optimization
Backend
Code generation
Click any stage to learn more
💡 Key Insight
Think of a compiler like a translator converting a novel from English to Japanese:
- Frontend = Understanding the English (grammar, vocabulary, sentence structure)
- Middle = Deciding the best way to express each idea in Japanese
- Backend = Actually writing out the Japanese text
🔄 Complete Compilation Pipeline
Follow your code's journey from source to executable:
int x = 5 + 3;
Tokenize
Build AST
Improve Code
Create Instructions
Machine Code
📊 Data Structures Along the Way
🏷️ Tokens
Smallest meaningful units of code
🌳 Abstract Syntax Tree
Hierarchical program structure
📋 Symbol Table
Registry of all identifiers
🔢 Intermediate Representation
Platform-independent code form
📥 Compiler Frontend
The frontend's job is to understand your code and check it for errors.
🔤 Lexical Analyzer (Lexer)
The lexer reads your raw source code character by character and groups them into tokens.
What are Tokens?
Tokens are the smallest meaningful units in a programming language - like words in a sentence.
Example: How the lexer sees int count = 42;
Token Types
- Keywords - Reserved words like
if,while,return - Identifiers - Names you create: variables, functions, classes
- Literals - Values like numbers (
42) or strings ("hello") - Operators - Symbols like
+,-,=,== - Punctuation - Structural symbols like
;,{,}
📐 Parser
The parser takes tokens and checks if they follow the grammar rules of the language.
Grammar Rules Examples
What the Parser Catches
- Missing semicolons:
int x = 5← where's the;? - Unbalanced brackets:
if (x > 5 { }← missing) - Invalid assignments:
5 = x;← can't assign to a number!
The Output: Abstract Syntax Tree (AST)
If the code is valid, the parser builds a tree structure that represents the program's logic.
AST for: result = (a + b) * 2;
├── result
└── Multiply (*)
├── Add (+)
│ ├── a
│ └── b
└── 2
📋 Symbol Table
While parsing, the compiler builds a symbol table - a database of all identifiers in your program.
For this code:
int globalCount = 0;
void calculateSum(int a, int b) {
int result = a + b;
globalCount++;
}
| Name | Type | Scope | Details |
|---|---|---|---|
| globalCount | int | global | initialized to 0 |
| calculateSum | function | global | returns void, 2 params |
| a | int | calculateSum | parameter |
| b | int | calculateSum | parameter |
| result | int | calculateSum | local variable |
Why Symbol Tables Matter
- Type checking - Is
x + "hello"valid? - Scope resolution - Which
xare we referring to? - Memory allocation - Where should each variable live?
⚡ Middle Stage (Optimizer)
The middle stage makes your code faster and smaller without changing what it does.
This stage is optional but almost always used in production compilers.
🎯 Optimization Goals
🏃 Faster Execution
Reduce CPU cycles needed
📦 Smaller Code
Reduce executable size
💾 Less Memory
Reduce RAM usage
🔧 Common Optimizations
Dead Code Elimination
Remove code that never runs or variables that are never used.
Before
int x = 5; int y = 10; // never used! int z = x * 2; return z;
After
int x = 5; int z = x * 2; return z;
Constant Folding
Calculate constant expressions at compile time instead of runtime.
Before
int seconds = 60 * 60 * 24; int area = 3.14159 * 10 * 10;
After
int seconds = 86400; int area = 314.159;
Loop Optimization
Move calculations outside loops when possible.
Before
for (i = 0; i < n; i++) {
x = y * z; // same every iteration!
arr[i] = x + i;
}
After
x = y * z; // moved outside
for (i = 0; i < n; i++) {
arr[i] = x + i;
}
Strength Reduction
Replace expensive operations with cheaper equivalents.
Before
x = y * 2; z = w / 4; p = q * 8;
After (using bit shifts)
x = y << 1; // shift left = *2 z = w >> 2; // shift right = /4 p = q << 3; // shift left 3 = *8
📤 Compiler Backend
The backend transforms the optimized intermediate code into actual machine instructions for a specific CPU.
🎯 Backend Responsibilities
📍 Memory Layout
Organize data in RAM
⚙️ Instruction Selection
Choose CPU operations
📝 Register Allocation
Assign CPU registers
🔧 Code Generation Example
See how high-level code becomes machine instructions:
C Code
int sum = a + b;
x86 Assembly (simplified)
MOV EAX, [a] ; Load value of 'a' into register EAX ADD EAX, [b] ; Add value of 'b' to EAX MOV [sum], EAX ; Store result in memory location 'sum'
Instruction Templates
The backend uses templates for common patterns:
🛠️ Additional Compiler Tools
Compilers work with other tools to create final executables.
🔗 Linkers
Linkers combine separately compiled pieces into a single executable.
What Gets Linked?
- Your compiled code - The .o or .obj files from your source
- Standard libraries - printf, malloc, math functions
- Third-party libraries - Any external code you use
When you write printf("Hello"), you don't include the actual printf code:
Static vs Dynamic Linking
Static Linking
Library code is copied INTO your executable.
✅ Self-contained, ❌ Larger file
Dynamic Linking
Library code is loaded at runtime (.dll, .so).
✅ Smaller file, ❌ Needs library installed
📝 Assemblers
Assemblers translate assembly language into machine code.
Assembly vs Machine Code
What Assemblers Do
- Translate mnemonics - MOV, ADD, JMP → numeric opcodes
- Resolve labels - Convert names like
loop_startto addresses - Manage symbol table - Track function/variable locations
- Handle directives - .data, .text, memory alignment
💡 Assembly is much simpler to translate than high-level languages because there's almost a 1:1 mapping between assembly instructions and machine code.
🎮 Interactive Lexer Demo
Enter some code and watch it get tokenized!
Tokens
Symbol Table
🌳 AST Builder
Enter a simple expression to see its syntax tree:
Abstract Syntax Tree
📝 Test Your Knowledge
See how well you understand compiler internals!