Advanced Mathematics for Computer Science

Machine learning (gradient descent, backpropagation), computer graphics (physics simulations), numerical methods, algorithm complexity analysis, signal processing

Applications of Calculus

Optimization

Critical points: $f'(x) = 0$ or $f'(x)$ undefined Second derivative test: - If $f''(c) > 0$: local minimum at $x = c$ - If $f''(c) < 0$: local maximum at $x = c$ - If $f''(c) = 0$: test inconclusive

Related Rates

If variables $x$ and $y$ are related by an equation and vary with time $t$: $$\frac{dy}{dt} = \frac{dy}{dx} \cdot \frac{dx}{dt}$$ Example: Area of circle $A = \pi r^2$ $$\frac{dA}{dt} = 2\pi r \frac{dr}{dt}$$

Arc Length & Surface Area

Arc length: $L = \int_a^b \sqrt{1 + (f'(x))^2}\,dx$ Surface area (revolution about x-axis): $$S = 2\pi \int_a^b f(x)\sqrt{1 + (f'(x))^2}\,dx$$

Multivariable Calculus

Partial Derivatives & Gradients

Partial Derivatives

$$\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h,y) - f(x,y)}{h}$$ Mixed partials theorem (Clairaut): $$\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial^2 f}{\partial y \partial x}$$ (if both are continuous)

Gradient & Directional Derivatives

$\nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right)$$ Directional derivative: $$D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}$$ Maximum rate of change: $|\nabla f|

Chain Rule (Multivariable)

If $z = f(x,y)$ where $x = g(t)$ and $y = h(t)$: $$\frac{dz}{dt} = \frac{\partial z}{\partial x}\frac{dx}{dt} + \frac{\partial z}{\partial y}\frac{dy}{dt}$$ General form: $$\frac{\partial z}{\partial t} = \sum_{i} \frac{\partial z}{\partial x_i}\frac{\partial x_i}{\partial t}$$

CS Applications:

Machine learning optimization (gradient descent), computer graphics (surface normals), image processing (edge detection), neural network training

Multiple Integrals

Double Integrals

\iint_R f(x,y) \, dA = \int_a^b \int_{c(x)}^{d(x)} f(x,y) \, dy \, dx$$ Fubini's Theorem: For continuous $f$, $$\int_a^b \int_c^d f(x,y) \, dy \, dx = \int_c^d \int_a^b f(x,y) \, dx \, dy

Change of Variables

$\iint_R f(x,y) \, dx \, dy = \iint_S f(x(u,v), y(u,v)) |J| \, du \, dv$$ Jacobian: $J = \frac{\partial(x,y)}{\partial(u,v)} = \begin{vmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{vmatrix}

Cylindrical & Spherical Coordinates

Cylindrical: $\begin{cases} x = r\cos\theta \\ y = r\sin\theta \\ z = z \end{cases}$ $dV = r \, dr \, d\theta \, dz$ Spherical: $\begin{cases} x = \rho\sin\phi\cos\theta \\ y = \rho\sin\phi\sin\theta \\ z = \rho\cos\phi \end{cases}$ $dV = \rho^2\sin\phi \, d\rho \, d\phi \, d\theta$

Vector Calculus

Vector Fields & Line Integrals

$\int_C \mathbf{F} \cdot d\mathbf{r} = \int_a^b \mathbf{F}(\mathbf{r}(t)) \cdot \mathbf{r}'(t) \, dt$$ Conservative field: $\nabla \times \mathbf{F} = \mathbf{0}$ If $\mathbf{F} = \nabla f$: $\int_C \mathbf{F} \cdot d\mathbf{r} = f(B) - f(A)

Green's Theorem

$\oint_C (P \, dx + Q \, dy) = \iint_D \left( \frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y} \right) dA$$ Area formula: $A = \frac{1}{2} \oint_C (x \, dy - y \, dx)

Divergence & Curl

\text{div } \mathbf{F} = \nabla \cdot \mathbf{F} = \frac{\partial P}{\partial x} + \frac{\partial Q}{\partial y} + \frac{\partial R}{\partial z}$$ $$\text{curl } \mathbf{F} = \nabla \times \mathbf{F} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \frac{\partial}{\partial x} & \frac{\partial}{\partial y} & \frac{\partial}{\partial z} \\ P & Q & R \end{vmatrix}

Stokes' Theorem

\oint_C \mathbf{F} \cdot d\mathbf{r} = \iint_S (\nabla \times \mathbf{F}) \cdot \mathbf{n} \, dS

Relates line integral around boundary to surface integral of curl

Divergence Theorem

\iint_S \mathbf{F} \cdot \mathbf{n} \, dS = \iiint_E \nabla \cdot \mathbf{F} \, dV

Relates surface integral to volume integral of divergence

Differential Equations

First-Order ODEs

Separable Equations

\frac{dy}{dx} = f(x)g(y)$$ Solution method: $$\int \frac{dy}{g(y)} = \int f(x) \, dx

Linear First-Order

$\frac{dy}{dx} + P(x)y = Q(x)$$ Integrating factor: $\mu(x) = e^{\int P(x) \, dx}$ Solution: $y = \frac{1}{\mu(x)}\left[\int \mu(x)Q(x) \, dx + C\right]

Exact Equations

$M(x,y) \, dx + N(x,y) \, dy = 0$$ Exact if: $\frac{\partial M}{\partial y} = \frac{\partial N}{\partial x}$ Solution: Find $F(x,y)$ where $\frac{\partial F}{\partial x} = M$ and $\frac{\partial F}{\partial y} = N

Second-Order Linear ODEs

Homogeneous with Constant Coefficients

$ay'' + by' + cy = 0$$ Characteristic equation: $ar^2 + br + c = 0$ Solutions: - If $r_1 \neq r_2$: $y = c_1e^{r_1x} + c_2e^{r_2x}$ - If $r_1 = r_2 = r$: $y = (c_1 + c_2x)e^{rx}$ - If $r = \alpha \pm \beta i$: $y = e^{\alpha x}(c_1\cos(\beta x) + c_2\sin(\beta x))

Method of Undetermined Coefficients

For $ay'' + by' + cy = f(x)$: - If $f(x) = e^{\alpha x}$: try $y_p = Ae^{\alpha x}$ - If $f(x) = \text{polynomial}$: try polynomial of same degree - If $f(x) = \sin(\alpha x)$ or $\cos(\alpha x)$: try $y_p = A\cos(\alpha x) + B\sin(\alpha x)$

Variation of Parameters

For $y'' + P(x)y' + Q(x)y = f(x)$ If $y_1, y_2$ are homogeneous solutions: $$y_p = -y_1\int\frac{y_2f}{W}dx + y_2\int\frac{y_1f}{W}dx$$ Wronskian: $W = y_1y_2' - y_2y_1'$

CS Applications:

System dynamics, control theory, signal processing, population models in algorithms, oscillator circuits, physics simulations

Laplace Transforms

Definition & Properties

$\mathcal{L}\{f(t)\} = F(s) = \int_0^{\infty} e^{-st}f(t) \, dt$$ Properties: - $\mathcal{L}\{af + bg\} = a\mathcal{L}\{f\} + b\mathcal{L}\{g\}$ - $\mathcal{L}\{f'\} = sF(s) - f(0)$ - $\mathcal{L}\{f''\} = s^2F(s) - sf(0) - f'(0)

Common Transforms

\mathcal{L}\{1\} = \frac{1}{s}$$ $$\mathcal{L}\{t^n\} = \frac{n!}{s^{n+1}}$$ $$\mathcal{L}\{e^{at}\} = \frac{1}{s-a}$$ $$\mathcal{L}\{\sin(at)\} = \frac{a}{s^2+a^2}$$ $$\mathcal{L}\{\cos(at)\} = \frac{s}{s^2+a^2}

Systems of ODEs

Matrix Form

$\mathbf{x}' = A\mathbf{x} + \mathbf{b}$$ Solution: $\mathbf{x}(t) = e^{At}\mathbf{x}_0 + \int_0^t e^{A(t-\tau)}\mathbf{b}(\tau) \, d\tau$ Matrix exponential: $e^{At} = \sum_{n=0}^{\infty} \frac{A^n t^n}{n!}

Eigenvalue Method

If $A\mathbf{v} = \lambda\mathbf{v}$, then $\mathbf{x} = \mathbf{v}e^{\lambda t}$ is a solution General solution: $$\mathbf{x} = c_1\mathbf{v}_1e^{\lambda_1 t} + c_2\mathbf{v}_2e^{\lambda_2 t} + \cdots$$

Linear Algebra

Matrix Operations & Properties

Basic Operations

(AB)C = A(BC)$ - Associative $A(B + C) = AB + AC$ - Distributive $(AB)^T = B^T A^T$ $(AB)^{-1} = B^{-1}A^{-1}

Determinants

\det(AB) = \det(A)\det(B)$ $\det(A^T) = \det(A)$ $\det(A^{-1}) = \frac{1}{\det(A)}$ For $2 \times 2$: $\det\begin{pmatrix}a & b\\c & d\end{pmatrix} = ad - bc

Matrix Inverse

A^{-1} = \frac{1}{\det(A)} \cdot \text{adj}(A)$ For $2 \times 2$: $A^{-1} = \frac{1}{ad-bc}\begin{pmatrix}d & -b\\-c & a\end{pmatrix}$ Gauss-Jordan: $[A|I] \rightarrow [I|A^{-1}]

Vector Spaces & Linear Independence

Vector Space Axioms

A vector space $V$ over field $F$ satisfies:

1. $\mathbf{u} + \mathbf{v} \in V$ (closure under addition) 2. $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ (commutativity) 3. $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$ (associativity) 4. $\exists \mathbf{0} \in V: \mathbf{v} + \mathbf{0} = \mathbf{v}$ (additive identity) 5. $\forall \mathbf{v} \exists (-\mathbf{v}): \mathbf{v} + (-\mathbf{v}) = \mathbf{0}$ (additive inverse) 6-10. Scalar multiplication axioms

Linear Independence

Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n$ are linearly independent if: $$c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_n\mathbf{v}_n = \mathbf{0} \implies c_1 = c_2 = \cdots = c_n = 0$$ Matrix form: $A\mathbf{x} = \mathbf{0}$ has only trivial solution

Basis & Dimension

Basis: linearly independent set that spans the space $\dim(V) = $ number of vectors in any basis If $\dim(V) = n$, then any $n$ linearly independent vectors form a basis

Eigenvalues & Eigenvectors

Characteristic Equation

$A\mathbf{v} = \lambda\mathbf{v} \quad (\mathbf{v} \neq \mathbf{0})$$ Characteristic polynomial: $\det(A - \lambda I) = 0$ Eigenspace: $E_{\lambda} = \{\mathbf{v} : A\mathbf{v} = \lambda\mathbf{v}\} = \text{null}(A - \lambda I)

Diagonalization

$A = PDP^{-1}$$ where $D$ is diagonal $P = [\mathbf{v}_1 \, \mathbf{v}_2 \, \cdots \, \mathbf{v}_n]$ (eigenvectors) $D = \text{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)$ (eigenvalues) $A^k = PD^k P^{-1}

Properties

$$\text{tr}(A) = \lambda_1 + \lambda_2 + \cdots + \lambda_n$$ $$\det(A) = \lambda_1 \times \lambda_2 \times \cdots \times \lambda_n$$ If $A$ symmetric: all eigenvalues real, eigenvectors orthogonal

CS Applications:

Google PageRank algorithm, Principal Component Analysis (PCA), computer graphics transformations, quantum computing gates, network analysis

Orthogonality & Least Squares

Inner Products & Norms

$\langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{u}^T\mathbf{v} = u_1v_1 + u_2v_2 + \cdots + u_nv_n$$ $$\|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle} = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}$$ Cauchy-Schwarz: $|\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\| \|\mathbf{v}\|

Gram-Schmidt Process

$\mathbf{u}_1 = \mathbf{v}_1$$ $$\mathbf{u}_2 = \mathbf{v}_2 - \text{proj}_{\mathbf{u}_1}\mathbf{v}_2$$ $$\mathbf{u}_3 = \mathbf{v}_3 - \text{proj}_{\mathbf{u}_1}\mathbf{v}_3 - \text{proj}_{\mathbf{u}_2}\mathbf{v}_3$$ where $\text{proj}_{\mathbf{u}}\mathbf{v} = \frac{\langle \mathbf{v}, \mathbf{u} \rangle}{\langle \mathbf{u}, \mathbf{u} \rangle}\mathbf{u}

Least Squares

Normal equation: $A^T A\hat{\mathbf{x}} = A^T\mathbf{b}$ Solution: $\hat{\mathbf{x}} = (A^T A)^{-1}A^T\mathbf{b}$ If $A$ has orthonormal columns: $\hat{\mathbf{x}} = A^T\mathbf{b}$

Singular Value Decomposition

SVD Theorem

Every $m \times n$ matrix $A$ can be written as:

$$A = U\Sigma V^T$$ $U$: $m \times m$ orthogonal (left singular vectors) $\Sigma$: $m \times n$ diagonal (singular values $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$) $V$: $n \times n$ orthogonal (right singular vectors)

CS Applications:

Image compression, dimensionality reduction, recommender systems, data analysis, pseudoinverse computation

Discrete Mathematics

Logic & Proofs

Logical Operators

\neg$ (NOT), $\land$ (AND), $\lor$ (OR), $\rightarrow$ (IMPLIES), $\leftrightarrow$ (IFF) De Morgan's Laws: $$\neg(P \land Q) \equiv \neg P \lor \neg Q$$ $$\neg(P \lor Q) \equiv \neg P \land \neg Q$$ $P \rightarrow Q \equiv \neg P \lor Q

Quantifiers

\forall x \, P(x)$: "for all $x$, $P(x)$" $\exists x \, P(x)$: "there exists $x$ such that $P(x)$" $$\neg\forall x \, P(x) \equiv \exists x \, \neg P(x)$$ $$\neg\exists x \, P(x) \equiv \forall x \, \neg P(x)$

Proof Techniques

Direct proof: $P \rightarrow Q$ Contrapositive: $\neg Q \rightarrow \neg P$ Contradiction: Assume $\neg Q$ and derive contradiction Induction: Base case + inductive step

Set Theory

Set Operations

A \cup B = \{x : x \in A \text{ or } x \in B\}$ $A \cap B = \{x : x \in A \text{ and } x \in B\}$ $A - B = \{x : x \in A \text{ and } x \notin B\}$ $A^c = \{x : x \notin A\}$ $|A \cup B| = |A| + |B| - |A \cap B|

Relations

Reflexive: $\forall x \, (x,x) \in R$ Symmetric: $\forall x,y \, ((x,y) \in R \rightarrow (y,x) \in R)$ Transitive: $\forall x,y,z \, ((x,y) \in R \land (y,z) \in R \rightarrow (x,z) \in R)$ Equivalence relation: reflexive + symmetric + transitive

Functions

Injective (one-to-one): $\forall x,y \, (f(x) = f(y) \rightarrow x = y)$ Surjective (onto): $\forall y \, \exists x \, (f(x) = y)$ Bijective: injective + surjective $|A| = |B|$ iff $\exists$ bijection $f: A \rightarrow B$

Combinatorics

Counting Principles

Addition Principle: $|A \cup B| = |A| + |B|$ if $A \cap B = \emptyset$ Multiplication Principle: $|A \times B| = |A| \times |B|$ Pigeonhole Principle: If $n$ objects in $m$ boxes and $n > m$, then some box contains $> 1$ object

Permutations & Combinations

$P(n,r) = \frac{n!}{(n-r)!} \text{ (permutations)}$$ $$C(n,r) = \binom{n}{r} = \frac{n!}{r!(n-r)!} \text{ (combinations)}$$ Binomial theorem: $(x+y)^n = \sum_{k=0}^n \binom{n}{k} x^k y^{n-k}$ Stars and bars: $\binom{n+k-1}{k-1}

Inclusion-Exclusion

|A_1 \cup A_2 \cup \cdots \cup A_n| =$$ $$\sum |A_i| - \sum |A_i \cap A_j| + \sum |A_i \cap A_j \cap A_k| - \cdots$$ $$+ (-1)^{n+1}|A_1 \cap \cdots \cap A_n|

CS Applications:

Algorithm analysis, database query optimization, complexity theory, coding theory, cryptographic protocols

Generating Functions

Ordinary Generating Functions

$G(x) = \sum_{n=0}^{\infty} a_n x^n = a_0 + a_1x + a_2x^2 + \cdots$$ Fibonacci: $F(x) = \frac{x}{1 - x - x^2}$ Geometric: $\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n

Exponential Generating Functions

$$E(x) = \sum_{n=0}^{\infty} a_n \frac{x^n}{n!}$$ $e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}$ Useful for problems with labels/arrangements

Recurrence Relations

Linear Homogeneous

$a_n = c_1a_{n-1} + c_2a_{n-2} + \cdots + c_ka_{n-k}$$ Characteristic equation: $r^k - c_1r^{k-1} - \cdots - c_k = 0$ If roots distinct: $a_n = A_1r_1^n + A_2r_2^n + \cdots

Master Theorem

For $T(n) = aT(n/b) + f(n)$: Case 1: $f(n) = O(n^{\log_b a - \varepsilon}) \Rightarrow T(n) = \Theta(n^{\log_b a})$ Case 2: $f(n) = \Theta(n^{\log_b a}) \Rightarrow T(n) = \Theta(n^{\log_b a} \log n)$ Case 3: $f(n) = \Omega(n^{\log_b a + \varepsilon}) \Rightarrow T(n) = \Theta(f(n))$

Probability & Statistics

Probability Theory

Basic Probability

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$ $$P(A^c) = 1 - P(A)$$ $$P(A|B) = \frac{P(A \cap B)}{P(B)}$$ Independence: $P(A \cap B) = P(A)P(B)

Bayes' Theorem

P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$ Law of Total Probability: $$P(B) = \sum_i P(B|A_i)P(A_i)

Random Variables

$E[X] = \sum_x x \cdot P(X = x) \text{ (discrete)}$$ $$E[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx \text{ (continuous)}$$ $\text{Var}(X) = E[X^2] - (E[X])^2$ $\text{Var}(aX + b) = a^2\text{Var}(X)

Common Distributions

Discrete Distributions

Bernoulli: $P(X = 1) = p$, $P(X = 0) = 1-p$ Binomial: $P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}$ $E[X] = np$, $\text{Var}(X) = np(1-p)$ Poisson: $P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$ $E[X] = \text{Var}(X) = \lambda$

Continuous Distributions

Uniform: $f(x) = \frac{1}{b-a}$ for $x \in [a,b]$ $E[X] = \frac{a+b}{2}$, $\text{Var}(X) = \frac{(b-a)^2}{12}$ Normal: $f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ $E[X] = \mu$, $\text{Var}(X) = \sigma^2$ Exponential: $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$ $E[X] = \frac{1}{\lambda}$, $\text{Var}(X) = \frac{1}{\lambda^2}$

Central Limit Theorem

If $X_1, X_2, \ldots, X_n$ are i.i.d. with $E[X_i] = \mu$, $\text{Var}(X_i) = \sigma^2$: $$\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1) \text{ as } n \to \infty$$ where $\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$

Statistical Inference

Confidence Intervals

For mean $\mu$ ($\sigma$ known): $$\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}$$ For mean $\mu$ ($\sigma$ unknown): $$\bar{x} \pm t_{\alpha/2,n-1} \times \frac{s}{\sqrt{n}}$$ For proportion $p$: $$\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Hypothesis Testing

Test statistic for mean: $$z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}} \text{ or } t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$$ p-value = P(observing test statistic or more extreme | $H_0$) Reject $H_0$ if p-value $< \alpha$

Regression

Simple linear: $\hat{y} = a + bx$ $$b = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}$$ $$a = \bar{y} - b\bar{x}$$ $R^2 = 1 - \frac{SSE}{SST}$ (coefficient of determination)

CS Applications:

Machine learning algorithms, A/B testing, data mining, quality assurance, performance analysis, randomized algorithms

Numerical Analysis

Root Finding

Bisection Method

If $f(a)f(b) < 0$, then $\exists c \in (a,b): f(c) = 0$ Algorithm: 1. $c = \frac{a + b}{2}$ 2. If $f(a)f(c) < 0$: $b = c$, else $a = c$ 3. Repeat until $|b - a| < $ tolerance Convergence: $O(\log_2((b-a)/\varepsilon))$

Newton's Method

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$$ Quadratic convergence when $f'(\text{root}) \neq 0$ May fail if $f'(x_n) \approx 0$ or poor initial guess

Secant Method

$x_{n+1} = x_n - f(x_n)\frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}$$ Approximates derivative using secant line Superlinear convergence: $\varphi = \frac{1+\sqrt{5}}{2} \approx 1.618

Interpolation

Lagrange Interpolation

$$P(x) = \sum_{i=0}^n y_i L_i(x)$$ where $L_i(x) = \prod_{j=0, j \neq i}^n \frac{x - x_j}{x_i - x_j}$ Unique polynomial of degree $\leq n$ through $n+1$ points

Newton's Divided Differences

$$P(x) = f[x_0] + f[x_0,x_1](x-x_0) + f[x_0,x_1,x_2](x-x_0)(x-x_1) + \cdots$$ $f[x_i,x_{i+1}] = \frac{f[x_{i+1}] - f[x_i]}{x_{i+1} - x_i}$ More efficient for adding new points

Spline Interpolation

Cubic spline: $S(x) = a_i + b_i(x-x_i) + c_i(x-x_i)^2 + d_i(x-x_i)^3$ Conditions: - $S(x_i) = y_i$ - $S'(x_i^-) = S'(x_i^+)$ - $S''(x_i^-) = S''(x_i^+)$

Numerical Integration

Composite Rules

Composite Trapezoidal: $$\int_a^b f(x)\,dx \approx \frac{h}{2}[f(x_0) + 2f(x_1) + \cdots + 2f(x_{n-1}) + f(x_n)]$$ Error: $O(h^2)$ Composite Simpson's: $$\int_a^b f(x)\,dx \approx \frac{h}{3}[f(x_0) + 4f(x_1) + 2f(x_2) + 4f(x_3) + \cdots + f(x_n)]$$ Error: $O(h^4)$

Gaussian Quadrature

$\int_{-1}^1 f(x)\,dx \approx \sum_{i=1}^n w_i f(x_i)$$ Optimal choice of points $x_i$ and weights $w_i$ $n$-point Gauss rule exact for polynomials of degree $\leq 2n-1

Ordinary Differential Equations

Euler's Method

$y_{n+1} = y_n + h \cdot f(x_n, y_n)$$ First-order accuracy: $O(h)$ Simple but may be unstable for large $h

Runge-Kutta Methods

RK4 (Fourth-order): $$k_1 = hf(x_n, y_n)$$ $$k_2 = hf(x_n + h/2, y_n + k_1/2)$$ $$k_3 = hf(x_n + h/2, y_n + k_2/2)$$ $$k_4 = hf(x_n + h, y_n + k_3)$$ $$y_{n+1} = y_n + \frac{k_1 + 2k_2 + 2k_3 + k_4}{6}$$

CS Applications:

Computer graphics (ray tracing), scientific computing, game physics, signal processing, finite element analysis

Complex Analysis

Complex Numbers

Basic Operations

$$z = x + iy = re^{i\theta} = r(\cos\theta + i\sin\theta)$$ $|z| = \sqrt{x^2 + y^2} = r$ $\arg(z) = \theta = \arctan(y/x)$ $z^* = x - iy = re^{-i\theta}$ (complex conjugate)

Polar Form

$z_1z_2 = r_1r_2e^{i(\theta_1+\theta_2)}$$ $$\frac{z_1}{z_2} = \frac{r_1}{r_2}e^{i(\theta_1-\theta_2)}$$ De Moivre's theorem: $z^n = r^n e^{in\theta}$ $n$th roots: $z^{1/n} = r^{1/n} e^{i(\theta+2\pi k)/n}$, $k = 0,1,\ldots,n-1

Analytic Functions

Cauchy-Riemann Equations

If $f(z) = u(x,y) + iv(x,y)$ is analytic:

\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}$$ $$\frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}

Elementary Functions

$e^z = e^x(\cos y + i\sin y)$$ $$\sin z = \frac{e^{iz} - e^{-iz}}{2i}$$ $$\cos z = \frac{e^{iz} + e^{-iz}}{2}$$ $\log z = \ln|z| + i(\arg z + 2\pi n)

Power Series

$f(z) = \sum_{n=0}^{\infty} a_n(z - z_0)^n$$ Radius of convergence: $R = \frac{1}{\limsup_{n \to \infty} |a_n|^{1/n}}$ Convergent for $|z - z_0| < R

Integration & Residues

Cauchy's Theorem

If $f$ is analytic in simply connected domain $D$ and $C$ is a closed curve in $D$:

\oint_C f(z) \, dz = 0

Residue Theorem

\oint_C f(z) \, dz = 2\pi i \sum_k \text{Res}(f, z_k)

where $z_k$ are poles inside $C$

Computing Residues

Simple pole at $z_0$: $$\text{Res}(f, z_0) = \lim_{z \to z_0} (z - z_0)f(z)$$ Pole of order $n$: $$\text{Res}(f, z_0) = \frac{1}{(n-1)!} \lim_{z \to z_0} \frac{d^{n-1}}{dz^{n-1}}[(z-z_0)^n f(z)]$$

CS Applications:

Signal processing (Fourier transforms), control systems, quantum computing, computer graphics (fractals), digital filters

Graph Theory

Basic Definitions

Graph Properties

Graph $G = (V, E)$ where $V =$ vertices, $E =$ edges Order: $|V|$ (number of vertices) Size: $|E|$ (number of edges) Degree: $\deg(v) = $ number of edges incident to $v$ Handshaking lemma: $\sum_{v \in V} \deg(v) = 2|E|$

Special Graphs

Complete graph $K_n$: $|E| = \frac{n(n-1)}{2}$ Complete bipartite $K_{m,n}$: $|E| = mn$ Cycle $C_n$: $n$ vertices, $n$ edges Path $P_n$: $n$ vertices, $n-1$ edges Tree: connected, acyclic, $|E| = |V| - 1$

Matrix Representations

Adjacency matrix $A$: $A[i,j] = 1$ if edge $(i,j)$ exists Incidence matrix: rows = vertices, columns = edges Number of walks of length $k$ from $i$ to $j$: $(A^k)[i,j]$

Connectivity

Paths & Cycles

Walk: sequence of vertices $v_0, v_1, \ldots, v_k$ Trail: walk with distinct edges Path: walk with distinct vertices Connected: path exists between any two vertices Distance $d(u,v)$: length of shortest path from $u$ to $v$

Euler & Hamilton

Eulerian path: uses every edge exactly once Exists iff graph connected and $\leq 2$ odd-degree vertices Eulerian cycle: closed Eulerian path Exists iff graph connected and all vertices even degree Hamiltonian path: visits every vertex exactly once

Trees & Spanning Trees

Tree Properties

For a graph $G$ with $n$ vertices, these are equivalent:

1. $G$ is a tree 2. $G$ is connected and has $n-1$ edges 3. $G$ is acyclic and has $n-1$ edges 4. $G$ is connected and removing any edge disconnects it 5. $G$ is acyclic and adding any edge creates exactly one cycle

Kruskal's Algorithm (Minimum Spanning Tree)

                        1. Sort edges by weight
                        2. Initialize disjoint sets for each vertex
                        3. For each edge (u,v):
                           - If u and v in different components:
                             - Add edge to MST
                             - Union components of u and v
                        4. Stop when n-1 edges added
                        
                        Time complexity: O(E log E)
                    

Prim's Algorithm

                        1. Start with arbitrary vertex
                        2. Maintain priority queue of edges from current tree
                        3. Repeatedly add minimum weight edge that connects
                           tree to new vertex
                        4. Update priority queue
                        
                        Time complexity: O(E log V) with binary heap
                    

Shortest Paths

Dijkstra's Algorithm

                        1. Initialize distances: d[source] = 0, d[v] = ∞ for v ≠ source
                        2. Priority queue Q with all vertices
                        3. While Q not empty:
                           - u = extract minimum from Q
                           - For each neighbor v of u:
                             - If d[u] + weight(u,v) < d[v]:
                               - d[v] = d[u] + weight(u,v)
                               - parent[v] = u
                        
                        Time: O((V + E) log V) with binary heap
                        Requirements: non-negative weights
                    

Bellman-Ford Algorithm

                        1. Initialize distances: d[source] = 0, d[v] = ∞ for v ≠ source
                        2. For i = 1 to |V| - 1:
                           - For each edge (u,v):
                             - If d[u] + weight(u,v) < d[v]:
                               - d[v] = d[u] + weight(u,v)
                        3. Check for negative cycles
                        
                        Time: O(VE)
                        Handles negative weights
                    

Floyd-Warshall Algorithm

                        1. Initialize distance matrix D
                        2. For k = 1 to n:
                           - For i = 1 to n:
                             - For j = 1 to n:
                               - D[i][j] = min(D[i][j], D[i][k] + D[k][j])
                        
                        Time: O(V³)
                        Finds all-pairs shortest paths
                    

Network Flow

Max-Flow Min-Cut Theorem

In any flow network, the maximum value of flow equals the minimum capacity of all cuts.

Ford-Fulkerson Method

                        1. Initialize flow f = 0
                        2. While there exists augmenting path P from source to sink:
                           - Find bottleneck capacity c along P
                           - Augment flow along P by c
                           - Update residual graph
                        3. Return maximum flow
                        
                        Time: O(E × max_flow) for integer capacities
                        With Edmonds-Karp (BFS): O(VE²)
                    

CS Applications:

Social networks, internet routing, database optimization, compiler optimization, bioinformatics, recommendation systems

Graph Coloring

Chromatic Number

\chi(G) = $ minimum number of colors needed $\chi(K_n) = n$ $\chi(C_n) = 2$ if $n$ even, $3$ if $n$ odd $\chi(\text{bipartite graph}) = 2$ Four Color Theorem: $\chi(\text{planar graph}) \leq 4

Greedy Coloring

Brooks' Theorem: If $G$ connected, not complete, not odd cycle, then $\chi(G) \leq \Delta(G)$ where $\Delta(G) = $ maximum degree

Number Theory

Divisibility & GCD

Division Algorithm

For integers $a, b$ with $b > 0$: $$a = qb + r \text{ where } 0 \leq r < b$$ $q = $ quotient, $r = $ remainder $a \equiv r \pmod{b}$

Euclidean Algorithm

$\gcd(a, b) = \gcd(b, a \bmod b)$$ Extended Euclidean Algorithm: $$\gcd(a, b) = ax + by$$ for some integers $x, y$ Time complexity: $O(\log \min(a, b))

Properties

$\gcd(a, b) \times \text{lcm}(a, b) = ab$$ If $\gcd(a, n) = 1$ and $\gcd(b, n) = 1$, then $\gcd(ab, n) = 1$ Bézout's identity: $\gcd(a, b) = ax + by

Modular Arithmetic

Congruences

a \equiv b \pmod{n}$ iff $n \mid (a - b)$ Properties: $(a + b) \bmod n = ((a \bmod n) + (b \bmod n)) \bmod n$ $(a \times b) \bmod n = ((a \bmod n) \times (b \bmod n)) \bmod n$ Reflexive: $a \equiv a \pmod{n}$ Symmetric: $a \equiv b \implies b \equiv a \pmod{n}$ Transitive: $a \equiv b, b \equiv c \implies a \equiv c \pmod{n}

Modular Inverse

a^{-1} \pmod{n}$ exists iff $\gcd(a, n) = 1$ Finding inverse: use Extended Euclidean Algorithm $ax + ny = 1 \implies ax \equiv 1 \pmod{n} \implies a^{-1} \equiv x \pmod{n}$ Fermat's Little Theorem: If $p$ prime, $\gcd(a, p) = 1$: $$a^{-1} \equiv a^{p-2} \pmod{p}$

Prime Numbers

Fundamental Theorem of Arithmetic

Every integer $n > 1$ has a unique prime factorization:

n = p_1^{a_1} \times p_2^{a_2} \times \cdots \times p_k^{a_k}

where $p_1 < p_2 < \cdots < p_k$ are primes and $a_i > 0$

Prime Testing

Trial division: test divisibility up to $\sqrt{n}$ Time: $O(\sqrt{n})$ Sieve of Eratosthenes: find all primes $\leq n$ Time: $O(n \log \log n)$ Miller-Rabin: probabilistic primality test Time: $O(k \log^3 n)$ for $k$ rounds

Prime Distribution

Prime Number Theorem: $$\pi(n) \approx \frac{n}{\ln(n)}$$ where $\pi(n) = $ number of primes $\leq n$ There are infinitely many primes

Euler's Function

Definition & Properties

$\varphi(n) = |\{k : 1 \leq k \leq n, \gcd(k, n) = 1\}|$$ $\varphi(p) = p - 1$ for prime $p$ $\varphi(p^k) = p^k - p^{k-1} = p^{k-1}(p - 1)$ If $\gcd(m, n) = 1$: $\varphi(mn) = \varphi(m)\varphi(n)

Euler's Theorem

If $\gcd(a, n) = 1$, then: $$a^{\varphi(n)} \equiv 1 \pmod{n}$$ Fermat's Little Theorem (special case): If $p$ prime and $\gcd(a, p) = 1$: $$a^{p-1} \equiv 1 \pmod{p}$$

Chinese Remainder Theorem

CRT Statement

If $n_1, n_2, \ldots, n_k$ are pairwise coprime, then the system:

\begin{cases} x \equiv a_1 \pmod{n_1} \\ x \equiv a_2 \pmod{n_2} \\ \vdots \\ x \equiv a_k \pmod{n_k} \end{cases}

has a unique solution modulo $N = n_1n_2 \cdots n_k$

CRT Algorithm

                        1. Compute N = n₁n₂...nₖ
                        2. For each i, compute Nᵢ = N/nᵢ
                        3. Find Mᵢ such that NᵢMᵢ ≡ 1 (mod nᵢ)
                        4. Solution: x ≡ Σ aᵢNᵢMᵢ (mod N)
                    

CS Applications:

Cryptography (RSA, ElGamal), hash functions, pseudorandom number generation, error-correcting codes, computer algebra

Algorithm Analysis

Asymptotic Notation

Big O Notation

$f(n) = O(g(n))$ if $\exists c > 0, n_0$ such that $f(n) \leq c \cdot g(n)$ for all $n \geq n_0$ Upper bound: $f$ grows no faster than $g$ $f(n) = \Omega(g(n))$ if $g(n) = O(f(n))$ Lower bound: $f$ grows at least as fast as $g$ $f(n) = \Theta(g(n))$ if $f(n) = O(g(n))$ and $f(n) = \Omega(g(n))$ Tight bound: $f$ and $g$ grow at same rate

Common Complexities

$$O(1) < O(\log n) < O(n) < O(n \log n) < O(n^2) < O(n^3) < O(2^n) < O(n!)$$ Logarithmic: binary search, balanced tree operations Linear: linear search, single loop Linearithmic: merge sort, heap sort Quadratic: nested loops, simple sorting Exponential: brute force subsets Factorial: brute force permutations

Sorting Algorithms

Algorithm	Best Case	Average Case	Worst Case	Space	Stable
Bubble Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Selection Sort	O(n²)	O(n²)	O(n²)	O(1)	No
Insertion Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Merge Sort	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes
Quick Sort	O(n log n)	O(n log n)	O(n²)	O(log n)	No
Heap Sort	O(n log n)	O(n log n)	O(n log n)	O(1)	No
Counting Sort	O(n + k)	O(n + k)	O(n + k)	O(k)	Yes
Radix Sort	O(d(n + k))	O(d(n + k))	O(d(n + k))	O(n + k)	Yes

Search Algorithms

Binary Search

Prerequisite: sorted array Algorithm: ``` while low ≤ high: mid = (low + high) / 2 if arr[mid] == target: return mid elif arr[mid] < target: low = mid + 1 else: high = mid - 1 ``` Time: $O(\log n)$, Space: $O(1)$

Hash Table Search

Average case: $O(1)$ for search, insert, delete Worst case: $O(n)$ if many collisions Load factor $\alpha = \frac{n}{m}$ (items/buckets) Keep $\alpha < 0.75$ for good performance Collision resolution: chaining, open addressing

Dynamic Programming

Optimal Substructure

A problem has optimal substructure if optimal solution contains optimal solutions to subproblems Examples: shortest paths, matrix chain multiplication, optimal binary search trees, edit distance

Overlapping Subproblems

Recursive algorithm solves same subproblems repeatedly Memoization: top-down approach, store results Tabulation: bottom-up approach, fill table Fibonacci: naive $O(2^n)$, DP $O(n)$

Classic DP Problems

                        Knapsack Problem: max value with weight constraint
                        DP[i][w] = max value using first i items, weight ≤ w
                        
                        Longest Common Subsequence:
                        LCS[i][j] = length of LCS of first i chars of X, first j chars of Y
                        
                        Edit Distance (Levenshtein):
                        ED[i][j] = min operations to transform X[1..i] to Y[1..j]
                    

Greedy Algorithms

Greedy Choice Property

Global optimum can be reached by making locally optimal choices at each step

Activity Selection

Select maximum number of non-overlapping activities Greedy strategy: sort by finish time, always pick earliest finishing activity Time: $O(n \log n)$

Huffman Coding

Optimal prefix-free binary codes Greedy strategy: repeatedly merge two nodes with minimum frequency Time: $O(n \log n)$

Complexity Classes

P vs NP

P: problems solvable in polynomial time NP: problems verifiable in polynomial time NP-Complete: hardest problems in NP NP-Hard: at least as hard as NP-Complete $P \subseteq NP$, but $P = NP$? is open question

Reduction

Problem $A$ reduces to $B$ ($A \leq_p B$) if $A$ can be solved using polynomial number of calls to $B$ If $A \leq_p B$ and $B \in P$, then $A \in P$ Used to prove NP-completeness

Cryptography

Classical Cryptography

Caesar Cipher

Encryption: $E(x) = (x + k) \bmod 26$ Decryption: $D(y) = (y - k) \bmod 26$ Key space: 26 possible keys Vulnerable to frequency analysis

Affine Cipher

Encryption: $E(x) = (ax + b) \bmod 26$ Decryption: $D(y) = a^{-1}(y - b) \bmod 26$ Requirement: $\gcd(a, 26) = 1$ Key space: $\varphi(26) \times 26 = 12 \times 26 = 312$

Vigenère Cipher

Encryption: $C_i = (M_i + K_{i \bmod \text{keylen}}) \bmod 26$ Polyalphabetic substitution Broken by Kasiski examination and index of coincidence

Public Key Cryptography

RSA Algorithm

Key Generation: 1. Choose large primes $p, q$ 2. $n = pq$, $\varphi(n) = (p-1)(q-1)$ 3. Choose $e$: $\gcd(e, \varphi(n)) = 1$ 4. Find $d$: $ed \equiv 1 \pmod{\varphi(n)}$ Public key: $(n, e)$ Private key: $(n, d)$ Encryption: $C = M^e \bmod n$ Decryption: $M = C^d \bmod n$

Why RSA works: By Euler's theorem, if $\gcd(M, n) = 1$, then $M^{\varphi(n)} \equiv 1 \pmod{n}$. Since $ed \equiv 1 \pmod{\varphi(n)}$, we have $ed = 1 + k\varphi(n)$ for some $k$. Therefore: $(M^e)^d = M^{ed} = M^{1+k\varphi(n)} = M \cdot (M^{\varphi(n)})^k \equiv M \cdot 1^k = M \pmod{n}$

Diffie-Hellman Key Exchange

Public: prime $p$, generator $g$ Alice: chooses $a$, sends $g^a \bmod p$ Bob: chooses $b$, sends $g^b \bmod p$ Shared secret: $K = g^{ab} \bmod p$ Security based on discrete log problem

ElGamal Encryption

Key Generation: Choose prime $p$, generator $g$, private key $x$ Public key: $y = g^x \bmod p$ Encryption of $M$: Choose random $k$, send $(g^k \bmod p, M \cdot y^k \bmod p)$ Decryption: $M = c_2 \cdot (c_1^x)^{-1} \bmod p$

Hash Functions

Properties

Pre-image resistance: given $h$, hard to find $x$ with $H(x) = h$ Second pre-image resistance: given $x$, hard to find $x' \neq x$ with $H(x) = H(x')$ Collision resistance: hard to find $x, x'$ with $H(x) = H(x')$ Birthday paradox: expect collision after $\sqrt{2^n}$ inputs for $n$-bit hash

Applications

Digital signatures: sign hash instead of message Password storage: store $H(\text{password} + \text{salt})$ Proof of work: find nonce with $H(\text{block} + \text{nonce}) < \text{target}$ Data integrity: compare hash values

Digital Signatures

RSA Signatures

Signing: $S = H(M)^d \bmod n$ Verification: check if $S^e \equiv H(M) \pmod{n}$ Must use padding (PSS) to prevent attacks

DSA (Digital Signature Algorithm)

Parameters: prime $p$, $q \mid (p-1)$, generator $g$ Private key: $x$, Public key: $y = g^x \bmod p$ Signing $M$: $k = $ random, $r = (g^k \bmod p) \bmod q$ $s = k^{-1}(H(M) + xr) \bmod q$ Signature: $(r, s)$ Verification: check if $r = ((g^{H(M)s^{-1}} \cdot y^{rs^{-1}}) \bmod p) \bmod q$

CS Applications:

Secure communications, blockchain technology, password authentication, digital certificates, software signing, cryptocurrency

Optimization

Linear Programming

Standard Form

Minimize: $\mathbf{c}^T \mathbf{x}$ Subject to: $A\mathbf{x} = \mathbf{b}$, $\mathbf{x} \geq \mathbf{0}$ Can convert any LP to standard form: - $\max f \rightarrow \min(-f)$ - inequality $\rightarrow$ equality (add slack variables) - unrestricted variable $\rightarrow$ difference of non-negative variables

Simplex Method

1. Start at basic feasible solution (vertex of polytope) 2. Check optimality conditions 3. If not optimal, move to adjacent vertex that improves objective 4. Repeat until optimal or unbounded Worst case: exponential Average case: polynomial

Fundamental Theorem of Linear Programming

If LP has optimal solution, then it has optimal basic feasible solution (vertex of feasible region)

Duality Theorem

Primal: $\min \mathbf{c}^T \mathbf{x}$ subject to $A\mathbf{x} \geq \mathbf{b}$, $\mathbf{x} \geq \mathbf{0}$ Dual: $\max \mathbf{b}^T \mathbf{y}$ subject to $A^T \mathbf{y} \leq \mathbf{c}$, $\mathbf{y} \geq \mathbf{0}$ Strong duality: if both have optimal solutions, optimal values are equal

Nonlinear Optimization

Unconstrained Optimization

First-order condition: $\nabla f(\mathbf{x}^*) = \mathbf{0}$ Second-order condition: $\nabla^2 f(\mathbf{x}^*)$ positive definite Gradient descent: $\mathbf{x}_{k+1} = \mathbf{x}_k - \alpha_k \nabla f(\mathbf{x}_k)$ Newton's method: $\mathbf{x}_{k+1} = \mathbf{x}_k - [\nabla^2 f(\mathbf{x}_k)]^{-1} \nabla f(\mathbf{x}_k)$

Constrained Optimization

KKT conditions for $\min f(\mathbf{x})$ subject to $g(\mathbf{x}) \leq \mathbf{0}$, $h(\mathbf{x}) = \mathbf{0}$: $\nabla f(\mathbf{x}^*) + \sum_i \lambda_i \nabla g_i(\mathbf{x}^*) + \sum_j \mu_j \nabla h_j(\mathbf{x}^*) = \mathbf{0}$ $g_i(\mathbf{x}^*) \leq 0$, $h_j(\mathbf{x}^*) = 0$ $\lambda_i \geq 0$, $\lambda_i g_i(\mathbf{x}^*) = 0$ (complementary slackness)

Convex Optimization

Convex Sets & Functions

Convex set: $\forall \mathbf{x}, \mathbf{y} \in S, \forall \lambda \in [0,1]: \lambda\mathbf{x} + (1-\lambda)\mathbf{y} \in S$ Convex function: $f(\lambda\mathbf{x} + (1-\lambda)\mathbf{y}) \leq \lambda f(\mathbf{x}) + (1-\lambda)f(\mathbf{y})$ Properties: - Local minimum = global minimum - First-order condition sufficient for optimality

Interior Point Methods

Barrier method: replace constraints $g(\mathbf{x}) \leq \mathbf{0}$ with penalty terms $-t \sum_i \log(-g_i(\mathbf{x}))$ As $t \to \infty$, solution approaches constrained optimum Polynomial time complexity for convex problems

CS Applications:

Machine learning (SVM, neural network training), resource allocation, network flow optimization, compiler optimization, game theory

Integer Programming

Branch and Bound

1. Solve LP relaxation 2. If solution integer, update best bound 3. If not integer, branch on fractional variable 4. Recursively solve subproblems 5. Prune if bound worse than current best NP-hard in general, but practical for many instances

Cutting Planes

Add linear constraints (cuts) that remove fractional solutions but preserve all integer solutions Gomory cuts: derived from optimal LP tableau Problem-specific cuts often more effective

Mathematical Software & Tools

Symbolic Computation

Mathematica, Maple, SymPy (Python) Exact arithmetic, symbolic integration/differentiation, equation solving, simplification Computer algebra systems

Numerical Computing

MATLAB, NumPy/SciPy (Python), R Linear algebra (BLAS, LAPACK) Optimization (CPLEX, Gurobi) Statistics and data analysis

Proof Assistants

Coq, Lean, Agda, Isabelle/HOL Formal verification of mathematical proofs Computer-checked mathematics Growing importance in CS theory

Computational Complexity

Complexity Zoo: database of complexity classes P, NP, PSPACE, EXPTIME, etc. Reductions and completeness results Interactive proofs, quantum complexity

Study Strategy for Advanced Math in CS:

Focus on applications: understand how each mathematical concept applies to computer science problems. Practice implementation: code algorithms and verify theoretical results computationally. Connect concepts: see relationships between different mathematical areas and their CS applications. Stay current: mathematics in CS is rapidly evolving, especially in machine learning and quantum computing.

Calculus

Multivariable Calculus

Differential Equations

Linear Algebra

Discrete Mathematics

Probability & Statistics

Calculus

Limits

Definition of a Limit

\lim_{x \to a} f(x) = L$$ means: For every $\varepsilon > 0$, there exists $\delta > 0$ such that $$0 < |x - a| < \delta \implies |f(x) - L| < \varepsilon

Important Limits

\lim_{x \to 0} \frac{\sin x}{x} = 1$$ $$\lim_{x \to 0} \frac{1 - \cos x}{x} = 0$$ $$\lim_{x \to 0} \frac{e^x - 1}{x} = 1$$ $$\lim_{x \to \infty} \left(1 + \frac{1}{x}\right)^x = e

L'Hôpital's Rule

If $\lim_{x \to a} f(x) = \lim_{x \to a} g(x) = 0$ or $\pm\infty$, then: $$\lim_{x \to a} \frac{f(x)}{g(x)} = \lim_{x \to a} \frac{f'(x)}{g'(x)}$$ (provided the right side exists)

Derivatives

Definition

$f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}$$ Alternative notation: $\frac{df}{dx}$, $\frac{d}{dx}f(x)$, $D_x f

Derivative Rules

Power Rule: $\frac{d}{dx}(x^n) = nx^{n-1}$ Product Rule: $(fg)' = f'g + fg'$ Quotient Rule: $\left(\frac{f}{g}\right)' = \frac{f'g - fg'}{g^2}$ Chain Rule: $\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x)$

Common Derivatives

\frac{d}{dx}(\sin x) = \cos x \qquad \frac{d}{dx}(\cos x) = -\sin x$$ $$\frac{d}{dx}(e^x) = e^x \qquad \frac{d}{dx}(\ln x) = \frac{1}{x}$$ $$\frac{d}{dx}(\tan x) = \sec^2 x \qquad \frac{d}{dx}(\arctan x) = \frac{1}{1+x^2}

Mean Value Theorem

If $f$ is continuous on $[a,b]$ and differentiable on $(a,b)$, then there exists $c \in (a,b)$ such that: $$f'(c) = \frac{f(b) - f(a)}{b - a}$$

Integrals

Definite Integral

$\int_a^b f(x)\,dx = \lim_{n \to \infty} \sum_{i=1}^n f(x_i^*) \Delta x$$ where $\Delta x = \frac{b-a}{n}$ and $x_i^* \in [x_{i-1}, x_i]

Fundamental Theorem of Calculus

Part 1: If $F(x) = \int_a^x f(t)\,dt$, then $F'(x) = f(x)$ Part 2: If $F'(x) = f(x)$, then $$\int_a^b f(x)\,dx = F(b) - F(a)$$

Integration Techniques

Substitution: $\int f(g(x))g'(x)\,dx = \int f(u)\,du$ Integration by Parts: $\int u\,dv = uv - \int v\,du$ Partial Fractions: Decompose rational functions

Common Integrals

\int x^n\,dx = \frac{x^{n+1}}{n+1} + C \quad (n \neq -1)$$ $$\int \frac{1}{x}\,dx = \ln|x| + C$$ $$\int e^x\,dx = e^x + C$$ $$\int \sin x\,dx = -\cos x + C$$ $$\int \frac{1}{\sqrt{1-x^2}}\,dx = \arcsin x + C

Series and Sequences

Sequences

A sequence $\{a_n\}$ converges to $L$ if: $$\lim_{n \to \infty} a_n = L$$ Common limits: $$\lim_{n \to \infty} \frac{1}{n} = 0 \qquad \lim_{n \to \infty} \sqrt[n]{n} = 1$$ $$\lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n = e$$

Series Convergence Tests

Geometric Series: $\sum_{n=0}^{\infty} ar^n = \frac{a}{1-r}$ if $|r| < 1$ p-Series: $\sum_{n=1}^{\infty} \frac{1}{n^p}$ converges if $p > 1$ Ratio Test: If $\lim_{n \to \infty} \left|\frac{a_{n+1}}{a_n}\right| = L$ then series converges if $L < 1$

Taylor Series

f(x) = \sum_{n=0}^{\infty} \frac{f^{(n)}(a)}{n!}(x-a)^n$$ Common Taylor series: $$e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}$$ $$\sin x = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n+1}}{(2n+1)!}$$ $$\ln(1+x) = \sum_{n=1}^{\infty} \frac{(-1)^{n+1} x^n}{n}

CS Applications:

Machine learning (gradient descent, backpropagation), computer graphics (physics simulations), numerical methods, algorithm complexity analysis, signal processing

Applications of Calculus

Optimization

Critical points: $f'(x) = 0$ or $f'(x)$ undefined Second derivative test: - If $f''(c) > 0$: local minimum at $x = c$ - If $f''(c) < 0$: local maximum at $x = c$ - If $f''(c) = 0$: test inconclusive

Related Rates

If variables $x$ and $y$ are related by an equation and vary with time $t$: $$\frac{dy}{dt} = \frac{dy}{dx} \cdot \frac{dx}{dt}$$ Example: Area of circle $A = \pi r^2$ $$\frac{dA}{dt} = 2\pi r \frac{dr}{dt}$$

Arc Length & Surface Area

Arc length: $L = \int_a^b \sqrt{1 + (f'(x))^2}\,dx$ Surface area (revolution about x-axis): $$S = 2\pi \int_a^b f(x)\sqrt{1 + (f'(x))^2}\,dx$$

Multivariable Calculus

Partial Derivatives & Gradients

Partial Derivatives

$$\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h,y) - f(x,y)}{h}$$ Mixed partials theorem (Clairaut): $$\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial^2 f}{\partial y \partial x}$$ (if both are continuous)

Gradient & Directional Derivatives

$\nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right)$$ Directional derivative: $$D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}$$ Maximum rate of change: $|\nabla f|

Chain Rule (Multivariable)

If $z = f(x,y)$ where $x = g(t)$ and $y = h(t)$: $$\frac{dz}{dt} = \frac{\partial z}{\partial x}\frac{dx}{dt} + \frac{\partial z}{\partial y}\frac{dy}{dt}$$ General form: $$\frac{\partial z}{\partial t} = \sum_{i} \frac{\partial z}{\partial x_i}\frac{\partial x_i}{\partial t}$$

CS Applications:

Machine learning optimization (gradient descent), computer graphics (surface normals), image processing (edge detection), neural network training

Multiple Integrals

Double Integrals

\iint_R f(x,y) \, dA = \int_a^b \int_{c(x)}^{d(x)} f(x,y) \, dy \, dx$$ Fubini's Theorem: For continuous $f$, $$\int_a^b \int_c^d f(x,y) \, dy \, dx = \int_c^d \int_a^b f(x,y) \, dx \, dy

Change of Variables

$\iint_R f(x,y) \, dx \, dy = \iint_S f(x(u,v), y(u,v)) |J| \, du \, dv$$ Jacobian: $J = \frac{\partial(x,y)}{\partial(u,v)} = \begin{vmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{vmatrix}

Cylindrical & Spherical Coordinates

Cylindrical: $\begin{cases} x = r\cos\theta \\ y = r\sin\theta \\ z = z \end{cases}$ $dV = r \, dr \, d\theta \, dz$ Spherical: $\begin{cases} x = \rho\sin\phi\cos\theta \\ y = \rho\sin\phi\sin\theta \\ z = \rho\cos\phi \end{cases}$ $dV = \rho^2\sin\phi \, d\rho \, d\phi \, d\theta$

Vector Calculus

Vector Fields & Line Integrals

$\int_C \mathbf{F} \cdot d\mathbf{r} = \int_a^b \mathbf{F}(\mathbf{r}(t)) \cdot \mathbf{r}'(t) \, dt$$ Conservative field: $\nabla \times \mathbf{F} = \mathbf{0}$ If $\mathbf{F} = \nabla f$: $\int_C \mathbf{F} \cdot d\mathbf{r} = f(B) - f(A)

Green's Theorem

$\oint_C (P \, dx + Q \, dy) = \iint_D \left( \frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y} \right) dA$$ Area formula: $A = \frac{1}{2} \oint_C (x \, dy - y \, dx)

Divergence & Curl

\text{div } \mathbf{F} = \nabla \cdot \mathbf{F} = \frac{\partial P}{\partial x} + \frac{\partial Q}{\partial y} + \frac{\partial R}{\partial z}$$ $$\text{curl } \mathbf{F} = \nabla \times \mathbf{F} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \frac{\partial}{\partial x} & \frac{\partial}{\partial y} & \frac{\partial}{\partial z} \\ P & Q & R \end{vmatrix}

Stokes' Theorem

\oint_C \mathbf{F} \cdot d\mathbf{r} = \iint_S (\nabla \times \mathbf{F}) \cdot \mathbf{n} \, dS

Relates line integral around boundary to surface integral of curl

Divergence Theorem

\iint_S \mathbf{F} \cdot \mathbf{n} \, dS = \iiint_E \nabla \cdot \mathbf{F} \, dV

Relates surface integral to volume integral of divergence

Differential Equations

First-Order ODEs

Separable Equations

\frac{dy}{dx} = f(x)g(y)$$ Solution method: $$\int \frac{dy}{g(y)} = \int f(x) \, dx

Linear First-Order

$\frac{dy}{dx} + P(x)y = Q(x)$$ Integrating factor: $\mu(x) = e^{\int P(x) \, dx}$ Solution: $y = \frac{1}{\mu(x)}\left[\int \mu(x)Q(x) \, dx + C\right]

Exact Equations

$M(x,y) \, dx + N(x,y) \, dy = 0$$ Exact if: $\frac{\partial M}{\partial y} = \frac{\partial N}{\partial x}$ Solution: Find $F(x,y)$ where $\frac{\partial F}{\partial x} = M$ and $\frac{\partial F}{\partial y} = N

Second-Order Linear ODEs

Homogeneous with Constant Coefficients

$ay'' + by' + cy = 0$$ Characteristic equation: $ar^2 + br + c = 0$ Solutions: - If $r_1 \neq r_2$: $y = c_1e^{r_1x} + c_2e^{r_2x}$ - If $r_1 = r_2 = r$: $y = (c_1 + c_2x)e^{rx}$ - If $r = \alpha \pm \beta i$: $y = e^{\alpha x}(c_1\cos(\beta x) + c_2\sin(\beta x))

Method of Undetermined Coefficients

For $ay'' + by' + cy = f(x)$: - If $f(x) = e^{\alpha x}$: try $y_p = Ae^{\alpha x}$ - If $f(x) = \text{polynomial}$: try polynomial of same degree - If $f(x) = \sin(\alpha x)$ or $\cos(\alpha x)$: try $y_p = A\cos(\alpha x) + B\sin(\alpha x)$

Variation of Parameters

For $y'' + P(x)y' + Q(x)y = f(x)$ If $y_1, y_2$ are homogeneous solutions: $$y_p = -y_1\int\frac{y_2f}{W}dx + y_2\int\frac{y_1f}{W}dx$$ Wronskian: $W = y_1y_2' - y_2y_1'$

CS Applications:

System dynamics, control theory, signal processing, population models in algorithms, oscillator circuits, physics simulations

Laplace Transforms

Definition & Properties

$\mathcal{L}\{f(t)\} = F(s) = \int_0^{\infty} e^{-st}f(t) \, dt$$ Properties: - $\mathcal{L}\{af + bg\} = a\mathcal{L}\{f\} + b\mathcal{L}\{g\}$ - $\mathcal{L}\{f'\} = sF(s) - f(0)$ - $\mathcal{L}\{f''\} = s^2F(s) - sf(0) - f'(0)

Common Transforms

\mathcal{L}\{1\} = \frac{1}{s}$$ $$\mathcal{L}\{t^n\} = \frac{n!}{s^{n+1}}$$ $$\mathcal{L}\{e^{at}\} = \frac{1}{s-a}$$ $$\mathcal{L}\{\sin(at)\} = \frac{a}{s^2+a^2}$$ $$\mathcal{L}\{\cos(at)\} = \frac{s}{s^2+a^2}

Systems of ODEs

Matrix Form

$\mathbf{x}' = A\mathbf{x} + \mathbf{b}$$ Solution: $\mathbf{x}(t) = e^{At}\mathbf{x}_0 + \int_0^t e^{A(t-\tau)}\mathbf{b}(\tau) \, d\tau$ Matrix exponential: $e^{At} = \sum_{n=0}^{\infty} \frac{A^n t^n}{n!}

Eigenvalue Method

If $A\mathbf{v} = \lambda\mathbf{v}$, then $\mathbf{x} = \mathbf{v}e^{\lambda t}$ is a solution General solution: $$\mathbf{x} = c_1\mathbf{v}_1e^{\lambda_1 t} + c_2\mathbf{v}_2e^{\lambda_2 t} + \cdots$$

Linear Algebra

Matrix Operations & Properties

Basic Operations

(AB)C = A(BC)$ - Associative $A(B + C) = AB + AC$ - Distributive $(AB)^T = B^T A^T$ $(AB)^{-1} = B^{-1}A^{-1}

Determinants

\det(AB) = \det(A)\det(B)$ $\det(A^T) = \det(A)$ $\det(A^{-1}) = \frac{1}{\det(A)}$ For $2 \times 2$: $\det\begin{pmatrix}a & b\\c & d\end{pmatrix} = ad - bc

Matrix Inverse

A^{-1} = \frac{1}{\det(A)} \cdot \text{adj}(A)$ For $2 \times 2$: $A^{-1} = \frac{1}{ad-bc}\begin{pmatrix}d & -b\\-c & a\end{pmatrix}$ Gauss-Jordan: $[A|I] \rightarrow [I|A^{-1}]

Vector Spaces & Linear Independence

Vector Space Axioms

A vector space $V$ over field $F$ satisfies:

1. $\mathbf{u} + \mathbf{v} \in V$ (closure under addition) 2. $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ (commutativity) 3. $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$ (associativity) 4. $\exists \mathbf{0} \in V: \mathbf{v} + \mathbf{0} = \mathbf{v}$ (additive identity) 5. $\forall \mathbf{v} \exists (-\mathbf{v}): \mathbf{v} + (-\mathbf{v}) = \mathbf{0}$ (additive inverse) 6-10. Scalar multiplication axioms

Linear Independence

Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n$ are linearly independent if: $$c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_n\mathbf{v}_n = \mathbf{0} \implies c_1 = c_2 = \cdots = c_n = 0$$ Matrix form: $A\mathbf{x} = \mathbf{0}$ has only trivial solution

Basis & Dimension

Basis: linearly independent set that spans the space $\dim(V) = $ number of vectors in any basis If $\dim(V) = n$, then any $n$ linearly independent vectors form a basis

Eigenvalues & Eigenvectors

Characteristic Equation

$A\mathbf{v} = \lambda\mathbf{v} \quad (\mathbf{v} \neq \mathbf{0})$$ Characteristic polynomial: $\det(A - \lambda I) = 0$ Eigenspace: $E_{\lambda} = \{\mathbf{v} : A\mathbf{v} = \lambda\mathbf{v}\} = \text{null}(A - \lambda I)

Diagonalization

$A = PDP^{-1}$$ where $D$ is diagonal $P = [\mathbf{v}_1 \, \mathbf{v}_2 \, \cdots \, \mathbf{v}_n]$ (eigenvectors) $D = \text{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)$ (eigenvalues) $A^k = PD^k P^{-1}

Properties

$$\text{tr}(A) = \lambda_1 + \lambda_2 + \cdots + \lambda_n$$ $$\det(A) = \lambda_1 \times \lambda_2 \times \cdots \times \lambda_n$$ If $A$ symmetric: all eigenvalues real, eigenvectors orthogonal

CS Applications:

Google PageRank algorithm, Principal Component Analysis (PCA), computer graphics transformations, quantum computing gates, network analysis

Orthogonality & Least Squares

Inner Products & Norms

$\langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{u}^T\mathbf{v} = u_1v_1 + u_2v_2 + \cdots + u_nv_n$$ $$\|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle} = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}$$ Cauchy-Schwarz: $|\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\| \|\mathbf{v}\|

Gram-Schmidt Process

$\mathbf{u}_1 = \mathbf{v}_1$$ $$\mathbf{u}_2 = \mathbf{v}_2 - \text{proj}_{\mathbf{u}_1}\mathbf{v}_2$$ $$\mathbf{u}_3 = \mathbf{v}_3 - \text{proj}_{\mathbf{u}_1}\mathbf{v}_3 - \text{proj}_{\mathbf{u}_2}\mathbf{v}_3$$ where $\text{proj}_{\mathbf{u}}\mathbf{v} = \frac{\langle \mathbf{v}, \mathbf{u} \rangle}{\langle \mathbf{u}, \mathbf{u} \rangle}\mathbf{u}

Least Squares

Normal equation: $A^T A\hat{\mathbf{x}} = A^T\mathbf{b}$ Solution: $\hat{\mathbf{x}} = (A^T A)^{-1}A^T\mathbf{b}$ If $A$ has orthonormal columns: $\hat{\mathbf{x}} = A^T\mathbf{b}$

Singular Value Decomposition

SVD Theorem

Every $m \times n$ matrix $A$ can be written as:

$$A = U\Sigma V^T$$ $U$: $m \times m$ orthogonal (left singular vectors) $\Sigma$: $m \times n$ diagonal (singular values $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$) $V$: $n \times n$ orthogonal (right singular vectors)

CS Applications:

Image compression, dimensionality reduction, recommender systems, data analysis, pseudoinverse computation

Discrete Mathematics

Logic & Proofs

Logical Operators

\neg$ (NOT), $\land$ (AND), $\lor$ (OR), $\rightarrow$ (IMPLIES), $\leftrightarrow$ (IFF) De Morgan's Laws: $$\neg(P \land Q) \equiv \neg P \lor \neg Q$$ $$\neg(P \lor Q) \equiv \neg P \land \neg Q$$ $P \rightarrow Q \equiv \neg P \lor Q

Quantifiers

\forall x \, P(x)$: "for all $x$, $P(x)$" $\exists x \, P(x)$: "there exists $x$ such that $P(x)$" $$\neg\forall x \, P(x) \equiv \exists x \, \neg P(x)$$ $$\neg\exists x \, P(x) \equiv \forall x \, \neg P(x)$

Proof Techniques

Direct proof: $P \rightarrow Q$ Contrapositive: $\neg Q \rightarrow \neg P$ Contradiction: Assume $\neg Q$ and derive contradiction Induction: Base case + inductive step

Set Theory

Set Operations

A \cup B = \{x : x \in A \text{ or } x \in B\}$ $A \cap B = \{x : x \in A \text{ and } x \in B\}$ $A - B = \{x : x \in A \text{ and } x \notin B\}$ $A^c = \{x : x \notin A\}$ $|A \cup B| = |A| + |B| - |A \cap B|

Relations

Reflexive: $\forall x \, (x,x) \in R$ Symmetric: $\forall x,y \, ((x,y) \in R \rightarrow (y,x) \in R)$ Transitive: $\forall x,y,z \, ((x,y) \in R \land (y,z) \in R \rightarrow (x,z) \in R)$ Equivalence relation: reflexive + symmetric + transitive

Functions

Injective (one-to-one): $\forall x,y \, (f(x) = f(y) \rightarrow x = y)$ Surjective (onto): $\forall y \, \exists x \, (f(x) = y)$ Bijective: injective + surjective $|A| = |B|$ iff $\exists$ bijection $f: A \rightarrow B$

Combinatorics

Counting Principles

Addition Principle: $|A \cup B| = |A| + |B|$ if $A \cap B = \emptyset$ Multiplication Principle: $|A \times B| = |A| \times |B|$ Pigeonhole Principle: If $n$ objects in $m$ boxes and $n > m$, then some box contains $> 1$ object

Permutations & Combinations

$P(n,r) = \frac{n!}{(n-r)!} \text{ (permutations)}$$ $$C(n,r) = \binom{n}{r} = \frac{n!}{r!(n-r)!} \text{ (combinations)}$$ Binomial theorem: $(x+y)^n = \sum_{k=0}^n \binom{n}{k} x^k y^{n-k}$ Stars and bars: $\binom{n+k-1}{k-1}

Inclusion-Exclusion

|A_1 \cup A_2 \cup \cdots \cup A_n| =$$ $$\sum |A_i| - \sum |A_i \cap A_j| + \sum |A_i \cap A_j \cap A_k| - \cdots$$ $$+ (-1)^{n+1}|A_1 \cap \cdots \cap A_n|

CS Applications:

Algorithm analysis, database query optimization, complexity theory, coding theory, cryptographic protocols

Generating Functions

Ordinary Generating Functions

$G(x) = \sum_{n=0}^{\infty} a_n x^n = a_0 + a_1x + a_2x^2 + \cdots$$ Fibonacci: $F(x) = \frac{x}{1 - x - x^2}$ Geometric: $\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n

Exponential Generating Functions

$$E(x) = \sum_{n=0}^{\infty} a_n \frac{x^n}{n!}$$ $e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}$ Useful for problems with labels/arrangements

Recurrence Relations

Linear Homogeneous

$a_n = c_1a_{n-1} + c_2a_{n-2} + \cdots + c_ka_{n-k}$$ Characteristic equation: $r^k - c_1r^{k-1} - \cdots - c_k = 0$ If roots distinct: $a_n = A_1r_1^n + A_2r_2^n + \cdots

Master Theorem

For $T(n) = aT(n/b) + f(n)$: Case 1: $f(n) = O(n^{\log_b a - \varepsilon}) \Rightarrow T(n) = \Theta(n^{\log_b a})$ Case 2: $f(n) = \Theta(n^{\log_b a}) \Rightarrow T(n) = \Theta(n^{\log_b a} \log n)$ Case 3: $f(n) = \Omega(n^{\log_b a + \varepsilon}) \Rightarrow T(n) = \Theta(f(n))$

Probability & Statistics

Probability Theory

Basic Probability

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$ $$P(A^c) = 1 - P(A)$$ $$P(A|B) = \frac{P(A \cap B)}{P(B)}$$ Independence: $P(A \cap B) = P(A)P(B)

Bayes' Theorem

P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$ Law of Total Probability: $$P(B) = \sum_i P(B|A_i)P(A_i)

Random Variables

$E[X] = \sum_x x \cdot P(X = x) \text{ (discrete)}$$ $$E[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx \text{ (continuous)}$$ $\text{Var}(X) = E[X^2] - (E[X])^2$ $\text{Var}(aX + b) = a^2\text{Var}(X)

Common Distributions

Discrete Distributions

Bernoulli: $P(X = 1) = p$, $P(X = 0) = 1-p$ Binomial: $P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}$ $E[X] = np$, $\text{Var}(X) = np(1-p)$ Poisson: $P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$ $E[X] = \text{Var}(X) = \lambda$

Continuous Distributions

Uniform: $f(x) = \frac{1}{b-a}$ for $x \in [a,b]$ $E[X] = \frac{a+b}{2}$, $\text{Var}(X) = \frac{(b-a)^2}{12}$ Normal: $f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ $E[X] = \mu$, $\text{Var}(X) = \sigma^2$ Exponential: $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$ $E[X] = \frac{1}{\lambda}$, $\text{Var}(X) = \frac{1}{\lambda^2}$

Central Limit Theorem

If $X_1, X_2, \ldots, X_n$ are i.i.d. with $E[X_i] = \mu$, $\text{Var}(X_i) = \sigma^2$: $$\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1) \text{ as } n \to \infty$$ where $\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$

Statistical Inference

Confidence Intervals

For mean $\mu$ ($\sigma$ known): $$\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}$$ For mean $\mu$ ($\sigma$ unknown): $$\bar{x} \pm t_{\alpha/2,n-1} \times \frac{s}{\sqrt{n}}$$ For proportion $p$: $$\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Hypothesis Testing

Test statistic for mean: $$z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}} \text{ or } t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$$ p-value = P(observing test statistic or more extreme | $H_0$) Reject $H_0$ if p-value $< \alpha$

Regression

Simple linear: $\hat{y} = a + bx$ $$b = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}$$ $$a = \bar{y} - b\bar{x}$$ $R^2 = 1 - \frac{SSE}{SST}$ (coefficient of determination)

CS Applications:

Machine learning algorithms, A/B testing, data mining, quality assurance, performance analysis, randomized algorithms

Numerical Analysis

Root Finding

Bisection Method

If $f(a)f(b) < 0$, then $\exists c \in (a,b): f(c) = 0$ Algorithm: 1. $c = \frac{a + b}{2}$ 2. If $f(a)f(c) < 0$: $b = c$, else $a = c$ 3. Repeat until $|b - a| < $ tolerance Convergence: $O(\log_2((b-a)/\varepsilon))$

Newton's Method

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$$ Quadratic convergence when $f'(\text{root}) \neq 0$ May fail if $f'(x_n) \approx 0$ or poor initial guess

Secant Method

$x_{n+1} = x_n - f(x_n)\frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}$$ Approximates derivative using secant line Superlinear convergence: $\varphi = \frac{1+\sqrt{5}}{2} \approx 1.618

Interpolation

Lagrange Interpolation

$$P(x) = \sum_{i=0}^n y_i L_i(x)$$ where $L_i(x) = \prod_{j=0, j \neq i}^n \frac{x - x_j}{x_i - x_j}$ Unique polynomial of degree $\leq n$ through $n+1$ points

Newton's Divided Differences

$$P(x) = f[x_0] + f[x_0,x_1](x-x_0) + f[x_0,x_1,x_2](x-x_0)(x-x_1) + \cdots$$ $f[x_i,x_{i+1}] = \frac{f[x_{i+1}] - f[x_i]}{x_{i+1} - x_i}$ More efficient for adding new points

Spline Interpolation

Cubic spline: $S(x) = a_i + b_i(x-x_i) + c_i(x-x_i)^2 + d_i(x-x_i)^3$ Conditions: - $S(x_i) = y_i$ - $S'(x_i^-) = S'(x_i^+)$ - $S''(x_i^-) = S''(x_i^+)$

Numerical Integration

Composite Rules

Composite Trapezoidal: $$\int_a^b f(x)\,dx \approx \frac{h}{2}[f(x_0) + 2f(x_1) + \cdots + 2f(x_{n-1}) + f(x_n)]$$ Error: $O(h^2)$ Composite Simpson's: $$\int_a^b f(x)\,dx \approx \frac{h}{3}[f(x_0) + 4f(x_1) + 2f(x_2) + 4f(x_3) + \cdots + f(x_n)]$$ Error: $O(h^4)$

Gaussian Quadrature

$\int_{-1}^1 f(x)\,dx \approx \sum_{i=1}^n w_i f(x_i)$$ Optimal choice of points $x_i$ and weights $w_i$ $n$-point Gauss rule exact for polynomials of degree $\leq 2n-1

Ordinary Differential Equations

Euler's Method

$y_{n+1} = y_n + h \cdot f(x_n, y_n)$$ First-order accuracy: $O(h)$ Simple but may be unstable for large $h

Runge-Kutta Methods

RK4 (Fourth-order): $$k_1 = hf(x_n, y_n)$$ $$k_2 = hf(x_n + h/2, y_n + k_1/2)$$ $$k_3 = hf(x_n + h/2, y_n + k_2/2)$$ $$k_4 = hf(x_n + h, y_n + k_3)$$ $$y_{n+1} = y_n + \frac{k_1 + 2k_2 + 2k_3 + k_4}{6}$$

CS Applications:

Computer graphics (ray tracing), scientific computing, game physics, signal processing, finite element analysis

Complex Analysis

Complex Numbers

Basic Operations

$$z = x + iy = re^{i\theta} = r(\cos\theta + i\sin\theta)$$ $|z| = \sqrt{x^2 + y^2} = r$ $\arg(z) = \theta = \arctan(y/x)$ $z^* = x - iy = re^{-i\theta}$ (complex conjugate)

Polar Form

$z_1z_2 = r_1r_2e^{i(\theta_1+\theta_2)}$$ $$\frac{z_1}{z_2} = \frac{r_1}{r_2}e^{i(\theta_1-\theta_2)}$$ De Moivre's theorem: $z^n = r^n e^{in\theta}$ $n$th roots: $z^{1/n} = r^{1/n} e^{i(\theta+2\pi k)/n}$, $k = 0,1,\ldots,n-1

Analytic Functions

Cauchy-Riemann Equations

If $f(z) = u(x,y) + iv(x,y)$ is analytic:

\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}$$ $$\frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}

Elementary Functions

$e^z = e^x(\cos y + i\sin y)$$ $$\sin z = \frac{e^{iz} - e^{-iz}}{2i}$$ $$\cos z = \frac{e^{iz} + e^{-iz}}{2}$$ $\log z = \ln|z| + i(\arg z + 2\pi n)

Power Series

$f(z) = \sum_{n=0}^{\infty} a_n(z - z_0)^n$$ Radius of convergence: $R = \frac{1}{\limsup_{n \to \infty} |a_n|^{1/n}}$ Convergent for $|z - z_0| < R

Integration & Residues

Cauchy's Theorem

If $f$ is analytic in simply connected domain $D$ and $C$ is a closed curve in $D$:

\oint_C f(z) \, dz = 0

Residue Theorem

\oint_C f(z) \, dz = 2\pi i \sum_k \text{Res}(f, z_k)

where $z_k$ are poles inside $C$

Computing Residues

Simple pole at $z_0$: $$\text{Res}(f, z_0) = \lim_{z \to z_0} (z - z_0)f(z)$$ Pole of order $n$: $$\text{Res}(f, z_0) = \frac{1}{(n-1)!} \lim_{z \to z_0} \frac{d^{n-1}}{dz^{n-1}}[(z-z_0)^n f(z)]$$

CS Applications:

Signal processing (Fourier transforms), control systems, quantum computing, computer graphics (fractals), digital filters

Graph Theory

Basic Definitions

Graph Properties

Graph $G = (V, E)$ where $V =$ vertices, $E =$ edges Order: $|V|$ (number of vertices) Size: $|E|$ (number of edges) Degree: $\deg(v) = $ number of edges incident to $v$ Handshaking lemma: $\sum_{v \in V} \deg(v) = 2|E|$

Special Graphs

Complete graph $K_n$: $|E| = \frac{n(n-1)}{2}$ Complete bipartite $K_{m,n}$: $|E| = mn$ Cycle $C_n$: $n$ vertices, $n$ edges Path $P_n$: $n$ vertices, $n-1$ edges Tree: connected, acyclic, $|E| = |V| - 1$

Matrix Representations

Adjacency matrix $A$: $A[i,j] = 1$ if edge $(i,j)$ exists Incidence matrix: rows = vertices, columns = edges Number of walks of length $k$ from $i$ to $j$: $(A^k)[i,j]$

Connectivity

Paths & Cycles

Walk: sequence of vertices $v_0, v_1, \ldots, v_k$ Trail: walk with distinct edges Path: walk with distinct vertices Connected: path exists between any two vertices Distance $d(u,v)$: length of shortest path from $u$ to $v$

Euler & Hamilton

Eulerian path: uses every edge exactly once Exists iff graph connected and $\leq 2$ odd-degree vertices Eulerian cycle: closed Eulerian path Exists iff graph connected and all vertices even degree Hamiltonian path: visits every vertex exactly once

Trees & Spanning Trees

Tree Properties

For a graph $G$ with $n$ vertices, these are equivalent:

1. $G$ is a tree 2. $G$ is connected and has $n-1$ edges 3. $G$ is acyclic and has $n-1$ edges 4. $G$ is connected and removing any edge disconnects it 5. $G$ is acyclic and adding any edge creates exactly one cycle

Kruskal's Algorithm (Minimum Spanning Tree)

                        1. Sort edges by weight
                        2. Initialize disjoint sets for each vertex
                        3. For each edge (u,v):
                           - If u and v in different components:
                             - Add edge to MST
                             - Union components of u and v
                        4. Stop when n-1 edges added
                        
                        Time complexity: O(E log E)
                    

Prim's Algorithm

                        1. Start with arbitrary vertex
                        2. Maintain priority queue of edges from current tree
                        3. Repeatedly add minimum weight edge that connects
                           tree to new vertex
                        4. Update priority queue
                        
                        Time complexity: O(E log V) with binary heap
                    

Shortest Paths

Dijkstra's Algorithm

                        1. Initialize distances: d[source] = 0, d[v] = ∞ for v ≠ source
                        2. Priority queue Q with all vertices
                        3. While Q not empty:
                           - u = extract minimum from Q
                           - For each neighbor v of u:
                             - If d[u] + weight(u,v) < d[v]:
                               - d[v] = d[u] + weight(u,v)
                               - parent[v] = u
                        
                        Time: O((V + E) log V) with binary heap
                        Requirements: non-negative weights
                    

Bellman-Ford Algorithm

                        1. Initialize distances: d[source] = 0, d[v] = ∞ for v ≠ source
                        2. For i = 1 to |V| - 1:
                           - For each edge (u,v):
                             - If d[u] + weight(u,v) < d[v]:
                               - d[v] = d[u] + weight(u,v)
                        3. Check for negative cycles
                        
                        Time: O(VE)
                        Handles negative weights
                    

Floyd-Warshall Algorithm

                        1. Initialize distance matrix D
                        2. For k = 1 to n:
                           - For i = 1 to n:
                             - For j = 1 to n:
                               - D[i][j] = min(D[i][j], D[i][k] + D[k][j])
                        
                        Time: O(V³)
                        Finds all-pairs shortest paths
                    

Network Flow

Max-Flow Min-Cut Theorem

In any flow network, the maximum value of flow equals the minimum capacity of all cuts.

Ford-Fulkerson Method

                        1. Initialize flow f = 0
                        2. While there exists augmenting path P from source to sink:
                           - Find bottleneck capacity c along P
                           - Augment flow along P by c
                           - Update residual graph
                        3. Return maximum flow
                        
                        Time: O(E × max_flow) for integer capacities
                        With Edmonds-Karp (BFS): O(VE²)
                    

CS Applications:

Social networks, internet routing, database optimization, compiler optimization, bioinformatics, recommendation systems

Graph Coloring

Chromatic Number

\chi(G) = $ minimum number of colors needed $\chi(K_n) = n$ $\chi(C_n) = 2$ if $n$ even, $3$ if $n$ odd $\chi(\text{bipartite graph}) = 2$ Four Color Theorem: $\chi(\text{planar graph}) \leq 4

Greedy Coloring

Brooks' Theorem: If $G$ connected, not complete, not odd cycle, then $\chi(G) \leq \Delta(G)$ where $\Delta(G) = $ maximum degree

Number Theory

Divisibility & GCD

Division Algorithm

For integers $a, b$ with $b > 0$: $$a = qb + r \text{ where } 0 \leq r < b$$ $q = $ quotient, $r = $ remainder $a \equiv r \pmod{b}$

Euclidean Algorithm

$\gcd(a, b) = \gcd(b, a \bmod b)$$ Extended Euclidean Algorithm: $$\gcd(a, b) = ax + by$$ for some integers $x, y$ Time complexity: $O(\log \min(a, b))

Properties

$\gcd(a, b) \times \text{lcm}(a, b) = ab$$ If $\gcd(a, n) = 1$ and $\gcd(b, n) = 1$, then $\gcd(ab, n) = 1$ Bézout's identity: $\gcd(a, b) = ax + by

Modular Arithmetic

Congruences

a \equiv b \pmod{n}$ iff $n \mid (a - b)$ Properties: $(a + b) \bmod n = ((a \bmod n) + (b \bmod n)) \bmod n$ $(a \times b) \bmod n = ((a \bmod n) \times (b \bmod n)) \bmod n$ Reflexive: $a \equiv a \pmod{n}$ Symmetric: $a \equiv b \implies b \equiv a \pmod{n}$ Transitive: $a \equiv b, b \equiv c \implies a \equiv c \pmod{n}

Modular Inverse

a^{-1} \pmod{n}$ exists iff $\gcd(a, n) = 1$ Finding inverse: use Extended Euclidean Algorithm $ax + ny = 1 \implies ax \equiv 1 \pmod{n} \implies a^{-1} \equiv x \pmod{n}$ Fermat's Little Theorem: If $p$ prime, $\gcd(a, p) = 1$: $$a^{-1} \equiv a^{p-2} \pmod{p}$

Prime Numbers

Fundamental Theorem of Arithmetic

Every integer $n > 1$ has a unique prime factorization:

n = p_1^{a_1} \times p_2^{a_2} \times \cdots \times p_k^{a_k}

where $p_1 < p_2 < \cdots < p_k$ are primes and $a_i > 0$

Prime Testing

Trial division: test divisibility up to $\sqrt{n}$ Time: $O(\sqrt{n})$ Sieve of Eratosthenes: find all primes $\leq n$ Time: $O(n \log \log n)$ Miller-Rabin: probabilistic primality test Time: $O(k \log^3 n)$ for $k$ rounds

Prime Distribution

Prime Number Theorem: $$\pi(n) \approx \frac{n}{\ln(n)}$$ where $\pi(n) = $ number of primes $\leq n$ There are infinitely many primes

Euler's Function

Definition & Properties

$\varphi(n) = |\{k : 1 \leq k \leq n, \gcd(k, n) = 1\}|$$ $\varphi(p) = p - 1$ for prime $p$ $\varphi(p^k) = p^k - p^{k-1} = p^{k-1}(p - 1)$ If $\gcd(m, n) = 1$: $\varphi(mn) = \varphi(m)\varphi(n)

Euler's Theorem

If $\gcd(a, n) = 1$, then: $$a^{\varphi(n)} \equiv 1 \pmod{n}$$ Fermat's Little Theorem (special case): If $p$ prime and $\gcd(a, p) = 1$: $$a^{p-1} \equiv 1 \pmod{p}$$

Chinese Remainder Theorem

CRT Statement

If $n_1, n_2, \ldots, n_k$ are pairwise coprime, then the system:

\begin{cases} x \equiv a_1 \pmod{n_1} \\ x \equiv a_2 \pmod{n_2} \\ \vdots \\ x \equiv a_k \pmod{n_k} \end{cases}

has a unique solution modulo $N = n_1n_2 \cdots n_k$

CRT Algorithm

                        1. Compute N = n₁n₂...nₖ
                        2. For each i, compute Nᵢ = N/nᵢ
                        3. Find Mᵢ such that NᵢMᵢ ≡ 1 (mod nᵢ)
                        4. Solution: x ≡ Σ aᵢNᵢMᵢ (mod N)
                    

CS Applications:

Cryptography (RSA, ElGamal), hash functions, pseudorandom number generation, error-correcting codes, computer algebra

Algorithm Analysis

Asymptotic Notation

Big O Notation

$f(n) = O(g(n))$ if $\exists c > 0, n_0$ such that $f(n) \leq c \cdot g(n)$ for all $n \geq n_0$ Upper bound: $f$ grows no faster than $g$ $f(n) = \Omega(g(n))$ if $g(n) = O(f(n))$ Lower bound: $f$ grows at least as fast as $g$ $f(n) = \Theta(g(n))$ if $f(n) = O(g(n))$ and $f(n) = \Omega(g(n))$ Tight bound: $f$ and $g$ grow at same rate

Common Complexities

$$O(1) < O(\log n) < O(n) < O(n \log n) < O(n^2) < O(n^3) < O(2^n) < O(n!)$$ Logarithmic: binary search, balanced tree operations Linear: linear search, single loop Linearithmic: merge sort, heap sort Quadratic: nested loops, simple sorting Exponential: brute force subsets Factorial: brute force permutations

Sorting Algorithms

Algorithm	Best Case	Average Case	Worst Case	Space	Stable
Bubble Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Selection Sort	O(n²)	O(n²)	O(n²)	O(1)	No
Insertion Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Merge Sort	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes
Quick Sort	O(n log n)	O(n log n)	O(n²)	O(log n)	No
Heap Sort	O(n log n)	O(n log n)	O(n log n)	O(1)	No
Counting Sort	O(n + k)	O(n + k)	O(n + k)	O(k)	Yes
Radix Sort	O(d(n + k))	O(d(n + k))	O(d(n + k))	O(n + k)	Yes

Search Algorithms

Binary Search

Prerequisite: sorted array Algorithm: ``` while low ≤ high: mid = (low + high) / 2 if arr[mid] == target: return mid elif arr[mid] < target: low = mid + 1 else: high = mid - 1 ``` Time: $O(\log n)$, Space: $O(1)$

Hash Table Search

Average case: $O(1)$ for search, insert, delete Worst case: $O(n)$ if many collisions Load factor $\alpha = \frac{n}{m}$ (items/buckets) Keep $\alpha < 0.75$ for good performance Collision resolution: chaining, open addressing

Dynamic Programming

Optimal Substructure

A problem has optimal substructure if optimal solution contains optimal solutions to subproblems Examples: shortest paths, matrix chain multiplication, optimal binary search trees, edit distance

Overlapping Subproblems

Recursive algorithm solves same subproblems repeatedly Memoization: top-down approach, store results Tabulation: bottom-up approach, fill table Fibonacci: naive $O(2^n)$, DP $O(n)$

Classic DP Problems

                        Knapsack Problem: max value with weight constraint
                        DP[i][w] = max value using first i items, weight ≤ w
                        
                        Longest Common Subsequence:
                        LCS[i][j] = length of LCS of first i chars of X, first j chars of Y
                        
                        Edit Distance (Levenshtein):
                        ED[i][j] = min operations to transform X[1..i] to Y[1..j]
                    

Greedy Algorithms

Greedy Choice Property

Global optimum can be reached by making locally optimal choices at each step

Activity Selection

Select maximum number of non-overlapping activities Greedy strategy: sort by finish time, always pick earliest finishing activity Time: $O(n \log n)$

Huffman Coding

Optimal prefix-free binary codes Greedy strategy: repeatedly merge two nodes with minimum frequency Time: $O(n \log n)$

Complexity Classes

P vs NP

P: problems solvable in polynomial time NP: problems verifiable in polynomial time NP-Complete: hardest problems in NP NP-Hard: at least as hard as NP-Complete $P \subseteq NP$, but $P = NP$? is open question

Reduction

Problem $A$ reduces to $B$ ($A \leq_p B$) if $A$ can be solved using polynomial number of calls to $B$ If $A \leq_p B$ and $B \in P$, then $A \in P$ Used to prove NP-completeness

Cryptography

Classical Cryptography

Caesar Cipher

Encryption: $E(x) = (x + k) \bmod 26$ Decryption: $D(y) = (y - k) \bmod 26$ Key space: 26 possible keys Vulnerable to frequency analysis

Affine Cipher

Encryption: $E(x) = (ax + b) \bmod 26$ Decryption: $D(y) = a^{-1}(y - b) \bmod 26$ Requirement: $\gcd(a, 26) = 1$ Key space: $\varphi(26) \times 26 = 12 \times 26 = 312$

Vigenère Cipher

Encryption: $C_i = (M_i + K_{i \bmod \text{keylen}}) \bmod 26$ Polyalphabetic substitution Broken by Kasiski examination and index of coincidence

Public Key Cryptography

RSA Algorithm

Key Generation: 1. Choose large primes $p, q$ 2. $n = pq$, $\varphi(n) = (p-1)(q-1)$ 3. Choose $e$: $\gcd(e, \varphi(n)) = 1$ 4. Find $d$: $ed \equiv 1 \pmod{\varphi(n)}$ Public key: $(n, e)$ Private key: $(n, d)$ Encryption: $C = M^e \bmod n$ Decryption: $M = C^d \bmod n$

Diffie-Hellman Key Exchange

Public: prime $p$, generator $g$ Alice: chooses $a$, sends $g^a \bmod p$ Bob: chooses $b$, sends $g^b \bmod p$ Shared secret: $K = g^{ab} \bmod p$ Security based on discrete log problem

ElGamal Encryption

Key Generation: Choose prime $p$, generator $g$, private key $x$ Public key: $y = g^x \bmod p$ Encryption of $M$: Choose random $k$, send $(g^k \bmod p, M \cdot y^k \bmod p)$ Decryption: $M = c_2 \cdot (c_1^x)^{-1} \bmod p$

Hash Functions

Properties

Pre-image resistance: given $h$, hard to find $x$ with $H(x) = h$ Second pre-image resistance: given $x$, hard to find $x' \neq x$ with $H(x) = H(x')$ Collision resistance: hard to find $x, x'$ with $H(x) = H(x')$ Birthday paradox: expect collision after $\sqrt{2^n}$ inputs for $n$-bit hash

Applications

Digital signatures: sign hash instead of message Password storage: store $H(\text{password} + \text{salt})$ Proof of work: find nonce with $H(\text{block} + \text{nonce}) < \text{target}$ Data integrity: compare hash values

Digital Signatures

RSA Signatures

Signing: $S = H(M)^d \bmod n$ Verification: check if $S^e \equiv H(M) \pmod{n}$ Must use padding (PSS) to prevent attacks

DSA (Digital Signature Algorithm)

Parameters: prime $p$, $q \mid (p-1)$, generator $g$ Private key: $x$, Public key: $y = g^x \bmod p$ Signing $M$: $k = $ random, $r = (g^k \bmod p) \bmod q$ $s = k^{-1}(H(M) + xr) \bmod q$ Signature: $(r, s)$ Verification: check if $r = ((g^{H(M)s^{-1}} \cdot y^{rs^{-1}}) \bmod p) \bmod q$

CS Applications:

Secure communications, blockchain technology, password authentication, digital certificates, software signing, cryptocurrency

Optimization

Linear Programming

Standard Form

Minimize: $\mathbf{c}^T \mathbf{x}$ Subject to: $A\mathbf{x} = \mathbf{b}$, $\mathbf{x} \geq \mathbf{0}$ Can convert any LP to standard form: - $\max f \rightarrow \min(-f)$ - inequality $\rightarrow$ equality (add slack variables) - unrestricted variable $\rightarrow$ difference of non-negative variables

Simplex Method

1. Start at basic feasible solution (vertex of polytope) 2. Check optimality conditions 3. If not optimal, move to adjacent vertex that improves objective 4. Repeat until optimal or unbounded Worst case: exponential Average case: polynomial

Fundamental Theorem of Linear Programming

If LP has optimal solution, then it has optimal basic feasible solution (vertex of feasible region)

Duality Theorem

Primal: $\min \mathbf{c}^T \mathbf{x}$ subject to $A\mathbf{x} \geq \mathbf{b}$, $\mathbf{x} \geq \mathbf{0}$ Dual: $\max \mathbf{b}^T \mathbf{y}$ subject to $A^T \mathbf{y} \leq \mathbf{c}$, $\mathbf{y} \geq \mathbf{0}$ Strong duality: if both have optimal solutions, optimal values are equal

Nonlinear Optimization

Unconstrained Optimization

First-order condition: $\nabla f(\mathbf{x}^*) = \mathbf{0}$ Second-order condition: $\nabla^2 f(\mathbf{x}^*)$ positive definite Gradient descent: $\mathbf{x}_{k+1} = \mathbf{x}_k - \alpha_k \nabla f(\mathbf{x}_k)$ Newton's method: $\mathbf{x}_{k+1} = \mathbf{x}_k - [\nabla^2 f(\mathbf{x}_k)]^{-1} \nabla f(\mathbf{x}_k)$

Constrained Optimization

KKT conditions for $\min f(\mathbf{x})$ subject to $g(\mathbf{x}) \leq \mathbf{0}$, $h(\mathbf{x}) = \mathbf{0}$: $\nabla f(\mathbf{x}^*) + \sum_i \lambda_i \nabla g_i(\mathbf{x}^*) + \sum_j \mu_j \nabla h_j(\mathbf{x}^*) = \mathbf{0}$ $g_i(\mathbf{x}^*) \leq 0$, $h_j(\mathbf{x}^*) = 0$ $\lambda_i \geq 0$, $\lambda_i g_i(\mathbf{x}^*) = 0$ (complementary slackness)

Convex Optimization

Convex Sets & Functions

Convex set: $\forall \mathbf{x}, \mathbf{y} \in S, \forall \lambda \in [0,1]: \lambda\mathbf{x} + (1-\lambda)\mathbf{y} \in S$ Convex function: $f(\lambda\mathbf{x} + (1-\lambda)\mathbf{y}) \leq \lambda f(\mathbf{x}) + (1-\lambda)f(\mathbf{y})$ Properties: - Local minimum = global minimum - First-order condition sufficient for optimality

Interior Point Methods

Barrier method: replace constraints $g(\mathbf{x}) \leq \mathbf{0}$ with penalty terms $-t \sum_i \log(-g_i(\mathbf{x}))$ As $t \to \infty$, solution approaches constrained optimum Polynomial time complexity for convex problems

CS Applications:

Machine learning (SVM, neural network training), resource allocation, network flow optimization, compiler optimization, game theory

Integer Programming

Branch and Bound

1. Solve LP relaxation 2. If solution integer, update best bound 3. If not integer, branch on fractional variable 4. Recursively solve subproblems 5. Prune if bound worse than current best NP-hard in general, but practical for many instances

Cutting Planes

Add linear constraints (cuts) that remove fractional solutions but preserve all integer solutions Gomory cuts: derived from optimal LP tableau Problem-specific cuts often more effective

Mathematical Software & Tools

Symbolic Computation

Mathematica, Maple, SymPy (Python) Exact arithmetic, symbolic integration/differentiation, equation solving, simplification Computer algebra systems

Numerical Computing

MATLAB, NumPy/SciPy (Python), R Linear algebra (BLAS, LAPACK) Optimization (CPLEX, Gurobi) Statistics and data analysis

Proof Assistants

Coq, Lean, Agda, Isabelle/HOL Formal verification of mathematical proofs Computer-checked mathematics Growing importance in CS theory

Computational Complexity

Complexity Zoo: database of complexity classes P, NP, PSPACE, EXPTIME, etc. Reductions and completeness results Interactive proofs, quantum complexity

Study Strategy for Advanced Math in CS:

Calculus

Multivariable Calculus

Differential Equations

Linear Algebra

Discrete Mathematics

Probability & Statistics

Calculus

Limits

Definition of a Limit

\lim_{x \to a} f(x) = L$$ means: For every $\varepsilon > 0$, there exists $\delta > 0$ such that $$0 < |x - a| < \delta \implies |f(x) - L| < \varepsilon

Important Limits

\lim_{x \to 0} \frac{\sin x}{x} = 1$$ $$\lim_{x \to 0} \frac{1 - \cos x}{x} = 0$$ $$\lim_{x \to 0} \frac{e^x - 1}{x} = 1$$ $$\lim_{x \to \infty} \left(1 + \frac{1}{x}\right)^x = e

L'Hôpital's Rule

If $\lim_{x \to a} f(x) = \lim_{x \to a} g(x) = 0$ or $\pm\infty$, then: $$\lim_{x \to a} \frac{f(x)}{g(x)} = \lim_{x \to a} \frac{f'(x)}{g'(x)}$$ (provided the right side exists)

Derivatives

Definition

$f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}$$ Alternative notation: $\frac{df}{dx}$, $\frac{d}{dx}f(x)$, $D_x f

Derivative Rules

Power Rule: $\frac{d}{dx}(x^n) = nx^{n-1}$ Product Rule: $(fg)' = f'g + fg'$ Quotient Rule: $\left(\frac{f}{g}\right)' = \frac{f'g - fg'}{g^2}$ Chain Rule: $\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x)$

Common Derivatives

\frac{d}{dx}(\sin x) = \cos x \qquad \frac{d}{dx}(\cos x) = -\sin x$$ $$\frac{d}{dx}(e^x) = e^x \qquad \frac{d}{dx}(\ln x) = \frac{1}{x}$$ $$\frac{d}{dx}(\tan x) = \sec^2 x \qquad \frac{d}{dx}(\arctan x) = \frac{1}{1+x^2}

Mean Value Theorem

If $f$ is continuous on $[a,b]$ and differentiable on $(a,b)$, then there exists $c \in (a,b)$ such that: $$f'(c) = \frac{f(b) - f(a)}{b - a}$$

Integrals

Definite Integral

$\int_a^b f(x)\,dx = \lim_{n \to \infty} \sum_{i=1}^n f(x_i^*) \Delta x$$ where $\Delta x = \frac{b-a}{n}$ and $x_i^* \in [x_{i-1}, x_i]

Fundamental Theorem of Calculus

Part 1: If $F(x) = \int_a^x f(t)\,dt$, then $F'(x) = f(x)$ Part 2: If $F'(x) = f(x)$, then $$\int_a^b f(x)\,dx = F(b) - F(a)$$

Integration Techniques

Substitution: $\int f(g(x))g'(x)\,dx = \int f(u)\,du$ Integration by Parts: $\int u\,dv = uv - \int v\,du$ Partial Fractions: Decompose rational functions

Common Integrals

\int x^n\,dx = \frac{x^{n+1}}{n+1} + C \quad (n \neq -1)$$ $$\int \frac{1}{x}\,dx = \ln|x| + C$$ $$\int e^x\,dx = e^x + C$$ $$\int \sin x\,dx = -\cos x + C$$ $$\int \frac{1}{\sqrt{1-x^2}}\,dx = \arcsin x + C

Series and Sequences

Sequences

A sequence $\{a_n\}$ converges to $L$ if: $$\lim_{n \to \infty} a_n = L$$ Common limits: $$\lim_{n \to \infty} \frac{1}{n} = 0 \qquad \lim_{n \to \infty} \sqrt[n]{n} = 1$$ $$\lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n = e$$

Series Convergence Tests

Geometric Series: $\sum_{n=0}^{\infty} ar^n = \frac{a}{1-r}$ if $|r| < 1$ p-Series: $\sum_{n=1}^{\infty} \frac{1}{n^p}$ converges if $p > 1$ Ratio Test: If $\lim_{n \to \infty} \left|\frac{a_{n+1}}{a_n}\right| = L$ then series converges if $L < 1$

Taylor Series

f(x) = \sum_{n=0}^{\infty} \frac{f^{(n)}(a)}{n!}(x-a)^n$$ Common Taylor series: $$e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}$$ $$\sin x = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n+1}}{(2n+1)!}$$ $$\ln(1+x) = \sum_{n=1}^{\infty} \frac{(-1)^{n+1} x^n}{n}

CS Applications:

Machine learning (gradient descent, backpropagation), computer graphics (physics simulations), numerical methods, algorithm complexity analysis, signal processing

Applications of Calculus

Optimization

Critical points: $f'(x) = 0$ or $f'(x)$ undefined Second derivative test: - If $f''(c) > 0$: local minimum at $x = c$ - If $f''(c) < 0$: local maximum at $x = c$ - If $f''(c) = 0$: test inconclusive

Related Rates

If variables $x$ and $y$ are related by an equation and vary with time $t$: $$\frac{dy}{dt} = \frac{dy}{dx} \cdot \frac{dx}{dt}$$ Example: Area of circle $A = \pi r^2$ $$\frac{dA}{dt} = 2\pi r \frac{dr}{dt}$$

Arc Length & Surface Area

Arc length: $L = \int_a^b \sqrt{1 + (f'(x))^2}\,dx$ Surface area (revolution about x-axis): $$S = 2\pi \int_a^b f(x)\sqrt{1 + (f'(x))^2}\,dx$$

Multivariable Calculus

Partial Derivatives & Gradients

Partial Derivatives

$$\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h,y) - f(x,y)}{h}$$ Mixed partials theorem (Clairaut): $$\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial^2 f}{\partial y \partial x}$$ (if both are continuous)

Gradient & Directional Derivatives

$\nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right)$$ Directional derivative: $$D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}$$ Maximum rate of change: $|\nabla f|

Chain Rule (Multivariable)

If $z = f(x,y)$ where $x = g(t)$ and $y = h(t)$: $$\frac{dz}{dt} = \frac{\partial z}{\partial x}\frac{dx}{dt} + \frac{\partial z}{\partial y}\frac{dy}{dt}$$ General form: $$\frac{\partial z}{\partial t} = \sum_{i} \frac{\partial z}{\partial x_i}\frac{\partial x_i}{\partial t}$$

CS Applications:

Machine learning optimization (gradient descent), computer graphics (surface normals), image processing (edge detection), neural network training

Multiple Integrals

Double Integrals

\iint_R f(x,y) \, dA = \int_a^b \int_{c(x)}^{d(x)} f(x,y) \, dy \, dx$$ Fubini's Theorem: For continuous $f$, $$\int_a^b \int_c^d f(x,y) \, dy \, dx = \int_c^d \int_a^b f(x,y) \, dx \, dy

Change of Variables

$\iint_R f(x,y) \, dx \, dy = \iint_S f(x(u,v), y(u,v)) |J| \, du \, dv$$ Jacobian: $J = \frac{\partial(x,y)}{\partial(u,v)} = \begin{vmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{vmatrix}

Cylindrical & Spherical Coordinates

Cylindrical: $\begin{cases} x = r\cos\theta \\ y = r\sin\theta \\ z = z \end{cases}$ $dV = r \, dr \, d\theta \, dz$ Spherical: $\begin{cases} x = \rho\sin\phi\cos\theta \\ y = \rho\sin\phi\sin\theta \\ z = \rho\cos\phi \end{cases}$ $dV = \rho^2\sin\phi \, d\rho \, d\phi \, d\theta$

Vector Calculus

Vector Fields & Line Integrals

$\int_C \mathbf{F} \cdot d\mathbf{r} = \int_a^b \mathbf{F}(\mathbf{r}(t)) \cdot \mathbf{r}'(t) \, dt$$ Conservative field: $\nabla \times \mathbf{F} = \mathbf{0}$ If $\mathbf{F} = \nabla f$: $\int_C \mathbf{F} \cdot d\mathbf{r} = f(B) - f(A)

Green's Theorem

$\oint_C (P \, dx + Q \, dy) = \iint_D \left( \frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y} \right) dA$$ Area formula: $A = \frac{1}{2} \oint_C (x \, dy - y \, dx)

Divergence & Curl

\text{div } \mathbf{F} = \nabla \cdot \mathbf{F} = \frac{\partial P}{\partial x} + \frac{\partial Q}{\partial y} + \frac{\partial R}{\partial z}$$ $$\text{curl } \mathbf{F} = \nabla \times \mathbf{F} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \frac{\partial}{\partial x} & \frac{\partial}{\partial y} & \frac{\partial}{\partial z} \\ P & Q & R \end{vmatrix}

Stokes' Theorem

\oint_C \mathbf{F} \cdot d\mathbf{r} = \iint_S (\nabla \times \mathbf{F}) \cdot \mathbf{n} \, dS

Relates line integral around boundary to surface integral of curl

Divergence Theorem

\iint_S \mathbf{F} \cdot \mathbf{n} \, dS = \iiint_E \nabla \cdot \mathbf{F} \, dV

Relates surface integral to volume integral of divergence

Differential Equations

First-Order ODEs

Separable Equations

\frac{dy}{dx} = f(x)g(y)$$ Solution method: $$\int \frac{dy}{g(y)} = \int f(x) \, dx

Linear First-Order

$\frac{dy}{dx} + P(x)y = Q(x)$$ Integrating factor: $\mu(x) = e^{\int P(x) \, dx}$ Solution: $y = \frac{1}{\mu(x)}\left[\int \mu(x)Q(x) \, dx + C\right]

Exact Equations

$M(x,y) \, dx + N(x,y) \, dy = 0$$ Exact if: $\frac{\partial M}{\partial y} = \frac{\partial N}{\partial x}$ Solution: Find $F(x,y)$ where $\frac{\partial F}{\partial x} = M$ and $\frac{\partial F}{\partial y} = N

Second-Order Linear ODEs

Homogeneous with Constant Coefficients

$ay'' + by' + cy = 0$$ Characteristic equation: $ar^2 + br + c = 0$ Solutions: - If $r_1 \neq r_2$: $y = c_1e^{r_1x} + c_2e^{r_2x}$ - If $r_1 = r_2 = r$: $y = (c_1 + c_2x)e^{rx}$ - If $r = \alpha \pm \beta i$: $y = e^{\alpha x}(c_1\cos(\beta x) + c_2\sin(\beta x))

Method of Undetermined Coefficients

For $ay'' + by' + cy = f(x)$: - If $f(x) = e^{\alpha x}$: try $y_p = Ae^{\alpha x}$ - If $f(x) = \text{polynomial}$: try polynomial of same degree - If $f(x) = \sin(\alpha x)$ or $\cos(\alpha x)$: try $y_p = A\cos(\alpha x) + B\sin(\alpha x)$

Variation of Parameters

For $y'' + P(x)y' + Q(x)y = f(x)$ If $y_1, y_2$ are homogeneous solutions: $$y_p = -y_1\int\frac{y_2f}{W}dx + y_2\int\frac{y_1f}{W}dx$$ Wronskian: $W = y_1y_2' - y_2y_1'$

CS Applications:

System dynamics, control theory, signal processing, population models in algorithms, oscillator circuits, physics simulations

Laplace Transforms

Definition & Properties

$\mathcal{L}\{f(t)\} = F(s) = \int_0^{\infty} e^{-st}f(t) \, dt$$ Properties: - $\mathcal{L}\{af + bg\} = a\mathcal{L}\{f\} + b\mathcal{L}\{g\}$ - $\mathcal{L}\{f'\} = sF(s) - f(0)$ - $\mathcal{L}\{f''\} = s^2F(s) - sf(0) - f'(0)

Common Transforms

\mathcal{L}\{1\} = \frac{1}{s}$$ $$\mathcal{L}\{t^n\} = \frac{n!}{s^{n+1}}$$ $$\mathcal{L}\{e^{at}\} = \frac{1}{s-a}$$ $$\mathcal{L}\{\sin(at)\} = \frac{a}{s^2+a^2}$$ $$\mathcal{L}\{\cos(at)\} = \frac{s}{s^2+a^2}

Systems of ODEs

Matrix Form

$\mathbf{x}' = A\mathbf{x} + \mathbf{b}$$ Solution: $\mathbf{x}(t) = e^{At}\mathbf{x}_0 + \int_0^t e^{A(t-\tau)}\mathbf{b}(\tau) \, d\tau$ Matrix exponential: $e^{At} = \sum_{n=0}^{\infty} \frac{A^n t^n}{n!}

Eigenvalue Method

If $A\mathbf{v} = \lambda\mathbf{v}$, then $\mathbf{x} = \mathbf{v}e^{\lambda t}$ is a solution General solution: $$\mathbf{x} = c_1\mathbf{v}_1e^{\lambda_1 t} + c_2\mathbf{v}_2e^{\lambda_2 t} + \cdots$$

Linear Algebra

Matrix Operations & Properties

Basic Operations

(AB)C = A(BC)$ - Associative $A(B + C) = AB + AC$ - Distributive $(AB)^T = B^T A^T$ $(AB)^{-1} = B^{-1}A^{-1}

Determinants

\det(AB) = \det(A)\det(B)$ $\det(A^T) = \det(A)$ $\det(A^{-1}) = \frac{1}{\det(A)}$ For $2 \times 2$: $\det\begin{pmatrix}a & b\\c & d\end{pmatrix} = ad - bc

Matrix Inverse

A^{-1} = \frac{1}{\det(A)} \cdot \text{adj}(A)$ For $2 \times 2$: $A^{-1} = \frac{1}{ad-bc}\begin{pmatrix}d & -b\\-c & a\end{pmatrix}$ Gauss-Jordan: $[A|I] \rightarrow [I|A^{-1}]

Vector Spaces & Linear Independence

Vector Space Axioms

A vector space $V$ over field $F$ satisfies:

1. $\mathbf{u} + \mathbf{v} \in V$ (closure under addition) 2. $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ (commutativity) 3. $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$ (associativity) 4. $\exists \mathbf{0} \in V: \mathbf{v} + \mathbf{0} = \mathbf{v}$ (additive identity) 5. $\forall \mathbf{v} \exists (-\mathbf{v}): \mathbf{v} + (-\mathbf{v}) = \mathbf{0}$ (additive inverse) 6-10. Scalar multiplication axioms

Linear Independence

Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n$ are linearly independent if: $$c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_n\mathbf{v}_n = \mathbf{0} \implies c_1 = c_2 = \cdots = c_n = 0$$ Matrix form: $A\mathbf{x} = \mathbf{0}$ has only trivial solution

Basis & Dimension

Basis: linearly independent set that spans the space $\dim(V) = $ number of vectors in any basis If $\dim(V) = n$, then any $n$ linearly independent vectors form a basis

Eigenvalues & Eigenvectors

Characteristic Equation

$A\mathbf{v} = \lambda\mathbf{v} \quad (\mathbf{v} \neq \mathbf{0})$$ Characteristic polynomial: $\det(A - \lambda I) = 0$ Eigenspace: $E_{\lambda} = \{\mathbf{v} : A\mathbf{v} = \lambda\mathbf{v}\} = \text{null}(A - \lambda I)

Diagonalization

$A = PDP^{-1}$$ where $D$ is diagonal $P = [\mathbf{v}_1 \, \mathbf{v}_2 \, \cdots \, \mathbf{v}_n]$ (eigenvectors) $D = \text{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)$ (eigenvalues) $A^k = PD^k P^{-1}

Properties

$$\text{tr}(A) = \lambda_1 + \lambda_2 + \cdots + \lambda_n$$ $$\det(A) = \lambda_1 \times \lambda_2 \times \cdots \times \lambda_n$$ If $A$ symmetric: all eigenvalues real, eigenvectors orthogonal

CS Applications:

Google PageRank algorithm, Principal Component Analysis (PCA), computer graphics transformations, quantum computing gates, network analysis

Orthogonality & Least Squares

Inner Products & Norms

$\langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{u}^T\mathbf{v} = u_1v_1 + u_2v_2 + \cdots + u_nv_n$$ $$\|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle} = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}$$ Cauchy-Schwarz: $|\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\| \|\mathbf{v}\|

Gram-Schmidt Process

$\mathbf{u}_1 = \mathbf{v}_1$$ $$\mathbf{u}_2 = \mathbf{v}_2 - \text{proj}_{\mathbf{u}_1}\mathbf{v}_2$$ $$\mathbf{u}_3 = \mathbf{v}_3 - \text{proj}_{\mathbf{u}_1}\mathbf{v}_3 - \text{proj}_{\mathbf{u}_2}\mathbf{v}_3$$ where $\text{proj}_{\mathbf{u}}\mathbf{v} = \frac{\langle \mathbf{v}, \mathbf{u} \rangle}{\langle \mathbf{u}, \mathbf{u} \rangle}\mathbf{u}

Least Squares

Normal equation: $A^T A\hat{\mathbf{x}} = A^T\mathbf{b}$ Solution: $\hat{\mathbf{x}} = (A^T A)^{-1}A^T\mathbf{b}$ If $A$ has orthonormal columns: $\hat{\mathbf{x}} = A^T\mathbf{b}$

Singular Value Decomposition

SVD Theorem

Every $m \times n$ matrix $A$ can be written as:

$$A = U\Sigma V^T$$ $U$: $m \times m$ orthogonal (left singular vectors) $\Sigma$: $m \times n$ diagonal (singular values $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$) $V$: $n \times n$ orthogonal (right singular vectors)

CS Applications:

Image compression, dimensionality reduction, recommender systems, data analysis, pseudoinverse computation

Discrete Mathematics

Logic & Proofs

Logical Operators

\neg$ (NOT), $\land$ (AND), $\lor$ (OR), $\rightarrow$ (IMPLIES), $\leftrightarrow$ (IFF) De Morgan's Laws: $$\neg(P \land Q) \equiv \neg P \lor \neg Q$$ $$\neg(P \lor Q) \equiv \neg P \land \neg Q$$ $P \rightarrow Q \equiv \neg P \lor Q

Quantifiers

\forall x \, P(x)$: "for all $x$, $P(x)$" $\exists x \, P(x)$: "there exists $x$ such that $P(x)$" $$\neg\forall x \, P(x) \equiv \exists x \, \neg P(x)$$ $$\neg\exists x \, P(x) \equiv \forall x \, \neg P(x)$

Proof Techniques

Direct proof: $P \rightarrow Q$ Contrapositive: $\neg Q \rightarrow \neg P$ Contradiction: Assume $\neg Q$ and derive contradiction Induction: Base case + inductive step

Set Theory

Set Operations

A \cup B = \{x : x \in A \text{ or } x \in B\}$ $A \cap B = \{x : x \in A \text{ and } x \in B\}$ $A - B = \{x : x \in A \text{ and } x \notin B\}$ $A^c = \{x : x \notin A\}$ $|A \cup B| = |A| + |B| - |A \cap B|

Relations

Reflexive: $\forall x \, (x,x) \in R$ Symmetric: $\forall x,y \, ((x,y) \in R \rightarrow (y,x) \in R)$ Transitive: $\forall x,y,z \, ((x,y) \in R \land (y,z) \in R \rightarrow (x,z) \in R)$ Equivalence relation: reflexive + symmetric + transitive

Functions

Injective (one-to-one): $\forall x,y \, (f(x) = f(y) \rightarrow x = y)$ Surjective (onto): $\forall y \, \exists x \, (f(x) = y)$ Bijective: injective + surjective $|A| = |B|$ iff $\exists$ bijection $f: A \rightarrow B$

Combinatorics

Counting Principles

Addition Principle: $|A \cup B| = |A| + |B|$ if $A \cap B = \emptyset$ Multiplication Principle: $|A \times B| = |A| \times |B|$ Pigeonhole Principle: If $n$ objects in $m$ boxes and $n > m$, then some box contains $> 1$ object

Permutations & Combinations

$P(n,r) = \frac{n!}{(n-r)!} \text{ (permutations)}$$ $$C(n,r) = \binom{n}{r} = \frac{n!}{r!(n-r)!} \text{ (combinations)}$$ Binomial theorem: $(x+y)^n = \sum_{k=0}^n \binom{n}{k} x^k y^{n-k}$ Stars and bars: $\binom{n+k-1}{k-1}

Inclusion-Exclusion

|A_1 \cup A_2 \cup \cdots \cup A_n| =$$ $$\sum |A_i| - \sum |A_i \cap A_j| + \sum |A_i \cap A_j \cap A_k| - \cdots$$ $$+ (-1)^{n+1}|A_1 \cap \cdots \cap A_n|

CS Applications:

Algorithm analysis, database query optimization, complexity theory, coding theory, cryptographic protocols

Generating Functions

Ordinary Generating Functions

$G(x) = \sum_{n=0}^{\infty} a_n x^n = a_0 + a_1x + a_2x^2 + \cdots$$ Fibonacci: $F(x) = \frac{x}{1 - x - x^2}$ Geometric: $\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n

Exponential Generating Functions

$$E(x) = \sum_{n=0}^{\infty} a_n \frac{x^n}{n!}$$ $e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}$ Useful for problems with labels/arrangements

Recurrence Relations

Linear Homogeneous

$a_n = c_1a_{n-1} + c_2a_{n-2} + \cdots + c_ka_{n-k}$$ Characteristic equation: $r^k - c_1r^{k-1} - \cdots - c_k = 0$ If roots distinct: $a_n = A_1r_1^n + A_2r_2^n + \cdots

Master Theorem

For $T(n) = aT(n/b) + f(n)$: Case 1: $f(n) = O(n^{\log_b a - \varepsilon}) \Rightarrow T(n) = \Theta(n^{\log_b a})$ Case 2: $f(n) = \Theta(n^{\log_b a}) \Rightarrow T(n) = \Theta(n^{\log_b a} \log n)$ Case 3: $f(n) = \Omega(n^{\log_b a + \varepsilon}) \Rightarrow T(n) = \Theta(f(n))$

Probability & Statistics

Probability Theory

Basic Probability

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$ $$P(A^c) = 1 - P(A)$$ $$P(A|B) = \frac{P(A \cap B)}{P(B)}$$ Independence: $P(A \cap B) = P(A)P(B)

Bayes' Theorem

P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$ Law of Total Probability: $$P(B) = \sum_i P(B|A_i)P(A_i)

Random Variables

$E[X] = \sum_x x \cdot P(X = x) \text{ (discrete)}$$ $$E[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx \text{ (continuous)}$$ $\text{Var}(X) = E[X^2] - (E[X])^2$ $\text{Var}(aX + b) = a^2\text{Var}(X)

Common Distributions

Discrete Distributions

Bernoulli: $P(X = 1) = p$, $P(X = 0) = 1-p$ Binomial: $P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}$ $E[X] = np$, $\text{Var}(X) = np(1-p)$ Poisson: $P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$ $E[X] = \text{Var}(X) = \lambda$

Continuous Distributions

Uniform: $f(x) = \frac{1}{b-a}$ for $x \in [a,b]$ $E[X] = \frac{a+b}{2}$, $\text{Var}(X) = \frac{(b-a)^2}{12}$ Normal: $f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ $E[X] = \mu$, $\text{Var}(X) = \sigma^2$ Exponential: $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$ $E[X] = \frac{1}{\lambda}$, $\text{Var}(X) = \frac{1}{\lambda^2}$

Central Limit Theorem

If $X_1, X_2, \ldots, X_n$ are i.i.d. with $E[X_i] = \mu$, $\text{Var}(X_i) = \sigma^2$: $$\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1) \text{ as } n \to \infty$$ where $\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$

Statistical Inference

Confidence Intervals

For mean $\mu$ ($\sigma$ known): $$\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}$$ For mean $\mu$ ($\sigma$ unknown): $$\bar{x} \pm t_{\alpha/2,n-1} \times \frac{s}{\sqrt{n}}$$ For proportion $p$: $$\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Hypothesis Testing

Test statistic for mean: $$z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}} \text{ or } t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$$ p-value = P(observing test statistic or more extreme | $H_0$) Reject $H_0$ if p-value $< \alpha$

Regression

Simple linear: $\hat{y} = a + bx$ $$b = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}$$ $$a = \bar{y} - b\bar{x}$$ $R^2 = 1 - \frac{SSE}{SST}$ (coefficient of determination)

CS Applications:

Machine learning algorithms, A/B testing, data mining, quality assurance, performance analysis, randomized algorithms

Numerical Analysis

Root Finding

Bisection Method

If $f(a)f(b) < 0$, then $\exists c \in (a,b): f(c) = 0$ Algorithm: 1. $c = \frac{a + b}{2}$ 2. If $f(a)f(c) < 0$: $b = c$, else $a = c$ 3. Repeat until $|b - a| < $ tolerance Convergence: $O(\log_2((b-a)/\varepsilon))$

Newton's Method

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$$ Quadratic convergence when $f'(\text{root}) \neq 0$ May fail if $f'(x_n) \approx 0$ or poor initial guess

Secant Method

$x_{n+1} = x_n - f(x_n)\frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}$$ Approximates derivative using secant line Superlinear convergence: $\varphi = \frac{1+\sqrt{5}}{2} \approx 1.618

Interpolation

Lagrange Interpolation

$$P(x) = \sum_{i=0}^n y_i L_i(x)$$ where $L_i(x) = \prod_{j=0, j \neq i}^n \frac{x - x_j}{x_i - x_j}$ Unique polynomial of degree $\leq n$ through $n+1$ points

Newton's Divided Differences

$$P(x) = f[x_0] + f[x_0,x_1](x-x_0) + f[x_0,x_1,x_2](x-x_0)(x-x_1) + \cdots$$ $f[x_i,x_{i+1}] = \frac{f[x_{i+1}] - f[x_i]}{x_{i+1} - x_i}$ More efficient for adding new points

Spline Interpolation

Cubic spline: $S(x) = a_i + b_i(x-x_i) + c_i(x-x_i)^2 + d_i(x-x_i)^3$ Conditions: - $S(x_i) = y_i$ - $S'(x_i^-) = S'(x_i^+)$ - $S''(x_i^-) = S''(x_i^+)$

Numerical Integration

Composite Rules

Composite Trapezoidal: $$\int_a^b f(x)\,dx \approx \frac{h}{2}[f(x_0) + 2f(x_1) + \cdots + 2f(x_{n-1}) + f(x_n)]$$ Error: $O(h^2)$ Composite Simpson's: $$\int_a^b f(x)\,dx \approx \frac{h}{3}[f(x_0) + 4f(x_1) + 2f(x_2) + 4f(x_3) + \cdots + f(x_n)]$$ Error: $O(h^4)$

Gaussian Quadrature

$\int_{-1}^1 f(x)\,dx \approx \sum_{i=1}^n w_i f(x_i)$$ Optimal choice of points $x_i$ and weights $w_i$ $n$-point Gauss rule exact for polynomials of degree $\leq 2n-1

Ordinary Differential Equations

Euler's Method

$y_{n+1} = y_n + h \cdot f(x_n, y_n)$$ First-order accuracy: $O(h)$ Simple but may be unstable for large $h

Runge-Kutta Methods

RK4 (Fourth-order): $$k_1 = hf(x_n, y_n)$$ $$k_2 = hf(x_n + h/2, y_n + k_1/2)$$ $$k_3 = hf(x_n + h/2, y_n + k_2/2)$$ $$k_4 = hf(x_n + h, y_n + k_3)$$ $$y_{n+1} = y_n + \frac{k_1 + 2k_2 + 2k_3 + k_4}{6}$$

CS Applications:

Computer graphics (ray tracing), scientific computing, game physics, signal processing, finite element analysis

Complex Analysis

Complex Numbers

Basic Operations

$$z = x + iy = re^{i\theta} = r(\cos\theta + i\sin\theta)$$ $|z| = \sqrt{x^2 + y^2} = r$ $\arg(z) = \theta = \arctan(y/x)$ $z^* = x - iy = re^{-i\theta}$ (complex conjugate)

Polar Form

$z_1z_2 = r_1r_2e^{i(\theta_1+\theta_2)}$$ $$\frac{z_1}{z_2} = \frac{r_1}{r_2}e^{i(\theta_1-\theta_2)}$$ De Moivre's theorem: $z^n = r^n e^{in\theta}$ $n$th roots: $z^{1/n} = r^{1/n} e^{i(\theta+2\pi k)/n}$, $k = 0,1,\ldots,n-1

Analytic Functions

Cauchy-Riemann Equations

If $f(z) = u(x,y) + iv(x,y)$ is analytic:

\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}$$ $$\frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}

Elementary Functions

$e^z = e^x(\cos y + i\sin y)$$ $$\sin z = \frac{e^{iz} - e^{-iz}}{2i}$$ $$\cos z = \frac{e^{iz} + e^{-iz}}{2}$$ $\log z = \ln|z| + i(\arg z + 2\pi n)

Power Series

$f(z) = \sum_{n=0}^{\infty} a_n(z - z_0)^n$$ Radius of convergence: $R = \frac{1}{\limsup_{n \to \infty} |a_n|^{1/n}}$ Convergent for $|z - z_0| < R

Integration & Residues

Cauchy's Theorem

If $f$ is analytic in simply connected domain $D$ and $C$ is a closed curve in $D$:

\oint_C f(z) \, dz = 0

Residue Theorem

\oint_C f(z) \, dz = 2\pi i \sum_k \text{Res}(f, z_k)

where $z_k$ are poles inside $C$

Computing Residues

Simple pole at $z_0$: $$\text{Res}(f, z_0) = \lim_{z \to z_0} (z - z_0)f(z)$$ Pole of order $n$: $$\text{Res}(f, z_0) = \frac{1}{(n-1)!} \lim_{z \to z_0} \frac{d^{n-1}}{dz^{n-1}}[(z-z_0)^n f(z)]$$

CS Applications:

Signal processing (Fourier transforms), control systems, quantum computing, computer graphics (fractals), digital filters

Graph Theory

Basic Definitions

Graph Properties

Graph $G = (V, E)$ where $V =$ vertices, $E =$ edges Order: $|V|$ (number of vertices) Size: $|E|$ (number of edges) Degree: $\deg(v) = $ number of edges incident to $v$ Handshaking lemma: $\sum_{v \in V} \deg(v) = 2|E|$

Special Graphs

Complete graph $K_n$: $|E| = \frac{n(n-1)}{2}$ Complete bipartite $K_{m,n}$: $|E| = mn$ Cycle $C_n$: $n$ vertices, $n$ edges Path $P_n$: $n$ vertices, $n-1$ edges Tree: connected, acyclic, $|E| = |V| - 1$

Matrix Representations

Adjacency matrix $A$: $A[i,j] = 1$ if edge $(i,j)$ exists Incidence matrix: rows = vertices, columns = edges Number of walks of length $k$ from $i$ to $j$: $(A^k)[i,j]$

Connectivity

Paths & Cycles

Walk: sequence of vertices $v_0, v_1, \ldots, v_k$ Trail: walk with distinct edges Path: walk with distinct vertices Connected: path exists between any two vertices Distance $d(u,v)$: length of shortest path from $u$ to $v$

Euler & Hamilton

Eulerian path: uses every edge exactly once Exists iff graph connected and $\leq 2$ odd-degree vertices Eulerian cycle: closed Eulerian path Exists iff graph connected and all vertices even degree Hamiltonian path: visits every vertex exactly once

Trees & Spanning Trees

Tree Properties

For a graph $G$ with $n$ vertices, these are equivalent:

1. $G$ is a tree 2. $G$ is connected and has $n-1$ edges 3. $G$ is acyclic and has $n-1$ edges 4. $G$ is connected and removing any edge disconnects it 5. $G$ is acyclic and adding any edge creates exactly one cycle

Kruskal's Algorithm (Minimum Spanning Tree)

                        1. Sort edges by weight
                        2. Initialize disjoint sets for each vertex
                        3. For each edge (u,v):
                           - If u and v in different components:
                             - Add edge to MST
                             - Union components of u and v
                        4. Stop when n-1 edges added
                        
                        Time complexity: O(E log E)
                    

Prim's Algorithm

                        1. Start with arbitrary vertex
                        2. Maintain priority queue of edges from current tree
                        3. Repeatedly add minimum weight edge that connects
                           tree to new vertex
                        4. Update priority queue
                        
                        Time complexity: O(E log V) with binary heap
                    

Shortest Paths

Dijkstra's Algorithm

                        1. Initialize distances: d[source] = 0, d[v] = ∞ for v ≠ source
                        2. Priority queue Q with all vertices
                        3. While Q not empty:
                           - u = extract minimum from Q
                           - For each neighbor v of u:
                             - If d[u] + weight(u,v) < d[v]:
                               - d[v] = d[u] + weight(u,v)
                               - parent[v] = u
                        
                        Time: O((V + E) log V) with binary heap
                        Requirements: non-negative weights
                    

Bellman-Ford Algorithm

                        1. Initialize distances: d[source] = 0, d[v] = ∞ for v ≠ source
                        2. For i = 1 to |V| - 1:
                           - For each edge (u,v):
                             - If d[u] + weight(u,v) < d[v]:
                               - d[v] = d[u] + weight(u,v)
                        3. Check for negative cycles
                        
                        Time: O(VE)
                        Handles negative weights
                    

Floyd-Warshall Algorithm

                        1. Initialize distance matrix D
                        2. For k = 1 to n:
                           - For i = 1 to n:
                             - For j = 1 to n:
                               - D[i][j] = min(D[i][j], D[i][k] + D[k][j])
                        
                        Time: O(V³)
                        Finds all-pairs shortest paths
                    

Network Flow

Max-Flow Min-Cut Theorem

In any flow network, the maximum value of flow equals the minimum capacity of all cuts.

Ford-Fulkerson Method

                        1. Initialize flow f = 0
                        2. While there exists augmenting path P from source to sink:
                           - Find bottleneck capacity c along P
                           - Augment flow along P by c
                           - Update residual graph
                        3. Return maximum flow
                        
                        Time: O(E × max_flow) for integer capacities
                        With Edmonds-Karp (BFS): O(VE²)
                    

CS Applications:

Social networks, internet routing, database optimization, compiler optimization, bioinformatics, recommendation systems

Graph Coloring

Chromatic Number

\chi(G) = $ minimum number of colors needed $\chi(K_n) = n$ $\chi(C_n) = 2$ if $n$ even, $3$ if $n$ odd $\chi(\text{bipartite graph}) = 2$ Four Color Theorem: $\chi(\text{planar graph}) \leq 4

Greedy Coloring

Brooks' Theorem: If $G$ connected, not complete, not odd cycle, then $\chi(G) \leq \Delta(G)$ where $\Delta(G) = $ maximum degree

Number Theory

Divisibility & GCD

Division Algorithm

For integers $a, b$ with $b > 0$: $$a = qb + r \text{ where } 0 \leq r < b$$ $q = $ quotient, $r = $ remainder $a \equiv r \pmod{b}$

Euclidean Algorithm

$\gcd(a, b) = \gcd(b, a \bmod b)$$ Extended Euclidean Algorithm: $$\gcd(a, b) = ax + by$$ for some integers $x, y$ Time complexity: $O(\log \min(a, b))

Properties

$\gcd(a, b) \times \text{lcm}(a, b) = ab$$ If $\gcd(a, n) = 1$ and $\gcd(b, n) = 1$, then $\gcd(ab, n) = 1$ Bézout's identity: $\gcd(a, b) = ax + by

Modular Arithmetic

Congruences

a \equiv b \pmod{n}$ iff $n \mid (a - b)$ Properties: $(a + b) \bmod n = ((a \bmod n) + (b \bmod n)) \bmod n$ $(a \times b) \bmod n = ((a \bmod n) \times (b \bmod n)) \bmod n$ Reflexive: $a \equiv a \pmod{n}$ Symmetric: $a \equiv b \implies b \equiv a \pmod{n}$ Transitive: $a \equiv b, b \equiv c \implies a \equiv c \pmod{n}

Modular Inverse

a^{-1} \pmod{n}$ exists iff $\gcd(a, n) = 1$ Finding inverse: use Extended Euclidean Algorithm $ax + ny = 1 \implies ax \equiv 1 \pmod{n} \implies a^{-1} \equiv x \pmod{n}$ Fermat's Little Theorem: If $p$ prime, $\gcd(a, p) = 1$: $$a^{-1} \equiv a^{p-2} \pmod{p}$

Prime Numbers

Fundamental Theorem of Arithmetic

Every integer $n > 1$ has a unique prime factorization:

n = p_1^{a_1} \times p_2^{a_2} \times \cdots \times p_k^{a_k}

where $p_1 < p_2 < \cdots < p_k$ are primes and $a_i > 0$

Prime Testing

Trial division: test divisibility up to $\sqrt{n}$ Time: $O(\sqrt{n})$ Sieve of Eratosthenes: find all primes $\leq n$ Time: $O(n \log \log n)$ Miller-Rabin: probabilistic primality test Time: $O(k \log^3 n)$ for $k$ rounds

Prime Distribution

Prime Number Theorem: $$\pi(n) \approx \frac{n}{\ln(n)}$$ where $\pi(n) = $ number of primes $\leq n$ There are infinitely many primes

Euler's Function

Definition & Properties

$\varphi(n) = |\{k : 1 \leq k \leq n, \gcd(k, n) = 1\}|$$ $\varphi(p) = p - 1$ for prime $p$ $\varphi(p^k) = p^k - p^{k-1} = p^{k-1}(p - 1)$ If $\gcd(m, n) = 1$: $\varphi(mn) = \varphi(m)\varphi(n)

Euler's Theorem

If $\gcd(a, n) = 1$, then: $$a^{\varphi(n)} \equiv 1 \pmod{n}$$ Fermat's Little Theorem (special case): If $p$ prime and $\gcd(a, p) = 1$: $$a^{p-1} \equiv 1 \pmod{p}$$

Chinese Remainder Theorem

CRT Statement

If $n_1, n_2, \ldots, n_k$ are pairwise coprime, then the system:

\begin{cases} x \equiv a_1 \pmod{n_1} \\ x \equiv a_2 \pmod{n_2} \\ \vdots \\ x \equiv a_k \pmod{n_k} \end{cases}

has a unique solution modulo $N = n_1n_2 \cdots n_k$

CRT Algorithm

                        1. Compute N = n₁n₂...nₖ
                        2. For each i, compute Nᵢ = N/nᵢ
                        3. Find Mᵢ such that NᵢMᵢ ≡ 1 (mod nᵢ)
                        4. Solution: x ≡ Σ aᵢNᵢMᵢ (mod N)
                    

CS Applications:

Cryptography (RSA, ElGamal), hash functions, pseudorandom number generation, error-correcting codes, computer algebra

Algorithm Analysis

Asymptotic Notation

Big O Notation

$f(n) = O(g(n))$ if $\exists c > 0, n_0$ such that $f(n) \leq c \cdot g(n)$ for all $n \geq n_0$ Upper bound: $f$ grows no faster than $g$ $f(n) = \Omega(g(n))$ if $g(n) = O(f(n))$ Lower bound: $f$ grows at least as fast as $g$ $f(n) = \Theta(g(n))$ if $f(n) = O(g(n))$ and $f(n) = \Omega(g(n))$ Tight bound: $f$ and $g$ grow at same rate

Common Complexities

$$O(1) < O(\log n) < O(n) < O(n \log n) < O(n^2) < O(n^3) < O(2^n) < O(n!)$$ Logarithmic: binary search, balanced tree operations Linear: linear search, single loop Linearithmic: merge sort, heap sort Quadratic: nested loops, simple sorting Exponential: brute force subsets Factorial: brute force permutations

Sorting Algorithms

Algorithm	Best Case	Average Case	Worst Case	Space	Stable
Bubble Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Selection Sort	O(n²)	O(n²)	O(n²)	O(1)	No
Insertion Sort	O(n)	O(n²)	O(n²)	O(1)	Yes
Merge Sort	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes
Quick Sort	O(n log n)	O(n log n)	O(n²)	O(log n)	No
Heap Sort	O(n log n)	O(n log n)	O(n log n)	O(1)	No
Counting Sort	O(n + k)	O(n + k)	O(n + k)	O(k)	Yes
Radix Sort	O(d(n + k))	O(d(n + k))	O(d(n + k))	O(n + k)	Yes

Search Algorithms

Binary Search

Prerequisite: sorted array Algorithm: ``` while low ≤ high: mid = (low + high) / 2 if arr[mid] == target: return mid elif arr[mid] < target: low = mid + 1 else: high = mid - 1 ``` Time: $O(\log n)$, Space: $O(1)$

Hash Table Search

Average case: $O(1)$ for search, insert, delete Worst case: $O(n)$ if many collisions Load factor $\alpha = \frac{n}{m}$ (items/buckets) Keep $\alpha < 0.75$ for good performance Collision resolution: chaining, open addressing

Dynamic Programming

Optimal Substructure

A problem has optimal substructure if optimal solution contains optimal solutions to subproblems Examples: shortest paths, matrix chain multiplication, optimal binary search trees, edit distance

Overlapping Subproblems

Recursive algorithm solves same subproblems repeatedly Memoization: top-down approach, store results Tabulation: bottom-up approach, fill table Fibonacci: naive $O(2^n)$, DP $O(n)$

Classic DP Problems

                        Knapsack Problem: max value with weight constraint
                        DP[i][w] = max value using first i items, weight ≤ w
                        
                        Longest Common Subsequence:
                        LCS[i][j] = length of LCS of first i chars of X, first j chars of Y
                        
                        Edit Distance (Levenshtein):
                        ED[i][j] = min operations to transform X[1..i] to Y[1..j]
                    

Greedy Algorithms

Greedy Choice Property

Global optimum can be reached by making locally optimal choices at each step

Activity Selection

Select maximum number of non-overlapping activities Greedy strategy: sort by finish time, always pick earliest finishing activity Time: $O(n \log n)$

Huffman Coding

Optimal prefix-free binary codes Greedy strategy: repeatedly merge two nodes with minimum frequency Time: $O(n \log n)$

Complexity Classes

P vs NP

P: problems solvable in polynomial time NP: problems verifiable in polynomial time NP-Complete: hardest problems in NP NP-Hard: at least as hard as NP-Complete $P \subseteq NP$, but $P = NP$? is open question

Reduction

Problem $A$ reduces to $B$ ($A \leq_p B$) if $A$ can be solved using polynomial number of calls to $B$ If $A \leq_p B$ and $B \in P$, then $A \in P$ Used to prove NP-completeness

Cryptography

Classical Cryptography

Caesar Cipher

Encryption: $E(x) = (x + k) \bmod 26$ Decryption: $D(y) = (y - k) \bmod 26$ Key space: 26 possible keys Vulnerable to frequency analysis

Affine Cipher

Encryption: $E(x) = (ax + b) \bmod 26$ Decryption: $D(y) = a^{-1}(y - b) \bmod 26$ Requirement: $\gcd(a, 26) = 1$ Key space: $\varphi(26) \times 26 = 12 \times 26 = 312$

Vigenère Cipher

Encryption: $C_i = (M_i + K_{i \bmod \text{keylen}}) \bmod 26$ Polyalphabetic substitution Broken by Kasiski examination and index of coincidence

Public Key Cryptography

RSA Algorithm

Key Generation: 1. Choose large primes $p, q$ 2. $n = pq$, $\varphi(n) = (p-1)(q-1)$ 3. Choose $e$: $\gcd(e, \varphi(n)) = 1$ 4. Find $d$: $ed \equiv 1 \pmod{\varphi(n)}$ Public key: $(n, e)$ Private key: $(n, d)$ Encryption: $C = M^e \bmod n$ Decryption: $M = C^d \bmod n$

Diffie-Hellman Key Exchange

Public: prime $p$, generator $g$ Alice: chooses $a$, sends $g^a \bmod p$ Bob: chooses $b$, sends $g^b \bmod p$ Shared secret: $K = g^{ab} \bmod p$ Security based on discrete log problem

ElGamal Encryption

Key Generation: Choose prime $p$, generator $g$, private key $x$ Public key: $y = g^x \bmod p$ Encryption of $M$: Choose random $k$, send $(g^k \bmod p, M \cdot y^k \bmod p)$ Decryption: $M = c_2 \cdot (c_1^x)^{-1} \bmod p$

Hash Functions

Properties

Pre-image resistance: given $h$, hard to find $x$ with $H(x) = h$ Second pre-image resistance: given $x$, hard to find $x' \neq x$ with $H(x) = H(x')$ Collision resistance: hard to find $x, x'$ with $H(x) = H(x')$ Birthday paradox: expect collision after $\sqrt{2^n}$ inputs for $n$-bit hash

Applications

Digital signatures: sign hash instead of message Password storage: store $H(\text{password} + \text{salt})$ Proof of work: find nonce with $H(\text{block} + \text{nonce}) < \text{target}$ Data integrity: compare hash values

Digital Signatures

RSA Signatures

Signing: $S = H(M)^d \bmod n$ Verification: check if $S^e \equiv H(M) \pmod{n}$ Must use padding (PSS) to prevent attacks

DSA (Digital Signature Algorithm)

Parameters: prime $p$, $q \mid (p-1)$, generator $g$ Private key: $x$, Public key: $y = g^x \bmod p$ Signing $M$: $k = $ random, $r = (g^k \bmod p) \bmod q$ $s = k^{-1}(H(M) + xr) \bmod q$ Signature: $(r, s)$ Verification: check if $r = ((g^{H(M)s^{-1}} \cdot y^{rs^{-1}}) \bmod p) \bmod q$

CS Applications:

Secure communications, blockchain technology, password authentication, digital certificates, software signing, cryptocurrency

Optimization

Linear Programming

Standard Form

Minimize: $\mathbf{c}^T \mathbf{x}$ Subject to: $A\mathbf{x} = \mathbf{b}$, $\mathbf{x} \geq \mathbf{0}$ Can convert any LP to standard form: - $\max f \rightarrow \min(-f)$ - inequality $\rightarrow$ equality (add slack variables) - unrestricted variable $\rightarrow$ difference of non-negative variables

Simplex Method

1. Start at basic feasible solution (vertex of polytope) 2. Check optimality conditions 3. If not optimal, move to adjacent vertex that improves objective 4. Repeat until optimal or unbounded Worst case: exponential Average case: polynomial

Fundamental Theorem of Linear Programming

If LP has optimal solution, then it has optimal basic feasible solution (vertex of feasible region)

Duality Theorem

Primal: $\min \mathbf{c}^T \mathbf{x}$ subject to $A\mathbf{x} \geq \mathbf{b}$, $\mathbf{x} \geq \mathbf{0}$ Dual: $\max \mathbf{b}^T \mathbf{y}$ subject to $A^T \mathbf{y} \leq \mathbf{c}$, $\mathbf{y} \geq \mathbf{0}$ Strong duality: if both have optimal solutions, optimal values are equal

Nonlinear Optimization

Unconstrained Optimization

First-order condition: $\nabla f(\mathbf{x}^*) = \mathbf{0}$ Second-order condition: $\nabla^2 f(\mathbf{x}^*)$ positive definite Gradient descent: $\mathbf{x}_{k+1} = \mathbf{x}_k - \alpha_k \nabla f(\mathbf{x}_k)$ Newton's method: $\mathbf{x}_{k+1} = \mathbf{x}_k - [\nabla^2 f(\mathbf{x}_k)]^{-1} \nabla f(\mathbf{x}_k)$

Constrained Optimization

KKT conditions for $\min f(\mathbf{x})$ subject to $g(\mathbf{x}) \leq \mathbf{0}$, $h(\mathbf{x}) = \mathbf{0}$: $\nabla f(\mathbf{x}^*) + \sum_i \lambda_i \nabla g_i(\mathbf{x}^*) + \sum_j \mu_j \nabla h_j(\mathbf{x}^*) = \mathbf{0}$ $g_i(\mathbf{x}^*) \leq 0$, $h_j(\mathbf{x}^*) = 0$ $\lambda_i \geq 0$, $\lambda_i g_i(\mathbf{x}^*) = 0$ (complementary slackness)

Convex Optimization

Convex Sets & Functions

Convex set: $\forall \mathbf{x}, \mathbf{y} \in S, \forall \lambda \in [0,1]: \lambda\mathbf{x} + (1-\lambda)\mathbf{y} \in S$ Convex function: $f(\lambda\mathbf{x} + (1-\lambda)\mathbf{y}) \leq \lambda f(\mathbf{x}) + (1-\lambda)f(\mathbf{y})$ Properties: - Local minimum = global minimum - First-order condition sufficient for optimality

Interior Point Methods

Barrier method: replace constraints $g(\mathbf{x}) \leq \mathbf{0}$ with penalty terms $-t \sum_i \log(-g_i(\mathbf{x}))$ As $t \to \infty$, solution approaches constrained optimum Polynomial time complexity for convex problems

CS Applications:

Machine learning (SVM, neural network training), resource allocation, network flow optimization, compiler optimization, game theory

Integer Programming

Branch and Bound

1. Solve LP relaxation 2. If solution integer, update best bound 3. If not integer, branch on fractional variable 4. Recursively solve subproblems 5. Prune if bound worse than current best NP-hard in general, but practical for many instances

Cutting Planes

Add linear constraints (cuts) that remove fractional solutions but preserve all integer solutions Gomory cuts: derived from optimal LP tableau Problem-specific cuts often more effective

Mathematical Software & Tools

Symbolic Computation

Mathematica, Maple, SymPy (Python) Exact arithmetic, symbolic integration/differentiation, equation solving, simplification Computer algebra systems

Numerical Computing

MATLAB, NumPy/SciPy (Python), R Linear algebra (BLAS, LAPACK) Optimization (CPLEX, Gurobi) Statistics and data analysis

Proof Assistants

Coq, Lean, Agda, Isabelle/HOL Formal verification of mathematical proofs Computer-checked mathematics Growing importance in CS theory

Computational Complexity

Complexity Zoo: database of complexity classes P, NP, PSPACE, EXPTIME, etc. Reductions and completeness results Interactive proofs, quantum complexity