Orthogonality is the key to solving inconsistent systems optimally, compressing signals, and decomposing functions. This chapter builds from the familiar dot product to inner product spaces — a framework encompassing polynomials, continuous functions, and Fourier analysis. Section 5.7 goes beyond the course syllabus into one of the most elegant and useful structures in applied mathematics.
Section 5.1
The Scalar Product in $\mathbb{R}^n$
Dot Product, Length, and Angle
For $\mathbf{x},\mathbf{y}\in\mathbb{R}^n$:
$$\mathbf{x}^T\mathbf{y} = x_1y_1 + \cdots + x_ny_n,\quad \|\mathbf{x}\| = \sqrt{\mathbf{x}^T\mathbf{x}},\quad \cos\theta = \frac{\mathbf{x}^T\mathbf{y}}{\|\mathbf{x}\|\|\mathbf{y}\|}$$
$\mathbf{x}\perp\mathbf{y}$ iff $\mathbf{x}^T\mathbf{y}=0$. A vector with $\|\mathbf{u}\|=1$ is a unit vector.
Cauchy-Schwarz and Triangle Inequalities
Two Fundamental Inequalities
Cauchy-Schwarz: $\;|\mathbf{x}^T\mathbf{y}| \leq \|\mathbf{x}\|\|\mathbf{y}\|$, with equality iff one is a scalar multiple of the other.
Triangle: $\;\|\mathbf{x}+\mathbf{y}\| \leq \|\mathbf{x}\|+\|\mathbf{y}\|$. The straight line is always the shortest path.
Vector Projection
The projection of $\mathbf{x}$ onto direction $\mathbf{y}$ is the component of $\mathbf{x}$ lying along $\mathbf{y}$:
The residual $\mathbf{x}-\mathbf{p}$ is orthogonal to $\mathbf{y}$. The projection matrix onto span$\{\mathbf{y}\}$ is $P = \frac{\mathbf{y}\mathbf{y}^T}{\mathbf{y}^T\mathbf{y}}$ — it is symmetric and idempotent ($P^2=P$).
Drag $\mathbf{x}$ (blue arrowhead). The green arrow is the projection $\mathbf{p}$ onto $\mathbf{y}$ (purple). The dashed segment $\mathbf{x}-\mathbf{p}$ is always perpendicular to $\mathbf{y}$ — the right-angle marker confirms this.
📘 Example 5.1 — Closest point on a line
Find the point on $y = \frac{1}{3}x$ closest to $Q=(1,4)$.
Direction vector $\mathbf{w}=(3,1)^T$, let $\mathbf{v}=(1,4)^T$.
$$\mathbf{p} = \frac{(1)(3)+(4)(1)}{9+1}(3,1)^T = \frac{7}{10}(3,1)^T = (2.1,\;0.7)^T$$
Verify perpendicularity: $(1,4)-(2.1,0.7)=(-1.1,3.3)$ and $(-1.1)(3)+(3.3)(1)=0$ ✓
Closest point: $(2.1,\ 0.7)$
Fundamental Theorem of Linear Algebra: For any $m\times n$ matrix $A$:
$N(A) = (\text{row space of }A)^\perp$ — null space ⊥ row space, both in $\mathbb{R}^n$
$N(A^T) = R(A)^\perp$ — left null space ⊥ column space, both in $\mathbb{R}^m$
Every $\mathbf{v}\in\mathbb{R}^n$ splits uniquely: $\mathbf{v}=\mathbf{p}+\mathbf{z}$, $\mathbf{p}\in$ row space, $\mathbf{z}\in N(A)$.
Section 5.3
Least Squares Problems
When $A\mathbf{x}=\mathbf{b}$ is inconsistent (overdetermined), we want $\hat{\mathbf{x}}$ minimizing $\|A\hat{\mathbf{x}}-\mathbf{b}\|^2$. Geometrically: project $\mathbf{b}$ onto the column space of $A$. The error $\mathbf{b}-A\hat{\mathbf{x}}$ must be perpendicular to every column of $A$.
Normal Equations (Theorem 5.3.1)
$$A^TA\hat{\mathbf{x}} = A^T\mathbf{b}$$
If $A$ has linearly independent columns: $\;\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b}$. The matrix $A^+ = (A^TA)^{-1}A^T$ is the pseudoinverse of $A$ (when $A$ has full column rank).
📘 Example 5.2 — Least Squares Line Fitting
Fit $y=c_1+c_2 x$ to data $(0,0),(1,1),(2,3),(3,4)$:
$$A=\begin{pmatrix}1&0\\1&1\\1&2\\1&3\end{pmatrix},\quad\mathbf{b}=\begin{pmatrix}0\\1\\3\\4\end{pmatrix}$$
$$A^TA=\begin{pmatrix}4&6\\6&14\end{pmatrix},\quad A^T\mathbf{b}=\begin{pmatrix}8\\19\end{pmatrix}$$
Solving the $2\times2$ system: $c_1=-0.1$, $c_2=1.4$.
Best fit line: $y=-0.1+1.4x$. Sum of squared residuals $= (0.1)^2+(0.5)^2+(0.3)^2+(0.1)^2=0.36$ — minimized.
Section 5.4
Inner Product Spaces
Abstract Inner Product
An inner product on $V$: a function $\langle\cdot,\cdot\rangle:V\times V\to\mathbb{R}$ satisfying symmetry, linearity in the first argument, and positive-definiteness. Examples:
$C[a,b]$: $\langle f,g\rangle = \int_a^b f(x)g(x)\,dx$ — used in Fourier analysis
$C[a,b]$ with weight $w$: $\langle f,g\rangle = \int_a^b f(x)g(x)w(x)\,dx$ — key for orthogonal polynomials
All theorems from $\mathbb{R}^n$ (Cauchy-Schwarz, Gram-Schmidt, etc.) hold in any inner product space.
Section 5.5
Orthonormal Sets
Orthonormal Set & Orthogonal Matrix
$\{\mathbf{u}_1,\ldots,\mathbf{u}_k\}$ is orthonormal if $\langle\mathbf{u}_i,\mathbf{u}_j\rangle=\delta_{ij}$. A square $Q$ with orthonormal columns is an orthogonal matrix: $Q^T=Q^{-1}$, i.e., $Q^TQ=I$.
Why Orthonormal Bases Are So Convenient
If $\{\mathbf{u}_1,\ldots,\mathbf{u}_n\}$ is orthonormal in $V$, then for any $\mathbf{v}\in V$:
$$\mathbf{v} = \langle\mathbf{v},\mathbf{u}_1\rangle\mathbf{u}_1 + \cdots + \langle\mathbf{v},\mathbf{u}_n\rangle\mathbf{u}_n$$
Each coordinate is just an inner product — no system to solve! This is the foundation of Fourier series.
Section 5.6
Gram-Schmidt & QR Factorization
Gram-Schmidt Process (Theorem 5.6.1)
Given basis $\{\mathbf{x}_1,\ldots,\mathbf{x}_n\}$, produce orthonormal $\{\mathbf{u}_1,\ldots,\mathbf{u}_n\}$ spanning the same space:
$$\tilde{\mathbf{u}}_k = \mathbf{x}_k - \sum_{i=1}^{k-1}\langle\mathbf{x}_k,\mathbf{u}_i\rangle\mathbf{u}_i, \qquad \mathbf{u}_k = \tilde{\mathbf{u}}_k/\|\tilde{\mathbf{u}}_k\|$$
The result: $A = QR$ where $Q$ has orthonormal columns and $R$ is upper triangular with positive diagonal.
Gram-Schmidt step by step: subtract the projection of $\mathbf{x}_2$ onto $\mathbf{u}_1$ to get the perpendicular component, then normalize. Click to advance through each step.
What if we apply the Gram-Schmidt idea to polynomials? The result is one of the most powerful and beautiful structures in mathematics — orthogonal polynomial families used in numerical analysis, quantum mechanics, approximation theory, and machine learning. The inner product is:
where $w(x)>0$ is a weight function that focuses attention on different parts of the interval.
Orthogonal Polynomial Sequence
A sequence $\{p_0,p_1,p_2,\ldots\}$ with $\deg p_i = i$ is orthogonal w.r.t. inner product $\langle\cdot,\cdot\rangle_w$ if $\langle p_i,p_j\rangle_w=0$ whenever $i\neq j$. Orthonormal if additionally $\langle p_i,p_i\rangle_w=1$.
Theorem 5.7.1: Any $n$ consecutive orthogonal polynomials $p_0,\ldots,p_{n-1}$ form a basis for $P_n$.
The Three-Term Recurrence Relation
Theorem 5.7.2 — Three-Term Recurrence
Every sequence of orthogonal polynomials satisfies a recurrence. If $a_n$ is the leading coefficient of $p_n$:
$$\alpha_{n+1}p_{n+1}(x) = (x-\beta_{n+1})p_n(x) - \alpha_n\gamma_n p_{n-1}(x), \qquad n\geq 0$$
where $\beta_{n+1}=\langle p_n,xp_n\rangle/\langle p_n,p_n\rangle$ and $\gamma_n=\langle p_n,p_n\rangle/\langle p_{n-1},p_{n-1}\rangle$. This recurrence generates the entire family from $p_0=1$ and $p_1$.
Legendre Polynomials
The most important family: $w(x)=1$ on $[-1,1]$. Inner product $\langle f,g\rangle = \int_{-1}^1 f(x)g(x)\,dx$.
Recurrence (simplified): $(n+1)P_{n+1}(x) = (2n+1)x P_n(x) - n P_{n-1}(x)$. The Legendre polynomials are the eigenfunctions of the Legendre differential equation and appear in solutions to Laplace's equation in spherical coordinates.
Legendre polynomials $P_0$ through $P_4$. Note how each $P_n$ has exactly $n$ zeros, all in $(-1,1)$ — guaranteed by Theorem 5.7.3. Click a polynomial to highlight it.
Chebyshev Polynomials
Weight function $w(x) = (1-x^2)^{-1/2}$ on $(-1,1)$. The remarkable property:
$$T_n(\cos\theta) = \cos(n\theta)$$
This means $T_n$ achieves values between $-1$ and $1$ at $n+1$ equally-spaced points in $\theta$ — giving it the equioscillation property. Chebyshev polynomials minimize the maximum error in polynomial interpolation (the Chebyshev nodes are the optimal interpolation points).
Family
Interval
Weight $w(x)$
Recurrence
Key Application
Legendre $P_n$
$[-1,1]$
$1$
$(n+1)P_{n+1}=(2n+1)xP_n-nP_{n-1}$
Gauss-Legendre quadrature, PDEs
Chebyshev $T_n$
$[-1,1]$
$(1-x^2)^{-1/2}$
$T_{n+1}=2xT_n-T_{n-1}$
Min-max approximation, interpolation nodes
Hermite $H_n$
$(-\infty,\infty)$
$e^{-x^2}$
$H_{n+1}=2xH_n-2nH_{n-1}$
Quantum harmonic oscillator, probabilists
Laguerre $L_n$
$[0,\infty)$
$e^{-x}$
$(n+1)L_{n+1}=(2n+1-x)L_n-nL_{n-1}$
Hydrogen atom radial wavefunctions
Theorem 5.7.3 — Zeros are Real and Simple
If $\{p_n\}$ is orthogonal w.r.t. $\langle f,g\rangle_w = \int_a^b fg\,w\,dx$, then the zeros of $p_n(x)$ are all real, distinct, and lie strictly inside $(a,b)$. This makes them ideal as nodes for numerical integration.
Application: Gaussian Quadrature
To approximate $\int_a^b f(x)w(x)\,dx$, use the $n$ zeros $x_1,\ldots,x_n$ of $p_n$ as quadrature nodes and compute specific weights $A_1,\ldots,A_n$:
This $n$-point rule is exact for all polynomials of degree $\leq 2n-1$ — more than double what equally-spaced nodes achieve. Gauss-Legendre quadrature with 5 nodes gives exact integrals of polynomials up to degree 9!
📘 Example 5.4 — Legendre Best Approximation
Find the best least squares polynomial approximation of degree $\leq2$ to $f(x)=e^x$ on $[-1,1]$ using $\langle f,g\rangle=\int_{-1}^1 fg\,dx$.
Since $\{P_0,P_1,P_2\}$ is an orthogonal basis for $P_3$:
$$\hat{f}(x) = c_0 P_0 + c_1 P_1 + c_2 P_2, \qquad c_k = \frac{\langle e^x, P_k\rangle}{\langle P_k,P_k\rangle}$$
$$c_0 = \frac{\int_{-1}^1 e^x\,dx}{2} = \sinh(1)\approx1.1752$$
$$c_1 = \frac{\int_{-1}^1 xe^x\,dx}{2/3} = \frac{2/e - 2e^{-1} + 2}{2/3}\approx1.1036$$
$$c_2 = \frac{\int_{-1}^1 e^x\cdot\tfrac{1}{2}(3x^2-1)\,dx}{2/5}\approx 0.3578$$
$\hat{f}(x)\approx1.1752 + 1.1036x + 0.5367(3x^2-1)/2$. The error $\|e^x-\hat{f}\|^2$ is globally minimized — no other degree-2 polynomial fits better in the $L^2$ sense.
🔭 Going Further
Fourier Series: $\{\cos(nx), \sin(nx)\}$ are orthogonal on $[0,2\pi]$ — the orthogonal polynomial idea generalized to trigonometric functions
Spectral Methods: PDEs solved by expanding solutions in Chebyshev/Legendre series — exponential convergence for smooth solutions
Random Matrix Theory: The eigenvalue distributions of large random matrices connect to Hermite polynomials
Connection to eigenvalues: The zeros of $p_n$ are eigenvalues of a tridiagonal matrix built from the recurrence coefficients — connecting Gram-Schmidt, eigenvalue theory (Chapter 6), and numerical analysis (Chapter 7)