Chapter 5: Orthogonality | Linear Algebra — Leon 8th Ed.

Leon 8th Edition

Chapter 5: Orthogonality

Scalar product · Orthogonal subspaces · Least squares · Inner product spaces · Gram-Schmidt · Orthogonal Polynomials

Orthogonality is the key to solving inconsistent systems optimally, compressing signals, and decomposing functions. This chapter builds from the familiar dot product to inner product spaces — a framework encompassing polynomials, continuous functions, and Fourier analysis. Section 5.7 goes beyond the course syllabus into one of the most elegant and useful structures in applied mathematics.

Section 5.1

The Scalar Product in $\mathbb{R}^n$

Dot Product, Length, and Angle

For $\mathbf{x},\mathbf{y}\in\mathbb{R}^n$: $$\mathbf{x}^T\mathbf{y} = x_1y_1 + \cdots + x_ny_n,\quad \|\mathbf{x}\| = \sqrt{\mathbf{x}^T\mathbf{x}},\quad \cos\theta = \frac{\mathbf{x}^T\mathbf{y}}{\|\mathbf{x}\|\|\mathbf{y}\|}$$ $\mathbf{x}\perp\mathbf{y}$ iff $\mathbf{x}^T\mathbf{y}=0$. A vector with $\|\mathbf{u}\|=1$ is a unit vector.

Cauchy-Schwarz and Triangle Inequalities

Two Fundamental Inequalities

Cauchy-Schwarz: $\;|\mathbf{x}^T\mathbf{y}| \leq \|\mathbf{x}\|\|\mathbf{y}\|$, with equality iff one is a scalar multiple of the other.

Triangle: $\;\|\mathbf{x}+\mathbf{y}\| \leq \|\mathbf{x}\|+\|\mathbf{y}\|$. The straight line is always the shortest path.

Vector Projection

The projection of $\mathbf{x}$ onto direction $\mathbf{y}$ is the component of $\mathbf{x}$ lying along $\mathbf{y}$:

$$\mathbf{p} = \frac{\mathbf{x}^T\mathbf{y}}{\|\mathbf{y}\|^2}\,\mathbf{y} = \frac{\mathbf{x}^T\mathbf{y}}{\mathbf{y}^T\mathbf{y}}\,\mathbf{y}$$

The residual $\mathbf{x}-\mathbf{p}$ is orthogonal to $\mathbf{y}$. The projection matrix onto span$\{\mathbf{y}\}$ is $P = \frac{\mathbf{y}\mathbf{y}^T}{\mathbf{y}^T\mathbf{y}}$ — it is symmetric and idempotent ($P^2=P$).

Drag $\mathbf{x}$ (blue arrowhead). The green arrow is the projection $\mathbf{p}$ onto $\mathbf{y}$ (purple). The dashed segment $\mathbf{x}-\mathbf{p}$ is always perpendicular to $\mathbf{y}$ — the right-angle marker confirms this.

📘 Example 5.1 — Closest point on a line

Find the point on $y = \frac{1}{3}x$ closest to $Q=(1,4)$. Direction vector $\mathbf{w}=(3,1)^T$, let $\mathbf{v}=(1,4)^T$. $$\mathbf{p} = \frac{(1)(3)+(4)(1)}{9+1}(3,1)^T = \frac{7}{10}(3,1)^T = (2.1,\;0.7)^T$$ Verify perpendicularity: $(1,4)-(2.1,0.7)=(-1.1,3.3)$ and $(-1.1)(3)+(3.3)(1)=0$ ✓ Closest point: $(2.1,\ 0.7)$

Section 5.2

Orthogonal Subspaces

Orthogonal Complement & Fundamental Theorem

$S^\perp = \{\mathbf{v}\mid \mathbf{v}^T\mathbf{s}=0\;\forall\,\mathbf{s}\in S\}$

Fundamental Theorem of Linear Algebra: For any $m\times n$ matrix $A$:

$N(A) = (\text{row space of }A)^\perp$ — null space ⊥ row space, both in $\mathbb{R}^n$
$N(A^T) = R(A)^\perp$ — left null space ⊥ column space, both in $\mathbb{R}^m$

Every $\mathbf{v}\in\mathbb{R}^n$ splits uniquely: $\mathbf{v}=\mathbf{p}+\mathbf{z}$, $\mathbf{p}\in$ row space, $\mathbf{z}\in N(A)$.

Section 5.3

Least Squares Problems

When $A\mathbf{x}=\mathbf{b}$ is inconsistent (overdetermined), we want $\hat{\mathbf{x}}$ minimizing $\|A\hat{\mathbf{x}}-\mathbf{b}\|^2$. Geometrically: project $\mathbf{b}$ onto the column space of $A$. The error $\mathbf{b}-A\hat{\mathbf{x}}$ must be perpendicular to every column of $A$.

Normal Equations (Theorem 5.3.1)

$$A^TA\hat{\mathbf{x}} = A^T\mathbf{b}$$ If $A$ has linearly independent columns: $\;\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b}$. The matrix $A^+ = (A^TA)^{-1}A^T$ is the pseudoinverse of $A$ (when $A$ has full column rank).

📘 Example 5.2 — Least Squares Line Fitting

Fit $y=c_1+c_2 x$ to data $(0,0),(1,1),(2,3),(3,4)$: $$A=\begin{pmatrix}1&0\\1&1\\1&2\\1&3\end{pmatrix},\quad\mathbf{b}=\begin{pmatrix}0\\1\\3\\4\end{pmatrix}$$ $$A^TA=\begin{pmatrix}4&6\\6&14\end{pmatrix},\quad A^T\mathbf{b}=\begin{pmatrix}8\\19\end{pmatrix}$$ Solving the $2\times2$ system: $c_1=-0.1$, $c_2=1.4$. Best fit line: $y=-0.1+1.4x$. Sum of squared residuals $= (0.1)^2+(0.5)^2+(0.3)^2+(0.1)^2=0.36$ — minimized.

Section 5.4

Inner Product Spaces

Abstract Inner Product

An inner product on $V$: a function $\langle\cdot,\cdot\rangle:V\times V\to\mathbb{R}$ satisfying symmetry, linearity in the first argument, and positive-definiteness. Examples:

$\mathbb{R}^n$: $\langle\mathbf{x},\mathbf{y}\rangle = \mathbf{x}^T\mathbf{y}$
$C[a,b]$: $\langle f,g\rangle = \int_a^b f(x)g(x)\,dx$ — used in Fourier analysis
$C[a,b]$ with weight $w$: $\langle f,g\rangle = \int_a^b f(x)g(x)w(x)\,dx$ — key for orthogonal polynomials

All theorems from $\mathbb{R}^n$ (Cauchy-Schwarz, Gram-Schmidt, etc.) hold in any inner product space.

Section 5.5

Orthonormal Sets

Orthonormal Set & Orthogonal Matrix

$\{\mathbf{u}_1,\ldots,\mathbf{u}_k\}$ is orthonormal if $\langle\mathbf{u}_i,\mathbf{u}_j\rangle=\delta_{ij}$. A square $Q$ with orthonormal columns is an orthogonal matrix: $Q^T=Q^{-1}$, i.e., $Q^TQ=I$.

Why Orthonormal Bases Are So Convenient

If $\{\mathbf{u}_1,\ldots,\mathbf{u}_n\}$ is orthonormal in $V$, then for any $\mathbf{v}\in V$: $$\mathbf{v} = \langle\mathbf{v},\mathbf{u}_1\rangle\mathbf{u}_1 + \cdots + \langle\mathbf{v},\mathbf{u}_n\rangle\mathbf{u}_n$$ Each coordinate is just an inner product — no system to solve! This is the foundation of Fourier series.

Section 5.6

Gram-Schmidt & QR Factorization

Gram-Schmidt Process (Theorem 5.6.1)

Given basis $\{\mathbf{x}_1,\ldots,\mathbf{x}_n\}$, produce orthonormal $\{\mathbf{u}_1,\ldots,\mathbf{u}_n\}$ spanning the same space: $$\tilde{\mathbf{u}}_k = \mathbf{x}_k - \sum_{i=1}^{k-1}\langle\mathbf{x}_k,\mathbf{u}_i\rangle\mathbf{u}_i, \qquad \mathbf{u}_k = \tilde{\mathbf{u}}_k/\|\tilde{\mathbf{u}}_k\|$$ The result: $A = QR$ where $Q$ has orthonormal columns and $R$ is upper triangular with positive diagonal.

Gram-Schmidt step by step: subtract the projection of $\mathbf{x}_2$ onto $\mathbf{u}_1$ to get the perpendicular component, then normalize. Click to advance through each step.

📘 Example 5.3 — QR Factorization (3×3)

$\mathbf{x}_1=(1,1,1)^T$, $\mathbf{x}_2=(1,1,0)^T$, $\mathbf{x}_3=(1,0,0)^T$. $$\mathbf{u}_1=\frac{1}{\sqrt3}(1,1,1)^T,\quad \mathbf{u}_2=\frac{1}{\sqrt6}(1,1,-2)^T,\quad \mathbf{u}_3=\frac{1}{\sqrt2}(1,-1,0)^T$$ $$Q=\begin{pmatrix}1/\sqrt3&1/\sqrt6&1/\sqrt2\\1/\sqrt3&1/\sqrt6&-1/\sqrt2\\1/\sqrt3&-2/\sqrt6&0\end{pmatrix},\quad R=Q^TA=\begin{pmatrix}\sqrt3&2/\sqrt3&1/\sqrt3\\0&\sqrt{2/3}&1/\sqrt6\\0&0&1/\sqrt2\end{pmatrix}$$ Check: $QR=A$, $Q^TQ=I$, $R$ upper triangular with positive diagonal entries ✓

Section 5.7 Beyond Syllabus

Orthogonal Polynomials

What if we apply the Gram-Schmidt idea to polynomials? The result is one of the most powerful and beautiful structures in mathematics — orthogonal polynomial families used in numerical analysis, quantum mechanics, approximation theory, and machine learning. The inner product is:

$$\langle f,g\rangle_w = \int_a^b f(x)g(x)w(x)\,dx$$

where $w(x)>0$ is a weight function that focuses attention on different parts of the interval.

Orthogonal Polynomial Sequence

A sequence $\{p_0,p_1,p_2,\ldots\}$ with $\deg p_i = i$ is orthogonal w.r.t. inner product $\langle\cdot,\cdot\rangle_w$ if $\langle p_i,p_j\rangle_w=0$ whenever $i\neq j$. Orthonormal if additionally $\langle p_i,p_i\rangle_w=1$.

Theorem 5.7.1: Any $n$ consecutive orthogonal polynomials $p_0,\ldots,p_{n-1}$ form a basis for $P_n$.

The Three-Term Recurrence Relation

Theorem 5.7.2 — Three-Term Recurrence

Every sequence of orthogonal polynomials satisfies a recurrence. If $a_n$ is the leading coefficient of $p_n$: $$\alpha_{n+1}p_{n+1}(x) = (x-\beta_{n+1})p_n(x) - \alpha_n\gamma_n p_{n-1}(x), \qquad n\geq 0$$ where $\beta_{n+1}=\langle p_n,xp_n\rangle/\langle p_n,p_n\rangle$ and $\gamma_n=\langle p_n,p_n\rangle/\langle p_{n-1},p_{n-1}\rangle$. This recurrence generates the entire family from $p_0=1$ and $p_1$.

Legendre Polynomials

The most important family: $w(x)=1$ on $[-1,1]$. Inner product $\langle f,g\rangle = \int_{-1}^1 f(x)g(x)\,dx$.

$$P_0(x)=1,\quad P_1(x)=x,\quad P_2(x)=\tfrac{1}{2}(3x^2-1),\quad P_3(x)=\tfrac{1}{2}(5x^3-3x)$$

Recurrence (simplified): $(n+1)P_{n+1}(x) = (2n+1)x P_n(x) - n P_{n-1}(x)$. The Legendre polynomials are the eigenfunctions of the Legendre differential equation and appear in solutions to Laplace's equation in spherical coordinates.

Legendre polynomials $P_0$ through $P_4$. Note how each $P_n$ has exactly $n$ zeros, all in $(-1,1)$ — guaranteed by Theorem 5.7.3. Click a polynomial to highlight it.

Chebyshev Polynomials

Weight function $w(x) = (1-x^2)^{-1/2}$ on $(-1,1)$. The remarkable property:

$$T_n(\cos\theta) = \cos(n\theta)$$

This means $T_n$ achieves values between $-1$ and $1$ at $n+1$ equally-spaced points in $\theta$ — giving it the equioscillation property. Chebyshev polynomials minimize the maximum error in polynomial interpolation (the Chebyshev nodes are the optimal interpolation points).

Family	Interval	Weight $w(x)$	Recurrence	Key Application
Legendre $P_n$	$[-1,1]$	$1$	$(n+1)P_{n+1}=(2n+1)xP_n-nP_{n-1}$	Gauss-Legendre quadrature, PDEs
Chebyshev $T_n$	$[-1,1]$	$(1-x^2)^{-1/2}$	$T_{n+1}=2xT_n-T_{n-1}$	Min-max approximation, interpolation nodes
Hermite $H_n$	$(-\infty,\infty)$	$e^{-x^2}$	$H_{n+1}=2xH_n-2nH_{n-1}$	Quantum harmonic oscillator, probabilists
Laguerre $L_n$	$[0,\infty)$	$e^{-x}$	$(n+1)L_{n+1}=(2n+1-x)L_n-nL_{n-1}$	Hydrogen atom radial wavefunctions

Theorem 5.7.3 — Zeros are Real and Simple

If $\{p_n\}$ is orthogonal w.r.t. $\langle f,g\rangle_w = \int_a^b fg\,w\,dx$, then the zeros of $p_n(x)$ are all real, distinct, and lie strictly inside $(a,b)$. This makes them ideal as nodes for numerical integration.

Application: Gaussian Quadrature

To approximate $\int_a^b f(x)w(x)\,dx$, use the $n$ zeros $x_1,\ldots,x_n$ of $p_n$ as quadrature nodes and compute specific weights $A_1,\ldots,A_n$:

$$\int_a^b f(x)w(x)\,dx \approx \sum_{k=1}^n A_k f(x_k)$$

This $n$-point rule is exact for all polynomials of degree $\leq 2n-1$ — more than double what equally-spaced nodes achieve. Gauss-Legendre quadrature with 5 nodes gives exact integrals of polynomials up to degree 9!

📘 Example 5.4 — Legendre Best Approximation

Find the best least squares polynomial approximation of degree $\leq2$ to $f(x)=e^x$ on $[-1,1]$ using $\langle f,g\rangle=\int_{-1}^1 fg\,dx$.

Since $\{P_0,P_1,P_2\}$ is an orthogonal basis for $P_3$:

$$\hat{f}(x) = c_0 P_0 + c_1 P_1 + c_2 P_2, \qquad c_k = \frac{\langle e^x, P_k\rangle}{\langle P_k,P_k\rangle}$$ $$c_0 = \frac{\int_{-1}^1 e^x\,dx}{2} = \sinh(1)\approx1.1752$$ $$c_1 = \frac{\int_{-1}^1 xe^x\,dx}{2/3} = \frac{2/e - 2e^{-1} + 2}{2/3}\approx1.1036$$ $$c_2 = \frac{\int_{-1}^1 e^x\cdot\tfrac{1}{2}(3x^2-1)\,dx}{2/5}\approx 0.3578$$ $\hat{f}(x)\approx1.1752 + 1.1036x + 0.5367(3x^2-1)/2$. The error $\|e^x-\hat{f}\|^2$ is globally minimized — no other degree-2 polynomial fits better in the $L^2$ sense.

🔭 Going Further

Fourier Series: $\{\cos(nx), \sin(nx)\}$ are orthogonal on $[0,2\pi]$ — the orthogonal polynomial idea generalized to trigonometric functions
Spectral Methods: PDEs solved by expanding solutions in Chebyshev/Legendre series — exponential convergence for smooth solutions
Random Matrix Theory: The eigenvalue distributions of large random matrices connect to Hermite polynomials
Connection to eigenvalues: The zeros of $p_n$ are eigenvalues of a tridiagonal matrix built from the recurrence coefficients — connecting Gram-Schmidt, eigenvalue theory (Chapter 6), and numerical analysis (Chapter 7)