Note 9: Error Correcting Codes | EECS 70

EECS 70 — Fall 2021

Note 9: Error Correcting Codes

Erasure errors · General errors · Reed-Solomon codes · Berlekamp-Welch · Hamming distance

Every time your phone sends a text, your Wi-Fi streams a video, or a space probe transmits data across billions of kilometers, error correcting codes are working silently to ensure the message arrives intact. This note explains the mathematics behind these codes — built entirely on polynomials over finite fields.

Section 1

The Problem: Unreliable Channels

Alice wants to send Bob a message $m_1, m_2, \ldots, m_n$ over an unreliable channel. Two types of errors can occur:

Two Types of Errors

Erasure errors: Some packets are lost. Bob knows which ones are missing (e.g., labeled by packet headers). He receives fewer than $n$ packets.

General (corruption) errors: Some packets are corrupted — Bob receives the wrong value, but doesn't know which packets are wrong. He still receives $n$ packets total.

ERASURE ERROR: Packets lost (Bob knows which)

Alice: 315061

Bob: 3?506?

GENERAL ERROR: Packets corrupted (Bob doesn't know which)

Alice: 315061

Bob: 215061

Erasure: Bob knows positions 2 and 6 are missing. General: Bob sees 2 instead of 3 in position 1, but doesn't know position 1 is wrong.

The Central Idea

Represent the message as a polynomial $P(x)$ of degree $n-1$. Send redundant evaluations. Since a polynomial is uniquely determined by enough of its values, Bob can reconstruct $P(x)$ and recover the message even with errors.

Section 2

Erasure Errors

Reed-Solomon Erasure Code

To protect a message $m_1, \ldots, m_n$ against up to $k$ erasures:

Find the unique polynomial $P(x)$ of degree $n-1$ with $P(i) = m_i$ for $1 \leq i \leq n$. (Use Lagrange interpolation.)
Transmit the codeword $c_1 = P(1), c_2 = P(2), \ldots, c_{n+k} = P(n+k)$ — that is, $n+k$ evaluations.
Bob receives any $n$ of these values. Since $P$ has degree $n-1$, he can reconstruct $P$ from any $n$ points, then evaluate $P(1), \ldots, P(n)$.

Why does this work? A degree-$(n-1)$ polynomial is uniquely determined by its values at $n$ distinct points (Property 2 from Note 8). So if Bob receives any $n$ packets out of the $n+k$ sent, that is exactly enough to recover $P$.

⚠ Optimality

This scheme is optimal: if Alice sent only $n+k-1$ packets, then with $k$ erasures Bob would have only $n-1$ values — not enough to uniquely determine a degree-$(n-1)$ polynomial (there would be $q$ valid polynomials through those $n-1$ points). One extra packet beyond the message is the exact minimum needed per erasure.

Worked Example: $n=4$ packets, guarding against $k=2$ erasures over GF(7)

📘 Full Worked Example — Alice sends 6 packets, 2 are lost

Setup: Message $m_1=3, m_2=1, m_3=5, m_4=0$ over GF(7). Find $P(x)$ of degree 3 with $P(i)=m_i$.

Using Lagrange interpolation: $P(x) = x^3 + 4x^2 + 5$ (mod 7). Check: $P(1)=10\equiv3$, $P(2)=25\equiv4$... (verify in exercise).

Codeword: Evaluate at $1,2,3,4,5,6$: $c_1=3,c_2=1,c_3=5,c_4=0,c_5=6,c_6=1$.

Suppose packets 2 and 6 are lost. Bob receives: $c_1=3, c_3=5, c_4=0, c_5=6$.

Bob has 4 values of a degree-3 polynomial at $x=1,3,4,5$. He runs Lagrange interpolation:

$$\Delta_1(x)=2(x-3)(x-4)(x-5),\quad \Delta_3(x)=2(x-1)(x-4)(x-5), \ldots$$

Reconstructs $P(x) = x^3 + 4x^2 + 5$. Evaluates $P(2)=1$ and $P(6)=1$. Message recovered: $3,1,5,0$.

Any 4 received packets $\Rightarrow$ unique degree-3 polynomial $\Rightarrow$ message recovered ✓

Section 3

General (Corruption) Errors

Now for the harder problem: $k$ packets are corrupted, and Bob doesn't know which ones. Alice must send $2k$ extra packets (twice as many as for erasures). The key insight uses a clever algebraic trick.

Setup for General Errors

Alice sends $n + 2k$ packets. Bob receives $r_1, \ldots, r_{n+2k}$. He knows at least $n+k$ values are correct (at most $k$ corrupted). His goal: reconstruct $P(x)$ from this data.

The Berlekamp-Welch Algorithm

The clever trick: introduce the error-locator polynomial $E(x) = (x-e_1)(x-e_2)\cdots(x-e_k)$, where $e_1, \ldots, e_k$ are the (unknown) positions of the errors. The key observation:

Key Observation

$$P(i) \cdot E(i) = r_i \cdot E(i) \quad \text{for all } 1 \leq i \leq n+2k$$ Why: At non-error positions, $P(i) = r_i$ so both sides are equal. At error positions $i = e_j$, $E(e_j) = 0$, so both sides are $0$.

Now define $Q(x) = P(x) \cdot E(x)$. This has degree $n + k - 1$. Then $Q(i) = r_i \cdot E(i)$ gives $n+2k$ equations. Both $Q$ and $E$ are unknown polynomials with known degree constraints ($Q$ has $n+k$ coefficients, $E$ has $k$ coefficients with leading coefficient 1). That's $n+2k$ unknowns, $n+2k$ equations — a solvable linear system!

Berlekamp-Welch Algorithm — Steps

Let $Q(x) = a_{n+k-1}x^{n+k-1}+\cdots+a_0$ (unknown coefficients $a_i$) and $E(x) = x^k + b_{k-1}x^{k-1}+\cdots+b_0$ (unknown $b_i$, leading coefficient is 1).
Write $Q(i) = r_i E(i)$ for each $i = 1, \ldots, n+2k$. This is a linear equation in the unknowns $a_i, b_j$.
Solve the $n+2k \times n+2k$ linear system to find $Q(x)$ and $E(x)$.
Compute $P(x) = Q(x) / E(x)$ (polynomial division, should be exact). Recover message: $P(1), \ldots, P(n)$.

📘 Example — $n=3$ message over GF(7), $k=1$ error

Alice sends message $3,0,6$ (polynomial $P(x)=x^2+x+1$ over GF(7)). Codeword: $P(1)=3,P(2)=0,P(3)=6,P(4)=0,P(5)=3$. Position 1 corrupted to 2, so Bob receives $r_1=2,r_2=0,r_3=6,r_4=0,r_5=3$.

Bob sets up: $E(x)=x+b_0$ and $Q(x)=a_3x^3+a_2x^2+a_1x+a_0$. The 5 equations $Q(i)=r_iE(i)$:

$$a_3+a_2+a_1+a_0 = 2(1+b_0), \quad a_3\cdot8+a_2\cdot4+2a_1+a_0=0, \ldots$$

Solving (all mod 7): $a_3=1, a_2=0, a_1=0, a_0=6, b_0=6$.

So $E(x)=x+6=x-1$, meaning the error was at position $e_1=1$. ✓

$Q(x)=x^3+6$. $P(x)=Q(x)/E(x)=(x^3+6)/(x-1)=x^2+x+1$ (mod 7).

Original message: $P(1)=3,P(2)=0,P(3)=6$. Recovered correctly! ✓

⚠ Why $2k$ extra packets for general errors?

For erasures: 1 extra packet per lost packet (you know which are missing). For general errors: 2 extra per corrupted packet (you don't know which are wrong, so you need twice the redundancy to uniquely identify both the error positions AND their values). This is fundamental, not a weakness of the algorithm.

Section 4

Hamming Distance and Optimality

Hamming Distance

The Hamming distance between two strings $\mathbf{s} = (s_1,\ldots,s_L)$ and $\mathbf{r} = (r_1,\ldots,r_L)$ is: $$d(\mathbf{s},\mathbf{r}) = \sum_{i=1}^{L} \mathbf{1}[r_i \neq s_i]$$ This counts the number of positions where they differ.

Minimum Distance of a Code

The minimum distance $d_{\min}$ of a code is the minimum Hamming distance between any two distinct codewords: $$d_{\min} = \min_{m \neq m'} d(\mathbf{c}(m), \mathbf{c}(m'))$$ The minimum distance determines how many errors a code can handle:

Erasure recovery: Can recover from up to $d_{\min} - 1$ erasures.
Error correction: Can correct up to $\lfloor (d_{\min}-1)/2 \rfloor$ general errors.

Hamming distance as a geometric concept. Two codewords at distance $d_{\min}$. Error correction corresponds to finding the closest codeword. Click to animate.

Section 5

Reed-Solomon: Minimum Distance

Theorem — Reed-Solomon Minimum Distance

The Reed-Solomon code with $n$ message characters and codeword length $n+2k$ has minimum distance exactly $2k+1$.

The proof has two parts:

Minimum distance $\leq 2k+1$ (existence of close codewords)

Take two messages that differ only in the last position: $m_1,\ldots,m_{n-1},m_n$ and $m_1,\ldots,m_{n-1},m_n'$ with $m_n \neq m_n'$. Their polynomials $P_a$ and $P_b$ agree on $x=1,\ldots,n-1$ (that's $n-1$ positions). So they can differ in at most $(n+2k)-(n-1) = 2k+1$ positions. Hence $d_{\min} \leq 2k+1$.

Minimum distance $\geq 2k+1$ (no two codewords are too close)

Suppose two distinct codewords $\mathbf{c}_a$ and $\mathbf{c}_b$ had $d(\mathbf{c}_a,\mathbf{c}_b) \leq 2k$. Then they agree on at least $n+2k-2k = n$ positions. Two polynomials $P_a, P_b$ of degree $\leq n-1$ agree on $n$ values $\Rightarrow$ they are identical (by Property 2). But then $\mathbf{c}_a = \mathbf{c}_b$, contradicting our assumption. Contradiction.

Summary Table

Error Type	Extra Packets Needed	Why?
$k$ erasures	$k$ (codeword length $n+k$)	Know which positions are missing
$k$ general errors	$2k$ (codeword length $n+2k$)	Must locate AND correct errors
Both $e$ erasures and $k$ errors	$e + 2k$ (if $e + 2k \leq d_{\min}-1$)	Combined budget

Section 6

Hard Practice Problems

🔥 Hard Problem 1

Problem: Alice sends a message of $n=3$ packets over GF(11), protecting against $k=2$ general errors. The codeword has length $n+2k=7$. Bob receives $r_1=1, r_2=0, r_3=3, r_4=5, r_5=4, r_6=9, r_7=2$. Set up (but do not fully solve) the Berlekamp-Welch linear system. How many unknowns and equations are there? Identify what $Q(x)$ and $E(x)$ look like structurally.

📋 Solution to Hard Problem 1

Solution:

$n=3, k=2$. $Q(x)$ has degree $n+k-1 = 4$, so $Q(x)=a_4x^4+a_3x^3+a_2x^2+a_1x+a_0$ with 5 unknown coefficients.
$E(x)$ has degree $k=2$ with leading coefficient 1: $E(x)=x^2+b_1x+b_0$ with 2 unknown coefficients.
Total unknowns: $5 + 2 = 7 = n+2k$. ✓
The equations $Q(i)=r_iE(i)$ for $i=1,\ldots,7$ give 7 linear equations in 7 unknowns, e.g., for $i=1$: $a_4+a_3+a_2+a_1+a_0 = 1\cdot(1+b_1+b_0)$, i.e., $a_4+a_3+a_2+a_1+a_0 - b_1 - b_0 = 1$.
This $7\times7$ linear system over GF(11) can be solved by Gaussian elimination to find $Q$ and $E$, then $P=Q/E$.

7 unknowns, 7 equations. System is solvable and yields $P(x)$ uniquely.

🔥 Hard Problem 2

Problem: Prove that a Reed-Solomon code with minimum distance $d_{\min} = 2k+1$ can simultaneously correct $k_e$ erasures and $k_g$ general errors, as long as $k_e + 2k_g \leq 2k$. (Hint: Think of erasures as errors whose positions you know.)

📋 Solution to Hard Problem 2

Solution: Think of erasures as errors that happen at known locations.

After the $k_e$ erasures are removed from the codeword, Bob has $n+2k-k_e$ received values, of which $k_g$ are corrupted.
Bob already knows the $k_e$ erasure positions. He can handle them with "forced" values in the Berlekamp-Welch setup: define $E(x) = E_{\text{erase}}(x) \cdot E_{\text{error}}(x)$ where $E_{\text{erase}}(x) = \prod_{j=1}^{k_e}(x - \text{pos}_j)$ is fully known, and $E_{\text{error}}(x)$ has degree $k_g$ with unknown coefficients.
The unknowns now are: $n+k_g$ coefficients for $Q'(x) = P(x)E_{\text{error}}(x)$ (degree $n+k_g-1$), plus $k_g$ coefficients for $E_{\text{error}}(x)$. Total: $n+2k_g$ unknowns.
Equations from uncorrupted received positions: at least $n+2k - k_e - k_g \geq n+2k_g$ (since $k_e+2k_g \leq 2k$ implies $k_e + k_g \leq 2k - k_g$). So we have enough equations.
Solvability follows by the same argument as the pure error case.

The key inequality is $k_e + 2k_g \leq 2k = d_{\min} - 1$, which quantifies the "error budget" of the code. Each erasure costs 1 unit of budget; each unknown error costs 2 units.

The code can handle $k_e + 2k_g \leq 2k$ combined errors/erasures. ∎