Note 14: Conditional Probability | EECS 70

EECS 70 — Fall 2021

Note 14: Conditional Probability, Independence, and Combinations

Conditional probability · Bayes' Rule · Total Probability Rule · Independence · Product Rule · Union Bound

Coin flips are independent — ten previous tails do not affect the next flip. But if you have dealt ten red cards from a deck, the probability of the next card being red drops noticeably. This difference is the essence of conditional probability: learning that an event occurred changes our probability estimates for other events. This note formalizes that idea and derives the most important computational tools in probability theory.

Section 1

Conditional Probability

Suppose event $B$ is known to have occurred. This shrinks our sample space from $\Omega$ to the subset $B$. Every sample point $\omega \in B$ must have its probability rescaled so that the new probabilities sum to 1. The scaling factor is $1/\mathbb{P}[B]$.

Definition — Conditional Probability

For events $A, B \subseteq \Omega$ with $\mathbb{P}[B] > 0$, the conditional probability of $A$ given $B$ is: $$\mathbb{P}[A \mid B] = \frac{\mathbb{P}[A \cap B]}{\mathbb{P}[B]}$$ Intuition: restrict the sample space to $B$, then ask what fraction of $B$ falls in $A$.

Conditional probability visualized. The full oval is $\Omega$. Given $B$ has occurred, the new sample space is the $B$ oval. $\mathbb{P}[A|B]$ is the fraction of $B$ that overlaps with $A$. Click to explore different configurations.

📘 Example — Balls and Bins: conditioning on bin 2 being empty

Throw $m=4$ balls into $n=3$ bins. Total outcomes: $3^4 = 81$, uniform space.

Let $A$ = event bin 1 is empty, $B$ = event bin 2 is empty.

$\mathbb{P}[B] = (2/3)^4 = 16/81$ (4 balls each avoid bin 2 with prob 2/3).

$A \cap B$ = both bins 1 and 2 empty, so all 4 balls fall in bin 3. $\mathbb{P}[A \cap B] = (1/3)^4 = 1/81$.

$$\mathbb{P}[A \mid B] = \frac{1/81}{16/81} = \frac{1}{16}$$

Compare: $\mathbb{P}[A] = 16/81 \approx 0.198$, but $\mathbb{P}[A \mid B] = 1/16 = 0.0625$. Knowing bin 2 is empty drastically reduces the probability that bin 1 is also empty.

$\mathbb{P}[A|B] = 1/16$

📘 Example — Card Dealing: second ace given first is an ace

Deal 2 cards in order. $|\Omega| = 52 \times 51$. Let $B$ = first card is an ace, $A$ = second card is an ace.

$\mathbb{P}[B] = 4/52 = 1/13$. $\mathbb{P}[A \cap B] = (4 \times 3)/(52 \times 51) = 12/2652$.

$$\mathbb{P}[A \mid B] = \frac{12/2652}{4/52} = \frac{12}{2652} \times \frac{52}{4} = \frac{3}{51} = \frac{1}{17}$$

Without conditioning, $\mathbb{P}[A] = 4/52 = 1/13 \approx 0.077$. With conditioning, $\mathbb{P}[A|B] = 1/17 \approx 0.059$. If we know the first card is an ace, there are now only 3 aces left out of 51 remaining cards.

$\mathbb{P}[A|B] = 3/51 = 1/17$

Section 2

Bayesian Inference

Conditional probability is at the heart of Bayesian inference — the process of updating beliefs after observations. The key idea: you start with a prior probability $\mathbb{P}[A]$ (your belief before any observations), make an observation $B$, and compute the posterior probability $\mathbb{P}[A|B]$ (your updated belief).

Medical Test — The Classic Example

A test for a medical disorder has these properties:

$\mathbb{P}[A] = 0.05$ — 5% of the population is affected
$\mathbb{P}[B \mid A] = 0.9$ — the test is positive for 90% of affected people (10% false negatives)
$\mathbb{P}[B \mid \bar{A}] = 0.2$ — the test is positive for 20% of healthy people (20% false positives)

A random person tests positive. What is the probability they are actually affected?

Using Bayes' Rule: $\mathbb{P}[A \mid B] = \dfrac{\mathbb{P}[B \mid A]\mathbb{P}[A]}{\mathbb{P}[B]} = \dfrac{0.9 \times 0.05}{\mathbb{P}[B]} = \dfrac{0.045}{\mathbb{P}[B]}$

$\mathbb{P}[B] = 0.9 \times 0.05 + 0.2 \times 0.95 = 0.045 + 0.19 = 0.235$

$$\mathbb{P}[A \mid B] = \frac{0.045}{0.235} = \frac{9}{47} \approx 0.191$$ Despite testing positive, there is only about a 19% chance of actually being affected!

This result is counterintuitive but crucial. The low prior probability ($5\%$) combined with the high false positive rate ($20\%$) means most positive tests are false. This is why medical screening programs are more effective for high-risk populations.

Visual breakdown of the medical test. The large rectangle represents the US population. Subregions show affected/healthy and positive/negative test results. Drag the sliders to change prior and test quality.

Section 3

Bayes' Rule and Total Probability Rule

Total Probability Rule (Two Events)

For any event $B$ and its complement $\bar{A}$, where $A$ and $\bar{A}$ partition $\Omega$: $$\mathbb{P}[B] = \mathbb{P}[B \mid A]\mathbb{P}[A] + \mathbb{P}[B \mid \bar{A}]\mathbb{P}[\bar{A}]$$ Think of it as: "divide $\Omega$ into cases (whether $A$ happens or not), compute $\mathbb{P}[B]$ in each case, then average."

Bayes' Rule (Two Events)

$$\mathbb{P}[A \mid B] = \frac{\mathbb{P}[B \mid A]\mathbb{P}[A]}{\mathbb{P}[B \mid A]\mathbb{P}[A] + \mathbb{P}[B \mid \bar{A}]\mathbb{P}[\bar{A}]}$$ This "flips" conditional probability: converts $\mathbb{P}[B|A]$ (what we know) into $\mathbb{P}[A|B]$ (what we want).

Total Probability Rule: why it works

$A \cap B$ and $\bar{A} \cap B$ are disjoint and their union is $B$. So $\mathbb{P}[B] = \mathbb{P}[A \cap B] + \mathbb{P}[\bar{A} \cap B]$. Apply definition of conditional probability to each term.

Bayes' Rule: why it works

$\mathbb{P}[A|B] = \mathbb{P}[A \cap B]/\mathbb{P}[B]$. Rewrite the numerator using $\mathbb{P}[B|A]\mathbb{P}[A] = \mathbb{P}[A \cap B]$. Then apply Total Probability Rule to the denominator.

📘 Example — Tennis Match

You will play either opponent X (probability 0.6) or Y (probability 0.4). You win against X with probability 0.7 and against Y with probability 0.3.

By the Total Probability Rule:

$$\mathbb{P}[\text{win}] = \mathbb{P}[\text{win}|X]\mathbb{P}[X] + \mathbb{P}[\text{win}|Y]\mathbb{P}[Y] = 0.7 \times 0.6 + 0.3 \times 0.4 = 0.42 + 0.12 = 0.54$$ You win with probability 0.54.

General Form: Partitions

Partition of $\Omega$

Events $A_1, \ldots, A_n$ form a partition of $\Omega$ if they are pairwise disjoint ($A_i \cap A_j = \emptyset$ for $i \neq j$) and their union is $\Omega$ ($\bigcup_i A_i = \Omega$).

General Total Probability Rule and Bayes' Rule

Let $A_1, \ldots, A_n$ partition $\Omega$. For any event $B$: $$\mathbb{P}[B] = \sum_{i=1}^{n} \mathbb{P}[B \mid A_i]\mathbb{P}[A_i]$$ $$\mathbb{P}[A_i \mid B] = \frac{\mathbb{P}[B \mid A_i]\mathbb{P}[A_i]}{\sum_{j=1}^{n} \mathbb{P}[B \mid A_j]\mathbb{P}[A_j]}$$

Section 4

Independence

Definition — Independence

Two events $A$ and $B$ are independent if: $$\mathbb{P}[A \cap B] = \mathbb{P}[A] \cdot \mathbb{P}[B]$$ Equivalently (when $\mathbb{P}[B] > 0$): $\mathbb{P}[A \mid B] = \mathbb{P}[A]$ — knowing $B$ occurred does not change the probability of $A$.

Intuition: the two events have nothing to do with each other. Coin flips are independent: the outcome of the first flip carries no information about the second. Drawing cards from a deck without replacement is NOT independent: each card drawn changes the composition of the remaining deck.

Mutual Independence vs. Pairwise Independence

Mutual Independence

Events $A_1, \ldots, A_n$ are mutually independent if for every subset $I \subseteq \{1,\ldots,n\}$ with $|I| \geq 2$: $$\mathbb{P}\left[\bigcap_{i \in I} A_i\right] = \prod_{i \in I} \mathbb{P}[A_i]$$ This must hold for ALL subsets, not just pairs.

⚠ Pairwise Independence Does NOT Imply Mutual Independence

Counterexample: Flip a fair coin twice. Let:

$A$ = first flip is H (prob 1/2)
$B$ = second flip is H (prob 1/2)
$C$ = both flips are the same (prob 1/2)

Any two of $\{A, B, C\}$ are independent (e.g., $\mathbb{P}[A \cap C] = 1/4 = \mathbb{P}[A]\mathbb{P}[C]$ since $A \cap C = \{HH\}$). But $\mathbb{P}[A \cap B \cap C] = \mathbb{P}[\{HH\}] = 1/4 \neq 1/8 = \mathbb{P}[A]\mathbb{P}[B]\mathbb{P}[C]$. So $A$, $B$, $C$ are pairwise independent but NOT mutually independent.

Section 5

Intersections: The Product Rule

Theorem — Product Rule (Chain Rule)

For any events $A_1, \ldots, A_n$: $$\mathbb{P}\left[\bigcap_{i=1}^{n} A_i\right] = \mathbb{P}[A_1] \cdot \mathbb{P}[A_2 \mid A_1] \cdot \mathbb{P}[A_3 \mid A_1 \cap A_2] \cdots \mathbb{P}\left[A_n \mid \bigcap_{i=1}^{n-1} A_i\right]$$ For mutually independent events, each conditional probability equals the unconditional probability, so: $$\mathbb{P}\left[\bigcap_{i=1}^{n} A_i\right] = \prod_{i=1}^{n} \mathbb{P}[A_i]$$

📘 Example — Poker Flush via Product Rule

Let $A_i$ = event that the $i$-th card drawn is a Heart. We want $\mathbb{P}[A_1 \cap A_2 \cap A_3 \cap A_4 \cap A_5]$.

$$\mathbb{P}[A_1] = \frac{13}{52} = \frac{1}{4}$$ $$\mathbb{P}[A_2 \mid A_1] = \frac{12}{51}, \quad \mathbb{P}[A_3 \mid A_1 \cap A_2] = \frac{11}{50}, \quad \mathbb{P}[A_4 \mid A_1\cap A_2 \cap A_3] = \frac{10}{49}, \quad \mathbb{P}[A_5 \mid \cdots] = \frac{9}{48}$$ $$\mathbb{P}[\text{Hearts flush}] = \frac{13}{52} \times \frac{12}{51} \times \frac{11}{50} \times \frac{10}{49} \times \frac{9}{48}$$ $$\mathbb{P}[\text{any flush}] = 4 \times \frac{13 \cdot 12 \cdot 11 \cdot 10 \cdot 9}{52 \cdot 51 \cdot 50 \cdot 49 \cdot 48} \approx 0.00198$$ Same answer as Note 13, via a different method. Two approaches confirm each other.

📘 Example — Monty Hall Revisited via Product Rule

Let $C_1$ = contestant picks door 1, $P_2$ = prize behind door 2, $H_3$ = host opens door 3.

Note: $C_1$ and $P_2$ are independent (contestant's choice and prize placement are unrelated).

$$\mathbb{P}[C_1 \cap P_2 \cap H_3] = \mathbb{P}[C_1] \times \mathbb{P}[P_2 \mid C_1] \times \mathbb{P}[H_3 \mid C_1 \cap P_2]$$ $$= \frac{1}{3} \times \frac{1}{3} \times 1 = \frac{1}{9}$$

The last factor is 1 because given the contestant chose door 1 and prize is behind door 2, the host MUST open door 3 (the only remaining goat door).

Then: $\mathbb{P}[P_2 \mid C_1 \cap H_3] = \dfrac{\mathbb{P}[C_1 \cap P_2 \cap H_3]}{\mathbb{P}[C_1 \cap H_3]} = \dfrac{1/9}{1/6} = \dfrac{2}{3}$

Switching wins with probability 2/3. Formally confirmed via the Product Rule.

Section 6

Unions: Inclusion-Exclusion and Union Bound

Theorem — Inclusion-Exclusion for Probabilities

For events $A_1, \ldots, A_n$: $$\mathbb{P}\left[\bigcup_{i=1}^{n} A_i\right] = \sum_{i} \mathbb{P}[A_i] - \sum_{i < j} \mathbb{P}[A_i \cap A_j] + \sum_{i < j < k} \mathbb{P}[A_i \cap A_j \cap A_k] - \cdots$$

📘 Example — Las Vegas dice game (the casino's wrong claim)

You pick a number from 1 to 6. Three dice are rolled. You win if your number appears on at least one die.

Let $A_i$ = your number appears on die $i$. $\mathbb{P}[A_i] = 1/6$. The dice are independent, so:

$$\mathbb{P}[A_i \cap A_j] = \mathbb{P}[A_i]\mathbb{P}[A_j] = \frac{1}{36}, \quad \mathbb{P}[A_1 \cap A_2 \cap A_3] = \frac{1}{216}$$

The casino's (wrong) claim: $\mathbb{P}[A_1 \cup A_2 \cup A_3] = 3 \times \frac{1}{6} = \frac{1}{2}$. Wrong because events overlap!

Correct computation via inclusion-exclusion:

$$\mathbb{P}[A_1 \cup A_2 \cup A_3] = 3 \cdot \frac{1}{6} - 3 \cdot \frac{1}{36} + \frac{1}{216} = \frac{108 - 18 + 1}{216} = \frac{91}{216} \approx 0.421$$ Your actual odds are about 42.1%, not 50%. The casino's argument overcounts outcomes where your number appears on multiple dice.

Union Bound (Boole's Inequality)

For any events $A_1, \ldots, A_n$ (no independence required): $$\mathbb{P}\left[\bigcup_{i=1}^{n} A_i\right] \leq \sum_{i=1}^{n} \mathbb{P}[A_i]$$ This is just the first term of inclusion-exclusion. It overestimates the probability of any union. Despite its crudeness, the union bound is extremely useful in algorithm analysis when events are rare.

Union bound visualized for 3 events. The circles may overlap. Adding up $\mathbb{P}[A_i]$ counts overlaps multiple times, giving an overestimate. Inclusion-exclusion corrects for this. Click to animate.

Section 7

Hard Practice Problems

🔥 Hard Problem 1

Problem: A biased coin has $\mathbb{P}[H] = 1/3$. You flip it $n$ times. Let $A$ be the event that you get at least one head, and let $B$ be the event that you get at least one tail. Compute $\mathbb{P}[A \mid B]$ in terms of $n$. What does it approach as $n \to \infty$?

📋 Solution to Hard Problem 1

Solution:

Compute $\mathbb{P}[A \cap B]$: this is the probability of having at least one head AND at least one tail — equivalently, the sequence is not all-heads and not all-tails. $$\mathbb{P}[A \cap B] = 1 - \mathbb{P}[\bar{A}] - \mathbb{P}[\bar{B}] = 1 - \left(\frac{2}{3}\right)^n - \left(\frac{1}{3}\right)^n$$ ($\bar{A}$ = all tails, prob $(2/3)^n$; $\bar{B}$ = all heads, prob $(1/3)^n$. These are disjoint.)
Compute $\mathbb{P}[B]$: at least one tail means not all heads. $$\mathbb{P}[B] = 1 - \mathbb{P}[\bar{B}] = 1 - \left(\frac{1}{3}\right)^n$$
Apply the definition: $$\mathbb{P}[A \mid B] = \frac{\mathbb{P}[A \cap B]}{\mathbb{P}[B]} = \frac{1 - (2/3)^n - (1/3)^n}{1 - (1/3)^n}$$
As $n \to \infty$: $(2/3)^n \to 0$ and $(1/3)^n \to 0$, so $\mathbb{P}[A|B] \to 1/1 = 1$. Intuitively: if you flip many coins and we know there's at least one tail, it becomes near-certain there's also at least one head.

$\mathbb{P}[A|B] = \dfrac{1-(2/3)^n-(1/3)^n}{1-(1/3)^n} \to 1$ as $n\to\infty$.

🔥 Hard Problem 2

Problem: An instructor gives a multiple-choice test with 4 options per question. A student either knows the answer (prob $p$) or guesses uniformly at random. Given that a student answered a question correctly, what is the probability they actually knew the answer (vs. guessed correctly)? Compute this for $p = 0.6$ and explain the result intuitively.

📋 Solution to Hard Problem 2

This is a classic application of Bayes' Rule.

Define events: $K$ = student knows the answer (prob $p$), $C$ = student answers correctly.
Given values: $\mathbb{P}[K] = p$, $\mathbb{P}[\bar{K}] = 1-p$, $\mathbb{P}[C \mid K] = 1$ (knows $\Rightarrow$ correct), $\mathbb{P}[C \mid \bar{K}] = 1/4$ (random guess, 4 options).
Total Probability Rule: $\mathbb{P}[C] = 1 \cdot p + \frac{1}{4}(1-p) = p + \frac{1-p}{4} = \frac{3p+1}{4}$.
Bayes' Rule: $\mathbb{P}[K \mid C] = \dfrac{\mathbb{P}[C|K]\mathbb{P}[K]}{\mathbb{P}[C]} = \dfrac{1 \cdot p}{(3p+1)/4} = \dfrac{4p}{3p+1}$.
For $p = 0.6$: $\mathbb{P}[K|C] = \dfrac{4 \times 0.6}{3 \times 0.6 + 1} = \dfrac{2.4}{2.8} = \dfrac{6}{7} \approx 0.857$.

Intuition: A correct answer most likely means the student knew it. Even though some guessers get lucky, knowing the answer is by far the most common reason for a correct response. The 85.7% figure quantifies this. If $p$ were very small (very weak class), the fraction would be closer to 0.

$\mathbb{P}[K|C] = 4p/(3p+1)$. For $p=0.6$, this is $6/7 \approx 85.7\%$.

Summary

Key Formulas Reference

Formula	When to Use
$\mathbb{P}[A\|B] = \mathbb{P}[A\cap B]/\mathbb{P}[B]$	Definition of conditional prob; computing $\mathbb{P}[A\|B]$ from intersection
$\mathbb{P}[B] = \mathbb{P}[B\|A]\mathbb{P}[A]+\mathbb{P}[B\|\bar A]\mathbb{P}[\bar A]$	Total Probability Rule; denominator for Bayes
$\mathbb{P}[A\|B] = \mathbb{P}[B\|A]\mathbb{P}[A]/\mathbb{P}[B]$	Bayes' Rule; "flipping" conditional probability
$\mathbb{P}[A\cap B]=\mathbb{P}[A]\mathbb{P}[B]$	Test for independence
$\mathbb{P}[A_1\cap\cdots\cap A_n]=\mathbb{P}[A_1]\mathbb{P}[A_2\|A_1]\cdots$	Product Rule (chain rule)
$\mathbb{P}[\bigcup A_i] = \sum \mathbb{P}[A_i]-\sum_{i	Inclusion-Exclusion for unions
$\mathbb{P}[\bigcup A_i] \leq \sum \mathbb{P}[A_i]$	Union Bound (simple overestimate)