### 13.7 — Channel Capacity
Phase: Information Theory Prerequisites: 13-03-mutual-information, 13-01-entropy
Learning Objectives
By the end of this subject, you will be able to:
- Define a communication channel model and channel capacity
- Compute capacity for simple channels (BSC, BEC)
- State Shannon's noisy-channel coding theorem and its implications
- Explain the relationship between capacity, bandwidth, and SNR
- Interpret capacity as the maximum reliable communication rate
Core Content
⚠️ CRITICAL: The Communication Channel
A channel is defined by its conditional distribution $P(Y \mid X)$: - $X$: channel input (what we transmit) - $Y$: channel output (what is received)
Channel capacity is the maximum rate at which information can be transmitted reliably:
$$C = \max_{P(X)} I(X; Y)$$
The maximisation is over all possible input distributions $P(X)$.
Binary Symmetric Channel (BSC)
The simplest noisy channel: bits are flipped with probability $p$.
$$P(Y=0 \mid X=0) = 1-p, \quad P(Y=1 \mid X=0) = p$$ $$P(Y=0 \mid X=1) = p, \quad P(Y=1 \mid X=1) = 1-p$$
Capacity: $C_{\text{BSC}} = 1 - H_b(p)$ bits per channel use
- $p = 0$: $C = 1$ (perfect channel — no noise)
- $p = 0.5$: $C = 0$ (completely random — output independent of input)
- $p = 0.1$: $C = 1 - 0.469 = 0.531$ bits per use
The capacity is achieved by a uniform input distribution: $P(X=0) = P(X=1) = 0.5$.
Binary Erasure Channel (BEC)
Bits are either received correctly or "erased" (lost, denoted ?) with probability $\epsilon$.
$$P(Y=0 \mid X=0) = 1-\epsilon, \quad P(Y=\text{?} \mid X=0) = \epsilon, \quad P(Y=1 \mid X=0) = 0$$ $$P(Y=1 \mid X=1) = 1-\epsilon, \quad P(Y=\text{?} \mid X=1) = \epsilon, \quad P(Y=0 \mid X=1) = 0$$
Capacity: $C_{\text{BEC}} = 1 - \epsilon$ bits per channel use
Erasures are "easier" than errors — you at least KNOW when you're uncertain! The BEC at $\epsilon = 0.5$ still has capacity 0.5, while BSC at $p = 0.5$ has capacity 0.
⚠️ CRITICAL: Shannon's Noisy-Channel Coding Theorem
Theorem: For any discrete memoryless channel with capacity $C$:
- Achievability: For any rate $R < C$, there exists a code that achieves arbitrarily low error probability.
- Converse: For any rate $R > C$, reliable communication is impossible — the error probability is bounded away from zero.
Key insight: Reliable communication IS possible over noisy channels, up to a fundamental limit $C$. Error-correcting codes exist that can achieve rates arbitrarily close to $C$ (though the codes may be extremely long and complex).
This theorem launched the field of coding theory. Modern codes (Turbo codes, LDPC, polar codes) approach the Shannon limit in practice.
Continuous-Time Channel: AWGN
For the Additive White Gaussian Noise channel (continuous-time):
$$C = B \log_2\left(1 + \frac{S}{N}\right) \quad \text{bits per second}$$
Where: - $B$ = bandwidth (Hz) - $S/N$ = signal-to-noise ratio (linear, not dB)
This is the Shannon-Hartley theorem — the fundamental limit of any communication system.
Example: Telephone line: $B = 3400$ Hz, SNR = 30 dB = 1000.
$C = 3400 \cdot \log_2(1 + 1000) \approx 3400 \cdot 9.97 \approx 33,900$ bps.
This is the theoretical maximum for a dial-up modem — real modems hit ~56 kbps by exploiting the digital backbone, which circumvents the analog local-loop limit.
Key Terms
- Channel capacity
- Shannon-Hartley theorem
Worked Examples
Example 1: BSC capacity calculation
BSC with flip probability $p = 0.2$.
$H_b(0.2) = -0.2\log_2 0.2 - 0.8\log_2 0.8 = 0.464 + 0.258 = 0.722$
$C = 1 - 0.722 = 0.278$ bits per channel use.
Per 1000 channel uses, you can reliably transmit at most ~278 bits of information.
Example 2: Comparing BSC and BEC
Both have their "damage" parameter at 0.3:
BSC ($p=0.3$): $C = 1 - H_b(0.3) = 1 - 0.881 = 0.119$ bits
BEC ($\epsilon=0.3$): $C = 1 - 0.3 = 0.700$ bits
BEC capacity is MUCH higher because erasures tell you where the uncertainty is — you can use erasure codes efficiently. Errors are worse because you don't know which bits are wrong.
Example 3: AWGN capacity
WiFi channel: $B = 20$ MHz, SNR = 15 dB = 31.62.
$C = 20 \times 10^6 \cdot \log_2(1 + 31.62) = 20 \times 10^6 \cdot \log_2(32.62) \approx 20 \times 10^6 \cdot 5.028 \approx 100.6$ Mbps.
Real WiFi achieves much less due to practical coding gaps, overhead, and interference — but the theoretical limit is instructive.
Quiz
Q1: What does the concept of Channel capacity primarily refer to in this subject?
A) A visual representation of Channel capacity B) A historical anecdote about Channel capacity C) A computational error related to Channel capacity D) The definition and application of Channel capacity
Correct: D)
- If you chose A: This is incorrect. Channel capacity is defined as: the definition and application of channel capacity. The other options describe different aspects that are not the primary focus.
- If you chose B: This is incorrect. Channel capacity is defined as: the definition and application of channel capacity. The other options describe different aspects that are not the primary focus.
- If you chose C: This is incorrect. Channel capacity is defined as: the definition and application of channel capacity. The other options describe different aspects that are not the primary focus.
- If you chose D: Channel capacity is defined as: the definition and application of channel capacity. The other options describe different aspects that are not the primary focus. Correct!
Q2: Which of the following is the key formula discussed in this subject?
A) An unrelated formula from a different topic B) P(Y \mid X) C) The inverse operation of the formula in question D) A simplified version of P(Y \mid X)...
Correct: B)
- If you chose A: This is incorrect. The formula P(Y \mid X) is central to this subject. The other options are either simplified versions or unrelated.
- If you chose B: The formula P(Y \mid X) is central to this subject. The other options are either simplified versions or unrelated. Correct!
- If you chose C: This is incorrect. The formula P(Y \mid X) is central to this subject. The other options are either simplified versions or unrelated.
- If you chose D: This is incorrect. The formula P(Y \mid X) is central to this subject. The other options are either simplified versions or unrelated.
Q3: What is the primary purpose of Shannon-Hartley theorem?
A) It is used only in advanced research contexts B) It replaces all other methods in this domain C) It is used to shannon-hartley theorem in mathematical analysis D) It is primarily a historical notation system
Correct: C)
- If you chose A: This is incorrect. Shannon-Hartley theorem serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose B: This is incorrect. Shannon-Hartley theorem serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose C: Shannon-Hartley theorem serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose D: This is incorrect. Shannon-Hartley theorem serves the purpose described in the correct answer. The other options misrepresent its role.
Q4: Which statement about ⚠️ Critical: The Communication Channel is TRUE?
A) ⚠️ Critical: The Communication Channel is mentioned only as a historical footnote B) ⚠️ Critical: The Communication Channel is an advanced topic beyond this subject's scope C) ⚠️ Critical: The Communication Channel is not related to this subject D) ⚠️ Critical: The Communication Channel is a fundamental concept covered in this subject
Correct: D)
- If you chose A: This is incorrect. ⚠️ Critical: The Communication Channel is a fundamental concept covered in this subject. This subject covers ⚠️ Critical: The Communication Channel as part of its core content.
- If you chose B: This is incorrect. ⚠️ Critical: The Communication Channel is a fundamental concept covered in this subject. This subject covers ⚠️ Critical: The Communication Channel as part of its core content.
- If you chose C: This is incorrect. ⚠️ Critical: The Communication Channel is a fundamental concept covered in this subject. This subject covers ⚠️ Critical: The Communication Channel as part of its core content.
- If you chose D: ⚠️ Critical: The Communication Channel is a fundamental concept covered in this subject. This subject covers ⚠️ Critical: The Communication Channel as part of its core content. Correct!
Q5: Based on the worked examples in this subject, what is the correct result?
A) \max_{P(X)} I(X; Y)$ — maximum reliable communicatio B) A different result from a common mistake C) An unrelated numerical value D) The inverse of the correct answer
Correct: A)
- If you chose A: The worked examples show that the result is \max_{P(X)} I(X; Y)$ — maximum reliable communicatio. The other options represent common errors. Correct!
- If you chose B: This is incorrect. The worked examples show that the result is \max_{P(X)} I(X; Y)$ — maximum reliable communicatio. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is \max_{P(X)} I(X; Y)$ — maximum reliable communicatio. The other options represent common errors.
- If you chose D: This is incorrect. The worked examples show that the result is \max_{P(X)} I(X; Y)$ — maximum reliable communicatio. The other options represent common errors.
Q6: How are ⚠️ Critical: The Communication Channel and Binary Symmetric Channel (Bsc) related?
A) ⚠️ Critical: The Communication Channel is the inverse of Binary Symmetric Channel (Bsc) B) ⚠️ Critical: The Communication Channel and Binary Symmetric Channel (Bsc) are completely unrelated topics C) ⚠️ Critical: The Communication Channel and Binary Symmetric Channel (Bsc) are closely related concepts D) ⚠️ Critical: The Communication Channel is a special case of Binary Symmetric Channel (Bsc)
Correct: C)
- If you chose A: This is incorrect. Both ⚠️ Critical: The Communication Channel and Binary Symmetric Channel (Bsc) are covered in this subject as interconnected topics.
- If you chose B: This is incorrect. Both ⚠️ Critical: The Communication Channel and Binary Symmetric Channel (Bsc) are covered in this subject as interconnected topics.
- If you chose C: Both ⚠️ Critical: The Communication Channel and Binary Symmetric Channel (Bsc) are covered in this subject as interconnected topics. Correct!
- If you chose D: This is incorrect. Both ⚠️ Critical: The Communication Channel and Binary Symmetric Channel (Bsc) are covered in this subject as interconnected topics.
Q7: What is a common pitfall when working with Binary Erasure Channel (Bec)?
A) A common mistake is confusing Binary Erasure Channel (Bec) with a similar concept B) The main error with Binary Erasure Channel (Bec) is using it when it is not needed C) Binary Erasure Channel (Bec) has no common misconceptions D) Binary Erasure Channel (Bec) is always computed the same way in all contexts
Correct: A)
- If you chose A: Students often confuse Binary Erasure Channel (Bec) with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
- If you chose B: This is incorrect. Students often confuse Binary Erasure Channel (Bec) with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose C: This is incorrect. Students often confuse Binary Erasure Channel (Bec) with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose D: This is incorrect. Students often confuse Binary Erasure Channel (Bec) with similar-sounding or related concepts. Pay attention to the precise definitions.
Q8: When should you apply ⚠️ Critical: Shannon'S Noisy-Channel Coding Theorem?
A) Avoid ⚠️ Critical: Shannon'S Noisy-Channel Coding Theorem unless explicitly instructed B) Apply ⚠️ Critical: Shannon'S Noisy-Channel Coding Theorem to solve problems in this subject's domain C) Use ⚠️ Critical: Shannon'S Noisy-Channel Coding Theorem only in pure mathematics contexts D) ⚠️ Critical: Shannon'S Noisy-Channel Coding Theorem is not practically useful
Correct: B)
- If you chose A: This is incorrect. ⚠️ Critical: Shannon'S Noisy-Channel Coding Theorem is a practical tool used throughout this subject to solve relevant problems.
- If you chose B: ⚠️ Critical: Shannon'S Noisy-Channel Coding Theorem is a practical tool used throughout this subject to solve relevant problems. Correct!
- If you chose C: This is incorrect. ⚠️ Critical: Shannon'S Noisy-Channel Coding Theorem is a practical tool used throughout this subject to solve relevant problems.
- If you chose D: This is incorrect. ⚠️ Critical: Shannon'S Noisy-Channel Coding Theorem is a practical tool used throughout this subject to solve relevant problems.
Practice Problems
-
What is the capacity of a BSC with $p = 0.01$?
Click for answer
$H_b(0.01) \approx 0.081$ bits $C = 1 - 0.081 = 0.919$ bits per channel use Even 1% error rate reduces capacity by ~8%. -
Why is $C = 0$ for a BSC with $p = 0.5$?
Click for answer
$H_b(0.5) = 1$, so $C = 0$. With $p=0.5$, the output is independent of the input — $I(X; Y) = 0$ for any input distribution. No matter what you transmit, the receiver sees pure noise. No reliable communication is possible. -
Channel capacity is defined as $C = \max_{P(X)} I(X; Y)$. Why maximise over $P(X)$?
Click for answer
The channel is defined by $P(Y \mid X)$, which is fixed (it's the physical medium). But we CAN choose how we use the channel — the input distribution $P(X)$. For example, a BSC achieves maximum $I(X; Y)$ when inputs are equally likely. If we always sent 0, $I(X; Y) = 0$. Capacity is the best we can do by designing the input optimally. -
The AWGN capacity formula $C = B \log_2(1 + S/N)$ suggests increasing bandwidth helps. Is there a limit?
Click for answer
As $B \to \infty$, $C \to \frac{S}{N_0} \log_2 e \approx 1.44 \cdot \frac{S}{N_0}$ (where $N_0$ is noise power spectral density). Infinite bandwidth does NOT give infinite capacity — it approaches a finite limit determined by signal power and noise density. -
If a channel has capacity $C = 0.5$ bits per use, can you transmit 100 bits reliably in 150 channel uses?
Click for answer
Rate $R = 100/150 = 0.667$ bits per use. Since $R > C = 0.5$, reliable transmission is IMPOSSIBLE according to the noisy-channel coding theorem (converse). You need at least $100/0.5 = 200$ channel uses. The error probability will be bounded away from zero.
Summary
Key takeaways:
- Channel capacity $C = \max_{P(X)} I(X; Y)$ — maximum reliable communication rate
- BSC capacity: $C = 1 - H_b(p)$; noise reduces capacity
- BEC capacity: $C = 1 - \epsilon$; erasures are "better" than errors
- Noisy-channel coding theorem: Reliable communication possible at rates $R < C$, impossible at $R > C$
- AWGN capacity: $C = B \log_2(1 + S/N)$ — the Shannon-Hartley limit
- Capacity is achieved by optimal input distribution (uniform for symmetric channels)
Pitfalls
-
Confusing BSC capacity with BEC capacity formulas: BSC capacity is $C = 1 - H_b(p)$, while BEC capacity is $C = 1 - \epsilon$. At the same "damage" level (e.g., $p = \epsilon = 0.3$), the BEC has much higher capacity ($0.700$ vs $0.119$) because erasures tell you where the uncertainty is. Applying the BSC formula to an erasure channel (or vice versa) gives wildly wrong results.
-
Using dB instead of linear SNR in the Shannon-Hartley formula: $C = B \log_2(1 + S/N)$ requires $S/N$ in linear units, not decibels. SNR of 30 dB is a ratio of 1000, not 30. Plugging in 30 instead of 1000 underestimates capacity by a factor of ~5 — a catastrophic error in link budgeting.
-
Thinking infinite bandwidth gives infinite capacity: As $B \to \infty$, $C \to \frac{S}{N_0} \log_2 e \approx 1.44 \cdot S/N_0$, where $N_0$ is the noise power spectral density. The capacity approaches a finite limit determined by signal power — you cannot overcome noise with bandwidth alone.
-
Assuming capacity is a fixed property of the channel: Capacity is defined as $C = \max_{P(X)} I(X; Y)$. The channel provides $P(Y \mid X)$ (the physical medium), but the achievable rate depends on how we choose $P(X)$. A BSC at $p=0.1$ has capacity 0.531 bits, but if you always transmit 0 you achieve 0 bits. Capacity is the best you can do by optimising the input.
-
Expecting practical codes to achieve capacity: Shannon's theorem proves existence of capacity-achieving codes (as block length $\to \infty$), but practical codes have a gap to capacity. Modern codes (LDPC, turbo, polar) get within 0.1 dB of the Shannon limit, but this requires long block lengths and sophisticated decoding. In real systems, there is always a trade-off between performance and complexity.
Next Steps
Next up: 13-08-differential-entropy.md