# Introduction
Polar codes were introduced by Erdal Arıkan in 2009 and they provide the first deterministic construction of capacity-achieving codes for binary memoryless symmetric (BMS) channels.
Notice that there is only one channel to transmission data, in order to facilitate the understanding, we logically separate one channel to two channel.
In particular, consider a setup where are two equiprobable bits that are encoded into . Then, are mapped to by two independent BMS channels with transition probabilities . Since the mapping from to is invertible, one finds that
where is the capacity of symmetric BMS channel because is equiprobable and
Thus, the transformation preserves the sum capacity of the system.
Also, the chain rule decomposition
implies that one can also achieve the rate using two steps. First, information is transmitted through the first virtual channel . By coding over multiple channel uses, one can achieve any rate up to . If all the ’s are known from the first stage, then one can decode the information transmitted through the second virtual channel . Again, coding allows one to achieve any rate up to .
If one chooses convolutional codes with sequential decoding, however, the expected decoding-complexity per bit becomes unbounded for rates above the computational cutoff rate
where
is the Bhattacharyya parameter of the channel . Arıkan’s key observation was that, while the sum capacity of the two virtual channels is preserved, the sum cutoff rate satisfies
with equality . Thus, repeated transformations of this type cause the implied virtual channels to polarize into extremal channels whose capacities approach 0 or 1. But, for these extremal channels, coding is trivial and one either sends an information bit or a dummy bit. From a theoretical point of view, polar codes are beautifully simple. Practically, they approach capacity rather slowly as their blocklength increases. Still, based on recent advances, they have been chosen as the short-block codes for the 5G standard.
# Information Theory Primer
# Discrete Random variables
A discrete random variable is completely defined by the set of values it can take, $\mathcal{X} $ , which we assume to be a finite set, and its probability distribution .
Definition. The value is the probability that the random variable takes the value . The probability distribution p : must satisfy the normalization condition:
We shall denote by the probability of an event $A \subseteq \mathcal{X} $, so that
# Self-information
Claude Shannon's definition of self-information was chosen to meet several axioms:
- An event with probability 100% is perfectly unsurprising and yields no information.
- The less probable an event is, the more surprising it is and the more information it yields.
- If two independent events are measured separately, the total amount of information is the sum of the self-informations of the individual events.
The units of information in follow formula is 'bit':
The concept of self-information is associated with variable values based on the observation of an information source. Consequently, multiple different values will be used to describe the same source.In order to avoid this situation, we introduce the concept of entropy.
# Entropy
Definition . The entropy (unit: bit/sig) of a discrete random variable with probability distribution is denoted
The unit of entropy is determined by the base of the logarithm with base-2 resulting in “bits” and the natural log (i.e., base-e) resulting in “nats”.
The understanding of entropy:
- Refers to the mathematical expectation of the self-information of various messages output by the information source, which is the average self-information of the source.
- Describe the average uncertainty of the information source
- Describe the average amount of information carried by each source symbol
# joint entropy
Definition . The joint entropy (in bits) of a pair of is denoted
Notice that this is identical to with .
# conditional entropy
For a pair of , the conditional entropy (in bits) of given is denoted
conditional entropy means the uncertainty of the when the is known.
# mutual information
Definition . For a pair of , the mutual information (in bits) between and is denoted
# Lemma
Lemma . Basic properties of joint entropy and mutual information:
- (chain rule of entropy) . If and are independent, .
- The mutual information satisfies and
# References
- https://zh.wikipedia.org/wiki/ 極化碼
- http://pfister.ee.duke.edu/courses/ecen655/polar.pdf