极化码(Polar code)是一种前向错误更正编码方式,用于讯号传输。

构造的核心是通过信道极化(channel polarization)处理,在编码侧采用方法使各个子信道呈现出不同的可靠性,当码长持续增加时,部分信道将趋向于容量近于 1 的完美信道(无误码),另一部分信道趋向于容量接近于 0 的纯噪声信道,选择在容量接近于 1 的信道上直接传输信息以逼近信道容量,是首个被证明能够达到香农极限的方法。

在解码侧,极化后的信道可用简单的逐次干扰抵消解码的方法,以较低的复杂度获得与最大似然解码相近的性能。

如果你没有信息论基础,请先看 Information Theory Primer 章节。

# Introduction

Polar codes were introduced by Erdal Arıkan in 2009 and they provide the first deterministic construction of capacity-achieving codes for binary memoryless symmetric (BMS) channels.

Notice that there is only one channel to transmission data, in order to facilitate the understanding, we logically separate one channel to two channel.

Figure 1

In particular, consider a setup where (U1,U2)(U_1, U_2) are two equiprobable bits that are encoded into (X1,X2)=(U1U2,U2)(X_1, X_2) = (U_1 ⊕ U_2, U_2). Then, (X1,X2)(X_1, X_2) are mapped to (Y1,Y2)(Y_1, Y_2) by two independent BMS channels with transition probabilities P(Y1=yX1=x)=P(Y2=yX2=x)=W(yx)P(Y_1 = y | X_1 = x) = P(Y_2 = y | X_2 = x) = W(y|x). Since the mapping from (U1,U2)(U_1, U_2) to (X1,X2)(X_1, X_2) is invertible, one finds that

I(U1,U2;Y1,Y2)=I(X1,X2;Y1,Y2)=I(X1;Y1)+I(X2;Y2)=2I(W)I(U_1,U_2;Y_1,Y_2) = I(X_1,X_2;Y_1,Y_2) = I(X_1; Y_1) + I(X_2;Y_2) = 2I(W)

where C=I(X1;Y1)=I(W)C = I(X_1; Y_1) = I(W) is the capacity of symmetric BMS channel because X1X_1 is equiprobable and

I(W)yW(y0)log2[W(y0)/(12W(y0)+12W(y1))]=I(X1;Y1)I(W) \triangleq \displaystyle\sum_y W(y|0)log_{2}[{W(y|0)/({\frac{1}{2}}W(y|0) + \frac{1}{2}W(y|1))}] = I(X_1;Y_1)

Thus, the transformation (X1,X2)=(U1U2,U2)(X_1, X_2) = (U_1 ⊕ U_2, U_2) preserves the sum capacity of the system.

Also, the chain rule decomposition

I(U1,U2;Y1,Y2)=I(U1;Y1,Y2)+I(U2;Y1,Y2U1)=2I(W)I(U_1, U_2; Y_1, Y_2) = I(U_1; Y_1, Y_2) + I(U_2; Y_1, Y_2|U_1) = 2I(W)

implies that one can also achieve the rate 2I(W)2I(W) using two steps. First, information is transmitted through the first virtual channel W:U1(Y1,Y2)W^- : U_1 → (Y_1, Y_2). By coding over multiple channel uses, one can achieve any rate up to I(U1;Y1,Y2)I(U_1; Y_1, Y_2). If all the U1U_1’s are known from the first stage, then one can decode the information transmitted through the second virtual channel W+:U2(Y1,Y2,U1)W^+ : U_2 → (Y_1, Y_2, U_1). Again, coding allows one to achieve any rate up to I(U2;Y1,Y2U1)I(U_2; Y_1, Y_2|U_1).

If one chooses convolutional codes with sequential decoding, however, the expected decoding-complexity per bit becomes unbounded for rates above the computational cutoff rate

R0(W)=1log2(1+Z(W))R_0(W) = 1 - log_{2}{(1 + Z(W))}

where

Z(W)yW(y0)W(y1)Z(W) \triangleq \displaystyle\sum_y \sqrt{W(y|0)W(y|1) }

is the Bhattacharyya parameter of the channel . Arıkan’s key observation was that, while the sum capacity of the two virtual channels is preserved, the sum cutoff rate satisfies

R0(W+)+R0(W)2R0(W)R_0(W^+) + R_0(W^-) ≥ 2R_0(W)

with equality R0(W)0,1R_0(W) ∈ {0, 1}. Thus, repeated transformations of this type cause the implied virtual channels to polarize into extremal channels whose capacities approach 0 or 1. But, for these extremal channels, coding is trivial and one either sends an information bit or a dummy bit. From a theoretical point of view, polar codes are beautifully simple. Practically, they approach capacity rather slowly as their blocklength increases. Still, based on recent advances, they have been chosen as the short-block codes for the 5G standard.

# Information Theory Primer

# Discrete Random variables

A discrete random variable XX is completely defined by the set of values it can take, $\mathcal{X} $ , which we assume to be a finite set, and its probability distribution {p(x)}xX\left \{p(x) \right \}_{x \in \mathcal{X} } .

Definition. The value p(x)p(x) is the probability that the random variable XX takes the value xx. The probability distribution p : X[0,1]\mathcal{X} → [0, 1] must satisfy the normalization condition:

xχp(x)=1\displaystyle\sum_{x \in \chi} p(x) = 1

We shall denote by P(A)\mathbb{P}(A) the probability of an event $A \subseteq \mathcal{X} $, so that p(x)=P(X=x)p(x) = \mathbb{P}(X=x)

# Self-information

Claude Shannon's definition of self-information was chosen to meet several axioms:

  1. An event with probability 100% is perfectly unsurprising and yields no information.
  2. The less probable an event is, the more surprising it is and the more information it yields.
  3. If two independent events are measured separately, the total amount of information is the sum of the self-informations of the individual events.

The units of information in follow formula is 'bit':

I(x)=log2(1p(x))=log2(p(x))I(x) = \log_{2}({1 \over p(x)}) = -\log_{2}({p(x)})

The concept of self-information is associated with variable values based on the observation of an information source. Consequently, multiple different values will be used to describe the same source.In order to avoid this situation, we introduce the concept of entropy.

# Entropy

entropy

Definition . The entropy (unit: bit/sig) of a discrete random variable XX with probability distribution p(x)p(x) is denoted

H(X)xχp(x)log21p(x)=E[log21p(x)]H(X) \triangleq \displaystyle\sum_{x \in \chi} p(x) \log_{2}{1\over p(x)} = \mathbb{E} [\log_{2}{1\over p(x)}]

The unit of entropy is determined by the base of the logarithm with base-2 resulting in “bits” and the natural log (i.e., base-e) resulting in “nats”.

The understanding of entropy:

  1. Refers to the mathematical expectation of the self-information of various messages output by the information source, which is the average self-information of the source.
  2. Describe the average uncertainty of the information source XX
  3. Describe the average amount of information carried by each source symbol

# joint entropy

Definition . The joint entropy (in bits) of a pair of (X,Y)pX,Y(x,y)(X, Y) \backsim p_{X, Y}(x, y) is denoted

H(X,Y)(x,y)X×YpXY(x,y)log21pXY(x,y)=E[log21pXY(x,y)]H(X, Y) \triangleq \displaystyle\sum_{(x, y) \in \mathcal{X\times Y} } p_{XY}(x, y) \log_{2}{1\over p_{XY}(x, y)} = \mathbb{E} [\log_{2}{1\over p_{XY}(x, y)}]

Notice that this is identical to H(Z)H (Z) with Z=(X,Y)Z = (X, Y).

# conditional entropy

For a pair of (X,Y)pX,Y(x,y)(X, Y ) \backsim p_{X,Y} (x, y), the conditional entropy (in bits) of YY given XX is denoted

H(YX)pX,Y(x,y)log21pYX(yx)=E[log21pYX(yx)]H(Y|X) \triangleq p_{X,Y}(x,y)log_{2}{1 \over p_{Y|X}(y|x)} = \mathbb{E}[log_{2}{1 \over p_{Y|X}(y|x)}]

conditional entropy H(YX)H(Y|X) means the uncertainty of the YY when the XX is known.

# mutual information

Definition . For a pair of (X,Y)pX,Y(x,y)(X, Y ) \backsim p_{X,Y} (x, y), the mutual information (in bits) between XX and YY is denoted

I(X;Y)(x,y)X×YpX,Y(x,y)log2pX,Y(x,y)PX(x)pY(y)=E[log2pX,Y(x,y)PX(x)pY(y)]I(X;Y) \triangleq \displaystyle\sum_{(x,y) \in \mathcal{X\times Y}}p_{X,Y}(x,y)log_{2}{p_{X,Y}(x,y) \over P_X(x)p_Y(y)} = \mathbb{E}[log_{2}{p_{X,Y}(x,y) \over P_X(x)p_Y(y)}]

# Lemma

Lemma . Basic properties of joint entropy and mutual information:

  1. (chain rule of entropy) H(X,Y)=H(X)+H(YX)H (X, Y) = H (X) + H (Y|X). If XX and YY are independent, H(X,Y)=H(X)+H(Y)H(X, Y) = H(X) + H(Y).
  2. The mutual information satisfies I(X;Y)=I(Y;X)I(X; Y) = I(Y ; X) and

I(X;Y)=H(X)+H(Y)H(X,Y)=H(X)H(XY)=H(Y)H(YX)I(X;Y) = H(X) + H(Y) - H(X,Y) = H(X) -H(X|Y) = H(Y) - H(Y|X)

# References

  • https://zh.wikipedia.org/wiki/ 極化碼
  • http://pfister.ee.duke.edu/courses/ecen655/polar.pdf