Fundamental Concepts of Information Theory- 123docz.net

Part IV MULTIPLE ACCESS AND ADVANCED TRANSCEIVER SCHEMES 363

14.1 Fundamentals of Coding and Information Theory

14.1.2 Fundamental Concepts of Information Theory

Shannon’s seminal work is the foundation of information theory, which explores the theoretical performance limits of optimum communication systems. Knowledge of information-theoretic limits is useful for system designers because it indicates how far a real system could be improved. Key concepts we will encounter are (i) a mathematically solid definition of information, and (ii) the channel capacity, which describes at what rate information can be transmitted, at best, over a given channel. We will find that suitable coding is required for communication to approach the channel capacity.

As a preliminary step, let us define the Discrete Memoryless Channel (DMC). We have an alphabet X of transmit symbols (the size of the alphabet is |X|), where the probability for the transmission of each of the elements of the alphabet is known. We furthermore have a receive symbol alphabetY, with size|Y|. Last, but not least, the DMC defines transition probabilities from each of the transmit symbols to each of the receive symbols. The most common form is the binary symmetric channel, which is characterized by|X| = |Y| =2, and a transition probabilityp- e.g., if we transmitX= +1, then with 1−pprobability, the receive symbol isY = +1, and withp probability, it isY= −1; similarly if we transmitX= −1, probability for receiving−1 is 1−p, and for receiving+1 isp. A memoryless channel has furthermore the property that if we transmit a sequence of transmit symbols x= {x1, x2, . . . .xN} and observe a sequence of output symbols y= {y1, y2, . . . .yN}, then

Pr(y|x)= N n=1

Pr(yn|xn) (14.1)

in other words, there is no InterSymbol Interference (ISI).

DMCs can describe, e.g., the concatenation of a Binary Phase Shift Keying (BPSK) modulator, an Additive White Gaussian Noise (AWGN) channel, and a demodulator with hard-decision output.

Then, the variableXcorresponds to the transmit symbols+1/−1. The variableYcorresponds to the output of the demodulator/decision device, which is also+1/−1. The amount of noise determines the symbol error probability (as computed in Chapter 12), which is in this case identical to the transition probabilityp. If we want to avoid the restrictions of hard-decision demodulators, a useful channel model is the discrete-time AWGN channel (as used in Chapters 11–13), with

yn=xn+nn (14.2)

where thenn are Gaussian-distributed random variables with varianceσ2, and the input variables are subject to an average power constraint,E{X2} ≤P .3

We now define the mutual information between two discrete random variablesXandY. If we observe a certain realization ofY, i.e.,Y=y, then the mutual information is a measure for how much this observation tells us about the occurrence of an event X=x. The mutual information between the realizationsxandyis defined as

I (x;y)=logPr(x|y)

Pr(x) (14.3)

where Pr(x|y)is the probability ofx, conditioned ony. The logarithm in the above equation can either have base 2, in which case the mutual information is in units ofbits, or it can have basee,

3Note thatP is power, whilepis the transition probability.

the probability of receiving≈Nperrors tends to unity. There are N

= N!

(Np)!(N (1−p)!) (14.10)

≈

√2π N NNe−N

√2π Np(Np)Npe−Np √

2π N (1−p)(N (1−p))N (1−p)e−N (1−p) (14.11)

≈ 1

2N[plog(p)]2N[(1−p)log(1−p)] (14.12)

=2N Hb(p) (14.13)

different codewords withNperrors; the first approximation follows from Sterling’s formula, and the second is a rearrangement of terms usingp=2log(p). The last equality is simply the definition of the binary entropy function,Hb(p)= −plog(p)−[(1−p)log(1−p)]. The overall number of available sequences of length N is 2N. Thus, the number of clearly distinguishable sequences is M=2N (1−Hb(p)). Therefore, usingN symbols, we can transmit M different messages, and thus log(M)information bits. The possible data rate is thus

R= log(M)

N =1−Hb(p) (14.14)

For an AWGN channel, a similar argument can be made. For a long codeword, we know that with high probability

1 N

|nn|2→σ2 (14.15)

so that the received signal vectorylies near the surface of a sphere (called noise sphere) of radius

√N σ2 around the transmit signal vector x. Reliable communications is possible as long as the spheres associated with the different codewords do not overlap, i.e., each received signal point can be uniquely associated with a particular transmit signal point. On the other hand, due to the average power constraint, we know that all received signal points must lie within a sphere of radius

N (P+σ2). We can thus conclude that the number of different received sequences that can be decoded reliably is equal to the number of noise spheres that fit into a sphere of radius N (P+σ2). Since the volume of anN-dimensional sphere with radiusρis proportional toρN, the number of different messages that can be communicated with a codeword of lengthN is

M= [N (P+σ2)]N /2

[N σ2]N /2 (14.16)

so that the possible rate of communications is R=log(M)

N = 1

2log 1+ P σ2

(14.17) This equals the capacity of the AWGN channel. Without derivation, we state here also that this capacity is achieved when the transmit alphabet is Gaussian (and thus continuous).

It is also noteworthy that Eq. (14.17) is the capacity per channel use (i.e., per transmitted underlying symbol) for arealmodulation alphabet and channel. When using complex modulation, the capacity per unit bandwidth becomes

CAWGN=2ã1

2log 1+ P N0B

bits/s/Hz (14.18)

become unmanageable. In the sections below, we discuss practical coding methods that have worse performance than random codes, but have the advantage of being actually decodable with finite effort. This is particularly true for the block codes and convolutional codes, which have been used for many years but which – due to their relatively short codeword length – do not come close to the theoretical performance limits. The 1990s have finally seen codes that achieve practical decodability while having large effective length of codewords, and thus close-to-optimum performance:turbo codes andLDPC codes approach the Shannon limit within less than 1 dB; they are discussed in Sections 14.6–14.7.

Fundamental Concepts of Information Theory

Diffraction by a Single Screen or Wedge

Small-Scale Fading with a Dominant Component