Maximum Likelihood Decision Rule

1.11 Decision Process and its Rules

1.11.3 Maximum Likelihood Decision Rule

If the channel input symbols are equiprobable, the MAP rule expressed by formula (1.74) can be further simpliﬁed and it takes the form

P (yk|x∗)≥P (yk|xj) (1.75) Sometimes the decision rule given by (1.75) is applied even though the channel input symbol probabilities are unknown to the receiver. In that case (1.75) is a suboptimum procedure and it is called theMaximum Likelihood(ML) decision rule. This rule is applied in data detection in many receivers. It is often used as a base of decoding algorithms for channel code decoding.

Let us consider a simple example of the ML decision rule setting.

Example 1.11.1 Let us consider a discrete memoryless channel model with input and output symbol alphabets of equal size. LetJ =K=3. Let the channel transition matrix for the considered channel have the form

P =



0.7 0.2 0.1

0.3 0.6 0.1

0.1 0.4 0.5





As we remember, the columns of the channel transition matrix P =[P (yk|xj)] (j = 1, . . . , J, k=1, . . . , K)are associated with the same output symbol, whereas the rows are associated with the same input symbol. Therefore, applying decision rule (1.75) in which a given output symbol has to be assigned the input symbol for which the transition probability is maximum, we obtain

d(y1)=x1, d(y2)=x2, d(y3)=x3

From ﬁrst glance it seems that both MAP and ML rules are quite abstract from the implementation point of view; however, as we will learn, application of the ML decision rule leads to highly practical solutions.

So far we have considered a discrete memoryless channel model. Let us focus for a moment on one particular case, i.e. on the channel model with binary input and continuous output. This channel model additionally supplemented with the quantizer is shown in Figure 1.19a. Letyn be the unquantized channel output at thenth timing instant and let the input symbolxn= ±A. Since the channel output can take continuous values, the ML decision rule changes its form to

p(yn|xn=A)≶p(yn|xn= −A) (1.76) As we see, conditional probabilities have been replaced by appropriate conditional probability density functions. The receiver selects the value, +A or −A, for which the conditional probability density function is higher. The decision rule will obviously remain the same if both sides of (1.76) are replaced by their natural logarithms, i.e.

lnp(yn|xn=A)≶lnp(yn|xn= −A) (1.77) or, equivalently, if we calculate the expression

(yn)=ln p(yn|xn=A)

p(yn|xn= −A) (1.78) and check if it is higher or lower than zero. The function(yn)given by (1.78) is called theLog-Likelihood Ratio(LLR) function and is an alternative tool in performing the ML decision rule. For common probability density functions the general expression (1.78) can be signiﬁcantly simpliﬁed. For example, consider the channel model with an additive Gaussian noise source, which is shown in Figure 1.19a. The conditional probability density functions of a Gaussian shape are given in Figure 1.19b and are described by the formula

p(yn|xn)= 1

√2π σ exp

−(yn−xn)2 2σ2

(1.79) For this example the LLR function reduces to

(yn)=ln

√1

2π σ exp

−(yn−A)2 2σ2

√1

2π σ exp

−(yn+A)2 2σ2

# = 2A

σ2yn (1.80)

The reader is asked to perform a simple derivation leading to the right-hand side of (1.80).

The additive noise variance is denoted byσ2. As we see, in the particular case of bipolar transmission6 the ML criterion using the LLR function is reduced to checking if yn is positive or negative.

6We call that transmissionbipolarif data symbols take the form±A.

Now let us consider a binary symmetric memoryless channel with the error probability p <1/2. Let the channel transmit a block of n subsequent binary symbols. Let us treat this block as a whole. Thus, we can state that we deal with a discrete memoryless channel for which the input and output symbols are n-element binary blocks. Let the input symbol alphabet X consist of J =2k (k < n) blocks selected from 2n possible binary combinations, and let the output symbol alphabetY consist ofK=2n blocks (all possible n-element binary blocks). During transmission of subsequent bits of the block xj a bit is received in error with the probability p and it is received correctly with the probability 1−p. As a result of feeding the symbolxj to the channel input, we receive a single symbol yk at its output. On the basis of the received symbolyk in the form of an n-element block, the ML decision rule should ensure the selection of such an input symbolx∗ for which condition (1.75) is fulﬁlled.

In order to ﬁnd the ML rule for the considered case, let us introduce the idea of the Hamming distance between two binary blocks of the same length.

Deﬁnition 1.11.2 The Hamming distance between binary blocksxj andyk of the same length, denoted byd(xj, yk), is the number of positions at which both blocks differ from each other.

Let the Hamming distance between the input block xj and the output block yk be D=d(xj, yk). Knowing that transmission of binary symbols consitituting ann-element block is a sequence of statistically independent events, the probability of reception ofyk conditioned on transmission ofxj is given by the expression

P (yk|xj)=pD(1−p)n−D (1.81) In a typical situation the error probability p is lower than 1/2. Thus, the following sequence of inequalities holds true

(1−p)n> p(1−p)n−1> p2(1−p)n−2> . . . (1.82) We conclude from (1.82) that the ML rule is reduced in this case to the selection of block x∗=d(yk)from all possible blocksxj, for which the Hamming distance to the received blockyk is the lowest. This situation is symbolically shown in Figure 1.23. The received blockyk is denoted by a cross. Blockx5is the closest one in the Hamming distance sense toyk among the input symbols x1, x2, . . . , x9.

Consider now a particular case of the above example. Let the input symbol alphabetX consist of two symbolsx1=(000. . .0)andx2=(111. . .1)of lengthn. Letnbe an odd number. Symbolx1can be assigned a message “0” andx2the message “1”, respectively.

Theoretically, these messages could be represented by 0 or 1; however, instead of that they are represented by whole sequences of these symbols of length n. During transmission of subsequent binary symbols of block x1 orx2 over the binary symmetric memoryless channel with the error probabilityp, the received channel output blockykcan take one of 2n possible forms. The ML rule allows selection of the input sequence that is closest to the received block in the Hamming sense. The decision will be erroneous if the number of binary errors committed during transmission of an n-element block of “0”s or “1”s

x5=d(yk) x6

x7 x8

yk D1 D2

D3 D4

D6 D7 D8 D9

Figure 1.23 Process of ﬁnding the sequencex∗ featuring the minimum Hamming distance from the received sequenceyk

exceeds n/2. Assuming the independence of binary error events, we can easily deduce that the probability of i errors occurring in the n-element block is pi(1−p)n−i. The number of possible combinations ofi errors in the block of lengthnisn

. Therefore the probability of an erroneous decision on the transmitted message is given by the formula

P (E)= n i=(n+1)/2

n i

pi(1−p)n−i (1.83)

Assuming, for example, the value p=0.01 and calculating the values of the probability P (E)for subsequent odd block lengths n we obtain P (E)=10−2, 3×10−4, 10−5, 4×10−7, . . .fornequal to 1,3,5,7, . . ., respectively. From the above we conclude that if we want to achieve a very low decision error probability related to a single message, we should increase the size of the “0” and “1” blocks appropriately. Since each binary message is represented by ann-bit block, the efﬁciency of this representation, and therefore of the coding, is R=1/n. As we see, the price paid for increasing the transmission quality is then-fold lowering of its rate. Need this price really be paid?

The answer to this question was given by Claude Shannon, who formulated the famous theorem on the reliable transmission of messages over unreliable channels. The form of this theorem for the case of binary symmetric memoryless channel is as follows.

Theorem 1.11.1 Consider a binary symmetric memoryless channel with the error proba- bilitypand capacityC=1−H (p). Letεbe an arbitrarily small positive constant and let M =2n(C−ε). For a sufﬁciently large numbern, from2n possible binary blocks of length n one can select a subset ofM blocks in such a way that the probability of erroneous decoding of the received block will be arbitrarily small.

The proof of this theorem can be found in the original paper by Shannon (1948) and in more advanced books on information theory. The above-quoted theorem, called the second Shannon theorem for a binary symmetric memoryless channel, states that in order to ensure transmission with arbitrarily low probability of error, the coding rateRcannot

be higher than the channel capacityC. Since the number of allowed transmitted blocks is M=2n(C−ε), each allowed block ofn bits represents in fact one of n(C−ε) binary messages. Therefore the coding rate is equal to

R= n(C−ε)

n =C−ε (1.84)

As we see,R < C. The coding rate R gets closer to channel capacityC asε decreases.

Thus, we conclude that the application of repetition coding is not a necessary solution, because the coding rate is in reality limited by the channel capacity only. However, the condition for achieving an arbitrarily small error probability is the application of the appropriately long symbol block.

The theorem states that there exists a set of M blocks that ensure arbitrarily low probability of erroneous decoding. However, it does not propose how to select them.

In this sense the above theorem is not constructive. However, in the 1990s some good codes with performance very close to the limit stated by Shanon’s theorem have been constructed. They will be presented in the next chapter.

In a more general case of the discrete memoryless channel the Shannon theorem has the following formulation.

Theorem 1.11.2 Consider a memoryless source characterized by the alphabet X and entropy H (X). Let the source emit a message every Ts seconds. Let there be given a discrete memoryless channel with capacityC, through which the symbols representing the messages of sourceXare sent everyTcseconds. Then, if the following inequality holds

H (X) Ts ≤ C

Tc (1.85)

there exists a code for which encoded messages of sourceXcan be decoded at the channel output with an arbitrarily low error probability. However, if

H (X) Ts >C

there is no code that ensures reception of the transmitted message sequence with an arbi- trarily low error probability.

Both theorems establish a basic limit on the rate of reliable message transmission over unreliable channels.

Examples of Discrete Memoryless Channel Models

Capacity of Band-Limited Channel with Additive White