Soft-Output Viterbi Algorithm (SOVA)

As we remember, the regular Viterbi algorithm decides about the transmitted codeword upon the received sequence by using the maximum likelihood criterion. Our considerations on the SOVA will be presented in a wider perspective using the Maximum a Posteriori (MAP) criterion applied in the decision process for the whole received sequences. As in the regular Viterbi algorithm, the algorithm will ﬁnd the optimal codeword but, unlike the latter, possibly unequal probabilities of the codewords are taken into account as well.

Equivalently, unequal probabilities of particular message sequences inﬂuence the decoder operation.

Assume the channel model shown in Figure 1.19a. Thus, the codewordci1transmitted from the initial moment up to the ith time unit is represented by a bipolar sequence di1 that, in the case of the convolutional code of code rateR=1/n, has the form

di1=(d1,d2, . . . ,di) (2.126) Thejth vector element in (2.126) is a vector of bipolar symbols characterizing the codeword generated in thejth time unit

dj =(dj,1, dj,2, . . . , dj,n), dj,k = ±(

Ec, j =1, . . . , i, k=1, . . . , n (2.127)

where Ec is the signal energy per single code symbol. Letdj,k = −√

Ec if cj,k =0 and let dj,k =√

Ec if cj,k=1. Let us also note that assuming a particular initial state of the encoder, both the codewordci1and its bipolar versiondi1are uniquely associated with the message sequence

mi1=(m1, m2, . . . , mi) (2.128) The transmitted vectordi1is subject to disturbance by additive white Gaussian noise, so at the decoder input it has the form

ri1=(r1,r2, . . . ,ri) (2.129) where

rj =(rj,1, rj,2, . . . , rj,n), rj,k =dj,k+νj,k, j =1, . . . , i, k=1, . . . , n (2.130) andνj,kis a white Gaussian noise sample added to thekth element of the bipolar codeword symbol in the jth time unit. As the Gaussian noise source is white, any different noise samples are statistically independent. Let us now formulate the MAP criterion for ﬁnding the codeword ci1 on the basis of the received sequence ri1 or, equivalently, ﬁnding the sequence di1, both uniquely associated with the transmitted message sequence mi1. The codewordci1,opt or its bipolar version di1,opt is searched according to the MAP criterion, which results from the maximized a posteriori probability

di1,opt=arg max

di1

P (di1|ri1) (2.131) Recalling Bayes’ formula, we have

di1,opt=arg max

di1

P (di1|ri1)=arg max

di1

p(ri1|di1)P (di1) p(ri1)

=arg max

di1

p(ri1|di1)P (di1)=arg max

di1

p(ri1|di1)P (mi1) (2.132) We have used the observation that the denominator in Bayes’ formula is common for all possible bipolar codewords di1, so it does not inﬂuence the choice of the best codeword.

We also applied the fact that the probability of the codeworddi1is equal to the probability of the message sequencemi1. Instead of comparing the probabilities we can compare their logarithms, so the MAP criterion evolves to the form

di1,opt=arg max

di1

ln p(ri1|di1)P (mi1) (2.133)

Let us consider the term that is the subject of maximization in detail. Because noise samples are statistically independent, we can write this term in the form

ln p(ri1|di1)P (mi1)=ln i l=1

p(rl|dl)P (ml)

=ln i l=1

% n

k=1

p(rl,k|dl,k)

P (ml) (2.134) The inner product reﬂects the conditional probabilities of particular n samples received within the lth timing instant. We have assumed in (2.134) that subsequent message symbols ml are statistically independent, although their probabilities can have different values. For our convenience we recall the formula describing the conditional probability p(rl,k|dl,k)for the white Gaussian noise channel, which is described by the expression

p(rl,k|dl,k)= 1

√2π σ exp

− 1

2σ2(rl,k−dl,k)2

, l=1, . . . , i, k=1, . . . , n (2.135) where σ2 is the noise variance. We will prove in Chapter 3 that for the white Gaussian noise channel and the optimum receiver σ2=N0/2 (where N0/2 is the power spectral density of additive white Gaussian noise on the receiver input). After substituting (2.135) in (2.134) we receive

ln p(ri1|di1)P (mi1)=ln 1

√2π σ ni

% i

l=1

− 1 2σ2

n k=1

(rl,k−dl,k)2

+lnP (ml) 9

(2.136) The ﬁrst term of the right-hand side in (2.136) does not depend on the searched codeword.

It linearly grows with the length of the codeword, so it does not inﬂuence the maximized logarithm of the probability p(ri1|di1)P (mi1) and can be omitted. Thus, we can write (2.133) in a new form

di1,opt=arg max

di1

ln p(ri1|di1)P (mi1)

=arg max

di1

: i

l=1

− 1 2σ2

n k=1

(rl,k−dl,k)2

+lnP (ml) 9;

(2.137) Maximization of the term in curly brackets is equivalent to minimization of the sum of terms that consist of the squared errors between the received sample rl,k and the bipolar symbol dl,k of a hypothetical codeword and the logarithms of the message probabilities ml (l=1, . . . , i). Let us note that the noise variance is used in the minimization process and it inﬂuences its result. If probabilities of all message symbols are equal, then their

logarithms do not inﬂuence the choice of the decoded codeword and can be omitted.

Consequently, the criterion reduces to the well-known result considered in the previous section

di1,opt=arg min

di1

i l=1

n k=1

(rl,k−dl,k)2 (2.138) However, let us come back to the more general case shown in (2.137). Expanding the squared errors in square brackets, we have

i l=1

− 1 2σ2

n k=1

(rl,k−dl,k)2

+lnP (ml) 9

= i

l=1

− 1 2σ2

n k=1

(rl,k2 −2rl,kdl,k+dl,k2 )

+lnP (ml) 9

= i

l=1

− 1 2σ2

n k=1

(rl,k2 +dl,k2 )+ 1 σ2

n k=1

rl,kdl,k

+lnP (ml) 9

= i

l=1

% Cl+Lν

n k=1

rl,kdl,k+lnP (ml)

(2.139) where

Cl= − 1 2σ2

n k=1

(rl,k2 +dl,k2 ) (2.140) is a common term in all possible codewords and does not inﬂuence the choice of the decoded codeword. Denote Lν =1/σ2. In consequence, the criterion achieves the sim- pliﬁed form

di1,opt=arg max

di1

8 i

l=1

% Lν

n k=1

rl,kdl,k+lnP (ml)

(2.141) Let us now assume that the message symbols are bipolar as well. Without changing the decoder decision we can add a certain value dependent on the current time index lto each term summed in subsequent time units, i.e. instead of Lνn

k=1rl,kdl,k+lnP (ml) we write

2Lν

n k=1

rl,kdl,k+2 lnP (ml)−ln Pr{ml =1} −ln Pr{ml = −1}

=Lν n k=1

rl,kdl,k+ml(ml) (2.142)

where Lν =2/σ2 and(ml)is the log-likelihood ratio (LLR) of the symbolml, i.e.

(ml)=ln Pr{ml =1}

Pr{ml = −1} (2.143)

In deriving (2.142) we used the observation that

2 lnP (ml)−ln Pr{ml =1} −ln Pr{ml = −1}





ln Pr{ml=1} −ln Pr{ml = −1} if ml =1 ln Pr{ml= −1} −ln Pr{ml =1} if ml = −1



=ml(ml) (2.144) Finally, the criterion achieves the useful form

di1,opt=arg max

di1

M(ri1|di1)

(2.145) where

M(ri1|di1)= i

l=1

% Lν

n k=1

rl,kdl,k+ml(ml)

(2.146) is a maximized metric. Searching for the best codeword reduces to ﬁnding such a codeword (or message sequence) for which the accumulated sum of the cross-correlation between the received samples and the hypothetical codewords in bipolar form weighted byLν and the LLRs of the hypothetical message symbols weighted by their bipolar values is maximized. Let us note that metric (2.146) can be calculated recurrently using the formula

M(ri1|di1)=M(ri−1 1|di−1 1)+Lν n k=1

ri,kdi,k+mi(mi) (2.147) The Viterbi algorithm calculates the metricM(ri1|di1)for each trellis state in each time unit, trying to determine the survival path to each trellis statesj (j =1, . . . ,2L−1). Consider such a calculation for the jth state at the ith moment. Let this state be accessible from statesl1andl2from the previous moment. Denote the survival path metrics for states l1

as Ml1(ri−11 |di−11 ) and Ml2(ri−11 |di−11 ), respectively, and the metrics associated with the paths between the pairs of states(sl1, sj)and (sl2, sj)as

d(ri, sl1, sj)=Lν n k=1

ri,kdi,k(l1,j )+m(li1,j ) m(li1,j )

(2.148) and

d(ri, sl2, sj)=Lν n k=1

ri,kdi,k(l2,j )+m(li2,j ) m(li2,j )!

(2.149)

where di,k(l1,j ) and di,k(l2,j ) are the bipolar codeword sequences associated with the path between pairs of states(sl1, sj)and (sl2, sj), respectively, whereasm(li1,j ) andm(li2,j )are the message symbols associated with these paths. Thus, for each trellis statesj at theith moment the decoder selects the path for which the following expression holds

(lmax1,l2)

Ml1,j ri1|

di1(l1,j )!

, Ml2,j ri1|

di1(l2,j )!7

(2.150) where

Ml1,j ri1|

di1(l1,j )!

=Ml1(ri−11 |di−11 )+d(ri, sl1, sj) Ml2,j ri1|

di1(l2,j )!

=Ml2(ri−11 |di−11 )+d(ri, sl2, sj) The vectors

di1(l1,j )

and di1(l2,j )

denote the codewords associated with the paths reaching state sj through statessl1 andsl2, respectively, and the new survival path metric for the statesj is

Mj(ri1|dii)=max

(l1,l2)

Ml1,j ri1|

di1(l1,j )!

, Ml2,j ri1|

di1(l2,j )!7

(2.151) The above procedure is illustrated in Figure 2.24. We still need to assign a certain measure of reliability to the decision upon the path selection. This is necessary for gen- eration of a soft decoder output for each message element. It is intuitively clear that if the candidate metrics Ml1,j(ri1|di1) and Ml2, j (ri1|di1) do not differ much, then selection of the correct path is unreliable, whereas when there is a large difference between them, the probability of selecting a wrong path is low. In this context let us choose the measure of reliability of reaching the state sj as

i−1(sj)= 1 2 6

Ml1,j ri1|

di1(l1,j )!

−Ml2,j ri1|

di1(l2,j )!7

(2.152)

s1=00 s2=01 s3=10 s4=11 Received

sequence r11r12r13 r21r22r23 r31r32r33 r41r42r43 r51r52r53 r61r62r63 r71r72r73 r81r82r83 r91r92r93 ∆8(s1)

M1,1(r19|(d19)1,1)

M2,1(r19|(d19)2,1)

Figure 2.24 Selection of the survival path for state s1 at the ninth moment, accompanied by calculation of the metric difference 8(s1)

Let us arbitrarily assume that the correct path is the one that reaches statesj from state sl1 together with its survival path. Then the probability of the correct path selection can be expressed in the form of the MAP probabilities associated with the candidate paths reaching statesj; i.e. in the form

Pc(sj)= P di1(l1,j )

|ri1

P di1(l1,j )

|ri1

!+P di1(l2,j )

|ri1

! (2.153)

Recalling Bayes’ theorem, we have

Pc(sj)= p ri1|

di1(l1,j )! P

(mi1)(l1,j ) p ri1|

di1(l1,j )! P

(mi1)(l1,j )

+P ri1|

di1(l2,j )! P

(mi1)(l2,j ) (2.154) However, our previous analysis allows us to express the probabilities in such a form that the probability of selecting the correct path takes the following shape

Pc(sj)= Cexp

2Ml1,j ri1|

di1(l1,j )!7 Cexp

2Ml1,j ri1|

di1(l1,j )!7

+Cexp 61

2Ml2,j ri1|

di1(l2,j )!7 (2.155) where the constant C accumulates all the components in the logarithm domain that do not influence the choice of path [see the first component of (2.136), Cl in (2.139) and (2.140)]. On the other hand, the scaling factor 12 reverts the influence of multiplication of the original metric by 2 performed in (2.143) and (2.144). After multiplying the nominator and denominator of (2.155) by exp

6−12Ml2,j ri1|

di1(l2,j )!7

we obtain Pc(sj)= exp[ i−1(sj)]

exp[ i−1(sj)]+1 (2.156)

Finally, the log-likelihood ratio or reliability of the path decision concerning reaching statesj at theith moment is

ln Pc

1−Pc = i−1(sj) (2.157)

It still remains to describe how the reliability of a path decision given by (2.157) is associated with the hard-decision output of the Viterbi decoder. As we know, the decoder produces hard decisions mi and the reliabilities associated with them. Recall that we have assumed that

di1(l1,j )

is associated with the correct survival path for statesj. The codeword

di1(l1,j )

is in turn uniquely associated with the message sequence (mi1)(l1,j ), whereas the second competing path is associated with the message sequence (mi1)(l2,j ). The choice of the survival path and the reliability associated with it affects only those positions in the message sequence in which the candidate sequences are different. The

calculated reliability becomes important after the initialization phase of the algorithm.

Consider the ﬁrst moment of the regular phase of the algorithm, which occurs at the time unit i=L (L is a code constraint length). At this moment the path selection is performed for the ﬁrst time for each trellis state and reliabilities L−1(sj)are calculated for j=1, . . . ,2L−1. As we can see, each state sj is characterized not only by the path metric Mj(ri1|dii) and the state sequence Dji, as in a regular Viterbi algorithm, but also by a reliability vector described by the expression

RL(sj)=[R1(sj), R2(sj), . . . , RL−1(sj)] (2.158) where

Rl(sj)=





L−1(sj) if m(ll1,j )=m(ll2,j )

∞ if m(ll1,j )=m(ll2,j )

, forl =1, . . . , L−1 (2.159) In this way, the reliability vectors are initialized for each trellis state. In subsequent time instants of the SOVA recurrent phase, the metrics for each survival paths are updated using formula (2.151) and the state sequence vectors are updated accordingly. Additionally, the reliability vectors are modiﬁed using the following rule

Ri+1(sj)=[R1(sj), R2(sj), . . . , Ri(sj)] (2.160) where

Rl(sj)=





min[ L−1(sj), Rl(sj)] if m(ll1,j )=m(ll2,j ) Rl(sj) if m(ll1,j )=m(ll2,j )

, forl =1, . . . , i (2.161) As we can see from (2.161), elements of the reliability vector at the (i+1)st moment preserve their previous value if two candidate paths have the same message symbol on the appropriate position. If the message symbols differ, then the minimum of the currently calculated reliability and of the previous vector entry is selected. The operation of the algorithm is completed at the end of the received vector. The algorithm produces the decided message sequence with the attached reliabilities that are the ﬁnal values of the reliability vector elements for the state featuring the maximum path metric. In the case of very long codewords, for which waiting for the processing of the whole sequence of symbols is infeasible, the appropriately long decoding depth is applied and all processed vectors are truncated to the selected length.

At the end of our considerations let us note the potential role of the a priori term ml(ml) in the metric calculated according to (2.146). If some extra knowledge on the a priori probabilities of the message symbols is available, then it can be applied for improving decoding quality as compared with the case in which it is more or less arbitrarily assumed that Pr{ml =1} =Pr{ml= −1}regardless of the real values of these probabilities. This potential improvement ability is utilized in iterative decoding, in which in each decoding iteration the a priori LLR term (ml) gets more and more precise.

Iterative decoding is a subject of our considerations in one of the next sections of this chapter.

Examples of Discrete Memoryless Channel Models

Capacity of Band-Limited Channel with Additive White