Digital communication receivers P10

A general-purpose DSP is a software-programmable integrated circuit used for Figure 10-l Hardware Components of a Digital Signal Processing System 533 Digital Communication Receivers: Sy

Trang 1

Chapter 10 DSP System Implementation

This chapter is concerned with the implementation of digital signal processing systems It serves the purpose to make the algorithm designer aware of the strong interaction between algorithm and architecture design

Digital signal processing systems are an assembly of heterogeneous hardware components The functionality is implemented in both hardware and software subsystems A brief overview of DSP hardware technology is given in Section 10.1 Design time and cost become increasingly more important than chip cost

A look at hardware-software co-design is done in Section 10.2 Section 10.3 is devoted to quantization issues In Sections 10.4 to 10.8 an ASIC (application- specific integrated circuit) design of a fully digital receiver is discussed We describe the design flow of the project, the receiver structure, and the decision making for its building blocks The last two sections are bibliographical notes on Viterbi and Reed-Solomon decoders

Digital signal processing systems are an assembly of heterogeneous subsystems The functionality is implemented in both hardware and software subsystems Commonly found hardware blocks are shown in Figure 10-l

There are two basic types of processors available to the designer: a programmable general-purpose digital signal processor (DSP) or a microprocessor

A general-purpose DSP is a software-programmable integrated circuit used for

Figure 10-l Hardware Components of a Digital Signal Processing System

533

Digital Communication Receivers: Synchronization, Channel Estimation, and Signal Processing

Heinrich Meyr, Marc Moeneclaey, Stefan A Fechtel Copyright  1998 John Wiley & Sons, Inc Print ISBN 0-471-50275-8 Online ISBN 0-471-20057-3

Trang 2

speech coding, modulation, channel coding, detection, equalization, and associ- ated modem tasks such as frequency, symbol timing and phase synchronization

as well as amplitude control Moreover, a DSP is preferably used with regard

to flexibility in applications and ability to add new features with minimum re- design and re-engineering Microprocessors are usually used to implement pro- tocol stacks, system software, and interface software Microprocessors are better suited to perform the non-repetitive, control-oriented input/output operations as well as all housekeeping chores

ASICs are used for various purposes They are utilized for high-throughput tasks in the area of digital filtering, synchronization, equalization, and channel decoding An ASIC is often likely to provide also the glue logic to interface components In some systems, the complete digital receiver is implemented as

an ASIC, coupled with a microcontroller ASICs have historically been used because of their lower power dissipation per function In certain applications like spread-spectrum communications, digital receiver designs require at least partial ASIC solutions in order to execute the wideband processing functions such as despreading and code synchronization This is primarily because the chip-rate processing steps cannot be supported by current general-purpose DSPs

Over the last few years, as manufacturers have brought to market first- and second-generation digital cellular and cordless solutions, programmable general- purpose digital signal processors are slowly being transformed into “accelerator- assisted DSP-microcontroller” hybrids This transformation is a result of the severe pressure being put on reducing the power consumption As firmware solutions become finalized, cycle-hungry portions of algorithms (e.g., equalizers) are being

“poured into silicon”, using various VLSI architectural ideas This has given rise to, for example, new DSPs with hardware accelerators for Viterbi decoding, vectorized processing, and specialized domain functions The combination of programmable processor cores with custom data-path accelerators within a single chip offers numerous advantages: performance improvements due to time-critical computations implemented in accelerators, reduced power consumption, faster internal communication between hardware and software, field programmability due to the programmable cores and lower total system cost due to a single DSP chip solution Such core-based ASIC solutions are especially attractive for portable applications typically found in digital cellular and cordless telephony, and they are likely to become the emerging solution for the foreseeable future

If a processor is designed by jointly optimizing the architecture, the instruc- tion set and the programs for the application, one speaks of an application-specific integrated processor (ASIP) The applications may range from a small number

of different algorithms to an entire signal processing application A major draw- back of ASIPs is that they require an elaborate support infrastructure which is economically justifiable only in large-volume applications

The decision to implement an algorithm in software or as a custom data-path (accelerator) depends on many issues Seen from a purely computational power

Trang 3

10.1 Digital Signal Processing Hardware 535

Figure 10-2 Complexity versus Signal Bandwidth Plot

point-of-view, algorithms can be categorized according to the two parameters signal bandwidth and number of operations per sample The first parameter defines a measure of the real-time processing requirement of the algorithm The second provides a measure of complexity of the algorithm In Figure 10-2 we have plotted complexity versus bandwidth on a double logarithmic scale A straight line in the graph corresponds to a processing device that performs a given number

of instructions per second Applications in the upper-right corner require massive parallel processing and pipelining and are the exclusive domain of ASICs In contrast, in the lower-left corner the signal bandwidth is much smaller than the clock rate of a VLSI chip Hence, hardware resources can be shared and the programmable processor is almost always the preferred choice For the region between these two extremes resource sharing is possible either using a processor

or an ASIC There are no purely computational power arguments in favor of either one of the two solutions The choice depends on other issues such as time-to- market and capability profile of the design team, to mention two examples The rapid advance of microelectronic is illustrated by Figure 10-3 The complexity of VLSI circuits (measured in number of gates) increases tenfold every

6 years This pattern has been observed for memory components and general- purpose processors over the last 20 years and appears to be true also for the DSP The performance measured in MOPS (millions of operations per second) is related

to the chip clock frequency which follows a similar pattern The complexity of software implemented in consumer products increases tenfold every 4 years [ 11

Trang 4

Figure 10-3 Complexity of VLSI Circuits

The functionality in a DSP system is implemented in both hardware and software subsystems But even within the software portions there is diversity Control-oriented processes (protocols) have different characteristics than data-flow- oriented processes (e.g., filtering) A DSP system design therefore not only mixes hardware design with software design but also mixes design styles within each of these categories

One can distinguish between two opposing philosophies for system level design [2] One is the unified approach which seeks a consistent semantics for the specification of the complete system The other is a heterogeneous approach which seeks to combine semantically disjoint subsystems For the foreseeable future the latter appears to be the feasible approach In the heterogeneous approach, for example, the design automation tool for modeling and analysis of algorithms is tightly coupled with tools for hardware and software implementation This makes it possible to explore algorithm/architecture trade-offs in a joint optimization process,

as will be discussed in the design case study of Section 10.4

The partitioning of the functionality into hardware and software subsystems

is guided by a multitude of (often conflicting) goals For example, a software implementation is more flexible than a hardware implementation because changes

in the specification are possible in any design phase On the negative side we mentioned the higher power consumption compared to an ASIC solution which

is a key issue in battery-operated terminals Also, for higher volumes an ASIC

is more cost effective

Design cost and time become increasingly more important than chip processing costs On many markets product life cycles will be very short To compete

Trang 5

10.3 Quantization and Number Representation 537

successfully, companies will need to be able to turn system concepts into silicon

quickly This puts high priority on computerized design methodology and tools in

order to increase the productivity of engineering design teams

In this section we discuss the effect of finite word-lengths in digital signal

processing There are two main issues to be addressed First, when sampling was

considered so far, we assumed that the samples were known with infinite precision,

which of course is impossible Each sample must therefore be approximated by

a binary word The process where a real number is converted to a finite binary

word is called quuntizution

Second, when in digital processing the result of an operation contains more

bits than can be handled by the process downstream, the word length must be

reduced This can be done either by rounding, truncation, or clipping

For further reading, we assume that the reader is familiar with the basics of

binary arithmetics As a refresher we suggest Chapter 9.0 to 9.2 of the book by

Oppenheim and Schafer [3]

A quantizer is a zero-memory nonlinear device whose output Zout is related

to the input x in according to

xout = qi if Xi<Xin<Xi+l (10-l) where qi is an output number that identifies the input interval [xi, xi+1 ),

Uniform quantization is the most widely used law in data signal processing and

the only one discussed here

All uniform quantizer characteristics have the staircase form shown in Figure

10-4 They differ in the number of levels, the limits of operation, and the location

2Aq

Ag -4Ax -3Ax -2Ax -Ax

0,ll

2f4 II4

0 -1 -314 -2l4 -l/4

Figure 10-4 Uniform Quantizer Characteristic with b= 3 Bit Rounding to the

nearest level is employed The binary number is interpreted as 2’s complement

Trang 6

of the origin Every quantizer I has a finite range that extends between

limits xrnin) xmax- Any input value exceeding the limits is clipped:

Aq as an integer or as a binary fraction 2-b

A quantizer exhibits small-scale nonlinearity within its individual steps and large-scale nonlinearity if operated in the saturation range The amplitude of the input signal has to be controlled to avoid severe distortion of the signal from either nonlinearity The joint operation of the analog-to-digital (A/D) converter and the AGC is of crucial importance to the proper operation of any receiver The input amplitude control of the A/D converter is known as loading adjustment

Quantizer characteristics can be categorized as possessing midstep or midriser staircases, according to their properties in the vicinity of zero input Each has its own advantages and drawbacks and is encountered extensively in practice

In Fig 10-4 a mid-step characteristic with an even number of levels L = 23

is shown The characteristic is obtained by rounding the input value to the nearest quantization level A 2’s complement representation of the binary numbers is used

in this example The characteristic exhibits a dead zone at the origin When an error detector possesses such a dead zone, the feedback loop tends to instability This characteristic is thus to be avoided in such applications The characteristic

is asymmetric since the number - 1 is represented but not the number +l If the quantizer therefore operates in both saturation modes, then it will produce a nonzero mean output despite the fact that the input signal has zero mean The characteristic can easily be made symmetric by omitting the most negative value

A different characteristic is obtained by truncation Truncation is the operation which chooses the largest integer less than or equal (xin/Ax) For example, xin/Ax = 0.8 we obtain INT(0.8) = 0 But for xin/Ax = -3.4 we obtain INT( -3.4) = -4 In Figure 10-5 the characteristic obtained by truncation is shown A 2’s complement representation of the binary numbers is used

This characteristic is known as ofiet quantizer3 Since it is no longer symmetric, it will bias the output signals even for small signal amplitudes This leads to difficulties in applications where a zero mean output is required for a zero mean input signal

A midriser characteristic is readily obtained from the truncated characteristics

in Fig 10-5 by increasing the word length of the quantizer output by 1 bit and choosing the LSB (least significant bit) identical 1 for all words (Figure 10-6)

3 In practical A/D converters the origin can be shifted by adjusting an input offset voltage

Trang 7

10.3 Quantization and Number Representation 539

Figure 10-5 Offset Quantizer Employing Truncation b=3;

binary number interpreted as 2’s complement

Two's compl

314 2/4 l/4

0 -1 -3/4 -2l4 -114

Notice that the extra bits need not to be produced in the physical A/D converter but can be added in the digital processor after AD conversion

The midriser characteristic is symmetric and has no dead zone around zero input A feedback loop with a discontinuous step at zero in its error detector will dither about the step which is preferable to limit cycles induced by a dead zone

0,ll 0,lO

0,oo l,oo 1,Ol 1,lO 1,ll

7/a

518

3/a

118 -718 -518 -318 -ii8

Figure 10-6 Midriser Characteristic Obtained by Adding

an Additional Bit to the Quantizer Output

Trang 8

3/4 2l4 l/4

0 -314 -2l4 -l/4 -0

Figure 10-7 Quantizer Characteristic Obtained by Magnitude Truncation

The effect of the number representation on the quantizer characteristic is illustrated in Figure 10-7 In some applications it is advantageous to employ a sign-magnitude representation

Figure 10-7 shows the characteristic by truncation of the magnitude of an input signal This is no longer a uniform quantizer, the center interval has double width When the result of an operation contains more bits than can be handled downstream, the result must be shortened The effect of rounding, truncation,

or clipping is different for the various number representations

The resulting quantization characteristic is analogous to that obtained earlier with the exception that now both input and output are discretized However, quantizing discrete values is more susceptible to causing biases than quantizing continuous values Thus quantizing discrete values should be performed even more carefully

10.4 ASIC Design Case Study

In this case study we describe the design of a complete receiver chip for digital video broadcasting over satellite (DVB-S)[4] The data rate of DVB is in the order of 40 Msymbols/s The chip was realized in 0.5 p CMOS technology with a (maximum) clock frequency of 88 MHz The complexity of operation and the symbol rate locates it in the right upper corner of the complexity versus bandwidth plot of Figure 10-2 We outline the design flow of the project, the receiver structure, and the rationale of the decision making for its building blocks

Trang 9

10.4 ASIC Design Case Study 541 10.4.1 Implementation Loss

In an ASIC realization the chip area is in a first approximation proportional to the word length Since the cost of a chip increases rapidly with the area, choosing quantization parameters is a major task

The finite word length representation of numbers in a properly designed digital receiver ideally has the same effect as an additional white noise term The resulting decrease of the signal-to-noise ratio is called the implementation loss A second degradation with respect to perfect synchronization is caused by the variance of the synchronization parameter estimates and was called detection loss (see Chapter

7) The sum of these two losses, D total, is the decrease of the signal-to-noise ratio with respect to a receiver with perfect synchronization and perfect implementation

It is exemplarily shown in Figure 10-8 for an 8-PSK trellis coded modulation [5] The left curve shows the BER for a system with perfect synchronization and infinite precision arithmetics while the dotted line shows the experimental results

It is seen that the experimental curve is indeed approximately obtained by shifting the perfect system performance curve by Dtotal to the right

Quantization is a nonlinear operation It exhibits small-scale nonlinearity

in its individual steps and large-scale nonlinearity in the saturation range Its effect depends on the specific algorithm, it cannot be treated in general terms

10-g

10-e

, m Trcll s-Coda (unquontitsd)84~“\ D \,

’ DIRKS, excaerimsntal results ‘&oto’

SNR (ES/NO)

Figure 10-S Loss D total of the Experimental Receiver DIRECS

Trang 10

In a digital receiver the performance measure of interest is the bit error rate

We are allowed to nonlinearly distort the signal as long as the processed signal represents a sufficient statistics for detection of acceptable accuracy For this reason, quantization effects in digital receivers are distinctly different than in other areas of digital signal processing (such as audio signal processing), which require

a virtually quantization-error free representation of the analog signal

10.4.2 Design Methodology

At this level of complexity, system simulation is indispensable to evaluate the performance characteristics of the system with respect to given design alternatives The design framework should provide the designer with a flexible and efficient environment to explore the alternatives and trade-offs on different levels

of abstraction This comprises investigations on the

structural level, e.g., joint or separate carrier and timing synchronization algorithmic level, e.g., various estimation algorithms

implementation level, e.g., architectures and word lengths

There is a strong interaction between these levels of abstraction The prin- cipal task of a system engineer is to find a compromise between implementation complexity and system performance Unfortunately, the complexity of the problem prevents formalization of this optimization problem Thus, practical system design is partly based on rough complexity estimates and experience, particularly

at the structural level

Typically a design engineer works hierarchically to cope with the problems of

a complex system design In a first step structural alternatives are investigated The next step is to transform the design into a model that can be used for system simulation Based on this simulation, algorithmic alternatives and their performance are evaluated At first this can be done without word length considerations and may already lead to modifications of the system structure The third step comprises developing the actual implementation which requires a largely fixed structure to

be able to obtain complexity estimates of sufficient accuracy At this step bit- true modeling of all imperfections due to limited word lengths is indispensable to assess the final system performance

10.4.3 Digital Video Broadcast Specification

The main points of the DVB standard are summarized in Table lo- 1:

Trang 11

10.4 ASIC Design Case Study 543 Table 10-l Outline of the DVB Standard

Modulation

QPSK with Root-Raised Cosine Pulses

(excess bandwidth a = 0.35) and Gray-Encoding

Convolutional Channel Coding

Example for Symbol Rates

20 , “, 44 Msymbols/s The data rate is not specified as a single value but suggested to be within

a range of [ 18 to 681 Mb/s The standard defines a concatenated coding scheme consisting of an inner convolutional code and an outer Reed-Solomon (RS) block code

Figure 10-9 displays the bit error rate versus Eb/No after the convolutional

Trang 12

decoder for the two code rates of R = l/2 and R = 7/8 under the assumption

of perfect synchronization and perfect convolutional decoder implementation The output of the outer RS code is supposed to be quasi-error-free (one error per hour) The standard specifies a BER of 2 x 10m4 at &,/No = 4.2 dB for R = l/2 and

at &,/No = 6.15 dB for code rate R = 7/8 This leaves a margin of 1 dB (see Figure 10-9) for the implementation loss of the complete receiver This loss must also take into account the degradation due to the analog front end (AGC, filter, oscillator for down conversion) In the actual implementation the total loss was equally split into 0.5 dB for the analog front end and 0.5 dB for the digital part

10.4.4 Receiver Structure

Figure lo-10 gives a structural overview of the receiver A/D conversion is done at the earliest point of the processing chain The costly analog parts are thus reduced to the minimum radio frequency components Down conversion and sampling is accomplished by free-running oscillators

In contrast to analog receivers where down conversion and phase recovery

is performed simultaneously by a PLL, the two tasks are separated in a digital receiver The received signal is first down converted to baseband with a free- running oscillator at approximately the carrier frequency fc, This leaves a small residual normalized frequency offset R

The digital part consists of the timing and phase synchronization units, the Viterbi and RS-decoder, frame synchronizer, convolutional deinterleaver, descram-

Trang 13

10.4 ASIC Design Case Study 545

bler, and the MPEG decoder for the video data A micro controller interacts via 12C bus with the functional building blocks It controls the acquisition process of the synchronizers and is used to configure the chip

10.4.5 Input Quantization

i‘he input signal to the A/D converter comes from a noncoherent AGC (see Volume 1, p 278); the signal-to-noise ratio is unknown to the A/D converter We must consider both large-scale and small-scale quantization effects

An A/D converter can be viewed as a series connection of a soft limiter and

a quantizer with an infinite number of levels We denote the normalized overload level of the limiter by

with

V(Pi) = ce(Pi)

C,(pi) : threshold of the soft limiter

P,: signal power, Pn: noise power

Pi = P,/P,, signal-to-noise ratio of the input signal

Threshold level Cc, interval width Ax, and word length b are related by (Figure 10-11)

Cc + Ax = 2b-1Ax (10-4)

Two problems arise:

1 What is the optimum overload level V(pi)?

2 Since the signal-to-noise ratio is unknown, the sensitivity of the receiver performance with respect to a mismatch of the overload level V(pi) must be determined

b = Number of bits (b = 3)

= Soft limiter threshold

Figure lo-11 A/D Conversion Viewed as Series Connection of

Soft Limiter and Infinitely Extended Quantizer

Trang 14

It is instructive to first consider the simple example of a sinusoidal signal plus Gaussian noise We determine V(pi) subject to the optimization criterion (the selection of this criterion will be justified later on)

E[&(xICc, b) - 221” + min (10-5) with &(slC,,b) th e uniform midriser quantizer characteristic with parameters (Cc,b) In eq (10-S) we are thus looking for the uniform quantizer characteristic which minimizes the quadratic error between input signal and quantizer output The result of the optimization task is shown in Figure 10-12 In this figure the overload level is plotted versus pi with word length b as parameter With

PS = A2/2, A: amplitude of the sinusoidal signal, we obtain for high SNR

CdPi) 1/2 4-z A pi >> 1 (10-6) For large word length b we expect that the useful signal passes the limiter undistorted, i.e., Cc(pi)/A N 1 and V(pi) N a For a 4-bit-quantization we obtain V(pi) N 1.26 which is close to 4 The value of V(pg) decreases with decreasing word length The overload level increases at low pi For a sufficiently fine quantization V(pi) becomes larger than the amplitude of the useful signal in order to pass larger amplitude values due to noise

We return to the optimality criterion of eq (10-5) which was selected in order not to discard information prematurely In the present case this implies to pass the input signal amplitude undistorted to the ML decoder It is well known that the

ML decoder requires soft decision inputs for optimum performance The bit error rate increases rapidly for hard-quantized inputs We thus expect minimizing the

Figure lo-12 Optimum Normalized Overload Level V(pi) for

a Sinusoidal Signal plus Gaussian Noise

Parameter is word length b

Trang 15

8-PSK modulation Trellis encoded

The small scale effects of input quantization are shown exemplarily in Figure lo-13 for an 8-PSK modulation over an additive Gaussian noise channel [S] From this figure we conclude that a 4-bit quantization is sufficient and a 5-bit quantization

is practically indistinguishable from an infinite precision representation

The bit error performance of Figure lo-13 assumes an optimum overload factor V(pi) To determine the sensitivity of the bit error rate to a deviation from the optimum value a computer experiment was carried out The result is shown

in Figure 10-14 The clipping level is normalized to the square root of the signal power, a The BER in Figure lo- 14 is plotted for the two smallest values of Eb/Nc The input signal-to-noise ratio, E, /NO, is related to &,/NO via eq (3-33):

(10-7)

Trang 16

m Eb/NO-5.4dB h/T=O.‘OO2, (n+O.O R-7/8

For both input values of Es/No the results are plotted for zero and maximum residual frequency offset, I (RT) Imax = 0.1213 The sampling rate is T,/T = 0.4002

The BER is minimal for a value larger than 1 A design point of

cc

was selected The results show a strong asymmetry with respect to the sign of the deviation from the optimum value Clipping (C,/m < (C,/fil.,t) strongly increases the BER since the ML receiver is fed with hard-quantized input signals which degrades its performance The opposite case, (C,Im > (C,/fi(,,t),

is far less critical since it only increases the quantization noise by increasing the resolution Ax to

(10-9)

Trang 17

by factor 2

Figure lo-15 Synchronizer Structure

10.4.6 Timing and Phase Synchronizer Structure

The first step in the design process is the selection of a suitable synchronizer structure Timing and phase synchronizer are separated (see Figure 10-H), which avoids interaction between these units From a design point of view this separation

is also advantageous, since it eases performance analysis and thus reduces design time and test complexity An error feedback structure for both units was chosen for the following reasons: video broadcasting data is transmitted as a continuous stream Only an initial acquisition process which is not time-critical has to be performed For tracking purposes error feedback structures are well suited and realizable with reasonable complexity Among the candidate algorithms which were initially considered for timing recovery was the square and filter algorithm (Section 5.4) The algorithm works independently of the phase It delivers an unambiguous estimate, requires no acquisition unit, and is simple to implement This ease of implementation, however, exists only for a known nominal ratio of

T/T8 = 4 Since the ratio T/T8 is only known to be in the interval [2; 2.51, the square and filter algorithm is ruled out for this application

10.4.7 Digital Phase-Locked Loop (DPLL) for Phase Synchronization

The detailed block diagram of the DPLL is shown in Figure 10-16 In this figure the input word length of the individual blocks and the output truncation operations are shown The word length of the DPLL are found by bit-true computer simulation The notation used is summarized in Figure lo-17 below

Trang 18

Ki=3:15

Ks=8:13

Figure lo-16 Block Diagram of the DPLL for Carrier Phase Synchronization

be within min 5 K,., S max

Figure lo-17 Notation Used in Figure lo-16

We next discuss the functional building block in some detail The incoming signal is multiplied by a rotating phasor exp [j (QM’/2 + &)I by means of a CORDIC algorithm [6,7], subsequently filtered in the matched filter and decimated

to symbol rate One notices that the matched filter is placed inside the closed loop This is required to achieve a sufficiently large SNR at the phase error detector input The loop filter has a proportional plus integral path The output rate of the loop filter is doubled to 2/T by repeating each value which is subsequently accumulated

in the NCO The accumulator is the digital equivalent to the integrator of the VCO

of an analog PLL The modulo 2~ reduction (shown as a separate block) of the accumulator is automatically performed by the adder if one uses a 2’s complement number representation The DPLL is brought into lock by applying a sweep value

to the accumulator in the loop filter The sweep value and the closing of the loop after detecting lock is controlled by the block acquisition control

Trang 19

Matched Filter

The complex-valued matched filter is implemented as two equivalent real FIR filters with identical coefficients To determine the number of taps and the word length of the filter coefficients, Figure lo-18 is helpful The lower part shows the number of coefficients which can be represented for a given numerical value of the center tap As an example, assume a center value of he = 15 which can be represented by a 5 bit word in 2’s complement representation From Figure lo- 18

it follows that the number of nonzero coefficients is nine Increasing the word length of ho to 6, the maximum number of coefficients is nine for ha 2 21 and

14 for 21 < he 5 31 The quadratic approximation error is shown in the upper part of Figure 10-18, again as a function of the center tap value The error decays slowly for values ho > 15 while the number of filter taps and the word length increase rapidly As a compromise between accuracy and complexity the design point of Figure lo-18 and Table 10-2 was chosen

In the following we will explain the detailed considerations that lead to the hardware implementation of the matched filter This example serves the purpose

to illustrate the close interaction between the algorithm and architecture design found in DSP systems

A suitable architecture of an FIR filter is the transposed direct form (Figure lo- 19) Since the coefficients are fixed, the area-intensive multipliers can be replaced

by sequences of shift-and-add operations As we see in Figure 10-20, each “1”

of a coefficient requires an adder Thus, we encounter an additional constraint on the system design: choosing the coefficient in a way that results in the minimum number of “1”

Matched Filter 2/T Quantlzatlon Reaulta

value of center tap

Figure lo-18 Matched Filter Approximation

Trang 20

7

Figure lo-19 FIR Filter in Transposed Direct Form

Figure lo-20 Replacement of Multipliers by Shift-and-Add Operations The number of l’s can be further reduced by changing the number representation of the coefficients In filters with variable coefficients (e.g., in the interpolator filter) this can be performed by Booth encoding [S] For fixed coefficient filters each sequence of l’s (without the most significant bit) in a 2’s complement number can be described in a canonical signed digit (CSD) representation with at most

c 1 x 2” = 1 x 2”+1 - 1 x 2N (10-10)

d=N

This leads to the matched filter coefficients in CSD format shown in Table 10-2

As a result, we need nine adders to implement a branch of the matched filter For these adders different implementation alternatives exist The simplest is the carry ripple adder (Figure 10-21) It consists of full adder cells This adder requires the smallest area but is the slowest one since the critical path consists

Table 10-2 Matched Filter Coefficients

Coefficient Numerical

Value 2’s complement

Canonical Signed Digit Representation

Trang 21

in, full adder

half adder

carry save representation

Figure lo-21 Carry Ripple versus Carry Save Adder

of the entire carry path By choosing an alternative number representation of the intermediate results, we obtain a speedup at a slightly increased area: the carry output of each adder is fed into the inputs of the following filter tap This carry save format is a redundant number representation since each numerical value can

be expressed by different binary representations Now, the critical path consists

of one full adder cell only

The word length of the intermediate results increases from left to right in Figure 10-19 This implies a growing size of the adders By reordering the partial products or the “bitplanes” [9, 10, 1 l] in such a way that the smallest intermediate results are added first, the increase of word length is the smallest possible Thus, the silicon area is minimized Taking into account that the requirements on the processing speed and the silicon technology allow to place three bitplanes between two register slices, we get the structural block diagram of the matched filter depicted in Figure 10-22

Trang 22

input’ Input ’ Input4 input3 input 2 input’ input’

Figure lo-23 Detailed Block Diagram of Matched Filter

An additional advantage of the reordering of the bitplanes can be seen in Figure lo-23 which shows the structure in detail It consists of full and half adder cells and of registers The carry overflow correction [lo, 121 cells are required

to reduce the word length of the intermediate results in carry save representation The vector merging adder (VMA) converts the filter output from carry save back to 2’s complement representation Due to the early processing of the bitplanes with

Trang 23

the smallest numerical coefficient values, the least significant bits are computed first and may be truncated without side effects The word length of the VMA is decreased as well

Phase Error Detector

A decision-directed detector with hard quantized decisions is used (see Section 5.8):

g(eo - 8) = Im[ic;z,(+-je’] (10-11) with

hn = sign{Re[t,(g)e-j’]} +j sign{Im[z,(Z)e-j”]} (10-12) Inserting the hard quantized symbols tin into the previous equation we obtain

!I 00 ( - 8) = Re(&,)Im[z,(++‘] - Im(&)Re [~~(2)e-~‘] (10-13)

Tiêu đề	Digital Communication Receivers: Synchronization, Channel Estimation, and Signal Processing
Tác giả	Heinrich Meyr, Marc Moeneclaey, Stefan A. Fechtel
Trường học	John Wiley & Sons, Inc.
Chuyên ngành	Digital Communication
Thể loại	Chương
Năm xuất bản	1998

Định dạng
Số trang	46
Dung lượng	3,68 MB