Quantized Filter Analysis

When the ﬁlters are built with registers of ﬁnite length andthe analog-to-digital converters ADCs are designed to operate at increasinglyhigh sampling rates, thereby reducing the number

Trang 1

Quantized Filter Analysis

Pro-of∼2−52= 2.22 × 10−6 Obviously this range is so large and the precision withwhich the numbers are expressed is so small that the numbers can be assumed tohave almost “inﬁnite precision.” Once these digital ﬁlters and DFT-IDFT havebeen obtained by the procedures described so far, they can be further analyzed

by mainframe computers, workstations, and PCs under “inﬁnite precision.” Butwhen the algorithms describing the digital ﬁlters and FFT computations have

to be implemented as hardware in the form of special-purpose microprocessors

or application-speciﬁc integrated circuits (ASICs) or the digital signal processor(DSP) chip, many practical considerations and constraints come into play Theregisters used in these hardware systems, to store the numbers have ﬁnite length,and the memory capacity required for processing the data is determined by the

number of bits—also called the wordlength —chosen for storing the data More

memory means more power consumption and hence the need to minimize thewordlength In microprocessors and DSP chips and even in workstations and PCs,

we would like to use registers with as few bits as possible and yet obtain highcomputational speed, low power, and low cost But such portable devices such ascell phones and personal digital assistants (PDAs) have a limited amount of mem-ory, containing batteries with low voltage and short duration of power supply.These constraints become more severe in other devices such as digital hearingaids and biomedical probes embedded in capsules to be swallowed So there is a

Introduction to Digital Signal Processing and Filter Design, by B A Shenoi

354

Trang 2

FILTER DESIGN–ANALYSIS TOOL 355

great demand for designing digital filters and systems in which they are ded, with the lowest possible number of bits to represent the data or to store thedata in their registers When the filters are built with registers of finite length andthe analog-to-digital converters (ADCs) are designed to operate at increasinglyhigh sampling rates, thereby reducing the number of bits with which the samples

embed-of the input signal are represented, the frequency response embed-of the ﬁlters and theresults of DFT-IDFT computations via the FFT are expected to differ from thosedesigned with “inﬁnite precision.” This process of representing the data with a

ﬁnite number of bits is known as quantization, which occurs at several points

in the structure chosen to realize the ﬁlter or the steps in the FFT computation

of the DFT-IDFT As pointed out in the previous chapter, a vast number ofstructures are available to realize a given transfer function, when we assume infi-nite precision But when we design the hardware with registers of finite length toimplement their corresponding difference equation, the effect of finite wordlength

is highly dependent on the structure Therefore we ﬁnd it necessary to analyzethis effect for a large number of structures This analysis is further compounded

by the fact that quantization can be carried out in several ways and the arithmeticoperations of addition and multiplication of numbers with ﬁnite precision yieldresults that are inﬂuenced by the way that these numbers are quantized

In this chapter, we discuss a new MATLAB toolbox called FDA Tool able1for analyzing and designing the filters with a finite number of bits for thewordlength The different form of representing binary numbers and the results ofadding and multiplying such numbers will be explained in a later section of thischapter The third factor that influences the deviation of filter performance fromthe ideal case is the choice of FIR or IIR filter The type of approximation chosenfor obtaining the desired frequency response is another factor that also influencesthe effect of finite wordlength We discuss the effects of all these factors in thischapter, illustrating their influence by means of a design example

avail-7.2 FILTER DESIGN – ANALYSIS TOOL

An enormous amount of research has been carried out to address these problems,but analyzing the effects of quantization on the performance of digital filtersand systems is not well illustrated by specific examples Although there is noanalytical method available at present to design or analyze a filter with finiteprecision, some useful insight can be obtained from the research work, whichserves as a guideline in making preliminary decisions on the choice of suitablestructures and quantization forms Any student interested in this research workshould read the material on finite wordlength effects found in other textbooks[1,2,4] In this chapter, we discuss the software for filter design and analysisthat has been developed by The MathWorks to address the abovementioned

1 MATLAB and its Signal Processing Toolbox are found in computer systems of many schools and universities but the FDA Tool may not be available in all of them.

Trang 3

problem2 This FDA Tool ﬁnite design–analysis (FDA) tool, found in the FilterDesign Toolbox, works in conjunction with the Signal Processing (SP) Toolbox.Unlike the SP Toolbox, the FDA Tool has been developed by making extensive use

of the object-oriented programming capability of MATLAB, and the syntax for thefunctions available in the FDA Tool is different from the syntax for the functions

we ﬁnd in MATLAB and the SP Toolbox When we log on to MATLAB and type

functions as command lines to design and analyze quantized ﬁlters, whereas theother screen is a graphical user interface (GUI) to serve the same purpose TheGUI window shown in Figure 7.1a displays a dialog box with an immense array

of design options as explained below

First we design a ﬁlter with double precision on the GUI window using theFDA Tool or on the command window using the Signal Processing Toolbox andthen import it into the GUI window In the dialog box for the FDA Tool, we canchoose the following options under the Filter Typepanel:

1 Lowpass

2 Highpass

3 Bandpass

4 Bandstop

5 Differentiator By clicking the arrow on the tab for this feature, we get

the following additional options

• Constrained least-pth norm

2 The author acknowledges that the material on the FDA Tool described in this chapter is based on

the Help Manual for Filter Design Toolbox found in MATLAB version 6.5.

Trang 4

Trang 5

To the right of the panel for design method is the one for filter order We caneither specify the order of the filter or let the program compute the minimum order(by use of SP Tool functionsChebord, Buttord, etc.) Remember to choose anodd order for the lowpass filter when it is to be designed as a parallel connection

of two allpass filters, if an even number is given as the minimum order Belowthis panel is the panel for other options, which are available depending on theabovementioned inputs For example, if we choose a FIR filter with the windowoption, this panel displays an option for the windows that we can choose Byclicking the button for the windows, we get a dropdown list of more than 10windows To the right of this panel are two panels that we use to specify thefrequency specifications, that is, to specify the sampling frequency, cutoff fre-quencies for the passband and stopband, the magnitude in the passband(s) andstopband(s), and so on depending on the type of filter and the design methodchosen These can be expressed in hertz, kilohertz, megahertz, gigahertz, or nor-malized frequency The magnitude can be expressed in decibels, with magnitudesquared or actual magnitude as displayed when we clickAnalysisin the mainmenu bar and then click the option Frequency Specificationsin the drop-down list The frequency specifications are displayed in the Analysis panel,which is above the panel for frequency specifications, when we start with thefilter design

The options available under any of these categories are dependent on theother options chosen All the FDA Tool functions, which are also the functions

of the SP Tool, are called overloaded functions After all the design options are

chosen, we click theDesign Filterbutton at the bottom of the dialog box Theprogram designs the ﬁlter and displays the magnitude response of the ﬁlter in the

icons shown above this area, the Analysis area displays one of the followingfeatures:

• Magnitude response

• Phase response

• Magnitude and phase response

• Group delay response

Trang 6

are available in a ﬁgure displayed under the SP Tool For example, by clicking theEditbutton and then selecting eitherFigure Properties, Axis Properties,

properties of these three objects can be modiﬁed

Finally, we look at the first panel titled Current Filter Information.This lists the structure, order, and number of sections of the filter that we havedesigned Below this information, it indicates whether the filter is stable andpoints out whether the source is the designed filter (i.e., reference filter designedwith double precision) or the quantized filter with a finite wordlength The defaultstructure for the IIR reference filter is a cascade connection of second-ordersections, and for the FIR filter, it is the direct form When we have completedthe design of the reference filter with double precision, we verify whether itmeets the desired specification, and if we wish, we can convert the structure ofthe reference filter to any one of the other types listed below We click theEditbutton on the main menu and then theConvert Structurebutton A dropdownlist shows the structures to which we can convert from the default structure orthe one that we have already converted

For IIR ﬁlters, the structures are

1 Direct form I

2 Direct form II

3 Direct form I transposed

4 Direct form II transposed

2[A1(z) − A2(z)], respectively The allpass ﬁlters A1(z) and

A2(z) are realized in the form of lattice allpass structures like the one shown

in Figure 6.19b The MA and AR structures are considered special cases of thelattice ARMA structure, which are also discussed in Chapter 6

For FIR ﬁlters, the options for the structures are

• Direct-form FIR

• Direct-form FIR transposed

• Direct-form symmetric FIR

When we have converted to a new structure, the information that can bedisplayed in theAnalysisarea, like the coefficients of the filter, changes We alsolike to point out that any one of the lowpass, highpass, bandpass, and bandstopfilters that we have designed can be converted to any other type, by clicking

Trang 7

the first icon on the left-hand bar in the dialog box and adding the frequencyspecifications for the new filter.

7.3 QUANTIZED FILTER ANALYSIS

When we have finished the analysis of the reference filter, we can move toconstruct the quantized filter as an object, by clicking the last icon on the barabove the Analysisarea and the second icon on the left-hand bar, which setsthe quantization parameters The panel below the Analysis area now changes

as shown in Figure 7.1b We can construct three objects inside the FDA Tool:

properties have values, which may be strings or numerical values Currently

we use the objects qfilt and quantizer to analyze the performance of thereference ﬁlter when it is quantized When we click the Turn Quantization

On button and the Set Quantization Parameters icon, we can choose thequantization parameters for the coefficients of the filter Quantization of the filtercoefficients alone are sufficient for finding the finite wordlength effect on themagnitude response, phase response, and group delay response of the quantizedfilter, which for comparison with the response of the reference filter displayed

in theAnalysisarea Quantization of the other data listed below are necessarywhen we have to ﬁlter an input signal:

• The input signal

• The output signal

• The multiplicand: the value of the signal that is multiplied by the multiplier

• The product of the multiplicand and the multiplier constant

• The output signal

The objectquantizeris used to convert each of these data, and this object hasfour properties:Mode, Round Mode, Overflow mode, and Format In order tounderstand the values of these properties, it is necessary to review and understandthe binary representation of numbers and the different results of adding them andmultiplying them These will be discussed next

7.4 BINARY NUMBERS AND ARITHMETIC

Numbers representing the values of the signal, the coefficients of both the filterand the difference equation or the recursive algorithm and other properties cor-responding to the structure for the filter are represented in binary form They arebased on the radix of 2 and therefore consist of only two binary digits, 0 and 1,

which are more commonly known as bits, just as the decimal numbers based on a

radix of 10 have 10 decimal numbers from 0 to 9 Placement of the bits in a stringdetermines the binary number as illustrated by the example x2= 10011010,

Trang 8

BINARY NUMBERS AND ARITHMETIC 361

which is equivalent to x10= 1 × 20+ 1 × 23+ 2−1+ 2−3 = 9.625 In this

dis-cussion of binary number representation, we have used the symbol to separatethe integer part and the fractional part and the subscripts 2 and 10 to denote thebinary number and the decimal number Another example given by

x10=

I−1

i =−F

In the binary representation (7.3), the integer part containsI bits and the bit b I−1

at the leftmost position is called the most signiﬁcant bit (MSB); the fractional

part containsF bits, and the bit b −F at the rightmost position is called the least signiﬁcant bit (LSB) This can only represent the magnitude of positive numbers and is known as the unsigned ﬁxed-point binary number In order to represent positive as well as negative numbers, one more bit called the sign bit is added to

the left of the MSB The sign bit, represented by the symbol s in (7.5), assigns

a negative sign when this bit is 1 and a positive sign when it is 0 So it becomes

a signed magnitude ﬁxed-point binary number Therefore a signed magnitudenumber x2= 110011010 is x10 = −9.625 In general, the signed magnitude

ﬁxed-point number is given by

x10 = (−1) s

I−1

i =−F

and the total number of bits is called the wordlength w = 1 + I + F When

two signed magnitude numbers with widely different values for the integer partand/or the fractional part have to be added, it is not easy to program the adders

in the digital hardware to implement this operation So it is common practice

to choose I = 0, keeping the sign bit and the bits for the fractional part only

so thatF = w − 1 in the signed magnitude ﬁxed-point representation But when

two numbers larger than 0.5 in decimal value are added, their sum is largerthan 1, and this cannot be represented by the format shown above, whereI = 0

Trang 9

So two other form of representing the numbers are more commonly used: the

one’s-complement and two’s-complement forms (also termed one-complementary and two-complementary forms) for representing the signed magnitude ﬁxed-point

numbers In the one’s-complement form, the bits of the fractional part are replaced

by their complement, that is, the ones are replaced by zeros and vice versa Byadding a one as the least signiﬁcant bit to the one’s-complement form, we getthe two’s-complement form of binary representation; the sign bit is retained inboth forms But it must be observed that when the binary number is positive, thesigned magnitude form, one’s-complement form, and two’s-complement form arethe same

Example 7.1

Given:x2= 01100 is the 5-bit, signed magnitude ﬁxed-point number equal to

x10= +2−1+ 2−2 = 0.75 and v2= 11100 is equal tov10 = −0.75 The one’s

complement ofv2= 11100 is 10011, whereas the two’s complement ofv2is

where M is the mantissa, which is usually represented by a signed magnitude,

ﬁxed-point binary number, andE is a positive- or negative-valued integer with

E bits and is called the exponent To get both positive and negative exponents,

the bias is provided by an integer, usually the bias is chosen as e7− 1 = 127when the exponent E is 8 bits or e10− 1 = 1023 when E is 11 bits Without

the bias, an 8-bit integer number varies from 0 to 255, but with a bias of 127,the exponent varies from−127 to 127 Also the magnitude of the fractional part

F is limited to 0 ≤ M < 1 In order to increase the range of the mantissa, one

more bit is added to the most signiﬁcant bit of F so that it is represented as

(1.F ) Now it is assumed to be normalized, but this bit is not counted in the total

wordlength

The IEEE 754-1985 standard for representing ﬂoating-point numbers is themost common standard used in DSP processors It uses a single-precision formatwith 32 bits and a double-precision format with 64 bits

The single-precision ﬂoating point number is given by

According to this standard, the (32-bit) single-precision, ﬂoating-point numberuses one sign bit, 8 bits for the exponent, and 23 bits for the fractional part

Trang 10

b7 b0

E (8 bits) F (23 bits)

Figure 7.2 IEEE format of bits for the 32- and 64-bit ﬂoating-point numbers.

F (and one bit to normalize it) A representation of this format is shown in

Figure 7.2a But this formula is implemented according to the following rules inorder to satisfy conditions other than the ﬁrst one listed below:

1 When 0< E < 255, then X10 = (−1) s (1F )2 E−127.

2 WhenE = 0 and M = 0, then X10= (−1) s (0F )(2−126).

3 When E = 255 and M = 0, then X10 is not a number and is denoted as

N aN

4 WhenE = 255 and M = 0, then X10= (−1) s∞

5 WhenE = 0 and M = 0, then X10= (−1) s (0).

Here,(1F ) is the normalized mantissa with one integer bit and 23 fractional bits,

whereas (0F ) is only the fractional part with 23 bits Most of the commercial

DSP chips use this 32-bit, single-precision, ﬂoating-point binary representation,although 64-bit processors are becoming available Note that there is no provisionfor storing the binary point () in these chips; their registers simply store the bits

and implement the rules listed above The binary point is used only as a notationfor our discussion of the binary number representation and is not counted in thetotal number of bits

The IEEE 754-1985 standard for the (64-bit), double-precision, ﬂoating-pointnumber is expressed by

It uses one sign bit, 11 bits for the exponent E, and 52 bits for F (one bit is

added to normalize it but is not counted) The representation for this format isshown in Figure 7.2b

Example 7.2

Consider the 16-bit ﬂoating-point number with 8 bits for the unbiased exponentand 4 bits for the denormalized fractional part, namely,E = 8 and F = 4 The

Trang 11

binary number is represented as

X2= 01000000100110Then the exponent E2 = 100000010; therefore E10= 130, the denormalizedmantissaF2=0110, which givesF10 = 0.375 Therefore the normalized man-

tissa M = 1.375 Finally X10= −(1.375)2130−127= +(1.375)23= 11

Consider another example:

Y2= 1000001110110Then E2 = 00000111, E10 = 7, F2=0110, F10 = 0.375, and ﬁnally Y10 =

In the same panel showing the quantization of the different data, there are twoother columns listed as Round Mode and Overflow Mode When we click thebutton for theRound Mode, we get the following options in the dropdown list:

Trang 12

rounded toward negative infinity, and positive numbers that lie halfway betweentwo quantization levels are rounded toward positive infinity If the number liesexactly halfway between two levels, it is rounded toward positive infinity Theoperation called’floor’is commonly known as truncation since it discards all

the bits beyond theb bits, and this results in a number that is nearest to negative

inﬁnity These two are the most commonly used operations in binary arithmetic.They are illustrated in Figure 7.3, where the dotted line indicates the actual value

ofx and the solid line shows the quantized value x Q withb bits.

toward positive inﬁnity, and thefixoperation rounds to the nearest level towardzero Theconvergentoperation is the same as rounding except that in the casewhen the number is exactly halfway, it is rounded down if the penultimate bit iszero and rounded up if it is one

Suppose that two positive numbers or two negative numbers in the ﬁxed-pointformat withb bits are added together It is possible that the result could exceed

Trang 13

the lower or upper limits of the range within which numbers withb bits lie For

a signed magnitude, ﬁxed-point number with wordlength w and fraction length

f , the numbers range from −2w −f −1 to 2w −f −1− 2−f, whereas the range for

floating-point numbers is as given in Table 7.1 When the sum or difference oftwo fixed-point numbers or the product of two floating-point numbers exceedsits normal range of values, there is an overflow or underflow of numbers The

overﬂow mode in the FDA panel for the quantized ﬁlter gives two choices: to

outside the normal range to a value within the maximum or minimum value inthe range; that is, values greater than the maximum value are set to the maximumvalue, and values less than the minimum value are set to the minimum value inthe range This is the default choice for the overﬂow mode

There is a third choice: to scale all the data This choice is made by clicking

use additional steps to adjust the quantization parameters, scale the coefficientswithout changing the overall gain of the filter response, and so on The coefficientsare scaled appropriately such that there is no overflow or underflow of the data

at the output of every section in the realization

Before we investigate the effects of ﬁnite wordlength and the many realizationstructures, by using all the options in the dialog box in the FDA Tool, it is useful

to know some of the insight gleaned from the vast amount of research on thiscomplex subject It has been found that in general, the IIR ﬁlters in the cascadeconnection of second-order sections, each of them realized in direct form II, areless sensitive to quantization than are those realized in the single section of directform I and direct form II The lattice ARMA structure and the special case ofthe AR structure are less sensitive to quantization than is the default structuredescribed above The lattice-coupled allpass structure, also known as “two allpassstructures in parallel,” is less sensitive than the lattice ARMA structure We willdetermine whether realizing the two allpass ﬁlters A1(z) and A2(z) by lattice

allpass structures has any advantages of further reduction in the quantizationeffects If the specified frequency response can be realized by an FIR filter,then the direct-form or the lattice MA structure realizing it may be preferable tothe structures described above, because the software development and hardwaredesign of the FIR filter is simpler, is always stable, has linear phase, and is freefrom limit cycles

We first design the reference filter that meets the desired specifications; then

we try different structures for the quantized filter with different levels and types ofquantization Comparing the frequency response, phase response, and group delayresponse of the reference filter with those of the quantized filter, we find out whichstructure has the lowest deviation from the frequency response, phase response,and so on of the reference filter, with the lowest finite wordlength The FDA Tooloffers us powerful assistance in trying a large number of options available for thetype of filter, design method, frequency specification, quantization of the severalcoefficients, and other variables, and comparing the results for the reference filter

Tiêu đề	Quantized Filter Analysis
Trường học	John Wiley & Sons, Inc.
Chuyên ngành	Digital Signal Processing
Thể loại	PhD thesis
Năm xuất bản	2006
Thành phố	New York

Định dạng
Số trang	27
Dung lượng	490,71 KB