SPEECH CODING ALGORITHMSFoundation and Evolution of Standardized Coders WAI C.. SPEECH CODING ALGORITHMSFoundation and Evolution of Standardized Coders WAI C.. My sincere hope is that th
Trang 2SPEECH CODING ALGORITHMS
Foundation and Evolution
of Standardized Coders
WAI C CHU
Mobile Media Laboratory
DoCoMo USA Labs
San Jose, California
A JOHN WILEY & SONS, INC., PUBLICATION
Trang 4SPEECH CODING ALGORITHMS
Trang 6SPEECH CODING ALGORITHMS
Foundation and Evolution
of Standardized Coders
WAI C CHU
Mobile Media Laboratory
DoCoMo USA Labs
San Jose, California
A JOHN WILEY & SONS, INC., PUBLICATION
Trang 7Copyright # 2003 by John Wiley & Sons, Inc All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee
to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-750-4470, or on the web at www.copyright.com Requests to the Publisher for permission should
be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken,
NJ 07030, (201) 748-6011, fax (201) 748-6008, e-mail: permreq@wiley.com.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may be created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation You should consult with a professional where appropriate Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services please contact our Customer Care Department within the U.S at 877-762-2974, outside the U.S at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats Some content that appears in print, however, may not be available in electronic format.
Library of Congress Cataloging-in-Publication Data:
Trang 8Intelligence is the fruit of industriousnessAccretion of knowledge creates genii
A Chinese proverb
Trang 101.1 Overview of Speech Coding / 2
1.2 Classification of Speech Coders / 8
1.3 Speech Production and Modeling / 11
1.4 Some Properties of the Human Auditory System / 18
1.5 Speech Coding Standards / 22
1.6 About Algorithms / 26
1.7 Summary and References / 31
2.1 Pitch Period Estimation / 33
2.2 All-Pole and All-Zero Filters / 45
2.3 Convolution / 52
2.4 Summary and References / 57
Exercises / 57
vii
Trang 113 STOCHASTIC PROCESSES AND MODELS 613.1 Power Spectral Density / 62
3.2 Periodogram / 67
3.3 Autoregressive Model / 69
3.4 Autocorrelation Estimation / 73
3.5 Other Signal Models / 85
3.6 Summary and References / 86
Exercises / 87
4.1 The Problem of Linear Prediction / 92
4.2 Linear Prediction Analysis of Nonstationary Signals / 96
4.3 Examples of Linear Prediction Analysis of Speech / 101
4.4 The Levinson–Durbin Algorithm / 107
4.5 The Leroux–Gueguen Algorithm / 114
4.6 Long-Term Linear Prediction / 120
4.7 Synthesis Filters / 127
4.8 Practical Implementation / 131
4.9 Moving Average Prediction / 137
4.10 Summary and References / 138
Trang 127 VECTOR QUANTIZATION 1847.1 Introduction / 185
7.2 Optimal Quantizer / 188
7.3 Quantizer Design Algorithms / 189
7.4 Multistage VQ / 194
7.5 Predictive VQ / 216
7.6 Other Structured Schemes / 219
7.7 Summary and References / 221
Exercises / 222
8.1 Spectral Distortion / 227
8.2 Quantization Based on Reflection Coefficient and
Log Area Ratio / 232
8.3 Line Spectral Frequency / 239
8.4 Quantization Based on Line Spectral Frequency / 252
8.5 Interpolation of LPC / 256
8.6 Summary and References / 258
Exercises / 260
9.1 Speech Production Model / 264
9.2 Structure of the Algorithm / 268
9.3 Voicing Detector / 271
9.4 The FS1015 LPC Coder / 275
9.5 Limitations of the LPC Model / 277
9.6 Summary and References / 280
Exercises / 281
10.1 Multipulse Excitation Model / 286
10.2 Regular-Pulse-Excited–Long-Term Prediction / 289
10.3 Summary and References / 295
Exercises / 296
11.1 The CELP Speech Production Model / 300
CONTENTS ix
Trang 1311.2 The Principle of Analysis-by-Synthesis / 301
11.3 Encoding and Decoding / 302
11.4 Excitation Codebook Search / 308
11.5 Postfilter / 317
11.6 Summary and References / 325
Exercises / 326
12.1 Improving the Long-Term Predictor / 331
12.2 The Concept of the Adaptive Codebook / 333
12.3 Incorporation of the Adaptive Codebook to
the CELP Framework / 336
12.4 Stochastic Codebook Structure / 338
12.5 Adaptive Codebook Search / 341
12.6 Stochastic Codebook Search / 344
12.7 Encoder and Decoder / 346
12.8 Summary and References / 349
Exercises / 350
13.1 The Core Encoding Structure / 354
13.2 Search Strategies for Excitation Codebooks / 356
13.3 Excitation Codebook Searches / 357
13.4 Gain Related Procedures / 362
13.5 Encoder and Decoder / 366
13.6 Summary and References / 368
Exercises / 369
14.1 Strategies to Achieve Low Delay / 373
14.2 Basic Operational Principles / 375
14.3 Linear Prediction Analysis / 377
14.4 Excitation Codebook Search / 380
14.5 Backward Gain Adaptation / 385
14.6 Encoder and Decoder / 389
14.7 Codebook Training / 391
14.8 Summary and References / 393
Exercises / 394
x CONTENTS
Trang 1415 VECTOR QUANTIZATION OF LINEAR
16.3 Encoding and Decoding / 433
16.4 Algebraic Codebook Search / 437
16.5 Gain Quantization Using Conjugate VQ / 443
16.6 Other ACELP Standards / 446
16.7 Summary and References / 451
Exercises / 451
17.1 The MELP Speech Production Model / 455
18.1 Adaptive Rate Decision / 487
18.2 LP Analysis and LSF-Related Operations / 494
18.3 Decoding and Encoding / 496
18.4 Summary and References / 498
Exercises / 499
19.1 The Scope of Quality and Measuring Conditions / 501
CONTENTS xi
Trang 1519.2 Objective Quality Measurements for Waveform Coders / 502
19.3 Subjective Quality Measures / 504
19.4 Improvements on Objective Quality Measures / 505
ORTHOGONALITY, BASIS, LINEARINDEPENDENCE, AND THE
xii CONTENTS
Trang 16Soon after the hardware was finished, the focus switched to software (or firmware)design, mainly dealing with the control of various on-board peripheral devices Mytrue interest, however, was the program code inside the mixed signal processor,which was developed by a separate team of ‘‘advanced’’ engineers I was told thatvoice signals were compressed using a code-excited linear prediction (CELP)algorithm Also, it was possible to play back fixed announcement messages—such
as numbers and days of the week—with the messages stored in the linear predictioncoding (LPC) format I had no idea what these algorithms were, nor how theyworked to compress speech However, I was eager to learn the details, and decided
to go back to school and pursue a PhD with concentration in speech coding.This book is the result of my personal experience as a researcher and practitioner
in the field of speech coding Four years ago I decided to put in extra hours, usuallylate nights and early mornings as well as weekends, to organize the literature inspeech coding and develop it into a logical presentation in terms of content andterminology Speech coding has evolved into a highly matured branch of signal
xiii
Trang 17processing, with deployment in a plethora of products, such as, cellular phones,answering machines, communication devices, and more recently, voice overinternet protocol (VoIP) It is obvious that a thorough textbook is necessaryfor students, professors, and engineering professionals to handle the subjectappropriately My sincere hope is that the availability of a book that collectsmany of the techniques used in speech coding and presents them in an accessiblefashion will create excitement and enthusiasm, ensuring continuous rapid advances
in the field
Philosophy and Approach
Speech Coding Algorithms reflects the core subject of the book, since most codingtechniques are implemented as algorithms, or computational procedures performed
by a processor However, this is by no means an exhaustive documentation of allmethods developed in this field; it is rather the study of the most successfultechniques, defined as those incorporated in a standard By doing so we concentrateour effort on understanding the most influential ideas, which is a rather efficientmanner to navigate this vast territory of knowledge
In my own personal learning curve, I found that there is a different andrefreshing lesson to be found in every standard To understand a new standard it
is often necessary to look back into the developed techniques adopted by paststandards or studies Attempting to learn by reading the official documentationdescribing the standard is very often a frustrating experience, since the assumptionmade in preparing those materials is that the audience consists of experts in thesubject, and hence the logical order and justification of a given approach isroutinely omitted Therefore the origin and the reason behind a certain practicecannot be fully understood This might not be a problem if one’s objective is toimplement the algorithm without comprehending it However, for those researcherseager to delve deeply into its roots, alternative reference sources must be explored,which can be a strenuous and prolonged process In this book I have summarizedthe knowledge acquired over an extended period of time, with the intention offilling the void between principles and implementations
In writing this book, a balance is sought between theory and practice, andbetween intuition and rigor Theoretical ideas are included only if they are used tosolve practical problems, and thorough proofs are provided Speech coding isrelated to human perception, and therefore a degree of fuzziness exists, in the sensethat no absolute right or wrong can be established for certain situations; in otherwords, no mathematical proofs are obtainable In these cases, solutions are oftenfound and justified on an intuitive basis For the most part, the book is meant to bepragmatic, since the discussed techniques are widely used in industry
Prerequisites
The minimum background required to understand the book is explained, withreference to popular textbooks where the relevant subjects can be found
xiv PREFACE
Trang 18Advanced calculus, including complex variables [Churchill and Brown, 1990].
Discrete-time signals and systems, Fourier transforms, z-transforms, filtering,and convolution [Oppenheim and Schafer, 1989; Stearns and Hush, 1990]
Random variables and stochastic processes, expectation, probability, andwide-sense stationarity [Papoulis, 1991; Peebles 1993]
Linear algebra, including linear equations, matrices, and vectors [Strang,1988]
Experience with high-level programming using a language such as C.The above list is covered in most undergraduate Electrical Engineering curricula;with this background, the book is self-contained
Organization
The text is divided into 19 chapters Chapter 1 provides an overview of the subjectscovered, with references to various aspects of speech coding, standards, algorithms,and comments on notation and terminology Chapter 2 is a review of some signalprocessing techniques, some are very general, but others are less known outsidespeech coding literature Chapter 3 contains some foundation for stochasticprocesses and models, which are important for an understanding of the theoreticalaspects Chapter 4 is about linear prediction, the integral part of almost all modernspeech coders Chapter 5 reviews the various aspects of scalar quantization, whichare utilized routinely by many speech coding algorithms One of the earliest digitalcoding techniques is pulse code modulation (PCM); it and its variants are the topic
of Chapter 6 Chapter 7 deals with vector quantization, which has become more andmore important for the achievement of high efficiency in coding systems Linearprediction coefficients (LPC) are normally quantized for transmission as part of thecompressed bit-stream; Chapter 8 covers the various methods for scalar quantiza-tion of these coefficients One of the landmarks in low bit-rate speech coding is thelinear prediction coding (LPC) algorithm, discussed in Chapter 9 Chapter 10
is devoted to regular pulse excitation coders, with a thorough description of theGSM 6.10 standard Principles of code-excited linear prediction (CELP) are given
in Chapter 11, covering the various aspects of analysis-by-synthesis, signalcalculation, postfilter design, and efficiency Chapters 12 and 13 present thestructure of two standardized CELP coders: FS1016 and IS54, respectively; theseare both milestones in speech coding development Chapter 14 is dedicated tothe G.728 low-delay CELP standard, with thorough explanations of strategies fordelay reduction and detailed structures of the coder Vector quantization of LPC
is included in Chapter 15, representing a huge advance with respect to scalarquantization techniques covered in Chapter 8, and methods used by variousstandardized coders are analyzed The highly influential algebraic CELP (ACELP)algorithm is covered in Chapter 16, where several ACELP-based standards aredescribed, with focus on the G.729 standard The mixed excitation linear prediction(MELP) algorithm is discussed in Chapter 17, and is shown to be an improvement
PREFACE xv
Trang 19upon the LPC coder, covered in Chapter 9 Chapter 18 is devoted to the IS96variable bit-rate CELP algorithm, which is a source-controlled multimode coderwith the operating mode selected by the input characteristics of the speech signal.Finally, Chapter 19 is concerned with various methods to assess the quality ofspeech signals, especially those processed by a speech coding algorithm.
The following table summarizes the chapters and their prerequisites
Chapter Title Prerequisites
1 Introduction
2 Signal Processing Techniques 1
3 Stochastic Processes and Models
4 Linear Prediction 1, 2, 3
5 Scalar Quantization
6 Pulse Code Modulation and its Variants 4, 5
7 Vector Quantization 5
8 Scalar Quantization of Linear Prediction Coefficients 4, 5
9 Linear Prediction Coding 4, 8
10 Regular-Pulse Excitation Coders 4, 8
11 Code-Excited Linear Prediction 2, 4
12 The Federal Standard Version of CELP 2, 8, 11
13 Vector Sum Excited Linear Prediction 8, 12
14 Low-Delay CELP 4, 11
15 Vector Quantization of Linear Prediction Coefficients 7 , 8
16 Algebraic CELP 7, 12, 15
17 Mixed Excitation Linear Prediction 9, 15
18 Source-Controlled Variable Bit-Rate CELP 11
19 Speech Quality Assessment 1
Acknowledgments
Throughout my professional career, I have had the opportunity to work with andlearn from a number of people whom I should like to publicly acknowledge Myformer advisor Dr Nirmal K Bose at the Pennsylvania State University hadprovided me with invaluable instruction, trust, and friendship during my graduatestudies; his methodical style, hard-working spirit, and commitment toward educa-tion have served as a role model to follow I am grateful to my former supervisor
Dr Tandhoni S Rao at Texas Instruments Inc., who had guided me through projectsinvolving adaptive filters, speech coding, and programming of digital signalprocessors
I would like to dedicate this book to my parents who have always encouraged
my academic interests and provided the moral support throughout my life andcareer I am deeply indebted to my cousin Chi-Ming Chu and wife Kam-Chi Chufor their help and support during my graduate studies at Stevens Tech; theirindustriousness and candid spirit have given me a great deal of positive influence
xvi PREFACE
Trang 20I am particularly indebted to my wife Laura for her love and patience, and forthoroughly reviewing and proofreading the first version of the manuscript.
I am grateful to the Wiley team for their professionalism and help during theproduction of this book; special thanks to George Telecki (Executive Editor) andRosalyn Farkas (Associate Editor) I am also most grateful to Dr Andreas Spaniasand Dr Allen Levesque for their encouraging comments and constructivecritiques—both early reviewers of the manuscript I also wish to thank my formercolleague at Texas Instruments Inc., Wai-Ming Lai for her help in examiningsome chapters of the text
Last but not least, this book is dedicated to Universidad Simo´n Bolivar, theschool where I received most of my early engineering education UniversidadSimo´n Bolivar me ha dado generosamente el vigor, la fortaleza, y la sabidurianecesaria para conquistar obsta´culos y dominar dificultades tanto en la ingenieriacomo en la vida Espero dar con este libro a los aspirantes en esta rama de laingenieria lo mismo que me ha dado la respectuosa universidad
Feedback
A book of this length is certain to contain errors and omissions While attemptswere made to provide a highly understandable and correct content, there aredoubtless many places where improvements are possible Feedback is welcome tothe author via email at wcc2@ieee.org Please note that a personal reply to allmessages might not be possible
WAIC CHU
PREFACE xvii