SPEECH CODING ALGORITHMS P1

SPEECH CODING ALGORITHMSFoundation and Evolution of Standardized Coders WAI C.. SPEECH CODING ALGORITHMSFoundation and Evolution of Standardized Coders WAI C.. My sincere hope is that th

Trang 2

SPEECH CODING ALGORITHMS

Foundation and Evolution

of Standardized Coders

WAI C CHU

Mobile Media Laboratory

DoCoMo USA Labs

San Jose, California

A JOHN WILEY & SONS, INC., PUBLICATION

Trang 4

Trang 6

Foundation and Evolution

of Standardized Coders

WAI C CHU

Mobile Media Laboratory

DoCoMo USA Labs

San Jose, California

A JOHN WILEY & SONS, INC., PUBLICATION

Trang 7

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or

by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee

to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-750-4470, or on the web at www.copyright.com Requests to the Publisher for permission should

be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken,

NJ 07030, (201) 748-6011, fax (201) 748-6008, e-mail: permreq@wiley.com.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may be created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation You should consult with a professional where appropriate Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care Department within the U.S at 877-762-2974, outside the U.S at 317-572-3993 or fax 317-572-4002.

Wiley also publishes its books in a variety of electronic formats Some content that appears in print, however, may not be available in electronic format.

Library of Congress Cataloging-in-Publication Data:

Trang 8

Intelligence is the fruit of industriousnessAccretion of knowledge creates genii

A Chinese proverb

Trang 10

1.1 Overview of Speech Coding / 2

1.2 Classiﬁcation of Speech Coders / 8

1.3 Speech Production and Modeling / 11

1.4 Some Properties of the Human Auditory System / 18

1.5 Speech Coding Standards / 22

1.6 About Algorithms / 26

1.7 Summary and References / 31

2.1 Pitch Period Estimation / 33

2.2 All-Pole and All-Zero Filters / 45

2.3 Convolution / 52

Exercises / 57

vii

Trang 11

3 STOCHASTIC PROCESSES AND MODELS 613.1 Power Spectral Density / 62

3.2 Periodogram / 67

3.3 Autoregressive Model / 69

3.4 Autocorrelation Estimation / 73

3.5 Other Signal Models / 85

Exercises / 87

4.1 The Problem of Linear Prediction / 92

4.2 Linear Prediction Analysis of Nonstationary Signals / 96

4.3 Examples of Linear Prediction Analysis of Speech / 101

4.4 The Levinson–Durbin Algorithm / 107

4.5 The Leroux–Gueguen Algorithm / 114

4.6 Long-Term Linear Prediction / 120

4.7 Synthesis Filters / 127

4.8 Practical Implementation / 131

4.9 Moving Average Prediction / 137

Trang 12

7 VECTOR QUANTIZATION 1847.1 Introduction / 185

7.2 Optimal Quantizer / 188

7.3 Quantizer Design Algorithms / 189

7.4 Multistage VQ / 194

7.5 Predictive VQ / 216

7.6 Other Structured Schemes / 219

Exercises / 222

8.1 Spectral Distortion / 227

8.2 Quantization Based on Reﬂection Coefﬁcient and

Log Area Ratio / 232

8.3 Line Spectral Frequency / 239

8.4 Quantization Based on Line Spectral Frequency / 252

8.5 Interpolation of LPC / 256

Exercises / 260

9.1 Speech Production Model / 264

9.2 Structure of the Algorithm / 268

9.3 Voicing Detector / 271

9.4 The FS1015 LPC Coder / 275

9.5 Limitations of the LPC Model / 277

Exercises / 281

10.1 Multipulse Excitation Model / 286

10.2 Regular-Pulse-Excited–Long-Term Prediction / 289

Exercises / 296

11.1 The CELP Speech Production Model / 300

CONTENTS ix

Trang 13

11.2 The Principle of Analysis-by-Synthesis / 301

11.3 Encoding and Decoding / 302

11.4 Excitation Codebook Search / 308

11.5 Postﬁlter / 317

Exercises / 326

12.1 Improving the Long-Term Predictor / 331

12.2 The Concept of the Adaptive Codebook / 333

12.3 Incorporation of the Adaptive Codebook to

the CELP Framework / 336

12.4 Stochastic Codebook Structure / 338

12.5 Adaptive Codebook Search / 341

12.6 Stochastic Codebook Search / 344

12.7 Encoder and Decoder / 346

Exercises / 350

13.1 The Core Encoding Structure / 354

13.2 Search Strategies for Excitation Codebooks / 356

13.3 Excitation Codebook Searches / 357

13.4 Gain Related Procedures / 362

Exercises / 369

14.1 Strategies to Achieve Low Delay / 373

14.2 Basic Operational Principles / 375

14.3 Linear Prediction Analysis / 377

14.4 Excitation Codebook Search / 380

14.5 Backward Gain Adaptation / 385

14.7 Codebook Training / 391

Exercises / 394

x CONTENTS

Trang 14

15 VECTOR QUANTIZATION OF LINEAR

16.3 Encoding and Decoding / 433

16.4 Algebraic Codebook Search / 437

16.5 Gain Quantization Using Conjugate VQ / 443

16.6 Other ACELP Standards / 446

Exercises / 451

17.1 The MELP Speech Production Model / 455

18.1 Adaptive Rate Decision / 487

18.2 LP Analysis and LSF-Related Operations / 494

18.3 Decoding and Encoding / 496

Exercises / 499

19.1 The Scope of Quality and Measuring Conditions / 501

CONTENTS xi

Trang 15

19.2 Objective Quality Measurements for Waveform Coders / 502

19.3 Subjective Quality Measures / 504

19.4 Improvements on Objective Quality Measures / 505

ORTHOGONALITY, BASIS, LINEARINDEPENDENCE, AND THE

xii CONTENTS

Trang 16

Soon after the hardware was finished, the focus switched to software (or firmware)design, mainly dealing with the control of various on-board peripheral devices Mytrue interest, however, was the program code inside the mixed signal processor,which was developed by a separate team of ‘‘advanced’’ engineers I was told thatvoice signals were compressed using a code-excited linear prediction (CELP)algorithm Also, it was possible to play back fixed announcement messages—such

as numbers and days of the week—with the messages stored in the linear predictioncoding (LPC) format I had no idea what these algorithms were, nor how theyworked to compress speech However, I was eager to learn the details, and decided

to go back to school and pursue a PhD with concentration in speech coding.This book is the result of my personal experience as a researcher and practitioner

in the ﬁeld of speech coding Four years ago I decided to put in extra hours, usuallylate nights and early mornings as well as weekends, to organize the literature inspeech coding and develop it into a logical presentation in terms of content andterminology Speech coding has evolved into a highly matured branch of signal

xiii

Trang 17

processing, with deployment in a plethora of products, such as, cellular phones,answering machines, communication devices, and more recently, voice overinternet protocol (VoIP) It is obvious that a thorough textbook is necessaryfor students, professors, and engineering professionals to handle the subjectappropriately My sincere hope is that the availability of a book that collectsmany of the techniques used in speech coding and presents them in an accessiblefashion will create excitement and enthusiasm, ensuring continuous rapid advances

in the ﬁeld

Philosophy and Approach

Speech Coding Algorithms reﬂects the core subject of the book, since most codingtechniques are implemented as algorithms, or computational procedures performed

by a processor However, this is by no means an exhaustive documentation of allmethods developed in this field; it is rather the study of the most successfultechniques, defined as those incorporated in a standard By doing so we concentrateour effort on understanding the most influential ideas, which is a rather efficientmanner to navigate this vast territory of knowledge

In my own personal learning curve, I found that there is a different andrefreshing lesson to be found in every standard To understand a new standard it

is often necessary to look back into the developed techniques adopted by paststandards or studies Attempting to learn by reading the official documentationdescribing the standard is very often a frustrating experience, since the assumptionmade in preparing those materials is that the audience consists of experts in thesubject, and hence the logical order and justification of a given approach isroutinely omitted Therefore the origin and the reason behind a certain practicecannot be fully understood This might not be a problem if one’s objective is toimplement the algorithm without comprehending it However, for those researcherseager to delve deeply into its roots, alternative reference sources must be explored,which can be a strenuous and prolonged process In this book I have summarizedthe knowledge acquired over an extended period of time, with the intention offilling the void between principles and implementations

In writing this book, a balance is sought between theory and practice, andbetween intuition and rigor Theoretical ideas are included only if they are used tosolve practical problems, and thorough proofs are provided Speech coding isrelated to human perception, and therefore a degree of fuzziness exists, in the sensethat no absolute right or wrong can be established for certain situations; in otherwords, no mathematical proofs are obtainable In these cases, solutions are oftenfound and justiﬁed on an intuitive basis For the most part, the book is meant to bepragmatic, since the discussed techniques are widely used in industry

Prerequisites

The minimum background required to understand the book is explained, withreference to popular textbooks where the relevant subjects can be found

xiv PREFACE

Trang 18

Advanced calculus, including complex variables [Churchill and Brown, 1990].

Discrete-time signals and systems, Fourier transforms, z-transforms, ﬁltering,and convolution [Oppenheim and Schafer, 1989; Stearns and Hush, 1990]

Random variables and stochastic processes, expectation, probability, andwide-sense stationarity [Papoulis, 1991; Peebles 1993]

Linear algebra, including linear equations, matrices, and vectors [Strang,1988]

Experience with high-level programming using a language such as C.The above list is covered in most undergraduate Electrical Engineering curricula;with this background, the book is self-contained

Organization

The text is divided into 19 chapters Chapter 1 provides an overview of the subjectscovered, with references to various aspects of speech coding, standards, algorithms,and comments on notation and terminology Chapter 2 is a review of some signalprocessing techniques, some are very general, but others are less known outsidespeech coding literature Chapter 3 contains some foundation for stochasticprocesses and models, which are important for an understanding of the theoreticalaspects Chapter 4 is about linear prediction, the integral part of almost all modernspeech coders Chapter 5 reviews the various aspects of scalar quantization, whichare utilized routinely by many speech coding algorithms One of the earliest digitalcoding techniques is pulse code modulation (PCM); it and its variants are the topic

of Chapter 6 Chapter 7 deals with vector quantization, which has become more andmore important for the achievement of high efficiency in coding systems Linearprediction coefficients (LPC) are normally quantized for transmission as part of thecompressed bit-stream; Chapter 8 covers the various methods for scalar quantiza-tion of these coefficients One of the landmarks in low bit-rate speech coding is thelinear prediction coding (LPC) algorithm, discussed in Chapter 9 Chapter 10

is devoted to regular pulse excitation coders, with a thorough description of theGSM 6.10 standard Principles of code-excited linear prediction (CELP) are given

in Chapter 11, covering the various aspects of analysis-by-synthesis, signalcalculation, postﬁlter design, and efﬁciency Chapters 12 and 13 present thestructure of two standardized CELP coders: FS1016 and IS54, respectively; theseare both milestones in speech coding development Chapter 14 is dedicated tothe G.728 low-delay CELP standard, with thorough explanations of strategies fordelay reduction and detailed structures of the coder Vector quantization of LPC

is included in Chapter 15, representing a huge advance with respect to scalarquantization techniques covered in Chapter 8, and methods used by variousstandardized coders are analyzed The highly inﬂuential algebraic CELP (ACELP)algorithm is covered in Chapter 16, where several ACELP-based standards aredescribed, with focus on the G.729 standard The mixed excitation linear prediction(MELP) algorithm is discussed in Chapter 17, and is shown to be an improvement

PREFACE xv

Trang 19

upon the LPC coder, covered in Chapter 9 Chapter 18 is devoted to the IS96variable bit-rate CELP algorithm, which is a source-controlled multimode coderwith the operating mode selected by the input characteristics of the speech signal.Finally, Chapter 19 is concerned with various methods to assess the quality ofspeech signals, especially those processed by a speech coding algorithm.

The following table summarizes the chapters and their prerequisites

Chapter Title Prerequisites

1 Introduction

2 Signal Processing Techniques 1

3 Stochastic Processes and Models

4 Linear Prediction 1, 2, 3

5 Scalar Quantization

6 Pulse Code Modulation and its Variants 4, 5

7 Vector Quantization 5

8 Scalar Quantization of Linear Prediction Coefﬁcients 4, 5

9 Linear Prediction Coding 4, 8

10 Regular-Pulse Excitation Coders 4, 8

11 Code-Excited Linear Prediction 2, 4

12 The Federal Standard Version of CELP 2, 8, 11

13 Vector Sum Excited Linear Prediction 8, 12

14 Low-Delay CELP 4, 11

15 Vector Quantization of Linear Prediction Coefﬁcients 7 , 8

16 Algebraic CELP 7, 12, 15

17 Mixed Excitation Linear Prediction 9, 15

18 Source-Controlled Variable Bit-Rate CELP 11

19 Speech Quality Assessment 1

Acknowledgments

Throughout my professional career, I have had the opportunity to work with andlearn from a number of people whom I should like to publicly acknowledge Myformer advisor Dr Nirmal K Bose at the Pennsylvania State University hadprovided me with invaluable instruction, trust, and friendship during my graduatestudies; his methodical style, hard-working spirit, and commitment toward educa-tion have served as a role model to follow I am grateful to my former supervisor

Dr Tandhoni S Rao at Texas Instruments Inc., who had guided me through projectsinvolving adaptive ﬁlters, speech coding, and programming of digital signalprocessors

I would like to dedicate this book to my parents who have always encouraged

my academic interests and provided the moral support throughout my life andcareer I am deeply indebted to my cousin Chi-Ming Chu and wife Kam-Chi Chufor their help and support during my graduate studies at Stevens Tech; theirindustriousness and candid spirit have given me a great deal of positive inﬂuence

xvi PREFACE

Trang 20

I am particularly indebted to my wife Laura for her love and patience, and forthoroughly reviewing and proofreading the ﬁrst version of the manuscript.

I am grateful to the Wiley team for their professionalism and help during theproduction of this book; special thanks to George Telecki (Executive Editor) andRosalyn Farkas (Associate Editor) I am also most grateful to Dr Andreas Spaniasand Dr Allen Levesque for their encouraging comments and constructivecritiques—both early reviewers of the manuscript I also wish to thank my formercolleague at Texas Instruments Inc., Wai-Ming Lai for her help in examiningsome chapters of the text

Last but not least, this book is dedicated to Universidad Simoń Bolivar, theschool where I received most of my early engineering education UniversidadSimoń Bolivar me ha dado generosamente el vigor, la fortaleza, y la sabidurianecesaria para conquistar obstaćulos y dominar dificultades tanto en la ingenieriacomo en la vida Espero dar con este libro a los aspirantes en esta rama de laingenieria lo mismo que me ha dado la respectuosa universidad

Feedback

A book of this length is certain to contain errors and omissions While attemptswere made to provide a highly understandable and correct content, there aredoubtless many places where improvements are possible Feedback is welcome tothe author via email at wcc2@ieee.org Please note that a personal reply to allmessages might not be possible

WAIC CHU

PREFACE xvii

Tiêu đề	Foundation and Evolution of Standardized Coders
Tác giả	Wai C. Chu
Trường học	Mobile Media Laboratory, DoCoMo USA Labs
Chuyên ngành	Speech Coding Algorithms
Thể loại	publication
Năm xuất bản	Not specified
Thành phố	San Jose

Định dạng
Số trang	40
Dung lượng	565,68 KB