A Guide to MPEG Fundamentals and Protocol Analysis pot

Compression is a way ofexpressing digital audio andvideo by using less data.. MPEG coding is divided intoseveral profiles that have differentcomplexity, and each profile can be implement

Trang 1

A Guide to MPEG Fundamentals and Protocol Analysis (Including DVB and ATSC)

Trang 2

Section 1 Introduction to MPEG 3

1.1 Convergence 3

1.2 Why compression is needed 3

1.3 Applications of compression 3

1.4 Introduction to video compression 4

1.5 Introduction to audio compression 6

1.6 MPEG signals 6

1.7 Need for monitoring and analysis 7

1.8 Pitfalls of compression 7

Section 2 Compression in Video 8

2.1 Spatial or temporal coding? 8

2.2 Spatial coding 8

2.3 Weighting 9

2.4 Scanning 11

2.5 Entropy coding 11

2.6 A spatial coder 11

2.7 Temporal coding 12

2.8 Motion compensation 13

2.9 Bidirectional coding 14

2.10 I, P, and B pictures 14

2.11 An MPEG compressor 16

2.12 Preprocessing 19

2.13 Profiles and levels 20

2.14 Wavelets 21

Section 3 Audio Compression .22

3.1 The hearing mechanism 22

3.2 Subband coding 23

3.3 MPEG Layer 1 24

3.4 MPEG Layer 2 25

3.5 Transform coding 25

3.6 MPEG Layer 3 25

3.7 AC-3 25

Section 4 Elementary Streams 26

4.1 Video elementary stream syntax 26

4.2 Audio elementary streams 27

Contents Section 5 Packetized Elementary Streams (PES) 28

5.1 PES packets 28

5.2 Time stamps 28

5.3 PTS/DTS 28

Section 6 Program Streams 29

6.1 Recording vs transmission 29

6.2 Introduction to program streams 29

Section 7 Transport streams 30

7.1 The job of a transport stream 30

7.2 Packets 30

7.3 Program Clock Reference (PCR) 31

7.4 Packet Identification (PID) 31

7.5 Program Specific Information (PSI) 32

Section 8 Introduction to DVB/ATSC 33

8.1 An overall view 33

8.2 Remultiplexing 33

8.3 Service Information (SI) 34

8.4 Error correction 34

8.5 Channel coding 35

8.6 Inner coding 36

8.7 Transmitting digits 37

Section 9 MPEG Testing 38

9.1 Testing requirements 38

9.2 Analyzing a Transport Stream 38

9.3 Hierarchic view 39

9.4 Interpreted view 40

9.5 Syntax and CRC analysis 41

9.6 Filtering 41

9.7 Timing Analysis 42

9.8 Elementary stream testing 43

9.9 Sarnoff compliant bit streams 43

9.10 Elementary stream analysis 43

9.11 Creating a transport stream 44

9.12 Jitter generation 44

9.13 DVB tests 45

Glossary 46

Trang 3

SECTION 1

INTRODUCTION TO MPEG

MPEG is one of the most popular

audio/video compression

tech-niques because it is not just a

single standard Instead it is a

range of standards suitable for

different applications but based

on similar principles MPEG is

an acronym for the Moving

Picture Experts Group which was

set up by the ISO (International

Standards Organization) to work

on compression

MPEG can be described as the

interaction of acronyms As ETSI

stated "The CAT is a pointer to

enable the IRD to find the EMMs

associated with the CA system(s)

that it uses." If you can

under-stand that sentence you don't

need this book

1.1 Convergence

Digital techniques have made

rapid progress in audio and

video for a number of reasons

Digital information is more robust

and can be coded to substantially

eliminate error This means that

generation loss in recording and

losses in transmission are

elimi-nated The Compact Disc was

the first consumer product to

demonstrate this

While the CD has an improved

sound quality with respect to its

vinyl predecessor, comparison of

quality alone misses the point

The real point is that digital

recording and transmission

tech-niques allow content

manipula-tion to a degree that is impossible

with analog Once audio or video

are digitized they become data

Such data cannot be distinguishedfrom any other kind of data;

therefore, digital video andaudio become the province ofcomputer technology

The convergence of computersand audio/video is an inevitableconsequence of the key inventions

of computing and Pulse CodeModulation Digital media canstore any type of information, so

it is easy to utilize a computerstorage device for digital video

The nonlinear workstation wasthe first example of an application

of convergent technology thatdid not have an analog forerunner

Another example, multimedia,mixed the storage of audio, video,graphics, text and data on thesame medium Multimedia isimpossible in the analog domain

1.2 Why compression is neededThe initial success of digitalvideo was in post-productionapplications, where the high cost

of digital video was offset by itslimitless layering and effectscapability However, production-standard digital video generatesover 200 megabits per second ofdata and this bit rate requiresextensive capacity for storageand wide bandwidth for trans-mission Digital video could only

be used in wider applications ifthe storage and bandwidthrequirements could be eased;

easing these requirements is thepurpose of compression

Compression is a way ofexpressing digital audio andvideo by using less data

Compression has the followingadvantages:

A smaller amount of storage isneeded for a given amount ofsource material With high-density recording, such as withtape, compression allows highlyminiaturized equipment for consumer and Electronic NewsGathering (ENG) use The accesstime of tape improves with com-pression because less tape needs

to be shuttled to skip over agiven amount of program Withexpensive storage media such asRAM, compression makes newapplications affordable

When working in real time, pression reduces the bandwidthneeded Additionally, compres-sion allows faster-than-real-timetransfer between media, forexample, between tape and disk

com-A compressed recording formatcan afford a lower recordingdensity and this can make therecorder less sensitive to environmental factors and maintenance

1.3 Applications of compressionCompression has a long associa-tion with television Interlace is

a simple form of compressiongiving a 2:1 reduction in band-width The use of color-differencesignals instead of GBR is anotherform of compression Becausethe eye is less sensitive to colordetail, the color-difference signalsneed less bandwidth Whencolor broadcasting was intro-duced, the channel structure ofmonochrome had to be retainedand composite video was devel-oped Composite video systems,such as PAL, NTSC and SECAM,are forms of compression becausethey use the same bandwidth forcolor as was used for monochrome

Trang 4

Figure 1.1a shows that in tional television systems, theGBR camera signal is converted

tradi-to Y, Pr, Pb components for duction and encoded into ana-logue composite for transmission

pro-Figure 1.1b shows the modernequivalent The Y, Pr, Pb signalsare digitized and carried as Y,

Cr, Cb signals in SDI formthrough the production processprior to being encoded withMPEG for transmission Clearly,MPEG can be considered by thebroadcaster as a more efficientreplacement for compositevideo In addition, MPEG hasgreater flexibility because the bitrate required can be adjusted tosuit the application At lower bitrates and resolutions, MPEG can

be used for video conferencingand video telephones

DVB and ATSC (the and American-originated digital-television broadcasting standards)would not be viable withoutcompression because the band-width required would be toogreat Compression extends theplaying time of DVD (digitalvideo/versatile disc) allowingfull-length movies on a standardsize compact disc Compressionalso reduces the cost of ElectronicNews Gathering and other contri-butions to television production

European-In tape recording, mild sion eases tolerances and addsreliability in Digital Betacam andDigital-S, whereas in SX, DVC,DVCPRO and DVCAM, the goal

compres-is miniaturization In magneticdisk drives, such as the TektronixProfile®

storage system, that areused in file servers and networks(especially for news purposes),

compression lowers storage cost

Compression also lowers width, which allows more users

band-to access a given server Thischaracteristic is also importantfor VOD (Video On Demand)applications

1.4 Introduction to video compression

In all real program material,there are two types of components

of the signal: those which arenovel and unpredictable andthose which can be anticipated

The novel component is calledentropy and is the true informa-tion in the signal The remainder

is called redundancy because it

is not essential Redundancy may

be spatial, as it is in large plainareas of picture where adjacentpixels have almost the samevalue Redundancy can also betemporal as it is where similaritiesbetween successive pictures areused All compression systemswork by separating the entropyfrom the redundancy in theencoder Only the entropy isrecorded or transmitted and thedecoder computes the redundancyfrom the transmitted signal

Figure 1.2a shows this concept

An ideal encoder would extractall the entropy and only this will

be transmitted to the decoder

An ideal decoder would thenreproduce the original signal Inpractice, this ideal cannot bereached An ideal coder would

be complex and cause a verylong delay in order to use tem-poral redundancy In certainapplications, such as recording

or broadcasting, some delay isacceptable, but in videoconfer-

encing it is not In some cases, avery complex coder would betoo expensive It follows thatthere is no one ideal compres-sion system

In practice, a range of coders isneeded which have a range ofprocessing delays and complexi-ties The power of MPEG is that

it is not a single compressionformat, but a range of standard-ized coding tools that can becombined flexibly to suit a range

of applications The way inwhich coding has been performed

is included in the compresseddata so that the decoder canautomatically handle whateverthe coder decided to do

MPEG coding is divided intoseveral profiles that have differentcomplexity, and each profile can

be implemented at a differentlevel depending on the resolution

of the input picture Section 2considers profiles and levels

in detail

There are many different digitalvideo formats and each has a different bit rate For example ahigh definition system mighthave six times the bit rate of astandard definition system.Consequently just knowing thebit rate out of the coder is notvery useful What matters is thecompression factor, which is theratio of the input bit rate to thecompressed bit rate, for example2:1, 5:1, and so on

Unfortunately the number ofvariables involved make it verydifficult to determine a suitablecompression factor Figure 1.2ashows that for an ideal coder, ifall of the entropy is sent, thequality is good However, if thecompression factor is increased

in order to reduce the bit rate,not all of the entropy is sent andthe quality falls Note that in acompressed system when thequality loss occurs, compression

is steep (Figure 1.2b) If theavailable bit rate is inadequate,

it is better to avoid this area byreducing the entropy of theinput picture This can be done

by filtering The loss of tion caused by the filtering is

resolu-AnalogCompositeOut

(PAL, NTSC

or SECAM)B

GR

YPrPb

DigitalCompressedOut

BGR

YPrPb

YCrCb

MPEGCoder

a)

MatrixCamera

Camera

CompositeEncoder

Trang 5

To identify the entropy perfectly,

an ideal compressor would have

to be extremely complex A

practical compressor may be less

complex for economic reasons

and must send more data to be

sure of carrying all of the entropy

Figure 1.2b shows the relationship

between coder complexity and

performance The higher the

com-pression factor required, the more

complex the encoder has to be

The entropy in video signals

varies A recording of an

announcer delivering the news

has much redundancy and is easy

to compress In contrast, it is

more difficult to compress a

recording with leaves blowing in

the wind or one of a football

crowd that is constantly moving

and therefore has less redundancy

(more information or entropy) In

either case, if all the entropy is

not sent, there will be quality loss

Thus, we may choose between a

constant bit-rate channel with

variable quality or a constant

quality channel with variable bit

rate Telecommunications network

operators tend to prefer a constant

bit rate for practical purposes,

but a buffer memory can be used

to average out entropy variations

if the resulting increase in delay

is acceptable In recording, a

variable bit rate maybe easier to

handle and DVD uses variable

bit rate, speeding up the disc

where difficult material exists

Intra-coding (intra = within) is a

technique that exploits spatial

redundancy, or redundancy

within the picture; inter-coding

(inter = between) is a technique

that exploits temporal redundancy

Intra-coding may be used alone,

as in the JPEG standard for still

pictures, or combined with

inter-coding as in MPEG

Intra-coding relies on two

char-acteristics of typical images

First, not all spatial frequencies

are simultaneously present, and

second, the higher the spatial

frequency, the lower the

ampli-tude is likely to be Intra-coding

requires analysis of the spatial

frequencies in an image This

analysis is the purpose of

trans-forms such as wavelets and DCT

(discrete cosine transform)

Transforms produce coefficients

which describe the magnitude of

Typically, many coefficients will

be zero, or nearly zero, and thesecoefficients can be omitted,resulting in a reduction in bit rate

Inter-coding relies on findingsimilarities between successivepictures If a given picture isavailable at the decoder, the nextpicture can be created by sendingonly the picture differences Thepicture differences will beincreased when objects move,but this magnification can be offset by using motion compen-sation, since a moving objectdoes not generally change itsappearance very much from onepicture to the next If the motioncan be measured, a closer approx-imation to the current picturecan be created by shifting part ofthe previous picture to a newlocation The shifting process iscontrolled by a vector that istransmitted to the decoder Thevector transmission requires lessdata than sending the picture-difference data

MPEG can handle both interlacedand non-interlaced images Animage at some point on the timeaxis is called a "picture," whether

it is a field or a frame Interlace

is not ideal as a source for digital

compression because it is in itself a compression technique

Temporal coding is made morecomplex because pixels in onefield are in a different position tothose in the next

Motion compensation minimizesbut does not eliminate thedifferences between successivepictures The picture-difference

is itself a spatial image and can

be compressed using based intra-coding as previouslydescribed Motion compensationsimply reduces the amount ofdata in the difference image

transform-The efficiency of a temporalcoder rises with the time spanover which it can act Figure 1.2cshows that if a high compressionfactor is required, a longer timespan in the input must be con-sidered and thus a longer codingdelay will be experienced Clearlytemporally coded signals are dif-ficult to edit because the content

of a given output picture may bebased on image data which wastransmitted some time earlier

Production systems will have

to limit the degree of temporalcoding to allow editing and thislimitation will in turn limit theavailable compression factor

Short DelayCoder has tosend even more

Non-IdealCoder has tosend more

Ideal Codersends onlyEntropyEntropy

PCM Video

WorseQuality

BetterQuality

Latency

WorseQuality

BetterQuality

Complexitya)

Figure 1.2.

Trang 6

Stream differs from a ProgramStream in that the PES packetsare further subdivided into shortfixed-size packets and in thatmultiple programs encoded withdifferent clocks can be carried.This is possible because a trans-port stream has a program clockreference (PCR) mechanismwhich allows transmission ofmultiple clocks, one of which isselected and regenerated at thedecoder A Single ProgramTransport Stream (SPTS) is alsopossible and this may be foundbetween a coder and a multi-plexer Since a Transport Streamcan genlock the decoder clock

to the encoder clock, the SingleProgram Transport Stream(SPTS) is more common thanthe Program Stream

A Transport Stream is more thanjust a multiplex of audio andvideo PES In addition to thecompressed audio, video anddata, a Transport Streamincludes a great deal of metadatadescribing the bit stream Thisincludes the Program AssociationTable (PAT) that lists every pro-gram in the transport stream.Each entry in the PAT points to

a Program Map Table (PMT) thatlists the elementary streamsmaking up each program Someprograms will be open, but someprograms may be subject to con-ditional access (encryption) andthis information is also carried

in the metadata

The Transport Stream consists

of fixed-size data packets, eachcontaining 188 bytes Eachpacket carries a packet identifiercode (PID) Packets in the sameelementary stream all have thesame PID, so that the decoder(or a demultiplexer) can selectthe elementary stream(s) itwants and reject the remainder.Packet-continuity counts ensurethat every packet that is needed

to decode a stream is received

An effective synchronizationsystem is needed so thatdecoders can correctly identifythe beginning of each packetand deserialize the bit streaminto words

complicating audio compression

is that delayed resonances inpoor loudspeakers actually maskcompression artifacts Testing acompressor with poor speakersgives a false result, and signalswhich are apparently satisfactorymay be disappointing whenheard on good equipment

1.6 MPEG signalsThe output of a single MPEGaudio or video coder is called

an Elementary Stream AnElementary Stream is an endlessnear real-time signal For conve-nience, it can be broken intoconvenient-sized data blocks in

a Packetized Elementary Stream(PES) These data blocks needheader information to identifythe start of the packets and mustinclude time stamps becausepacketizing disrupts the time axis

Figure 1.3 shows that one videoPES and a number of audio PEScan be combined to form aProgram Stream, provided thatall of the coders are locked to acommon clock Time stamps ineach PES ensure lip-syncbetween the video and audio

Program Streams have length packets with headers

variable-They find use in data transfers

to and from optical and harddisks, which are error free and in which files of arbitrarysizes are expected DVD usesProgram Streams

For transmission and digitalbroadcasting, several programsand their associated PES can bemultiplexed into a singleTransport Stream A Transport

1.5 Introduction to audio compression

The bit rate of a PCM digitalaudio channel is only about onemegabit per second, which isabout 0.5% of 4:2:2 digital video

With mild video compressionschemes, such as DigitalBetacam, audio compression isunnecessary But, as the videocompression factor is raised, itbecomes necessary to compressthe audio as well

Audio compression takes tage of two facts First, in typicalaudio signals, not all frequenciesare simultaneously present

advan-Second, because of the enon of masking, human hearingcannot discern every detail of

phenom-an audio signal Audio sion splits the audio spectruminto bands by filtering or trans-forms, and includes less datawhen describing bands in whichthe level is low Where maskingprevents or reduces audibility of

compres-a pcompres-articulcompres-ar bcompres-and, even less dcompres-atcompres-aneeds to be sent

Audio compression is not aseasy to achieve as is video com-pression because of the acuity ofhearing Masking only worksproperly when the masking andthe masked sounds coincidespatially Spatial coincidence isalways the case in mono record-ings but not in stereo recordings,where low-level signals can still be heard if they are in a different part of the soundstage

Consequently, in stereo and round sound systems, a lowercompression factor is allowablefor a given quality Another factor

VideoPESAudioPES

Data

ProgramStream(DVD)

SingleProgramTransportStream

VideoEncoder

AudioEncoder

Packetizer

ProgramStreamMUX

TransportStreamMUX

Trang 7

1.7 Need for monitoring and analysis

The MPEG transport stream is

an extremely complex structure

using interlinked tables and

coded identifiers to separate the

programs and the elementary

streams within the programs

Within each elementary stream,

there is a complex structure,

allowing a decoder to distinguish

between, for example, vectors,

coefficients and quantization

tables

Failures can be divided into

two broad categories In the first

category, the transport system

correctly multiplexes and delivers

information from an encoder to

a decoder with no bit errors or

added jitter, but the encoder or

the decoder has a fault In the

second category, the encoder

and decoder are fine, but the

transport of data from one to the

other is defective It is very

important to know whether the

fault lies in the encoder, the

transport, or the decoder if a

prompt solution is to be found

Synchronizing problems, such

as loss or corruption of sync

patterns, may prevent reception

of the entire transport stream

Transport-stream protocol

defects may prevent the decoder

from finding all of the data for a

program, perhaps delivering

picture but not sound Correct

delivery of the data but with

excessive jitter can cause decoder

timing problems

If a system using an MPEG

transport stream fails, the fault

could be in the encoder, the

multiplexer, or in the decoder

How can this fault be isolated?

First, verify that a transport

stream is compliant with the

MPEG-coding standards If the

stream is not compliant, a

decoder can hardly be blamed

for having difficulty If it is, the

decoder may need attention

Traditional video testing tools,the signal generator, the wave-form monitor and vectorscope,are not appropriate in analyzingMPEG systems, except to ensurethat the video signals enteringand leaving an MPEG systemare of suitable quality Instead,

a reliable source of valid MPEGtest signals is essential for testing receiving equipment and decoders With a suitableanalyzer, the performance ofencoders, transmission systems,multiplexers and remultiplexerscan be assessed with a highdegree of confidence As a longstanding supplier of high gradetest equipment to the videoindustry, Tektronix continues toprovide test and measurementsolutions as the technologyevolves, giving the MPEG userthe confidence that complexcompressed systems are correctlyfunctioning and allowing rapiddiagnosis when they are not

1.8 Pitfalls of compressionMPEG compression is lossy inthat what is decoded, is notidentical to the original Theentropy of the source varies,and when entropy is high, thecompression system may leavevisible artifacts when decoded

In temporal compression,redundancy between successivepictures is assumed When this

is not the case, the system fails

An example is video from a pressconference where flashguns arefiring Individual pictures con-taining the flash are totally dif-ferent from their neighbors, andcoding artifacts become obvious

Irregular motion or several independently moving objects

on screen require a lot of vectorbandwidth and this requirementmay only be met by reducingthe picture-data bandwidth

Again, visible artifacts may

occur whose level varies anddepends on the motion Thisproblem often occurs in sports-coverage video

Coarse quantizing results inluminance contouring and pos-terized color These can be seen

as blotchy shadows and blocking

on large areas of plain color

Subjectively, compression artifactsare more annoying than the relatively constant impairmentsresulting from analog televisiontransmission systems

The only solution to these lems is to reduce the compressionfactor Consequently, the com-pression user has to make avalue judgment between theeconomy of a high compressionfactor and the level of artifacts

prob-In addition to extending theencoding and decoding delay,temporal coding also causes difficulty in editing In fact, anMPEG bit stream cannot be arbitrarily edited at all Thisrestriction occurs because intemporal coding the decoding

of one picture may require thecontents of an earlier pictureand the contents may not beavailable following an edit

The fact that pictures may besent out of sequence also complicates editing

If suitable coding has been used,edits can take place only atsplice points, which are relativelywidely spaced If arbitrary editing

is required, the MPEG streammust undergo a read-modify-write process, which will result

in generation loss

The viewer is not interested inediting, but the production userwill have to make another valuejudgment about the edit flexibilityrequired If greater flexibility isrequired, the temporal compres-sion has to be reduced and ahigher bit rate will be needed

Trang 8

sufficient accuracy, the output

of the inverse transform is tical to the original waveform.The most well known transform

iden-is the Fourier transform Thiden-istransform finds each frequency

in the input signal It finds eachfrequency by multiplying theinput waveform by a sample of

a target frequency, called a basisfunction, and integrating theproduct Figure 2.1 shows thatwhen the input waveform doesnot contain the target frequency,the integral will be zero, butwhen it does, the integral will

be a coefficient describing theamplitude of that componentfrequency

The results will be as described

if the frequency component is inphase with the basis function.However if the frequency com-ponent is in quadrature with thebasis function, the integral willstill be zero Therefore, it is necessary to perform twosearches for each frequency,with the basis functions inquadrature with one another sothat every phase of the inputwill be detected

The Fourier transform has thedisadvantage of requiring coeffi-cients for both sine and cosinecomponents of each frequency

In the cosine transform, the inputwaveform is time-mirrored withitself prior to multiplication bythe basis functions Figure 2.2shows that this mirroring cancelsout all sine components anddoubles all of the cosine compo-nents The sine basis function

is unnecessary and only onecoefficient is needed for eachfrequency

The discrete cosine transform(DCT) is the sampled version ofthe cosine transform and is usedextensively in two-dimensionalform in MPEG A block of 8 x 8pixels is transformed to become

a block of 8 x 8 coefficients.Since the transform requiresmultiplication by fractions,there is wordlength extension,resulting in coefficients thathave longer wordlength than thepixel values Typically an 8-bit

Spatial compression relies onsimilarities between adjacentpixels in plain areas of pictureand on dominant spatial fre-quencies in areas of patterning

The JPEG system uses spatialcompression only, since it isdesigned to transmit individualstill pictures However, JPEGmay be used to code a succession

of individual pictures for video

In the so-called "Motion JPEG"

application, the compressionfactor will not be as good as iftemporal coding was used, butthe bit stream will be freelyeditable on a picture-by-picture basis

2.2 Spatial codingThe first step in spatial coding

is to perform an analysis of tial frequency using a transform

spa-A transform is simply a way ofexpressing a waveform in a dif-ferent domain, in this case, thefrequency domain The output

of a transform is a set of cients that describe how much

coeffi-of a given frequency is present

An inverse transform reproducesthe original waveform If thecoefficients are handled with

SECTION 2 COMPRESSION IN VIDEOThis section shows how videocompression is based on theperception of the eye Importantenabling techniques, such astransforms and motion compen-sation, are considered as anintroduction to the structure of

an MPEG coder

2.1 Spatial or temporal coding?

As was seen in Section 1, videocompression can take advantage

of both spatial and temporalredundancy In MPEG, temporalredundancy is reduced first byusing similarities between suc-cessive pictures As much aspossible of the current picture iscreated or "predicted" by usinginformation from pictures alreadysent When this technique isused, it is only necessary tosend a difference picture, whicheliminates the differencesbetween the actual picture andthe prediction The differencepicture is then subject to spatialcompression As a practical mat-ter it is easier to explain spatialcompression prior to explainingtemporal compression

No Correlation

if FrequencyDifferent

High Correlation

if Frequencythe Same

Sine ComponentInverts at Mirror – CancelsFigure 2.1.

Trang 9

Figure 2.3.

pixel block results in an 11-bit

coefficient block Thus, a DCT

does not result in any

compres-sion; in fact it results in the

opposite However, the DCT

converts the source pixels into a

form where compression is easier

Figure 2.3 shows the results of

an inverse transform of each of

the individual coefficients of an

8 x 8 DCT In the case of the

luminance signal, the top-left

coefficient is the average

bright-ness or DC component of the

whole block Moving across

the top row, horizontal spatial

frequency increases Moving

down the left column, vertical

spatial frequency increases In

real pictures, different vertical

and horizontal spatial

frequen-cies may occur simultaneously

and a coefficient at some point

within the block will represent

all possible horizontal and

vertical combinations

Figure 2.3 also shows the

coefficients as a one dimensional

horizontal waveform Combining

these waveforms with various

amplitudes and either polarity

can reproduce any combination

of 8 pixels Thus combining the

64 coefficients of the 2-D DCT

will result in the original 8 x 8

pixel block Clearly for color

pictures, the color difference

samples will also need to be

handled Y, Cr, and Cb data are

assembled into separate 8 x 8

arrays and are transformed

individually

In much real program material,

many of the coefficients will

have zero or near zero values

and, therefore, will not be

transmitted This fact results in

significant compression that is

virtually lossless If a higher

compression factor is needed,

then the wordlength of the

non-zero coefficients must be reduced

This reduction will reduce

accu-racy of these coefficients and

will introduce losses into the

process With care, the losses

can be introduced in a way that

is least visible to the viewer

2.3 WeightingFigure 2.4 shows that thehuman perception of noise inpictures is not uniform but is afunction of the spatial frequency

More noise can be tolerated athigh spatial frequencies Also,video noise is effectively masked

by fine detail in the picture,whereas in plain areas it is highlyvisible The reader will be awarethat traditional noise measure-ments are always weighted so

that the technical measurementrelates to the subjective result

Compression reduces the accuracy

of coefficients and has a similareffect to using shorter wordlengthsamples in PCM; that is, thenoise level rises In PCM, theresult of shortening the word-length is that the noise levelrises equally at all frequencies

As the DCT splits the signal intodifferent frequencies, it becomespossible to control the spectrum

of the noise Effectively, frequency coefficients are

low-Horizontal spatialfrequency waveforms

HV

HumanVisionSensitivity

Spatial FrequencyFigure 2.4.

Trang 10

As an alternative to truncation,weighted coefficients may benonlinearly requantized so thatthe quantizing step size increaseswith the magnitude of the coef-ficient This technique allowshigher compression factors butworse levels of artifacts.

Clearly, the degree of sion obtained and, in turn, theoutput bit rate obtained, is afunction of the severity of therequantizing process Differentbit rates will require differentweighting tables In MPEG, it ispossible to use various differentweighting tables and the table inuse can be transmitted to thedecoder, so that correct decodingautomatically occurs

compres-increased noise Coefficientsrepresenting higher spatial fre-quencies are requantized withlarge steps and suffer morenoise However, fewer stepsmeans that fewer bits are needed

to identify the step and a pression is obtained

com-In the decoder, a low-order zerowill be added to return theweighted coefficients to theircorrect magnitude They willthen be multiplied by inverseweighting factors Clearly, athigh frequencies the multiplica-tion factors will be larger, so therequantizing noise will be greater

Following inverse weighting,the coefficients will have theiroriginal DCT output values, plusrequantizing error, which will

be greater at high frequencythan at low frequency

Figure 2.5 shows that, in theweighting process, the coeffi-cients from the DCT are divided

by constants that are a function

of two-dimensional frequency

Low-frequency coefficients will

be divided by small numbers,and high-frequency coefficientswill be divided by large numbers

Following the division, theleast-significant bit is discarded

or truncated This truncation is

a form of requantizing In theabsence of weighting, thisrequantizing would have theeffect of doubling the size of thequantizing step, but with weight-ing, it increases the step sizeaccording to the division factor

As a result, coefficients senting low spatial frequenciesare requantized with relativelysmall steps and suffer little

repre-Input DCT Coefficients

not actual results

Quant Matrix ValuesValue used corresponds

to the coefficient location

Quant Scale ValuesNot all code values are shownOne value used for complete 8x8 block

Divide byQuantMatrix

Divide byQuantScale

12752210

1363440

68221

1110

222226272729

2627292935

29323438

2163240485662

182440

8811256

Figure 2.5.

Trang 11

Figure 2.6.

2.6 A spatial coderFigure 2.7 ties together all of the preceding spatial codingconcepts The input signal isassumed to be 4:2:2 SDI (SerialDigital Interface), which mayhave 8- or 10-bit wordlength

MPEG uses only 8-bit resolutiontherefore, a rounding stage will

be needed when the SDI signalcontains 10-bit words MostMPEG profiles operate with 4:2:0sampling; therefore, a verticallow pass filter/interpolationstage will be needed Roundingand color subsampling intro-duces a small irreversible loss ofinformation and a proportionalreduction in bit rate The rasterscanned input format will need

to be stored so that it can be converted to 8 x 8 pixel blocks

(RLC) allows these coefficients

to be handled more efficiently

Where repeating values, such as

a string of 0s, are present, runlength coding simply transmitsthe number of zeros rather thaneach individual bit

The probability of occurrence ofparticular coefficient values inreal video can be studied Inpractice, some values occur veryoften; others occur less often

This statistical information can

be used to achieve further pression using variable lengthcoding (VLC) Frequently occur-ring values are converted toshort code words, and infrequentvalues are converted to longcode words To aid deserializa-tion, no code word can be theprefix of another

com-2.4 Scanning

In typical program material, the

significant DCT coefficients are

generally found in the top-left

corner of the matrix After

weighting, low-value coefficients

might be truncated to zero

More efficient transmission can

be obtained if all of the non-zero

coefficients are sent first,

fol-lowed by a code indicating that

the remainder are all zero

Scanning is a technique which

increases the probability of

achieving this result, because it

sends coefficients in descending

order of magnitude probability

Figure 2.6a shows that in a

non-interlaced system, the probability

of a coefficient having a high

value is highest in the top-left

corner and lowest in the

bottom-right corner A 45 degree diagonal

zig-zag scan is the best sequence

to use here

In Figure 2.6b, the scan for an

interlaced source is shown In

an interlaced picture, an 8 x 8

DCT block from one field

extends over twice the vertical

screen area, so that for a given

picture detail, vertical frequencies

will appear to be twice as great

as horizontal frequencies Thus,

the ideal scan for an interlaced

picture will be on a diagonal

that is twice as steep Figure 2.6b

shows that a given vertical

spa-tial frequency is scanned before

scanning the same horizontal

spatial frequency

2.5 Entropy coding

In real video, not all spatial

frequencies are simultaneously

present; therefore, the DCT

coefficient matrix will have zero

terms in it Despite the use of

scanning, zero coefficients will

still appear between the

signifi-cant values Run length coding

Zigzag or Classic (nominally for frames)

Rate ControlQuantizing Data

CompressedData

Data reduced(no loss)

Data reduced(information lost)

No Loss

No Data reduced

Information lostData reduced

Convert4:2:2 to

Reduce the number of bits for each coefficient

Give preference to certain coefficients

Reduction can differ for each coefficient

Variable Length CodingUse short words for most frequentvalues (like Morse Code)

Run Length CodingSend a unique code word instead

of strings of zerosFigure 2.7.

Trang 12

To obtain a 4:2:2 output from4:2:0 data, a vertical interpolationprocess will be needed as shown

in Figure 2.8

The chroma samples in 4:2:0 arepositioned half way betweenluminance samples in the verticalaxis so that they are evenlyspaced when an interlacedsource is used

2.7 Temporal codingTemporal redundancy can beexploited by inter-coding ortransmitting only the differencesbetween pictures Figure 2.9shows that a one-picture delaycombined with a subtractor cancompute the picture differences.The picture difference is animage in its own right and can

be further compressed by thespatial coder as was previouslydescribed The decoder reversesthe spatial coding and adds thedifference picture to the previouspicture to obtain the next picture.There are some disadvantages tothis simple system First, asonly differences are sent, it isimpossible to begin decodingafter the start of the transmission.This limitation makes it difficultfor a decoder to provide picturesfollowing a switch from one bit stream to another (as occurswhen the viewer changes chan-nels) Second, if any part of thedifference data is incorrect, theerror in the picture will propagate indefinitely

The solution to these problems is

to use a system that is not pletely differential Figure 2.10shows that periodically completepictures are sent These arecalled Intra-coded pictures (or I-pictures), and they are obtained

com-by spatial compression only If

an error or a channel switchoccurs, it will be possible toresume correct decoding at thenext I-picture

system, a buffer memory is used to absorb variations in coding difficulty Highlydetailed pictures will tend to fillthe buffer, whereas plain pic-tures will allow it to empty Ifthe buffer is in danger of over-flowing, the requantizing stepswill have to be made larger, sothat the compression factor iseffectively raised

In the decoder, the bit stream isdeserialized and the entropycoding is reversed to reproducethe weighted coefficients Theinverse weighting is applied andcoefficients are placed in thematrix according to the zig-zagscan to recreate the DCT matrix

The DCT stage transforms thepicture information to the fre-quency domain The DCT itselfdoes not achieve any compres-sion Following DCT, the coeffi-cients are weighted and truncated,providing the first significantcompression The coefficientsare then zig-zag scanned toincrease the probability that thesignificant coefficients occurearly in the scan After the lastnon-zero coefficient, an EOB(end of block) code is generated

Coefficient data are further pressed by run length and vari-able length coding In a variablebit-rate system, the quantizing isfixed, but in a fixed bit-rate

Picture Delay

+_

Figure 2.9.

DifferenceDifference

Figure 2.10.

Trang 13

Figure 2.11.

objects in the image and therewill be occasions where part ofthe macroblock moves and part

of it does not In this case, it isimpossible to compensate prop-erly If the motion of the movingpart is compensated by trans-mitting a vector, the stationarypart will be incorrectly shifted,and it will need difference data

to be corrected If no vector issent, the stationary part will becorrect, but difference data will

be needed to correct the movingpart A practical compressormight attempt both strategiesand select the one which requiredthe least difference data

with a resolution of half a pixelover the entire search range

When the greatest correlation isfound, this correlation is assumed

to represent the correct motion

The motion vector has a verticaland horizontal component Intypical program material,motion continues over a number

of pictures A greater compressionfactor is obtained if the vectorsare transmitted differentially

Consequently, if an objectmoves at constant speed, thevectors do not change and thevector difference is zero

Motion vectors are associatedwith macroblocks, not with real

2.8 Motion compensation

Motion reduces the similarities

between pictures and increases

the data needed to create the

difference picture Motion

com-pensation is used to increase the

similarity Figure 2.11 shows the

principle When an object moves

across the TV screen, it may

appear in a different place in

each picture, but it does not

change in appearance very much

The picture difference can be

reduced by measuring the motion

at the encoder This is sent to

the decoder as a vector The

decoder uses the vector to shift

part of the previous picture to a

more appropriate place in the

new picture

One vector controls the shifting

of an entire area of the picture

that is known as a macroblock

The size of the macroblock is

determined by the DCT coding

and the color subsampling

structure Figure 2.12a shows

that, with a 4:2:0 system, the

vertical and horizontal spacing

of color samples is exactly twice

the spacing of luminance A

single 8 x 8 DCT block of color

samples extends over the same

area as four 8 x 8 luminance

blocks; therefore this is the

minimum picture area which

can be shifted by a vector One

4:2:0 macroblock contains four

luminance blocks: one Cr block

and one Cb block

In the 4:2:2 profile, color is only

subsampled in the horizontal

axis Figure 2.12b shows that in

4:2:2, a single 8 x 8 DCT block

of color samples extends over

two luminance blocks A 4:2:2

macroblock contains four

lumi-nance blocks: two Cr blocks and

two Cb blocks

The motion estimator works by

comparing the luminance data

from two successive pictures A

macroblock in the first picture

is used as a reference When the

input is interlaced, pixels will

be at different vertical locations

in the two fields, and it will,

therefore, be necessary to

inter-polate one field before it can be

compared with the other The

correlation between the reference

and the next picture is measured

at all possible displacements

Actions:

1 Compute Motion Vector

2 Shift Data from Picture N Using Vector to Make Predicted Picture N+1

3 Compare Actual Picture with Predicted Picture

4 Send Vector and Prediction Error

MotionVectorPart of Moving Object

a) 4:2:0 has 1/4 as many chroma sampling points as Y

b) 4:2:2 has twice as much chroma data as 4:2:0

88

4 x Y

88

Figure 2.12.

Trang 14

Figure 2.13.

2.10 I, P and B pictures

In MPEG, three different types

of pictures are needed to supportdifferential and bidirectionalcoding while minimizing errorpropagation:

I pictures are Intra-coded picturesthat need no additional informa-tion for decoding They require

a lot of data compared to otherpicture types, and therefore theyare not transmitted any morefrequently than necessary Theyconsist primarily of transformcoefficients and have no vectors

I pictures allow the viewer toswitch channels, and they arresterror propagation

P pictures are forward Predictedfrom an earlier picture, whichcould be an I picture or a P pic-ture P-picture data consists ofvectors describing where, in theprevious picture, each macro-block should be taken from, andnot of transform coefficients thatdescribe the correction or differ-ence data that must be added tothat macroblock P picturesrequire roughly half the data of

moved backwards in time tocreate part of an earlier picture

Figure 2.13 shows the concept

of bidirectional coding On anindividual macroblock basis, abidirectionally coded picturecan obtain motion-compensateddata from an earlier or later picture, or even use an average

of earlier and later data

Bidirectional coding significantlyreduces the amount of differencedata needed by improving thedegree of prediction possible

MPEG does not specify how anencoder should be built, onlywhat constitutes a compliant bitstream However, an intelligentcompressor could try all threecoding strategies and select theone that results in the least data

to be transmitted

2.9 Bidirectional codingWhen an object moves, it concealsthe background at its leadingedge and reveals the background

at its trailing edge The revealedbackground requires new data to

be transmitted because the area

of background was previouslyconcealed and no informationcan be obtained from a previouspicture A similar problemoccurs if the camera pans: newareas come into view and nothing

is known about them MPEGhelps to minimize this problem

by using bidirectional coding,which allows information to betaken from pictures before andafter the current picture If abackground is being revealed, itwill be present in a later picture,and the information can be

Revealed Area isNot in Picture (N)

Revealed Area is

in Picture N+2

Picture(N)

Picture(N+1)

Picture(N+2)

TIM

E

Trang 15

Figure 2.14.

Figure 2.14 introduces the

concept of the GOP or Group

Of Pictures The GOP begins

with an I picture and then has

P pictures spaced throughout

The remaining pictures are

B pictures The GOP is defined

as ending at the last picture

before the next I picture The

GOP length is flexible, but 12 or

15 pictures is a common value

Clearly, if data for B pictures are

to be taken from a future picture,

that data must already be

avail-able at the decoder Consequently,

bidirectional coding requires

that picture data is sent out of

sequence and temporarily

stored Figure 2.14 also shows

that the P-picture data are sent

before the B-picture data Note

that the last B pictures in the

GOP cannot be transmitted until

after the I picture of the next

GOP since this data will be

needed to bidirectionally decode

them In order to return pictures

to their correct sequence, a

tem-poral reference is included with

each picture As the picture rate

is also embedded periodically in

headers in the bit stream, an

MPEG file may be displayed by,

for example, a personal computer,

in the correct order and timescale

Sending picture data out of

sequence requires additional

memory at the encoder and

decoder and also causes delay

The number of bidirectionally

Rec 601Video Frames

ElementaryStreamtemporal_reference

LowerQuality

HigherQuality

ConstantQualityCurve

to edit is important, an IBsequence is a useful compromise

Trang 16

Figure 2.16a.

the data pass straight through to

be spatially coded Subtractoroutput data also pass to a framestore that can hold several pic-tures The I picture is held inthe store

order The data then enter thesubtractor and the motion esti-mator To create an I picture, seeFigure 2.16a, the end of theinput delay is selected and thesubtractor is turned off, so that

2.11 An MPEG compressorFigures 2.16a, b, and c show atypical bidirectional motioncompensator structure Pre-processed input video enters aseries of frame stores that can bebypassed to change the picture

TablesIn

SpatialCoder

RateControl

BackwardPredictionError

Disablefor I,P

SpatialData

VectorsOut

ForwardPredictionError

Norm

Reorder

ForwardVectors

BackwardVectors

GOPControl

CurrentPicture

OutF

ForwardPredictor

BackwardPredictor

PastPicture

MotionEstimator

FuturePicture

SpatialDecoder

SubtractPass (I)

I Pictures

(shaded areas are unused)

Trang 17

Figure 2.16b.

TablesIn

SpatialCoder

RateControl

Disablefor I,P

SpatialData

VectorsOut

Norm

Reorder

ForwardVectors

BackwardVectors

GOPControl

CurrentPicture

OutF

BackwardPredictor

FuturePicture

SpatialDecoder

P Pictures

(shaded areas are unused)

ForwardPredictor

MotionEstimator

PastPicture

SpatialDecoder

SubtractPass (I)

To encode a P picture, see

Figure 2.16b, the B pictures in

the input buffer are bypassed, so

that the future picture is selected

The motion estimator compares

the I picture in the output store

with the P picture in the inputstore to create forward motionvectors The I picture is shifted

by these vectors to make a dicted P picture The predicted

pre-P picture is subtracted from theactual P picture to produce the

prediction error, which is spatially coded and sent alongwith the vectors The predictionerror is also added to the pre-dicted P picture to create a locallydecoded P picture that alsoenters the output store

Trang 18

Figure 2.16c.

TablesIn

SpatialCoder

RateControl

Disablefor I,P

SpatialData

VectorsOut

Norm

Reorder

ForwardVectors

BackwardVectors

GOPControl

CurrentPicture

OutF

MotionEstimator

PastPicture

SpatialDecoder

SubtractPass (I)

Forward

- Backward Decision

BackwardPredictor

FuturePicture

SpatialDecoder

B Pictures

(shaded area is unused)

output is spatially coded andthe vectors are added in a multi-plexer Syntactical data is alsoadded which identifies the type

of picture (I, P or B) and vides other information to help adecoder (see section 4) The out-put data are buffered to allowtemporary variations in bit rate

pro-If the bit rate shows a long termincrease, the buffer will tend tofill up and to prevent overflowthe quantization process willhave to be made more severe.Equally, should the buffer showsigns of underflow, the quantiza-tion will be relaxed to maintainthe average bit rate This meansthat the store contains exactlywhat the store in the decoderwill contain, so that the results

of all previous coding errors arepresent These will automatically

be reduced when the predicted

forward or backward data areselected according to which rep-resent the smallest differences

The picture differences are thenspatially coded and sent withthe vectors

When all of the intermediate

B pictures are coded, the inputmemory will once more bebypassed to create a new P pic-ture based on the previous

P picture

Figure 2.17 shows an MPEGcoder The motion compensator

The output store then contains

an I picture and a P picture A

B picture from the input buffercan now be selected The motioncompensator, see Figure 2.16c,will compare the B picture withthe I picture that precedes it andthe P picture that follows it toobtain bidirectional vectors

Forward and backward motioncompensation is performed toproduce two predicted B pictures

These are subtracted from thecurrent B picture On a macro-block-by-macroblock basis, the

In

OutDemandClock

QuantizingTablesSpatialDataMotionVectors

SyntacticalData

Entropy andRun Length CodingDifferentialCoder

Bidirectional

Coder

(Fig 2.13)

Trang 19

this behavior would result in aserious increase in vector data.

In 60 Hz video, 3:2 pulldown isused to obtain 60 Hz from 24 Hzfilm One frame is made intotwo fields, the next is made intothree fields, and so on

Consequently, one field in five

is completely redundant MPEGhandles film material best bydiscarding the third field in 3:2systems A 24 Hz code in thetransmission alerts the decoder

to recreate the 3:2 sequence byre-reading a field store In 50and 60 Hz telecine, pairs offields are deinterlaced to createframes, and then motion is measured between frames Thedecoder can recreate interlace

by reading alternate lines in theframe store

A cut is a difficult event for acompressor to handle because itresults in an almost completeprediction failure, requiring alarge amount of correction data

If a coding delay can be tolerated,

a coder may detect cuts inadvance and modify the GOPstructure dynamically, so thatthe cut is made to coincide withthe generation of an I picture Inthis case, the cut is handledwith very little extra data Thelast B pictures before the I framewill almost certainly need touse forward prediction In someapplications that are not real-time, such as DVD mastering, acoder could take two passes atthe input video: one pass toidentify the difficult or highentropy areas and create a codingstrategy, and a second pass toactually compress the input video

If a high compression factor isrequired, the level of artifactscan increase, especially if inputquality is poor In this case, itmay be better to reduce theentropy entering the coder usingprefiltering The video signal issubject to two-dimensional,low-pass filtering, whichreduces the number of coeffi-cients needed and reduces thelevel of artifacts The picturewill be less sharp, but lesssharpness is preferable to a highlevel of artifacts

In most MPEG-2 applications,4:2:0 sampling is used, whichrequires a chroma downsam-pling process if the source is4:2:2 In MPEG-1, the luminanceand chroma are further down-sampled to produce an inputpicture or SIF (Source InputFormat), that is only 352-pixelswide This technique reducesthe entropy by a further factor

For very high compression, theQSIF (Quarter Source InputFormat) picture, which is 176-pixels wide, is used

Downsampling is a process thatcombines a spatial low-pass filterwith an interpolator Downsam-pling interlaced signals is prob-lematic because vertical detail isspread over two fields whichmay decorrelate due to motion

When the source material istelecine, the video signal hasdifferent characteristics thannormal video In 50 Hz video,pairs of fields represent thesame film frame, and there is nomotion between them Thus, themotion between fields alternatesbetween zero and the motionbetween frames Since motionvectors are sent differentially,

2.12 Preprocessing

A compressor attempts to

elimi-nate redundancy within the

picture and between pictures

Anything which reduces that

redundancy is undesirable

Noise and film grain are

particu-larly problematic because they

generally occur over the entire

picture After the DCT process,

noise results in more non-zero

coefficients, which the coder

cannot distinguish from genuine

picture data Heavier quantizing

will be required to encode all of

the coefficients, reducing picture

quality Noise also reduces

simi-larities between successive

pic-tures, increasing the difference

data needed

Residual subcarrier in video

decoded from composite video

is a serious problem because it

results in high, spatial frequencies

that are normally at a low level

in component programs

Subcar-rier also alternates from picture

to picture causing an increase in

difference data Naturally, any

composite decoding artifact

that is visible in the input to the

MPEG coder is likely to be

reproduced at the decoder

Any practice that causes

un-wanted motion is to be avoided

Unstable camera mountings, in

addition to giving a shaky picture,

increase picture differences and

vector transmission requirements

This will also happen with

telecine material if film weave

or hop due to sprocket hole

damage is present In general,

video that is to be compressed

must be of the highest quality

possible If high quality cannot

be achieved, then noise reduction

and other stabilization techniques

will be desirable

Trang 20

Figure 2.18.

picture with moderate noise ratio results If, however,that picture is locally decodedand subtracted pixel-by-pixelfrom the original, a quantizingnoise picture results This picturecan be compressed and trans-mitted as the helper signal Asimple decoder only decodesthe main, noisy bit stream, but amore complex decoder candecode both bit streams andcombine them to produce a low noise picture This is theprinciple of SNR scaleability

signal-to-As an alternative, coding onlythe lower spatial frequencies in

a HDTV picture can produce amain bit stream that an SDTVreceiver can decode If the lowerdefinition picture is locallydecoded and subtracted fromthe original picture, a definition-enhancing picture would result.This picture can be coded into ahelper signal A suitable decodercould combine the main andhelper signals to recreate theHDTV picture This is the principle of Spatial scaleability.The High profile supports bothSNR and spatial scaleability aswell as allowing the option of4:2:2 sampling

The 4:2:2 profile has beendeveloped for improved compat-ibility with digital productionequipment This profile allows4:2:2 operation without requiringthe additional complexity ofusing the high profile Forexample, an HP@ML decodermust support SNR scaleability,which is not a requirement forproduction The 4:2:2 profilehas the same freedom of GOPstructure as other profiles, but

in practice it is commonly usedwith short GOPs making editingeasier 4:2:2 operation requires ahigher bit rate than 4:2:0, andthe use of short GOPs requires

an even higher bit rate for agiven quality

low level uses a low resolutioninput having only 352 pixelsper line The majority of broad-cast applications will requirethe MP@ML (Main Profile atMain Level) subset of MPEG,which supports SDTV (StandardDefinition TV)

The high-1440 level is a highdefinition scheme that doublesthe definition compared to themain level The high level notonly doubles the resolution butmaintains that resolution with16:9 format by increasing thenumber of horizontal samplesfrom 1440 to 1920

In compression systems usingspatial transforms and requan-tizing, it is possible to producescaleable signals A scaleableprocess is one in which the inputresults in a main signal and a

"helper" signal The main signalcan be decoded alone to give apicture of a certain quality, but,

if the information from the helpersignal is added, some aspect ofthe quality can be improved

For example, a conventionalMPEG coder, by heavily requan-tizing coefficients, encodes a

2.13 Profiles and levelsMPEG is applicable to a widerange of applications requiringdifferent performance and com-plexity Using all of the encodingtools defined in MPEG, there aremillions of combinations possi-ble For practical purposes, theMPEG-2 standard is dividedinto profiles, and each profile

is subdivided into levels (seeFigure 2.18) A profile is basically

a subset of the entire codingrepertoire requiring a certaincomplexity A level is a parame-ter such as the size of the picture

or bit rate used with that profile

In principle, there are 24 nations, but not all of these havebeen defined An MPEG decoderhaving a given Profile and Levelmust also be able to decodelower profiles and levels

combi-The simple profile does not port bidirectional coding, and soonly I and P pictures will beoutput This reduces the codingand decoding delay and allowssimpler hardware The simpleprofile has only been defined atMain level (SP@ML)

sup-The Main Profile is designed for

a large proportion of uses The

80 Mb/sI,P,B4:2:01440x1152

60 Mb/sI,P,B4:2:0720x576

15 Mb/sI,P,B4:2:0352x288

4 Mb/sI,P,B

4:2:0720x576

15 Mb/sI,P

4:2:2720x608

50 Mb/sI,P,B

4:2:2PROFILE

4:2:0352x288

4 Mb/sI,P,B

4:2:0720x576

15 Mb/sI,P,B

4:2:01440x1152

60 Mb/sI,P,B

4:2:0, 4:2:21440x1152

80 Mb/sI,P,B4:2:0, 4:2:2720x576

20 Mb/sI,P,B

4:2:0, 4:2:21920x1152

100 Mb/sI,P,B

Trang 21

Figure 2.19.

of pitch in steady tones

For video coding, wavelets havethe advantage of producing reso-lution scaleable signals withalmost no extra effort In movingvideo, the advantages of waveletsare offset by the difficulty ofassigning motion vectors to avariable size block, but in still-picture or I-picture coding this

Figure 2.19 contrasts the fixedblock size of the DFT/DCT withthe variable size of the wavelet

Wavelets are especially usefulfor audio coding because theyautomatically adapt to the con-flicting requirements of theaccurate location of transients intime and the accurate assessment

2.14 Wavelets

All transforms suffer from

uncertainty because the more

accurately the frequency domain

is known, the less accurately the

time domain is known (and vice

versa) In most transforms such

as DFT and DCT, the block

length is fixed, so the time and

frequency resolution is fixed

The frequency coefficients

rep-resent evenly spaced values on

a linear scale Unfortunately,

because human senses are

loga-rithmic, the even scale of the

DFT and DCT gives inadequate

frequency resolution at one end

and excess resolution at the other

The wavelet transform is not

affected by this problem because

its frequency resolution is a

fixed fraction of an octave and

therefore has a logarithmic

characteristic This is done by

changing the block length as a

function of frequency As

fre-quency goes down, the block

becomes longer Thus, a

charac-teristic of the wavelet transform

is that the basis functions all

contain the same number of

cycles, and these cycles are

sim-ply scaled along the time axis to

search for different frequencies

FFT

WaveletTransform

ConstantSize Windows

in FFT

ConstantNumber of Cycles

in Basis Function

Trang 22

Figure 3.1.

frequencies available determinesthe frequency range of humanhearing, which in most people

is from 20 Hz to about 15 kHz.Different frequencies in theinput sound cause differentareas of the membrane tovibrate Each area has differentnerve endings to allow pitchdiscrimination The basilarmembrane also has tiny musclescontrolled by the nerves thattogether act as a kind of positivefeedback system that improvesthe Q factor of the resonance.The resonant behavior of thebasilar membrane is an exactparallel with the behavior of atransform analyzer According

to the uncertainty theory oftransforms, the more accuratelythe frequency domain of a signal

is known, the less accurately thetime domain is known

Consequently, the more able atransform is able to discriminatebetween two frequencies, theless able it is to discriminatebetween the time of two events.Human hearing has evolvedwith a certain compromise thatbalances time-uncertainty discrimination and frequencydiscrimination; in the balance,neither ability is perfect

The imperfect frequency discrimination results in theinability to separate closelyspaced frequencies This inability

is known as auditory masking,defined as the reduced sensitivity

to sound in the presence

of another

The physical hearing mechanismconsists of the outer, middleand inner ears The outer earcomprises the ear canal and theeardrum The eardrum convertsthe incident sound into a vibra-tion in much the same way asdoes a microphone diaphragm

The inner ear works by sensingvibrations transmitted through afluid The impedance of fluid ismuch higher than that of air andthe middle ear acts as an imped-ance-matching transformer thatimproves power transfer

Figure 3.1 shows that vibrationsare transferred to the inner ear

by the stirrup bone, which acts

on the oval window Vibrations

in the fluid in the ear travel upthe cochlea, a spiral cavity inthe skull (shown unrolled inFigure 3.1 for clarity) The basilarmembrane is stretched acrossthe cochlea This membranevaries in mass and stiffnessalong its length At the end nearthe oval window, the membrane

is stiff and light, so its resonantfrequency is high At the distantend, the membrane is heavy andsoft and resonates at low fre-quency The range of resonant

SECTION 3 AUDIO COMPRESSIONLossy audio compression isbased entirely on the character-istics of human hearing, whichmust be considered before anydescription of compression ispossible Surprisingly, humanhearing, particularly in stereo, isactually more critically discrim-inating than human vision, andconsequently audio compressionshould be undertaken with care

As with video compression,audio compression requires anumber of different levels ofcomplexity according to therequired compression factor

3.1 The hearing mechanismHearing comprises physicalprocesses in the ear and nervous/

mental processes that combine

to give us an impression ofsound The impression wereceive is not identical to theactual acoustic waveform present

in the ear canal because someentropy is lost Audio compres-sion systems that lose only thatpart of the entropy that will belost in the hearing mechanismwill produce good results

OuterEar

EarDrumMiddleEar

StirrupBone

Basilar Membrane

InnerEarCochlea (unrolled)

Trang 23

Figure 3.2a.

3.2 Subband codingFigure 3.4 shows a band-splittingcompandor The band-splittingfilter is a set of narrow-band,linear-phase filters that overlapand all have the same band-width The output in each bandconsists of samples representing

a waveform In each frequencyband, the audio input is ampli-fied up to maximum level prior

to transmission Afterwards, eachlevel is returned to its correctvalue Noise picked up in thetransmission is reduced in eachband If the noise reduction iscompared with the threshold ofhearing, it can be seen thatgreater noise can be tolerated insome bands because of masking

Consequently, in each band aftercompanding, it is possible toreduce the wordlength of sam-ples This technique achieves acompression because the noiseintroduced by the loss of resolu-tion is masked

be present for at least about 1 millisecond before it becomesaudible Because of this slowresponse, masking can still takeplace even when the two signalsinvolved are not simultaneous

Forward and backward maskingoccur when the masking soundcontinues to mask sounds atlower levels before and after themasking sound's actual duration

Figure 3.3 shows this concept

Masking raises the threshold ofhearing, and compressors takeadvantage of this effect by raisingthe noise floor, which allowsthe audio waveform to beexpressed with fewer bits Thenoise floor can only be raised atfrequencies at which there iseffective masking To maximizeeffective masking, it is necessary

to split the audio spectrum intodifferent frequency bands toallow introduction of differentamounts of companding andnoise in each band

Figure 3.2a shows that the

threshold of hearing is a function

of frequency The greatest

sensi-tivity is, not surprisingly, in the

speech range In the presence of

a single tone, the threshold is

modified as in Figure 3.2b Note

that the threshold is raised for

tones at higher frequency and to

some extent at lower frequency

In the presence of a complex

input spectrum, such as music,

the threshold is raised at nearly

all frequencies One consequence

of this behavior is that the hiss

from an analog audio cassette is

only audible during quiet

pas-sages in music Companding

makes use of this principle by

amplifying low-level audio

sig-nals prior to recording or

trans-mission and returning them to

their correct level afterwards

The imperfect time

discrimina-tion of the ear is due to its

reso-nant response The Q factor is

such that a given sound has to

20 Hz

MaskingThreshold

Sub-BandFilter

LevelDetectX

Trang 24

Figure 3.5.

be reversed at the decoder.The filter bank output is alsoanalyzed to determine the spec-trum of the input signal Thisanalysis drives a masking modelthat determines the degree ofmasking that can be expected ineach band The more maskingavailable, the less accurate thesamples in each band can be.The sample accuracy is reduced

by requantizing to reducewordlength This reduction isalso constant for every word in

a band, but different bands canuse different wordlengths Thewordlength needs to be trans-mitted as a bit allocation codefor each band to allow thedecoder to deserialize the bitstream properly

3.3 MPEG Layer 1Figure 3.6 shows an MPEGLevel 1 audio bit stream.Following the synchronizingpattern and the header, there are32-bit allocation codes of fourbits each These codes describethe wordlength of samples ineach subband Next come the 32scale factors used in the com-panding of each band Thesescale factors determine the gainneeded in the decoder to returnthe audio to the correct level.The scale factors are followed,

in turn, by the audio data ineach band

Figure 3.7 shows the Layer 1decoder The synchronizationpattern is detected by the timinggenerator, which deserializes thebit allocation and scale factordata The bit allocation data thenallows deserialization of thevariable length samples Therequantizing is reversed and thecompression is reversed by thescale factor data to put eachband back to the correct level.These 32 separate bands arethen combined in a combinerfilter which produces the audio output

output of the filter there are 12samples in each of 32 bands

Within each band, the level isamplified by multiplication tobring the level up to maximum

The gain required is constant forthe duration of a block, and asingle scale factor is transmittedwith each block for each band

in order to allow the process to

Figure 3.5 shows a simple splitting coder as is used inMPEG Layer 1 The digitalaudio input is fed to a band-splitting filter that divides thespectrum of the signal into anumber of bands In MPEG thisnumber is 32 The time axis isdivided into blocks of equallength In MPEG layer 1, this is

band-384 input samples, so in the

In

OutMUX

X32

MaskingThresholds

DynamicBit andScale FactorAllocaterand Coder

}

SubbandSamples

384 PCM Audio Input SamplesDuration 8 msec @ 48 kHzHeader

20 Bit System

12 Bit Sync

CRC

OptionalBit Allocation

32InputFilterX

ScaleFactors

X32

X32SamplesX32Bit

AllocationDemux

Figure 3.6.

Figure 3.7.

Tiêu đề	A Guide to MPEG Fundamentals and Protocol Analysis
Trường học	Tektronix, Inc.
Chuyên ngành	MPEG and Protocol Analysis
Thể loại	guide
Năm xuất bản	1997

Định dạng
Số trang	48
Dung lượng	1,4 MB