These are: † Full-Rate FR codec † Half-Rate HR codec † Enhanced Full-Rate EFR codec † Adaptive Multi-Rate AMR codec † Adaptive Multi-Rate Wideband AMR-WB codec All voice codecs include s
Trang 1Chapter 14: Voice Codecs
Kari Ja¨rvinen1
14.1 Overview
Five voice codecs have been standardised for GSM These are:
† Full-Rate (FR) codec
† Half-Rate (HR) codec
† Enhanced Full-Rate (EFR) codec
† Adaptive Multi-Rate (AMR) codec
† Adaptive Multi-Rate Wideband (AMR-WB) codec
All voice codecs include speech coding (source coding), channel coding (error protection and bad frame detection), concealment of erroneous or lost frames (bad frame handling), Voice Activity Detection (VAD), and a low bit rate source controlled mode for coding background noise The codecs operate either in the GSM full-rate traffic channel at the gross bit rate of 22.8 kbit/s (FR, EFR, AMR-WB), or in the half-rate channel at the gross bit rate of 11.4 kbit/s (HR), or in both (AMR) AMR and AMR-WB have also been specified for use in 3G WCDMA
The FR codec [1] was the first voice codec defined for GSM The codec was standardised in
1989 It uses 13.0 kbit/s for speech coding and 9.8 kbit/s for channel coding FR is the default codec to provide speech service in GSM
The HR codec [2] was developed to bring channel capacity savings through operation in the half-rate channel The codec was standardised in 1995 It operates at 5.6 kbit/s speech coding bit rate with 5.8 kbit/s used for channel coding The codec provides the same level of speech quality as the FR codec, except in background noise and in tandem (two encodings in MS-to-MS calls) where the performance is somewhat lower
The EFR codec [3] was the first codec to provide digital cellular systems with voice quality equivalent to that of a wireline telephony reference (ITU G.726-32 ADPCM standard at 32 kbit/s) The EFR codec brings substantial quality improvement over the previous GSM codecs EFR was standardised first for the GSM based PCS 1900 system in the US during
1995 and was adopted to GSM in 1996 The EFR codec uses 12.2 kbit/s for speech coding and 10.6 kbit/s for channel coding
A further development in GSM voice quality was the standardisation of the AMR codec [4]
in 1999 AMR offers substantial improvement over EFR in error robustness in the full-rate
1 The views expressed in this chapter are those of the author and do not necessarily reflect the views of his affiliation entity.
Copyright q 2001 John Wiley & Sons Ltd ISBNs: 0-470-84322-5 (Hardback); 0-470-845546 (Electronic)
Trang 2channel by adapting speech and channel coding depending on channel conditions Channel capacity is gained by switching to operate in the half-rate channel during good channel conditions The AMR codec includes several modes for use both in the full- and half-rate channel The speech coding bit rates are between 4.75 and 12.2 kbit/s in the full-rate channel (eight modes) and between 4.75 and 7.95 kbit/s in the half-rate channel (six modes) The AMR codec was adopted in 1999 by 3GPP as the default speech codec to the 3G WCDMA system
The AMR-WB codec [5] is the most recent voice codec It was standardised in 2001 for both GSM and 3G WCDMA systems Later in 2001, rapporteur’s meeting of ITU-T Q.7/16 choose the AMR-WB speech codec for the new ITU-T wideband coding algorithm of speech
at around 16 kbit/s AMR-WB is an adaptive multi-rate codec like the AMR (narrowband) codec AMR-WB brings quality improvement through the use of extended audio bandwidth While all previous codecs in digital cellular systems operate on narrow audio bandwidth limited below 3.4 kHz, AMR-WB extends the bandwidth to 7 kHz Wideband coding brings improved voice quality especially in terms of increased voice naturalness AMR-WB consists
of nine modes operating at speech coding bit rates between 6.6 and 23.85 kbit/s
A voice codec related development in GSM and 3G WCDMA was the definition of in-band Tandem Free Operation (TFO) This feature was completed including TFO for AMR in March 2001 [23] TFO brings improvement in speech quality for MS-to-MS calls by avoiding double transcoding in the network TFO can be employed when the same speech codec is used at both ends of the call
The speech coding part in all the voice codecs is based on the use of Linear Predictive Coding (LPC) All except the FR codec belong to the class of speech coding algorithms generally known as Code Excited Linear Prediction (CELP) All codecs operate at the sampling rate of 8 kHz except AMR-WB which uses 16 kHz sampling rate Channel coding
in all codecs is based on convolution coding for error correction combined with Cyclic Redundancy Check (CRC) for error detection Three protection classes are typically used: bits protected by the convolutional code and CRC, bits protected by the convolutional code alone, and bits without any error protection
The voice codec specifications define the speech codec bit-exactly to guarantee high basic voice quality For bad frame handling, only an example solution is given to allow the possibility for implementation-specific performance improvements in error concealment Tables 14.1–14.3 give a summary of the GSM voice codecs: standards, implementation complexity, and algorithmic delay
14.2 Codec Selection Process
The development of GSM voice codecs has been carried out in ETSI SMG11 and in its predecessors Finalisation of channel coding has taken place under SMG2 The AMR-WB codec was developed jointly by SMG11 and 3GPP TSG-SA WG4
All the voice codecs have been chosen through a competitive selection process among several candidate codec algorithms
Before the codec selection process starts, speech quality performance requirements and codec design constraints (e.g implementation complexity and transmission delay) have to be defined For the most recent codecs (AMR and AMR-WB), the launch of standardisation has been preceded by a feasibility study phase to validate the new codec concept
Trang 3Table 14.1 Voice codec standards
standard
Speech coding bit-rate (in kbit/s)
System/traffic channel
Speech coding algorithm
Long Term Prediction (RPE-LTP)
Prediction (VSELP)
Linear Prediction (ACELP)
7.95, 7.4, 6.7, 5.9, 5.15, 4.75
GSM FR (all eight modes), GSM HR (six lowest modes), 3G WCDMA (all modes)
Algebraic Code Excited Linear Prediction (ACELP)
AMR-WB
codec
19.85, 18.25, 15.85, 14.25, 12.65, 8.85, 6.60
GSM FR (seven lowest modes), EDGE (all modes), 3G WCDMA (all modes)
Algebraic Code Excited Linear Prediction (ACELP)
WMOPS Data
RAM (16-bit kwords)
Data ROM (16-bit kwords)
Program ROM (1000 assembly instructions)
WMOPS Data
RAM (16-bit kwords)
Data ROM (16-bit kwords)
Program ROM (1000 assembly instructions)
AMR
codeca
2.9 (HR)
2.6 (FR), 2.4 (HR)
AMR-WB
codeca
a
Complexity of channel quality measurement and mode control is counted as part of channel coding
Trang 4The selection process typically consists of two phases: a qualification (pre-selection) phase and a selection phase During the qualification phase, the most promising candidate codecs are chosen to enter the selection phase The qualification is usually based on in-house listening tests In the selection phase, the codec proposals are tested more comprehensively
in several independent test laboratories and using multiple languages The codec proposals are implemented in C-code with fixed-point arithmetics For both phases, the codec propo-nents need to deliver documentation of their proposal including a justification of meeting all design constraints The codec selection is based both on the speech quality of the candidate codecs and on fulfilling other design requirements
After codec selection, a verification phase and a characterisation phase will take place An optimisation phase may be launched to improve some key performances of the codec if there
is sufficient promise of improvement During the verification phase, the codec is subjected to further analysis to verify its suitability for the intended systems and applications A detailed analysis of implementation complexity and transmission delay is also carried out during this phase The final phase of codec standardisation is the characterisation phase This is launched after the approval of the codec standard to characterise the codec in a large variety of operational conditions The output is a technical report on performance characterisation which provides information on codec performance
14.3 FR Codec
In the FR voice codec, the speech coding part is based on the Regular Pulse Excitation – Long Term Prediction (RPE-LTP) algorithm [6] The frame length is 20 ms, i.e a set of codec parameters are produced every 20 ms The speech codec operates at 13.0 kbit/s while 9.8 kbit/
s is used for channel coding FR is the default codec to provide speech service in GSM The FR speech codec carries out short-term LPC analysis once every frame (without any lookahead over future samples) The rest of the coding is performed in 5 ms sub-frames The short-term residual signal, after LPC analysis, is further compressed by using Long-Term Prediction (LTP) analysis LTP removes any long-term correlation remaining in the short-term residual signal The long-short-term residual is then decimated into a sparse signal in which only every third sample has a non-zero value The non-zero samples are located on a regular grid The grid starting position is determined separately for each sub-frame based on the energy of the sub-frame This Regular Pulse Excitation (RPE) approach results in rather efficient coding Only the non-zero samples in the long-term residual need to be quantised
length (ms)
Lookahead in LPC analysis (ms)
Trang 5and sent to the decoder The parameters for each 20 ms frame consist of a set of LPC-coefficients (reflection LPC-coefficients) and a set of parameters describing the short-term residual for each sub-frame (LTP parameters, RPE parameters) A block diagram of the encoder is shown in Figure 14.1
The FR channel codec uses convolution coding for protecting the 182 most important bits out of the 260 bits in each frame [11] A 3 bit CRC is employed for bad frame detection The CRC covers the most important 50 bits
The FR codec, like all GSM and 3G WCDMA codecs, includes a low bit rate source controlled mode for coding background noise only (voice activity detection with discontin-uous transmission) This saves power in the mobile station and also reduces the overall interference level over the air-interface
The complexity of the FR speech codec is about 3.0 WMOPS (weighted million operations per second) The complexity has been estimated from a C-code implemented with a fixed point function library in which each operation has been assigned a weight representative for performing the operation on a typical DSP The channel coding requires about 1.7 WMOPS [13]
14.4 HR Codec
The HR codec employs the Vector-Sum Excited Linear Prediction (VSELP) speech coding algorithm [7] VSELP belongs to the class of CELP codecs The codec uses 20 ms frame length The speech codec operates at the bit rate of 5.6 kbit/s while 5.8 kbit/s is used for channel coding
Like most CELP codecs the HR VSELP employs two codebooks: a fixed codebook and an adaptive codebook The adaptive codebook is derived from the long-term filter state (and therefore the content of the codebook changes frame-by-frame) The adaptive codebook is
Trang 6used to generate a periodic component in the excitation, while the fixed codebook generates a random-like component The excitation sequence is coded by choosing the best match from each of the two codebooks The excitation that produces the least decoding error is chosen (analysis-by-synthesis) The codebook indices and gains are computed once for every 5 ms subframe A lookahead of 35 samples is employed in the LPC analysis
A specific feature in VSELP is the structure of the fixed codebook The fixed codebook is constructed as a linear combination (vector sum) of only a small amount of basis vectors There are four modes based on how voiced each 20 ms speech frame is For the least voiced mode, the adaptive codebook is not used at all, and a second fixed VSELP codebook is used instead
The HR channel codec employs convolution coding protecting 95 out of the 112 bits in each frame [11] A 3 bit CRC covers the most important 22 bits
The HR codec provides the same level of speech quality as the FR codec, except in background noise and during tandem (two encodings in MS-to-MS calls) where the perfor-mance lacks somewhat behind the FR codec
The computational complexity of the speech codec is about 18.5 WMOPS The complexity
of the channel codec is about 2.7 WMOPS [13,14]
14.5 EFR Codec
The EFR codec gives substantial quality improvement compared to the previous GSM codecs EFR is the first codec to provide digital cellular systems with quality equivalent to that of a wireline telephony reference (ITU G.726-32 ADPCM standard at 32 kbit/s) The EFR codec standardisation started in ETSI in 1995 Wireline quality was set as a development target because GSM had become increasingly used in communication environ-ments where it started to compete directly with fixed line or cordless systems To be compe-titive also with respect to speech quality, GSM needed to provide wireline speech quality which is robust to typical usage conditions such as background noise and transmission errors
A similar development of enhanced quality full-rate codec was carried out for the GSM based PCS 1900 system in the US during 1995 An EFR codec was standardised for the PCS
1900 system already in 1995 The PCS 1900 EFR codec was one candidate considered for the GSM EFR standard and it was adopted to GSM through a competitive selection process In addition to voice quality performance, the advantage of using the same voice codec in PCS 1900 and in GSM was one factor in favour of this particular solution The GSM EFR codec standard was approved in January 1996 (at SMG#17)
The EFR speech codec is based on the Algebraic Code Excited Linear Prediction (ACELP) algorithm [8,18] The speech coding bit rate is 12.2 kbit/s whereas 10.6 kbit/s is used for channel coding The codec operates on 20 ms frames which are divided into four 5 ms sub-frames Two sets of LPC parameters are calculated for each 20 ms frame with no lookahead over samples in the next frame EFR employs an adaptive and a fixed codebook The code-book parameters are computed once for each 5 ms sub-frame The name ACELP refers to the type of fixed codebook where algebraic code is used to populate the excitation vectors The ACELP codebook contains a small number of non-zero pulses with predefined interlaced sets
of positions In EFR, the 40 positions in each 5 ms sub-frame are divided into five tracks where each track contains two pulses Each excitation vector contains ten non-zero pulses with amplitudes of 21 or 11 Figure 14.2 gives a block diagram of the EFR speech encoder
Trang 7The EFR channel codec is almost the same as the FR channel codec because a key design aim was to keep it as similar as possible During the GSM EFR codec standardisation, the use
of the existing FR channel codec (or existing GSM generator polynomials) was encouraged since this minimises hardware changes in the GSM base stations and thus potentially speeds
up the introduction of the EFR codec In the PCS 1900 EFR codec standardisation, the use of the existing FR channel codec was a mandatory requirement Therefore, the FR channel codec was included in the EFR channel codec as a module together with additional error protection [11] The additional 0.8 kbit/s error protection consists of an 8 bit CRC to provide improved detection of frame errors and a repetition code for improved error correction The implementation complexity of EFR is lower than that of the HR codec Computational complexity of the speech codec is about 15.2 WMOPS The complexity of the channel codec
is about the same as for the FR codec [13,16]
Figure 14.3a,b shows the performance of the EFR codec compared to the FR codec and to a wireline quality reference G.726-32 (32 kbit/s ADPCM) [15,16] Figure 14.3a shows the performance for clean speech under transmission errors The dotted line shows the perfor-mance of (error-free) G.726-32 The EFR codec gives substantial improvement over the FR codec in the error-free channel and in error conditions down to a carrier-to-interference ratio (C/I) of 7 dB EFR provides wireline quality still at approximately 10 dB C/I Figure 14.3b shows the performance in background noise for the error-free channel with four types of background noise (home 20 dB SNR, car 15 and 25 dB SNR, street 10 dB SNR, and office 20
dB SNR) The results demonstrate substantial improvement over FR In many test cases EFR exceeds the performance of the wireline quality reference of 32 kbit/s ADPCM
14.6 AMR Codec
The EFR codec was the first codec to provide digital cellular systems with quality equivalent
to that of a wireline telephony reference However, it still left some room for improvements
In particular, the performance in severe channel error conditions could be improved by
Trang 8employing a different bit allocation between speech and channel coding This led into devel-opment of a new type of codec that is able to adapt to the channel quality conditions The concept of adaptive multi-rate coding was born
The standardisation of AMR was launched in October 1997 (at SMG#23) Already before that a one-year feasibility study had been carried out to validate the novel AMR concept The selection process consisted of a qualification and a selection phase The qualification phase was carried out during spring 1998 Altogether 11 candidate codecs were submitted In June
1998 (at SMG#26), the five most promising candidates were chosen to enter the selection tests The selection phase took place from July 1998 until the selection of the codec in October 1998 (at SMG#27) After the selection, a short optimisation phase took place The optimisation was focused on making improvements for the channel coding part and bringing corrections to the codec C-code During the optimisation, the complexity of channel coding was reduced while at the same time obtaining some performance improvements The AMR codec standard was formally approved in February 1999 at SMG#28 (speech coding part) and
back-ground noise
Trang 9in June 1999 at SMG#29 (channel coding part) A detailed description of the AMR codec standardisation process can be found in [19]
The main principle behind AMR is to adapt to radio channel and traffic load conditions and select the optimum channel mode (full-rate or half-rate) and codec mode (bit rate trade-off between speech and channel coding) to deliver the best combination of speech quality and system capacity AMR provides good overall performance and high granularity of bit rates making it suitable also for systems and applications other than GSM In 1999, the AMR codec was adopted by 3GPP as the default speech codec to provide speech service in the 3G WCDMA system
The AMR codec contains a set of fixed rate speech and channel codecs, link adaptation, and in-band signalling Each AMR codec mode provides a different level of error protection through a different distribution of the available gross bit rate between speech and channel coding The link adaptation process bears responsibility for measuring the channel quality and selecting the optimal speech and channel codecs In-band signalling transmits the measured channel quality and codec mode information over the air-interface The in-band signalling is transmitted along with the speech data
The AMR speech codec utilises the ACELP algorithm employed also in the EFR codec The frame length is 20 ms which is divided into four 5 ms sub-frames A 5 ms lookahead is used The codec contains eight codec modes with speech coding bit rates of 12.2, 10.2, 7.95, 7.4, 6.7, 5.9, 5.15 and 4.75 kbit/s [9,20] As seen in Figure 14.4a, all the speech codecs are employed in the full-rate channel, while the six lowest ones are used in the half-rate channel All the modes provide seamless switching between each other The GSM EFR, D-AMPS EFR, and PDC EFR speech codecs are included in the AMR as the 12.2, 7.4 and 6.7 kbit/s modes, respectively Some minor harmonisation to other modes (e.g in the post-processing) has been carried out for these codecs when used within AMR The AMR 12.2 kbit/s speech codec was later defined as an alternative implementation of the EFR speech codec Error protection in GSM is based on Recursive Systematic Convolutional (RSC) coding with puncturing to obtain the required bit rates [11] Each codec mode also employs a 6 bit CRC for detecting bad frames All channel codecs use convolution polynomials previously specified for GSM (either for speech or data traffic channels) to maximise commonality with the existing GSM system For 3G channels, the general channel coding toolbox of the 3G WCDMA system is used
Figure 14.5 shows a basic block diagram of the AMR codec in GSM The Mobile Station (MS) and the Base Transceiver Station (BTS) both perform channel quality estimation for the receive signal path Based on the channel quality measurements, a codec mode command (over downlink to the MS) or a codec mode request (over uplink to network) is sent in-band over the air-interface The receiving end uses this information to choose the best codec mode for the prevailing channel conditions A codec mode indicator is also sent over the air-interface to indicate the current mode of operation The codec mode in the uplink may be different from the one used in the downlink on the same air-interface, but the channel mode (FR or HR) must be the same
The network controls the uplink and downlink codec modes and channel modes The mobile station must obey the codec mode command from the network, while the network may use any complementing information, in addition to codec mode request, to determine the downlink codec mode The mobile station must implement all the codec modes However, the network can support any combination of them, based on the choice of the operator In GSM,
Trang 10the in-band signalling supports adaptation between up to four active codec modes The set of active codec modes is selected at call set-up (and in handover) Codec mode command/ request and codec mode indication are transmitted in every other speech frame in GSM (alternating within consecutive frames) Therefore, the codec mode can be changed every
40 ms In 3G WCDMA, AMR can adapt between all the eight modes and can switch modes every 20 ms To obtain interoperability with GSM AMR under TFO, the 3G AMR adaptation rate can be limited to 40 ms in uplink
Link adaptation is an essential part of the AMR codec It consists of channel quality measurement and codec/channel mode adaptation algorithms [12,21] Link adaptation in AMR is two-fold: it adapts the bit-partitioning between speech and channel coding within
a transmission channel (codec mode), and the operation in the GSM full- and half-rate channels (channel mode) Depending on the channel quality and possible network constraints (e.g network load), link adaptation selects the optimal codec and channel mode Figure 14.4b shows an example of how the codec mode adaptation operates in the GSM full-rate channel under dynamic error conditions Channel quality varies between about 22 and 2 dB C/I Based
adaptation in GSM full-rate channel under dynamic error conditions