Items that have changed in this revision Specific changes that have been incorporated in this revision are: • refinement of the STI model with respect to the level dependent masking fun
General
Rationale for the STI method
The STI method is a rapid and objective approach for assessing the speech transmission quality of communication channels By utilizing the speech transmission index, it enables the prediction of speech intelligibility across various word and sentence formats in diverse speech transmission systems.
In speech, the signal's intensity changes over time, creating a varying intensity envelope Slow fluctuations in this envelope indicate word and sentence boundaries, while rapid fluctuations align with individual phonemes The preservation of the intensity envelope is crucial within the Speech Transmission Index (STI) framework.
The Speech Transmission Index (STI) differs from the traditional articulation index by focusing on how the intensity envelope of speech signals is influenced by transmission channels, rather than solely on signal-to-noise ratios across various speech spectral bands It employs a Modulation Transfer Function (MTF) to quantify the impact of the channel on the speech signal's intensity envelope.
The STI produces a metric on a scale of 0 to 1, based on weighted contributions from a range of frequency bands present in speech
The STI method and its derivatives (see below) can be used to determine the potential intelligibility of a speech transmission channel at various locations and for various conditions
In particular, the effect of changes in the acoustic properties of spaces can be assessed.
Applicability of the STI method
The STI method is a validated and objective measure of speech transmission quality, effectively assessing various acoustic and electro-acoustical distortions that affect intelligibility However, its simplification of human speech can limit its applicability, leading to potentially inaccurate intelligibility predictions when used beyond its intended scope This article provides an overview of the applications and limitations of the STI method, guiding users in selecting the most appropriate STI approach to achieve meaningful and accurate results.
The STI method was validated for acoustic output with a single omnidirectional microphone, as using a directional microphone yields inconsistent and uncorrelatable results, which is generally not recommended Additional details can be found in clause 7.10.
In situations where STI methods are not feasible due to transmission channel limitations, alternative techniques for evaluating speech intelligibility should be employed Various methods are available for assessing the quality of speech communication, each offering distinct advantages and disadvantages tailored to different users For further details, refer to Annex N, which outlines several additional measures of intelligibility.
Background of the STI method
General
The Speech Transmission Index (STI) is grounded in the observation that fluctuations in speech signals are crucial for understanding speech intelligibility These fluctuations arise from the acoustic separation of sentences, words, and phonemes, which are the core components of speech These modulations can be measured in terms of modulation frequency \( F \), resulting in a modulation spectrum In clear speech, modulation frequencies generally range from 0.5 Hz to 16 Hz, with peak modulation occurring around 3 Hz.
Deterioration of the modulation spectrum during transmission typically leads to decreased speech intelligibility This decline is associated with a reduction in modulation depth at specific modulation frequencies, quantified as a modulation transmission value for each octave band within the speech spectral range Figure 1 illustrates the potential reduction in modulation that can occur between a speaker and a listener.
Transmitted speech signal modulation index = 1
Received speech signal modulation index = m < 1
Figure 1 – Concept of the reduction in modulation due to a transmission channel
The STI method has been refined and validated through subject-based intelligibility experiments utilizing CVC (Dutch) word scores, addressing a wide range of transmission channel distortions These distortions encompass noise, reverberation, echoes, non-linear distortion, and various digital encoding techniques.
The STI test signal was created using parameters from speech material and consists of seven octave band noise signals that span from 125 Hz to 8 kHz Each noise carrier is modulated with multiple modulation frequencies at one-third octave intervals, ranging from 0.63 Hz to 12.5 Hz.
The STI method, outlined in Annex A, calculates the modulation transfer function m(F) for the transmission channel, yielding 98 results across 14 modulation frequencies and seven octave bands Each octave band's RMS level corresponds to the average long-term spectrum of speech material, with contributions to speech intelligibility weighted accordingly The overall STI value for the transmission channel is derived from the weighted sum of these transmission index values.
Research indicates that adjacent octave bands hold redundant information crucial for speech intelligibility When one octave band fails to enhance intelligibility due to factors like reverberation or background noise, neighboring bands can partially compensate for this loss This understanding has influenced the incorporation of redundancy factors in the Speech Transmission Index (STI) methodology.
Theoretical overview
The modulation index \( m_i \) of a test signal is transmitted into a room or through a communication channel, where it is received as the modulation index \( m_o \) at the listener's position To assess the Speech Transmission Index (STI) in the scenario depicted in Figure 1, the test signal is emitted by a sound source that mimics a human speaker, while a receiving test microphone is positioned at the listener's location.
For the sound source, the important characteristics are physical size and directivity, position, sound pressure level and frequency response
The typical test signal consists of a carrier with a speech-shaped frequency spectrum and a sinusoidal intensity modulation with modulation frequency f m (see Figure 2) Ī o (1+ m o cos 2 π f m (t + τ )) Ī i (1+ m i cos 2 π f m t)
Modulation frequency f m (Hz) Modulation transfer function m (f m )
Input Echoes, Output reverberation, noise
The modulation indices of the input and output signals are represented by \( m_i \) and \( m_o \), respectively The input and output intensities, denoted as \( I_i \) and \( I_o \), correspond to the square of the sound pressure levels (\( p^2 \)).
Figure 2 – Modulation transfer function – Input/output comparison
The reduction in the modulation depth at frequency f m is quantified by the modulation transfer function m(f m ) which is determined by
The effective signal-to-noise ratio (SNR\(_{eff}\)) is influenced by various factors, including reverberation, echoes, non-linear distortion, and interfering noise This ratio is crucial for understanding the quality of a signal in the presence of these disturbances.
The effective signal-to-noise ratio is constrained between –15 dB and +15 dB Any values below –15 dB are capped at –15 dB, while values exceeding +15 dB are limited to +15 dB.
The speech transmission index STI combines the modulation transfer index values from measurements in seven octave bands into one overall weighted value
Annex A provides a more detailed description of the calculation of the speech transmission index.
Measurement of STI
The FULL STI is based on a complete set of 98 (7 x 14) modulation indices
Two simplified forms of the STI, based on measurements using a lower number of modulation indices, are STIPA and STITEL (see Clause 5)
STIPA consists of a test signal with a predefined set of two modulations per octave band that are generated simultaneously giving a total of 14 modulation indices
STITEL consists of a test signal with a predefined set of seven modulation frequencies, one per octave band, that are generated simultaneously giving a total of seven modulation indices
There are two methods to measure STI:
• direct methods using modulated test signals;
• indirect methods based on the system’s impulse response using the Schroeder equation
The direct methods using STIPA and STITEL have substantially shorter measurement durations than the direct FULL STI Note that the direct FULL STI is rarely used in practice
Annex B and Annex C provide detailed descriptions of STIPA and STITEL, respectively Annex D provides details about the now obsolete method RASTI
The STI method, applicable in both direct and indirect forms, effectively yields valid results for numerous linear distortions across time and frequency domains This method accounts for various types of distortions, ensuring comprehensive analysis and accurate outcomes.
• temporal distortion, e.g reverberation and echoes;
• strong spectral distortion e.g band-pass filtering
NOTE Some types of spectral distortions may not be accounted for, see 4.5.8
Direct STI methods address non-linear distortion, such as clipping, while indirect methods are suitable only for linear systems For more details on the impact of non-linear distortion, refer to Clause 6 Table 1 summarizes the STI test methods in relation to the types of linear and non-linear distortion they are designed to handle.
Table 1 – Comparison of STI test methods for different types of distortion
Type of distortion Noise Reverberation, echoes Non-linear distortion Spectral distortion a
Direct FULL STI yes yes condition dependent yes
Direct STIPA yes yes condition dependent yes
Direct STITEL yes condition dependent condition dependent yes
Indirect FULL STI using MLS e yes b yes no yes
Indirect FULL STI using swept sine signal c no d yes no yes
The term 'condition dependent' refers to the variability in the accuracy of test signal types, which may yield precise results based on the specific type of distortion present.
Centre clipping does not significantly impact modulation depth, while peak clipping decreases modulation depth but typically does not affect speech intelligibility Consequently, the measured Speech Transmission Index (STI) value may be overly conservative.
• STITEL can be used in reverberant environments, provided that the reverberation time is not largely dependent on frequency;
STIPA can effectively assess PA systems with non-linear distortion components, provided the signal is not severely clipped across frequency bands It's important to note that the frequency response of the transmission channel may lead to a perceived loss of intelligibility, which is not fully captured in the results Signal averaging of time domain data is not recommended, and the excitation spectrum should be speech-shaped, including considerations for time delay spectrometry Additionally, the effects of noise can be calculated mathematically, and theoretically, other mathematically deterministic pseudo-noise signals could be utilized.
Applicability of STI test methods
Table 2 provides an overview as to which forms of STI are recommended for various types of application The + and − symbols are a general indication of the suitability of the method
If significant parts of the listener population are non-native and/or older listeners, the STI should be interpreted as noted in Annex H
STI a STIPA STITEL Limitations Work- arounds
Assessing suitability of room acoustics for speech communication (no electronic amplification)
Suitability of STITEL depends on reverberation
Suitability of STITEL depends on reverberation and echoes Evaluating telecommunication channels (phone, radio)
STITEL has more diagnostic power Channel features amplitude compression STIPA + + +
Difference between male and female voices needs specific attention
STIPA not suitable for female (male spectrum only)
Strong centre clipping None − − − none
Strongly fluctuating noise STIPA +/− +/− +/− Report several STI measurements Speech and noise clearly spatially separated or a strong direct-field component exists in a highly reverberant environment
To be used with caution
Currently standardised methods are inaccurate
Channels that do not permit artificial test signals, such as vocoders
Current standardized methods lack accuracy Speech-based STI test signals and listener tests are effective, with varying degrees of suitability: very well suited, well suited, somewhat suitable, and not suitable For a detailed assessment of measurement methods, refer to Table 1 Notably, this direct method may be incorporated into future updates of the standard.
Use of direct and indirect methods
Table 3 below compares a number of practical issues relating to the use of direct and indirect measurement methods
Subject Direct method Indirect method
Amplitude nonlinearities reduce the reliability of the result reduce the reliability of the result
Frequency shift not possible not possible
Sample rate accuracy between the clock frequencies of the signal source and receiver errors less than 20 × 10 –6 errors less than 0,5 × 10 –6 a See 4.5.8 for further details.
Limitations of the STI method
General
The STI method aims to capture all relevant changes in a transmission channel that affect speech intelligibility However, it is crucial to understand that the STI modeling approach simplifies human auditory processing Additionally, the STI test signal differs from natural human speech in nuanced temporal and spectral characteristics.
• the dynamic range of speech, which depends on the integration time;
• the energy distribution in each time frame;
• the distribution of signal levels over the entire length of a speech segment or test signal (percentile exceedances);
• the lack of gaps in the test signal;
• the carriers in speech are not restricted to the fixed carrier bands and modulation frequencies;
• the spectral differences between individual words and the STI signal;
• the spectral differences between various talkers
NOTE The speech spectrum specified for STI differs from the spectrum specified by ANSI [4]
When using the Speech Transmission Index (STI) in specific situations and narrow-band transmission channels, caution is essential There are instances where intelligibility remains relatively unaffected by distortion, even though the STI indicates a significant decrease Conversely, there are cases where the STI shows minimal changes, yet intelligibility can be greatly compromised The subsequent sections will explore these potential limitations in greater detail.
Frequency shifts
This type of distortion may occur with
• playing a digital signal at the wrong sampling rate,
• devices for preventing acoustic feedback,
Frequency shifts can have a large effect on STI with generally little effect on intelligibility, so the STI may underestimate intelligibility for systems with frequency shifts.
Centre clipping
Distortion can arise when low-level signal components are either poorly transmitted or muted, often due to issues in amplifiers or corroded connectors Additionally, the Speech Transmission Index (STI) may inaccurately assess intelligibility in systems experiencing significant center clipping.
NOTE Centre clipping is also known as crossover distortion and origin distortion.
Drop outs
Signal drop-outs occurring at regular intervals can be attributed to selective fading patterns in wireless transmissions and digital signal corruption While the Speech Transmission Index (STI) may not show significant reduction, the intelligibility of the signal can be severely compromised To address this issue, it is advisable to analyze the fine structure of the received modulated signal to identify drop-outs and, when feasible, compute the STI with the drop-outs excluded.
Jitter
Time shifts in speech used for digital signal transmission to adjust for variations in transmission rates do not impact intelligibility However, they can significantly lower the Speech Transmission Index (STI), potentially leading to an underestimation of intelligibility in systems affected by jitter.
Vocoders
Digital voice coders have minimal impact on intelligibility; however, the choice of codecs can lead to an increase in the Speech Transmission Index (STI) In scenarios where intelligibility is low, it is advisable to utilize speech-based test signals or subject-based measures.
STI is not an appropriate metric for evaluating systems like vocoders that encode speech segments Techniques such as linear predictive coding, which may involve code-book synthesis, can introduce errors associated with voiced and unvoiced speech fragments, as well as pitch inaccuracies.
Overestimation of STI under low background noise conditions
The STI model assumes a finite signal-to-noise ratio in each octave band, as the hearing reception threshold acts as background noise Setting background noise levels or reception threshold values to zero during measurements or simulations can result in artificially high STI values.
When analyzing the Speech Transmission Index (STI), the issue becomes evident when the input signal's spectrum varies Utilizing a Modulation Transfer Function (MTF) matrix where all values are set to unity—indicating no interference from reverberation or background noise—can lead to minimal STI changes, even when significant alterations occur in the input signal's spectrum.
Incorporating realistic background noise levels in STI predictions and measurements is crucial for accurate applications For instance, when measuring acoustic output, it is important to consider both the background noise and the speech reception thresholds.
Frequency response
Research indicates that the frequency response of the transmission channel significantly impacts the perceived tonal balance of speech and is more crucial for intelligibility than STI measurements suggest, particularly in reverberant environments A non-flat frequency response can lead to inflated STI values that do not accurately reflect intelligibility Instances have been documented where systems with STIs above 0.5 exhibited inadequate speech intelligibility due to poor tonal balance However, applying equalization to enhance frequency response has been shown to improve perceived intelligibility.
To address the limitations of the STI method, it is essential to conduct a separate measurement of the amplitude versus frequency response of the system, ideally with a resolution finer than one octave bandwidth However, it is important to note that significant factors may be overlooked in these measurements.
The frequency response derived from impulse response data is significantly influenced by both the duration of the measurement data and the time window applied to it.
There is no single measure that accurately reflects perceived tonal balance across different acoustical environments In low-reverberation settings, the direct field response significantly impacts tonal balance, whereas in highly reverberant spaces, the power response of the sound source takes precedence.
• The influence of varying talker position on the microphone’s frequency response
Small adjustments to the frequency response of sound systems can significantly enhance speech intelligibility by minimizing audible coloration, thereby reducing the concentration required from listeners This is especially crucial in prolonged listening scenarios or when dealing with non-native speakers For instance, even minor tweaks of just 1 dB within a narrow bandwidth of 1/3 octave can lead to noticeable improvements in perceived intelligibility.
Echoes
Audible echoes, or late reflections, can lead to a notable decrease in perceived speech intelligibility, even when the measured Speech Transmission Index (STI) values suggest otherwise.
NOTE This issue is the subject of ongoing research, see e.g [8]
In situations with audible echoes, other diagnostic acoustic methods should be used to measure and assess the severity of the echo.
Fast amplitude compression and expansion
Measured STI and STIPA values can change with the application of compression or expansion to the test signal However, practical experience indicates that these adjustments typically result in only slight variations in perceived intelligibility It is important to note that compression techniques often modify the tonal balance of speech, which may negatively impact the clarity of speech intelligibility.
When properly implemented, companders (complementary compression and expansion devices) are likely to have no overall effect on intelligibility
Fast compression operates on the instantaneous amplitude envelopes across various frequency bands, effectively reducing signal level variations that exceed the compression threshold (knee point) based on the compression ratio This process decreases the dynamic range of the signal, which can also lead to a reduction in modulation depth.
On the other hand, automatic gain control (AGC) has a fast reaction time, but a very slow recovery time and does not reduce the short-term dynamic range
Compression and Automatic Gain Control (AGC) techniques enhance speech intelligibility, particularly benefiting individuals with hearing impairments who experience a limited dynamic range These methods are also effective in optimizing sound quality in public address systems.
Research indicates that sentence intelligibility, assessed through the speech reception threshold (SRT), can improve by as much as 4 dB in effective signal-to-noise ratio (SNR) However, this enhancement is influenced by the specific amount and type of compression applied.
The effect of compression on intelligibility at high signal and noise levels, such as in public address systems, awaits the outcome of further research.
Non-linear distortion
Although the STI is sensitive to distortion, the result is highly dependent on the measurement method adopted (This is discussed further in 6.3.)
Impulsive and fluctuating noise
Two types of background noise should be distinguished in STI measurements:
Impulsive noise and brief unwanted events, like a hammer drop, can lead to inaccurate Speech Transmission Index (STI) results, particularly in narrow band transmission, and may also cause misdiagnosis of frequency band contributions.
Fluctuating noise, including intermittent babbling voices and machinery that operates in cycles, can cause significant variations in the Speech Transmission Index (STI) values during repeated measurements This variability may result in substantial underestimation or overestimation of speech intelligibility assessments.
Subjectively, the intelligibility of sentences in fluctuating noise is known to be higher than in stationary noise with the same time-averaged RMS output [14]
When conducting STI measurements amidst impulsive or fluctuating noise, it is essential to utilize the indirect method outlined in Clause 6 To minimize noise interference, signal averaging techniques such as MLS or slow sine-sweeps should be employed Subsequently, the adverse effects of noise can be incorporated back into the MTF through post-processing of the 'noise-free' MTF data.
To accurately determine the STI using sine-sweeps, it is essential to obtain a noise-free measurement For practical applications, this requires that the signal-to-noise ratio (SNR) in each octave band is a minimum of 20 dB Additional details can be found in section 7.8.3.
Hearing impaired listeners
The STI method alone is not a dependable indicator of speech intelligibility for individuals with hearing impairments While it is feasible to measure the effectiveness of hearing assistive systems, additional specific corrections may be necessary for accurate results.
Conclusion
In general, the STI method is a conservative approach and may underestimate intelligibility in some applications, but there are exceptions such as given in 4.5.3
5 Direct method of measuring STI
Overview
STI may be measured either directly using a suitably modulated signal or indirectly by means of mathematical manipulation of a system impulse response using a relationship proposed by Schroeder [17]
The research outlined in sources [4], [5], [18], [19], and [20] established the foundation and methodology for the FULL STI This led to the development of two simplified versions, STIPA and STITEL, which significantly reduce measurement time Although RASTI was initially created, it is now considered obsolete.
The FULL STI consists of 98 distinct test signals utilizing 14 different modulation frequencies across seven octave bands Each test signal features a single modulation frequency paired with one octave band noise carrier, while the remaining octave bands remain silent These test signals are generated in sequence, with each lasting approximately 10 seconds, resulting in a total measurement time of around 15 minutes Additionally, an alternative version of the FULL STI incorporates random modulations in the other octave bands alongside the primary modulation frequency and octave band being tested.
STIPA comprises a single test signal featuring a specific combination of two modulations across seven octave bands, resulting in a total of 14 simultaneous modulations Each measurement typically lasts between 10 to 15 seconds.
STITEL is a measurement tool that utilizes a single test signal with seven predefined modulation frequencies, one for each octave band, generated simultaneously Each measurement lasts about 12 seconds While STITEL offers enhanced sensitivity, it requires careful handling during use.
RASTI is a test signal that utilizes a specific set of nine modulation frequencies, generated simultaneously—five in the 2,000 Hz octave band and four in the 500 Hz octave band Each measurement lasts about 30 seconds Although RASTI is now considered obsolete, additional information can be found in Annex D for reference.
Table 2 compares the accuracy of the two simplified test signals with that of the FULL STI for various test conditions
To accurately assess the operational signal-to-noise ratios and absolute speech levels for an STI, the mean intensity of the test signal must match the normal speech level at the test position Following the method outlined in Annex J, the L Aeq of the test signal is adjusted to be 3 dB A higher than the typical L Aeq of the continuous speech measured at the test position, necessitating a 3 dB correction factor.
STIPA
Simplifying the STI test signal by omitting uncorrelated modulations enhances the efficiency of simultaneous modulation and parallel processing across all frequency bands, thereby reducing measurement time However, this simplification may compromise the ability to address certain types of non-linear distortion, as indicated in Table 1 The modulation transfer function is assessed for two modulation frequencies within each octave frequency band.
The STIPA method, outlined in Annex B, simplifies measurements with a duration of 10 to 15 seconds It is effective for assessing both natural speech in room acoustics and the performance of sound systems.
STIPA denotes a modulated, speech-shaped signal, as outlined in Annex B When STIPA is generated from an impulse response, such as through prediction, it must be explicitly indicated, using the term STIPA(IR) to prevent any misunderstanding Additionally, it is important to recognize that the standard STIPA signal is derived from a male speech spectrum.
The STIPA method lacks reliability in predicting speech intelligibility for individuals with hearing impairments unless specific corrections are applied While it is feasible to measure hearing assistive systems or channels, these measurements may also necessitate particular adjustments.
Application
The direct Speech Transmission Index (STI) method is versatile, applicable to various digital, analogue, electro-acoustic, and acoustic speech transmission channels By determining the STI value, it is possible to predict the intelligibility of different speech materials across multiple transmission systems.
All tests referencing this standard must include the relevant parameters and results in a measurement report sheet, with a sample provided in Annex K.
Limitations
In addition to the limitations of the STI method described in Clause 4, there are a number of other limitations to the direct method of measuring the STI
Repetitive measurements of a band-limited random or pseudo-random noise test signal typically yield varying results, even in consistent interference conditions The outcomes are centered around a mean value with a specific deviation, influenced by factors such as the number of discrete measurements of the modulation transfer function—commonly 98 for the STI method and 14 for STIPA—and the duration of the measurement process.
Typically, with FULL STI, the maximum deviation is about 0,02 STI for a measuring time of
10 s for each modulation index m(f m ) and with stationary noise interference With STIPA and a measurement time of 15 s, the maximum deviation is approximately 0,03 STI for repeated measurements
Fluctuating noise, such as a babble of voices, can lead to higher deviations and potential systematic errors (bias) To verify this, measurements should be conducted without the test signal, aiming for a residual STI value below 0.20 Additionally, to estimate the deviation accurately, it is essential to repeat measurements under a limited set of conditions.
It is therefore good practice to average the STI results over two or three measurements for a specific condition
6 Indirect method of measuring STI using the impulse response
Overview
The modulation transfer function (MTF), fundamental to the speech transmission index (STI), can be derived from the impulse response of a transmission channel through the Schroeder method This impulse response is typically obtained using computer-based equipment, from which the MTF is calculated to determine the STI.
The following equation (of which the first factor is the Schroeder equation), should be used to calculate the modulation transfer function m f,k , at modulation frequency f m in octave band k
SNR k t k f m k t t h t e t h f m where h k (t) is impulse response of octave band k; f m is the modulation frequency;
SNR k is the signal-to-noise ratio in dB
The indirect method is only applicable to linear, time-invariant systems
Considerable experience is required to use this method as the measurement systems allow a variety of parameters to be adjusted, which may affect the final result
This method is applicable to simplified forms of STI, and due to the short processing time, it is advisable to calculate the FULL STI However, calculating the shorter derivatives of STI can also be beneficial.
STIPA values derived from impulse response measurements shall be termed STIPA(IR).
Application
When deriving STI values from impulse response measurements, it is essential to conduct noise-free measurements and correct for background noise and speech levels Techniques such as using a speech-shaped MLS signal without averaging can directly account for background noise According to ISO 18233, impulse response measurements must adhere to specific requirements, including a minimum length of 1.6 seconds and a signal-to-noise ratio (SNR) of at least 20 dB across all octave bands It is advisable to avoid excitation signals with a white frequency spectrum unless background noise is minimal; instead, a pink frequency spectrum or a speech-shaped MLS signal is preferred Impulsive signals like the Dirac function are unsuitable in the presence of significant background noise and distortion The impulse response method is applicable only to linear, time-invariant systems, and any non-linear processing should be bypassed during measurements Additionally, environmental factors such as wind and movement of sound transmission components must be controlled to ensure measurement accuracy Repeatability of results should be verified through repeated measurements, and the effects of background noise and operational speech levels must be incorporated into the final results through post-processing.
Limitations (non-linear distortion)
The impulse response method for measuring the Speech Transmission Index (STI) has several limitations, particularly concerning non-linear distortions, in addition to those outlined in Clause 4.
To ensure accurate measurements, it is crucial to avoid distortions in the measurement signal, as the indirect method fails to adequately address their effects The sensitivity to distortion varies significantly based on the measurement procedure employed Notably, Fourier transform-based methods yield error-free results solely in linear systems.
A critical analysis of the impulse response is essential, particularly regarding the influence of non-linearities in the transmission system, as components often operate at their performance limits In sine sweep techniques, non-linear distortion components can be identified at the beginning or end of the recovered impulse response However, long reverberation times may introduce errors, as the reverberant tail of these distortion components can blend into the main impulse response.
When utilizing an MLS signal, distortion components often manifest as noise, making them less noticeable Additionally, DC components and time aliasing artifacts present themselves as pre-arrivals, or pre-echoes, prior to the signal's arrival.
When employing a sine sweep technique, it is essential to eliminate the inherent distortion components from the impulse response (IR) prior to calculating the Speech Transmission Index (STI).
7 Measurement procedures, post-processing of data and applications
General
STI measurements are typically conducted using acoustic methods; however, there are instances where acoustic excitation or measurements may not be feasible or required In cases where various systems are evaluated for their speech transmission quality or when additional diagnostic information is necessary, electrical injection and reception of the test signal can be utilized.
In post-processing the MTF matrix, it is crucial to incorporate a realistic level of background noise For acoustic transmission channel outputs, the minimum threshold for hearing reception should be applied.
All relevant parameters should be stated in a measurement report A sample report is given in Annex K.
Acoustical input
To ensure accurate testing of the system, a special loudspeaker is used to apply a test signal directly to the microphone, accounting for factors like ambient noise and feedback that may affect intelligibility This method is essential for certain electro-acoustic systems that lack alternative means of injecting the test signal Consequently, a specific loudspeaker, such as an artificial mouth, is employed to replicate the characteristics of a natural talker during acoustic signal reproduction.
To ensure accurate electrical injection of the test signal, it is essential to properly adjust the test signal spectrum to align with the standard speech spectrum When employing the direct method, the standardized test signal must be utilized for this adjustment.
To ensure accurate testing, it is essential to verify the integrity of the test signal through loop back measurements This verification is especially crucial when the signal is sourced from a CD player, and it is also important for PCM generators, such as wav files It is advisable to avoid using digitally compressed signal formats like MP3 for testing purposes.
The 128 kbit/s bitrate has demonstrated reliable performance without noticeable errors Additionally, it is essential to confirm that the 1/3 octave frequency response of the test signal source, whether using an artificial mouth or an appropriate test loudspeaker, remains within ±1 dB across the specified frequency range This measurement should be conducted in a free field environment, free from reflections, utilizing one of the approved measurement techniques.
• over the range 88 Hz to 11,3 kHz using a FULL STI or MLS or other impulse response measurement signal (the limits of the 125 Hz and 8 kHz octave bands) or
• individual octave band levels over the range 125 Hz to 8 kHz when using a STIPA or other speech shaped test signal
NOTE 1 For indirect measurements, the frequency response derived from an MLS or other impulse response measurement can be processed to calculate an octave-band spectrum
To ensure optimal performance, adjust the equalization of the artificial mouth or test loudspeaker as needed Position the sound source along the axis of the designated microphone at the correct distance and orientation, directing it towards the normal speaking direction.
In the absence of an artificial mouth, a high-quality loudspeaker with a cone diameter not exceeding 100 mm can serve as a suitable transducer, and the results will be detailed accordingly.
In a listening environment, the intelligibility of speech is influenced by the directivity of the sound source To accurately assess the clarity of unamplified speakers, it is essential to use a mouth simulator that mimics the directivity characteristics of the human head and mouth, as outlined in ITU-T Recommendation P.51 Additionally, the directional properties of the test source, whether it be a talker simulator loudspeaker or mouth simulator, play a crucial role in measurements, particularly in large or reverberant spaces, or when the microphone is positioned at a distance from the speaker.
When using a sound system for speech, a simulator may not be necessary unless the microphone is in a noisy or reverberant environment, or if a close-talking or noise-canceling microphone is used, in which case a talker or mouth simulator should be utilized It is essential to set the test signal level at the microphone position to match the operational speech level intended for the system, ensuring that both speech and test signal levels are aligned as outlined in Annex J.
The amplifier may experience stress during this test, as indicated in section 14.9 of IEC 60268-3 To facilitate cooling, it is advisable to apply the test signal for one minute, followed by several minutes of zero signal.
When there is no proper alignment between the test signal level and the operational speech level, it is recommended to use a default equivalent level of 60 dB A at 1 meter in front of the artificial mouth or test loudspeaker as the source.
Smaller talker distances typically result in speech levels of approximately 86 dB A to
94 dB A for handheld microphones (distances of 5 cm to 2 cm), while speech levels of approximately 80 dB A to 86 dB A result for gooseneck microphones (distances of 10 cm to 5 cm)
The levels mentioned are subject to significant variations in practice It is essential to run the STI, STIPA (or STITEL) test sequence, selecting the “with noise” option when available When using an MLS signal to measure the impulse response, the excitation spectrum must be adjusted to the standardized speech spectrum through appropriate filtering to account for background noise Signal averaging should be disabled, or a single sequence should be utilized For determining the noise-free impulse response with sine-sweeps, MLS, or TDS, necessary adjustments to speech and noise levels at both the microphone and receiver locations must be applied to the noise-free MTF through post-processing Additionally, the test signal should be introduced into the system in a manner that ensures all relevant signal processing components for speech reproduction, such as equalizers and signal delays, are accurately considered during the measurement process.
Acoustical output
The measurement device, whether a microphone, artificial ear, or head simulator, must undergo acoustic calibration to ensure accurate sensitivity and frequency response Measurements should be taken at the listener's typical location and height, or at a designated listening height If utilizing a single microphone, it must be an omni-directional type designed for diffuse field applications.
Electrical input
To enhance the testing process, modify step d) in section 7.2 by choosing the signal injection point as close as possible to the normal signal input This approach aims to incorporate a larger portion of the system into the test.
The STI test signal at the injection point must be calibrated to match the speech level at that location, which is established using the speech level measurement method outlined in Annex J.
Electrical output
To ensure accurate measurements, it is essential to disable hearing-related effects like masking and the reception threshold on the measurement device, as acoustic conditions do not influence the electrical output If disabling these effects is not feasible, the electrical input must be adjusted to simulate a sound pressure level that exceeds the reception threshold but remains below the level where masking significantly impacts the Speech Transmission Index (STI) results Additionally, broad band output levels should be A-weighted and reported as A-weighted voltage levels in decibels (dB) relative to a specified reference, such as 1 V.
Examples of input/output combinations
Acoustical input – Acoustical output
In standard STI measurement setups for PA systems and auditoria, a sound source generates the STI test signal, which is calibrated to match the nominal speech level It is essential to use a situation-dependent and representative talking distance, as outlined in section 7.2 A calibrated STI measuring device is then utilized at the receiver location to assess the STI of the transmission channel.
Electrical input – Electrical output (e.g assessment of wired and wireless) communication systems)
Purely electrical Speech Transmission Index (STI) measurements are primarily used to evaluate the speech transmission quality of various communication systems rather than to determine an absolute speech intelligibility value It is recommended to conduct these measurements across a range of input signal levels, typically from −10 dB to +10 dB relative to the reference operational level, to assess the effects of dynamic range, noise floor, and signal processing on speech intelligibility Such measurements are commonly performed on both wired and wireless speech transmission systems, including telephone lines and radio communication systems.
Acoustical input – Electrical output (e.g assessment of microphones)
To effectively compare microphones based on their impact on speech intelligibility, it is essential to calibrate the STI test signal level at the microphone as outlined in section 7.2 Measurements should be conducted alongside the relevant ambient noise spectrum and adjusted according to varying noise levels to assess the microphone's noise rejection capabilities Ideally, these measurements should also be taken at different speech levels to analyze how changes in voice volume affect intelligibility.
Measuring the Speech Transmission Index (STI) for assistive hearing systems and Audio Frequency Induction Loop Systems (AFILS) may necessitate specific methods Nonetheless, the general guidance provided in this section remains relevant and applicable.
Electrical input – Acoustical output (e.g assessment of PA systems)
To evaluate various transducers, such as loudspeakers and headsets, the STI test signal can be electrically injected This test signal must be played back at the listener's location at a sound pressure level that reflects typical operating conditions.
For effective measurements in public address or sound distribution systems, it's essential to assess a representative number of locations Relying solely on a simple mean can be misleading; instead, calculating the mean of the measured data minus one standard deviation provides a more accurate representation, indicating an 84% probability of achieving a target value under a Gaussian distribution For even greater precision, plotting the complete statistical distribution of the results is recommended.
When assessing headsets, an in-ear microphone (MIRE) or an artificial ear should be used.
Post-processing of measured MTF data
There are a number of corrections that can be made to measured MTF data:
• elimination of noise from (de-noising) a measured MTF;
• addition of an occupancy noise level and spectrum;
• consideration of the hearing reception threshold;
• adjustment of the speech level and spectrum;
• correction for different reverberation times
Occupancy noise can be assessed in two ways: a) by manually inputting noise data into the measuring equipment's noise data table, or b) by combining an artificial or recorded noise signal that matches the required spectral content and level with the direct signal input to the analyzer or a previously recorded signal.
Annex M illustrates the process of eliminating noise from a measured Modulation Transfer Function (MTF) matrix while incorporating operational background noise and targeted speech levels, utilizing the equations provided in Annex A.
Issues concerning noise
General
In linear systems, distortions like reverberation affect the signal independently of the amplitude response The key variables influenced by signal levels include the signal-to-noise ratio in each octave band and the related upward masking As a result, the Speech Transmission Index (STI) method shows relative insensitivity to variations in the amplitude frequency response of the transmission channel, particularly in low background noise conditions.
Incorporating low levels of background noise into the MTF matrix enhances the sensitivity of the overall Speech Transmission Index (STI) to variations in the input spectrum, reflecting realistic conditions encountered in electro-acoustic systems.
In most practical scenarios, the assumption of a completely noiseless environment is unrealistic, as even quiet places like libraries or courtrooms typically have a residual noise level around 25 dB, primarily influenced by the auditory hearing threshold.
35 dB SPL is not uncommon and should be taken into account This can be achieved by applying a suitable criterion, such as NCB, RC or NR curves (see [26])
Short, unwanted events like impulsive noise can be identified by analyzing the signal's statistics However, a more practical approach is to either repeat the STI measurement after removing the noise source or to employ indirect methods along with averaging techniques.
Fluctuating noise can be identified by assessing the direct Speech Transmission Index (STI) when no test signal is present High STI values, such as those exceeding 0.2, may indicate inaccurate measurement results Ideally, STI measurements should occur in the absence of noise, which should be measured separately (refer to section 7.8.2) before calculating the STI mathematically.
Measurement of background noise
To effectively correct an STI measurement for background noise, it is essential to accurately characterize the background noise This involves measuring the equivalent continuous sound pressure level (L eq) across seven octave bands, ranging from 125 Hz to 8 kHz, over an adequate duration Additionally, it is important to document the measurement positions, durations, and times, along with any notes on unusual circumstances that could impact the validity of the results.
For accurate corrective calculations, it is essential to avoid relying on a single broadband value for background noise, such as L A,eq, and instead consider multiple values Additionally, using only one broadband value for the speech signal, like the operational speech level, is inadequate Refer to section 7.8.3 for further details.
Fluctuating noise
To minimize the impact of fluctuating noise, it is essential to amplify the signal so that it reaches at least 15 dB above the noise level in each octave band Subsequently, the Speech Transmission Index (STI) should be calculated using the modulation indices based on the original signal levels prior to amplification This approach necessitates a certain level of computational expertise.
To ensure accurate measurements in the presence of fluctuating noise, it is essential to repeat the measurements at least three times before calculating the average STI If the variation across these three repetitions is less than 0.03 STI, additional measurements are unnecessary.
Understanding speech intelligibility amidst fluctuating noise is challenging and exceeds the current standard's capabilities Research indicates that listeners can comprehend speech during the pauses in fluctuating noise, resulting in a higher level of intelligibility than what the Speech Transmission Index (STI) would suggest based solely on the equivalent continuous noise level (L eq).
NOTE If the fluctuation is great (e.g 15 dB or more), it may be necessary to use the L 10 in each octave band.
Analysis and interpretation of the results
It is important to examine the MTF data in each octave band to determine the reliability of the results, as follows:
• constant or slightly reducing modulation transfer ratio values as a function of modulation frequency indicate that noise is the dominant mechanism;
• modulation transfer ratio values monotonically decreasing with modulation frequency indicate that reverberation is the main mechanism;
Values that decrease initially and then increase with modulation frequency suggest the presence of significant reflections occurring after 50 ms, potentially leading to an overly optimistic assessment of intelligibility It is advisable to report this effect if detected in the results.
Binaural STI measurements
The Speech Transmission Index (STI) is a widely recognized method for predicting speech intelligibility; however, it primarily relies on monaural listening This approach overlooks the benefits of binaural listening, which can enhance speech intelligibility.
The binaural advantage may be considerable, yet there are no definitive measurement methods to assess it The existing Speech Transmission Index (STI) method could underestimate intelligibility, particularly when speech and noise originate from different directions Ongoing research is addressing this concern.
When performing binaural STI measurements using an artificial head, the recommended approach is to use the STI results for the best ear For further information, see [27]
8 Use of STI as a design prediction tool
Overview
During the design stage of a sound system, it is useful to predict the STI performance from the predicted room acoustic parameters Two methods are available:
• calculation based on a predicted direct field, combined with an exponential reverberant decay and simple electro-acoustic parameters Statistically calculated reverberation times may be used here;
• prediction based on a simulated impulse response of the system in the acoustic space
Simulated impulse response predictions provide enhanced precision, particularly in scenarios where traditional statistical methods for calculating reverberation times, such as Sabine or Eyring, are likely to be inaccurate This is especially true in coupled spaces or areas with uneven absorption distribution.
For accurate prediction of the Speech Transmission Index (STI), it is essential to utilize the operational speech level, as it influences both the effective Signal-to-Noise Ratio (SNR) and masking effects A broadband speech signal must be employed for this prediction, ensuring that the transmission channel can deliver the required operational sound pressure level.
Statistical predictions
The prediction of a sound system's Speech Transmission Index (STI) performance relies on the MTF matrix, which is derived from anticipated room acoustic and electro-acoustic parameters, along with the measured or estimated background noise levels for each octave band relevant to the selected STI version This calculation follows the methodology established by Houtgast et al [28], as detailed in Annex L.
Access shall be available to the MTI values in each octave band and the octave band levels of the output speech signal
If the prediction is made using commercially-available software, the results shall state:
• that a statistical estimate has been made using the method of Houtgast et al [28];
• that the STI has been computed using the appropriate male or female weightings; note that:
• RASTI shall not be used as an indication of the predicted STI;
• the STI shall not be estimated by converting a %Alcons value;
The statistical prediction method demonstrates reduced sensitivity compared to direct STI, particularly regarding the impact of significant discrete early and late arrivals, as well as potential intelligibility loss caused by inadequate frequency response.