In this paper we use a formal definition for such a varying periodic signal by means of a modulation coherence function.. This modulation coherence function is very different from the co
Trang 1Investigation of Randomly Modulated Periodicity in Musical Instruments
Trang 2Acoustical musical instruments, which are considered to produce a well-defined pitch, emit waveforms that are never exactly periodic A periodic signal can be perfectly predicted far into the future and considered deterministic In nature, and specifically in sustained portion of musical sounds, there is always some variation in the waveform over time Thus, signals that are labeled as periodic are not truly deterministic In this paper
we use a formal definition for such a varying periodic signal by means of a modulation
coherence function This measure characterizes the amount of random variation in each
Fourier component and allows capturing its statistical properties The estimation is done
in period or pitch-synchronous manner and allows capturing even the smallest deviations away from periodicity, with only mild assumptions on the nature of the random modulating noise This modulation coherence function is very different from the coherence function between two stationary signals, which measure second order statistical / spectral similarity between signal It is also different from non-linear phase coupling measures that were previously applied to musical sounds, which depend on interaction between several harmonic Fourier components using higher order statistics The method is applied to a digitized record of an acoustic signal from several musical instruments.
PACS 43: 60.Cg, 75.De, 75.Ef, 75.Fg
Trang 3I INTRODUCTION
This paper investigates fluctuations away from perfect periodicity in pitchedacoustic instruments during a sustained portion of their sounds Acoustical musicalinstruments, which are considered to produce a well-defined pitch, emit waveforms thatare never exactly periodic (Beauchamp 1974, McIntyre et al 1981, Schumacher 1992,Dubnov and Rodet 2003) The paper focuses quantitatively on one of the possible ways
of describing those fluctuations that has not been quantitatively addressed so far, namely,random modulated periodicity (Hinich 2000, Hinich and Wild 2001) This type randommodulation is encountered in signals, which are labeled as periodic, but exhibit somevariation in the waveform over time which are not truly deterministic A randomlymodulated periodic signal is created by some mechanism that has a more or less stableinherent periodicity with random deviations around the mean periodic value Forexample, in speech signals voiced speech is randomly modulated since the oscillatingvocal cord varies slowly in amplitude and phase over several pitch periods in a seeminglyrandom fashion Other examples include sonar reflections pinging on a target, rotatingmachinery (Barker et al., 1994), and so on
In this work we investigate instrumental sounds that have a well-defined pitchduring a sustained portion of their sound Although we are dealing with sustainedportions of instrumental sounds, it is important to state that these sounds are not in the
"steady state" as would be produced by an artificial blowing or bowing machine, but areplayed by a human player, with all the attendant vibrato, amplitude and pitch variability.For instance, it should be noted that both the flute and the cello are normally played with
Trang 4significant vibrato at around 6Hz, while the trumpet is normally played with no vibrato.
In the case of Cello, one must also distinguish between natural playing of stopped andopen strings Playing a note on an open string contains only small pitch variation due topossible variations in the force applied to the bow A flute vibrato generally adds only asmall pitch variation, and generally has a large and uncorrelated variation in theamplitudes of upper partials and not a large variation in the amplitude of the fundamental
In stopped string bowing, the sounds have both a significant pitch variation (a fewpercent) over all partials and also large amplitude variations among the partials because
of body resonance (Fletcher and Rossing, 1995)
Recently a method for evaluating the degree of phase synchronous vs asynchronousdeviations among harmonics of musical instruments in sustained portions of their soundswas proposed (Dubnov and Rodet, 2003), based on estimation of the degree phasecoupling among groups of harmonically related partials and it is closely related toevaluation of bi-coherence (using Higher Order Spectral (HOS) analysis) The bi-coherence method is different from the coherence method of the current paper in severalaspects: First, the bi-coherence function depends on interaction between phases ofdifferent partials, while the coherence measure is a local property of every partial.Moreover, phase coupling measures deviations between phases of sinusoidal components,while coherence captures random modulations that may contain both phase andamplitude deviations
We use a term “modulation coherence” to denote this new measure for signaldeviation from periodicity, which measures the deviations in the frequency domain of thesignal spectral component relative to a mean signal that has perfectly coherent or constant
Trang 5spectral components with no amplitude or phase deviations between periods We use theterm “coherence” in analogy with the physics use of the term, like in “coherent light”,being a signal of zero bandwidth, and having no deviations from single frequency(monochromatic)
One of the contributions of this paper is in derivation of a theoretical estimate for theamount of decay in modulation coherence due to vibrato (mathematical details areprovided in the Appendix) It might be expected that a signal containing quazi-periodicfrequency fluctuations would have little modulation coherence since it does not have awell-defined period and accordingly no averaging period or mean signal could bedetermined Our analysis shows that in case that vibrato is considered to be a (random)frequency modulation, then for vibrato depth of the order of magnitude of a semitone (orless, typical to musical instruments), the decay in modulation coherence is actually verysmall This finding is interesting when considering the experimental modulationcoherence results for instruments with vibrato For instance, comparing open and stoppednotes on a Cello (i.e without and with vibrato), we come to the conclusion that the largereduction in modulation coherence in the later case cannot be attributed to frequencymodulation aspect due to the vibrato
The experimental analyses in the paper are performed using a set of sounds similar toones that were used in (Dubnov and Rodet, 2003) (specifically, the sounds of Cello, Fluteand Trumpet instruments are the same recordings) The experiments include investigation
of both stopped and open string cello sounds and normal playing for wind instrumentscontaining various amount of vibrato, with the flute having a significant vibrato, while
Trang 6the trumpet or French horn having no vibrato These samples were taken from McGillUniversity Music Sound Database (McGill University Master Samples).
II THE MODEL
A varying periodic signal with a randomly modulated periodicity is defined as follows:
Definition: A signal { ( )}x t is called a randomly modulated periodicity with period T if it
operation The K/2+1 u tl qk( ) are jointly dependent random processes that represent the
random modulation This signal can be written as x t( )s t( )u t( ) where
The periodic component s t ( ) is the mean of x t ( ) The zero mean stochastic term u t( ) is
a real valued non-stationary process
A common approach in processing signals with a periodic structure is to segment the
observations into frames of length T so that there is exactly integer number of periods in each sampling frame The term sampling frame, or simply frame is used in this paper in
order to match the terminology used in the speech and audio processing literature The
Trang 7waveform in frame m is slightly different from that in frame m +1 due to variation in the
stochastic signal To further simplify notation, let us set the time origin at the start of the
first frame Then the start of the m-th frame is m (m 1)T where m=1,…M The
variation of the waveform from frame-to-frame is determined by a probability
mechanism described by the joint distribution of x(m), ., (x m T 1)
Now that the concept of a randomly modulated periodicity has been defined, the nextstep is to develop a measure of the amount of random variation present in each Fourier
component of a signal Such a measure, called a modulation coherence function, is
presented in the next section It is important to note that in the definition of the signal
(1.1) it is implicitly assumed that the signal period is some integer multiple of 1/T and accordingly the frequencies f k are integer multiples of this period Since, at this point ofdiscussion, we are free to specify any sampling frequency, one could in principle sampleany periodic analog signal so that it is also discrete periodic The implication of thechoice of the sampling frequency is that the spectral analysis involved in estimation ofthe modulation coherence function (i.e the DFT operation to be performed below), doesnot need to employ windowing or frequency interpolation techniques in order to obtainadditional spectral values “in between” the DFT bins In practice, the signal samplingfrequency is chosen a-priori independently of the signal period, a situation that indeedrequires additional methods for improving the spectral analysis This will be done in thesection on estimating the coherence function immediately following the next section Forthe sake of clarity of the presentation we shall first define the modulation coherencefunction assuming that the sampling of the signal and the signal periodicity indeedcorrespond to each other (i.e the signal is discrete periodic)
Trang 8Modulation coherence
The m-th frame of the signal is xl (m), , x(m T 1)q Its discrete Fourier
transform (DFT) at frequency f r r T / for each r = 1,…,T/2 is
Essentially, the above result says that the DFT of a randomly modulated periodic signalcan be split into the mean spectral component and the contribution of the modulationcomponent at that frequency Although initially this may seem trivial, there are a couple
of points to consider here: One is that this is a first step in preparing the estimator anddefining the modulation coherence The second is more significant, and it shows thatperiodic modulation, which is considered here as an inherent property of the signal andnot as an added noise, behaves in the frequency domain as an additive spectralcomponent, i.e surplus energy and possibly phase shift in addition to the spectralcomponents of the mean signal Mathematically, of course, this is a manifestation of thelinearity of the DFT, but it is considered here in a stochastic context, i.e the addedspectral component is a random spectral deviation and some statistics need to beextracted from it in order to use it as a signal characteristic
Trang 9To simplify the notation, the index m is not used to subscript the complex valued random variables X(r) and U(r) The variability of the complex Fourier amplitude X(r)
about its mean r is E U r U r r[ ( ) (1 1 )]u( )r , independent of r due to stationarity If1
r 0 and u( ) 0 then that complex amplitude is a true periodicity The larger ther
value of u( ) , the greater is the variability of that component from frame to frame Ifr
r 0 and u( ) 0 , then that component does not contribute to periodicity.r
In order to quantify the variability consider the function x ( ) , called a modulation r coherence function defined as follows for each r=1, ,T/2:
where the mean value of the f r frequency component is zero, which is true for each
frequency component of any stationary random process with finite energy.
A high coherence value can be either due to large amplitude relative to ther
standard deviation u( )r or a small standard deviation relative to the amplitude Ther
signal coherence value at each harmonic is dimensionless and is neither a function of theenergy in the band nor the amplitude of the partial
One should note that this modulation coherence function is very different from thecoherence function between two stationary signals (p.352, Jenkins and Watts, 1968) The
Trang 10coherence (sometimes called coherency) between x t1( ) and x t2( ) at frequency f r is thecorrelation between X r1( ) and X r2( ) The closer the coherence value is to one, the
higher the correlation between the real and imaginary parts of both Fourier components(Carter, Knapp, and Nuttall, 1973) The modulation coherence function, in contrast, isdefined for one signal1 It measures the variability of X(r) about its mean r. One shouldkeep in mind that the signal in this representation is the mean of the observed signal
In the signal plus modulation-noise representation of { ( )}x t the
signal-to-modulation-noise ratio (SMNR) is ( )r r 2 u2( )r
for frequency f r Thus 2x( )r ( ) / ( )r r 1
is a monotonically increasing function of SMNR Inverting this relationship it followsthat
A modulation coherence value of 0.44 yields a SMNR of 0.24 which is –6.2 dB
The measure is not shift invariant in the sense that it needs to be “synchronized” to thepitch As will be discussed in the next section, the size of the frame is chosen in practice
to include multiple periods The size of the frame defines the resolution bandwidth, i.e.the larger the frames are, the better frequency resolution we get, but with a tradeoff ofhaving less averaging (smaller amount of frames for the signal duration) and accordinglymore noisy estimates
1 Estimation of correlation for one signal yields a periodicity estimate, i.e a time shift of the signal with respect to itself when it is similar This is again different from modulation coherence.
Trang 11Estimating the Modulation Coherence Function
As mentioned earlier, the signal in practice would most likely not have acorrespondence between the sampling frequency and the signal period This situationviolates the model of (1.1) and requires some changes to the modulation coherencefunction in (2.2) The simple solution to this problem is to assume that either thesampling frequency is sufficiently high compared to the signal period Another solution is
to use multiple periods in a frame and possibly to use zero padding or other spectralinterpolation methods for estimation of the signal spectrum at frequencies that do notcorrespond precisely to the DFT frequencies
We shall address these problems in two stages First, we present a simple method forfinding the fundamental frequency Then, we use a large frame size (a frame that containsmultiple periods instead of a single period) and for estimation of the mean signal andinclude zero padding for estimation of the spectrum of the remaining difference signal
Finding the Fundamental Frequency
It is important to know the fundamental frequency of the periodic component in order
to obtain the correct frame length for correct DFT analysis and averaging of the signal Incase that the fundamental is unknown, it must be estimated from the signal There aremany algorithms in the literature that might be used for pitch or fundamental frequencydetection Below we describe the method for determining the fundamental that was used
in our program
Trang 12To find the fundamental of a sound we subtract the mean (i.e DC value) of the signal
from each data point x t where n t n n and is the sampling interval In our case it
is important to find the exact value of the fundamental frequency to a precision that might
be higher then the DFT resolution 1/T in equation (2.1) For this purpose we resample
the signal to a higher sampling frequency and then we compute the discrete Fourier
transform X r N n0x t n exp( 2 i f t r n) using a multiple of the fundamental instead
of a single period, a situation that also stabilizes the average frame in terms of amplitude,phase and frequency fluctuations of the instrument The coherence function is estimatedfrom the mean and the variance of the DFT as explained below and the process is iterated
by manually adjusting the analysis frame size (and changing the DFT analysis frequencyaccordingly) so as to maximize the resulting coherence values The maximally coherentresults are reported in the following graphs It should be noted that additional zeropadding is not required since when a matching signal period and DFT analysis frequencyare found, the analysis frequency is exact
Mean signal, modulation variance and modulation coherence function estimates
Suppose that we have observed M frames each of length T of the process { ( )} x t as
denoted in the beginning of Section 2 Recall that m (m 1)T for each m=1,…,M
The sample mean for each t=0,…,T-1
Trang 13is an unbiased estimator of the "signal" s t( )
Let X r( ) denote the r-th component of the DFT of ( ( ), , (x 0 x T 1)) We define
It can be shown (Hinich 2000) that ( )x r is a consistent estimator of x( ) for frequencyr
f r with an error of O M( 1/ 2) The expression X r( ) ( )2u 2 r
can be used as an estimator
of the signal-to-noise ratio ( )r for frequency f r
Example: Coherent versus modulation only signal components
In order to better explain the difference between modulation coherence estimationand other, more standard spectral estimation methods we consider a signal comprising of
a single sinusoid at a frequency f and a band-limited noise-only component at the first0
harmonic frequency 2 f The signal can be written as0
Trang 141 0 2 0
Note that this signal has energy at two frequencies, where a component at frequency f0
has u t1( ) 0 for all times, which results in modulation coherence of value one, and asecond component at frequency 2 f that has 0 , resulting in modulation coherence2 0
of zero value It should be noted that the bandwidth of the noise component is notspecified in the definition of modulation coherence, since both the definition and theanalysis are asymptotic From the point of view of spectral analysis, the second
component at the right hand side of equation (4.1) is heterodyning of a signal u t ,2( )which centers the energy of the noise on frequency 2 f , with a bandwidth that equals0
that of u t 2( )
The following figures present one such example with frame size of T 100samples,fundamental period of 20 samples ( f =1/20 or five period in a frame), and a low-pass0
2( )
u t with cutoff equal to the frame rate (it was generated by band-limited interpolation
from a random sequence with factor 1:100, or up-sampling of a random signal generate at
frame rate into signal u t at the original sampling rate) A total of 200 frames were2( )generated An excerpt from the signal is shown in top Figure 1 It can be seen that thesignal has strong amplitude variations to the strongly modulted second harmonic
The mean signal was estimated by averaging the frames It should be noted that thisaveraging occurs in “pitch synchronous” manner As can be seen from the second fromtop Figure 1, the resulting signal corresponds to the periodic component only
Trang 15The DFT analysis by multiplying the signal frames by a cosine and sine matrices, each
generated with an exact period T, resulting in a matrix of dimensions 50 x 100 (50
frequency points and 100 time samples) The mean values of the sine and cosinecomponents were used as an estimate of the mean signal spectrum The variances of these
components were used for estimation of the variance u2( ) Both of them were used forr
estimation of coherence Bottom Figure 1 shows the coherence values for the 50 DFTvalues
This should be contrasted with the spectral estimation using standard methods, such asperiodogram or correlogram methods The power spectral density estimate using Welchmethod appears above the modulation coherence graph (third from top Figure 1) One cansee that there is no distinction between the sinusoidal and the band-limited noisecomponents since both contribute approximately same energy at their respectivefrequencies
PLACE FIGURE 1 ABOUT HERE
III INFLUENCE OF FREQUENCY MODULATION ON MODULATION
COHERENCE FUNCTION
The coherence analysis of the previous sections is written out as an amplitudemodulation component added to a coherent (i.e zero bandwidth) sinusoid.Mathematically speaking, the way the modulation coherence model is written out, one