1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Data Analysis Machine Learning and Applications Episode 1 Part 10 ppt

25 300 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 25
Dung lượng 715,38 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

A general harmonic model for pitch tracking of polyphonic musical time series will be introduced.. Based on a model of Davy and Godsill 2002 the fundamental frequencies of polyphonic sou

Trang 1

212 Antonello D’Ambra, Pietro Amenta and Valentin Rousson

values (a negative value, zero, and a positive value), the sum of the loadings beingzero for each component (defining hence proper contrasts of categories)

The goal of Simple NSCA is to find the optimal system of components amongthe simple ones, where optimality is calculated according to Gervini and Rousson(2004)

The percentage of extracted variability V(L) accounted by a system L of m =

min (I,J) − 1 components is given by

where lk is the kth column of L, and where L (k−1) is the m×(k−1) matrix containing

the first(k − 1) columns of L.

Whereas the numerator of the first term of this sum is equal to the variance of the

first component, the numerator of the kth term can be interpreted as the variance of the part of the kth component which is not explained by (which is independent from)

the previous(k −1) components Thus, correlations are "penalized" by this criterion

which is hence uniquely maximized by PCA, i.e by taking L = Em, the matrix of the

first m eigenvectors of S (Gervini and Rousson, 2004) The optimality of a system L

is then calculated as V(L)/V(E m)

In our sequential algorithms below, the kth simple component is obtained by regressing the original row/column categories on the previous k − 1 simple compo-

nents already in the system, by computing the first eigenvector of the residual ance hence obtained, and by shrinking this eigenvector towards the simple differencecomponent which maximizes optimality Here are two algorithms providing simplecomponents for the rows and the columns

vari-Simple solutions for the rows

1 Let S = 3Dj3, let L be an empty matrix and let ˆS = S.

2 Let a= (a1, , a I)be the first eigenvector of ˆS.

3 For each cut-off value among g = {0,|a1|, ,|a I |}, consider the shrunken vector

b(g) = {b1(g), ,b I (g)}  with elements b k (g) = sign(a k ) if |a k | > g and b k (g) =

0 otherwise (for k = 1, ,I) Update and normalize it such that b k (g) = 0 and

b2

k (g) = 1.

4 Include into the system the difference component b(g) which maximizes b(g) ˆSb(g) (i.e add the column b(g) to the matrix of loadings L).

5 If the maximum number of components is attained stop Otherwise let ˆS= S −

SL(LSL)−1LS and go back to step 2.

Simple solutions for the columns

1 Let S = D1/2 j 33D1/2 j , let L be an empty matrix and let ˆS = S.

2 Let a= (a1, , a J)be the first eigenvector of ˆS.

Trang 2

Simple Non Symmetrical Correspondence Analysis 213

3 For each cut-off value among g = {0,|a1|, ,|a J |}, consider the shrunken vector

b(g) = {b1(g), ,b I (g)}  with elements b k (g) = sign(a k ) if |a k | > g and b k (g) =

0 otherwise (for k = 1, ,J) Update and normalize it such that b k (g) = 0 and

4 Father’s and son’s occupations data

To illustrate the technique of Simple NSCA, we applied it to the well known Father’sand Son’s Occupations This data set (Perrin, 1904) was collected to study whetherand how the professional occupation of some man depends on the occupation of hisfather Occupations of 1550 men were cross-classified according to father’s and son’soccupation reparted into 14 occupations

The conclusion of the study was that such a dependence existed Two measures

of predicability, the Goodman-Kruskal’s W (1954) and the Light and Margolin’s C = (n − 1)(I − 1)W (1971), have been computed Note that the C-statistic can be used

to formally test for association, being asymptotically chi-squared distributed with

(I − 1)(J − 1) degrees of freedom under the hypothesis of no association (Light and

Margolin, 1971)

The overall increase in predicability of a man’s occupation when knowing the

oc-cupation of his father was equal to 14% (W = 0.14; C = 2880.8; d f = 169,

second one represents 20.7% Therefore Figure 1 accounts for 64.4% of the totalinertia

Unfortunately, the two-dimensional NSCA solution (Figure 1) does not give aclear description of the dependence of the two variables as well as of the associationbetween rows and columns Thus, NSCA is difficult to interpret and a simple solutionhas been calculated according to Simple NSCA

From Table 1, one can see that the first component defined by Simple NSCA forthe rows contrasts son’s occupation “Art” versus the group of occupations {Army,Divinity, Law, Medicine, Politics & Court and Scholarship & Science} This simplecomponent explains 42.5% of the variance compared to 43.7% for optimal solutionabove Thus, the first simple row solution is 42.5%/43.7%=97.4% optimal One canconclude that the influence of father’s occupation on son’s occupation mainly con-trasts these two groups of occupation The second simple row solution provided bySimple NSCA contrasts son’s occupation “Divinity” versus the group of occupations{Army and Politics & Court}

Trang 3

214 Antonello D’Ambra, Pietro Amenta and Valentin Rousson

Fig 1 Non Symmetrical Correspondence Analysis (NSCA): Joint plot.

The same table also contains the Simple NSCA solution for the columns Thefirst simple column solution contrasts father’s occupation “Art” versus “Divinity”,and is 81.9% optimal The second simple column solution contrast groups of father’soccupations {Army, Landownership, Law and Politics & Court} versus {Art andDivinity} with an optimality value of 90.4% Similarly, further simple constrats can

be defined for both the rows and the columns (see Table 1 for the first 5 solutions)

Trang 4

Simple Non Symmetrical Correspondence Analysis 215

Table 1 Simple NSCA solutions for the first five axes.

Axis1 Axis2 Axis3 Axis4 Axis5 Axis1 Axis2 Axis3 Axis4 Axis5Army 0,15 -0,41 -0,44 -0,37 -0,50 0,00 -0,89 -1,20 3,21 0,00Art -0,93 0,00 0,00 0,00 0,00 -2,04 1,77 -1,20 0,00 0,00TCCS 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00Crafts 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,86 0,00 0,00Divinity 0,15 0,82 -0,44 0,00 0,00 2,04 1,77 -1,20 0,00 0,00Agricolture 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,86 0,00 0,00Landownership 0,00 0,00 0,00 0,00 0,00 0,00 -0,89 -1,20 0,00 0,00Law 0,15 0,00 0,33 0,55 -0,50 0,00 -0,89 0,86 -1,61 -2,65Literature 0,00 0,00 0,33 0,00 0,00 0,00 0,00 0,86 0,00 0,00Commerce 0,00 0,00 0,33 0,00 0,00 0,00 0,00 0,86 0,00 0,00Medicine 0,15 0,00 0,00 -0,37 0,50 0,00 0,00 0,86 0,00 2,65Navy 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00POLCOURT 0,15 -0,41 -0,44 0,55 0,50 0,00 -0,89 -1,20 -1,61 0,00SCSCIENCE 0,15 0,00 0,33 -0,37 0,00 0,00 0,00 0,86 0,00 0,00

Explained variance (%)Optimal solu-

tion

43,70 64,40 75,30 83,00 89,20 43,70 64,40 75,30 83,00 89,20Simple solu-

tion

42,50 62,20 72,30 79,70 85,70 35,80 58,20 68,50 75,10 80,30Optimality 97,40 96,60 96,10 96,10 96,10 81,90 90,40 91,00 90,50 90,00Note: TCCS, POLCOURT and SCSCIENCE stand for “Teacher, Clerck and Civil

Servant”, “Politics & Court” and “Scolarship & Science”, respectively

To better summarize and visualize the relationship between father’s and son’soccupation, it is helpful to plot the solutions for rows and columns for each axis on asame graphic (Figure 2) One can see that the first Simple NSCA solution highlightsthe fact that a son has the tendency to choose the same occupation as his father ifthis occupation is “Art”, while father’s occupation “Divinity” is linked with a son’soccupation within {Army, Divinity, Law, Medicine, Politics & Court and Scholarship

& Science} Similarly, one can try to interpret the second Simple NSCA solution

In summary, Simple NSCA provides a clearcut picture of the situation, the mality of the first two axes being in this example of more than 95% (for the rows)and 90% (for the columns) Thus, the price to pay for simplicity is about 5% (for therows) and 10% (for the columns), which is not much In this sense, Simple NSCAmay be a worth alternative to NSCA

opti-5 Conclusions

In general, all PCA-based methods are tuned to condense information in an optimalway However, they define some abstract scores which often are not meaningful ornot well interpretable in practice This was also the case in our example above for

Trang 5

216 Antonello D’Ambra, Pietro Amenta and Valentin Rousson

Fig 2 Summary of Simple NSCA solutions for the axes 1 and 2.

NSCA To enhance interpretability, Simple NSCA focus on simplicity and seeksfor “optimal simple components”, as illustrated in our example It provides a clear-cut interpretation of the association between rows and columns, the price to payfor simplicity being relatively low In this sense, Simple NSCA may be a worthalternative to NSCA Extensions of this approach for the Classical CorrespondenceAnalysis and for ordinal variables are under investigation

Trang 6

Simple Non Symmetrical Correspondence Analysis 217

References

D ’AMBRA, L and LAURO, N.C (1989): Non symmetrical analysis of three-way

contin-gency tables In: R Coppi and S Bolasco (Eds.): Multiway Data Analysis North

Hol-land, 301–314

LIGHT, R J and MARGOLIN, B H (1971): An analysis of variance for categorical data

Journal of the American Statistical Association, 66, 534–544.

GERVINI, D and ROUSSON, V (2004): Criteria for evaluating dimension-reducing

compo-nents for multivariate data The American Statistician, 58, 72–76.

GOODMAN, L A and KRUSKAL, W H (1954): Measures of association for

cross-classifications Journal of the American Statistical Association, 49, 732–7644.

PERRIN, E (1904): On the Contingency Between Occupation in the Case of Fathers and

Trang 7

A Comparative Study on Polyphonic Musical Time

Series Using MCMC Methods

Katrin Sommer and Claus WeihsLehrstuhl für Computergestützte Statistik,

Universität Dortmund, 44221 Dortmund, Germany

sommer@statistik.uni-dortmund.de

Abstract A general harmonic model for pitch tracking of polyphonic musical time series will

be introduced Based on a model of Davy and Godsill (2002) the fundamental frequencies

of polyphonic sound are estimated simultaneously For an improvement of these results apreprocessing step was be implemented to build an extended polyphonic model

All methods are applied on real audio data from the McGill University Master Samples(Opolko and Wapnick (1987))

1 Introduction

The automatic transcription of musical time series data is a wide research domain.There are many methods for the pitch tracking of monophonic sound (e.g Weihs andLigges (2006)) More difficult is the distinction of polyphonic sound because of theproperties of the time series of musical sound

In this research paper we describe a general harmonic model for polyphonic sical time series data, based on a model of Davy and Godsill (2002) After trans-forming this model to an hierarchical bayes model the fundamental frequencies ofthis data can be estimated with MCMC methods

mu-Then we consider a preprocessing step to improve the results For this, we duce the design of an alphabet of artificial tones

intro-After that we apply the polyphonic model to real audio data from the McGill versity Master Samples (Opolko and Wapnick (1987)) We demonstrate the building

Uni-of an alphabet on real audio data and present the results Uni-of utilising such an alphabet.Further, we show first results of combining the preprocessing step and the MCMCmethods Finally the results are discussed and an outlook to future work is given

2 Polyphonic model

In this section the harmonic polyphonic model will be introduced and its componentswill be illustrated The model is based on the model of Davy and Godsill (2002) andhas the following structure:

Trang 8

286 Katrin Sommer and Claus Weihs

Fig 1 Illustration of the modelling with basis functions Modelling time-variant amplitudes

of a real audio signal

a k,h,icos(2Sh f k / f s t) + b k,h,isin(2Sh f k / f s t)+ Ht ,

The number of observations of the audio signal y t is T , t ∈ {0, ,T −1} Each signal

is normalized to[−1,1] since the absolute overall loudness of different recordings is not relevant The signal y t is made up of K tones each composed out of harmonics from H k partial tones In this paper the number of tones K is assumed to be known The first partial of the k-th tone is the fundamental frequency f k , the other H k − 1 partials are called overtones Further, f sis the sampling rate

To reduce the number of parameters to be estimated, the amplitudes a k,h,t and

b k,h,t of the k−th tone and the h-th partial tone at each timepoint t are modelled with

I+ 1 basis functions The basis functions It,iare equally spaced hanning windowswith 50% overlap:

It,i:= cos2[S(t − i')/(2')]1 [(i−1)',(i+1)'] (t),' = (T − 1)/I.

So the a k,h,i and b k,h,i are the amplitudes of the k-th tone, the h-th partial tone and the i-th basis function Finally, H t is the model error

Figure 1 shows the necessity of using basis functions and thus modelling variant amplitudes In the figure the points are the observations of the real signal Theassumption of constant amplitudes over time cannot depict the higher amplitudes atthe beginning of the tone (black line) Modelling with time-variant amplitudes (greyline) leads to better results

time-The model can be written as a hierarchical bayes model time-The estimation of the rameters results from stochastic search for the best coefficients in a given region withdifferent prior distributions The region and the probabilities are specified by distri-butions This leads to the implementation of MCMC methods (Gilks et al (1996))

Trang 9

pa-Polyphonic Musical Time Series 287

For the sampling of the fundamental frequency f k variants of the Hastings-Algorithm are used where the candidate frequencies are generated in dif-ferent ways

Metropolis-In the first variant the candidate for the fundamental frequency is sampled from

a uniform distribution in the range of the possible frequencies In the second variantthe new candidate for the fundamental frequency is the half or the double frequency

of the actual fundamental frequency In the third variant a random walk is used which

allows small changes of the fundamental frequency f kto get a more precise result

For the determination of the number of partial tones H ka reversible jump MCMCwas implemented In each iteration of the MCMC-computation one of these algo-rithms is chosen with a distinct probability

The parameters of the amplitude a k,h,i and b k,h,iare computed conditional on the

fundamental frequency f k and the number of partial tones H k

There is no full generation of the posterior distributions due to the computationalburden Instead we use a stopping criterion to stop the iterations if the slope of themodel error is no longer significant (Sommer and Weihs (2006))

3 Extended polyphonic model

An extented polyphonic model with an additional preprocessing step to the algorithms will be established in this section The results of this step could be thestarting values for the MCMC algorithm in order to improve the results

MCMC-For this purpose we constructed an alphabet of artificial tones These artificialtones are compared with the audio data to be analysed The artificial tones are com-posed by evaluating the periodograms of seven time intervals with 512 observations

of a real audio signal with 50% overlap So a time interval of 2048 observations is garded At a sampling rate of 11 025 Hz a time interval of 0.186 seconds is observed.These seven periodograms are averaged to a mean periodogram For better com-parability all values in this periodogram are set to zero which are smaller than onepercent of the maximum peak All artificial tones together form the alphabet

re-In figure 2 (upper part) a periodogram of a c4 (262 Hz) played by an electricguitar can be seen The lower part of figure 2 shows the small values of the peri-odogram The horizontal line reflects the value of one percent of the maximum value

of the periodogram All values below this line are set to zero in the alphabet

To determine the correct notes, every combination of two artificial tones of thealphabet is matched to the periodogram of the real audio signal The modified pe-riodograms of the two artificial tones are summed up to one periodograms Theseperiodograms are compared with the audio signal The notes corresponding to thetwo artificial tones which cause minimal error are considered as estimates for thetrue notes Finally, voting over ten time intervals leads to the estimation of the fun-damental frequencies

Trang 10

288 Katrin Sommer and Claus Weihs

Fig 2 Periodogram of note c4 played with an electric guitar Original (upper part) and zoomed

in with cut-off line (lower part)

4 Results

In this section results of estimating the fundamental frequencies of real audio datawill be figured out First, the data used in our studies will be introduced Then firstresults are shown Further the construction of an alphabet will be reconsidered andthen the results based on this alphabet are depicted Finally additional results areshown

4.1 Data

The data used for our monophonic and polyphonic studies are real audio data fromthe McGill University Master Samples (Opolko and Wapnick (1987)) We chose 5instruments (electric guitar, piano, violin, flute and trumpet) each with 5 notes (262,

Trang 11

Polyphonic Musical Time Series 289

Table 1 1 if both notes were correctly identified, 0 otherwise The left hand table requires the

exact note to be estimated, the right table also counts octaves of the note as correct

instrumentnotes flu guit pian trum viol

a brass instrument

For polyphonic data we superimposed the oszillations of two tones The firsttone was a c4 (262 Hz) played by the piano This tone was combined with eachinstrument–tone combination we used So we had 25 datasets each normalized to

[−1,1] The pitches of the tones were tracked over ten time intervals of T = 512

observations with 50% overlap at a sampling rate of 11 025 Hz The number of servations in one time interval is a tradeoff between the computational burden andthe quality of the estimate The estimate of the notes is the result of voting over theten time intervals The estimated notes are the two notes which occur in the ten timeintervals most often

ob-4.2 First results with polyphonic model

The first step in our analysis was to consider how good the model works and if thepitch of a tone is estimated exactly For this purpose we made a first study withmonophonic data The results of the study with monophonic time series data werevery promising In most cases the correct note was estimated and the deviations fromthe correct fundamental frequencies were minor (Sommer and Weihs (2006)).The results of the estimation of polyphonic time series data are not as promising

as the results with monophonic time series data There are many notes which are notestimated correctly The left side of Table 1 shows 1 if both notes were estimatedcorrectly and 0 otherwise In 15 of the 25 experiments both notes were estimatedcorrectly Counting octaves of the notes as correct increases the number of correctestimates to 21 (see the right hand side of Table 1) It can be seen that all notes ofthe combination c4–g4 are estimated incorrectly, but they are correct by counting theoctaves of the right notes as correct (Sommer and Weihs (2007))

Analysing the data over 20, 30 and 50 time intervals results in the same outcomes

So it seems to be adequate to examine 10 time intervals In longer interval series newcorrectly estimated notes could not be determined

Trang 12

290 Katrin Sommer and Claus Weihs

Table 2 1 if both notes AND instruments are correctly recognized after voting and 1if bothnotes are estimated correcty, but not the instrument (left), including octaves of the correctnotes (right) In 22 (left) and 23 (right) cases both notes are estimated correctly, in 18 casesfor both tones the correct instrument is recognized

instrumentnotes flu guit pian trum viol

c4–c5 1 1 1 1 1

4.3 Results with extended polyphonic model

In a first study with an alphabet of artificial tones we used 30 notes from g3 (196 Hz)

to c6 (1 047 Hz) of the same five instruments as for the studies in section 4.2 Thechoice of this range is restricted by the availability of the data of the McGill Univer-sity Master Samples The mean periodogram is computed out of seven periodograms

each with T= 512 observations with 50% overlap at a sampling rate of 11 025 Hz.The first 1000 observations of a note were not considered for this periodogram inorder to omit the attack of an instrument Overall there are 150 artificial notes in thealphabet

With this alphabet 11 325 pairwise comparisons of two artificial tones with theaudio signal have to be computed The results of the estimates of the same 25 note-combinations used in the previous study can be seen in table 2 The left hand side

of the table shows that in 22 of 25 cases the fundamental frequency of both notes isestimated correctly If octaves of the correct notes are counted as correct this numberincreases to 23 (right hand side of table 2)

Further, the entries in table 2 are annotated with a star if the instruments are notrecognized correctly This means that only in 18 of 22 cases (18 of 23) the instru-ments of both notes are identified correctly Moreover, it can be seen that the caseswhere the notes are estimated incorrectly occur only in the first and last rows of thetables So the correct estimation of the notes seems to be a problem if both notes arethe same or one is the octave of the other

4.4 Further results

Using these estimated notes as starting values for the MCMC algorithm in order toestimate the fundamental frequencies more precisely does not lead to an improve-ment of the results of the preprossing To the contrary, the results are comparable tothe results without this preprocessing step In most of the cases the estimated notesare the octave of the correct notes Often, the MCMC algorithm leads to an esti-mate ˆH= 1 of the number of partial tones This often meant that only the octave of

Ngày đăng: 05/08/2014, 21:21

TỪ KHÓA LIÊN QUAN