1. Trang chủ
  2. » Khoa Học Tự Nhiên

Báo cáo hóa học: " Multi-label classification of music by emotion" ppt

9 374 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 9
Dung lượng 277,76 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The primary aim of this article is twofold: • The experimental evaluation of seven multi-label classification algorithms using a variety of evaluation measures.. Related work This sectio

Trang 1

R E S E A R C H Open Access

Multi-label classification of music by emotion

Konstantinos Trohidis1*, Grigorios Tsoumakas2, George Kalliris1and Ioannis Vlahavas2

Abstract

This work studies the task of automatic emotion detection in music Music may evoke more than one different emotion at the same time Single-label classification and regression cannot model this multiplicity Therefore, this work focuses on multi-label classification approaches, where a piece of music may simultaneously belong to more than one class Seven algorithms are experimentally compared for this task Furthermore, the predictive power of several audio features is evaluated using a new multi-label feature selection method Experiments are conducted

on a set of 593 songs with six clusters of emotions based on the Tellegen-Watson-Clark model of affect Results show that multi-label modeling is successful and provide interesting insights into the predictive quality of the algorithms and features

Keywords: multi-label classification, feature selection, music information retrieval

1 Introduction

Humans, by nature, are emotionally affected by music

Who can argue against the famous quote of the German

philosopher Friedrich Nietzsche, who said that ‘without

music, life would be a mistake’ As music databases

grow in size and number, the retrieval of music by

emo-tion is becoming an important task for various

applica-tions, such as song selection in mobile devices [1],

music recommendation systems [2], TV and radio

pro-gramsa

, and music therapy

Past approaches towards automated detection of

emo-tions in music modeled the learning problem as a

sin-gle-label classification [3,4] regression [5], or multi-label

classification [6-9] task Music may evoke more than

one different emotion at the same time Single-label

classification and regression cannot model this

multipli-city Therefore, the focus of this article is on multi-label

classification methods The primary aim of this article is

twofold:

• The experimental evaluation of seven multi-label

classification algorithms using a variety of evaluation

measures Previous work experimented with just a

single algorithm We employ some recent

develop-ments in multi-label classification and show which

algorithms perform better for musical data

• The creation of a new multi-label dataset with 72 music features for 593 songs categorized into one or more out of 6 classes of emotions The dataset is released to the publicb, in order to allow compara-tive experiments by other researchers Publicly avail-able multi-label music datasets are rare, hindering the progress of research in this area

The remaining of this article is structured as follows Sections 2 reviews related work and Sections 3 and 4 provide background material on multi-label classification and emotion modeling, respectively Section 5 presents the details of the dataset used in this work Section 6 presents experimental results comparing the seven multi-label classification algorithms Finally, conclusions are drawn and future work is proposed in Section 7

2 Related work

This section discusses past efforts on emotion detection

in music, mainly in terms of emotion model, extracted features, and the kind of modeling of the learning pro-blem: (a) single label classification, (b) regression, and (c) multi-label classification

2.1 Single-label classification The four main emotion classes of Thayer’s model were used as the emotion model in [3] Three different fea-ture sets were adopted for music representation, namely intensity, timbre, and rhythm Gaussian mixture models were used to model each of the four classes An

* Correspondence: trohidis2000@yahoo.com

1

Department of Journalism and Mass Communication, Aristotle University of

Thessaloniki, Thessaloniki, 54124, Greece

Full list of author information is available at the end of the article

© 2011 Trohidis et al; licensee Springer This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in

Trang 2

interesting contribution of this work, was a hierarchical

classification process, which first classifies a song into

high/low energy (vertical axis of Thayer’s model), and

then into one of the two high/low stress classes

The same emotion classes were used in [4] The

authors experimented with two fuzzy classifiers, using

the 15 features proposed in [10] They also experimented

with a feature selection method, which improved the

overall accuracy (around 78%), but they do not mention

which features were selected

The classification of songs into a single cluster of

emotions was a new category in the 2007 MIREX

(Music Information Retrieval Evaluation eXchange)

competitionc The top two submissions of the

competi-tion were based on support vector machines (SVM)

The model of mood that was used in the competition

included five clusters of moods proposed in [11], which

was compiled based on a statistical analysis of the

rela-tionship of mood with genre, artist, and usage metadata

Among the many interesting conclusion of the

competi-tion, was the difficulty to discern between certain

clus-ters of moods, due to their semantic overlap A

multi-label classification approach could overcome this

pro-blem, by allowing the specification of multiple

finer-grain emotion classes

2.2 Regression

Emotion recognition is modeled as a regression task in

[5] Volunteers rated a training collection of songs in

terms of arousal and valence in an ordinal scale of 11

values from -1 to 1 with a 0.2 step The authors then

trained regression models using a variety of algorithms

(with SVMs having the best performance) and a variety

of extracted features Finally, a user could retrieve a

song by selecting a point in the two-dimensional arousal

and valence mood plane of Thayer

Furthermore, the authors used a feature selection

algorithm, leading to an increase of the predictive

per-formance However, it is not clear if the authors run the

feature selection process on all input data on each fold

of the 10-fold cross-validation used to evaluate the

regressors If the former is true, then their results may

be optimistic, as the feature selection algorithm had

access to the test data A similar pitfall of feature

selec-tion in music classificaselec-tion is discussed in [12]

2.3 Multi-label classification

Both regression and single-label classification methods

suffer from the same problem: two different (clusters of)

emotions cannot be simultaneously predicted

Multi-label classification allows for a natural modeling of this

issue

Li and Ogihara [6] used two emotion models: (a) the

ten adjective clusters of Farnsworth (extended with

three clusters of adjectives proposed by the labeler) and (b) a further clustering of those into six super-clusters They only experimented with the BR multi-label classifi-cation method using SVMs as the underlying base sin-gle-label classifier In terms of features, they used Marsyas [13] to extract 30 features related to the timbral texture, rhythm, and pitch The predictive performance was low for the clusters and better for the super-clus-ters In addition, they found evidence that genre is cor-related with emotions

In an extension of their work, Li and Ogihara [7] con-sidered three bipolar adjective pairs Cheerful vs Depres-sing, Relaxing vs Exciting, and Comforting vs Disturbing Each track was initially labeled using a scale ranging from -4 to +4 by two subjects and then con-verted to a binary (positive/negative) label The learning approach was the same with [6] The feature set was expanded with a new extraction method, called Daube-chies wavelet coefficient histograms The authors report

an accuracy of around 60%

The same 13 clusters as in [6] were used in [8], where the authors modified the k Nearest Neighbors algorithm

in order to handle multi-label data directly They found that the predictive performance was low, too Recently, Pachet and Roy [14] used stacked binary relevance (2BR) for the multi-label classification of music samples into a large number of labels (632)

Compared to our work, none of the aforementioned approaches discusses feature selection from multi-label data, compares different multi-label classification algo-rithms or uses a variety of multi-label evaluation mea-sures in its empirical study

3 Multi-label classification

Traditional single-label classification is concerned with learning from a set of examples that are associated with

a single labell from a set of disjoint labels L, |L| > 1 In multi-label classification, the examples are associated with a set of labels Y⊆ L

3.1 Learning algorithms Multi-label classification algorithms can be categorized into two different groups [15]: (i) problem transforma-tion methods, and (ii) algorithm adaptation methods The first group includes methods that are algorithm independent They transform the multi-label classifica-tion task into one or more single-label classificaclassifica-tion, regression, or ranking tasks The second group includes methods that extend specific learning algorithms in order to handle multi-label data directly

We next present the methods that are used in the experimental part of this work For the formal descrip-tion of these methods, we will use L = {(lj: j = 1 M} to denote the finite set of labels in a multi-label learning

Trang 3

task and D = {(xi, Yi), i = 1 N} to denote a set of

multi-label training examples, where xiis the feature vector

and Yi⊆ L the set of labels of the i-th example

Binary relevance (BR) is a popular problem

transfor-mation method that learns M binary classifiers, one for

each different label in L It transforms the original

data-set into M datadata-sets D λ j : j = 1 M that contain all

examples of the original dataset, labeled positively if the

label set of the original example containedljand

nega-tively otherwise For the classification of a new instance,

BR outputs the union of the labelsljthat are positively

predicted by the M classifiers

BR is criticized, because it does not take into account

label correlations and may fail to accurately predict label

combinations or rank labels according to relevance with

a new instance One approach that has been proposed

in the past in order to deal with the aforementioned

problem of BR, works generally as follows: it learns a

second (or Meta) level of binary models (one for each

label) that consider as input the output of all first (or

base) level binary models It will be called 2BR, as it

uses the BR method twice, in two consecutive levels

2BR follows the paradigm of stacked generalization [16],

a method for the fusion of heterogeneous classifiers,

widely known as stacking One of the earliest account of

2BR is [17], where 2BR was part of the SVM-HF

method, a SVM based algorithm for training the binary

models of both levels The abstraction of SVM-HF

irre-spectively of SVMs and its relation to stacking was

pointed out in [18] 2BR was very recently applied to

the analysis of musical titles [14]

Label powerset (LP) is a simple but effective problem

transformation method that works as follows: it

consid-ers each unique set of labels that exists in a multi-label

training set as one of the classes of a new single-label

classification task Given a new instance, the single-label

classifier of LP outputs the most probable class, which is

actually a set of labels

The computational complexity of LP with respect to

M depends on the complexity of the base classifier with

respect to the number of classes, which is equal to the

number of distinct label sets in the training set This

number is upper bounded by min (N, 2M) and despite

that it typically is much smaller, it still poses an

impor-tant complexity problem, especially for large values of N

and M Furthermore, the large number of classes, many

of which are associated with very few examples, makes

the learning process difficult as well

The random k-labelsets (RAkEL) method [19]

con-structs an ensemble of LP classifiers Each LP classifier

is trained using a different small random subset of the

set of labels This way RAkEL manages to take label

cor-relations into account, while avoiding LP’s problems A

ranking of the labels is produced by averaging the zero-one predictions of each model per considered label Thresholding is then used to produce a classification as well

Ranking by pairwise comparison (RPC) [20] trans-forms the multi-label dataset into M(M− 1)

2 binary label datasets, one for each pair of labels (li,lj), 1≤ i ≤

j≤ M Each dataset contains those examples of D that are annotated by at least one of the two corresponding labels, but not both A binary classifier that learns to discriminate between the two labels is trained from each

of these datasets Given a new instance, all binary classi-fiers are invoked, and a ranking is obtained by counting the votes received by each label

Calibrated label ranking (CLR) [21] extends RPC by introducing an additional virtual label, which acts as a natural breaking point of the ranking into relevant and irrelevant sets of labels This way, CLR manages to per-form multi-label classification

Multi-label back-propagation (BP-MLL) [22] is an adaptation of the popular back-propagation algorithm for multi-label learning The main modification to the algorithm is the introduction of a new error function that takes multiple labels into account

Multi-label k-nearest neighbor (ML-kNN) [23] extends the popular k nearest neighbors (kNN) lazy learning algorithm using a Bayesian approach It uses the maxi-mum a posteriori principle in order to determine the label set of the test instance, based on prior and poster-ior probabilities for the frequency of each label within the k nearest neighbors

3.2 Evaluation measures Multi-label classification requires different evaluation measures than traditional single-label classification A taxonomy of multi-label classification evaluation mea-sures is given in [19], which considers two main cate-gories: example-based and label-based measures A third category of measures, which is not directly related to multi-label classification, but is often used in the litera-ture, is ranking-based measures, which are nicely pre-sented in [23]

4 Emotions and music

4.1 Emotional models Emotions that are experienced and perceived while lis-tening to music are somehow different than those induced in everyday life Many studies indicate the important distinction between one’s perception of the emotion(s) expressed by music and the emotion(s) induced by music Studies of the distinctions between perception and induction of emotion have demonstrated

Trang 4

that both can be subjected to not only the social context

of the listening experience, but also to personal

motiva-tion [24] There are different approaches as to how

emotion can be conceptualized and described The main

approaches that exist in the literature are the

categori-cal, the dimensional, and the prototype approach

[25,26]

According to the categorical approach, emotions are

conceptualized as discrete unique entities According to

several discrete emotion theories, there is a certain basic

number of emotion categories from which all the

emo-tion states are derived such as happiness, sadness, anger,

fear, and disgust [27-32] Basic emotions are

character-ized by features having distinct functions, are found in

all cultures, associated with distinct physiological

pat-terns, experienced as unique feeling states and appear

early in the development of humans [27,29-32] In

stu-dies investigating music and emotion, the categorical

model of emotions has been modified to better

repre-sent the emotions induced by music Emotions such as

disgust are often replaced with the emotion of

tender-ness, which is more suitable in the context of music

While the categorical approach focuses on the distinct

characteristics that distinguish the emotions from each

other, in the dimensional approach, emotions are

expressed on a two-dimensional system according to

two axes such as valence and arousal This type of

model was first proposed by Izard [29,30] and later

modified by Wundt [33]

The dimensional approach includes Russell’s [34]

cir-cumplex model of affect, where all affective states arise

from two independent systems One is related to arousal

(activation-deactivation) and the other is related to

valence (pleasure-displeasure) and emotions can be

per-ceived as varying degrees of arousal and valence Thayer

[35] suggested that the two dimensions of affect are

pre-sented by two-arousal dimensions, tension, and

energetic arousal The dimensional models have been criticized in the past by the lack of differentiation of neighborhood emotions in the valence and arousal dimensions such as anger and fear [36]

In our study, the Tellegen-Watson-Clark model was employed This model (depicted in Figure 1) extends previous dimensional models emphasizing the value of a hierarchical perspective by integrating existing models

of emotional expressivity

It analyses a three-level hierarchy incorporating at the highest level a general bipolar happiness vs unhappiness dimension, an independent positive affect versus nega-tive affect dimension at the second order level below it, and discrete expressivity factors of joy, sadness, hostility, guilt/shame, fear emotions at the base Similarly, a three-level hierarchical model of affect links the basic factors of affect at different levels of abstraction and integrates previous models into a single scheme The key to this hierarchical structure is the recognition that the general bipolar factor of happiness and independent dimensions of PA and NA are better viewed as different levels of abstraction within a hierarchical model, rather than as competing models at the same level of abstrac-tion At the highest level of this model, the general bipolar factor of happiness accounts for the tendency for PA and NA to be moderately negatively correlated Therefore, the hierarchical model of affect accounted for both the bipolarity of pleasantness-unpleasantness and the independence of PA and NA, effectively resolving a debate that occupied the literature for decades

Over the years, a number of different dimensions have been proposed Wundt [33] proposed a three-dimen-sional scheme with the three dimensions of pleasure-dis-pleasure, arousal-calmness, and tension-relaxation Schlosberg [37] proposed a three-dimensional model with three main dimensions expressing arousal, valence, and control A similar model was proposed by

activate the application

High N/A

Low P/A

High P/A

Amazed Surprised Angry Distressed Fearful Discouraged sad

Sleepy tired

Quite still

Calm relaxed

Happy joyful Pleasantness

unpleasantness

Delighted Alert Low N/A

Figure 1 The Tellegen-Watson-Clark model of mood (figure reproduced from[51]).

Trang 5

Mehrabian [38] He tried to define a three-dimensional

model with three basic principles related to pleasure,

arousal, and dominance

Finally, the prototype approach is based on the fact

that language and knowledge structures associate with

how people conceptualize information [39] The

proto-type approach combines effectively the categorical and

dimensional approaches providing the individual

con-tents of emotions and the hierarchical relationship

among them

5 Dataset

The dataset used for this work consists of 100 songs

from each of the following 7 different genres: Classical,

Reggae, Rock, Pop, Hip-Hop, Techno, and Jazz The

col-lection was created from 233 musical albums choosing

three songs from each album From each song, a period

of 30 s after the initial 30 s was extracted

The resulting sound clips were stored and converted

into wave files of 22,050 Hz sampling rate, 16-bit per

sample, and mono The following subsections present

the features that were extracted from each wave file and

the emotion labeling process

5.1 Feature extraction

For the feature extraction process, the Marsyas tool [13]

was used, which is a free software framework It is

mod-ular, fast, and flexible for rapid development and

evalua-tion of computer audievalua-tion applicaevalua-tions and has been

commonly used for music emotion classification and

MIR tasks

The extracted features fall into two categories:

rhyth-mic and timbre We select the categories of temporal

and spectral features due to the highly correlation with

valence and arousal dimensions of emotion For

exam-ple, songs with fast tempo are often perceived as having

high arousal Often fluent and flowing rhythm is usually

associated with positive valence whereas firm rhythm is

associated with negative valence On the other hand,

high arousal often correlates with bright timbre and vice

versa low arousal with soft timbre

(1) Rhythmic features: The rhythmic features were

derived by extracting periodic changes from a beat

his-togram An algorithm that identifies peaks using

auto-correlation was implemented We selected the two

highest peaks and computed their amplitudes, their

BPMs (beats per minute) and the high-to-low ratio of

their BPMs In addition, three features were calculated

by summing the histogram bins between 40 and 90, 90

and 140, and 140 and 250 BPMs, respectively The

whole process led to a total of eight rhythmic features

(2) Timbre features: Mel frequency cepstral

coeffi-cients (MFCCs) are used for speech recognition and

music modeling [40] To derive MFCCs features, the

signal was divided into frames and the amplitude spec-trum was calculated for each frame Next, its logarithm was taken and converted to Mel scale Finally, the dis-crete cosine transform was implemented We selected the first 13 MFCCs

Another set of three features related to timbre tex-tures were extracted from the short-term Fourier trans-form (FFT): spectral centroid, spectral rolloff, and spectral flux This kind of features model the spectral properties of the signal such as the amplitude spectrum distribution, brightness, and the spectral change

For each of the 16 aforementioned features (13 MFCCs, 3 FFT), we calculated the mean, standard deviation (std), mean standard deviation (mean std), and standard deviation of standard deviation (std std) over all frames This led to a total of 64 features and 8 rhyth-mic features

5.2 Emotion labeling The Tellegen-Watson-Clark model was employed for labeling the data with emotions We decided to use this particular model because it presents a powerful way of organizing emotions in terms of their affect appraisals such as pleasant and unpleasant and psychological reac-tions such as arousal It is also especially useful for cap-turing the continuous changes in emotional expression occurring during a piece of music

The emotional space of music is abstract with many emotions and a music application based on mood should combine a series of moods and emotions To achieve this goal, without using an excessive number of labels, we reached a compromise retaining only six main emotional clusters from this model The corresponding labels are presented in Table 1

The sound clips were annotated by a panel of experts

of age 20, 25, and 30 from the School of Music Studies

in our University All experts had a high musical back-ground During the annotation process, all experts were encouraged to mark as many emotion labels as possible induced by music According to studies of Kivy [41], lis-teners make a fundamental attribution error in that they habitually take the expressive properties of music for what they feel This argument is strongly supported by other studies [42] in which listeners are instructed to

Table 1 Description of emotion clusters

Label Description # of Examples L1 Amazed-surprised 173

L2 Happy-pleased 166 L3 Relaxing-calm 264 L4 Quiet-still 148 L5 Sad-lonely 168 L6 Angry-fearful 189

Trang 6

describe both what they perceived and felt in response

to different music genres Meyer [43] argues that when

a listener reports that he felt an emotion, he describes

the emotion that a passage of music is supposed to

indi-cate rather than what he experienced Taking this

notion into account, we instructed the subjects to label

the sound clips according to what they felt rather than

what the music produced

Only the songs with completely identical labeling from

at least two experts were kept for subsequent

experi-mentation This process led to a final annotated dataset

of 593 songs Potential reasons for this unexpectedly

high agreement of the experts are the short track length

and their common background The last column of

Table 1 indicates the number of examples annotated

with each label Out of the 593 songs, 178 were

anno-tated with a single label, 315 with two labels, and 100

with three labels

6 Empirical comparison of algorithms

6.1 Multi-label classification algorithms

We compare the following multi-label classification

algorithms that were introduced in Section 2: BR, 2BR,

LP, RAkEL, CLR, ML-kNN, and BP-MLL

The first two approaches were selected as they are the

most basic approaches for multi-label classification

tasks RAkEL and CLR were selected, as two recent

methods that have been shown to be more effective

than the first two Finally, ML-kNN and BP-MLL were

selected, as two recent high- performance

representa-tives of problem adaptation methods Apart from BR,

none of the other algorithms have been evaluated on

music data in the past, to the best of our knowledge

6.2 Experimental setup

LP, BR, RAkEL, and CLR were run using a SVM as the

base classifier The SVM was trained with a linear kernel

and the complexity constant C equal to 1 An SVM with

the same setup was used for training both the first level

and the meta-level models of 2BR The one-against-one

strategy was used for dealing with multi-class tasks in

the case of LP and RAkEL RAkEL was run with subset

size equal to 3, number of models equal to twice the

number of labels (12), and a threshold of 0.5 which

cor-responds to a default parameter setting 10-fold

cross-validation was used for creating the necessary meta-data

for 2BR The number of neighbors in ML-kNN was set

to 10 and the smoothing factor to 1 as recommended in

[23] As recommended in [22], BP-MLL was run with

0.05 learning rate, 100 epochs and the number of

hid-den units equal to 20% of the input units

Ten different 10-fold cross-validation experiments

were run for evaluation The results that follow are

averages over these 100 runs of the different algorithms

Experiments were conducted with the aid of the Mulan software library for multi-label classification [44], which includes implementations of all algorithms and evalua-tion measures Mulan runs on top of the Weka [45] machine learning library

6.3 Results Table 2 shows the predictive performance of the seven competing multi-label classification algorithms using a variety of measures We evaluate the seven algorithms using three categories of multi-label evaluation mea-sures, namely example-based, label-based, and ranking-based measures Example-ranking-based measures include ham-ming loss, accuracy, precision, recall, F1-measure, and subset accuracy These measures are calculated based on the average differences of the actual and the predicted sets of labels over all test examples

Label-based measures include micro and macro preci-sion, recall, F1-measure, and area under the ROC curve (AUC) Finally, ranking-based measures include one-error, coverage, ranking loss, and average precision Table 2 shows the predictive performance of the seven competing multi-label classification algorithms

6.3.1 Example-based

As far as the example-based measures are concerned, RAkEL has a quite competitive performance, being best

in Hamming loss, second best in accuracy behind LP, best in the combination of precision and recall (F1), and second best in subset accuracy behind LP again

Table 2 Predictive performance of the seven different multi-label algorithms based on a variety of measures

BR LP RAkEL 2BR CLR

ML-kNN

BP-MLL Hamming

Loss

0.1943 0.1964 0.1849 0.1953 0.1930 0.2616 0.2064 Accuracy 0.5185 0.5887 0.5876 0.5293 0.5271 0.3427 0.5626 Precision 0.6677 0.6840 0.7071 0.6895 0.6649 0.5184 0.6457 Recall 0.5938 0.7065 0.6962 0.6004 0.6142 0.3802 0.7234

F 1 0.6278 0.6945 0.7009 0.6411 0.6378 0.4379 0.6814 Subset acc 0.2759 0.3511 0.3395 0.2839 0.2830 0.1315 0.2869 Micro prec 0.7351 0.6760 0.7081 0.7280 0.7270 0.6366 0.6541 Micro rec 0.5890 0.7101 0.6925 0.5958 0.6103 0.3803 0.7189 Micro F 1 0.6526 0.6921 0.6993 0.6540 0.6622 0.4741 0.6840 Micro AUC 0.7465 0.8052 0.8241 0.7475 0.8529 0.7540 0.8474 Macro prec 0.6877 0.6727 0.7059 0.6349 0.7036 0.4608 0.6535 Macro rec 0.5707 0.7018 0.6765 0.5722 0.5933 0.3471 0.7060 Macro F 1 0.6001 0.6782 0.6768 0.5881 0.6212 0.3716 0.6681 Macro AUC 0.7343 0.8161 0.8115 0.7317 0.8374 0.7185 0.8344 One-error 0.3038 0.3389 0.2593 0.2964 0.2512 0.3894 0.2946 Coverage 2.4378 1.9300 1.9983 2.4482 1.6914 2.2715 1.7664 Ranking loss 0.2776 0.1867 0.1902 0.2770 0.1456 0.2603 0.1635 Avg precis 0.7378 0.7632 0.7983 0.7392 0.8167 0.7104 0.7961

Trang 7

Example-based measures evaluate how well an

algo-rithm calculates a bipartition of the emotions into

rele-vant and irrelerele-vant, given a music title LP models

directly the combinations of labels and manages to

per-form well in predicting the actual set of relevant labels

RAkEL is based on an ensemble of LP classifiers, and as

expected further improves the good performance of LP

One of the reasons for the good performance of LP is

the relatively small number of labels (six emotional

clus-ters) As mentioned in Section 2, LP has problems

scal-ing to large numbers of labels, but RAkEL does not

suffer from such scalability issues

6.3.2 Label-based

As far as the micro and macro averaged measures are

concerned, LP and RAkEL again excel in the

combina-tion of precision and recall (F1) achieving the first two

places among their competitors, while BP-MLL

immedi-ately follows as third best The macro F1 measure

evalu-ates the ability of the algorithms to correctly identify the

relevance of each label, by averaging the performance of

individual labels, while the micro F1 measure takes a

more holistic approach by summing the distributions of

all labels first and then computing a single measure

Both measures evaluate in this case the retrieval of

rele-vant music titles by emotion

6.3.3 Ranking-based

A first clear pattern that can be noticed is the

superior-ity of CLR, as far as the ranking measures are

con-cerned Based on the pairwise comparisons of labels, it

ranks effectively relevant labels higher than irrelevant

labels Therefore, if the goal of a music application was

to present an ordered set of emotions for a music title,

then CLR should definitely be the algorithm to employ

Such an application for example, could be one that

recommends emotions to human annotators, in order to

assist them in their labor intensive task The good

prob-ability estimates that CLR obtains for the relevance of

each label through the voting of all pairwise models, is

also indicated by the top performance of CLR in the

micro and macro averaged AUC measures, which are

probability based BP-MLL is also quite good in the

ranking measures (apart from one-error) and in the

micro and macro averaged AUC measures, which

indi-cates that it also computes good estimates of the

prob-ability of relevance for each label

6.3.4 Label prediction accuracy

Table 3 shows the classification accuracy of the

algo-rithms for each label (as if they were independently

pre-dicted), along with the average accuracy in the last

column We notice that based on the ease of predictions

we can rank the labels in the following descending

order L4 (quiet-still), L6 (angry-fearful), L5 (sad-lonely),

L1 (amazed-surprised), L3 (relaxing-calm), and L2

(happy-pleased) L4 is the easiest with a mean accuracy

of approximately 88%, followed by L6, L5, L1, and L3 with mean accuracies of approximately 81, 80, 79, and 77% respectively The hardest label is L2 with a mean accuracy of approximately 72%

Based on the results, one can see that the classification model performs better for emotional labels such as L4 (quiet-still) rather than L2 (happy-pleased) This is not

at all in agreement with past research [46,47] claiming that the happy emotional tone tend to be among the easiest one to communicate in music

An explanation for this result is that happiness is a measure of positive valence and high activity Expressive cues describing the happiness emotion are fast tempo, small tempo variability, staccato articulation, high sound level, bright timbre, fast tone attacks, which are more difficult to model using the musical features extracted

On the other hand, quiet emotion is just a measure of energy corresponding to the activity dimension only, thus it can be more successfully described and repre-sented by the features employed

7 Conclusions and future work

This article investigated the task of multi-label map-ping of music into emotions An evaluation of seven multi-label classification algorithms was performed on

a collection of 593 songs Among these algorithms, CLR was the most effective in ranking the emotions according to relevance to a given song, while RAkEL was very competitive in providing a bipartition of the labels into relevant and irrelevant for a given song, as well as retrieving relevant songs given an emotion The overall predictive performance was high and encourages further investigation of multi-label meth-ods The performance per each different label varied The subjectivity of the label may be influencing the performance of its prediction

Multi-label classifiers such as CLR and RAkEL could

be used for the automated annotation of large music collections with multiple emotions This in turn would support the implementation of music information retrie-val systems that query music collections by emotion Such a querying capability would be useful for song selection in various applications

Table 3 Accuracy of the seven multi-label classification algorithms per each label

BR LP RAkEL 2BR CLR ML-kNN BP-MLL Avg L1 0.7900 0.7907 0.7976 0.7900 0.7954 0.7446 0.7871 0.7851 L2 0.7115 0.7380 0.7584 0.7113 0.7137 0.7195 0.7161 0.7241 L3 0.7720 0.7705 0.7804 0.7661 0.7735 0.7221 0.7712 0.7651 L4 0.8997 0.8992 0.9019 0.9002 0.8970 0.7969 0.8923 0.8839 L5 0.8287 0.8093 0.8250 0.8283 0.8295 0.7051 0.7894 0.8022 L6 0.8322 0.8142 0.8275 0.8320 0.8325 0.7422 0.8054 0.8123

Trang 8

Interesting future work directions are the

incorpora-tion of features based on song lyrics [48,49] as well as

the experimentation with hierarchical multi-label

classi-fication approaches [50], based on a hierarchical

organi-zation of emotions

Endnotes

a

http://www.musicovery.com/

b

http://mulan.sourceforge.net/datasets.html

c

http://www.music-ir.org/mirex/2007

List of abbreviations

AUC: area under the ROC curve; BP-MLL: multi-label back-propagation; BR:

binary relevance; CLR: calibrated label ranking; kNN: k nearest neighbors; LP:

label powerset; ML-kNN: multi-label nearest neighbor; RAkEL: random

k-labelsets; RPC: ranking by pairwise comparison; SVM: support vector

machine.

Author details

1 Department of Journalism and Mass Communication, Aristotle University of

Thessaloniki, Thessaloniki, 54124, Greece2Department of Informatics, Aristotle

University of Thessaloniki, Thessaloniki, 54124, Greece

Competing interests

The authors declare that they have no competing interests.

Received: 17 January 2011 Accepted: 18 September 2011

Published: 18 September 2011

References

1 M Tolos, R Tato, T Kemp, Mood-based navigation through large collections

of musical data, in Proceedings of the 2nd IEEE Consumer Communications

and Networking Conference (CCNC 2005), 71 –75, (3-6 January 2005)

2 R Cai, C Zhang, C Wang, L Zhang, W-Y Ma, Musicsense: contextual music

recommendation using emotional allocation modeling, in Proceedings of

the 15th International Conference on Multimedia, 553 –556 (2007)

3 L Lu, D Liu, H-J Zhang, Automatic mood detection and tracking of music

audio signals IEEE Trans Audio Speech Lang Process 14(1), 5 –18 (2006)

4 Y-H Yang, C-C Liu, H-H Chen, Music emotion classification: a fuzzy

approach, in Proceedings of ACM Multimedia 2006 (MM ’06), 81–84 (2006)

5 Y-H Yang, Y-C Lin, Y-F Su, H-H Chen, A regression approach to music

emotion recognition IEEE Trans Audio Speech Lang Process 16(2), 448 –457

(2008)

6 T Li, M Ogihara, Detecting emotion in music, in Proceedings of the

International Symposium on Music Information Retrieval, USA, 239 –240

7 T Li, M Ogihara, Toward intelligent music information retrieval IEEE Trans

Multimedia 8(3), 564 –574 (2006)

8 A Wieczorkowska, P Synak, ZW Ras, Multi-label classification of emotions in

music, in Proceedings of the 2006 International Conference on Intelligent

Information Processing and Web Mining (IIPWM ’06), 307–315 (2006)

9 K Trohidis, G Tsoumakas, G Kalliris, I Vlahavas, Multilabel classification of

music into emotions, in Proceedings of the 9th International Conference on

Music Information Retrieval (ISMIR 2008), USA (2008)

10 E Schubert, Measurement and Time Series Analysis of Emotion in Music PhD

thesis, University of New South Wales, (1999)

11 X Hu, JS Downie, Exploring mood metadata: relationships with genre, artist

and usage metadata, in Proceedings of the 8th International Conference on

Music Information Retrieval (ISMIR 2007), 67 –72 (2007)

12 R Fiebrink, I Fujinaga, Feature selection pitfalls and music classification, in

Proceedings of the International Conference on Music Information Retrieval

(ISMIR 2006), 340 –341 (2006)

13 G Tzanetakis, P Cook, Musical genre classification of audio signals IEEE

Trans Speech Audio Process 10(5), 293 –302 (2002) doi:10.1109/

TSA.2002.800560

14 F Pachet, P Roy, Improving multilabel analysis of music titles: A large-scale

validation of the correction approach IEEE Trans Audio Speech Lang

Process 17(2), 335 –343 (2009)

15 G Tsoumakas, I Katakis, I Vlahavas, Mining Multi-Label Data, Data Mining and Knowledge Discovery Handbook, Part 6, O Maimon L Rokach (Ed.), Springer, 2nd edition, pp 667 –685, (2010)

16 DH Wolpert, Stacked generalization Neural Netw 5, 241 –259 (1992) doi:10.1016/S0893-6080(05)80023-1

17 S Godbole, S Sarawagi, Discriminative methods for multi-labeled classification, in Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004), 22 –30 (2004)

18 G Tsoumakas, I Katakis, Multi-label classification: an overview Int J Data Warehousing Mining 3(3), 1 –13 (2007)

19 G Tsoumakas, I Vlahavas, Random k-labelsets: An ensemble method for multilabel classification, in Proceedings of the 18th European Conference on Machine Learning (ECML 2007), Poland, 406 –417 (2007)

20 E Hüllermeier, J Fürnkranz, W Cheng, K Bringer, Label ranking by learning pairwise preferences Artif Intell 172(16-17), 1897 –1916 (2008) doi:10.1016/j artint.2008.08.002

21 J Fürnkranz, E Hüllermeier, EL Mencia, K Brinker, Multilabel classification via calibrated label ranking Mach Learn 73(2), 133 –153 (2008) doi:10.1007/ s10994-008-5064-8

22 M-L Zhang, Z-H Zhou, Multi-label neural networks with applications to functional genomics and text categorization IEEE Trans Knowl Data Eng 18(10), 1338 –1351 (2006)

23 M-L Zhang, Z-H Zhou, ML-KNN: a lazy learning approach to multi-label learning Pattern Recog 40(7), 2038 –2048 (2007) doi:10.1016/j.

patcog.2006.12.019

24 PN Juslin, P Lukka, Expression, perception and induction of musical emotions: a review and questionnaire study of every day listening J New Music Res 33, 217 –238 (2004) (2004) doi:10.1080/0929821042000317813

25 PN Juslin, JA Sloboda (Eds.), Music and Emotion: Theory and Research (Oxford University Press, New York, 2001)

26 T Eerola, JK Vuoskoski, A comparison of the discrete and dimensional models of emotion in music Psychol Music 39(1), 18 –49 (2011).

doi:10.1177/0305735610362821

27 P Ekman, An argument for basic emotions, Cognition Emotion 6, 169 –200 (1992)

28 PR Farnsworth, A study of the Hevner adjective list J Aesth Art Crit 13,

97 –103 (1954) doi:10.2307/427021

29 CE Izard, The Emotions (Plenum Press, New York, 1977)

30 R Plutchik, The Psychology and Biology of Emotion (Harper Collins, New York, 1994)

31 J Panksepp, A critical role for affective neuroscience in resolving what is basic about basic emotions Psychol Rev 99, 554 –560 (1992)

32 K Oatley, Best laid schemes The Psychology of Emotions (Harvard University Press, MA, 1992)

33 WM Wundt, Outlines of Psychology (Wilhelm Engelmann, Leipzig, 1897) (Translated by CH Judd)

34 JA Russell, A circumplex model of affect J Soc Psychol 39, 1161 –1178 (1980)

35 RE Thayer, The Biopsychology of Mood and Arousal (Oxford University Press, 1989)

36 D Tellegen, D Watson, LA Clark, On the dimensional and hierarchical structure of affect Psychol Sci 10(4), 297 –303 (1999) doi:10.1111/1467-9280.00157

37 H Schlosberg, Three dimensions of emotion Psychol Rev 61(2), 81 –88 (1954)

38 A Mehrabian, Pleasure-arousal-dominance: a general framework for describing and measuring individual Curr Psychol 14(4), 261 –292 (1996) doi:10.1007/BF02686918

39 R Rosch, Cognition and categorization, in Principles of categorization, ed by Rosch E, Loyd BB (Hillsdale, NJ, 1978), pp 27 –48

40 B Logan, Mel frequency cepstral coefficients for music modeling, in Proceedings of the 1st International Symposium on Music Information Retrieval (ISMIR 2000), Plymouth, Massachusetts (2000)

41 P Kivy, Sound Sentiment An Essay on the Musical Emotions (Temple University Press, Philadelphia, PA, 1989)

42 MR Zentner, S Meylan, KR Scherer, Exploring ‘musical emotions’ across five genres of music, in Proceedings of 6th International Conference of society for Music Perception and Cognition (ICMPC) (2000)

43 LB Meyer, Emotion and Meaning in Music (University of Chicago Press, Chicago, 1956)

Trang 9

44 G Tsoumakas, E Spyromitros-Xioufis, J Vilcek, I Vlahavas, Mulan: A Java

Library for Multi-Label Learning J Mach Learn Res 12, 2411 –2414 (2001)

45 IH Witten, E Frank, Data Mining: Practical Machine Learning Tools and

Techniques (Morgan Kaufmann, 2011)

46 A Gabrielsson, PN Juslin Emotional expression in music performance:

between the performers intention and the listeners experience Psychol

Music 24(1), 68 –91 (1996) doi:10.1177/0305735696241007

47 L Krumhansl, An exploratory study of musical emotions and

psychophysiology Can J Exp Psychol 51, 336 –353 (1997)

48 C Laurier, J Grivolla, P Herrera, Multimodal music mood classification using

audio and lyrics, in Proceedings of the International Conference on Machine

Learning and Applications, USA (2008)

49 Y-H Yang, Y-C Lin, H-T Cheng, I-B Liao, Y-C Ho, H-H Chen, Toward

multi-modal music emotion classification, in Proceedings of the 9th Pacific Rim

Conference on Multimedia (PCM 2008), pp 70 –79 (2008)

50 C Vens, J Struyf, L Schietgat, S D žeroski, H Blockeel, Decision trees for

hierarchical multi-label classification Mach Learn 73(2), 185 –214 (2008).

doi:10.1007/s10994-008-5077-3

51 D Yang, W Lee, Disambiguating music emotion using software agents in

Proceedings of the 5th International Conference on Music Information Retrieval

(ISMIR ’04), Barcelona, Spain, (2004)

doi:10.1186/1687-4722-2011-426793

Cite this article as: Trohidis et al.: Multi-label classification of music by

emotion EURASIP Journal on Audio, Speech, and Music Processing 2011

2011:4.

Submit your manuscript to a journal and benefi t from:

7 Convenient online submission

7 Rigorous peer review

7 Immediate publication on acceptance

7 Open access: articles freely available online

7 High visibility within the fi eld

7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com

Ngày đăng: 20/06/2014, 22:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm