Mpeg 7 audio and beyond audio content indexing and retrieval phần 7 ppsx

1995 “The Application of Classical Information Retrieval Techniques to Spoken Documents”, PhD Thesis, University of Cambridge, Speech, Vision andRobotic Group, Cambridge, UK.. 2003 “Usin

Trang 1

The flexibility of the MPEG-7 SpokenContent description makes it usable in

many different application contexts The main possible types of applications are:

• Spoken document retrieval This is the most obvious application of spoken

content metadata, already detailed in this chapter The goal is to retrieveinformation in a database of spoken documents The result of the query may

be the top-ranked relevant documents As SpokenContent descriptions include

the time locations of recognition hypotheses, the position of the retrievedquery word(s) in the most relevant documents may also be returned to the

user Mixed SpokenContent lattices (i.e combining words and phones) could

be an efficient approach in most cases

• Indexing of audiovisual data The spoken segments in the audio stream can

be annotated with SpokenContent descriptions (e.g word lattices yielded by

an LVCSR system) A preliminary audio segmentation of the audio stream isnecessary to spot the spoken parts The spoken content metadata can be used

to search particular events in a film or a video (e.g the occurrence of a queryword or sequence of words in the audio stream)

• Spoken annotation of databases Each item in a database is annotated with

a short spoken description This annotation is processed by an ASR system

and attached to the item as a SpokenContent description This metadata can then be used to search items in the database, by processing the SpokenContent

annotations with an SDR engine A typical example of such applications,already on the market, is the spoken annotation of photographs In that case,speech decoding is performed on a mobile device (integrated in the cameraitself) with limited storage and computational capacities The use of a simplephone recognizer may be appropriate

4.5.3 Perspectives

One of the most promising perspectives for the development of efficient spokencontent retrieval methods is the combination of multiple independent index

sources A SpokenContent description can represent the same spoken information

at different levels of granularity in the same lattice by merging words andsub-lexical terms

These multi-level descriptions lead to retrieval approaches that combine thediscriminative power of large-vocabulary word-based indexing with the open-vocabulary property of sub-word-based indexing, by which the problem of OOVwords is greatly alleviated As outlined in Section 4.4.6.2, some steps havealready been made in this direction However, hybrid word/sub-word-based SDRstrategies have to be further investigated, with new fusion methods (Yu and Seide,2004) or new combinations of index sources, e.g combined use of distinct types

of sub-lexical units (Lee et al., 2004) or distinct LVCSR systems (Matsushita

et al., 2004).

Trang 2

REFERENCES 167

Another important perspective is the combination of spoken content with

other metadata derived from speech (Begeja et al., 2004; Hu et al., 2004).

In general, the information contained in a spoken message consists of morethan just words In the query, users could be given the possibility to searchfor words, phrases, speakers, words and speakers together, non-verbal speechcharacteristics (male/female), non-speech events (like coughing or other humannoises), etc In particular, the speakers’ identities may be of great interest forretrieving information in audio If a speaker segmentation and identificationalgorithm is applied to annotate the lattices with some speaker identifiers (stored

in SpeakerInfo metadata), this can help searching for particular events in a film

or a video (e.g sentences or words spoken by a given character in a film) The

SpokenContent descriptions enclose other types of valuable indexing information,

such as the spoken language

REFERENCES

Angelini B., Falavigna D., Omologo M and De Mori R (1998) “Basic Speech Sounds, their Analysis and Features”, in Spoken Dialogues with Computers, pp 69–121,

R De Mori (ed.), Academic Press, London

Begeja L., Renger B., Saraclar M., Gibbon D., Liu Z and Shahraray B (2004) “A System

for Searching and Browsing Spoken Communications”, HLT-NAACL 2004 Workshop

on Interdisciplinary Approaches to Speech Indexing and Retrieval, pp 1–8, Boston,

MA, USA, May

Browne P., Czirjek C., Gurrin C., Jarina R., Lee H., Marlow S., McDonald K., Murphy N.,O’Connor N E., Smeaton A F and Ye J (2002) “Dublin City University Video Track

Experiments for TREC 2002”, NIST, 11th Text Retrieval Conference (TREC 2002),

Gaithersburg, MD, USA, November

Buckley C (1985) “Implementation of the SMART Information Retrieval System”,Computer Science Department, Cornell University, Report 85–686

Chomsky N and Halle M (1968) The Sound Pattern of English, MIT Press, Cambridge,

MA

Clements M., Cardillo P S and Miller M S (2001) “Phonetic Searching vs LVCSR:

How to Find What You Really Want in Audio Archives”, AVIOS 2001, San Jose, CA,

USA, April

Coden A R., Brown E and Srinivasan S (2001) “Information Retrieval Techniques for

Speech Applications”, ACM SIGIR 2001 Workshop “Information Retrieval Techniques for Speech Applications”.

Crestani F (1999) “A Model for Combining Semantic and Phonetic Term Similarityfor Spoken Document and Spoken Query Retrieval”, International Computer ScienceInstitute, Berkeley, CA, tr-99-020, December

Crestani F (2002) “Using Semantic and Phonetic Term Similarity for Spoken Document

Retrieval and Spoken Query Processing” in Technologies for Constructing Intelligent Systems, pp 363–376, J G.-R B Bouchon-Meunier and R R Yager (eds) Springer-

Verlag, Heidelberg, Germany

Trang 3

Crestani F., Lalmas M., van Rijsbergen C J and Campbell I (1998) “ “Is This DocumentRelevant? Probably”: A Survey of Probabilistic Models in Information Retrieval”,

ACM Computing Surveys, vol 30, no 4, pp 528–552.

Deligne S and Bimbot F (1995) “Language Modelling by Variable Length Sequences:

Theoretical Formulation and Evaluation of Multigrams”, ICASSP’95, pp 169–172,

Detroit, USA

Ferrieux A and Peillon S (1999) “Phoneme-Level Indexing for Fast and

Vocabulary-Independent Voice/Voice Retrieval”, ESCA Tutorial and Research Workshop (ETRW),

“Accessing Information in Spoken Audio”, Cambridge, UK, April.

Gauvain J.-L., Lamel L., Barras C., Adda G and de Kercardio Y (2000) “The LIMSI SDR

System for TREC-9”, NIST, 9th Text Retrieval Conference (TREC 9), pp 335–341,

Gaithersburg, MD, USA, November

Glass J and Zue V W (1988) “Multi-Level Acoustic Segmentation of Continuous

Speech”, ICASSP’88, pp 429–432, New York, USA, April.

Glass J., Chang J and McCandless M (1996) “A Probabilistic Framework for

Feature-based Speech Recognition”, ICSLP’96, vol 4, pp 2277–2280, Philadelphia, PA, USA,

October

Glavitsch U and Schäuble P (1992) “A System for Retrieving Speech Documents”,

ACM, SIGIR, pp 168–176.

Gold B and Morgan N (1999) Speech and Audio Signal Processing, John Wiley &

Sons, Inc., New York

Halberstadt A K (1998) “Heterogeneous acoustic measurements and multiple classifiersfor speech recognition”, PhD Thesis, Massachusetts Institute of Technology (MIT),Cambridge, MA

Hartigan J (1975) Clustering Algorithms, John Wiley & Sons, Inc., New York.

Hu Q., Goodman F., Boykin S., Fish R and Greiff W (2004) “Audio Hot Spotting and

Retrieval using Multiple Features”, HLT-NAACL 2004 Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval, pp 13–17, Boston, MA, USA, May.

James D A (1995) “The Application of Classical Information Retrieval Techniques

to Spoken Documents”, PhD Thesis, University of Cambridge, Speech, Vision andRobotic Group, Cambridge, UK

Jelinek F (1998) Statistical Methods for Speech Recognition, MIT Press, Cambridge,

MA

Johnson S E., Jourlin P., Spärck Jones K and Woodland P C (2000) “Spoken Document

Retrieval for TREC-9 at Cambridge University”, NIST, 9th Text Retrieval Conference (TREC 9), pp 117–126, Gaithersburg, MD, USA, November.

Jones G J F., Foote J T., Spärk Jones K and Young S J (1996) “Retrieving

Spo-ken Documents by Combining Multiple Index Sources”, ACM SIGIR’96, pp 30–38,

Zurich, Switzerland, August

Katz S M (1987) “Estimation of Probabilities from Sparse Data for the Language Model

Component of a Speech Recognizer”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol 3, pp 400–401.

Kupiec J., Kimber D and Balasubramanian V (1994) “Speech-based Retrieval using

Semantic Co-Occurrence Filtering”, ARPA, Human Language Technologies (HLT) Conference, pp 373–377, Plainsboro, NJ, USA.

Larson M and Eickeler S (2003) “Using Syllable-based Indexing Features and Language

Models to Improve German Spoken Document Retrieval”, ISCA, Eurospeech 2003,

pp 1217–1220, Geneva, Switzerland, September

Trang 4

REFERENCES 169

Lee S W., Tanaka K and Itoh Y (2004) “Multi-layer Subword Units for

Open-Vocabulary Spoken Document Retrieval”, ICSLP’2004, Jeju Island, Korea, October.

Levenshtein V I (1966) “Binary Codes Capable of Correcting Deletions, Insertions and

Reversals”, Soviet Physics Doklady, vol 10, no 8, pp 707–710.

Lindsay A T., Srinivasan S., Charlesworth J P A., Garner P N and Kriechbaum W

(2000) “Representation and linking mechanisms for audio in MPEG-7”, Signal Processing: Image Communication Journal, Special Issue on MPEG-7, vol 16,

pp 193–209

Logan B., Moreno P J and Deshmukh O (2002) “Word and Sub-word Indexing

Approaches for Reducing the Effects of OOV Queries on Spoken Audio”, Human Language Technology Conference (HLT 2002), San Diego, CA, USA, March.

Matsushita M., Nishizaki H., Nakagawa S and Utsuro T (2004) “Keyword tion and Extraction by Multiple-LVCSRs with 60,000 Words in Speech-driven WEB

Recogni-Retrieval Task”, ICSLP’2004, Jeju Island, Korea, October.

Moreau N., Kim H.-G and Sikora T (2004a) “Combination of Phone N-Grams for

a MPEG-7-based Spoken Document Retrieval System”, EUSIPCO 2004, Vienna,

Austria, September

Moreau N., Kim H.-G and Sikora T (2004b) “Phone-based Spoken Document Retrieval

in Conformance with the MPEG-7 Standard”, 25th International AES Conference

“Metadata for Audio”, London, UK, June.

Moreau N., Kim H.-G and Sikora T (2004c) “Phonetic Confusion Based Document

Expansion for Spoken Document Retrieval”, ICSLP Interspeech 2004, Jeju Island,

Korea, October

Morris R W., Arrowood J A., Cardillo P S and Clements M A (2004) “Scoring

Algo-rithms for Wordspotting Systems”, HLT- NAACL 2004 Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval, pp 18–21, Boston, MA, USA, May.

Ng C., Wilkinson R and Zobel J (2000) “Experiments in Spoken Document Retrieval

Using Phoneme N-grams”, Speech Communication, vol 32, no 1, pp 61–77.

Ng K (1998) “Towards Robust Methods for Spoken Document Retrieval”, ICSLP’98,

vol 3, pp 939–342, Sydney, Australia, November

Ng K (2000) “Subword-based Approaches for Spoken Document Retrieval”, PhD Thesis,Massachusetts Institute of Technology (MIT), Cambridge, MA

Ng K and Zue V (1998) “Phonetic Recognition for Spoken Document Retrieval”,

ICASSP’98, pp 325–328, Seattle, WA, USA.

Ng K and Zue V W (2000) “Subword-based Approaches for Spoken Document

Retrieval”, Speech Communication, vol 32, no 3, pp 157–186.

Paul D B (1992) “An Efficient A∗ Stack Decoder Algorithm for Continuous

Speech Recognition with a Stochastic Language Model”, ICASSP’92, pp 25–28, San

Francisco, USA

Porter M (1980) “An Algorithm for Suffix Stripping”, Program, vol 14, no 3,

pp 130–137

Rabiner L (1989) “A Tutorial on Hidden Markov Models and Selected Applications in

Speech Recognition”, Proceedings of the IEEE, vol 77, no 2, pp 257–286.

Rabiner L and Juang B.-H (1993) Fundamentals of Speech Recognition, Prentice Hall,

Englewood Cliffs, NJ

Robertson E S (1977) “The probability ranking principle in IR”, Journal of tation, vol 33, no 4, pp 294–304.

Trang 5

Documen-Rose R C (1995) “Keyword Detection in Conversational Speech Utterances Using

Hidden Markov Model Based Continuous Speech Recognition”, Computer, Speech and Language, vol 9, no 4, pp 309–333.

Salton G and Buckley C (1988) “Term-Weighting Approaches in Automatic Text

Retrieval”, Information Processing and Management, vol 24, no 5, pp 513–523 Salton G and McGill M J (1983) Introduction to Modern Information Retrieval,

McGraw-Hill, New York

Srinivasan S and Petkovic D (2000) “Phonetic Confusion Matrix Based Spoken

Doc-ument Retrieval”, 23rd Annual ACM Conference on Research and Development in Information Retrieval (SIGIR’00), pp 81–87, Athens, Greece, July.

TREC (2001) “Common Evaluation Measures”, NIST, 10th Text Retrieval Conference (TREC 2001), pp A–14, Gaithersburg, MD, USA, November.

van Rijsbergen C J (1979) Information Retrieval, Butterworths, London.

Voorhees E and Harman D K (1998) “Overview of the Seventh Text REtrieval

Confer-ence”, NIST, 7th Text Retrieval Conference (TREC-7), pp 1–24, Gaithersburg, MD,

USA, November

Walker S., Robertson S E., Boughanem M., Jones G J F and Spärck Jones K (1997)

“Okapi at TREC-6 Automatic Ad Hoc, VLC, Routing, Filtering and QSDR”, 6th Text Retrieval Conference (TREC-6), pp 125–136, Gaithersburg, MD, USA, November.

Wechsler M (1998) “Spoken Document Retrieval Based on Phoneme Recognition”, PhDThesis, Swiss Federal Institute of Technology (ETH), Zurich

Wechsler M., Munteanu E and Schäuble P (1998) “New Techniques for

Open-Vocabulary Spoken Document Retrieval”, 21st Annual ACM Conference on Research and Development in Information Retrieval (SIGIR’98), pp 20–27, Melbourne,

Australia, August

Wells J C (1997) “SAMPA computer readable phonetic alphabet”, in Handbook of Standards and Resources for Spoken Language Systems, D Gibbon, R Moore and

R Winski (eds), Mouton de Gruyter, Berlin and New York

Wilpon J G., Rabiner L R and Lee C.-H (1990) “Automatic Recognition of Keywords

in Unconstrained Speech Using Hidden Markov Models”, Transactions on Acoustics, Speech and Signal Processing, vol 38, no 11, pp 1870–1878.

Witbrock M and Hauptmann A G (1997) “Speech Recognition and Information

Retrieval: Experiments in Retrieving Spoken Documents”, DARPA Speech Recognition Workshop, Chantilly, VA, USA, February.

Yu P and Seide F T B (2004) “A Hybrid Word/Phoneme-Based Approach for Improved

Vocabulary-Independent Search in Spontaneous Speech”, ICSLP’2004, Jeju Island,

Korea, October

Trang 6

Music Description Tools

The purpose of this chapter is to outline how music and musical signals can

be described Several MPEG-7 high-level tools were designed to describe theproperties of musical signals Our prime goal is to use these descriptors tocompare music signals and to query for pieces of music

The aim of the MPEG-7 Timbre DS is to describe some perceptual features

of musical sounds with a reduced set of descriptors These descriptors relate

to notions such as “attack”, “brightness” or “richness” of a sound The Melody

DS is a representation for melodic information which mainly aims at ing efficient melodic similarity matching The musical Tempo DS is defined to

facilitat-characterize the underlying temporal structure of musical sounds In this chapter

we focus exclusively on MPEG-7 tools and applications We outline how tance measures can be constructed that allow queries for music based on theMPEG-7 DS

dis-5.1 TIMBRE

5.1.1 Introduction

In music, timbre is the quality of a musical note which distinguishes differenttypes of musical instrument, see (Wikipedia, 2001) The timbre is like a formant

in speech; a certain timbre is typical for a musical instrument This is why, with

a little practice, it is possible for human beings to distinguish a saxophone from

a trumpet in a jazz group or a flute from a violin in an orchestra, even if theyare playing notes at the same pitch and amplitude Timbre has been called thepsycho-acoustician’s waste-basket as it can include so many factors

Though the phrase tone colour is often used as a synonym for timbre, colours ofthe optical spectrum are not generally explicitly associated with particular sounds.Rather, the sound of an instrument may be described with words like “warm” or

MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval H.-G Kim, N Moreau and T Sikora

Trang 7

“harsh” or other terms, perhaps suggesting that tone colour has more in commonwith the sense of touch than of sight People who experience synaesthesia,however, may see certain colours when they hear particular instruments.Two sounds with similar physical characteristics like pitch and loudness may

have different timbres The aim of the MPEG-7 Timbre DS is to describe

per-ceptual features with a reduced set of descriptors

MPEG-7 distinguishes four different families of sounds:

• Harmonic sounds

• Inharmonic sounds

• Percussive sounds

• Non-coherent sounds

These families are characterized using the following features of sounds:

• Harmony: related to the periodicity of a signal, distinguishes harmonic frominharmonic and noisy signals

• Sustain: related to the duration of excitation of the sound source, distinguishessustained from impulsive signals

• Coherence: related to the temporal behaviour of the signal’s spectral nents, distinguishes spectra with prominent components from noisy spectra.The four sound families correspond to these characteristics, see Table 5.1 Pos-sible target applications are, following the standard (ISO, 2001a):

compo-• Authoring tools for sound designers or musicians (music sample databasemanagement) Consider a musician using a sample player for music production,playing the drum sounds of in his or her musical recordings Large libraries

of sound files for use with sample players are already available The MPEG-7

Timbre DS could be facilitated to find percussive sounds in such a library

which matches best the musician’s idea for his or her production

• Retrieval tools for producers (query-by-example (QBE) search based on ceptual features) If a producer wants a certain type of sound and already has

per-Table 5.1 Sound families and sound characteristics (from ISO, 2001a)

Sound family Harmonic Inharmonic Percussive Non-coherentCharacteristics Sustained Sustained Impulsive Sustained

Harmonic Inharmonic

Example Violin, flute Bell, triangle Snare, claves Cymbals

Harmonic- Timbre

Instrument- Instrument- Timbre

Trang 8

Percussive-5.1 TIMBRE 173

a sample sound, the MPEG-7 Timbre DS provides the means to find the most

similar sound in a sound file of a music database Note that this problem isoften referred to as audio fingerprinting

All descriptors of the MPEG-7 Timbre DS use the low-level timbral

descrip-tors already defined in Chapter 2 of this book The following sections describe

the high-level DS InstrumentTimbre, HarmonicInstrumentTimbre and siveInstrumentTimbre.

Percus-5.1.2 InstrumentTimbre

The structure of the InstrumentTimbre is depicted in Figure 5.1 It is a set of

tim-bre descriptors in order to describe timtim-bres with harmonic and percussive aspects:

• LogAttackTime (LAT), the LogAttackTime descriptor, see Section 2.7.2.

• HarmonicSpectralCentroid (HSC), the HarmonicSpectralCentroid descriptor,

see Section 2.7.5

• HarmonicSpectralDeviation (HSD), the HarmonicSpectralDeviation

descrip-tor, see Section 2.7.6

• HarmonicSpectralSpread (HSS), the HarmonicSpectralSpread descriptor, see

Section 2.7.7

Figure 5.1 The InstrumentTimbre: + signs at the end of a field indicate furtherstructured content; – signs mean unfold content;· · · indicate a sequence (from Manjunath

et al., 2002)

Trang 9

• HarmonicSpectralVariation (HSV), the HarmonicSpectralVariation

• SpectralCentroid (SC), the SpectralCentroid descriptor, see Section 2.7.9.

• TemporalCentroid (TC), the TemporalCentroid descriptor, see Section 2.7.3.

har-monic and percussive features The following listing represents a harp using the

InstrumentTimbre It is written in MPEG-7 XML syntax, as mentioned in the

Figure 5.2 shows the HarmonicInstrumentTimbre It holds the following set of

timbre descriptors to describe the timbre perception among sounds belonging tothe harmonic sound family, see (ISO, 2001a):

• HarmonicSpectralCentroid (HSC), the HarmonicSpectralCentroid descriptor,

see Section 2.7.5

Trang 10

5.1 TIMBRE 175

Figure 5.2 The HarmonicInstrumentTimbre (from Manjunath et al., 2002)

• HarmonicSpectralDeviation (HSD), the HarmonicSpectralDeviation

• HarmonicSpectralSpread (HSS), the HarmonicSpectralSpread descriptor, see

Section 2.7.7

• HarmonicSpectralVariation (HSV), the HarmonicSpectralVariation

Trang 11

Figure 5.3 The PercussiveInstrumentTimbre (from Manjunath et al., 2002)

5.1.4 PercussiveInstrumentTimbre

The PercussiveInstrumentTimbre depicted in Figure 5.3 can describe impulsive

sounds without any harmonic portions To this end it includes:

• SpectralCentroid (SC), the SpectralCentroid descriptor, see Section 2.7.9.

• TemporalCentroid (TC), the TemporalCentroid descriptor, see Section 2.7.3.

Trang 12

The MPEG-7 Melody DS provides a rich representation for monophonic melodic

information to facilitate efficient, robust and expressive melodic similarity ing

match-The term melody denotes a series of notes or a succession, not a simultaneity

as in a chord, see (Wikipedia, 2001) However, this succession must containchange of some kind and be perceived as a single entity (possibly gestalt) to becalled a melody More specifically, this includes patterns of changing pitches anddurations, while more generally it includes any interacting patterns of changingevents or quality

What is called a “melody” depends greatly on the musical genre Rock musicand folk songs tend to concentrate on one or two melodies, verse and chorus.Much variety may occur in phrasing and lyrics In western classical music,composers often introduce an initial melody, or theme, and then create variations.Classical music often has several melodic layers, called polyphony, such asthose in a fugue, a type of counterpoint Often melodies are constructed frommotifs or short melodic fragments, such as the opening of Beethoven’s NinthSymphony Richard Wagner popularized the concept of a leitmotif: a motif ormelody associated with a certain idea, person or place

For jazz music a melody is often understood as a sketch and widely changed

by the musicians It is more understood as a starting point for improvization.Indian classical music relies heavily on melody and rhythm, and not so much onharmony as the above forms A special problem arises for styles like Hip Hopand Techno This music often presents no clear melody and is more related torhythmic issues Moreover, rhythm alone is enough to picture a piece of music,

e.g a distinct percussion riff, as mentioned in (Manjunath et al., 2002) Jobim’s

famous “One Note Samba” is an nice example where the melody switchesbetween pure rhythmical and melodic features

5.2.1 Melody

The structure of the MPEG-7 Melody is depicted in Figure 5.4 It contains

information about meter, scale and key of the melody The representation

Trang 13

Figure 5.4 The MPEG-7 Melody (from Manjunath et al., 2002)

of the melody itself resides inside either the fields MelodyContour or MelodySequence.

Besides the optional field Header there are the following entries:

• Meter: the time signature is held in the Meter (optional).

• Scale: in this array the intervals representing the scale steps are held (optional).

• Key: a container containing degree, alteration and mode (optional).

• MelodyContour: a structure of MelodyContour (choice).

• MelodySequence: a structure of MelodySequence (choice).

All these fields and necessary MPEG-7 types will be described in more detail inthe following sections

5.2.2 Meter

The field Meter contains the time signature It specifies how many beats are in

each bar and which note value constitutes one beat This is done using a fraction:the numerator holds the number of beats in a bar, the denominator contains thelength of one beat For example, for the time signature 4/4 each beat containsthree quarter notes The most common time signatures in western music are 4/4,3/4 and 2/4

The time signature also gives information about the rhythmic subdivision ofeach bar, e.g a 4/4 meter is stressed on the first and third bar by convention Forunusual rhythmical patterns in music complex signatures like 3+ 2 + 3/8 are given.Note that this cannot be represented exactly by MPEG-7 (see example next page)

Trang 14

5.2 MELODY 179

Figure 5.5 The MPEG-7 Meter (from Manjunath et al., 2002)

The Meter is shown in Figure 5.5 It is defined by:

• Numerator: contains values from 1 to 128.

• Denominator: contains powers of 2: 20 27, e.g 1 2 128

MPEG-7 Complex signatures like 3+ 2 + 3/8 have to be defined in a simplifiedmanner like 8/8

The Scale descriptor contains a list of intervals representing a sequence of

intervals dividing the octave The intervals result in a list of frequencies givingthe pitches of the single notes of the scale In traditional western music, scalesconsist of seven notes, made up of a root note and six other scale degrees whosepitches lie between the root and its first octave Notes in the scale are separated

by whole and half step intervals of tones and semitones, see (Wikipedia, 2001).There are a number of different types of scales commonly used in western

music, including major, minor, chromatic, modal, whole tone and pentatonic scales There are also synthetic scales like the diminished scales (also known as octatonic), the altered scale, the Spanish and Jewish scales, or the Arabic scale.

The relative pitches of individual notes in a scale may be determined by one

of a number of tuning systems Nowadays, in most western music, the equal temperament is the most common tuning system Starting with a pitch at F0, thepitch of noten can be calculated using:

Trang 15

Figure 5.6 The MPEG-7 Scale It is a simple vector of float values From (Manjunath

an example below

Also mentioned in the MPEG-7 standard is the Bohlen–Pierce (BP) scale,

a non-traditional scale containing 13 notes It was independently developed in

1972 by Heinz Bohlen, a microwave electronics and communications engineer,and later by John Robinson Pierce,1 also a microwave electronics and commu-nications engineer! See the examples for more details

The information of the Scale descriptor may be helpful for reference purposes The structure of the Scale is a simple vector of floats as shown in Figure 5.6:

• Scale: the vector contains the parameter n of Equation (5.3) Using the whole

numbers 1–12 results in the equal temperated chromatic scale, which is also

the default of the Scale vector If a number of frequencies fn of pitchesbuilding a scale are given, the values scale(n) of the Scale vector can be

calculated using:

scalen = 12 log2

fn

F0

temperature, is simply represented as:

Định dạng
Số trang	31
Dung lượng	589,99 KB