Today data mining efforts are going beyond the databases to focusing on datacollected in fields like art, design, hypermedia, and digital media production, med-ical multimedia data analy
Trang 2P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
Multimedia Data Mining and Knowledge Discovery
i
Trang 4British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Control Number: 2006924373
ISBN-10: 1-84628-436-8 Printed on acid-free paper
ISBN-13: 978-1-84628-436-6
© Springer-Verlag London Limited 2007
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency Enquiries concerning reproduction outside those terms should be sent to the publishers The use of registered names, trademarks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.
Whilst we have made considerable efforts to contact all holders of copyright material contained in this book, we may have failed to locate some of them Should holders wish to contact the Publisher, we will be happy to come to some arrangement with them.
9 8 7 6 5 4 3 2 1
Springer Science+Business Media
springer.com
iv
Trang 5P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
Contents
Preface xvii
List of Contributors xix
Part I Introduction 1 Introduction into Multimedia Data Mining and Knowledge Discovery 3
Valery A Petrushin 1.1 What Is Multimedia Data Mining? 3
1.2 Who Does Need Multimedia Data Mining? 5
1.3 What Shall We See in the Future? 8
1.4 What Can You Find in This Book? 8
References 12
2 Multimedia Data Mining: An Overview 14
Nilesh Patel and Ishwar Sethi 2.1 Introduction 14
2.2 Multimedia Data Mining Architecture 15
2.3 Representative Features for Mining 18
2.3.1 Feature Fusion 21
2.4 Supervised Concept Mining 21
2.4.1 Annotation by Classification 21
2.4.2 Annotation by Association 23
2.4.3 Annotation by Statistical Modeling 24
2.5 Concept Mining Through Clustering 25
2.6 Concept Mining Using Contextual Information 27
2.7 Events and Feature Discovery 29
2.8 Conclusion 33
References 33
v
Trang 6P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
vi Contents
Part II Multimedia Data Exploration and Visualization
3 A New Hierarchical Approach for Image Clustering 41
Lei Wang and Latifur Khan 3.1 Introduction 41
3.2 Related Works 42
3.3 Hierarchy Construction and Similarity Measurement 43
3.3.1 Object Clustering 44
3.3.2 Vector Model for Images 47
3.3.3 Dynamic Growing Self-Organizing Tree (DGSOT) Algorithm 47
3.4 Experiment Results 52
3.5 Conclusion and Future Works 54
References 55
4 Multiresolution Clustering of Time Series and Application to Images 58
Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos 4.1 Introduction 58
4.2 Background and Related Work 59
4.2.1 Background on Clustering 59
4.2.2 Background on Wavelets 61
4.2.3 Background on Anytime Algorithms 62
4.2.4 Related Work 62
4.3 Our Approach—theik-means Algorithm 62
4.3.1 Experimental Evaluation on Time Series 64
4.3.2 Data Sets and Methodology 65
4.3.3 Error of Clustering Results 65
4.3.4 Running Time 68
4.4 ik-means Algorithm vs k-means Algorithm 69
4.5 Application to Images 71
4.5.1 Clustering Corel Image Data sets 74
4.5.2 Clustering Google Images 75
4.6 Conclusions and Future Work 77
Acknowledgments 77
References 77
5 Mining Rare and Frequent Events in Multi-camera Surveillance Video 80
Valery A Petrushin 5.1 Introduction 80
5.2 Multiple Sensor Indoor Surveillance Project 82
5.3 Data Collection and Preprocessing 83
5.4 Unsupervised Learning Using Self-Organizing Maps 86
5.4.1 One-Level Clustering Using SOM 86
Trang 7P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
Contents vii
5.4.2 Two-Level Clustering Using SOM 89
5.4.3 Finding Unusual Events 90
5.5 Visualization Tool 91
5.6 Summary 92
References 92
6 Density-Based Data Analysis and Similarity Search 94
Stefan Brecheisen, Hans-Peter Kriegel, Peer Kr¨oger, Martin Pfeifle, Matthias Schubert, and Arthur Zimek 6.1 Introduction 94
6.2 Hierarchical Clustering 96
6.3 Application Ranges 98
6.3.1 Data Analysis 98
6.3.2 Navigational Similarity Search 100
6.4 Cluster Recognition for OPTICS 100
6.4.1 Recent Work 101
6.4.2 Gradient Clustering 102
6.4.3 Evaluation 106
6.5 Extracting Cluster Hierarchies for Similarity Search 108
6.5.1 Motivation 108
6.5.2 Basic Definitions 109
6.5.3 Algorithm 110
6.5.4 Choice ofε in the i-th Iteration 112
6.5.5 The Extended Prototype CLUSS 113
6.6 Conclusions 114
References 114
7 Feature Selection for Classification of Variable Length Multiattribute Motions 116
Chuanjun Li, Latifur Khan, and Balakrishnan Prabhakaran 7.1 Introduction 116
7.2 Related Work 118
7.3 Background 120
7.3.1 Support Vector Machines 120
7.3.2 Singular Value Decomposition 121
7.4 Feature Vector Extraction Based on SVD 123
7.4.1 SVD Properties of Motion Data 123
7.4.2 Feature Vector Extraction 125
7.5 Classification of Feature Vectors Using SVM 127
7.6 Performance Evaluation 128
7.6.1 Hand Gesture Data Generation 128
7.6.2 Motion Capture Data Generation 128
7.6.3 Performance Evaluation 129
7.6.4 Discussion 133
7.7 Conclusion 135
Trang 8P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
viii Contents
Acknowledgments 136
References 136
Part III Multimedia Data Indexing and Retrieval 8 FAST: Fast and Semantics-Tailored Image Retrieval 141
Ruofei Zhang and Zhongfei (Mark) Zhang 8.1 Introduction 141
8.2 Fuzzified Feature Representation and Indexing Scheme 144
8.2.1 Image Segmentation 144
8.2.2 Fuzzy Color Histogram for Each Region 146
8.2.3 Fuzzy Representation of Texture and Shape for Each Region 147
8.2.4 Region Matching and Similarity Determination 148
8.3 Hierarchical Indexing Structure and HEAR Online Search 150
8.4 Addressing User’s Subjectivity Using ITP and ARWU 153
8.5 Experimental Evaluations 157
8.6 Conclusions 165
References 165
9 New Image Retrieval Principle: Image Mining and Visual Ontology 168
Marinette Bouet and Marie-Aude Aufaure 9.1 Introduction 168
9.2 Content-Based Retrieval 170
9.2.1 Logical Indexation Process 171
9.2.2 Retrieval Process 172
9.3 Ontology and Data Mining Against Semantics Lack in Image Retrieval 173
9.3.1 Knowledge Discovery in Large Image Databases 174
9.3.2 Ontologies and Metadata 175
9.4 Toward Semantic Exploration of Image Databases 176
9.4.1 The Proposed Architecture 176
9.4.2 First Experimentations 179
9.5 Conclusion and Future Work 181
References 182
10 Visual Alphabets: Video Classification by End Users 185
Menno Isra¨el, Egon L van den Broek, Peter van der Putten, and Marten J den Uyl 10.1 Introduction 185
10.2 Overall Approach 186
10.2.1 Scene Classification Procedure 187
Trang 9P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
Contents ix
10.2.2 Related Work 187
10.2.3 Positioning the Visual Alphabet Method 189
10.3 Patch Features 189
10.3.1 Distributed Color Histograms 190
10.3.2 Histogram Configurations 190
10.3.3 Human Color Categories 191
10.3.4 Color Spaces 191
10.3.5 Segmentation of the HSI Color Space 192
10.3.6 Texture 193
10.4 Experiments and Results 194
10.4.1 Patch Classification 195
10.4.2 Scene Classification 196
10.5 Discussion and Future Work 197
10.6 Applications 198
10.6.1 Vicar 199
10.6.2 Porn Filtering 200
10.6.3 Sewer Inspection 201
10.7 Conclusion 203
Acknowledgments 203
References 203
Part IV Multimedia Data Modeling and Evaluation 11 Cognitively Motivated Novelty Detection in Video Data Streams 209
James M Kang, Muhammad Aurangzeb Ahmad, Ankur Teredesai, and Roger Gaborski 11.1 Introduction 209
11.2 Related Work 211
11.2.1 Video Streams 211
11.2.2 Image Novelty 212
11.2.3 Clustering Novelty in Video Streams 212
11.2.4 Event vs Novelty Clustering 213
11.3 Implementation 213
11.3.1 Machine-Based Process 213
11.3.2 Human-Based System 217
11.3.3 Indexing and Clustering of Novelty 220
11.3.4 Distance Metrics 223
11.4 Results 225
11.4.1 Clustering and Indexing of Novelty 225
11.4.2 Human Novelty Detection 228
11.4.3 Human vs Machine 228
11.5 Discussion 229
11.5.1 Issues and Ideas 229
Trang 10P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
x Contents
11.5.2 Summary 231
Acknowledgments 231
References 231
12 Video Event Mining via Multimodal Content Analysis and Classification 234
Min Chen, Shu-Ching Chen, Mei-Ling Shyu, and Chengcui Zhang 12.1 Introduction 234
12.2 Related Work 236
12.3 Goal Shot Detection 238
12.3.1 Instance-Based Learning 238
12.3.2 Multimodal Analysis of Soccer Video Data 239
12.3.3 Prefiltering 248
12.3.4 Nearest Neighbor with Generalization (NNG) 251
12.4 Experimental Results and Discussions 252
12.4.1 Video Data Source 252
12.4.2 Video Data Statistics and Feature Extraction 253
12.4.3 Video Data Mining for Goal Shot Detection 254
12.5 Conclusions 255
Acknowledgments 256
References 256
13 Identifying Mappings in Hierarchical Media Data 259
K Selc¸uk Candan, Jong Wook Kim, Huan Liu, Reshma Suvarna, and Nitin Agarwal 13.1 Introduction 259
13.1.1 Integration of RDF-Described Media Resources 259
13.1.2 Matching Hierarchical Media Objects 260
13.1.3 Problem Statement 261
13.1.4 Our Approach 262
13.2 Related Work 262
13.3 Structural Matching 264
13.3.1 Step I: Map Both Trees Into Multidimensional Spaces 265
13.3.2 Step II: Compute Transformations to Align the Common Nodes of the Two Trees in a Shared Space 266
13.3.3 Step III: Use the Identified Transformations to Position the Uncommon Nodes in the Shared Space 272
13.3.4 Step IV: Relate the Nodes from the Two Trees in the Shared Space 272
13.4 Experimental Evaluation 272
13.4.1 Synthetic and Real Data 273
13.4.2 Evaluation Strategy 275
13.4.3 Experiment Synth1–Label Differences 276
13.4.4 Experiment Synth2-Structural Differences 277
13.4.5 Experiment Real1: Treebank Collection 279
Trang 11P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
Contents xi
13.4.6 Execution Time 280
13.4.7 Synth3: When the Corresponding Nodes in the Two Trees Match Imperfectly 282
13.4.8 Synth4: Many-to-Many Correspondences Between Nodes 282
13.4.9 Execution Time with Fuzzy, Many-to-Many Mappings 286
13.4.10 Real2: Experiments with the CNN Data 286
13.5 Conclusions 286
Acknowledgments 287
References 287
14 A Novel Framework for Semantic Image Classification and Benchmark Via Salient Objects 291
Yuli Gao, Hangzai Luo, and Jianping Fan 14.1 Introduction 291
14.2 Image Content Representation Via Salient Objects 292
14.3 Salient Object Detection 294
14.4 Interpretation of Semantic Image Concepts 298
14.5 Performance Evaluation 302
14.6 Conclusions 304
References 305
15 Extracting Semantics Through Dynamic Context 307
Xin Li, William Grosky, Nilesh Patel, and Farshad Fotouhi 15.1 Introduction 307
15.2 Related Work 308
15.3 System Architecture 309
15.4 Segmentation 310
15.4.1 Chi-Square Method 310
15.4.2 Kruskal–Wallis Method 314
15.5 Extracting Semantics 316
15.5.1 Image Resegmentation 317
15.6 Experimental Results 321
15.7 Supporting MPEG-7 322
15.8 Conclusion 322
References 324
16 Mining Image Content by Aligning Entropies with an Exemplar 325
Clark F Olson 16.1 Introduction 325
16.2 Related Work 327
16.3 Matching with Entropy 328
16.4 Toward Efficient Search 330
16.5 Results 332
16.6 Large Image Databases 333
Trang 12P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
xii Contents
16.7 Future Work 336
16.7.1 Tracking 336
16.7.2 Grouping 337
16.8 Summary 337
Acknowledgments 338
References 338
17 More Efficient Mining Over Heterogeneous Data Using Neural Expert Networks 340
Sergio A Alvarez, Carolina Ruiz, and Takeshi Kawato 17.1 Introduction 340
17.1.1 Scope of the Chapter 340
17.1.2 Related Work 341
17.1.3 Outline of the Chapter 342
17.2 Artificial Neural Networks 343
17.2.1 Network Topologies 343
17.2.2 Network Training 345
17.3 Efficiency and Expressiveness 346
17.3.1 Time Complexity of Training 346
17.3.2 Expressive Power 347
17.4 Experimental Evaluation 350
17.4.1 Web Images Data 351
17.4.2 Networks 352
17.4.3 Performance Metrics 354
17.4.4 Evaluation Protocol 354
17.5 Results 355
17.5.1 Classification Performance 355
17.5.2 Time Efficiency 356
17.5.3 Discussion 356
17.5.4 Additional Experimental Results 356
17.6 Conclusions and Future Work 359
References 360
18 A Data Mining Approach to Expressive Music Performance Modeling 362
Rafael Ramirez, Amaury Hazan, Esteban Maestre, and Xavier Serra 18.1 Introduction 362
18.2 Melodic Description 363
18.2.1 Algorithms for Feature Extraction 363
18.2.2 Low-Level Descriptors Computation 363
18.2.3 Note Segmentation 366
18.2.4 Note Descriptor Computation 366
18.3 Expressive Performance Knowledge Induction 366
18.3.1 Training Data 367
18.3.2 Musical Analysis 367
Trang 13P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
Contents xiii
18.3.3 Data Mining Techniques 368
18.3.4 Results 373
18.4 Expressive Melody Generation 375
18.5 Related Work 376
18.6 Conclusion 378
Acknowledgments 378
References 379
Part V Applications and Case Studies 19 Supporting Virtual Workspace Design Through Media Mining and Reverse Engineering 383
Simeon J Simoff and Robert P Biuk-Aghai 19.1 Introduction 383
19.2 Principles of the Approach Toward Reverse Engineering of Processes Using Data Mining 388
19.2.1 The Information Pyramid Formalism 388
19.2.2 Integrated Collaboration Data and Data Mining Framework 391
19.3 Method for Reverse Engineering of Processes 392
19.4 Example of Reverse Engineering of Knowledge-Intensive Processes from Integrated Virtual Workspace Data 394
19.4.1 Task Analysis 397
19.4.2 Process Analysis 398
19.4.3 Temporal Analysis 401
19.5 Conclusions and Future Work 402
Acknowledgments 403
References 403
20 A Time-Constrained Sequential Pattern Mining for Extracting Semantic Events in Videos 404
Kimiaki Shirahama, Koichi Ideno, and Kuniaki Uehara 20.1 Introduction 404
20.2 Related Works 406
20.2.1 Video Data Mining 406
20.2.2 Sequential Pattern Mining 408
20.3 Raw Level Metadata 409
20.4 Time-Constrained Sequential Pattern Mining 412
20.4.1 Formulation 412
20.4.2 Mining Algorithm 414
20.4.3 Parallel Algorithm 418
20.5 Experimental Results 419
20.5.1 Evaluations of Assigning Categorical Values ofS M 420
20.5.2 Evaluation of Semantic Event Boundary Detections 420
Trang 14P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
xiv Contents
20.5.3 Evaluations of Extracted Semantic Patterns 421
20.6 Conclusion and Future Works 424
Acknowledgments 424
References 424
21 Multiple-Sensor People Localization in an Office Environment 427
Gang Wei, Valery A Petrushin, and Anatole V Gershman 21.1 Introduction 427
21.2 Environment 428
21.3 Related Works 431
21.4 Feature Extraction 433
21.4.1 Camera Specification 433
21.4.2 Background Modeling 434
21.4.3 Visual Feature Extraction 436
21.4.4 People Modeling 437
21.5 People Localization 438
21.5.1 Sensor Streams 438
21.5.2 Identification and Tracking of Objects 439
21.6 Experimental Results 444
21.7 Summary 446
References 446
22 Multimedia Data Mining Framework for Banner Images 448
Qin Ding and Charles Daniel 22.1 Introduction 448
22.2 Na¨ıve Bayesian Classification 450
22.3 The Bayesian Banner Profiler Framework 450
22.3.1 GIF Image Attribute Extraction 451
22.3.2 Attribute Quantization Algorithm 452
22.3.3 Bayesian Probability Computation Algorithm 453
22.4 Implementation 454
22.5 Conclusions and Future Work 456
References 456
23 Analyzing User’s Behavior on a Video Database 458
Sylvain Mongy, Fatma Bouali, and Chabane Djeraba 23.1 Introduction 458
23.2 Related Work 460
23.2.1 Web Usage Mining 460
23.2.2 Video Usage Mining 461
23.3 Proposed Approach 462
23.3.1 Context 462
23.3.2 Gathering Data 463
23.3.3 Modeling User’s Behavior: A Two-Level Based Model 464
Trang 15P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
Contents xv
23.4 Experimental Results 468
23.4.1 Creation of the Test Data Sets 468
23.4.2 Exploiting the Intravideo Behavior 468
23.4.3 Multiple Subsequence Cluster Modeling 469
23.5 Future work 470
References 470
24 On SVD-Free Latent Semantic Indexing for Iris Recognition of Large Databases 472
Pavel Praks, Libor Machala, and V´aclav Sn´aˇsel 24.1 Introduction 472
24.2 Image Retrieval Using Latent Semantic Indexing 473
24.2.1 Image Coding 473
24.2.2 Document Matrix Scaling 474
24.3 Implementation Details of Latent Semantic Indexing 476
24.3.1 LSI and Singular Value Decomposition 476
24.3.2 LSI1—The First Enhancement of the LSI Implementation 477
24.3.3 LSI2—The Second Enhancement of LSI 479
24.4 Iris Recognition Experiments 479
24.4.1 The Small-Size Database Query 480
24.4.2 The Large-Scale Database Query 481
24.4.3 Results of the Large-Scale Database Query 483
24.5 Conclusions 483
Acknowledgments 485
References 485
25 Mining Knowledge in Computer Tomography Image Databases 487
Daniela Stan Raicu 25.1 Introduction 487
25.2 Mining CT Data: Classification, Segmentation, and Retrieval 489
25.2.1 Image Classification: Related Work 489
25.2.2 Image Segmentation: Related Work 490
25.2.3 Image Retrieval: Related Work 491
25.3 Materials and Methods 492
25.3.1 Image Database 492
25.3.2 Texture Features 493
25.3.3 Classification Model 497
25.3.4 Similarity Measures and Performance Evaluation 498
25.4 Experimental Results and Their Interpretation 501
25.4.1 Tissue Classification Results 501
25.4.2 Tissue Segmentation Results 502
25.4.3 Tissue Retrieval Results 504
Trang 16P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
xvi Contents
25.5 Conclusions 505
References 506
Author Index 509
Subject Index 511
Trang 17Today data mining efforts are going beyond the databases to focusing on datacollected in fields like art, design, hypermedia, and digital media production, med-ical multimedia data analysis and computational modeling of creativity, includingevolutionary computation These fields use variety of data sources and structures,interrelated by the nature of the phenomenon As a result there is an increasing in-terest in new techniques and tools that can detect and discover patterns that can lead
to a new knowledge in the problem domain, where the data have been collected.There is also an increasing interest in the analysis of multimedia data generated
by different distributed applications, like collaborative virtual environments, virtualcommunities, and multiagent systems The data collected from such environmentsinclude record of the actions in them, audio and video recordings of meetings andcollaborative sessions, variety of documents that are part of the business process, asyn-chronous threaded discussions, transcripts from synchronous communications, andother data records These heterogeneous multimedia data records require sophisticated
xvii
Trang 18This book is based mostly on extended and updated papers that have been sented at the two Multimedia Data Mining Workshops—MDM KDD 2003 and MDMKDD 2004 that held in conjunction with the ACM SIGKDD Conference in Wash-ington, DC, August 2003 and the ACM SIGKDD Conference in Seattle, WA, August
pre-2004, respectively The book also includes several invited surveys and papers Thebook chapters give a snapshot of research and applied activities in the multimediadata mining
The editors are grateful to the founders and active supporters of the MultimediaData Mining Workshop Series Simeon Simoff, Osmar Zaiane, and Chabane Djeraba
We also thank the reviewers of book papers for their well-done job and organizers ofACM SIGKDD Conferences for their support
We thank the Springer-Verlag’s employees Wayne Wheeler, who initiated the bookproject, Catherine Brett, and Frank Ganz for their help in coordinating the publicationand editorial assistance
January 2006
Trang 19Fatma Bouali
LIFL UMR CNRS 8022France
Fatma.Bouali@univ-lille2.fr
Marinette Bouet
LIMOS – UMR 6158 CNRSBlaise Pascal University ofClermont-Ferrand IICampus des C´ezeaux 24, avenue desLandais
F-63173 Aubiere CedexFrance
bpclermont.fr
Marinette.Bouet@cust.univ-Stefan Brecheisen
Institute for Computer ScienceUniversity of Munich
Oettingenstr 67, 80538 Munich,Germany
brecheis@dbs.ifi.lmu.de
xix
Trang 20Farshad Fotouhi
Department of Computer ScienceWayne State University
Detroit, MI 48202USA
fotouhi@wayne.edu
Roger Gaborski
Rochester Institute of Technology
102 Lomb Memorial DriveRochester, NY 14623-5608USA
rsg@cs.rit.edu
Yuli Gao
Dept of Computer ScienceUniversity of North Carolina – CharlotteCharlotte, NC 28223
USAygao@uncc.edu
anatole.v.gershman@accenture.com
Trang 21Music Technology Group
Pompeu Fabra University
USAtakeshi@wpi.edu
Latifur Khan
Department of Computer ScienceUniversity of Texas at Dallas
75083 Richardson, TexasUSA
lkhan@utdallas.edu
Jong Wook Kim
Department of Computer Science andEngineering
Arizona State UniversityTempe, AZ 82857USA
Peer Kr¨oger
Institute for Computer ScienceUniversity of Munich
Oettingenstr 67, 80538 Munich,Germany
kroegerp@dbs.ifi.lmu.de
Trang 22P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
xxii List of Contributors
Chuanjun Li
Department of Computer Science
The University of Texas at Dallas
Richardson, Texas 75083
USA
chuanjun@utdallas.edu
Xin Li
Department of Computer Science
Oklahoma City University
Dept of Computer Science
University of North Carolina – Charlotte
Trang 23P1: OTE/SPH P2: OTE
SVNY295-Petrushin October 18, 2006 15:30
List of Contributors xxiii
Balakrishnan Prabhakaran
Department of Computer Science
University of Texas at Dallas,
Daniela Stan Raicu
Intelligent Multimedia Processing
Music Technology Group
Pompeu Fabra University
Computer Science Department
Worcester Polytechnic Institute
80538 MunichGermanyschubert@dbs.ifi.lmu.de
Xavier Serra
Music Technology GroupPompeu Fabra UniversityOcata 1, 08003 BarcelonaSpain
simeon@it.uts.edu.au
Trang 24Rochester Institute of Technology
102 Lomb Memorial Drive
Egon L van den Broek
Center for Telematics and Information
Technology (CTIT) and
Institute for Behavioral Research (IBR)
Ruofei Zhang
Computer Science DepartmentWatson School
SUNY BinghamtonBinghamton, NY, 13902-6000USA
rzhang@cs.binghamton.edu
Zhongfei (Mark) Zhang
Computer Science DepartmentWatson School,
SUNY BinghamtonBinghamton, NY 13902-6000USA
zhongfei@cs.binghamton.edu
Trang 25zimek@dbs.ifi.lmu.de
Trang 26P1: OTE/SPH P2: OTE
SVNY295-Petrushin November 1, 2006 16:9
Part I Introduction
1
Trang 27Summary This chapter briefly describes the purpose and scope of multimedia data mining
and knowledge discovery It identifies industries that are major users or potential users of thistechnology, outlines the current state of the art and directions to go, and overviews the chapterscollected in this book
1.1 What Is Multimedia Data Mining?
The traditional definition of data mining is that it “is the process of automatinginformation discovery” [1], which improves decision making and gives a companyadvantages on the market Another definition is that it “is the exploration and analysis,
by automatic or semiautomatic means, of large quantities of data in order to discovermeaningful patterns and rules” [2] It is also assumed that the discovered patternsand rules are meaningful for business Indeed, data mining is an applied discipline,which grew out of the statistical pattern recognition, machine learning, and artificialintelligence and coupled with business decision making to optimize and enhance it.Initially, data mining techniques have been applied to structured data from databases.The term “knowledge discovery in databases,” which is currently obsolete, reflectsthis period However, knowledge is interpretation of data meanings and knowledgediscovery goes beyond finding simple patterns and correlations in data to identifyingconcepts and finding relationships Knowledge-based modeling creates a consistentlogical picture of the world In recent years the term “predictive analytics” has beenwidely adopted in the business world [3]
On one hand growing computer power made data mining techniques affordable
by small companies, but on the other, emergence of cheap massive memory anddigital recording electronic devices, such as scanners, microphones, cameras, andcamcorders, allowed digitizing all kind of corporate, governmental, and private doc-uments Many companies consider these electronic documents as valuable assets andother sources of data for data mining For example, e-mail messages from customersand recordings of telephonic conversations between customers and operators couldserve as valuable sources of knowledge about both customers’ needs and the qual-ity of service Unprecedented growing information on the World Wide Web made it
3
Trang 28Recently, two branches of data mining, text data mining and Web data mining, haveemerged [4, 5] They have their own research agenda, communities of researchers,and supporting companies that develop technologies and tools Unfortunately, todaymultimedia data mining is still in an embryonic state It could be explained by im-mature technology, high cost of storing and processing media data, and the absence
of successful stories that show the benefits and the high rate of return on investmentsinto multimedia data mining
For understanding multimedia data mining in depth, let us consider its purposeand scope First, let us describe what kinds of data belong to the multimedia data.According to MPEG-7 Standard [6], there are four types of multimedia data: audiodata, which includes sounds, speech, and music; image data (black-and-white andcolor images); video data, which include time-aligned sequences of images; andelectronic or digital ink, which is sequences of time aligned 2D or 3D coordinates of
a stylus, a light pen, data glove sensors, or a similar device All this data is generated
by specific kind of sensors.
Second, let us take a closer look at the term multimedia data mining The word
multimedia assumes that several data sources of different modalities are processing
at the same time It could be or could not be the case A data mining project candeal with only one modality of data, for example, customers’ audio recordings or
surveillance video It would be better to use the term media data mining instead, but the word media usually connotes by mass media such as radio and television, which could be or could not be the data source for the data mining project The term sensor
data mining extends the scope too far covering such sensors as radars, speedometers,
accelerometers, echo locators, thermometers, etc This book is devoted to discussing
the first three media types mentioned above and we shall use the term multimedia
data mining and the acronym MDM keeping in mind the above discussion.
The MDM’s primary purpose is to process media data alone or in a combinationwith other data for finding patterns useful for business For example, analyze customertraffic in a retail store using video recordings to find optimal location for a new productdisplay Besides explicit data mining projects, the MDM techniques can be used as apart of complex operational or manufacturing processes For example, using imagesfor finding defective products or indexing a video database of company’s meetings.The MDM is a part of multimedia technology, which covers the following areas[7, 8]:
r Media compression and storage
r Delivering streaming media over networks with required quality of service
r Media restoration, transformation, and editing
r Media indexing, summarization, search, and retrieval
r Creating interactive multimedia systems for learning/training and creative artproduction
r Creating multimodal user interfaces
Trang 29P1: OTE/SPH P2: OTE
SVNY295-Petrushin November 1, 2006 16:9
1 Introduction into Multimedia Data Mining and Knowledge Discovery 5
The major challenge that the MDM shares with the multimedia retrieval is theso-called semantic gap, which is the difficulty of deriving a high-level concept such as
“mountain landscape” or “abnormal customer behavior” from low-level features such
as color histogram, homogeneous texture, contour-based shape, motion trajectory,etc., that are extracted from media data A solution of this problem requires creating
an ontology that covers different aspects of the concept and a hierarchy of recognizersthat deduce the probability of the concept from the probabilities of its componentsand relationships among them
1.2 Who Does Need Multimedia Data Mining?
To answer the above question, we consider the application areas of MDM and relatedindustries and companies who are (potential) users of technology The following arethe five major application areas of MDM:
Customer Insight—Customer insight includes collecting and summarizing
informa-tion about customers’ opinions about products or services, customers’ complains,customers’ preferences, and the level of customers’ satisfaction of products or ser-vices All product manufacturers and service providers are interested in customerinsight Many companies have help desks or call centers that accept telephonecalls from the customers These calls are recorded and stored If an operator is notavailable a customer may leave an audio message Some organizations, such asbanks, insurance companies, and communication companies, have offices wherecustomers meet company’s representatives The conversations between customersand sales representatives can be recorded and stored The audio data serve as aninput for data mining to pursue the following goals:
r Topic detection—The speech from recordings is separated into turns, i.e., speechsegments spoken by only one speaker Then the turns are transcribed into text andkeywords are extracted The keywords are used for detecting topics and estimat-ing how much time was spent on each topic This information aggregated by day,week, and month gives the overview of hot topics and allows the management
to plan the future training taking into account the emerging hot topics
r Resource assignment—Call centers have a high turn around rate and only smallpercentage of experienced operators, who are a valuable resource In case whenthe call center collects messages, the problem is how to assign an operator whowill call back To solve the problem, the system transcribes the speech anddetects the topic Then it estimates the emotional state of the caller If the caller
is agitated, angry, or sad, it gives the message a higher priority to be responded
by an experienced operator Based on the topic, emotional state of the caller, andoperators’ availability the system assigns a proper operator to call back [9]
r Evaluation of quality of service—At a call center with thousands of calls perday, the evaluation of quality of service is a laborious task It is done by peoplewho listen to selected recordings and subjectively estimate the quality of service.Speech recognition in combination with emotion recognition can be used forautomating the evaluation and making it more objective Without going into
Trang 30Currently, there are several small companies that provide tools and solutionsfor customer management for call centers Some tools include procedures thatare based on MDM techniques The extension of this market is expected in thefuture.
Surveillance—Surveillance consists of collecting, analyzing, and summarizing
au-dio, video, or audiovisual information about a particular area, such as battlefields,forests, agricultural areas, highways, parking lots, buildings, workshops, malls,retail stores, offices, homes, etc [10] Surveillance often is associated with intel-ligence, security, and law enforcement, and the major uses of this technology aremilitary, police, and private companies that provide security services The U.S Gov-ernment is supporting long-term research and development activities in this field
by conducting the Video Analysis and Content Extraction (VACE) program, which
is oriented on development technology for military and intelligence purposes TheNational Institute of Standards and Technology is conducting the annual evaluation
of content-based retrieval in video (TRECVID) since 2000 [11] However, manycivilian companies use security services for protecting their assets and monitoringtheir employees, customers, and manufacturing processes They can get valuableinsight from mining their surveillance data There are several goals of surveillancedata mining:
r Object or event detection/recognition—The goal is to find an object in an image
or in a sequence of images or in soundtrack that belongs to a certain class ofobjects or represents a particular instance For example, detect a military vehicle
in a satellite photo, detect a face in sequence of frames taken by a camera that
is watching an office, recognize the person whose face is detected, or recognizewhether sound represents music or speech Another variant of the goal is toidentify the state or attributes of the object For example, classify an X-ray image
of lungs as normal or abnormal or identify the gender of a speaker Finally,the goal can be rather complex for detecting or identifying an event, which is
a sequence of objects and relationships among objects For example, detect agoal event in a soccer game; detect violence on the street, or detect a trafficaccident This goal is often a part of the high-level goals that are describedbelow
r Summarization—The goal is to aggregate data by summarizing activities in spaceand/or time It covers summarizing activities of a particular object (for example,drawing a trajectory of a vehicle or indicating periods of time when a speaker wastalking) or creating a “big picture” of activities that happened during some period
of time For example, assuming that a bank has a surveillance system that includesmultiple cameras that watch tellers and ATM machines indoors and outdoors,the goal is to summarize the activities that happened in the bank during 24 h
To meet the goal, unsupervised learning and visualization techniques are used
Summarization also serves as a prerequisite step to achieve another goal—find
frequent and rare events.
Trang 31P1: OTE/SPH P2: OTE
SVNY295-Petrushin November 1, 2006 16:9
1 Introduction into Multimedia Data Mining and Knowledge Discovery 7
r Monitoring—The goal is to detect events and generate response in real time Themajor challenges are real-time processing and generating minimum false alarms.Examples are monitoring areas with restricted access, monitoring a public placefor threat detection, or monitoring elderly or disabled people at home
Today we witness a real boom in video processing and video mining researchand development [12–14] However, only a few companies provide tools that haveelements of data mining
Media Production and Broadcasting—Proliferation of radio stations and TV channels
makes broadcasting companies to search for more efficient approaches for creatingprograms and monitoring their content The MDM techniques are used to achievethe following goals:
r Indexing archives and creating new programs—A TV program maker uses raw
footage called rushes and clips from commercial libraries called stockshot
li-braries to create a new program A typical “shoot-to-show” ratio for a TV
pro-gram is in the range from 20 to 40 It means that 20–40 hours of rushes go intoone hour of a TV show Many broadcasting companies have thousands of hours
of rushes, which are poorly indexed and contain redundant, static, and low-valueepisodes Using rushes for TV programs is inefficient However managers be-lieve that rushes could be valuable if the MDM technology could help programmakers to extract some “generic” episodes with high potential for reuse
r Program monitoring and plagiarism detection—A company that pays for casting its commercials is interested to know how many times the commercialhas been really aired Some companies are also interested how many times theirlogo has been viewed on TV during a sporting event The broadcasting compa-nies or independent third-party companies can provide such service The othergoals are detecting plagiarism in and unlicensed broadcasting of music or videoclips This requires using MDM techniques for audio/video clip recognition [15]and robust object recognition [16]
broad-Intelligent Content Service—According to Forrester Research, Inc, (Cambridge, MA)
the Intelligent Content Service (ICS) is “a semantically smart content-centric set
of software services that enhance the relationship between information workersand computing systems by making sense of content, recognizing context, and un-derstanding the end user’s requests for information” [17] Currently, in spite ofmany Web service providers’ efforts to extend their search services beyond basickeyword search to ICS, it is not available for multimedia data yet This area will bethe major battlefield among the Web search and service providers in next 5 years.The MDM techniques can help to achieve the following goals:
r Indexing Web media and using advanced media search—It includes creatingindexes of images and audio and video clips posted on the Web using audioand visual features; creating ontologies for concepts and events; implementingadvance search techniques, such as search images or video clips by example or
by sketch and search music by humming; using Semantic Web techniques toinfer the context and improve search [18]; and using context and user feedback
to better understand the user’s intent
r Advanced Web-based services—Taking into account exponential growth of formation on the Web such services promise to be an indispensable part of
Trang 32ac-Knowledge Management—Many companies consider their archives of documents as
a valuable asset They spend a lot of money to maintain and provide access to theirarchives to employees Besides text documents, these archives can contain draw-ings of designs, photos and other images, audio and video recording of meetings,multimedia data for training, etc The MDM approaches can provide the ICS forsupporting knowledge management of companies
1.3 What Shall We See in the Future?
As to the future development in the multimedia data mining field, I believe that we shallsee the essential progress during the next 5 years The major driven force behind it iscreating techniques for advanced Web search to provide intelligent content services tocompanies for supporting their knowledge management and business intelligence Itwill require creating general and industry-specific ontologies, develop recognizers forentities of these ontologies, merging probabilistic and logical inferences, and usage
of metadata and reasoning represented in different vocabularies and languages, such
as MPEG-7 [6], MPEG-21 [20, 21], Dublin Core [22], RDF [23], SKOS [24], TGM[25], OWL [26], etc
Another force, which is driven by funding from mostly governmental agencies, isvideo surveillance and mass media analysis for improving intelligence, military, andlaw enforcement decision making
Mining audio and video recordings for customer insight and using MDM forimproving broadcasting companies’ production will also be growing and elaborating
in the future
From the viewpoint of types of media, most advances in video processing areexected in research and development for surveillance and broadcasting applications,audio processing will get benefits from research for customer insight and creatingadvanced search engines, and image processing will benefit from developing advancedsearch engines As to electronic ink, I believe this type of multimedia data will beaccumulated and used in data mining in the future, for example, for estimating theskills level of a user of an interactive tool or summarizing the contribution to a project
of each participant of a collaborative tool
1.4 What Can You Find in This Book?
As any collection of creative material the chapters of the book are different in style,topics, mathematical rigidity, level of generality, and readiness for deployment Butaltogether they build a mosaic of ongoing research in the fast changing dynamic field
of research that is the multimedia data mining
Trang 33P1: OTE/SPH P2: OTE
SVNY295-Petrushin November 1, 2006 16:9
1 Introduction into Multimedia Data Mining and Knowledge Discovery 9
The book consists of five parts: The first part, which includes two chapters, gives
an introduction into multimedia data mining Chapter 1, which you are reading now,overviews the multimedia data mining as an industry Chapter 2 presents an overview
of MDM techniques It describes a typical architecture of MDM systems and coversapproaches to supervised and unsupervised concept mining and event discovery.The second part includes five chapters It is devoted to multimedia data explorationand visualization Chapter 3 deals with exploring images It presents a clusteringmethod based on unsupervised neural nets and self-organizing maps, which is calledthe dynamic growing self-organizing tree algorithm (DGSOT) The chapter showsthat the suggested algorithm outperforms the traditional hierarchical agglomerativeclustering algorithm
Chapter 4 presents a multiresolution clustering of time series It uses the Haar
wavelet transform for representing time series at different resolutions and applies
k-means clustering sequentially on each level starting from the coarsest representation.The algorithm stops when two sequential membership assignments are equal or when
it reaches the finest representation The advantages of the algorithm are that it worksfaster and produces better clustering The algorithm has been applied to clusteringimages, using color and texture features
Chapter 5 describes a method for unsupervised classification of events in ticamera indoors surveillance video The self-organizing map approach has beenapplied to event data for clustering and visualization The chapter presents a tool forbrowsing clustering results, which allows exploring units of the self-organizing maps
mul-at different levels of hierarchy, clusters of units, and distances between units in 3Dspace for searching for rare events
Chapter 6 presents the density-based data analysis and similarity search It troduces a visualization technique called the reachability plot, which allows visuallyexploring a data set in multiple representations and comparing multiple similaritymodels The chapter also presents a new method for automatically extracting clusterhierarchies from a given reachability plot and describes a system prototype, whichserves for both the visual data analysis and a new way of object retrieval callednavigational similarity search
in-Chapter 7 presents an approach to the exploration of variable length multiattributemotion data captured by data glove sensors and 3-D human motion cameras It sug-gests using the singular value decomposition technique to regularize multiattributemotion data of different lengths and classify them applying support vector machineclassifiers Classification motion data using support vector machine classifiers is com-pared with classification by related similarity measures in terms of accuracy and CPUtime
The third part is devoted to multimedia data indexing and retrieval It consists
of three chapters Chapter 8 focuses on developing an image retrieval methodology,which includes a new indexing method based on fuzzy logic, a hierarchical indexing
structure, and the corresponding hierarchical elimination-based A∗ retrieval rithm with logarithmic search complexity in the average case It also deals with userrelevance feedbacks to tailor the semantic retrieval to each user’s individualized querypreferences
Trang 34Chapter 10 describes a methodology and a tool for end user that allows creatingclassifiers for images The process consists of two stages: First, small image fragmentscalled patches are classified Second, frequency vectors of these patch classificationsare fed into a second-level classier for global scene classification (e.g., city, portrait,
or countryside) The first-stage classifiers can be seen as a set of highly specializedfeature detectors that define a domain-specific visual alphabet The end user buildsthe second-level classifiers interactively by simply indicating positive examples of ascene The scene classifier approach has been successfully applied to several problemdomains, such as content-based video retrieval in television archives, automated sewerinspection, and pornography filtering
The fourth part is a collection of eight chapters that describe approaches to timedia data modeling and evaluation Chapter 11 presents an approach to automaticnovelty detection in video stream and compares it experimentally to human perfor-mance The evaluation of human versus machine-based novelty detection is quantified
mul-by metrics based on location of novel events, number of novel events, etc
Chapter 12 presents an effective approach for event detection using both audio andvisual features with its application in the automatic extraction of goal events in soccervideos The extracted goal events can be used for high-level indexing and selectivebrowsing of soccer videos The approach uses the nearest neighbor classifier withgeneralization scheme The proposed approach has been tested using soccer videos
of different styles that have been produced by different broadcasters
Chapter 13 describes an approach for mining and automatically discovering pings in hierarchical media data, metadata, and ontologies, using the structural infor-mation inherent in hierarchical data It uses structure-based mining of relationships,which provides high degrees of precision The approach works even when the map-pings are imperfect, fuzzy, and many-to-many
map-Chapter 14 proposes a new approach to high-level concept recognition in imagesusing the salient objects as the semantic building blocks The novel approach usessupport vector machine techniques to achieve automatic detection of the salient ob-jects, which serve as a basic visual vocabulary Then a high-level concept is modelingusing the Gaussian mixture model of weighted dominant components, i.e., salientobjects The chapter proves the efficiency of the modeling approach, using the results
of broad experiments obtained on images of nature
Chapter 15 is devoted to a fundamental problem—image segmentation It duces a new MPEG-7 friendly system that integrates a user’s feedback with imagesegmentation and region recognition, supporting the user in the extraction of imageregion semantics in highly dynamic environments The results obtained for aerialphotos are provided and discussed
Trang 35intro-P1: OTE/SPH P2: OTE
SVNY295-Petrushin November 1, 2006 16:9
1 Introduction into Multimedia Data Mining and Knowledge Discovery 11
Chapter 16 describes techniques for query by example in an image database,when the exemplar image can have different position and scale in the target im-age and/or be captured by different sensor The approach matches images againstthe exemplar by comparing the local entropies in the images at correspondingpositions It employs a search strategy that combines sampling in the space ofexemplar positions, the Fast Fourier Transform for efficiently evaluating objecttranslations, and iterative optimization for pose refinement The techniques are ap-plied to matching exemplars with real images such as aerial and ground reconnais-sance photos Strategies for scaling this approach to multimedia databases are alsodescribed
Chapter 17 presents a neural experts architecture that enables faster neural works training for datasets that can be decomposed into loosely interacting sets ofattributes It describes the expressiveness of this architecture in terms of functionalcomposition The experimental results show that the proposed neural experts archi-tecture can achieve classification performance that is statistically identical to that of afully connected feedforward neural networks, while significantly improving trainingefficiency
net-Chapter 18 describes a methodology that allows building models of expressivemusic performance The methodology consists of three stages First, acoustic featuresare extracted from recordings of both expressive and neutral musical performances.Then, using data mining techniques a set of transformation rules is derived from per-formance data Finally, the rules are applied to description of inexpressive melody tosynthesize an expressive monophonic melody in MIDI or audio format The chap-ter describes, explores, and compares different data mining techniques for creatingexpressive transformation models
Finally, the fifth part unites seven chapters that describe case studies and cations Chapter 19 presents a new approach for supporting design and redesign ofvirtual collaborative workspaces, based on combining integrated data mining tech-niques for refining the lower level models with a reverse engineering cycle to createupper-level models The methodology is based on the combination of a new model
appli-of vertical information integration related to virtual collaboration that is called theinformation pyramid of virtual collaboration
Chapter 20 presents a time-constrained sequential pattern mining method forextracting patterns associated with semantic events in produced videos The video
is separated into shots and 13 streams of metadata are extracted for each shot Themetadata not only cover such shot’s attributes as duration, low-level color, texture,shape and motion features, and sound volume, but also advanced attributes such asthe presence of weapons in the shot and sound type, i.e., silence, speech, or music.All metadata are represented as quantized values forming finite alphabets for eachstream Data mining techniques are applied to stream of metadata to derive patternsthat represent semantic events
Chapter 21 describes an approach for people localization and tracking in an officeenvironment using a sensor network that consists of video cameras, infrared tagreaders, a fingerprint reader, and a pan-tilt-zoom camera The approach is based on aBayesian framework that uses noisy, but redundant data from multiple sensor streams
Trang 36Chapter 22 presents an approach that allows estimating a potential attractiveness
of a banner image based on its attributes A banner image is an advertisement that isdesigned to attract Web users to view and eventually to buy the advertised product
or service The approach uses a Bayesian classifier to predict the level of the thru rates based on the features extracted from the banner image GIF file, such asimage dimensions, color features, number of frames in an animated file, and frames’dynamic and chromatic features The experimental results are discussed
click-Chapter 23 presents an approach that allows deriving users’ profiles from theirperformance data when they are working with a video search engine The approachsuggests a two-level model The goal of the first level is modeling and clustering user’sbehavior on a single video sequence (an intravideo behavior) The goal of the secondlevel is modeling and clustering a user’s behavior on a set of video sequences (an in-tervideo behavior) The two-phase clustering algorithm is presented and experimentalresults are discussed
Chapter 24 presents a method for an automatic identification of a person by irisrecognition The method uses arrays of pixels extracted from a raster image of a humaniris and searches for similar patterns using the latent semantic indexing approach.The comparison process is expedited by replacing the time consuming singular valuedecomposition algorithm with the partial symmetric eigenproblem The results ofexperiments using a real biometric data collection are discussed
Chapter 25 deals with data mining of medical images It presents the researchresults obtained for texture extraction, classification, segmentation, and retrieval ofnormal soft tissues in computer tomography images of the chest and abdomen Thechapter describes various data mining techniques, which allow identifying differenttissues in images, segmenting and indexing images, and exploring different similaritymeasures for image retrieval Experimental results for tissue segmentation, classifi-cation and retrieval are presented
References
1 Groth R Data Mining A Hands-On Approach for Business Professionals Prentice Hall,
Upper Saddle River, NJ, 1998
2 Berry MJA, Linoff G Data Mining Techniques for Marketing, Sales, and Customer port Wiley Computer Publishing, New York, 1997.
Sup-3 Agosta L The future of data mining—Predictive analytics IT View Report, ForresterResearch October 17, 2003 Available at: http://www.forrester.com/Research/LegacyIT/0,7208,32896,00.html
4 Berry MW (Ed.) Survey of Text Mining: Clustering, Classification, and Retrieval
Springer-Verlag, New York, 2004, 244 p
5 Zhong N, Liu J, Yao Y (Eds.) Web Intelligence Springer, New York, 2005; 440 p.
6 Manjunath BS, Salembier Ph, Sikora T (Eds.) Introduction to MPEG-7 Multimedia tent Description Interface Wiley, New York, 2002, 371 p.
Con-7 Maybury MT Intelligent Multimedia Information Retrieval AAAI Press/MIT Press,
Cambridge, MA, 1997, 478 p
Trang 37P1: OTE/SPH P2: OTE
SVNY295-Petrushin November 1, 2006 16:9
1 Introduction into Multimedia Data Mining and Knowledge Discovery 13
8 Furht B (Ed.) Handbook of Multimedia Computing CRC Press, Boca Raton, FL, 1999,
971 p
9 Petrushin V Creating emotion recognition agents for speech signal In Dauntenhahn K,
Bond AH, Canamero L and Edmonds B (Eds.), Socially Intelligent Agents Creating Relationships with Computers and Robots Kluwer, Boston, 2002, pp 77–84.
10 Remagnino P, Jones GA, Paragios N, Regazzoni CS Video-based Surveillance Systems Computer Vision and Distributed Processing Kluwer, Boston, 2002, 279 p.
11 TRECVID Workshop website http://www-nlpir.nist.gov/projects/trecvid/
12 Furht B, Marques O (Eds.) Handbook of Video Databases Design and Applications CRC
Press, Boca Raton, FL, 2004, 1211 p
13 Rosenfeld A, Doermann D, DeMenthon D Video Mining Kluwer, Boston, 2003, 340 p.
14 Zaiane OR, Simoff SJ, Djeraba Ch Mining Multimedia and Complex Data Lecture Notes
in Artificial Intelligence, Vol 2797 Springer, New York, 2003; 280 p.
15 Kulesh V, Petrushin VA, Sethi IK Video clip recognition using joint audio-visual
pro-cessing model In Proceedings of 16th International Conference on Pattern Recognition
18 Stamou G, Kollias S Multimedia Content and the Semantic Web Wiley, New York, 2005,
392 p
19 Kulesh V, Petrushin VA, Sethi IK PERSEUS: Personalized multimedia news portal
In Proceedings of IASTED Intl Conf on Artificial Intelligence and Applications,September 4–7, 2001, Marbella, Spain, pp 307–312
20 Kosch H Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21 CRC Press, Boca Raton, FL, 2004, 258 p.
21 Bormans J, Hill K MPEG-21 Multimedia framework Overview Available at: http://www.chiariglione.org/mpeg/standards/mpeg-21/mpeg-21.htm
22 The Dublin Core Metadata Initiative website http://dublincore.org/
23 Resource Description Framework (RDF) http://www.w3.org/RDF/
24 Simple Knowledge Organisation System (SKOS): http://www.w3.org/2004/02/skos/
25 Thesaurus for Graphic Materials (TGM): http://www.loc.gov/rr/print/tgm1/
26 OWL Web Ontology Language Overview http://www.w3.org/TR/owl-features/
Trang 38P1: OTE/SPH P2: OTE
SVNY295-Petrushin September 15, 2006 9:47
2 Multimedia Data Mining: An Overview
Nilesh Patel and Ishwar Sethi
Summary Data mining has been traditionally applied to well-structured data With the
explo-sion of multimedia data methods—videos, audios, images, and Web pages, many researchershave felt the need for data mining methods to deal with unstructured data in recent years Thischapter provides an overview of data mining efforts aimed at multimedia data We identifyexamples of pattern discovery models that have been addressed by different researchers andprovide an overview of such methods
The mining of multimedia data is more involved than that of traditional business
data because multimedia data are unstructured by nature There are no well-defined
fields of data with precise and nonambiguous meaning, and the data must be processed
to arrive at fields that can provide content information about it Such processing oftenleads to nonunique results with several possible interpretations In fact, multimediadata are often subject to varied interpretations even by human beings For example,
it is not uncommon to have different interpretation of an image by different experts,for example radiologists Another difficulty in mining of multimedia data are its
14
Trang 39P1: OTE/SPH P2: OTE
SVNY295-Petrushin September 15, 2006 9:47
2 Multimedia Data Mining: An Overview 15
heterogeneous nature The data are often the result of outputs from various kinds ofsensor modalities with each modality needing its own way of processing Yet anotherdistinguishing aspect of multimedia data is its sheer volume All these characteristics
of multimedia data make mining it challenging and interesting
The goal of this chapter is to survey the existing multimedia data mining ods and their applications The organization of the chapter is as follows In Section2.2, we describe the basic data mining architecture for multimedia data and discussaspects of data mining that are specific to multimedia data Section 2.3 provides anoverview of representative features used in multimedia data mining It also discussesthe issues of feature fusion Section 2.4 describes multimedia data mining efforts forconcept mining through supervised techniques Methods for concept mining throughclustering are discussed in Section 2.5 Section 2.6 discusses concept mining throughthe exploitation of contextual information Event and feature discovery research isaddressed in Section 2.7 Finally, a summary of chapter is provided in Section 2.8
meth-2.2 Multimedia Data Mining Architecture
The typical data mining process consists of several stages and the overall process
is inherently interactive and iterative The main stages of the data mining processare (1) domain understanding; (2) data selection; (3) cleaning and preprocessing;(4) discovering patterns; (5) interpretation; and (6) reporting and using discoveredknowledge [1] The domain understanding stage requires learning how the results ofdata-mining will be used so as to gather all relevant prior knowledge before mining.Blind application of data-mining techniques without the requisite domain knowledgeoften leads to the discovery of irrelevant or meaningless patterns For example, whilemining sports video for a particular sport, for example, cricket, it is important to have agood knowledge and understanding of the game to detect interesting strokes used bybatsmen
The data selection stage requires the user to target a database or select a subset offields or data records to be used for data mining A proper domain understanding atthis stage helps in the identification of useful data This is the most time consumingstage of the entire data-mining process for business applications; data are never cleanand in the form suitable for data mining For multimedia data mining, this stage isgenerally not an issue because the data are not in relational form and there are nosubsets of fields to choose from
The next stage in a typical data-mining process is the preprocessing step that volves integrating data from different sources and making choices about representing
in-or coding certain data fields that serve as inputs to the pattern discovery stage Suchrepresentation choices are needed because certain fields may contain data at levels
of details not considered suitable for the pattern discovery stage The preprocessingstage is of considerable importance in multimedia data mining, given the unstructurednature of multimedia data
The pattern-discovery stage is the heart of the entire data mining process It is thestage where the hidden patterns and trends in the data are actually uncovered There
Trang 40P1: OTE/SPH P2: OTE
SVNY295-Petrushin September 15, 2006 9:47
16 Nilesh Patel and Ishwar Sethi
are several approaches to the pattern discovery stage These include association,classification, clustering, regression, time-series analysis, and visualization Each ofthese approaches can be implemented through one of several competing methodolo-gies, such as statistical data analysis, machine learning, neural networks, and patternrecognition It is because of the use of methodologies from several disciplines thatdata mining is often viewed as a multidisciplinary field
The interpretation stage of the data mining process is used to evaluate the quality
of discovery and its value to determine whether previous stages should be revisited ornot Proper domain understanding is crucial at this stage to put a value on discoveredpatterns The final stage of the data mining process consists of reporting and putting
to use the discovered knowledge to generate new actions or products and services
or marketing strategies as the case may be An example of reporting for multimediadata mining is the scout system from IBM [2] in which the mined results are used bycoaches to design new moves
The architecture, shown in Figure 2.1, captures the above stages of data mining inthe context of multimedia data The broken arrows on the left in Figure 2.1 indicatethat the process is iterative The arrows emanating from the domain knowledge block
on the right indicate domain knowledge guides in certain stages of the mining process