1. Trang chủ
  2. » Tất cả

Multimedia Data Mining and Knowledge Discovery [Petrushin & Khan 2006-12-15]

539 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 539
Dung lượng 28,51 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Today data mining efforts are going beyond the databases to focusing on datacollected in fields like art, design, hypermedia, and digital media production, med-ical multimedia data analy

Trang 2

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

Multimedia Data Mining and Knowledge Discovery

i

Trang 4

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

Library of Congress Control Number: 2006924373

ISBN-10: 1-84628-436-8 Printed on acid-free paper

ISBN-13: 978-1-84628-436-6

© Springer-Verlag London Limited 2007

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency Enquiries concerning reproduction outside those terms should be sent to the publishers The use of registered names, trademarks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use.

The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.

Whilst we have made considerable efforts to contact all holders of copyright material contained in this book, we may have failed to locate some of them Should holders wish to contact the Publisher, we will be happy to come to some arrangement with them.

9 8 7 6 5 4 3 2 1

Springer Science+Business Media

springer.com

iv

Trang 5

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

Contents

Preface xvii

List of Contributors xix

Part I Introduction 1 Introduction into Multimedia Data Mining and Knowledge Discovery 3

Valery A Petrushin 1.1 What Is Multimedia Data Mining? 3

1.2 Who Does Need Multimedia Data Mining? 5

1.3 What Shall We See in the Future? 8

1.4 What Can You Find in This Book? 8

References 12

2 Multimedia Data Mining: An Overview 14

Nilesh Patel and Ishwar Sethi 2.1 Introduction 14

2.2 Multimedia Data Mining Architecture 15

2.3 Representative Features for Mining 18

2.3.1 Feature Fusion 21

2.4 Supervised Concept Mining 21

2.4.1 Annotation by Classification 21

2.4.2 Annotation by Association 23

2.4.3 Annotation by Statistical Modeling 24

2.5 Concept Mining Through Clustering 25

2.6 Concept Mining Using Contextual Information 27

2.7 Events and Feature Discovery 29

2.8 Conclusion 33

References 33

v

Trang 6

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

vi Contents

Part II Multimedia Data Exploration and Visualization

3 A New Hierarchical Approach for Image Clustering 41

Lei Wang and Latifur Khan 3.1 Introduction 41

3.2 Related Works 42

3.3 Hierarchy Construction and Similarity Measurement 43

3.3.1 Object Clustering 44

3.3.2 Vector Model for Images 47

3.3.3 Dynamic Growing Self-Organizing Tree (DGSOT) Algorithm 47

3.4 Experiment Results 52

3.5 Conclusion and Future Works 54

References 55

4 Multiresolution Clustering of Time Series and Application to Images 58

Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos 4.1 Introduction 58

4.2 Background and Related Work 59

4.2.1 Background on Clustering 59

4.2.2 Background on Wavelets 61

4.2.3 Background on Anytime Algorithms 62

4.2.4 Related Work 62

4.3 Our Approach—theik-means Algorithm 62

4.3.1 Experimental Evaluation on Time Series 64

4.3.2 Data Sets and Methodology 65

4.3.3 Error of Clustering Results 65

4.3.4 Running Time 68

4.4 ik-means Algorithm vs k-means Algorithm 69

4.5 Application to Images 71

4.5.1 Clustering Corel Image Data sets 74

4.5.2 Clustering Google Images 75

4.6 Conclusions and Future Work 77

Acknowledgments 77

References 77

5 Mining Rare and Frequent Events in Multi-camera Surveillance Video 80

Valery A Petrushin 5.1 Introduction 80

5.2 Multiple Sensor Indoor Surveillance Project 82

5.3 Data Collection and Preprocessing 83

5.4 Unsupervised Learning Using Self-Organizing Maps 86

5.4.1 One-Level Clustering Using SOM 86

Trang 7

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

Contents vii

5.4.2 Two-Level Clustering Using SOM 89

5.4.3 Finding Unusual Events 90

5.5 Visualization Tool 91

5.6 Summary 92

References 92

6 Density-Based Data Analysis and Similarity Search 94

Stefan Brecheisen, Hans-Peter Kriegel, Peer Kr¨oger, Martin Pfeifle, Matthias Schubert, and Arthur Zimek 6.1 Introduction 94

6.2 Hierarchical Clustering 96

6.3 Application Ranges 98

6.3.1 Data Analysis 98

6.3.2 Navigational Similarity Search 100

6.4 Cluster Recognition for OPTICS 100

6.4.1 Recent Work 101

6.4.2 Gradient Clustering 102

6.4.3 Evaluation 106

6.5 Extracting Cluster Hierarchies for Similarity Search 108

6.5.1 Motivation 108

6.5.2 Basic Definitions 109

6.5.3 Algorithm 110

6.5.4 Choice ofε in the i-th Iteration 112

6.5.5 The Extended Prototype CLUSS 113

6.6 Conclusions 114

References 114

7 Feature Selection for Classification of Variable Length Multiattribute Motions 116

Chuanjun Li, Latifur Khan, and Balakrishnan Prabhakaran 7.1 Introduction 116

7.2 Related Work 118

7.3 Background 120

7.3.1 Support Vector Machines 120

7.3.2 Singular Value Decomposition 121

7.4 Feature Vector Extraction Based on SVD 123

7.4.1 SVD Properties of Motion Data 123

7.4.2 Feature Vector Extraction 125

7.5 Classification of Feature Vectors Using SVM 127

7.6 Performance Evaluation 128

7.6.1 Hand Gesture Data Generation 128

7.6.2 Motion Capture Data Generation 128

7.6.3 Performance Evaluation 129

7.6.4 Discussion 133

7.7 Conclusion 135

Trang 8

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

viii Contents

Acknowledgments 136

References 136

Part III Multimedia Data Indexing and Retrieval 8 FAST: Fast and Semantics-Tailored Image Retrieval 141

Ruofei Zhang and Zhongfei (Mark) Zhang 8.1 Introduction 141

8.2 Fuzzified Feature Representation and Indexing Scheme 144

8.2.1 Image Segmentation 144

8.2.2 Fuzzy Color Histogram for Each Region 146

8.2.3 Fuzzy Representation of Texture and Shape for Each Region 147

8.2.4 Region Matching and Similarity Determination 148

8.3 Hierarchical Indexing Structure and HEAR Online Search 150

8.4 Addressing User’s Subjectivity Using ITP and ARWU 153

8.5 Experimental Evaluations 157

8.6 Conclusions 165

References 165

9 New Image Retrieval Principle: Image Mining and Visual Ontology 168

Marinette Bouet and Marie-Aude Aufaure 9.1 Introduction 168

9.2 Content-Based Retrieval 170

9.2.1 Logical Indexation Process 171

9.2.2 Retrieval Process 172

9.3 Ontology and Data Mining Against Semantics Lack in Image Retrieval 173

9.3.1 Knowledge Discovery in Large Image Databases 174

9.3.2 Ontologies and Metadata 175

9.4 Toward Semantic Exploration of Image Databases 176

9.4.1 The Proposed Architecture 176

9.4.2 First Experimentations 179

9.5 Conclusion and Future Work 181

References 182

10 Visual Alphabets: Video Classification by End Users 185

Menno Isra¨el, Egon L van den Broek, Peter van der Putten, and Marten J den Uyl 10.1 Introduction 185

10.2 Overall Approach 186

10.2.1 Scene Classification Procedure 187

Trang 9

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

Contents ix

10.2.2 Related Work 187

10.2.3 Positioning the Visual Alphabet Method 189

10.3 Patch Features 189

10.3.1 Distributed Color Histograms 190

10.3.2 Histogram Configurations 190

10.3.3 Human Color Categories 191

10.3.4 Color Spaces 191

10.3.5 Segmentation of the HSI Color Space 192

10.3.6 Texture 193

10.4 Experiments and Results 194

10.4.1 Patch Classification 195

10.4.2 Scene Classification 196

10.5 Discussion and Future Work 197

10.6 Applications 198

10.6.1 Vicar 199

10.6.2 Porn Filtering 200

10.6.3 Sewer Inspection 201

10.7 Conclusion 203

Acknowledgments 203

References 203

Part IV Multimedia Data Modeling and Evaluation 11 Cognitively Motivated Novelty Detection in Video Data Streams 209

James M Kang, Muhammad Aurangzeb Ahmad, Ankur Teredesai, and Roger Gaborski 11.1 Introduction 209

11.2 Related Work 211

11.2.1 Video Streams 211

11.2.2 Image Novelty 212

11.2.3 Clustering Novelty in Video Streams 212

11.2.4 Event vs Novelty Clustering 213

11.3 Implementation 213

11.3.1 Machine-Based Process 213

11.3.2 Human-Based System 217

11.3.3 Indexing and Clustering of Novelty 220

11.3.4 Distance Metrics 223

11.4 Results 225

11.4.1 Clustering and Indexing of Novelty 225

11.4.2 Human Novelty Detection 228

11.4.3 Human vs Machine 228

11.5 Discussion 229

11.5.1 Issues and Ideas 229

Trang 10

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

x Contents

11.5.2 Summary 231

Acknowledgments 231

References 231

12 Video Event Mining via Multimodal Content Analysis and Classification 234

Min Chen, Shu-Ching Chen, Mei-Ling Shyu, and Chengcui Zhang 12.1 Introduction 234

12.2 Related Work 236

12.3 Goal Shot Detection 238

12.3.1 Instance-Based Learning 238

12.3.2 Multimodal Analysis of Soccer Video Data 239

12.3.3 Prefiltering 248

12.3.4 Nearest Neighbor with Generalization (NNG) 251

12.4 Experimental Results and Discussions 252

12.4.1 Video Data Source 252

12.4.2 Video Data Statistics and Feature Extraction 253

12.4.3 Video Data Mining for Goal Shot Detection 254

12.5 Conclusions 255

Acknowledgments 256

References 256

13 Identifying Mappings in Hierarchical Media Data 259

K Selc¸uk Candan, Jong Wook Kim, Huan Liu, Reshma Suvarna, and Nitin Agarwal 13.1 Introduction 259

13.1.1 Integration of RDF-Described Media Resources 259

13.1.2 Matching Hierarchical Media Objects 260

13.1.3 Problem Statement 261

13.1.4 Our Approach 262

13.2 Related Work 262

13.3 Structural Matching 264

13.3.1 Step I: Map Both Trees Into Multidimensional Spaces 265

13.3.2 Step II: Compute Transformations to Align the Common Nodes of the Two Trees in a Shared Space 266

13.3.3 Step III: Use the Identified Transformations to Position the Uncommon Nodes in the Shared Space 272

13.3.4 Step IV: Relate the Nodes from the Two Trees in the Shared Space 272

13.4 Experimental Evaluation 272

13.4.1 Synthetic and Real Data 273

13.4.2 Evaluation Strategy 275

13.4.3 Experiment Synth1–Label Differences 276

13.4.4 Experiment Synth2-Structural Differences 277

13.4.5 Experiment Real1: Treebank Collection 279

Trang 11

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

Contents xi

13.4.6 Execution Time 280

13.4.7 Synth3: When the Corresponding Nodes in the Two Trees Match Imperfectly 282

13.4.8 Synth4: Many-to-Many Correspondences Between Nodes 282

13.4.9 Execution Time with Fuzzy, Many-to-Many Mappings 286

13.4.10 Real2: Experiments with the CNN Data 286

13.5 Conclusions 286

Acknowledgments 287

References 287

14 A Novel Framework for Semantic Image Classification and Benchmark Via Salient Objects 291

Yuli Gao, Hangzai Luo, and Jianping Fan 14.1 Introduction 291

14.2 Image Content Representation Via Salient Objects 292

14.3 Salient Object Detection 294

14.4 Interpretation of Semantic Image Concepts 298

14.5 Performance Evaluation 302

14.6 Conclusions 304

References 305

15 Extracting Semantics Through Dynamic Context 307

Xin Li, William Grosky, Nilesh Patel, and Farshad Fotouhi 15.1 Introduction 307

15.2 Related Work 308

15.3 System Architecture 309

15.4 Segmentation 310

15.4.1 Chi-Square Method 310

15.4.2 Kruskal–Wallis Method 314

15.5 Extracting Semantics 316

15.5.1 Image Resegmentation 317

15.6 Experimental Results 321

15.7 Supporting MPEG-7 322

15.8 Conclusion 322

References 324

16 Mining Image Content by Aligning Entropies with an Exemplar 325

Clark F Olson 16.1 Introduction 325

16.2 Related Work 327

16.3 Matching with Entropy 328

16.4 Toward Efficient Search 330

16.5 Results 332

16.6 Large Image Databases 333

Trang 12

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

xii Contents

16.7 Future Work 336

16.7.1 Tracking 336

16.7.2 Grouping 337

16.8 Summary 337

Acknowledgments 338

References 338

17 More Efficient Mining Over Heterogeneous Data Using Neural Expert Networks 340

Sergio A Alvarez, Carolina Ruiz, and Takeshi Kawato 17.1 Introduction 340

17.1.1 Scope of the Chapter 340

17.1.2 Related Work 341

17.1.3 Outline of the Chapter 342

17.2 Artificial Neural Networks 343

17.2.1 Network Topologies 343

17.2.2 Network Training 345

17.3 Efficiency and Expressiveness 346

17.3.1 Time Complexity of Training 346

17.3.2 Expressive Power 347

17.4 Experimental Evaluation 350

17.4.1 Web Images Data 351

17.4.2 Networks 352

17.4.3 Performance Metrics 354

17.4.4 Evaluation Protocol 354

17.5 Results 355

17.5.1 Classification Performance 355

17.5.2 Time Efficiency 356

17.5.3 Discussion 356

17.5.4 Additional Experimental Results 356

17.6 Conclusions and Future Work 359

References 360

18 A Data Mining Approach to Expressive Music Performance Modeling 362

Rafael Ramirez, Amaury Hazan, Esteban Maestre, and Xavier Serra 18.1 Introduction 362

18.2 Melodic Description 363

18.2.1 Algorithms for Feature Extraction 363

18.2.2 Low-Level Descriptors Computation 363

18.2.3 Note Segmentation 366

18.2.4 Note Descriptor Computation 366

18.3 Expressive Performance Knowledge Induction 366

18.3.1 Training Data 367

18.3.2 Musical Analysis 367

Trang 13

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

Contents xiii

18.3.3 Data Mining Techniques 368

18.3.4 Results 373

18.4 Expressive Melody Generation 375

18.5 Related Work 376

18.6 Conclusion 378

Acknowledgments 378

References 379

Part V Applications and Case Studies 19 Supporting Virtual Workspace Design Through Media Mining and Reverse Engineering 383

Simeon J Simoff and Robert P Biuk-Aghai 19.1 Introduction 383

19.2 Principles of the Approach Toward Reverse Engineering of Processes Using Data Mining 388

19.2.1 The Information Pyramid Formalism 388

19.2.2 Integrated Collaboration Data and Data Mining Framework 391

19.3 Method for Reverse Engineering of Processes 392

19.4 Example of Reverse Engineering of Knowledge-Intensive Processes from Integrated Virtual Workspace Data 394

19.4.1 Task Analysis 397

19.4.2 Process Analysis 398

19.4.3 Temporal Analysis 401

19.5 Conclusions and Future Work 402

Acknowledgments 403

References 403

20 A Time-Constrained Sequential Pattern Mining for Extracting Semantic Events in Videos 404

Kimiaki Shirahama, Koichi Ideno, and Kuniaki Uehara 20.1 Introduction 404

20.2 Related Works 406

20.2.1 Video Data Mining 406

20.2.2 Sequential Pattern Mining 408

20.3 Raw Level Metadata 409

20.4 Time-Constrained Sequential Pattern Mining 412

20.4.1 Formulation 412

20.4.2 Mining Algorithm 414

20.4.3 Parallel Algorithm 418

20.5 Experimental Results 419

20.5.1 Evaluations of Assigning Categorical Values ofS M 420

20.5.2 Evaluation of Semantic Event Boundary Detections 420

Trang 14

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

xiv Contents

20.5.3 Evaluations of Extracted Semantic Patterns 421

20.6 Conclusion and Future Works 424

Acknowledgments 424

References 424

21 Multiple-Sensor People Localization in an Office Environment 427

Gang Wei, Valery A Petrushin, and Anatole V Gershman 21.1 Introduction 427

21.2 Environment 428

21.3 Related Works 431

21.4 Feature Extraction 433

21.4.1 Camera Specification 433

21.4.2 Background Modeling 434

21.4.3 Visual Feature Extraction 436

21.4.4 People Modeling 437

21.5 People Localization 438

21.5.1 Sensor Streams 438

21.5.2 Identification and Tracking of Objects 439

21.6 Experimental Results 444

21.7 Summary 446

References 446

22 Multimedia Data Mining Framework for Banner Images 448

Qin Ding and Charles Daniel 22.1 Introduction 448

22.2 Na¨ıve Bayesian Classification 450

22.3 The Bayesian Banner Profiler Framework 450

22.3.1 GIF Image Attribute Extraction 451

22.3.2 Attribute Quantization Algorithm 452

22.3.3 Bayesian Probability Computation Algorithm 453

22.4 Implementation 454

22.5 Conclusions and Future Work 456

References 456

23 Analyzing User’s Behavior on a Video Database 458

Sylvain Mongy, Fatma Bouali, and Chabane Djeraba 23.1 Introduction 458

23.2 Related Work 460

23.2.1 Web Usage Mining 460

23.2.2 Video Usage Mining 461

23.3 Proposed Approach 462

23.3.1 Context 462

23.3.2 Gathering Data 463

23.3.3 Modeling User’s Behavior: A Two-Level Based Model 464

Trang 15

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

Contents xv

23.4 Experimental Results 468

23.4.1 Creation of the Test Data Sets 468

23.4.2 Exploiting the Intravideo Behavior 468

23.4.3 Multiple Subsequence Cluster Modeling 469

23.5 Future work 470

References 470

24 On SVD-Free Latent Semantic Indexing for Iris Recognition of Large Databases 472

Pavel Praks, Libor Machala, and V´aclav Sn´aˇsel 24.1 Introduction 472

24.2 Image Retrieval Using Latent Semantic Indexing 473

24.2.1 Image Coding 473

24.2.2 Document Matrix Scaling 474

24.3 Implementation Details of Latent Semantic Indexing 476

24.3.1 LSI and Singular Value Decomposition 476

24.3.2 LSI1—The First Enhancement of the LSI Implementation 477

24.3.3 LSI2—The Second Enhancement of LSI 479

24.4 Iris Recognition Experiments 479

24.4.1 The Small-Size Database Query 480

24.4.2 The Large-Scale Database Query 481

24.4.3 Results of the Large-Scale Database Query 483

24.5 Conclusions 483

Acknowledgments 485

References 485

25 Mining Knowledge in Computer Tomography Image Databases 487

Daniela Stan Raicu 25.1 Introduction 487

25.2 Mining CT Data: Classification, Segmentation, and Retrieval 489

25.2.1 Image Classification: Related Work 489

25.2.2 Image Segmentation: Related Work 490

25.2.3 Image Retrieval: Related Work 491

25.3 Materials and Methods 492

25.3.1 Image Database 492

25.3.2 Texture Features 493

25.3.3 Classification Model 497

25.3.4 Similarity Measures and Performance Evaluation 498

25.4 Experimental Results and Their Interpretation 501

25.4.1 Tissue Classification Results 501

25.4.2 Tissue Segmentation Results 502

25.4.3 Tissue Retrieval Results 504

Trang 16

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

xvi Contents

25.5 Conclusions 505

References 506

Author Index 509

Subject Index 511

Trang 17

Today data mining efforts are going beyond the databases to focusing on datacollected in fields like art, design, hypermedia, and digital media production, med-ical multimedia data analysis and computational modeling of creativity, includingevolutionary computation These fields use variety of data sources and structures,interrelated by the nature of the phenomenon As a result there is an increasing in-terest in new techniques and tools that can detect and discover patterns that can lead

to a new knowledge in the problem domain, where the data have been collected.There is also an increasing interest in the analysis of multimedia data generated

by different distributed applications, like collaborative virtual environments, virtualcommunities, and multiagent systems The data collected from such environmentsinclude record of the actions in them, audio and video recordings of meetings andcollaborative sessions, variety of documents that are part of the business process, asyn-chronous threaded discussions, transcripts from synchronous communications, andother data records These heterogeneous multimedia data records require sophisticated

xvii

Trang 18

This book is based mostly on extended and updated papers that have been sented at the two Multimedia Data Mining Workshops—MDM KDD 2003 and MDMKDD 2004 that held in conjunction with the ACM SIGKDD Conference in Wash-ington, DC, August 2003 and the ACM SIGKDD Conference in Seattle, WA, August

pre-2004, respectively The book also includes several invited surveys and papers Thebook chapters give a snapshot of research and applied activities in the multimediadata mining

The editors are grateful to the founders and active supporters of the MultimediaData Mining Workshop Series Simeon Simoff, Osmar Zaiane, and Chabane Djeraba

We also thank the reviewers of book papers for their well-done job and organizers ofACM SIGKDD Conferences for their support

We thank the Springer-Verlag’s employees Wayne Wheeler, who initiated the bookproject, Catherine Brett, and Frank Ganz for their help in coordinating the publicationand editorial assistance

January 2006

Trang 19

Fatma Bouali

LIFL UMR CNRS 8022France

Fatma.Bouali@univ-lille2.fr

Marinette Bouet

LIMOS – UMR 6158 CNRSBlaise Pascal University ofClermont-Ferrand IICampus des C´ezeaux 24, avenue desLandais

F-63173 Aubiere CedexFrance

bpclermont.fr

Marinette.Bouet@cust.univ-Stefan Brecheisen

Institute for Computer ScienceUniversity of Munich

Oettingenstr 67, 80538 Munich,Germany

brecheis@dbs.ifi.lmu.de

xix

Trang 20

Farshad Fotouhi

Department of Computer ScienceWayne State University

Detroit, MI 48202USA

fotouhi@wayne.edu

Roger Gaborski

Rochester Institute of Technology

102 Lomb Memorial DriveRochester, NY 14623-5608USA

rsg@cs.rit.edu

Yuli Gao

Dept of Computer ScienceUniversity of North Carolina – CharlotteCharlotte, NC 28223

USAygao@uncc.edu

anatole.v.gershman@accenture.com

Trang 21

Music Technology Group

Pompeu Fabra University

USAtakeshi@wpi.edu

Latifur Khan

Department of Computer ScienceUniversity of Texas at Dallas

75083 Richardson, TexasUSA

lkhan@utdallas.edu

Jong Wook Kim

Department of Computer Science andEngineering

Arizona State UniversityTempe, AZ 82857USA

Peer Kr¨oger

Institute for Computer ScienceUniversity of Munich

Oettingenstr 67, 80538 Munich,Germany

kroegerp@dbs.ifi.lmu.de

Trang 22

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

xxii List of Contributors

Chuanjun Li

Department of Computer Science

The University of Texas at Dallas

Richardson, Texas 75083

USA

chuanjun@utdallas.edu

Xin Li

Department of Computer Science

Oklahoma City University

Dept of Computer Science

University of North Carolina – Charlotte

Trang 23

P1: OTE/SPH P2: OTE

SVNY295-Petrushin October 18, 2006 15:30

List of Contributors xxiii

Balakrishnan Prabhakaran

Department of Computer Science

University of Texas at Dallas,

Daniela Stan Raicu

Intelligent Multimedia Processing

Music Technology Group

Pompeu Fabra University

Computer Science Department

Worcester Polytechnic Institute

80538 MunichGermanyschubert@dbs.ifi.lmu.de

Xavier Serra

Music Technology GroupPompeu Fabra UniversityOcata 1, 08003 BarcelonaSpain

simeon@it.uts.edu.au

Trang 24

Rochester Institute of Technology

102 Lomb Memorial Drive

Egon L van den Broek

Center for Telematics and Information

Technology (CTIT) and

Institute for Behavioral Research (IBR)

Ruofei Zhang

Computer Science DepartmentWatson School

SUNY BinghamtonBinghamton, NY, 13902-6000USA

rzhang@cs.binghamton.edu

Zhongfei (Mark) Zhang

Computer Science DepartmentWatson School,

SUNY BinghamtonBinghamton, NY 13902-6000USA

zhongfei@cs.binghamton.edu

Trang 25

zimek@dbs.ifi.lmu.de

Trang 26

P1: OTE/SPH P2: OTE

SVNY295-Petrushin November 1, 2006 16:9

Part I Introduction

1

Trang 27

Summary This chapter briefly describes the purpose and scope of multimedia data mining

and knowledge discovery It identifies industries that are major users or potential users of thistechnology, outlines the current state of the art and directions to go, and overviews the chapterscollected in this book

1.1 What Is Multimedia Data Mining?

The traditional definition of data mining is that it “is the process of automatinginformation discovery” [1], which improves decision making and gives a companyadvantages on the market Another definition is that it “is the exploration and analysis,

by automatic or semiautomatic means, of large quantities of data in order to discovermeaningful patterns and rules” [2] It is also assumed that the discovered patternsand rules are meaningful for business Indeed, data mining is an applied discipline,which grew out of the statistical pattern recognition, machine learning, and artificialintelligence and coupled with business decision making to optimize and enhance it.Initially, data mining techniques have been applied to structured data from databases.The term “knowledge discovery in databases,” which is currently obsolete, reflectsthis period However, knowledge is interpretation of data meanings and knowledgediscovery goes beyond finding simple patterns and correlations in data to identifyingconcepts and finding relationships Knowledge-based modeling creates a consistentlogical picture of the world In recent years the term “predictive analytics” has beenwidely adopted in the business world [3]

On one hand growing computer power made data mining techniques affordable

by small companies, but on the other, emergence of cheap massive memory anddigital recording electronic devices, such as scanners, microphones, cameras, andcamcorders, allowed digitizing all kind of corporate, governmental, and private doc-uments Many companies consider these electronic documents as valuable assets andother sources of data for data mining For example, e-mail messages from customersand recordings of telephonic conversations between customers and operators couldserve as valuable sources of knowledge about both customers’ needs and the qual-ity of service Unprecedented growing information on the World Wide Web made it

3

Trang 28

Recently, two branches of data mining, text data mining and Web data mining, haveemerged [4, 5] They have their own research agenda, communities of researchers,and supporting companies that develop technologies and tools Unfortunately, todaymultimedia data mining is still in an embryonic state It could be explained by im-mature technology, high cost of storing and processing media data, and the absence

of successful stories that show the benefits and the high rate of return on investmentsinto multimedia data mining

For understanding multimedia data mining in depth, let us consider its purposeand scope First, let us describe what kinds of data belong to the multimedia data.According to MPEG-7 Standard [6], there are four types of multimedia data: audiodata, which includes sounds, speech, and music; image data (black-and-white andcolor images); video data, which include time-aligned sequences of images; andelectronic or digital ink, which is sequences of time aligned 2D or 3D coordinates of

a stylus, a light pen, data glove sensors, or a similar device All this data is generated

by specific kind of sensors.

Second, let us take a closer look at the term multimedia data mining The word

multimedia assumes that several data sources of different modalities are processing

at the same time It could be or could not be the case A data mining project candeal with only one modality of data, for example, customers’ audio recordings or

surveillance video It would be better to use the term media data mining instead, but the word media usually connotes by mass media such as radio and television, which could be or could not be the data source for the data mining project The term sensor

data mining extends the scope too far covering such sensors as radars, speedometers,

accelerometers, echo locators, thermometers, etc This book is devoted to discussing

the first three media types mentioned above and we shall use the term multimedia

data mining and the acronym MDM keeping in mind the above discussion.

The MDM’s primary purpose is to process media data alone or in a combinationwith other data for finding patterns useful for business For example, analyze customertraffic in a retail store using video recordings to find optimal location for a new productdisplay Besides explicit data mining projects, the MDM techniques can be used as apart of complex operational or manufacturing processes For example, using imagesfor finding defective products or indexing a video database of company’s meetings.The MDM is a part of multimedia technology, which covers the following areas[7, 8]:

r Media compression and storage

r Delivering streaming media over networks with required quality of service

r Media restoration, transformation, and editing

r Media indexing, summarization, search, and retrieval

r Creating interactive multimedia systems for learning/training and creative artproduction

r Creating multimodal user interfaces

Trang 29

P1: OTE/SPH P2: OTE

SVNY295-Petrushin November 1, 2006 16:9

1 Introduction into Multimedia Data Mining and Knowledge Discovery 5

The major challenge that the MDM shares with the multimedia retrieval is theso-called semantic gap, which is the difficulty of deriving a high-level concept such as

“mountain landscape” or “abnormal customer behavior” from low-level features such

as color histogram, homogeneous texture, contour-based shape, motion trajectory,etc., that are extracted from media data A solution of this problem requires creating

an ontology that covers different aspects of the concept and a hierarchy of recognizersthat deduce the probability of the concept from the probabilities of its componentsand relationships among them

1.2 Who Does Need Multimedia Data Mining?

To answer the above question, we consider the application areas of MDM and relatedindustries and companies who are (potential) users of technology The following arethe five major application areas of MDM:

Customer Insight—Customer insight includes collecting and summarizing

informa-tion about customers’ opinions about products or services, customers’ complains,customers’ preferences, and the level of customers’ satisfaction of products or ser-vices All product manufacturers and service providers are interested in customerinsight Many companies have help desks or call centers that accept telephonecalls from the customers These calls are recorded and stored If an operator is notavailable a customer may leave an audio message Some organizations, such asbanks, insurance companies, and communication companies, have offices wherecustomers meet company’s representatives The conversations between customersand sales representatives can be recorded and stored The audio data serve as aninput for data mining to pursue the following goals:

r Topic detection—The speech from recordings is separated into turns, i.e., speechsegments spoken by only one speaker Then the turns are transcribed into text andkeywords are extracted The keywords are used for detecting topics and estimat-ing how much time was spent on each topic This information aggregated by day,week, and month gives the overview of hot topics and allows the management

to plan the future training taking into account the emerging hot topics

r Resource assignment—Call centers have a high turn around rate and only smallpercentage of experienced operators, who are a valuable resource In case whenthe call center collects messages, the problem is how to assign an operator whowill call back To solve the problem, the system transcribes the speech anddetects the topic Then it estimates the emotional state of the caller If the caller

is agitated, angry, or sad, it gives the message a higher priority to be responded

by an experienced operator Based on the topic, emotional state of the caller, andoperators’ availability the system assigns a proper operator to call back [9]

r Evaluation of quality of service—At a call center with thousands of calls perday, the evaluation of quality of service is a laborious task It is done by peoplewho listen to selected recordings and subjectively estimate the quality of service.Speech recognition in combination with emotion recognition can be used forautomating the evaluation and making it more objective Without going into

Trang 30

Currently, there are several small companies that provide tools and solutionsfor customer management for call centers Some tools include procedures thatare based on MDM techniques The extension of this market is expected in thefuture.

Surveillance—Surveillance consists of collecting, analyzing, and summarizing

au-dio, video, or audiovisual information about a particular area, such as battlefields,forests, agricultural areas, highways, parking lots, buildings, workshops, malls,retail stores, offices, homes, etc [10] Surveillance often is associated with intel-ligence, security, and law enforcement, and the major uses of this technology aremilitary, police, and private companies that provide security services The U.S Gov-ernment is supporting long-term research and development activities in this field

by conducting the Video Analysis and Content Extraction (VACE) program, which

is oriented on development technology for military and intelligence purposes TheNational Institute of Standards and Technology is conducting the annual evaluation

of content-based retrieval in video (TRECVID) since 2000 [11] However, manycivilian companies use security services for protecting their assets and monitoringtheir employees, customers, and manufacturing processes They can get valuableinsight from mining their surveillance data There are several goals of surveillancedata mining:

r Object or event detection/recognition—The goal is to find an object in an image

or in a sequence of images or in soundtrack that belongs to a certain class ofobjects or represents a particular instance For example, detect a military vehicle

in a satellite photo, detect a face in sequence of frames taken by a camera that

is watching an office, recognize the person whose face is detected, or recognizewhether sound represents music or speech Another variant of the goal is toidentify the state or attributes of the object For example, classify an X-ray image

of lungs as normal or abnormal or identify the gender of a speaker Finally,the goal can be rather complex for detecting or identifying an event, which is

a sequence of objects and relationships among objects For example, detect agoal event in a soccer game; detect violence on the street, or detect a trafficaccident This goal is often a part of the high-level goals that are describedbelow

r Summarization—The goal is to aggregate data by summarizing activities in spaceand/or time It covers summarizing activities of a particular object (for example,drawing a trajectory of a vehicle or indicating periods of time when a speaker wastalking) or creating a “big picture” of activities that happened during some period

of time For example, assuming that a bank has a surveillance system that includesmultiple cameras that watch tellers and ATM machines indoors and outdoors,the goal is to summarize the activities that happened in the bank during 24 h

To meet the goal, unsupervised learning and visualization techniques are used

Summarization also serves as a prerequisite step to achieve another goal—find

frequent and rare events.

Trang 31

P1: OTE/SPH P2: OTE

SVNY295-Petrushin November 1, 2006 16:9

1 Introduction into Multimedia Data Mining and Knowledge Discovery 7

r Monitoring—The goal is to detect events and generate response in real time Themajor challenges are real-time processing and generating minimum false alarms.Examples are monitoring areas with restricted access, monitoring a public placefor threat detection, or monitoring elderly or disabled people at home

Today we witness a real boom in video processing and video mining researchand development [12–14] However, only a few companies provide tools that haveelements of data mining

Media Production and Broadcasting—Proliferation of radio stations and TV channels

makes broadcasting companies to search for more efficient approaches for creatingprograms and monitoring their content The MDM techniques are used to achievethe following goals:

r Indexing archives and creating new programs—A TV program maker uses raw

footage called rushes and clips from commercial libraries called stockshot

li-braries to create a new program A typical “shoot-to-show” ratio for a TV

pro-gram is in the range from 20 to 40 It means that 20–40 hours of rushes go intoone hour of a TV show Many broadcasting companies have thousands of hours

of rushes, which are poorly indexed and contain redundant, static, and low-valueepisodes Using rushes for TV programs is inefficient However managers be-lieve that rushes could be valuable if the MDM technology could help programmakers to extract some “generic” episodes with high potential for reuse

r Program monitoring and plagiarism detection—A company that pays for casting its commercials is interested to know how many times the commercialhas been really aired Some companies are also interested how many times theirlogo has been viewed on TV during a sporting event The broadcasting compa-nies or independent third-party companies can provide such service The othergoals are detecting plagiarism in and unlicensed broadcasting of music or videoclips This requires using MDM techniques for audio/video clip recognition [15]and robust object recognition [16]

broad-Intelligent Content Service—According to Forrester Research, Inc, (Cambridge, MA)

the Intelligent Content Service (ICS) is “a semantically smart content-centric set

of software services that enhance the relationship between information workersand computing systems by making sense of content, recognizing context, and un-derstanding the end user’s requests for information” [17] Currently, in spite ofmany Web service providers’ efforts to extend their search services beyond basickeyword search to ICS, it is not available for multimedia data yet This area will bethe major battlefield among the Web search and service providers in next 5 years.The MDM techniques can help to achieve the following goals:

r Indexing Web media and using advanced media search—It includes creatingindexes of images and audio and video clips posted on the Web using audioand visual features; creating ontologies for concepts and events; implementingadvance search techniques, such as search images or video clips by example or

by sketch and search music by humming; using Semantic Web techniques toinfer the context and improve search [18]; and using context and user feedback

to better understand the user’s intent

r Advanced Web-based services—Taking into account exponential growth of formation on the Web such services promise to be an indispensable part of

Trang 32

ac-Knowledge Management—Many companies consider their archives of documents as

a valuable asset They spend a lot of money to maintain and provide access to theirarchives to employees Besides text documents, these archives can contain draw-ings of designs, photos and other images, audio and video recording of meetings,multimedia data for training, etc The MDM approaches can provide the ICS forsupporting knowledge management of companies

1.3 What Shall We See in the Future?

As to the future development in the multimedia data mining field, I believe that we shallsee the essential progress during the next 5 years The major driven force behind it iscreating techniques for advanced Web search to provide intelligent content services tocompanies for supporting their knowledge management and business intelligence Itwill require creating general and industry-specific ontologies, develop recognizers forentities of these ontologies, merging probabilistic and logical inferences, and usage

of metadata and reasoning represented in different vocabularies and languages, such

as MPEG-7 [6], MPEG-21 [20, 21], Dublin Core [22], RDF [23], SKOS [24], TGM[25], OWL [26], etc

Another force, which is driven by funding from mostly governmental agencies, isvideo surveillance and mass media analysis for improving intelligence, military, andlaw enforcement decision making

Mining audio and video recordings for customer insight and using MDM forimproving broadcasting companies’ production will also be growing and elaborating

in the future

From the viewpoint of types of media, most advances in video processing areexected in research and development for surveillance and broadcasting applications,audio processing will get benefits from research for customer insight and creatingadvanced search engines, and image processing will benefit from developing advancedsearch engines As to electronic ink, I believe this type of multimedia data will beaccumulated and used in data mining in the future, for example, for estimating theskills level of a user of an interactive tool or summarizing the contribution to a project

of each participant of a collaborative tool

1.4 What Can You Find in This Book?

As any collection of creative material the chapters of the book are different in style,topics, mathematical rigidity, level of generality, and readiness for deployment Butaltogether they build a mosaic of ongoing research in the fast changing dynamic field

of research that is the multimedia data mining

Trang 33

P1: OTE/SPH P2: OTE

SVNY295-Petrushin November 1, 2006 16:9

1 Introduction into Multimedia Data Mining and Knowledge Discovery 9

The book consists of five parts: The first part, which includes two chapters, gives

an introduction into multimedia data mining Chapter 1, which you are reading now,overviews the multimedia data mining as an industry Chapter 2 presents an overview

of MDM techniques It describes a typical architecture of MDM systems and coversapproaches to supervised and unsupervised concept mining and event discovery.The second part includes five chapters It is devoted to multimedia data explorationand visualization Chapter 3 deals with exploring images It presents a clusteringmethod based on unsupervised neural nets and self-organizing maps, which is calledthe dynamic growing self-organizing tree algorithm (DGSOT) The chapter showsthat the suggested algorithm outperforms the traditional hierarchical agglomerativeclustering algorithm

Chapter 4 presents a multiresolution clustering of time series It uses the Haar

wavelet transform for representing time series at different resolutions and applies

k-means clustering sequentially on each level starting from the coarsest representation.The algorithm stops when two sequential membership assignments are equal or when

it reaches the finest representation The advantages of the algorithm are that it worksfaster and produces better clustering The algorithm has been applied to clusteringimages, using color and texture features

Chapter 5 describes a method for unsupervised classification of events in ticamera indoors surveillance video The self-organizing map approach has beenapplied to event data for clustering and visualization The chapter presents a tool forbrowsing clustering results, which allows exploring units of the self-organizing maps

mul-at different levels of hierarchy, clusters of units, and distances between units in 3Dspace for searching for rare events

Chapter 6 presents the density-based data analysis and similarity search It troduces a visualization technique called the reachability plot, which allows visuallyexploring a data set in multiple representations and comparing multiple similaritymodels The chapter also presents a new method for automatically extracting clusterhierarchies from a given reachability plot and describes a system prototype, whichserves for both the visual data analysis and a new way of object retrieval callednavigational similarity search

in-Chapter 7 presents an approach to the exploration of variable length multiattributemotion data captured by data glove sensors and 3-D human motion cameras It sug-gests using the singular value decomposition technique to regularize multiattributemotion data of different lengths and classify them applying support vector machineclassifiers Classification motion data using support vector machine classifiers is com-pared with classification by related similarity measures in terms of accuracy and CPUtime

The third part is devoted to multimedia data indexing and retrieval It consists

of three chapters Chapter 8 focuses on developing an image retrieval methodology,which includes a new indexing method based on fuzzy logic, a hierarchical indexing

structure, and the corresponding hierarchical elimination-based A∗ retrieval rithm with logarithmic search complexity in the average case It also deals with userrelevance feedbacks to tailor the semantic retrieval to each user’s individualized querypreferences

Trang 34

Chapter 10 describes a methodology and a tool for end user that allows creatingclassifiers for images The process consists of two stages: First, small image fragmentscalled patches are classified Second, frequency vectors of these patch classificationsare fed into a second-level classier for global scene classification (e.g., city, portrait,

or countryside) The first-stage classifiers can be seen as a set of highly specializedfeature detectors that define a domain-specific visual alphabet The end user buildsthe second-level classifiers interactively by simply indicating positive examples of ascene The scene classifier approach has been successfully applied to several problemdomains, such as content-based video retrieval in television archives, automated sewerinspection, and pornography filtering

The fourth part is a collection of eight chapters that describe approaches to timedia data modeling and evaluation Chapter 11 presents an approach to automaticnovelty detection in video stream and compares it experimentally to human perfor-mance The evaluation of human versus machine-based novelty detection is quantified

mul-by metrics based on location of novel events, number of novel events, etc

Chapter 12 presents an effective approach for event detection using both audio andvisual features with its application in the automatic extraction of goal events in soccervideos The extracted goal events can be used for high-level indexing and selectivebrowsing of soccer videos The approach uses the nearest neighbor classifier withgeneralization scheme The proposed approach has been tested using soccer videos

of different styles that have been produced by different broadcasters

Chapter 13 describes an approach for mining and automatically discovering pings in hierarchical media data, metadata, and ontologies, using the structural infor-mation inherent in hierarchical data It uses structure-based mining of relationships,which provides high degrees of precision The approach works even when the map-pings are imperfect, fuzzy, and many-to-many

map-Chapter 14 proposes a new approach to high-level concept recognition in imagesusing the salient objects as the semantic building blocks The novel approach usessupport vector machine techniques to achieve automatic detection of the salient ob-jects, which serve as a basic visual vocabulary Then a high-level concept is modelingusing the Gaussian mixture model of weighted dominant components, i.e., salientobjects The chapter proves the efficiency of the modeling approach, using the results

of broad experiments obtained on images of nature

Chapter 15 is devoted to a fundamental problem—image segmentation It duces a new MPEG-7 friendly system that integrates a user’s feedback with imagesegmentation and region recognition, supporting the user in the extraction of imageregion semantics in highly dynamic environments The results obtained for aerialphotos are provided and discussed

Trang 35

intro-P1: OTE/SPH P2: OTE

SVNY295-Petrushin November 1, 2006 16:9

1 Introduction into Multimedia Data Mining and Knowledge Discovery 11

Chapter 16 describes techniques for query by example in an image database,when the exemplar image can have different position and scale in the target im-age and/or be captured by different sensor The approach matches images againstthe exemplar by comparing the local entropies in the images at correspondingpositions It employs a search strategy that combines sampling in the space ofexemplar positions, the Fast Fourier Transform for efficiently evaluating objecttranslations, and iterative optimization for pose refinement The techniques are ap-plied to matching exemplars with real images such as aerial and ground reconnais-sance photos Strategies for scaling this approach to multimedia databases are alsodescribed

Chapter 17 presents a neural experts architecture that enables faster neural works training for datasets that can be decomposed into loosely interacting sets ofattributes It describes the expressiveness of this architecture in terms of functionalcomposition The experimental results show that the proposed neural experts archi-tecture can achieve classification performance that is statistically identical to that of afully connected feedforward neural networks, while significantly improving trainingefficiency

net-Chapter 18 describes a methodology that allows building models of expressivemusic performance The methodology consists of three stages First, acoustic featuresare extracted from recordings of both expressive and neutral musical performances.Then, using data mining techniques a set of transformation rules is derived from per-formance data Finally, the rules are applied to description of inexpressive melody tosynthesize an expressive monophonic melody in MIDI or audio format The chap-ter describes, explores, and compares different data mining techniques for creatingexpressive transformation models

Finally, the fifth part unites seven chapters that describe case studies and cations Chapter 19 presents a new approach for supporting design and redesign ofvirtual collaborative workspaces, based on combining integrated data mining tech-niques for refining the lower level models with a reverse engineering cycle to createupper-level models The methodology is based on the combination of a new model

appli-of vertical information integration related to virtual collaboration that is called theinformation pyramid of virtual collaboration

Chapter 20 presents a time-constrained sequential pattern mining method forextracting patterns associated with semantic events in produced videos The video

is separated into shots and 13 streams of metadata are extracted for each shot Themetadata not only cover such shot’s attributes as duration, low-level color, texture,shape and motion features, and sound volume, but also advanced attributes such asthe presence of weapons in the shot and sound type, i.e., silence, speech, or music.All metadata are represented as quantized values forming finite alphabets for eachstream Data mining techniques are applied to stream of metadata to derive patternsthat represent semantic events

Chapter 21 describes an approach for people localization and tracking in an officeenvironment using a sensor network that consists of video cameras, infrared tagreaders, a fingerprint reader, and a pan-tilt-zoom camera The approach is based on aBayesian framework that uses noisy, but redundant data from multiple sensor streams

Trang 36

Chapter 22 presents an approach that allows estimating a potential attractiveness

of a banner image based on its attributes A banner image is an advertisement that isdesigned to attract Web users to view and eventually to buy the advertised product

or service The approach uses a Bayesian classifier to predict the level of the thru rates based on the features extracted from the banner image GIF file, such asimage dimensions, color features, number of frames in an animated file, and frames’dynamic and chromatic features The experimental results are discussed

click-Chapter 23 presents an approach that allows deriving users’ profiles from theirperformance data when they are working with a video search engine The approachsuggests a two-level model The goal of the first level is modeling and clustering user’sbehavior on a single video sequence (an intravideo behavior) The goal of the secondlevel is modeling and clustering a user’s behavior on a set of video sequences (an in-tervideo behavior) The two-phase clustering algorithm is presented and experimentalresults are discussed

Chapter 24 presents a method for an automatic identification of a person by irisrecognition The method uses arrays of pixels extracted from a raster image of a humaniris and searches for similar patterns using the latent semantic indexing approach.The comparison process is expedited by replacing the time consuming singular valuedecomposition algorithm with the partial symmetric eigenproblem The results ofexperiments using a real biometric data collection are discussed

Chapter 25 deals with data mining of medical images It presents the researchresults obtained for texture extraction, classification, segmentation, and retrieval ofnormal soft tissues in computer tomography images of the chest and abdomen Thechapter describes various data mining techniques, which allow identifying differenttissues in images, segmenting and indexing images, and exploring different similaritymeasures for image retrieval Experimental results for tissue segmentation, classifi-cation and retrieval are presented

References

1 Groth R Data Mining A Hands-On Approach for Business Professionals Prentice Hall,

Upper Saddle River, NJ, 1998

2 Berry MJA, Linoff G Data Mining Techniques for Marketing, Sales, and Customer port Wiley Computer Publishing, New York, 1997.

Sup-3 Agosta L The future of data mining—Predictive analytics IT View Report, ForresterResearch October 17, 2003 Available at: http://www.forrester.com/Research/LegacyIT/0,7208,32896,00.html

4 Berry MW (Ed.) Survey of Text Mining: Clustering, Classification, and Retrieval

Springer-Verlag, New York, 2004, 244 p

5 Zhong N, Liu J, Yao Y (Eds.) Web Intelligence Springer, New York, 2005; 440 p.

6 Manjunath BS, Salembier Ph, Sikora T (Eds.) Introduction to MPEG-7 Multimedia tent Description Interface Wiley, New York, 2002, 371 p.

Con-7 Maybury MT Intelligent Multimedia Information Retrieval AAAI Press/MIT Press,

Cambridge, MA, 1997, 478 p

Trang 37

P1: OTE/SPH P2: OTE

SVNY295-Petrushin November 1, 2006 16:9

1 Introduction into Multimedia Data Mining and Knowledge Discovery 13

8 Furht B (Ed.) Handbook of Multimedia Computing CRC Press, Boca Raton, FL, 1999,

971 p

9 Petrushin V Creating emotion recognition agents for speech signal In Dauntenhahn K,

Bond AH, Canamero L and Edmonds B (Eds.), Socially Intelligent Agents Creating Relationships with Computers and Robots Kluwer, Boston, 2002, pp 77–84.

10 Remagnino P, Jones GA, Paragios N, Regazzoni CS Video-based Surveillance Systems Computer Vision and Distributed Processing Kluwer, Boston, 2002, 279 p.

11 TRECVID Workshop website http://www-nlpir.nist.gov/projects/trecvid/

12 Furht B, Marques O (Eds.) Handbook of Video Databases Design and Applications CRC

Press, Boca Raton, FL, 2004, 1211 p

13 Rosenfeld A, Doermann D, DeMenthon D Video Mining Kluwer, Boston, 2003, 340 p.

14 Zaiane OR, Simoff SJ, Djeraba Ch Mining Multimedia and Complex Data Lecture Notes

in Artificial Intelligence, Vol 2797 Springer, New York, 2003; 280 p.

15 Kulesh V, Petrushin VA, Sethi IK Video clip recognition using joint audio-visual

pro-cessing model In Proceedings of 16th International Conference on Pattern Recognition

18 Stamou G, Kollias S Multimedia Content and the Semantic Web Wiley, New York, 2005,

392 p

19 Kulesh V, Petrushin VA, Sethi IK PERSEUS: Personalized multimedia news portal

In Proceedings of IASTED Intl Conf on Artificial Intelligence and Applications,September 4–7, 2001, Marbella, Spain, pp 307–312

20 Kosch H Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21 CRC Press, Boca Raton, FL, 2004, 258 p.

21 Bormans J, Hill K MPEG-21 Multimedia framework Overview Available at: http://www.chiariglione.org/mpeg/standards/mpeg-21/mpeg-21.htm

22 The Dublin Core Metadata Initiative website http://dublincore.org/

23 Resource Description Framework (RDF) http://www.w3.org/RDF/

24 Simple Knowledge Organisation System (SKOS): http://www.w3.org/2004/02/skos/

25 Thesaurus for Graphic Materials (TGM): http://www.loc.gov/rr/print/tgm1/

26 OWL Web Ontology Language Overview http://www.w3.org/TR/owl-features/

Trang 38

P1: OTE/SPH P2: OTE

SVNY295-Petrushin September 15, 2006 9:47

2 Multimedia Data Mining: An Overview

Nilesh Patel and Ishwar Sethi

Summary Data mining has been traditionally applied to well-structured data With the

explo-sion of multimedia data methods—videos, audios, images, and Web pages, many researchershave felt the need for data mining methods to deal with unstructured data in recent years Thischapter provides an overview of data mining efforts aimed at multimedia data We identifyexamples of pattern discovery models that have been addressed by different researchers andprovide an overview of such methods

The mining of multimedia data is more involved than that of traditional business

data because multimedia data are unstructured by nature There are no well-defined

fields of data with precise and nonambiguous meaning, and the data must be processed

to arrive at fields that can provide content information about it Such processing oftenleads to nonunique results with several possible interpretations In fact, multimediadata are often subject to varied interpretations even by human beings For example,

it is not uncommon to have different interpretation of an image by different experts,for example radiologists Another difficulty in mining of multimedia data are its

14

Trang 39

P1: OTE/SPH P2: OTE

SVNY295-Petrushin September 15, 2006 9:47

2 Multimedia Data Mining: An Overview 15

heterogeneous nature The data are often the result of outputs from various kinds ofsensor modalities with each modality needing its own way of processing Yet anotherdistinguishing aspect of multimedia data is its sheer volume All these characteristics

of multimedia data make mining it challenging and interesting

The goal of this chapter is to survey the existing multimedia data mining ods and their applications The organization of the chapter is as follows In Section2.2, we describe the basic data mining architecture for multimedia data and discussaspects of data mining that are specific to multimedia data Section 2.3 provides anoverview of representative features used in multimedia data mining It also discussesthe issues of feature fusion Section 2.4 describes multimedia data mining efforts forconcept mining through supervised techniques Methods for concept mining throughclustering are discussed in Section 2.5 Section 2.6 discusses concept mining throughthe exploitation of contextual information Event and feature discovery research isaddressed in Section 2.7 Finally, a summary of chapter is provided in Section 2.8

meth-2.2 Multimedia Data Mining Architecture

The typical data mining process consists of several stages and the overall process

is inherently interactive and iterative The main stages of the data mining processare (1) domain understanding; (2) data selection; (3) cleaning and preprocessing;(4) discovering patterns; (5) interpretation; and (6) reporting and using discoveredknowledge [1] The domain understanding stage requires learning how the results ofdata-mining will be used so as to gather all relevant prior knowledge before mining.Blind application of data-mining techniques without the requisite domain knowledgeoften leads to the discovery of irrelevant or meaningless patterns For example, whilemining sports video for a particular sport, for example, cricket, it is important to have agood knowledge and understanding of the game to detect interesting strokes used bybatsmen

The data selection stage requires the user to target a database or select a subset offields or data records to be used for data mining A proper domain understanding atthis stage helps in the identification of useful data This is the most time consumingstage of the entire data-mining process for business applications; data are never cleanand in the form suitable for data mining For multimedia data mining, this stage isgenerally not an issue because the data are not in relational form and there are nosubsets of fields to choose from

The next stage in a typical data-mining process is the preprocessing step that volves integrating data from different sources and making choices about representing

in-or coding certain data fields that serve as inputs to the pattern discovery stage Suchrepresentation choices are needed because certain fields may contain data at levels

of details not considered suitable for the pattern discovery stage The preprocessingstage is of considerable importance in multimedia data mining, given the unstructurednature of multimedia data

The pattern-discovery stage is the heart of the entire data mining process It is thestage where the hidden patterns and trends in the data are actually uncovered There

Trang 40

P1: OTE/SPH P2: OTE

SVNY295-Petrushin September 15, 2006 9:47

16 Nilesh Patel and Ishwar Sethi

are several approaches to the pattern discovery stage These include association,classification, clustering, regression, time-series analysis, and visualization Each ofthese approaches can be implemented through one of several competing methodolo-gies, such as statistical data analysis, machine learning, neural networks, and patternrecognition It is because of the use of methodologies from several disciplines thatdata mining is often viewed as a multidisciplinary field

The interpretation stage of the data mining process is used to evaluate the quality

of discovery and its value to determine whether previous stages should be revisited ornot Proper domain understanding is crucial at this stage to put a value on discoveredpatterns The final stage of the data mining process consists of reporting and putting

to use the discovered knowledge to generate new actions or products and services

or marketing strategies as the case may be An example of reporting for multimediadata mining is the scout system from IBM [2] in which the mined results are used bycoaches to design new moves

The architecture, shown in Figure 2.1, captures the above stages of data mining inthe context of multimedia data The broken arrows on the left in Figure 2.1 indicatethat the process is iterative The arrows emanating from the domain knowledge block

on the right indicate domain knowledge guides in certain stages of the mining process

Ngày đăng: 17/04/2017, 19:55