pattern recognition and image analysis

Hern´andez et al.Persuing the same idea, a combination of the speech enhanced signal sented by the SS method and a feature vector normalization techniquePD-MEMLIN [7] are presented in th

Trang 2

Lecture Notes in Computer Science 4478

Commenced Publication in 1973

Founding and Former Series Editors:

Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Trang 3

Joan Martí José Miguel Benedí

Ana Maria Mendonça Joan Serrat (Eds.)

Pattern Recognition and Image Analysis

Third Iberian Conference, IbPRIA 2007 Girona, Spain, June 6-8, 2007

Proceedings, Part II

1 3

Trang 4

José Miguel Benedí

Polytechnical University of Valencia

Camino de Vera, s/n., 46022 Valencia, Spain

Centre de Visió per Computador-UAB

Campus UAB, 08193 Belaterra, (Cerdanyola), Barcelona, Spain

E-mail: joan.serrat@cvc.uab.es

Library of Congress Control Number: 2007927717

CR Subject Classification (1998): I.4, I.5, I.7, I.2.7, I.2.10

LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition,and Graphics

ISBN-10 3-540-72848-1 Springer Berlin Heidelberg New York

ISBN-13 978-3-540-72848-1 Springer Berlin Heidelberg New York

This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks Duplication of this publication

or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,

in its current version, and permission for use must always be obtained from Springer Violations are liable

to prosecution under the German Copyright Law.

Springer is a part of Springer Science+Business Media

Trang 5

A record number of 328 full paper submissions from 27 countries were ceived Each of these submissions was reviewed in a blind process by two re-viewers The review assignments were determined by the four General Chairs,and the ﬁnal decisions were made after the Chairs meeting in Girona, giving anoverall acceptance rate of 47.5% Because of the limited size of the conference,

re-we regret that some worthy papers re-were probably rejected

In keeping with the IbPRIA tradition of having a single track of oral tations, the number of oral papers remained in line with the previous IbPRIAeditions, with a total of 48 papers The number of poster papers was settled to108

presen-We were also very honored to have as invited speakers such internationallyrecognized researchers as Chris Willians from the University of Edinburgh, UK,Michal Irani from The Weizmann Institute of Science, Israel and Andrew Davisonfrom Imperial College London, UK

For the ﬁrst time, some relevant related events were scheduled in parallel tothe IbPRIA main conference according to the Call for Tutorials and Workshops:Antonio Torralba from MIT, USA and Aleix Mart´ınez from Ohio State Uni-versity, USA taught relevant tutorials about object recognition and StatisticalPattern Recognition, respectively, while the “Supervised and Unsupervised En-semble Methods and Their Applications” workshop and the ﬁrst edition of the

“Spanish Workshop on Biometrics” were successfully developed

We would like to thank all the authors for submitting their papers and thusmaking these proceedings possible We address special thanks to the members ofthe Program Committee and the additional reviewers for their great work whichcontributed to the high quality of these proceedings

We are also grateful to the Local Organizing Committee for their substantialcontribution of time and eﬀort

Trang 6

VI Preface

Finally, our thanks go to IAPR for support in sponsoring the Best PaperPrize at IbPRIA 2007

The next edition of IbPRIA will be held in Portugal in 2009

Ana Maria Mendon¸caJos´e Miguel Bened´ı

Joan Serrat

Trang 7

IbPRIA 2007 was organized by AERFAI (Asociaci´on Espa˜nola de Reconocimiento

de Formas y Análisis de Imágenes) and APRP (Associa¸cão Portuguesa de hecimento de Padrões), and as the local organizer of this edition, the ComputerVision and Robotics Group, Institute of Informatics and Applications, University

Recon-of Girona (UdG)

General Conference Co-chairs

Joan Mart´ı University of Girona, Spain

Jos´e Miguel Bened´ı Polytechnical University of Valencia, Spain

Ana Maria Mendon¸ca University of Porto, Portugal

Joan Serrat Universitat Aut`onoma de Barcelona, Spain

Invited Speakers

Chris Williams University of Edinburgh, UK

Michal Irani The Weizmann Institute of Science, Israel

Andrew Davison Imperial College London, UK

National Organizing Committee

Trang 8

Francisco Casacuberta Polytechnical University of Valencia, SpainVicent Caselles Universitat Pompeu Fabra, Spain

Aur´elio Campilho University of Porto, Portugal

Lu´ıs Corte-Real University of Porto, Portugal

Pierre Dupont Universit´e catholique de Louvain, BelgiumMarcello Federico ITC-irst Trento, Italy

Vito di Ges´u University of Palermo, Italy

Francisco Mario Hern´andez Tejera Universidad de Las Palmas, Spain

Laurent Heutte Universit´e de Rouen, France

Jos´e Manuel I˜nesta Quereda Universidad de Alicante, Spain

Jorge Marques Technical University of Lisbon, Portugal

Wiro Niessen University of Utrecht, The NetherlandsFrancisco José Perales Universitat de les Illes Balears, SpainNicolás Pérez de la Blanca University of Granada, Spain

Fernando P´erez Cruz Universidad Carlos III, Spain

Ioannis Pitas University of Thessaloniki, Greece

Alberto Sanfeliu Polytechnical University of Catalonia, SpainGabriella Sanniti di Baja Istituto di Cibernetica CNR, Italy

Trang 9

Organization IX

Pierre Soille Joint Research Centre, Italy

M In´es Torres University of the Basque Country, Spain

Jordi Vitri`a Universitat Aut`onoma de Barcelona, Spain

Joachim Weickert Saarland University, Germany

Reyer Zwiggelaar University of Wales, Aberystwyth, UK

Reviewers

Maria Jos´e Abasolo University of the Balearic Islands, Spain

Antonio Adán Universidad de Castilla La Mancha, SpainFrancisco Jávier López Aligué University of Extremadura, Spain

Joachim Buhmann ETH Zurich, Switzerland

Juan Carlos Amengual UJI-LPI, Spain

Hans Burkhard University of Freiburg, Germany

Ramon Baldrich Computer Vision Center, Spain

Jorge Pereira Batista ISR Coimbra, Portugal

Alexandre Bernardino Instituto Superior T´ecnico, Portugal

Lilian Blot University of East Anglia, UK

Marcello Federico ITC-irst Trento, Italy

Michael Breuss Saarland University, Germany

Jaime Santos Cardoso INESC Porto, Portugal

Modesto Castrill´on Universidad de Las Palmas de Gran Canaria,

SpainMiguel Velhote Correia Instituto de Engenharia Biom´edica, Portugal

Jorge Alves da Silva FEUB-INEB, Portugal

Hans du Buf University of Algarve, Portugal

´

Oscar Deniz Universidad de Las Palmas de Gran Canaria,

SpainDaniel Hern´andez-Sosa Universidad de Las Palmas de Gran Canaria,

Spain

Claudio Eccher ITC-irst Trento, Italy

Arturo De la Escalera Universidad Carlos III de Madrid, SpainMiquel Feixas Universitat de Girona, Spain

Francesc J Ferri Universitat de Val`encia, Spain

Jordi Freixenet University of Girona, Spain

Maria Frucci Institute of Cybernetics “E Caianiello”, ItalyCesare Furlanello ITC-irst Trento, Italy

Miguel ´Angel Garc´ıa Universidad Aut´onoma de Madrid, SpainRafael Garc´ıa University of Girona, Spain

Trang 10

X Organization

Yolanda Gonz´alez Universidad de las Islas Baleares, Spain

Manuel Gonz´alez Universitat de les Illes Balears, Spain

Nuno Gracias University of Girona, Spain

Nicol´as Guil University of Malaga, Spain

Alfons Juan Universitat Polit`ecnica de Val`encia, Spain

Fr´ed´eric Labrosse University of Wales, Aberystwyth, UK

Bart Lamiroy Nancy Universit´e - LORIA - INPL, FranceXavier Llad´o University of Girona, Spain

Paulo Lobato Correia IT - IST, Portugal

´

Angeles L´opez Universitat Jaume I, Spain

Javier Lorenzo Universidad de Las Palmas de Gran Canaria,

SpainManuel Lucena Universidad de Ja´en, Spain

Enric Mart´ı Universitat Aut`onoma de Barcelona, SpainRobert Mart´ı Universitat de Girona, Spain

Elisa Mart´ınez Enginyeria La Salle, Universitat Ramon Llull,

SpainCarlos Mart´ınez Hinarejos Universidad Polit´ecnica de Valencia, SpainFabrice Meriaudeau Le2i UMR CNRS 5158, France

Maria Luisa Mic´o Universidad de Alicante, Spain

Birgit M¨oller Martin Luther University Halle-Wittenberg,

GermanyRam´on Mollineda Universidad Jaume I, Spain

Jacinto Nascimento Instituto de Sistemas e Rob´otica, PortugalShahriar Negahdaripour University of Miami, USA

Gabriel A Oliver-Codina University of the Balearic Islands, Spain

Jos´e Oncina Universidad de Alicante, Spain

Joao Paulo Costeira Instituto de Sistemas e Rob´otica, PortugalAntonio Miguel Peinado Universidad de Granada, Spain

Caroline Petitjean Universit´e de Rouen, France

Andr´e Teixeira Puga Universidade do Porto, Portugal

Petia Radeva Computer Vision Center-UAB, Spain

Joao Miguel Raposo Sanches Instituto Superior T´ecnico, Portugal

Antonio Rubio Universidad de Granada, Spain

Jos´e Ruiz Shulcloper Advanced Technologies Application Center, Cuba

J Salvador S´anchez Universitat Jaume I, Spain

Joaquim Salvi University of Girona, Spain

Joan Andreu Sánchez Universitat Politècnica de València, Spain

Elena S´anchez Nielsen Universidad de La Laguna, Spain

Trang 11

Organization XI

Joao Silva Sequeira Instituto Superior T´ecnico, Portugal

Margarida Silveira Instituto Superior T´ecnico, Portugal

Joao Manuel R.S Tavares Universidade do Porto, Portugal

Antonio Teixeira Universidade de Aveiro, Portugal

Javier Traver Universitat Jaume I, Spain

Maria Vanrell Computer Vision Center, Spain

Javier Varona Universitat de les Illes Balears, Spain

Martin Welk Saarland University, Germany

Laurent Wendling LORIA, France

Michele Zanin ITC-irst Trento, Italy

Sponsoring Institutions

MEC (Ministerio de Educaci´on y Ciencia, Spanish Government)

AGAUR (Ag`encia de Gesti´o d’Ajuts Universitaris i de Recerca, CatalanGovernment)

IAPR (International Association for Pattern Recognition)

Vicerectorat de Recerca en Ci`encia i Tecnologia, Universitat de Girona

Trang 12

Table of Contents – Part II

Robust Automatic Speech Recognition Using PD-MEEMLIN . 1

Igmar Hern´ andez, Paola Garc´ıa, Juan Nolazco, Luis Buera, and

Eduardo Lleida

Shadow Resistant Road Segmentation from a Mobile Monocular

System . 9

Jos´ e Manuel ´ Alvarez, Antonio M L´ opez, and Ramon Baldrich

Mosaicking Cluttered Ground Planes Based on Stereo Vision . 17

Jos´ e Gaspar, Miguel Realpe, Boris Vintimilla, and

Jos´ e Santos-Victor

Fast Central Catadioptric Line Extraction . 25

Jean Charles Bazin, C´ edric Demonceaux, and Pascal Vasseur

Similarity-Based Object Retrieval Using Appearance and Geometric

Feature Combination . 33

Agn´ es Borr` as and Josep Llad´ os

Real-Time Facial Expression Recognition for Natural Interaction . 40

Eva Cerezo, Isabelle Hupont, Cristina Manresa-Yee, Javier Varona,

Sandra Baldassarri, Francisco J Perales, and Francisco J Seron

A Simple But Eﬀective Approach to Speaker Tracking in Broadcast

News . 48

Luis Javier Rodr´ıguez, Mikel Pe˜ nagarikano, and Germ´ an Bordel

Region-Based Pose Tracking . 56

Christian Schmaltz, Bodo Rosenhahn, Thomas Brox,

Daniel Cremers, Joachim Weickert, Lennart Wietzke, and

Gerald Sommer

Testing Geodesic Active Contours . 64

A Caro, T Alonso, P.G Rodr´ıguez, M.L Dur´ an, and M.M ´ Avila

Rate Control Algorithm for MPEG-2 to H.264/AVC Transcoding . 72

Gao Chen, Shouxun Lin, and Yongdong Zhang

3-D Motion Estimation for Positioning from 2-D Acoustic Video

Imagery . 80

H Sekkati and S Negahdaripour

Progressive Compression of Geometry Information with Smooth

Intermediate Meshes . 89

Taejung Park, Haeyoung Lee, and Chang-hun Kim

Trang 13

XIV Table of Contents – Part II

Rejection Strategies Involving Classiﬁer Combination for Handwriting

Recognition . 97

Jose A Rodr´ıguez, Gemma S´ anchez, and Josep Llad´ os

Summarizing Image/Surface Registration for 6DOF Robot/Camera

Pose Estimation . 105

Elisabet Batlle, Carles Matabosch, and Joaquim Salvi

Robust Complex Salient Regions . 113

Sergio Escalera, Oriol Pujol, and Petia Radeva

Improving Piecewise-Linear Registration Through Mesh

Optimization . 122

Vicente Ar´ evalo and Javier Gonz´ alez

Registration-Based Segmentation Using the Information Bottleneck

Method . 130

Anton Bardera, Miquel Feixas, Imma Boada, Jaume Rigau, and

Mateu Sbert

Dominant Points Detection Using Phase Congruence . 138

Francisco Jos´ e Madrid-Cuevas, Rafel Medina-Carnicer,

´

Angel Carmona-Poyato, and Nicol´ as Luis Fern´ andez-Garc´ıa

Exploiting Information Theory for Filtering the Kadir Scale-Saliency

Detector . 146

Pablo Suau and Francisco Escolano

False Positive Reduction in Breast Mass Detection Using

Two-Dimensional PCA . 154

Arnau Oliver, Xavier Llad´ o, Joan Mart´ı, Robert Mart´ı, and

Jordi Freixenet

A Fast and Robust Iris Segmentation Method . 162

No´ e Otero-Mateo, Miguel ´ Angel Vega-Rodr´ıguez,

Juan Antonio G´ omez-Pulido, and

Juan Manuel S´ anchez-P´ erez

Detection of Lung Nodule Candidates in Chest Radiographs . 170

Carlos S Pereira, Hugo Fernandes, Ana Maria Mendon¸ ca, and

Aur´ elio Campilho

A Snake for Retinal Vessel Segmentation . 178

L Espona, M.J Carreira, M Ortega, and M.G Penedo

Risk Classiﬁcation of Mammograms Using Anatomical Linear Structure

and Density Information . 186

Edward M Hadley, Erika R.E Denton, Josep Pont,

Elsa P´ erez, and Reyer Zwiggelaar

Trang 14

Table of Contents – Part II XV

A New Method for Robust and Eﬃcient Occupancy Grid-Map

Nazife Dimililer, Ekrem Varo˘ glu, and Hakan Altın¸ cay

Boundary Shape Recognition Using Accumulated Length and Angle

Information . 210

Mar¸ cal Rusi˜ nol, Philippe Dosch, and Josep Llad´ os

Extracting Average Shapes from Occluded Non-rigid Motion . 218

Alessio Del Bue

Automatic Topological Active Net Division in a Genetic-Greedy Hybrid

Approach . 226

N Barreira, M.G Penedo, O Ib´ a˜ nez, and J Santos

Using Graphics Hardware for Enhancing Edge and Circle Detection . 234

Antonio Ruiz, Manuel Ujald´ on, and Nicol´ as Guil

Optimally Discriminant Moments for Speckle Detection in Real B-Scan

Images . 242

Robert Mart´ı, Joan Mart´ı, Jordi Freixenet,

Joan Carles Vilanova, and Joaquim Barcel´ o

Inﬂuence of Resampling and Weighting on Diversity and Accuracy of

Classiﬁer Ensembles . 250

R.M Valdovinos, J.S S´ anchez, and E Gasca

A Hierarchical Approach for Multi-task Logistic Regression . 258

`

Agata Lapedriza, David Masip, and Jordi Vitri` a

Modelling of Magnetic Resonance Spectra Using Mixtures for Binned

and Truncated Data . 266

Juan M Garcia-Gomez, Montserrat Robles, Sabine Van Huﬀel, and

Alfons Juan-C´ıscar

Atmospheric Turbulence Eﬀects Removal on Infrared Sequences

Degraded by Local Isoplanatism . 274

Magali Lemaitre, Olivier Laligant, Jacques Blanc-Talon, and

Trang 15

XVI Table of Contents – Part II

Word Spotting in Archive Documents Using Shape Contexts . 290

Josep Llad´ os, Partha Pratim-Roy, Jos´ e A Rodr´ıguez, and

Gemma S´ anchez

Fuzzy Rule Based Edge-Sensitive Line Average Algorithm in Interlaced

HDTV Sequences . 298

Gwanggil Jeon, Jungjun Kim, Jongmin You, and Jechang Jeong

A Tabular Pruning Rule in Tree-Based Fast Nearest Neighbor Search

Algorithms . 306

Jose Oncina, Franck Thollard, Eva G´ omez-Ballester,

Luisa Mic´ o, and Francisco Moreno-Seco

A General Framework to Deal with the Scaling Problem in

Phrase-Based Statistical Machine Translation . 314

Daniel Ortiz, Ismael Garc´ıa Varea, and Francisco Casacuberta

Recognizing Individual Typing Patterns . 323

Michal Chora´ s and Piotr Mroczkowski

Residual Filter for Improving Coding Performance of Noisy Video

Sequences . 331

Won Seon Song, Seong Soo Lee, and Min-Cheol Hong

Cyclic Viterbi Score for Linear Hidden Markov Models . 339

Vicente Palaz´ on and Andr´ es Marzal

Non Parametric Classiﬁcation of Human Interaction . 347

Scott Blunsden, Ernesto Andrade, and Robert Fisher

A Density-Based Data Reduction Algorithm for Robust Estimators . 355

L Ferraz, R Felip, B Mart´ınez, and X Binefa

Robust Estimation of Reﬂectance Functions from Polarization . 363

Gary A Atkinson and Edwin R Hancock

Estimation of Multiple Objects at Unknown Locations with Active

Contours . 372

Margarida Silveira and Jorge S Marques

Analytic Reconstruction of Transparent and Opaque Surfaces from

Texture Images . 380

Mohamad Ivan Fanany and Itsuo Kumazawa

Sedimentological Analysis of Sands . 388

Cristina Lira and Pedro Pina

Catadioptric Camera Calibration by Polarization Imaging . 396

O Morel, R Seulin, and and D Foﬁ

Trang 16

Table of Contents – Part II XVII

Stochastic Local Search for Omnidirectional Catadioptric Stereovision

Design . 404

G Dequen, L Devendeville, and E Mouaddib

Dimensionless Monocular SLAM . 412

Javier Civera, Andrew J Davison, and J.M.M Montiel

Improved Camera Calibration Method Based on a Two-Dimensional

Template . 420

Carlos Ricolfe-Viala and Antonio-Jose Sanchez-Salmeron

Relative Pose Estimation of Surgical Tools in Assisted Minimally

Invasive Surgery . 428

Agustin Navarro, Edgar Villarraga, and Joan Aranda

Eﬃciently Downdating, Composing and Splitting Singular Value

Decompositions Preserving the Mean Information . 436

Javier Melench´ on and Elisa Mart´ınez

On-Line Classiﬁcation of Human Activities . 444

J.C Nascimento, M.A.T Figueiredo, and J.S Marques

Data-Driven Jacobian Adaptation in a Multi-model Structure for Noisy

Speech Recognition . 452

Yong-Joo Chung and Keun-Sung Bae

Development of a Computer Vision System for the Automatic Quality

Grading of Mandarin Segments . 460

Jos´ e Blasco, Sergio Cubero, Ra´ ul Arias, Juan G´ omez,

Florentino Juste, and Enrique Molt´ o

Mathematical Morphology in theHSI Colour Space 467

M.C Tobar, C Platero, P.M Gonz´ alez, and G Asensio

Improving Background Subtraction Based on a Casuistry of

Colour-Motion Segmentation Problems . 475

I Huerta, D Rowe, M Mozerov, and J Gonz` alez

Random Forest for Gene Expression Based Cancer Classiﬁcation:

Overlooked Issues . 483

Oleg Okun and Helen Priisalu

Bounding the Size of the Median Graph . 491

Miquel Ferrer, Ernest Valveny, and Francesc Serratosa

When Overlapping Unexpectedly Alters the Class Imbalance Eﬀects . 499

V Garc´ıa, R.A Mollineda, J.S S´ anchez, R Alejo, and J.M Sotoca

A Kernel Matching Pursuit Approach to Man-Made Objects Detection

in Aerial Images . 507

Wei Wang, Xin Yang, and Shoushui Chen

Trang 17

XVIII Table of Contents – Part II

Anisotropic Continuous-Scale Morphology . 515

Michael Breuß, Bernhard Burgeth, and Joachim Weickert

Three-Dimensional Ultrasonic Assessment of Atherosclerotic Plaques . 523

Jos´ e Seabra, Jo˜ ao Sanches, Lu´ıs M Pedro, and

J Fernandes e Fernandes

Measuring the Applicability of Self-organization Maps in a Case-Based

Reasoning System . 532

A Fornells, E Golobardes, J.M Martorell, J.M Garrell,

E Bernad´ o, and N Maci` a

Algebraic-Distance Minimization of Lines and Ellipses for Traﬃc Sign

Shape Localization . 540

Pedro Gil-Jim´ enez, Saturnino Maldonado-Basc´ on,

Hilario G´ omez-Moreno, Sergio Lafuente-Arroyo, and

Javier Acevedo-Rodr´ıguez

Modeling Aceto-White Temporal Patterns to Segment Colposcopic

Images . 548

H´ ector-Gabriel Acosta-Mesa, Nicandro Cruz-Ram´ırez,

Rodolfo Hern´ andez-Jim´ enez, and

Daniel-Alejandro Garc´ıa-L´ opez

Speech/Music Classiﬁcation Based on Distributed Evolutionary Fuzzy

Logic for Intelligent Audio Coding . 556

J.E Mu˜ noz Exp´ osito, N Ruiz Reyes, S Garcia Gal´ an, and

P Vera Candeas

Breast Skin-Line Segmentation Using Contour Growing . 564

Robert Mart´ı, Arnau Oliver, David Raba, and Jordi Freixenet

New Measure for Shape Elongation . 572

Miloˇ s Stojmenovi´ c and Joviˇ sa ˇ Zuni´ c

Evaluation of Spectral-Based Methods for Median Graph

Computation . 580

Miquel Ferrer, Francesc Serratosa, and Ernest Valveny

Feasible Application of Shape-Based Classiﬁcation . 588

A Caro, P.G Rodr´ıguez, T Antequera, and R Palacios

3D Shape Recovery with Registration Assisted Stereo Matching . 596

Huei-Yung Lin, Sung-Chung Liang, and Jing-Ren Wu

Blind Estimation of Motion Blur Parameters for Image

Deconvolution . 604

Jo˜ ao P Oliveira, M´ ario A.T Figueiredo, and Jos´ e M Bioucas-Dias

Trang 18

Table of Contents – Part II XIX

Dependent Component Analysis: A Hyperspectral Unmixing

Algorithm . 612

Jos´ e M.P Nascimento and Jos´ e M Bioucas-Dias

Synchronization of Video Sequences from Free-Moving Cameras . 620

Joan Serrat, Ferran Diego, Felipe Lumbreras, and

Jos´ e Manuel ´ Alvarez

Tracking the Left Ventricle in Ultrasound Images Based on Total

Variation Denoising . 628

Jacinto C Nascimento, Jo˜ ao M Sanches, and Jorge S Marques

Bayesian Oil Spill Segmentation of SAR Images Via Graph Cuts . 637

S´ onia Pelizzari and Jos´ e M Bioucas-Dias

Unidimensional Multiscale Local Features for Object Detection Under

Rotation and Mild Occlusions . 645

Michael Villamizar, Alberto Sanfeliu, and Juan Andrade Cetto

Author Index 653

Trang 19

Robust Automatic Speech Recognition Using

PD-MEEMLIN

Igmar Hern´andez1, Paola Garc´ıa1, Juan Nolazco1, Luis Buera2,

and Eduardo Lleida2

1Computer Science Department, Tecnolgico de Monterrey,

Campus Monterrey, M´exico

2 Communications Technology Group (GTC), I3A, University of Zaragoza, Spain

{A00778595,paola.garcia,jnolazco,}@itesm.mx, {lbuera,lleida}@unizar.es

Abstract. This work presents a robust normalization technique bycascading a speech enhancement method followed by a feature vectornormalization algorithm To provide speech enhancement the SpectralSubtraction (SS) algorithm is used; this method reduces the effect of ad-ditive noise by performing a subtraction of the noise spectrum estimateover the complete speech spectrum On the other hand, an empirical fea-ture vector normalization technique known as PD-MEMLIN (Phoneme-Dependent Multi-Enviroment Models based LInear Normalization) hasalso shown to be effective PD-MEMLIN models clean and noisy spacesemploying Gaussian Mixture Models (GMMs), and estimates a set oflinear compensation transformations to be used to clean the signal Theproper integration of both approaches is studied and the final design, PD-MEEMLIN (Phoneme-Dependent Multi-Enviroment Enhanced Modelsbased LInear Normalization), confirms and improves the effectiveness ofboth approaches The results obtained show that in very high degradedspeech PD-MEEMLIN outperforms the SS by a range between 11.4% and34.5%, and for PD-MEMLIN by a range between 11.7% and 24.84% Fur-themore, in moderate SNR, i.e 15 or 20 dB, PD-MEEMLIN is as good

as PD-MEMLIN and SS techniques

1 Introduction

The robust speech recognition ﬁeld plays a key rule in real environment cations Noise can degrade speech signals causing nocive eﬀects in AutomaticSpeech Recognition (ASR) tasks Even though there have been great advances

appli-in the area, robustness still remaappli-ins an issue Noticappli-ing this problem, several niques have been developed over the years, for instance the Spectral Subtractionalgorithm (SS) [1]; and in the last decade, SPLICE (State Based Piecewise Lin-ear Compensation for Enviroments) [2], PMC (Parallel Model Combination) [3],RATZ (multivariate Gaussian based cepstral normalization) [4] and RASTA (theRelAtive SpecTrAl Technique) [5] The research that followed this evolution was

tech-to make a proper combination of algorithms in order tech-to reduce the noise fects For example, a good example is described in [6], where the core scheme iscomposed of a Continuous SS (CSS) and PMC

ef-J Mart´ı et al (Eds.): IbPRIA 2007, Part II, LNCS 4478, pp 1–8, 2007.

c

Springer-Verlag Berlin Heidelberg 2007

Trang 20

2 I Hern´andez et al.

Persuing the same idea, a combination of the speech enhanced signal sented by the SS method) and a feature vector normalization technique(PD-MEMLIN [7]) are presented in this work to improve the recognition accu-racy of the speech recognition system in highly degraded environments [8,9] Theﬁrst technique was selected because of its implementation simplicity and goodperformance The second one is an empirical vector normalization technique thathas been compared against some other algorithms [8] and has obtained impor-tant improvements

(repre-The organization of the paper is as follows In Section 2, a brief overview ofthe SS and PD-MEMLIN Section 3 details the new method PD-MEEMLIN InSection 4, the experimental results are presented Finally, the conclusions areshown in Section 5

2 Spectral Subtraction and PD-MEMLIN

In order to evaluate the proposed integration, an ASR system is employed Ingeneral, a pre-processing stage of the speech waveform is always desirable Thespeech signal is divided into overlaped short windows, from which a set of coeﬃ-cients, usually Mel Frequency Cepstral Coeﬃcients (MFCCs)[10], are computed.The MFCCs are feeded to the training algorithm that calculates the acousticmodels The acoustic models used in this research are the Hidden Markov Mod-els (HMMs), which are widely used to model statistically the behaviour of thephonetic events in speech [10] The HMMs employ a sequence of hidden stateswhich characterises how a random process (speech in this case) evolves in time.Although the states are not observable, a sequence of realizations from thesestates can always be obtained Associated to each state there is a probabilitydensity function, normally a mixture of Gaussians The criteria used to trainthe HMMs is the Maximum Likelihood, thus, the training process becomes anoptimization problem that can be solved iteratively with the Baum and Welchalgorithm

2.1 Spectral Subtraction

The Spectral Subtraction (SS) algorithm is a simple and known speech ment technique This research is based on the SS algorithm expressed in [9] Ithas the property that it does not requiere the use of an explicit voice activitydetector, as general SS algorithms does The algorithm is based on the existance

enhance-of peaks and valleys in a short noisy speech time subband power estimate [9].The peaks correspond to the speech activity and the valleys are used to obtain

an estimate of the subband noise power So, a reliable noise estimation is tained using a large enough window that can pemit the detection of any peak ofspeech activity

ob-As shown in Figure 1, this algorithm performs a modiﬁcation of the short timespectral magnitude of the noisy speech signal during the process of enhancement.Hence, the output signal can be considered close to the speech clean signal when

Trang 21

Robust Automatic Speech Recognition Using PD-MEEMLIN 3

Fig 1.Diagram of the Basic SS Method Used

synthesized The appropriate computation of the spectral magnitude is obtained

with the noise power estimate and the SS algorithm Let, y(i) = x(i)+n(i), where y(i) is the noisy speech signal, x(i) is the clean speech signal, n(i) is the noise signal and i denotes the time index, x(i) and n(i) are statistically independent.

Figure 1 depicts the spectral analysis in which the frames in the time main data are windowed and converted to frequency domain using the Discrete

do-Fourier Transform (DFT) ﬁlter bank with WDF T subbands and with a

decima-tion/interpolation ratio named R [9] After the computation of the noise power

estimation and the spectral weightening, the enhanced signal can be transformedback to the time domain using the Inverse Discrete Fourier Transform (IDFT).For the subtraction algorithm it is necessary to estimate the subband noise

power Pn(λ, k) and the short time signal power |Y (λ, k)|2, where λ is the mated time index and k are the frequency bins of the DFT A ﬁrst order recursive

deci-network is used to obtain a short time signal power as shown in Equation 1

|Y (λ, k)|2= γ ∗ |Y (λ − 1, k)|2+ (1− γ) ∗ |Y (λ, k)|2. (1)Afterwards, the subtraction algorithm is accomplished using an oversubtrac-

tion factor osub(λ, k) and a spectral ﬂooring constant (subf ) [12] The osub(λ, k)

factor is needed to eliminate the musical noise, and it is calculated as a function

of the subband Signal to Noise Ratio SN Ry(λ, k), λ and k (for a high SNR and high frequencies less osub factor is required, for low SNR and low frequencies the osub is less) The subf constant helps the resultant spectral components from

going below a minimum level It is expressed as a fraction of the original noise

power spectrum The ﬁnal relation of the spectral subtraction between subf and osub is deﬁned by Equation 2.

Trang 22

constant to obtain the periodograms Then, Pn(λ, k) is calculated as a weighted minimum of Px(λ, k) in a window of D subband samples Hence,

P n(λ, k) = omin · P min(λ, k), (3)

where P min (λ, k) denotes the estimated minimum power and omin is a bias

compensation factor The data window D is divided into W windows of length

M, allowing to update the minimum every M samples without time consuming.This noise estimator combined with the spectral subtraction has the ability

to preserve weak speech sounds If a short time subband power is observed,the valleys correspond to the noisy speech signal and are used to estimate thesubband noise power

The last element to be calculated is the SN Ry(λ, k) in Equation 4 that trols the oversubtraction factor osub(λ, k).

Up to this stage osub(λ, k) and subf can be selected and the spectral substraction

algorithm can be computed

PD-MEMLIN is an empirical feature vector normalization technique which usesstereo data in order to estimate the diﬀerent compensation linear transforma-tions in a previous training process The clean feature space is modelled as amixture of Gaussians for each phoneme The noisy space is split in several ba-sic acoustic environments and each environment is modelled as a mixture ofGaussians for each phoneme The transformations are estimated for all basicenvironments between a clean phoneme Gaussian and a noisy Gaussian of thesame phoneme

PD-MEMLIN approximations Clean feature vectors, x, are modelled using

a GMM for each phoneme, ph

x ) are the mean vector, the diagonal covariance

ma-trix, and the a priori probability associated with the clean model Gaussian s ph x

Trang 23

Finally, clean feature vectors can be approximated as a linear function, f ,

of the noisy feature vector for each time frame t which depends on the basic environments, the phonemes and the clean and noisy model Gaussians: x ≈

f (y t , s ph

x , s e,ph

y ) = yt −r s ph

x ,s e,ph y , where r s ph

x ,s e,ph y is the bias vector transformation

between noisy and clean feature vectors for each pair of Gaussians, s ph

x and s e,ph

y

PD-MEMLIN enhancement With those approximations, PD-MEMLIN

transforms the Minimum Mean Square Error (MMSE) estimation expression,ˆ

where p(e |y t) is the a posteriori probability of the basic environment; p(ph |y t , e) is

the a posteriori probability of the phoneme, given the noisy feature vector and the

environment; p(s e,ph y |y t , e, ph) is the a posteriori probability of the noisy model Gaussian, s e,ph y , given the feature vector, yt, the basic environment, e, and the phoneme, ph To estimate those terms: p(e |y t ), p(ph |y t , e) and p(s e,ph

3 PD-MEEMLIN

By combinig both techniques, PD-MEEMLIN arises as an empirical featurevector normalization which estimates diﬀerent linear transformations as PD-MEMLIN, with the special property that a new enhanced space is obtained byapplying SS to the noisy speech signal Furthermore, this ﬁrst-stage enhance-ment produces that the noisy space gets closer to the clean one, making the gapsmaller among them Figure 2 shows PD-MEEMLIN architecture

Next, the architecture modules are explained:

– The SS-enhancement of the noisy speech signal is performed, | ˆ X(λ, k) |,

P n (λ, k) and SN R y (λ, k) are calculated.

– Given the clean speech signal and the enhanced noisy speech signal, the clean

and noisy-enhanced GMMs are obtained

Trang 24

Fig 2.PD-MEEMLIN Architecture

– In the testing stage, the noisy speech signal is also SS-enhanced and then

normalized using PD-MEEMLIN

– These normalized coeﬃcients are forwarded to the decoder.

4 Experimental Results

All the experiments were performed employing the AURORA2 database [13],clean and noisy data based on TIDigits Three types of noises were selected:Subway, Babble and Car from AURORA2, that go from -5dB to 20dB SNR For

every SNR the SS parameters osub and subf needs to be conﬁgured The eter osub takes values from 0.4 to 4.6 (0.4 for 20dB, 0.7 for 15dB, 1.3 for 10dB, 2.21 for 5dB, 4.6 for 0dB and 4.6 for -5dB) and subf values 0.03 or 0.04 (all SNR

param-levels except 5dB optimised for 0.04) The phonetic acoustic models employed

by PD-MEEMLIN are obtained from 22 phonemes and 1 silence The modelsset is represented by a mixture of 32 Gaussians each Besides, two new sets ofeach noise were used, PD-MEEMLIN needs one to estimate the enhanced-noisymodel, and onother to obtain the normalized coeﬃcients The feature vectorsfor the recognition process are built by 12 normalized MFCCs followed by the

energy coeﬃcient, its time-derative Δ and the time-acceleration ΔΔ For the

training stage of the ASR system, the acoustic models of 22 phonemes and thesilence consist on a three-state HMMs with a mixture of 8 Gaussians per state

The combined techniques show that for low noise conditions i.e SN R=10, 15

or 20 dB, the diﬀerence between the original noisy space and the one mated to the clean is similar However, when the SNR is lower (-5dB or 0dB)the SS improves the performance of PD-MEMLIN Comparing the combination

approxi-of SS with PD-MEMLIN against the case where no techniques are applied, asigniﬁcant improvement is shown The results described before are presented inTables 1, 2 and 3 The Tables show ”Sent” that means complete utterances

Trang 25

Table 1.Comparative Table for the ASR working with Subway Noise

SNR Sent % Word % Sent % Word % Sent % Word % Sent % Word %

-5dB 3.40 21.57 10.09 34.22 11.29 37.09 13.29 47.95 0dB 9.09 29.05 20.18 53.71 27.07 61.88 30.87 69.71 5dB 17.58 40.45 32.17 70.00 48.15 80.38 51.65 83.40 10dB 33.07 65.47 50.95 83.23 65.83 90.58 70.13 91.86 15dB 54.45 84.60 64.84 90.02 78.92 94.98 78.22 94.40 20dB 72.83 93.40 76.52 94.56 85.91 97.14 86.71 97.30

Table 2.Comparative Table for the ASR working with Babble Noise

0dB 11.29 30.41 15.98 44.49 23.48 55.72 20.08 59.50 5dB 20.58 44.23 30.37 65.11 48.75 80.55 49.25 83.70 10dB 40.86 72.85 50.25 80.93 74.93 94.20 69.33 91.48 15dB 69.03 90.54 69.93 90.56 84.12 96.86 81.32 95.54 20dB 82.42 96.17 83.52 95.84 88.91 98.09 88.01 97.98

Table 3.Comparative Table for the ASR working with Car Noise

10dB 28.77 58.13 54.25 82.72 70.83 92.15 70.93 91.9015dB 57.84 84.04 68.03 90.51 82.02 96.16 81.42 95.8620dB 78.32 94.61 81.42 95.30 87.01 97.44 87.81 97.77

percentage correctly recognised, and ”Word” indicates the words percentage rectly recognised The gap between the clean and the noisy model, for the veryhigh degraded speech, had been shortened due to the advantages of both tech-niques When PD-MEEMLIN is employed the performance is between 11.7%and 24.84% better than PD-MEMLIN, and between 11.4% and 34.5% betterthan SS

cor-5 Conclusions

In this work a robust normalization technique, PD-MEEMLIN, has been sented by cascading a speech enhancement method (SS) followed by a featurevector normalization algorithm (PD-MEMLIN) The results of PD-MEEMLINshow a better performance than SS and PD-MEMLIN for a very high degraded

Trang 26

3 Gales, M.J.F., Young, S.: Cepstral Parameter Compensation for HMM Recognition

in Noise Speech Communication 12(3), 231–239 (1993)

4 Moreno, P.J., Raj, B., Gouvea, E., Stern, R.M.: Multivariate-Gaussian-Based stral Normalization for Robust Speech Recognition Department of Electrical andComputer Engineering & School of Computer Science Carnegie Mellon University

Cep-5 Hermansky, H., Morgan, N.: RASTA Processing of Speech IEEE Transactions onSpeech and Audio Processing 2(4), 578–589 (1994)

6 Nolazco-Flores, J., Young, S.: Continuous Speech Recognition in Noise Using tral Subtraction and HMM adaptation In: ICASSP, pp I.409–I.412 (1994)

Spec-7 Buera, L., Lleida, E., Miguel, A., Ortega, A.: Multienvironment Models BasedLInear Normalization for Speech Recognition in Car Conditions In: Proc ICASSP(2004)

8 Buera, L., Lleida, E., Miguel, A., Ortega, A.: Robust Speech Recognition in CarsUsing Phoneme Dependent Multienvironment LInear Normalization In: Proceed-ings of Interspeech Lisboa, Portugal, pp 381–384 (2005)

9 Martin, R.: Spectral Subtraction Based on Minimum Statistics In: Proc Eur.Signal Processing Conf pp 1182–1185 (1994)

10 Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing, pp 504–512.Prentice Hall PTR, United States (2001)

11 Martin, R.: Noise Power Spectral Density Estimation Based on Optimal Smoothingand Minimum Statistics IEEE Transactions on Speech and Audio Processing, vol.9(5) (2000)

12 Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of Speech Corrupted byAcoustic Noise In: Proc IEEE Conf ASSP, pp 208–211 (1979)

13 Hirsch, H.G., Pearce, D.: The AURORA Experimental Framework for the mance Evaluations of Speech Recognition Systems Under Noisy Condidions In:ISCA ITRW ASR2000, Automatic Speech Recognition: Challenges for the NextMillennium, Paris, France (2000)

Trang 27

Perfor-Shadow Resistant Road Segmentation

from a Mobile Monocular System

José Manuel Álvarez, Antonio M López, and Ramon Baldrich

Computer Vision Center and Computer Science Dpt.,

Universitat Aut`onoma de BarcelonaEdiﬁci O, 08193 Bellaterra, Barcelona, Spain

{jalvarez,antonio,ramon}@cvc.uab.es

http://www.cvc.uab.es/adas

Abstract. An essential functionality for advanced driver assistance tems (ADAS) is road segmentation, which directly supports ADAS ap-plications like road departure warning and is an invaluable backgroundsegmentation stage for other functionalities as vehicle detection Unfor-tunately, road segmentation is far from being trivial since the road is

sys-in an outdoor scenario imaged from a mobile platform For sys-instance,shadows are a relevant problem for segmentation The usual approachesare ad hoc mechanisms, applied after an initial segmentation step, thattry to recover road patches not included as segmented road for being inshadow In this paper we argue that by using a diﬀerent feature space toperform the segmentation we can minimize the problem of shadows fromthe very beginning Rather than the usual segmentation in a color space

we propose segmentation in a shadowless image which is computable inreal–time using a color camera The paper presents comparative resultsfor both asphalted and non–asphalted roads, showing the beneﬁts of theproposal in presence of shadows and vehicles

1 Introduction

Advanced driver assistance systems (ADAS) arise as a contribution to traﬃcsafety, a major social issue in modern countries The functionalities required tobuild such systems can be addressed by computer vision techniques, which havemany advantages over using active sensors (e.g radar, lidar) Some of them are:higher resolution, richness of features (color, texture), low cost, easy aestheticintegration, non–intrusive nature, low power consumption, and besides, somefunctionalities can only be addressed by interpreting visual information A rele-vant functionality is road segmentation which supports ADAS applications likeroad departure warning Moreover, it is an invaluable background segmentationstage for other functionalities as vehicle and pedestrian detection, since knowingthe road surface considerably reduces the image region to search for such objects,thus, allowing real–time and reducing false detections

Our interest is real–time segmentation of road surfaces, both non–asphaltedand asphalted, using a single forward facing color camera placed at the wind-shield of a vehicle However, road segmentation is far from being trivial since the

J Mart´ı et al (Eds.): IbPRIA 2007, Part II, LNCS 4478, pp 9–16, 2007.

c

Trang 28

10 J.M ´Alvarez, A.M L´opez, and R Baldrich

Fig 1.Roads with shadows

road is in an outdoor scenario imaged from a mobile platform Hence, we dealwith a continuously changing background, the presence of diﬀerent vehicles of un-known movement, diﬀerent road shapes with worn–out asphalt (or not asphalted

at all), and diﬀerent illumination conditions For instance, a particularly vant problem is the presence of shadows (Fig 1) The usual approaches found

rele-in the literature are ad hoc mechanisms applied after an rele-initial segmentationstep (e.g [1,2,3]) These mechanisms try to recover road patches not included assegmented road for being in shadow In this paper we argue that by using a dif-ferent feature space to perform the segmentation we can minimize the problem ofshadows from the very beginning Rather than the usual segmentation in a colorspace, we propose segmentation in a shadowless image, which is computable inreal–time using a color camera In particular, we use the grey–scale illuminantinvariant image introduced in [4],I from now on.

In Sect 2 we summarize the formulation ofI Moreover, we also show that

automatic shutter, needed outdoors to avoid global over/under–exposure, ﬁtswell in such formulation In order to illustrate the usefulness ofI, in Sect 3 we

propose a segmentation algorithm based on standard region growing applied to

I We remark that we do not recover a shadow–free color image from the

orig-inal, which would result in too large processing time for the road segmentationproblem Section 4 presents comparative road segmentation results in presence

of shadows and vehicles, both in asphalted and non–asphalted roads, conﬁrmingthe validity of our hypothesis Finally, conclusions are drawn in Sect 5

2 Illuminant Invariant Image

Image formation models are deﬁned in terms of the interaction between thespectral power distribution of illumination, surface reﬂectance and spectral sen-

sitivity of the imaging sensors Finlayson et al [4] show that under the tions of Planckian illumination, Lambertian surfaces and having three diﬀerent narrow band sensors, it is possible to obtain a shadow–free color image We are

assump-not interested in such image since it requires very large processing time to berecovered We focus on an illuminant invariant image (I) that is obtained at the

ﬁrst stage of the shadow–free color image recovering process We brieﬂy exposehere the idea behindI and refer to [4] for details.

Trang 29

Shadow Resistant Road Segmentation from a Mobile Monocular System 11

Fig 2.Ideal log–log chromaticity plot A Lambertian surface patch of a given maticity under a Planckian illumination is represented by a point By changing thecolor temperature of the Planckian illuminator we obtain a straight line associated tothe patch Lambertian surface patches of diﬀerent chromaticity have diﬀerent associ-

chro-ated lines All these lines form a family of parallel lines, namely Ψ θ Let θ be a line

perpendicular to Ψ θ and θ the angle between θand the horizontal axis Then, by

pro-jection, we have a one–to–one correspondence between points in θand straight lines of

Ψ θ , so that θ preserves the diﬀerences regarding chromaticity but removes diﬀerencesdue to illumination changes assuming Planckian radiators

Let us denote by R, G, B the usual color channels and assume a normalizing

channel (or combination of channels), e.g without losing generality let us choose

G as such normalizing channel Then, under the assumptions regarding the sors, the surfaces and the illuminators, if we perform a plot of r = log(R/G)

sen-vs b = log(B/G) for a set of surfaces of diﬀerent chromaticity under diﬀerent

illuminants, we would obtain a result similar to the one in Fig 2 This means

that we obtain an axis, θ, where a surface under diﬀerent illuminations is resented by the same point, while moving along θ implies to change the surface

rep-chromaticity In other words, θcan be seen as a grey–level axis where each greylevel corresponds to a surface chromaticity, independently of the surface illu-mination Therefore, we obtain an illuminant invariant image, I(p), by taking each pixel p = (x, y) of the original color image, IRGB(p) = (R(p), G(p), B(p)),

computing p = (r(p), b(p)) and projecting p onto θ according to θ (a camera

dependent constant angle) The reason forI being shadow–free is, roughly, that

non–shadow surface areas are illuminated by both direct sunlight and skylight(a sort of scattered ambient light), while areas in the umbra are only illuminated

by skylight Since both, skylight alone and with sunlight addition, can be ered Planckian illuminations [5], areas of the same chromaticity ideally project

consid-onto the same point in θ, no matter if the areas are in shadow or not.

Given this result, the ﬁrst question is whether the working assumptions are

realistic or not In fact, Finlayson et al [4] show examples where, despite the

departures from the assumptions that are found in practice, the obtained sults are quite good We will see in Sect 4 that this holds in our case, i.e., the

Trang 30

re-12 J.M ´Alvarez, A.M L´opez, and R Baldrich

combination of our camera, the daylight illuminant and the surface we areinterested in (the road) ﬁts pretty well theI theory.

A detail to point out is that our acquisition system was operating in matic shutter mode: i.e., inside predeﬁned ranges, the shutter changes to avoidboth global overexposure and underexposure However, provided we are us-ing sensors with linear response and the same shutter for the three channels,

auto-we can model the shutter action as a multiplicative constant s, i.e., auto-we have

sI RGB = (sR, sG, sB) and, therefore, the channel normalization removes the constant (e.g sR/sG = R/G).

In addition, we expect the illumination invariant image to reduce not onlydiﬀerences due to shadow but also diﬀerences due to asphalt degradation since,

at the resolution we work, they are pretty analogous to just intensity changes.Note that the whole intensity axis is equivalent to a single chromaticity, i.e., allthe patches of the last row of the Macbeth color checker in Fig 2 (Ni) project

to the same point of θ.

3 Road Segmentation

With the aim of evaluating the suitability of the illuminant invariant image wehave devised a relatively simple segmentation method based on region growing[6], sketched in Fig 3 This is, we do not claim that the proposed segmentation

is the best, but one of the most simplest that can be expected to work in ourproblem We emphasize that our aim is to show the suitability of I for road

segmentation and we think that providing good results can be a proof of it, evenusing such simple segmentation approach

The region growing uses a very simple aggregation criterium: if p = (x, y)

is a pixel already classiﬁed as of the road, any other pixel pn = (xn , y n) of its8–connected neighborhood is classiﬁed as road one if

where diss(p, p n) is the dissimilarity metric for the aggregation and tagg athreshold that ﬁxes the maximum dissimilarity to consider two connected pixels

as of the same region To prove the usefulness ofI we use the simplest

dissimi-larity based on grey levels, i.e.,

diss I(p, pn) =|I(p) − I(p n)| (2)

Of course, region growing needs initialization, i.e., the so–called seeds

Cur-rently, such seeds are taken from ﬁxed positions at the bottom region of theimage (Fig 3), i.e., we assume that such region is part of the road In fact, thelowest row of the image corresponds to a distance of about 4 meters away fromthe vehicle, thus, it is a reasonable assumption most of the time (other proposalsrequire to see the full road free at the start up of the system, e.g [1])

In order to compute the angle θ corresponding to our camera, we have

fol-lowed two approaches One is the proposal in [7], based on acquiring images of

Trang 31

Fig 3.Proposed algorithm In all our experiments we have ﬁxed values for the

algo-rithm parameters: σ = 0.5 for Gaussian smoothing (Gaussian kernel, g σ, discretized

in a 3 × 3 window for convolution ’∗’); θ = 38 ◦ ; t agg = 0, 007 and seven seeds placed

at the squares pointed out in the region growing result; structuring element (SE) of

n × m = 5 × 3 Notice that we apply some mathematical morphology just to ﬁll in

some small gaps and thin grooves

the Macbeth color checker under diﬀerent day time illuminations and using the

(r,b)–plot to obtain θ The other approach consists in taking a few road images with shadows and use them as positive examples to ﬁnd θ providing the best shadow–free images for all the examples The values of θ obtained from the two

calibration methods basically give rise to the same segmentation results We have

taken θ from the example–based calibration because it provides slightly better

segmentations Besides, although not proposed in the original formulation ofI, before computing it we regularize the input image I RGB by a small amount ofGaussian smoothing (the same for each color channel)

4 Results

In this section we present comparative results based on the region growing gorithm introduced in Sect 3 for three diﬀerent feature spaces: intensity image

al-(I; also called luminance or brightness); hue–saturation–intensity (HSI) color

space; and the illuminant invariant image (I).

Trang 32

14 J.M ´Alvarez, A.M L´opez, and R Baldrich

The intensity image is included in the comparison just to see what can weexpect from a monocular monochrome system Since it is a grey level image, itscorresponding dissimilarity measure is deﬁned analogously to Eq (2), i.e.:

diss I (p, p n) = |I(p) − I(p n) | (3)

The HSI space is chosen because it is one of the most accepted color spaces for

segmentation purposes [8] The reason is that by having separated chrominance

(H & S) and intensity (I) such space allows reasoning in a closer way to human

perception than others For instance, it is possible to deﬁne a psychologicallymeaningful distance between colors as the cylindrical metric proposed in [8] formultimedia applications, and used in [1] for segmenting non–asphalted roads

Such metric gives rise to the following dissimilarity measure for HSI space:

– Case achromatic pixels: use only the deﬁnition of diss I given in Eq (3)

– Case chromatic pixels:

where the diﬀerent criterion regarding chromaticity is used to take into account

the fact that hue value (H) is meaningless when the intensity (I) is very low or very high, or when the saturation (S) is very low For such cases only intensity

is taken into account for aggregation We use the proposal in [8,1] to deﬁne the

frontier of meaningful hue, i.e., p is an achromatic pixel if either I(p) > 0.9Imaxor

I(p) < 0.1Imaxor S(p) < 0.1Smax, where Imaxand Smaxrepresent the maximumintensity and saturation values, respectively

In summary, to compute Eq (1) we use Eq (2) forI with threshold t agg, I, Eq

(3) for I with threshold tagg,I , and Eq (4) for HSI with thresholds tagg,ch

(chro-matic case) and t agg,ach (achromatic case) Figure 4 shows the results obtainedfor examples of both asphalted and non–asphalted roads We have manually set

the tagg, I , tagg,I, and tagg,ch , t agg,ach parameters to obtain the best results foreach feature space, but such values are not changed from image to image, i.e.,all the frames of our sequences have been processed with them ﬁxed

These results suggest that I is a more suitable feature space for road

seg-mentation than the others Road surface is well recovered most of the times,with the segmentation stopping at road limits and vehicles1, even with a simple

1 Other on going experiments, not included here for space restrictions, also show thatsegmentation is quite stable regarding the chosen aggregation threshold as well as

the number and position of seeds, much more stable than both I and HSI.

Trang 33

Fig 4.From left to right columns: (a) original 640 × 480 color image with the seven used seeds marked in white; (b) segmentation using I with t agg,I = 0, 008; (c) segmentation using I with t agg, I = 0, 003; (d) segmentation using HSI with t agg,ch = 0, 08, and t agg,ach = 0, 008 The white pixels over the original image correspond to the seg-

mentation results The top four rows correspond to asphalted roads and the rest tonon–asphalted areas of a parking

segmentation method Now, such segmentation can be augmented with roadshape models like in [9,10] with the aim of estimating the not seen road in case

of many vehicles in the scene As a result, road limits and road curvature tained will be useful for applications as road departure warning The processing

Trang 34

ob-16 J.M ´Alvarez, A.M L´opez, and R Baldrich

time required in non–optimized MatLab code to computeI is about 125ms and 700ms for the whole segmentation process We expect it to reach real–time when

written in C++ code

5 Conclusions

We have addressed road segmentation by using a shadow–free image (I) In

order to illustrate the suitability of I for such task we have devised a very

simple segmentation method based on region growing By using this method wehave provided comparative results for asphalted and non–asphalted roads whichsuggest that I makes the segmentation process easier in comparison to other popular feature space found in road segmentation algorithms, namely the HSI.

In addition, the process can run in real–time In fact, since the computation ofI

only depends on a precalculated parameter, i.e., the camera characteristic angle

θ, it is possible that a camera supplier would provide such angle after calibration

(analogously to calibration parameters provided with stereo rigs)

Acknowledgments This work was supported by the Spanish Ministry of

Education and Science under project TRA2004-06702/AUT

References

1 Sotelo, M., Rodriguez, F., Magdalena, L., Bergasa, L., Boquete, L.: A colorvision-based lane tracking system for autonomous driving in unmarked roads

Autonomous Robots 16(1) (2004)

2 Rotaru, C., Graf, T., Zhang, J.: Extracting road features from color images using

a cognitive approach In: IEEE Intelligent Vehicles Symposium (2004)

3 Ramstrom, O., Christensen, H.: A method for following unmarked roads In: IEEEIntelligent Vehicles Symposium (2005)

4 Finlayson, G., Hordley, S., Lu, C., Drew, M.: On the removal of shadows from

images IEEE Trans on Pattern Analysis and Machine Intelligence 28(1) (2006)

5 Wyszecki, G., Stiles, W.: Section 1.2 In: Color science: concepts and methods,quantitative data and formulae (2nd Edition) John Wiley & Sons (1982)

6 Gonzalez, R., Woods, R.: Section 10.4 In: Digital Image Processing (2nd Edition).Prentice Hall (2002)

7 Finlayson, G., Hordley, S., Drew, M.: Removing shadows from images In: pean Conference on Computer Vision (2002)

Euro-8 Ikonomakis, N., Plataniotis, K., Venetsanopoulos, A.: Color image segmentation

for multimedia applications Journal of Intelligent Robotics Systems 28(1-2) (2000)

9 He, Y., Wang, H., Zhang, B.: Color–based road detection in urban traﬃc scenes

IEEE Transactions on Intelligent Transportation Systems 5(24) (2004)

10 Lombardi, P., Zanin, M., Messelodi, S.: Switching models for vision-based on–boardroad detection In: International IEEE Conference on Intelligent TransportationSystems (2005)

Trang 35

Mosaicking Cluttered Ground Planes Based on

Stereo Vision

Jos´e Gaspar1, Miguel Realpe2, Boris Vintimilla2, and Jos´e Santos-Victor1

1 Computer Vision Laboratory Inst for Systems and Robotics Instituto Superior

T´ecnico Lisboa, Portugal

{jag,jasv}@isr.ist.utl.pt

2Vision and Robotics Center Dept of Electrical and Computer Science Eng

Escuela Superior Polit´ecnica del Litoral Guayaquil, Ecuador

{mrealpe,boris.vintimilla}@fiec.espol.edu.ec

Abstract. Recent stereo cameras provide reliable 3D reconstructions.These are useful for selecting ground-plane points, register them andbuilding mosaics of cluttered ground planes In this paper we propose

a 2D Iterated Closest Point (ICP) registration method, based on thedistance transform, combined with a fine-tuning-registration step usingdirectly the image data Experiments with real data show that ICP isrobust to 3D reconstruction differences due to motion and the fine tuningstep minimizes the effect of the uncertainty in the 3D reconstructions

1 Introduction

In this paper we approach the problem of building mosaics, i.e image montages,

of cluttered ground planes, using stereo vision on-board of a wheeled mobilerobot Mosaics are useful for the navigation of robots and for building human-robot interfaces One clear advantage of mosaics is the simple representation ofrobot localization and motion: they are simply 2D rigid transformations.Many advances have been made recently in vision based navigation Flexi-ble (and precise) tracking and reconstruction of visual features, using particleﬁlters, allowed real time Simultaneous Localization and Map Building (SLAM)[1] The introduction of scale-invariant visual features brought more robustnessand allowed very inexpensive navigation solutions [2,3] Despite being eﬀective,these navigation modalities lack building dense scene representations convenientfor intuitive human-robot interfaces Recent commercial stereo cameras came

to help by giving locally dense 3D scene reconstructions Iterative methods formatching points and estimation their rigid motion, allow registering the localreconstructions and obtaining global scene representations The Iterated ClosestPoint (ICP) [4] is one such method that we explore in this work

The ICP basic algorithm has been extended in a number of ways Examples ofimprovements are robustifying the algorithm to the inﬂuence of features lackingcorrespondences or using weighted metrics to trade-oﬀ distance and feature simi-larity [5] More recent improvements target real time implementations, matchingshapes with defects or mixing probabilistic matching metrics with saturations

to minimize the eﬀect of outliers [6,7,8] In our case, the wheeled mobile robots

J Mart´ı et al (Eds.): IbPRIA 2007, Part II, LNCS 4478, pp 17–24, 2007.

c

Trang 36

18 J Gaspar et al.

motion on the ground plane allows searching for 2D, instead of 3D, tions Hence we follow a 2D ICP methodology, but we take a computer visionapproach, namely registering clouds of points using the distance transform [9].Stereo cameras allow selecting ground-plane points, registering them and thenbuilding the ground plane mosaic Stereo reconstruction is therefore an advan-tage, however some speciﬁc issues arise about its use For example, the discretenature of the imaging process, and the variable imaging of objects and occlusionsdue to robot motion, imply uncertainties on the 3D reconstruction Hence, theregistration of 3D data propagates also some intrinsic uncertainty The selection

registra-of ground-plane data, is convenient for complexity reduction, however a question

of the sparsity of data arises In our work we investigate robust methodologies

to deal with these issues, and in particular we investigate whether resorting tothe raw image data can help minimizing error propagation

The paper is structured as follows: Sec.2 details the mosaicking problem andintroduces our approach to solve it; Sec.3 shows how we build the orthographicviews of the ground plane; Sec.4 details the optimization functionals associated

to mosaic construction; Sec.5 is the results section; Finally in Sec.6 we drawsome conclusions and guidelines for future work

2 Problem Description

The main objective of our work is mosaicking (mapping) the ground plane sidering that it can be cluttered with objects such as furniture The sensor is astatic trinocular-stereo camera mounted on a wheeled mobile robot The stereocamera gives 3D clouds of points in the camera coordinate system, i.e a mobileframe changed by the robot motion See Fig 1

con-The ground plane constraint implies that the relationships between cameracoordinate systems are 2D rigid motions As in general the camera is not alignedwith the ﬂoor, i.e the camera coordinate system does not have two axis parallel

to the ground plane, the relationships do not clearly show their 2D nature Inorder to make clear the 2D nature of the problem, we deﬁne a new coordinatesystem aligned with the ground plane (three user-selected well-separated groundpoints are suﬃcient for this purpose)

Fig 1.Mosaicking ground planes: Stereo camera, Image and BEV coordinate systems

Trang 37

Mosaicking Cluttered Ground Planes Based on Stereo Vision 19

Commercial stereo cameras give dense reconstructions For example, for eachimage feature, such as a corner or an edge point, there are usually about 20 to 30reconstructed 3D points (the exact number depend on the size of the correlationwindows) Considering standard registration methods as Iterated Closest Point(ICP, [4]), the large clouds of 3D points imply large computational costs Hence,

we choose to work with a subset of the data, namely by selecting just points ofthe ground plane The 2D clouds of points can therefore be registered with a 2DICP method

Noting that each 3D cloud of points results from stereo images registration,the process of registering consecutive clouds of points has some error propagatedfrom the cloud reconstruction In order to minimize the error propagation, weadd a ﬁne tuning image-based registration process after the initial registration

by a 2D ICP method The image-based registration is a 2D rigid transformation

in Bird’s Eye Views (BEV), i.e orthographic images of the ground plane BEVimages can be obtained also knowing some ground points and the projectiongeometry To maintain consistent units system, despite having metric values inthe 3D clouds of points, we choose to process both the 2D ICP and the imageregistration in the pixel metric system, i.e the same as the raw data

In summary our approach encompasses two main steps: (i) selection of groundpoints and 2D ICP, (ii) BEV image registration Despite the 2D methodologynotice that the 3D data is a principal component The 3D sensor allows select-ing the ground plane data, which is useful not only for using a faster 2D ICPmethod but mainly for registering the ground plane images without consideringthe distracting (biasing) non-ground regions

3 Obtaining Bird’s Eye Views (BEV)

The motion of the robot implies a motion of the trinocular camera which wedenote as2T1 The indexes 1 and 2 indicate two consecutive times, and tag alsothe coordinate systems at the diﬀerent times, e.g the camera frames{cam1} and {cam2} The image plane deﬁnes new coordinate systems, {img1} and {img2},

and the BEV deﬁnes another ones,{bev1} and {bev2} See Fig 1.

The projection matrix, P relating {cam i } and {img i } is given by the camera

manufacturer or by a standard calibration procedure [10] In this section we are

mainly concerned with obtaining the homography, H relating the image plane

with the BEV

The BEV dewarping, H is deﬁned by back-projecting to the ground plane four image points (appendix A details the back-projection matrix, P ∗) The

four image points are chosen so to comprehend most of the ﬁeld of view imagingthe ground plane The region close to the horizon line is discarded due to poorresolution Scaling is chosen such that it preserves the largest resolution available,i.e no image-data loss due to sub sampling

Is interesting to note that the knowledge of the 3D camera-motion,2T1directlygives the BEV 2D rigid transformation,2H1 (see Fig 1):

2

Trang 38

4 Mosaic Construction

The input data for mosaic creation consists of BEV images, It and It+1, and

clouds of ground-points projected in the BEV coordinate system,{[u v] T

t,i } and {[u v] T

t+1,i } In this frame, the camera motion is a 2D rigid translation,2H1,

which can be represented by three parameters μ = [δu δv δθ] We want to ﬁnd

μ such that the clouds of points match as close as possible:

μ ∗= arg

μmin

i

[u v]T t+1,j − Rot(δθ).[u v] T

a saturation on the distance transform (constant distances imply no inﬂuence inthe optimization process)

Given the ﬁrst estimation of the 2D motion and the knowledge of groundpoints, we can now ﬁne tune the registration using ground plane image data:

an initial stage These values are updated in the optimization process only iftrue matchings become possible, i.e a new hypothetical 2D rigid motion be-tween BEV images can bring to visibility unmatched points This allows furthersmoothing the optimization process for points near the border of the ﬁeld ofview

Finally, given the 2D rigid motion, the mosaic composition is just an mulation of images A growing 2D image buﬀer is deﬁned such as to hold imagepoints of the ground plane along the robot traveled path

Trang 39

accu-Mosaicking Cluttered Ground Planes Based on Stereo Vision 21

215 stereo images are acquired along the path

Figure 2 illustrates the dewarping to BEV images and the registration of thedewarped images The BEV images are 1568× 855 One meter measured in the

ground plane is equivalent to 318 pixels the BEV (this calibration informationderives directly from the stereo-camera calibration) The registration is illus-trated by super-imposing consecutive BEV images after applying to the ﬁrst

(a) Trinocular (b) Reference camera (c) Reference camera

camera time t time t + 1.

(d) Dewarping (e) Superposition (f ) Distance

BEV of (b) without registration transform of (c).

0.2 0.4 0.6 0.8 1

F1( δθ )

F2( δθ )

(g) Superposition (h) Cost functionals vs perturbation δθ

after registration (costs normalized to [0, 1], δθ in [−100

,100]).

Fig 2.BEV dewarping and registration (a) Stereo camera (b) and (c) show structed ground-points (blue-points) in the reference camera of the stereo setup (d)BEV dewarping of (b) (e) superposition of BEVs without registration (notice the blur).(f) distance transform of the ground points seen in (c) (g) correct superposition of allground points after registration (h) comparison of the cost functionals by perturbing

recon-δθabout the minimizing point: registration using Eq.2 has a larger convergence region(dots plot) but the image-based registration, Eq.3 is more precise (circles plot)

Trang 40

22 J Gaspar et al.

(a) View of the working space and of the robot.

(b) Ground points used (c) Mosaic with all imaging for registration (landmarks) data superimposed

Fig 3.View of the working area (a), mosaic of the ground points chosen as landmarkswhile registering the sequence of BEV images (b) and a mosaic with all the visualinformation superimposed (c)

image the estimated 2D rigid motion Notice in particular in Fig 2c the niﬁcant shape diﬀerences of the clouds of points as compared to Fig 2b, and

sig-in Fig 2g the graceful degradation of the superposition for posig-ints progressivelymore distant to the ground plane Fig 2f shows the distance transform usedfor matching ground-points The matching is performed repeatedly in Eq.2 inorder to obtain the optimal registration shown in Fig 2g The existence of local-clusters of points, instead of isolated points, motivates a wider-convergence butless precise registration which can be improved resorting to image data (Eq.3)

as shown in ﬁgure Fig 2h

The mosaicking of BEVs shows clearly the precision of the registration process

In particular shows that the image-based registration improves signiﬁcantly the2D motion estimation After one complete circle described by the robot, the 2DICP registration gives about 2.7 meters localization error (28% error over path

Tiêu đề	Pattern Recognition and Image Analysis
Tác giả	Joan Martớ, Josộ Miguel Benedớ, Ana Maria Mendonỗa, Joan Serrat
Trường học	University of Girona
Chuyên ngành	Pattern Recognition and Image Analysis
Thể loại	Proceedings
Năm xuất bản	2007
Thành phố	Girona

Định dạng
Số trang	674
Dung lượng	23,95 MB