Hern´andez et al.Persuing the same idea, a combination of the speech enhanced signal sented by the SS method and a feature vector normalization techniquePD-MEMLIN [7] are presented in th
Trang 2Lecture Notes in Computer Science 4478
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Trang 3Joan Martí José Miguel Benedí
Ana Maria Mendonça Joan Serrat (Eds.)
Pattern Recognition and Image Analysis
Third Iberian Conference, IbPRIA 2007 Girona, Spain, June 6-8, 2007
Proceedings, Part II
1 3
Trang 4José Miguel Benedí
Polytechnical University of Valencia
Camino de Vera, s/n., 46022 Valencia, Spain
Centre de Visió per Computador-UAB
Campus UAB, 08193 Belaterra, (Cerdanyola), Barcelona, Spain
E-mail: joan.serrat@cvc.uab.es
Library of Congress Control Number: 2007927717
CR Subject Classification (1998): I.4, I.5, I.7, I.2.7, I.2.10
LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition,and Graphics
ISBN-10 3-540-72848-1 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-72848-1 Springer Berlin Heidelberg New York
This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
Trang 5A record number of 328 full paper submissions from 27 countries were ceived Each of these submissions was reviewed in a blind process by two re-viewers The review assignments were determined by the four General Chairs,and the final decisions were made after the Chairs meeting in Girona, giving anoverall acceptance rate of 47.5% Because of the limited size of the conference,
re-we regret that some worthy papers re-were probably rejected
In keeping with the IbPRIA tradition of having a single track of oral tations, the number of oral papers remained in line with the previous IbPRIAeditions, with a total of 48 papers The number of poster papers was settled to108
presen-We were also very honored to have as invited speakers such internationallyrecognized researchers as Chris Willians from the University of Edinburgh, UK,Michal Irani from The Weizmann Institute of Science, Israel and Andrew Davisonfrom Imperial College London, UK
For the first time, some relevant related events were scheduled in parallel tothe IbPRIA main conference according to the Call for Tutorials and Workshops:Antonio Torralba from MIT, USA and Aleix Mart´ınez from Ohio State Uni-versity, USA taught relevant tutorials about object recognition and StatisticalPattern Recognition, respectively, while the “Supervised and Unsupervised En-semble Methods and Their Applications” workshop and the first edition of the
“Spanish Workshop on Biometrics” were successfully developed
We would like to thank all the authors for submitting their papers and thusmaking these proceedings possible We address special thanks to the members ofthe Program Committee and the additional reviewers for their great work whichcontributed to the high quality of these proceedings
We are also grateful to the Local Organizing Committee for their substantialcontribution of time and effort
Trang 6VI Preface
Finally, our thanks go to IAPR for support in sponsoring the Best PaperPrize at IbPRIA 2007
The next edition of IbPRIA will be held in Portugal in 2009
Ana Maria Mendon¸caJos´e Miguel Bened´ı
Joan Serrat
Trang 7IbPRIA 2007 was organized by AERFAI (Asociaci´on Espa˜nola de Reconocimiento
de Formas y An´alisis de Im´agenes) and APRP (Associa¸c˜ao Portuguesa de hecimento de Padr˜oes), and as the local organizer of this edition, the ComputerVision and Robotics Group, Institute of Informatics and Applications, University
Recon-of Girona (UdG)
General Conference Co-chairs
Joan Mart´ı University of Girona, Spain
Jos´e Miguel Bened´ı Polytechnical University of Valencia, Spain
Ana Maria Mendon¸ca University of Porto, Portugal
Joan Serrat Universitat Aut`onoma de Barcelona, Spain
Invited Speakers
Chris Williams University of Edinburgh, UK
Michal Irani The Weizmann Institute of Science, Israel
Andrew Davison Imperial College London, UK
National Organizing Committee
Trang 8Francisco Casacuberta Polytechnical University of Valencia, SpainVicent Caselles Universitat Pompeu Fabra, Spain
Aur´elio Campilho University of Porto, Portugal
Lu´ıs Corte-Real University of Porto, Portugal
Pierre Dupont Universit´e catholique de Louvain, BelgiumMarcello Federico ITC-irst Trento, Italy
Vito di Ges´u University of Palermo, Italy
Francisco Mario Hern´andez Tejera Universidad de Las Palmas, Spain
Laurent Heutte Universit´e de Rouen, France
Jos´e Manuel I˜nesta Quereda Universidad de Alicante, Spain
Jorge Marques Technical University of Lisbon, Portugal
Wiro Niessen University of Utrecht, The NetherlandsFrancisco Jos´e Perales Universitat de les Illes Balears, SpainNicol´as P´erez de la Blanca University of Granada, Spain
Fernando P´erez Cruz Universidad Carlos III, Spain
Ioannis Pitas University of Thessaloniki, Greece
Alberto Sanfeliu Polytechnical University of Catalonia, SpainGabriella Sanniti di Baja Istituto di Cibernetica CNR, Italy
Trang 9Organization IX
Pierre Soille Joint Research Centre, Italy
M In´es Torres University of the Basque Country, Spain
Jordi Vitri`a Universitat Aut`onoma de Barcelona, Spain
Joachim Weickert Saarland University, Germany
Reyer Zwiggelaar University of Wales, Aberystwyth, UK
Reviewers
Maria Jos´e Abasolo University of the Balearic Islands, Spain
Antonio Ad´an Universidad de Castilla La Mancha, SpainFrancisco J´avier L´opez Aligu´e University of Extremadura, Spain
Joachim Buhmann ETH Zurich, Switzerland
Juan Carlos Amengual UJI-LPI, Spain
Hans Burkhard University of Freiburg, Germany
Ramon Baldrich Computer Vision Center, Spain
Jorge Pereira Batista ISR Coimbra, Portugal
Alexandre Bernardino Instituto Superior T´ecnico, Portugal
Lilian Blot University of East Anglia, UK
Marcello Federico ITC-irst Trento, Italy
Michael Breuss Saarland University, Germany
Jaime Santos Cardoso INESC Porto, Portugal
Modesto Castrill´on Universidad de Las Palmas de Gran Canaria,
SpainMiguel Velhote Correia Instituto de Engenharia Biom´edica, Portugal
Jorge Alves da Silva FEUB-INEB, Portugal
Hans du Buf University of Algarve, Portugal
´
Oscar Deniz Universidad de Las Palmas de Gran Canaria,
SpainDaniel Hern´andez-Sosa Universidad de Las Palmas de Gran Canaria,
Spain
Claudio Eccher ITC-irst Trento, Italy
Arturo De la Escalera Universidad Carlos III de Madrid, SpainMiquel Feixas Universitat de Girona, Spain
Francesc J Ferri Universitat de Val`encia, Spain
Jordi Freixenet University of Girona, Spain
Maria Frucci Institute of Cybernetics “E Caianiello”, ItalyCesare Furlanello ITC-irst Trento, Italy
Miguel ´Angel Garc´ıa Universidad Aut´onoma de Madrid, SpainRafael Garc´ıa University of Girona, Spain
Trang 10X Organization
Yolanda Gonz´alez Universidad de las Islas Baleares, Spain
Manuel Gonz´alez Universitat de les Illes Balears, Spain
Nuno Gracias University of Girona, Spain
Nicol´as Guil University of Malaga, Spain
Alfons Juan Universitat Polit`ecnica de Val`encia, Spain
Fr´ed´eric Labrosse University of Wales, Aberystwyth, UK
Bart Lamiroy Nancy Universit´e - LORIA - INPL, FranceXavier Llad´o University of Girona, Spain
Paulo Lobato Correia IT - IST, Portugal
´
Angeles L´opez Universitat Jaume I, Spain
Javier Lorenzo Universidad de Las Palmas de Gran Canaria,
SpainManuel Lucena Universidad de Ja´en, Spain
Enric Mart´ı Universitat Aut`onoma de Barcelona, SpainRobert Mart´ı Universitat de Girona, Spain
Elisa Mart´ınez Enginyeria La Salle, Universitat Ramon Llull,
SpainCarlos Mart´ınez Hinarejos Universidad Polit´ecnica de Valencia, SpainFabrice Meriaudeau Le2i UMR CNRS 5158, France
Maria Luisa Mic´o Universidad de Alicante, Spain
Birgit M¨oller Martin Luther University Halle-Wittenberg,
GermanyRam´on Mollineda Universidad Jaume I, Spain
Jacinto Nascimento Instituto de Sistemas e Rob´otica, PortugalShahriar Negahdaripour University of Miami, USA
Gabriel A Oliver-Codina University of the Balearic Islands, Spain
Jos´e Oncina Universidad de Alicante, Spain
Joao Paulo Costeira Instituto de Sistemas e Rob´otica, PortugalAntonio Miguel Peinado Universidad de Granada, Spain
Caroline Petitjean Universit´e de Rouen, France
Andr´e Teixeira Puga Universidade do Porto, Portugal
Petia Radeva Computer Vision Center-UAB, Spain
Joao Miguel Raposo Sanches Instituto Superior T´ecnico, Portugal
Antonio Rubio Universidad de Granada, Spain
Jos´e Ruiz Shulcloper Advanced Technologies Application Center, Cuba
J Salvador S´anchez Universitat Jaume I, Spain
Joaquim Salvi University of Girona, Spain
Joan Andreu S´anchez Universitat Polit`ecnica de Val`encia, Spain
Elena S´anchez Nielsen Universidad de La Laguna, Spain
Trang 11Organization XI
Joao Silva Sequeira Instituto Superior T´ecnico, Portugal
Margarida Silveira Instituto Superior T´ecnico, Portugal
Joao Manuel R.S Tavares Universidade do Porto, Portugal
Antonio Teixeira Universidade de Aveiro, Portugal
Javier Traver Universitat Jaume I, Spain
Maria Vanrell Computer Vision Center, Spain
Javier Varona Universitat de les Illes Balears, Spain
Martin Welk Saarland University, Germany
Laurent Wendling LORIA, France
Michele Zanin ITC-irst Trento, Italy
Sponsoring Institutions
MEC (Ministerio de Educaci´on y Ciencia, Spanish Government)
AGAUR (Ag`encia de Gesti´o d’Ajuts Universitaris i de Recerca, CatalanGovernment)
IAPR (International Association for Pattern Recognition)
Vicerectorat de Recerca en Ci`encia i Tecnologia, Universitat de Girona
Trang 12Table of Contents – Part II
Robust Automatic Speech Recognition Using PD-MEEMLIN . 1
Igmar Hern´ andez, Paola Garc´ıa, Juan Nolazco, Luis Buera, and
Eduardo Lleida
Shadow Resistant Road Segmentation from a Mobile Monocular
System . 9
Jos´ e Manuel ´ Alvarez, Antonio M L´ opez, and Ramon Baldrich
Mosaicking Cluttered Ground Planes Based on Stereo Vision . 17
Jos´ e Gaspar, Miguel Realpe, Boris Vintimilla, and
Jos´ e Santos-Victor
Fast Central Catadioptric Line Extraction . 25
Jean Charles Bazin, C´ edric Demonceaux, and Pascal Vasseur
Similarity-Based Object Retrieval Using Appearance and Geometric
Feature Combination . 33
Agn´ es Borr` as and Josep Llad´ os
Real-Time Facial Expression Recognition for Natural Interaction . 40
Eva Cerezo, Isabelle Hupont, Cristina Manresa-Yee, Javier Varona,
Sandra Baldassarri, Francisco J Perales, and Francisco J Seron
A Simple But Effective Approach to Speaker Tracking in Broadcast
News . 48
Luis Javier Rodr´ıguez, Mikel Pe˜ nagarikano, and Germ´ an Bordel
Region-Based Pose Tracking . 56
Christian Schmaltz, Bodo Rosenhahn, Thomas Brox,
Daniel Cremers, Joachim Weickert, Lennart Wietzke, and
Gerald Sommer
Testing Geodesic Active Contours . 64
A Caro, T Alonso, P.G Rodr´ıguez, M.L Dur´ an, and M.M ´ Avila
Rate Control Algorithm for MPEG-2 to H.264/AVC Transcoding . 72
Gao Chen, Shouxun Lin, and Yongdong Zhang
3-D Motion Estimation for Positioning from 2-D Acoustic Video
Imagery . 80
H Sekkati and S Negahdaripour
Progressive Compression of Geometry Information with Smooth
Intermediate Meshes . 89
Taejung Park, Haeyoung Lee, and Chang-hun Kim
Trang 13XIV Table of Contents – Part II
Rejection Strategies Involving Classifier Combination for Handwriting
Recognition . 97
Jose A Rodr´ıguez, Gemma S´ anchez, and Josep Llad´ os
Summarizing Image/Surface Registration for 6DOF Robot/Camera
Pose Estimation . 105
Elisabet Batlle, Carles Matabosch, and Joaquim Salvi
Robust Complex Salient Regions . 113
Sergio Escalera, Oriol Pujol, and Petia Radeva
Improving Piecewise-Linear Registration Through Mesh
Optimization . 122
Vicente Ar´ evalo and Javier Gonz´ alez
Registration-Based Segmentation Using the Information Bottleneck
Method . 130
Anton Bardera, Miquel Feixas, Imma Boada, Jaume Rigau, and
Mateu Sbert
Dominant Points Detection Using Phase Congruence . 138
Francisco Jos´ e Madrid-Cuevas, Rafel Medina-Carnicer,
´
Angel Carmona-Poyato, and Nicol´ as Luis Fern´ andez-Garc´ıa
Exploiting Information Theory for Filtering the Kadir Scale-Saliency
Detector . 146
Pablo Suau and Francisco Escolano
False Positive Reduction in Breast Mass Detection Using
Two-Dimensional PCA . 154
Arnau Oliver, Xavier Llad´ o, Joan Mart´ı, Robert Mart´ı, and
Jordi Freixenet
A Fast and Robust Iris Segmentation Method . 162
No´ e Otero-Mateo, Miguel ´ Angel Vega-Rodr´ıguez,
Juan Antonio G´ omez-Pulido, and
Juan Manuel S´ anchez-P´ erez
Detection of Lung Nodule Candidates in Chest Radiographs . 170
Carlos S Pereira, Hugo Fernandes, Ana Maria Mendon¸ ca, and
Aur´ elio Campilho
A Snake for Retinal Vessel Segmentation . 178
L Espona, M.J Carreira, M Ortega, and M.G Penedo
Risk Classification of Mammograms Using Anatomical Linear Structure
and Density Information . 186
Edward M Hadley, Erika R.E Denton, Josep Pont,
Elsa P´ erez, and Reyer Zwiggelaar
Trang 14Table of Contents – Part II XV
A New Method for Robust and Efficient Occupancy Grid-Map
Nazife Dimililer, Ekrem Varo˘ glu, and Hakan Altın¸ cay
Boundary Shape Recognition Using Accumulated Length and Angle
Information . 210
Mar¸ cal Rusi˜ nol, Philippe Dosch, and Josep Llad´ os
Extracting Average Shapes from Occluded Non-rigid Motion . 218
Alessio Del Bue
Automatic Topological Active Net Division in a Genetic-Greedy Hybrid
Approach . 226
N Barreira, M.G Penedo, O Ib´ a˜ nez, and J Santos
Using Graphics Hardware for Enhancing Edge and Circle Detection . 234
Antonio Ruiz, Manuel Ujald´ on, and Nicol´ as Guil
Optimally Discriminant Moments for Speckle Detection in Real B-Scan
Images . 242
Robert Mart´ı, Joan Mart´ı, Jordi Freixenet,
Joan Carles Vilanova, and Joaquim Barcel´ o
Influence of Resampling and Weighting on Diversity and Accuracy of
Classifier Ensembles . 250
R.M Valdovinos, J.S S´ anchez, and E Gasca
A Hierarchical Approach for Multi-task Logistic Regression . 258
`
Agata Lapedriza, David Masip, and Jordi Vitri` a
Modelling of Magnetic Resonance Spectra Using Mixtures for Binned
and Truncated Data . 266
Juan M Garcia-Gomez, Montserrat Robles, Sabine Van Huffel, and
Alfons Juan-C´ıscar
Atmospheric Turbulence Effects Removal on Infrared Sequences
Degraded by Local Isoplanatism . 274
Magali Lemaitre, Olivier Laligant, Jacques Blanc-Talon, and
Trang 15XVI Table of Contents – Part II
Word Spotting in Archive Documents Using Shape Contexts . 290
Josep Llad´ os, Partha Pratim-Roy, Jos´ e A Rodr´ıguez, and
Gemma S´ anchez
Fuzzy Rule Based Edge-Sensitive Line Average Algorithm in Interlaced
HDTV Sequences . 298
Gwanggil Jeon, Jungjun Kim, Jongmin You, and Jechang Jeong
A Tabular Pruning Rule in Tree-Based Fast Nearest Neighbor Search
Algorithms . 306
Jose Oncina, Franck Thollard, Eva G´ omez-Ballester,
Luisa Mic´ o, and Francisco Moreno-Seco
A General Framework to Deal with the Scaling Problem in
Phrase-Based Statistical Machine Translation . 314
Daniel Ortiz, Ismael Garc´ıa Varea, and Francisco Casacuberta
Recognizing Individual Typing Patterns . 323
Michal Chora´ s and Piotr Mroczkowski
Residual Filter for Improving Coding Performance of Noisy Video
Sequences . 331
Won Seon Song, Seong Soo Lee, and Min-Cheol Hong
Cyclic Viterbi Score for Linear Hidden Markov Models . 339
Vicente Palaz´ on and Andr´ es Marzal
Non Parametric Classification of Human Interaction . 347
Scott Blunsden, Ernesto Andrade, and Robert Fisher
A Density-Based Data Reduction Algorithm for Robust Estimators . 355
L Ferraz, R Felip, B Mart´ınez, and X Binefa
Robust Estimation of Reflectance Functions from Polarization . 363
Gary A Atkinson and Edwin R Hancock
Estimation of Multiple Objects at Unknown Locations with Active
Contours . 372
Margarida Silveira and Jorge S Marques
Analytic Reconstruction of Transparent and Opaque Surfaces from
Texture Images . 380
Mohamad Ivan Fanany and Itsuo Kumazawa
Sedimentological Analysis of Sands . 388
Cristina Lira and Pedro Pina
Catadioptric Camera Calibration by Polarization Imaging . 396
O Morel, R Seulin, and and D Fofi
Trang 16Table of Contents – Part II XVII
Stochastic Local Search for Omnidirectional Catadioptric Stereovision
Design . 404
G Dequen, L Devendeville, and E Mouaddib
Dimensionless Monocular SLAM . 412
Javier Civera, Andrew J Davison, and J.M.M Montiel
Improved Camera Calibration Method Based on a Two-Dimensional
Template . 420
Carlos Ricolfe-Viala and Antonio-Jose Sanchez-Salmeron
Relative Pose Estimation of Surgical Tools in Assisted Minimally
Invasive Surgery . 428
Agustin Navarro, Edgar Villarraga, and Joan Aranda
Efficiently Downdating, Composing and Splitting Singular Value
Decompositions Preserving the Mean Information . 436
Javier Melench´ on and Elisa Mart´ınez
On-Line Classification of Human Activities . 444
J.C Nascimento, M.A.T Figueiredo, and J.S Marques
Data-Driven Jacobian Adaptation in a Multi-model Structure for Noisy
Speech Recognition . 452
Yong-Joo Chung and Keun-Sung Bae
Development of a Computer Vision System for the Automatic Quality
Grading of Mandarin Segments . 460
Jos´ e Blasco, Sergio Cubero, Ra´ ul Arias, Juan G´ omez,
Florentino Juste, and Enrique Molt´ o
Mathematical Morphology in theHSI Colour Space 467
M.C Tobar, C Platero, P.M Gonz´ alez, and G Asensio
Improving Background Subtraction Based on a Casuistry of
Colour-Motion Segmentation Problems . 475
I Huerta, D Rowe, M Mozerov, and J Gonz` alez
Random Forest for Gene Expression Based Cancer Classification:
Overlooked Issues . 483
Oleg Okun and Helen Priisalu
Bounding the Size of the Median Graph . 491
Miquel Ferrer, Ernest Valveny, and Francesc Serratosa
When Overlapping Unexpectedly Alters the Class Imbalance Effects . 499
V Garc´ıa, R.A Mollineda, J.S S´ anchez, R Alejo, and J.M Sotoca
A Kernel Matching Pursuit Approach to Man-Made Objects Detection
in Aerial Images . 507
Wei Wang, Xin Yang, and Shoushui Chen
Trang 17XVIII Table of Contents – Part II
Anisotropic Continuous-Scale Morphology . 515
Michael Breuß, Bernhard Burgeth, and Joachim Weickert
Three-Dimensional Ultrasonic Assessment of Atherosclerotic Plaques . 523
Jos´ e Seabra, Jo˜ ao Sanches, Lu´ıs M Pedro, and
J Fernandes e Fernandes
Measuring the Applicability of Self-organization Maps in a Case-Based
Reasoning System . 532
A Fornells, E Golobardes, J.M Martorell, J.M Garrell,
E Bernad´ o, and N Maci` a
Algebraic-Distance Minimization of Lines and Ellipses for Traffic Sign
Shape Localization . 540
Pedro Gil-Jim´ enez, Saturnino Maldonado-Basc´ on,
Hilario G´ omez-Moreno, Sergio Lafuente-Arroyo, and
Javier Acevedo-Rodr´ıguez
Modeling Aceto-White Temporal Patterns to Segment Colposcopic
Images . 548
H´ ector-Gabriel Acosta-Mesa, Nicandro Cruz-Ram´ırez,
Rodolfo Hern´ andez-Jim´ enez, and
Daniel-Alejandro Garc´ıa-L´ opez
Speech/Music Classification Based on Distributed Evolutionary Fuzzy
Logic for Intelligent Audio Coding . 556
J.E Mu˜ noz Exp´ osito, N Ruiz Reyes, S Garcia Gal´ an, and
P Vera Candeas
Breast Skin-Line Segmentation Using Contour Growing . 564
Robert Mart´ı, Arnau Oliver, David Raba, and Jordi Freixenet
New Measure for Shape Elongation . 572
Miloˇ s Stojmenovi´ c and Joviˇ sa ˇ Zuni´ c
Evaluation of Spectral-Based Methods for Median Graph
Computation . 580
Miquel Ferrer, Francesc Serratosa, and Ernest Valveny
Feasible Application of Shape-Based Classification . 588
A Caro, P.G Rodr´ıguez, T Antequera, and R Palacios
3D Shape Recovery with Registration Assisted Stereo Matching . 596
Huei-Yung Lin, Sung-Chung Liang, and Jing-Ren Wu
Blind Estimation of Motion Blur Parameters for Image
Deconvolution . 604
Jo˜ ao P Oliveira, M´ ario A.T Figueiredo, and Jos´ e M Bioucas-Dias
Trang 18Table of Contents – Part II XIX
Dependent Component Analysis: A Hyperspectral Unmixing
Algorithm . 612
Jos´ e M.P Nascimento and Jos´ e M Bioucas-Dias
Synchronization of Video Sequences from Free-Moving Cameras . 620
Joan Serrat, Ferran Diego, Felipe Lumbreras, and
Jos´ e Manuel ´ Alvarez
Tracking the Left Ventricle in Ultrasound Images Based on Total
Variation Denoising . 628
Jacinto C Nascimento, Jo˜ ao M Sanches, and Jorge S Marques
Bayesian Oil Spill Segmentation of SAR Images Via Graph Cuts . 637
S´ onia Pelizzari and Jos´ e M Bioucas-Dias
Unidimensional Multiscale Local Features for Object Detection Under
Rotation and Mild Occlusions . 645
Michael Villamizar, Alberto Sanfeliu, and Juan Andrade Cetto
Author Index 653
Trang 19Robust Automatic Speech Recognition Using
PD-MEEMLIN
Igmar Hern´andez1, Paola Garc´ıa1, Juan Nolazco1, Luis Buera2,
and Eduardo Lleida2
1Computer Science Department, Tecnolgico de Monterrey,
Campus Monterrey, M´exico
2 Communications Technology Group (GTC), I3A, University of Zaragoza, Spain
{A00778595,paola.garcia,jnolazco,}@itesm.mx, {lbuera,lleida}@unizar.es
Abstract. This work presents a robust normalization technique bycascading a speech enhancement method followed by a feature vectornormalization algorithm To provide speech enhancement the SpectralSubtraction (SS) algorithm is used; this method reduces the effect of ad-ditive noise by performing a subtraction of the noise spectrum estimateover the complete speech spectrum On the other hand, an empirical fea-ture vector normalization technique known as PD-MEMLIN (Phoneme-Dependent Multi-Enviroment Models based LInear Normalization) hasalso shown to be effective PD-MEMLIN models clean and noisy spacesemploying Gaussian Mixture Models (GMMs), and estimates a set oflinear compensation transformations to be used to clean the signal Theproper integration of both approaches is studied and the final design, PD-MEEMLIN (Phoneme-Dependent Multi-Enviroment Enhanced Modelsbased LInear Normalization), confirms and improves the effectiveness ofboth approaches The results obtained show that in very high degradedspeech PD-MEEMLIN outperforms the SS by a range between 11.4% and34.5%, and for PD-MEMLIN by a range between 11.7% and 24.84% Fur-themore, in moderate SNR, i.e 15 or 20 dB, PD-MEEMLIN is as good
as PD-MEMLIN and SS techniques
1 Introduction
The robust speech recognition field plays a key rule in real environment cations Noise can degrade speech signals causing nocive effects in AutomaticSpeech Recognition (ASR) tasks Even though there have been great advances
appli-in the area, robustness still remaappli-ins an issue Noticappli-ing this problem, several niques have been developed over the years, for instance the Spectral Subtractionalgorithm (SS) [1]; and in the last decade, SPLICE (State Based Piecewise Lin-ear Compensation for Enviroments) [2], PMC (Parallel Model Combination) [3],RATZ (multivariate Gaussian based cepstral normalization) [4] and RASTA (theRelAtive SpecTrAl Technique) [5] The research that followed this evolution was
tech-to make a proper combination of algorithms in order tech-to reduce the noise fects For example, a good example is described in [6], where the core scheme iscomposed of a Continuous SS (CSS) and PMC
ef-J Mart´ı et al (Eds.): IbPRIA 2007, Part II, LNCS 4478, pp 1–8, 2007.
c
Springer-Verlag Berlin Heidelberg 2007
Trang 202 I Hern´andez et al.
Persuing the same idea, a combination of the speech enhanced signal sented by the SS method) and a feature vector normalization technique(PD-MEMLIN [7]) are presented in this work to improve the recognition accu-racy of the speech recognition system in highly degraded environments [8,9] Thefirst technique was selected because of its implementation simplicity and goodperformance The second one is an empirical vector normalization technique thathas been compared against some other algorithms [8] and has obtained impor-tant improvements
(repre-The organization of the paper is as follows In Section 2, a brief overview ofthe SS and PD-MEMLIN Section 3 details the new method PD-MEEMLIN InSection 4, the experimental results are presented Finally, the conclusions areshown in Section 5
2 Spectral Subtraction and PD-MEMLIN
In order to evaluate the proposed integration, an ASR system is employed Ingeneral, a pre-processing stage of the speech waveform is always desirable Thespeech signal is divided into overlaped short windows, from which a set of coeffi-cients, usually Mel Frequency Cepstral Coefficients (MFCCs)[10], are computed.The MFCCs are feeded to the training algorithm that calculates the acousticmodels The acoustic models used in this research are the Hidden Markov Mod-els (HMMs), which are widely used to model statistically the behaviour of thephonetic events in speech [10] The HMMs employ a sequence of hidden stateswhich characterises how a random process (speech in this case) evolves in time.Although the states are not observable, a sequence of realizations from thesestates can always be obtained Associated to each state there is a probabilitydensity function, normally a mixture of Gaussians The criteria used to trainthe HMMs is the Maximum Likelihood, thus, the training process becomes anoptimization problem that can be solved iteratively with the Baum and Welchalgorithm
2.1 Spectral Subtraction
The Spectral Subtraction (SS) algorithm is a simple and known speech ment technique This research is based on the SS algorithm expressed in [9] Ithas the property that it does not requiere the use of an explicit voice activitydetector, as general SS algorithms does The algorithm is based on the existance
enhance-of peaks and valleys in a short noisy speech time subband power estimate [9].The peaks correspond to the speech activity and the valleys are used to obtain
an estimate of the subband noise power So, a reliable noise estimation is tained using a large enough window that can pemit the detection of any peak ofspeech activity
ob-As shown in Figure 1, this algorithm performs a modification of the short timespectral magnitude of the noisy speech signal during the process of enhancement.Hence, the output signal can be considered close to the speech clean signal when
Trang 21Robust Automatic Speech Recognition Using PD-MEEMLIN 3
Fig 1.Diagram of the Basic SS Method Used
synthesized The appropriate computation of the spectral magnitude is obtained
with the noise power estimate and the SS algorithm Let, y(i) = x(i)+n(i), where y(i) is the noisy speech signal, x(i) is the clean speech signal, n(i) is the noise signal and i denotes the time index, x(i) and n(i) are statistically independent.
Figure 1 depicts the spectral analysis in which the frames in the time main data are windowed and converted to frequency domain using the Discrete
do-Fourier Transform (DFT) filter bank with WDF T subbands and with a
decima-tion/interpolation ratio named R [9] After the computation of the noise power
estimation and the spectral weightening, the enhanced signal can be transformedback to the time domain using the Inverse Discrete Fourier Transform (IDFT).For the subtraction algorithm it is necessary to estimate the subband noise
power Pn(λ, k) and the short time signal power |Y (λ, k)|2, where λ is the mated time index and k are the frequency bins of the DFT A first order recursive
deci-network is used to obtain a short time signal power as shown in Equation 1
|Y (λ, k)|2= γ ∗ |Y (λ − 1, k)|2+ (1− γ) ∗ |Y (λ, k)|2. (1)Afterwards, the subtraction algorithm is accomplished using an oversubtrac-
tion factor osub(λ, k) and a spectral flooring constant (subf ) [12] The osub(λ, k)
factor is needed to eliminate the musical noise, and it is calculated as a function
of the subband Signal to Noise Ratio SN Ry(λ, k), λ and k (for a high SNR and high frequencies less osub factor is required, for low SNR and low frequencies the osub is less) The subf constant helps the resultant spectral components from
going below a minimum level It is expressed as a fraction of the original noise
power spectrum The final relation of the spectral subtraction between subf and osub is defined by Equation 2.
Trang 224 I Hern´andez et al.
constant to obtain the periodograms Then, Pn(λ, k) is calculated as a weighted minimum of Px(λ, k) in a window of D subband samples Hence,
P n(λ, k) = omin · P min(λ, k), (3)
where P min (λ, k) denotes the estimated minimum power and omin is a bias
compensation factor The data window D is divided into W windows of length
M, allowing to update the minimum every M samples without time consuming.This noise estimator combined with the spectral subtraction has the ability
to preserve weak speech sounds If a short time subband power is observed,the valleys correspond to the noisy speech signal and are used to estimate thesubband noise power
The last element to be calculated is the SN Ry(λ, k) in Equation 4 that trols the oversubtraction factor osub(λ, k).
Up to this stage osub(λ, k) and subf can be selected and the spectral substraction
algorithm can be computed
PD-MEMLIN is an empirical feature vector normalization technique which usesstereo data in order to estimate the different compensation linear transforma-tions in a previous training process The clean feature space is modelled as amixture of Gaussians for each phoneme The noisy space is split in several ba-sic acoustic environments and each environment is modelled as a mixture ofGaussians for each phoneme The transformations are estimated for all basicenvironments between a clean phoneme Gaussian and a noisy Gaussian of thesame phoneme
PD-MEMLIN approximations Clean feature vectors, x, are modelled using
a GMM for each phoneme, ph
x ) are the mean vector, the diagonal covariance
ma-trix, and the a priori probability associated with the clean model Gaussian s ph x
Trang 23Robust Automatic Speech Recognition Using PD-MEEMLIN 5
Finally, clean feature vectors can be approximated as a linear function, f ,
of the noisy feature vector for each time frame t which depends on the basic environments, the phonemes and the clean and noisy model Gaussians: x ≈
f (y t , s ph
x , s e,ph
y ) = yt −r s ph
x ,s e,ph y , where r s ph
x ,s e,ph y is the bias vector transformation
between noisy and clean feature vectors for each pair of Gaussians, s ph
x and s e,ph
y
PD-MEMLIN enhancement With those approximations, PD-MEMLIN
transforms the Minimum Mean Square Error (MMSE) estimation expression,ˆ
where p(e |y t) is the a posteriori probability of the basic environment; p(ph |y t , e) is
the a posteriori probability of the phoneme, given the noisy feature vector and the
environment; p(s e,ph y |y t , e, ph) is the a posteriori probability of the noisy model Gaussian, s e,ph y , given the feature vector, yt, the basic environment, e, and the phoneme, ph To estimate those terms: p(e |y t ), p(ph |y t , e) and p(s e,ph
3 PD-MEEMLIN
By combinig both techniques, PD-MEEMLIN arises as an empirical featurevector normalization which estimates different linear transformations as PD-MEMLIN, with the special property that a new enhanced space is obtained byapplying SS to the noisy speech signal Furthermore, this first-stage enhance-ment produces that the noisy space gets closer to the clean one, making the gapsmaller among them Figure 2 shows PD-MEEMLIN architecture
Next, the architecture modules are explained:
– The SS-enhancement of the noisy speech signal is performed, | ˆ X(λ, k) |,
P n (λ, k) and SN R y (λ, k) are calculated.
– Given the clean speech signal and the enhanced noisy speech signal, the clean
and noisy-enhanced GMMs are obtained
Trang 246 I Hern´andez et al.
Fig 2.PD-MEEMLIN Architecture
– In the testing stage, the noisy speech signal is also SS-enhanced and then
normalized using PD-MEEMLIN
– These normalized coefficients are forwarded to the decoder.
4 Experimental Results
All the experiments were performed employing the AURORA2 database [13],clean and noisy data based on TIDigits Three types of noises were selected:Subway, Babble and Car from AURORA2, that go from -5dB to 20dB SNR For
every SNR the SS parameters osub and subf needs to be configured The eter osub takes values from 0.4 to 4.6 (0.4 for 20dB, 0.7 for 15dB, 1.3 for 10dB, 2.21 for 5dB, 4.6 for 0dB and 4.6 for -5dB) and subf values 0.03 or 0.04 (all SNR
param-levels except 5dB optimised for 0.04) The phonetic acoustic models employed
by PD-MEEMLIN are obtained from 22 phonemes and 1 silence The modelsset is represented by a mixture of 32 Gaussians each Besides, two new sets ofeach noise were used, PD-MEEMLIN needs one to estimate the enhanced-noisymodel, and onother to obtain the normalized coefficients The feature vectorsfor the recognition process are built by 12 normalized MFCCs followed by the
energy coefficient, its time-derative Δ and the time-acceleration ΔΔ For the
training stage of the ASR system, the acoustic models of 22 phonemes and thesilence consist on a three-state HMMs with a mixture of 8 Gaussians per state
The combined techniques show that for low noise conditions i.e SN R=10, 15
or 20 dB, the difference between the original noisy space and the one mated to the clean is similar However, when the SNR is lower (-5dB or 0dB)the SS improves the performance of PD-MEMLIN Comparing the combination
approxi-of SS with PD-MEMLIN against the case where no techniques are applied, asignificant improvement is shown The results described before are presented inTables 1, 2 and 3 The Tables show ”Sent” that means complete utterances
Trang 25Robust Automatic Speech Recognition Using PD-MEEMLIN 7
Table 1.Comparative Table for the ASR working with Subway Noise
SNR Sent % Word % Sent % Word % Sent % Word % Sent % Word %
-5dB 3.40 21.57 10.09 34.22 11.29 37.09 13.29 47.95 0dB 9.09 29.05 20.18 53.71 27.07 61.88 30.87 69.71 5dB 17.58 40.45 32.17 70.00 48.15 80.38 51.65 83.40 10dB 33.07 65.47 50.95 83.23 65.83 90.58 70.13 91.86 15dB 54.45 84.60 64.84 90.02 78.92 94.98 78.22 94.40 20dB 72.83 93.40 76.52 94.56 85.91 97.14 86.71 97.30
Table 2.Comparative Table for the ASR working with Babble Noise
SNR Sent % Word % Sent % Word % Sent % Word % Sent % Word %
0dB 11.29 30.41 15.98 44.49 23.48 55.72 20.08 59.50 5dB 20.58 44.23 30.37 65.11 48.75 80.55 49.25 83.70 10dB 40.86 72.85 50.25 80.93 74.93 94.20 69.33 91.48 15dB 69.03 90.54 69.93 90.56 84.12 96.86 81.32 95.54 20dB 82.42 96.17 83.52 95.84 88.91 98.09 88.01 97.98
Table 3.Comparative Table for the ASR working with Car Noise
SNR Sent % Word % Sent % Word % Sent % Word % Sent % Word %
10dB 28.77 58.13 54.25 82.72 70.83 92.15 70.93 91.9015dB 57.84 84.04 68.03 90.51 82.02 96.16 81.42 95.8620dB 78.32 94.61 81.42 95.30 87.01 97.44 87.81 97.77
percentage correctly recognised, and ”Word” indicates the words percentage rectly recognised The gap between the clean and the noisy model, for the veryhigh degraded speech, had been shortened due to the advantages of both tech-niques When PD-MEEMLIN is employed the performance is between 11.7%and 24.84% better than PD-MEMLIN, and between 11.4% and 34.5% betterthan SS
cor-5 Conclusions
In this work a robust normalization technique, PD-MEEMLIN, has been sented by cascading a speech enhancement method (SS) followed by a featurevector normalization algorithm (PD-MEMLIN) The results of PD-MEEMLINshow a better performance than SS and PD-MEMLIN for a very high degraded
Trang 263 Gales, M.J.F., Young, S.: Cepstral Parameter Compensation for HMM Recognition
in Noise Speech Communication 12(3), 231–239 (1993)
4 Moreno, P.J., Raj, B., Gouvea, E., Stern, R.M.: Multivariate-Gaussian-Based stral Normalization for Robust Speech Recognition Department of Electrical andComputer Engineering & School of Computer Science Carnegie Mellon University
Cep-5 Hermansky, H., Morgan, N.: RASTA Processing of Speech IEEE Transactions onSpeech and Audio Processing 2(4), 578–589 (1994)
6 Nolazco-Flores, J., Young, S.: Continuous Speech Recognition in Noise Using tral Subtraction and HMM adaptation In: ICASSP, pp I.409–I.412 (1994)
Spec-7 Buera, L., Lleida, E., Miguel, A., Ortega, A.: Multienvironment Models BasedLInear Normalization for Speech Recognition in Car Conditions In: Proc ICASSP(2004)
8 Buera, L., Lleida, E., Miguel, A., Ortega, A.: Robust Speech Recognition in CarsUsing Phoneme Dependent Multienvironment LInear Normalization In: Proceed-ings of Interspeech Lisboa, Portugal, pp 381–384 (2005)
9 Martin, R.: Spectral Subtraction Based on Minimum Statistics In: Proc Eur.Signal Processing Conf pp 1182–1185 (1994)
10 Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing, pp 504–512.Prentice Hall PTR, United States (2001)
11 Martin, R.: Noise Power Spectral Density Estimation Based on Optimal Smoothingand Minimum Statistics IEEE Transactions on Speech and Audio Processing, vol.9(5) (2000)
12 Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of Speech Corrupted byAcoustic Noise In: Proc IEEE Conf ASSP, pp 208–211 (1979)
13 Hirsch, H.G., Pearce, D.: The AURORA Experimental Framework for the mance Evaluations of Speech Recognition Systems Under Noisy Condidions In:ISCA ITRW ASR2000, Automatic Speech Recognition: Challenges for the NextMillennium, Paris, France (2000)
Trang 27Perfor-Shadow Resistant Road Segmentation
from a Mobile Monocular System
Jos´e Manuel ´Alvarez, Antonio M L´opez, and Ramon Baldrich
Computer Vision Center and Computer Science Dpt.,
Universitat Aut`onoma de BarcelonaEdifici O, 08193 Bellaterra, Barcelona, Spain
{jalvarez,antonio,ramon}@cvc.uab.es
http://www.cvc.uab.es/adas
Abstract. An essential functionality for advanced driver assistance tems (ADAS) is road segmentation, which directly supports ADAS ap-plications like road departure warning and is an invaluable backgroundsegmentation stage for other functionalities as vehicle detection Unfor-tunately, road segmentation is far from being trivial since the road is
sys-in an outdoor scenario imaged from a mobile platform For sys-instance,shadows are a relevant problem for segmentation The usual approachesare ad hoc mechanisms, applied after an initial segmentation step, thattry to recover road patches not included as segmented road for being inshadow In this paper we argue that by using a different feature space toperform the segmentation we can minimize the problem of shadows fromthe very beginning Rather than the usual segmentation in a color space
we propose segmentation in a shadowless image which is computable inreal–time using a color camera The paper presents comparative resultsfor both asphalted and non–asphalted roads, showing the benefits of theproposal in presence of shadows and vehicles
1 Introduction
Advanced driver assistance systems (ADAS) arise as a contribution to trafficsafety, a major social issue in modern countries The functionalities required tobuild such systems can be addressed by computer vision techniques, which havemany advantages over using active sensors (e.g radar, lidar) Some of them are:higher resolution, richness of features (color, texture), low cost, easy aestheticintegration, non–intrusive nature, low power consumption, and besides, somefunctionalities can only be addressed by interpreting visual information A rele-vant functionality is road segmentation which supports ADAS applications likeroad departure warning Moreover, it is an invaluable background segmentationstage for other functionalities as vehicle and pedestrian detection, since knowingthe road surface considerably reduces the image region to search for such objects,thus, allowing real–time and reducing false detections
Our interest is real–time segmentation of road surfaces, both non–asphaltedand asphalted, using a single forward facing color camera placed at the wind-shield of a vehicle However, road segmentation is far from being trivial since the
J Mart´ı et al (Eds.): IbPRIA 2007, Part II, LNCS 4478, pp 9–16, 2007.
c
Springer-Verlag Berlin Heidelberg 2007
Trang 2810 J.M ´Alvarez, A.M L´opez, and R Baldrich
Fig 1.Roads with shadows
road is in an outdoor scenario imaged from a mobile platform Hence, we dealwith a continuously changing background, the presence of different vehicles of un-known movement, different road shapes with worn–out asphalt (or not asphalted
at all), and different illumination conditions For instance, a particularly vant problem is the presence of shadows (Fig 1) The usual approaches found
rele-in the literature are ad hoc mechanisms applied after an rele-initial segmentationstep (e.g [1,2,3]) These mechanisms try to recover road patches not included assegmented road for being in shadow In this paper we argue that by using a dif-ferent feature space to perform the segmentation we can minimize the problem ofshadows from the very beginning Rather than the usual segmentation in a colorspace, we propose segmentation in a shadowless image, which is computable inreal–time using a color camera In particular, we use the grey–scale illuminantinvariant image introduced in [4],I from now on.
In Sect 2 we summarize the formulation ofI Moreover, we also show that
automatic shutter, needed outdoors to avoid global over/under–exposure, fitswell in such formulation In order to illustrate the usefulness ofI, in Sect 3 we
propose a segmentation algorithm based on standard region growing applied to
I We remark that we do not recover a shadow–free color image from the
orig-inal, which would result in too large processing time for the road segmentationproblem Section 4 presents comparative road segmentation results in presence
of shadows and vehicles, both in asphalted and non–asphalted roads, confirmingthe validity of our hypothesis Finally, conclusions are drawn in Sect 5
2 Illuminant Invariant Image
Image formation models are defined in terms of the interaction between thespectral power distribution of illumination, surface reflectance and spectral sen-
sitivity of the imaging sensors Finlayson et al [4] show that under the tions of Planckian illumination, Lambertian surfaces and having three different narrow band sensors, it is possible to obtain a shadow–free color image We are
assump-not interested in such image since it requires very large processing time to berecovered We focus on an illuminant invariant image (I) that is obtained at the
first stage of the shadow–free color image recovering process We briefly exposehere the idea behindI and refer to [4] for details.
Trang 29Shadow Resistant Road Segmentation from a Mobile Monocular System 11
Fig 2.Ideal log–log chromaticity plot A Lambertian surface patch of a given maticity under a Planckian illumination is represented by a point By changing thecolor temperature of the Planckian illuminator we obtain a straight line associated tothe patch Lambertian surface patches of different chromaticity have different associ-
chro-ated lines All these lines form a family of parallel lines, namely Ψ θ Let θ be a line
perpendicular to Ψ θ and θ the angle between θand the horizontal axis Then, by
pro-jection, we have a one–to–one correspondence between points in θand straight lines of
Ψ θ , so that θ preserves the differences regarding chromaticity but removes differencesdue to illumination changes assuming Planckian radiators
Let us denote by R, G, B the usual color channels and assume a normalizing
channel (or combination of channels), e.g without losing generality let us choose
G as such normalizing channel Then, under the assumptions regarding the sors, the surfaces and the illuminators, if we perform a plot of r = log(R/G)
sen-vs b = log(B/G) for a set of surfaces of different chromaticity under different
illuminants, we would obtain a result similar to the one in Fig 2 This means
that we obtain an axis, θ, where a surface under different illuminations is resented by the same point, while moving along θ implies to change the surface
rep-chromaticity In other words, θcan be seen as a grey–level axis where each greylevel corresponds to a surface chromaticity, independently of the surface illu-mination Therefore, we obtain an illuminant invariant image, I(p), by taking each pixel p = (x, y) of the original color image, IRGB(p) = (R(p), G(p), B(p)),
computing p = (r(p), b(p)) and projecting p onto θ according to θ (a camera
dependent constant angle) The reason forI being shadow–free is, roughly, that
non–shadow surface areas are illuminated by both direct sunlight and skylight(a sort of scattered ambient light), while areas in the umbra are only illuminated
by skylight Since both, skylight alone and with sunlight addition, can be ered Planckian illuminations [5], areas of the same chromaticity ideally project
consid-onto the same point in θ, no matter if the areas are in shadow or not.
Given this result, the first question is whether the working assumptions are
realistic or not In fact, Finlayson et al [4] show examples where, despite the
departures from the assumptions that are found in practice, the obtained sults are quite good We will see in Sect 4 that this holds in our case, i.e., the
Trang 30re-12 J.M ´Alvarez, A.M L´opez, and R Baldrich
combination of our camera, the daylight illuminant and the surface we areinterested in (the road) fits pretty well theI theory.
A detail to point out is that our acquisition system was operating in matic shutter mode: i.e., inside predefined ranges, the shutter changes to avoidboth global overexposure and underexposure However, provided we are us-ing sensors with linear response and the same shutter for the three channels,
auto-we can model the shutter action as a multiplicative constant s, i.e., auto-we have
sI RGB = (sR, sG, sB) and, therefore, the channel normalization removes the constant (e.g sR/sG = R/G).
In addition, we expect the illumination invariant image to reduce not onlydifferences due to shadow but also differences due to asphalt degradation since,
at the resolution we work, they are pretty analogous to just intensity changes.Note that the whole intensity axis is equivalent to a single chromaticity, i.e., allthe patches of the last row of the Macbeth color checker in Fig 2 (Ni) project
to the same point of θ.
3 Road Segmentation
With the aim of evaluating the suitability of the illuminant invariant image wehave devised a relatively simple segmentation method based on region growing[6], sketched in Fig 3 This is, we do not claim that the proposed segmentation
is the best, but one of the most simplest that can be expected to work in ourproblem We emphasize that our aim is to show the suitability of I for road
segmentation and we think that providing good results can be a proof of it, evenusing such simple segmentation approach
The region growing uses a very simple aggregation criterium: if p = (x, y)
is a pixel already classified as of the road, any other pixel pn = (xn , y n) of its8–connected neighborhood is classified as road one if
where diss(p, p n) is the dissimilarity metric for the aggregation and tagg athreshold that fixes the maximum dissimilarity to consider two connected pixels
as of the same region To prove the usefulness ofI we use the simplest
dissimi-larity based on grey levels, i.e.,
diss I(p, pn) =|I(p) − I(p n)| (2)
Of course, region growing needs initialization, i.e., the so–called seeds
Cur-rently, such seeds are taken from fixed positions at the bottom region of theimage (Fig 3), i.e., we assume that such region is part of the road In fact, thelowest row of the image corresponds to a distance of about 4 meters away fromthe vehicle, thus, it is a reasonable assumption most of the time (other proposalsrequire to see the full road free at the start up of the system, e.g [1])
In order to compute the angle θ corresponding to our camera, we have
fol-lowed two approaches One is the proposal in [7], based on acquiring images of
Trang 31Shadow Resistant Road Segmentation from a Mobile Monocular System 13
Fig 3.Proposed algorithm In all our experiments we have fixed values for the
algo-rithm parameters: σ = 0.5 for Gaussian smoothing (Gaussian kernel, g σ, discretized
in a 3 × 3 window for convolution ’∗’); θ = 38 ◦ ; t agg = 0, 007 and seven seeds placed
at the squares pointed out in the region growing result; structuring element (SE) of
n × m = 5 × 3 Notice that we apply some mathematical morphology just to fill in
some small gaps and thin grooves
the Macbeth color checker under different day time illuminations and using the
(r,b)–plot to obtain θ The other approach consists in taking a few road images with shadows and use them as positive examples to find θ providing the best shadow–free images for all the examples The values of θ obtained from the two
calibration methods basically give rise to the same segmentation results We have
taken θ from the example–based calibration because it provides slightly better
segmentations Besides, although not proposed in the original formulation ofI, before computing it we regularize the input image I RGB by a small amount ofGaussian smoothing (the same for each color channel)
4 Results
In this section we present comparative results based on the region growing gorithm introduced in Sect 3 for three different feature spaces: intensity image
al-(I; also called luminance or brightness); hue–saturation–intensity (HSI) color
space; and the illuminant invariant image (I).
Trang 3214 J.M ´Alvarez, A.M L´opez, and R Baldrich
The intensity image is included in the comparison just to see what can weexpect from a monocular monochrome system Since it is a grey level image, itscorresponding dissimilarity measure is defined analogously to Eq (2), i.e.:
diss I (p, p n) = |I(p) − I(p n) | (3)
The HSI space is chosen because it is one of the most accepted color spaces for
segmentation purposes [8] The reason is that by having separated chrominance
(H & S) and intensity (I) such space allows reasoning in a closer way to human
perception than others For instance, it is possible to define a psychologicallymeaningful distance between colors as the cylindrical metric proposed in [8] formultimedia applications, and used in [1] for segmenting non–asphalted roads
Such metric gives rise to the following dissimilarity measure for HSI space:
– Case achromatic pixels: use only the definition of diss I given in Eq (3)
– Case chromatic pixels:
where the different criterion regarding chromaticity is used to take into account
the fact that hue value (H) is meaningless when the intensity (I) is very low or very high, or when the saturation (S) is very low For such cases only intensity
is taken into account for aggregation We use the proposal in [8,1] to define the
frontier of meaningful hue, i.e., p is an achromatic pixel if either I(p) > 0.9Imaxor
I(p) < 0.1Imaxor S(p) < 0.1Smax, where Imaxand Smaxrepresent the maximumintensity and saturation values, respectively
In summary, to compute Eq (1) we use Eq (2) forI with threshold t agg, I, Eq
(3) for I with threshold tagg,I , and Eq (4) for HSI with thresholds tagg,ch
(chro-matic case) and t agg,ach (achromatic case) Figure 4 shows the results obtainedfor examples of both asphalted and non–asphalted roads We have manually set
the tagg, I , tagg,I, and tagg,ch , t agg,ach parameters to obtain the best results foreach feature space, but such values are not changed from image to image, i.e.,all the frames of our sequences have been processed with them fixed
These results suggest that I is a more suitable feature space for road
seg-mentation than the others Road surface is well recovered most of the times,with the segmentation stopping at road limits and vehicles1, even with a simple
1 Other on going experiments, not included here for space restrictions, also show thatsegmentation is quite stable regarding the chosen aggregation threshold as well as
the number and position of seeds, much more stable than both I and HSI.
Trang 33Shadow Resistant Road Segmentation from a Mobile Monocular System 15
Fig 4.From left to right columns: (a) original 640 × 480 color image with the seven used seeds marked in white; (b) segmentation using I with t agg,I = 0, 008; (c) segmen- tation using I with t agg, I = 0, 003; (d) segmentation using HSI with t agg,ch = 0, 08, and t agg,ach = 0, 008 The white pixels over the original image correspond to the seg-
mentation results The top four rows correspond to asphalted roads and the rest tonon–asphalted areas of a parking
segmentation method Now, such segmentation can be augmented with roadshape models like in [9,10] with the aim of estimating the not seen road in case
of many vehicles in the scene As a result, road limits and road curvature tained will be useful for applications as road departure warning The processing
Trang 34ob-16 J.M ´Alvarez, A.M L´opez, and R Baldrich
time required in non–optimized MatLab code to computeI is about 125ms and 700ms for the whole segmentation process We expect it to reach real–time when
written in C++ code
5 Conclusions
We have addressed road segmentation by using a shadow–free image (I) In
order to illustrate the suitability of I for such task we have devised a very
simple segmentation method based on region growing By using this method wehave provided comparative results for asphalted and non–asphalted roads whichsuggest that I makes the segmentation process easier in comparison to other popular feature space found in road segmentation algorithms, namely the HSI.
In addition, the process can run in real–time In fact, since the computation ofI
only depends on a precalculated parameter, i.e., the camera characteristic angle
θ, it is possible that a camera supplier would provide such angle after calibration
(analogously to calibration parameters provided with stereo rigs)
Acknowledgments This work was supported by the Spanish Ministry of
Education and Science under project TRA2004-06702/AUT
References
1 Sotelo, M., Rodriguez, F., Magdalena, L., Bergasa, L., Boquete, L.: A colorvision-based lane tracking system for autonomous driving in unmarked roads
Autonomous Robots 16(1) (2004)
2 Rotaru, C., Graf, T., Zhang, J.: Extracting road features from color images using
a cognitive approach In: IEEE Intelligent Vehicles Symposium (2004)
3 Ramstrom, O., Christensen, H.: A method for following unmarked roads In: IEEEIntelligent Vehicles Symposium (2005)
4 Finlayson, G., Hordley, S., Lu, C., Drew, M.: On the removal of shadows from
images IEEE Trans on Pattern Analysis and Machine Intelligence 28(1) (2006)
5 Wyszecki, G., Stiles, W.: Section 1.2 In: Color science: concepts and methods,quantitative data and formulae (2nd Edition) John Wiley & Sons (1982)
6 Gonzalez, R., Woods, R.: Section 10.4 In: Digital Image Processing (2nd Edition).Prentice Hall (2002)
7 Finlayson, G., Hordley, S., Drew, M.: Removing shadows from images In: pean Conference on Computer Vision (2002)
Euro-8 Ikonomakis, N., Plataniotis, K., Venetsanopoulos, A.: Color image segmentation
for multimedia applications Journal of Intelligent Robotics Systems 28(1-2) (2000)
9 He, Y., Wang, H., Zhang, B.: Color–based road detection in urban traffic scenes
IEEE Transactions on Intelligent Transportation Systems 5(24) (2004)
10 Lombardi, P., Zanin, M., Messelodi, S.: Switching models for vision-based on–boardroad detection In: International IEEE Conference on Intelligent TransportationSystems (2005)
Trang 35Mosaicking Cluttered Ground Planes Based on
Stereo Vision
Jos´e Gaspar1, Miguel Realpe2, Boris Vintimilla2, and Jos´e Santos-Victor1
1 Computer Vision Laboratory Inst for Systems and Robotics Instituto Superior
T´ecnico Lisboa, Portugal
{jag,jasv}@isr.ist.utl.pt
2Vision and Robotics Center Dept of Electrical and Computer Science Eng
Escuela Superior Polit´ecnica del Litoral Guayaquil, Ecuador
{mrealpe,boris.vintimilla}@fiec.espol.edu.ec
Abstract. Recent stereo cameras provide reliable 3D reconstructions.These are useful for selecting ground-plane points, register them andbuilding mosaics of cluttered ground planes In this paper we propose
a 2D Iterated Closest Point (ICP) registration method, based on thedistance transform, combined with a fine-tuning-registration step usingdirectly the image data Experiments with real data show that ICP isrobust to 3D reconstruction differences due to motion and the fine tuningstep minimizes the effect of the uncertainty in the 3D reconstructions
1 Introduction
In this paper we approach the problem of building mosaics, i.e image montages,
of cluttered ground planes, using stereo vision on-board of a wheeled mobilerobot Mosaics are useful for the navigation of robots and for building human-robot interfaces One clear advantage of mosaics is the simple representation ofrobot localization and motion: they are simply 2D rigid transformations.Many advances have been made recently in vision based navigation Flexi-ble (and precise) tracking and reconstruction of visual features, using particlefilters, allowed real time Simultaneous Localization and Map Building (SLAM)[1] The introduction of scale-invariant visual features brought more robustnessand allowed very inexpensive navigation solutions [2,3] Despite being effective,these navigation modalities lack building dense scene representations convenientfor intuitive human-robot interfaces Recent commercial stereo cameras came
to help by giving locally dense 3D scene reconstructions Iterative methods formatching points and estimation their rigid motion, allow registering the localreconstructions and obtaining global scene representations The Iterated ClosestPoint (ICP) [4] is one such method that we explore in this work
The ICP basic algorithm has been extended in a number of ways Examples ofimprovements are robustifying the algorithm to the influence of features lackingcorrespondences or using weighted metrics to trade-off distance and feature simi-larity [5] More recent improvements target real time implementations, matchingshapes with defects or mixing probabilistic matching metrics with saturations
to minimize the effect of outliers [6,7,8] In our case, the wheeled mobile robots
J Mart´ı et al (Eds.): IbPRIA 2007, Part II, LNCS 4478, pp 17–24, 2007.
c
Springer-Verlag Berlin Heidelberg 2007
Trang 3618 J Gaspar et al.
motion on the ground plane allows searching for 2D, instead of 3D, tions Hence we follow a 2D ICP methodology, but we take a computer visionapproach, namely registering clouds of points using the distance transform [9].Stereo cameras allow selecting ground-plane points, registering them and thenbuilding the ground plane mosaic Stereo reconstruction is therefore an advan-tage, however some specific issues arise about its use For example, the discretenature of the imaging process, and the variable imaging of objects and occlusionsdue to robot motion, imply uncertainties on the 3D reconstruction Hence, theregistration of 3D data propagates also some intrinsic uncertainty The selection
registra-of ground-plane data, is convenient for complexity reduction, however a question
of the sparsity of data arises In our work we investigate robust methodologies
to deal with these issues, and in particular we investigate whether resorting tothe raw image data can help minimizing error propagation
The paper is structured as follows: Sec.2 details the mosaicking problem andintroduces our approach to solve it; Sec.3 shows how we build the orthographicviews of the ground plane; Sec.4 details the optimization functionals associated
to mosaic construction; Sec.5 is the results section; Finally in Sec.6 we drawsome conclusions and guidelines for future work
2 Problem Description
The main objective of our work is mosaicking (mapping) the ground plane sidering that it can be cluttered with objects such as furniture The sensor is astatic trinocular-stereo camera mounted on a wheeled mobile robot The stereocamera gives 3D clouds of points in the camera coordinate system, i.e a mobileframe changed by the robot motion See Fig 1
con-The ground plane constraint implies that the relationships between cameracoordinate systems are 2D rigid motions As in general the camera is not alignedwith the floor, i.e the camera coordinate system does not have two axis parallel
to the ground plane, the relationships do not clearly show their 2D nature Inorder to make clear the 2D nature of the problem, we define a new coordinatesystem aligned with the ground plane (three user-selected well-separated groundpoints are sufficient for this purpose)
Fig 1.Mosaicking ground planes: Stereo camera, Image and BEV coordinate systems
Trang 37Mosaicking Cluttered Ground Planes Based on Stereo Vision 19
Commercial stereo cameras give dense reconstructions For example, for eachimage feature, such as a corner or an edge point, there are usually about 20 to 30reconstructed 3D points (the exact number depend on the size of the correlationwindows) Considering standard registration methods as Iterated Closest Point(ICP, [4]), the large clouds of 3D points imply large computational costs Hence,
we choose to work with a subset of the data, namely by selecting just points ofthe ground plane The 2D clouds of points can therefore be registered with a 2DICP method
Noting that each 3D cloud of points results from stereo images registration,the process of registering consecutive clouds of points has some error propagatedfrom the cloud reconstruction In order to minimize the error propagation, weadd a fine tuning image-based registration process after the initial registration
by a 2D ICP method The image-based registration is a 2D rigid transformation
in Bird’s Eye Views (BEV), i.e orthographic images of the ground plane BEVimages can be obtained also knowing some ground points and the projectiongeometry To maintain consistent units system, despite having metric values inthe 3D clouds of points, we choose to process both the 2D ICP and the imageregistration in the pixel metric system, i.e the same as the raw data
In summary our approach encompasses two main steps: (i) selection of groundpoints and 2D ICP, (ii) BEV image registration Despite the 2D methodologynotice that the 3D data is a principal component The 3D sensor allows select-ing the ground plane data, which is useful not only for using a faster 2D ICPmethod but mainly for registering the ground plane images without consideringthe distracting (biasing) non-ground regions
3 Obtaining Bird’s Eye Views (BEV)
The motion of the robot implies a motion of the trinocular camera which wedenote as2T1 The indexes 1 and 2 indicate two consecutive times, and tag alsothe coordinate systems at the different times, e.g the camera frames{cam1} and {cam2} The image plane defines new coordinate systems, {img1} and {img2},
and the BEV defines another ones,{bev1} and {bev2} See Fig 1.
The projection matrix, P relating {cam i } and {img i } is given by the camera
manufacturer or by a standard calibration procedure [10] In this section we are
mainly concerned with obtaining the homography, H relating the image plane
with the BEV
The BEV dewarping, H is defined by back-projecting to the ground plane four image points (appendix A details the back-projection matrix, P ∗) The
four image points are chosen so to comprehend most of the field of view imagingthe ground plane The region close to the horizon line is discarded due to poorresolution Scaling is chosen such that it preserves the largest resolution available,i.e no image-data loss due to sub sampling
Is interesting to note that the knowledge of the 3D camera-motion,2T1directlygives the BEV 2D rigid transformation,2H1 (see Fig 1):
2
Trang 384 Mosaic Construction
The input data for mosaic creation consists of BEV images, It and It+1, and
clouds of ground-points projected in the BEV coordinate system,{[u v] T
t,i } and {[u v] T
t+1,i } In this frame, the camera motion is a 2D rigid translation,2H1,
which can be represented by three parameters μ = [δu δv δθ] We want to find
μ such that the clouds of points match as close as possible:
μ ∗= arg
μmin
i
[u v]T t+1,j − Rot(δθ).[u v] T
a saturation on the distance transform (constant distances imply no influence inthe optimization process)
Given the first estimation of the 2D motion and the knowledge of groundpoints, we can now fine tune the registration using ground plane image data:
an initial stage These values are updated in the optimization process only iftrue matchings become possible, i.e a new hypothetical 2D rigid motion be-tween BEV images can bring to visibility unmatched points This allows furthersmoothing the optimization process for points near the border of the field ofview
Finally, given the 2D rigid motion, the mosaic composition is just an mulation of images A growing 2D image buffer is defined such as to hold imagepoints of the ground plane along the robot traveled path
Trang 39accu-Mosaicking Cluttered Ground Planes Based on Stereo Vision 21
215 stereo images are acquired along the path
Figure 2 illustrates the dewarping to BEV images and the registration of thedewarped images The BEV images are 1568× 855 One meter measured in the
ground plane is equivalent to 318 pixels the BEV (this calibration informationderives directly from the stereo-camera calibration) The registration is illus-trated by super-imposing consecutive BEV images after applying to the first
(a) Trinocular (b) Reference camera (c) Reference camera
camera time t time t + 1.
(d) Dewarping (e) Superposition (f ) Distance
BEV of (b) without registration transform of (c).
0.2 0.4 0.6 0.8 1
F1( δθ )
F2( δθ )
(g) Superposition (h) Cost functionals vs perturbation δθ
after registration (costs normalized to [0, 1], δθ in [−100
,100]).
Fig 2.BEV dewarping and registration (a) Stereo camera (b) and (c) show structed ground-points (blue-points) in the reference camera of the stereo setup (d)BEV dewarping of (b) (e) superposition of BEVs without registration (notice the blur).(f) distance transform of the ground points seen in (c) (g) correct superposition of allground points after registration (h) comparison of the cost functionals by perturbing
recon-δθabout the minimizing point: registration using Eq.2 has a larger convergence region(dots plot) but the image-based registration, Eq.3 is more precise (circles plot)
Trang 4022 J Gaspar et al.
(a) View of the working space and of the robot.
(b) Ground points used (c) Mosaic with all imaging for registration (landmarks) data superimposed
Fig 3.View of the working area (a), mosaic of the ground points chosen as landmarkswhile registering the sequence of BEV images (b) and a mosaic with all the visualinformation superimposed (c)
image the estimated 2D rigid motion Notice in particular in Fig 2c the nificant shape differences of the clouds of points as compared to Fig 2b, and
sig-in Fig 2g the graceful degradation of the superposition for posig-ints progressivelymore distant to the ground plane Fig 2f shows the distance transform usedfor matching ground-points The matching is performed repeatedly in Eq.2 inorder to obtain the optimal registration shown in Fig 2g The existence of local-clusters of points, instead of isolated points, motivates a wider-convergence butless precise registration which can be improved resorting to image data (Eq.3)
as shown in figure Fig 2h
The mosaicking of BEVs shows clearly the precision of the registration process
In particular shows that the image-based registration improves significantly the2D motion estimation After one complete circle described by the robot, the 2DICP registration gives about 2.7 meters localization error (28% error over path