4 shows an example of a base likelihood shown as a grayscale image and particles that represent the pose of the vehicle and their associated paths in cyan.. 4 shows an example of a base
Trang 2trajectoryi t' , t' k ,k, where the value defines some horizon of time This
trajectory ends exactly at the particle instance, i.e i i
k
The estimated dead-reckoning trajectory is usually defined in a different coordinate system
as it is the result of an independent process The important aspect of the dead-reckoning
estimate is that its path has good quality in relative terms, i.e locally Its shape is, after
proper rotation and translation, similar to the real path of the vehicle expressed in a
different coordinate frame
If the dead-reckoning estimate is expressed as the path i t' x t y t ' , ' , t' then the
process to associate it to an individual particle and to express it in the global coordinate
frame is performed according to:
k
is the heading of the particle i
k
X and k
is the heading of the dead-reckoning path at time k
Clearly, at time 't k the difference x t' ,y t' must be 0,0 , and t' t k' k 0
Fig 3 (left) shows a dead-reckoning path and how it would be used to define the associated
paths of two hypothetical particles The associated paths Fig 3 (right) are just versions of the
original path adequately translated and rotated to match the particles’ positions and
100 150 200
X
Fig 3 The left picture shows a dead-reckoning path, expressed in a coordinate frame
defined by the position and heading of the last point (red square) The right picture shows
the same path associated to two arbitrary particles, expressed in a common coordinate frame
The new likelihood of a particle is now evaluated through the likelihood of its associated path with respect to the road map:
where i s is the path expressed in function of its intrinsic parameter s and l s is the length
of integration over the path The integration of the hypothetical path can be well approximated by a discrete summation
Fig 4 shows an example of a base likelihood (shown as a grayscale image) and particles that represent the pose of the vehicle and their associated paths (in cyan) The particles’ positions and headings are represented blue arrows The red arrow and the red path correspond to one of most likely hypotheses
By applying observations that consider the hypothetical past path of the particle, the map problem is mitigated (although not solved completely) for transition situations The transition between being on the known map and going completely out of it (i.e current pose and recent path are out of the map) can be performed safely by considering an approach based on hysteresis
out-of-The approach is summarized as follows: If the maximum individual path likelihood (the likelihood of the particle with maximum likelihood) is higher than K H then the process keeps all particles with likelihoodK L These thresholds are defined by100%K HK L0% If the maximum likelihood is K H then the process keeps all the particles and continues the processing in pure prediction mode Usual values for these thresholds areK H70%,K L60%
Trang 3Robust Global Urban Localization Based on Road Maps 275
trajectoryi t' , t' k ,k, where the value defines some horizon of time This
trajectory ends exactly at the particle instance, i.e i i
k
The estimated dead-reckoning trajectory is usually defined in a different coordinate system
as it is the result of an independent process The important aspect of the dead-reckoning
estimate is that its path has good quality in relative terms, i.e locally Its shape is, after
proper rotation and translation, similar to the real path of the vehicle expressed in a
different coordinate frame
If the dead-reckoning estimate is expressed as the path i t' x t y t ' , ' , t' then the
process to associate it to an individual particle and to express it in the global coordinate
frame is performed according to:
k
is the heading of the particle i
k
X and k
is the heading of the dead-reckoning path at time k
Clearly, at time 't k the difference x t' ,y t' must be 0,0 , and t' t k' k 0
Fig 3 (left) shows a dead-reckoning path and how it would be used to define the associated
paths of two hypothetical particles The associated paths Fig 3 (right) are just versions of the
original path adequately translated and rotated to match the particles’ positions and
100 150 200
X
Fig 3 The left picture shows a dead-reckoning path, expressed in a coordinate frame
defined by the position and heading of the last point (red square) The right picture shows
the same path associated to two arbitrary particles, expressed in a common coordinate frame
The new likelihood of a particle is now evaluated through the likelihood of its associated path with respect to the road map:
where i s is the path expressed in function of its intrinsic parameter s and l s is the length
of integration over the path The integration of the hypothetical path can be well approximated by a discrete summation
Fig 4 shows an example of a base likelihood (shown as a grayscale image) and particles that represent the pose of the vehicle and their associated paths (in cyan) The particles’ positions and headings are represented blue arrows The red arrow and the red path correspond to one of most likely hypotheses
By applying observations that consider the hypothetical past path of the particle, the map problem is mitigated (although not solved completely) for transition situations The transition between being on the known map and going completely out of it (i.e current pose and recent path are out of the map) can be performed safely by considering an approach based on hysteresis
out-of-The approach is summarized as follows: If the maximum individual path likelihood (the likelihood of the particle with maximum likelihood) is higher than K H then the process keeps all particles with likelihoodK L These thresholds are defined by100%K HK L0% If the maximum likelihood is K H then the process keeps all the particles and continues the processing in pure prediction mode Usual values for these thresholds areK H70%,K L60%
Trang 460 80 100 120 140 160 180 40
60 80 100 120 140 160 180
x (meters)
Fig 4 A synthetic example This region of interest (ROI) is a rectangle of 200 meters by 200
meters A set of particles and their associated paths are superimposed to an image of base
likelihood
In the synthetic example shown in Fig 4 the region of interest (ROI) is a rectangle of 200
meters by 200 meters This ROI is big enough to contain the current population of particles
and their associated paths
Although all the particles are located on the road (high base likelihood); many of their
associated paths abandon the zones of high base likelihood The most likely particles are
those that have a path mostly contained in the nominal zones It can be seen the remarkable
effect of a wrong heading that can rotate the associated path and make it to abandon the
zones of high base likelihood (i.e the road sections in gray)
Some particles have current values that escape the dark gray region (high base likelihood
zones) however their associated paths are mostly contained in the roads That means the
real vehicle could be actually abandoning the road This situation is repeated in Fig 5 as
well, where all the particles are located outside of the nominal road although many of them
have paths that match the map constraints
When the filter infers that the vehicle has been outside the map for sufficient time (i.e no
particles show relevant part of their paths consistent with the map), no updates are
performed on the particles, i.e the filter works in pure prediction mode
When the vehicle enters the known map and eventually there are some particles that
achieve the required path likelihood, i.e higher thanK H, then the filter will start to apply
the updates on the particles
However this synchronization is not immediate There could be some delay until some
associated paths are consistent with the map the fact that a particle is well inside the road
does not mean that its likelihood is high It needs a relevant fraction of its associated path
history to match the road map in order to be considered “inside the map”
This policy clearly immunizes the filter from bias when incorrect particles are temporarily
on valid roads
40 60 80 100 120 140 160 180
4 Experimental Results
Long term experiments have been performed in urban areas of Sydney The road maps were created by an ad-hoc Matlab tool that allowed users to define segments on top of a satellite image obtained from Google Earth These road maps were low quality representations of the roads This disregard for the quality of the definition of the road maps was done on purpose with the goal of exposing the approach to realistic and difficult conditions Fig 7 and Fig 8 show the road map used in the estimation process Fig 2 shows part of the used road map as well
The dead-reckoning process was based on the fusion of speed and heading rate measurements The heading rate was provided by low cost three dimensional gyroscopes A diversity of additional sensors were available in the platform (PAATV/UTE project) although those were not used in the estimation process and results presented in this paper All the experiments and realistic simulations have validated the satisfactory performance of the approach
Figures 7, 8 and 9 present the position estimates as result of the estimation process Those are shown in red (Figure 7) or in yellow (Figures 8 and 9) and are superimposed on the road map In some parts of the test the vehicle went temporarily outside the known map Although there was not a predefined map on those sections it was possible to infer that the estimator performed adequately From the satellite image and the over-imposed estimated
path, a human can realize that the estimated path is actually on a road not defined in the a
priori map (Fig 9)
Trang 5Robust Global Urban Localization Based on Road Maps 277
40 60 80 100 120 140 160 180
x (meters)
Fig 4 A synthetic example This region of interest (ROI) is a rectangle of 200 meters by 200
meters A set of particles and their associated paths are superimposed to an image of base
likelihood
In the synthetic example shown in Fig 4 the region of interest (ROI) is a rectangle of 200
meters by 200 meters This ROI is big enough to contain the current population of particles
and their associated paths
Although all the particles are located on the road (high base likelihood); many of their
associated paths abandon the zones of high base likelihood The most likely particles are
those that have a path mostly contained in the nominal zones It can be seen the remarkable
effect of a wrong heading that can rotate the associated path and make it to abandon the
zones of high base likelihood (i.e the road sections in gray)
Some particles have current values that escape the dark gray region (high base likelihood
zones) however their associated paths are mostly contained in the roads That means the
real vehicle could be actually abandoning the road This situation is repeated in Fig 5 as
well, where all the particles are located outside of the nominal road although many of them
have paths that match the map constraints
When the filter infers that the vehicle has been outside the map for sufficient time (i.e no
particles show relevant part of their paths consistent with the map), no updates are
performed on the particles, i.e the filter works in pure prediction mode
When the vehicle enters the known map and eventually there are some particles that
achieve the required path likelihood, i.e higher thanK H, then the filter will start to apply
the updates on the particles
However this synchronization is not immediate There could be some delay until some
associated paths are consistent with the map the fact that a particle is well inside the road
does not mean that its likelihood is high It needs a relevant fraction of its associated path
history to match the road map in order to be considered “inside the map”
This policy clearly immunizes the filter from bias when incorrect particles are temporarily
on valid roads
40 60 80 100 120 140 160 180
4 Experimental Results
Long term experiments have been performed in urban areas of Sydney The road maps were created by an ad-hoc Matlab tool that allowed users to define segments on top of a satellite image obtained from Google Earth These road maps were low quality representations of the roads This disregard for the quality of the definition of the road maps was done on purpose with the goal of exposing the approach to realistic and difficult conditions Fig 7 and Fig 8 show the road map used in the estimation process Fig 2 shows part of the used road map as well
The dead-reckoning process was based on the fusion of speed and heading rate measurements The heading rate was provided by low cost three dimensional gyroscopes A diversity of additional sensors were available in the platform (PAATV/UTE project) although those were not used in the estimation process and results presented in this paper All the experiments and realistic simulations have validated the satisfactory performance of the approach
Figures 7, 8 and 9 present the position estimates as result of the estimation process Those are shown in red (Figure 7) or in yellow (Figures 8 and 9) and are superimposed on the road map In some parts of the test the vehicle went temporarily outside the known map Although there was not a predefined map on those sections it was possible to infer that the estimator performed adequately From the satellite image and the over-imposed estimated
path, a human can realize that the estimated path is actually on a road not defined in the a
priori map (Fig 9)
Trang 6It is difficult to define a true path in order to compare it with the estimated solution This is
because the estimator is intended to provide permanent global localization with a quality
usually similar to a GPS Figures 10, 11 and 12 present the estimated positions and
corresponding GPS estimates although those were frequently affected by multipath and
other problems
5 Conclusions and Future Work
This paper presented a method to perform global localization in urban environments using
segment-based maps together with particle filters In the proposed approach the likelihood
function is locally generated as a grid derived from segment-based maps The scheme can
efficiently assign weights to the particles in real time, with minimum memory requirements
and without any additional pre-filtering procedures Multi-hypothesis cases are handled
transparently by the filter A path-based observation model is developed as an extension to
consistently deal with out-of-map navigation cases This feature is highly desirable since the
map can be incomplete, or the vehicle can be actually located outside the boundaries of the
provided map
The system behaves like a virtual GPS, providing accurate global localization without using
an actual GPS
Experimental results have shown that the proposed architecture works robustly in urban
environments using segment-based road maps These particular maps provide road
network connectivity in the context of the Darpa Urban Challenge However, the proposed
architecture is general and can be used with any kind of segment-based or topological a
priori map
The filter is able to provide consistent localization, for extended periods of time and long
traversed courses, using only rough dead-reckoning input (affected by considerably drift),
and the RNDF map
The system performs robustly in a variety of circumstances, including extreme situations
such as tunnels, where a GPS-based positioning would not render any solution at all
The continuation of this work involves different lines of research and development One of
them is the implementation of this approach as a robust and reliable module ready to be
used as a localization resource by other systems However this process should be flexible
enough to allow the integration with other sources of observations such as biased compass
measurements and even sporadic GPS measurements
Other necessary and interesting lines are related to the initialization of the estimation
process, particularly for cases where the robot starts at a completely unknown position
Defining a huge local area for the definition of the likelihood (and spreading a population of
particles in it) is not feasible in real-time We are investigating efficient and practical
solutions for that issue
Another area of relevance is the application of larger paths in the evaluation of the Path
Likelihood In the current implementation we consider a deterministic path, i.e we exploit
the fact that for short paths the dead-reckoning presents low uncertainty to the degree of
allowing us to consider the recent path as a deterministic entity In order to extend the path
validity we need to model the path in a stochastic way, i.e by a PDF Although this concept
is mathematically easy to define and understand it implies considerable additional
computational cost
Finally, the observability of the estimation process can be increased by considering additional sources of observation such the detection of road intersections These additional observations would improve the observability of the process particularly when the vehicle does not perform turning maneuvers for long periods
Likelihood in selected region
-1500 -1000 -500 0 500 1000 1500 2000 -1500
-1000 -500 0 500 1000 1500 2000 2500
Estimated Path and Known Map
Trang 7Robust Global Urban Localization Based on Road Maps 279
It is difficult to define a true path in order to compare it with the estimated solution This is
because the estimator is intended to provide permanent global localization with a quality
usually similar to a GPS Figures 10, 11 and 12 present the estimated positions and
corresponding GPS estimates although those were frequently affected by multipath and
other problems
5 Conclusions and Future Work
This paper presented a method to perform global localization in urban environments using
segment-based maps together with particle filters In the proposed approach the likelihood
function is locally generated as a grid derived from segment-based maps The scheme can
efficiently assign weights to the particles in real time, with minimum memory requirements
and without any additional pre-filtering procedures Multi-hypothesis cases are handled
transparently by the filter A path-based observation model is developed as an extension to
consistently deal with out-of-map navigation cases This feature is highly desirable since the
map can be incomplete, or the vehicle can be actually located outside the boundaries of the
provided map
The system behaves like a virtual GPS, providing accurate global localization without using
an actual GPS
Experimental results have shown that the proposed architecture works robustly in urban
environments using segment-based road maps These particular maps provide road
network connectivity in the context of the Darpa Urban Challenge However, the proposed
architecture is general and can be used with any kind of segment-based or topological a
priori map
The filter is able to provide consistent localization, for extended periods of time and long
traversed courses, using only rough dead-reckoning input (affected by considerably drift),
and the RNDF map
The system performs robustly in a variety of circumstances, including extreme situations
such as tunnels, where a GPS-based positioning would not render any solution at all
The continuation of this work involves different lines of research and development One of
them is the implementation of this approach as a robust and reliable module ready to be
used as a localization resource by other systems However this process should be flexible
enough to allow the integration with other sources of observations such as biased compass
measurements and even sporadic GPS measurements
Other necessary and interesting lines are related to the initialization of the estimation
process, particularly for cases where the robot starts at a completely unknown position
Defining a huge local area for the definition of the likelihood (and spreading a population of
particles in it) is not feasible in real-time We are investigating efficient and practical
solutions for that issue
Another area of relevance is the application of larger paths in the evaluation of the Path
Likelihood In the current implementation we consider a deterministic path, i.e we exploit
the fact that for short paths the dead-reckoning presents low uncertainty to the degree of
allowing us to consider the recent path as a deterministic entity In order to extend the path
validity we need to model the path in a stochastic way, i.e by a PDF Although this concept
is mathematically easy to define and understand it implies considerable additional
computational cost
Finally, the observability of the estimation process can be increased by considering additional sources of observation such the detection of road intersections These additional observations would improve the observability of the process particularly when the vehicle does not perform turning maneuvers for long periods
Likelihood in selected region
-1500 -1000 -500 0 500 1000 1500 2000 -1500
-1000 -500 0 500 1000 1500 2000 2500
Estimated Path and Known Map
Trang 8-1500 -1000 -500 0 500 1000 1500 2000 -1000
Fig 8 Estimated path (in yellow) for one of the experiments The known map (cyan) and a
satellite image of the region are included in the picture
-750 -700 -650 -600 -550
-800 -790 -780 -770 -760 -750 -740
Trang 9Fig 8 Estimated path (in yellow) for one of the experiments The known map (cyan) and a
satellite image of the region are included in the picture
-750 -700 -650 -600 -550
-800 -790 -780 -770 -760 -750 -740
Trang 101680 1690 1700 1710 1720 1730 -788
Fig 12 A close inspection shows interesting details The estimates are provided at
frequencies higher than the GPS (5Hz) The GPS presents jumps and the road segment
appears as a continuous piece-wise line (in blue), both sources of information are unreliable
if used individually
6 References
S Arulampalam, S Maskell, N Gordon, and T Clapp, “A Tutorial on Particle Filters for
On-line Non-On-linear/Non-Gaussian Bayesian Tracking,” IEEE Transactions on Signal
Processing, vol 50, no 2, pp 174–188, 2002
F Dellaert, D Fox, W Burgard, and S Thrun, “Monte Carlo Localization for Mobile
Robots,” in International Conference on Robotics and Automation Detroit: IEEE, 1999
D Fox, S Thrun, W Burgard, and F Dellaert, “Particle Filters for Mobile Robot
Localization,” in Sequential Monte Carlo Methods in Practice, A Doucet, N de Freitas,
and Gordon, Eds New York: Springer, 2001
J E Guivant and E M Nebot, “Optimization of the Simultaneous Localization and
Map-building Algorithm for Real time Implementation,” IEEE Transactions on Robotics
and Automation, vol 17, no 3, 2001, pp 242–257
J Guivant, and R Katz, “Global urban localization based on road maps,” in RSJ
International Conference on Intelligent Robots and Systems, IROS San Diego, CA :
IEEE, Oct 2007, pp 1079-1084 ISBN: 978-1-4244-0912-9
J J Leonard and H F Durrant-Whyte, “Simultaneous Map Building and Localization for an
Autonomous Mobile Robot,” IEEE/RSJ International Workshop on Intelligent Robots
and Systems IEEE, 1991
L Liao, D Fox, J Hightower, H Kautz, and D Schultz, “Voronoi Tracking: Location
Estimation Using Sparse and Noisy Sensor Data,” in IEEE/RSJ International
Workshop on Intelligent Robots and Systems Las Vegas, USA: IEEE, 2003
M E E Najjar and P Bonnifait, “A Road-Matching Method for Precise Vehicle Localization
Using Belief Theory and Kalman Filtering,” Autonomous Robots, vol 19, no 2, 2005,
pp 173–191
S M Oh, S Tariq, B N Walker, and F Dellaert, “Map based Priors for Localization,” in
IEEE/RSJ International Workshop on Intelligent Robots and Systems Sendai, Japan:
IEEE, 2004
G Taylor and G Blewitt, “Road Reduction Filtering using GPS,” in 3rd AGILE Conference on
Geographic Information Science, Helsinki, Finland, 2000
ACFR, The University of Sydney and LCR, Universidad Nacional del Sur, “PAATV/UTE
Projects,” Sydney, Australia, 2006
Defence Advanced Research Projects Agency (DARPA), “DARPA Urban Challenge,”
http://www.darpa.mil/grandchallenge/, 2006
Google, “Google Earth,” http://earth.google.com/, 2007
Trang 11Fig 12 A close inspection shows interesting details The estimates are provided at
frequencies higher than the GPS (5Hz) The GPS presents jumps and the road segment
appears as a continuous piece-wise line (in blue), both sources of information are unreliable
if used individually
6 References
S Arulampalam, S Maskell, N Gordon, and T Clapp, “A Tutorial on Particle Filters for
On-line Non-On-linear/Non-Gaussian Bayesian Tracking,” IEEE Transactions on Signal
Processing, vol 50, no 2, pp 174–188, 2002
F Dellaert, D Fox, W Burgard, and S Thrun, “Monte Carlo Localization for Mobile
Robots,” in International Conference on Robotics and Automation Detroit: IEEE, 1999
D Fox, S Thrun, W Burgard, and F Dellaert, “Particle Filters for Mobile Robot
Localization,” in Sequential Monte Carlo Methods in Practice, A Doucet, N de Freitas,
and Gordon, Eds New York: Springer, 2001
J E Guivant and E M Nebot, “Optimization of the Simultaneous Localization and
Map-building Algorithm for Real time Implementation,” IEEE Transactions on Robotics
and Automation, vol 17, no 3, 2001, pp 242–257
J Guivant, and R Katz, “Global urban localization based on road maps,” in RSJ
International Conference on Intelligent Robots and Systems, IROS San Diego, CA :
IEEE, Oct 2007, pp 1079-1084 ISBN: 978-1-4244-0912-9
J J Leonard and H F Durrant-Whyte, “Simultaneous Map Building and Localization for an
Autonomous Mobile Robot,” IEEE/RSJ International Workshop on Intelligent Robots
and Systems IEEE, 1991
L Liao, D Fox, J Hightower, H Kautz, and D Schultz, “Voronoi Tracking: Location
Estimation Using Sparse and Noisy Sensor Data,” in IEEE/RSJ International
Workshop on Intelligent Robots and Systems Las Vegas, USA: IEEE, 2003
M E E Najjar and P Bonnifait, “A Road-Matching Method for Precise Vehicle Localization
Using Belief Theory and Kalman Filtering,” Autonomous Robots, vol 19, no 2, 2005,
pp 173–191
S M Oh, S Tariq, B N Walker, and F Dellaert, “Map based Priors for Localization,” in
IEEE/RSJ International Workshop on Intelligent Robots and Systems Sendai, Japan:
IEEE, 2004
G Taylor and G Blewitt, “Road Reduction Filtering using GPS,” in 3rd AGILE Conference on
Geographic Information Science, Helsinki, Finland, 2000
ACFR, The University of Sydney and LCR, Universidad Nacional del Sur, “PAATV/UTE
Projects,” Sydney, Australia, 2006
Defence Advanced Research Projects Agency (DARPA), “DARPA Urban Challenge,”
http://www.darpa.mil/grandchallenge/, 2006
Google, “Google Earth,” http://earth.google.com/, 2007
Trang 13Object Localization using Stereo Vision 285
Object Localization using Stereo Vision
Sai Krishna Vuppala
x
Object Localization using Stereo Vision
Sai Krishna Vuppala
Institute of Automation, University of Bremen
Germany
1 Introduction
Computer vision is the science and technology of machines that see The theoretical
explanation about the optics of the eye and the information about the existence of images
formed at the rear of the eye ball are provided by Kepler and Scheiner respectively in the
16th century (Lin, 2002) The field of computer vision is started emerging in the second half
of 19th century, since then researchers have been trying to develop the methods and systems
aiming at imitating the biological vision process and therefore increasing the machine
intelligence for many possible applications in the real world The theoretical knowledge
behind the perception of 3D real world using multiple vision systems has been published in
various literature, and (Hartley & Zisserman 2000) (Klette at al., 1998) (Sterger , 2007) are few
well known from them As there are numerous applications of computer vision in service,
industrial, surveillance, and surgical etc automation sectors, researchers have been
publishing numerous methods and systems that address their specific goals It is intended
with most of these researchers that their developed methods and systems are enough
general with respect to the aspects such as functional, robust, time effective and safety
issues Though practical limitations hinder the researchers and developers in achieving
these goals, many are getting ahead providing the solutions to the known and predicted
problems
The field of computer vision is supporting the field of robotics with many vision based
applications In service robotics action interpretation and object manipulation are few
examples with which the computer vision supports the humans According to (Taylor &
Kleeman, 2006), three things are almost certain about universal service robots of the future:
many will have manipulators (and probably legs!), most will have cameras, and almost all
will be called upon to grasp, lift, carry, stack or otherwise manipulate objects in our
environment Visual perception and coordination in support of robotic grasping is thus a
vital area of research for the progress of universal service robots Service robotic tasks are
usually specified at a supervisory level with reference to general actions and classes of
objects An example of such a task would be: Please get ‘the bottle’ from ‘the fridge’ Here,
getting the bottle is the intended action, ‘bottle’ belongs to the class of objects that are
manipulated, and ‘fridge’ belongs to the class of objects in/on which the objects to be
manipulated lie The content of the manuscript addresses the object manipulation tasks
using stereo vision for applications of service robotics Motivation of the content of the
manuscript stems from the needs of service robotic systems FRIEND II and FRIEND III
15
Trang 14(Functional Robot with dexterous arm and user-frIENdly interface for Disabled people) that
are being developed at IAT (Institute of Automation, University of Bremen, Germany) The
systems FRIEND II and FRIEND III are shown in figure 1 a) and b) The objects of interest to
be manipulated are shown in figure 2
Fig 1 Rehabilitation robotic systems a) FRIEND II b) FRIEND III
Fig 2 The objects of interest for the manipulation purpose
In order robot manipulators to perform their actions autonomously, 3D environment
information is necessary In order to reconstruct 3D object information using cameras
various approaches have been investigated since few decades According to (Lange at al.,
2006), the 3D object information recovery techniques are broadly classified into passive and
active sensing methods Passive sensing methods require relatively low power compared to
the active methods Passive sensing methods are similar to the biological vision process The
characteristic contrast-related features in images (i.e cues such as shading, texture) of the
observed scene are used in extracting the 3D information In active sensing methods compared with passive methods high accuracy measurements can be obtained Active sensing relies on projecting energy into the environment and interpreting the modified sensory view CMOS time-of-flight cameras can capture complete 3D image in a single measurement cycle (Lange, 2000), however such a technology is limited with timing resolutions and, the possibility of the 3D information perception depends on the object surface properties
Considering passive sensing methods, though any number of cameras can be used in 3D vision process, the methods based on stereo vision plays optimal role considering the functional, hardware, and computational issues Visual servoing system (Corke, 1996) (Hutchinson, 1996) come into this this category The two major classes of such systems are position- and image based visual servoing systems In position based visual servoing systems, they are further classified into closed- and open loop systems Depending on the requirements of the strategy, they can be realized using single, or two cameras Closed loop systems have high accuracy since the overall system error is minimized using the feedback information, where as in an open-loop system the error depends on the performance of the system in a single step In contrast to the closed approaches look-then-move is an open loop approach for the autonomous task executions; the vision system identifies the target location
in 3D and the robot is driven to the location of interest In look-then-move approach, the accuracy of the overall robotic system depends on the camera calibration accuracy and the manipulator accuracy in reaching the specified locations in the real world (Garric & Devy
1995) In a system like FRIEND II look-then-move approach is prefered since object manipulation, collision avoidance, and path planning are planned using a standlone stereo camera unit
Stereo triangulation is the core process in stereo based 3D vision applications, in order to calculate 3D point stereo correspondences information is used Precisely identified stereo correspondences give precise 3D reconstruction results Any uncertainty in the result of stereo correspondences can potentially yield a virtually unbounded uncertainty in the result
of 3D reconstruction (Lin, 2002) Precise 3D reconstruction additionally requires well calibrated stereo cameras as the calibrated parameters of the stereo view geometry are used
in the stereo triangulation process (Hartley & Zissermann 2000) In addition to that 3D reconstruction accuracy further depends on the length of base line (Gilbert at al., 2006), if the base line is shorter the matching process is facilitated, and if the base line is larger the 3D reconstruction accuracy is higher Approaches for finding the stereo correspondences are broadly classified into two types (Klette at al., 1998); they are intensity and feature-based matching techniques The state of the art stereo correspondence search algorithms mostly based on various scene and geometry based constraints These algorithms can not be used for the texture less objects as the reconstructed obejct information is error prone
The pose (position and orientation) of an object can be estimated using the information from
a single or multiple camera views Assuming that the internal camera parameters are known, the problem of finding the object pose is nothing but finding the orientation and position of object with respect to the camera The problem of finding the pose of the object using the image and object point correspondences in the literature is described as a perspective n-point problem The standard perspective n-point problem can be solved using systems of linear equations if correspondences between at least six image and scene points are known Several researchers provided solutions for this problem considering at least 3
Trang 15Object Localization using Stereo Vision 287
(Functional Robot with dexterous arm and user-frIENdly interface for Disabled people) that
are being developed at IAT (Institute of Automation, University of Bremen, Germany) The
systems FRIEND II and FRIEND III are shown in figure 1 a) and b) The objects of interest to
be manipulated are shown in figure 2
Fig 1 Rehabilitation robotic systems a) FRIEND II b) FRIEND III
Fig 2 The objects of interest for the manipulation purpose
In order robot manipulators to perform their actions autonomously, 3D environment
information is necessary In order to reconstruct 3D object information using cameras
various approaches have been investigated since few decades According to (Lange at al.,
2006), the 3D object information recovery techniques are broadly classified into passive and
active sensing methods Passive sensing methods require relatively low power compared to
the active methods Passive sensing methods are similar to the biological vision process The
characteristic contrast-related features in images (i.e cues such as shading, texture) of the
observed scene are used in extracting the 3D information In active sensing methods compared with passive methods high accuracy measurements can be obtained Active sensing relies on projecting energy into the environment and interpreting the modified sensory view CMOS time-of-flight cameras can capture complete 3D image in a single measurement cycle (Lange, 2000), however such a technology is limited with timing resolutions and, the possibility of the 3D information perception depends on the object surface properties
Considering passive sensing methods, though any number of cameras can be used in 3D vision process, the methods based on stereo vision plays optimal role considering the functional, hardware, and computational issues Visual servoing system (Corke, 1996) (Hutchinson, 1996) come into this this category The two major classes of such systems are position- and image based visual servoing systems In position based visual servoing systems, they are further classified into closed- and open loop systems Depending on the requirements of the strategy, they can be realized using single, or two cameras Closed loop systems have high accuracy since the overall system error is minimized using the feedback information, where as in an open-loop system the error depends on the performance of the system in a single step In contrast to the closed approaches look-then-move is an open loop approach for the autonomous task executions; the vision system identifies the target location
in 3D and the robot is driven to the location of interest In look-then-move approach, the accuracy of the overall robotic system depends on the camera calibration accuracy and the manipulator accuracy in reaching the specified locations in the real world (Garric & Devy
1995) In a system like FRIEND II look-then-move approach is prefered since object manipulation, collision avoidance, and path planning are planned using a standlone stereo camera unit
Stereo triangulation is the core process in stereo based 3D vision applications, in order to calculate 3D point stereo correspondences information is used Precisely identified stereo correspondences give precise 3D reconstruction results Any uncertainty in the result of stereo correspondences can potentially yield a virtually unbounded uncertainty in the result
of 3D reconstruction (Lin, 2002) Precise 3D reconstruction additionally requires well calibrated stereo cameras as the calibrated parameters of the stereo view geometry are used
in the stereo triangulation process (Hartley & Zissermann 2000) In addition to that 3D reconstruction accuracy further depends on the length of base line (Gilbert at al., 2006), if the base line is shorter the matching process is facilitated, and if the base line is larger the 3D reconstruction accuracy is higher Approaches for finding the stereo correspondences are broadly classified into two types (Klette at al., 1998); they are intensity and feature-based matching techniques The state of the art stereo correspondence search algorithms mostly based on various scene and geometry based constraints These algorithms can not be used for the texture less objects as the reconstructed obejct information is error prone
The pose (position and orientation) of an object can be estimated using the information from
a single or multiple camera views Assuming that the internal camera parameters are known, the problem of finding the object pose is nothing but finding the orientation and position of object with respect to the camera The problem of finding the pose of the object using the image and object point correspondences in the literature is described as a perspective n-point problem The standard perspective n-point problem can be solved using systems of linear equations if correspondences between at least six image and scene points are known Several researchers provided solutions for this problem considering at least 3
Trang 16object points According to Shakunaga in his publication on pose estimation using single
camera (Shakunaga, 1991) described that an n-vector body with n >=3 gives at most 8 rotation
candidates from object image correspondences In case if n < 3, the recovery of the rotation
of n-vector body is not possible
The content of the manuscript discusses selection of stereo feature correspondences and
determining the stereo correspondences for opted features on texture less objects in sections
2 and 3 respectively; section 4 presents tracking object pose using 2 object points ; section 5
presents 3D object reconstruction results; Conclusion and References are followed in
sections 6 and 7 respectively
2 Selection of Stereo Feature Correspondences
Scene independent 3D object reconstruction requires information from at least two camera
views Considering stereo vision rather than multiple vision systems for this purpose eases
the computational complexity of the system Stereo vision based 3D reconstruction methods
require stereo correspondence information Traditional problems in finding the stereo
correspondence are occlusion, regularity/ repetitiveness Traditionally these problems are
solved using intensity and feature based stereo matching techniques However, absolute
homogeneous surfaces without any sharp features (i.e edges, corners etc) do not provide
proper stereo correspondence information The domestic objects to be manipulated in the
FRIEND environment are some examples of such kind As the considered object (either each
part or the whole body) has uniform color information the intensity based correspondence
search methods often provide improper stereo correspondence information Therefoere,
alternatively the edges of the green bottle are observed for the stereo correspondence
analysis Out of all possible edges of green bottle that is shown in figure 3, only the
orientation edges of the bottle can be considered for the 3D bottle reconstruction; this is
because the stereo correspondence information between other kinds of edges is typically
lost
2.1 Resolutions between 3D and 2D
The 3D reconstruction accuracy depends on, the accuracy of calibration parameters, the sub
pixel accuracy of stereo correspondences and the length of the baseline In addition to these
factors, the 3D reconstruction accuracy further depends on the pixel size, the focal length of
the camera, and the distance between the camera and the object In the following the feasible
spatial resolution of object surface in the projected scene is analyzed Figure 4 illustrates the
projection of the object surface on to a single image row The feasible geometrical resolution
of the object surface projected on to one row of the image plane is calculated using formula
(1) Similarly the feasible resolution of the object projection on to the column of the object
plane can be calculated The width and the height of the pictured scene are also can be
approximately calculated multiplying the number of pixels along the rows and columns
multiplied with respective feasible resolutions
Fig 3 Various possible edges of green bottle from FRIEND Environment
f
pd resolution
Fig 4 Feasible spatial resolution on object surface in the projected scene (Klette at al., 1998) E.g: The size of the SONY CCD Sensor ICX204AK that is used in the bumblebee stereo vision system has the pixel size of 4.54m×4.54m, the focal length of the camera is 4mm,
and the assumed situation of the object is about 1m far from the camera The feasible
resolution of the camera along the row is 1.125mm/column and the feasible resolution along the column is 1.125mm/row Similarly with a higher focal length camera such as 6mm the feasible resolution becomes 0.75mm/column Implicitely, with decrease in feasible
resolution increases the 3D reconstruction accuracy From equation (1), the feasible resolution is proportional to the distance between the camera and the object, as the distance
Trang 17Object Localization using Stereo Vision 289
object points According to Shakunaga in his publication on pose estimation using single
camera (Shakunaga, 1991) described that an n-vector body with n >=3 gives at most 8 rotation
candidates from object image correspondences In case if n < 3, the recovery of the rotation
of n-vector body is not possible
The content of the manuscript discusses selection of stereo feature correspondences and
determining the stereo correspondences for opted features on texture less objects in sections
2 and 3 respectively; section 4 presents tracking object pose using 2 object points ; section 5
presents 3D object reconstruction results; Conclusion and References are followed in
sections 6 and 7 respectively
2 Selection of Stereo Feature Correspondences
Scene independent 3D object reconstruction requires information from at least two camera
views Considering stereo vision rather than multiple vision systems for this purpose eases
the computational complexity of the system Stereo vision based 3D reconstruction methods
require stereo correspondence information Traditional problems in finding the stereo
correspondence are occlusion, regularity/ repetitiveness Traditionally these problems are
solved using intensity and feature based stereo matching techniques However, absolute
homogeneous surfaces without any sharp features (i.e edges, corners etc) do not provide
proper stereo correspondence information The domestic objects to be manipulated in the
FRIEND environment are some examples of such kind As the considered object (either each
part or the whole body) has uniform color information the intensity based correspondence
search methods often provide improper stereo correspondence information Therefoere,
alternatively the edges of the green bottle are observed for the stereo correspondence
analysis Out of all possible edges of green bottle that is shown in figure 3, only the
orientation edges of the bottle can be considered for the 3D bottle reconstruction; this is
because the stereo correspondence information between other kinds of edges is typically
lost
2.1 Resolutions between 3D and 2D
The 3D reconstruction accuracy depends on, the accuracy of calibration parameters, the sub
pixel accuracy of stereo correspondences and the length of the baseline In addition to these
factors, the 3D reconstruction accuracy further depends on the pixel size, the focal length of
the camera, and the distance between the camera and the object In the following the feasible
spatial resolution of object surface in the projected scene is analyzed Figure 4 illustrates the
projection of the object surface on to a single image row The feasible geometrical resolution
of the object surface projected on to one row of the image plane is calculated using formula
(1) Similarly the feasible resolution of the object projection on to the column of the object
plane can be calculated The width and the height of the pictured scene are also can be
approximately calculated multiplying the number of pixels along the rows and columns
multiplied with respective feasible resolutions
Fig 3 Various possible edges of green bottle from FRIEND Environment
f
pd resolution
Fig 4 Feasible spatial resolution on object surface in the projected scene (Klette at al., 1998) E.g: The size of the SONY CCD Sensor ICX204AK that is used in the bumblebee stereo vision system has the pixel size of 4.54m×4.54m, the focal length of the camera is 4mm,
and the assumed situation of the object is about 1m far from the camera The feasible
resolution of the camera along the row is 1.125mm/column and the feasible resolution along the column is 1.125mm/row Similarly with a higher focal length camera such as 6mm the feasible resolution becomes 0.75mm/column Implicitely, with decrease in feasible
resolution increases the 3D reconstruction accuracy From equation (1), the feasible resolution is proportional to the distance between the camera and the object, as the distance