Detecting Objects and Estimating Pose with Multi-R- 123docz.net

Tracking requires an initial guess on the object pose. In many applications, however, the object’s pose is not known a-priori and needs to be estimated from the images. We adapt a state-of-the-art approach to object detection and pose estimation in point clouds to our MRSMap framework.

Our object detection method is based on the surfel-pair voting algorithm pro- posed by Drost et al. (2010) which has been recently extended for RGB-D images with color by Choi and Christensen (2012a). Our contribution is a pose voting

6.3. Object Detection and Real-Time Tracking

Figure 6.3.: Surfel-pairs, features, and constructed reference frames. Left: We describe geometry, luminance, and color between two surfels by distance, angular relations of normals, and luminance and color contrasts. Right: A surfel-pair defines a unique pose in the map frame by aligning the normal of the reference surfel sr with the x-axis of the map frame and rotating the paired surfel si by an angle α around the x-axis onto the half-plane spanned by the x- and positive y-direction.

scheme that utilizes surfel-pairs at multiple resolutions in varying local neigh- borhoods. The aim of object detection and pose estimation is to find an object in an RGB-D image by aligning a MRSMap ms of the image with an object model MRSMapmm.

6.3.1.1. Local Colored Surfel-Pair Relations at Multiple Resolutions

As in (Drost et al., 2010), we describe the geometric relation between a pair of surfels f(sr, si) := (fs(sr, si), fc(sr, si)) with the geometric descriptors

fs(sr, si) :=àpr−àpi

2, ∠nr, àpr−àpi, ∠ni, àpr−àpi, ∠(nr, ni) (6.29) that measure distance and angles between means and normals (see Fig. 6.3, left). In addition, we incorporate color by the three luminance and chrominance contrasts

fc(sr, si) :=àLr −àLi, àαr −àαi, àβr −àβi. (6.30) Different to the approach of Drost et al. (2010), we only consider surfel- pairings for a reference surfel sr in a local neighborhood around the surfel. The

Figure 6.4.: Surfel-pair voting. Each association of surfel-pairs between scene and model votes for a 6-DoF camera pose relative to the model in a two-dimensional Hough space.

radius rρ =λrρ(sr)−1 of the neighborhood is set in relation with the surfel’s resolution. We also neglect surfel-pairs with similar normals, luminance, and chrominances to avoid ambiguous pose voting from planar, textureless regions.

6.3.1.2. Multi-Resolution Pose Voting

A surfel-pair defines a unique coordinate frame through the normal direction of the reference surfel and the difference between the means as long as the difference is not parallel to the normal, which is unlikely to happen in practice. This frame is used to define the pose of the surfel-pair relative to the reference frame of a map. We follow the approach of Drost et al. (2010) and decompose this pose into a transformationTsgr that moves the meanàr of the reference surfel into the map origin and aligns its normal nr with the map x-axis (see Fig. 6.3, right).

A final rotation around the x-axis with angle α(sr, si) moves the paired surfel meanài into the half-plane spanned by the x- and y-axes with positive y-values.

If we decompose the pose in this way, all pairings of the reference surfel share the same transformation Tsgr and only differ in angle α.

From a correct match of surfel-pairs between two maps we are able to estimate the pose difference between the maps. Let (ss,r, ss,i) and (sm,r, sm,i) be two matching surfel-pairs in scene and model map, respectively. The pose difference Tsm between the map reference frames can be determined from

Tsm=Rx(α(sm,r, sm,i))Tsgm,r−1 Rx(α(ss,r, ss,i))Tsgs,r

=Tsgm,r−1 Rx(α)Tsgs,r

(6.31)

6.3. Object Detection and Real-Time Tracking

with α=α(ss,r, ss,i)−α(sm,r, sm,i).

For object detection problems, however, correct matches of surfel-pairs between scene and model map are not known a-priori, but need to be estimated with the object pose. Drost et al. (2010) propose a Hough voting scheme in which surfel-pairs are matched according to their geometric descriptor and cast votes for the object pose. For efficient matching, hash keys are determined from the descriptors to map surfel-pairs in a hash table. The descriptors of the surfel-pairs are quantized into a number of bins per dimension to form the keys.

From a matching of surfel-pairs, a potential object pose is determined by the indexrof the matched model reference surfelsm,r and the angleα=α(ss,r, ss,i)−

α(sm,r, sm,i) that aligns the surfel-pairs (see Fig. 6.4). Hough voting is efficiently performed in this two-dimensional pose space. Each model reference surfel is considered individually in the Hough space, while the angles α are discretized into a number of bins. To increase the precision of the Hough procedure, we attribute a continuous angle estimate for a surfel match to the two closest angle bins.

We process scene reference surfels per available resolution, and, to achieve fast run-time, sample a fraction of the scene reference surfels uniformly without replacement. Pose votes are separately accumulated in a Hough space for each scene reference surfel ss,r. The local surfel-pairings of the reference surfel ss,r

with other scene surfels ss,i are matched with surfel-pairs (sm,r, sm,i) via their descriptors through efficient hash map look-ups. Multiple matchings may be re- trieved for the scene pair. Each matching votes for a pose in the two-dimensional Hough space. After all pairings for the scene reference surfel ss,r have been processed, the bin that accumulated most pose votes is determined and a pose hypothesis is extracted. We also include pose hypotheses from Hough space bins that received a fraction of votes below the maximum. Each pose hypothesis is assigned a score that corresponds to its accumulated votes.

In order to find the most consistent pose hypotheses across all scene reference surfels, we merge the pose hypotheses using agglomerative clustering with a threshold on the linear and angular distance of the poses. Since agglomerative clustering depends on the ordering of the pose hypotheses, we sort the hypotheses for their scores in descending order. The algorithm finally returns the top C clusters which accumulate the highest score of pose hypotheses.

6.3.1.3. Pose Verfication

The resulting pose hypotheses of our voting method are only coarse object pose estimates. Also, the voting method does only consider positive information for matching. It does not validate if pose hypotheses would observe parts of the model in front of actual measurements, i.e. the measurements would be seen through the model. We therefore perform a pose verification step to increase the rate of retrieving correct hypothesis.

Each pose hypothesis is registered towards the model from its pose estimates using a few iterations of LM registration (see Sec. 3.2.2). We determine the matching likelihood of scene to model map for the optimized poses according to Sec. 6.2.1.1 and reorder the pose hypotheses by their matching likelihood.

Detecting Objects and Estimating Pose with Multi-Resolution Surfel Maps

Efficient Coarse-To-Fine Deformable Registration of Multi-Resolution Surfel Maps

Object Tracking with Particle Filters