Hierarchical Object Discovery and Dense Modelling- 123docz.net

We demonstrate and evaluate our approach to hierarchical object discovery and dense modelling on two RGB-D sequences. The first sequence contains two independently moving chairs in a static camera setup. The second sequence displays a container with drawers in which the camera is moving througout the sequence. The container is moved with respect to the static scene background, before one of the drawers is pulled open.

Both sequences have been recorded with an Asus Xtion Pro Live RGB-D camera. Ground truth motion could not be captured with a motion capture system, as the optical markers would have been occluded during the recordings. For reference, the accuracy of our motion segmentation and SLAM methods have been assessed in chapters 4 and 6. For the MRSMaps we use a distance depen- dency factor of λρ = 0.014 at a maximum resolution of 0.025m. All formulae butF0have been weighted by wF = 1. ForF0we used a weight of 10 to increase the influence of these evidence relations. The lower bound for the overlap was chosen asρ0= 0.5. Finally, we accept relations as valid, if their belief exceeds a threshold of 0.8.

7.3.1.1. Chairs Sequence

Fig. 7.10 shows the sequence of the 14 key views extracted in the chairs sequence.

In addition, we show the 34 segmentations made between pairs of key views. It can be seen, that many out-of-sequence segmentations between key view pairs are established. They occur most frequently, where one chair stops moving while the other chair is pushed.

All valid relations between segments and objects found by our approach are shown in Fig. 7.11. At the end of the sequence, the MLN consists of 7,466 formulae. To keep the graph-structure comprehensible, we do not display relations with a belief below a threshold of 0.8. In the graph, the 5 objects cluster those segments that are in equivalence relations with each object. Many relations between segments are incorporated by relating out-of-sequence segmentations, which are visible as smaller loops in the segment relations.

Figs. 7.12 and 7.13 show representative SLAM graphs of two of the found objects. Out-of-sequence relations also produce loops in the pose graphs. Each view pose is attributed multiple segmentations of the same key view towards different other key views. While the right chair in Fig. 7.12 is only seen by equivalent segments, the pose graph of the object that subsumes left chair and background is more complex (Fig. 7.13). It not only has view poses for the segments that see the complete object, but also for segments that partially observe it. If a segment in a key view represents only parts of the object, it appears

Figure 7.10.: Extracted key views and segmentations on the chairs sequence.

Red arrows depict the temporal sequence of the key views. Black arrows point from segmented to connected key view.

7.3.Experimen

Figure 7.11.: Graph of valid relations on the chairs sequence. Blue/cyan: part-relation, red/magenta: equivalence relation, cyan/magenta: segment-object evidence relation.

Figure 7.12.: SLAM graph of one object (black circle) on the chairs sequence.

The view poses are shown as red circles, their interior displays the key view corresponding to the view pose. Spatial constraints in the pose graph are shown as red edges.

7.3.Experimen

Figure 7.13.: SLAM graph of one object (black circle) on the chairs sequence. The view poses are shown as red circles, their interior displays the key view corresponding to the view pose. Spatial constraints in the pose graph are

in a different view pose node than segments within the same key view of the complete object. This is necessary, since the parts move differently between the segmented key views and create different spatial constraints.

The resulting objects and the hierarchical relations between them are shown in Fig. 7.14. Our method finds left and right chair as well as the background segment. It also includes two objects that are composed of the background and either one of the chairs. We display the objects by overlaying the RGB- D measurements of their segments from their estimated view poses. We use segments that are in both part and equivalence relations with the objects.

The hierarchy reflects which segment splits and merges have been observed.

Between key views, often one chair has been moving with respect to the background and the other chair. Both chairs could also be observed to move simulta- neously with respect to the background. Our approach correctly recognizes that the background segment is part of the two objects that combine the background with either one of the chairs.

7.3.1.2. Container Sequence

The container sequence is more difficult than the chairs sequence. The camera is moving during the recording such that naive background subtraction would not be possible. The objects, furthermore, have to be singularized in a three-level hierarchy from drawer to container to background. Finally, large parts of the drawer are occluded while the container is closed in which case only the front panel of the drawer is visible.

Fig. 7.15 shows the 6 key views and 20 segmentations used by our approach.

As the motion of the objects is on a smaller scale, key views are related with large temporal gaps between them. Our approach finds 4 objects in the sequence. In addition to the singularized objects background, container, and drawer, it also finds an object that combines background, container, and drawer. This is caused by the sequence of split and merge events: the container is observed static with the background while the drawer is moving. The valid relations inferred are shown in Fig. 7.16. The MLN has 4,460 formulae after the last frame. As in the chairs sequence, segments cluster at the objects with which they are equivalent.

Figs. 7.17 and 7.18 visualize the object SLAM graphs for the background and the drawer objects. For the background, the segments observe the object equivalently. Hence, each key view that is segmented for the background, is included once as a view pose in the pose graph. This is also the case for the drawer for most of the key views. For one key view, however, the whole visible part of the drawer, as well as the front panel alone is segmented. While the segments are determined to not be equivalent by their overlap, the front panel is observed as part of the whole drawer. By this, a spatial constraint from the front panel to the whole drawer is included in the pose graph. Remarkably, although these individual segments are not directly equivalent through overlap,

7.3.Experimen

Figure 7.15.: Extracted key views and segmentations on the container sequence.

Red arrows depict the temporal sequence of the key views. Black arrows point from segmented to connected key view.

7.3.Experimen

Figure 7.17.: SLAM graph of one object (black circle) on the container sequence.

The view poses are shown as red circles, their interior displays the key view corresponding to the view pose. Spatial constraints in the pose graph are shown as red edges.

7.3. Experiments

Figure 7.18.: SLAM graph of one object (black circle) on the container sequence.

The view poses are shown as red circles, their interior displays the key view corresponding to the view pose. Spatial constraints in the pose graph are shown as red edges.

Figure 7.19.: SLAM graph of one object (black circle) on the container sequence.

The view poses are shown as red circles, their interior displays the key view corresponding to the view pose. Spatial constraints in the pose graph are shown as red edges.

7.3. Experiments

Figure 7.20.: Discovered objects (black circles) and valid part-relations (blue arrows, point from part to containing object) on the container sequence.

Figure 7.21.: Our method makes the drawer inside the container explicit. The drawer segments are part of the shown container object (left). They are not in equivalence relations with the object (right).

sequence tracking key view addition

out-of- sequence

relation

belief

propagation pruning

pose graph update

total

chairs 0.139 1.025 0.010 0.127 0.0002 0.001 0.301

(0.257) (2.042) (0.876) (6.597) (0.001) (0.003) (7.626)

container 0.208 0.921 0.017 0.092 0.0002 0.0001 0.329

(0.298) (1.444) (1.143) (12.268) (0.002) (0.004) (13.652)

Table 7.1.: Average (maximum) run-time in seconds for the individual parts of the processing pipeling of our hierarchical object discovery and dense modeling approach.

our probabilistic reasoning approach recognizes the segments as equivalent to the drawer. The front panel overlaps to a large degree in both directions with many other segments. Those segments have strong evidence to be equivalent with the drawer.

Fig. 7.19 shows the object SLAM graph of the container. It is more complex than the graphs of the drawer and the background, as it also includes view poses for segments in part-relations with the container.

The discovered hierarchy between the objects can be seen in Fig. 7.20. Our approach correctly discovers that the drawer is part of the container, which in turn moves separately with respect to the background. All objects are part of the combined object of background, container, and drawer. Fig. 7.21 displays that our approach makes the drawer explicit as a part inside the container.

7.3. Experiments

Figure 7.22.: Cognitive service robot Cosero manipulates an unknown watering can during the Open Challenge at RoboCup 2013.

7.3.1.3. Run-Time

The run-time of our approach on both sequences is shown in Table 7.1. Keeping track of the current’s image segmentation with respect to the reference key view requires run-time similar to the timing results in Ch. 4. Instantiating a new key view vt involves two segmentations Stt−1 and St−1t that are run for several iterations until convergence. In average this takes 1.025 s on the chairs and 0.921 s on the container sequence. In each frame, we search for one new out- of-sequence relation. If a new relation is included, one segmentation has to be determined and the relations between segments and objects need to be updated.

This amounts in average to 0.01 s and 0.017 s. The maximum times of 0.876 s and 1.143 s occur if a relation is established. We only search for out-of-sequence relations, if no key view is added for the current image. MLN inference is also efficient in average. It can, however, take many iterations and several seconds to converge if ambiguous evidence needs to be balanced. Pruning objects and object SLAM graphs as well as updating the object SLAM graphs with new relations costs negligible time. In average, the total run-time is governed by the time required for tracking and belief propagation. If new key views or out-of- sequence relations are added, or if relational information is ambiguous, run-time can peak up to a few seconds. In our current approach, both the estimation of new segmentations as well as belief propagation is run until convergence. While such peaks are infrequent and their magnitude is low, in future work instead, peaks could be avoided by distributing the computational load from a single image to multiple subsequent images.

Hierarchical Object Discovery and Dense Modelling

Efficient Coarse-To-Fine Deformable Registration of Multi-Resolution Surfel Maps

Object Tracking with Particle Filters