Dynamic Vision for Perception and Control of Motion - Ernst D. Dickmanns Part 13 pdf

Detection of vehicle candidates by search of horizontal edges in vertical search stripes below the on: Mask parameters selected such that several stripes cover a small vehicle [Thomanek

Trang 1

the general case Therefore, it is always recommended to take into account the best estimates for the road state and for the relative state of other vehicles

The last three columns in Figure 11.6 will be of interest for the more advanced vision systems of the future exploiting the full potential of the sense of vision with high resolution when sufficient computing power will be available

It is the big advantage of vision over radar and laser range finding that vision lows recognizing the traffic situation with good resolution and up to greater ranges

al-if multal-ifocal vision with active gaze control is used This is not yet the general state

of the art since the data rates to be handled are rather high (many gigabytes/second) and their interpretation requires sophisticated software

In the case of expectation-based, multi focal, saccadic vision (EMS vision) it

has been demonstrated that from a functional point of view, visual perception as in humans is possible; until the human performance level is achieved, however, quite

a bit of development has still to be done We will come back to this point in the nal outlook

fi-Due to this situation, industry has decided to pick radar for obstacle detection in systems already on the market for traffic applications; LRF has also been studied intensively and is being prepared for market introduction in the near future Radar-based systems for driver assistance in cruise control have been available for a few years by now Complementing them by vision for road and lane recognition as well

as for reduction of false alarms has been investigated for about the same time These combined systems will not be looked at here; the basic goal of this section is

to develop and demonstrate the potential of vertebrate-type vision for use in the long run It exploits exactly the same features as human vision does, and should thus be sufficient for safe driving Multisensor adaptive cruise control will be dis-cussed in Section 14.6.3

11.3.1 Feature Sets for Visual Vehicle Detection

Many different approaches have been tried for solving this problem since the late 1980s Regensburger (1993) presents a good survey on the task “visual obstacle recognition in road traffic” In [Carlson, Eklundh 1990], an object detection method using prediction and motion parallax is investigated In [Kuehnle 1991], the use of symmetries of contours, gray levels and horizontal lines for obstacle detection and tracking is discussed [Zielke et al 1993] investigates a similar approach Other ap-proaches are the evaluation of optical flow fields [Enkelmann 1990] and model-based techniques like the one described below [Koller et al 1993] Solder and Graefe (1990) find road vehicles by extracting the left, right and lower object boundary using controlled correlation An up-to-date survey on the topic may be found in [Masaki 1992++] or in the vision bibliography [http://iris.usc.edu/Vision- Notes/bibliography/contents.html] Some more recent papers are [Graefe, Efenberger

1996; Kalinke et al 1998; Fleischer et al 2002; Labayarde et al.2002; Broggi et al 2004]

The main goal of the 4-D approach to dynamic machine vision from the ning has been to take advantage of the full spatiotemporal framework for internal representation and to do as little reasoning as possible in the image plane and be-tween frames Instead, temporal continuity in physical space according to some

Trang 2

begin-model for the motion of objects is being exploited in conjunction with spatial shape rigidity in this “analysis-by-synthesis” approach

Since high image evaluation rate had proven more beneficial in this approach than using a wide variety of features, only edge features with adjacent average in-tensity values in mask regions were used when computing power was very low (see Section 5.2) With increasing computing power, homogeneously shaded blobs, corner features, and in the long run, color and texture are being added In any case, perturbations both from the motion process and from measurements as well as from data interpretation tend to change rapidly over time so that a single image in a sequence should not be given too much weight; instead, filtering likely (maybe not very precise) results at a high rate using motion models with low eigenfrequencies has proven to be a good way to go So, concentration on feature extraction was on fast available ones with selection of those used guided by expectations and statisti-cal data of the recursive estimation process running

For this reason, image evaluation rates of less than about ten per second were not considered acceptable from the beginning in the early 1980s; the number of processors in the system and workload sharing had to be adjusted such that the high evaluation rate was achievable This was in sharp contrast to the approaches

to machine vision studied by most other groups around the globe at that time cumulated delay times could be handled by exploiting the spatiotemporal models for compensation by prediction These short cycle times, of course, left no great choice of features to be used On the contrary, even simple edge detection could not be used all over the image but had to be concentrated (attention controlled!) in those regions where objects of interest for the task at hand could be expected Once the road has been known from the specific perception loop for it, “obsta-cles” could be only those objects in a certain volume above the road region, strictly speaking only those within and somewhat to the side of the width of the wheel tracks

Ac-11.3.1.1 Edge Features and Adjacent Average Gray Values

Edge features are robust to changes in lighting conditions; maybe this is the reason why their extraction is widespread in biological vision systems (striate cortex) Edge features on their own have three parameters for specifying them completely: position, orientation, and the value of the extreme intensity gradient By associat-ing the average intensity on one side of the edge as a fourth parameter with each edge, average intensities on both sides are known since the gradient is the differ-ence between both sides; this allows coarse area-based information to be included

in the feature

Mori and Charkari (1993) have shown that the shadow underneath a vehicle is a significant pattern for detecting vehicles; it usually is the darkest region in the en-vironment Combining this feature with knowledge of 3-D geometric models and 4-D dynamic scene understanding leads to a robust method for obstacle detection and tracking [Thomanek et al 1994; Thomanek 1996] developed the first vision sys-tem capable of tracking a half dozen vehicles on highways in each hemisphere with bifocal vision based on these facts in closed-loop autonomous driving This ap-proach will be taken as a starting point for discussing more modern approaches ex-

Trang 3

ploiting the increase in

highway scene from a

wide-angle camera with

one car ahead in the

sub-ject’s lane A search for

horizontal edge features

is performed in vertical

search stripes with

KRONOS masks of size

5 × 7 as indicated on the

right-hand side (see

Sec-tion 5.2) Due to missing computer performance in the early 1990s, the search stripes did not cover the whole image below the horizon; evaluation cycle time was

80 ms (every second video field with the same index) Stripe width and spacing as well as mask parameters had to be adjusted according to the detection range de-sired For improved resolution, there was a second camera with a telelens on the gaze controlled platform (see Figure 1.3) with a viewing range about three times as far (and a correspondingly narrower field of view) compared to the wide-angle camera This allowed using exactly the same feature extraction algorithms for ve-hicles nearby and further away (see Figure 11.22 further below)

Find lower edge of a vehicle: About 30 search stripes of 100 pixels length have been analyzed by shifting the correlation mask top-down to find close-to-horizontal edge features at extreme correlation values Potential candidates for the dark area underneath the vehicle have to satisfy the criteria:

The value of the mask response (correlation magnitude) at the edge has to be above a threshold value (corrmin,uv)

The average gray value of the trailing mask region (upper part) has to be below

a threshold value (darkmin,uv)

The first bullet requires a pronounced dark-to-bright transition, and the second one eliminates areas that are too bright to stem from the shaded region underneath the vehicle; adapting these threshold values to the situation actually given is the chal-lenge for good performance For tanker vehicles and low standing sun, the ap-proach very likely does not work In this case, the big volume above the wheels may require area-based features for robust recognition (homogeneously shaded, for example)

Generate horizontal contours:Edge elements satisfying certain gestalt conditions are aggregated applying a known algorithm for chaining The following steps are performed, starting from the left window and ending with the right one:

1 For each edge element, search the nearest one in the neighboring stripe and store the corresponding index if the distance to it is below a threshold value

Search direction

Figure 11.7 Detection of vehicle candidates by search of

horizontal edges in vertical search stripes below the on: Mask parameters selected such that several stripes cover a small vehicle [Thomanek 1996]

hori-z

Trang 4

2 Tag each edge element with the number count of previous corresponding

ele-ments (e.g., six, if the contour contains six edge eleele-ments up to this point)

Read starting point Ps(ys, zs) and end point Pe(ye, ze) of each extracted contour and

check the slope, whose magnitude |(ze – zs)/(ye – ys)| must be below a threshold for being accepted (close to horizontal, see Figure 11.8)

If lines grow too long, they very likely stem from the shadow of a bridge or from other buildings in the vicinity; they may be tracked as the hypothesis for a new stationary object (shadow or discontinuity in surface appearance), but elimi-nating them altogether will do no harm to tracking moving vehicles with speed al-ready recognized Within a few cycles, these elongated lines will have moved out

of the actual image With knowledge of 3-D geometry (projection equations link row number to range), the extracted contours are examined to see whether they al-low association with

certain object classes:

Side constraints

con-cerning width must be

satisfied; likely height

is thereby hypothesized

Contours starting

from inhomogeneous

areas inside the objects

(i.e., bumper bar or rear

window) are discarded;

they lie above the lower

shadow region (see

Fig-ure 11.9)

Determine lateral boundaries: Depending on the lateral position relative to the lane driven in, the vertical object boundaries are extracted additionally This is done with an edge detector which exploits the fact that the difference in brightness

on the object and from the background is not constant and can even change sign; in

1 Chaining of geometrically next element

end point

2 Numbering of edge

elements (each branch)

3 Elimination of shorter branch, determine start and end point

start point

Figure 11.8 Contour generation from edge elements observing gestalt ideas of nearness

and colinearity; below an upper limit for total contour length, only the longer one is kept

Figure 11.9 Extracted horizontal edge elements: The

rec-tangular group of features is an indication of a vehicle didate; the lower elements (aggregated shadow region under the car) allow estimation of the range to the vehicle

Trang 5

can-Figure 11.10, the wheels and fender

are darker than the light gray of the

road while the white body is brighter

than the road

For this purpose, the gradient of

brightness is calculated at each

posi-tion in each image row, and its

abso-lute values are summed up over the

lines of interest The calculated

dis-tribution of correlation values has

significantly large maxima at the

ob-ject boundaries (lower part of

fig-ure) The maxima of the

accumu-lated values yield the width of the

obstacle in the image; knowing

range and mapping parameters,

ob-stacle size in the real world is

initial-ized for recursive estimation and

up-dated until it is stable With clearly

visible extremes as in the lower part

of Figure 11.10 the object width of

the real vehicle is fixed, and changes

in the image are from now on used

to support range estimation

For vehicles driving in their own

lanes, the left and right object boundary must be present to accept the extracted horizontal contour as representing an object In neighboring lanes, it suffices to find a vertical boundary on the side of the vehicle adjacent to their own lane to prove the hypothesis of an object in connection with the lower contour This means that in the left lane, a vertical line to the right of the lower contour has to be found, while in the right lane, one to the left has to be found for acceptance of the hy-pothesis of a vehicle This allows recognition of partially occluded objects, too The algorithm was able to detect and track up to five objects in parallel with four INMOS Transputer® 222 (16 bit) for feature extraction and one T805 (32 bit) for recursive estimation at a cycle time of 80 ms

Figure 11.10 Determination of lateral

boundaries of a vehicle by accumulation of correlation values at each position in each single row of the lower part of the body with

a KRONOS-mask (n w = 1; n d = large) The maxima of the accumulated values yield the width of the obstacle in the image

gram of correlation maxima from single rows

Histo-Pixel position

Applying these methods is a powerful tool for extracting vehicle boundaries in monochrome images also for modern high-performance microprocessors Adding more features, however, can make the system more versatile with respect to type of vehicle and more robust under strong perturbations in lighting conditions

11.3.1.2 Homogeneous Intensity Blobs

Region-based methods, extracting homogeneously shaded or textured areas are of importance especially for robust recognition of large vehicles Color recognition very much alleviates object separation in complex scenes with many objects of dif-ferent colors But just regions of homogeneous intensity shading alleviate object separation considerably (especially in connection with other features)

Trang 6

In Figure 11.11 the homogeneously shaded areas of the road yield the

back-ground for detecting cles with different inten-sity blobs above a dark region on the ground, stemming from vehicle shade underneath the body Though resolution

vehi-is poor (32 pixels per mel and 128 per mask) and some artifacts normal to the search direction can

be seen, relatively good hypotheses for objects are derivable from this coarse scale Five vehicle candi-dates can be recognized, three of which are par-tially occluded The car ahead in the same lane and the bus in the right neighboring lane are clearly visible The truck further ahead in the subject’s lane can clearly be recognized by its dark upper body For the two cars in the left neighboring lane, resolution is too poor to recognize details; however, from the shape of the road area, the presence of two cars can be hypothe-sized Low resolution allows higher evaluation frequency for limited computing power

Figure 11.11 Highway scene with many vehicles,

ana-lyzed with UBM method (see Section 5.3.2.4) in vertical

stripes with coarse resolution (22.42C) and aggregation

of homogeneous intensity blobs (see text)

Performing the search

on the coarse scale for

homogeneously shaded

regions in both vertical

and horizontal stripes

yields sharp edges in the

search direction; thus,

close to vertical blob

boundaries should be

taken from horizontal

search results while close

to horizontal boundaries

should be taken from

ver-tical search results

Reconstructed image:

Coarse (4x4)

Fine

Coarse resolution

Figure 11.12 Highway scene similar to Figure 11.11

with more vehicles analyzed with UBM method in zontal stripes; the outer regions are treated with coarse resolution (11.44R), while the central region (within the white box) covering a larger look-ahead range above the road, is analyzed on a fine scale (11.11R) (reconstructed images, see text)

hori-Figure 11.12 shows

re-sults from a row search

with different parameters

(11.44R) for another

im-age out of the same

se-quence (see bus in right

neighboring lane and the

Trang 7

dark truck in the subject’s lane) Here, however, the central part of the image, into which objects further away on the road are mapped, is analyzed at fine resolution giving full details (11.11R) This yields many more details and homogeneous in-tensity blobs; the reconstructed image shown can hardly be distinguished from the original image A total of eight vehicle candidates can be recognized, six of which are partially occluded It can be easily understood from this image that large vehi-cles like trucks and buses should be hypothesized from the presence of larger ho-mogeneous areas well above an elevation of one wheel diameter from the ground For humans, it is immediately clear that in neighboring lanes, vehicles are recog-nized by three wheels if no occlusion is present; the far outer front wheel will be self-occluded by the vehicle body All wheels will be only partially visible This fact has led to the development of parameterized wheel detectors based on features defined by regional intensity elements [Hofmann 2004].

Figure 11.13 shows the basic idea and the derivation of templates that can be adapted to wheel diameter (including range) and aspect angle in pan (small tilt an-gles are neglected because they enter with a cosine effect (§ 1)); since the car body occludes a large part of the wheels, the lower part of the dark tire contrasting the road to its sides is especially emphasized For orthogonal and oblique views of the near side of the vehicle, usually, the inner part of the wheel contrasts to the tire around it; ellipticity is continuously adapted according to the best estimate for the relative yaw (pan) angle

Figure 11.13 Derivation of templates for wheel recognition from coarse shape

repre-sentations (octagon): (a) Basic geometric parameters: width, outer and inner visible dius of tire; (b) oblique view transforms circle into ellipses as a function of aspect angle; (c) shape approximation for templates, radii, and aspect angle are parameters; (d) tem- plate masks for typically visible parts of wheels [seen from left, right, § orthogonal, far side (underneath body)] Intelligently controlled 2-D search is done based on the exist- ing hypothesis for a vehicle body (after [Hofmann 2004])

ra-The wheels on the near side appear in pairs, usually, separated by the axle tance in the longitudinal direction which lets the front wheel appear higher up in the image due to camera elevation above the wheel axle There is good default knowledge available on the geometric parameters involved so that initialization poses no challenge Again, being overly accurate in a single image does not make sense, since averaging over time will lead to a stable (maybe a little bit noisier) re-

Trang 8

dis-sult with the noise doing no harm To support estimation of the aspect conditions, taking into account other characteristic subobjects like light groups in relation to the license plate as regional features will help

11.3.1.3 Corner Features

This class of features is especially helpful before a good interpretation of the scene

or an object has been achieved If corner localization can be achieved precisely and consistently from frame to frame, it allows determining feature flow in both image dimensions and is thus optimally suited for tracking without image understanding However, the challenge is that checking consistency requires some kind of under-standing of the feature arrangement Recognition of complex motion patterns of ar-ticulated bodies is very much alleviated using these features For this reason, their extraction has received quite a bit of attention in the literature (see Section 5.3.3) Even special hardware has been developed for this purpose

With the computing power nowadays available in general-purpose processors, corner detection can be afforded as a standard component in image analysis The unified blob-edge-corner method (UBM) treated in Section 5.3 first separates candidate regions for corners in a very simple way from those for homo-geneously shaded regions and edges Only a very small percentage of usual road images qualify as corner candidates depending on the planarity threshold specified (see Figures 5.23 and 5.26); this allows efficient corner detection in real time to-gether with blobs and edges The combination then alleviates detection of joint fea-ture flow and object candidates: Jointly moving blobs, edges, and corners in the image plane are the best indicators of a moving object

micro-11.3.2 Hypothesis Generation and Initialization

The center of gravity of a jointly moving group of features tells us something about the translational motion of the object normal to the optical axis; expanding or shrinking similar feature distributions contains information on radial motion Changing relative positions of features other than expansion or shrinking carries information on rotational motion of the object The crucial point is the jump from 2-D feature distributions observed over a short amount of time to an object hy-pothesis in 3-D space and time

11.3.2.1 Influence of Domain and Actual Situation

If one had to start from scratch without any knowledge about the domain of the tual task, the problem would be hardly solvable Even within a known domain (like

ac-“road traffic”) the challenge is still large since there are so many types of roads, lighting-, and weather conditions; the vehicle may be stationary or moving on a smooth or on a rough surface

It is assumed here that the human operator has checked the lighting and weather conditions and has found them acceptable for autonomous perception and opera-

Trang 9

tion When observation of other vehicles is started, it is also assumed that road ognition has been initiated successfully and is working properly; this provides the system (via DOB, see Chapters 4 and 13) with the number and widths of lanes ac-tually available With GPS and digital maps onboard and working, the type of road being driven is known: unidirectional or two-way traffic, motorway or general cross-country/urban road

rec-The type of road determines the classes of obstacles that might be expected with certain likelihood; the levels of likelihood may be taken into account in hypothesis generation Pedestrians are less likely on high-speed than on urban roads Speed actually being driven and traffic density also have an influence on this choice; for example, in a traffic jam on a freeway with very low average speed, pedestrians are more likely than in normal freeway traffic

11.3.2.2 Three Components Required for Instantiation

In the 4-D approach, there are always three components necessary for starting ception based on recursive estimation: (1) the generic object type (class and sub-class with reasonable parameter settings), (2) the aspect conditions (initial values for state components, and (3) the dynamic model as knowledge (or side constraint)

per-of evolution over time; for subjects, this includes knowledge per-of (stereotypical) tion capabilities and their temporal sequence This latter component means an indi-vidual capability for animation based on onsets of maneuvers visually observed; this component will be needed mainly in tracking (see Section 11.3.3) However, a passing car cutting into the vehicle’s lane immediately ahead will be perceived much faster and more robustly if this motion behavior (normally not allowed) is available also during the initialization phase, which takes about one half to one second, usually

mo-Instantiation of a generic object (3-D shape): The first step always is to establish good range estimation to the object If stereovision or direct range measurements are available, this information should be taken from these sources For monocular vision, this step is done with the row index zBu of the lowest features that belong most likely to the object Then, the first part of the following procedure is, as it is for static obstacles, to obtain initial values of range and bearing

With range information and the known camera parameters, the object in the age can be scaled for comparison with models in the knowledge base of 3-D ob-jects Homogeneously shaded regions with edges and corners moving in conjunc-tion give an indication of the vehicle type For example, in Figure 11.11, the car upfront, the truck ahead of it (obscured in the lower part), and the bus upfront to the right are easily classified correctly; the two cars in the lane to the left allow only uncertain classification due to occlusion of large parts of them Humans may feel certain in classifying the car upfront left, since they interpret the intensity blobs vertically located at the top and the center of the hypothesized car: The somewhat brighter rectangle at the top may originate from the light of the sky re-flected from the curved roof of the car The bright rectangular patch between two more quadratic ones a little bit darker halfway from the roof to the ground is inter-preted as a license plate between light groups at each rear side of the car

Trang 10

im-Figure 11.12 (taken a few frames apart from im-Figure 11.11) shows in the inner high-resolution part that this interpretation is correct It also can be seen by the three bright blobs reasonably distributed over the rear surface that the car immedi-ately ahead is now braking (in color vision, these blobs would be bright red) The two cars in the neighboring lane beside the dark truck are also braking (Note the different locations and partial obscuration of the braking lights on the three cars depending on make and traffic situation) Confining image interpretation for obsta-cle detection to the region marked by the white rectangle (as done in the early days) would make vehicle classification much more difficult Therefore, both pe-ripheral low-resolution and foveal high-resolution images in conjunction allow ef-ficient and sufficiently precise image interpretation

Aspect conditions: The vertical aspect angle is determined by the range and tion of the camera in the subject vehicle above the ground It will differ for cars, vans, and trucks/buses Therefore, only the aspect angle in yaw has to be derived from image evaluation In normal traffic situations with vehicles driving in the di-rection of the lanes, lane recognition yields the essential input for initializing the aspect angle in yaw

eleva-On straight roads, lane width and range to the vehicle determine the yaw aspect angle It is large for vehicles nearby and decreases with distance Therefore, in the right neighboring lane, only the left-hand and the rear side can be seen; in the left neighboring lane, it is the right-hand and rear side Tires of vehicles on the left have their dark contact area to the ground on the left side of the elliptically mapped

vertical wheel surface (and vice versa for the other side; see Figure 11.13d)

As-pects conditions and 3-D shape are closely linked together, of course, since both in conjunction determine the feature distribution in the image after perspective pro-jection, which is the only source available for dynamic scene understanding

Dynamic model: The third essential component for starting recursive estimation is the process model for motion which implements continuity conditions and knowl-edge about the evolution of motion over time This temporal component was the one that allowed achieving superior performance in image sequence interpretation and autonomous driving As mentioned before, there are two big advantages in temporal embedding:

1 Known state variables in a motion process decouple future evolution from the past (by definition); so there is no need to store previous images if all objects of relevance are represented by an individual dynamic process model Future evo-lution depends only on (a) the actual state, (b) the control output applied, and (c)

on external perturbations Items (b) and (c) principally are the unknowns while best estimates for (a) are derived by visual observation exploiting a knowledge base of vehicle classes (see Chapter 3)

2 Disturbance statistics can be compiled for both process and measurement noise; knowing these characteristics allows setting up a temporal filter process that (under certain constraints) yields optimal estimates for open parameters and for the state variables in the generic process model

3 These components together are the means by which “the outside world is duced into an internal representation in the computer” (The corresponding

trans-question often asked in biological systems is, how does the world get into your

Trang 11

head?) Quite a bit of background knowledge has to be available for this purpose

in the computer process analyzing the data stream and recognizing “the world”; features extracted from the image sequence activate the application of proper parts of this knowledge In this closed-loop process resulting in control output for a real vehicle (carrying the sensors), feedback of prediction-errors shows the validity of the models used and allows adaptation for improved performance The dynamic models used in the early days in [Thomanek 1996] were the follow-ing (separate, decoupled models for longitudinal and lateral translation, no rota-tional dynamics):

Simplified longitudinal dynamics: The goal was to estimate the range and range rate sufficiently well for automatic transition into and for convoy driving Since the control and perturbation inputs to the vehicle observed are unknown, a third-order model with colored noise for acceleration as given in Equations 2.34 and 2.35 has been chosen and proven to be sufficient [Bar-Shalom, Fortmann 1988] The noise term n(t) is fed into a first-order system with time constant Tc = 1/Į The discrete model then is (here ĭ = A)

[ k] q

After [Loffeld 1990]ıq should be chosen as the maximally expected acceleration

of the process observed

Simplified lateral dynamics:Since the lateral positions of the vehicles observed have only very minor effects on the subject’s control behavior, a standard second-

order dynamical model with the state variables, lateral position yo relative to the

subject’s (its own) lane center and lateral speed vyo are sufficient The discrete model then is

11.3.2.3 Initial State Variables for Starting Recursive Estimation

Figure 11.14 visualizes the transformation of the feature set marking the lower bound of a potential vehicle into estimated positions for the vehicles in Cartesian coordinates In the left part of the figure, dark-to-bright edges of the dark area un-derneath the vehicle in a top-down search are shown For the near range, assuming

a flat surface is sufficient, usually The tangents to the local lane markings are trapolated to a common vanishing point of the near range if the road is curved.Since convergence behavior is good, usually, special care is not necessary

ex-in the general case From the right part of the figure, the bearex-ing angles ȥi to the

Trang 12

vehicle candidates can easily be determined Initialization and feature selection for tracking have to take the different aspect conditions in the three lanes into account

3 2

The aspect graph in Figure 11.15 shows the distribution of characteristic

fea-tures as seen from the rear left (vehicle in front in neighboring lane to the right)

The features detected, for which correspondence can be established most easily at low computational cost, are the dark area underneath the vehicle and edge features

at the vehicle corners, front left and rear right (marked by bold letters in the ure) The configuration of rear groups of lights and license plate (dotted rectangle) and the characteristic set of wheel parts are the features detectable and most easily recognizable with additional area-based methods

fig-Figure 11.14 Transformation of image row, in which the lower edge of the dark

region underneath the vehicle appears, and of lateral position into Cartesian nates, based on camera elevation above the ground assumed to be flat

coordi-Getting good estimates for the velocity components needed for each order dynamic model is much harder Again, trusting in good convergence behav-ior leads to the easy solution, in which all velocity components are initialized with zeros Faster convergence may be achieved if an approximate estimation of the

second-Aspect hypothesis instantiated:

Single vehicle

aspect graph:

straight behind

left front wheel

left rear group of lights left rear

wheel

dark area neath car, edges

under-right rear wheel

right rear group

of lights

licence plate

elliptical central blob

FL SL RL SB RR

straight from

left front

edges rear right

dark tire below body line

elliptical central blob

group of blob features

SR

FR

dark tire below body

rear right

Figure 11.15 Aspect conditions determine feature sets to be extracted for tracking On

the same road in normal traffic, road curvature, distance, and the lane position relative to the subject’s own lane are the most essential parameters; traffic moving in the same or in the opposite direction exhibits rear/front parts of vehicles On crossroads, views from the side predominate The situation shown is typical for passing a vehicle in right-hand traffic

Trang 13

speed components can be achieved in the initial observation period of a few cycles; this is especially true if the corresponding elements of measurement covariance are set large and system covariance is set low (high confidence in the correctness of the model)

11.3.2.4 Measurement Model and Jacobian Elements

There are two essentially independent motion processes in the models given:

“Lon-gitudinal” (xo, Vo) and “lateral” states (yo, vyo) of the vehicle observed Pitching tion of the vehicle has not yet been taken into account However, if the sensors (cameras) have no degree of freedom for counteracting vehicle motion in pitch, this motion will affect visual measurement results appreciably (see Section 7.3.4) Depending on acceleration and deceleration, pitch angles of several degrees (§ 0.05 mrad) are not uncommon; at 70 m distance, this value corresponds to a height change in the real world of § 3.5 m for a point in the same row of the image Rough ground may easily introduce pitch vibrations with amplitudes around 0.25º (§ 0.005 mrad); at the same distance of 70 m, this corresponds to a 35 cm height change or changes in look-ahead distances on flat ground in the range of 10

mo-to 20 m (around 25 %) At shorter look-ahead distances, this sensitivity is much duced; for example, at 20 m, the same vibration amplitude of 0.25° leads to look-ahead changes of only about 1.3 m (6.5 %) for the test vehicle VaMP with camera

re-elevation HK = 1.3 m above the ground; larger elevations reduce this sensitivity

Of course, this sensitivity enters range estimation directly according to Figure 11.14 A sensitivity analysis of Equation 7.19 (with distance ȡ instead of Lf) shows that range changes (ȡ/ș) as a function of pitch angle ș, and (ȡ/zB) as a function

the same as determined nearby at distance Ln and to compute the initial value xo ing the pinhole camera model

us-The measurement model for the width of the vehicle is given by Equation 11.3; for each single vertical edge feature, the elements of the Jacobian matrix are

Trang 14

The second equation indicates that changes in range, 'xo, can be approximately

neglected for predicting lateral feature position since yo and 'xo << xo, and range is not updated from the prediction-error 'yBo (no direct cross-coupling between lon-

gitudinal and lateral model necessary) However, since lateral position yo in the first equation is updated by inverting the Jacobian element (yBo/yo), small predic-tion-errors 'yBo in the feature position in the image will lead to large increments

'yo Note that this sensitivity results from taking the camera coordinates only as

reference (xo) Determining yoL relative to the local road or lane has range xo cancel out, and the lateral position of the vehicle in its lane can be estimated as much less noise-corrupted

11.3.2.5 Statistical Parameters for Recursive Estimation

The covariance matrices Q of the system models are required as knowledge about

the process observed to achieve good convergence in recursive estimation The variance matrix of the longitudinal model has been given as Equation 2.38 For the lateral model, one similarly obtains

.1

ve-) 2 A detailed cussion of this filter design may be found in [Thomanek 1996]

dis-The statistical parameters for image evaluation determine the measurement

co-variance matrix R Errors in row and column evaluation are assumed to be

uncorre-lated Since lateral speed is not measured directly but only reconstructed from the model, the matrix R can be reduced to a scalar r with

11.3.2.6 Falsification Strategies for Hypothesis Pruning

Computing power available in the early 1990s allowed putting up just one object hypothesis for each set of features found as a candidate The increase in computa-tional resources by two to three orders of magnitude in the meantime (and even more in the future) allows putting up several likely object hypotheses in parallel This reduces delay time until stable interpretation and a corresponding internal rep-resentation has been achieved

Trang 15

The early jump to full spatiotemporal object hypotheses in connection with more detailed models for object classes has the advantage that it taps into the knowledge bases with characteristic image features and motion models without running the risk of combinatorial feature explosion as in a pure bottom-up ap-proach putting much emphasis on generating “the most likely single hypothesis” Each hypothesis allows predicting new characteristic features which can then be tested in the next image taking into account temporal changes already predictable Those hypotheses with a high rate and good quality of feature matches are pre-ferred over the others, which will be deleted only after a few cycles Of course, it is possible that two (or even more) hypotheses continue to exist in parallel Increas-ingly more features considered in parallel will eventually allow a final decision; otherwise, the object will be published in the DOB with certain parameters recog-nized but others still open

An example is a trailer observed from the rear driving at low speed; whether the vehicle towing is a truck capable of driving at speeds up to say 80 km/h or an agri-cultural tractor with a maximal speed of say 40 km/h cannot be decided until an oblique view in a tighter curve or performing a lane change is possible; the length

of the total vehicle is also unknown This information is essential for planning a passing maneuver in the future The oblique view uncovers a host of new features which easily allow answering the open questions Moving laterally in one’s own lane is a maneuver often used for uncovering new features of vehicles ahead or for discovering the reason for unexpectedly slow moving traffic

Once the tracking process is running for an object instantiated, the bottom-up detection process will rediscover its features independent of possible predictions Therefore, the main task is to establish correspondence between features predicted and those newly extracted A Mahalanobis distance with the matrix / for proper weighting of the contribution of different features of a contour is one way to go

Let the predicted contour of a vehicle be c*, the measured one c Its position in the image depends on (at least) three physical parameters: distance x, lateral position y and vehicle width B From the best estimates for these parameters, c* is

computed for the predicted states at the time of next measurement taking The

pre-diction-errors c – c* are taken to evaluate the set of features of a contour

minimiz-ing the distance

It is the combination of robust simple feature extraction and high-level temporal models with frequent bottom-up and top-down traversal of the representation

spatio-hierarchy that provides the basis for efficient dynamic vision In this context, time and motion in conjunction with knowledge about spatiotemporal processes constitute an efficient hypothesis pruning device.

If both shape and motion state have to be determined simultaneously [Schick 1992]

an interference problem may occur trading shape variations versus aspect conditions; these problems have just been tackled and it is too early to make general statements on favorable ways to proceed But again, observing both spatial rigidity and (dynamic) time constraints yields the best prospects for solving this difficult task efficiently

Tiêu đề	Dynamic Vision for Perception and Control of Motion
Trường học	Universität Karlsruhe (Karlsruhe Institute of Technology)
Chuyên ngành	Perception and Control of Motion
Thể loại	Lehrmaterial
Thành phố	Karlsruhe

Định dạng
Số trang	30
Dung lượng	766,61 KB