Dynamic Vision for Perception and Control of Motion - Ernst D. Dickmanns Part 6 ppt

In line b a rather large mask for finding the transition between relatively large homogeneous areas with ragged boundaries is given md = 17 pixels wide and each field with seven elemen

Trang 1

5.2 Efficient Extraction of Oriented Edge Features 135

ference is largest in amplitude,

the gradient over two

consecu-tive mask elements is

maxi-mal

However, due to local

per-turbations, this need not

corre-spond to an actual extreme

gradient on the scale of

inter-est Experience with images

from natural environments has

shown that two additional

pa-rameters may considerably

improve the results obtained:

1 By allowing a yet to be

specified number n0 of

entries in the mask center

to be dropped, the results achieved may be more robust This can be immediately appreciated when taking into account that either the actual edge direction may deviate from the mask orientation used or the edge is not straight but curved; by setting central elements of the mask to zero, the extreme intensity gradient becomes more pronounced The rest of Figure 5.10

shows typical mask parameters with n0 = 1 for masks three and five pixels in depth (md = 3 or 5), and with n0 = 2 for md = 8 as well as n0 = 3 for md = 17

(rows b, c)

2 Local perturbations are suppressed by assigning to the mask a significant

depth nd, which designates the number of pixels along the search path in each row or column in each positive and negative field The total mask depth md then is md = 2 nd + n0 Figure 5.10 shows the corresponding mask schemes In

line (b) a rather large mask for finding the transition between relatively large

homogeneous areas with ragged boundaries is given (md = 17 pixels wide and

each field with seven elements, so that the correlation value is formed from

large averages; for a mask width nw of 17 pixels, the correlation value is

formed from 7·17 = 119 pixels) With the number of zero-values in between

chosen as n0 = 3, the total receptive field (= mask) size is 17·17 = 289 pixels The sum formed from nd mask elements (vector values “ColSum”) divided by (nw· nd) represents the average intensity value in the oblique image region

adjacent to the edge At the maximum correlation value found, this is the average gray value on one side of the edge This information may be used for recognizing a specific edge feature in consecutive images or for grouping edges in a scene context

For larger mask depths, it is more efficient when shifting the mask along the search direction, to subtract the last mask element (ColSum-value) from the summed field intensities and add the next one at the front in the search direction, see line (c) in Figure 5.10); the number of operations needed is much lower than for summing all ColSum elements anew in each field

The optimal value of these additional mask parameters nd and n0 as well as the mask width nw depend on the scene at hand and are considered knowledge gained

Figure 5.10 Efficient mask evaluation with the

“Colsum”-vector; the nd-values given are typical for sizes of “receptive fields” formed

b)

md = 2; md= 3; md= 5; md = 8

7 3 7

Masks characterized by: (nd n0 nd )

md = 17 (Total mask depth)

(a)

(b)

(c)

Trang 2

by experience in visually similar environments From these considerations, generic edge extraction mask sets for specific problems have resulted In Figure 5.11, some representative receptive fields for different tasks are given The mask parameters can be changed from one video frame to the next, allowing easy adaptation to changing scenes observed continuously, like driving on a curved road

The large mask in the center top of Figure 5.11 may be used on dirt roads in the near region with ragged transitions from road to shoulder For sharp, pronounced edges like well-kept lane markings, a receptive field like that in the upper right cor-

ner (probably with nd = 2, that is, md = 5) will be most efficient The further one looks ahead, the more the mask width nw should be reduced (9 or 5 pixels); part (c)

in the lower center shows a typical mask for edges on the right-hand side of a straight road further away (smaller and oblique to the right)

The 5 × 5 (2, 1, 2) mask at the left hand side of Figure 5.11 has been the dard mask for initial detection of other vehicles and obstacles on the road through horizontal edges; collections of horizontal edge elements are good indicators for objects torn by gravity to the road surface Additional masks are then applied for checking object hypotheses formed

stan-If narrow lines like lane markings have to be detected, there is an optimal mask

width depending on the width of the line in the image: If the mask depth nd chosen

is too large, the line will be low-pass-filtered and extreme gradients lose in tude; if mask depth is too small, sensitivity to noise increases

magni-As an optional step, while adding up pixel values for mask elements “ColSum”

or while forming the receptive fields, the extreme intensity values of pixels in Sum and of each ColSum vector component (max and min.) may be determined The former gives an indication of the validity of averaging (when the extreme val-ues are not too far apart), while the latter may be used for automatically adjusting threshold parameters In natural environments, in addition, this gives an indication

Col-Figure 5.11 Examples of receptive fields and search paths for efficient edge feature

ex-traction; mask parameters can be changed from one video-frame to the next, allowing easy adaptation to changing scenes observed continuously

Rec

eptive

fi

dofmask :

Search path center

ient

ation

+ 0

-nd=1

Edge

orie

ation

md= 2·nd+ n0 =14 + 3 = 17 md = 3

For fuzzy large scale edge

For sharp, nounced edge

Trang 3

of the contrasts in the scene These are some of the general environmental ters to be collected in parallel (right-hand part of Figure 5.1)

parame-5.2.2 Search Paths and Subpixel Accuracy

The masks defined in the previous section are applied to rectangular search ranges

to find all possible candidates for an edge in these ranges The smaller these search ranges can be kept, the more efficient the overall algorithm is going to be If the high-level interpretation via recursive estimation is stable and good information on the variances is available, the search region for specific features may be confined

to the 3 ı region around the predicted value, which is not very large, usually (ı = standard variation) It does not make sense first to perform the image processing part in a large search region fixed in advance and afterwards sort out the features according to the variance criterion In order not to destabilize the tracking process, prediction errors > 3 ı are considered outliers and are usually removed when they appear for the first time in a sequence.]

Figure 5.6 shows an example of edge localization with a ternary mask of size nw

= 17, nd = 2, and n0 = 1 (i.e., mask depth md = 5) The mask response is close to

zero when the region to which it is applied is close to homogeneously gray spective of the gray value); this is an important design factor for abating sensitivity

(irre-to light levels It means that the plus– and minus regions have (irre-to be the same size The lower part of the figure shows the resulting correlation values (mask re-sponses) which form the basis for determining edge location If the image areas within each field of the mask are homogeneous, the response is maximal at the lo-cation of the edge With different light levels, only the magnitude of the extreme value changes but not its location Highly discernible extreme values are obtained

also for neighboring mask orientations The larger the parameter n0, the less

pro-nounced is the extreme value in the search direction, and the more tolerant it is to deviations in angle These robustness aspects make the method well suited for natural outdoor scenes

Search directions (horizontal or vertical) are automatically chosen depending on the feature orientation specified The horizontal search direction is used for mask orientations between 45 to 135° as well as between 225 and 315°; vertical search is applied for mask directions between 135 to 225° and 315 to 45° To avoid too fre-quent switching between search directions, a hysteresis (dead zone of about one di-rection–increment for the larger mask widths) is often used that means switching is actually performed (automatically) 6 to 11° beyond the diagonal lines, depending

on the direction from which these are approached

5.2.2.1 Subpixel Accuracy by Second-Order Interpolation

Experience with several interpolation schemes, taking up to two correlation values

on each side of the extreme value into account, has shown that the simple order parabola interpolation is the most cost-effective and robust solution (Figure 5.12) Just the neighboring correlation values around a peak serve as a basis

Trang 4

second-If an extreme value of the magnitude of the mask response above the threshold level (see Figure 5.6) has been found by stating that the new value is smaller than the old one, the last three values are used

to find the interpolating parabola of second order Its extreme value yields the position

yextr of the edge to subpixel accuracy and

the corresponding magnitude Cextr; this

po-sition is obtained at the location where the derivative of the parabolic unction is zero Designating the largest correlation value

found as C0 at pixel position 0, the ous one Cm at í1, and the last correlation value Cp at position +1 (which indicated

previ-that there is an extreme value by its

magni-tude Cp < C0), the following differences

From the last expressions of Equation 5.1 and 5.2 it is seen that the interpolated

value lies on the side of C0 on which the neighboring correlation value measured is

larger Experience with real-world scenes has shown that subpixel accuracy in the range of 0.3 to 0.1 may be achieved

5.2.2.2 Position and Direction of an Optimal Edge

Determining precise edge direction by applying, additionally, the two neighboring mask orientations in the same search path and performing a bi–variant interpola-tion has been investigated, but the

results were rather disappointing

Precise edge direction can be

de-termined more reliably by

exploit-ing results from three neighborexploit-ing

search paths with the same mask

direction (see Figure 5.13)

The central edge position to

subpixel accuracy yields the

posi-tion of the tangent point, while the

tangent direction is determined

from the straight line connecting

the positions of the (equidistant)

neighboring edge points; this is

Figure 5.13 Determination of the tangent

di-rection of a slightly curved edge by sub-pixel localization of edge points in three neighboring search paths and parabolic interpolation

Figure 5.12 Subpixel edge

localiza-tion by parabolic interpolalocaliza-tion after

passing a maximum in mask response

Trang 5

the result of a parabolic interpolation for the three points

Once it is known that the edge is curved – because the edge point at the center does not lie on the straight line connecting the neighboring edge points – the ques-tion arises whether the amount of curvature can also be determined with little effort (at least approximately) This is the case

5.2.2.3 Approximate Determination of Edge Curvature

When applying a series of equidistant search stripes to an image region, the method

of the previous section yields to each point on the edge also the corresponding edge direction that is its tangent Two points and two slopes determine the coefficients

of a third-order polynomial, dubbed Hermite-interpolation after a French

mathema-tician As a third-order curve, it can have at most one inflection point Taking the

connecting line (dash-dotted in Figure 5.14) between the two tangent points P-d and

P+d as reference (chord line or secant), a simple linear relationship for a smooth curve with small angles ȥ relative to the chord line can be derived Tangent direc-tions are used in differential-geometry terms, yielding a linear curvature model; the reference is the slope of the straight line connecting the tangent points (secant) Let

míd and m+d be the slopes of the tangents at points Píd and P+d respectively; s be the running variable in the direction of the arc (edge line); and ȥ the angle between the local tangent and the chord direction (|ȥ| < 0.2 radian so that cos(ȥ) §1)

The linear curvature model in differential-geometry terms with s as running

variable along the arc s from x § íd to x § +d is:

Since curvature is a second-order concept with respect to Cartesian coordinates,

lateral position y results from a second integral of the curvature model With the origin at the center of the chord, x in the direction of the chord, y normal to it, and

ȥíd = arctan(m-d)§ míd as the angle between the tangent and chord directions at point Píd, the equation describing the curved arc then is given by Equation 5.4 be-low [with ȥ in the range ± 0.2 radian (~ 11°), the cosine can be approximated by 1 and the sine by the argument ȥ]:

Trang 6

2 1

curva-(2·d) between the

tangent points Of course, this distance has to be chosen such that the angle con-straint (|ȥ| < 0.2 ra-dian) is not violated

On smooth curves, this is always possi-ble; however, for

large curvatures, the distance d allowed becomes small and the scale for measuring

edge locations and tangent directions probably has to be adapted Very sharp curves have to be isolated and jumped over as “corners” having large directional changes over small arc lengths In an idealized but simple scheme, they can be ap-

proximated by a Dirac impulse in curvature with a finite change in direction over

zero arc length

Due to the differencing process unavoidable for curvature determination, the sults tend to be noisy When basic properties of objects recognized are known, a post–processing step for noise reduction exploiting this knowledge should be in-cluded

re-Remark: The special advantage of subscale resolution for dynamic vision lies in

the fact that the onset of changes in motion behavior may be detected earlier, ing better tracking performance, crucial for some applications The aperture prob-lem inherent in edge tracking will be revisited in Section 9.5 after the basic track-ing problem has been discussed

yield-5.2.3 Edge Candidate Selection

Usually, due to image noise there are many insignificant extreme values in the sulting correlation vector, as can be seen in Figure 5.6 Positioning the threshold properly (and selecting the mask parameters in general) depends very much on the scene at hand, as may be seen in Figure 5.15, due to shadow boundaries and scene noise, the largest gradient values may not be those looked for in the task context (road boundary) Colinearity conditions (or even edge elements on a smoothly

re-Figure 5.14 Approximate determination of curvature of a

slightly curved edge by sub-pixel localization of edge points

and tangent directions: Hermite-interpolation of a third order

parabola from two tangent points

Ȍ 0 =í 0.25·(míd+ m+d) 0

Trang 7

Figure 5.15 The challenge of edge feature selection in road scenes: Good decisions can

be made only by resorting to higher level knowledge Road scenes with shadows (and texture); extreme correlation values marking road boundaries may not be the absolutely largest ones

curved line) may be needed for proper feature selection; therefore, threshold tion in the feature extraction step should not eliminate these candidates Depending

selec-on the situatiselec-on, these parameters have to be specified by the user (now) or by a knowledge-based component on the higher system levels of a more mature version Average intensity levels and intensity ranges resulting from region-based methods (see Section 5.3) will yield information for the latter case

As a service to the user, in the code CRONOS, the extreme values found in one function call may be listed according to their correlation values; the user can spec-ify how many candidates he wants presented at most in the function call As an ex-treme value of the search either the pixel position with the largest mask response may be chosen (simplest case with large measurement noise), or several neighbor-ing correspondence values may be taken into account allowing interpolation

5.2.4 Template Scaling as a Function of the Overall “Gestalt”

An additional degree of freedom available to the designer of a vision system is the focal length of the camera for scaling the image size of an object to its distance in the scene To analyze as many details as possible of an object of interest, one tends

to assume that a focal length, which lets the object (in its largest dimension) just fill the image would be optimal This may be the case for a static scene being ob-served from a stationary camera If either the object observed or the vehicle carry-

Trang 8

ing the camera or both can move, there should be some room left for searching and tracking over time Generously granting an additional space of the actual size of the object to each side results in the requirement that perspective mapping (focal length) should be adjusted so that the major object dimension in the image is about one third of the image This leaves some regions in the image for recognizing the environment of the object, which again may be useful in a task context

To discover essential shape details of an object, the smallest edge element plate should not be larger than about one-tenth of the largest object dimension This yields the requirement that the size of an object in the image to be analyzed in some detail should be about 20 to 30 pixels However, due to the poor angular resolution of masks with a size of three pixels, a factor of 2 (60 pixels) seems more comfortable This leads to the requirement that objects in an image must be larger than about 150 pixels Keep in mind that objects imaged with a size (region) of only about a half dozen pixels still can be noticed (discovered and roughly tracked), however, due to spurious details from discrete mapping (rectangular pixel size) into the sensor array, no meaningful shape analysis can be performed

tem-This has been a heuristic discussion of the effects of object size on shape nition A more operational consideration based on straight edge template matching and coordinate-free differential geometry shape representation by piecewise func-tions with linear curvature models is to follow

recog-A lower limit to the support region required for achieving accuracy of about one-tenth of a pixel in a tangent position and about 1° in the tangent direction (or-der of magnitude) by subpixel resolution is about eight to ten pixels The efficient scheme given in [Dickmanns 1985] for accurately determining the curvature parame-ters is limited to a smooth change in the tangent direction of about 20 to 25°; for

recovering a circle (360°) This means that about nelef§ 15 to 18 elemental edge features have to be measured Since the ratio of circumference to diameter is ʌ for

a circle, the smallest circle satisfying these conditions for non–overlapping support

regions is nelef times (mask size = 8 to 10 pixels) divided by ʌ This yields a

re-quired size of about 40 to 60 pixels in linear extension of an object in an image Since corners (points of finite direction change) can be included as curvature impulses measurable by adjacent tangent directions, the smallest (horizontally aligned) measurable square is ten pixels wide while the diagonal is about 14 pixels; more irregularly shaped objects with concavities require a larger number of tangent measurements The convex hull and its dimensions give the smallest size measur-able in units of the support region Fine internal structures may be lost

From these considerations, for accurate shape analysis down to the percent range, the image of the object should be between 20 and 100 pixel in linear exten-sion, in general This fits well in the template size range from 3 (or 5) to 17 (or 33) pixels Usual image sizes of several hundred lines allow the presence of several well-recognizable objects in each image; other scales of resolution may require dif-ferent focal lengths for imaging (from microscopy to far ranging telescopes)

Template scaling for line detection: Finally, choosing the right scale for detecting

(thin) lines will be discussed using a real example [Hofmann 2004] Figure 5.16 shows results for an obliquely imaged lane marking which appears 16 pixels wide

in the search direction (top: image section searched, width nw = 9 pixel) Summing

up the mask elements in the edge direction corresponds to rectifying the image

Trang 9

stripe, as shown below in the figure; however, only one intensity value remains, so that for the rest of the pixel-operations with different mask sizes in the search di-rection, about one order of magnitude in efficiency is gained All five masks inves-tigated (a) to (e) rely on the same “ColSum”-vector; depending on the depth of the masks, the valid search ranges are reduced (see double-arrows at bottom)

Figure 5.16 Optimal mask size for line recognition: For general scaling, mask size

should be scaled by line width (= 16 pixels here)

The averaged intensity profile of the mask elements is given in the vertical ter (around 90 for the road, and ~130 for the lane marker); the lane marking clearly sticks out Curve (e) shows the mask response for the mask of highest possible resolution (1, 0, 1); see legend It can be seen that the edge is correctly detected with respect to location, but due to the smaller extreme value, sensitivity to noise is

cen-higher than that for the other masks All other masks have been chosen with n0 = 3

for reducing sensitivity to slightly different edge directions including curved edges

In practical terms, this means that the three central values under the mask shifted

over the ColSum–vector need not be touched; only nd values to the left and to the

right need be summed

Depth values for the two fields of the mask of nd = 4, 8, and 16 (curves a, b, c)

yield the same gradient values and edge location; the mask response widens with

increasing field width By scaling the field depth nd of the mask by the width of the line lw to be detected, the curves can be generalized to scaled masks of depths nd/lw

= ¼, ½, and 1 Case (d) shows with nd/lw = 21/16 = 1.3 that for field depths larger

than line width, the maximal gradient decreases and the edge is localized at a wrong position So, the field width selected should always be smaller than the line

to be detected The number of zeros at the center should be less than the field depth, probably less than half that value for larger masks; values between 1 and 3

have shown good results for nd up to 7 For the detection of dirt roads with jagged edges and homogeneous intensity values on and off the road, large n0 are favorable

Trang 10

5.3 The Unified Blob-edge-corner Method (UBM)

The approach discussed above for detecting edge features of single (sub-) objects based on receptive fields (masks) has been generalized to a feature extraction method for characterizing image regions and general image properties by oriented edges, homogeneously shaded areas, and nonhomogeneous areas with corners and texture For characterizing textures by their statistical properties of image intensi-ties in real time (certain types of textures), more computing power is needed; this has to be added in the future In an even more general approach, stripe directions could be defined in any orientation, and color could be added as a new feature space For efficiency reasons, here, only horizontal and vertical stripes in intensity images are considered, for which only one matrix index and the gray values vary at

a time) To achieve reusability of intermediate results, stripe widths are confined to even numbers and are decomposed into two half-stripes

5.3.1 Segmentation of Stripes through Corners, Edges, and Blobs

In this image evaluation method, the goal is to start from as few assumptions on tensity distributions as possible Since pixel noise is an important factor in outdoor environments, some kind of smoothing has to be taken into account, however This

in-is done by fitting models with planar intensity din-istribution to local pixel values if they exhibit some smoothness conditions; otherwise, the region will be character-ized as nonhomogeneous Surprisingly, it has turned out that the planarity check for local intensity distribution itself constitutes a nice feature for region segmenta-tion

5.3.1.1 Stripe Selection and Decomposition into Elementary Blocks

The field size for the least-squares fit of a planar pixel-intensity model is (2·m) × (2·n), and is called the “model support region” or mask region For reusability of intermediate results in computation, this support region is subdivided into basic

(elementary) image regions (called mask elements or briefly “mels”) that can be defined by two numbers: The number of pixels in the row direction m, and the number of pixels in the column direction n In Figure 5.17, m has been selected as

4 and n as 2; the total stripe width for row search thus is 4 pixels For m = n = 1,

the highest possible image resolution will be obtained; however, strong influence

of noise on the pixel level may show up in the results in this case

When working with video fields (sub–images with only odd or even row–indices, as is often done in practical applications), it makes sense for horizontal

stripes to choose m = 2n; this yields averaging of pixels at least in row direction for

n = 1 Rendering these mels as squares, finally yields the original rectangular

im-age shape with half the original full-frame resolution By shifting stripe evaluation

by only half the stripe width, all intermediate pixel results in one half-stripe can be reused directly in the next stripe by just changing sign (see below) The price to be paid for this convenience is that the results obtained have to be represented at the

Trang 11

5.3 The Unified Blob-edge-corner Method (UBM) 145

center point of the support region which is exactly at pixel boundaries However, since subpixel accuracy is looked for anyway, this is of no concern

Figure 5.17 Stripe definition (row = horizontal, column = vertical) for the (multiple)

feature extractor ‘UBM’ in a pixel-grid; mask elements (mels) are defined as basic

rec-tangular units for fitting a planar intensity model

Values in these half-stripes of stripe 2 are stored for reuse in stripe 3

1 Half- stripe

Stripe ‘C2’

Stripe ‘C1’ Stripe ‘C3’

Center points of mel regions

Stripe ‘C4’

half-stripe

Image region evaluated

n = 2

m = 4

No gradient direction in this mask: region marked as non–homogeneous

Center of mask

Stored newly computed

Image region evaluated: mask

Left L Right R

in search direction row (R) Gradient direction

Number of half-stripe 1 2 3 4 5 6

Rows are evaluated top-down;

columns from left to right.

Still open is the question of how to proceed within a stripe Figure 5.17 suggests taking steps equal to the width of mels; this covers all pixels in the stripe direction once and is very efficient However, shifting mels by just 1 pixel in the stripe di-

rection yields smoother pass filtered) results [Hofmann 2004] For larger mel-lengths, in-termediate computational results can be used as shown in Figure 5.18

(low-This corresponds to the use of Colsum in the method CRONOS (see Figures 5.9 and 5.10) The new summed value for the next mel can be obtained by subtract-ing the value of the last column and adding the one of the next

Figure 5.18 Mask elements (mels) for efficient

computation of gradients and average intensities

resulting cell structure

incremental computation of cell values for cells with larger extension in stripe direction

Step

1

Trang 12

column [(jí2) and (j+2) in the example shown, bottom row in Figure 5.18]

For the vertical search direction, image evaluation progresses top-down within the stripe and from left to right in the sequence of stripes Shifting of stripes is al-

ways done by mel-size m or n (width of half-stripe), while shifting of masks in the search direction can be specified from 1 to m or n (see Figure 5.19b below); the latter number m or n means pure block evaluation, however, only coarse resolution

This yields the lowest possible computational load with all pixels used just once in one mel For objects in the near range, this may still be sufficient for tracking The goal was to obtain an algorithm allowing easy adaptation to limited com-puting power; since high resolution is required in only a relatively small part of images, in general in outdoor scenes, only this region needs to be treated with more finely tuned parameters (see Figure 5.37 below) Specifying a rectangular region of special interest by its upper left and lower right corners, this sub-area can be pre-cisely evaluated in a separate step If no view stabilization is available, the decision for the corner points may even be based on actual evaluation results with coarse resolution The initial analysis with coarse resolution guarantees that only the most promising subregions of the image are selected despite angular perturbations stemming from motion of the subject body, which shifts around the inertially fixed

scene in the image This attention-focusing avoids unnecessary details in regions of

less concern

Figure 5.19 shows the definitions necessary for performing efficient scale feature evaluation The left part (a) shows the specification of masks of dif-

multiple-ferent sizes (with mel-sizes from 1×1 to 4×2 and 4×4, i.e., two pyramid stages)

Note that the center of a pixel or of mels does not coincide with the origin O of the masks, which is for all masks at (0, 0) The mask origin is always defined as the point where all four quadrants (mels) meet The computation of the average inten-

Figure 5.19 With the reference points chosen here for the mask and the average image

intensities in quadrants Qi, fusing results from different scales becomes simple; (a) basic definitions of mask elements, (b) progressive image analysis within stripes and with sequences of stripes (shown here for rows)

Mask with mask element mel = 4×4

Mask with mask elements 3 × 3

Mask with mask elements 2×2

z v n

Row

umn

Mask in stripe Ri+1

at index k+1

(b) (a)

Trang 13

sities in each mel (I12, I11, I21, I22in quadrants Q1 to Q4) is performed with the erence point at (0.5, 0.5), the center of the first pixel nearest to the mask origin in the most recent mel; this yields a constant offset for all mask sizes when rendering pixel intensities from symbolic representations For computing gradients, of course, the real mel centers shown in quadrant Q4 have to be used

ref-The reconstruction of image intensities from results of one stripe is done for the central part of the mask (± half the size of the width normal to the search direction

of the mask element) This is shown in the right part (b) of the figure by different

shading It shows (low-frequency) shifting of the stripe position by n = 2 (index i)

and (high-frequency) shifting of the mask position in search direction by 1 (index

k) Following this strategy in both row and column direction will yield nice

low-pass-filtered results for the corresponding edges

5.3.1.2 Reduction of the Pixel Stripe to a Vector with Attributes

The first step is to sum up all n pixel or cell values in the direction of the width of

the half-stripe (lower part in Figure 5.18) This reduces the half-stripe for search to

a vector, irrespective of stripe width specified It is represented in Figure 5.18 by the bottom row (note the reduction in size at the boundaries) Each and every fur-ther computation is based on these values that represent the average pixel or cell intensity at the location in the stripe if divided by the number of pixels summed However, these individual divisions are superfluous computations and can be spared; only the final results have to be scaled properly for image intensity

In our example with m = 4 in Figure 5.18, the first mel value has to be computed

by summing up the first four values in the vector When the mels are shifted by one pixel or cell length for smooth evaluation of image intensities in the stripe (center row), the four new mel values are obtained by subtracting the trailing pixel or cell

value at position j í2 and by adding the leading one at j +2 (see lower left in Figure

5.18) The operations to be performed for gradient computation in horizontal and vertical directions are shown in the upper left and center parts of the figure Sum-ming two mel values (vertically in the left and horizontally in the center sub-figure) and subtracting the corresponding other two sums yields the difference in (average) intensities in the horizontal and vertical directions of the support region Dividing these numbers by the distances between the centers of the mels yields a measure of the (averaged) horizontal and vertical image intensity gradient at that location Combining both results allows computing the absolute gradient direction and magnitude This corresponds to determining a local plane tangent to the image intensity distribution for each support region (mask) selected

However, it may not be meaningful to enforce a planar approximation if the tensities vary irregularly by a large amount For example, the intensity distribution

in-in the mask top left of Figure 5.17 shows a situation where averagin-ing does not make sense Figure 5.20a shows the situation with intensities as vectors above the center of each mel For simplicity, the vectors have been chosen of equal magni-tude on the diagonals The interpolating plane is indicated by the dotted lines; its origin is located at the top of the central vector representing the average intensity

IC From the dots at the center of each mel in this plane, it can be recognized that two diagonally adjacent vectors of average mel intensity are well above, respec-

Trang 14

tively, below the interpolating plane This is typical for two cor-

ners or a textured area (e.g., four

checkerboard fields or a saddle point)

Figure 5.20b represents a fect (gray value) corner Of course, the quadrant with the dif-fering gray value may be located anywhere in the mask In general, all gray values will differ from each other The challenge is to find algorithms allowing reason-able separation of these feature types versus regions fit for inter-polation with planar shading models (lower part of Figure 5.20) at low computa-tional cost Well known for corner detection among many others are the “Harris”- [Harris, Stephens 1988], the KLT- [Tomasi, Kanade 1991] and the “Haralick”- [Haralick, Shapiro 1993] algorithms, all based on combinations of intensity gradients

per-in several regions and directions The basic ideas have been adapted and per-integrated into the algorithm UBM The goal is to segment the image stripe into regions with smooth shading, corner points, and extended nonhomogeneous regions (textured areas) It will turn out that nonplanarity is a new, easily computable feature on its own (see Section 5.3.2.1)

Figure 5.20 Feature types detectable by UBM

in stripe analysis

Corner points are of special value in tracking since they often allow determining optical feature flow in image sequences (if robustly recognizable); this is one im-portant hint for detecting moving objects before they have been identified on higher system levels These types of features have shown good performance for de-tecting pedestrians or bicyclists in the near range of a car in urban traffic [Franke et

By interpolation of results from neighboring masks, extreme values of gradients including their orientation are determined to subpixel accuracy Note that, contrary

to the method CRONOS, no direction has to be specified in advance; the direction

of the maximal gradient is a result of the interpolation process For this reason the method UBM is called “direction-sensitive” (instead of “direction selective” in the case of CRONOS) It is, therefore, well suited for initial (strictly “bottom-up”) im-age analysis [Hofmann 2004], while CRONOS is very efficient once predominant

Trang 15

edge directions in the image are known and their changes can be estimated by the 4-D approach (see Chapter 6)

During these computations within stripes, some statistical properties of the ages can be determined In step 1, all pixel values are compared to the lowest and the highest values encountered up to then If one of them exceeds the actual ex-treme value, the actual extreme is updated At the end of the stripe, this yields the

im-maximal (Imax-st) and the minimal (Imin-st) image intensity values in the stripe The

same statistic can be run for the summed intensities normal to the stripe direction

(Iwmax-st and Iwmin-st) and for each mel (Icmax-st and Icmin-st); dividing the maximal and

minimal value within each mel by the average for the mel, these scaled values will allow monitoring the appropriateness of averaging A reasonable balance between computing statistical data and fast performance has to be found for each set of problems

Table 5.1 summarizes the parameters for feature evaluation in the algorithm UBM; they are needed for categorizing the symbolic descriptions within a stripe, for selecting candidates, and for merging across stripe boundaries Detailed mean-ings will be discussed in the following sections

Table 5.1 Parameters for feature evaluation in image stripes

ErrMax Maximally allowed percent error of interpolated intensity

plane through centers of four mels (typically 3 to 10%); note that the errors at all mel centers have same magnitude! (see Section 5.3.2.2)

CircMin (qmin)

Minimal “circularity” required, threshold value on scaled second eigenvalue for corner selection [0.75 corresponds to

an ideal corner (Figure 5.20b), the maximal value 1 to an ideal double–corner (checkerboard, Figure 5.20a)]; (see section 5.3.3)

traceNmin (Alternate) threshold value for selection of corner

candi-dates; useful for adjusting the number of corner candidates IntensGradMin Threshold value for intensity gradients to be accepted as

edge candidates; (see Section 5.3.2.3) AngleFactHor Factor for limiting edge directions to be found in horizontal

search direction (rows); (see Section 5.3.2.3) AngleFactVer Factor for limiting edge directions to be found in vertical

search direction (columns); (see Section 5.3.2.3) VarLim Upper bound on variance allowed for a fit on both ends of

a linearly shaded blob segment Lsegmin Minimum length required of a linearly shaded blob seg-

ment to be accepted (suppression of small regions) DelIthreshMerg Tolerance in intensity for merging adjacent regions to 2-D

blobs DelSlopeThrsh Tolerance for intensity gradients for merging adjacent re-

gions to 2-D blobs The five feature types treated with the method UBM are (1) textured regions (see Section 5.3.2.1), (2) edges from extreme values of gradients in the search di-rection (see Section 5.3.2.3), (3) homogeneous segments with planar shading mod-

Định dạng
Số trang	30
Dung lượng	0,91 MB