In line b a rather large mask for finding the transition between relatively large homogeneous areas with ragged boundaries is given md = 17 pixels wide and each field with seven elemen
Trang 15.2 Efficient Extraction of Oriented Edge Features 135
ference is largest in amplitude,
the gradient over two
consecu-tive mask elements is
maxi-mal
However, due to local
per-turbations, this need not
corre-spond to an actual extreme
gradient on the scale of
inter-est Experience with images
from natural environments has
shown that two additional
pa-rameters may considerably
improve the results obtained:
1 By allowing a yet to be
specified number n0 of
entries in the mask center
to be dropped, the results achieved may be more robust This can be immediately appreciated when taking into account that either the actual edge direction may deviate from the mask orientation used or the edge is not straight but curved; by setting central elements of the mask to zero, the extreme intensity gradient becomes more pronounced The rest of Figure 5.10
shows typical mask parameters with n0 = 1 for masks three and five pixels in depth (md = 3 or 5), and with n0 = 2 for md = 8 as well as n0 = 3 for md = 17
(rows b, c)
2 Local perturbations are suppressed by assigning to the mask a significant
depth nd, which designates the number of pixels along the search path in each row or column in each positive and negative field The total mask depth md then is md = 2 nd + n0 Figure 5.10 shows the corresponding mask schemes In
line (b) a rather large mask for finding the transition between relatively large
homogeneous areas with ragged boundaries is given (md = 17 pixels wide and
each field with seven elements, so that the correlation value is formed from
large averages; for a mask width nw of 17 pixels, the correlation value is
formed from 7·17 = 119 pixels) With the number of zero-values in between
chosen as n0 = 3, the total receptive field (= mask) size is 17·17 = 289 pixels The sum formed from nd mask elements (vector values “ColSum”) divided by (nw· nd) represents the average intensity value in the oblique image region
adjacent to the edge At the maximum correlation value found, this is the average gray value on one side of the edge This information may be used for recognizing a specific edge feature in consecutive images or for grouping edges in a scene context
For larger mask depths, it is more efficient when shifting the mask along the search direction, to subtract the last mask element (ColSum-value) from the summed field intensities and add the next one at the front in the search direction, see line (c) in Figure 5.10); the number of operations needed is much lower than for summing all ColSum elements anew in each field
The optimal value of these additional mask parameters nd and n0 as well as the mask width nw depend on the scene at hand and are considered knowledge gained
Figure 5.10 Efficient mask evaluation with the
“Colsum”-vector; the nd-values given are typical for sizes of “receptive fields” formed
b)
md = 2; md= 3; md= 5; md = 8
7 3 7
Masks characterized by: (nd n0 nd )
md = 17 (Total mask depth)
(a)
(b)
(c)
Trang 2by experience in visually similar environments From these considerations, generic edge extraction mask sets for specific problems have resulted In Figure 5.11, some representative receptive fields for different tasks are given The mask parameters can be changed from one video frame to the next, allowing easy adaptation to changing scenes observed continuously, like driving on a curved road
The large mask in the center top of Figure 5.11 may be used on dirt roads in the near region with ragged transitions from road to shoulder For sharp, pronounced edges like well-kept lane markings, a receptive field like that in the upper right cor-
ner (probably with nd = 2, that is, md = 5) will be most efficient The further one looks ahead, the more the mask width nw should be reduced (9 or 5 pixels); part (c)
in the lower center shows a typical mask for edges on the right-hand side of a straight road further away (smaller and oblique to the right)
The 5 × 5 (2, 1, 2) mask at the left hand side of Figure 5.11 has been the dard mask for initial detection of other vehicles and obstacles on the road through horizontal edges; collections of horizontal edge elements are good indicators for objects torn by gravity to the road surface Additional masks are then applied for checking object hypotheses formed
stan-If narrow lines like lane markings have to be detected, there is an optimal mask
width depending on the width of the line in the image: If the mask depth nd chosen
is too large, the line will be low-pass-filtered and extreme gradients lose in tude; if mask depth is too small, sensitivity to noise increases
magni-As an optional step, while adding up pixel values for mask elements “ColSum”
or while forming the receptive fields, the extreme intensity values of pixels in Sum and of each ColSum vector component (max and min.) may be determined The former gives an indication of the validity of averaging (when the extreme val-ues are not too far apart), while the latter may be used for automatically adjusting threshold parameters In natural environments, in addition, this gives an indication
Col-Figure 5.11 Examples of receptive fields and search paths for efficient edge feature
ex-traction; mask parameters can be changed from one video-frame to the next, allowing easy adaptation to changing scenes observed continuously
Rec
eptive
fi
dofmask :
Search path center
ient
ation
+ 0
-nd=1
Edge
orie
ation
md= 2·nd+ n0 =14 + 3 = 17 md = 3
For fuzzy large scale edge
For sharp, nounced edge
Trang 35.2 Efficient Extraction of Oriented Edge Features 137
of the contrasts in the scene These are some of the general environmental ters to be collected in parallel (right-hand part of Figure 5.1)
parame-5.2.2 Search Paths and Subpixel Accuracy
The masks defined in the previous section are applied to rectangular search ranges
to find all possible candidates for an edge in these ranges The smaller these search ranges can be kept, the more efficient the overall algorithm is going to be If the high-level interpretation via recursive estimation is stable and good information on the variances is available, the search region for specific features may be confined
to the 3 ı region around the predicted value, which is not very large, usually (ı = standard variation) It does not make sense first to perform the image processing part in a large search region fixed in advance and afterwards sort out the features according to the variance criterion In order not to destabilize the tracking process, prediction errors > 3 ı are considered outliers and are usually removed when they appear for the first time in a sequence.]
Figure 5.6 shows an example of edge localization with a ternary mask of size nw
= 17, nd = 2, and n0 = 1 (i.e., mask depth md = 5) The mask response is close to
zero when the region to which it is applied is close to homogeneously gray spective of the gray value); this is an important design factor for abating sensitivity
(irre-to light levels It means that the plus– and minus regions have (irre-to be the same size The lower part of the figure shows the resulting correlation values (mask re-sponses) which form the basis for determining edge location If the image areas within each field of the mask are homogeneous, the response is maximal at the lo-cation of the edge With different light levels, only the magnitude of the extreme value changes but not its location Highly discernible extreme values are obtained
also for neighboring mask orientations The larger the parameter n0, the less
pro-nounced is the extreme value in the search direction, and the more tolerant it is to deviations in angle These robustness aspects make the method well suited for natural outdoor scenes
Search directions (horizontal or vertical) are automatically chosen depending on the feature orientation specified The horizontal search direction is used for mask orientations between 45 to 135° as well as between 225 and 315°; vertical search is applied for mask directions between 135 to 225° and 315 to 45° To avoid too fre-quent switching between search directions, a hysteresis (dead zone of about one di-rection–increment for the larger mask widths) is often used that means switching is actually performed (automatically) 6 to 11° beyond the diagonal lines, depending
on the direction from which these are approached
5.2.2.1 Subpixel Accuracy by Second-Order Interpolation
Experience with several interpolation schemes, taking up to two correlation values
on each side of the extreme value into account, has shown that the simple order parabola interpolation is the most cost-effective and robust solution (Figure 5.12) Just the neighboring correlation values around a peak serve as a basis
Trang 4second-If an extreme value of the magnitude of the mask response above the threshold level (see Figure 5.6) has been found by stating that the new value is smaller than the old one, the last three values are used
to find the interpolating parabola of second order Its extreme value yields the position
yextr of the edge to subpixel accuracy and
the corresponding magnitude Cextr; this
po-sition is obtained at the location where the derivative of the parabolic unction is zero Designating the largest correlation value
found as C0 at pixel position 0, the ous one Cm at í1, and the last correlation value Cp at position +1 (which indicated
previ-that there is an extreme value by its
magni-tude Cp < C0), the following differences
From the last expressions of Equation 5.1 and 5.2 it is seen that the interpolated
value lies on the side of C0 on which the neighboring correlation value measured is
larger Experience with real-world scenes has shown that subpixel accuracy in the range of 0.3 to 0.1 may be achieved
5.2.2.2 Position and Direction of an Optimal Edge
Determining precise edge direction by applying, additionally, the two neighboring mask orientations in the same search path and performing a bi–variant interpola-tion has been investigated, but the
results were rather disappointing
Precise edge direction can be
de-termined more reliably by
exploit-ing results from three neighborexploit-ing
search paths with the same mask
direction (see Figure 5.13)
The central edge position to
subpixel accuracy yields the
posi-tion of the tangent point, while the
tangent direction is determined
from the straight line connecting
the positions of the (equidistant)
neighboring edge points; this is
Figure 5.13 Determination of the tangent
di-rection of a slightly curved edge by sub-pixel localization of edge points in three neighboring search paths and parabolic interpolation
Figure 5.12 Subpixel edge
localiza-tion by parabolic interpolalocaliza-tion after
passing a maximum in mask response
Trang 55.2 Efficient Extraction of Oriented Edge Features 139
the result of a parabolic interpolation for the three points
Once it is known that the edge is curved – because the edge point at the center does not lie on the straight line connecting the neighboring edge points – the ques-tion arises whether the amount of curvature can also be determined with little effort (at least approximately) This is the case
5.2.2.3 Approximate Determination of Edge Curvature
When applying a series of equidistant search stripes to an image region, the method
of the previous section yields to each point on the edge also the corresponding edge direction that is its tangent Two points and two slopes determine the coefficients
of a third-order polynomial, dubbed Hermite-interpolation after a French
mathema-tician As a third-order curve, it can have at most one inflection point Taking the
connecting line (dash-dotted in Figure 5.14) between the two tangent points P-d and
P+d as reference (chord line or secant), a simple linear relationship for a smooth curve with small angles ȥ relative to the chord line can be derived Tangent direc-tions are used in differential-geometry terms, yielding a linear curvature model; the reference is the slope of the straight line connecting the tangent points (secant) Let
míd and m+d be the slopes of the tangents at points Píd and P+d respectively; s be the running variable in the direction of the arc (edge line); and ȥ the angle between the local tangent and the chord direction (|ȥ| < 0.2 radian so that cos(ȥ) §1)
The linear curvature model in differential-geometry terms with s as running
variable along the arc s from x § íd to x § +d is:
Since curvature is a second-order concept with respect to Cartesian coordinates,
lateral position y results from a second integral of the curvature model With the origin at the center of the chord, x in the direction of the chord, y normal to it, and
ȥíd = arctan(m-d)§ míd as the angle between the tangent and chord directions at point Píd, the equation describing the curved arc then is given by Equation 5.4 be-low [with ȥ in the range ± 0.2 radian (~ 11°), the cosine can be approximated by 1 and the sine by the argument ȥ]:
Trang 62 1
curva-(2·d) between the
tangent points Of course, this distance has to be chosen such that the angle con-straint (|ȥ| < 0.2 ra-dian) is not violated
On smooth curves, this is always possi-ble; however, for
large curvatures, the distance d allowed becomes small and the scale for measuring
edge locations and tangent directions probably has to be adapted Very sharp curves have to be isolated and jumped over as “corners” having large directional changes over small arc lengths In an idealized but simple scheme, they can be ap-
proximated by a Dirac impulse in curvature with a finite change in direction over
zero arc length
Due to the differencing process unavoidable for curvature determination, the sults tend to be noisy When basic properties of objects recognized are known, a post–processing step for noise reduction exploiting this knowledge should be in-cluded
re-Remark: The special advantage of subscale resolution for dynamic vision lies in
the fact that the onset of changes in motion behavior may be detected earlier, ing better tracking performance, crucial for some applications The aperture prob-lem inherent in edge tracking will be revisited in Section 9.5 after the basic track-ing problem has been discussed
yield-5.2.3 Edge Candidate Selection
Usually, due to image noise there are many insignificant extreme values in the sulting correlation vector, as can be seen in Figure 5.6 Positioning the threshold properly (and selecting the mask parameters in general) depends very much on the scene at hand, as may be seen in Figure 5.15, due to shadow boundaries and scene noise, the largest gradient values may not be those looked for in the task context (road boundary) Colinearity conditions (or even edge elements on a smoothly
re-Figure 5.14 Approximate determination of curvature of a
slightly curved edge by sub-pixel localization of edge points
and tangent directions: Hermite-interpolation of a third order
parabola from two tangent points
Ȍ 0 =í 0.25·(míd+ m+d) 0
Trang 75.2 Efficient Extraction of Oriented Edge Features 141
Figure 5.15 The challenge of edge feature selection in road scenes: Good decisions can
be made only by resorting to higher level knowledge Road scenes with shadows (and texture); extreme correlation values marking road boundaries may not be the absolutely largest ones
curved line) may be needed for proper feature selection; therefore, threshold tion in the feature extraction step should not eliminate these candidates Depending
selec-on the situatiselec-on, these parameters have to be specified by the user (now) or by a knowledge-based component on the higher system levels of a more mature version Average intensity levels and intensity ranges resulting from region-based methods (see Section 5.3) will yield information for the latter case
As a service to the user, in the code CRONOS, the extreme values found in one function call may be listed according to their correlation values; the user can spec-ify how many candidates he wants presented at most in the function call As an ex-treme value of the search either the pixel position with the largest mask response may be chosen (simplest case with large measurement noise), or several neighbor-ing correspondence values may be taken into account allowing interpolation
5.2.4 Template Scaling as a Function of the Overall “Gestalt”
An additional degree of freedom available to the designer of a vision system is the focal length of the camera for scaling the image size of an object to its distance in the scene To analyze as many details as possible of an object of interest, one tends
to assume that a focal length, which lets the object (in its largest dimension) just fill the image would be optimal This may be the case for a static scene being ob-served from a stationary camera If either the object observed or the vehicle carry-
Trang 8ing the camera or both can move, there should be some room left for searching and tracking over time Generously granting an additional space of the actual size of the object to each side results in the requirement that perspective mapping (focal length) should be adjusted so that the major object dimension in the image is about one third of the image This leaves some regions in the image for recognizing the environment of the object, which again may be useful in a task context
To discover essential shape details of an object, the smallest edge element plate should not be larger than about one-tenth of the largest object dimension This yields the requirement that the size of an object in the image to be analyzed in some detail should be about 20 to 30 pixels However, due to the poor angular resolution of masks with a size of three pixels, a factor of 2 (60 pixels) seems more comfortable This leads to the requirement that objects in an image must be larger than about 150 pixels Keep in mind that objects imaged with a size (region) of only about a half dozen pixels still can be noticed (discovered and roughly tracked), however, due to spurious details from discrete mapping (rectangular pixel size) into the sensor array, no meaningful shape analysis can be performed
tem-This has been a heuristic discussion of the effects of object size on shape nition A more operational consideration based on straight edge template matching and coordinate-free differential geometry shape representation by piecewise func-tions with linear curvature models is to follow
recog-A lower limit to the support region required for achieving accuracy of about one-tenth of a pixel in a tangent position and about 1° in the tangent direction (or-der of magnitude) by subpixel resolution is about eight to ten pixels The efficient scheme given in [Dickmanns 1985] for accurately determining the curvature parame-ters is limited to a smooth change in the tangent direction of about 20 to 25°; for
recovering a circle (360°) This means that about nelef§ 15 to 18 elemental edge features have to be measured Since the ratio of circumference to diameter is ʌ for
a circle, the smallest circle satisfying these conditions for non–overlapping support
regions is nelef times (mask size = 8 to 10 pixels) divided by ʌ This yields a
re-quired size of about 40 to 60 pixels in linear extension of an object in an image Since corners (points of finite direction change) can be included as curvature impulses measurable by adjacent tangent directions, the smallest (horizontally aligned) measurable square is ten pixels wide while the diagonal is about 14 pixels; more irregularly shaped objects with concavities require a larger number of tangent measurements The convex hull and its dimensions give the smallest size measur-able in units of the support region Fine internal structures may be lost
From these considerations, for accurate shape analysis down to the percent range, the image of the object should be between 20 and 100 pixel in linear exten-sion, in general This fits well in the template size range from 3 (or 5) to 17 (or 33) pixels Usual image sizes of several hundred lines allow the presence of several well-recognizable objects in each image; other scales of resolution may require dif-ferent focal lengths for imaging (from microscopy to far ranging telescopes)
Template scaling for line detection: Finally, choosing the right scale for detecting
(thin) lines will be discussed using a real example [Hofmann 2004] Figure 5.16 shows results for an obliquely imaged lane marking which appears 16 pixels wide
in the search direction (top: image section searched, width nw = 9 pixel) Summing
up the mask elements in the edge direction corresponds to rectifying the image
Trang 95.2 Efficient Extraction of Oriented Edge Features 143
stripe, as shown below in the figure; however, only one intensity value remains, so that for the rest of the pixel-operations with different mask sizes in the search di-rection, about one order of magnitude in efficiency is gained All five masks inves-tigated (a) to (e) rely on the same “ColSum”-vector; depending on the depth of the masks, the valid search ranges are reduced (see double-arrows at bottom)
Figure 5.16 Optimal mask size for line recognition: For general scaling, mask size
should be scaled by line width (= 16 pixels here)
The averaged intensity profile of the mask elements is given in the vertical ter (around 90 for the road, and ~130 for the lane marker); the lane marking clearly sticks out Curve (e) shows the mask response for the mask of highest possible resolution (1, 0, 1); see legend It can be seen that the edge is correctly detected with respect to location, but due to the smaller extreme value, sensitivity to noise is
cen-higher than that for the other masks All other masks have been chosen with n0 = 3
for reducing sensitivity to slightly different edge directions including curved edges
In practical terms, this means that the three central values under the mask shifted
over the ColSum–vector need not be touched; only nd values to the left and to the
right need be summed
Depth values for the two fields of the mask of nd = 4, 8, and 16 (curves a, b, c)
yield the same gradient values and edge location; the mask response widens with
increasing field width By scaling the field depth nd of the mask by the width of the line lw to be detected, the curves can be generalized to scaled masks of depths nd/lw
= ¼, ½, and 1 Case (d) shows with nd/lw = 21/16 = 1.3 that for field depths larger
than line width, the maximal gradient decreases and the edge is localized at a wrong position So, the field width selected should always be smaller than the line
to be detected The number of zeros at the center should be less than the field depth, probably less than half that value for larger masks; values between 1 and 3
have shown good results for nd up to 7 For the detection of dirt roads with jagged edges and homogeneous intensity values on and off the road, large n0 are favorable
Trang 105.3 The Unified Blob-edge-corner Method (UBM)
The approach discussed above for detecting edge features of single (sub-) objects based on receptive fields (masks) has been generalized to a feature extraction method for characterizing image regions and general image properties by oriented edges, homogeneously shaded areas, and nonhomogeneous areas with corners and texture For characterizing textures by their statistical properties of image intensi-ties in real time (certain types of textures), more computing power is needed; this has to be added in the future In an even more general approach, stripe directions could be defined in any orientation, and color could be added as a new feature space For efficiency reasons, here, only horizontal and vertical stripes in intensity images are considered, for which only one matrix index and the gray values vary at
a time) To achieve reusability of intermediate results, stripe widths are confined to even numbers and are decomposed into two half-stripes
5.3.1 Segmentation of Stripes through Corners, Edges, and Blobs
In this image evaluation method, the goal is to start from as few assumptions on tensity distributions as possible Since pixel noise is an important factor in outdoor environments, some kind of smoothing has to be taken into account, however This
in-is done by fitting models with planar intensity din-istribution to local pixel values if they exhibit some smoothness conditions; otherwise, the region will be character-ized as nonhomogeneous Surprisingly, it has turned out that the planarity check for local intensity distribution itself constitutes a nice feature for region segmenta-tion
5.3.1.1 Stripe Selection and Decomposition into Elementary Blocks
The field size for the least-squares fit of a planar pixel-intensity model is (2·m) × (2·n), and is called the “model support region” or mask region For reusability of intermediate results in computation, this support region is subdivided into basic
(elementary) image regions (called mask elements or briefly “mels”) that can be defined by two numbers: The number of pixels in the row direction m, and the number of pixels in the column direction n In Figure 5.17, m has been selected as
4 and n as 2; the total stripe width for row search thus is 4 pixels For m = n = 1,
the highest possible image resolution will be obtained; however, strong influence
of noise on the pixel level may show up in the results in this case
When working with video fields (sub–images with only odd or even row–indices, as is often done in practical applications), it makes sense for horizontal
stripes to choose m = 2n; this yields averaging of pixels at least in row direction for
n = 1 Rendering these mels as squares, finally yields the original rectangular
im-age shape with half the original full-frame resolution By shifting stripe evaluation
by only half the stripe width, all intermediate pixel results in one half-stripe can be reused directly in the next stripe by just changing sign (see below) The price to be paid for this convenience is that the results obtained have to be represented at the
Trang 115.3 The Unified Blob-edge-corner Method (UBM) 145
center point of the support region which is exactly at pixel boundaries However, since subpixel accuracy is looked for anyway, this is of no concern
Figure 5.17 Stripe definition (row = horizontal, column = vertical) for the (multiple)
feature extractor ‘UBM’ in a pixel-grid; mask elements (mels) are defined as basic
rec-tangular units for fitting a planar intensity model
Values in these half-stripes of stripe 2 are stored for reuse in stripe 3
1 Half- stripe
Stripe ‘C2’
Stripe ‘C1’ Stripe ‘C3’
Center points of mel regions
Stripe ‘C4’
half-stripe
Image region evaluated
n = 2
m = 4
No gradient direction in this mask: region marked as non–homogeneous
Center of mask
Stored newly com- puted
Image region evaluated: mask
Left L Right R
in search direction row (R) Gradient direction
Number of half-stripe 1 2 3 4 5 6
Rows are evaluated top-down;
columns from left to right.
Still open is the question of how to proceed within a stripe Figure 5.17 suggests taking steps equal to the width of mels; this covers all pixels in the stripe direction once and is very efficient However, shifting mels by just 1 pixel in the stripe di-
rection yields smoother pass filtered) results [Hofmann 2004] For larger mel-lengths, in-termediate computational results can be used as shown in Figure 5.18
(low-This corresponds to the use of Colsum in the method CRONOS (see Figures 5.9 and 5.10) The new summed value for the next mel can be obtained by subtract-ing the value of the last column and adding the one of the next
Figure 5.18 Mask elements (mels) for efficient
computation of gradients and average intensities
resulting cell structure
incremental computation of cell values for cells with larger extension in stripe direction
Step
1
Trang 12column [(jí2) and (j+2) in the example shown, bottom row in Figure 5.18]
For the vertical search direction, image evaluation progresses top-down within the stripe and from left to right in the sequence of stripes Shifting of stripes is al-
ways done by mel-size m or n (width of half-stripe), while shifting of masks in the search direction can be specified from 1 to m or n (see Figure 5.19b below); the lat- ter number m or n means pure block evaluation, however, only coarse resolution
This yields the lowest possible computational load with all pixels used just once in one mel For objects in the near range, this may still be sufficient for tracking The goal was to obtain an algorithm allowing easy adaptation to limited com-puting power; since high resolution is required in only a relatively small part of images, in general in outdoor scenes, only this region needs to be treated with more finely tuned parameters (see Figure 5.37 below) Specifying a rectangular region of special interest by its upper left and lower right corners, this sub-area can be pre-cisely evaluated in a separate step If no view stabilization is available, the decision for the corner points may even be based on actual evaluation results with coarse resolution The initial analysis with coarse resolution guarantees that only the most promising subregions of the image are selected despite angular perturbations stemming from motion of the subject body, which shifts around the inertially fixed
scene in the image This attention-focusing avoids unnecessary details in regions of
less concern
Figure 5.19 shows the definitions necessary for performing efficient scale feature evaluation The left part (a) shows the specification of masks of dif-
multiple-ferent sizes (with mel-sizes from 1×1 to 4×2 and 4×4, i.e., two pyramid stages)
Note that the center of a pixel or of mels does not coincide with the origin O of the masks, which is for all masks at (0, 0) The mask origin is always defined as the point where all four quadrants (mels) meet The computation of the average inten-
Figure 5.19 With the reference points chosen here for the mask and the average image
intensities in quadrants Qi, fusing results from different scales becomes simple; (a) basic definitions of mask elements, (b) progressive image analysis within stripes and with se- quences of stripes (shown here for rows)
Mask with mask element mel = 4×4
Mask with mask elements 3 × 3
Mask with mask elements 2×2
z v n
Row
umn
Mask in stripe Ri+1
at index k+1
(b) (a)
Trang 135.3 The Unified Blob-edge-corner Method (UBM) 147
sities in each mel (I12, I11, I21, I22in quadrants Q1 to Q4) is performed with the erence point at (0.5, 0.5), the center of the first pixel nearest to the mask origin in the most recent mel; this yields a constant offset for all mask sizes when rendering pixel intensities from symbolic representations For computing gradients, of course, the real mel centers shown in quadrant Q4 have to be used
ref-The reconstruction of image intensities from results of one stripe is done for the central part of the mask (± half the size of the width normal to the search direction
of the mask element) This is shown in the right part (b) of the figure by different
shading It shows (low-frequency) shifting of the stripe position by n = 2 (index i)
and (high-frequency) shifting of the mask position in search direction by 1 (index
k) Following this strategy in both row and column direction will yield nice
low-pass-filtered results for the corresponding edges
5.3.1.2 Reduction of the Pixel Stripe to a Vector with Attributes
The first step is to sum up all n pixel or cell values in the direction of the width of
the half-stripe (lower part in Figure 5.18) This reduces the half-stripe for search to
a vector, irrespective of stripe width specified It is represented in Figure 5.18 by the bottom row (note the reduction in size at the boundaries) Each and every fur-ther computation is based on these values that represent the average pixel or cell intensity at the location in the stripe if divided by the number of pixels summed However, these individual divisions are superfluous computations and can be spared; only the final results have to be scaled properly for image intensity
In our example with m = 4 in Figure 5.18, the first mel value has to be computed
by summing up the first four values in the vector When the mels are shifted by one pixel or cell length for smooth evaluation of image intensities in the stripe (center row), the four new mel values are obtained by subtracting the trailing pixel or cell
value at position j í2 and by adding the leading one at j +2 (see lower left in Figure
5.18) The operations to be performed for gradient computation in horizontal and vertical directions are shown in the upper left and center parts of the figure Sum-ming two mel values (vertically in the left and horizontally in the center sub-figure) and subtracting the corresponding other two sums yields the difference in (average) intensities in the horizontal and vertical directions of the support region Dividing these numbers by the distances between the centers of the mels yields a measure of the (averaged) horizontal and vertical image intensity gradient at that location Combining both results allows computing the absolute gradient direction and magnitude This corresponds to determining a local plane tangent to the image intensity distribution for each support region (mask) selected
However, it may not be meaningful to enforce a planar approximation if the tensities vary irregularly by a large amount For example, the intensity distribution
in-in the mask top left of Figure 5.17 shows a situation where averagin-ing does not make sense Figure 5.20a shows the situation with intensities as vectors above the center of each mel For simplicity, the vectors have been chosen of equal magni-tude on the diagonals The interpolating plane is indicated by the dotted lines; its origin is located at the top of the central vector representing the average intensity
IC From the dots at the center of each mel in this plane, it can be recognized that two diagonally adjacent vectors of average mel intensity are well above, respec-
Trang 14tively, below the interpolating plane This is typical for two cor-
ners or a textured area (e.g., four
checkerboard fields or a saddle point)
Figure 5.20b represents a fect (gray value) corner Of course, the quadrant with the dif-fering gray value may be located anywhere in the mask In general, all gray values will differ from each other The challenge is to find algorithms allowing reason-able separation of these feature types versus regions fit for inter-polation with planar shading models (lower part of Figure 5.20) at low computa-tional cost Well known for corner detection among many others are the “Harris”- [Harris, Stephens 1988], the KLT- [Tomasi, Kanade 1991] and the “Haralick”- [Haralick, Shapiro 1993] algorithms, all based on combinations of intensity gradients
per-in several regions and directions The basic ideas have been adapted and per-integrated into the algorithm UBM The goal is to segment the image stripe into regions with smooth shading, corner points, and extended nonhomogeneous regions (textured areas) It will turn out that nonplanarity is a new, easily computable feature on its own (see Section 5.3.2.1)
Figure 5.20 Feature types detectable by UBM
in stripe analysis
Corner points are of special value in tracking since they often allow determining optical feature flow in image sequences (if robustly recognizable); this is one im-portant hint for detecting moving objects before they have been identified on higher system levels These types of features have shown good performance for de-tecting pedestrians or bicyclists in the near range of a car in urban traffic [Franke et
By interpolation of results from neighboring masks, extreme values of gradients including their orientation are determined to subpixel accuracy Note that, contrary
to the method CRONOS, no direction has to be specified in advance; the direction
of the maximal gradient is a result of the interpolation process For this reason the method UBM is called “direction-sensitive” (instead of “direction selective” in the case of CRONOS) It is, therefore, well suited for initial (strictly “bottom-up”) im-age analysis [Hofmann 2004], while CRONOS is very efficient once predominant
Trang 155.3 The Unified Blob-edge-corner Method (UBM) 149
edge directions in the image are known and their changes can be estimated by the 4-D approach (see Chapter 6)
During these computations within stripes, some statistical properties of the ages can be determined In step 1, all pixel values are compared to the lowest and the highest values encountered up to then If one of them exceeds the actual ex-treme value, the actual extreme is updated At the end of the stripe, this yields the
im-maximal (Imax-st) and the minimal (Imin-st) image intensity values in the stripe The
same statistic can be run for the summed intensities normal to the stripe direction
(Iwmax-st and Iwmin-st) and for each mel (Icmax-st and Icmin-st); dividing the maximal and
minimal value within each mel by the average for the mel, these scaled values will allow monitoring the appropriateness of averaging A reasonable balance between computing statistical data and fast performance has to be found for each set of problems
Table 5.1 summarizes the parameters for feature evaluation in the algorithm UBM; they are needed for categorizing the symbolic descriptions within a stripe, for selecting candidates, and for merging across stripe boundaries Detailed mean-ings will be discussed in the following sections
Table 5.1 Parameters for feature evaluation in image stripes
ErrMax Maximally allowed percent error of interpolated intensity
plane through centers of four mels (typically 3 to 10%); note that the errors at all mel centers have same magnitude! (see Section 5.3.2.2)
CircMin (qmin)
Minimal “circularity” required, threshold value on scaled second eigenvalue for corner selection [0.75 corresponds to
an ideal corner (Figure 5.20b), the maximal value 1 to an ideal double–corner (checkerboard, Figure 5.20a)]; (see section 5.3.3)
traceNmin (Alternate) threshold value for selection of corner
candi-dates; useful for adjusting the number of corner candidates IntensGradMin Threshold value for intensity gradients to be accepted as
edge candidates; (see Section 5.3.2.3) AngleFactHor Factor for limiting edge directions to be found in horizontal
search direction (rows); (see Section 5.3.2.3) AngleFactVer Factor for limiting edge directions to be found in vertical
search direction (columns); (see Section 5.3.2.3) VarLim Upper bound on variance allowed for a fit on both ends of
a linearly shaded blob segment Lsegmin Minimum length required of a linearly shaded blob seg-
ment to be accepted (suppression of small regions) DelIthreshMerg Tolerance in intensity for merging adjacent regions to 2-D
blobs DelSlopeThrsh Tolerance for intensity gradients for merging adjacent re-
gions to 2-D blobs The five feature types treated with the method UBM are (1) textured regions (see Section 5.3.2.1), (2) edges from extreme values of gradients in the search di-rection (see Section 5.3.2.3), (3) homogeneous segments with planar shading mod-