COMPUTER-AIDED INTELLIGENT RECOGNITION TECHNIQUES AND APPLICATIONS phần 8 pps

Fast Object Recognition Using Dynamic Programming from a Combination of Salient Line This chapter presents a new method of grouping and matching line segments to recognize objects.. If t

Trang 1

[22] McLaughlin, R A and Alder, M D The Hough Transform versus the UpWrite, TR97-02, CIIPS,

The University of Western Australia, 1997.

[23] Milisavljevi´c, N “Comparison of Three Methods for Shape Recognition in the Case of Mine Detection,”

Pattern Recognition Letters, 20(11–13), pp 1079–1083, 1999.

[24] Haig, T., Attikiouzel, Y and Alder, M D “Border Following: New Definition Gives Improved Border,” IEE

Proceedings-I, 139(2), pp 206–211, 1992.

[25] McLaughlin, R A Randomized Hough Transform: Improved Ellipse Detection with Comparison, TR97-01,

CIIPS, The University of Western Australia, 1997.

[26] Xu, L “Randomized Hough Transform (RHT): Basic Mechanisms, Algorithms and Computational

Complexities,” CVGIP: Image Understanding, 57(2), pp 131–154, 1993.

[27] Xu, L., Oja, E and Kultanen, P “A New Curve Detection Method: Randomized Hough Transform (RHT),”

Pattern Recognition Letters, 11, pp 331–338, 1990.

[28] Duda, O and Hart, P E “Use of the Hough Transform to Detect Lines and Curves in Pictures,”

Communications of the Association for Computing Machinery, 15(1), pp 11–15, 1972.

[29] Kälviäinen, H., Hirvonen, P., Xu, L and Oja, E “Comparisons of Probabilistic and Non-Probabilistic Hough

Transforms,” Proceedings of 3rd European Conference on Computer Vision, Stockholm, Sweden, pp 351–360,

1994.

[30] Leavers, V F Shape Detection in Computer Vision Using the Hough Transform, Springer, London, 1992.

[31] Yuen, H K., Illingworth, J and Kittler, J “Detecting Partially Occluded Ellipses using the Hough Transform,”

Image and Vision Computing, 7(1), pp 31–37, 1989.

[32] Capineri, L., Grande, P and Temple, J A G “Advanced Image-Processing Technique for Real-Time

Interpretation of Ground Penetrating Radar Images,” International Journal on Imaging Systems and

Technology, 9, pp 51–59, 1998.

[33] Milisavljevi´c, N., Bloch, I and Acheroy, M “Application of the Randomized Hough Transform to

Humanitarian Mine Detection,” Proceedings of the 7th IASTED International Conference on Signal and Image Procesing (SIP2001), Honolulu, Hawaii, USA, pp 149–154, 2001.

[34] Banks, E Antipersonnel Landmines – Recognising and Disarming, Brassey’s, London-Washington, 1997.

[35] Milisavljevi´c, N and Bloch, I “Sensor Fusion in Anti-Personnel Mine Detection Using a Two-Level Belief

Function Model,” IEEE Transactions On Systems, Man, and Cybernetics C, 33(2), pp 269–283, 2003.

[36] Milisavljevi´c, N., Bloch, I., van den Broek, S P and Acheroy, M “Improving Mine Recognition

through Processing and Dempster–Shafer Fusion of Ground-Penetrating Data,” Pattern Recognition, 36(5),

pp 1233–1250, 2003.

[37] Dubois, D., Grabisch, M., Prade, H and Smets, P “Assessing the Value of a Candidate,” Proceedings of 15th Conference on Uncertainty in Artificial Intelligence (UAI’99), Stockholm, Sweden, pp 170–177, 1999.

[38] Smets, P “Belief Functions: the Disjunctive Rule of Combination and the Generalized Bayesian Theorem,”

International Journal of Approximate Reasoning, 9, pp 1–35, 1993.

[39] Schubert, J “On Nonspecific Evidence,” International Journal of Intelligent Systems, 8, pp 711–725, 1993.

[40] Smets, P “Constructing the Pignistic Probability Function in a Context of Uncertainty,” Uncertainty in

Artificial Intelligence, 5, pp 29–39, 1990.

[41] Milisavljevi´c, N., Bloch, I and Acheroy, M “Characterization of Mine Detection Sensors in Terms of Belief

Functions and their Fusion, First Results,” Proceedings of 3rd International Conference on Information Fusion

(FUSION 2000), II, pp ThC3.15–ThC3.22, 2000.

Trang 3

Fast Object Recognition Using

Dynamic Programming from

a Combination of Salient Line

This chapter presents a new method of grouping and matching line segments to recognize objects.

We propose a dynamic programming-based formulation extracting salient line patterns by defining a robust and stable geometric representation that is based on perceptual organizations As the end point proximity,

we detect several junctions from image lines We then search for junction groups by using the collinear constraint between the junctions Junction groups similar to the model are searched in the scene, based on

a local comparison A DP-based search algorithm reduces the time complexity for the search of the model lines in the scene The system is able to find reasonable line groups in a short time.

1 Introduction

This chapter describes an algorithm that robustly locates collections of salient line segments in an image

In computer vision and related applications, we often wish to find objects based on stored models from

an image containing objects of interest [1–6] To achieve this, a model-based object recognition system

Computer-Aided Intelligent Recognition Techniques and Applications Edited by M Sarfraz

Trang 4

variations, it is possible to extract enough lines to guide 2D or 3D object recognition.

Conventionally, the DP-based algorithm as a search tool is an optimization technique for the problemswhere not all variables are interrelated simultaneously [7–9] In the case of an inhomogeneous problem,such as object recognition, related contextual dependency for all the model features always exists [10].Therefore, DP optimization would not give the true minimum

On the other hand, the DP method has an advantage in greatly reducing the time complexity for acandidate search, based on the local similarity Silhouette or boundary matching problems that satisfythe locality constraint can be solved by DP-based methods using local comparison of the shapes Inthese approaches, both the model and matched scene have a sequentially connected form of lines,ordered pixels, or chained points [11–13] In some cases, there also exist many vision problems,

in which the ordering or local neighborhood cannot be easily defined For example, definition of ameaningful line connection in noisy lines is not easy, because the object boundary extraction for anoutdoor scene is itself a formidable job for object segmentation

In this chapter, we do not assume known boundary lines or junctions, rather, we are open to anyconnection possibilities for arbitrary junction groups in the DP-based search That is, the given problem

is a local comparison between predefined and sequentially linked model junctions and all possiblescene lines in an energy minimization framework

Section 2 introduces previous research about feature grouping in object recognition Section 3explains a quality measure to detect two line junctions in an input image Section 4 describes acombination model to form local line groups and how junctions are linked to each other Section 5explains how related junctions are searched to form the salient line groups in a DP-based searchframework Section 6 gives a criterion to test the collinearity between lines Section 7 tests therobustness of the junction detection algorithm by counting the number of detected junctions as afunction of the junction quality and whether a prominent junction from a single object is extractedunder an experimentally decided quality threshold Section 8 presents the results of experiments usingsynthetic and real images Finally, Section 9 summarizes the results and draws conclusions

2 Previous Research

Guiding object recognition by matching perceptual groups of features was suggested by Lowe [6] InSCERPO, his approach is to match a few significant groupings from certain arrangements of linesfound in images Lowe has successfully incorporated grouping into an object recognition system First,

he groups together lines thought particularly likely to come from the same object Then, SCERPO looksfor groups of lines that have some property invariant with the camera viewpoint For this purpose, heproposes three major line groups – proximity, parallelism and collinearity

Recent results in the field of object recognition, including those of Jacobs, Grimson and Huttenlocher,demonstrate the necessity of some type of grouping, or feature selection, to make the combinatorics ofobject recognition manageable [9,14] Grouping, as for the nonaccidental image features, overcomesthe unfavorable combinatorics of recognition by removing the need to search the space for all matches

Trang 5

between image and model features Grimson has shown that the combinatorics of the recognitionprocess in cluttered environments using a constrained search reduces the time complexity from anexponential to a low-order polynomial if we use an intermediate grouping process [9] Only thoseimage features considered likely to come from a single object could be included together in hypotheticalmatches And these groups need only be matched with compatible groups of model features Forexample, in the case of a constrained tree search, grouping may tell us which parts of the search tree

to explore first, or allow us to prune sections of the tree in advance

This chapter is related to Lowe’s work using perceptual groupings However, the SCERPO groupinghas a limitation: forming only small groups of lines limits the amount by which we may reduce thesearch Our work extends the small grouping to bigger perceptual groups, including more complexshapes Among Lowe’s organization groups, the proximity consisting of two or more image lines is

an important clue for starting object recognition When projected to the image plane, most manmadeobjects may have a polyhedral plane in which two or several sides give line junctions First, weintroduce a quality measure to detect meaningful line junctions denoting the proximity The qualitymeasure must be carefully defined not to skip salient junctions in the input image Then, extractedsalient junctions are combined to form more complex and important local line groups The combinationbetween junctions is guided by the collinearity that is another of Lowe’s perceptual groups Henikoffand Shapiro [15] effectively use an ordered set of three lines representing a line segment with junctions

at both ends In their work, the line triples, or their relations as a local representative pattern, broadlyperform the object recognition and shape indexing However, their system cannot define the line triplewhen the common line sharing two junctions is broken by image noise or object occlusion And thetriple and bigger local groups are separately defined in low-level detection and discrete relaxation,respectively The proposed system in this chapter is able to form the line triple and bigger line groups

in a consistent framework Although the common line is broken, the combination of the two junctionscan be compensated by the collinearity of the broken lines We introduce the following:

1 A robust and stable geometric representation that is based on the perceptual organizations (i.e therepresentation as a primitive search node includes two or more perceptual grouping elements)

2 A consistent search framework combining the primitive geometric representations, based on thedynamic programming formulation

3 Junction Extraction

A junction is defined as any pair of line segments that intersect, and whose intersection point either lies

on one of the line segments, or does not lie on either of the line segments An additional requirement isthat the acute angle between the two lines must lie in a range minto max In order to avoid ambiguitywith parallel or collinear pairs [6], mincould be chosen to be a predefined threshold Various junction

types are well defined by Etemadi et al [7].

Now a perfect junction (or two-line junction) is defined as one in which the intersection point P

lies precisely at the end points of the line segments Figure 18.1 shows the schematic diagram of a

typical junction Note that there are now two virtual lines that share the end point P The points P1

and P4 locating the opposite sides of P2and P3, denote the remaining end points of the virtual lines,respectively Then, the junction quality factor is:

Trang 6

Figure 18.1 The junction.

two variance factors i and i⊥ are ignored The defined relation penalizes pairings in which eitherline is far away from the junction point The quality factor also retains the symmetry property

4 Energy Model for the Junction Groups

The relational representation, made from each contextual relation of the model and scene features,provides a reliable means to compute the correspondence information in the matching problem Supposethat the model consists of M feature nodes Then, a linked node chain, given by the sequentialconnection of the nodes, can be constructed

If the selected features are sequentially linked, then it is possible to calculate a potential energy fromthe enumerated feature nodes For example, assume that any two-line features of the model correspond

to two features fI and fI+1of the scene If the relational configuration of each line node depends only

on the connected neighboring nodes, then the energy potential obtained from the M line nodes can be

represented as:

Etotalf1 f2 fM= E1f1 f2+ E2f2 f3+ · · · + EM−1fM−1 fM (18.2)where

EIfI fI+1= · fI I fI+1− RI I + 1 (18.4)

Each junction has the unary angle relation from two lines constituting a single junction, as shown in

the first term of Equation (18.4) and in Figure 18.1 fI and I are corresponding junction angles

in a scene and a model, respectively We do not use a relation depending on line length, because lines

in a noisy scene could be easily broken The binary relation for the scene r and model R in the

Trang 7

34

second term is defined as a topological constraint or an angle relation between two junctions For

example, the following descriptions can represent the binary relations.

1 Two lines 1 and 4 should be approximately parallel (parallelism).

2 Scene lines corresponding to two lines 2 and 3 must be a collinear pair [6] or the same line That

is, two junctions are combined by the collinear constraint

3 Line ordering for two junctions J1, J2should be maintained, for example as clockwise or clockwise, as the order of line 1, line 2, line 3 and line 4.

counter-The relation defined by the connected two junctions includes all three perceptual organization groupsthat Lowe used in SCERPO These local relations can be selectively imposed according to the type

of the given problem For example, a convex line triplet [15] is simply defined, by removing theabove constraint 1 and letting line 2 and line 3 of constraint 2 be equal to each other The weightingfactor of the line perturbation for image noise

5 Energy Minimization

Dynamic Programming (DP) is an optimization technique good for problems where not all variablesare interrelated simultaneously [8,16] Suppose that the global energy can be decomposed into thefollowing form:

Trang 8

12

to the cluttered background Therefore, it is difficult to extract a meaningful object boundary thatcorresponds to the given model In this case, the DP-based search structure is formulated as the column

in Figure 18.3(b), in which all detected scene features are simultaneously included in each column.Each junction of the model can get a corresponding junction in the scene as well as a null node, whichindicates no correspondence The potential matches are defined as the energy accumulation form ofEquation (18.5) From binary relations of junctions (i.e arrows in Figure 18.3(b)) defined betweentwo neighboring columns, the local comparison-based method using the recursive energy accumulationtable of Equation (18.5) can give a fast matching solution

The DP algorithm generates a sequence that can be written in the recursive form For

I= 1 M − 1,

DIfI+1= min

f I

DI−1fI+ EIfI fI+1 (18.6)with D0f1= 0 The minimal energy solution is obtained by

Trang 9

6 Collinear Criterion of Lines

Extraction of the image features such as points or lines is influenced by the conditions during imageacquisition Because the image noise distorts the object shape in the images, we need to handle theeffect of the position perturbation in the features, and to decide a threshold or criterion to discard andreduce the excessive noise In this section, the noise model and the error propagation for the collinearitytest between lines are proposed The Gaussian noise distribution for two end points of a line is aneffective and general approach, as referred to in Haralick [17] and Roh [18], etc In this section, we usethe Gaussian noise model to compute error propagation for two-line collinearity and obtain a thresholdvalue resulting from the error variance test to decide whether the two lines are collinear or not.The line collinearity can be decomposed into two terms of parallelism and normal distance definedbetween the two lines being evaluated

From these noisy measurements, we define the noisy parallel function,

˜p ˜x˜y ˜x ˜y ˜x˜y ˜x˜y (18.15)

Trang 10

Hence, for a given two lines, we can determine a threshold:

The ai biand ciare line coefficients for the i-line and xm ym denotes the center point coordinate

of the first line, and x m ym denotes the center of the second line Similarly to the parallel case ofSection 6.1, the normal distance is also a function of eight variables:

dnorm= dx1 x2 x3 x4 (18.22)Through all processes similar to the noise model of Section 6.1, we obtain:

Varp = E˜p − p 2

= 2 4

Trang 11

7 Energy Model for the Junction Groups

In this section, we test the robustness of the junction detection algorithm by counting the number ofdetected junctions as a function of the junction quality QJof Equation (18.1) Figure 18.4 shows someimages of 2D and 3D objects under well-controlled lighting conditions and a cluttered outdoor scene

We use Lee’s method [19] to extract lines

Most junctions (i.e more than 80 %) extracted from possible combinations of the line segmentsare concentrated in the range 00∼10 of the quality measure, as shown in Figure 18.5 The threeexperimental sets of Figure 18.4 give similar tendencies except for a small fluctuation at the qualitymeasure 0.9, as shown in Figure 18.5 At the quality level 0.5, the occupied portion of the junctionsrelative to the whole range drops to less than 1 %

When robust line features are extracted, QJ, as a threshold for the junction detection, does notseverely influence the number of extracted junctions In good conditions, the extracted lines areclearly defined along the object boundary and few cluttered lines exist in the scene, as shown inFigure 18.4(a) Therefore, the extracted junctions are accordingly defined and give junctions with ahigh junction quality factor, as shown in Figure 18.4(a) The parts plot in Figure 18.5 graphicallyshows the high-quality junctions as the peak concentrated in the neighbor range of quality measure 0.9.For QJ= 07, the detection ratio 1.24 (i.e number of junctions/number of primitive lines) ofFigure 18.4(a) is decreased to 0.41 for the outdoor scene of Figure 18.4(c), indicating the increased effect

of the threshold (see also Table 18.1) The effect of threshold level for the number of junctions resultingfrom distorted and broken lines is more sensitive to the outdoor scenes That is, junction detection

Figure 18.4 Junction extraction: the number of junctions depends on the condition of the images.Each column consists of an original image, line segments and junctions and their intersecting pointsfor quality measure 0.5, respectively The small circles in the figure represent the intersection points

of two-line junctions (a) Parts: a 2D scene under controlled lighting conditions; (b) blocks: an indoorimage with good lighting; and (c) cars: a cluttered image under an uncontrolled outdoor road scene

Trang 12

42

Figure 18.5 The occupying percentage of junctions according to changes of the quality measure

Table 18.1 Junction number vs quality measure

To reduce the repeated computation for relations between line segments, all possible relations such

as inter-angles, collinear properties and junction qualities between lines were previously computed

Trang 13

8.1 Line Group Extraction

As an example of 2D line matching for a viewpoint variation, the rear-view of a vehicle was used.The description of the model lines was given as a trapezoidal object The model pattern could have

a clockwise combination of the constituting lines Figure 18.6(a-1) shows the first and the last imageamong a sequence of 30 images to be tested With the cluttered background lines, a meaningfulboundary extraction of the object of interest was difficult, as shown in Figure 18.6(a-2) Figure 18.6(a-3) shows the extraction of junctions in the two frames A threshold for the quality measure QJwasset at 0.5 Figure 18.6(a-4) shows optimal matching lines having the smallest accumulation energy

of Equation (18.7) In spite of some variations from the model shape, a reasonable matching resultwas obtained Unary and binary properties of Equation (18.4) were both used Figure 18.6(b) shows

a few optimal matching results In Figure 18.6(b), the model shape is well matched as the minimum

DP energy of Equation (18.7), in spite of the distorted shapes in the scenes Matching was successfulfor 25 frames out of 30 – a success ratio of 83 % Failing cases result from line extraction errors inlow-level processing, in which lines could not be defined on the rear windows of vehicles

Trang 14

(b)

Figure 18.7 Object matching in a synthetic image with broken and noisy lines

Figure 18.7 shows experimental results for extracting a 2D polyhedral object Figure 18.7(a) shows

a model as a pentagon shape with noisy and broken lines in the background region All lines exceptfor the pentagon were randomly generated Figure 18.7(b) shows a few matching results with small

DP energy A total of six candidates were extracted as the matched results for generating hypothesesfor the object recognition Each one is similar to the pentagon model shape It is interesting to see thelast result because we did not expect this extraction

Two topological combinations of line junctions are shown as model shapes in Figure 18.8 J1 and

J2 junctions are combined with a collinear constraint that also denotes the same rotating condition asthe clockwise direction in the case of Figure 18.8(a) The three binary relations in Section 4 all appear

in the topology of Figure 18.8 In the combination of J2 and J3, the rotating direction between thetwo junctions is reversed In Figure 18.8(b), similar topology to Figure 18.8(a) is given, except forthe difference of rotating direction of the constituting junctions Figure 18.9 presents an example ofextracting topological line groups to guide 3D object recognition The topological shapes are invariant

to wide changes of view That is, if there is no self-occlusion on the object, the interesting line groupsare possible to extract Figure 18.9(a) shows the original image to be tested After discarding theshorter lines, Figure 18.9(b) presents the extracted lines with the numbering indicating the line index,and Figure 18.9(c) and 18.9(d) give the matched line groups corresponding to the model shape of

Trang 15

or scene lines has not been used, because the scene lines are easily broken in outdoor images Anglerelations and line ordering between two neighboring junctions are well preserved, even in broken anddistorted line sets.

Figure 18.9 A topological shape extraction for 3D object recognition (a) Original image; (b) lineextraction; and (c), (d) found topological shapes

Trang 16

41 70

69

79 39

41 70

69 39

86

41 7069

56 38

80

41 116

56 38

80

116 41

38 80

41 118 56

Trang 17

Table 18.2 The topological matching results.

(39, 79) (86, 41) (41, 70) (70, 69)(39, 86) (86, 41) (41, 70) (70, 69)

(40, 41) (41, 80) (80, 38) (38, 56)(106, 41) (41, 80) (80, 38) (38, 56)(94, 41) (41, 80) (80, 38) (38, 56)(116, 41) (41, 80) (80, 38) (38, 56)

8.2 Collinearity Tests for Random Lines

We tested the stability of the two collinear functions proposed in Section 6 by changing the standarddeviation as a threshold for four end points of the two constituting lines The noise was modeled asGaussian random noise having mean 0 and variance 2

A total of 70 lines were randomly generated in a rectangular region of size 100×100 in Figure 18.10,hence the number of possible line pairs was70C2= 2415 Finally, only two line pairs were selected assatisfying the two collinear conditions of Equation (18.19) and Equation (18.25) under 0= 01 When

we reduced the variation value, there were a few collinear sets By a simple definition of collinearfunctions and control of the variance 0 as Gaussian perturbation, we could systematically obtain thecollinear line set without referring to heuristic parameters

1009080706050403020100

(a)xy

Figure 18.10 Collinear lines according to the change of standard deviation 0as a threshold (a) Therandomly generated original line set; (b) line pairs detected for 0= 04; (c) for 0= 03; and (d) for

= 01

Trang 18

(c) (d)(b)

Figure 18.10 (continued)

9 Conclusions

In this chapter, a fast and reliable matching and grouping method using dynamic programming isproposed to extract collections of salient line segments We have considered the classical dynamicprogramming as an optimization technique for geometric matching and grouping problems First, theimportance of grouping to object recognition was emphasized By grouping together line features thatare likely to have been produced by a single object, it has long been known that significant speed-ups

in a recognition system can be achieved, compared to performing a random search This general factwas used as a motive to develop a new feature grouping method We introduced a general way ofrepresenting line patterns and of using the patterns to consistently match 2D and 3D objects.The main element in this chapter is a DP-based formulation for matching and grouping of linepatterns by introducing a robust and stable geometric representation that is based on the perceptualorganizations The end point proximity and collinearity comprised of image lines are introduced as twomain perceptual organizing groups to start the object matching or recognition We detect the junctions

as the end point proximity for the grouping of line segments Then, we search a junction group again,

in which each junction is combined by the collinear constraint between them These local primitives,

by including Lowe’s perceptual organizations and acting as the search node, are consistently in a linkedform in the DP-based search structure

We could also impose several constraints, such as parallelism, the same line condition and rotationaldirection, to increasingly narrow down the search space for possible objects and their poses Themodel description is predefined for comparison with the scene relation The collinear constraint acts tocombine the two junctions as a neighborhood for each other The DP-based search algorithm reducesthe time complexity for the search of the model chain in the scene

Through experiments using images from cluttered scenes, including outdoor environments, we havedemonstrated that the method can be applied to the optimal matching, grouping of line segments and2D/3D recognition problems, with a simple shape description sequentially represented

References

[1] Ayache, N and Faugeras, O D “HYPER: A New Approach for the Recognition and Positioning

of Two-Dimensional Objects,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(1),

pp 44–54, 1986.

[2] Ballard, D H and Brown, C M Computer Vision, Prentice Hall, 1982.

[3] Grimson, W E L and Lozano-Perez, T “Localizing overlapping parts by searching the interpretation tree,”

IEEE Transactions on Pattern Analysis and Machine Intelligence, 9, pp 469–482, 1987.

Trang 19

[4] Hummel, R A and Zucker, S W “On the foundation of relaxation labeling processes,” IEEE Transactions on

Pattern Analysis and Machine Intelligence, 5(3), pp 267–286, 1983.

[5] Li, S Z “Matching: invariant to translations, rotations and scale changes,” Pattern Recognition, 25,

pp 583–594, 1992.

[6] Lowe, D G “Three-Dimensional Object Recognition from Single Two-Dimensional Images,” Artificial

Intelligence, 31, pp 355–395, 1987.

[7] Etemadi, A., Schmidt, J P., Matas, G., Illingworth, J and Kittler, J “Low-level grouping of straight line

segments,” in Proceedings of 1991 British Machine Vision Conference, pp 118–126, 1991.

[8] Fischler, M and Elschlager, R “The representation and matching of pictorial structures,” IEEE Transactions

on Computers, C-22, pp 67–92, 1973.

[9] Grimson, W E L and Huttenlocher, D “On the Sensitivity of the Hough Transform for Object Recognition,”

Proceedings of the Second International Conference on Computer Vision, pp 700–706, 1988.

[10] Li, S Z Markov Random Field Modeling in Computer Vision, Springer Verlag, New York, 1995.

[11] Bunke, H and Buhler, U “Applications of Approximate String Matching to 2D Shape Recognition,” Pattern

Recognition, 26, pp 1797–1812, 1993.

[12] Cox, I J., Higorani, S L and Rao, S B “A Maximum Likelihood Stereo Algorithm,” Computer Vision and

Image Understanding, 63(3), pp 542–567, 1996.

[13] Ohta, Y and Kanade, T “Stereo by intra- and inter- scanline search using dynamic programming,” IEEE

Transactions on Pattern Analysis and Machine Intelligence, 7(2), pp 139–154, 1985.

[14] Jacobs, D W “Robust and Efficient Detection of Convex Groups,” IEEE Conference on Computer Vision and Pattern Recognition, pp 770–771, 1993.

[15] Henikoff, J and Shapiro, L G “Representative Patterns for Model-based Matching,” Pattern Recognition,

26, pp 1087–1098, 1993.

[16] Amini, A A., Weymouth, T E and Jain, R C “Using dynamic programming for solving variational problems

in vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(9), pp 855–867, 1990.

[17] Haralick, Y S and Shapiro, R M “Error Propagation in Machine Vision,” Machine Vision and Applications,

7, pp 93–114, 1994.

[18] Roh, K S and Kweon, I S “2-D object recognition using invariant contour descriptor and projective

invariant,” Pattern Recognition, 31(4), pp 441–455, 1998.

[19] Lee, J W and Kweon, I S “Extraction of Line Features in a Noisy Image,” Pattern Recognition, 30(10),

pp 1651–1660, 1997.

Trang 21

Holo-extraction and Intelligent

Recognition of Digital Curves

Scanned from Paper Drawings

This chapter introduces a holo-extraction method of information from paper drawings, i.e the networks

of Single Closed Regions (SCRs) of black pixels, which not only provide a unified base for recognizing both annotations and the outlines of projections of parts, but can also build the holo-relationships among SCRs so that it is convenient to extract lexical, syntactic and semantic information in the subsequent phases for 3D reconstruction Based on the holo-extraction method, this chapter further introduces an intelligent recognition method of digital curves scanned from paper drawings for subsequent pattern recognition and 3D reconstruction.

1 Introduction

In order to survive worldwide competition, enterprises have tried to use the computer’s huge memorycapacity, fast processing speed and user-friendly interactive graphics capabilities to automate andtie together cumbersome and separate engineering or production tasks, including design, analysis,

Computer-Aided Intelligent Recognition Techniques and Applications Edited by M Sarfraz

Trang 22

In order to convert 2D mechanical paper drawings into 3D computer feature models, the 2D paperdrawings are first scanned by an optical scanner and then the scanned results are input to a computer

in the form of raster (binary) images The conversion from the raster image to 3D model needs twoprocesses: understanding and 3D reconstruction The research on the understanding process has beenimplemented following three phases [1]:

1 The lexical phase The raster image is converted into vectorized information, such as straight lines,

arcs and circles

2 The syntactic phase The outlines of orthographic projections of a part and the annotations are

separated; the dimension sets, including their values of both the nominal dimensions and thetolerances, are aggregated; the crosshatching patterns are identified; and the text is recognized

3 The semantic phase Much semantic information needed for 3D reconstruction is obtained by functional

analyses of each view, including the analysis of symmetries, the recognition of technologicallymeaningful entities from symbolic representation (such as bearing and threading), and so on.After the understanding process, a 3D computer model will then be reconstructed by using geometricmatching techniques with the recognized syntactic and semantic information

Up to now, the research on the conversion from 2D paper drawings to 3D computer feature modelshas been stuck in low-level coding: essential vectorization, basic layer separation and very limitedsymbol recognition [1] One of the reasons for this is that the three phases of the understanding processhave been isolated and people have been doing their research on only one of the phases since thewhole conversion is very complicated and more difficult For instance, the vectorization methods weredeveloped only for getting straight lines, arcs, circles, etc., so that much information contained in thedrawing was lost after the vectorization Also, in some research, different methods were developedand applied for recognizing the text and the outlines of orthographic projections of parts, respectively

In fact, the 3D reconstruction needs not only the vectors themselves but also their relationships andthe information indicated by various symbols, from which the syntactic and semantic information can

be extracted later on This chapter introduces a holo-extraction method of information from paperdrawings, i.e the networks of Single Closed Regions (SCRs) of black pixels, which not only provide

a unified base for recognizing both annotations and the outlines of projections of parts, but also buildthe holo-relationships among SCRs so that it is convenient to extract lexical, syntactic and semanticinformation in the subsequent phases for 3D reconstruction Based on the holo-extraction method,this chapter further introduces an intelligent recognition method of digital curves scanned from paperdrawings for subsequent pattern recognition and 3D reconstruction

2 Review of Current Vectorization Methods

Vectorization is a process that finds the vectors (such as straight lines, arcs and circles) from theraster images Much research work on this area has been done, and many vectorization methodsand their software have been developed Although the vectorization for the lexical phase is more

Trang 23

mature than the technologies used for the other two higher level phases, it is yet quite far from beingperfect.

Current vectorization methods can be categorized into six types: Hough Transform (HT)-basedmethods [2], thinning-based methods, contour-based methods, sparse pixel-based methods, meshpattern-based methods and black pixel region-based methods

2.1 The Hough Transform-based Method

This visits each pixel of the image in the x y plane, detects peaks in its transform m c space, anduses each peak to form a straight line defined by the following equation:

one-pixel-2.3 The Contour-based Method

This first finds the contour of the line object and then calculates the middle points of the pair of points

on two opposite parallel contours or edges [8–10] Although it is much faster than thinning-basedmethods and the line width is also much easier to obtain, joining up the lines for a merging junction or

a cross intersection is problematic, and it is inappropriate for use in vectorization of curved and crossing lines [3]

multi-2.4 The Sparse Pixel-based Method

Here, the basic idea is to track the course of a one-pixel-wide ‘beam of light’, which turns orthogonallyeach time when it hits the edge of the area covered by the black pixels, and to record the midpoint ofeach run [11] With some improvement based on the orthogonal zig-zag, the sparse pixel vectorizationalgorithm can record the medial axis points and the width of run lengths [3]

2.5 Mesh Pattern-based Methods

These divide the entire image using a certain mesh and detect characteristic patterns by only checkingthe distribution of the black pixels on the border of each unit of the mesh [12] A control map for theimage is then prepared using these patterns Finally, the extraction of long straight-line segments isperformed by analyzing the control map This method not only needs a characteristic pattern database,but also requires much more processing time Moreover, it is not suitable for detection of more complexline patterns, such as arcs and discontinuous (e.g dashed or dash-dotted) lines [3]

Trang 24

It can be seen from the current vectorization methods that, except for black pixel region graph-basedmethods, other methods are mainly focused on the speed and accuracy of generating vectors themselves,not on holo-extraction of information Although black pixel region graph-based methods build certainrelationships between constructed runs, rectangles or trapezoids, the regions are so small that it isnot appropriate for curve line vectorization and it is difficult to construct the relationships amongvectors.

In fact, the understanding process for its subsequent 3D reconstruction is an iterative process forsearching different level relationships and performing corresponding connections For instance, linkingcertain related pixels can form a vector Connecting a certain set of vectors can form primitives orcharacters of the text Combining two projection lines and a dimension line containing two arrowheadswith the values for normal dimension and tolerance can produce a dimension set, which can then beused with a corresponding primitive for parametric modeling Aggregating the equal-spaced parallelthin lines and the contour of their area can form a section, which can then be used with certainsection symbols for solid modeling Connecting certain primitives referring to corresponding symbolsrecognized can form certain features (e.g bearing and threading), which can be used for featuremodeling Matching primitives in different views according to orthogonal projective relationships canproduce a 3D model If the primitives extracted are accurate, their projective relationships can bedetermined by analyzing the coordinates of end points of these vectors But the primitives extractedand their projective relationships are inaccurate in paper drawings, so that this method cannot beapplied It needs an expert system that simulates the experienced human designer’s thinking mode totransform the inaccurate outlines of parts’ orthographic projections into 3D object images, so that theirrelationships become more important and crucial As mentioned in the first section of this chapter, thevectorization process in the first phase should not lose the information in a drawing or the informationneeded for 3D reconstruction, which are mainly different level relationships contained in the rasterimage Accordingly, a holo-extraction of information from the raster image is needed In order tofacilitate the iterative process for searching different level relationships and performing correspondingconnections, the method needs a compact representation of the raster image as a bridge from the rasterimage to understanding, which should satisfy the following requirements:

• it can distinguish different types of linking point for different relationships of the related elements(e.g tangential point, intersecting point and merging junction) to provide necessary information forextracting lexical, syntactic and semantic information in the subsequent phases;

• it can provide a unified base for further recognizing both the outlines of orthogonal projections ofparts and the annotations, and facilitate their separation;

• it can recognize line patterns and arrowheads, and facilitate the aggregation of related elements toform certain syntactic information, such as dimension sets;

• it can facilitate recognizing vectors quickly and precisely, including straight lines, arcs and circles;

• it can provide holo-graphs of all the elements as a base for the subsequent 3D reconstruction.The networks of SCRs reported in this chapter are developed for these purposes

Trang 25

3 Construction of the Networks of SCRs

The elements and workflow of constructing the networks of SCRs are shown in Figure 19.1, andillustrated in more detail as follows

3.1 Generating Adjacency Graphs of Runs

Let visiting a sequence be from top to bottom for a raster image and from left to right along thehorizontal direction for each row When visiting a row, the first black pixel converted from a whitepixel is the starting point of a run, and the last black pixel, if the next pixel is white, is the end point

of the run The value of its end coordinates minus the starting coordinate in the horizontal direction isits run length, the unit of which is the pixel If the difference of two runs’ vertical coordinates is 1 andthe value of their minimal end coordinates minus maximal starting coordinates is larger than or equal

to 1, the two runs are adjacent If A and B are adjacent and A’s vertical coordinate is larger than that

of B, A is called the predecessor of B, and B is called the successor of A There are seven types ofrun according to adjacency relationships with other runs, as shown in Figure 19.2:

By recording the adjacency relationships of runs while visiting in the assumed way, the adjacencygraphs of runs can be made using a node to represent a run, and a line between two nodes to

Raster image

Generate adjacency graphs of runs

Construct closed regions

Split closed regions into single closed regions

Build adjacency graphs of single closed regions

Construct networks of single closed regions

Figure 19.1 Elements and workflow of constructing the networks of SCRs

Trang 26

(a) (b)

Figure 19.3 The adjacency graph of runs for a run graph

represent an adjacency relationship between the two runs Different properties of runs (including theirstarting coordinates, end coordinates, lengths and types) are stored in the node data file Figure 19.3(b)shows an adjacency graph of runs generated from the run graph in Figure 19.3(a) It is obviousthat the adjacency graph of runs represents not only the runs themselves, but also their adjacencyrelationships

3.2 Constructing Single Closed Regions (SCRs)

Based on the adjacency graphs of runs constructed, a higher level representation of the raster imagecan be obtained by aggregating related runs into closed regions These related runs must satisfy thefollowing requirements:

• they are adjacent to each other;

• the beginning run is not a branching run or a cross run, and the end run is not a merging run or across run;

• the difference between their lengths is less than their shortest run length

From the first run, the related runs can be searched via the adjacency graphs of runs, andtheir starting points and end points are then recorded to construct closed regions Figure 19.4(c)shows a closed region graph constructed from the run graph in Figure 19.4(b) for the outlines inFigure 19.4(a)

According to the method, the possible results obtained may be the closed region for a point, astraight line, an arc, an arrowhead or a combined line, as shown in Figure 19.5(a), (b), (c), (d) and (e)respectively, which are identified using a fuzzy logic method [16] A point may be an independentpoint, branching point, merging point, cross point or tangential point, as shown in Figures 19.5(a),19.6(a), 19.6(b), 19.6(c) and 19.6(d) respectively A straight line can be a sloping, vertical or horizontalstraight line

Although the closed region for a point, a straight line, an arc or an arrowhead is a single closedregion, the closed region for a combined line is not an SCR, since a combined line may consist

of straight lines and/or arcs, which are linked or tangential to each other In order to construct theadjacency graphs of SCRs, the closed region for a combined line should be decomposed into severalSCRs, each of which represents a straight line or an arc The method for decomposition is to findtheir optimal split points with the least number of segments, each of which can be fitted to a straightline or an arc with least error This is an optimization problem, which can be solved by using agenetic algorithm [17] and will be introduced in detail in Section 6 For example, with this method,

Tiêu đề	Fast Object Recognition Using Dynamic Programming from a Combination of Salient Line Groups
Tác giả	Dong Joong Kang, Jong Eun Ha
Người hướng dẫn	In So Kweon
Trường học	Tongmyong University of Information Technology
Chuyên ngành	Information Technology
Thể loại	Chương
Thành phố	Busan

Định dạng
Số trang	52
Dung lượng	752,38 KB