2.4.5 Image Recognition and Decisions 2.4.5.1 Neural Networks Arti®cial neural networks ANNs can be used in image processing applications.. 2.4.6 Image Processing Applications Arti®cial
Trang 1basic operations, like linear ®ltering and modulations,
are easily described in the Fourier domain A common
example of Fourier transforms can be seen in the
appearance of stars A star lools like a small point of
twinkling light However, the small point of light we
observe is actually the far-®eld Fraunhoffer diffraction
pattern or Fourier transform of the image of the star
The twinkling is due to the motion of our eyes The
moon image looks quite different, since we are close
enough to view the near-®eld or Fresnel diffraction
pattern
While the most common transform is the Fourier
transform, there are also several closely related
trans-forms The Hadamard, Walsh, and discrete cosinetransforms are used in the area of image compression.The Hough transform is used to ®nd straight lines in abinary image The Hotelling transform is commonlyused to ®nd the orientation of the maximum dimension
of an object [5]
2.4.2.1 Fourier TransformThe one-dimensional Fourier transform may be writ-ten as
F u
1
1
Figure 6 Images at various gray-scale quantization ranges
Figure 7 Digitized image
Figure 8 Color cube shows the three-dimensional nature ofcolor
Figure 9 Image surface and viewing geometry effects
Trang 2In the two-dimensional case, the Fourier transform
and its corresponding inverse representation are:
The discrete two-dimensional Fourier transform and
corresponding inverse relationship may be written as
a linear, position invariant system are related by aconvolution, is an important principle The basic idea
of convolution is that if we have two images, for ple, pictures A and B, then the convolution of A and Bmeans repeating the whole of A at every point in B, orvice versa An example of the convolution theorem isshown inFig 12 The convolution theorem enables us
exam-to do many important things During the Apollo 13space ¯ight, the astronauts took a photograph of theirdamaged spacecraft, but it was out of focus Imageprocessing methods allowed such an out-of-focus pic-ture to be put back into focus and clari®ed
2.4.3 Image EnhancementImage enhancement techniques are designed to improvethe quality of an image as perceived by a human [1].Some typical image enhancement techniques includegray-scale conversion, histogram, color composition,etc The aim of image enhancement is to improve theinterpretability or perception of information in imagesfor human viewers, or to provide ``better'' input forother automated image processing techniques
2.4.3.1 HistogramsThe simplest types of image operations are pointoperations, which are performed identically on eachpoint in an image One of the most useful point opera-tions is based on the histogram of an image
Figure 10 Diffuse surface re¯ection
Figure 11 Specular re¯ection
Trang 3the image enables us to generate another image with a
gray-level distribution having a uniform density
This transformation can be implemented by a
three-step process:
1 Compute the histogram of the image
2 Compute the cumulative distribution of the
gray levels
3 Replace the original gray-level intensities using
the mapping determined in 2
After these processes, the original image, shown in Fig
13, can be transformed, and scaled and viewed as
shown in Fig 16 The new gray-level value set Sk,
which represents the cumulative sum, is
Sk 1=7; 2=7; 5=7; 5=7; 5=7; 6=7; 6=7; 7=7
Histogram Speci®cation Even after the equalization
process, certain levels may still dominate the image so
that the eye cannot interpret the contribution of the
other levels One way to solve this problem is to specify
a histogram distribution that enhances selected gray
levels relative to others and then reconstitutes the
ori-ginal image in terms of the new distribution For
exam-ple, we may decide to reduce the levels between 0 and
2, the background levels, and increase the levels
between 5 and 7 correspondingly After the similar
step in histogram equalization, we can get the newgray levels set Sk0:
Sk0 1=7; 5=7; 6=7; 6=7; 6=7; 6=7; 7=7; 7=7
By placing these values into the image, we can get thenew histogram-speci®ed image shown inFig 17.Image Thresholding This is the process of separating
an image into different regions This may be basedupon its gray-level distribution.Figure 18 shows how
an image looks after thresholding The percentage
Figure 15 An example of histogram equalization (a) Original image, (b) histogram, (c) equalized histogram, (d) enhanced image
Figure 16 Original image before histogram equalization
Trang 4Next, we shift the window one pixel to the right and
repeat the calculation After calculating all the pixels in
the line, we then reposition the matrix one pixel down
and repeat this procedure At the end of the entire
process, we have a set of T values, which enable us
to determine the existence of the edge Depending on
the values used in the mask template, various effects
such as smoothing or edge detection will result
Since edges correspond to areas in the image where
the image varies greatly in brightness, one idea would
be to differentiate the image, looking for places where
the magnitude of the derivative is large The only
drawback to this approach is that differentiation
enhances noise Thus, it needs to be combined with
smoothing
Smoothing Using Gaussians One form of smoothing
the image is to convolve the image intensity with a
gaussian function Let us suppose that the image is
of in®nite extent and that the image intensity is
I x; y The Gaussian is a function of the form
G x; y 1
22e x 2 y 2 =2 2
12
The result of convolving the image with this function is
equivalent to lowpass ®ltering the image The higher
the sigma, the greater the lowpass ®lter's effect The
®ltered image is
I x; y I x; y G x; y 13
One effect of smoothing with a Gaussian function is a
reduction in the amount of noise, because of the low
pass characteristic of the Gaussian function Figure 20
shows the image with noise added to the original, Fig
19
Figure 21 shows the image ®ltered by a lowpass
Gaussian function with 3
Vertical Edges To detect vertical edges we ®rst volve with a Gaussian function and then differentiate
con-I x; y I x; y G x; y 14the resultant image in the x-direction This is the same
as convolving the image with the derivative of thegaussian function in the x-direction that is
x22e x 2 y 2 =2 2
15Then, one marks the peaks in the resultant images thatare above a prescribed threshold as edges (the thresh-old is chosen so that the effects of noise are mini-mized) The effect of doing this on the image of Fig
21 is shown inFig 22
Horizontal Edges To detect horizontal edges we ®rstconvolve with a Gaussian and then differentiate theresultant image in the y-direction But this is thesame as convolving the image with the derivative ofthe gaussian function in the y-direction, that isy
Figure 19 A digital image from a camera
Figure 20 The original image corrupted with noise
Figure 21 The noisy image ®ltered by a Gaussian of variance3
Trang 5Stereometry This is the technique of deriving a range
image from a stereo pair of brightness images It has
long been used as a manual technique for creating
elevation maps of the earth's surface
Stereoscopic Display If it is possible to compute a
range image from a stereo pair, then it should be
pos-sible to generate a stereo pair given a single brightness
image and a range image In fact, this technique makes
it possible to generate stereoscopic displays that give
the viewer a sensation of depth
Shaded Surface Display By modeling the imaging
system, one can compute the digital image that
would result if the object existed and if it were digitized
by conventional means Shaded surface display grew
out of the domain of computer graphics and has
devel-oped rapidly in the past few years
2.4.5 Image Recognition and Decisions
2.4.5.1 Neural Networks
Arti®cial neural networks (ANNs) can be used in
image processing applications Initially inspired by
biological nervous systems, the development of
arti®-cial neural networks has more recently been motivated
by their applicability to certain types of problem and
their potential for parallel processing implementations
Biological Neurons There are about a hundred
bil-lion neurons in the brain, and they come in many
dif-ferent varieties, with a highly complicated internal
structure Since we are more interested in large
net-works of such units, we will avoid a great level of
detail, focusing instead on their salient computational
features A schematic diagram of a single biological
neuron is shown in Fig 27
The cells at the neuron connections, or synapses,
receive information in the form of electrical pulses
from the other neurons The synapses connect to thecell inputs, or dendrites, and form an electrical signaloutput of the neuron is carried by the axon An elec-trical pulse is sent down the axon, or the neuron
``®res,'' when the total input stimuli from all of thedendrites exceeds a certain threshold Interestingly,this local processing of interconnected neurons results
in self-organized emergent behavior
Arti®cial Neuron Model The most commonly usedneuron model, depicted in Fig 28, is based on the
Figure 26 Edges of the original image
Figurer 27 A schematic diagram of a single biologicalneuron
Figure 28 ANN model proposed by McCulloch and Pitts in1943
Trang 6model proposed by McCulloch and Pitts in 1943 [11].
In this model, each neuron's input, a1 an, is weighted
by the values wi1 win A bias, or offset, in the node is
characterized by an additional constant input w0 The
output, ai, is obtained in terms of the equation
Feedforward and Feedback Networks Figure 29
shows a feedforward network in which the neurons
are organized into an input layer, hidden layer or
layers, and an output layer The values for the input
layer are set by the environment, while the output layer
values, analogous to a control signal, are returned to
the environment The hidden layers have no external
connections, they only have connections with other
layers in the network In a feedforward network, a
weight wij is only nonzero if neuron i is in one layer
and neuron j is in the previous layer This ensures that
information ¯ows forward through the network, from
the input layer to the hidden layer(s) to the output
layer More complicated forms for neural networks
exist and can be found in standard textbooks
Training a neural network involves determining the
weights wij such that an input layer presented with
information results in the output layer having a correct
response This training is the fundamental concern
when attempting to construct a useful network
Feedback networks are more general than
feedfor-ward networks and may exhibit different kinds of
behavior A feedforward network will normally settle
into a state that is dependent on its input state, but a
feedback network may proceed through a sequence of
states, even though there is no change in the externalinputs to the network
2.4.5.2 Supervised Learning and Unsupervised
LearningImage recognition and decision making is a process ofdiscovering, identifying, and understanding patternsthat are relevant to the performance of an image-based task One of the principal goals of image recog-nition by computer is to endow a machine with thecapability to approximate, in some sense, a similarcapability in human beings For example, in a systemthat automatically reads images of typed documents,the patterns of interest are alphanumeric characters,and the goal is to achieve character recognition accu-racy that is as close as possible to the superb capabilityexhibited by human beings for performing such tasks.Image recognition systems can be designed andimplemented for limited operational environments.Research in biological and computational systems iscontinually discovering new and promising theories
to explain human visual cognition However, we donot yet know how to endow these theories and appli-cations with a level of performance that even comesclose to emulating human capabilities in performinggeneral image decision functionality For example,some machines are capable of reading printed, prop-erly formatted documents at speeds that are orders ofmagnitude faster than the speed that the most skilledhuman reader could achieve However, systems of thistype are highly specialized and thus have little extend-ibility That means that current theoretical and imple-mentation limitations in the ®eld of image analysis anddecision making imply solutions that are highly pro-blem dependent
Different formulations of learning from an ment provide different amounts and forms of informa-tion about the individual and the goal of learning Wewill discuss two different classes of such formulations
environ-of learning
Supervised Learning For supervised learning, a
``training set'' of inputs and outputs is provided Theweights must then be determined to provide the correctoutput for each input During the training process, theweights are adjusted to minimize the differencebetween the desired and actual outputs for eachinput pattern
If the association is completely prede®ned, it is easy
to de®ne an error metric, for example mean-squarederror, of the associated response This is turn gives usthe possibility of comparing the performance with the
Figure 29 A feedforward neural network
Trang 7prede®ned responses (the ``supervision''), changing the
learning system in the direction in which the error
diminishes
Unsupervised Learning The network is able to
dis-cover statistical regularities in its input space and can
automatically develop different modes of behavior to
represent different classes of inputs In practical
appli-cations, some ``labeling'' is required after training,
since it is not known at the outset which mode of
behavior will be associated with a given input class
Since the system is given no information about the
goal of learning, all that is learned is a consequence
of the learning rule selected, together with the
indivi-dual training data This type of learning is frequently
referred to as self-organization
A particular class of unsupervised learning rule
which has been extremely in¯uential is Hebbian
learn-ing [12] The Hebb rule acts to strengthen often-used
pathways in a network, and was used by Hebb to
account for some of the phenomena of classical
con-ditioning
Primarily some type of regularity of data can be
learned by this learning system The associations
found by unsupervised learning de®ne representations
optimized for their information content Since one of
the problems of intelligent information processing
deals with selecting and compressing information, the
role of unsupervised learning principles is crucial for
the ef®ciency of such intelligent systems
2.4.6 Image Processing Applications
Arti®cial neural networks can be used in image
proces-sing applications Many of the techniques used are
variants of other commonly used methods of pattern
recognition However, other approaches of image
pro-cessing may require modeling of the objects to be
found within an image, while neural network models
often work by a training process Such models also
need attention devices, or invariant properties, as it is
usually infeasible to train a network to recognize
instances of a particular object class in all orientations,
sizes, and locations within an image
One method commonly used is to train a network
using a relatively small window for the recognition of
objects to be classi®ed, then to pass the window over
the image data in order to locate the sought object,
which can then be classi®ed once located In some
engineering applications this process can be performed
by image preprocessing operations, since it is possible
to capture the image of objects in a restricted range of
orientations with predetermined locations and priate magni®cation
appro-Before the recognition stage, the system has to bedetermined such as which image transform is to beused These transformations include Fourier trans-forms, or using polar coordinates or other specializedcoding schemes, such as the chain code One interest-ing neural network model is the neocognition model ofFukushima and Miyake [13], which is capable of recog-nizing characters in arbitrary locations, sizes andorientations, by the use of a multilayered network.For machine vision, the particular operationsinclude setting the quantization levels for the image,normalizing the image size, rotating the image into astandard orientation, ®ltering out background detail,contrast enhancement, and edge direction Standardtechniques are available for these and it may be moreeffective to use these before presenting the transformeddata to a neural network
2.4.6.1 Steps in Setting Up an ApplicationThe main steps are shown below
Physical setup: light source, camera placement,focus, ®eld of view
Software setup: window placement, threshold,image map
Feature extraction: region shape features, gray-scalevalues, edge detection
Decision processing: decision function, training,testing
2.4.7 Future Development of Machine VisionAlthough image processing has been successfullyapplied to many industrial applications, there are stillmany de®nitive differences and gaps between machinevision and human vision Past successful applicationshave not always been attained easily Many dif®cultproblems have been solved one by one, sometimes bysimplifying the background and redesigning theobjects Machine vision requirements are sure toincrease in the future, as the ultimate goal of machinevision research is obviously to approach the capability
of the human eye Although it seems extremely dif®cult
to attain, it remains a challenge to achieve highly tional vision systems
func-The narrow dynamic range of detectable brightnesscauses a number of dif®culties in image processing Anovel sensor with a wide detection range will drasti-cally change the impact of image processing As micro-electronics technology progreses, three-dimensional
Trang 8compound sensor, large scale integrated circuits (LSI)
are also anticipated, to which at least preprocessing
capability should be provided
As to image processors themselves, the local
par-allel pipelined processor may be further improved to
proved higher processing speeds At the same time,
the multiprocessor image processor may be applied in
industry when the key-processing element becomes
more widely available The image processor will
become smaller and faster, and will have new
func-tions, in response to the advancement of
semiconduc-tor technology, such as progress in system-on-chip
con®gurations and wafer-scale integration It may
also be possible to realize one-chip intelligent
proces-sors for high-level processing, and to combine these
with one-chip rather low-level image processors to
achieve intelligent processing, such as
knowledge-based or model-knowledge-based processing Based on these
new developments, image processing and the resulting
machine vision improvements are expected to
gener-ate new values not merely for industry but for all
aspects of human life
2.5 MACHINE VISION APPLICATIONS
Machine vision applications are numerous as shown in
the following list
Surface contour accuracy
Part identi®cation and sorting:
Sorting
Shape recognition
Inventory monitoring
Conveyor pickingÐnonoverlapping parts
Conveyor pickingÐoverlapping parts
Bin picking
Industrial robot control:
Tracking
Seam welding guidance
Part positioning and location determination
2.5.1 OverviewHigh-speed production lines, such as stamping lines,use machine vision to meet online, real time inspectionneeds Quality inspection involves deciding whetherparts are acceptable or defective, then directing motioncontrol equipment to reject or accept them Machineguidance applications improve the accuracy and speed
of robots and automated material handling equipment.Advanced systems enable a robot to locate a part or anassembly regardless of rotation or size In gaging appli-cations, a vision system works quickly to measure avariety of critical dimensions The reliability and accu-racy achieved with these methods surpasses anythingpossible with manual methods
In the machine tool industry, applications formachine vision include sensing tool offset and break-age, verifying part placement and ®xturing, and mon-itoring surface ®nish A high-speed processor that oncecost $80,000 now uses digital signal processing chiptechnology and costs less than $10,000 The rapidgrowth of machine vision usage in electronics, assem-bly systems, and continuous process monitoring cre-ated an experience base and tools not available even
a few years ago
2.5.2 InspectionThe ability of an automated vision system to recognizewell-de®ned patterns and determine if these patternsmatch those stored in the system's CPU memorymakes it ideal for the inspection of parts, assemblies,containers, and labels Two types of inspection can beperformed by vision systems: quantitative and qualita-tive Quantitative inspection is the veri®cation thatmeasurable quantities fall within desired ranges of tol-erance, such as dimensional measurements and thenumber of holes Qualitative inspection is the veri®ca-tion that certain components or properties are presentand in a certain position, such as defects, missing parts,extraneous components, or misaligned parts
Many inspection tasks involve comparing the givenobject with a reference standard and verifying thatthere are no discrepancies One method of inspection
is called template matching An image of the object iscompared with a reference image, pixel by pixel A dis-crepancy will generate a region of high differences Onthe other hand, if the observed image and the reference
Trang 9are slightly out of registration, differences will be found
along the borders between light and dark regions in the
image This is because a slight misalignment can lead to
dark pixels being compared with light pixels
A more ¯exible approach involves measuring a set
of the image's properties and comparing the measured
values with the corresponding expected values An
example of this approach is the use of width
measure-ments to detect ¯aws in printed circuits Here the
expected width values were relatively high; narrow
ones indicated possible defects
2.5.2.1 Edge-Based Systems
Machine vision systems, which operate on edge
descriptions of objects, have been developed for a
number of defense applications Commercial
edge-based systems with pattern recognition capabilities
have reached markets now The goal of edge detection
is to ®nd the boundaries of objects by marking points
of rapid change in intensity Sometimes, the systems
operate on edge descriptions of images as
``gray-level'' image systems These systems are not sensitive
to the individual intensities of patterns, only to changes
in pixel intensity
2.5.2.2 Component or Attribute Measurements
An attribute measurement system calculates speci®c
qualities associated with known object images
Attributes can be geometrical patterns, area, length
of perimeter, or length of straight lines Such systems
analyze a given scene for known images with
prede-®ned attributes Attributes are constructed from
pre-viously scanned objects and can be rotated to match an
object at any given orientation This technique can be
applied with minimal preparation However, orienting
and matching are used most ef®ciently in aplications
permitting standardized orientations, since they
con-sume signi®cant processing time Attribute
measure-ment is effective in the segregating or sorting of
parts, counting parts, ¯aw detection, and recognition
decisions
2.5.2.3 Hole Location
Machine vision is ideally suited for determining if a
well-de®ned object is in the correct location relative
to some other well-de®ned object Machined objects
typically consist of a variety of holes that are drilled,
punched, or cut at speci®ed locations on the part
Holes may be in the shape of circular openings, slits,
squares, or shapes that are more complex Machine
vision systems can verify that the correct holes are inthe correct locations, and they can perform this opera-tion at high speeds A window is formed around thehole to be inspected If the hole is not too close toanother hole or to the edge of the workpiece, onlythe image of the hole will appear in the window andthe measurement process will simply consist of count-ing pixels Hole inspection is a straightforward appli-cation for machine vision It requires a two-dimensional binary image and the ability to locateedges, create image segments, and analyze basic fea-tures For groups of closely located holes, it may alsorequire the ability to analyze the general organization
of the image and the position of the holes relative toeach other
2.5.2.4 Dimensional Measurements
A wide range of industries and potential applicationsrequire that speci®c dimensional accuracy for the ®n-ished products be maintained within the tolerance lim-its Machine vision systems are ideal for performing100% accurate inspections of items which are moving
at high speeds or which have features which are cult to measure by humans Dimensions are typicallyinspected using image windowing to reduce the dataprocessing requirements A simple linear length mea-surement might be performed by positioning a longwidth window along the edge The length of the edgecould then be determined by counting the number ofpixels in the window and translating into inches ormillimeters The output of this dimensional measure-ment process is a ``pass±fail'' signal received by ahuman operator or by a robot In the case of a con-tinuous process, a signal that the critical dimensionbeing monitored is outside the tolerance limits maycause the operation to stop, or it may cause the form-ing machine to automatically alter the process.2.5.2.5 Defect Location
dif®-In spite of the component being present and in thecorrect position, it may still be unacceptable because
of some defect in its construction The two types ofpossible defects are functional and cosmetic
A functional defect is a physical error, such as abroken part, which can prevent the ®nished productfrom performing as intended A costmetic defect is a
¯aw in the appearance of an object, which will notinterfere with the product's performance, but maydecrease the product's value when perceived by theuser Gray-scale systems are ideal for detecting subtledifferences in contrast between various regions on the
Trang 10surface of the parts, which may indicate the presence of
defects Some examples of defect inspection include the
inspection of:
Label position on bottles
Deformations on metal cans
Deterioration of dies
Glass tubing for bubbles
Cap seals for bottles
Keyboard character deformations
2.5.2.6 Surface Contour Accuracy
The determination of whether a three-dimensional
curved surface has the correct shape or not is an
important area of surface inspection Complex
manu-factured parts such as engine block castings or aircraft
frames have very irregular three-dimensional shapes
However, these complex shapes must meet a large
number of dimensional tolerance speci®cations
Manual inspection of these shapes may require several
hours for each item A vision system may be used for
mapping the surface of these three-dimensional
objects
2.5.3 Part Identi®cation and Sorting
The recognition of an object from its image is the most
fundamental use of a machine vision system
Inspection deals with the examination of objects
with-out necessarily requiring that the objects be identi®ed
In part recognition however, it is necessary to make a
positive identi®cation of an object and then make the
decision from that knowledge This is used for
categor-ization of the objects into one of several groups The
process of part identi®cation generally requires strong
geometrical feature interpretation capabilities The
applications considered often require an interface
cap-ability with some sort of part-handling equipment An
industrial robot provides this capability
There are manufacturing situations that require that
a group of varying parts be categorized into common
groups and sorted In general, parts can be sorted
based on several characteristics, such as shape, size,
labeling, surface markings, color, and other criteria,
depending on the nature of the application and the
capabilities of the vision system
2.5.3.1 Character Recognition
Usually in manufacturing situations, an item can be
identi®ed solely based on the identi®cation of an
alpha-numeric character or a set of characters Serial bers on labels identify separate batches in whichproducts are manufactured Alphanumeric charactersmay be printed, etched, embossed, or inscribed on con-sumer and industrial products Recent developmentshave provided certain vision systems with the capabil-ity of reading these characters
num-2.5.3.2 Inventory MonitoringCategories of inventories, which can be monitored forcontrol purposes, need to be created The sorting pro-cess of parts or ®nished products is then based on thesecategories Vision system part identi®cation capabil-ities make them compatible with inventory control sys-tems for keeping track of raw material, work inprocess, and ®nished goods inventories Vision systeminterfacing capability allows them to command indus-trial robots to place sorted parts in inventory storageareas Inventory level data can then be transmitted to ahost computer for use in making inventory-leveldecisions
2.5.3.3 Conveyor PickingÐOverlapOne problem encountered during conveyor picking isoverlapping parts This problem is complicated by thefact that certain image features, such as area, losemeaning when the images are joined together Incases of a machined part with an irregular shape, ana-lysis of the overlap may require more sophisticateddiscrimination capabilities, such as the ability toevaluate surface characteristics or to read surfacemarkings
2.5.3.4 No Overlap
In manufacturing environments with high-volumemass production, workpieces are typically positionedand oriented in a highly precise manner Flexible auto-mation, such as robotics, is designed for use in therelatively unstructured environments of most factories.However, ¯exible automation is limited without theaddition of the feedback capability that allows it tolocate parts Machine vision systems have begun toprovide the capability The presentation of parts in arandom manner, as on a conveyor belt, is common in
¯exible automation in batch production A batch ofthe same type of parts will be presented to the robot
in a random distribution along the conveyor belt Therobot must ®rst determine the location of the part andthen the orientation so that the gripper can be properlyaligned to grip the part
Trang 112.5.3.5 Bin Picking
The most common form of part representation is a bin
of parts that have no order While a conveyor belt
insures a rough form of organization in a
two-dimen-sional plane, a bin is a three-dimentwo-dimen-sional assortment of
parts oriented randomly through space This is one of
the most dif®cult tasks for a robot to perform
Machine vision is the most likely tool that will enable
robots to perform this important task Machine vision
can be used to locate a part, identify orientation, and
direct a robot to grasp the part
2.5.4 Industrial Robot Control
2.5.4.1 Tracking
In some applications like machining, welding,
assem-bly, or other process-oriented applications, there is a
need for the parts to be continuously monitored and
positioned relative to other parts with a high degree of
precision A vision system can be a powerful tool for
controlling production operations The ability to
mea-sure the geometrical shape and the orientation of the
object coupled with the ability to measure distance is
important A high degree of image resolution is also
needed
2.5.4.2 Seam Welding Guidance
Vision systems used for this application need more
features than the systems used to perform continuous
welding operations They must have the capability to
maintain the weld torch, electrode, and arc in the
proper positions relative to the weld joint They must
also be capable of detecting weld joints details, such as
widths, angles, depths, mismatches, root openings,
tack welds, and locations of previous weld passes
The capacity to perform under conditions of smoke,
heat, dirt, and operator mistreatment is also necessary
2.5.4.3 Part Positioning and Location
Determination
Machine vision systems have the ability to direct a part
to a precise position so that a particular machining
operation may be performed on it As in guidance
and control applications, the physical positioning is
performed by a ¯exible automation device, such as a
robot The vision system insures that the object is
cor-rectly aligned This facilitates the elimination of
expen-sive ®xturing The main concern here would be how to
achieve a high level of image resolution so that the
position can be measured accurately In cases in
which one part would have to touch another part, atouch sensor might also be needed
2.5.4.4 Collision AvoidanceOccasionally, there is a case in industry, where robotsare being used with ¯exible manufacturing equipment,when the manipulator arm can come in contact withanother piece of equipment, a worker, or other obst-acles, and cause an accident Vision systems may beeffectively used to prevent this This applicationwould need the capability of sensing and measuringrelative motion as well as spatial relationships amongobjects A real-time processing capability would berequired in order to make rapid decisions and preventcontact before any damage would be done
2.5.4.5 Machining MonitoringThe popular machining operations like drilling, cut-ting, deburring, gluing, and others, which can be pro-grammed of¯ine, have employed robots successfully.Machine vision can greatly expand these capabilities
in applications requiring visual feedback The tage of using a vision system with a robot is that thevision system can guide the robot to a more accurateposition by compensating for errors in the robot's posi-tioning accuracy Human errors, such as incorrectpositioning and undetected defects, can be overcme
advan-by using a vision system
2.5.5 Mobile Robot ApplicationsThis is an active research topic in the following areas.Navigation
GuidanceTrackingHazard determinationObstacle avoidance
2.6 CONCLUSIONS ANDRECOMMENDATIONSMachine vision, even in its short history, has beenapplied to practically every type of imagery with var-ious degrees of success Machine vision is a multidisci-plinary ®eld It covers diverse aspects of optics,mechanics, electronics, mathematics, photography,and computer technology This chapter attempts tocollect the fundamental concepts of machine visionfor a relatively easy introduction to this ®eld
Trang 12The declining cost of both processing devices and
required computer equipment make it likely to have a
continued growth for the ®eld Several new
technolo-gical trends promise to stimulate further growth of
computer vision systems Among these are:
Parallel processing, made practical by low-cost
Inexpensive, high-resolution color display systems
Machine vision systems can be applied to many
manufacturing operations where human vision is
tra-ditional These systems are best for applications in
which their speed and accuracy over long time periods
enable them to outperform humans Some
manufac-turing operations depend on human vision as part of
the manufacturing process Machine vision can
accom-plish tasks that humans cannot perform due to
hazar-dous conditions and carry out these tasks at a higher
con®dence level than humans Beyond inspecting
pro-ducts, the human eye is also valued for its ability to
make measurement judgments or to perform
calibra-tion This will be one of the most fruitful areas for
using machine vision to replace labor The bene®ts
3 JD Murray, W Van Ryper Encyclopedia of GraphicFile Formats Sebastopol, CA: O'Reilly andAssociates, 1994
4 G Wagner Now that they're cheap, we have to makethem smart Proceedings of the SME Applied MachineVision' 96 Conference, Cincinnati, OH, June 3±6, 1996,
7 MD Levine Vision in Man and Machine Hill, New York, 1985, pp 151±170
McGraw-8 RM Haralick, LG Shapiro Computer and RobotVision Addison-Wesley, Reading, MA, 1992, pp 509±553
9 EL Hall Fundamental principles of robot vision In:Handbook of Pattern Recognition and ImageProcessing Computer Vision, Academic Press,Orlando, FL, 1994, pp 542±575
10 R Schalkoff, Pattern Recognition, John Wiley, NY,
1992, pp 204±263
11 WS McCulloch and WH Pitts, ``A Logical Calculus ofthe Ideas Imminent in Nervous Behavior,'' Bulletin ofMathematical Biophysics, Vol 5, 1943, pp 115±133
12 D Hebb Organization of Behavior, John Wiley & Sons,
NY, 1949
13 K Fukushima and S Miyake, ``Neocognition: A NewAlgorithm for Pattern Recognition Tolerant ofDeformations and Shifts in Position,'' PatternRecognition, Vol 15, No 6, 1982, pp 455±469
14 M Sonka, V Klavec, and R Boyle, Image Processing,Analysis and Machine Vision, PWS, Paci®c Grove, CA,
1999, pp 722±754
Trang 13Three-dimensional vision concerns itself with a system
that captures three-dimensional displacement
informa-tion from the surface of an object Let us start by
reviewing dimensions and displacements A
displace-ment between two points is a one-dimensional
mea-surement One point serves as the origin and the
second point is located by a displacement value
Displacements are described by a multiplicity of
stan-dard length units For example, a displacement can be
3 in Standard length units can also be used to create a
co-ordinate axis For example, if the ®rst point is the
origin, the second point may fall on the co-ordinate 3
which represents 3 in
Determining the displacements among three points
requries a minimum of two co-ordinate axes, assuming
the points do not fall on a straight line With one point
as the origin, measurements are taken in perpendicular
(orthogonal) directions, once again using a standard
displacement unit
Three-dimensional vision determines displacements
along three co-ordinate axes Three dimensions are
required when the relationship among four points is
desired that do not fall on the same plane
Three-dimensional sensing systems are usually used to
acquire more than just four points Hundreds or
thou-sands of points are obtained from which critical spatial
relationships can be derived Of course, simple
one-dimensional measurements can still be made point to
point fronm the captured data
The three-dimensional vision systems discussed inthis chapter can also be referred to as triangulationsystems These systems typically consist of two cam-eras, or a camera and projector The systems use geo-metrical relationships to calculate the location of alarge number of points, simultaneously Three-dimen-sional vision systems are computationally intensive.Advances in computer processing and storage technol-ogies have made these systems economical
3.1.1 Competing TechnologiesBefore proceeding, let us review other three-dimen-sional capture technologies that are available.Acquisition of three-dimensional data can be broadlycategorized into contact and noncontact methods.Contact methods require the sensing system to makephysical contact with the object Noncontact methodsprobe the surface unobtrusively
Scales and calipers are traditional contact ment devices that require a human operator When theoperator is a computer, the measuring device would be
measure-a co-ordinmeasure-ate memeasure-asuring mmeasure-achine (CMM) A CMM is
a rectangular robot that uses a probe to acquire dimensional positional data The probe senses contactwith a surface using a force transducer The CMMrecords the three-dimensional position of the sensor
three-as it touches the surface point
Several noncontact methods exist for capturingthree-dimensional data Each has its advantages anddisadvantages One method, known as time of ¯ight,
415
Trang 14bounces a laser, sound wave, or radio wave off the
surface of interest By measuring the time it takes for
the signal to return, one can calculate a position
Acoustical time-of-¯ight systems are better known as
sonar, and can span enormous distances underwater
Laser time-of-¯ight systems, on the other hand, are
used in industrial settings but also have inherently
large work volumes Long standoffs from the system
to the measured surface are required
Another noncontact technique for acquiring
three-dimensional data is image depth of focus A camera
can be ®tted with a lens that has a very narrow, but
adjustable depth of ®eld A computer controls the
depth of ®eld and identi®es locations in an image
that are in focus A group of points are acquired at a
speci®c distance, then the lens is refocused to acquire
data at a new depth
Other three-dimensional techniques are tailored to
speci®c applications Interferometry techniques can be
used to determine surface smoothness It is frequently
used in ultrahigh precision applications that require
accuracies up to the wavelength of light Specialized
medical imaging systems such as magnetic resonance
imaging (MRI) or ultrasound also acquire
three-dimensional data by penetrating the subject of interest
The word ``vision'' usually refers to an outer shell
mea-surement, putting these medical systems outside the
scope of this chapter
The competing technology to three-dimensional
tri-angulation vision, as described in this chapter, are
CMM machines, time-of-¯ight devices, and depth of
®eld Table 1 shows a brief comparison among
differ-ent systems represdiffer-enting each of these technologies
The working volume of a CMM can be scaled up out loss of accuracy Triangulation systems and depth-of-®eld systems lose accuracy with large work volumes.Hence, both systems are sometimes moved as a unit toincrease work volume.Figure 1shows a triangulationsystem, known as a laser scanner Laser scanners canhave accuracies of a thousandth of an inch but thesmall work volume requires a mechanical actuator.Triangulation systems acquire an exceptionally largenumber of points simultaneously A CMM mustrepeatedly make physical contact with the object toacquire points and therefore is much slower
with-3.1.2 Note on Two-Dimensional Vision SystemsVision systems that operate with a single camera aretwo-dimensional vision systems Three-dimensionalinformation may sometimes be inferred from such avision system As an example, a camera acquirestwo-dimensional information about a circuit board
An operator may wish to inspect the solder joints onthe circuit board, a three-dimensional problem Forsuch a task, lighting can be positioned such that sha-dows of solder joints will be seen by the vision system.This method of inspecting does not require the directmeasurement of three-dimensional co-ordinate loca-tions on the surface of the board Instead the three-dimensional information is inferred by a clever setup.Discussion of two-dimensional image processing forinspection of three dimensions by inference can befound in Chap 5.2 This chapter will concern itselfwith vision systems that capture three-dimensionalposition locations
Table 1 Comparison of Three-Dimensional TechnologiesSystem Work volume (in.) resolution (in.)Depth (points/sec)SpeedTriangulation
(DCS Corp.)
CMM(Brown & Sharp Mfg Co.)
of 4 in/sec
Trang 15where the slope of the line is the pixel position divided
by the focal length:
35
xyz1
264
37
Equation (8) can be used to ®nd a pixel location,
xpixel; ypixel, for any point x; y; z in space dimensional information is reduced to two-dimen-sional information by dividing wxpixel by w Equation(8) cannot be inverted It is not possible to use a pixellocation alone to determine a unique x; y; z point
Three-In order to represent the camera in different tions, it is helpful to de®ne a zpixelco-ordinate that willalways have a constant value The equation below is aperspective projection matrix that contains such a con-stant:
loca-wxpixel
wypixel
wzpixelw
264
37
375
xyz1
264
37
Figure 3 The pinhole camera is a widely used approximate for a camera or projector
Figure 4 The pinhole camera model leads to the perspective projection matrix
Trang 16example, to simulate moving the focal point to a new
location, d, on the z-axis, one would use the equation
377
377
xyz1
266
377
10
This equation subtracts a value of d in the z-direction
from every point being viewed in the co-ordinate space
That would be equivalent to moving the camera
for-ward along the z-direction by a value of d
The co-ordinate space orientation, and hence the
camera's viewing angle, can be changed using standard
rotation matrices [1] A pinhole camera, ®ve units away
from the origin, viewing the world space at a 458 angle
with respect to x z-axis would have the matrix
377
377
377
xyz1
266
377
xyz1
266
377 11b
Once again, the world co-ordinates are changed to
re¯ect the view of the camera, with respect to the
pin-hole model
Accuracy in modeling a physical camera is
impor-tant for obtaining accurate measurements When
set-ting up a stereo vision system, it may be possible to
precisely locate a physical camera and describe that
location with displacement and rotation tion matrices This will require precision ®xtures andlasers to guide set up Furthermore, special cameralenses should be used, as standard off-the-shelf lensesoften deviate from the pinhole model Rather than try
transforma-to duplicate transformation matrices in the setup, adifferent approach can be taken
Let us consider the general perspective projectionmatrix for a camera located at some arbitrary locationand rotation:
wxpixel
wypixel
wzpixelw
264
37
375
xyz1
264
37
Specialized ®xtures are not required to assure a speci®crelationship to some physically de®ned origin.(Cameras, however, must always be mounted to hard-ware that prevents dislocation and minimizes vibra-tion.) The location of the camera can be determined
by the camera view itself A calibration object, withknown calibration points in space, is viewed by thecamera and is used to determine the aij constants.Equation (12) has 16 unknowns Sixteen calibrationpoints can be located at 16 different pixel locationsgenerating a suf®cient number of equations to solvefor the unknowns [2] More sophisticated methods of
®nding the aij constants exist, and take into accountlens deviations from the pinhole model [3,4]
3.2.3 System Types3.2.3.1 Passive Stereo ImagingPassive stereo refers to two cameras viewing the samescene from different perspectives Points corresponding
to the same location in space must be matched in theimages, resulting in two lines of sight Triangulationwill then determine the x; y; z point location.Assume the perspective projection transformationmatrix of one of the cameras can be described by Eq.(12) where xpixel; ypixel is replaced by x0; y0 The twoequations below can be derived by substituting for theterm w and ignoring the constant zpixel
a11 a41x0x a12 a42x0y a13 a43x0z
Trang 17b11 b41x00x b12 b42x00y b13 b43x00z
b21 b41y00x b22 b42y00y b23 b43y00z
where the aijconstants of Eq (12) have been replaced
with bij Equations (13)±(16) can be arranged in matrix
xyz
26
37
17
The constants aij and bij will be set based on the
posi-tion of the cameras in world space The cameras view
the same point in space at locations x0; y0 and x00; y00
on their respective image planes Hence, Eqs (13)±(16)
are four linearly independent equations with only three
unknowns, x; y; z A solution for the point of
trian-gulation, x; y; z, can be achieved by using
least-squares regression However, more accurate results
may be obtained by using other methods [4,5]
Passive stereo vision is interesting because of its
similarity to human vision, but it is rarely used by
industry Elements of passive stereo can be found in
photogrammetry Photogrammetry is the use of
pas-sive images, taken from aircraft, to determine
geogra-phical topology [6] In the industrial setting,
determining points that correspond in the two images
is dif®cult and imprecise, especially on smooth
manu-factured surfaces The uncertainty of the lines of sight
from the cameras result in poor measurements
Industrial systems usually replace one camera with a
projection system, as described in the section below
3.2.3.2 Active Stereo Imaging (Moire Systems)
In active stereo imaging, one camera is replaced with a
projector Cameras and projectors can both be
simu-lated with a pinhole camera model For a projector, the
focal point of the pinhole camera model is replaced
with a point light source A transmissive image plane
is then placed in front of this light source
A projector helps solves the correspondence
pro-blem of the passive system The projector projects a
shadow from a known pixel location on its image
plane The shadow falls on a surface that may havebeen smooth and featureless The imaging cameralocates the shadow in the ®eld of view using algorithmsespecially designed for the task The system activelymodi®es the scene of inspection to simplify and makemore precise the correspondence task Often the pro-jector projects a simple pattern of parallel stripesknown as a Ronchi pattern, as shown in Fig 5.Let us assume that the aij constants in Eq (17) cor-respond to the camera The bij constants woulddescribe the location of the projector Equation (17)was overdetermined The fourth equation, Eq (16),which was generated by the y00 pixel position, is notneeded to determine the three unknowns A location inspace can be found by
a11 a41x0 a12 a42x0 a13 a43x0
a21 a41y0 a22 a42y0 a23 a43y0
b11 b41x00 b12 b42x00 b13 b43x00
26
3
7 xyz
26
37
37
18All pixels in the y00-direction can be used to project asingle shadow, since the speci®c y00pixel location is notneeded Hence, a pattern of striped shadows is logical.Active stereo systems use a single camera to locateprojected striped shadows in the ®eld of view Thestripes can be found using two-dimensional edge detec-tion techniques described inChap 5.2 The image pro-cessing technique must assign an x00 location to theshadow This can be accomplished by encoding thestripes [7,8] Assume a simpli®ed Ronchi grid as
Figure 5 Example of an active stereo vision system
Trang 18shown in Fig 6 Each of the seven stripe positions is
uniquely identi®ed by a binary number The camera
images the ®eld of view three times Stripes are turned
on±off with each new image, based on the 3-bit
numer-ical representation The camera tracks the appearance
of shadows in the images and determines the x00
posi-tion based on the code
Prior to the advent of active stereo imaging, moire
fringe patterns were used to determine
three-dimen-sional surface displacements When a sinusoidal grid
is projected on a surface, a viewer using the same grid
will see fringes that appear as a relief map of the
sur-face [9±11] Figure 7 shows a conceptual example using
a sphere The stripe pattern is projected and an
obser-ver views the shadows as a contour map of the sphere
In order to translate the scene into measurements, a
baseline fringe distance must be established The moire
fringe patterns present an intuitive display for viewing
displacements
The traditional moire technique assumes the lines of
sight for light source and camera are parallel As
cussed in Sec 3.2.4, shadow interference occurs at
dis-crete distances from the grid This is the reason for the
relief mapping Numerous variations on the moire
sys-tem have been made including: specialized projection
patterns, dynamically altering projection patterns, and
varying the relationship of the camera and projector
The interested reader should refer to the many optical
journals available
The moire technique is a triangulation technique It
is not necessary for the camera to view the scenethrough a grid A digital camera consists of evenlyspaced pixel rows, that can be modeled as a grid.Active stereo imaging could be described as a moiresystem using a Ronchi grid projection and a digitalcamera
3.2.3.3 Laser ScannerThe simplest and most popular industrial triangulationsystem is the laser scanner Previously, active stereovision systems were described as projecting severalstraight-line shadows simultaneously A laser scannerprojects a single line of light onto a surface, for ima-ging by a camera Laser scanners acquire a single slice
of the surface that intersects the laser's projected plane
of light The scanner, or object, is then translated andadditional slices are captured in order to obtain three-dimensional information
For the laser scanner shown in Fig 8, the laserplane is assumed to be parallel to the x y-axis Eachpixel on the image plane is represented on the laserplane The camera views the laser light re¯ected fromthe surface, at various pixel locations Since the z-co-ordinate is constant, Eq (18) reduces to
a11 a41x0 a12 a42x0
a21 a41y0 a22 a41y0
xy
Figure 6 Ronchi grid stripes are turned on (value 1) and off
(value 0) to distinguish the x00position of the projector plane
Figure 7 Classic system for moire topology measurements