An introduction to digital image processing

I An Introduction to Digital Image Processing Bill Silver Chief Technology Officer Cognex Corporation, Modular Vision Systems Division Digital image processing allows one to enhance ima

Trang 1

I

An Introduction to Digital Image Processing

Bill Silver Chief Technology Officer Cognex Corporation, Modular Vision Systems Division

Digital image processing allows

one to enhance image features of

interest while attenuating detail

irrelevant to a given application,

and then extract useful information

about the scene from the

enhanced image This introduction

is a practical guide to the

challenges, and the hardware and

algorithms used to meet them

mages are produced by a variety of

physical devices, including still

and video cameras, x-ray devices,

electron microscopes, radar, and

ultrasound, and used for a variety of

purposes, including entertainment,

medical, business (e.g documents),

industrial, military, civil (e.g traffic),

security, and scientific The goal in

each case is for an observer, human or

machine, to extract useful information

about the scene being imaged An

example of an industrial application is

shown in figure 1

Often the raw image is not directly

suitable for this purpose, and must be

processed in some way Such

processing is called image

enhancement; processing by an

observer to extract information is

called image analysis Enhancement

and analysis are distinguished by their

output, images vs scene information,

and by the challenges faced and

methods employed

Image enhancement has been done

by chemical, optical, and electronic

means, while analysis has been done

mostly by humans and electronically

Digital image processing is a subset

of the electronic domain wherein the image is converted to an array of small

integers, called pixels, representing a

physical quantity such as scene radiance, stored in a digital memory, and processed by computer or other digital hardware Digital image processing, either as enhancement for human observers or performing autonomous analysis, offers advantages in cost, speed, and flexibility, and with the rapidly falling price and rising performance of personal computers it has become the dominant method in use

The Challenge

An image is not a direct measurement of the properties of physical objects being viewed Rather

it is a complex interaction among several physical processes: the intensity and distribution of illuminating radiation, the physics of

the interaction of the radiation with the matter comprising the scene, the geometry of projection of the reflected

or transmitted radiation from 3 dimensions to the 2 dimensions of the image plane, and the electronic characteristics of the sensor Unlike for example writing a compiler, where

an algorithm backed by formal theory exists for translating a high-level computer language to machine language, there is no algorithm and no comparable theory for extracting scene information of interest, such as the position or quality of an article of manufacture, from an image

The challenge is often under-appreciated by novice users due to the seeming effortlessness with which their own visual system extracts information from scenes Human vision is enormously more sophisticated than anything we can engineer at present and for the foreseeable future Thus one must be

Figure 1 Digital

image processing is used to verify that the correct tire is installed

on vehicles at

GM

Trang 2

careful not to evaluate the difficulty of

a digital image processing application

on the basis of how it looks to

humans

Perhaps the first guiding principal is

that humans are better at judgement

and machines are better at

measurement Thus determining the

precise position and size of an

automobile part on a conveyer, for

example, is well-suited for digital

image processing, whereas grading

apples or wood is quite a bit more

challenging (although not impossible)

Along these lines image enhancement,

which generally requires lots of

numeric computation but little

judgement, is well-suited for digital

processing

If teasing useful information out of

the soup that is an image isn’t

challenging enough, the problem is

further complicated by often severe

time budgets Few users care if a

spreadsheet takes 300 milliseconds to

update rather than 200, but most

industrial applications, for example,

must operate within hard constraints

imposed by machine cycle times

There are also many applications, such

as ultrasound image enhancement,

traffic monitoring, and camcorder

stabilization, that require real-time

processing of a video stream

To make the speed challenge

concrete, consider that the video

stream from a standard monochrome

video camera produces around 10

million pixels per second As of this

writing the typical desktop PC can

execute maybe 50 machine

instructions in the 100 ns available to

process each pixel The set of things

one can do in a mere 50 instructions is

rather limited

On top of this many digital image

processing applications are

constrained by severe cost targets

Thus we often face the engineer’s

dreaded triple curse, the need to

design something good, fast, and

cheap all at once

Hardware

Lights All image processing applications start with some form of illumination, typically light but more generally some form of energy In some cases ambient light must be used, but more typically the illumination can be designed for the application In such cases the battle is often won or lost right here—no amount of clever software can recover information that simply isn’t there due to poor illumination

Generally one can choose illumination intensity, direction, spectrum (color), and continuous or strobed Intensity is easiest to choose and least important; any decent image proces sing algorithm should be immune to significant variations in contrast, although applications that demand photometric accuracy will require control and calibration of intensity

Direction is harder to choose and more important, as any professional photographer knows The choices range from point sources at one extreme to “sky” illumination (equal intensity from every direction) at the other In between are various extended sources such as linear and ring lights

The goal generally is to produce consistent appearance As a rule matte surfaces do better with point sources and shiny, specularly -reflecting surfaces do better with diffuse, extended sources A design that allows computer-controlled direction (usually

by switching LEDs on and off) is often ideal

Illumination color can sometimes be used as form of image enhancement

Its primary value is that it’s cheap and adds zero processing time

High speed image acquisition for rapidly moving or vibrating objects may require a strobe Most cameras have an electronic shutter which is preferable for low- to medium-speed acquisition, but as the exposure times get shorter the amount of light needed increases beyond what is reasonable to supply continuously

Camera For our purposes a camera is any device that converts a pattern of radiated energy into a digital image stored in a random-access memory In the past this operation was divided into two pieces: conversion of energy

to electrical signal, considered to be the camera’s function, and conversion and storage of the signal in digital

form, performed by a digitizer As of

this writing the distinction is becoming blurred, and before long cameras will feed directly to computer memory via USB, Ethernet, or IEEE

1394 interfaces

Camera technology and the characteristics of the resulting images are driven almost exclusively by the highest volume applications, which until recently has been consumer television Thus most visible-light cameras in current use for digital image processing have resolution and speed chara cteristics established by

TV broadcast standards almost a half-century ago

As of this writing the typical visible light monochrome camera would have

a resolution of 640 x 480 pixels, produce 30 frames per second, and support electronic shuttering and rapid reset (the ability to reset to the beginning of a frame at any time, to avoid having to wait before beginning

an image acquisition) It would be based on CCD sensor technology, which produces good image quality but is expensive relative to most chips with a similar number of transistors Significantly higher resolution and speed devices are available but often prohibitively expensive An

alternative is the line-scan camera,

which uses a one-dimensional sensor and relies on scene motion to produce

an image

For the first time ever the landscape

is changing, as high volume personal computer multimedia applications proliferate First affected were monitors, which for some time have offered higher-than-broadcast speed and resolution One can expect cameras to follow, with high-speed, high-resolution devices driven by consumer digital still camera

Trang 3

technology and lower-resolution, ultra

low cost units driven by

entertainment, Internet conferencing,

and perceptual user interface

applications

The low cost devices may have the

greater influence These are based on

emerging CMOS sensor technology,

which uses the same process as most

computer chips and is therefore

inexpensive due simply to higher

process volume Currently image

quality is not up to CCD standards,

but that is certain to change as the

technology matures

Although monochrome images have

almost entirely disappeared in

consumer applications, they still

represent the majority in digital image

proces sing due primarily to camera

cost and data processing burden (for

color those 50 instructions per pixel

would drop to 17) Color cameras

come in two forms: single sensor

devices that alternate red, green, and

blue pixels in some pattern, and much

higher quality but more expensive

devices with separate sensors for each

color

Monochrome pixels are usually 8

bits (256 gray levels), although 10-

and 12-bit devices are sometimes

used Video signals tend to be noisy,

however, and careful engineering is

required to get more than 8 useful bits

out of the signal Furthermore, robust

image analysis algorithms do not rely

on photometric accuracy, so unless the

application calls for accurate

measurements of scene radiance, there

is usually little or no benefit beyond 8

bits Wide dynamic range is more

useful than photometric accuracy, but

it is usually best achieved by using a

logarithmic response than by going to

more bits

Color pixels are 3-vectors (this is a

fact of human physiology, not

physics) Several representations,

called color spaces, are commonly

used for representing color The

simplest to produce is the {red, green,

blue} space (RGB), although {hue,

intensity, saturation} (HIS) may be

more useful for image analysis For

the lower quality single-sensor

cameras, the {luminance, chroma1,

chroma2} space (YCC) is sometimes used

Action Until recently the computational burden of digital image processing for the most part had to be handled by dedicated hardware Typically such hardware consisted of plug-in cards for PCI and/or VME backplanes, containing one or more application-specific integrated circuits (ASICs) designed for digital image processing

The last few years has seen a move away from dedicated hardware towards pure software solutions, due

to the advent first of DSPs and later general-purpose CPUs that fall at or above the 1 billion operations per second mark Of these the most significant is the development of MMX processors by Intel Corporation

MMX technology is well-suited for digital image processing Although it

is hardly alone in being so, MMX is so widely available (all Intel-compatible PCs made since 1997) that it is the de facto standard for merchant digital image processing software This development is likely to solidify with the expected introduction sometime in

2000 on Merced processors of EPIC technology, jointly developed by Intel and Hewlett-Packard The EPIC architecture is superb for digital image processing

The full power of the new processors is generally available only

to skilled assembly language programmers, and this is unlikely to change in the foreseeable future

Compiler vendors and the EPIC architects may argue otherwise, but direct experience in high-performance digital image processing has consistently shown this For time-critical applications, users should turn

to specialists

Algorithms

We divide our discussion of digital image processing algorithms into image enhancement and image analysis The distinction is useful if not always clear-cut

Generally image enhancement algorithms produce modified images

as output, intended for subsequent analysis by humans or machines Their output behavior and execution speed are easy to characterize, and the basic algorithms are generally in the public domain

Image analysis, by contrast, produces information that is much smaller in quantity but much more highly refined than an image, for example the position and orientation

of an object In many cases the output

is just an accept/reject decision, the smallest quantity of information but perhaps the highest refinement Output behavior and execution speed are generally difficult and sometimes impossible to characterize Image analysis algorithms are often a vendor’s most important intellectual property

A simple example drawn from human experience will make these points concrete Imagine focusing a lens, which is an act of image enhancement It is easy to characterize what will happen (the picture gets sharper) and estimate how long it will take (a couple of seconds) The results will be fairly consistent from person to person, and there is no great secret as

to how it’s done

Now imagine that you are shown a picture of a specific car and asked to find it in a parking lot and report the space number This is image analysis

If the lot is nearly empty then the results and time needed are easy to characterize and consistent If the lot

is full, however, there is no telling how long it will take or even whether the correct answer will be reported, since many cars look alike Characterizing the output space number as a function of the input distribution of scene radiance measurements is essentially impossible Results may vary widely from person to person, and an individual’s “proprietary” methods may have a large bearing on the outcome

The difficulty in characterizing the behavior of automated image analysis leads to a level of risk that is far

Trang 4

greater than that of more typical

software development projects, which

are already notoriously risky The best

ways to manage the risk are to rely on

experienced professional developers,

to share the risk between vendors and

their clients, and to characterize

performance empirically using a large

database of stored images

Image Enhancement

Table 1 shows a classification of

digital image enhancement algorithms

in common use The classification

given is useful but neither complete

nor unique The algorithms are

broadly divided into two classes, point

transforms and neighborhood

operations

Point transforms produce output

images where each pixel is some

function of a corresponding input

pixel The function is the same for

every pixel, and is often derived from

global statistics of the image With

neighborhood operations, each output

pixel is a function of a set of

corresponding input pixels This set is

called a neighborhood because it is

usually some region surrounding a

corresponding center pixel, for

example a 3x3 neighborhood

Point transforms generally execute

rapidly but are limited to global

transformations such as adjusting

overall image contrast Neighborhood

operations can implement frequency

and shape filtering and other

sophisticated enhancements, but

execute more slowly because the

neighborhood must be recomputed for

each output pixel

Pixel mapping point transforms

include a large set of enhancements

that are useful with scalar-valued

pixels (e.g monochrome images)

Often these are implemented by a

single software routine (or hardware

module) that uses a lookup table

Lookup tables are fast and can be

programmed for any function, offering

the ultimate in generality at reasonable

speed MMX and similar processors,

however, can perform a variety of

functions much faster by direct

computation than by table lookup, at a

cost of increased software complexity

Pixel maps are most useful when the function is computed based on global statistics of the image One can process an image to have a desired gain and offset, for example, based on the mean and standard deviation, or alternatively, the minimum and maximum, of the input

Histogram specification is a powerful pixel mapping point transform wherein an input image is processed so that it has the same distribution of pixel values as some reference image The pixel map for histogram specification is easily computed from histograms of the input and reference images Histogram specification is a useful enhancement prior to an analysis step whose goal is some sort of comparison between the input and the reference

Thresholding is a commonly used enhancement whose goal is to segment

an image into object and background

A threshold value is computed above (or below) which pixels are considered

“object” and below (or above) which

“background” Sometimes two thresholds are used to specify a band

of values that correspond to object pixels Thresholds can be fixed but are best computed from image statistics

Thresholding can also be done using neighborhood operations In all cases the result is a binary image—only black and white are represented, with

no shades of gray

Thresholding has a long but checkered his tory in digital image proces sing Up until the mid 1980’s

thresholding was a nearly universal first step in image analysis, due to the high cost of hardware needed to do gray-scale processing As hardware cost dropped and sophisticated new algorithms were developed, thresholding became less important When thresholding works it can be quite effective, because it directly identifies objects against a background, and eliminates unimportant shading variation Unfortunately in most applications scene shading is such that objects cannot be separated from background

by any threshold, and even when an appropriate threshold value exists in principal it is notoriously difficult to find it automatically Furthermore, thresholding destroys useful shading information and applies essentially infinite gain to noise at the threshold value, resulting in a significant loss of robustness and accuracy

As a general rule, given the performance of modern processors and gray-scale image analysis algorithms, thresholding and image analysis algorithms that depend on thresholding are best avoided

Color space conversion is used to convert between, for example, the RGB space provided by a camera to the HIS space needed by an image analysis algorithm Accurate color space conversion is computationally expensive, and often crude approximations are used in time-critical applications These can be quite effective, but it is a good idea to

TABLE 1

IMAGE ENHANCEMENT ALGORITHMS

Point transforms

• pixel mapping

− gain/offset control

− histogram specification

− thresholding

• color space transforms

• time averaging

Neighborhood operations

• linear filtering

− smoothing

− sharpening

• boundary detection

• non-linear filtering

− median filter

− morphology

• re-sampling

− resolution pyramids

− coordinate transforms

Trang 5

understand the tradeoffs between

speed and accuracy before choosing

an algorithm

Time averaging is the most

effective method of handling very low

contrast images Pixel maps to

increase image gain are of limited

utility because they affect signal and

noise equally Neighborhood

operations can reduce noise but at the

cost of some loss in image fidelity

The only way to reduce noise without

affecting the signal is to average

multiple images over time The

amplitude of uncorrelated noise is

attenuated by the square root of the

number of images averaged When

time averaging is combined with a

gain-amplifying pixel map, extremely

low contrast scenes can be processed

The principal disadvantage of time

averaging is the time needed to

acquire multiple images from a

camera

Linear filters are the best understood

of the neighborhood operations, due to

the extensively developed

mathematical framework of signal

theory dating back 200 years to

Fourier Linear filters amplify or

attenuate selected spatial frequencies,

can achieve such effects as smoothing

and sharpening., and usually form the

basis of re-sampling and boundary

detection algorithms

Linear filters can be defined by a

convolution operation, where output

pixels are obtained by multiplying

each neighborhood pixel by a

corresponding element of a

like-shaped set of values called a kernel,

and then summing those products

Figure 2a, for example, shows a

rather noisy image of a cross within a

circle Convolution with the

smoothing (low pass) kernel of figure

2b produces figure 2c In this example

the neighborhood is 25 pixels arranged

in a 5x5 square Note how the

high-frequency noise has been attenuated,

but at a cost of some loss of edge

sharpness Note also that the kernel

elements sum to 1.0 for unity gain

The smoothing kernel of figure 2b is

a 2D Gaussian approximation The 2D

Gaussian is among the most important

functions used for linear filtering Its

frequency response is also a Gaussian, which results in a well-defined pass-band and no ringing Kernels that approximate the difference of two Gaussians of different size make excellent band-pass and high-pass filters

Figure 2d illustrates the effect of a band-pass filter based on a difference

of Gaussian approximation using a 10x10 kernel Note that both the high frequency noise and the low frequency uniform regions have been attenuated,

leaving only the mid-frequency components of the edges

Linear filters can be implemented

by direct convolution or in the frequency domain using FFTs While frequency domain filtering is theoretically more efficient, in practice direct convolution is almost always preferred Convolution, with its use of small integers and sequential memory

addressing, is a better match for digital

hardware than FFTs, is simpler to implement, and has little trouble with boundary conditions

Boundary detection has an extensive history and literature, which ranges from simple edge detection to complex algorithms that might more properly be considered under image analysis We somewhat arbitrarily consider boundary detection under image enhancement because the goal

is to emphasize features of interest (the boundaries) and attenuate

everything else

The shading produced by an object

in an image is among the least reliable

of an object’s properties, since shading is a complex combination of illumination, surface properties, projection geometry, and sensor characteristics Image discontinuities,

on the other hand, usually correspond directly to object surface

discontinuities (e.g edges), since the

Figure 3 Image discontinuities usually correspond to physical object features, while

shading is often unreliable

2e

.004 016 023 016 004 016 062 094 062 016 023 094 140 094 023 016 062 094 062 016 004 016 023 016 004

Figure 2 An image can be enhanced to reduce noise or emphasize boundaries

Trang 6

other factors tend not to be

discontinuous Image discontinuities

are generally consistent geometrically

(i.e in shape) even when not

consistent photometrically (see figure

3) Thus identifying and localizing

discontinuities, which is the goal of

boundary detection, is one of the most

important digital image processing

tasks

Boundaries are usually defined to

occur at points where the rate of

change of image brightness is a local

maximum, i.e at peaks of the first

derivative or, equivalently,

zero-crossings of the second derivative On

a discrete grid such points can only be

estimated, which can be done with

linear filters designed to estimate first

or second derivative The difference of

Gaussian of figure 2d, for example, is

a second derivative estimator, and

boundaries show up as zero -crossings

that occur at the sharp black-to-white

transition points in the figure

Figure 2e shows the output of a first

derivative estimator, often called a

gradient operator, applied to a

noise-free version of figure 2a The gradient

operator consists of a pair of linear

filters designed to estimate first

derivative horizontally and vertically,

which gives components of the

gradient vector The figure shows

gradient magnitude, with boundaries

defined to occur at the local

magnitude peaks

Crude edge detectors simply mark

image pixels corresponding to

gradient magnitude peaks or

second-derivative zero-crossings

Sophisticated boundary detectors

produce organized chains of boundary

points, with sub-pixel position and

boundary orientation (accurate to a

few degrees) at each point The best

commercially available boundary

detectors are also tunable in spatial

frequency response over a wide range,

and operate at high speed

Non-linear filters designed to pass

or block desired shapes rather than

spatial frequencies have been found

useful for digital image enhancement

The first we consider is the median

filter, whose output at each pixel is the

median of the corresponding input

neighborhood Roughly speaking the effect of a median filter is to attenuate image features smaller in size than the neighborhood and pass image features larger than the neighborhood

Figure 2f shows the effect of a 3x3 median filter on the noisy image of figure 2a Note that the noise, which generally results in features smaller than 3x3 pixels, is strongly attenuated

Unlike the linear smoothing filter of figure 2c, however, note that there is

no significant loss in edge sharpness, since all of the cross and circle features are much larger than the neighborhood Thus a median filter is often superior to linear filters for noise reduction One of the main dis advantages of the median filter, however, is that it is very expensive to compute compared to linear filters, and the disparity gets worse as the neighborhood size increases

Morphology refers to a broad class

of non-linear shape filters Like the linear filters the operation is defined

by a matrix of elements applied to input image neighborhoods, but instead of a sum of products, a minimum or maximum of sums is computed These operations are called

erosion and dilation, and the matrix of

elements is usually referred to as a

probe rather than a kernel

Erosion followed by a dilation using

the same probe is called an opening,

and dilation followed by erosion is

called closing

The 4 basic morphology operations have many uses, one of which is shown in figure 4 In the figure, the input image on the left is opened with

a circular probe and a rectangular probe, resulting in the images shown

on the right One might imagine the probe to be a paintbrush, with the output being everything the brush can paint while placed wherever in the

input it will fit (i.e entirely on black with no white showing) Notice how the opening operation with appropriate probes is able to pass certain shapes and block others

For simplicity the example of figure

4 illustrates opening as a binary (black/white) operation, but in general the 4 morphology operations are defined on gray-level images, with the concept of probe fitting defined on 2D surfaces in 3-space

Digital re-sampling refers to a process of estimating the image that would have resulted had the continuous distribution of energy falling on the sensor been sampled differently A different sampling, perhaps at a different resolution or orientation, is often useful

One of the most important forms of digital re-sampling obtains a series of images at successively coarser resolution Such a series of images is called a resolution pyramid

Conventionally each image in the series is half the resolution of the previous in each dimension (1/4 the

Figure 4 A morphology “opening” operation acts as a shape filter, whose

behavior is controlled by a “probe”

Trang 7

number of pixels), but other choices

are often preferable Resolution is

reduced by a combination of low-pass

filtering and sub-sampling (selecting

every nth pixel)

A resolution pyramid forms the

basis of many image analysis

algorithms that follow a coarse-to-fine

strategy The coarse resolution images

allow rough information to be

extracted quickly, without being

distracted and confused by fine and

often irrelevant detail The algorithm

proceeds to finer resolution images to

localize and refine this information

Another important class of

re-sampling algorithms are coordinate

transforms, which can shift by

sub-pixel amounts, rotate and size images,

and convert between Cartesian and

polar representations Output pixel

values are interpolated from a

neighborhood of input values Three

methods is common use are nearest

neighbor, which is the fastest, bilinear

interpolation, which is more accurate

but slower and suffers some loss of

high frequency components, and cubic

convolution, which is very accurate

but slowest

Image Analysis

It’s only a slight oversimplification

to say that the fundamental problem of

image analysis is pattern recognition,

the purpose of which is to recognize

image patterns corresponding to

physical objects in the scene, and

determine their pose (position,

orientation, size, etc.) Often the

results of pattern recognition are all

that’s needed, for example a robot

guidance system supplies an object’s

pose to a robot, and in other cases a

pattern recognition step is needed to

find an object so that it can, for

example, be inspected for defects or

correct assembly

Pattern recognition is hard because a

specific object can give rise to a wide

variety of images depending on all of

the factors previously discussed

Furthermore, similar-looking objects

may be present in the scene that must

be ignored, and the speed and cost

targets may be severe

Blob analysis is one of the earliest methods widely used for industrial pattern recognition The premise is simple—classify image pixels as object or background by some means, join the classified pixels to make discrete objects using neighborhood connectivity rules, and compute various moments of the connected objects to determine object position (1st moments), size (0th moment), and orientation (principal axis of inertia, based on 2nd moments)

The advantages of blob analysis include high speed, sub-pixel accuracy (in cases where the image is not subject to degradation), and the ability

to tolerate and measure variations in orientation and size Disadvantages include inability to tolerate touching

or overlapping objects, poor performance in the presence various forms of image degradation, inability

to determine the orientation of certain shapes (e.g squares), and poor ability

to discriminate amongst similar-looking objects

Perhaps the most serio us problem, however, is that in practice the only generally reliable method ever found for separating object from background was to arrange for the objects to be entirely brighter or entirely darker than the background This requirement

so severely limits the range of potential applications that before long other methods for pattern recognition were developed

Normalized correlation (NC) has been the dominant method for pattern recognition in industry over the last decade It is a member of a class of algorithms known as template matching, which starts with a training

step wherein a picture of an object to

be located (the template) is stored At run-time the template is compared to like-sized subsets of the image over a range of positions, with the position of greatest match taken to be the position

of the object The degree of match (a numerical value) can be used for inspection, as can comparisons of individual pixels between the template and image at the position of best match

NC is a gray-scale match function that uses no thresholds and ignores variation in overall pattern brightness and contrast It is ideal for use in template matching algorithms

NC template matching overcomes many of the limitations of blob analysis —it can tolerate touching or overlapping objects, performs well in the presence of various forms of image degradation, and the NC match value is useful in some inspection applications Most significantly, perhaps, objects need not be separated from background by brightness, enabling a much wider range of applications

Unfortunately, NC gives up some of the significant advantages of blob analysis, particularly the ability to tolerate and measure variations in orientation and size NC will tolerate small variations, typically a few degrees and a few percent (depending

on the specific template), but even within this small range of orientation and size the accuracy of the results falls off rapidly

These limitations have been partly overcome by using re-sampling methods to extend NC by rotating and scaling the templates so as to measure orientation and size These methods have been expensive, however, and by the time computer cost and performance made them practical they were superceded by the far superior geometric methods described below The Hough transform is a method for recognizing parametrically defined curves such as lines and arcs, as well

as general patterns It starts with an edge detection step, which makes it more tolerant of local and non-linear shading variations than NC When used to find parameterized curves the Hough transform is quite effective; for general patterns NC may have a speed and accuracy advantage, as long as it can handle the shading variations Geometric pattern matching (GPM)

is replacing NC template matching as the method of choice for industrial pattern recognition Template methods suffer from fundamental limitations imposed by the pixel grid nature of the template itself Translating, rotating,

Trang 8

and sizing grids by non-integer

amounts requires re-sampling, which

is time consuming and of limited

accuracy This limits the pose

accuracy that can be achieved with

template-based pattern recognition

Pixel grids, furthermore, represent

patterns using gray-scale shading,

which as we’ve observed is often not

reliable

GPM avoids these limitations by

representing an object as a geometric

shape, independent of shading and not

tied to a discrete grid Sophisticated

boundary detection is used to turn the

pixel grid produced by a camera into a

conceptually real-valued geometric

description that can be translated,

rotated, and sized quickly and without

loss of fidelity When combined with

advanced pattern training and

high-speed, high-accuracy pattern matching

modules, the result is a truly general

purpose pattern recognition and

inspection method

A well-designed GPM system

should be as easy to train as NC

template matching, yet offer rotation,

size, and shading independence It

should be robust under conditions of

low contrast, noise, poor focus, and

missing and unexpected features

Pattern recognition time is

application-specific, as is typical of

image analysis methods For a

ballpark figure, to locate a 150x150

pixel pattern in a 500x500 field of

view with 360° orientation uncertainty

might require 30 – 50 milliseconds on

PCs current as of this writing Always

test speed for a specific application,

however, since times can vary

considerably beyond any specified

range

GPM is capable of much higher

pose accuracy than any

template-based method, as much as an order of

magnitude better when orientation and

size vary Table 2 shows what can be

achieved in practice when patterns are

reasonably close to the training image

in shape, and not too degraded

Accuracy is generally higher for larger

patterns; the example of table 2

assumes a pattern in the 150x150 pixel

range

GPM is also capable of providing detailed data on differences between a trained pattern and an object being inspected This difference data is also rotation, size, and shading independent

Putting it All Together Often a complete digital image processing system combines many of the above image enhancement and analysis methods In the following example, the goal is to inspect objects

by looking for differences in shading between an object and a pre -trained,

defect-free example called a golden template

Simply subtracting the template from an image and looking for differences does not work in practice, since the variation in gray-scale due to ordinary and acceptable conditions can be as great as that due to defects

This is particularly true along edges, where slight (i.e subpixel) mis -registration of template and image can give rise to large variation in gray-scale Variation in illumination and surface reflectance can also give rise

to differences that are not defects, as can noise

A practical method of template comparison for inspection uses a combination of enhancement and analysis steps to distinguish shading variation due to defects from that due

to ordinary conditions:

1 A pattern recognition step (e.g

GPM) determines the relative pose of the template and image

2 A digital re-sampling step uses the pose to achieve precise alignment of template to image

3 A pixel mapping step using histogram specification compensates for variations in illumination and surface reflectance

4 The absolute difference of the template and image is computed

5 A threshold is used to mark pixels that may correspond to defects Each pixel has a separate threshold, with pixels near edges having a higher threshold because their gray-scale is more uncertain

6 A blob analysis or morphology step is used to identify those clusters of marked pixels that correspond to true defects

Further Reading

Digital image processing is a broad field with an extensive literature This introduction could only summarize some of the more important methods

in common use, and may suffer from a bias towards industrial applications

We have entirely ignored image compression, 3D reconstruction, motion, texture, and many other significant topics

The following are suggested for further reading Ballard and Brown gives an excellent survey of the field, while the others provide more technical depth

Ballard, D.H and Brown, C.M

(1982) Computer Vision

Prentice-Hall, Englewood Cliffs, New Jersay

Horn, B.K.P (1986) Robot Vision

MIT Press, Cambridge, Massachusetts

Pratt, W.K (1991) Digital Image Processing, 2nd Ed John Wiley &

Sons, New York, NY

Rosenfeld, A and Kak, A.C (1982)

Digital Picture Processing, Vol 1 and 2, 2nd Ed., Academic Press,

Orlando, Florida

TABLE 2 GEOMETRIC PATTERN MATCHING ACCURACY

Translation ±0.025 pixels Rotation ±0.02 degrees

Trang 9

Cognex Corporation

One Vision Drive, Natick, MA

01760

Tel: (508) 650-3000

Fax: (508) 650-3333

Web: www.cognex.com

Email: mktg@cognex.com

Cognex Corpration

Định dạng
Số trang	9
Dung lượng	270,47 KB