Automating quality control in manufacturing systems universität ( TQL)

Abstract This thesis explores the possibilities of using Machine Learning methods for mated quality control in the production process of cast iron workpieces.. This thesis tries to explo

Trang 1

Automating Quality Control in

Manufacturing Systems

Combining Knowledge-Based and Rule Learning Approaches

Masterarbeit

im Studiengang Wirtschaftsinformatik der Fakultät

Wirtschaftsinformatik und Angewandte Informatik

der Otto-Friedrich-Universität Bamberg

Verfasser:

Themensteller:

Abgabedatum:

Thomas Hecker Prof Dr Ute Schmid 30.9.2007

Trang 2

Diese Arbeit enthält zum Teil vertrauliche Informationen Jegliche ergabe an Dritte ist ohne Einwilligung des Fraunhofer Instituts für In-tegrierte Schaltungen strengstens untersagt

Weit-Notice:

Parts of this work contain confidential information Any disclosure to

a third party without permission of Fraunhofer Institute for IntegratedCircuits, Erlangen is hereby strictly prohibited

Trang 3

Acknowledgments

I would like to thank my thesis advisor, Prof Dr Ute Schmid, for her

valuable input during the creation of this thesis and the patience and

many encouragements she had for me during my entire time at

Bam-berg University Her teaching in the area of Cognitive Systems made

this time a very rewarding experience

I would also like to express my gratitude to Dr Christian

Münzen-mayer and Dipl.-Ing Klaus Spinnler, for enabling me to write part of

my thesis at Fraunhofer Institute for Integrated Circuits, Erlangen This

thesis would not have been possible without their assistance Also

many thanks to Dr Thomas Wittenberg, who brought all the

aforemen-tioned people together, enabling this thesis in the first place

Thomas Hecker

Bamberg, Sept 2007

This document was typeset using a modified version of the i10 diploma

thesis template made by Thorsten Karrer from the Media Computing

Group at RWTH Aachen University

Trang 4

Abstract

This thesis explores the possibilities of using Machine Learning methods for mated quality control in the production process of cast iron workpieces In particu-lar, rule induction methods are combined with existing domain knowledge to trainclassifiers from workpieces that are each described through a variable list of de-fects The research finds that most traditional learning approaches are too restricted

auto-to deal with this kind of data Instead, methods of Inductive Logic Programmingshow more flexibility in this domain

Diese Arbeit behandelt die Möglichkeiten von Methoden des maschinellen nens für den Einsatz bei der Qualitätskontrolle in der industriellen Produktion vonGußeisenteilen Es wird insbesondere geprüft, wie Methoden der Regelinduktion

Ler-in KombLer-ination mit vorhandenem Domänenwissen zum Anlernen von toren verwendet werden können Die Klassifikatoren basieren hierbei auf Datenvon Gußeisenteilen, welche durch eine variable Anzahl von Defekten beschriebensind Die Ergebnisse der Arbeit zeigen, dass die traditionellen Lernmethoden zulimitiert sind, um mit dieser Art von Daten umgehen zu können Stattdessenscheinen Methoden der induktiven logischen Programmierung für diese Domänebesser geeignet

Trang 5

Contents

2.1 Industrial Image Processing 3

2.2 Visual Quality Inspection in Cast Iron Production 5

2.2.1 The Problem of Pores in the Production Process 5 2.2.2 The Pore Class PK3 5

2.2.3 Problems Associated with the Standard 6

2.2.4 Structure of the Current System 7

2.3 Machine Learning 8

2.4 Knowledge Based Systems 9

2.5 Related Work 11

Trang 6

Contents v

3.1 Data Acquisition 13

3.1.1 Feature Extraction 13

3.1.2 The Labeling Process 15

3.2 Analysis of the Cleaned Datasets 16

3.3 Insights Gained from the Acquisition Process 17

4 Experiments: General Considerations 22 4.1 Experiment Setup 22

4.1.1 Environment 22

4.1.2 Evaluation Method 23

4.1.3 Parameter Tuning 25

4.2 Significance Testing 28

4.3 Initial Experiments 30

4.3.1 Description of the Static Model 31

4.3.2 Experiment 0.a: Static Model vs Default Classifier 32 4.3.3 Discussion 33

5 Experiments I – The Traditional Approach 36 5.1 Feature Extraction 37

5.2 Description of the Learning Algorithms 39

5.2.1 C4.5 40

5.2.2 CN2 42

Trang 7

Contents vi

5.2.3 PART 43

5.2.4 Remarks 45

5.3 Experiments 45

5.4 Discussion 46

6 Experiments II – Combining Fuzzy and Crisp Methods 50 6.1 A Short Introduction to Fuzzy Logic 51

6.2 Feature Extraction 53

6.2.1 The Fuzzification Process 53

6.2.2 Defuzzification to Symbolic Features 58

6.3 Experiments 58

6.4 Discussion 59

7 Experiments III – Learning Fuzzy Rules 63 7.1 The FuzzConRI Learner 63

7.2 Own Extensions to FuzzConRI 66

7.2.1 Certainty Factors 66

7.2.2 Non-Binary Defuzzification as Conflict Resolution Strategy 67

7.3 Experiments III.a - Initial Results 67

7.4 Adjusting Membership Functions with Genetic Algorithms 69 7.4.1 Initial Considerations 69

7.4.2 Genetic Algorithms 70

Trang 8

Contents vii

7.4.3 Representing Membership Functions for Genetic

Algorithms 71

7.5 Experiments III.b - Tuning Membership Functions 73

7.5.1 Experiment Setup 73

7.5.2 Experiments 74

7.6 Discussion of the Results 78

8 Experiments IV – An Approach for Learning on Different Lev-els of Abstraction 82 8.1 Learning on Different Levels of Abstraction 84

8.1.1 General Considerations 84

8.1.2 A Prolog Framework for a PK3 Classifier 86

8.2 The FOIL Algorithm 88

8.3 Transformation of Predicates 90

8.4 Learning Strategy 91

8.5 Experiments – Learning Very Bad Pores 92

8.5.1 Experiments IV.a - Going the Most Obvious Way 92 8.5.2 Experiments IV.b - Using an Informed Search Strat-egy for more Examples 93

8.5.3 Experiments IV.c - Learning Location-Dependent Rules 95

8.6 Discussion of the Results 96

Trang 9

Contents viii

10.1 Desired Extensions and Suggestions for Further Work 102

10.1.1 Noise-Robust FOIL 102

10.1.2 Beam-Search FOIL 103

10.1.3 Constraining Arguments of Predicates 103

10.1.4 Back to Fuzzy Logic 104

10.1.5 Growing Non-Rectangular Regions for Rules 104

10.1.6 Using Hierarchical Agglomerative Clustering Meth-ods to Learn Location-Specific Rules 106

10.1.7 Incorporating Misclassification Costs 107

10.2 Summary 108

A Description of the Contents of the Accompanying DVD 113

B Prolog Framework and Ruleset for Multi-Level Classification

Trang 10

List of Figures

2.1 Hardware Set-up of an Industrial Image Processing System 4

2.2 Structure of the Quality Control System 8

2.3 Structure of an Expert System 10

3.1 Masking the Workpiece’s Region of Interest 15

3.2 An Artefact Resulting from an Incorrect Mask Overlay 18

3.3 Imprecise Measurement of Pore Distance to ROI Border 18

3.4 Imprecise Measurement of Pore Size: Underestimating

Size 19

3.5 Imprecise Measurement of Pore Size: Overestimating Size 19

3.6 Images of F1100 (top) and F2000 (bottom) areas with

crit-ical edges marked 20

3.7 Distribution of Defects on Workpieces 21

3.8 Distribution of Pore Sizes on Workpieces 21

4.1 Different Treatment of Small Pores in F1100 and F2000

Areas 35

5.1 Overview of the C4.5 Algorithm 41

Trang 11

List of Figures x

5.2 The CN2 Algorithm 44

5.3 Hypotheses Produced from the F2000 Dataset with Nu-merical Features 49

6.1 Membership Functions of the Linguistic Variables for Pore Size and Distance to Border 53

6.2 Fuzzy Max-Min-Inference Illustrated 54

6.3 A Measured Pore Size of 0.5mm Fuzzified 62

6.4 A Measured Distance of 1.0mm to the ROI Border Fuzzi-fied 62

6.5 A Measured Distance of 1.0mm between two Pores Fuzzi-fied 62

7.1 The FuzzConRI Algorithm 65

7.2 The Linguistic Variable Used in the Rule Consequents 68

7.3 General Outline of a Genetic Algorithm 71

7.4 Representation of L-,LR- and R-Type Fuzzy Sets 72

7.5 Alternative Representation of L-,LR- and R-Type Fuzzy Sets 72

7.6 Testing the Fitness of an MF/Learning Parameter Con-figuration 74

7.7 Correlation between alpha and Accuracy 80

7.8 Comparison of Tuned Membership Functions 81

8.1 Classification on Different Levels of Abstraction 85

8.2 The PK3 Standard as Prolog Rules 87

8.3 The FOIL Algorithm 88

Trang 12

List of Figures xi

8.4 Model with Location-Specific Rules 99

10.1 Approximating Non-Rectangular Areas 105

10.2 Approximating Areas Through Connected Circles 106

Trang 13

List of Tables

3.1 Class Distributions on the Cleaned Datasets 16

4.1 Comparison of Prediction Accuracy between Default

(Ma-jority Vote) Classifier and the Static Model 32

4.2 Confusion Matrices for the Static Model 32

5.1 Accuracy Estimates of the Decision Tree & Rule Learners

on the Datasets with Numerical Features 46

5.2 Improvement of the Learners on Numerical Features in

Comparison to the Static Model 46

5.3 Confusion Matrices on the F2000 Dataset Comparing C4.5

and CN2 46

6.1 Accuracy Estimates of the Decision Tree & Rule Learners

on the Datasets with Symbolic (Fuzzy–Crisp) Features 59

6.2 Improvement of the Learners on Symbolic (Fuzzy–Crisp)

Features in Comparison to the Static Model 59

7.1 Accuracy Estimate of the FuzzConRI Learner 68

7.2 Accuracy Estimates of the FuzzConRI Learner with Tuned

Fuzzy Membership Functions 75

Trang 14

List of Tables xiii

7.3 Improvement of Fuzzy Learners with Membership

Func-tion Tuning 77

7.4 Comparison of alpha values and Accuracy 80

8.1 Accuracy Estimates of the Initial Multi-Level Learner 92

8.2 Accuracy Estimates of the Multi-Level Learner with

Im-proved Example Search 95

8.3 Accuracy Estimates of the Multi-Level Learner with

Location-Specific Rules 96

9.1 Overview of the Best Accuracies Achieved 100

Trang 15

Chapter 1

Introduction

Quality control is an important step in the industrial production

pro-cess and a component of the quality management propro-cess It ensures that

produced goods leaving the factory meet some minimum quality

stan-dard set by the manufacturer, its customers or government regulations

Quality control practices such as the Zero Defects methodology (Crosby,

1979) advocate on-line quality inspection during the production

pro-cess to ensure that no defective product is shipped at all But such

re-quirements can only be effectively achieved if the quality inspection

process is performed in a consistent way and the result conforms to the

quality standard

Humans, despite their ability to quickly learn complex tasks, do not

have the ability to perform such tasks in a consistent way over a long

time

By contrast, machines (in particular: computers) do have this ability to

consistently do what they are told to But what may first seem like a

virtue can also be a curse: machines tend to be rigorous about the precise

execution of any rule, not allowing the smallest deviation

What may first seem as the perfect tool for a zero defects production

process may actually turn out as an economic disaster, because explicit

quality standards tend to be overly restrictive for the sake of simplicity

So when the quality standard is fed to a computer, the computer will

apply it in a rigorous manner and thus reject more products than what

rationally be justified by the actual quality requirements

Trang 16

By contrast, when human quality controllers apply these standards,

they enforce them in a different way which is more robust to small

de-viations, but may vary over a longer time, resulting in the

aforemen-tioned inconsistent performance However, this behavior is inherently

complex and hard to explicate as general rules, hence it is not possible

to explicitly tell a machine to behave exactly in the same way

A solution for this problem may come from the area of Machine

Learn-ing, where machines learn from examples produced by humans, trying

to identify the general patterns and build a hypothesis for how things

actually work This approach has the advantage that it does not

re-quire to explicitly specify the expected behavior However, the learned

behavior need not necessarily be correct and might just be sufficient to

perform correctly on the training examples

An often desired property of such a hypothesis—apart from being correct—

is that the hypothesis should not only be interpretable by computers

but also by human beings, to allow sanity-checking or manual

alter-ation, for example Hypotheses such as decision trees or production

rules are known to be easily interpretable by humans (Mitchell, 1997)

This thesis tries to explore the potential use of Machine Learning

meth-ods for automating quality control in the production of cast iron parts,

while also investigating how existing knowledge about the problem

domain can be incorporated into the process

The remainder of this thesis is organized as follows:

Chapter 2 gives a short overview of the work’s overall context Chapter

3 describes that data used in the experiments and how it was acquired

Chapter 4 contains some general considerations regarding the

upcom-ing experiments In the Chapters 5–8, experiments are conducted usupcom-ing

different learning approaches Chapter 9 discusses the overall results

Finally, Chapter 10 contains concluding remarks and gives suggestions

for further work

Trang 17

Chapter 2

Context of this Work

"Computer vision largely deals with the analysis of

pic-tures in order to achieve results similar to those obtained by

man." (Levine, 1985)

"Automated visual inspection systems are able to

de-liver excellent recognition results continuously and reliably,

equal to the average performance of humans over time, even

better in some areas, provided the inspection task has

been described precisely and in detail, in a way

appropri-ate for the special characteristics of machine vision."

(Demant, Streicher-Abel, & Waszkewitz, 1999)

Image processing systems are used throughout many industries The

systems are not only used for quality inspection, as in the case of this

work, but also for other tasks such as process control Figure 2.1 depicts

the hardware set-up of a typical industrial image processing system

Due to the various tasks that can be performed in industrial image

pro-cessing, Jähne et al (1995) categorize these systems based on the

fol-lowing objectives:

• object recognition

• position recognition

Trang 18

2.1 Industrial Image Processing 4

Figure 2.1: Hardware set-up of an industrial image processing system

(Source: Demant et al (1999))

• completeness check

• shape and dimension check

• surface inspection

The system discussed in the context of this work is a surface inspection

system The general steps of an inspection task are as follows (Demant

5 computation of object features

6 decision as to the correctness of the segmented objects

It is essentially the segmentation step which identifies defects on the

surface of the inspected part There are various image segmentation

techniques that can be chosen from An overview of these is given in

Jähne et al (1995) However, the actual processing of images is not

within the scope of this work Rather, the next section will discuss what

happens when defects are detected on the inspected workpiece

Trang 19

Produc-tion

2.2.1 The Problem of Pores in the Production Process

When hot liquid iron is cast into a form, pores may develop inside and

on the surface of the created workpiece These pores originate either

from gas enclosures or arise from differences in the material’s density

during the cooling process

Aside from the negative influence of pores to the visual appearance of

a workpiece, pores may also compromise the structural integrity of a

workpiece or its leak tightness when they occur nearby a sealing

sur-face (VDG, 2002)

Pores will inevitably occur and thus quality control must ensure that

the pores contained in a workpiece do not compromise its functionality

Different methods are used to check for pores, depending on where

they occur Pores occurring inside the workpiece may be checked by

x-raying, while pores appearing the workpiece’s surface are checked

through visual inspection

When a workpiece is checked for pores, it is divided into different

ar-eas, each for which different constraints regarding the occurring pores

apply For example, the area near a sealing surface may have stricter

limitations regarding pores than an area that does have no function at

all These restrictions are recorded in Pore Classes, each of which applies

to different types of areas

If an area does not meet the requirements of its associated pore class,

then the workpiece is rejected as a ’bad’ or defective part

The workpiece areas used in the upcoming experiments are all applied

to the same pore class PK3, which is covered in the next section

2.2.2 The Pore Class PK3

The pore class PK3 defines constraints on pores that occur on the

sur-face of certain areas, i.e areas that can be checked through visual

in-spection

Trang 20

The PK3 standard is defined as follows:

• only pores > 0.4mm are checked (smaller pores are ignored)

• pores up to a size of 0.7mm are permitted on the surface

• pores must have a mutual distance of at least 15mm, otherwise

they will be treated as a pore nest (see below)

• the distance of a pore to the edge of the surface is not checked

• Exceptions:

– one pore nest with up to 3 pores is permitted, as long as the

pores have a mutual distance of at least 1.5mm

– one single pore of max 1.0mm is permitted (i.e the pore

must not occur in a nest)

2.2.3 Problems Associated with the Standard

As it was already indicated in the introduction, the PK3 standard as

well as the other standards tend to be overly simplistic, due to the fact

that these standards must be enforced by humans which need to be

trained first

If the standards were more complex, it would be very difficult to learn

to properly apply them, if one considers the fact that there are multiple

pore classes to be applied on different areas of a workpiece

However, it would be unusual if the simplification of a quality standard

made it less restrictive—after all that would not live up to the idea of a

quality standard Consequently, the standard becomes more restrictive,

leading to an overly high rejection rate if applied precisely as it was

defined

The fact that the PK3 is applied differently by the human quality

con-trollers can be assumed from the following observations:

• PK3 defines hard thresholds for all numbers For example, two

pores of size 0.7mm are permitted, but not if they had a size of

0.71mm A human will not make a difference between these two

sizes, as they are too close

Trang 21

• The exception regarding pore nests also seems simplified The

restrictions for size, distance and count of pores do not relate to

each other For example, three pores of up to 0.7mm size are

al-lowed, but if there were four pores of just 0.4mm—which is just

above the size for being completely ignored at all—that would be

illegal

• human quality controllers tend to apply the PK3 standard more

strict if a pore is near a critical area, such as the edge of sealing

surface Usually, this would require another pore class to be

de-fined for these areas, but again, this would make the instruction

of quality controllers very complicated

So it is assumed that over time, the human quality controllers gather

tacit knowledge about the "real" quality standards for a workpiece,

which is passed on to other controllers through the process of

socializa-tion, as described by Nonaka’s and Takeuchi’s Knowledge Spiral

(Non-aka & Takeuchi, 1995) However, this knowledge is never explicated

in a way that it could be actually used in an automated quality control

system

2.2.4 Structure of the Current System

The visual inspection system currently used for quality control consists

of several components, that are depicted in Figure 2.2 The major steps

of the inspection process are performed as follows:

1 Image Acquisition: A digital grayscale image of a workpiece’s

area is acquired

2 Defect Detection: Potential defects (pores) are identified in the

image’s region of interest (ROI) A list of identified defects and

their characteristics is produced in this step

3 Classification: A static classifier model, that implements the rules

of the PK standard for the respective area, processes the defect list

and classifies the area as good or bad

Note: Although a workpiece may consist of several areas, it must be

noted that these are checked independently of each other If one area

is classified as "bad", then the whole workpiece is rejected Therefore,

Trang 22

2.3 Machine Learning 8

Figure 2.2: Structure of the quality control system The structure has

been simplified to focus only on the components relevant to the current

work

the terms "workpiece" and "area" will be used interchangeably for the

remainder of the work

The aim of this thesis will be to replace the static classifier model of the

current classification component with a model trained from observed

data, or combine both

Machine Learning is a discipline that emerged from Statistics and

Com-puter Science to answer the question

"How can we build computer systems that automatically

improve with experience, and what tare the fundamental

laws that govern all learning processes?" (Mitchell, 2006)

Unlike traditional Artificial Intelligence systems that reason about

exist-ing knowledge, machine learnexist-ing systems try to create new knowledge

by inducing a hypothesis over observed data

The following definitions are adapted from Mitchell (1997):

Let f : X → O be some unknown target function that maps the elements

of an instance space X to the values of some output space O

In the context of this work, X would be the set of all workpiece images

and O would be a two-valued set {good, bad} The target function f

would be performed by the human quality controllers when inspecting

workpieces

Trang 23

Because an explicit formal representation of the target function f is

un-known, it must be learned from a set of training examples T , whose

elements are pairs < xi, f (xi) >, where the xiare instances from X and

the f (xi) ∈ Oare their assigned target function values

A learner L (also: inducer) will try to identify patterns in the data of the

training set to explain the relation between the instances (as described

through certain characteristics) and their assigned function values At

the end of this training process, the learner produces a hypothesis h :

X → O, which is an approximation of f Sometimes a hypothesis will

also be referred to as a model

During the training phase, the learner uses certain assumptions about

the data as well as assumptions about which hypothesis should be

pre-ferred over another These assumptions constitute the learner’s

induc-tive bias (also: learning bias) and are a requirement in order to

general-ize over the data

A hypothesis can be described through various representation

forma-lisms, some of which differ in expressiveness and comprehensibility by

human beings Which kind of formalism is used in the training process

depends on the learner that is used as well as on the representation

of the instance space In general it is safe to say that symbolic

repre-sentation forms such as decision trees and production rules are more

easily understood by humans than other, numeric or sub-symbolic

for-malisms Therefore, these formalisms will be embraced in the

upcom-ing experiments

Machine Learning, and in particular: rule induction, has been

success-fully applied to many different domains, including quality control (for

an overview, see for example Langley and Simon (1995)) However, for

image classification tasks such as visual quality control, artificial

neu-ral networks are probably the most widespread approach, due to their

robustness to noise and missing information

Knowledge Based Systems (KBS), or: Expert Systems (XPS), are not trained

from observed data like it is done in machine learning Instead, expert

systems are engineered from existing knowledge that is acquired from

domain experts

Trang 24

Figure 2.3:Structure of an Expert System (Source: Nikolopoulos (1997))

The general architecture of such a system is illustrated in Figure 2.3

The following definitions of the XPS components are adapted from

(Nikolopoulos, 1997):

• Knowledge Base

The knowledge base contains specific knowledge about the

prob-lem domain ("domain knowledge"), as well as control knowledge,

which is more general knowledge about strategies to be used in

the overall problem solving process

• Inference Engine

The inference engine contains some general algorithms that are

applied to the knowledge base in order to solve a problem It is

important to understand that the inference engine is a

standard-ized piece of software that is not domain-dependent and can be

exchanged for another engine as long as this supports the same

knowledge representation formalism that is used in for the

knowl-edge base

• Knowledge Acquisition Module

The knowledge acquisition module is used by the knowledge

en-gineer to model and test the acquired knowledge

• Explanation Module and User Interface

When using the expert system for a particular case of a problem,

the user interacts in a dialog with the system, which in turn asks

certain questions about known facts that help in the problem

solv-ing process If the XPS has come to a conclusion regardsolv-ing the

Trang 25

2.5 Related Work 11

problem, it may use the explanation module to present the

infer-ence process to the user

Practically every software program incorporates some kind of domain

knowledge Yet, this does not make it a KBS The key distinction

be-tween KBS and ordinary software is its separation bebe-tween domain

(and control) knowledge and the inference engine that performs the

actual problem solving process Hence, the current automated

qual-ity control system used for the classification of workpieces is no KBS,

even though it contains some domain knowledge in the form of the

PK3 quality standard In Chapter 8, the same system is re-designed as

a Prolog-based KBS

Nikolopoulos (1997) also describes hybrid expert systems, whose

knowl-edge base does not only consist of knowlknowl-edge elicited from human

do-main experts, but also contains knowledge that was acquired using

ma-chine learning techniques This knowledge may have the same

repre-sentation and can thus be stored together with the expert knowledge,

or the system accommodates a trained hypothesis such as an artificial

neural network that is only used for specific sub-problems The reason

for using such hybrid expert systems is that some problem domains are

not entirely well understood so that not all the knowledge required for

solving a task can be elicited from a domain expert

As previously mentioned, many applications of machine learning for

visual quality control use artificial neural network approaches:

Konig, Windirsch, Gasteier, and Glesner (1995) demonstrate Novas, an

artificial neural network architecture that can be trained to detect

de-fects in digital images

Bahlmann, Heidemann, and Ritter (1999) use artificial neural networks

(in particular: self organizing feature maps) for classification of textile

seams based on features acquired through visual inspection

Chang et al (1997) use a trained neural network to classify cork

stop-pers based on pre-processed defect features acquired through visual

inspection

Trang 26

2.5 Related Work 12

Other approaches use Fuzzy Logic in combination with rule learning or

decision trees:

Chou, Rao, Sturzenbecker, Wu, and Brecher (1997) report a system used

in semiconductor manufacturing that learns fuzzy rules used for the

classification of single defects on semiconductors The input data is

based on visually acquired and pre-processed features The method

uses a modified version of Kosko’s Fuzzy Associative Memory (Kosko,

1992) and reportedly achieves an accuracy similar to that of methods

such as C4.5 and probabilistic neural networks

Adorni, Bianchi, and Cagnoni (1998) report of successfully training a

fuzzy decision tree from pre-processed image features for use in quality

control of pork meat

Further approaches rely solely on image processing techniques or use

them in combination with knowledge based systems:

Perner (1994) developed a quality control system for the offset

print-ing process The system uses image processprint-ing techniques for defect

detection, but a knowledge based system for defect classification and

determination of their causes and appropriate counter measures

Njoroge, Ninomiya, Kondo, and Toita (2002) report of a visual

inspec-tion system for the automated grading of agricultural products The

system is entirely based on image processing techniques Also, the

sys-tem developed by Al-Habaibeh et al (2004) for the quality control of

a laser sealing process for food containers is based entirely on visual

inspection

More experimental approaches to image processing include the

sym-bolic processing of images as part of image understanding Behnke

presents therefore the approach of a hierarchical neural network or:

Neural Abstraction Pyramid (Behnke & Rojas, 1998; Behnke, 2003)

Another approach taken by Hennessey, Hahn, and Lin (1991) parses

images of integrated circuits into a syntactic representation and uses

knowledge based systems to detect and classify defects

In conclusion it can be said that various approaches are tried for

au-tomating visual quality control, however, soft computing approaches

like artificial neural networks or fuzzy logic seem to be the most

popu-lar choice

Trang 27

Chapter 3

Description of the Data

For the experiments there were two sets of images used Each set

con-tained images of one area of a workpiece (denoted F1100 and F2000)

As it was noted in the last chapter, it is not important whether the

im-ages of these two areas originate from the same workpieces or not, since

the classification tasks are independent of each other

The pictures are a sample recorded on-line during the production

pro-cess, so it can be said that they were made under realistic conditions

But before being able to start with the experiments, the defect detection

process had to be performed to produce the lists of defects for each

image

3.1.1 Feature Extraction

Before the actual defect detection process is started, the acquired image

is compared to a reference image in order to detect and correct a

poten-tial offset resulting from a minor displacement of the workpiece Then,

a mask overlay is applied to the region of interest (ROI), i.e the actual

area that is to be checked (see Figure 3.1)1 Because the mask may not

1 The images of the F2000 area contained several variants of the workpiece, for

which slightly different masks were used So it may be that some locations in an F2000

Trang 28

perfectly fit the acquired image, the mask is refined to approximate the

edges of the actual ROI borders

Then, the defect detection algorithm is applied on the masked area,

which produces a list of defects that were identified in the image’s ROI

Each defect is characterized through the following features:

• size (diameter)

• x/y coordinate of center of gravity

• distance to the ROI border

• distance to all other defects found in the image

As it turns out, there are several sources of imprecision in the

measure-ments:

• in some parts of the image, the masked ROI also captures parts

of the true ROI’s border, resulting in an "artefact", which is

incor-rectly identified as a defect (see Figure 3.2)

• in other cases, the mask refinement algorithm stops too early

be-fore the true ROI border, resulting in an underestimation of a

de-fect’s distance to the true ROI border (see Figure 3.3) However, a

pore’s distance to the border is never overestimated

• sometimes, a pore’s size is slightly underestimated, because the

detection algorithm identified only a part of the pore’s true

sur-face (see Figure 3.4) This deviation is usually only one or two

pixels

• while the potential for underestimating pore size is relatively low,

the potential for overestimating its size is much bigger, because

the diameter of a pore is not directly measured Instead, the largest

side of the pore’s bounding box is taken as an estimate, which

may result in a significant deviation to the true diameter if the

pore is not round (see Figure 3.5)

• the x/y coordinates of a defect may slightly deviate due to an

uncorrected image offset

image may not be part of the ROI in other images These differences are rather small

though, so all images were used for the same dataset.

Trang 29

Figure 3.1:Masking the Workpiece’s Region of Interest

3.1.2 The Labeling Process

Since the judgment of the human quality controllers were not available

for labeling the workpiece images as good or bad, the author of this

work was instructed by a subject matter expert in how to apply the

PK3 standard correctly to the F1100 and F2000 areas

Part of this process was the definition of critical edges, i.e those edges of

the ROI border that adjoin to sealing surfaces or require for some other

reason that the PK3 standard is interpreted more strictly nearby these

edges Figure 3.6 shows these critical edges for both datasets

Also, as part of the labeling process, all artefacts that were incorrectly

identfied as defects were marked and later removed The rationale for

Trang 30

3.2 Analysis of the Cleaned Datasets 16

Dataset Size Class "good" Class "bad"

Table 3.1:Class distributions on the cleaned datasets Note that

work-pieces contained no defects were already removed from the datasets

this was that the images should only be judged based on their visual

appearance, not on the result of the defect detection process, which

can-not be influenced in the context of this work So if artefacts were can-not

removed and fed to the learner, they would prevent the learner from

generalizing properly over the data, because good workpieces may also

contain artefacts and the large size of the artefacts would dominate all

other defects Also, future improvements of the mask refinement

pro-cess are likely to tackle the problem of artefacts

After all workpiece images had been labeled, those images where no

defects were detected or where the only defects turned out to be

arte-facts, were removed from the dataset The reason for this was that a

learner can only judge a workpiece based on the defects found Thus if

no defects are detected, the outcome will always be good and therefore

this kind of data will be no help in the learning process

Table 3.1 shows the class distributions of the cleaned datasets As it

turns out, the class distributions are relatively balanced, which makes

the learning task easier as opposed to extremely imbalanced class

dis-tributions However, it must be noted that both datasets are relatively

small

The average count of defects encountered on a F1100 workpiece is 4.14

and 7.23 for the F2000 area Figure 3.7 shows the distribution

The mean size of a pore is 0.594 mm on the F1100 dataset and 0.464 mm

on the F2000 dataset Figure 3.8 shows the distribution of the sizes

Trang 31

During the process of labeling the workpiece image, some insights were

gained about how a human quality controller judges the workpieces:

• the human expert will first inspect all pores each as a single entity

If one pore is somehow very critical, the workpiece gets

immedi-ately rejected

• then, pores are examined in their environment, in particular their

proximity to other pores If some nest of pores is judged as

criti-cal, the workpiece is immediately rejected

• if during the preceding steps pores were found that are

some-what critical, but do not justify alone the rejection of workpiece,

then the ’big picture’ of the workpiece is checked again to make a

decision

• as already mentioned, pores near critical edges are stronger

sub-ject to constraints than pores that lie far away from such borders

The next chapter will show, how the strict application of the PK3

stan-dard differs from the judgments of a human expert

Trang 32

Figure 3.2:An Artefact Resulting from an Incorrect Mask Overlay

Figure 3.3:Imprecise Measurement of Pore Distance to ROI Border The

ROI border detection algorithm returned for all three pores a distance

of 0.08 mm

Trang 33

Figure 3.4:Imprecise Measurement of Pore Size: Underestimating Size

The picture on the left shows the part of the pores’ surfaces that have

been detected by the segmentation process (contour marked red inside

the squares) The picture on the right shows that the real pores are

slightly bigger

Figure 3.5: Imprecise Measurement of Pore Size: Overestimating Size

Both pictures show pores whose size (diameter) has been estimated to

0.8 mm The pore on the left side however has a significantly smaller

surface The high estimate results from the fact that only the largest

side of the bounding box is taken as size estimate

Trang 34

Figure 3.6:Images of F1100 (top) and F2000 (bottom) areas with critical

edges marked

Trang 35

Figure 3.7: Defects encountered on Workpieces Note that these

fig-ures are based on the cleaned dataset, where all images containing no

defects were removed

Figure 3.8:Pore Sizes Defects encountered on Workpieces The fact that

no pore with less than 0.3mm or 0.25mm diameter were encountered is

because the defect detection algorithm is set to ignore defects below

these sizes

Trang 36

Chapter 4

Experiments: General

Considerations

This chapter describes how experiments were conducted and what

eval-uation methods their results are based upon Because it is important to

keep experiments repeatable, all experiment setups, as well as the

soft-ware programs and libraries required to conduct the experiments, are

included on the accompanying DVD (see also Appendix A)

4.1.1 Environment

All experiments were conducted using the RAPIDMINER 4.0 Machine

Learning environment (Mierswa, Wurst, Klinkenberg, Scholz, &

Eu-ler, 2006), formerly known as YALE The environment was chosen

be-cause it provides standardized environment for different learning

algo-rithms that can be combined with operators for many recurring tasks

such as cross-validation and parameter tuning, thus freeing the author

from implementing these functionalities for each learning algorithm by

hand

RAPIDMINER is Open Source software, which provides the freedom

to extend the software, which was necessary in some cases during the

experiments Also, it is possible to integrate RAPIDMINER with other

Trang 37

4.1 Experiment Setup 23

applications, which is especially useful when trained classifiers are

sup-posed to be used for on-line classification tasks

Another advantage of RAPIDMINER is its de facto platform

indepen-dence due to the use of JAVA technology, which makes it accessible

to a broad circle of users, while other environments such as MLC++

(Kohavi, Sommerfield, & Doughert, 1996) are confined to the

operat-ing systems and hardware architectures for which they were originally

conceived

During an earlier project (Hecker & Mennicke, 2006) it became soon

apparent that porting older machine learning software to more recent

system architectures is a non-trivial task and sometimes impossible, as

in the case of MLC++ The JAVA-based collection of popular Machine

Learning algorithms WEKA(Witten & Frank, 2005) adresses this

prob-lem by making the algorithms available to all JAVA-enabled platforms

In many publications that make use of older Machine Learning

algo-rithms, the WEKA-implementation of these algorithm is used instead

of the implementation by the original authors RAPIDMINER fully

in-corporates the WEKA library Although RAPIDMINER provides own

implementations of some learning algorithms used in the experiments,

the WEKAimplementation was preferred due to its higher popularity

It must be noted though that the WEKAalgorithms do not always offer

the full functionality of the original implementations Whenever there

are relevant differences in functionality, this will be noted in the text

Finally, some learning algorithms used in the experiments were not

available in the WEKA library (and neither in RAPIDMINER) These

algorithms had to be newly implemented in Java and integrated with

RAPIDMINER as a plugin In other cases it was more appropriate to

build a JAVAwrapper around the original implementation, allowing it

to be used within RAPIDMINER

4.1.2 Evaluation Method

When conducting experiments to assess a learner’s ability to generalize

over observed data, the main concern is to get an accurate estimate of

the learner’s error rate when classifying unseen instances

Mitchell (1997) notes that it is important to understand that a learner’s

true accuracy cannot be determined usually, since that would require to

Trang 38

evaluate the learner over the whole population of instances Therefore

the learner—or more specific: the hypothesis produced by the learner—

is evaluated against a (hopefully representative) sample of the general

population of instances, in order to produce an estimate of its true

gen-eralization error (which is also referred to as the sample error)

Usually there is only one labeled Dataset D which functions as this

representative sample Hence the available data must both be used to

produce a hypothesis from the learner and to evaluate that hypothesis

There are multiple strategies how to make best use of the available data

to produce a good estimate of the learner’s generalization error An

overview of evaluation methods is given in Mitchell (1997) and Bramer

(2007) A more comprehensive comparison from a statistical

perspec-tive can be found in Kohavi (1995)

Probably the most common evaluation strategy found in the Machine

Learning literature is k-fold cross validation With k-fold cross

valida-tion (CV), the Dataset D is split into k disjoint subsets of equal size

D1 Dk Then the training set Ti = D − Di is formed by removing

one fold from the dataset and assigning the latter as the test set Si = Di

Then, the learner is trained on Ti, yielding hypothesis hi, which is then

evaluated against the test set Si, from which a count of misclassified

ex-amples eiresults This procedure is repeated k times so that each fold is

used once as test set The total count of misclassified examples is then

used to compute the sample error (Micro-Average)

err = 1

|D|

X

ei

or alternatively the accuracy estimate 1 − err

The standard deviation σeof the sample error ("standard error") is then

computed as

σe≈

serr ∗ (1 − err)

|D|

This computation treats the total amount if misclassified instances as a

binomially distributed random variable, for which the probability of a

misclassification is err (Mitchell, 1997)1

1 Note: R APID M INER only computes the standard error for the Macro-Averaged

sample error, which in turn computes a sample error for each fold and then averages

these errors, yielding much more extreme values if the folds are small In this work

however, the micro averages are used as described above.

Trang 39

K-fold CV makes best use of the data, as each instance in the dataset

is used exactly once in an error estimate and k − 1 times for training a

hypothesis (Mitchell, 1997) Hence this method is preferred when D is

relatively small

Kohavi (1995) compares the sample errors for different values of k and

finds that for smaller values of k, the error estimates are

pessimisti-cally biased The reason for this phenomenon is that by dividing D in

only few partitions, the test set Si used for each run is relatively large

compared to the training set, which means in return that more data is

withheld from the learner when it forms the hypothesis hi, thus

low-ering its chances to find the truly relevant patterns in the data

Conse-quently, Kohavi finds that the most extreme form of CV—leave-one-out

CV—where k = |D| folds containing only one instance each are used,

produces the most accurate and least pessimistically biased error

esti-mates

A drawback of leave-one-out is its high computational cost, rendering

it relatively unattractive for large datasets In addition, large datasets

provide sufficiently large training sets even when k is small, making the

error estimate relatively insusceptible to perturbations of the training

set

A review of the Machine Learning literature reveals that most researchers

chose 10-fold CV as a trade-off between bias in the error estimate and

computational cost However, it must be noted that the data sets

com-monly used in the literature2usually have sizes of at least 300 instances

The F1100 and F2000 datasets used here have 138 and 155 instances

re-spectively, which is very small for such a learning problem Therefore

it was chosen to use leave-one-out CV for error estimation, in order to

leave as many instances as possible to the learner and thus reduce the

pessimistic bias in the estimate as much as possible

4.1.3 Parameter Tuning

Many learners allow to adjust their learning algorithms to the

train-ing set by providtrain-ing a set of adjustable learntrain-ing parameters When

choosing a process to determine the right parameters for each learners,

several question arise:

2 For a collection of publicly available datasets visit

http://mlearn.ics.uci.edu/MLRepository.html

Trang 40

• How can the parameter tuning process bias the learner’s estimate

of the true generalization error and how can this be prevented?

• How can it be ensured that the tuning process is free from

subjec-tive influences, i.e did the tuning process yield the best possible

parameters for each learner?

Given there are m different learning parameter configurations for a

learner and the best one should be selected Which one would that

be? In many cases the one that yields the hypothesis with the lowest

true generalization error would be preferred But as it was explained

in the last section, the true error is unknown and must be estimated

from the misclassifications on a test set Hence, selecting the

parame-ter configuration with the lowest error estimate may add an optimistic

bias to the final estimate, because the parameter configuration might

have been only optimal for that particular test set, not for other unseen

instances independently drawn from the general population

A solution to this problem is to further remove instances from the

train-ing set T , resulttrain-ing in a validation setV and a traintrain-ing set T0 The

learner then trains hypotheses for each parameter configuration on T0,

which are then evaluated on the validation set V This procedure does

not bias the generalization error estimate, as long as S and V were

drawn independently

When Michie et al (1994) conducted their experiments to compare

sev-eral learners on different datasets, the datasets were first divided into

a training set T and test set S and both sets were kept apart on

dif-ferent research sites Then, difdif-ferent researchers tuned the parameters

for each learner on the training set (potentially using a validation set

as described above) The resulting hypotheses that were deemed

opti-mal were sent to the other research site, where they were evaluated on

the test sets that were kept apart so far This procedure ensures

maxi-mally unbiased results, except for one part: theoretically, the different

researchers might not have put the same effort into the parameter

tun-ing process, thus putttun-ing some learners at a disadvantage

To ensure maximal objectivity in the tuning process, parameters are not

tuned manually, but automatically RAPIDMINERoffers for several

pa-rameter tuning facilities, the most important being the GRIDPARAME

-TEROPTIMIZER, which simply builds parameter configurations by

com-bining all possible parameter values with each other Some learner

pa-rameters are continuous-valued, which requires to define either a list of

Định dạng
Số trang	132
Dung lượng	6,57 MB