Abstract This thesis explores the possibilities of using Machine Learning methods for mated quality control in the production process of cast iron workpieces.. This thesis tries to explo
Trang 1Automating Quality Control in
Manufacturing Systems
Combining Knowledge-Based and Rule Learning Approaches
Masterarbeit
im Studiengang Wirtschaftsinformatik der Fakultät
Wirtschaftsinformatik und Angewandte Informatik
der Otto-Friedrich-Universität Bamberg
Verfasser:
Themensteller:
Abgabedatum:
Thomas Hecker Prof Dr Ute Schmid 30.9.2007
Trang 2Diese Arbeit enthält zum Teil vertrauliche Informationen Jegliche ergabe an Dritte ist ohne Einwilligung des Fraunhofer Instituts für In-tegrierte Schaltungen strengstens untersagt
Weit-Notice:
Parts of this work contain confidential information Any disclosure to
a third party without permission of Fraunhofer Institute for IntegratedCircuits, Erlangen is hereby strictly prohibited
Trang 3Acknowledgments
I would like to thank my thesis advisor, Prof Dr Ute Schmid, for her
valuable input during the creation of this thesis and the patience and
many encouragements she had for me during my entire time at
Bam-berg University Her teaching in the area of Cognitive Systems made
this time a very rewarding experience
I would also like to express my gratitude to Dr Christian
Münzen-mayer and Dipl.-Ing Klaus Spinnler, for enabling me to write part of
my thesis at Fraunhofer Institute for Integrated Circuits, Erlangen This
thesis would not have been possible without their assistance Also
many thanks to Dr Thomas Wittenberg, who brought all the
aforemen-tioned people together, enabling this thesis in the first place
Thomas Hecker
Bamberg, Sept 2007
This document was typeset using a modified version of the i10 diploma
thesis template made by Thorsten Karrer from the Media Computing
Group at RWTH Aachen University
Trang 4Abstract
This thesis explores the possibilities of using Machine Learning methods for mated quality control in the production process of cast iron workpieces In particu-lar, rule induction methods are combined with existing domain knowledge to trainclassifiers from workpieces that are each described through a variable list of de-fects The research finds that most traditional learning approaches are too restricted
auto-to deal with this kind of data Instead, methods of Inductive Logic Programmingshow more flexibility in this domain
Diese Arbeit behandelt die Möglichkeiten von Methoden des maschinellen nens für den Einsatz bei der Qualitätskontrolle in der industriellen Produktion vonGußeisenteilen Es wird insbesondere geprüft, wie Methoden der Regelinduktion
Ler-in KombLer-ination mit vorhandenem Domänenwissen zum Anlernen von toren verwendet werden können Die Klassifikatoren basieren hierbei auf Datenvon Gußeisenteilen, welche durch eine variable Anzahl von Defekten beschriebensind Die Ergebnisse der Arbeit zeigen, dass die traditionellen Lernmethoden zulimitiert sind, um mit dieser Art von Daten umgehen zu können Stattdessenscheinen Methoden der induktiven logischen Programmierung für diese Domänebesser geeignet
Trang 5Contents
2.1 Industrial Image Processing 3
2.2 Visual Quality Inspection in Cast Iron Production 5
2.2.1 The Problem of Pores in the Production Process 5 2.2.2 The Pore Class PK3 5
2.2.3 Problems Associated with the Standard 6
2.2.4 Structure of the Current System 7
2.3 Machine Learning 8
2.4 Knowledge Based Systems 9
2.5 Related Work 11
Trang 6Contents v
3.1 Data Acquisition 13
3.1.1 Feature Extraction 13
3.1.2 The Labeling Process 15
3.2 Analysis of the Cleaned Datasets 16
3.3 Insights Gained from the Acquisition Process 17
4 Experiments: General Considerations 22 4.1 Experiment Setup 22
4.1.1 Environment 22
4.1.2 Evaluation Method 23
4.1.3 Parameter Tuning 25
4.2 Significance Testing 28
4.3 Initial Experiments 30
4.3.1 Description of the Static Model 31
4.3.2 Experiment 0.a: Static Model vs Default Classifier 32 4.3.3 Discussion 33
5 Experiments I – The Traditional Approach 36 5.1 Feature Extraction 37
5.2 Description of the Learning Algorithms 39
5.2.1 C4.5 40
5.2.2 CN2 42
Trang 7Contents vi
5.2.3 PART 43
5.2.4 Remarks 45
5.3 Experiments 45
5.4 Discussion 46
6 Experiments II – Combining Fuzzy and Crisp Methods 50 6.1 A Short Introduction to Fuzzy Logic 51
6.2 Feature Extraction 53
6.2.1 The Fuzzification Process 53
6.2.2 Defuzzification to Symbolic Features 58
6.3 Experiments 58
6.4 Discussion 59
7 Experiments III – Learning Fuzzy Rules 63 7.1 The FuzzConRI Learner 63
7.2 Own Extensions to FuzzConRI 66
7.2.1 Certainty Factors 66
7.2.2 Non-Binary Defuzzification as Conflict Resolution Strategy 67
7.3 Experiments III.a - Initial Results 67
7.4 Adjusting Membership Functions with Genetic Algorithms 69 7.4.1 Initial Considerations 69
7.4.2 Genetic Algorithms 70
Trang 8Contents vii
7.4.3 Representing Membership Functions for Genetic
Algorithms 71
7.5 Experiments III.b - Tuning Membership Functions 73
7.5.1 Experiment Setup 73
7.5.2 Experiments 74
7.6 Discussion of the Results 78
8 Experiments IV – An Approach for Learning on Different Lev-els of Abstraction 82 8.1 Learning on Different Levels of Abstraction 84
8.1.1 General Considerations 84
8.1.2 A Prolog Framework for a PK3 Classifier 86
8.2 The FOIL Algorithm 88
8.3 Transformation of Predicates 90
8.4 Learning Strategy 91
8.5 Experiments – Learning Very Bad Pores 92
8.5.1 Experiments IV.a - Going the Most Obvious Way 92 8.5.2 Experiments IV.b - Using an Informed Search Strat-egy for more Examples 93
8.5.3 Experiments IV.c - Learning Location-Dependent Rules 95
8.6 Discussion of the Results 96
Trang 9Contents viii
10.1 Desired Extensions and Suggestions for Further Work 102
10.1.1 Noise-Robust FOIL 102
10.1.2 Beam-Search FOIL 103
10.1.3 Constraining Arguments of Predicates 103
10.1.4 Back to Fuzzy Logic 104
10.1.5 Growing Non-Rectangular Regions for Rules 104
10.1.6 Using Hierarchical Agglomerative Clustering Meth-ods to Learn Location-Specific Rules 106
10.1.7 Incorporating Misclassification Costs 107
10.2 Summary 108
A Description of the Contents of the Accompanying DVD 113
B Prolog Framework and Ruleset for Multi-Level Classification
Trang 10List of Figures
2.1 Hardware Set-up of an Industrial Image Processing System 4
2.2 Structure of the Quality Control System 8
2.3 Structure of an Expert System 10
3.1 Masking the Workpiece’s Region of Interest 15
3.2 An Artefact Resulting from an Incorrect Mask Overlay 18
3.3 Imprecise Measurement of Pore Distance to ROI Border 18
3.4 Imprecise Measurement of Pore Size: Underestimating
Size 19
3.5 Imprecise Measurement of Pore Size: Overestimating Size 19
3.6 Images of F1100 (top) and F2000 (bottom) areas with
crit-ical edges marked 20
3.7 Distribution of Defects on Workpieces 21
3.8 Distribution of Pore Sizes on Workpieces 21
4.1 Different Treatment of Small Pores in F1100 and F2000
Areas 35
5.1 Overview of the C4.5 Algorithm 41
Trang 11List of Figures x
5.2 The CN2 Algorithm 44
5.3 Hypotheses Produced from the F2000 Dataset with Nu-merical Features 49
6.1 Membership Functions of the Linguistic Variables for Pore Size and Distance to Border 53
6.2 Fuzzy Max-Min-Inference Illustrated 54
6.3 A Measured Pore Size of 0.5mm Fuzzified 62
6.4 A Measured Distance of 1.0mm to the ROI Border Fuzzi-fied 62
6.5 A Measured Distance of 1.0mm between two Pores Fuzzi-fied 62
7.1 The FuzzConRI Algorithm 65
7.2 The Linguistic Variable Used in the Rule Consequents 68
7.3 General Outline of a Genetic Algorithm 71
7.4 Representation of L-,LR- and R-Type Fuzzy Sets 72
7.5 Alternative Representation of L-,LR- and R-Type Fuzzy Sets 72
7.6 Testing the Fitness of an MF/Learning Parameter Con-figuration 74
7.7 Correlation between alpha and Accuracy 80
7.8 Comparison of Tuned Membership Functions 81
8.1 Classification on Different Levels of Abstraction 85
8.2 The PK3 Standard as Prolog Rules 87
8.3 The FOIL Algorithm 88
Trang 12List of Figures xi
8.4 Model with Location-Specific Rules 99
10.1 Approximating Non-Rectangular Areas 105
10.2 Approximating Areas Through Connected Circles 106
Trang 13List of Tables
3.1 Class Distributions on the Cleaned Datasets 16
4.1 Comparison of Prediction Accuracy between Default
(Ma-jority Vote) Classifier and the Static Model 32
4.2 Confusion Matrices for the Static Model 32
5.1 Accuracy Estimates of the Decision Tree & Rule Learners
on the Datasets with Numerical Features 46
5.2 Improvement of the Learners on Numerical Features in
Comparison to the Static Model 46
5.3 Confusion Matrices on the F2000 Dataset Comparing C4.5
and CN2 46
6.1 Accuracy Estimates of the Decision Tree & Rule Learners
on the Datasets with Symbolic (Fuzzy–Crisp) Features 59
6.2 Improvement of the Learners on Symbolic (Fuzzy–Crisp)
Features in Comparison to the Static Model 59
7.1 Accuracy Estimate of the FuzzConRI Learner 68
7.2 Accuracy Estimates of the FuzzConRI Learner with Tuned
Fuzzy Membership Functions 75
Trang 14List of Tables xiii
7.3 Improvement of Fuzzy Learners with Membership
Func-tion Tuning 77
7.4 Comparison of alpha values and Accuracy 80
8.1 Accuracy Estimates of the Initial Multi-Level Learner 92
8.2 Accuracy Estimates of the Multi-Level Learner with
Im-proved Example Search 95
8.3 Accuracy Estimates of the Multi-Level Learner with
Location-Specific Rules 96
9.1 Overview of the Best Accuracies Achieved 100
Trang 15Chapter 1
Introduction
Quality control is an important step in the industrial production
pro-cess and a component of the quality management propro-cess It ensures that
produced goods leaving the factory meet some minimum quality
stan-dard set by the manufacturer, its customers or government regulations
Quality control practices such as the Zero Defects methodology (Crosby,
1979) advocate on-line quality inspection during the production
pro-cess to ensure that no defective product is shipped at all But such
re-quirements can only be effectively achieved if the quality inspection
process is performed in a consistent way and the result conforms to the
quality standard
Humans, despite their ability to quickly learn complex tasks, do not
have the ability to perform such tasks in a consistent way over a long
time
By contrast, machines (in particular: computers) do have this ability to
consistently do what they are told to But what may first seem like a
virtue can also be a curse: machines tend to be rigorous about the precise
execution of any rule, not allowing the smallest deviation
What may first seem as the perfect tool for a zero defects production
process may actually turn out as an economic disaster, because explicit
quality standards tend to be overly restrictive for the sake of simplicity
So when the quality standard is fed to a computer, the computer will
apply it in a rigorous manner and thus reject more products than what
rationally be justified by the actual quality requirements
Trang 16By contrast, when human quality controllers apply these standards,
they enforce them in a different way which is more robust to small
de-viations, but may vary over a longer time, resulting in the
aforemen-tioned inconsistent performance However, this behavior is inherently
complex and hard to explicate as general rules, hence it is not possible
to explicitly tell a machine to behave exactly in the same way
A solution for this problem may come from the area of Machine
Learn-ing, where machines learn from examples produced by humans, trying
to identify the general patterns and build a hypothesis for how things
actually work This approach has the advantage that it does not
re-quire to explicitly specify the expected behavior However, the learned
behavior need not necessarily be correct and might just be sufficient to
perform correctly on the training examples
An often desired property of such a hypothesis—apart from being correct—
is that the hypothesis should not only be interpretable by computers
but also by human beings, to allow sanity-checking or manual
alter-ation, for example Hypotheses such as decision trees or production
rules are known to be easily interpretable by humans (Mitchell, 1997)
This thesis tries to explore the potential use of Machine Learning
meth-ods for automating quality control in the production of cast iron parts,
while also investigating how existing knowledge about the problem
domain can be incorporated into the process
The remainder of this thesis is organized as follows:
Chapter 2 gives a short overview of the work’s overall context Chapter
3 describes that data used in the experiments and how it was acquired
Chapter 4 contains some general considerations regarding the
upcom-ing experiments In the Chapters 5–8, experiments are conducted usupcom-ing
different learning approaches Chapter 9 discusses the overall results
Finally, Chapter 10 contains concluding remarks and gives suggestions
for further work
Trang 17Chapter 2
Context of this Work
"Computer vision largely deals with the analysis of
pic-tures in order to achieve results similar to those obtained by
man." (Levine, 1985)
"Automated visual inspection systems are able to
de-liver excellent recognition results continuously and reliably,
equal to the average performance of humans over time, even
better in some areas, provided the inspection task has
been described precisely and in detail, in a way
appropri-ate for the special characteristics of machine vision."
(Demant, Streicher-Abel, & Waszkewitz, 1999)
Image processing systems are used throughout many industries The
systems are not only used for quality inspection, as in the case of this
work, but also for other tasks such as process control Figure 2.1 depicts
the hardware set-up of a typical industrial image processing system
Due to the various tasks that can be performed in industrial image
pro-cessing, Jähne et al (1995) categorize these systems based on the
fol-lowing objectives:
• object recognition
• position recognition
Trang 182.1 Industrial Image Processing 4
Figure 2.1: Hardware set-up of an industrial image processing system
(Source: Demant et al (1999))
• completeness check
• shape and dimension check
• surface inspection
The system discussed in the context of this work is a surface inspection
system The general steps of an inspection task are as follows (Demant
5 computation of object features
6 decision as to the correctness of the segmented objects
It is essentially the segmentation step which identifies defects on the
surface of the inspected part There are various image segmentation
techniques that can be chosen from An overview of these is given in
Jähne et al (1995) However, the actual processing of images is not
within the scope of this work Rather, the next section will discuss what
happens when defects are detected on the inspected workpiece
Trang 192.2 Visual Quality Inspection in Cast Iron Production 5
Produc-tion
2.2.1 The Problem of Pores in the Production Process
When hot liquid iron is cast into a form, pores may develop inside and
on the surface of the created workpiece These pores originate either
from gas enclosures or arise from differences in the material’s density
during the cooling process
Aside from the negative influence of pores to the visual appearance of
a workpiece, pores may also compromise the structural integrity of a
workpiece or its leak tightness when they occur nearby a sealing
sur-face (VDG, 2002)
Pores will inevitably occur and thus quality control must ensure that
the pores contained in a workpiece do not compromise its functionality
Different methods are used to check for pores, depending on where
they occur Pores occurring inside the workpiece may be checked by
x-raying, while pores appearing the workpiece’s surface are checked
through visual inspection
When a workpiece is checked for pores, it is divided into different
ar-eas, each for which different constraints regarding the occurring pores
apply For example, the area near a sealing surface may have stricter
limitations regarding pores than an area that does have no function at
all These restrictions are recorded in Pore Classes, each of which applies
to different types of areas
If an area does not meet the requirements of its associated pore class,
then the workpiece is rejected as a ’bad’ or defective part
The workpiece areas used in the upcoming experiments are all applied
to the same pore class PK3, which is covered in the next section
2.2.2 The Pore Class PK3
The pore class PK3 defines constraints on pores that occur on the
sur-face of certain areas, i.e areas that can be checked through visual
in-spection
Trang 202.2 Visual Quality Inspection in Cast Iron Production 6
The PK3 standard is defined as follows:
• only pores > 0.4mm are checked (smaller pores are ignored)
• pores up to a size of 0.7mm are permitted on the surface
• pores must have a mutual distance of at least 15mm, otherwise
they will be treated as a pore nest (see below)
• the distance of a pore to the edge of the surface is not checked
• Exceptions:
– one pore nest with up to 3 pores is permitted, as long as the
pores have a mutual distance of at least 1.5mm
– one single pore of max 1.0mm is permitted (i.e the pore
must not occur in a nest)
2.2.3 Problems Associated with the Standard
As it was already indicated in the introduction, the PK3 standard as
well as the other standards tend to be overly simplistic, due to the fact
that these standards must be enforced by humans which need to be
trained first
If the standards were more complex, it would be very difficult to learn
to properly apply them, if one considers the fact that there are multiple
pore classes to be applied on different areas of a workpiece
However, it would be unusual if the simplification of a quality standard
made it less restrictive—after all that would not live up to the idea of a
quality standard Consequently, the standard becomes more restrictive,
leading to an overly high rejection rate if applied precisely as it was
defined
The fact that the PK3 is applied differently by the human quality
con-trollers can be assumed from the following observations:
• PK3 defines hard thresholds for all numbers For example, two
pores of size 0.7mm are permitted, but not if they had a size of
0.71mm A human will not make a difference between these two
sizes, as they are too close
Trang 212.2 Visual Quality Inspection in Cast Iron Production 7
• The exception regarding pore nests also seems simplified The
restrictions for size, distance and count of pores do not relate to
each other For example, three pores of up to 0.7mm size are
al-lowed, but if there were four pores of just 0.4mm—which is just
above the size for being completely ignored at all—that would be
illegal
• human quality controllers tend to apply the PK3 standard more
strict if a pore is near a critical area, such as the edge of sealing
surface Usually, this would require another pore class to be
de-fined for these areas, but again, this would make the instruction
of quality controllers very complicated
So it is assumed that over time, the human quality controllers gather
tacit knowledge about the "real" quality standards for a workpiece,
which is passed on to other controllers through the process of
socializa-tion, as described by Nonaka’s and Takeuchi’s Knowledge Spiral
(Non-aka & Takeuchi, 1995) However, this knowledge is never explicated
in a way that it could be actually used in an automated quality control
system
2.2.4 Structure of the Current System
The visual inspection system currently used for quality control consists
of several components, that are depicted in Figure 2.2 The major steps
of the inspection process are performed as follows:
1 Image Acquisition: A digital grayscale image of a workpiece’s
area is acquired
2 Defect Detection: Potential defects (pores) are identified in the
image’s region of interest (ROI) A list of identified defects and
their characteristics is produced in this step
3 Classification: A static classifier model, that implements the rules
of the PK standard for the respective area, processes the defect list
and classifies the area as good or bad
Note: Although a workpiece may consist of several areas, it must be
noted that these are checked independently of each other If one area
is classified as "bad", then the whole workpiece is rejected Therefore,
Trang 222.3 Machine Learning 8
Figure 2.2: Structure of the quality control system The structure has
been simplified to focus only on the components relevant to the current
work
the terms "workpiece" and "area" will be used interchangeably for the
remainder of the work
The aim of this thesis will be to replace the static classifier model of the
current classification component with a model trained from observed
data, or combine both
Machine Learning is a discipline that emerged from Statistics and
Com-puter Science to answer the question
"How can we build computer systems that automatically
improve with experience, and what tare the fundamental
laws that govern all learning processes?" (Mitchell, 2006)
Unlike traditional Artificial Intelligence systems that reason about
exist-ing knowledge, machine learnexist-ing systems try to create new knowledge
by inducing a hypothesis over observed data
The following definitions are adapted from Mitchell (1997):
Let f : X → O be some unknown target function that maps the elements
of an instance space X to the values of some output space O
In the context of this work, X would be the set of all workpiece images
and O would be a two-valued set {good, bad} The target function f
would be performed by the human quality controllers when inspecting
workpieces
Trang 232.4 Knowledge Based Systems 9
Because an explicit formal representation of the target function f is
un-known, it must be learned from a set of training examples T , whose
elements are pairs < xi, f (xi) >, where the xiare instances from X and
the f (xi) ∈ Oare their assigned target function values
A learner L (also: inducer) will try to identify patterns in the data of the
training set to explain the relation between the instances (as described
through certain characteristics) and their assigned function values At
the end of this training process, the learner produces a hypothesis h :
X → O, which is an approximation of f Sometimes a hypothesis will
also be referred to as a model
During the training phase, the learner uses certain assumptions about
the data as well as assumptions about which hypothesis should be
pre-ferred over another These assumptions constitute the learner’s
induc-tive bias (also: learning bias) and are a requirement in order to
general-ize over the data
A hypothesis can be described through various representation
forma-lisms, some of which differ in expressiveness and comprehensibility by
human beings Which kind of formalism is used in the training process
depends on the learner that is used as well as on the representation
of the instance space In general it is safe to say that symbolic
repre-sentation forms such as decision trees and production rules are more
easily understood by humans than other, numeric or sub-symbolic
for-malisms Therefore, these formalisms will be embraced in the
upcom-ing experiments
Machine Learning, and in particular: rule induction, has been
success-fully applied to many different domains, including quality control (for
an overview, see for example Langley and Simon (1995)) However, for
image classification tasks such as visual quality control, artificial
neu-ral networks are probably the most widespread approach, due to their
robustness to noise and missing information
Knowledge Based Systems (KBS), or: Expert Systems (XPS), are not trained
from observed data like it is done in machine learning Instead, expert
systems are engineered from existing knowledge that is acquired from
domain experts
Trang 242.4 Knowledge Based Systems 10
Figure 2.3:Structure of an Expert System (Source: Nikolopoulos (1997))
The general architecture of such a system is illustrated in Figure 2.3
The following definitions of the XPS components are adapted from
(Nikolopoulos, 1997):
• Knowledge Base
The knowledge base contains specific knowledge about the
prob-lem domain ("domain knowledge"), as well as control knowledge,
which is more general knowledge about strategies to be used in
the overall problem solving process
• Inference Engine
The inference engine contains some general algorithms that are
applied to the knowledge base in order to solve a problem It is
important to understand that the inference engine is a
standard-ized piece of software that is not domain-dependent and can be
exchanged for another engine as long as this supports the same
knowledge representation formalism that is used in for the
knowl-edge base
• Knowledge Acquisition Module
The knowledge acquisition module is used by the knowledge
en-gineer to model and test the acquired knowledge
• Explanation Module and User Interface
When using the expert system for a particular case of a problem,
the user interacts in a dialog with the system, which in turn asks
certain questions about known facts that help in the problem
solv-ing process If the XPS has come to a conclusion regardsolv-ing the
Trang 252.5 Related Work 11
problem, it may use the explanation module to present the
infer-ence process to the user
Practically every software program incorporates some kind of domain
knowledge Yet, this does not make it a KBS The key distinction
be-tween KBS and ordinary software is its separation bebe-tween domain
(and control) knowledge and the inference engine that performs the
actual problem solving process Hence, the current automated
qual-ity control system used for the classification of workpieces is no KBS,
even though it contains some domain knowledge in the form of the
PK3 quality standard In Chapter 8, the same system is re-designed as
a Prolog-based KBS
Nikolopoulos (1997) also describes hybrid expert systems, whose
knowl-edge base does not only consist of knowlknowl-edge elicited from human
do-main experts, but also contains knowledge that was acquired using
ma-chine learning techniques This knowledge may have the same
repre-sentation and can thus be stored together with the expert knowledge,
or the system accommodates a trained hypothesis such as an artificial
neural network that is only used for specific sub-problems The reason
for using such hybrid expert systems is that some problem domains are
not entirely well understood so that not all the knowledge required for
solving a task can be elicited from a domain expert
As previously mentioned, many applications of machine learning for
visual quality control use artificial neural network approaches:
Konig, Windirsch, Gasteier, and Glesner (1995) demonstrate Novas, an
artificial neural network architecture that can be trained to detect
de-fects in digital images
Bahlmann, Heidemann, and Ritter (1999) use artificial neural networks
(in particular: self organizing feature maps) for classification of textile
seams based on features acquired through visual inspection
Chang et al (1997) use a trained neural network to classify cork
stop-pers based on pre-processed defect features acquired through visual
inspection
Trang 262.5 Related Work 12
Other approaches use Fuzzy Logic in combination with rule learning or
decision trees:
Chou, Rao, Sturzenbecker, Wu, and Brecher (1997) report a system used
in semiconductor manufacturing that learns fuzzy rules used for the
classification of single defects on semiconductors The input data is
based on visually acquired and pre-processed features The method
uses a modified version of Kosko’s Fuzzy Associative Memory (Kosko,
1992) and reportedly achieves an accuracy similar to that of methods
such as C4.5 and probabilistic neural networks
Adorni, Bianchi, and Cagnoni (1998) report of successfully training a
fuzzy decision tree from pre-processed image features for use in quality
control of pork meat
Further approaches rely solely on image processing techniques or use
them in combination with knowledge based systems:
Perner (1994) developed a quality control system for the offset
print-ing process The system uses image processprint-ing techniques for defect
detection, but a knowledge based system for defect classification and
determination of their causes and appropriate counter measures
Njoroge, Ninomiya, Kondo, and Toita (2002) report of a visual
inspec-tion system for the automated grading of agricultural products The
system is entirely based on image processing techniques Also, the
sys-tem developed by Al-Habaibeh et al (2004) for the quality control of
a laser sealing process for food containers is based entirely on visual
inspection
More experimental approaches to image processing include the
sym-bolic processing of images as part of image understanding Behnke
presents therefore the approach of a hierarchical neural network or:
Neural Abstraction Pyramid (Behnke & Rojas, 1998; Behnke, 2003)
Another approach taken by Hennessey, Hahn, and Lin (1991) parses
images of integrated circuits into a syntactic representation and uses
knowledge based systems to detect and classify defects
In conclusion it can be said that various approaches are tried for
au-tomating visual quality control, however, soft computing approaches
like artificial neural networks or fuzzy logic seem to be the most
popu-lar choice
Trang 27Chapter 3
Description of the Data
For the experiments there were two sets of images used Each set
con-tained images of one area of a workpiece (denoted F1100 and F2000)
As it was noted in the last chapter, it is not important whether the
im-ages of these two areas originate from the same workpieces or not, since
the classification tasks are independent of each other
The pictures are a sample recorded on-line during the production
pro-cess, so it can be said that they were made under realistic conditions
But before being able to start with the experiments, the defect detection
process had to be performed to produce the lists of defects for each
image
3.1.1 Feature Extraction
Before the actual defect detection process is started, the acquired image
is compared to a reference image in order to detect and correct a
poten-tial offset resulting from a minor displacement of the workpiece Then,
a mask overlay is applied to the region of interest (ROI), i.e the actual
area that is to be checked (see Figure 3.1)1 Because the mask may not
1 The images of the F2000 area contained several variants of the workpiece, for
which slightly different masks were used So it may be that some locations in an F2000
Trang 283.1 Data Acquisition 14
perfectly fit the acquired image, the mask is refined to approximate the
edges of the actual ROI borders
Then, the defect detection algorithm is applied on the masked area,
which produces a list of defects that were identified in the image’s ROI
Each defect is characterized through the following features:
• size (diameter)
• x/y coordinate of center of gravity
• distance to the ROI border
• distance to all other defects found in the image
As it turns out, there are several sources of imprecision in the
measure-ments:
• in some parts of the image, the masked ROI also captures parts
of the true ROI’s border, resulting in an "artefact", which is
incor-rectly identified as a defect (see Figure 3.2)
• in other cases, the mask refinement algorithm stops too early
be-fore the true ROI border, resulting in an underestimation of a
de-fect’s distance to the true ROI border (see Figure 3.3) However, a
pore’s distance to the border is never overestimated
• sometimes, a pore’s size is slightly underestimated, because the
detection algorithm identified only a part of the pore’s true
sur-face (see Figure 3.4) This deviation is usually only one or two
pixels
• while the potential for underestimating pore size is relatively low,
the potential for overestimating its size is much bigger, because
the diameter of a pore is not directly measured Instead, the largest
side of the pore’s bounding box is taken as an estimate, which
may result in a significant deviation to the true diameter if the
pore is not round (see Figure 3.5)
• the x/y coordinates of a defect may slightly deviate due to an
uncorrected image offset
image may not be part of the ROI in other images These differences are rather small
though, so all images were used for the same dataset.
Trang 293.1 Data Acquisition 15
Figure 3.1:Masking the Workpiece’s Region of Interest
3.1.2 The Labeling Process
Since the judgment of the human quality controllers were not available
for labeling the workpiece images as good or bad, the author of this
work was instructed by a subject matter expert in how to apply the
PK3 standard correctly to the F1100 and F2000 areas
Part of this process was the definition of critical edges, i.e those edges of
the ROI border that adjoin to sealing surfaces or require for some other
reason that the PK3 standard is interpreted more strictly nearby these
edges Figure 3.6 shows these critical edges for both datasets
Also, as part of the labeling process, all artefacts that were incorrectly
identfied as defects were marked and later removed The rationale for
Trang 303.2 Analysis of the Cleaned Datasets 16
Dataset Size Class "good" Class "bad"
Table 3.1:Class distributions on the cleaned datasets Note that
work-pieces contained no defects were already removed from the datasets
this was that the images should only be judged based on their visual
appearance, not on the result of the defect detection process, which
can-not be influenced in the context of this work So if artefacts were can-not
removed and fed to the learner, they would prevent the learner from
generalizing properly over the data, because good workpieces may also
contain artefacts and the large size of the artefacts would dominate all
other defects Also, future improvements of the mask refinement
pro-cess are likely to tackle the problem of artefacts
After all workpiece images had been labeled, those images where no
defects were detected or where the only defects turned out to be
arte-facts, were removed from the dataset The reason for this was that a
learner can only judge a workpiece based on the defects found Thus if
no defects are detected, the outcome will always be good and therefore
this kind of data will be no help in the learning process
Table 3.1 shows the class distributions of the cleaned datasets As it
turns out, the class distributions are relatively balanced, which makes
the learning task easier as opposed to extremely imbalanced class
dis-tributions However, it must be noted that both datasets are relatively
small
The average count of defects encountered on a F1100 workpiece is 4.14
and 7.23 for the F2000 area Figure 3.7 shows the distribution
The mean size of a pore is 0.594 mm on the F1100 dataset and 0.464 mm
on the F2000 dataset Figure 3.8 shows the distribution of the sizes
Trang 313.3 Insights Gained from the Acquisition Process 17
During the process of labeling the workpiece image, some insights were
gained about how a human quality controller judges the workpieces:
• the human expert will first inspect all pores each as a single entity
If one pore is somehow very critical, the workpiece gets
immedi-ately rejected
• then, pores are examined in their environment, in particular their
proximity to other pores If some nest of pores is judged as
criti-cal, the workpiece is immediately rejected
• if during the preceding steps pores were found that are
some-what critical, but do not justify alone the rejection of workpiece,
then the ’big picture’ of the workpiece is checked again to make a
decision
• as already mentioned, pores near critical edges are stronger
sub-ject to constraints than pores that lie far away from such borders
The next chapter will show, how the strict application of the PK3
stan-dard differs from the judgments of a human expert
Trang 323.3 Insights Gained from the Acquisition Process 18
Figure 3.2:An Artefact Resulting from an Incorrect Mask Overlay
Figure 3.3:Imprecise Measurement of Pore Distance to ROI Border The
ROI border detection algorithm returned for all three pores a distance
of 0.08 mm
Trang 333.3 Insights Gained from the Acquisition Process 19
Figure 3.4:Imprecise Measurement of Pore Size: Underestimating Size
The picture on the left shows the part of the pores’ surfaces that have
been detected by the segmentation process (contour marked red inside
the squares) The picture on the right shows that the real pores are
slightly bigger
Figure 3.5: Imprecise Measurement of Pore Size: Overestimating Size
Both pictures show pores whose size (diameter) has been estimated to
0.8 mm The pore on the left side however has a significantly smaller
surface The high estimate results from the fact that only the largest
side of the bounding box is taken as size estimate
Trang 343.3 Insights Gained from the Acquisition Process 20
Figure 3.6:Images of F1100 (top) and F2000 (bottom) areas with critical
edges marked
Trang 353.3 Insights Gained from the Acquisition Process 21
Figure 3.7: Defects encountered on Workpieces Note that these
fig-ures are based on the cleaned dataset, where all images containing no
defects were removed
Figure 3.8:Pore Sizes Defects encountered on Workpieces The fact that
no pore with less than 0.3mm or 0.25mm diameter were encountered is
because the defect detection algorithm is set to ignore defects below
these sizes
Trang 36Chapter 4
Experiments: General
Considerations
This chapter describes how experiments were conducted and what
eval-uation methods their results are based upon Because it is important to
keep experiments repeatable, all experiment setups, as well as the
soft-ware programs and libraries required to conduct the experiments, are
included on the accompanying DVD (see also Appendix A)
4.1.1 Environment
All experiments were conducted using the RAPIDMINER 4.0 Machine
Learning environment (Mierswa, Wurst, Klinkenberg, Scholz, &
Eu-ler, 2006), formerly known as YALE The environment was chosen
be-cause it provides standardized environment for different learning
algo-rithms that can be combined with operators for many recurring tasks
such as cross-validation and parameter tuning, thus freeing the author
from implementing these functionalities for each learning algorithm by
hand
RAPIDMINER is Open Source software, which provides the freedom
to extend the software, which was necessary in some cases during the
experiments Also, it is possible to integrate RAPIDMINER with other
Trang 374.1 Experiment Setup 23
applications, which is especially useful when trained classifiers are
sup-posed to be used for on-line classification tasks
Another advantage of RAPIDMINER is its de facto platform
indepen-dence due to the use of JAVA technology, which makes it accessible
to a broad circle of users, while other environments such as MLC++
(Kohavi, Sommerfield, & Doughert, 1996) are confined to the
operat-ing systems and hardware architectures for which they were originally
conceived
During an earlier project (Hecker & Mennicke, 2006) it became soon
apparent that porting older machine learning software to more recent
system architectures is a non-trivial task and sometimes impossible, as
in the case of MLC++ The JAVA-based collection of popular Machine
Learning algorithms WEKA(Witten & Frank, 2005) adresses this
prob-lem by making the algorithms available to all JAVA-enabled platforms
In many publications that make use of older Machine Learning
algo-rithms, the WEKA-implementation of these algorithm is used instead
of the implementation by the original authors RAPIDMINER fully
in-corporates the WEKA library Although RAPIDMINER provides own
implementations of some learning algorithms used in the experiments,
the WEKAimplementation was preferred due to its higher popularity
It must be noted though that the WEKAalgorithms do not always offer
the full functionality of the original implementations Whenever there
are relevant differences in functionality, this will be noted in the text
Finally, some learning algorithms used in the experiments were not
available in the WEKA library (and neither in RAPIDMINER) These
algorithms had to be newly implemented in Java and integrated with
RAPIDMINER as a plugin In other cases it was more appropriate to
build a JAVAwrapper around the original implementation, allowing it
to be used within RAPIDMINER
4.1.2 Evaluation Method
When conducting experiments to assess a learner’s ability to generalize
over observed data, the main concern is to get an accurate estimate of
the learner’s error rate when classifying unseen instances
Mitchell (1997) notes that it is important to understand that a learner’s
true accuracy cannot be determined usually, since that would require to
Trang 384.1 Experiment Setup 24
evaluate the learner over the whole population of instances Therefore
the learner—or more specific: the hypothesis produced by the learner—
is evaluated against a (hopefully representative) sample of the general
population of instances, in order to produce an estimate of its true
gen-eralization error (which is also referred to as the sample error)
Usually there is only one labeled Dataset D which functions as this
representative sample Hence the available data must both be used to
produce a hypothesis from the learner and to evaluate that hypothesis
There are multiple strategies how to make best use of the available data
to produce a good estimate of the learner’s generalization error An
overview of evaluation methods is given in Mitchell (1997) and Bramer
(2007) A more comprehensive comparison from a statistical
perspec-tive can be found in Kohavi (1995)
Probably the most common evaluation strategy found in the Machine
Learning literature is k-fold cross validation With k-fold cross
valida-tion (CV), the Dataset D is split into k disjoint subsets of equal size
D1 Dk Then the training set Ti = D − Di is formed by removing
one fold from the dataset and assigning the latter as the test set Si = Di
Then, the learner is trained on Ti, yielding hypothesis hi, which is then
evaluated against the test set Si, from which a count of misclassified
ex-amples eiresults This procedure is repeated k times so that each fold is
used once as test set The total count of misclassified examples is then
used to compute the sample error (Micro-Average)
err = 1
|D|
X
ei
or alternatively the accuracy estimate 1 − err
The standard deviation σeof the sample error ("standard error") is then
computed as
σe≈
serr ∗ (1 − err)
|D|
This computation treats the total amount if misclassified instances as a
binomially distributed random variable, for which the probability of a
misclassification is err (Mitchell, 1997)1
1 Note: R APID M INER only computes the standard error for the Macro-Averaged
sample error, which in turn computes a sample error for each fold and then averages
these errors, yielding much more extreme values if the folds are small In this work
however, the micro averages are used as described above.
Trang 394.1 Experiment Setup 25
K-fold CV makes best use of the data, as each instance in the dataset
is used exactly once in an error estimate and k − 1 times for training a
hypothesis (Mitchell, 1997) Hence this method is preferred when D is
relatively small
Kohavi (1995) compares the sample errors for different values of k and
finds that for smaller values of k, the error estimates are
pessimisti-cally biased The reason for this phenomenon is that by dividing D in
only few partitions, the test set Si used for each run is relatively large
compared to the training set, which means in return that more data is
withheld from the learner when it forms the hypothesis hi, thus
low-ering its chances to find the truly relevant patterns in the data
Conse-quently, Kohavi finds that the most extreme form of CV—leave-one-out
CV—where k = |D| folds containing only one instance each are used,
produces the most accurate and least pessimistically biased error
esti-mates
A drawback of leave-one-out is its high computational cost, rendering
it relatively unattractive for large datasets In addition, large datasets
provide sufficiently large training sets even when k is small, making the
error estimate relatively insusceptible to perturbations of the training
set
A review of the Machine Learning literature reveals that most researchers
chose 10-fold CV as a trade-off between bias in the error estimate and
computational cost However, it must be noted that the data sets
com-monly used in the literature2usually have sizes of at least 300 instances
The F1100 and F2000 datasets used here have 138 and 155 instances
re-spectively, which is very small for such a learning problem Therefore
it was chosen to use leave-one-out CV for error estimation, in order to
leave as many instances as possible to the learner and thus reduce the
pessimistic bias in the estimate as much as possible
4.1.3 Parameter Tuning
Many learners allow to adjust their learning algorithms to the
train-ing set by providtrain-ing a set of adjustable learntrain-ing parameters When
choosing a process to determine the right parameters for each learners,
several question arise:
2 For a collection of publicly available datasets visit
http://mlearn.ics.uci.edu/MLRepository.html
Trang 404.1 Experiment Setup 26
• How can the parameter tuning process bias the learner’s estimate
of the true generalization error and how can this be prevented?
• How can it be ensured that the tuning process is free from
subjec-tive influences, i.e did the tuning process yield the best possible
parameters for each learner?
Given there are m different learning parameter configurations for a
learner and the best one should be selected Which one would that
be? In many cases the one that yields the hypothesis with the lowest
true generalization error would be preferred But as it was explained
in the last section, the true error is unknown and must be estimated
from the misclassifications on a test set Hence, selecting the
parame-ter configuration with the lowest error estimate may add an optimistic
bias to the final estimate, because the parameter configuration might
have been only optimal for that particular test set, not for other unseen
instances independently drawn from the general population
A solution to this problem is to further remove instances from the
train-ing set T , resulttrain-ing in a validation setV and a traintrain-ing set T0 The
learner then trains hypotheses for each parameter configuration on T0,
which are then evaluated on the validation set V This procedure does
not bias the generalization error estimate, as long as S and V were
drawn independently
When Michie et al (1994) conducted their experiments to compare
sev-eral learners on different datasets, the datasets were first divided into
a training set T and test set S and both sets were kept apart on
dif-ferent research sites Then, difdif-ferent researchers tuned the parameters
for each learner on the training set (potentially using a validation set
as described above) The resulting hypotheses that were deemed
opti-mal were sent to the other research site, where they were evaluated on
the test sets that were kept apart so far This procedure ensures
maxi-mally unbiased results, except for one part: theoretically, the different
researchers might not have put the same effort into the parameter
tun-ing process, thus putttun-ing some learners at a disadvantage
To ensure maximal objectivity in the tuning process, parameters are not
tuned manually, but automatically RAPIDMINERoffers for several
pa-rameter tuning facilities, the most important being the GRIDPARAME
-TEROPTIMIZER, which simply builds parameter configurations by
com-bining all possible parameter values with each other Some learner
pa-rameters are continuous-valued, which requires to define either a list of