On the other hand, rough set theory, which was introduced by Pawlak [1982] in theearly 1980s, is a new mathematical tool that can be employed to handle uncertainty and vagueness.Basicall
Trang 1Khoo, Li-Pheng et al "RClass*: A Prototype Rough-Set and Genetic Algorithms Enhanced Multi-Concept Classification System for Manufacturing Diagnosis"
Computational Intelligence in Manufacturing Handbook
Edited by Jun Wang et al
Boca Raton: CRC Press LLC,2001
Trang 2for Manufacturing
Diagnosis19.1 Introduction
by experience, decision rules can possibly be induced from the empirical training data obtained
In reality, due to various reasons, empirical data often has the property of granularity and may beincomplete, imprecise, or even conflicting For example, in diagnosing a manufacturing system, theopinions of two engineers can be different, or even contradictory Some earlier inductive learning systemssuch as the once prevailing decision tree learning system, the ID3, are unable to deal with imprecise andinconsistent information present in empirical training data [Khoo et al., 1999] Thus, the ability to handleimprecise and inconsistent information has become one of the most important requirements for aclassification system
Trang 3Many theories, techniques, and algorithms have been developed to deal with the analysis of imprecise
or inconsistent data in recent years The most successful ones are fuzzy set theory and Dempster–Shafertheory of evidence On the other hand, rough set theory, which was introduced by Pawlak [1982] in theearly 1980s, is a new mathematical tool that can be employed to handle uncertainty and vagueness.Basically, rough set handles inconsistent information using two approximations, namely the upper andlower approximations Such a technique is different from fuzzy set theory or Dempster–Shafer theory ofevidence Furthermore, rough set theory focuses on the discovery of patterns in inconsistent data setsobtained from information sources [Slowinski and Stefanowski, 1989; Pawlak, 1996] and can be used asthe basis to perform formal reasoning under uncertainty, machine learning, and rule discovery [Ziarko,1994; Pawlak, 1984; Yao et al., 1997] Compared to other approaches in handling uncertainty, rough settheory has its unique advantages [Pawlak, 1996, 1997] It does not require any preliminary or additionalinformation about the empirical training data such as probability distribution in statistics; the basicprobability assignment in the Dempster–Shafer theory of evidence; or grades of membership in fuzzyset theory [Pawlak et al., 1995] Besides, rough set theory is more justified in situations where the set ofempirical or experimental data is too small to employ standard statistical method [Pawlak, 1991]
In less than two decades, rough set theory has rapidly established itself in many real-life applicationssuch as medical diagnosis [Slowinski, 1992], control algorithm acquisition and process control [Mrozek,1992], and structural engineering [Arciszewski and Ziarko, 1990] However, most literature related toinductive learning or classification using rough set theory is limited to a binary concept, such as yes or
no in decision making or positive or negative in classification of objects
Genetic algorithms (GAs) are stochastic and evolutionary search techniques based on the principles
of biological evolution, natural selection, and genetic recombination GAs have received much attentionfrom researchers working on optimization and machine learning [Goldberg, 1989] Basically, GA-basedlearning techniques take advantage of the unique search engine of GAs to perform machine learning or
to glean probable decision rules from its search space This chapter describes the work that leads to thedevelopment of RClass*, a prototype multi-concept classification system for manufacturing diagnosis.RClass* is based on a hybrid technique that combines the strengths of rough set, genetic algorithms, andBoolean algebra In the following sections, the basic notions of rough set theory and GAs are presented.Details of RClass*, its validation, and a case study using the prototype system are also described
19.2 Basic Notions
19.2.1 Rough Set Theory
Large amounts of applications of rough set theory have proven its robustness in dealing with uncertaintyand vagueness, and many researchers attempted to combine it with other inductive learning techniques
to achieve better results Yasdi [1995] combined rough set theory with neural network to deal withlearning from imprecise training data Khoo et al [1999] developed RClass*, a prototype system based
on rough sets and a decision-tree learning methodology, and the predecessor of RClass*, for inductivelearning under noisy environment
Approximation space and the lower and upper approximations of a set form two important notions
of rough set theory The approximation space of a rough set is the classification of the domain of interestinto disjoint categories [Pawlak, 1991] Such a classification refers to the ability to characterize all theclasses in a domain The upper and lower approximations represent the classes of indiscernible objectsthat possess sharp descriptions on concepts but with no sharp boundaries The basic philosophy behindrough set theory is based on equivalence relations or indiscernibility in the classification of objects Roughset theory employs a so-called information table to describe objects The information about the objectsare represented in a structure known as an information system, which can be viewed as a table with itsrows and columns corresponding to objects and attributes, respectively (Table 19.1) For example, aninformation system (S) with 4-tuple can be expressed as follows:
S = 〈 U, Q, V, ρ 〉
Trang 4where U is the universe which contains a finite set of objects,
Q is a finite set of attributes,
V = q ∈Q V q
V qis a domain of the attribute q,
ρ: U× Q→V is the information function such that ρ(x, q) ∈ for every q ∈ Q and x ∈ U and ∃(q,
v), where q ∈Q and v ∈ V q is called a descriptor in S.
Table 19.1 shows a typical information system used for rough set analysis with x i s(i = 1, 2, 10)
representing objects of the set U to be classified; q is(i = 1, 2) denoting the condition attributes; and d
representing the decision attribute As a result, q i s and d form the set of attributes, Q
More specifically,
A typical information function, ρ(x1 , q1), can be expressed as
Any attribute-value pair such as (q1,1) is called a descriptor in S
Indiscernibility is one of the most important concepts in rough set theory It is caused by imprecise
information about the observed objects The indiscernibility relation (R) is an equivalence relation on
the set U and can be defined in the following manner:
If x, y∈U and P ∈Q, then x and y are indiscernible by the set of attributes P in S
Mathematically, it can be expressed as follows
For example, using the information system given in Table 19.1, objects x5 and x7 are indiscernible by
the set of attributes P = {q1,q2} The relation can be expressed as because the information
functions for the two objects are identical and are given by
TABLE 19.1 A Typical Information System Used by Rough Set Theory
Trang 5Hence, it is not possible to distinguish one from another using attributes set {q1,q2}.
The equivalence classes of relation, , are known as P-elementary sets in S Particularly, when P = Q,
these Q-elementary sets are known as the atoms in S In an information system, concepts can be represented
by the decision-elementary sets For example, using the information system depicted in Table 19.1, the
{q1}-elementary sets, atoms, and concepts can be expressed as follows:
Table 19.1 shows that objects x5and x7are indiscernible by condition attributes q1 and q2 Furthermore,
they possess different decision attributes This implies that there exists a conflict (or inconsistency) between
objects x5 and x7 Similarly, another conflict also exists between objects x6 and x8
Rough set theory offers a means to deal with inconsistency in information systems For a concept (C),
the greatest definable set contained in the concept is known as the lower approximation of C (R(C)) It
represents the set of objects (Y) on U that can be certainly classified as belonging to concept C by the set
of attributes, R, such that
where U/R represents the set of all atoms in the approximation space (U, R) On the other hand, the
least definable set containing concept C is called the upper approximation of C (R(C)) It represents the
set of objects (Y) on U that can be possibly classified as belonging to concept C by the set of attributes
R such that
where U/R represents the set of all atoms in the approximation space (U, R) Elements belonging only
to the upper approximation compose the boundary region (BN R ) or the doubtful area Mathematically, a
boundary region can be expressed as
A boundary region contains a set of objects that cannot be certainly classified as belonging to or not
belonging to concept C by a set of attributes, R Such a concept, C, is called a rough set In other words,
rough sets are sets having non-empty boundary regions
ρ(x5, q q1, 2)=ρ(x7, q q1, 2)={ }1 0, ˆP
R C( )=U{Y U R Y∈ / : ⊆C}.
R C( )=U{Y U R Y∈ / : ∩ ≠ ∅C }
BN R( )C =R C( ) ( )– R C
Trang 6Using the information system shown in Table 19.1 again, based on rough set theory, the upper and
lower approximations, concepts C1 for d = 0 and C2 for d = 1, can be easily obtained For example, the lower approximation of concept C1 (d = 0) is given by
and its upper approximation is denoted as
Thus, the boundary region of concept C1 is given by
As for concept C2 (d = 1), the approximations can be similarly obtained as follows.
As already mentioned, rough set theory offers a powerful means to deal with inconsistency in aninformation system The upper and lower approximations make it possible to mathematically describeclasses of indiscernible objects that possess sharp descriptions on concepts but with no sharp boundaries
For example, universe U (Table 19.1) consists of ten objects and can be described using two concepts,
namely “d = 0” and “d = 1.” As already mentioned, two conflicts, namely objects x5 and x7, and objects
x6 and x8, exist in the data set These conflicts cause the objects to be indiscernible and constitute doubtful
areas, which are denoted by BN R (0) or BN R(1), respectively (Figure 19.1) The lower approximation of
concept “0” is given by object set {x1,x4,x9,x10}, which forms the certain training data set of concept “0.”
On the other hand, the upper approximation is represented by object set {x1,x4,x5,x6,x7,x8,x9,x10}, which
contains the possible training data set of concept “0.” Concept “1” can be similarly interpreted.19.2.2 Genetic Algorithms
As already mentioned, GAs are stochastic and evolutionary search techniques based on the principles ofbiological evolution, natural selection, and genetic recombination They simulate the principle of “sur-
vival of the fittest” in a population of potential solutions known as chromosomes Each chromosome
represents one possible solution to the problem or a rule in a classification The population evolves overtime through a process of competition whereby the fitness of each chromosome is evaluated using afitness function During each generation, a new population of chromosomes is formed in two steps First,the chromosomes in the current population are selected to reproduce on the basis of their relative fitness.Second, the selected chromosomes are recombined using idealized genetic operators, namely crossoverand mutation, to form a new set of chromosomes that are to be evaluated as the new solution of theproblem GAs are conceptually simple but computationally powerful They are used to solve a widevariety of problems, particularly in the areas of optimization and machine learning [Grefenstette, 1994;Davis, 1991]
Figure 19.2 shows the flow of a typical GA program It begins with a population of chromosomeseither generated randomly or gleaned from some known domain knowledge Subsequently, it proceeds
to evaluate the fitness of all the chromosomes, select good chromosomes for reproduction, and produce
Trang 7FIGURE 19.1 Basic notions of rough sets.
FIGURE 19.2 A typical GA program flow.
Start
Limit on number ofgeneration reached?
Trang 8the next generation of chromosomes More specifically, each chromosome is evaluated according to a
given performance criterion or fitness function, and assigned a fitness score Using the fitness value attained
by each chromosome, good chromosomes are selected to undergo reproduction Reproduction involvesthe creation of offspring using two operators namely crossover and mutation (Figure 19.3) By randomlyselecting a common crossover site on two parent chromosomes, two new chromosomes are produced.During the process of reproduction, mutation may take place For example, the binary value of bit 2 in
Figure 19.3 has been changed from 0 to 1 The above process of fitness evaluation, chromosome selection,and reproduction of next generation of chromosomes continues for a predetermined number of gener-ations or until an acceptable performance level is reached
19.3 A Prototype Multi-Concept Classification System
19.3.1 Twin-Concept and Multi-Concept Classification
The basic principle of rough set theory is founded on a twin-concept classification [Pawlak, 1982] Forexample, in the information system shown in Table 19.1, an object belongs either to “0” or “1.” However,
binary-concept classification, in reality, has limited application This is because in most situations, objectscan be classified into more than two classes For example, in describing the vibration experienced by arotary machinery such as a turbine in a power plant or a pump in a chemical refinery, it is common to
use more than two states such as normal, slight vibration, mild vibration, and abnormal, rather than just
normal or abnormal to describe the condition As a result, the twin-concept classification of rough set
theory needs to be generalized in order to handle multi-concept problems Based on rough set theory,Grzymala-Busse [1992] developed an inductive learning system called LERS to deal with inconsistency
in training data Basically, LERS is able to perform multi-concept classification However, as observed byGrzymala-Busse [1992], LERS becomes impractical when it encounters a large training data set This canpossibly be attributed to the complexity of its computational algorithm Furthermore, the rules induced
by LERS are relatively complex and difficult to interpret
19.3.2.1 The Approach
RClass* adopts a hybrid approach that combines the basic notions of rough set theory, the uniquesearching engine of GAs, and Boolean algebraic operations to carry out multi-concept classification Itpossesses the ability of
FIGURE 19.3 Genetic operators.
Chromosome 1 1 0 1 1 0 0 1 0 1 0 1 0 New chromosome 1
Crossover
Mutation
⇒
⇒
Trang 91 Handling inconsistent information This is treated by rough set principles.
2 Inducing probable decision rules for each concept This is achieved by using a simple but effectiveGA-based search engine
3 Simplifying the decision rules discovered by the GA-based search engine This is realized usingthe Boolean algebraic operators to simplify the decision rules induced
Multi-concept classification can be realized using the following procedure
1 Treat all the concepts (classes) in a training data set as component sets (sets A, B, C ) of a universe, U (Figure 19.4)
2 Partition the universe, U, into two sets using one of the concepts such as A and ‘not A’ (¬A).
This implies that the rough set’s twin-concept classification can be used to treat concept A and
its complement, ¬A.
3 Apply the twin-concept classification to determine the upper and lower approximations of concept
A in accordance to rough set theory.
4 Use Steps 2 and 3 repeatedly to classify other concepts on universe U.
19.3.2.2 Framework of RClass *
The framework of RClass* is shown in Figure 19.5.It comprises four main modules, namely a cessor, a rough-set analyzer, a GA-based searching engine, and a rule pruner
prepro-The raw knowledge or data gleaned from a process or experts is stored and subsequently forwarded
to RClass* for classification and rule induction The preprocessor module performs the following tasks:
1 Access input data
2 Identify attributes and their value
3 Perform redundancy check and reorganize the new data set with no superfluous observations forsubsequent use
4 Initialize all the necessary parameters for the GA-based search engine, such as the length ofchromosome, population size, number of generation, and the probabilities of crossover andmutation
The rough set analyzer carries out three subtasks, namely, consistency check, concept forming, andapproximation It scans the training data set obtained from the preprocessor module and checks itsconsistency Once an inconsistency is spotted, it will activate the concept partitioner and the approxima-tion operator to carry out analysis using rough set theory The concept partitioner performs set operationsfor each concept (class) according to the approach outlined previously The approximation operatoremploys the lower and upper approximators to calculate the lower and upper approximations, during
which the training data set is split into certain training data set and possible training data set Subsequently,
these training sets are forwarded to the GA-based search engine for rule extraction
FIGURE 19.4 Partitioning of universe U.
A
B C
A
¬
.
U
Trang 10The GA-based search engine, once invoked, performs the bespoke genetic operations such as crossover,
mutation, and reproduction to gather certain rules and possible rules from the certain training data set
and possible training data set, respectively
The rule pruner performs two tasks: pruning (or simplifying) and rule evaluation It examines all the
rules, both certain and possible rules, extracted by the GA-based search engine and employs Boolean
algebraic operators such as union and intersection, to prune and simplify the rules During the pruningoperation, redundant rules are removed, whereas related rules are clustered and generalized during
simplification As possible rules are not definitely certain, the quality and reliability of these possible rules must therefore be assessed For every possible rule, RClass* also estimates its reliability using the followingindex:
where Observation_Possible_Rule is the number of observations that are correctly classified by a possible rule, and Observation_Possible_Original_Datais the number of observations with condi-tion attributes covered by the same rule in the original data set
This index can be viewed as the probability of classifying an inconsistent training data set correctly For
each certain rule extracted from the certain training data set, RClass* uses a so-called completeness index
to indicate the number of observations in the original training data set that are related to the certainrule Such an index is defined as follows:
FIGURE 19.5 Framework of RClass*
Fi
Module 3:
GA-based search engine
Knowledge Extracted Input Data
Pre-processor
C L A S S I F I E R
Redundancy Analysis
Rough-set Analyzer
GA-based Search Engine