Tài liệu Computational Intelligence In Manufacturing Handbook P18 pdf

Most control chart improvements over the years have been focused on detecting process mean shifts,with a few exceptions that are discussed in the following section.Shewhart R, S, and S2

Trang 1

Chang, Shing I "A Hybrid Neural Fuzzy System for Statistical Process Control"

Computational Intelligence in Manufacturing Handbook

Edited by Jun Wang et al

Boca Raton: CRC Press LLC,2001

Trang 2

A Hybrid Neural Fuzzy System for Statistical

Process Control18.1 Statistical Process Control

simulta-18.1 Statistical Process Control

Statistical process control (SPC) is one of the most often applied quality improvement tools in today’smanufacturing as well as service industries Instead of inspecting end products or services, SPC focuses

on processes that produce products and services The philosophy of a successful SPC application is toidentify sources of special causes of production variation as soon as possible during production ratherthan wait until the very end Here “production” is defined as either a manufacturing or service activity.SPC provides savings over traditional inspection operations on end products or service because iteliminates accumulations of special causes of variation by monitoring key quality characteristics duringproduction Imagine how much waste is generated when a production mistake enters a stream of productsduring mid-day but inspection doesn’t take place until the end of an 8-hour shift SPC can alleviate thissituation by frequently monitoring the production process via product quality characteristics

Shing I Chang

Kansas State University

Trang 3

A quality characteristic (QC) is a measure of quality on a product or service Examples of QC areweight of a juice can, length of a cylinder part, the number of errors made during payroll operations,etc A QC can be mathematically defined as a random variable, which is a function that takes values from

a population or distribution Denote a QC as random variable x If a population Ω only contains discretemembers, that is, Ω = {x1, x2, …, xn}, then QC x is a discrete random variable For example, if x is thenumber of errors made during payroll operations, then member x1 is the value in January, x2 is the value

in February, and so on In this case, attribute control charts can be used to monitor a QC with discretedistribution A control chart for fraction nonconforming, also known as a P chart, based on binomialdistribution, is the most frequently used chart (Montgomery, 1996) However, in this chapter, we willfocus only on a more interesting class of control charts when QC x is a continuous random variablewhere x can take a value in a continuous range, i.e., x∈Ω ={x | L ≤ x ≤ U} For example, x is the weight

of a juice can with a target weight of 8 oz

The central limit theorem (CLT) implies that the sample mean of a continuous random variable x isapproximately normally distributed where the sample mean is calculated by n independently sampledobservations of x The approximation improves when the size of n increases In much of the qualitycontrol literature, n is chosen to be 5 to 10 when the approximation is considered good enough Notethat CLT does not impose any restriction on the original distribution on x, which provides the foundationfor control charts Since the sample mean of x, –x, is approximately normal distributed, i.e., N(µ, σ2/n)where µ and σ are the mean and standard deviation of x, respectively, we can collect n observations of

a QC, calculate its sample mean, and plot it against a control chart with three lines If both µ and σ areknown, the centerline is µ with lower control limit and upper control limit If CLTholds and the process defined by QC x remains in control, 99.73% of the sample population will fallwithin the two control limits On the other hand, if either µ or σ shifts from its target, this will increasethe probability that sample points plot outside the control limits, which indicates an out-of-controlcondition

A pair of control charts are often used simultaneously to monitor QC x — one for the mean µ andthe other for the standard deviation σ The goal is to make sure the process characterized by QC x isunder statistical control In other words, SPC charts are used to verify that the distribution of x remainsthe same over time Since a probability distribution is usually estimated by two major parameters, µ and

σ, SPC charts monitor the distribution through these two parameters Figure 18.1 (Montgomery, 1996)demonstrates two out-of-control scenarios At time t1, the mean µ0 of x starts to shift to µ1 One of themost often used control charts, chart, can be used to detect this situation On the other hand, attime t2, the mean is on target but the standard deviation has increased from σ0 to σ1 where σ1 > σ0 Inthis case, a control chart for ranges (R chart) can be used to detect the variation change Notice that SPCcharts are designed to detect assignable causes of variation as indicated by mean or standard deviationshifts and at the same time tolerate the chance variation as shown by the bell-shaped distribution of x.Such a chance variation is inevitable in any production process

Statistical process control charts have been applied to a wide range of manufacturing and serviceindustries since Shewhart first introduced the concept in the 1920s There have been several improvements

on the traditional control charts since then Page (1954) first introduced cumulative sum (CUSUM)control charts to enhance the sensitivities of detecting small process shifts Instead of depending solely

on data collected in the most recent sample period for plotting in the traditional Shewhart-type controlchart, the CUSUM chart’s plotting statistic involves all data points previously collected and assigns anequal weight factor for every point If a small shift occurs, CUSUM statistic can accumulate such adeviation in a short period of time and thus increase the sensitivity of an SPC chart However, CUSUMcharts cannot be plotted as easily as the Shewhart-type control charts Roberts (1959) proposes anexponential weighted moving average (EWMA) control chart that weighs the most recent observationsmore heavily than remote data points EWMA charts were developed to have the structure of thetraditional Shewhart charts, yet match the CUSUM charts’ capability of detecting small process shifts

µ– 3 σn

µ+3 σ

n

X

Trang 4

Most control chart improvements over the years have been focused on detecting process mean shifts,with a few exceptions that are discussed in the following section.

Shewhart R, S, and S2 charts are the first statistical control charts for monitoring process variancechanges Johnson and Leone (1962a, 1962b) and Page (1963) later proposed CUSUM charts based onsample variance and sample range As an alternative, Crowder and Hamilton (1992) developed an expo-nential weighted moving average (EWMA) scheme based on the log transformation of the sample variance

ln(S2) Their experimental results show that the EWMA chart outperforms the Shewhart S2 chart and iscomparable to the CUSUM chart for variation proposed by Page (1963) Using the concept of logtransformation of sample variance, Chang and Gan (1995) suggest a CUSUM scheme based on ln(S2),which performs as well as the corresponding EWMA Performances of Chang and Gan’s (1995) CUSUMand Crowder and Hamilton’s (1992) EWMA are not significantly better than Page’s (1963) CUSUM;however, their development of design strategies and procedures are relatively easier for practitioners to use

18.2 Neural Network Control Charts

In recent years, attempts to apply neural networks to process control have been investigated by severalresearchers Guo and Dooley (1992) proposed network models that identify positive mean or variancechanges using backpropagation training Their best network performs 40% better on the average errorrate than conventional control chart heuristic tests

Pugh (1989, 1991) also successfully trained backpropagation networks for detecting process meanshifts with subgrouping size of five He found his networks equal in average run length (ARL) performance

to a 2-σ control chart in both type I and II errors

Hwarng and Hubele (1991, 1993) trained a backpropagation pattern recognition classifier to detectsix unnatural control chart patterns — trend, cycle, stratification, systematic, mixture, and sudden shift.Their results were promising in recognizing various special causes in out-of-control situations.Smith (1994) and Smith and Yazici (1993) described a combined X-bar and R chart backpropagationmodel to investigate both mean and variance shifts They found their networks performed 50% better

in average error rate when compared to Shewhart control charts However, the majority of the wrong

FIGURE 18.1 In-control and out-of-control scenarios in SPC (From Montgomery, D.C., 1996, Introduction to Statistical Quality Control, 2nd ed p 131 Reproduced with the permission of John Wiley & Sons, Inc.)

Assignable cause three

is present; process is out-of-control Assignable cause two

is present; process is out-of-control Assignable cause one

is present; process is out-of-control

Only chance causes of

Trang 5

classification is of type I error That is, the network signals too many out-of-control false alarms whenthe process is actually in control.

Chang and Aw (1994) proposed a four-layer backpropagation network and a fuzzy inferencing systemfor detecting process mean shifts Their network outperforms conventional Shewhart control charts interms of both type I and type II errors, while Pugh’s and Smith’s charts have larger type I errors thanthat of the 3σ chart Further, Chang and Aw’s scheme has the advantage of identifying the magnitude

of shifts None of the Shewhart-type charts, or the other neural network charts, offer this feature Changand Ho (1999) further introduced a two-stage neural network approach for detecting and classifyingprocess variance shifts The performance of the proposed method is comparable to that of the othercontrol charts for detecting variance changes as well as being capable of estimating the magnitude of thevariance change, which is not supported by the other control charts Furthermore, Ho and Chang (1999)integrated both neural network control chart schemes and compared this with many other approachesfor monitoring process mean and variance shifts In this chapter, we will summarize the proposed hybridneural fuzzy system for monitoring both process mean and variance shifts, provide guidelines andexamples for using this system, and list the properties

18.3 A Hybrid Neural Fuzzy Control Chart

As shown in Figure 18.2 (Ho and Chang, 1999), the proposed hybrid neural fuzzy control chart, calledC-NN (C stands for “combined” and NN means “neural network”), is composed of several modules —data input, data processing, decision making, and data summary The data input module takes observa-tions from QC x and transforms them into appropriate types for both control charts for mean M-NNand for variance V-NN, which are the major components of the data processing module The decision-making module is responsible for interpreting the neural network outputs from the previous module.There are four distinct possibilities: no process shift, process mean shift only, process variance shift only,and both process mean and variance shifts Note that two different classifiers — fuzzy and neural network

— are adopted for the process mean and variance components, respectively Finally, the data summarymodule calculates estimated shift magnitudes according to appropriate diagnosis Details of each modulewill be discussed in the following sections

18.3.1 Data Input Module

The data input module takes samples or observations of QC x in two ways Sample observations, x1,

x2, …, and x n in the first input method are independent of each other In the proposed system, n is chosen

as five, that is, each plotting point consists of a sample of five observations Traditional Shewhart-typecontrol charts normally use this input method

A moving window of five observations is used for the second method to select incoming observations.For example, the first sample point consists of observations x1, x2, , x5 and the second sample point

is composed of x2, x3, , x6, and so on This method is explored due to the fact that both CUSUMand EWMA charts for mean shifts are capable of taking individual observations The proposed movingrange method comes close to individual observation in terms of the number of observations used fordecision making Unlike the “true” individual observation input method, the moving range method mustwait until the fifth observation to complete the first sample point to start using the proposed chart Afterthis point, it is on pace with the “true” individual observation input method in that it uses the mostrecent and four immediately passed observations The reason for maintaining a few observations in asample point is due to the need to evaluate process variation An individual observation does not providesuch information

Transformation is also a key component in the data input module As we will discuss later, both neuralnetworks were trained “off-line” from simulated observations In order to make the proposed schemes workfor various applications, data transformation is necessary to standardize the raw data into the value rangethat both neural network components can work with Formulas for data transformation are as follows:

X

Trang 6

18.3.1.1 Transformation for M-NN Input

Equation (18.1)

where i is the index of observations in a sample or window; t is the index for the sample period, and

and s are estimates of process mean and standard deviation, respectively In traditional control charts, ittakes 100 to 125 observations, e.g., 25 samples of 4 or 5 observations each, to establish the control limits.However, in this case, 20 to 30 observations can provide reasonably good estimates

18.3.1.2 Transformation for V-NN Input

Given the data standardization in Equation 18.1, the input for V-NN of variance detection needs tofurther process as

Equation (18.2)

where t and i are the same as those defined in Equation 18.1, and is the average of five transformedobservations z ti of the sample at time t

18.3.2 Data Processing Module

The heart and soul of the proposed system is a module composed of two independently developed neuralnetworks: M-NN and V-NN M-NN, developed by Chang and Aw (1996), is a 5–8–5–1 four-layer neuralnetwork for detecting process mean shift On the other hand, Chang and Ho’s (1999) V-NN is a 5–12–12–1neural network for detecting process variance shift Data from transformation formulas (Equations 18.1and 18.2) are fed into M-NN and V-NN, respectively Both neural networks have single output nodes.M-NN’s output values range from –1 to +1 A value that falls into a negative range indicates a decrease inprocess mean value, while a positive M-NN output value indicates a potential increase in process meanshift On the other hand, V-NN’s output ranges from 0 to 1 with larger values meaning larger shifts Notethat both neural networks were trained off-line using simulations By incorporating the trained weightmatrices, one can start using the proposed method The only setup required is to estimate both processmean and variance for transformation The central limit theorem guarantees that transformed data is

FIGURE 18.2 A schematic diagram of C-NN (combined neural network) control chart (Adapted from Ho and Chang, 1999, Figure 3, p 1891.)

Sample Observations

Individual Observations

formation

Trans-M-NN

V-NN

Cutoff value(s)

Mean/

Variance Shift

Mean Shift

Fuzzy Classifier

Neural Classifier

Shift Magnitude

Variance Shift

Trang 7

similar to the simulated data used for training Thus the proposed method can be applied to many

applications with various data types as long as they can be defined as QC x Before M-NN and V-NN

are introduced in detail, we first summarize calculation and training of any feedforward, multiple-layer

neural networks as follows

18.3.2.1 Computing in a Neural Network

The most commonly implemented neural network is the multilayer backpropagation network, which

adapts weights according to the steepest gradient descent rule along a nonlinear transformation function

The reason for this popularity is due to the versatility of its paradigm in solving diverse problems, and

its strong mathematical foundation An example of a multilayer neural network is shown in Figure 18.3

In neural networks, information propagates from input nodes (or neurons) through the system’s weight

connections in the middle layers (or hidden layers) of nodes, finally passing out the last layer of nodes

— the output nodes

Each node, for example node j in the hidden and output layers, contains the input links with weights

w ij, an activation function (or transfer function) f, and output links to other nodes as shown in Figure

18.4 Assuming k input links are connected to node j, the output V jof node j is processed by the activation

function

Equation (18.3)

where V piis the output of node i from the previous layer

Many activation functions, e.g., sigmoidal and hyperbolic-tangent functions, are available We choose

to use the sigmoidal function

Equation (18.4)

where c is a coefficient that adjusts the abruptness of the function

FIGURE 18.3 An example of a multilayer neural network.

1

Trang 8

18.3.2.2 Training of a Neural Network

Backpropagation training is the most popular supervised neural network training algorithm The training

is designed to modify the thresholds and weights so that the overall error will be minimized At each

iteration, we first calculate error signals δo, o = 1, 2, , no, for the output layer nodes as follows:

Equation (18.5)

where f ' (I) is the first-order derivative of the activation function f(I); t o is the desired target value; and

V o is the actual output for output node o We then update the weights connected between hidden layer

nodes and output layer nodes:

where η is a constant chosen by users for adjusting training rate; α is a momentum factor; δo is obtained

from Equation 18.5; V h is the output of node h in the last hidden layer; and ∆w ho is the previous weight

change between node h and output node o Subsequent steps include computing the error signals for a

hidden layer(s) and propagating the errors backward toward the input layer The error signals for node

h in the current hidden layer are

Equation (18.7)

where V h is the output for node h in the current hidden layer under consideration; w ih is the weight

coefficient between node h in the current hidden layer and node i in the next hidden layer; δ′i is the error

signal for node i in the next hidden layer; and n′ is the number of nodes in the next hidden layer Given

the error signals from Equation 18.7, the weight coefficient w jh between node j in the lower hidden layer

and node h in the current hidden layer can be updated as follows:

FIGURE 18.4 Node j and its input–output values in a multilayer neural network.

Vp1

V

Node j at Current Layer

Nodes in the Next Layer

W1j

Transfer Function f

Trang 9

where the indicesη and α are defined in Equation 18.6 and V j is the actual output from node j in the

lower hidden layer In summary, the procedure of backpropagation training is as follows:

Step 1 Initialize the weight coefficients

Step 2 Randomly select a data entry from the training data set

Step 3 Feed the input data of the data entry into the network under training

Step 4 Calculate the network outputs

Step 5 Calculate the error signals between the network outputs and desired targets using Equation18.5

Step 6 Adjust the weight coefficients between the output layer and closest hidden layer using Equation18.6

Step 7 Propagate the error signals and weight coefficients backward using Equations 18.7 and 18.8.Step 8 Repeat steps 2 to 7 for each entry in the training set until the network error term drops to anacceptable level

Note that calculations in steps 2 to 4 are done from the input layer toward the output layer, whileweight updates in steps 5 and 7 are calculated in a backward manner The term “backpropagation” comesfrom the way the network weight coefficients are updated

18.3.2.3 Computing and Training of M-NN

The first neural network is a backpropagation type trained by Chang and Aw (1996) It is a 5–8–5–1four-layer network, i.e., five input nodes with two hidden layers, each having eight and five neurons andone output node This network has a unique feature in that the input layer is connected to all nodes inthe other three layers, as shown in Figure 18.5 They trained M-NN by using 900 samples, each with five

observations, simulated from N(µo ±δσo, σo) where µo = 0 and σo = 1 and δ = 0, ±1, ±2, ±3, and ±4.These observations were fed directly to the network and trained by a standard backpropagation algorithm

to achieve a desired output between –1 and 1 The network was originally developed to detect bothpositive and negative mean shifts Since we will analyze positive shifts only, our interest here is in positiveoutput values between 0 and 1 A value close to zero indicates the process is in control while it triggers

an out-of-control signal when the output value exceeds a set of critical cutoff points The larger theoutput value, the larger the process mean shift

18.3.2.4 Computing and Training of V-NN

Chang and Ho (1999) trained a neural network to detect process variation shifts henceforth called

V-NN V-NN is a standard backpropagation network with a 5-12-12-1 structure The number of nodes forinput and output were kept the same so that parallel use of both charts is possible

In training V-NN, 600 exemplar samples were taken from simulated distributions N(µo, (ρσo)2) where

µo= 0 and σo = 1, and ρ = 1, 2, 3, 4, 5 They were then transformed into input values for the neuralnetwork by using Equation 18.2

The desired output, which represents different shift magnitudes, has values between 0 and 1 Thenetwork was trained by a standard backpropagation algorithm with adaptive learning rates A V-NNoutput value close to 0 means the process variation is likely to be in control, while larger values indicatethat the process variation increases The larger the V-NN output, the larger the magnitude of increase

18.3.3 Decision-Making Module

The decision-making module is responsible for interpreting neural network outputs from both M-NNand V-NN The fuzzy set theory is applied to justify human solutions to these problems Before thedecision rules for evaluating both M-NN and V-NN are given, fuzzy sets and fuzzy computing related

to this module are briefly reviewed in the following sections

Trang 10

18.3.3.1 Fuzzy Sets and Fuzzy Computing

Zadeh (1965) emphasized that applications based on fuzzy logic start with a human solution, which isdistinctly different from a neural network solution Motivated by solving complex systems, Zadehobserved that system models based on the first principle, such as physics, are not always able to solve theproblem Any attempt to enhance details of modeling of complex systems often leads to more uncertain-ties On the other hand, a human being is able to offer a solution for such a system from his or herexperiences The fact is that human beings can handle uncertainties much better than a system model can

18.3.3.1.1 Fuzzy Sets and Fuzzy Variables

Zadeh (1965) first introduced the concept of the fuzzy set A member in a fuzzy set or subset has amembership value between [0, 1] to describe how likely it is that this member belongs to the fuzzy set

Let U be a collection of objects denoted generically by {x}, which could be discrete or continuous U is called the universe of discourse and u represents a member of U (Yager and Filev, 1994) A fuzzy set F

in a universe of discourse U is characterized by a membership function µF which takes values in theinterval [0, 1], namely,

That fuzzy set F can be represented as F = {(x, µF (x)), x ∈ U}.

An ordinary set may be viewed as a special case of the fuzzy set whose membership function onlytakes two values, 0 or 1 An example of probability modeling is throwing a dice Assuming a fair dice,

outcomes can be modeled as a precise set A = {1, 2, 3, 4, 5, 6} with probabilities 1/6 for the occurrence

of each member in the set To model this same event in fuzzy sets, we need six fuzzy subsets ONE, TWO,THREE, FOUR, FIVE, and SIX that contain the outcomes of coin flipping In this case, the universe of

discourse U is same as set A and six membership functions µ1, µ2,…, µ6 are for the members in fuzzy

FIGURE 18.5 A proposed two-sided mean shift detection neural network model (Adapted from Chang and Aw,

HiddenLayer 2

OutputLayer

Output

Trang 11

sets ONE, TWO, …, and SIX, respectively We can now define fuzzy sets ONE = {(1, 1), (2, 0), (3, 0),

(4, 0), (5, 0), (6, 0)}, TWO = {(1, 0), (2, 1), (3, 0), (4, 0), (5, 0), (6, 0)}, and so on There is no ambiguity

about this event — you receive a number from 1 to 6 when a dice is thrown Conventional set theory is

appropriate in this case

In general, the grade of each member u in a fuzzy set is given by its membership function µF (u) whose

value is between 0 and 1 Thus the vagueness nature of the real world can be modeled In this paper, the

output from M-NN, for example, can be modeled as a fuzzy variable in that we cannot precisely define

the meaning of an M-NN output value For example, a value 0.4 can mean either a positive small or a

positive medium process mean shift because of the way we define the M-NN output target values Target

values 0.3 and 0.5 are for one-sigma and two-sigma positive mean shifts, respectively One way to model

this vagueness is to define the meaning of M-NN output by several linguistic fuzzy variables as discussed

in the following section

18.3.3.1.2 Membership Functions and Linguistic Fuzzy Variables

A fuzzy membership function, as shown in Equation 18.9, is a subjective description of how likely it is

that an element belongs to a fuzzy set We propose a fuzzy set of nine linguistic fuzzy variables to define

M-NN outputs that take values within the range [–1, 1] The universe of disclosure U is [–1,1] in this

case Nine linguistic fuzzy variables for the M-NN outputs are Extremely Negative Large Shift (XNL),

Negative Large Shift (NL), Negative Medium Shift (NM), Negative Small Shift (NS), No Shift (NO),

Positive Small Shift (PS), Positive Medium Shift (PM), Positive Large Shift (PL), and Extremely Large

Shift (XPL) Each fuzzy set is responsible for one process status; e.g., NS means that the process

experi-ences a negative small mean shift Due to the nature of neural network output, some fuzzy sets overlap

each other; that is, different fuzzy sets share the same members

We use two of the most popular membership functions, triangular and trapezoidal functions, to define

these linguistic fuzzy variables, as shown in Figure 18.6 Note that an M-NN output value 0.4 will generate

two non-zero fuzzy membership values, i.e., µPS (x = 0.4) = 0.5 and µPM (x = 0.4) = 1 In other words,

an M-NN output with value 0.4 most likely belongs to a positive mean shift, although there is a 50%

possibility that it may be a small mean shift

18.3.3.1.3 Fuzzy Operators

An α-cut set of a fuzzy set is defined as the collection of members whose membership values are equal

to or larger than α where α is between 0 and 1 An α-cut set of F is defined as

For example, the 0.5 cut set of PS (positive small mean shift) contains M-NN values between 0.05 and

0.4 Perhaps this concept can be best demonstrated by visualization In Figure 18.6, this represents the

x axis interval supporting the portion of the PS trapezoid above the horizontal 0.5 alpha-level line For

a NO fuzzy variable, its α-cut members {x ∈U| –0.15 ≤ x ≤ 0.15} support a triangular shape

We can rewrite the α-cut of a fuzzy set F as an interval of confidence (IC)

which is a monotonic decreasing function of α, that is,

where for every α, α′∈ [0,1] Note that the closer the value of α is to 1, the more the element u belongs

to the fuzzy set On the other hand, the closer the value of α to 0, the more uncertain we are of the set

membership

For a given α level, if a neural network output value falls into the IC of a fuzzy set, we can classify the

NN output into this fuzzy set; that is, the process status can be identified The ICs for the fuzzy decision

Tiêu đề	A Hybrid Neural Fuzzy System for Statistical Process Control
Tác giả	Shing I Chang
Người hướng dẫn	Jun Wang, Editor
Trường học	Kansas State University
Chuyên ngành	Computational Intelligence in Manufacturing
Thể loại	chapter
Năm xuất bản	2001
Thành phố	Boca Raton

Định dạng
Số trang	22
Dung lượng	286,82 KB