Approaches for efficient tool condition monitoring based on support vector machine 1

Appendix A: TCM Graphs of Feature Selection ta flank weartool state Figure A-1 AE signals, tool wear and identification results of feature set Y1 and Y3 workpiece ASSAB705, insert SNMN12

Trang 1

Appendix A: TCM Graphs of Feature Selection

ta flank weartool state

Figure A-1 AE signals, tool wear and identification results of feature set Y1 and Y3 (workpiece ASSAB705, insert SNMN120408 of material A30,

Trang 2

v=170 m/min, f=0.2 mm/rev, d=1mm)

Trang 3

Trang 4

Figure A-10 Force signals, tool wear and identification results of feature set Z1 and Z2 (workpiece Ti-6Al-4V, insert: SNMG120408 of material AC3000

Trang 5

Appendix B: TCM Graphs with Manufacturing Loss Consideration

Figure B-1 AE, tool wear and tool state prediction from the standard and revised SVM

(workpiece ASSAB705, insert SNMN120408 of material A30,

Trang 6

Trang 7

Figure B-7 AE, tool wear and tool state prediction from the standard and revised SVM (workpiece ASSAB705, insert SNMN120408 of material A30,

v=200 m/min, f=0.2 mm/rev, d=1mm)

Figure B-8 AE, tool wear and tool state prediction from the standard and revised SVM (workpiece ASSAB705, insert SNMG120408 of material AC3000,

v=200 m/min, f=0.3 mm/rev, d=1mm)

Trang 8

v=200 m/min, f=0.3 mm/rev, d=1mm)

(workpiece ASSAB705, insert SNMG120408 of material AC3000,

v=170 m/min, f=0.3 mm/rev, d=1mm)

Trang 9

Trang 10

v=220 m/min, f=0.3 mm/rev, d=1mm)

v=150 m/min, f=0.4 mm/rev, d=1mm)

Trang 11

Appendix C: TCM Graphs in Multiclassification

Figure C-1 Tool state prediction (workpiece ASSAB760, insert SNMN120408 of material A30, v=150 m/min, f=0.4 mm/rev, d=1mm)

(a) AE signals,

(b) Tool wear and tool state prediction from the standard SVM,

(c) Prediction result from the revised SVM

Figure C-2 Tool state prediction (workpiece ASSAB705, insert SNMN120408 of material A30, v=150 m/min, f=0.4 mm/rev, d=1mm)

(a) AE signals,

(b) Tool wear and tool state prediction from the standard SVM,

(c) Prediction result from the revised SVM

W1=0.156 W2=0.530

W1=0.170 W2=0.392

W1=0.204 W2=0.540

W1=0.188 W2=0.353

Trang 12

Appendix D: TCM Graphs in Titanium Machining

Figure D-1 Cutting force, tool wear and tool state prediction from the standard and revised SVM (v=80 m/min, f=0.2 mm/rev, d=0.5mm)

Trang 13

Figure D-4 Cutting force, tool wear and tool state prediction from the standard and revised SVM (v=80 m/min, f=0.1 mm/rev, d=1mm)

Trang 14

Trang 15

Appendix E: SVM Theory in Classification Task

SVM Theory in Classification Task

E.1 Basic theory of SVM

In the past few years, there has been an increasing development in SVM – an optimization method to predict the output of unseen datasets In SVM, the support vectors come from elements of dataset which are specifically chosen for the classification task, and considered important in separating the two classes from each other Meanwhile, other input vectors are ignored because of their insignificance in constructing the optimal hyperplane With the use of structural risk minimization principle, SVM gives good generalization performance on practical problems

SVM was initially developed by Vapnik (1995) for the classification problem with separable data Later it was improved to handle nonseparable data and also adapted to solve the regression problem As a supervised method, SVM can take advantage of prior knowledge of tool wear and construct a hyperplane as the decision surface so that the margin of the separation between different tool state samples is maximized

To explain the basic idea behind the SVM, we start with the simplest case: linear machine trained on separable data

E.1.1 Linear SVM

In the classification problem, a hyperplane is a linear function that is capable of separating the training data without error The minimal distance from the hyperplane

Trang 16

Let’s consider a binary classification task with l training data d

i∈R

x (i=1,L,l)having corresponding class labelsy i =±1 Suppose that the training data can be

separated by a hyperplane decision function (E.1) with appropriate coefficients w and b , where x is an input vector

b

The problem of finding an optimal hyperplane is to find w and b that maximize the margin Using the method of Lagrange multipliers, the quadratic optimization problem with linear constraints may be formally stated below

Given the training samples( x i , y i ),i=1,L, l, find the parameter α to maximize ithe objective function

)(2

1)

(

1 1 1

j i j i l

i l

j j i l

i i

yα

(E.3)

αi ≥0, i=1,L,l (E.4) This leads to a hyperplane in the form

∑

=

−

⋅α

= l

i

i i

i y

1

x w

(E.6) where α are Lagrange multipliers at the solution of the optimization problem 0i(E.2)-(E.4) and the parameter b0 at the solution is computed by taking advantage of the conditions on the support vectors

Trang 17

E.1.2 Nonlinear SVM

As the computation capability of the linear functions is somewhat limited, extending the linear SVM into nonlinear is more practical for solving complex estimation problems

The idea of generating nonlinear SVM is to map the original input space X into a high dimensional feature space H by a function ϕ( x) and then to construct a linear function in the high dimensional feature space which corresponds to a nonlinear function in the original input space

Both the decision function g ( x) and optimization problem remain the same form

except that the input vector x is replaced byϕ( x), or in other word, the inner product

)

(x i⋅x j is replaced by ϕ(x i)⋅ϕ(x j)

While, with the increasing dimension of the used feature space, computation cost will also increase How to efficiently compute the linear function defined in the high dimensional feature space? This problem is solved by using the kernel function

to the number of the dimension of ϕ( x) Thus, the use of the kernel functions makes

it possible to map the data implicitly into a high dimensional feature space

Any function satisfying Mercer’s condition (Mercer, 1909) can be used as the kernel function The commonly used kernel functions are Gaussian kernel

Trang 18

E.1.3 Nonlinear SVM with soft decision boundary

For training data that cannot be separated by a hyperplane without error, it would

be desirable to separate the data with a minimal number of errors as in figure E.1 Positive slack variables ξi are introduced in the constraints to quantify the nonseparable data in the defining condition of the hyperplane (Cortes, 1995), and now the constraints become:

[ i ] i

y (w⋅x )− ≥1−ξ , i=1,L,l (E.8) For a training samplex , the slack variable i ξi is the deviation from the margin border corresponding to the class ofy i

With the method of Lagrange multipliers, finding the optimal hyperplane for the linearly nonseparable case is a quadratic optimization problem with linear constraints, as formally stated next:

Figure E.1 Soft Margin Hyperplane

Given the training data(x i,y i),i=1,L,l , determine the w and b that minimize

the objective function

),(2

1)

(

1 1 1

j i j i l

i l

j j i l

i i

−α

=

α (E.9)

Subject to y i(w⋅ϕ(x i)−b)≥1−ξi, (E.10)

ξi ≥0, i=1,L,l (E.11)

Trang 19

where C is a user-specified parameter, a larger C corresponds to assign a higher

penalty to training errors The weight vectors can be expressed using the following equation

i d

1

)(x

w (E.12)

E.2 Network structure and training method

Figure E.2 shows the network structure of SVM, which includes input layer, output layer and hidden layer (Haykin, 1999)

Figure E.2 SVM Configuration

SVM training algorithm is called sequential minimal optimization (SMO) which has been widely used due to fast learning speed even in large problem Since it always optimizes and alters two Lagrange multipliers at every step, making the problem can

be solved easily and quickly The improved SMO learning algorithm is introduced as

Trang 20

j

i j i j j

F

1

),

}:

F i low = low = i ∈ ∪ ∪ (E.14)

Then optimality conditions will hold at some α iff

τ2+

≤ up

low b

where τ is a positive tolerance parameter

Correspondingly, violation is defined at α if one of the following sets of conditions holds:

4 3

I

i∈ ∪ ∪ , j∈I0∪I1∪I2 and F i >F j +2τ (E.16)

2 1

If the target y1 does not equal the targety2, then the following bounds apply toα2:

L=max(0,α2 -α1 ), H=min(C,C+α2 -α1 ) (E.18)

If the target y1 equals to the targety2, then the following bounds apply toα2:

L=max(0,α2 +α1 -C), H=min(C,α2 +α1 ) (E.19)

The second derivative of the objective function along the diagonal line can be expressed as:

η=K(x1,x1)+K(x2,x2)−2K(x1,x2) (E.20)

Trang 21

;,

2

2 2

2 ,

2

L L

H L

H H

new

new new

new clipped

new

α

αα

α

Otherwise, SMO will move to the Lagrange multipliers to the end point (L or H)

that has the lowest value of the objective function

And if α2new,clipped −α2 ≥eps*( new, clipped +

2

α α2 + eps), then

)(

2 2 2 1

1

clipped new

α = + − (E.22)

where constant eps is a tolerance parameter Otherwise, select another different α2

and repeat the whole process again Note that after a successful step using a pair of

indices (i 2 ,i 1), let ~I = I0∪{i1,i2} (i_low,b low) and (i_up,b up) are computed using

*)(

*),(

*)(

*),(

*)(

2 2 2 1 1

1 1

2

This algorithm first loops over the entire training set for one time, then makes repeated passes over the non-bound examples (0<αi <C) until all of those examples obey the optimality conditions (E.15) Again the algorithm alternates between single pass over the entire training set and multiple passes over the non-bound subset until the number of αi needed to be changed in the entire training set is equal to zero The convergence of this algorithm has been proved by Keerthi and Gilbert (2002)

Trang 22

useful information contained in the data but also unwanted noises, named over-fitting This will certainly lead the network to perform well on the training data, but generalize poorly to similar input-output patterns not used in training In order to avoid over-fitting, generalization must be evaluated in the learning process

Generalization performance measures how well a network performs on unseen data after the training stage has been completed The problem of generalization is analogous to the curve fitting problem with regularization - a good curve fit must represent some underlying trend in the data rather than provide an arbitrarily close fit

to the data points without regard to complexity The generalization performance is evaluated using classification error

Three factors mainly influence the ability of a network to generalize: the size and efficiency of the training set; the architecture of the network; the physical complexity

of the problem

Of these three factors, the last is the only one which we have no control The other two factors, whether the training samples contain sufficient information to enable the network to generalize correctly, and how to optimize the structure and architecture of the network have been investigated in this practical application Thus, the other two statistical metrics, namely classification error, training time and the number of support vectors are added to explain generalization performance

The classification error in the form of percentages measures the deviation between the actual and predicted output, thereby a smaller value suggests a better prediction Classification error can also be used to evaluate the quality of training set and network performance

Trang 23

ignored because of their insignificance in constructing a hyperplane for classification task Under the similar classification error, a smaller number of SVs is preferred Consider an example, if a classification task can be constructed by two kinds network structure according to SV set A and B with similar classification error, and the size of set A is smaller than that of set B, the trained network based on set A is considered more effective and powerful to describe this classification task than that of set B Since the decision time closely relates with the number of SV, a shorter decision time

is also found in set A

About the issue of training time, let’s consider another example Under the same SVM network structure, two classification models with a similar classification error are developed based on training data sets C and D from one data population Assuming the training time of data set C is smaller than that of D, the classification network based on C generalizes quickly than that of D, thereby training data set C is considered to own a better learning performance

E.4 Selecting training data and tuning parameters

E.4.1 Training data selection

When NNs and related methods are used in classification, independent parameters

in this network will be specified in terms of training data Hence, the network performance heavily depends upon the quality of training data, assuming an optimal topology for the network is known In the past, training data is selected arbitrary While, Reeves (1995) found that different training data set gave substantially different

Định dạng
Số trang	29
Dung lượng	641,84 KB