Modeling of Combustion Systems A Practical Approach 12 doc

Notwithstanding, even for theincorrect empirical model, this bias may be nil so long as we are within thebounds of our original data set coded to ±1.. inﬂu-For x >> 1, we need to add man

Trang 1

or other nonideal sources with much less desirable statisticalproperties But even poorly designed (or nondesigned) experi-ments usually contain recoverable information On rarer occa-sions, we may not be able to draw ﬁrm conclusions, but even this

is preferable to concluding falsehoods unawares

We begin our analysis with plant data With the advent of thedistributed control systems (DCSs), plant data are ubiquitous.However, they almost certainly suffer from maladies that lead to

correlated rather than independent errors Also, bias due to an

improper experimental design or model can lead to nonrandomerrors In such cases, a mechanical application of ANOVA and

statistical tests will mislead; F ratios will be incorrect; coefﬁcients

will be biased Since furnaces behave as integrators, we look

brieﬂy at some features of moving average processes and lag plots for serial correlation, as well as other residuals plots The chapter shows how to orthogonalize certain kinds of data sets using source and target matrices and, more importantly, eigenvalues and eigenvectors Additionally, we discuss canonical forms for interpreting

multidimensional data and overview a variety of helpful statistics

to ﬂag troubles Such statistics include the coefficient of tion (r2), the adjusted coefficient of determination (r A2), the prediction sum of squares (PRESS) statistic and a derivative, r P2, and variance inflation factors (VIFs) for multicollinear data We also introduce the hat matrix for detecting hidden extrapolation.

determina-In other cases, the phenomena are so complex or theory solacking that we simply cannot formulate a credible theoretical or

Trang 2

even semiempirical model In such a case, it is preferable to producesome kind of model For this purpose, we shall use purely empiricalmodels, and we show how to derive them beginning with a Taylorseries approximation to the true but unknown function.

This chapter also examines categorical factors and shows how to analyze designs with restricted randomization such as nested and split-plot designs This requires rules for deriving expected mean squares, and we provide them On occasion, the reader may need

to ﬁt parameters for categorical responses, and we touch on this

subject as well

The last part of the chapter concerns mixture designs for fuel

blends and how to simulate complex fuels with many fewer

com-ponents This requires a brief overview of fuel chemistry, which

we present We conclude by showing how to combine mixtureand factorial designs and fractionate them

Plant data typically exhibit serial correlation, often strongly Serial correlation

indicates errors that correlate with run order rather than the random errors

we subsume in our statistical tests Consider a NOx analyzer attached to amunicipal solid waste (MSW) boiler, for example Suppose it takes 45 min-utes for the MSW to go from trash to ash, after which the ash leaves the

Then the natural burning cycle of the unit is roughly 45 minutes or so If

we pull an independent NOx sample every 4 hours, it is unlikely that therewill be any correlation among the data Except in the case of an obviousmalfunction, the history of the boiler 4 hours earlier will have no measurableeffect on the latest sample However, let us investigate what will happen bymerely increasing the sampling frequency

4.1.1 Problem 1: Events Too Close in Time

DCS units provide a steady stream of continual (and correlated) information.Suppose we analyze NOx with a snapshot every hour Will one reading becorrelated with the next? How about every minute? What about every sec-ond? Surely, if the previous second’s analysis shows high NOx, we wouldexpect the subsequent second to be high as well In other words, data that

are very close in time exhibit positive serial correlation Negative serial correlation

is possible, but rarer in plant environments However, it can occur in theplant when one effect inhibits another Nor is this the only cause of serialcorrelation

boiler (Figure 4.1).

Trang 3

4.1.2 Problem 2: Lurking Factors

Lurking factors are an important cause of serial correlation For example, O2concentration affects both NOx and CO emissions If we were so nạve as toneglect to measure the O2 level, we could easily induce a serial correlation.For example, air temperature correlates inversely to airflow, and the formerrelates to a diurnal cycle Therefore, we can also expect airflow with fixeddamper positions, e.g., most refinery burners, to also show a diurnal cycle.Every effect must have a cause If we account for all the major sources offixed variation, then the multiple minor and unknown sources should dis-tribute normally according to the central limit theorem and collect in ourerror term Therefore, it behooves us to find every major cause for ourresponse because major fixed effects in the errors can result in correlatedrather than normally distributed errors

4.1.3 Problem 3: Moving Average Processes

If we consider the boiler furnace as an integrator, then ﬂue gas emissions andcomponents comprise a moving average process — and moving averages are

FIGURE 4.1

A municipal solid waste boiler It takes roughly 45 minutes for the trash-to-ash cycle This

particular unit is equipped with ammonia injection to reduce NOx (From Baukal, C.E., Jr., Ed.,

The John Zink Combustion Handbook, CRC Press, Boca Raton, FL, 2001.)

UNDERGRATE COMBUSTION AIR

Trang 4

highly and positively correlated To see this, consider a random distribution —

a plot of x k against the next data point in time, x k+1

The ﬁrst plot shows 100 nearest neighbors from a uniform random bution plotted one against the other The data were generated with theExcel™ function RAND( )-0.5, representing a uniform distribution withzero mean between –0.5 and 0.5 The nearest-neighbor plot shows no corre-

distri-lation to speak of (r2 = 0.009), the mean is essentially zero ( = 0.04), and the

standard deviation is s = 0.28 These are very close to the expected values

for these statistics, and it is not so surprising that random data show notrend when plotted against nearest neighbors

formed a moving average using the 10 nearest neighbors:

where k indexes each point sequentially Note that the correlation of ξ k with

ξk+1 in Figure 4.2b has an r2 of 84.0% despite being drawn from an originally

FIGURE 4.2

uniform random number generator, –0.5 < x < 0.5 The graph plots each data point against its

the same data as 10-point moving averages Plotting the moving average data in the same

fashion gives noticeably less dispersion (s = 0.09 vs 0.28) and high correlation, despite the fact

that the moving averages comprise uniform random data In the same way, integrating processes such as combustion furnaces can have emissions with serially correlated errors.

(a) Nearest Neighbor Plot, Uniform Random Distribution (b) Nearest Neighbor Plot, Moving Average

A moving average with random data Figure 3.7a shows data from 100 points generated by a

vs x ) The correlation is, as expected, nearly zero (r = 0.009) Figure 3.7b shows

(Figure 4.2a)

But Figure 4.2b tells a different story To create the second plot, we

Trang 5

per-uniform random population with zero mean Also note that the standard ation of the process has fallen by a factor of three (from 0.28 to 0.094) Thedeﬂation of the standard deviation by a factor of three is not a coincidence, forthe denominator in the calculation of standard deviation is , or

devi- However, the mean values for both data sets are virtually tical at ~0.0

iden-Since the mean values are unaffected, we may perform regressions andgenerate accurate values for the coefﬁcients However, as the moving average

process deﬂates s, our F test will errantly lead us to score insigniﬁcant effects

as signiﬁcant ones That is, failure to account for serial correlation in the data

set before analysis will result in inﬂated F tests An analysis showing many

factors to be statistically significant is a red flag for the deflation of variancefrom whatever cause

4.1.4 Some Diagnostics and Remedies

Here are a few things we can do to warn of serial correlation and remedy it:

1 Always check for serial correlation as revealed by an x k vs x k+1 plotand time-ordered residuals

2 Make sure that the data are sufﬁciently separate in time and eachrun condition sufﬁciently long to ensure that the samples are inde-pendent

3 Carefully consider the process, not just the data Since the seriallycorrelated data have both ﬁxed and random components, the prob-

lem becomes assessing which are which One could make an a priori

estimate for a moving average process using a well-stirred model of

sufﬁciently large

4.1.5 Historical Data and Serial Correlation

For historical data, we do not have the privilege of changing how the datawere collected Therefore, we must do our best to note serial correlation anddeal with it after the fact Once we recognize serial correlation, the problembecomes recovering independent errors from correlated ones and using only

the former in our F tests As we have noted, most serial correlation will

evaporate if we can identify lurking factors or the actual cause for thecorrelation We then put that cause into a ﬁxed effect in the model

If there are cyclical trends, an analysis of batch cycles within the plant maylead to the discovery of a lurking factor Failing this, one may be able to usetime series analysis to extract the actual random error term from the correlated

n−1

10 1− =3

the furnace per the transient mass balance for the boiler in Chapter

2 Using such results, we could adjust the sampling period to be

Trang 6

one.1,2 This is not so easy Such models fall into some subset of an gressive-integrated-moving average (ARIMA) model, with the moving aver-age (MA) model being the most likely Time series analysis is a dedicateddiscipline in its own right Often one will have to do supplemental experi-ments to arrive at reasonable estimates and models.

The main subject of this text is semiempirical models, i.e., theoreticallyderived models with some adjustable parameters These are always prefer-able to purely empirical models for a variety of reasons, including a greaterrange of prediction, a closer relation to the underlying physics, and a require-ment for the modeler to think about the system being modeled But in somecases, we know so little about the system that we are at a loss to know how

to begin In such cases, we shall use a purely empirical model

For the time being, let us presume that we have no preferred form for themodel That is, we have sufﬁcient theoretical knowledge to suspect certainfactors, but not their exact relationships to the response For example, sup-

pose we know that oxygen (x1), air preheat temperature (x2), and furnace

temperature (x3) affect NOx We may write the following implicit relation:

(4.1)

where ξ represents the factors in their original metric and φ is the functionalnotation Although we do not know the explicit form of the model, we canuse a Taylor series to approximate the true but unknown model Equation4.2 represents a general Taylor series:

(4.2)

Here ξ refers to the factors, subscripted to distinguish among them We

reference the Taylor series to some coordinate center in factor space (a1, a2, … ,

ap), where each coordinate is subscripted per its associated factor The farther

we move from the coordinate center, the more Taylor series terms we require

to maintain accuracy For Equation 4.1, the Taylor series of Equation 4.2,truncated to second order, gives the following equation:

y= φ ξ ξ ξ( ,1 2, 3)

k a k

k p

1

2!

Trang 7

Now if we code the factors to ±1 with the transforms given earlier, the Taylorseries becomes the simpler Maclaurin series, which by deﬁnition is centered

For nonlinear models, when n < ∞, the series is no longer exact but

approx-imate In such a case we replace the equality (=) by an approximate equality(≈) We illustrate the use of Equations 4.3 and 4.4 with an example

for Two Factors

Problem statement: Use Equations 4.3 and 4.4 to derive theMaclaurin and Taylor series for , truncated to thirdorder What would the corresponding ﬁtted equation look like?

0 0

22 2 0 2 1

3 1

3 2

k k

1

y= φ( ,x x1 2)

Trang 8

Solution: For f = 2 and n = 3, Equation 4.3 becomes

Proceeding step by step, we have the following:

If we were to evaluate the above equation numerically from adata set, we could ﬁt the third-order model

Here, we have grouped the terms in parentheses by overall order

We may derive the Taylor series in the same manner, replacing x k

by ξk – a k and 0 by a k

y

k k

1 0

1

1 0 2 1

1 0

1

1 0 21

2 1 2 00 1

1 2 0

1 2 2 2 2 0 2 22

13

2 0 1 2 2 3

1 2 2 0

x x x x x x x x xx x1 2 x x

2 3 2 3 0 2 3+ ∂

22 2

1 2 0

1 2 2 2 2 0 1 2

1 2

2 0 1 2 2 3

1 2 2 0

2 3 2 3 0 2 3

x x

112 1 2

2 122 1 2

2

222 2 3

111 1 3

112 1 2

2 122 1 2

2

222 2 3

Trang 9

4.2.1 Model Bias from an Incorrect Model Specification

In the previous section, we constructed a model comprising a ﬁnite number

of terms by truncating an inﬁnite Taylor series; therefore, if higher-orderderivatives exist, then they will bias the coefﬁcients We introduced the

explore additional considerations For example, let us suppose that Equation4.5 gives the true model for NOx:

(4.5)

where y is the NOx, A and b are constants, and T is the furnace temperature.

Further, suppose that due to our ignorance or out of convenience or ever, we ﬁt the following (wrong) model:

So long as the series remains inﬁnite, there is a one-to-one correspondencebetween the coefﬁcients and the evaluated derivatives However, once wetruncate the model, this is no longer strictly true: higher-order derivatives

2 3 3 0

0 2 2 2 0 3 3 3 0

12

13

reader to this concept in Chapter 3 beginning with Section 3.4 Here we

Trang 10

will bias the lower-order coefﬁcients Yet, near zero, higher-order terms will

vanish more quickly than lower-order ones So, if x is close to zero then the

model has little bias We refer to the error caused by using an incorrect

mathematical expression as model bias.

At x = 1 each term is weighted by its Maclaurin series coefﬁcient As x

grows beyond 1, then the higher-order terms exert larger and larger ence; so mild extrapolation leads quickly to erroneous results This wouldnot be the case if the model were correct Notwithstanding, even for theincorrect empirical model, this bias may be nil so long as we are within thebounds of our original data set (coded to ±1)

inﬂu-For x >> 1, we need to add many additional terms for the empirical model

to adequately approximate the true model As x grows larger and larger, we

need more and more empirical terms This is so, despite the fact that thetrue model comprises only two terms This is why it is much more preferable

to generate a theoretical or semiempirical form rather than a wholly ical one Nonetheless, an empirical model of second order at most (andusually less) is sufﬁcient for interpolation In other words, empirical modelsare very good interpolators and very poor extrapolators This is true for allmodels in the sense that we may never have exactly the right model form,but it is especially so for empirical models

empir-Suppose that we could expand our model to comprise an inﬁnite number

of terms (which would require an infinite data set to fit) Then we couldevaluate the coefficients for Equation 4.7, generating the following normalequations:

(4.8)

Because we centered x, the sum of the odd powers is zero, but the sum of

the even powers is not Since our approximate model comprises only two

terms — a0 and a1 of Equation 4.6 — the higher-order terms will bias them

A careful examination of Equation 4.8 shows that the even terms bias a0and the odd terms bias a1 We are actually ﬁtting an equation something like

0 1 2 3

1 1 3

3 5 5

Trang 11

where b k and c k are constants accounting for the contributions of the

higher-order derivatives Again, for 0 < x < 1 the sum of the higher powers will

likely be negligible, and for this reason, empirical models are excellent polators Nonetheless, a good theoretical model would eliminate this biasand would require fewer terms for an adequate ﬁt to the data

inter-4.2.2 Design Bias

We have seen from the previous section that an improper model speciﬁcation

is a problem if we extrapolate beyond the bounds of the experimental design.The proper model derived from theoretical considerations ameliorates thisproblem We have also seen that a purely empirical model will do a verygood job within the design boundaries even if it is wrong However, evenwith the proper model, an improper experimental design may still bias the

coefﬁcients We refer to errors introduced by a less than ideal X matrix as

design bias Conversely, proper experimental design can eliminate this bias.

Consider a classical one-factor-at-a-time design given in Table 4.1 Here,

x1 is the excess oxygen concentration in the furnace, x2 is the air preheat

temperature (APH) of the combustion air, and x3 is the furnace temperature,measured at the bridgewall of the furnace (BWT) Let us represent this factor

Trang 12

This is not a promising start, as STS contains not a single zero value;everything mutually biases everything else Coding will zero some of theoff-diagonal values Using the coding transforms, we have

(4.11a)

(4.11b)

(We show a merely to give the coefﬁcient references.) These coded data are

depicts the classical design

It forms a right-angled tetrahedron in factor space Since it is neither scalednor centered, the edges are not equal lengths, nor does the design center(centroid of the tetrahedron) coincide with the center of the factor space(centroid of the cubic region)

Since it is not centered, the design center is not coincident with the center

of the factor space

centers are now coincident However, the design is still not orthogonalbecause it is not balanced about the coordinate center

scaled, and since it is balanced about the origin, it also gives an orthogonal

matrix Let us represent this factor space by T Then,

a a aa a

2 3

better At least a is unbiased, but a to a still bias one another Figure 4.3a

Figure 4.3b is the same design scaled to 0/1 coordinates, but not centered

Figure 4.3c shows the design in ±1 coordinates The design and coordinate

Figure 4.3d is an example of a fractional factorial design It is centered and

Trang 13

If we could transform our design to these coordinates, we would have anorthogonal design In fact, we can.

We have two basic remedies to make designs orthogonal: we can eitherchange the design or morph the factor space Changing the design meansthat before we begin the experiment, we think about what factors are impor-tant and how we can arrange the test matrix to be orthogonal This generates

a balanced design having an equal number of high and low values for eachfactor equidistant from zero in each factor direction, e.g., factorial designs.The advantage of using orthogonal designs is that one can examine inde-pendent factors with clear meaning and perform a number of statistical tests,etc The only “disadvantage” is that it requires up-front thinking RememberWestheimer’s discovery: “a couple of months in the laboratory will save you

a couple of hours at the library.”

FIGURE 4.3

Graphical representation of various experimental designs (a) The classical design in the original

coordinates The coordinate center does not coincide with the center of the design (b) The design coded to 0/1 coordinates This conformally shrinks the factor directions to uniform dimension (c) The design in ±1 coordinates The design and coordinate center are now coincident (X) (d) A design that is orthogonal and centered in the new coordinates.

(a) Design in Original Coordinates

(distorted right-angle tetrahedron)

BWT: 1000 – 2000 F

A Design Point (1 of 4)

Trang 14

4.3.1 Source and Target Matrices: Morphing Factor Space

Suppose we want to convert a source matrix (S) that is nonorthogonal but

full rank and square, such as Matrix 4.10a, into an orthogonal target matrix

(T), such as Matrix 4.12a We could postmultiply by some transformation (F) matrix:

(4.15)

Again, we could just have easily used the original matrix, the above 0/1coding, or the traditional ±1 coding But as this is a classical design, one-factor-at-a-time investigations usually proceed from some origin, which ismore conveniently coded as the coordinate center

Trang 15

We would like to transform S in Matrix 4.16a into T of Matrix 4.12a We

will do this with a transformation matrix:

Before the transformation, we have something like in

s1 ·s2·s3 factor space After the transformation, we have

in t1·t2·t3 factor space This latter function is orthogonal in t1, t2, and t3 Inother words, if and SF = T, (where F maps s1, s2, and s3 onto t1,

t2 , and t3) then So, on the one hand, we have gained independentcoefﬁcients On the other hand, we are not sure what they mean In other

words, we are trading a non-orthogonal design in orthogonal s1·s2·s3 factor

space for an orthogonal design in distorted t1·t2·t3 space If the distorted spacehas no physical meaning, we have gained little

We see that after the fact, it may be possible to ﬁnd combinations of theoriginal factors that represent an orthogonal design However, this is a muchweaker approach than conducting a proper design in the ﬁrst place, becausethe factor combinations often have no real meaning

On the other hand, sometimes a linear combination of factors does havemeaning and the linear combination may actually be the penultimate factor

y= +b0 b s1 1+b s2 2+b s3 3

y= +a0 a t1 1+a t2 2+a t3 3

y=Ta=Sb SFa=Sb

Trang 16

For example, kinetic expressions (those determining the rate of appearance

or disappearance of a species like NOx or CO) are really a function of

collision frequency (Z) But it is not possible to directly observe molecular collisions and hence Z However, Z is related to the temperature (T), pressure (P), and concentration (C) — all increase the collision frequency Suppose,

for the sake of argument, that the actual production rate of an important

species, y = f(ζ), were actually a function of the log of the collision frequency,

ζ = ln(Z), and that Z is given by Equation 4.20:

(4.20)Then

where a0 = ln(b0), x1 = ln(P), , and x3 = ln(C) So for y = φ(ζ), the most parsimonious model would actually be a linear combination of x1, x2, and

x3 In such a case, orthogonal components may be useful to spot such tions in the data However, we do not want to distort the original factors

rela-We seek only to rotate the axes to expose these relations Eigenvectors andeigenvalues can do this for us

4.3.2 Eigenvalues and Eigenvectors

One may use eigenvalues and eigenvectors to decompose a matrix intoorthogonal components, and they are the best alternative for that purposebecause they do not distort the factor space as the source–target method may

do Eigenvalues (ΛΛ) and eigenvectors (K) are deﬁned for a square matrix

(M) of full rank as follows:

λ

1 2'

n

Trang 17

Theoretically, eigenvalues are solutions of an nth-order polynomial

(charac-teristic) equation, where n is the number of rows in the starting matrix,

presuming it is nonsingular and square Matrix algebra texts give the cedure.3 However, the mechanics can become unwieldy and dedicated soft-ware is really a must for this procedure Regrettably, Excel does not have astandard function for this, but software such as MathCAD™ does Dedicatedstatistical software is the best option The procedure can be done in a spread-sheet, but it is tedious, as we show now

pro-We may make use of the trace of the matrix to ﬁnd the eigenvalues The

trace of a matrix is the sum of the diagonal elements We may also deﬁne

traces for higher-order square matrices

In the above equation, we are relying on context to obviate any

equivoca-tion for n, (for M n the superscript is an authentic exponent)

Thus, M2 = MM However, for t n and m n

kk , the superscript n is mere clature Once we have t n, the characteristic equation and its solutions follow:

n k k

if

othherwise

Trang 18

Example 4.2 The Characteristic Equation

Using the Trace Operator

Problem statement: Given Matrix 4.16b, ﬁnd the characteristicequation and the eigenvalues

Solution: Matrix 4.16b is a full-rank (nonsingular) matrix having

four rows (n = 4) We solve for t n in the following manner: Let

Trang 19

Solving for the coefﬁcients of the characteristic matrix according

to Equation 4.24, we have

(the nth coefﬁcient of the characteristic equation is always 1)

and the characteristic equation is 1 – 7λ + 12λ2 – 7λ3 + λ4 nately, this equation factors as (λ2 – 5λ + 1)(λ- 1)2 = 0 with the solu-tions

Fortu-λ =

Since these are solutions for a single variable, one may also use

numerical procedures such as the goal seek algorithm in Excel to

solve for them Also, the rational roots (if they exist) will always

be factors of the constant In our case, the constant is 1, so wewould try ±1, ﬁnding 1 to be a double root, as shown above This

rational roots procedure can often help to factor the equation and

reduce the order of the remainder, simplifying the ﬁnal solution

Trang 20

Analytically, one can always ﬁnd the solutions for polynomials

up to fourth order using various procedures.*

Each eigenvalue has an associated eigenvector such that

where k is an eigenvector The eigenvectors are not unique in the sense that

any scalar multiple of an eigenvector will itself be an eigenvector To resolvethis problem, we shall reduce the eigenvectors to unit magnitude, i.e.,

For real, symmetric matrices (the only kind we need to consider in thistext), the eigenvectors are always orthogonal That is,

where j and k are any two different vectors in the K matrix.

For the case at hand, Equation 4.25 reduces to

(4.27)

We illustrate the procedure for one of the eigenvalues in the next example

Problem statement: For the Matrix 4.27, we have shown that thecharacteristic equation is (λ2 – 5λ + 1)(λ- 1)2 = 0, having solutions

λ =

Find the eigenvector associated with the eigenvalue =

* Any standard mathematical text will have solutions for up to fourth-order polynomials See,

for example, Gellert, W et al., Eds., The VNR Concise Encyclopedia of Mathematics, American

Edi-tion, Van Nostrand Reinhold Company, New York, 1977, pp 80-101 General equations of ﬁfth order and higher have been proven impossible to solve, though many special equations of arbi-

trary order are solvable; e.g., the triquadratic equation ax6 + bx3 + c = 0 may be reduced to a quadratic equation with the substitution u = x3

kj

j

n

2 0

11

λλ

0 1 2 3

0000

Trang 21

Solution: We can ﬁnd the eigenvector numerically using a sheet.

spread-Step 1: First, we substitute a selected eigenvalue, e.g., λ = 0.2097:

Step 2: Now we arbitrarily set k3 = 1,

and reduce the matrix by one column and the eigenvector and lution vector by one row, so that the system becomes soluble

so-Step 3: Premultiplying by the inverse of the matrix we have

But we had arbitrarily set k3 = 1, so the full vector is

and this is an eigenvector associated with λ = 0.2097.

0 1 2 3

11

0000

0 79111

⎟⎟

k k k k

0 1 2 3

0 791111

Trang 22

Step 4: Normalizing this by root of the sum of squares,

, we obtain the unit eigenvectorassociated with λ = 0.2097:

where the subscript denotes the column of the column vector in the

eigenvectors’ matrix K.

So long as the eigenvectors are distinct, this method will lead to the associatedeigenvectors The major advantage of this method is that spreadsheets can doall the calculations However, if the eigenvectors are not distinct (e.g., multipleroots), we will end up with a problem — two different eigenvectors associatedwith two identically valued eigenvalues We can continue without problem toobtain an eigenvector associated with

But we run into trouble almost immediately, solving for the eigenvectorsassociated with the double root, λ = {1, 1}, generating the matrix

It reduces to the following equations: 3k0 + k1 + k2 = –1 and k0 = 0 Substituting

one into the other, we obtain k1 + k2 = –1, from which we may evaluate theremaining two eigenvectors

Trang 23

Here a and b are undetermined coefﬁcients We use a and b because the

remaining two eigenvectors cannot be the same; eigenvectors for real metric matrices are always mutually orthogonal Note that we have not yet

sym-normalized the ﬁrst two vectors in K to unit magnitude, so for now we label the eigenvector matrix as K′′′′ rather than K Now if the ﬁrst two column vectors in K′′′′ (let us call them k0 and k1) are mutually orthogonal, then k0Tk1

= 0, giving ab + (a + 1)(b + 1) + 1 = 0, which reduces to

Arbitrarily choosing b = 1 gives a = –1 Substituting into the matrix gives

Normalizing the ﬁrst two vectors to unit magnitude gives

MathCAD gives the following solution, which the reader may verify isequally correct, yielding the relations given in Equations 4.21, 4.22, and 4.25.(Multiple roots do not have unique associated eigenvectors.)

(4.28)

At any rate, once we obtain the eigenvalues and eigenvectors, we can move

on to making real symmetric matrices orthogonal Least squares solutionsalways generate real symmetric matrices; thus, they are amenable to thistreatment

Recall that for real symmetric matrices, eigenvectors are orthogonal in thestrictest sense And so it follows for our example that

Trang 24

4.3.3 Using Eigenvectors to Make Matrices Orthogonal

Premultiplying Equation 4.21 by KT gives

KTKΛΛ ≡ Λ ≡ KTMK (4.29)

Given y = Xa, we seek another system of factors giving linear combinations

of X such that y = Ub, and also where UTU = D, a diagonal matrix Here is

Trang 25

Step 3: The above equations complete the transformation: substitution

of these relations into Equation 4.30 gives Ub = XKKTa = Xa because

KKT = I; therefore, y = Ub represents an alternate system of factors and coefﬁcients for y = Xa.

To see this, consider the necessary properties of UTU Premultiplying y =

Ub by UT gives UTy = UTUb Substituting in terms of X gives UTU =

(xK)TXK = KTXTXK But M = XTX, and in light of Equation 4.29, this

substi-tution gives UTU = KTMK = ΛΛ Therefore, U is a diagonal matrix — the eigenvector matrix of XTX to be exact Collecting these equations:

u4 = 0.910 + 0.240 (x1 + x2 + x3)

Thus, the u k represent linear combinations of x k, and either system will give

identical values for y: y = a0 + a1x1 + a2x2 + a3x3 = b1u1 + b2u2 + b3u3 + b4u4.The advantage of the eigenvalue procedure over the source–target matrixprocedure is that:

1 We have merely rotated axes, not distorted factors

2 Both the original and new coordinate axes are orthogonal

3 We may apply the procedure to any nonsingular matrix, even when

X is nonsquare, because M = XTX will always be square

Trang 26

4 The new system, y = Ub, is orthogonal Therefore, we estimate b

without design bias

5 If the linear combinations of factors have meaning, they may sent a more parsimonious model and help to identify an importantunderlying relationship

repre-4.3.4 Canonical Forms

Consider a second-order surface given by the general second-order equation

in summation form for f factors.

(4.38)

In this formulation, the inequality in the index of the second-order terms

(j ≤ k ) includes the pure quadratics Now this surface may take severalpossible forms, depending on the coefﬁcients However, by rotating and/ortranslating axes, we can always simplify the equation to either of two forms:

A canonical form (4.39a)

B canonical form (4.40a)

Box and Draper4 call the ﬁrst the A canonical form and the second the B canonical form The A canonical form represents a rotation of axes The B

canonical form represents both a rotation and a translation to a new designcenter If we are far from the design center, the A canonical form will bemore useful If we are close to the design center, we shall prefer the Bcanonical form

4.3.4.1 Derivation of A Canonical Form

We may rewrite Equation 4.38 in matrix form as

2

y kk k u k

Trang 27

a matrix containing all the terms that are second-order overall That is,

Now all second-order coefﬁcients vanish except the pure quadratics

4.3.4.2 Derivation of B Canonical Form

We may also write Equation 4.41 as

2

22

x f

1 2

$

θθθ

λλλ

λ

11 22

1 2

sym

u u

Trang 28

where xT = (1 x1 x2 … x f) and

Then, letting uT = xTK , where K is the eigenvector matrix of B, and Λ =

KTAK, we have

y = a0 + uTΛu B canonical form (4.40b)That is,

Equation 4.40b corresponds to a rotation of axes and a translation to a newcoordinate center By setting ﬁrst derivatives to zero, we see that Λ represents

a set of coordinates in u1·u2···un space that correspond to the extremum(maximum, minimum, or saddle point, collectively referred to as the station-ary point) If the stationary point is close to our design center, we will preferthe A canonical form If not, we shall prefer the B canonical form If we canassign a clear meaning to the linear combinations of the factors derived fromthe canonical analysis, we may well prefer to keep our regression in thecanonical space rather than the original space

4.3.4.3 Canonical Form and Function Shape

One may use these canonical forms to simplify second-order equations ofthe type given in Equation 4.41 by either rotating (A form) or rotating andtranslating (B form) the axes By examining the second-order coefﬁcients ofeither form, one may determine what kind of surface one is dealing with Ifall λkk are positive or negative, then one is dealing with a minimum ormaximum surface, respectively If they are of differing signs, then the surfacehas a min-max or saddle shape If some of the λkk are close to zero and theassociated θk are positive, then the surface is a rising ridge If the associated

222

1

1 2

11 22

1 2

$

'

%

λλ

Trang 29

Example 4.4 Canonical Forms

Problem statement: Consider the second-order equation

y = 1.029 + 0.326 x1 – 0.085 x2 + 0.282 x1 – 0.031 x1x2 – 0.127 x2

(a) Explicitly declare the equation according to the forms of tion 4.39a Use an eigenvector analysis to reduce it to the A canon-ical form and give the explicit equations What can you tell aboutthe surfaces by inspection of the coefﬁcients? (b) Repeat the anal-ysis using the B canonical form beginning with Equation 4.40a

Equa-Solution: a) From the problem statement we ﬁnd

FIGURE 4.4

Various response surfaces Values of λ and θ allow the investigator to quickly determine the shape of the response surface in any number of dimensions When all λ s are of the same sign, then the response surface is a maximum or minimum; when the signs differ, the response surface is a saddle (min-max or col) shape When λ is close to zero, then the design is ﬁrst order

in that response factor.

Trang 30

According to Equation 4.39a, we have

or, explicitly

Making use of y = a0 + (xTK )(KTa ) + (xTK )(KTAK )(KTx), or

equiv-alently, y = a0 + uTθθθθ + uTΛu , we ﬁnd K = eigenvectors (A) This

0 1 2 11 12 22

a a

= +( )⎛⎝⎜ ⎞⎠⎟+( )

⎛

1 2

11 12

12 222

1 2

x x

1 2

Trang 31

So, the A canonical form becomes

The reader may verify that this equation gives exactly the same

values as our starting equation The u vector represents a rotation

of axes that zeroes out the non-diagonal elements for Λ By ining the coefﬁcients of Λ, we see that they are of opposite sign,

exam-indicating a saddle shape oriented along the axes of u1 and u2

The saddle is steeper in the u1 direction as indicated by a

coefﬁ-cient about double that of u2

b) Likewise, from Equation 4.40a, we may write our equation as

u u

x x

−1

1 029 0 163 0 043

0 163 0 282 0 0160

1 2 3

Trang 32

Then we may write y = (xTK )(KTBK )(KTx), or equivalently, y =

uTΛu That is, explicitly

We may derive other statistics related to the ANOVA Some of the more

important ones are the coefficient of determination (r2), the adjusted coefficient

of determination (rA2), the prediction sum of squares (PRESS) statistic, and a coefﬁcient we shall call the coefficient of prediction (r P2) We can also deﬁne

the variance inflation factor (VIF) and leverage statistics We shall speak to each

in turn

4.4.1 The Coefficient of Determination, r2

most misused) statistic for goodness of ﬁt: the coefficient of determination, r2:

The statistic r2 gives the fraction of the total variance accounted for by the

model An r2 of 0.9 (or 90%) means that the model accounts for 90% of thetotal variation of the data For the purposes of combustion modeling, we

desire r2 > 0.8, according to the following scale:

• 0.9 < r2 < 1.0 — very strong correlation

• 0.8 < r2 < 0.9 — strong correlation

• 0.7 < r2 < 0.8 — good correlation

• 0.6 < r2 < 0.7 — fair correlation

• 0.5 < r2 < 0.6 — weak correlation

If r2 = 1, then the model accounts for all of the variation and ﬁts the data

perfectly A related measure is r, the coefficient of correlation, Since

0 < r2 < 1, then

u u

SSRSST

r= r2

r ≥r2.Chapter 3 (Equation 3.43) deﬁned an important ratio — the best known (and

Trang 33

4.4.2 Overfit

Now r or r2 will always increase as the number of adjustable parameters inthe model increases Continuing to add adjustable model parameters even-

tually results in a condition known as overfit Overﬁt is the unjustiﬁed

addi-tion of model parameters resulting in the ﬁtting of random error to falsefactor effects This is a statistical no-no, because random effects cannot berelated to nonrandom variables An example will help make this clear

Suppose we have the hypothetical data given by Table 4.2 Most sheets and calculators have random number (uniform distribution) genera-tors The Excel command RAND( )simulates a uniform random distributionbetween 0 and 1 Therefore, the underlying model to these data is Suppose we ﬁt the following models to the ﬁrst four data points:

spread-Model 0:

Model 1:

Model 2:

Model 3:

All of these models except model 0 are nonsense Notwithstanding, here

are the least squares solutions and the associated r2 values:

Trang 34

If we were to judge based on r2 only, we would prefer model 3 Clearly,

we have gotten too carried away and ﬁt random behavior as if it were

determined by the factor x Our job, using least squares, is to ﬁnd the

underlying model Here the underlying model is But by continuing

in a least squares frenzy, we have gone far beyond ﬁnding the true model

and have force-ﬁt random behavior as a function of x Random behavior is

not a function of any known factor (else it would not be random)

Now the expected value of the ﬁfth observation is 0.5 Model 0 comes theclosest to predicting the true value; model 0:0.57 The rest of the models areway off — model 1: 0.11, model 2: –1.07, and model 3: –0.66 With each

adjustable parameter to the model the r2 has gone up, but the predictivepower is lower than for model 0

4.4.3 Parsing Data into Model and Validation Sets

If we ﬁt random data to a factor, we overﬁt the model and generate nonsense

This is a great danger in judging models with only r2 One way to catch thisbehavior (presuming we have enough data points) is to do the following:

FIGURE 4.5

On overﬁtting data The data were generated by the Excel function RAND( ) , which gives a

uniform distribution between 0 and 1 Therefore, the true model is y = 0.5 + e; i.e., model 0 is

the only sensible model Model 1 is a first-order fit to the first four data points (diamonds); likewise, model 2 is a second-order fit and model 3 a third-order fit Data point 5 (square) is next in the random sequence All models but model 0 are biased and give absurd results for point 5; they represent overfit — the ascription of random data to nonrandom effects Despite

this, all models have higher r2 than model 0 Therefore, r2 is not a useful statistic for revealing

overﬁt In this case, the model with the lowest r2 is actually the best.

Random Value Sequence #

Random Data, curve fit

Random Data, not curve fit

Predicted from curve fit

Deviation from actual

y = a0 + a1x ˆ

y = a0ˆ

y=0 5

Trang 35

1 Parse the data set into two sets, a model set (comprising about twothirds of the points) and a validation set (comprising one third ofthe points).

2 Randomly assign the points to the data sets

3 Fit coefﬁcients to the model set

4 Gauge r2 using the validation set

Because a different set validates the data, the lower limit for r2 is no longer

zero and the difference between r2 coefficients for the whole and parsed datasets is an indicator for overfit We also hope for stable model coefficients.However, other data maladies (e.g., influential and errant response values)

can cause large differences in r2 In other words, this is a severe test forregressed data Nonetheless, if the data pass this test, one can be reasonablysure that the regression is trustworthy

For illustration purposes, and as we have already begun with an 80/20split of the data, let us extend this analysis for all ﬁve possible model/validation sets We shall use four points in the model set and one in the

validation set Table 4.3 shows the results; the statistic r k2 is the r2 statistic

based on the kth point for validation

The largest entries are bolded From examining the table, one sees that infour out of ﬁve instances, no model is superior to model 0 Therefore, themajority report from this analysis is that model 0 is the best model As well,

we could have parsed the data into a 60/40 model/validation split This

would have generated 10 r jk2 statistics: r122, r132, r142, r152, r232, r242, r252, r342, r352,

and r452 and resulted in the same conclusion (In general, there are 2n possible

r2 statistics for all possible parsings into two groups This means that a dataset comprising only 20 values could prepare and analyze over a million suchstatistics Obviously, this is would be overkill.)

4.4.4 The Adjusted Coefficient of Determination, r A 2

The foregoing discussion prods us to search for better goodness-of-ﬁt tics Practitioners have developed several statistics to gauge more fairly the

Trang 36

trade-off between increased fit and reduced degrees of freedom and to alertthe investigator to overfit One such statistic is the adjusted coefficient of

determination, r A2:

(4.44)

Thus, r2 makes use of the sum of squares while r A2 makes use of the meansquares This compensates partially for reducing the degrees of freedom as

we add more terms to the model However, r A2 still overstates the case,

though not as badly as r2

4.4.5 The PRESS Statistic

A better statistic for gauging overfit is the PRESS statistic When we overfitdata, we increase goodness of fit by correlating random error What we reallyhave is noise, but we add a coefficient to the model and treat it as if it were

information In other words, overfit adds bias by equivocating noise with mation This bias may have great inﬂuence, especially at the outskirts of the

infor-model, or beyond What we would really like to know is what the residualwould be if we were to encounter points that were not in our original dataset We have already described one method for detecting this — splitting thedata set into model and validation portions Though a milder test, the PRESSstatistic requires less effort and gives good results Many statistical programsperform it Effectively, we do the following:

1 Delete the ﬁrst point from the model and calculate the variancebetween the deleted point and the model prediction

2 Do this for all n points.

3 Cumulate the variance

We shall call this the sum of squares residual, predicted, or SSRp, but thecommon term in the statistical literature is the PRESS statistic PRESS is acomposite acronym and stands for prediction sum of squares

(4.45)

In Equation 4.45, is the predicted value for y k but regressed from the

data with the kth response deleted Likewise, we may deﬁne the sum ofsquares model, predicted, SSMp as

n p A

SSRSST

Trang 37

For a well-behaved model, we would expect that SSRp ≈ SSR and SSMp ≈ SSM.

Calculating n – 1 regressions does not appear to be a less tedious procedure

than parsing the data set However, we can make use of an identity tocalculate the PRESS statistic in a single regression run using the hat matrix

4.4.6 The Hat Matrix

H = X(XTX)–1XTThe matrix has some remarkable properties:

• It is square symmetric (and singular, except for saturated designs

where it becomes an n-row identity matrix).

• It has as many rows and columns as there are y values in the data set — that is, n rows and n columns.

• It codes for (y hat) in terms of linear combinations of y (Equation

1.80):

• It is idempotent

• Since it is symmetrical, H = HTH = HHT = H2 = Hn This last point

is so because if H2 = H, then we may also write H3 = H(H2) = HH = H.

• The diagonal elements of H are always between 0 and 1:

(4.48)

where h k,k are the diagonal elements of H and k indexes them.

• The sum of the diagonal elements of H is equal to the number of

parameters, p, in the model:

Trang 38

One may use the diagonal hat matrix elements to transform the normalresidual to the deleted residual used in Equation 4.45:

(4.51)

This last property is the one that allows for direct calculation of SSRp in a

single regression, thus avoiding n regressions:

(4.52)

We can use the PRESS statistic to derive a modified coefficient of nation that we shall call the coefficient of determination, predicted

determi-4.4.7 The Coefficient of Determination, Predicted, r p

We may use SSRp to estimate a goodness of ﬁt for the predicted values We

deﬁne the coefﬁcient of determination, predicted (r p) as

overﬁt, the lower r p

For example, let us compare r2, r A2, and r p for the hypothetical data ofshows them

We notice several things:

• First, we note that r2 continually increases as we add more eters to the model This will always be the case for any data set This

param-is why we cannot look at r2 alone to decide on an appropriate model

• The cubic model (model 3) is particularly good at capturing the

variation with an r2 of 92.5% However, the PRESS statistic alerts us

to a problem

h j k h k

n

k j k

k k k

2 1

rp2= −1 SSRp = p

SST

SSMSST

r p2≤r2

Table 4.2 This is best done with dedicated statistical software Table 4.4

Trang 39

• Comparing PRESS (SSE p) to SSE shows that in all cases, the PRESSstatistic exceeds the SSE This is typical What is more important isthat the magnitude of this difference is quite large in some cases andincreases with model order This should alert us to a potential prob-lem regarding the predictive ability of our model.

• Examining r A2 shows that models 1 and 2 are not preferred over

model 0 However, r A2 for model 3 is quite respectable at 70.1% It

is less than r2 of 92.5%, but not sufﬁciently Were we to choose a

model based on r A2, it would be model 3

• In contrast, the r p statistic clearly shows the worsening conditionwith increasing model order Were we to gauge the predictive ability

of the model using this statistic, we would conclude that no model

is better than model 0

Now, suppose the data were not random and that model 3 was actually a

valid model Our statistics would be no different, r p would still be verynegative, and this would lead us to question the validity of model 3 Butsuch skepticism would be appropriate If we were actually trying to ﬁt amodel with four parameters (model 3), it would be best to have more thanﬁve data points Five data points leave only one degree of freedom to assess

random error The r p statistic would prod us to gather more data and let usknow that deletion of points makes a big change in the model form As such,

we could say that r p is a measure of robustness of ﬁt Although no singlestatistic is a panacea, by looking at a variety of statistics we get a good idea

of a model’s ﬁtness for purpose So then, let us continue to build our arsenal

of appropriate statistics to help us understand how well behaved our modeland data are

Trang 40

even these have empirical parameters In all cases, we would prefer to avoidextrapolation, or at least know when it is occurring.

Extrapolation beyond individual factor ranges is easy to detect For

exam-ple, if –1 < x < 1, then x = 2 is clearly an extrapolation; if the new data point

lies outside any of the individual factor boundaries, we can be sure that it

is an extrapolation no matter what the shape of the data cloud in factorspace Now refer to Table 4.5

Before reading on, hazard a guess Does the last data point comprise anextrapolation compared to the previous ones?

The data point is clearly within the range of each individual factor

How-deﬁned by both factors considered simultaneously

Hidden extrapolation refers to data that are outside the joint region of factors

but not outside the range of any individual factor For two factors, one may

jection However, even this may not detect all possible extrapolations Ideally,

we prefer a statistic that is easy to calculate, generates a single score, andreliably ﬂags all extrapolation, hidden or otherwise A variant of the hatmatrix will do the trick

It turns out that H deﬁnes an ellipsoid or hyperellipsoid in p-dimensional

factor space that bounds the cloud of data If our new data point lies outside

this ellipsoid, then we have an extrapolation The diagonal elements of H

measure the distance between the kth value of X and the mean value of all

the X values We can think of them as the distances in p-factor space from

the point to the center of the data cloud We know from Equation 4.49 that

the diagonal elements of H sum to p, so the mean value of the diagonal elements of H must be p/n Thus, if we have a new value and want to test

for it being an outlier, it seems sensible to calculate xT(XTX)–1x and compare

it to p/n If xT(XTX)–1x≈ p/n, then the point represents an interpolation — it

ever, Figure 4.6 shows that the data point actually lies outside the joint region

easily detect hidden extrapolation by plotting data as we have done in Figure4.6 For three or more factors, one may plot every possible two-factor pro-

Tiêu đề	Analysis of Nonideal Data
Trường học	Taylor & Francis Group
Chuyên ngành	Modeling of Combustion Systems
Thể loại	Thesis
Năm xuất bản	2006
Thành phố	New York

Định dạng
Số trang	128
Dung lượng	2,04 MB