1. Trang chủ
  2. » Khoa Học Tự Nhiên

Chemometrics in food chemistry

513 427 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 513
Dung lượng 18,97 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Numbers in Parentheses indicate the pages on which the author’s contributions begin.Jose´ Manuel Amigo 265, 343, Department of Food Science, Quality andTechnology, Faculty of Life Scienc

Trang 4

Volume 28 Chemometrics in Food

Chemistry

Edited by

Federico MariniDepartment of Chemistry,University of Rome “La Sapienza”,

Rome, Italy

AMSTERDAM • BOSTON • HEIDELBERG • LONDON • NEW YORK • OXFORD PARIS • SAN DIEGO • SAN FRANCISCO • SYDNEY • TOKYO

Trang 5

First edition 2013

Copyright © 2013 Elsevier B.V All rights reserved

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher

Permissions may be sought directly from Elsevier’s Science & Technology Rights

Department in Oxford, UK: phone (+ 44) (0) 1865 843830; fax (+ 44) (0) 1865 853333; email: permissions@elsevier.com Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions , and selecting

Obtaining permission to use Elsevier material

Notice

No responsibility is assumed by the publisher for any injury and/or damage to persons or property

as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of Congress

ISBN: 978-0-444-59528-7

ISSN: 0922-3487

For information on all Elsevier publications

visit our web site at store.elsevier.com

Printed and bound in Great Britain

13 14 15 16 10 9 8 7 6 5 4 3 2 1

Trang 6

Contributors xi

Federico Marini

Mario Li Vigni, Caterina Durante, and Marina Cocchi

v

Trang 7

3.1 Univariate Linear Regression: Introducing the

3.2 Multivariate Generalization of the Ordinary

3.3 Principal Component Regression 136

6.1 Interpretation of the Structured Part 157

Marta Bevilacqua, Remo Bucci, Andrea D Magrı`,

Antonio L Magrı`, Riccardo Nescatelli, and Federico Marini

1.1 Classification of Classification Methods 172

2 Discriminant Classification Methods 176

2.2 Extended Canonical Variates Analysis 1882.3 Partial Least Squares Discriminant Analysis 195

2.5 Density-Based Methods (Potential Functions) 208

Trang 8

2.6 Other Discriminant Classification Methods 215

3 MCR Applied to Qualitative and Quantitative Analysis

Jose´ Manuel Amigo and Federico Marini

1 Introduction: Why Multiway Data Analysis? 266

2 Nomenclature and General Notation 266

3.2 PARAFAC Iterations Convergence to the Solution

3.4 Model Validation Selection of the Number of Factors 2733.5 Imposing Constraints to the Model 275

4.2 Resemblances and Dissimilarities Between PARAFAC and

5.4 Some Considerations on the Core Array 290

Trang 9

6.1 Multilinear PLS (N-PLS) 298

8 Robust Methods in Analysis of Multivariate

Ivana Stanimirova, Michał Daszykowski,

and Beata Walczak

2 Basic Concepts in Robust Statistics 3172.1 Classic and Robust Estimators of Data Location and Scale 3192.2 Robust Estimates of Covariance and Multivariate

3 Robust Modelling of Data Variance 3233.1 Spherical Principal Component Analysis 3243.2 Robust PCA using PP with the Qn Scale 3263.3 ROBPCA: A Robust Variant of PCA 328

4.2 RSIMPLS and RSIMCD: Robust Variants of SIMPLS 3314.3 Spatial Sign Preprocessing and Robust PLS 3314.4 Identification of Outlying Samples using a Robust Model 332

5 Discrimination and Classification 3335.1 Classic and Robust Discrimination 3335.2 Classic and Robust Classification 334

6 Dealing with Missing Elements in Data Containing Outliers 334

Part II

Applications

9 Hyperspectral Imaging and Chemometrics: A Perfect

Combination for the Analysis of Food Structure,

Jose´ Manuel Amigo, Idoia Martı´, and Aoife Gowen

1.2 The Role of Hyperspectral Image in Food

Trang 10

2 Structure of a Hyperspectral Image 350

3 Hyperspectral Analysis and Chemometrics: Practical Examples 352

3.3 Unsupervised Techniques to Explore the Image: PCA 3553.4 Supervised Techniques for Classification of Features 3583.5 Regression Modelling for Obtaining Quantitative

Information from Hyperspectral Images 362

Lucia Bertacchini, Marina Cocchi, Mario Li Vigni, Andrea

Marchetti, Elisa Salvatore, Simona Sighinolfi, Michele

Silvestri, and Caterina Durante

1.1 Authenticity and Traceability: The European Union

1.2 Authenticity and Traceability: A Scientific Point of View 374

2.1 Chemometrics Approaches for Soil Sampling Planning

2.2 Geographical Traceability of Raw Materials for PDO

3.1 Study of Grape Juice Heating Process in a Context

3.2 Study of Sensory and Compositional Profiles During

3.3 Characterisation and Classification of Ligurian Extra

Alberta Tomassini, Giorgio Capuani, Maurizio Delfini,

and Alfredo Miccheli

3 NMR-Base Metabolomics Applications 420

3.2 Quality Control: Geographical Origin and Authentication 4263.3 Quality Control, Adulteration, and Safety 433

Trang 11

3.4 Quality Control and Processing 437

12 Interval-Based Chemometric Methods in NMR

Francesco Savorani, Morten Arendt Rasmussen, A˚smund

Rinnan, and Sren Balling Engelsen

2.5 Requirements for Bilinear Models 461

Trang 12

Numbers in Parentheses indicate the pages on which the author’s contributions begin.Jose´ Manuel Amigo (265, 343), Department of Food Science, Quality andTechnology, Faculty of Life Sciences, University of Copenhagen, Frederiksberg

Maurizio Delfini (411), Department of Chemistry, Sapienza University of Rome,Rome, Italy

Caterina Durante (55, 371), Department of Chemical and Geochemical Sciences,University of Modena and Reggio Emilia, Modena, Italy

Søren Balling Engelsen (449), Department of Food Science, Quality & Technology,Faculty of Science, University of Copenhagen, Frederiksberg C, DenmarkAoife Gowen (343), School of Biosystems Engineering, University College Dublin,Dublin 4, Ireland

Anna de Juan (235), Department of Analytical Chemistry, Universitat de Barcelona,Martı´ i Franque`s, Barcelona, Spain

Riccardo Leardi (9), Department of Pharmacy, University of Genoa, Genoa, ItalyMario Li Vigni (55, 371), Department of Chemical and Geochemical Sciences,University of Modena and Reggio Emilia, Modena, Italy

Andrea D Magrı` (171), Department of Chemistry, University of Rome “LaSapienza”, Rome, Italy

Antonio L Magrı` (171), Department of Chemistry, University of Rome “LaSapienza”, Rome, Italy

xi

Trang 13

Andrea Marchetti (371), Department of Chemical and Geochemical Sciences,University of Modena and Reggio Emilia, Modena, Italy

Federico Marini (1, 127, 171, 265), Department of Chemistry, University of Rome

“La Sapienza”, Rome, Italy

Idoia Martı´ (343), Analytical and Organic Chemistry Department, Universitat Rovira iVirgili, Tarragona, Spain

Sı´lvia Mas (235), Department of Analytical Chemistry, Universitat de Barcelona,Martı´ i Franque`s, Barcelona, Spain

Alfredo Miccheli (411), Department of Chemistry, Sapienza University of Rome,Rome, Italy

Riccardo Nescatelli (171), Department of Chemistry, University of Rome “LaSapienza”, Rome, Italy

Morten Arendt Rasmussen (449), Department of Food Science, Quality &Technology, Faculty of Science, University of Copenhagen, Frederiksberg C,Denmark

A˚smund Rinnan (449), Department of Food Science, Quality & Technology, Faculty

of Science, University of Copenhagen, Frederiksberg C, Denmark

Elisa Salvatore (371), Department of Chemical and Geochemical Sciences, University

of Modena and Reggio Emilia, Modena, Italy

Francesco Savorani (449), Department of Food Science, Quality & Technology,Faculty of Science, University of Copenhagen, Frederiksberg C, DenmarkSimona Sighinolfi (371), Department of Chemical and Geochemical Sciences,University of Modena and Reggio Emilia, Modena, Italy

Michele Silvestri (371), Department of Chemical and Geochemical Sciences,University of Modena and Reggio Emilia, Modena, Italy

Ivana Stanimirova (315), Department of Analytical Chemistry, ChemometricResearch Group, Institute of Chemistry, The University of Silesia, Katowice,Poland

Alberta Tomassini (411), Department of Chemistry, Sapienza University of Rome,Rome, Italy

Beata Walczak (315), Department of Analytical Chemistry, Chemometric ResearchGroup, Institute of Chemistry, The University of Silesia, Katowice, PolandFrank Westad (127), CAMO Software AS, Oslo, Norway

Trang 14

For many years, food was not considered an important or even decent tific subject “Food belongs in the kitchen!” Those days are over and for goodreasons Food still belongs in the kitchen, but at the same time food science is

scien-an extremely challenging, interesting scien-and rewarding area of research.Food is of a fundamental importance and covers complicated and cross-disciplinary aspects ranging from e.g sensory perception, culture, nutrition,gastronomy, physics, chemistry and engineering

l What is the impact of seasonal variations in the raw material?

l How will the long-term stability of cream cheese change when switching

to another breed of cows?

l How to evaluate the complex changes in aroma occurring over the course

l How can the Maillard reaction be controlled during cooking?

l Can we have more timely and more accurate characterization of whetherproduction is running smoothly?

The above questions are difficult to answer without comprehensive and relevantinformation Such information will almost invariably be multivariate in nature

in order to comprehensively describe the complex underlying problems fore, the need for advanced experimental planning and subsequent advanceddata analysis is obvious Chemometrics provides the necessary tools for digginginto food-related problems This book is a highly needed and relevant contribu-tion to the food research area in this respect The book provides an impressive,very detailed and illustrativetour de force through the chemometric landscape.This book will prove useful to newcomers trying to understand the field ofchemometrics, for the food researcher wanting to more actively use chemo-metric tools in practice and to teachers and students participating in chemo-metrics courses

There-A recurring motto in our Department of Food Science has been

If you think rocket science is difficult—try food science

xiii

Trang 15

With this book, you can actually seriously start to unravel the deep and cate mysteries in food science and I would like to sincerely thank FedericoMarini and the many competent researchers for taking time to write this book.Enjoy!

intri-Rasmus BroFrederiksberg, Denmark, May 2013

Trang 16

Federico Marini*

Department of Chemistry, University of Rome “La Sapienza”, Rome, Italy

* Corresponding author: federico.marini@uniroma1.it

Chapter Outline

1 Another Book on the Wall 1

2 Organisation of the Book 2

1 ANOTHER BOOK ON THE WALL

Issues related to food science and authentication are of particular importance,not only for researchers but also for consumers and regulatory entities Theneed to guarantee quality foodstuff—where the word “quality” encompassesmany different meanings, including, for example, nutritional value, safety ofuse, absence of alteration and adulterations, genuineness, typicalness, and so

on[1]—has led researchers to look for more and more effective tools to tigate and deal with food chemistry problems As even the simplest food is acomplex matrix, the way to investigate its chemistry cannot be other thanmultivariate [2] Therefore, chemometrics is a necessary and powerful tool

inves-in the field of food analysis and control[3–5]

Indeed, since the very beginning, chemometrics has been dealing with ferent problems related to food quality[6–8] Today, when considering foodscience in general and food analysis and control in particular, several problemscan be listed in the resolution of which chemometrics can be of utmost impor-tance and relevance Traceability[9,10], that is, the possibility of verifyingthe animal/botanical, geographical and/or productive origin of a foodstuff, is,for instance, one of the issues where the use of chemometric techniques is notonly recommended but essential[11]; indeed, till date, no specific chemicaland/or physico-chemical markers have been identified that can be univocallylinked to the origin of a foodstuff, and the only way of obtaining a reliable trace-ability is by application of multivariate classification to experimental finger-printing results [12,13] Another area where chemometrics is of particularimportance is in building the bridge between consumer preferences, sensory

dif-Data Handling in Science and Technology, Vol 28 http://dx.doi.org/10.1016/B978-0-444-59528-7.00001-6

Trang 17

attributes and molecular profiling of food[14,15]; indeed, by identifying latentstructures among the data tables, bilinear modelling techniques (such as PCA,MCR, PLS and its various evolutions) can provide an interpretable and reliableconnection among these domains Other problems that can be listed include pro-cess control and monitoring[16], the possibility of using RGB or hyperspectralimaging techniques to non-destructively check food quality[17,18], calibration

of multidimensional or hyphenated instruments[19,20,21]and so on

Despite these considerations, while a huge amount of the literature dealswith the design of chemometric techniques and their application to differentambits of food science, a general monograph covering the main aspects of thistopic as comprehensively as possible is lacking This book aims to fill the gap,such that it can be used by both food chemists wanting to learn how chemo-metric techniques can help in many aspects of their work and chemometri-cians having to deal with food-related problems

2 ORGANISATION OF THE BOOK

The twofold scope (and the corresponding prospective audience) of the bookdrives the way it is conceived and organised Indeed, the monograph is orga-nised in two parts: a first part (Chapters 2–8) covering the theory, and a sec-ond part (Chapters 9–12) presenting some selected applications ofchemometrics to “hot topics” in food science As it is hoped that this bookwill be read and used not just by “professional” chemometricians, all thetopics, especially the ones in the theoretical part, are covered extensively,starting from a beginner level up to an intermediate or advanced one In thesame theoretical part, the description of the methods is accompanied by awide variety of examples taken from food science to illustrate how the differ-ent techniques can be fruitfully applied to solve real-world food-relatedissues

In particular, the first chapters of this book are suitable to be used as anintroductory textbook on chemometrics or as a self-study guide, as they covermost of the principal aspects of the topic; the reader who is more interested inspecific topics and/or applications can just pick the chapters that she/he pre-fers as each of the chapters is self-contained As already anticipated, the firstpart of the book covers the theory of the main chemometric methods and eachchapter is meant to be a tutorial on the specific topic The aim ofChapter 2is

to review the rationale and strategies for the design of experiments, whichconstitute a fundamental step in the set-up of any kind of experimental proce-dure The topics covered include screening and two-level factorial designs,multi-level designs for both qualitative and quantitative variables, andresponse surface methodologies Chapter 3 presents an extensive description

of the chemometric methods used for exploratory data analysis, with the tion specifically focused on principal component analysis (PCA) and data pre-processing methods Additional topics covered include descriptive statistics

Trang 18

atten-and other projection methods such as multidimensional scaling atten-and nonlinearmapping.Chapter 4is devoted to calibration, from univariate to multivariate,and discusses extensively the strategies for model validation and interpreta-tion The topics covered include ordinary least squares, principal componentregression, Partial least squares (PLS) regression, identification of outliersand variable selection The aim ofChapter 5 is to provide the reader with acomprehensive description of chemometric pattern recognition tools A dis-tinction is provided between discriminant and modelling approaches and themost frequently used techniques (LDA, QDA, kNN, PLS-DA, SIMCA,UNEQ and density methods) are described in detail Taken together,Chapters2–5cover the theory behind the most fundamental chemometric methods; onthe other hand,Chapters 6–8describe some advanced topics that have gainedmore and more importance during the last years.Chapter 6is focused on mul-tivariate curve resolution (MCR) for single data matrices and for multi-setconfiguration Basic MCR theory is reviewed together with a detailed discus-sion of all the different scenarios in food control where this approach could be

of importance.Chapter 7presents an overview of the chemometric techniquesused for the analysis of multi-way arrays, that is, the data arrays resultingfrom experiments in which a signal is recorded as a function of more thantwo sources of variation The topics covered include methods for deconvolu-tion/resolution (PARAFAC and PARAFAC2), data description (TUCKER)and calibration (N-PLS and multi-way covariate regression) Finally,Chapter 8discusses robust methods, that is, methods that provide a reliable answer evenwhen a relatively high percentage of anomalous observations are present Thetopics covered include robust measures of location and scale, robust PCA andPLS, and robust classification methods

The second part of the book—Chapters 9–12—presents some selectedapplications of chemometrics to different topics of interest in the field of foodauthentication and control Chapter 9 deals with the application of chemo-metric methods to the analysis of hyperspectral images, that is, of thoseimages where a complete spectrum is recorded at each of the pixels After adescription of the peculiar characteristics of images as data, a detailed discus-sion on the use of exploratory data analytical tools, calibration and classifica-tion methods is presented The aim ofChapter 10is to present an overview ofthe role of chemometrics in food traceability, starting from the characterisa-tion of soils up to the classification and authentication of the final product.The discussion is accompanied by examples taken from the different ambitswhere chemometrics can be used for tracing and authenticating foodstuffs.Chapter 11introduces NMR-based metabolomics as a potentially useful toolfor food quality control After a description of the bases of the metabolomicsapproach, examples of its application for authentication, identification ofadulterations, control of the safety of use, and processing are presented anddiscussed Finally,Chapter 12introduces the concept of interval methods inchemometrics, both for data pretreatment and data analysis The topics

Trang 19

covered are the alignment of signals using iCoshift, and interval methods forexploration (iPCA), regression (iPLS) and classification (iPLS-DA, iECVA),and the important roles they play in the emerging discipline of foodomics.Moreover, the book is multi-authored, collecting contributions from aselected number of well-known and active chemometric research groupsacross Europe, each covering one or more subjects where the group’s exper-tise is recognised and appreciated This interplay of high competences repre-sents another added value to the proposed monograph.

[5] Forina M, Casale M, Oliveri P Application of chemometrics to food chemistry In: Brown SD, Tauler R, Walczak B, editors Comprehensive chemometrics, vol 4 Oxford, UK: Elsevier; 2009 p 75–128.

[6] Saxsberg BEH, Duewer DL, Booker JL, Kowalski BR Pattern recognition and blind assay techniques applied to forensic separation of whiskies Anal Chim Acta 1978;103:201–12.

[7] Kwan WO, Kowalski BR Classification of wines by applying pattern recognition to cal composition data J Food Sci 1978;43:1320–3.

chemi-[8] Forina M, Armanino C Eigenvector projection and simplified non-linear mapping of fatty acid content of Italian olive oils Ann Chim 1982;72:127–41.

[9] Brereton P Preface to the special issue “Food authenticity and traceability” Food Chem 2010;118:887.

[10] Guillou C Foreword to the special issue “Food authenticity and traceability” Food Chem 2010;118:888–9.

[11] Available from: http://www.trace.eu.org , last accessed 22.03.2013.

[12] Reid LM, O’Donnell CP, Downey G Recent technological advances for the determination

of food authenticity Trends Food Sci Technol 2006;17:344–53.

[13] Luykx DMAM, van Ruth SM An overview of the analytical methods for determining the geographical origin of food products Food Chem 2008;107:897–911.

[14] Naes T, Risvik E, editors Multivariate analysis of data in sensory science Amsterdam, The Netherlands: Elsevier; 1996.

[15] Naes T, Brockhoff PM, Tomic O Statistics for sensory and consumer science New York, NY: John Wiley and Sons; 2010.

[16] Bro R, van den Berg F, Thybo A, Andersen CM, Jørgensen BM, Andersen H Multivariate data analysis as a tool in advanced quality monitoring in the food production chain Trends Food Sci Technol 2002;13:235–44.

[17] Pereira AC, Reis MS, Saraiva PM Quality control of food products using image analysis and multivariate statistical tools Ind Eng Chem Res 2009;48:988–98.

Trang 20

[18] Gowen AA, O’Donnell CP, Cullen PJ, Downey G, Frias JM Hyperspectral imaging—an emerging process analytical tool for food quality and safety control Trends Food Sci Tech- nol 2007;18:590–8.

[19] Amigo JM, Skov T, Bro R ChroMATHography: solving chromatographic issues with ematical models and intuitive graphics Chem Rev 2010;110:4582–605.

math-[20] Pierce KM, Kehimkar B, Marney LC, Hoggard JC, Synovec RE Review of chemometric analysis techniques for comprehensive two dimensional separations data J Chromatogr A 2012;1255:3–11.

[21] de Juan A, Tauler R Factor analysis of hyphenated chromatographic data—exploration, olution and quantification of multicomponent systems J Chromatogr A 2007;1158:184–95.

Trang 22

res-Theory

Trang 24

Experimental Design

Riccardo Leardi1

Department of Pharmacy, University of Genoa, Genoa, Italy

1 Corresponding author: riclea@difar.unige.it

at a time is the correct approach)

Instead, the “optimization” performed OVAT does not guarantee at all thatthe real optimum will be hit This is because this approach would be validonly if the variables to be optimized were totally independent from each other,

a condition that very seldom happens to be true

By studying OVAT the interactions among variables will be totallymissed

Data Handling in Science and Technology, Vol 28 http://dx.doi.org/10.1016/B978-0-444-59528-7.00002-8

Trang 25

What is an interaction? Let us try to explain this concept with some ples taken from everyday life.

exam-If somebody asks you what is the best gear in which to ride a bike, yourreply would surely be: ‘It depends.’

‘What is the best cooking time for a cake?’ ‘It depends’

‘What is the best waxing for your skis?’ ‘It depends’

‘What is the best setup for a racing car?’ ‘It depends’

This means that you do not have ‘the best’ gear, but the best gear depends onthe levels of the other factors involved, such as the slope of the road, thedirection and the speed of the wind, the quality of the cyclist, how tired thecyclist is and the speed he wants to maintain

Similarly, when baking a cake the best time depends on the temperature ofthe oven, the best waxing depends on the conditions of the weather and of thesnow, the best setup for a racing car depends on the circuit and so on.Every time your reply is ‘it depends’ it means that you intuitively recog-nize that the effect of the factor you are talking about is not independent ofthe levels of the other factors; this means that an interaction among those fac-tors is relevant and that not taking it into account can give terrible results

So, it is evident that the housewife knows very well that there is a stronginteraction between cooking time and oven temperature, a cyclist knows verywell that there is an interaction between the gear and the surrounding condi-tions and so on

Of course, you will never hear a housewife using the word ‘interaction’,but her behaviour demonstrates clearly that she intuitively understands what

an interaction is

Could you imagine somebody looking for the best gear on a flat course(i.e changing gear while keeping all the remaining variables constant) andthen using it on any other course simply because the first set of experimentsdemonstrated that it was the best?

Well, chemists optimizing their procedures OVAT behave in the verysame way!

Why do the very people who answer ‘it depends’ on a lot of questionsabout their everyday life never give the same answer when entering a laband working as chemists?

Why, when looking for the best pH, do chemists usually behave like thefoolish cyclist described earlier, changing the pH and keeping constant allthe remaining variables instead of thinking that the ‘best pH’ may depend

on the setting of the other variables?

While in the OVAT approach the only points about which something isknown are the points where the experiments have been performed, the experi-mental design, by exploring in a systematic way the whole experimentaldomain, also allows to obtain a mathematical model by which the value ofthe response in the experimental domain can be predicted with a precision

Trang 26

that, provided that the experimental variability is known, can be estimatedeven before performing the actual experiments of the design and that onlydepends on the arrangement of the points in space and on the postulatedmodel (this will be explained in greater detail later on) This means goingfrom a local knowledge to a global knowledge.

By comparing the information obtained by an OVAT approach with theinformation obtained by an experimental design we can say that:

 The experimental design takes into account the interactions among thevariables, while the OVAT does not;

 The experimental design provides a global knowledge (in the wholeexperimental domain), while the OVAT gives a local knowledge (onlywhere the experiments have been performed);

 In each point of the experimental domain, the quality of the informationobtained by the experimental design is higher than the informationobtained by the OVAT;

 The number of experiments required by an experimental design is smallerthan the number of experiments performed with an OVAT approach.Summarizing, it should be clear that:

 The quality of the results depends on the distribution of the experiments inthe experimental domain;

 The optimal distribution of the experiments depends on the postulated model;

 Given the model, the experimental limitations and the budget available(¼maximum number of experiments), the experimental design will detectthe set of experiments resulting in the highest possible information.People should also be aware that building the experimental matrix (i.e decid-ing which experiments must be performed) is the easiest part of the wholeprocess, and that in the very great majority of the cases it can be performed

by hand, without any software

What is difficult is rather the definition of the problem: Which are the tors to be studied? Which is the domain of interest? Which model? How manyexperiments?

fac-To perform an experimental design, the following five steps must beconsidered:

1 Define the goal of the experiments Though it can seem totally absurd,many people start doing experiments without being clear in their minds

as to what the experiments are done for This is a consequence of the eral way of thinking, according to which once you have the results you cananyway extract information from them (and the more experiments havebeen performed, the better)

gen-2 Detect all the factors that can have an effect Particular attention must begiven to the words ‘all’ and ‘can’ This means that it is not correct to

Trang 27

consider a predefined number of factors (e.g let us take into account onlythree factors), and saying that a factor ‘can’ have an effect is totally differ-ent from saying that we think that a factor has an effect One of the mostcommon errors is indeed that of performing what has been defined a ‘sen-timental screening’, often based only on some personal feelings rather than

on scientific facts

3 Plan the experiments Once the factors have been selected, their rangeshave been defined and the model to be applied has been postulated, thisstep requires only a few minutes

4 Perform the experiments While in the classical way of thinking this is themost important part of the process, in the philosophy of experimentaldesign doing the experiments is just something that cannot be avoided inorder to get results that will be used to build the model

5 Analyse the data obtained by the experiments This step transforms datainto information and is the logical conclusion of the whole process.Very often one single experimental design does not lead to the solution of theproblem In those cases the information obtained at point 5 is used to reformu-late the problem (removal of the non-significant variables, redefinition of theexperimental domain, modification of the postulated model), after which onegoes back to step 3

As the possibility of having to perform more than one single experimentaldesign must always be taken into account, it is wise not to invest more than40% of the available budget in the first set of experiments

The 2kFactorial Designs are the simplest possible designs, requiring a number ofexperiments equal to 2k, wherek is the number of variables under study In thesedesigns each variable has two levels, coded as1 and þ1, and the variables can beeither quantitative (e.g temperature, pressure, amount of an ingredient) or quali-tative (e.g type of catalyst, type of apparatus, sequence of operations)

The experimental matrix fork ¼3 is reported inTable 1, and it can be seenthat it is quite easy to build it also by hand The matrix has eight rows (23,each row corresponding to an experiment) and three columns (each columncorresponding to a variable); in the first column the1 and þ1 alternate atevery row, in the second column they alternate every second row and in thethird column they alternate every fourth row The same procedure can be used

to build any Factorial Design, whatever the number of variables

From a geometrical point of view, as shown inFigure 1, a Factorial Designexplores the corners of a cube (if the variables are more than three, it will be ahypercube; our mind will no more be able to visualize it, but from the mathe-matical point of view nothing will change)

Contrary to what happens in the OVAT approach, in which variable 1 ischanged while variables 2 and 3 are kept constant, in the Factorial Design

Trang 28

variable 1 is changed while variables 2 and 3 have different values (of coursethe same happens for all the variables).

This means that the Factorial Design is suitable for estimating the tions between variables (i.e the difference in changing variable 1 when vari-able 2 is at its higher level or at its lower level and so on)

interac-The mathematical model is therefore the following:

Y ¼ b0þ b1X1þ b2X2þ b3X3þ b12X1X2þ b13X1X3þ b23X2X3þ b123X1X2X3

As a consequence, with just eight experiments it is possible to estimate aconstant term, the three linear terms, the three two-term interactions and thethree-term interaction

To illustrate the application of a Factorial Design the following example isreported[2]

A chemical company was producing a polymer, whose viscosity had to be

>46.0103mPa s As a consequence of the variation of a raw material, they

TABLE 1 A 23Factorial Design (Experimental Matrix)

(+)

FIGURE 1 Geometrical representation of a 2 3

Factorial Design.

Trang 29

got a final product rather different from the ‘original’ product (being producedsince several years), with a viscosity below the acceptable value Of course,this was a very big problem for the company, as the product could not be soldanymore The person in charge of the product started performing experimentsOVAT, but after about 30 experiments he could not find any acceptablesolution.

It was then decided to try with an experimental design

At first, three potentially relevant variables were detected: they were theamounts of three reagents (let us call themA, B and C) The original formu-lation was 10 g ofA, 4 g of B and 10 g of C

Therefore, it was decided to keep this experimental setting as a startingpoint and to explore its surroundings As the number of possible experimentswas quite limited, it was decided to apply a 23Factorial Design, requiring atotal of eight experiments

The next step was to define the levels of the variables and to write downthe experimental plan

As mentioned earlier, it had been decided to keep the original recipe as thecentre point and to set the levels1 and þ1 of each variable symmetrically tothe original value (9 and 11 for reagentsA and C, 3.6 and 4.4 for reagent B),leading to the experimental plan reported inTable 2

As it can be seen, while the experimental matrix contains the coded values(1 and þ1), the experimental plan reports the real values of the variablesand therefore can be understood by anybody

A very important point is that the experiments must be performed in dom order, in order to avoid the bias related to possible systematic effects.Let us suppose we are doing our experiments on a hot morning in July,starting at 8 a.m and finishing at 2 p.m., following the standard order reported

ran-TABLE 2 The Experimental Plan for the Polymer Factorial Design

Trang 30

inTable 2 Let us also suppose that, for some unknown and unsuspected son, the outcome of our experiments increases with external temperature,while none of the variables under study has a significant effect As a result,the responses of the eight experiments, instead of being the same (inside theexperimental error), will regularly increase We would therefore conclude,just looking at the results, that reagentC has a very relevant positive effect(the four best experiments are all the four experiments performed when itwas at a higher level), reagentB has a moderate positive effect and reagent

rea-A has a smaller but constant positive effect This happens because an trolled and unsuspected systematic trend is confounded with the effect of thevariables Instead, if the experiments are performed in random order, the samesystematic and uncontrolled variations (if any) will be ‘spread’ equally amongall the variables under study

uncon-After having performed the eight experiments and having recorded theresponses (Table 3), it was immediately clear that in several cases the viscos-ity was much higher than the minimum acceptable value

How is it possible not to have found those solutions in more than 30 vious experiments?

pre-Before computing any coefficient, let us look at the results shown inFigure 2

It can be clearly seen that all the experiments performed at a lower value

of reagentA led to responses greater than the threshold value It can therefore

be said that by lowering the amount of A an increase of the response isobtained

TABLE 3 Experimental Design, Experimental Plan and Responses of thePolymer Factorial Design

X 1 X 2 X 3 Reagent A (g)

Reagent B (g)

Reagent C (g)

Viscosity (mPa s)  10 3

Trang 31

In what concerns reagent B, it can be seen that its increase leads to adecrease of the response when reagentC is at a lower level and to an increase

of the response when reagentC is at a higher level This is a clear example ofinteraction between two variables The same interaction is detected when tak-ing into account reagent C It can be seen that an increase of reagent Cimproves the response when reagentB is at a higher level, while a worseningoccurs when reagentB is at a lower level

It should be clear now that the experiments performed by following anexperimental design are usually very few but highly informative, and there-fore some information can be obtained just by looking at the data

To compute the coefficients, we must go from the experimental matrix tothe model matrix (Table 4) While the former has as many rows as experi-ments and as many columns as variables, the latter has as many rows as

FIGURE 2 Spatial representation of the results of the polymer Factorial Design.

TABLE 4 Model Matrix and Computation of the Coefficients of the

Polymer Factorial Design

Trang 32

experiments and as many columns as coefficients and can be easily obtained

in the following way: the first column (b0) is a column ofþ1, the columns ofthe linear terms are the same as the experimental matrix, the columns of theinteractions are obtained by a point-to-point product of the columns of thelinear terms of the variables involved in the interaction (e.g the columnb12

of the interaction between variables 1 and 2 is obtained by multiplying point

to point the column b1 by the column b2) If quadratic terms were alsopresent, their columns would be obtained by computing the square of eachelement of the corresponding linear term

Computing the coefficients is very simple (again, no software required!) Foreach of them, multiply point to point the column corresponding to the coeffi-cient that has to be estimated by the column of the response, and then take theaverage of the results For instance, for estimatingb1(the linear term ofX1), justcalculate (51.8þ51.651.0þ42.450.2þ46.652.0þ50.0)/8¼1.8

An interesting thing to notice is that, as every column of the model matrixhas four1 and four þ1, every coefficient will be computed as half the dif-ference between the average of the four experiments with positive sign andthe average of the four experiments with negative sign This means that eachcoefficient is computed with the same precision, and that this precision, beingthe difference of two averages of four values, is much better than that of anOVAT experiment, where the difference between two experiments (one per-formed at higher level and one performed at lower level) is usually computed.Once more, it can be seen how the experimental design can give much moreinformation (the interaction terms) of much higher quality (higher precision ofthe coefficients)

The following model has been obtained:

Y ¼ 49:4  1:8 X1 0:6 X2þ 0:2 X3 0:8 X1X2þ 0:4 X1X3þ 1:9 X2X3

þ 1:2 X1X2X3

As eight coefficients have been estimated with eight experiments (andtherefore no degrees of freedom are available) and as the experimental varia-bility is not known, it is impossible to define a statistical significance of thecoefficients Anyway, the linear term of X1 (reagent A) and the interaction

X2–X3(reagentB–reagent C) have absolute values larger than the other ones.The negative coefficient ofX1indicates that by increasing the amount ofreagentA, a decrease of the viscosity is obtained, and therefore better resultsare obtained by reducing its amount AsX1 is not involved in any relevantinteraction, we can conclude that this effect is present whatever the values

of the other two reagents

In what concerns the interaction of reagent B–reagent C, it can only beinterpreted by looking at the isoresponse plot shown in Figure 3 As we areplotting the response on the plane defined by two variables (think of a slice

of the cube depicted inFigure 1), we must define the level of the third able (reagent A) at which we want to represent the response (i.e where to

Trang 33

vari-cut the slice) The clear effect of reagentA (the lower, the better) leads us to thechoice of setting the value ofX1at its lower level (1, corresponding to 9 g).The geometrical shape of a linear model without interactions is a plane(the isoresponse lines are parallel); if relevant interactions are present, itbecomes a distorted plane (the isoresponse lines are not parallel) This is thecase of the response surface on the plane of reagentB–reagent C By looking

at the plot, it can be seen that an increase of reagent B decreases viscositywhen reagent C is at its lower level, while it has the opposite effect whenreagent C is at its higher level In the same way, an increase of reagent Cdecreases viscosity when reagentB is at its lower level, while it has the oppo-site effect when reagentB is at its higher level

Looking at the plot, it can also be understood why the OVAT approach didnot produce any good result If you go to the centre point (corresponding tothe original formulation) and change the amount of either reagentB or reagent

C (but not both at the same time), you will realize that, whatever experimentyou will do, nothing will change Instead, owing to the strong interaction, youonly have relevant variations when you change both variables at the same time.Two combinations produce the same response: 3.6 g of reagentB and 9 g ofreagentC and 4.4 g of reagent B and 11 g of reagent C As a higher amount ofreagents increases the speed of the reaction, and therefore the final throughput,the latter has been selected and therefore the best combination is 9 g of reagent

A, 4.4 g of reagent B and 11 g of reagent C All the experiments were performed

at lab scale, and therefore this formulation had to be tested at the plant Whendoing it, the results obtained in the lab were confirmed, with a viscosity in therange 50.0–52.0103mPa s, well over the acceptability value

FIGURE 3 Isoresponse plot of the polymer Factorial Design.

Trang 34

Happy but not totally satisfied, the person performing the experimentaldesign tried one more experiment The results of the experimental designshowed that a decrease of reagentA was leading to better products, and thatthis variable was not involved in interactions with the other variables.

Of course, this behaviour was demonstrated only inside the experimentaldomain, but it could have been worthwhile to check if the effect was the samealso outside it The most logical development would have been to do a furtherexperimental design centred on the new formulation, but she did not haveenough time to do eight more experiments So, she just tried to further reducereagentA, and she tested the formulation with 7 g of reagent A, 4.4 g of reagent

B and 11 g of reagent C This experiment was a total success, as the productobtained at the plant had a viscosity in the range 55.0–60.0103

mPa s, wellabove the acceptable value

Of course, everybody in the company was very happy with the result—everybody except one person Can you guess who? It was the expert in charge

of the problem, who could not accept that somebody else could succeed withjust nine experiments where he totally failed, in spite of having performed ahuge number of experiments

One more comment: the previous example is not an optimization

Probably, if more experiments would have been performed with moreexperimental designs, even better results could have been obtained Anyway,the immediate goal of the company was not to find the optimum, but rather toget out of an embarrassing situation and to find a commercially valid solution

as fast as possible, and the Factorial Design, the simplest of all the tal designs, allowed getting a substantial improvement with a very limitedexperimental effort

experimen-The main problem with the previous design was that, as there were nodegrees of freedom and no previous estimate of the experimental variablewas available, it was not possible to determine which coefficients were statis-tically significant

Furthermore, as in a 2kFactorial Design each variable has two levels, onlylinear models (with interactions) can be estimated In order to use them as pre-dictive models they must be validated To do that, an experiment (or, better, aset of experiments) is performed at the centre point The experimentalresponse is then compared with the predicted response (corresponding to the

b0coefficient) If the two values are not significantly different, then the model

is said to be validated and therefore it can be used to predict the outcome ofthe experiments in the whole experimental domain It has to be well under-stood that validating a model does not mean demonstrating that it is true;instead, validating a model means that it has not been possible to demonstratethat it is false It is a subtle, but very relevant difference (the same betweenbeing acquitted because it has been demonstrated that you are not guilty orbeing acquitted because it was not possible to demonstrate that you areguilty)

Trang 35

A group of crystallographers at NASA was interested in studying theeffect of three variables (amount of precipitant, degree of supersaturation,amount of impurities) on the growth of the crystals of a protein[3] The goal

of the study was to obtain the largest possible crystal, and the measuredresponse (to be minimized) was the logarithm of the average number of crys-tals obtained in different wells (the lower the number, the greater the crystals)

As a high variability was expected, each experiment had been run in cate; this also allowed a better estimate of the experimental variance In order

dupli-to validate the model, a centre point had also been added The dupli-total number ofexperiments was 18, much fewer than what they were used to doing.Table 5shows the experimental design, the experimental plan and the responses

TABLE 5 Experimental Design, Experimental Plan and Responses of theNASA Factorial Design

X 1 X 2 X 3

Precipitant % (w/v)

Supersaturation ln(c/s)

Impurity % (w/w)

Log(crystal number)

Trang 36

The resulting model was the following:

Y ¼ 1:65  0:15X1þ 0:33X2þ 0:16X3 0:04X1X2þ 0:03X1X3 0:11X2X3

þ 0:03X1X2X3

For each experiment two replicates were available, and therefore theexperimental standard deviation could be computed as pooled standard devia-tion from the nine pairs of replicates This value was 0.125, with nine degrees

of freedom (one from each pair)

The model matrix for this design is reported in Table 6 (it has to benoticed that it has only 16 rows, because the two experiments at the centrepoint are only used for validation, and are not taken into account for comput-ing the coefficients)

The model matrix is commonly denoted asX By premultiplying it by itstransposed and then doing the inverse of this product the dispersion matrix isobtained (D ¼(X0X)1) The dispersion matrix is a square matrix having asmany rows and as many columns as coefficients in the model (eight in our

TABLE 6 Model Matrix of the NASA Factorial Design

Trang 37

case, seeTable 7) When multiplied by the experimental variance, the nal terms give the variance of the coefficients, while the extradiagonal termsgive the covariance of the coefficients.

diago-The fact that the dispersion matrix is diagonal means that there is nocovariance among the coefficients, and therefore all of them can be computedindependently from each other (it is an orthogonal design)

It can also be seen that all the elements of the diagonal are the same, ing that all the coefficients are estimated with the same precision This is not asurprise, because, as we have previously seen, the estimation of the coefficients

mean-of a Factorial Design is performed in the same way for all mean-of them (it is alwaysthe average of the response vector multiplied point to point by thecorresponding vector of the model matrix, having as many ‘þ1’ as ‘1’ terms).More in detail, their value is 0.0625, which is 1/16 Generally speaking,the 2k Factorial Designs in which all the experimental points have the samenumber of replicates are orthogonal designs producing a diagonal dispersionmatrix with the diagonal terms being equal to 1/(number of experiments) It

is clear now how (and how much) performing replicates improves the quality

of the design by decreasing the standard deviation (and therefore the dence interval) of the coefficients And once more it has to be noted that everycalculation we have done till now does not require any software

confi-As previously said, the variance of the coefficients can be computed bymultiplying the experimental variance by the terms on the diagonal of the dis-persion matrix In our case, the standard deviation of the coefficients will besqrt(0.1252* 0.0625)¼0.031 As the experimental variance has been esti-mated with nine degrees of freedom, the corresponding values oft are 2.26,3.25 and 4.78 for p ¼0.05, 0.01 and 0.001, respectively Therefore, thesemi-amplitude of the confidence interval is 0.07, 0.10 and 0.15 for

TABLE 7 Dispersion Matrix of the NASA Factorial Design

Trang 38

p ¼0.05, 0.01 and 0.001 Each coefficient can now be given its significancelevel, and the model can be written accordingly:

interac-at (Figure 4) From this it can be seen that the best condition corresponds to

no impurity and low supersaturation (agreeing with the fact that both variableshave negative coefficients), but it is also clear that at lower supersaturation theeffect of the impurity is quite relevant, while at higher supersaturationthe impurity has no effect The other way round, the effect of supersaturation

is much higher when no impurity is present

In order to validate the model it is required to compare the predictedresponse at the test point with the experimental value

The predicted response is 1.65 The experimental values of the two cates are 1.75 and 1.76, and therefore the average value is 1.76

repli-The experimental standard deviation (see above) is 0.125, with nine d.o.f.The semi-amplitude of the confidence interval of the mean is t * s/(sqrt(n)),where in our case t0.05,9 is 2.26 and n is 2 (two replicates have been per-formed) It should be noted that the number of d.o.f fort is related to how

FIGURE 4 Isoresponse plot of the NASA Factorial Design.

Trang 39

the standard deviation has been estimated (in our case, it was the pooled dard deviation of nine pairs of replicates) and has nothing to do with the value

stan-ofn So, it is 2.26 * 0.125/1.41 ¼0.20

The experimental value at the centre point is 1.760.20, which is not nificantly different from the predicted value (1.65); the model is validated andcan be used in the whole experimental domain Once more, this does notmean that the model is true; it simply means that the difference between the

sig-‘truth’ and the model is not larger than the experimental variability and fore we can use the model as a good approximation of the reality

there-It is clear now that in the validation of a model the experimental ity plays a very important role, and that the higher the experimental varia-bility, the easier it will be to validate a model If the response has a verysmall variability (e.g the elution time of a chromatographic peak), the confi-dence interval of the experimental test value will be very small, and therefore

variabil-it will be more difficult for the model to be statistically validated Though variabil-itcan seem counterintuitive, the worse the quality of the response (in terms ofexperimental variability), the easier it will be for the model to be validated.Instead, if the experimental variability is small, the confidence interval ofthe experimental value will also be very small and then too very small differ-ences between the experimental value and the predicted value will be statisti-cally significant, meaning that the model will not be validated

This is something that must be well understood Having a non-statisticallyvalidated model does not mean that the same model cannot be useful It can

be that the difference between the predicted value and the experimental value

is so small and totally negligible from a practical point of view that the model,

in spite of being non-validated from a purely statistical point of view, can beused anyway

3 PLACKETT–BURMAN DESIGNS

A company producing brake pads selected 11 variables as having a possibleeffect on the quality of the final product As a first screening they were inter-ested in sorting out which of these variables actually had an effect (or, better,

to remove those variables that did not).Table 8reports the selected variablesand the levels under study It is clear that an approach such as the FactorialDesign previously described is totally inapplicable (211¼2048 experiments!)

On the other hand, the Factorial Design allows estimating linear terms and allthe interactions among variables, which is way too much compared to the goal

we are interested in at this stage (just deciding which variables are important).Instead, a Plackett–Burman Design[4]only requires a number of experimentsequal to the first multiple of 4 greater than the number of variables So, in ourcase, it will be just 12 experiments!

In the examples of the previous section all the variables were quantitative,that is, all of them could assume every possible numerical value in the range

Trang 40

of interest Typical quantitative variables are time, temperature, pressure,amount of reagent, speed, flow and so on FromTable 8, it can be seen thatsome of them (e.g resin type, press type) can be described by a label, not anumber These are qualitative variables, such as operator, type of column,type of reagent, origin of a raw material and so on For these variables, though

a ‘numerical’ label can be applied (e.g operator 1, operator 2, operator 3, etc.)there is no correspondence at all with a real numerical value So, if we saythat the reaction time 2 h is midway between 1 and 3 h, we cannot obviouslysay that operator 2 is midway between operator 1 and operator 3

In both the Factorial Design and the Plackett–Burman Design all the ables are studied at two levels An interesting property is that these designscan be applied to both types of variables In the case of quantitative variablesthe ‘1’ level is usually (but not always) assigned to the lower level and the

vari-‘þ1’ to the higher level; in the case of qualitative variables the ‘1’ and ‘þ1’levels are arbitrarily assigned

Table 9shows the experimental matrix for a Plackett–Burman Design with

11 variables and the response (compressibility, to be minimized) It can beseen that each column has 6 ‘’ and 6 ‘þ’, meaning that each variable willhave one half of the experiments performed at the ‘’ level and one half ofthe experiments performed at the ‘þ’ level Again, as in the Factorial Design,the effect of each variable will be easily computed by calculating the alge-braic sum of the responses, each with the appropriate sign This means thatthe effect of each variable will be derived from the comparison of the

TABLE 8 The 11 Variables Studied in the Plackett–Burman Design

10 Pressure at high temperature Low High

11 Pressure at low temperature Low High

Ngày đăng: 13/03/2018, 14:56

TỪ KHÓA LIÊN QUAN