I occasionally exported data in WKS format, loaded the file into a spreadsheet program, sorted the data, and so forth, and then re-read the file back into Pirouette.. The newsletter anno
Trang 1The North American Chapter of the
International Chemometrics Society
S.D Brown, Course Accreditation
Departmentof Chemistry and
Biochemistry
University of Delaware
Newark, DE 19716, USA
sdb@brahms.UDel.edu
B Vandeginste, Chemometric Abstracts
Unilever Research Laboratory
Trang 2NAmICS Newsletter #9 September 1994
Pirouette aim high; they
seek to design a
power-ful, yet user-friendly
soft-ware tool that can tackle
the most frequent types
of chemometric
investi-gations Infometrix
achieves this ambitious
goal by judiciously
se-lecting techniques and
then carefully
implement-ing them Much thought
has gone into the
devel-opment of this high
qual-ity product and it shows
clustering and principal
component analysis; the
classification module
offers K-nearest
neighbors and SIMCA;
the calibration module
contains PLS and PCR
routines The underlying
chemometric theory is
solid Necessary options
are available and
From the Editor's Desk
It's a pleasure to bring you the eighth edition of the newsletter of the North American Chapter of the International Chemometrics Society Dave Duewer and Dora Schnur roped me into, er, ah, asked me to guest-edit this issue It's the one you've all been waiting for, yes, the Election Issue! There's quite a line-up of candidates waiting for you to cast your vote, so don't delay in returning your ballot (p 20) All opinions expressed herein are solely those of contributing individuals; their institutions bear none of the blame
Deborah Illman, Guest Editor
In this issue:
Candidate's Statements, 18 Ballot, 20 Miss Prim, 3Seasholtz waxes philosophic, 4 Happy Birthday NAmICS, 5Education, 8 Letters, 9 Vendor Information on List-Serve, 10Calendar, 12 Chemometrics On-Line Conference, 13
2
Trang 3spreadsheet holds the
data Generating and
managing data subsets
is easy and efficient
can be zoomed to full
screen or resized The
toolbar also provides
the means to spin 3D
plots, magnify 2D
plots, and identify
points in plots
Pirouette provides
what it calls "array
plots" for certain
objects Consider a
SIMCA model
containing three
classes The scores
window will contain
three miniature score plots; each can be successively zoomed
to fill the quadrant by double-clicking
These miniatures can
be surprisingly informative Other miniatures (called multiplots) of raw datashow up to 231 pairwise variable plots(i.e., 22 variables worth); linear correlations are immediately visible even in the reduced form This variety of data views and the advantageous use of color facilitate the tedious, yet necessary,process of examining alarge data set
The installation procedure is automatic In a few cases, some
customization of config.sys and autoexec.bat might be necessary but these matters are spelled out
in the manual The program accesses at most 16 MB of memory and requires
at minimum a 386 computer with 4 MB
of memory and 5 MB
of hard disk space with an EGA or VGA adapter and mouse A math co- processor is strongly
recommended, as is more memory I ran Pirouette on a 386sx (20 MHz, 4 MB, mathco-processor) and a 486dx (66 MHz, 16
MB); the times given below correspond to the flashier hardware unless explicitly noted The program's worksheet can hold up
to 8000 samples or variables with the limit of the combination being determined by the available memory
Extracting 10 principalcomponents from a 75 sample/66 variable data set took less than
15 sec
Pirouette employs data linking in two imaginative ways
First, in SIMCA, PCR, PLS, the number
of model factors is linked to related ob-jects (plots of model-ing power, residuals, predictions, and lever-age, etc) Thus, with aplot of eigenvalues v
number of factors in one window and up to three linked objects in other windows, click-ing on the desired number of factors in the eigenvalue plot triggers an immediate update of the results inthe remaining win-dows The other type
of data linking allows
a user to select a set of samples or vari-ables in one view of the data and see those selections highlighted
sub-in another view
This greatly simplifies the inspection and/or deletion of outliers
They can be highlighted in the residuals plot using a rubberband box, examined in the prediction plot, and then, with a single keystroke, excluded toform a new subset This feature, coupled with the program's computational speed, makes it realistic to investigate and compare many subsets For example, for my 75x66 data matrix, I could delete
a few variables, re-run PCA, and compare theeigenvalue plots in less than 20 sec.Besides its own data format, Pirouette supports ASCII and WKS formats for both input and output I had no problem getting data into the program I
occasionally exported data in WKS format, loaded the file into a spreadsheet program, sorted the data, and so forth, and then re-read the file back into Pirouette Bundled with Pirouette is MasterKey, a utility which translates data files produced by a variety of commercial instruments into Pirouette, ASCII, and WKS formats To export only certain results, the user can choose to save the contents of an active window This is
Trang 4handy for transferring
discusses the theory
behind the methods
Pirouette is its printing
capabilities For the
record, I have used (or
tried to use) the
following printers: HP
PaintJet (color), HP
LaserJetII, Apple
LaserWriterIINT, and
two other postscript
printers whose names
it is not surprising thatthe process is rather slow on aging hardware; it took my 386sx about 2 min/page for the LaserJetII (I can't give a time for the 486dx because only postscript printers were available on that system.)
It is also distressing that offline devices are not recognized as such
For those working in apostscript
environment, the otherwise fine performance of the software is
compromised by printing difficulties
The program offers theoption of printing to a file or saving TIFF images, which is a fastway around printing out of Pirouette It took less than 15 sec for the 486 tif file-write Perhaps Windows users imagine saving TIFFs,switching into a Windows program thatrecognizes the format, printing via the Print Manager, and switching back to Pirouette Good idea except for the switching part
Pirouette CAN run inside Windows (but only in standard mode and some applications don't like this) but it cannot task switch
Infometrix is aware of the postscript printing problems and is takingsteps to address them
A Windows version ofthe program is due by the end of the year Itsrelease should make both the print speed and printer driver issues moot I found the staff at Infometrix EXTREMELY responsive and knowledgeable My phone calls and email messages were dealt with in a timely and professional fashion
Overall, Pirouette
is a very impressive product It IS expensive (list $4000 with a 40% discount for academic users) but let's face it, good tools are never cheap!
A free, almost fully functional demo is available so interested parties can investigate the program for themselves Since Infometrix will customize a demo containing your own data, this is a no-risk way to get a real feel for Pirouette and see how it works with your system
For further information:
Infometrix, Inc
2200 Sixth Avenue, Suite 833Seattle, WA 98121phone:
206-441-4696fax: 206-441-0841internet:
infomtrx@halcyon.com
Ask Miss Prim
Dear Miss Prim,
My name is Eddie an
my nayburhood is gonna becum a enterprise zone I wanna start a kemmometrics kumpany wit my pals
on da street What shud we do?
Signed,Eddieand
a LatinSqua
Trang 5Gentle (sic) Reader: Perhaps you should look into a more lucrative business like statistical consulting There are already gangs of unemployed chemometricians roaming the country looking for jobs In fact, there is an
international crisis, with svante or eighty such gangs
throughout the world These people are mean (not average), sum are squared, and many are in analysis
Do a target
transformation on your goals.
[Questions for Miss Prim (Clare Gerlach) may be sent in care of the Editor-in-Chief.]
Trang 6A rose by any other name…
Mary Beth Seasholtz, mseasholtz@dow.com, (517)636-3646The field of chemometrics is fortunate enough to have progressed to the point where there are multiple generations
of ideas As a graduate student in 1989 eager to learn the tools of the trade, I was confronted with two sets of
‘generations’ of equations describing PCR (and who knows how many for PLS, but that is another story!) This newsletter seemed a good place to present a few lines which demonstrate the equivalence of the two approaches, and to touch on the historical context which led to the move Please forgive the omission to the multitude of appropriate references – Deborah only gave me one page
The older of the two approaches begins with assuming R = TPT where T has orthogonal columns and P has
orthonormal columns In the mid 1960’s when chemometrics was born, there were a few methods available for
calculating T and P One was the not so well behaved NIPALS algorithm Alternatively, T could be obtained by solving for the eigenvalues and eigenvectors of RRT (a square symmetric matrix), and then P could be estimated given R and T The symmetric eigenvalue problem was studied for many years; the most famous book on the
subject was published in 1965 by J.H Wilkinson (The Algebraic Eigenvalue Problem) However, computational
algorithms were not in high demand as computers of the 60’s certainly were not what they are today The
calibration problem then was cT~x (~ indicates truncation) Solving for x via the normal equations gives
c T
r , giving
c T T T r x t
un un
In 1969 Gene Golub made the singular value decomposition (svd) an algorithmic reality It was long known that an
arbitrary matrix could be written as the product of three matrices, R = USVT, where U = eigenvectors of RRT, V = eigenvectors of RTR and RRT (they are the same) But, until Gene and his coworkers came on the scene there was not a direct calculation (you had to go through the covariance matrices as described above) With the advent of the widespread availability of the svd (and other useful code) through facilities like LINPACK, EISPACK, Numerical Recipes and Matlab, the PCR story has since evolved and a new word is being used: pseudoinverse The calibration
equation now reads c = Rb, and b Rc
ˆ , where R V~S~1U~Tc pseudoinverse Prediction is simply
c U S V r b r
un
T un un
~
~
~ˆ
Equation (2) sure looks different from (1)! Well, recall T could be calculated from an eigenvector problem of RRT
… in fact T = US and P = V Making these substitutions into (1) give c rTV U S U S 1 U S Tc
un
un ~((~~)(~~)) (~~)
which is equation (2), after reduction using standard linear algebra rules
As can be seen, it was because of some relatively new technology in the area of numerical linear algebra which gaverise to the new look for PCR In addition to all the other things that keep us busy, I think we must be as diligent as
we can to continue to bring into chemometrics new developments from disciplines such as applied mathematics, statistics and numerical analysis
Trang 7See what too much tequila can do?
Happy 20th Birthday to the Chemometrics
Society! It was on June 10, 1974 that the
Laboratory for Chemometrics met with Svante
Wold in Seattle over some great Mexican food
and too much tequila and formed the Society
Our focus was on improving communication
between chemists, statisticians and
mathematicians We also wanted all chemists to
be about 10% chemometricians to insure that
experiments would be designed optimally and all
information would be extracted from chemical
measurements Well, the Mexican restaurant is
no longer in business but the Society and field of
chemometrics is alive and doing very well.
I was reading Chemometrics Society
Newsletter Number 1 (I have a complete set)
published in January, 1976, and it reported 101
members worldwide with half of them owning
the program ARTHUR which some of you may
remember The newsletter announced that the
second FACSS meeting had an attendance of 200
at the "Chemometrics in Analytical Chemistry"
session with papers from Wold, Deming,
Horlick, Duewer and Kowalski Also announced
was the "Chemometrics: Theory and
Applications" session at the Summer 1976 ACS
meeting in San Francisco that later produced the
first book on chemometrics The newsletter
ended with comments, requests and suggestions
from Richard Cramer, Ken Loach and Harold
Martens So much for ancient history.
Two journals, thousands of papers and reviews and dozens of books later we find ourselves today with rich areas of application, powerful chemometrics tools and essentially infinite computer power We are very busy scientists Also, scientists, statisticians, and mathematicians and even chemical engineers have discovered chemometrics and the race is on What will our science be like in the next century, the year 2000?
Allow me to make a few predictions You can use leave-one-out cross-validation to estimate the PRESS, SEP or RHSCV if you choose First and foremost we should all see the necessity to have chemometrics permeate the formal education of all chemists, not only with graduate level courses, training courses and workshops but also
at the beginning levels of chemical education The old "scientific method" that relies on a lot of theory and few definite measurements must die
It should be replaced with equal amounts of theory, experimentation, measurements and simulation and emphasize the multivariate nature
of the world around us There is no place for univariate thinking in our multivariate, dynamic world.
Next, chemometrics will no longer be just a collection of our data analysis methods The tools of chemometrics will spawn new measurement theories that will guide chemists in all areas of research To this end a young
chemometrician, Karl Booksh, and I offer a special report in the August issue of
ANALYTICAL CHEMISTRY titled "Theory of Analytical Chemistry." I invite you to read this paper and incorporate it in your research and education activities I also encourage you to expand this theory and move it into areas of chemistry beyond chemical analysis.
Trang 8Finally, the tools of chemometrics will move
from mathematics and software to firmware so as
to be transparent to the user and very easy to use
and hard to abuse Our current software, while a
great improvement over ARTHUR and SIMCA,
is still too difficult to use The younger
generation of chemists have good backgrounds in
linear algebra and have little difficulty with
multivariate methods However, the older
generation that will be with us into the next
century doesn’t understand our methods and
therefore prefer to separate one peak from all the
rest or correlate one molecular property at a time
to molecular activity thereby missing the most
important part of nature, covariance We must
accept the responsibility to make it easy for all
chemists to incorporate multivariate methods into
their work Use this as an analogy We are all
expert users of TVs, VCRs, cellular phones and
the like, but how many of us are truly familiar
with the complex subsystems of these devices
To really be of use, our methods must be
integrated into instruments and experiments to
the point of being transparent.
I'll stop here Many of you have your own
vision of the future of chemometrics that I for
one would surely love to hear Perhaps you can
use the newsletter as your vehicle to share your
thoughts with the rest of us and thereby continue
a twenty year tradition.
Q: How many Seattlites does it take to screw in a light bulb?
A: Two One to change the bulb and one to hold both lattes
Q: How many Zen masters does it take to screw in a light bulb?
A: A tree in a golden forest
Q: How many surrealists does it take to change a light bulb?
A: Two: one to hold the giraffe, and the other to fill the bathtub with brightly colored machine tools.Q: How many IBM types does it take to change a light bulb?
A: 100 Ten to do it and 90 to write document number GC7500439-0003 Multitasking Incandescent Source System Facility of which 10% of the pages state only “This page intentionally left blank.”
Q: How many Vulcans does it take to change a light bulb?
A: Approximately 1.00000000000000000000000Q: How many existentialists does it take to screw in
Trang 9Open a Window on Chemometrics
by
Judith Barnsby
BARNSBYJ@rsc.org
Window on Chemometrics is a new monthly
publication reporting on the latest work in the
computer handling of analytical data It covers the
science of chemometrics and its applications in
spectroscopy, chromatography, and other analytical
techniques
Produced by scanning the international scientific
literature, including the major chemometrics and
analytical chemistry publications, Window on
Chemometrics gives reports of developments in the
following key areas:
General Techniques & Statistics
Calibration & Validation
Computer Programs, Expert Systems and
Applications
Spectrometry
Chromatography
Other Analytical Techniques
Each report gives title, detailed abstract and
Second order analytical instruments or
instrumental methods (i.e those that give a response
matrix when analyzing a pure analyte) have the
advantage of the ability to analyze mixtures which
contain unknown interferences However, this
advantage can be lost, if a suitable calibration method
is not used A medium-rank second-order calibration
method is proposed (full details given), based on
least-squares restricted Tucker models With this
method the second-order advantage is retained
Ordering details:
Window on Chemometrics ISSN 0966-9086
12 issues per year
1994 subscription prices:
US $162.00Canada L95.00 (+GST)
EC & Rest of World L90.00Please order from:
The Royal Society of ChemistryTurpin Distribution Services LtdBlackhorse Road
Letchworth, Herts SG6 1HNUK
Tel: +44 (0)462 672555Fax: +44 (0)462 480947
For further details and a sample copy of Window on Chemometrics, please contact:
Judith BarnsbyThe Royal Society of ChemistryThomas Graham House
Science Park, Milton RoadCambridge CB4 4WFUK
Tel: +44 (0)223 420066(Toll free in US: 1-800-473-9234)Fax: +44 (0)223 423429
E-mail: barnsbyj@rsc.org
Trang 10Chemometrics is for undergraduates, too
by Nick C Thanasoulias
[Thanasoulias writes to us from the Chemistry
Department at the University of Ioannina, Greece
Ed.]
Many colleagues, who teach in higher education
institutions, keep wondering whether undergraduate
students should be presented with chemometrics or
not Common questions involve the student's
background in mathematics, knowledge of computers
and ability to catch the idea behind the calculations
that chemometrics requires One common argument
is that chemometrics is needed only if you plan to do
research So why not stick to the old familiar
Gaussian least squares?
However, the fact is that everyday we are more
and more confronted with the problem of agreement
between results of different laboratories Most
industrial processes require pilot experiments which
can be performed only by means of applying
statistical and mathematical techniques to chemical
problems Quality control, especially in the clinical
laboratory, is another major application of
chemometrics
It is more than certain that chemometrics can not
be ignored In my opinion it should be taught in the
undergraduate level and students should become
familiar with as many statistical techniques as
possible
In the University of Ioannina, Greece,
chemometrics is taught during the last semester of the
fourth year Since students are able to choose their
classes during that semester, chemometrics teaching
can be modified according to the number of students
attending the class It usually consists of two
modules: one theoretical and one experimental The
theoretical part includes some six to ten hours of
classroom teaching and the chapters covered include:
fundamentals of statistics, significance testing,
analysis of variance, experimental errors, simple and
multiple linear regression, factorial design and
cumulative sum techniques
The second module consists of a series of experimental classes which are designed so that they resemble a "miniature" thesis preparation Each student is asked to apply to a real chemical problem some of the theoretical principles learned so far The outline is as follows:
Definition of the chemical problem;
Literature study related to the chosen chemical problem (usually covers the last 2-5 years and students are asked to concentrate on review papers);
A normality test is applied to a set of repeated measurements in order for the student to decide whether to use parametric or non-parametric techniques We usually take care that parametric statistics can be applied since students are not familiar with non-parametric methods and algorithms;
Classic one-at-a-time factor-change experiments are performed, in which change in an appropriately chosen variable (the response) as a function of one factor is followed while the other factors are kept constant The choice of the factors and response relies on previous knowledge (usually from the literature study) and care is taken not to include more than one factor which may not have
a significant effect on the response The results are treated with the usual regression techniques;One-way ANOVA tests are carried out for each factor
in turn for the student to get an idea of how significant the effect of each factor may be;
Up to four factors are selected in a complete factorial design (five, if the measurements are very easy toperform) The factors are chosen in two levels and all the trials, consisting of all the possible combinations, are carried out twice for an estimation of the effects and the residual variance;
Finally, the student presents a report that follows closely the principles of scientific paper writing and is asked to make conclusions such as the suitability of the system for analytical purposes (limit of detection, limit of determination, sensitivitiy) and to propose ways of maximizing system response
Trang 11So far, we have tried systems such as the
oxidation of pyrogallol by various oxidants with
chemiluminescence detection, the determination of
gallic acid and tannins, the correlation of analytical
methods for determination of glucomse content of
foodstuffs, and the synthesis of zeolites and their
ion-exchange capability, among others
Letters to the Editor
Communication gap between
QSAR and Physiologial
Modelers What to do?
Dear Editor,
… an increasing percentage of activities
supported by public and private sector organizations
are framed against the complexities associated with
assessing and managing the many forms of risk as
may be posed to humans and the environment by
chemical, biological, and physical agents
A "Workshop on Decision Support
Methodologies for Human Health Risk Assessment of
Toxic Substances" was held November last year
The focus of the Workshop was on the status,
direction, and utility of models described as PBPKPD
(Physiologically Based/ Pharmacokinetic /
Pharmacodynamic means to model metabolic
disposition of chemical substances) and similar
discussions on efforts in QSAR (Quantitative
Structure Activity Relationships)
The Workshop was funded by: Agency for Toxic
Substances And Disease Registry (ATSDR), National
Institute of Environmental Health Sciences (NIEHS),
National Cancer Institute's (NCI) Division of Cancer
Etiology (DCE), Environmental Protection Agency
(EPA), Wright Patterson Air Force Base, Toxicology
Division, and the National Library of Medicine
(NLM)
I had two roles, one as a member of the
Workshop's steering committee and as one of the
wrap-up speakers My presentation focused on the
array of data and information resources needed to
efficiently and effectively support the development,
testing, application, and validation of means to model
the effect of chemical, biological, and physical agents
on biological systems The resources generally useful
in organizational decision making are relevant to assessing and managing the many forms of risk.What was distressing to me and others at the Workshop was the almost complete lack of communication between the PBPKPD and QSAR modelers which raises the challenge as to what should be done to insure that understanding evolves
of the inter-relationships of these models
The steering committee will be maintained, and I will continue as a member I would like to be able to use it as a platform to identify needs for data and information resources which so far have not been identified, and to use this platform to help prioritize forms of such resources to support scientific efforts.Sidney Siegel, Ph.D
Chief, Office of Hazardous Substances Information301-496-5022; FAX 301-480-3537
Expert System Available
An expert system is now available, at no cost, which will determine chemical class, molecular weight and target compound identity from low resolution mass spectra The target compounds are
75 volatile toxic and related compounds Class and
MW information is valid for compounds other than the target set Description is provided in D.R Scott, Anal Chim.Acta, 285, 209-222 (1994), and in forthcoming paper by D.R Scott in Chemometrics and Intell Lab Sys., accepted February 1994 For a copy of the program and instructions, send a 3.5 or 5.25-inch MS-DOS formatted diskette to D.R Scott, AREAL, MD-77, U.S EPA, Research Triangle Park, N.C 27711, USA
Trang 12News from the President-Elect
Vendor information on List-Serve
Message to List-Serve Readers:
An important role of NAmICS is to provide a
conduit for communication between those who
create and/or apply chemometrics and those who
provide the instrumentation and software to
implement it In the case of software we have
attempted to do this through reviews in our
newsletter.
Unfortunately this approach takes a great deal
of time I spent about 60 hours on the review
that I wrote for the newsletter Although I
personally hope that others will offer reviews of
software and instrumentation that they use, we
have had little success in finding those willing to
do so.
It has been suggested that we allow software
and hardware companies an opportunity to
present their products to our members We, the
officers of NAmICS, agree with this suggestion,
but wish to avoid long sales pitches or
monologues Yet it would be helpful if the
vendors provided the philosophy, scope,
contents, references, prices, and purchasing
procedures for their products It will take some
trial and error and feedback from the
membership in order to define the line between
useful information and annoying advertisements.
There is nothing to prevent any member of
the list-server from posting an advertisement
(other than the VERY real risk of alienating
possible customers) For this reason, we suggest
that presentations first be sent to me for initial
screening and negotiated editing.
Our first attempt at providing information
about a software package will follow this
message Please let me know what you think
about its form and presentation In making these
comments please also remember that you signed
onto this list-serve in order to be kept current in the development and application of
chemometrics.
Donald Dahlberg, Ph.D.
Department of Chemistry Lebanon Valley College Annville, PA 17003-0501 office: (717)867-6143 fax: (717)867-6124 E-Mail: Dahlberg@ACAD.LVC.EDU
Chemometrics Software Upgrade PLS_Toolbox Version 1.4 For Use with MATLAB*
Now Available Submitted by Barry M Wise
***************************************
I am pleased to announce that the PLS_Toolbox Version 1.4 is now available This upgrade of the PLS_Toolbox is the most extensive since it was first released in 1991 The toolbox is now completely compatible with MATLAB 4.x, and takes advantage
of many of the new MATLAB features In addition, I
am now offering technical support for the PLS_Toobox through my home e-mail account and home FAX
Many new functions have been added to the toolbox, and others have been improved considerably.Some examples are:
* Multivariate instrument standardization with additive background correction
* Two and three dimensional scores and loadings plots with labelled points
* Locally weighted regression with y-distance weighting
* K-means statistical cluster analysis with dendrograms
* Savitsky-Golay smoothing and derivatives