Cell arrays can have ments that contain any data type even other cell arrays, and they can be ofdifferent sizes.. The cell array has an overall structure that is similar to thebasic data
Trang 1A.6 Data Constructs in MATLAB
BBBBaaaasic Data sic Data sic Data CCCCoooonst nst nstrrrructs ucts
We do not cover the object-oriented aspects of MATLAB here Thus, we are
concerned mostly with data that are floating point (type double) or strings (type char) The elements in the arrays will be of these two data types.
The fundamental data element in MATLAB is an array Arrays can be:
• The empty array created using [ ]
• A scalar array
• A row vector, which is a array
• A column vector, which is an array
• A matrix with two dimensions, say or
• A multi-dimensional array, say
Arrays must always be dimensionally conformal and all elements must be
of the same data type In other words, a matrix must have 3 elements(e.g., numbers) on each of its 2 rows Table A.5 gives examples of how toaccess elements of arrays
Building
Building AAAArrrrrrrraaaays ys
In most cases, the statistician or engineer will be using outside data in an
analysis, so the data would be imported into MATLAB using load or some
other method described previously Sometimes, we need to type in simplearrays for testing code or entering parameters, etc Here we cover some of theways to build small arrays Note that this can also be used to concatenatearrays
Trang 2Commas or spaces concatenate elements (which can be arrays) as columns.Thus, we get a row vector from the following
confor-Cell
Cell AAAArrrrrrrraaaayyyyssss
Cell arrays and structures allow for more flexibility Cell arrays can have ments that contain any data type (even other cell arrays), and they can be ofdifferent sizes The cell array has an overall structure that is similar to thebasic data arrays For instance, the cells are arranged in dimensions (rows,columns, etc.) If we have a cell array, then each of its 2 rows has to have
ele-3 cells However, the content of the cells can be different sizes and can contain
different types of data One cell might contain char data, another double,
and some can be empty Mathematical operations are not defined on cellarrays
arrays, which can be cell arrays or basic arrays With cell arrays, this accesses
the cell element, but not the contents of the cells Curly braces, { }, are used
to get to the elements inside the cell For example, A{1,1} would give us the contents of the cell (type double or char) Whereas, A(1,1) is the cell itself
zeros, ones These build arrays containing all 0’s or
all 1’s, respectively
rand, randn These build arrays containing uniform
(0,1) random variables or standard normal random variables, respectively
eye This creates an identity matrix
2×3
Trang 3and has data type cell The two notations can be combined to access part of the contents of a cell To get the first two elements of the contents of A{1,1},
assuming it contains a vector, we can use
A.7 Script Files and Functions
MATLAB programs are saved in M-files These are text files that contain
MATLAB commands, and they are saved with the m extension Any text
edi-TA
TABBBBLLLLEEEE AAAA 5555
Examples of Accessing Elements of Arrays
a(i) Denotes the i-th element (cell) of a row or
column vector array (cell array)
A(:,i) Accesses the i-th column of a matrix or cell
array In this case, the colon in the row dimension tells MATLAB to access all rows
A(i,:) Accesses the i-th row of a matrix or cell array
The colon tells MATLAB to gather all of the columns
A(1,3,4) This accesses the element in the first row,
third column on the fourth entry of dimension 3 (sometimes called the page)
Trang 4tor can be used to create them, but the one that comes with MATLAB is
rec-ommended This editor can be activated using the File menu or the toolbar.
When script files are executed, the commands are implemented just as ifyou typed them in interactively The commands have access to the workspaceand any variables created by the script file are in the workspace when thescript finishes executing To execute a script file, simply type the name of the
file at the command line or use the option in the File menu.
Script files and functions both have the same m extension However, a
function has a special syntax for the first line In the general case, this syntaxis
function [out1, ,outM] = func_name(in1, ,inN)
A function does not have to be written with input or output arguments.Whether you have these or not depends on the application and the purpose
of the function The function corresponding to the above syntax would be
saved in a file called func_name.m These functions are used in the same
way any other MATLAB function is used
It is important to keep in mind that functions in MATLAB are similar tothose in other programming languages The function has its own workspace
So, communication of information between the function workspace and themain workspace is done via input and output variables
It is always a good idea to put several comment lines at the beginning of
your function These are returned by the help command.
We use a special type of MATLAB function in several examples contained
in this book This is called the inline function This makes a MATLAB inline object from a string that represents some mathematical expression orthe commands that you want MATLAB to execute As an optional argument,
you can specify the input arguments to the inline function object For example, the variable gfunc represents an inline object:
gfunc = inline('sin(2*pi*f + theta)','f','theta');
This calculates the based on two input variables: f and theta.
We can now call this function just as we would any MATLAB function
x = 0:.1:4*pi;
thet = pi/2;
ys = gfunc(x, thet);
In particular, the inline function is useful when you have a simple function
and do not want to keep it in a separate file
2πf θ+
sin
Trang 5A.8 Control Flow
Most computer languages provide features that allow one to control the flow
of execution depending on certain conditions MATLAB has similar structs:
con-• For loops
• While loops
• If-else statements
• Switch statement
These should be used sparingly In most cases, it is more efficient in MATLAB
to operate on an entire array rather than looping through it
For Loopppp Loo
The basic syntax for a for loop is
Trang 6can be used also In the case of arrays, all elements of the resulting array must
be true for the commands to execute
If-Else SSSStatement tatement tatementssss
Sometimes, commands must be executed based on a relational test The else statement is suitable here The basic syntax is
Switch SSSSttttaaaatemen temen tementttt
The switch statement is useful if one needs a lot of if, elseif statements
to execute the program This construct is very similar to that in the C guage The basic syntax is:
Expression must be either a scalar or a character string
A.9 Simple Plotting
For more information on some of the plotting capabilities of MATLAB, thereader is referred to Chapter 5 of this text Other useful resources are the
MATLAB documentation Using MATLAB Graphics and Graphics and GUI’s with MATLAB [Marchand, 1999] In this appendix, we briefly describe some
Trang 7of the basic uses of plot for plotting 2-D graphics and plot3 for plotting 3-D graphics The reader is strongly urged to view the help file for more infor-
mation and options for these functions
When the function plot is called, it opens a Figure window, if one is not
already there, scales the axes to fit the data and plots the points The default
is to plot the points and connect them using straight lines For example,
plot(x,y)
plots the values in vector x on the horizontal axis and the values in vector y
on the vertical axis, connected by straight lines These vectors must be thesame size or you will get an error
Any number of pairs can be used as arguments to plot For instance, the
following command plots two curves,
plot(x,y1,x,y2)
on the same axes If only one argument is supplied to plot, then MATLAB
plots the vector versus the index of its values
The default is a solid line, but MATLAB allows other choices These aregiven in Table A.6
If several lines are plotted on one set of axes, then MATLAB plots them asdifferent colors The predefined colors are listed in Table A.7
Plotting symbols (e.g., *, x, o, etc.) can be used for the points Since the list
of plotting symbols is rather long, we refer the reader to the online help for plot for more information To plot a curve where both points and a con-nected curve are displayed, use
plot(x, y, x, y, ‘b*’)
This command first plots the points in x and y, connecting them with straight lines It then plots the points in x and y using the symbol * and the color blue The plot3 function works the same as plot, except that it takes three vec-
tors for plotting:
plot3(x, y, z)
TTTTABABABLLLLE A.6E A.6
Line Styles for Plots
Trang 8All of the line styles, colors and plotting symbols apply to plot3 Other forms of 3-D plotting (e.g., surf and mesh) are covered in Chapter 5 Titles
and axes labels can be created for all plots using title, xlabel, ylabel and zlabel.
Before we finish this discussion on simple plotting techniques in MATLAB,
we present a way to put several axes or plots in one figure window This is through the use of the subplot function This creates an matrix of
plots (or axes) in the current figure window We provide an example below,
where we show how to create two plots side-by-side
% Create the left-most plot.
by any subsequent plotting commands To access a previous plot, simply use
the subplot function again with the proper value for the third argument p.
You can think of the subplot function as a pointer that tells MATLAB what
set of axes to work with
Through the use of MATLAB’s low-level Handle Graphics functions, thedata analyst has complete control over graphical output We do not presentany of that here, because we make limited use of these capabilities However,
we urge the reader to look at the online help for propedit This graphical
user interface allows the user to change many aspects or properties of theplots
TTTTAAAABBBBLLLLEEEE A.7A.7
Line Colors for Plots
Trang 9A.10 Contact Information
For MATLAB product information, please contact:
The MathWorks, Inc
3 Apple Hill Drive
Natick, MA, 01760-2098 USA
tronic newsletter called the MATLAB Digest Another is called MATLAB News
& Notes, published quarterly You can subscribe to both of these at
www.mathworks.com or send an email request to
subscribe@mathworks.com
Back issues of these documents are available on-line
Trang 10n Sample size
p Probability
Quantile Sample variance
Z Standard normal random variable
Othe
Otherrrr
Expected value of X Probability mass or density function
Trang 11Cumulative distribution function Nearest neighbor point-event cdf Joint probability (mass) function Nearest neighbor event-event cdf
K-function
Kernel
L-function
Likelihood function Likelihood ratio
Probability of event E
Conditional probability Class-conditional probability Prior probability
Posterior probability Proposal distribution - MCMC Roughness
Variance of X Gre
GreeeeekSymbol kSymbol kSymbolssss
Probability of Type I error Probability of Type II error Projection vector - grand tour Projection vector - grand tour Acceptance probability - MCMC Residuals
Bootstrap replicate Intensity
r-th central moment
Mean Histogram bin heights Target distribution - MCMC
Trang 12Correlation coefficient Variance
Covariance matrix Standard normal probability density function Standard normal cdf
Stationary distribution - MCMC
Class j
Acronym
Acronymssss
CSR Complete spatial randomness
EDA Exploratory data analysis
IQR Interquartile range
ISE Integrated squared error
MCMC Markov chain Monte Carlo
MIAE Mean integrated absolute error
MISE Mean integrated squared error
MSE Mean squared error
Trang 13Appendix C
Projection Pursuit Indexes
In this appendix, we list several indexes for projection pursuit [Posse, 1995b],and we also provide the M-file source code for the functions included in theComputational Statistics Toolbox
C.1 Indexes
Since structure is considered to be departures from normality, these indexesare developed to detect non-normality in the projected data There are somecriteria that we can use to assess the usefulness of projection indexes Theseinclude affine invariance [Huber, 1985], speed of computation, and sensitiv-ity to departure from normality in the core of the distribution rather than thetails The last criterion ensures that we are pursuing structure and not justoutliers
FFFFrrrrieieieieddddmmmmaaaannnn TTTTuke uke ukeyyyy IIIInd nd ndeeeexxxx
This projection pursuit index [Friedman and Tukey, 1974] is based on point distances and is calculated using the following
inter-,
function for positive values,
PI F T(α β, ) R2 r ij
2–
1111 R2 r ij
2–
Trang 14This index has been revised from the original to be affine invariant [Swayne,Cook and Buja, 1991] and has computational order
EEEEntntntntrrrrop op opyyyy Ind Ind Indeeeexxxx
This projection pursuit index [Jones and Sibson, 1987] is based on the entropyand is given by
,
where is the bivariate standard normal density The bandwidths
are obtained from
This index is also
Moment
Moment IIIInd nd ndeeeexxxx
This index was developed in Jones and Sibson [1987] and is based on ate third and fourth moments This is very fast to compute, so it is useful forlarge data sets However, a problem with this index is that it tends to locatestructure in the tails of the distribution It is given by
hβ
,
κ03
2 14 - κ40 2
4κ31 2
6κ22 24κ13 2
κ04 2
- z i
α( )3
β( )3
i 1
n
∑
=
Trang 15polynomials with J terms Note that MATLAB has a function for obtaining
these polynomials called legendre.
κ31
n n( +1)
n–1( ) n 2( – ) n 3( – ) - ( )z iα 3
β( )3
- ( )z iβ 4 3(n 1– )3
n n( +1) -–
α( )4 3(n 1– )3
n n( +1) -–
z iβ
( )2 (n–1)3
n n( +1) -–
β( )2
Trang 16where is the Legendre polynomial of order a This index is not affine
invariant, so Morton [1989] proposed the following revised index This isbased on a conversion to polar coordinates as follows
We then have the following index where Fourier series and Laguerre mials are used:
,
where represents the Laguerre polynomial of order a Two more indexes
based on the distance using expansions in Hermite polynomials are given
in Posse [1995b]
C.2 MATLAB Source Code
The first function we look at is the one to calculate the chi-square projectionpursuit index
function ppi = csppind(x,a,b,n,ck)
% x is the data, a and b are the projection vectors,
% n is the number of data points, and ck is the value
% of the standard normal bivariate cdf for the boxes.
L a
L2
Trang 17% find # points in each box
for i=1:(nr-1)% loop over each ring
for k=1:(na-1)% loop over each wedge
ind =
find(r>rd(i) & r<rd(i+1) &
th>angles(k) & th<angles(k+1));
Any of the other indexes can be coded in an M-file function and called by the
csppeda function given below You would call your function instead of
csppind
function [as,bs,ppm]=csppeda(Z,c,half,m)
% Z is the sphered data.
Trang 18% get the necessary constants
% find the probability of bivariate standard normal
% over each radial box.
% NOTE: the user could put the values in to ck to
% prevent re-calculating each time We thought the
% reader would be interested in seeing how we did
% it
% NOTE: MATLAB 5 users should use the function
% quad8 instead of quadl.
% generate a random starting plane
% this will be the current best plane
% find the projection index for this plane
% this will be the initial value of the index ppimax = csppind(Z,astar,bstar,n,ck);
% keep repeating this search until the value
Trang 19% c becomes less than cstop or until the
% number of iterations exceeds maxiter
mi = 0;
% number of iterations without increase in index
h = 0;
c = cs;
while (mi < maxiter) & (c > cstop)
% generate a p-vector on the unit sphere
Trang 20% Transform data using the matrix U.
% To match Friedman's treatment: T is d x n.
T = U*Z';
% These should be the 2-d projection that is 'best' x1 = T(1,:);
x2 = T(2,:);
% Gaussianize the first two rows of T.
% set of vector of angles
gam = [0,pi/4, pi/8, 3*pi/8];
Trang 22Appendix D
M ATLAB Code
In this appendix, we provide the MATLAB functions for some of the morecomplicated techniques covered in this book This includes code for the boot-strap confidence interval, the adaptive mixtures algorithm for probabil-ity density estimation, classification trees, and regression trees
D.1 Bootstrap Confidence Interval
% Loop over each resample and
% calculate the bootstrap replicates.
for i = 1:B
% generate the indices for the B bootstrap
% resamples, sampling with
% replacement using the discrete uniform.
ind = ceil(n.*rand(n,1));
% extract the sample from the data
% each row corresponds to a bootstrap resample
xstar = data(ind,:);
% use feval to evaluate the estimate for
% the i-th resample
bvals(i) = feval(fname, xstar);
Trang 23for i = 1:n
% use feval to evaluate the estimate
% with the i-th observation removed
% These are the jackknife replications.
jvals(i) =
feval(fname, [data(1:(i-1));data((i+1):n)]); end
D.2 Adaptive Mixtures Density Estimation
First we provide some of the helper functions that are used in csadpmix.
This first function calculates the estimated posterior probability, given thecurrent estimated model and the new observation
% function post=rpostup(x,pies,mus,vars,nterms)
% This function will return the posterior
function post = rpostup(x,pies,mus,vars,nterms)
% This function will update all of the parameters for
% the adaptive mixtures density estimation approach
Trang 24% This function will update the variances
% in the AMDE Call with nterms-1,
% since new term is based only on previous terms
function newvar = cssetvar(mus,pies,vars,x,nterms)
Here is the main MATLAB function csadpmix that ties everything together.
For brevity, we show only the part of the function that corresponds to theunivariate case View the M-file for the multivariate case
function [pies,mus,vars] = cadpmix(x,maxterms)
Trang 25else % create a new term
end % end if statement
% to prevent spiking of variances
index = find(vars(1:nterms)<1/(sievebd*nterms)); vars(index) = ones(size(index))/(sievebd*nterms); end % for i loop
% clean up the model - get rid of the 0 terms
Trang 26function tree = csgrowc(X,maxn,clas,Nk,pies)
[n,dd] = size(X);
if nargin == 4% then estimate the pies
pies = Nk/n;
end
% The tree will be implemented as a structure.
% get the initial tree - which is the data set itself tree.pies = pies;
% need for node impurity calcs:
% This will be a 2 element vector of
% node numbers to the children.
Trang 27tree.node.data = X;
% Now get started on growing the very large tree.
% first we have to extract the number of terminal nodes
% that qualify for splitting.
% get the data needed to decide to split the node [term,nt,imp]=getdata(tree);
% find all of the nodes that qualify for splitting ind = find( (term==1) & (imp>0) & (nt>maxn) );
% now start splitting
func-function tree = csgrowr(X,y,maxn)
Trang 29Appendix E
M ATLAB Statistics Toolbox
The following tables list the functions that are available in the MATLABStatistics Toolbox, Version 3.0 This toolbox is available for purchase fromThe MathWorks, Inc
TTTTABABABLLLLE E.1E E.1
Functions for Parameter Estimation (fit) and Distribution Statistics - Mean and Variance (stat)
betafit, betastat Beta distribution.
binofit, binostat Binomial distribution.
expfit, expstat Exponential distribution.
gamfit, gamstat Gamma distribution.
geostat Geometric distribution
hygestat Hypergeometric distribution
lognstat Lognormal distribution
mle Maximum likelihood parameter estimation.
nbinstat Negative binomial distribution
ncfstat Noncentral F distribution
nctstat Noncentral t distribution
ncx2stat Noncentral Chi-square distribution
normfit, normstat Normal distribution.
poissfit, poisstat Poisson distribution.
raylfit Rayleigh distribution.
unidstat Discrete uniform distribution
unifit, unifstat Uniform distribution.
weibfit, weibstat Weibull distribution.
Trang 30TTTTAAAABBBBLLLLEEEE EEEE.2.2
Probability Density Functions (pdf) and Cumulative Distribution
Functions (cdf)
betapdf, betacdf Beta distribution
binopdf, binocdf Binomial distribution
chi2pdf, chi2cdf Chi-square distribution
exppdf, expcdf Exponential distribution
fpdf, fcdf F distribution
gampdf, gamcdf Gamma distribution
geopdf, geocdf Geometric distribution
hygepdf, hygecdf Hypergeometric distribution
lognpdf, logncdf Log normal distribution
nbinpdf, nbincdf Negative binomial distribution
ncfpdf, ncfcdf Noncentral F distribution
nctpdf, nctcdf Noncentral t distribution
ncx2pdf, ncx2cdf Noncentral chi-square distribution
normpdf, normcdf Normal distribution
pdf, cdf Probability density/Cumulative distribution
poisspdf, poisscdf Poisson distribution
raylpdf, raylcdf Rayleigh distribution
tpdf, tcdf T distribution
unidpdf, unidcdf Discrete uniform distribution
unifpdf, unifcdf Continuous uniform distribution
weibpdf, weibcdf Weibull distribution
Trang 31TTTTAAAABBBBLLLLE E.3E E.3
Critical Values (inv) and Random Number Generation (rnd) for
Probability Distribution Functions
betainv, betarnd Beta distribution
binoinv, binornd Binomial distribution
chi2inv, chi2rnd Chi-square distribution
expinv, exprnd Exponential distribution
finv, frnd F distribution
gaminv, gamrnd Gamma distribution
geoinv, geornd Geometric distribution
hygeinv, hygernd Hypergeometric distribution
logninv, lognrnd Log normal distribution
nbininv, nbinrnd Negative binomial distribution
ncfinv, ncfrnd Noncentral F distribution
nctinv, nctrnd Noncentral t distribution
ncx2inv, ncx2rnd Noncentral chi-square distribution
norminv, normrnd Normal distribution
poissinv, poissrnd Poisson distribution
raylinv, raylrnd Rayleigh distribution
tinv, trnd T distribution
unidinv, unidrnd Discrete uniform distribution
unifinv, unifrnd Continuous uniform distribution
weibinv, weibrnd Weibull distribution
icdf Specified inverse cdf