Computational Statistics Handbook with MATLAB phần 10 ppsx

Cell arrays can have ments that contain any data type even other cell arrays, and they can be ofdifferent sizes.. The cell array has an overall structure that is similar to thebasic data

Trang 1

A.6 Data Constructs in MATLAB

BBBBaaaasic Data sic Data sic Data CCCCoooonst nst nstrrrructs ucts

We do not cover the object-oriented aspects of MATLAB here Thus, we are

concerned mostly with data that are floating point (type double) or strings (type char) The elements in the arrays will be of these two data types.

The fundamental data element in MATLAB is an array Arrays can be:

• The empty array created using [ ]

• A scalar array

• A row vector, which is a array

• A column vector, which is an array

• A matrix with two dimensions, say or

• A multi-dimensional array, say

Arrays must always be dimensionally conformal and all elements must be

of the same data type In other words, a matrix must have 3 elements(e.g., numbers) on each of its 2 rows Table A.5 gives examples of how toaccess elements of arrays

Building

Building AAAArrrrrrrraaaays ys

In most cases, the statistician or engineer will be using outside data in an

analysis, so the data would be imported into MATLAB using load or some

other method described previously Sometimes, we need to type in simplearrays for testing code or entering parameters, etc Here we cover some of theways to build small arrays Note that this can also be used to concatenatearrays

Trang 2

Commas or spaces concatenate elements (which can be arrays) as columns.Thus, we get a row vector from the following

confor-Cell

Cell AAAArrrrrrrraaaayyyyssss

Cell arrays and structures allow for more flexibility Cell arrays can have ments that contain any data type (even other cell arrays), and they can be ofdifferent sizes The cell array has an overall structure that is similar to thebasic data arrays For instance, the cells are arranged in dimensions (rows,columns, etc.) If we have a cell array, then each of its 2 rows has to have

ele-3 cells However, the content of the cells can be different sizes and can contain

different types of data One cell might contain char data, another double,

and some can be empty Mathematical operations are not defined on cellarrays

arrays, which can be cell arrays or basic arrays With cell arrays, this accesses

the cell element, but not the contents of the cells Curly braces, { }, are used

to get to the elements inside the cell For example, A{1,1} would give us the contents of the cell (type double or char) Whereas, A(1,1) is the cell itself

zeros, ones These build arrays containing all 0’s or

all 1’s, respectively

rand, randn These build arrays containing uniform

(0,1) random variables or standard normal random variables, respectively

eye This creates an identity matrix

2×3

Trang 3

and has data type cell The two notations can be combined to access part of the contents of a cell To get the first two elements of the contents of A{1,1},

assuming it contains a vector, we can use

A.7 Script Files and Functions

MATLAB programs are saved in M-files These are text files that contain

MATLAB commands, and they are saved with the m extension Any text

edi-TA

TABBBBLLLLEEEE AAAA 5555

Examples of Accessing Elements of Arrays

a(i) Denotes the i-th element (cell) of a row or

column vector array (cell array)

A(:,i) Accesses the i-th column of a matrix or cell

array In this case, the colon in the row dimension tells MATLAB to access all rows

A(i,:) Accesses the i-th row of a matrix or cell array

The colon tells MATLAB to gather all of the columns

A(1,3,4) This accesses the element in the first row,

third column on the fourth entry of dimension 3 (sometimes called the page)

Trang 4

tor can be used to create them, but the one that comes with MATLAB is

rec-ommended This editor can be activated using the File menu or the toolbar.

When script files are executed, the commands are implemented just as ifyou typed them in interactively The commands have access to the workspaceand any variables created by the script file are in the workspace when thescript finishes executing To execute a script file, simply type the name of the

file at the command line or use the option in the File menu.

Script files and functions both have the same m extension However, a

function has a special syntax for the first line In the general case, this syntaxis

function [out1, ,outM] = func_name(in1, ,inN)

A function does not have to be written with input or output arguments.Whether you have these or not depends on the application and the purpose

of the function The function corresponding to the above syntax would be

saved in a file called func_name.m These functions are used in the same

way any other MATLAB function is used

It is important to keep in mind that functions in MATLAB are similar tothose in other programming languages The function has its own workspace

So, communication of information between the function workspace and themain workspace is done via input and output variables

It is always a good idea to put several comment lines at the beginning of

your function These are returned by the help command.

We use a special type of MATLAB function in several examples contained

in this book This is called the inline function This makes a MATLAB inline object from a string that represents some mathematical expression orthe commands that you want MATLAB to execute As an optional argument,

you can specify the input arguments to the inline function object For example, the variable gfunc represents an inline object:

gfunc = inline('sin(2*pi*f + theta)','f','theta');

This calculates the based on two input variables: f and theta.

We can now call this function just as we would any MATLAB function

x = 0:.1:4*pi;

thet = pi/2;

ys = gfunc(x, thet);

In particular, the inline function is useful when you have a simple function

and do not want to keep it in a separate file

2πf θ+

sin

Trang 5

A.8 Control Flow

Most computer languages provide features that allow one to control the flow

of execution depending on certain conditions MATLAB has similar structs:

con-• For loops

• While loops

• If-else statements

• Switch statement

These should be used sparingly In most cases, it is more efficient in MATLAB

to operate on an entire array rather than looping through it

For Loopppp Loo

The basic syntax for a for loop is

Trang 6

can be used also In the case of arrays, all elements of the resulting array must

be true for the commands to execute

If-Else SSSStatement tatement tatementssss

Sometimes, commands must be executed based on a relational test The else statement is suitable here The basic syntax is

Switch SSSSttttaaaatemen temen tementttt

The switch statement is useful if one needs a lot of if, elseif statements

to execute the program This construct is very similar to that in the C guage The basic syntax is:

Expression must be either a scalar or a character string

A.9 Simple Plotting

For more information on some of the plotting capabilities of MATLAB, thereader is referred to Chapter 5 of this text Other useful resources are the

MATLAB documentation Using MATLAB Graphics and Graphics and GUI’s with MATLAB [Marchand, 1999] In this appendix, we briefly describe some

Trang 7

of the basic uses of plot for plotting 2-D graphics and plot3 for plotting 3-D graphics The reader is strongly urged to view the help file for more infor-

mation and options for these functions

When the function plot is called, it opens a Figure window, if one is not

already there, scales the axes to fit the data and plots the points The default

is to plot the points and connect them using straight lines For example,

plot(x,y)

plots the values in vector x on the horizontal axis and the values in vector y

on the vertical axis, connected by straight lines These vectors must be thesame size or you will get an error

Any number of pairs can be used as arguments to plot For instance, the

following command plots two curves,

plot(x,y1,x,y2)

on the same axes If only one argument is supplied to plot, then MATLAB

plots the vector versus the index of its values

The default is a solid line, but MATLAB allows other choices These aregiven in Table A.6

If several lines are plotted on one set of axes, then MATLAB plots them asdifferent colors The predefined colors are listed in Table A.7

Plotting symbols (e.g., *, x, o, etc.) can be used for the points Since the list

of plotting symbols is rather long, we refer the reader to the online help for plot for more information To plot a curve where both points and a con-nected curve are displayed, use

plot(x, y, x, y, ‘b*’)

This command first plots the points in x and y, connecting them with straight lines It then plots the points in x and y using the symbol * and the color blue The plot3 function works the same as plot, except that it takes three vec-

tors for plotting:

plot3(x, y, z)

TTTTABABABLLLLE A.6E A.6

Line Styles for Plots

Trang 8

All of the line styles, colors and plotting symbols apply to plot3 Other forms of 3-D plotting (e.g., surf and mesh) are covered in Chapter 5 Titles

and axes labels can be created for all plots using title, xlabel, ylabel and zlabel.

Before we finish this discussion on simple plotting techniques in MATLAB,

we present a way to put several axes or plots in one figure window This is through the use of the subplot function This creates an matrix of

plots (or axes) in the current figure window We provide an example below,

where we show how to create two plots side-by-side

% Create the left-most plot.

by any subsequent plotting commands To access a previous plot, simply use

the subplot function again with the proper value for the third argument p.

You can think of the subplot function as a pointer that tells MATLAB what

set of axes to work with

Through the use of MATLAB’s low-level Handle Graphics functions, thedata analyst has complete control over graphical output We do not presentany of that here, because we make limited use of these capabilities However,

we urge the reader to look at the online help for propedit This graphical

user interface allows the user to change many aspects or properties of theplots

TTTTAAAABBBBLLLLEEEE A.7A.7

Line Colors for Plots

Trang 9

A.10 Contact Information

For MATLAB product information, please contact:

The MathWorks, Inc

3 Apple Hill Drive

Natick, MA, 01760-2098 USA

tronic newsletter called the MATLAB Digest Another is called MATLAB News

& Notes, published quarterly You can subscribe to both of these at

www.mathworks.com or send an email request to

subscribe@mathworks.com

Back issues of these documents are available on-line

Trang 10

n Sample size

p Probability

Quantile Sample variance

Z Standard normal random variable

Othe

Otherrrr

Expected value of X Probability mass or density function

Trang 11

Cumulative distribution function Nearest neighbor point-event cdf Joint probability (mass) function Nearest neighbor event-event cdf

K-function

Kernel

L-function

Likelihood function Likelihood ratio

Probability of event E

Conditional probability Class-conditional probability Prior probability

Posterior probability Proposal distribution - MCMC Roughness

Variance of X Gre

GreeeeekSymbol kSymbol kSymbolssss

Probability of Type I error Probability of Type II error Projection vector - grand tour Projection vector - grand tour Acceptance probability - MCMC Residuals

Bootstrap replicate Intensity

r-th central moment

Mean Histogram bin heights Target distribution - MCMC

Trang 12

Correlation coefficient Variance

Covariance matrix Standard normal probability density function Standard normal cdf

Stationary distribution - MCMC

Class j

Acronym

Acronymssss

CSR Complete spatial randomness

EDA Exploratory data analysis

IQR Interquartile range

ISE Integrated squared error

MCMC Markov chain Monte Carlo

MIAE Mean integrated absolute error

MISE Mean integrated squared error

MSE Mean squared error

Trang 13

Appendix C

Projection Pursuit Indexes

In this appendix, we list several indexes for projection pursuit [Posse, 1995b],and we also provide the M-file source code for the functions included in theComputational Statistics Toolbox

C.1 Indexes

Since structure is considered to be departures from normality, these indexesare developed to detect non-normality in the projected data There are somecriteria that we can use to assess the usefulness of projection indexes Theseinclude affine invariance [Huber, 1985], speed of computation, and sensitiv-ity to departure from normality in the core of the distribution rather than thetails The last criterion ensures that we are pursuing structure and not justoutliers

FFFFrrrrieieieieddddmmmmaaaannnn TTTTuke uke ukeyyyy IIIInd nd ndeeeexxxx

This projection pursuit index [Friedman and Tukey, 1974] is based on point distances and is calculated using the following

inter-,

function for positive values,

PI F T(α β, ) R2 r ij

2–

1111 R2 r ij

2–

Trang 14

This index has been revised from the original to be affine invariant [Swayne,Cook and Buja, 1991] and has computational order

EEEEntntntntrrrrop op opyyyy Ind Ind Indeeeexxxx

This projection pursuit index [Jones and Sibson, 1987] is based on the entropyand is given by

,

where is the bivariate standard normal density The bandwidths

are obtained from

This index is also

Moment

Moment IIIInd nd ndeeeexxxx

This index was developed in Jones and Sibson [1987] and is based on ate third and fourth moments This is very fast to compute, so it is useful forlarge data sets However, a problem with this index is that it tends to locatestructure in the tails of the distribution It is given by

hβ

,

κ03

2 14 - κ40 2

4κ31 2

6κ22 24κ13 2

κ04 2

- z i

α( )3

β( )3

i 1

n

∑

=

Trang 15

polynomials with J terms Note that MATLAB has a function for obtaining

these polynomials called legendre.

κ31

n n( +1)

n–1( ) n 2( – ) n 3( – ) - ( )z iα 3

β( )3

- ( )z iβ 4 3(n 1– )3

n n( +1) -–

α( )4 3(n 1– )3

n n( +1) -–

z iβ

( )2 (n–1)3

n n( +1) -–

β( )2

Trang 16

where is the Legendre polynomial of order a This index is not affine

invariant, so Morton [1989] proposed the following revised index This isbased on a conversion to polar coordinates as follows

We then have the following index where Fourier series and Laguerre mials are used:

,

where represents the Laguerre polynomial of order a Two more indexes

based on the distance using expansions in Hermite polynomials are given

in Posse [1995b]

C.2 MATLAB Source Code

The first function we look at is the one to calculate the chi-square projectionpursuit index

function ppi = csppind(x,a,b,n,ck)

% x is the data, a and b are the projection vectors,

% n is the number of data points, and ck is the value

% of the standard normal bivariate cdf for the boxes.

L a

L2

Trang 17

% find # points in each box

for i=1:(nr-1)% loop over each ring

for k=1:(na-1)% loop over each wedge

ind =

find(r>rd(i) & r<rd(i+1) &

th>angles(k) & th<angles(k+1));

Any of the other indexes can be coded in an M-file function and called by the

csppeda function given below You would call your function instead of

csppind

function [as,bs,ppm]=csppeda(Z,c,half,m)

% Z is the sphered data.

Trang 18

% get the necessary constants

% find the probability of bivariate standard normal

% over each radial box.

% NOTE: the user could put the values in to ck to

% prevent re-calculating each time We thought the

% reader would be interested in seeing how we did

% it

% NOTE: MATLAB 5 users should use the function

% quad8 instead of quadl.

% generate a random starting plane

% this will be the current best plane

% find the projection index for this plane

% this will be the initial value of the index ppimax = csppind(Z,astar,bstar,n,ck);

% keep repeating this search until the value

Trang 19

% c becomes less than cstop or until the

% number of iterations exceeds maxiter

mi = 0;

% number of iterations without increase in index

h = 0;

c = cs;

while (mi < maxiter) & (c > cstop)

% generate a p-vector on the unit sphere

Trang 20

% Transform data using the matrix U.

% To match Friedman's treatment: T is d x n.

T = U*Z';

% These should be the 2-d projection that is 'best' x1 = T(1,:);

x2 = T(2,:);

% Gaussianize the first two rows of T.

% set of vector of angles

gam = [0,pi/4, pi/8, 3*pi/8];

Trang 22

Appendix D

M ATLAB Code

In this appendix, we provide the MATLAB functions for some of the morecomplicated techniques covered in this book This includes code for the boot-strap confidence interval, the adaptive mixtures algorithm for probabil-ity density estimation, classification trees, and regression trees

D.1 Bootstrap Confidence Interval

% Loop over each resample and

% calculate the bootstrap replicates.

for i = 1:B

% generate the indices for the B bootstrap

% resamples, sampling with

% replacement using the discrete uniform.

ind = ceil(n.*rand(n,1));

% extract the sample from the data

% each row corresponds to a bootstrap resample

xstar = data(ind,:);

% use feval to evaluate the estimate for

% the i-th resample

bvals(i) = feval(fname, xstar);

Trang 23

for i = 1:n

% use feval to evaluate the estimate

% with the i-th observation removed

% These are the jackknife replications.

jvals(i) =

feval(fname, [data(1:(i-1));data((i+1):n)]); end

D.2 Adaptive Mixtures Density Estimation

First we provide some of the helper functions that are used in csadpmix.

This first function calculates the estimated posterior probability, given thecurrent estimated model and the new observation

% function post=rpostup(x,pies,mus,vars,nterms)

% This function will return the posterior

function post = rpostup(x,pies,mus,vars,nterms)

% This function will update all of the parameters for

% the adaptive mixtures density estimation approach

Trang 24

% This function will update the variances

% in the AMDE Call with nterms-1,

% since new term is based only on previous terms

function newvar = cssetvar(mus,pies,vars,x,nterms)

Here is the main MATLAB function csadpmix that ties everything together.

For brevity, we show only the part of the function that corresponds to theunivariate case View the M-file for the multivariate case

function [pies,mus,vars] = cadpmix(x,maxterms)

Trang 25

else % create a new term

end % end if statement

% to prevent spiking of variances

index = find(vars(1:nterms)<1/(sievebd*nterms)); vars(index) = ones(size(index))/(sievebd*nterms); end % for i loop

% clean up the model - get rid of the 0 terms

Trang 26

function tree = csgrowc(X,maxn,clas,Nk,pies)

[n,dd] = size(X);

if nargin == 4% then estimate the pies

pies = Nk/n;

end

% The tree will be implemented as a structure.

% get the initial tree - which is the data set itself tree.pies = pies;

% need for node impurity calcs:

% This will be a 2 element vector of

% node numbers to the children.

Trang 27

tree.node.data = X;

% Now get started on growing the very large tree.

% first we have to extract the number of terminal nodes

% that qualify for splitting.

% get the data needed to decide to split the node [term,nt,imp]=getdata(tree);

% find all of the nodes that qualify for splitting ind = find( (term==1) & (imp>0) & (nt>maxn) );

% now start splitting

func-function tree = csgrowr(X,y,maxn)

Trang 29

Appendix E

M ATLAB Statistics Toolbox

The following tables list the functions that are available in the MATLABStatistics Toolbox, Version 3.0 This toolbox is available for purchase fromThe MathWorks, Inc

TTTTABABABLLLLE E.1E E.1

Functions for Parameter Estimation (fit) and Distribution Statistics - Mean and Variance (stat)

betafit, betastat Beta distribution.

binofit, binostat Binomial distribution.

expfit, expstat Exponential distribution.

gamfit, gamstat Gamma distribution.

geostat Geometric distribution

hygestat Hypergeometric distribution

lognstat Lognormal distribution

mle Maximum likelihood parameter estimation.

nbinstat Negative binomial distribution

ncfstat Noncentral F distribution

nctstat Noncentral t distribution

ncx2stat Noncentral Chi-square distribution

normfit, normstat Normal distribution.

poissfit, poisstat Poisson distribution.

raylfit Rayleigh distribution.

unidstat Discrete uniform distribution

unifit, unifstat Uniform distribution.

weibfit, weibstat Weibull distribution.

Trang 30

TTTTAAAABBBBLLLLEEEE EEEE.2.2

Probability Density Functions (pdf) and Cumulative Distribution

Functions (cdf)

betapdf, betacdf Beta distribution

binopdf, binocdf Binomial distribution

chi2pdf, chi2cdf Chi-square distribution

exppdf, expcdf Exponential distribution

fpdf, fcdf F distribution

gampdf, gamcdf Gamma distribution

geopdf, geocdf Geometric distribution

hygepdf, hygecdf Hypergeometric distribution

lognpdf, logncdf Log normal distribution

nbinpdf, nbincdf Negative binomial distribution

ncfpdf, ncfcdf Noncentral F distribution

nctpdf, nctcdf Noncentral t distribution

ncx2pdf, ncx2cdf Noncentral chi-square distribution

normpdf, normcdf Normal distribution

pdf, cdf Probability density/Cumulative distribution

poisspdf, poisscdf Poisson distribution

raylpdf, raylcdf Rayleigh distribution

tpdf, tcdf T distribution

unidpdf, unidcdf Discrete uniform distribution

unifpdf, unifcdf Continuous uniform distribution

weibpdf, weibcdf Weibull distribution

Trang 31

TTTTAAAABBBBLLLLE E.3E E.3

Critical Values (inv) and Random Number Generation (rnd) for

Probability Distribution Functions

betainv, betarnd Beta distribution

binoinv, binornd Binomial distribution

chi2inv, chi2rnd Chi-square distribution

expinv, exprnd Exponential distribution

finv, frnd F distribution

gaminv, gamrnd Gamma distribution

geoinv, geornd Geometric distribution

hygeinv, hygernd Hypergeometric distribution

logninv, lognrnd Log normal distribution

nbininv, nbinrnd Negative binomial distribution

ncfinv, ncfrnd Noncentral F distribution

nctinv, nctrnd Noncentral t distribution

ncx2inv, ncx2rnd Noncentral chi-square distribution

norminv, normrnd Normal distribution

poissinv, poissrnd Poisson distribution

raylinv, raylrnd Rayleigh distribution

tinv, trnd T distribution

unidinv, unidrnd Discrete uniform distribution

unifinv, unifrnd Continuous uniform distribution

weibinv, weibrnd Weibull distribution

icdf Specified inverse cdf

Định dạng
Số trang	62
Dung lượng	5,3 MB