SAS/ETS 9.22 User''''s Guide 167 pptx

Output 23.3.1 Summary of the Slide MeasuresThe SIMILARITY Procedure Slide Measures Summary for Input=ELECTRIC and Target=MASONRY Slide Slide Target Input Slide Slide Slide Sequence Seque

Trang 1

1652 F Chapter 23: The SIMILARITY Procedure

Example 23.3: Sliding Similarity Analysis

This example illustrates how to use sliding similarity analysis to compare two time sequences The SASHELP.WORKERSdata set contains two similar time series variables (ELECTRICandMASONRY), which represent employment over time The following statements create an example data set that contains two time series of differing lengths, where the variable MASONRY has the first 12 and last

7 observations set to missing to simulate the lack of data associated with the target series:

data workers; set sashelp.workers;

if '01JAN1978'D <= date < '01JAN1982'D then masonry = masonry;

else masonry = ;

run;

The goal of sliding similarity measures analysis is find the slide index that corresponds to the most similar subsequence of the input series when compared to the target sequence The following statements perform sliding similarity analysis on the example data set:

proc similarity data=workers out=_NULL_ print=(slides summary);

id date interval=month;

input electric;

target masonry / slide=index measure=msqrdev

expand=(localabs=3 globalabs=3) compress=(localabs=3 globalabs=3);

run;

The DATA=WORKERS option specifies that the input data set WORK.WORKERS is to be used

in the analysis The OUT=_NULL_ option specifies that no output time series data set is to

be created The PRINT=(SLIDES SUMMARY) option specifies that the ODS tables related to the sliding similarity measures and their summary be produced The INPUT statement speci-fies that the input variable is ELECTRIC The TARGET statement specifies that the target vari-able is MASONRY and that the similarity measure be computed using mean squared deviation (MEASURE=MSQRDEV) The SLIDE=INDEX option specifies observation index sliding The COMPRESS=(LOCALABS=3 GLOBALABS=3) option limits local and global absolute compres-sion to 3 The EXPAND=(LOCALABS=3 GLOBALABS=3) option limits local and global absolute expansion to 3

Trang 2

Output 23.3.1 Summary of the Slide Measures

The SIMILARITY Procedure

Slide Measures Summary for Input=ELECTRIC and Target=MASONRY

Slide Slide Target Input Slide Slide Slide Sequence Sequence Warping Minimum

Index DATE Length Length Amount Measure

Output 23.3.2 Minimum Measure

Minimum Measure Summary

Input Variable MASONRY

ELECTRIC 322.5460

This analysis results in 23 slides based on the observation index The minimum measure (322.5460) occurs at slide index 13 which corresponds to the time value FEB1978 Note that the original data setSASHELP.WORKERSwas modified beginning at the time value JAN1978 This similarity analysis justifies the belief theELECTRIClagsMASONRYby one month based on the time series cross-correlation analysis despite the lack of target data (MASONRY)

The goal of seasonal sliding similarity measures is to find the seasonal slide index that corresponds to the most similar seasonal subsequence of the input series when compared to the target sequence The

Trang 3

following statements repeat the preceding similarity analysis on the example data set with seasonal sliding:

proc similarity data=workers out=_NULL_ print=(slides summary);

id date interval=month;

input electric;

target masonry / slide=season measure=msqrdev;

run;

Output 23.3.3 Summary of the Seasonal Slide Measures

The SIMILARITY Procedure

Slide Measures Summary for Input=ELECTRIC and Target=MASONRY

Slide Slide Target Input Slide Slide Slide Sequence Sequence Warping Minimum Index DATE Length Length Amount Measure

Output 23.3.4 Seasonal Minimum Measure

Minimum Measure Summary

Input Variable MASONRY ELECTRIC 641.9273

The analysis differs from the previous analysis in that the slides are performed based on the seasonal index (SLIDE=SEASON) with no warping With a seasonality of 12, two seasonal slides are considered at slide indices 0 and 12 with the minimum measure (641.9273) occurring at slide index

12 which corresponds to the time value JAN1978 Note that the original data setSASHELP.WORKERS was modified beginning at the time value JAN1978 This similarity analysis justifies the belief that ELECTRICandMASONRYhave similar seasonal properties based on seasonal decomposition analysis despite the lack of target data (MASONRY)

Example 23.4: Searching for Historical Analogies

This example illustrates how to search for historical analogies by using seasonal sliding similarity analysis of transactional time-stamped data TheSASHELP.TIMEDATAdata set contains the variable (VOLUME), which represents activity over time The following statements create an example data

Trang 4

set that contains two time series of differing lengths, where the variableHISTORYrepresents the historical activity andRECENTrepresents the more recent activity:

data timedata; set sashelp.timedata;

drop volume;

recent = ;

history = volume;

if datetime >= '20AUG2000:00:00:00'DT then do;

recent = volume;

history = ;

end;

run;

The goal of seasonal sliding similarity measures is to find the seasonal slide index that corresponds

to the most similar seasonal subsequence of the input series when compared to the target sequence The following statements perform similarity analysis on the example data set with seasonal sliding:

proc similarity data=timedata out=_NULL_ outsequence=sequences

outsum=summary;

id datetime interval=dtday accumulate=total

start='27JUL1997:00:00:00'dt end='21OCT2000:11:59:59'DT;

input history / normalize=absolute;

target recent / slide=season normalize=absolute measure=mabsdev;

run;

The DATA=TIMEDATA option specifies that the input data set WORK.TIMEDATAbe used in the analysis The OUT=_NULL_ option specifies that no output time series data set is to be created The OUTSEQUENCE=SEQUENCES and OUTSUM=SUMMARY options specify the output sequences and summary data sets, respectively The ID statement specifies that the time ID variable isDATETIME, which is to be accumulated on a daily basis (INTERVAL=DTDAY) by summing the transactions (ACCUMULATE=TOTAL) The ID statement also specifies that the data is accumulated on the weekly boundaries starting on the week of 27JUL1997 and ending on the week of 15OCT2000 (START=’27JUL1997:00:00:00’DT END=’21OCT2000:11:59:59’DT) The INPUT statement spec-ifies that the input variable isHISTORY, which is to be normalized using absolute normalization (NORMALIZE=ABSOLUTE) The TARGET statement specifies that the target variable isRECENT, which is to be normalized by using absolute normalization (NORMALIZE=ABSOLUTE) and that the similarity measure be computed by using mean absolute deviation (MEASURE=MABSDEV) The SLIDE=SEASON options specifies season index sliding

To illustrate the results of the similarity analysis, the output sequence data set must be subset by using the output summary data set

data _NULL_; set summary;

call symput('MEASURE', left(trim(putn(recent,'BEST20.'))));

run;

data result; set sequences;

by _SLIDE_;

retain flag 0;

if first._SLIDE_ then do;

if (&measure - 0.00001 < _SIM_ < &measure + 0.00001)

then flag = 1;

Trang 5

end;

if flag then output;

if last._SLIDE_ then flag = 0;

run;

The following statements generate a cross series plot of the results:

proc timeseries data=result out=_NULL_ crossplot=series;

id datetime interval=dtday;

var _TARSEQ_;

crossvar _INPSEQ_;

run;

The cross series plot illustrates that the historical time series analogy most similar to the most recent time series data that started on 20AUG2000 occurred on 02AUG1998

Output 23.4.1 Cross Series Plot of the Historical Time Series

Trang 6

Barry, M J and Linoff, G S (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, New York: John Wiley & Sons

Han, J and Kamber, M (2001), Data Mining: Concepts and Techniques, San Francisco: Morgan Kaufmann Publishers

Leonard, M J and Wolfe, B L (2005), “Mining Transactional and Time Series Data,” Proceedings

of the Thirtieth Annual SAS Users Group International Conference, Cary, NC: SAS Institute Inc Leonard, M J., Elsheimer, D B., and Sloan, J (2008), “An Introduction to Similarity Analysis Using SAS,” Proceedings of the SAS Global Forum 2008 Conference, Cary, NC: SAS Institute Inc

Pyle, D (1999), Data Preparation for Data Mining, San Francisco: Morgan Kaufman Publishers, Inc

Sankoff, D and Kruskal, J B (2001), Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Stanford, CA: CSLI Publications

Trang 7

1658

Trang 8

The SIMLIN Procedure

Contents

Overview: SIMLIN Procedure 1659

Getting Started: SIMLIN Procedure 1660

Prediction and Simulation 1661

Syntax: SIMLIN Procedure 1662

Functional Summary 1662

PROC SIMLIN Statement 1663

BY Statement 1664

ENDOGENOUS Statement 1665

EXOGENOUS Statement 1665

ID Statement 1665

LAGGED Statement 1665

OUTPUT Statement 1666

Details: SIMLIN Procedure 1666

Defining the Structural Form 1667

Computing the Reduced Form 1667

Dynamic Multipliers 1667

Multipliers for Higher Order Lags 1668

EST= Data Set 1669

DATA= Data Set 1670

OUTEST= Data Set 1670

OUT= Data Set 1671

Printed Output 1671

ODS Table Names 1673

Examples: SIMLIN Procedure 1673

Example 24.1: Simulating Klein’s Model I 1673

Example 24.2: Multipliers for a Third-Order System 1682

References 1687

Overview: SIMLIN Procedure

The SIMLIN procedure reads the coefficients for a set of linear structural equations, which are usually produced by the SYSLIN procedure PROC SIMLIN then computes the reduced form and, if

Trang 9

1660 F Chapter 24: The SIMLIN Procedure

input data are given, uses the reduced form equations to generate predicted values PROC SIMLIN is especially useful when dealing with sets of structural difference equations The SIMLIN procedure can perform simulation or forecasting of the endogenous variables

The SIMLIN procedure can be applied only to models that are:

linear with respect to the parameters

linear with respect to the variables

square (as many equations as endogenous variables)

nonsingular (the coefficients of the endogenous variables form an invertible matrix)

Getting Started: SIMLIN Procedure

The SIMLIN procedure processes the coefficients in a data set created by the SYSLIN procedure using the OUTEST= option or by another regression procedure such as PROC REG To use PROC SIMLIN you must first produce the coefficient data set and then specify this data set on the EST= option of the PROC SIMLIN statement You must also tell PROC SIMLIN which variables are endogenous and which variables are exogenous List the endogenous variables in an ENDOGENOUS statement, and list the exogenous variables in an EXOGENOUS statement

The following example illustrates the creation of an OUTEST= data set with PROC SYSLIN and the computation and printing of the reduced form coefficients for the model with PROC SIMLIN

proc syslin data=in outest=e;

model y1 = y2 x1;

model y2 = y1 x2;

run;

proc simlin est=e;

endogenous y1 y2;

exogenous x1 x2;

run;

If the model contains lagged endogenous variables you must also use a LAGGED statement to tell PROC SIMLIN which variables contain lagged values, which endogenous variables they are lags of, and the number of periods of lagging For dynamic models, the TOTAL and INTERIM= options can

be used on the PROC SIMLIN statement to compute and print total and impact multipliers (See

"Dynamic Multipliers" later in this section for an explanation of multipliers.)

In the following example the variables Y1LAG1, Y2LAG1, and Y2LAG2 contain lagged values

of the endogenous variables Y1 and Y2 Y1LAG1 and Y2LAG1 contain values of Y1 and Y2 for the previous observation, while Y2LAG2 contains 2 period lags of Y2 The LAGGED statement specifies the lagged relationships, and the TOTAL and INTERIM= options request multiplier analysis

Trang 10

The INTERIM=2 option prints matrices showing the impact that changes to the exogenous variables have on the endogenous variables after 1 and 2 periods

data in; set in;

y1lag1 = lag(y1);

y2lag1 = lag(y2);

y2lag2 = lag2(y2);

run;

proc syslin data=in outest=e;

model y1 = y2 y1lag1 y2lag2 x1;

model y2 = y1 y2lag1 x2;

run;

proc simlin est=e total interim=2;

lagged y1lag1 y1 1 y2lag1 y2 1 y2lag2 y2 2;

run;

After the reduced form of the model is computed, the model can be simulated by specifying an input data set on the PROC SIMLIN statement and using an OUTPUT statement to write the simulation results to an output data set The following example modifies the PROC SIMLIN step from the preceding example to simulate the model and stores the results in an output data set

proc simlin est=e total interim=2 data=in;

lagged y1lag1 y1 1 y2lag1 y2 1 y2lag2 y2 2;

output out=sim predicted=y1hat y2hat

residual=y1resid y2resid;

run;

Prediction and Simulation

If an input data set is specified with the DATA= option in the PROC SIMLIN statement, the procedure reads the data and uses the reduced form equations to compute predicted and residual values for each

of the endogenous variables (If no data set is specified with the DATA= option, no simulation of the system is performed, and only the reduced form and multipliers are computed.)

The character of the prediction is based on the START= value Until PROC SIMLIN encounters the START= observation, actual endogenous values are found and fed into the lagged endogenous terms Once the START= observation is reached, dynamic simulation begins, where predicted values are fed into lagged endogenous terms until the end of the data set is reached

The predicted and residual values generated here are different from those produced by the SYSLIN procedure since PROC SYSLIN uses the structural form with actual endogenous values The

Định dạng
Số trang	10
Dung lượng	248,89 KB