Multiple box plots can be used as an alternative to the seasonal subseries plot to detect seasonality.. Related Techniques Box Plot Run Sequence Plot Autocorrelation Plot Software Season
Trang 16 Process or Product Monitoring and Control
6.4 Introduction to Time Series Analysis
6.4.4 Univariate Time Series Models
6.4.4.3 Seasonality
Seasonality Many time series display seasonality By seasonality, we mean periodic
fluctuations For example, retail sales tend to peak for the Christmas season and then decline after the holidays So time series of retail sales will typically show increasing sales from September through December and declining sales in January and February
Seasonality is quite common in economic time series It is less common
in engineering and scientific data
If seasonality is present, it must be incorporated into the time series model In this section, we discuss techniques for detecting seasonality
We defer modeling of seasonality until later sections
Detecting
Seasonality he following graphical techniques can be used to detect seasonality.
A run sequence plot will often show seasonality
1
A seasonal subseries plot is a specialized technique for showing seasonality
2
Multiple box plots can be used as an alternative to the seasonal subseries plot to detect seasonality
3
The autocorrelation plot can help identify seasonality
4
Examples of each of these plots will be shown below
The run sequence plot is a recommended first step for analyzing any time series Although seasonality can sometimes be indicated with this plot, seasonality is shown more clearly by the seasonal subseries plot or the box plot The seasonal subseries plot does an excellent job of
showing both the seasonal differences (between group patterns) and also the within-group patterns The box plot shows the seasonal difference (between group patterns) quite well, but it does not show within group patterns However, for large data sets, the box plot is usually easier to read than the seasonal subseries plot
Both the seasonal subseries plot and the box plot assume that the
6.4.4.3 Seasonality
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm (1 of 5) [5/1/2006 10:35:20 AM]
Trang 2seasonal periods are known In most cases, the analyst will in fact know this For example, for monthly data, the period is 12 since there are 12 months in a year However, if the period is not known, the
autocorrelation plot can help If there is significant seasonality, the autocorrelation plot should show spikes at lags equal to the period For example, for monthly data, if there is a seasonality effect, we would expect to see significant peaks at lag 12, 24, 36, and so on (although the intensity may decrease the further out we go)
Example
without
Seasonality
The following plots are from a data set of southern oscillations for predicting el nino
Run
Sequence
Plot
No obvious periodic patterns are apparent in the run sequence plot
6.4.4.3 Seasonality
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm (2 of 5) [5/1/2006 10:35:20 AM]
Trang 3Subseries
Plot
The means for each month are relatively close and show no obvious pattern
Box Plot
As with the seasonal subseries plot, no obvious seasonal pattern is apparent
Due to the rather large number of observations, the box plot shows the difference between months better than the seasonal subseries plot
6.4.4.3 Seasonality
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm (3 of 5) [5/1/2006 10:35:20 AM]
Trang 4with
Seasonality
The following plots are from a data set of monthly CO2 concentrations
A linear trend has been removed from these data
Run
Sequence
Plot
This plot shows periodic behavior However, it is difficult to determine the nature of the seasonality from this plot
Seasonal
Subseries
Plot
The seasonal subseries plot shows the seasonal pattern more clearly In
6.4.4.3 Seasonality
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm (4 of 5) [5/1/2006 10:35:20 AM]
Trang 5this case, the CO2 concentrations are at a minimun in September and October From there, steadily the concentrations increase until June and then begin declining until September
Box Plot
As with the seasonal subseries plot, the seasonal pattern is quite evident
in the box plot
6.4.4.3 Seasonality
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm (5 of 5) [5/1/2006 10:35:20 AM]
Trang 6This plot allows you to detect both between group and within group patterns
If there is a large number of observations, then a box plot may be preferable
Definition Seasonal subseries plots are formed by
Vertical axis: Response variable Horizontal axis: Time ordered by season For example, with
monthly data, all the January values are plotted (in chronological order), then all the February values, and so on
In addition, a reference line is drawn at the group means
The user must specify the length of the seasonal pattern before generating this plot In most cases, the analyst will know this from the context of the problem and data collection
Questions The seasonal subseries plot can provide answers to the following
questions:
Do the data exhibit a seasonal pattern?
1
What is the nature of the seasonality?
2
Is there a within-group pattern (e.g., do January and July exhibit similar patterns)?
3
Are there any outliers once seasonality has been accounted for?
4
Importance It is important to know when analyzing a time series if there is a
significant seasonality effect The seasonal subseries plot is an excellent tool for determining if there is a seasonal pattern
Related
Techniques
Box Plot Run Sequence Plot Autocorrelation Plot
Software Seasonal subseries plots are available in a few general purpose statistical
software programs They are available in Dataplot It may possible to write macros to generate this plot in most statistical software programs that do not provide it directly
6.4.4.3.1 Seasonal Subseries Plot
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4431.htm (2 of 2) [5/1/2006 10:35:20 AM]
Trang 7(AR) Models
A common approach for modeling univariate time series is the autoregressive (AR) model:
where X t is the time series, A t is white noise, and
with denoting the process mean
An autoregressive model is simply a linear regression of the current value of the series against one or more prior values of the series The
value of p is called the order of the AR model.
AR models can be analyzed with one of various methods, including
standard linear least squares techniques They also have a straightforward interpretation
Moving
Average (MA)
Models
Another common approach for modeling univariate time series models is the moving average (MA) model:
where X t is the time series, is the mean of the series, A t-i are white noise, and 1, , q are the parameters of the model The value of q
is called the order of the MA model
That is, a moving average model is conceptually a linear regression of the current value of the series against the white noise or random
shocks of one or more prior values of the series The random shocks
at each point are assumed to come from the same distribution, typically a normal distribution, with location at zero and constant scale The distinction in this model is that these random shocks are propogated to future values of the time series Fitting the MA estimates is more complicated than with AR models because the error terms are not observable This means that iterative non-linear fitting procedures need to be used in place of linear least squares MA models also have a less obvious interpretation than AR models
Sometimes the ACF and PACF will suggest that a MA model would
be a better model choice and sometimes both AR and MA terms should be used in the same model (see Section 6.4.4.5)
Note, however, that the error terms after the model is fit should be
independent and follow the standard assumptions for a univariate process
6.4.4.4 Common Approaches to Univariate Time Series
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc444.htm (2 of 3) [5/1/2006 10:35:21 AM]
Trang 8Approach
Box and Jenkins popularized an approach that combines the moving average and the autoregressive approaches in the book "Time Series
1994)
Although both autoregressive and moving average approaches were already known (and were originally investigated by Yule), the contribution of Box and Jenkins was in developing a systematic methodology for identifying and estimating models that could incorporate both approaches This makes Box-Jenkins models a powerful class of models The next several sections will discuss these models in detail
6.4.4.4 Common Approaches to Univariate Time Series
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc444.htm (3 of 3) [5/1/2006 10:35:21 AM]
Trang 9Stages in
Box-Jenkins
Modeling
There are three primary stages in building a Box-Jenkins time series model
Model Identification
1
Model Estimation
2
Model Validation
3
Remarks The following remarks regarding Box-Jenkins models should be noted
Box-Jenkins models are quite flexible due to the inclusion of both autoregressive and moving average terms
1
Based on the Wold decomposition thereom (not discussed in the Handbook), a stationary process can be approximated by an ARMA model In practice, finding that approximation may not be easy
2
Chatfield (1996) recommends decomposition methods for series
in which the trend and seasonal components are dominant
3
Building good ARIMA models generally requires more experience than commonly used statistical methods such as regression
4
Sufficiently
Long Series
Required
Typically, effective fitting of Box-Jenkins models requires at least a moderately long series Chatfield (1996) recommends at least 50 observations Many others would recommend at least 100 observations
6.4.4.5 Box-Jenkins Models
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc445.htm (2 of 2) [5/1/2006 10:35:21 AM]
Trang 10Identify p and q Once stationarity and seasonality have been addressed, the next step
is to identify the order (i.e., the p and q) of the autoregressive and
moving average terms
Autocorrelation
and Partial
Autocorrelation
Plots
The primary tools for doing this are the autocorrelation plot and the
partial autocorrelation plot The sample autocorrelation plot and the sample partial autocorrelation plot are compared to the theoretical behavior of these plots when the order is known
Order of
Autoregressive
Process (p)
Specifically, for an AR(1) process, the sample autocorrelation function should have an exponentially decreasing appearance
However, higher-order AR processes are often a mixture of exponentially decreasing and damped sinusoidal components
For higher-order autoregressive processes, the sample autocorrelation needs to be supplemented with a partial autocorrelation plot The
partial autocorrelation of an AR(p) process becomes zero at lag p+1
and greater, so we examine the sample partial autocorrelation function to see if there is evidence of a departure from zero This is usually determined by placing a 95% confidence interval on the sample partial autocorrelation plot (most software programs that generate sample autocorrelation plots will also plot this confidence interval) If the software program does not generate the confidence band, it is approximately , with N denoting the sample
size
Order of
Moving
Average
Process (q)
The autocorrelation function of a MA(q) process becomes zero at lag
q+1 and greater, so we examine the sample autocorrelation function
to see where it essentially becomes zero We do this by placing the 95% confidence interval for the sample autocorrelation function on the sample autocorrelation plot Most software that can generate the autocorrelation plot can also generate this confidence interval
The sample partial autocorrelation function is generally not helpful for identifying the order of the moving average process
6.4.4.6 Box-Jenkins Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc446.htm (2 of 4) [5/1/2006 10:35:27 AM]
Trang 11Shape of
Autocorrelation
Function
The following table summarizes how we use the sample autocorrelation function for model identification
Exponential, decaying to zero
Autoregressive model Use the partial autocorrelation plot to identify the order of the autoregressive model
Alternating positive and negative, decaying to zero
Autoregressive model Use the partial autocorrelation plot to help identify the order
One or more spikes, rest are essentially zero
Moving average model, order identified by where plot becomes zero
Decay, starting after a few lags
Mixed autoregressive and moving average model
All zero or close to zero Data is essentially random
High values at fixed intervals
Include seasonal autoregressive term
No decay to zero Series is not stationary
Mixed Models
Difficult to
Identify
In practice, the sample autocorrelation and partial autocorrelation functions are random variables and will not give the same picture as the theoretical functions This makes the model identification more difficult In particular, mixed models can be particularly difficult to identify
Although experience is helpful, developing good models using these sample plots can involve much trial and error For this reason, in recent years information-based criteria such as FPE (Final Prediction Error) and AIC (Aikake Information Criterion) and others have been preferred and used These techniques can help automate the model identification process These techniques require computer software to use Fortunately, these techniques are available in many commerical statistical software programs that provide ARIMA modeling
capabilities
For additional information on these techniques, see Brockwell and Davis (1987, 2002)
6.4.4.6 Box-Jenkins Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc446.htm (3 of 4) [5/1/2006 10:35:27 AM]
Trang 12Examples We show a typical series of plots for performing the initial model
identification for
the southern oscillations data and
1
the CO2 monthly concentrations data
2
6.4.4.6 Box-Jenkins Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc446.htm (4 of 4) [5/1/2006 10:35:27 AM]
Trang 13Subseries Plot
The seasonal subseries plot indicates that there is no significant seasonality
Since the above plots show that this series does not exhibit any significant non-stationarity or seasonality, we generate the autocorrelation and partial autocorrelation plots of the raw data
Autocorrelation
Plot
The autocorrelation plot shows a mixture of exponentially decaying
6.4.4.6.1 Model Identification for Southern Oscillations Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4461.htm (2 of 3) [5/1/2006 10:35:28 AM]
Trang 14and damped sinusoidal components This indicates that an autoregressive model, with order greater than one, may be appropriate for these data The partial autocorrelation plot should be examined to determine the order
Partial
Autocorrelation
Plot
The partial autocorrelation plot suggests that an AR(2) model might
be appropriate
In summary, our intial attempt would be to fit an AR(2) model with
no seasonal terms and no differencing or trend removal Model validation should be performed before accepting this as a final model
6.4.4.6.1 Model Identification for Southern Oscillations Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4461.htm (3 of 3) [5/1/2006 10:35:28 AM]