Opening the data set and data description We recommend that you create a log file before you start working in Stata, this way you will have all your computations on a file to review aft
Trang 1Center for Teaching, Research & Learning
Social Science Research Lab American University, Washington, D.C
http://www.american.edu/provost/ctrl/
202-885-3862
Stata & Time series
Stata is a general-purpose statistical software package Stata's full range of capabilities include: data management, statistical analysis, graphics, simulations, and custom programming
Course Objective
This course is designed to give a basic understanding of some of the features available in Stata when working with time series analysis Time series data represents a pool of variables observed and recorded over time For this tutorial we are going to use the “Time series.dta” data set containing the following variables: date, unemployment, consumer price index (CPI), interest rate, and GDP growth “Time series.dta” contains observations for each quarter from 1960 to
2005
Learning Outcomes
1 Opening the data set and data description
2 Declaring the data to be Time Series
3 Useful time series command
4 Autocorrelation and cross-correlation analysis
5 Unit Root test
1 Opening the data set and data description
We recommend that you create a log file before you start working in Stata, this way you will have all your computations on a file to review afterwards
To do this, go to: File > Log > Begin This file will record all the input that you type, as well as all
the output produced by STATA Alternatively, you can type (in the command window):
Trang 2Opening the data file For this tutorial, we will use Time series.dta, which can be downloaded
from:
http://www.american.edu/provost/ctrl/trainingguides.cfm
In Stata 11 and earlier versions, before you open the dataset, you may need to set the memory
size (In this instance, this isn’t necessary, as the example dataset is relatively small and does not
require a lot of memory.) To tell STATA how much memory to set aside for data, type:
set mem 100m
(This command is not needed in Stata 12)
Once you have downloaded and unzipped the dataset, you can access by going to: File > Open
Alternatively, you can type:
use "C:\Users\CTRL\Desktop\Time series.dta", clear
where the clear option has been appended This clears Stata’s memory, allowing you to open a
new dataset
In order to get a sense of what the data file contains we can use a couple of commands:
summerize and describe, both stata commands provide useful information about our data set
and variables
Summarize calculates and displays a variety of univariate summary statistics If no variable list is
specified, summary statistics are calculated for all the variables in the dataset
Describe produces a summary of the dataset in memory or of the data stored in a Stata-format
dataset
Example using “Time series.dta”
summarize
datevar 181 90 52.39434 0 180
gdp 181 2.031231 2.001162 -1.703726 9.718504
interest 181 6.167403 3.3706 .98 19.1
cpi 181 95.91184 54.13317 29.39667 192.1667
unemp 181 5.914917 1.453928 3.4 10.66667
Variable Obs Mean Std Dev Min Max
Trang 3describe
2 Declaring the data to be Time Series
Using the time variable “datevar”, we are able to declare the data as times series in order to use
the time series operators
Using the tsset command
tsset declares the data in memory to be a time series tssetting the data is what makes Stata's
timeseries operators such as L and F (lag and lead) work Also, before using the other time
-series commands, you must tsset the data first If you save the data after tsset, Stata will
remember that data as being time series and you will not have to tsset again
Example using “Time series.dta”
tsset datevar
Note: dataset has changed since last saved
Sorted by: datevar
datevar float %tq Date variable
gdp float %9.0g GDP annual growth
interest float %9.0g Federal Funds Interest Rate
cpi float %9.0g Consumer Price Index
unemp float %9.0g Unemployment Rate
variable name type format label variable label
storage display value
size: 3,620
vars: 5 12 Oct 2011 10:00
obs: 181
Contains data from C:\Users\CTRL\Desktop\Time series.dta
delta: 1 quarter
time variable: datevar, 1960q1 to 2005q1
Trang 43 Useful Time Series commands
In this section, we introduce a few basic but very helpful commands
tin (times in, from time A to time B) option:
list datevar unemp if tin(2000q1,2000q4)
twithin (times within time A and time B, excluding the two time points) option:
list datevar unemp if twithin(2001q1,2001q3)
164 2000q4 3.9
163 2000q3 4
162 2000q2 3.933333
161 2000q1 4.033333
datevar unemp
166 2001q2 4.4
datevar unemp
Trang 5
Generating values bases on past observations using the lag operator and forward-looking values
using the lead operator:
generate unempL1=L1.unemp
generate unempL2=L2.unemp
list datevar unemp unempL1 unempL2 in 1/5
generate unempF1=F1.unemp
generate unempF2=F2.unemp
list datevar unemp unempF1 unempF2 in 1/5
5 1961q1 6.8 6.266667 5.533333
4 1960q4 6.266667 5.533333 5.233333
3 1960q3 5.533333 5.233333 5.133333
2 1960q2 5.233333 5.133333
1 1960q1 5.133333
datevar unemp unempL1 unempL2
5 1961q1 6.8 7 6.766667
4 1960q4 6.266667 6.8 7
3 1960q3 5.533333 6.266667 6.8
2 1960q2 5.233333 5.533333 6.266667
1 1960q1 5.133333 5.233333 5.533333
datevar unemp unempF1 unempF2
Trang 6
To generate the difference between current and previous values, use the D operator The
transformations are as follows: D1 = Yt – Yt-1 and D2 = (Yt–Yt-1) – (Yt-1–Yt-2)
generate unempD1=D1.unemp
generate unempD2=D2.unemp
list datevar unemp unempD1 unempD2 in 1/5
4 Autocorrelation and cross-correlation analysis
In this section, we show you how to explore autocorrelation and cross-correlation
Autocorrelation represent the correlation between a variable and its previous values; use the ac
and pac commands To explore the relationship between two time series, use the command
xcorr, making sure that you always list the independent variable first and the dependent
variable second
ac produces a correlogram (a graph of autocorrelations) with pointwise confidence intervals
that is based on Bartlett's formula for MA(q) processes
pac produces a partial correlogram (a graph of partial autocorrelations) with confidence
intervals calculated using a standard error of 1/sqrt(n) The residual variances for each lag may
optionally be included on the graph
xcorr plots the sample cross-correlation function
5 1961q1 6.8 .5333333 -.2000003
4 1960q4 6.266667 .7333336 .4333334
3 1960q3 5.533333 .3000002 .2000003
2 1960q2 5.233333 .0999999
1 1960q1 5.133333
datevar unemp unempD1 unempD2
Trang 7
Example using “Time series.dta”
ac unemp, lags(10)
In this case, the autocorrelation graph indicates that unemployment is correlated with up to eight previous quarters
Lag
Bartlett's formula for MA(q) 95% confidence bands
Trang 8pac unemp, lags(10)
xcorr gdp unemp
Lag
95% Confidence bands [se = 1/sqrt(n)]
Cross-correlogram
Trang 9The graph above indicates that GDP has a negative correlation with unemployment (six to nine months)
5 Unit Root test
In this section, we demonstrate how to evaluate if the series has a unit root
When working with times series data sets it is important to look for unit root If unit root is found in a series this means that more than one trend is present in the series
Let’s look at unemployment across time and test for unit root
line unemp datevar
1960q1 1965q1 1970q1 1975q1 1980q1 1985q1 1990q1 1995q1 2000q1 2005q1
Date variable
Trang 10In order to assess for Unit Root we can use the Dickey-Fuller test to examine for stochastic
trends, using the following command:
dfuller unemp, lag(5)
In this case the null hypothesis is that unemployment has a unit root The Z-score yielded by the
test shows that unemployment has a unit root, because it falls within the acceptance interval
(i.e |-2.597| < |-3.481|)
When testing for unit root on the first difference of unemployment, we will find out that it does
not have unit root:
dfuller unempD1, lag(5)
MacKinnon approximate p-value for Z(t) = 0.1201
Z(t) -2.481 -3.485 -2.885 -2.575
Statistic Value Value Value
Test 1% Critical 5% Critical 10% Critical
Interpolated Dickey-Fuller
Augmented Dickey-Fuller test for unit root Number of obs = 175
MacKinnon approximate p-value for Z(t) = 0.0001
Z(t) -4.593 -3.485 -2.885 -2.575
Statistic Value Value Value
Test 1% Critical 5% Critical 10% Critical
Interpolated Dickey-Fuller
Augmented Dickey-Fuller test for unit root Number of obs = 174