Stata time series fall 2011

Opening the data set and data description We recommend that you create a log file before you start working in Stata, this way you will have all your computations on a file to review aft

Trang 1

Center for Teaching, Research & Learning

Social Science Research Lab American University, Washington, D.C

http://www.american.edu/provost/ctrl/

202-885-3862

Stata & Time series

Stata is a general-purpose statistical software package Stata's full range of capabilities include: data management, statistical analysis, graphics, simulations, and custom programming

Course Objective

This course is designed to give a basic understanding of some of the features available in Stata when working with time series analysis Time series data represents a pool of variables observed and recorded over time For this tutorial we are going to use the “Time series.dta” data set containing the following variables: date, unemployment, consumer price index (CPI), interest rate, and GDP growth “Time series.dta” contains observations for each quarter from 1960 to

2005

Learning Outcomes

1 Opening the data set and data description

2 Declaring the data to be Time Series

3 Useful time series command

4 Autocorrelation and cross-correlation analysis

5 Unit Root test

1 Opening the data set and data description

We recommend that you create a log file before you start working in Stata, this way you will have all your computations on a file to review afterwards

To do this, go to: File > Log > Begin This file will record all the input that you type, as well as all

the output produced by STATA Alternatively, you can type (in the command window):

Trang 2

Opening the data file For this tutorial, we will use Time series.dta, which can be downloaded

from:

http://www.american.edu/provost/ctrl/trainingguides.cfm

In Stata 11 and earlier versions, before you open the dataset, you may need to set the memory

size (In this instance, this isn’t necessary, as the example dataset is relatively small and does not

require a lot of memory.) To tell STATA how much memory to set aside for data, type:

set mem 100m

(This command is not needed in Stata 12)

Once you have downloaded and unzipped the dataset, you can access by going to: File > Open

Alternatively, you can type:

use "C:\Users\CTRL\Desktop\Time series.dta", clear

where the clear option has been appended This clears Stata’s memory, allowing you to open a

new dataset

In order to get a sense of what the data file contains we can use a couple of commands:

summerize and describe, both stata commands provide useful information about our data set

and variables

Summarize calculates and displays a variety of univariate summary statistics If no variable list is

specified, summary statistics are calculated for all the variables in the dataset

Describe produces a summary of the dataset in memory or of the data stored in a Stata-format

dataset

Example using “Time series.dta”

summarize

datevar 181 90 52.39434 0 180

gdp 181 2.031231 2.001162 -1.703726 9.718504

interest 181 6.167403 3.3706 .98 19.1

cpi 181 95.91184 54.13317 29.39667 192.1667

unemp 181 5.914917 1.453928 3.4 10.66667

Variable Obs Mean Std Dev Min Max

Trang 3

describe

2 Declaring the data to be Time Series

Using the time variable “datevar”, we are able to declare the data as times series in order to use

the time series operators

Using the tsset command

tsset declares the data in memory to be a time series tssetting the data is what makes Stata's

timeseries operators such as L and F (lag and lead) work Also, before using the other time

-series commands, you must tsset the data first If you save the data after tsset, Stata will

remember that data as being time series and you will not have to tsset again

tsset datevar

Note: dataset has changed since last saved

Sorted by: datevar

datevar float %tq Date variable

gdp float %9.0g GDP annual growth

interest float %9.0g Federal Funds Interest Rate

cpi float %9.0g Consumer Price Index

unemp float %9.0g Unemployment Rate

variable name type format label variable label

storage display value

size: 3,620

vars: 5 12 Oct 2011 10:00

obs: 181

Contains data from C:\Users\CTRL\Desktop\Time series.dta

delta: 1 quarter

time variable: datevar, 1960q1 to 2005q1

Trang 4

3 Useful Time Series commands

In this section, we introduce a few basic but very helpful commands

tin (times in, from time A to time B) option:

list datevar unemp if tin(2000q1,2000q4)

twithin (times within time A and time B, excluding the two time points) option:

list datevar unemp if twithin(2001q1,2001q3)

164 2000q4 3.9

163 2000q3 4

162 2000q2 3.933333

161 2000q1 4.033333

datevar unemp

166 2001q2 4.4

datevar unemp

Trang 5

Generating values bases on past observations using the lag operator and forward-looking values

using the lead operator:

generate unempL1=L1.unemp

generate unempL2=L2.unemp

list datevar unemp unempL1 unempL2 in 1/5

generate unempF1=F1.unemp

generate unempF2=F2.unemp

list datevar unemp unempF1 unempF2 in 1/5

5 1961q1 6.8 6.266667 5.533333

4 1960q4 6.266667 5.533333 5.233333

3 1960q3 5.533333 5.233333 5.133333

2 1960q2 5.233333 5.133333

1 1960q1 5.133333

datevar unemp unempL1 unempL2

5 1961q1 6.8 7 6.766667

4 1960q4 6.266667 6.8 7

3 1960q3 5.533333 6.266667 6.8

2 1960q2 5.233333 5.533333 6.266667

1 1960q1 5.133333 5.233333 5.533333

datevar unemp unempF1 unempF2

Trang 6

To generate the difference between current and previous values, use the D operator The

transformations are as follows: D1 = Yt – Yt-1 and D2 = (Yt–Yt-1) – (Yt-1–Yt-2)

generate unempD1=D1.unemp

generate unempD2=D2.unemp

list datevar unemp unempD1 unempD2 in 1/5

4 Autocorrelation and cross-correlation analysis

In this section, we show you how to explore autocorrelation and cross-correlation

Autocorrelation represent the correlation between a variable and its previous values; use the ac

and pac commands To explore the relationship between two time series, use the command

xcorr, making sure that you always list the independent variable first and the dependent

variable second

ac produces a correlogram (a graph of autocorrelations) with pointwise confidence intervals

that is based on Bartlett's formula for MA(q) processes

pac produces a partial correlogram (a graph of partial autocorrelations) with confidence

intervals calculated using a standard error of 1/sqrt(n) The residual variances for each lag may

optionally be included on the graph

xcorr plots the sample cross-correlation function

5 1961q1 6.8 .5333333 -.2000003

4 1960q4 6.266667 .7333336 .4333334

3 1960q3 5.533333 .3000002 .2000003

2 1960q2 5.233333 .0999999

1 1960q1 5.133333

datevar unemp unempD1 unempD2

Trang 7

ac unemp, lags(10)

In this case, the autocorrelation graph indicates that unemployment is correlated with up to eight previous quarters

Lag

Bartlett's formula for MA(q) 95% confidence bands

Trang 8

pac unemp, lags(10)

xcorr gdp unemp

Lag

95% Confidence bands [se = 1/sqrt(n)]

Cross-correlogram

Trang 9

The graph above indicates that GDP has a negative correlation with unemployment (six to nine months)

5 Unit Root test

In this section, we demonstrate how to evaluate if the series has a unit root

When working with times series data sets it is important to look for unit root If unit root is found in a series this means that more than one trend is present in the series

Let’s look at unemployment across time and test for unit root

line unemp datevar

1960q1 1965q1 1970q1 1975q1 1980q1 1985q1 1990q1 1995q1 2000q1 2005q1

Date variable

Trang 10

In order to assess for Unit Root we can use the Dickey-Fuller test to examine for stochastic

trends, using the following command:

dfuller unemp, lag(5)

In this case the null hypothesis is that unemployment has a unit root The Z-score yielded by the

test shows that unemployment has a unit root, because it falls within the acceptance interval

(i.e |-2.597| < |-3.481|)

When testing for unit root on the first difference of unemployment, we will find out that it does

not have unit root:

dfuller unempD1, lag(5)

MacKinnon approximate p-value for Z(t) = 0.1201

Z(t) -2.481 -3.485 -2.885 -2.575

Statistic Value Value Value

Test 1% Critical 5% Critical 10% Critical

Interpolated Dickey-Fuller

Augmented Dickey-Fuller test for unit root Number of obs = 175

MacKinnon approximate p-value for Z(t) = 0.0001

Z(t) -4.593 -3.485 -2.885 -2.575

Statistic Value Value Value

Test 1% Critical 5% Critical 10% Critical

Interpolated Dickey-Fuller

Augmented Dickey-Fuller test for unit root Number of obs = 174

Định dạng
Số trang	10
Dung lượng	533,92 KB
File đính kèm	98. Stata.rar (500 KB)