Một cuốn sách hay về lý thuyết kinh tế lượng và các mô hình tiên đoán trong kinh tế. Các mô hình này sẽ được thực hiện bằng phần mềm Stata 15.1
Trang 1Econometrics
in Theory and Practice
Analysis of Cross Section, Time Series
and Panel Data with Stata 15.1
Trang 3Panchanan Das
Econometrics in Theory and Practice
Analysis of Cross Section, Time Series and Panel Data with Stata 15.1
123
Trang 4© Springer Nature Singapore Pte Ltd 2019
This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, speci fically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on micro films or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a speci fic statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional af filiations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Trang 5Dedicated to my father late Bibhuti Bhusan Das
Trang 6This book is an outcome of my experience in learning and teaching econometricssince more than three decades Good quality books of econometrics are available,but there is a dearth of user-friendly books with a proper combination of theory andapplication with statistical software The books particularly by Maddala,Wooldridge, Greene, Enders, Maddala and Kim, Hsiao and Baltagi are very muchinvaluable The book by Gujarati is also a good one in its ability to elaborateeconometric theories for graduate students However, many scholars and studentsand researchers, today, use statistical software to do empirical analysis I also haveused both EViews and Stata in my teaching and research works and personallyfound that Stata is as powerful orflexible compared to EViews Furthermore, Stata
is used extensively to process large data sets This book is a proper combination ofeconometric theory and application with Stata 15.1
The basic purpose of this text is to introduce econometric analysis of crosssection, time series and panel data with the application of statistical software Thisbook may serve as a basic text for those who wish to learn and apply econometricanalysis in empirical research The level of presentation is kept as simple as pos-sible to make it useful for undergraduate as well as graduate students It containsseveral examples with real data and Stata programmes and interpretation of theresults
This book is intended primarily for graduate and post-graduate students inuniversities in India and abroad and researchers in the social sciences, business,management, operations research, engineering or applied mathematics In this book,
we view econometrics as a subject dealing with a set of data analytic techniques thatare used in empirical research extensively The aim is to provide students with theskills required to undertake independent applied research using modern econo-metric methods It covers the statistical tools needed to understand empirical eco-nomic research and to plan and execute independent research projects It attempts toprovide a balance between theory and applied research Various concepts andtechniques of econometric analysis are supported by carefully developed examples
vii
Trang 7with the use of statistical software package, Stata 15.1 Hopefully, this book willsuccessfully bridge the gap between learning econometrics and learning how to useStata.
It is an attempt to incorporate econometric theories in a student-friendly manner
to understand properly the techniques needed for empirical research It demandsboth students and professional analysts because of its balanced discussion of thetheories with software applications However, this book should not be claimed as asubstitute for the well-established texts that are being used in academia; rather it canserve as a supplementary text in both undergraduate- and post-graduate-leveleconometric courses The discussion in this book is based on the assumption thatthe reader is somewhat familiar with the Stata software and other statistical pro-gramming The Stata help manuals from the Stata Corporation offer detailedexplanation and syntax for all the commands used in this book The data used forillustration are taken mainly from official sources like CSO, NSSO and ILO.The topics covered in this book are divided into four parts Part I is the discussion
on introductory econometric methods covering the syllabus of graduate courses inthe University of Calcutta, Delhi University and other leading universities in Indiaand abroad This part of the book provides an introduction to basic econometricmethods for data analysis that economists and other social scientists use to estimatethe economic and social relationships, and to test hypotheses about them, usingreal-world data There are 5 chapters in this part covering the data managementissues, details of linear regression models and the related problems due to the vio-lation of the classical assumptions Chapter 1 provides some basic steps used ineconometrics and statistical software, Stata 15.1, for useful application of econo-metric theories Chapter2discusses linear regression model and its application withcross section data Chapter3deals with this problem of statistical inference of a linearregression model Chapter4 relaxes the homoscedasticity and non-autocorrelationassumptions of the random error of a linear regression model and shows how theparameters of the linear model are correctly estimated Chapter 5 discusses thedetection of multicollinearity and alternatives for handling the problem
Part II discusses some advanced topics used frequently in empirical researchwith cross section data This part contains 3 chapters to include some specificproblems of regression analysis Chapter 6 explains how qualitative explanatoryvariables can be incorporated into a linear model Chapter7provides econometricmodels with limited dependent variables and problems of truncated distribution,sample selection bias and multinomial logit Special emphasis is given to multi-variate analysis, particularly principal component analysis and factor analysis,because of their popularity in empirical research with cross section data Chapter8captures these issues
Part III deals with time series econometric analysis Time series data have somespecial features, and they should be taken care of very much cautiously Time serieseconometrics was developed in modern approach since the early 1980s with thepublications of Engle and Granger, and it becomes very much popular in empiricalresearch with the development of user-friendly software This book covers inten-sively both the univariate and multivariate time series econometric models and their
Trang 8applications with software programming in 6 chapters This part starts with thediscussion on data generating process of time series data in Chap.9 Chapter 10deals with different features of the data generating process (DGP) of a time series in
a univariate framework The presence of unit roots in macroeconomic time serieshas received a major area of theoretical and applied research since the early 1980s.Chapter11presents some issues regarding unit root tests and explores some of theimplications for macroeconomic theory and policy Chapter12explores the basicconceptual issues involved in estimating the relationship between two or morenonstationary time series with unit roots Chapter 13 examines the behaviour ofvolatility in terms of conditional heteroscedasticity model Forecasting is important
in economics, commerce and various disciplines of social science and pure science.Chapter 14 aims to provide an overview of forecasting based on time seriesanalysis
Part IV takes care of panel data analysis in 4 chapters Panel data have severaladvantages over the cross section and time series data Panel data econometricsgains popularity because of the availability of panel data in the public domaintoday Different aspects offixed effects and random effects are discussed here
I have extended panel data analysis by taking dynamic panel data models which arethe most suitable for macroeconomic research Chapter15discusses different types
of panel data model in a static framework Chapter 16 deals with testing ofhypotheses to examine panel data in a static framework Panel data with long timeperiod have been used predominately in applied macroeconomic research likepurchasing power parity, growth convergence, business cycle synchronisation and
so on Chapter17provides some theoretical issues and their application in testingfor unit roots in panel data Dynamic model in panel data framework is very muchpopular in empirical research Chapter18focuses on some issues of dynamic paneldata model
All chapters in this book provide applications of econometric models by usingStata Simple presentation of some difficult topics in a rigorous manner is the majorstrength of this book While the Bayesian econometrics, nonparametric and semi-parametric, are popular methods today to capture the behaviour of the data in amore complex real situation, I do not attempt to cover these topics because of mycomparative disadvantage in these areas and to keep the technical difficulty at alower possible level Despite these limitations, the topics covered in this book arebasics and necessary for econometrics training of every student in economics andother disciplines I hope the students of econometrics will share my enthusiasm andoptimism in the importance of different econometric methods they will learnthrough reading this book Hopefully, it will enhance their interest in empiricalresearch in economics and otherfields of social science
May 2019
Trang 9My interest in econometrics was initiated by my teachers at different level sincemore than three decades back I acknowledge the contribution of Amiya KumarBagchi, my teacher and Ph.D supervisor, in the field of empirical research thatencourages me to learn econometrics at least indirectly Among others I shouldmention Dipankor Coondoo of Indian Statistical Institute, Kolkata, who helped me
to understand clearly different issues of the subject Sankar Kumar Bhoumik, mysenior colleague and friend, helped a lot to learn the subject by providing access toteaching at post-graduate level at the Department of Economics, University ofCalcutta, even much before my joining the Department as a permanent faculty
I also gratefully acknowledge my teacher, Manoj Kumar Sanyal, who in fact is acontinuous source of encouragement in learning and thinking I think, in some way,they have prepared the background for this book being written
A number of friends and colleagues have commented on earlier drafts of thebook, or helped in other ways I am grateful to Maniklal Adhikary, AninditaSengupta, Pradip Kumar Biswas and others for their assistance and encouragement.Discussions with Oleg Golichenko and Kirdina Svetlana of Higher SchoolEconomics, Moscow, were helpful in clarifying some of my ideas
Any remaining errors and omissions are, of course, my responsibility, and I shall
be glad to have them brought to my attention
I am grateful to the Department of Economics, University of Calcutta, forproviding an adequate infrastructure where I spent time during my learning andteaching of economics Special thanks are due to the Head of the Department ofEconomics and the authority of the University of Calcutta
I am extremely grateful to my wife, Krishna, who took over many of my roles inthe household during the preparation of the manuscripts
xi
Trang 10Finally, thanks to the editorial team of Springer for help with indexing andproof-reading I am grateful to Sagarika Ghosh of Springer for encouragement for thisproject.
May 2019
Trang 11Part I Introductory Econometrics
1 Introduction to Econometrics and Statistical Software 3
1.1 Introduction 4
1.2 Economic Model and Econometric Model 6
1.3 Population Regression Function and Sample Regression Function 8
1.4 Parametric and Nonparametric or Semiparametric Model 10
1.5 Steps in Formulating an Econometric Model 11
1.5.1 Specification 11
1.5.2 Estimation 13
1.5.3 Testing of Hypothesis 14
1.5.4 Forecasting 14
1.6 Data 15
1.6.1 Cross Section Data 15
1.6.2 Time Series Data 16
1.6.3 Pooled Cross Section 16
1.6.4 Panel Data 17
1.7 Use of Econometric Software: Stata 15.1 17
1.7.1 Data Management 18
1.7.2 Generating Variables 21
1.7.3 Describing Data 22
1.7.4 Graphs 22
1.7.5 Logical Operators in Stata 23
1.7.6 Functions Used in Stata 24
1.8 Matrix Algebra 24
1.8.1 Matrix and Vector: Basic Operations 24
1.8.2 Partitioned Matrices 28
1.8.3 Rank of a Matrix 28
1.8.4 Inverse Matrix 30
xiii
Trang 121.8.5 Positive Definite Matrix 31
1.8.6 Trace of a Matrix 31
1.8.7 Orthogonal Vectors and Matrices 32
1.8.8 Eigenvalues and Eigenvectors 32
References 35
2 Linear Regression Model: Properties and Estimation 37
2.1 Introduction 37
2.2 The Simple Linear Regression Model 38
2.3 Multiple Linear Regression Model 42
2.4 Assumptions of Linear Regression Model 46
2.4.1 Non-stochastic Regressors 46
2.4.2 Linearity 46
2.4.3 Zero Unconditional Mean 47
2.4.4 Exogeneity 47
2.4.5 Homoscedasticity 48
2.4.6 Non-autocorrelation 48
2.4.7 Full Rank 49
2.4.8 Normal Distribution 50
2.5 Methods of Estimation 50
2.5.1 The Method of Moments (MM) 51
2.5.2 The Method of Ordinary Least Squares (OLS) 51
2.5.3 Maximum Likelihood Method 59
2.6 Properties of the OLS Estimation 63
2.6.1 Algebraic Properties 63
2.6.2 Statistical Properties 66
References 73
3 Linear Regression Model: Goodness of Fit and Testing of Hypothesis 75
3.1 Introduction 75
3.2 Goodness of Fit 76
3.2.1 The R2as a Measure of Goodness of Fit 76
3.2.2 The Adjusted R2as a Measure of Goodness of Fit 79
3.3 Testing of Hypothesis 80
3.3.1 Sampling Distributions of the OLS Estimators 82
3.3.2 Testing of Hypothesis for a Single Parameter 83
3.3.3 Use of P-Value 89
3.3.4 Interval Estimates 89
3.3.5 Testing of Hypotheses for More Than One Parameter: t Test 90
3.3.6 Testing Significance of the Regression: F Test 91
3.3.7 Testing for Linearity 93
Trang 133.3.8 Tests for Stability 95
3.3.9 Analysis of Variance 96
3.3.10 The Likelihood-Ratio, Wald and Lagrange Multiplier Test 97
3.4 Linear Regression Model by Using Stata 15.1 101
3.4.1 OLS Estimation in Stata 101
3.4.2 Maximum Likelihood Estimation (MLE) in Stata 104
References 108
4 Linear Regression Model: Relaxing the Classical Assumptions 109
4.1 Introduction 109
4.2 Heteroscedasticity 110
4.2.1 Problems with Heteroscedastic Data 110
4.2.2 Heteroscedasticity Robust Variance 112
4.2.3 Testing for Heteroscedasticity 115
4.2.4 Problem of Estimation 116
4.2.5 Illustration of Heteroscedastic Linear Regression by Using Stata 118
4.3 Autocorrelation 126
4.3.1 Linear Regression Model with Autocorrelated Error 127
4.3.2 Testing for Autocorrelation: Durbin–Watson Test 128
4.3.3 Consequences of Autocorrelation 130
4.3.4 Correcting for Autocorrelation 131
4.3.5 Illustration by Using Stata 132
References 135
5 Analysis of Collinear Data: Multicollinearity 137
5.1 Introduction 137
5.2 Multiple Correlation and Partial Correlation 138
5.3 Problems in the Presence of Multicollinearity 140
5.4 Detecting Multicollinearity 142
5.4.1 Determinant of (X′X) 143
5.4.2 Determinant of Correlation Matrix 143
5.4.3 Inspection of Correlation Matrix 143
5.4.4 Measure Based on Partial Regression 143
5.4.5 Theil’s Measure 144
5.4.6 Variance Inflation Factor (VIF) 144
5.4.7 Eigenvalues and Condition Numbers 146
5.5 Dealing with Multicollinearity 147
5.6 Illustration by Using Stata 149
References 151
Trang 14Part II Advanced Analysis of Cross Section Data
6 Linear Regression Model: Qualitative Variables as Predictors 155
6.1 Introduction 155
6.2 Regression Model with Intercept Dummy 157
6.2.1 Dichotomous Factor 157
6.2.2 Polytomous Factors 158
6.3 Regression Model with Interaction Dummy 160
6.4 Illustration by Using Stata 162
7 Limited Dependent Variable Model 167
7.1 Introduction 167
7.2 Linear Probability Model 168
7.3 Binary Response Models: Logit and Probit 170
7.3.1 The Logit Model 173
7.3.2 The Probit Model 174
7.3.3 Difference Between Logit and Probit Models 174
7.4 Maximum Likelihood Estimation of Logit and Probit Models 175
7.4.1 Interpretation of the Estimated Coefficients 176
7.4.2 Goodness of Fit 178
7.4.3 Testing of Hypotheses 179
7.4.4 Illustration of Binary Response Model by Using Stata 180
7.5 Regression Model with Truncated Distribution 185
7.5.1 Illustration of Truncated Regression by Using Stata 189
7.6 Problem of Censoring: Tobit Model 191
7.6.1 Illustration of Tobit Model by Using Stata 193
7.7 Models with Sample Selection Bias 195
7.7.1 Illustration of Sample Selection Model by Using Stata 199
7.8 Multinomial Logit Regression 201
7.8.1 Illustration by Using Stata 203
References 206
8 Multivariate Analysis 207
8.1 Introduction 207
8.2 Displaying Multivariate Data 208
8.2.1 Multivariate Observations 208
8.2.2 Sample Mean Vector 211
8.2.3 Population Mean Vector 211
8.2.4 Covariance Matrix 212
Trang 158.2.5 Correlation Matrix 213
8.2.6 Linear Combination of Variables 215
8.3 Multivariate Normal Distribution 218
8.4 Principal Component Analysis 219
8.4.1 Calculation of Principal Components 220
8.4.2 Properties of Principal Components 223
8.4.3 Illustration by Using Stata 223
8.5 Factor Analysis 225
8.5.1 Orthogonal Factor Model 226
8.5.2 Estimation of Loadings and Communalities 228
8.5.3 Factor Loadings Are not Unique 232
8.5.4 Factor Rotation 232
8.5.5 Illustration by Using Stata 233
8.6 Multivariate Regression 236
8.6.1 Structure of the Regression Model 236
8.6.2 Properties of Least Squares Estimators of B 238
8.6.3 Model Corrected for Means 239
8.6.4 Canonical Correlations 239
References 242
Part III Analysis of Time Series Data 9 Time Series: Data Generating Process 247
9.1 Introduction 247
9.2 Data Generating Process (DGP) 248
9.2.1 Stationary Process 250
9.2.2 Nonstationary Process 252
9.3 Methods of Time Series Analysis 253
9.4 Seasonality and Seasonal Adjustment 254
9.5 Creating a Time Variable by Using Stata 255
References 258
10 Stationary Time Series 261
10.1 Introduction 262
10.2 Univariate Time Series Model 262
10.3 Autoregressive Process (AR) 264
10.3.1 The First-Order Autoregressive Process 265
10.3.2 The Second-Order Autoregressive Process 269
10.3.3 The Autoregressive Process of Order p 275
10.3.4 General Linear Processes 276
10.4 The Moving Average (MA) Process 278
10.4.1 The First-Order Moving Average Process 278
10.4.2 The Second-Order Moving Average Process 279
Trang 1610.4.3 The Moving Average Process of Order q 280
10.4.4 Invertibility in Moving Average Process 281
10.5 Autoregressive Moving Average (ARMA) Process 281
10.6 Autocorrelation Function 284
10.6.1 Autocorrelation Function for AR(1) 285
10.6.2 Autocorrelation Function for AR(2) 287
10.6.3 Autocorrelation Function for AR(p) 290
10.6.4 Autocorrelation Function for MA(1) 291
10.6.5 Autocorrelation Function for MA(2) 292
10.6.6 Autocorrelation Function for MA(q) 293
10.6.7 Autocorrelation Function for ARMA Process 293
10.7 Partial Autocorrelation Function (PACF) 294
10.7.1 Partial Autocorrelation for AR Series 296
10.7.2 Partial Autocorrelation for MA Series 298
10.8 Sample Autocorrelation Function 299
10.8.1 Illustration by Using Stata 300
References 303
11 Nonstationarity, Unit Root and Structural Break 305
11.1 Introduction 306
11.2 Analysis of Trend 307
11.2.1 Deterministic Function of Time 307
11.2.2 Stochastic Function of Time 308
11.2.3 Stochastic and Deterministic Function of Time 310
11.3 Concept of Unit Root 312
11.4 Unit Root Test 313
11.4.1 Dickey–Fuller Unit Root Test 315
11.4.2 Augmented Dickey–Fuller (ADF) Unit Root Test 318
11.4.3 Phillips–Perron Unit Root Test 326
11.4.4 Dickey–Fuller GLS Test 329
11.4.5 Stationarity Tests 331
11.4.6 Multiple Unit Roots 334
11.4.7 Some Problems with Unit Root Tests 336
11.4.8 Macroeconomic Implications of Unit Root 336
11.5 Testing for Structural Break 337
11.5.1 Tests with Known Break Points 337
11.5.2 Tests with Unknown Break Points 341
11.6 Unit Root Test with Break 349
11.6.1 When Break Point is Exogenous 349
11.6.2 When Break Point is Endogenous 354
11.7 Seasonal Adjustment 355
Trang 1711.7.1 Unit Roots at Various Frequencies: Seasonal Unit
Root 356
11.7.2 Generating Time Variable and Seasonal Dummies in Stata 359
11.8 Decomposition of a Time Series into Trend and Cycle 360
References 364
12 Cointegration, Error Correction and Vector Autoregression 367
12.1 Introduction 367
12.2 Regression with Trending Variables 368
12.3 Concept of Cointegration 370
12.4 Granger’s Representation Theorem 373
12.5 Testing for Cointegration: Engle–Granger’s Two-Step Method 374
12.5.1 Illustrations by Using Stata 376
12.6 Vector Autoregression (VAR) 377
12.6.1 Stationarity Restriction of a VAR Process 381
12.6.2 Autocovariance Matrix of a VAR Process 384
12.6.3 Estimation of a VAR Process 386
12.6.4 Selection of Lag Length of a VAR Model 390
12.6.5 Illustration by Using Stata 391
12.7 Vector Moving Average Processes 392
12.8 Impulse Response Function 393
12.8.1 Illustration by Using Stata 398
12.9 Variance Decomposition 399
12.10 Granger Causality 400
12.10.1 Illustration by Using Stata 401
12.11 Vector Error Correction Model 403
12.11.1 Illustration by Using Stata 406
12.12 Estimation and Testing of Hypotheses of Cointegrated Systems 408
12.12.1 Illustration by Using Stata 413
References 415
13 Modelling Volatility Clustering 417
13.1 Introduction 417
13.2 Modelling Non-constant Conditional Variance 419
13.3 The ARCH Model 421
13.4 The GARCH Model 425
13.5 Asymmetric ARCH Models 429
13.6 ARCH-in-Mean Model 430
13.7 Testing and Estimation of a GARCH Model 432
13.7.1 Testing for ARCH Effect 432
13.7.2 Maximum Likelihood Estimation for GARCH (1, 1) 432
Trang 1813.8 The ARCH Regression Model in Stata 433
13.8.1 Illustration with Market Capitalisation Data 434
References 437
14 Time Series Forecasting 439
14.1 Introduction 439
14.2 Simple Exponential Smoothing 440
14.3 Forecasting—Univariate Model 441
14.4 Forecasting with General Linear Processes 445
14.5 Multivariate Forecasting 447
14.6 Forecasting of a VAR Model 447
14.7 Forecasting GARCH Processes 449
14.8 Time Series Forecasting by Using Stata 450
References 453
Part IV Analysis of Panel Data 15 Panel Data Analysis: Static Models 457
15.1 Introduction 458
15.2 Structure and Types of Panel Data 459
15.2.1 Data Description by Using Stata 15.1 460
15.3 Benefits of Panel Data 465
15.4 Sources of Variation in Panel Data 465
15.5 Unrestricted Model with Panel Data 467
15.6 Fully Restricted Model: Pooled Regression 468
15.6.1 Illustration by Using Stata 469
15.7 Error Component Model 471
15.8 First-Differenced (FD) Estimator 473
15.8.1 Illustration by Using Stata 473
15.9 One-Way Error Component Fixed Effects Model 474
15.9.1 The“Within” Estimation 474
15.9.2 Least Squares Dummy Variable (LSDV) Regression 483
15.10 One-Way Error Component Random Effects Model 486
15.10.1 The GLS Estimation 490
15.10.2 Maximum Likelihood Estimation 492
15.10.3 Illustration by Using Stata 494
Reference 497
16 Panel Data Static Model: Testing of Hypotheses 499
16.1 Introduction 499
16.2 Measures of Goodness of Fit 500
16.3 Testing for Pooled Regression 501
Trang 1916.4 Testing for Fixed Effects 503
16.4.1 Illustration by Using Stata 503
16.5 Testing for Random Effects 505
16.5.1 Illustration by Using Stata 506
16.6 Fixed or Random Effect: Hausman Test 507
16.6.1 Illustration by Using Stata 509
References 510
17 Panel Unit Root Test 513
17.1 Introduction 513
17.2 First-Generation Panel Unit Root Tests 514
17.2.1 Wu (1996) Unit Root Test 515
17.2.2 Levin, Lin and Chu Unit Root Test 516
17.2.3 Im, Pesaran and Shin (IPS) Unit Root Test 521
17.2.4 Fisher-Type Unit Root Tests 524
17.3 Stationarity Tests 526
17.3.1 Illustration by Using Stata 528
17.4 Second-Generation Panel Unit Root Tests 528
17.4.1 The Covariance Restrictions Approach 529
17.4.2 The Factor Structure Approach 531
References 539
18 Dynamic Panel Model 541
18.1 Introduction 542
18.2 Linear Dynamic Model 542
18.3 Fixed and Random Effects Estimation 544
18.3.1 Illustration by Using Stata 547
18.4 Instrumental Variable Estimation 548
18.4.1 Illustration by Using Stata 549
18.5 Arellano–Bond GMM Estimator 552
18.5.1 Illustration by Using Stata 556
18.6 System GMM Estimator 560
18.6.1 Illustration by Using Stata 562
Appendix: Generalised Method of Moments 564
References 565
Trang 20Panchanan Das is a Professor of Economics, currently teaching Time Series andPanel Data Econometrics at the Department of Economics, University of Calcutta.His main research areas are Development Economics, Indian Economics, andApplied Macroeconomics He has published several articles and book chapters ongrowth, inequality and poverty, and is a principal author of Economics I andEconomics II, graduate-level textbooks published by Oxford University Press, NewDelhi He is also a major contributor to the West Bengal Development Report –
2008, published by the Academic Foundation, New Delhi, in collaboration with thePlanning Commission, Government of India
xxiii
Trang 21List of Figures
Fig 1.1 Income demand relation 6
Fig 1.2 Conditional mean function 9
Fig 1.3 Sample regression function 10
Fig 2.1 Spending–income relationship for households in West Bengal 42
Fig 2.2 Relation between projection and error vectors 66
Fig 3.1 Histogram 80
Fig 3.2 aTwo-tailed test, b one-tailed test (left tail), c one-tailed test (right tail) 87
Fig 3.3 Comparison of LR, W and LM tests 100
Fig 3.4 Log-likelihood 107
Fig 4.1 Distribution of Y with heteroscedastic error 111
Fig 4.2 Variability of ln(wage) with year of schooling Source NSS 68th round (2011–2012) data on employment and unemployment 112
Fig 4.3 Scattered plot of residuals 119
Fig 4.4 Pattern of residual 132
Fig 4.5 Pattern of corrected residual 134
Fig 6.1 Relation between education and income among men and women 156
Fig 6.2 Conditional mean functions for female and male groups 158
Fig 7.1 Predicted probability function 170
Fig 7.2 Density function for logit (green) and probit (red) models 175
Fig 7.3 CDF for logit (blue) and probit (red) models 175
Fig 9.1 Different shapes of time series 249
Fig 9.2 Time behaviour of BSE sensex 257
Fig 9.3 Time behaviour offirst difference of BSE sensex 258
Fig 10.1 Stationarity region for AR(2) process 274
Fig 10.2 Autocorrelation function of log GDP series 302
Fig 10.3 Autocorrelation function of thefirst difference of log GDP series 302
xxv
Trang 22Fig 10.4 Partial autocorrelation function of log GDP series 303Fig 11.1 Time path of a series without trend 316Fig 11.2 Time path of a series with trend 317Fig 11.3 Wald test statistics 350Fig 11.4 Index of industrial production 356Fig 11.5 Seasonally adjusted iip 360Fig 12.1 Impulse response function 399Fig 12.2 Movement of GDP and consumption expenditure 407Fig 13.1 Time path of stock price and return 418Fig 13.2 Autocorrelation function of returns and squared returns 418Fig 13.3 Time path offirst-differenced series of market capitalisation 435Fig 15.1 Line plots of GDP growth 464Fig 15.2 Line plots of GDP growth (overlay) 464Fig 15.3 Relation between labour employment and labour
productivity 470Fig 15.4 Relation between labour employment and GDP growth 471Fig 15.5 Mean values of variables by country 479Fig 15.6 Estimated relationship between labour employment and labour
productivity 487
Trang 23List of Tables
Table 7.1 Distribution of random error 169Table 15.1 Data matrix of a single variable (X) 459
xxvii
Trang 24Introductory Econometrics
Trang 25Chapter 1
Introduction to Econometrics
and Statistical Software
Abstract This chapter discusses some basic steps used in econometrics and
statisti-cal software, Stata 15.1, for useful application of econometric theories with real-lifedata Econometric methods are helpful in explaining the stochastic relationship inmathematical form among variables Applied econometrics is the application ofeconometric theory to analyse economic phenomenon with economic data While
an economic model provides a theoretical relation, an econometric model is a tionship used to analyse real-life situation The formulation of economic models in
rela-an empirically testable form is rela-an econometric model The rrela-andom error or bance term is very much powerful in econometric analysis One of the major tasks ofstatistics and econometrics is to obtain information about populations The main aim
distur-of econometric analysis is to obtain information about the population through theanalysis of the sample Regression analysis is an important tool used in econometrics
to analyse quantitative data for estimating model parameters and making forecasts.Data are the main inputs in econometric analysis Therefore, a researcher shouldhave a clear idea about the data
This chapter discusses some basic steps used in econometrics and statistical software,Stata 15.1, for useful application of econometric theories with real-life data Econo-metric methods are helpful in explaining the stochastic relationship in mathematicalform among variables Applied econometrics is the application of econometric theory
to analyse economic phenomenon with economic data While an economic modelprovides a theoretical relation, an econometric model is a relationship used to analysereal-life situation The formulation of economic models in an empirically testableform is an econometric model The random error or disturbance term is very muchpowerful in econometric analysis One of the major tasks of statistics and econo-metrics is to obtain information about populations The main aim of econometricanalysis is to obtain information about the population through the analysis of thesample Regression analysis is an important tool used in econometrics to analysequantitative data for estimating model parameters and making forecasts Data arethe main inputs in econometric analysis Therefore, a researcher should have a clearidea about the data
© Springer Nature Singapore Pte Ltd 2019
P Das, Econometrics in Theory and Practice,
https://doi.org/10.1007/978-981-32-9019-8_1
3
Trang 261.1 Introduction
Econometrics is the application of statistical and mathematical methods to analyseeconomic theory with data by using different techniques of estimation and testing ofhypotheses relating to economic theories.1It uses statistical methods for the analysis
of economic phenomena Econometrics is by no means the same as economic tics or application of mathematics to economics The unification of economic theory,mathematics and statistics constitutes what is called econometrics Economic theo-ries are usually expressed in mathematical forms Statistical methods are adopted inexplaining the economic phenomenon in stochastic form that constitutes the econo-metric methods Econometrics is used to estimate the values of the parameters whichare essentially the coefficients of mathematical equations representing economicrelationships The econometric relationships can capture the random behaviour ofeconomic relationships which are not considered in theories in economics
statis-Econometrics differs from statistics Statistical models describe the methods ofmeasurement which are developed on the basis of controlled experiments The econo-metric methods are generally developed for the analysis of non-experimental data.Econometrics uses statistical methods to test the validity of economic theories byintroducing randomness in economic relationships Thus, econometrics basicallyattempts to specify the stochastic element in the model on the basis of the real-worlddata
Econometrics has emerged as a separate discipline because the straightforwardapplication of statistical methods usually fails to answer many economic questions.Economic problems can rarely be studied in a fully controlled, experimental environ-ment Real-world data are needed to infer economic regularities Economic researchquestions based on economic theory suggest the structure of an appropriate econo-metric model for estimation with data to make some inference on the research ques-tions Econometric analysis is of two types: theoretical econometrics and appliedeconometrics The theoretical econometrics deals with the development of new meth-ods appropriate for the measurement of economic relationships The applied econo-metrics, on the other hand, is the application of econometric theory for the analysis
of economic phenomenon and forecasting the economic behaviour
The following is a good example of how economic theory structures the cal method to develop an econometric model for empirical analysis of an economicproblem Human capital theory states that workers with similar productive character-istics like education and work experience should get the same wages We can expressthis theoretical proposition in terms of wage equation by taking wage as dependentvariable and workers’ characteristics as a vector of independent variables By incor-porating statistical regularities, this wage regression equation forms an econometricmodel that could be estimated with real-world data to test the validity of the humancapital theory The estimation of econometric model also provides several economic
statisti-1 Econometrics means measurement in economics It has started to develop systematically since the
establishment of the Econometric Society in 1930 and the publication of the journal Econometrica
in January 1933.
Trang 271.1 Introduction 5
implications One of the popular research areas in labour economics relates to genderwage discrimination An economic model based on human capital theory can givesome substance to this conjecture To see whether there is discrimination, we cancompare estimated wages of women and men that are similar with respect to thesecharacteristics If the null hypothesis of gender discrimination is not rejected, then
it has serious implications for women participation in the labour market A relatedissue is the comparison of wages between groups over time or space If genderwage differential has narrowed over time, it seems to indicate that there has been
an improvement in the labour market outcome Statistics is sceptical about labourmarket participation
Another important research issue in human capital theory is return to education,the effect of education on employment and wages It is important to know the return
to schooling for taking decision on investment in schooling Return to education isdefined as the wage increase per additional year of schooling By using this def-inition if we want to estimate return to schooling by using statistical method in astraightforward way, we have to face a real problem We can measure statisticallythe return to education by randomly allocating different education levels to differentindividuals and infer the effect on their earnings But, the problem of measurement isnot straightforward like this when we use real-world data on actual education levelsand earnings In reality, workers are heterogeneous in terms of their ability Personswith high ability earn more than persons with low ability at any given level of edu-cation If we compare return to education by ignoring the possible effects of ability,
we cannot explain the wage differences because of the inherent differences in abilitybetween the groups
In econometrics, we combine economic theory and statistics to formulate andanalyse an interesting economic question Economic theory, for example, providesdifferent models of stock prices to test for stock market efficiency If stock markets areefficient, all available information determines properly the movement of stock prices
If stock market is inefficient arbitrage will appear in the market, and by exploiting it
an investor becomes rich Econometrics is useful in finding out the effect of arbitragewith the help of economic theory and appropriate statistical tools Econometrics isused not only in economics but also in other areas like engineering sciences, biolog-ical sciences, medical sciences, geosciences and agricultural sciences Econometricmethods are helpful in explaining the stochastic relationship in mathematical formamong variables
This introductory chapter is organised in the following way Section1.2guishes between economic model and econometric model Section1.3provides themeaning of population regression function and sample regression function The basicdifference between parametric and nonparametric models is discussed in Sect.1.4.Section1.5deals with the steps in econometric model Data are the primary inputs
distin-in econometric analysis Section1.6discusses the data Application of ric theories with real-life data needs the use of appropriate software A number ofeconometric and statistical software are available today Section1.7of this chapterprovides some basic steps of Stata 15.1 Some basic operations of matrix algebrafrequently used in econometrics are shown in Sect.1.8
Trang 28economet-1.2 Economic Model and Econometric Model
A model is a simplified representation of a real-world process An economic model
is a set of mathematical equations formed on the basis of a set of assumptionsthat approximately describes the behaviour of an economy For example, utilitymaximisation subject to budget constraint by an individual is described well byeconomic models The problem of constrained utility maximisation is to be solvedfor demand functions The rationality assumptions on consumers’ behaviour areneeded to formulate a demand function for a commodity showing the relationshipbetween quantities demanded for a commodity and its own price, the price of otherrelated commodities, consumer’s income and consumers’ preferences This demandequation obtained from economic model is the basis of an econometric analysis ofconsumer demand
Suppose that we would like to examine the effects of income on demand for a modity Economic theory suggests that quantity demanded for a commodity depends
com-on its own price, prices of other commodities, ccom-onsumer’s income, ccom-onsumers’ tasteand preferences and so on An economic model relating to demand is expressed interms of the demand function:
y = f (x1, x2, x3, ) (1.2.1)
Here, y denotes quantity demanded for a commodity, x1 is income, x2 its own
price, and x3is price of other related commodities
Under ceteris paribus assumption, there exists a unique relationship between
quan-tity demanded (y) and household income (x1) If we assume that the relationship islinear, the income demand relation is expressed as
Trang 291.2 Economic Model and Econometric Model 7
But, in reality we have no power to keep all other factors remaining the same Now,let we incorporate the effects of other factors, which are not available in the data set,
on quantity demanded To confirm the theoretical claim as shown in Eq (1.2.2), wehave to incorporate the effects of the other factors which are not considered explicitly
into the model by introducing a new variable (let it be u) in Eq (1.2.2)
Eq (1.2.3) in a straightforward manner as for Eq (1.2.2) After introducing therandom disturbance term into the economic model, the income demand relationshipbecomes stochastic Most of the relationships between economic or other variablesare stochastic in reality This stochastic relationship forms an econometric model
In an econometric model, the dependent variable is called an explained variable andthe independent variables are called explanatory variables
The random error or disturbance term tells us about the parts of the dependentvariable that cannot be predicted by the independent variables in the equation Itcaptures the effect of a large number of omitted variables In our example of household
demand for a commodity (y), income (x1) is not the only variable influencing y The family size, tastes of the family, spending habits and so on affect the variable y.
The error incorporates all other variables not included in the model, some of whichmay not even be quantifiable and some of which may not even be identifiable Wecan minimise the effect of the unobserved disturbance by increasing the number ofexplanatory variables in an econometric model In Eq (1.2.3), u contains price of the commodity (x2), price of other commodities (x3) and tastes and preferences of the
buyers determined by the utility function As x2and x3are observable in the sample,
we can separate out them from the random disturbance u.
y = β0+ β1x1 + β2x2 + β3x3+ ε (1.2.4)
Here, u = β2x2+ β3x3+ ε.
The coefficientsβ0,β1,β2 andβ3 are the parameters describing the nature ofrelationship between quantity demanded and consumer’s income, price of the com-modity concerned and price for other related commodities In Eq (1.2.4),ε contains
buyers’ preferences which are still now unobserved
While an economic model provides a theoretical relation, an econometric model is
a relationship used to analyse real-life situation Econometric model does not provide
a unique relationship between the explained and explanatory variables In incomedemand relationship, for example, Eq (1.2.3) provides different values of y for a given value of x Different buyers can buy different quantities for a commodity at
Trang 30the same income level depending on their preferences which is included in u This
is the main difference between the economic modelling and econometric modelling.Therefore, the formulation of economic models in an empirically testable form is
an econometric model An econometric model is derived from the economic modelwhich has deterministic components and stochastic components The stochastic part
is unobserved represented by a disturbance term that follows a probability bution In econometrics, the disturbance term plays an important role in describingthe nature of relationship and we have to exploit the probability distribution of thedisturbance term to analyse the empirical relationship between the variables Theambiguities inherent in the economic model are resolved by specifying a particulareconometric model The choice of variables in the econometric model is determined
distri-by the economic theory as well as data considerations The error term or disturbanceterm is used to capture the effects of other variables which are not considered in themodel
An econometric model is constructed on the basis of an economic model thatexplains the rational for the possible relation between the variables included in theanalysis However, an economic model provides only a logical description of anissue In order to verify the logical relation and the assumptions related to it are inaccordance with the reality, we need to specify an econometric model on the basis
of the formulation of the economic model and to test the hypothesis relating to theeconomic model by using data from the sample
Regression Function
A population is defined as the set of all elements that are of interest for econometricanalysis It is similar to universal set in set theory Theory provides some proposi-tions which are assumed to be applicable for all In other words, theory focuses onpopulation Therefore, econometric model shown in (1.2.3) or (1.2.4) derived fromeconomic model relates to the population One of the major objectives of economet-rics is to make inference about populations
The econometric model shown in Eq (1.2.3) does not provide unique value of
y for a given value of x1 because of the presence of u It provides a stochastic relationship between y and x1and can be described by a probability distribution of u.
If the error term u follows normal distribution, then y in Eq (1.2.3) will also follow
normal distribution If the mean and variance of u are 0 and σ2, respectively, then
the conditional mean and variance of y are given by
Trang 311.3 Population Regression Function and Sample Regression Function 9
The conditional mean function is called the population regression function (PRF)
The PRF is the relation between expectations of population regressand (y) conditional
on population regressors (x) It provides the theoretical relation in econometric
frame-work The conditional mean function shown in (1.3.1) states that the values of y on average depend on x1 The changes in x1can change the average value of y, not the
actual value of it This is the basic outcome of regression analysis which is discussed
in detail in Chap.2
For each value of x1, we have different values of y with corresponding probabilities
obtained from the normal density curve as shown in Fig.1.2 By joining the mean
values of y at different values of x1, we have a straight line known as the populationregression line (Fig.1.2)
The population is the universal set containing all possible outcomes of a randomexperiment Population is unobserved, and it deals with the theoretical part of aneconometric model What is observed is a finite subset of observations drawn fromthe population This subset is called a sample, a part of the population which is used
to verify the theoretical model The objective of econometric analysis is to makeinference on unobserved population on the basis of observed sample This process
is known as statistical inference
The econometric model based on a sample is called the sample regression function(SRF) Using data from the sample, we estimate the model and make inference on
the population Suppose that x 1i and yi are the actual values of the variables x1and y corresponding to observation unit i in the sample Therefore, the relationship between
y and x1for cross section unit i in the sample is given by
Fig 1.2 Conditional mean function
Trang 32Fig 1.3 Sample regression
function
Thus, Eq (1.3.3) is the sample counterpart of Eq (1.2.3) It presents that the
relationship for observation i and uiis realisations (sampled values) of error variables
If ˆβ0and ˆβ1are the estimated values ofβ0andβ1, respectively, by using the sample
observations, then the estimated conditional mean value of y will be
Equation (1.3.4) is the sample counterpart of (1.3.1) and is called the sample
regression function (SRF) The SRF is the estimated relation between estimated yi and x 1i
The main concern of any econometric model is on population characteristics, theparameters, which are unknown The estimated form of the parameters is statistics,the sample characteristics, which are known On the basis of statistics, we have tofind out the parameters In a linear regression model, we have to estimate the SRF
to investigate the relationship between y and x as suggested in the theory or the
hypothesis put forward by a researcher (Fig.1.3)
Model
The parametric econometric model is based on the prior knowledge of the functionalform relationship If the prior information is correct, the parametric model can explainthe data sets well But, if the functional form is chosen wrongly on the basis of apriori information, estimated results will be biased (Fan and Yao2003) A parametricmodel utilises all information about the data in terms of the parameters only of themodel In a linear regression model with one regressor, for example, two parameters(the coefficient and the intercept) are estimated by analysing the data A parametric
Trang 331.4 Parametric and Nonparametric or Semiparametric Model 11
model has a fixed number of parameters, each with a fixed meaning The simplestexample is the Gaussian model parametrised by its mean and variance
Nonparametric regression model relaxes the assumption of the linearity in theregression analysis and enables one to explore the data more flexibly The nonpara-metric econometric model is specified endogenously on the basis of the data Thestructure of the data tells what the regression model looks like It uses more informa-tion of the data for estimating the model The parameters as well as the current state
of the data that has been observed are used for forecasting The parameters of thenonparametric model are assumed to be infinite in dimensions It has more degrees
of freedom and is more flexible For example, the kernel density estimator tries tocapture small details in the distribution by adding successive correction terms Thenumber of such terms is not fixed apriori, even though each term is parametrised.However, there is no inherent difference between parametric and nonparametricregression model in the sense that the functional form in the nonparametric model isapproximated by infinite number of parameters In many cases, parametric model ispreferred because it is easier to estimate, easier to interpret, and the estimates havebetter statistical properties compared to those of nonparametric regression In thisbook, we have dealt mostly with parametric econometric model
There are four basic steps in econometric model: model specification, model tion, testing of hypotheses and forecasting These steps are described one by one asfollows:
estima-1.5.1 Specification
In formulating an econometric model, we have to specify a relationship based on ory and incorporate a random error An econometric model is an empirically testableform of economic model Economic theory determines the relevant independentvariables and the nature of relationship between the dependent variables and theindependent variables Lack of theoretical understanding leads to model misspecifi-cation either in functional form or in the form of omission of relevant variables orinclusion of irrelevant variables Misspecification in functional form means that themodel fails to account for some form of nonlinearities Functional form misspecifi-cation causes bias in the parameter estimators Regression Specification Error Test(RESET) developed by Ramsey (1969), or the methodology proposed by Davidsonand MacKinnon (1981), or Wooldridge (1994) to test for misspecification may beuseful in this context
Trang 34the-The classical regression model is specified in linear form as shown in (1.2.3):
y = β0xβ1
Here, y denotes output and x1denotes labour with fixed capital and technologicalparameterβ0 The conventional specification of Cobb–Douglas production functioncan be converted into log-linear form as shown in (1.5.2):
ln y = ln β0+ β1ln x1+ ln u (1.5.2)When a regression model is specified in linear form in terms of log of the vari-ables, the regression coefficients measure the proportional change In economics, thecoefficients of the log-linear model provide elasticity measure In this example,β1
measures output elasticity of labour
In some cases, the regression model is specified in semi-log-linear (linear-log orlog-linear) form by transforming either the dependent or the independent variable inlog form:
Both the linear-log model (1.5.3) and log-linear model (1.5.4) are linear in theparameters, although they are not linear in the variables The log-linear model issometimes called the exponential model because it is derived from the followingexponential form:
If the regression equation is specified as linear-log form, the conditional mean of
y will increase by β1/100 units when x1increases by 1%:
E(y|X) = β1
100× 100 × ln x1
Trang 351.5 Steps in Formulating an Econometric Model 13
In a log-linear model as shown in (1.5.4), the conditional mean of y will increase
by 100β1per cent with one-unit increase of x1
1.5.2 Estimation
Econometric models are estimated on the basis of observed data from the sample
by applying a suitable method and tested for the validity of the hypotheses In theparametric model, there are three popular methods of estimation:
• the method of moments,
• the method of least squares and
• the method of maximum likelihood
The method of moments utilises the moment conditions relating to zero ditional and conditional mean of the random errors The most popular method ofestimation is the ordinary least squares (OLS) The least squares principles sug-gest that we should select the estimators of the parameters so as to minimise theresidual sum of square (RSS) The method of maximum likelihood is the broad plat-form for parametric classical estimation in econometrics The statistics, functions
uncon-of the observed data, obtained by maximising the probability uncon-of observation uncon-of theresponses are called maximum likelihood estimators
In the nonparametric model, the method of estimations includes
Let y1, …, yn be a random sample of size n from a population distribution with
a parameterβ A random variable which is a function of the random sample, ˆβ =
T (y 1, …, yn), is called an estimator of the population parameter β, while its value is
called an estimate of the population parameterβ An estimator ˆβ of a parameter β
is a random variable, and the estimate is a single value taken from the distribution of
ˆβ Since an estimate should be close to the parameter, the random variable ˆβ should
be centred close toβ and have a small variance Also, an estimator should be such
that, as n → ∞, ˆβ → β with probability tends to one The estimator, ˆβ, defined in
this way is called the point estimator
Trang 36= 1−α, α ∈ (0, 1), is called a 100(1 − α)% confidence interval of
β The random variables ˆβ1and ˆβ2are called the lower and upper limits, respectively;
1− α is called the confidence coefficient
1.5.3 Testing of Hypothesis
After estimation, testing for goodness of fit of the model is necessary Testing ofhypotheses relates to the statistical inference of the model It is a process throughwhich a sample is used to have an idea about the characteristics of a population Forexample, a sample mean is used to learn about the population mean We begin bystating the value of a population mean and test whether this claim is true or not onthe basis of sample mean by exploiting the behaviour of sampling distribution of thesample mean Hypothesis testing is a statistical process to test the likelihood of theclaims or ideas about a population on the basis of a sample drawn from it
We know that sample mean is an unbiased estimator of population mean Thismeans, on average, the value of the sample mean will be equal to the populationmean Suppose that the population mean of household income is Rs 15,000 If thisclaim is true, on average, the sample mean will be Rs 15,000 (the population mean)
We can illustrate the steps involved in hypothesis testing mostly used in metrics in the following way After specifying an econometric model, we can putforward various hypotheses on the basis of the theory For example, in Eq (1.2.4)
econo-we might hypothesise that price of other commodities (x3) has no effect on demandfor a commodity The hypothesis is equivalent to the population parameterβ3= 0
To test this hypothesis,β3is to be estimated from a random sample drawn from thepopulation We have to compare the estimated value ofβ3 with the expected value
of it if the claim we are testing is true on the basis of some criteria We expect theestimated value ofβ3to be around 0 If the discrepancy between the statistic and theparameter is small, then we will not reject the claim If the discrepancy is too large,then we will reject the claim
1.5.4 Forecasting
Forecasting is an integral part of economic decision-making Forecasting or diction is useful for the policy-makers to evaluate economic policies A forecast ismerely a prediction about the future values of data Forecasting is made by using theestimated model Normally, regression analysis is used to make forecasts Forecasts
pre-by using a regression model are made pre-by assuming that the relationship stated in theregression model continues to exist in future
Trang 371.5 Steps in Formulating an Econometric Model 15
There are two types of forecasts in time series econometrics: ex-post forecast andex-ante forecast Ex-post forecasts are made beyond the period of estimation, butwithin the period where actual information is available Ex-post forecasts are usefulfor studying the behaviour of forecasting models Ex-ante forecasts are those that aremade for the period where actual information is not available In order to generateex-ante forecasts, the model requires forecasts of the predictors
Data, particularly non-experimental data,2are the main inputs in econometric ysis Therefore, a researcher should have a clear idea about the data, its basic char-acteristics and the process through which data are generated, before using them
anal-in econometric model Data are not merely some numerical figures, but they aregenerated through a process called the data generating process The nature of datagenerating process largely depends on the time period over which data are collected.Data sets used in econometric analysis are of three types: cross section data, timeseries data and panel data
1.6.1 Cross Section Data
Cross section data are collected through sample survey or complete enumerationmethod The information on collected across cross section units like households,firms or countries, at a given point in time forms the cross section data In most of thecases, the information cannot be collected precisely at the same time period Data may
be collected during a very short period, normally one year, and we can call it as a crosssection data set The data generating process in a very short period is deterministic in
a sense that single realisation of a variable is not a stochastic process In other words,the factors determining the observed value of a variable (e.g income) during a shortperiod of time are well known to the respondent Therefore, cross section data arenot stochastic in nature
Cross section data are generated by an individual researcher through field survey
or by the official agencies in different countries In India, the National Sample vey Office (NSSO) under the Ministry of Statistics and Programme Implementation(MOSPI) conducts survey to collect cross section data on several issues The house-hold consumer expenditure survey and the survey on employment and unemploymentare very much popular cross section data in Indian official statistics Cross sectiondata are widely used in economics and other social sciences In economics, cross
Sur-2 Non-experimental data are collected not through controlled experiments of the observation units Experimental data, on the other hand, are collected in laboratory environments.
Trang 38section data are mostly used in labour economics, industrial organisation, phy, health economics and any other applied microeconomics.
demogra-Cross section data are obtained by random sampling from the underlying tion through survey Random sampling simplifies the analysis of cross section data.Sometimes sampling may not be random in cross section data For example, supposethat we are interested in studying factors that influence buying a new car We cancollect information by taking a random sample of households, but some households
popula-do not have sufficient income or wealth to buy a new car and they might refuse
to respond While data were collected by using random sampling, resulting sample
in this case is not a random sample This problem is known as a sample selectionproblem
1.6.2 Time Series Data
Time series data consist of observations on a variable or several variables collectedover time Time is an important dimension in a time series data Most of the macroe-conomic data are time series The data generating process of time series is stochastic,and the realisation of time series data is characterised by a joint probability densityfunction As time series data are stochastic, a researcher has to examine the stochasticbehaviour of the variables before using them in econometric model Time series dataare not collected through survey as for the cross section data Most of the time seriesdata are estimated and available in official statistics As time series data are estimated,they are stochastic in nature The National Accounts Division (NAD) of the CentralStatistics Office (CSO) prepares National Accounts Statistics (NAS) which is theprimary source of macroeconomic time series in India Time series data are useful
in analysing trend and forecasting in macroeconometric model In finance, they areused in forecasting volatility along with mean return from a financial asset
A key feature of time series data is that they are related, often strongly related, totheir recent histories This feature creates a critical problem in using time series data
in a standard econometric model More steps are needed in specifying econometricmodels for time series data before using them in standard econometric methods
1.6.3 Pooled Cross Section
Pooling of two or more sets of cross section data containing similar issues obtainedfrom different samples at different time points drawn from the same populationform is called pooled cross section data The features of pooled cross section dataare similar to those of the cross section data Suppose that two cross section datasets are taken from employment and unemployment survey in India undertaken bythe National Sample Survey Office (NSSO), one in 2004 and other in 2011 Thesurveys were conducted by using the same sample design with a different sample of
Trang 391.6 Data 17
households chosen randomly from the same population both in 2004 and in 2011 If
we combine these two different random samples in two different time periods fromthe same population, we get pooled cross section The use of pooled cross sectiondata provides more robust result because it contains more number of observationsfor different time periods The pooled cross section data are useful to look into thechanging behaviour over two or more time points
1.6.4 Panel Data
Panel data are a mix of cross section and time series Panel data are obtained byrepeating a survey with the same set of sample units for information on similarissues over time A time series for each cross section unit forms a set of panel data
or longitudinal data If the cross section units are micro units like households, andfirms, the panel is called the micro panel In a micro panel, time dimension is lessthan cross section dimension If, on the other hand, the cross section units are macrounits like countries, the panel is called the macro panel Time dimension is very large
as compared to cross section dimension in a macro panel Panel data may also bebalanced or unbalanced depending on whether all information is available for allunits at every time point
The key feature of panel data is that it considers the same cross section unitsover a given time period Panel data sets, especially those on individuals, householdsand firms, are more difficult to obtain in developing countries There are no paneldata, particularly micro panel data, in official statistics in India For this reason,pooled cross section data have gained popularity in using econometric model in thedeveloping world
Application of econometric theories with data needs econometric or statistical ware In this section, some basic points on operational issues of Stata 15.1 have beendiscussed in a short manner Stata is a powerful statistical software used in carryingout statistical and econometric techniques Stata is available now in version 15.1 forWindows, Unix and Mac computers
soft-Main windows in Stata
There are five docked windows in Stata The Command window locating below in the
startup window is used for typing commands The larger window immediately above
the Command window is the Result window which shows the results after executing any command The Review window on the left keeps track of the commands already used The variables in the data set are listed in the Variable window on the top right.
Trang 40Properties of the variables are displayed in the Properties window just below the
Variable window In addition, there are some subsidiary windows like the Graph, Viewer, Variables Manager, Data Editor and Do file Editor in Stata.
Menu and dialogue system
Stata allows selecting commands and options from a menu and dialogue system.There are a number of menus at the top of the Stata main window that can be usedfor econometric analysis For example, going to ‘FILE’, ‘OPEN’, and selecting thefile will open the data set This is a useful way to learn commands at the beginning.The alternative is to type commands directly into the command window Stata canwork as a calculator using thedisplaycommand Stata commands are highly case-sensitive The commanddisplaycannot be written asDisplay
Data browser
Data browser and data editor look like excel sheet where data are in the memory inStata If we want to look at the actual data in a data file (.dta), we can open the databrowser from the Stata menu The first row of the data browser displays the variablenames Each column indicates a variable, and each row is the observation of the dataset Data editor looks similar to data browser, but in the editor we can able to editthe data
log file
Log file records all commands and output during a particular session To keep track
of our analysis, we should open a log at the start of every session in Stata
1.7.1.1 Stata Data Files
Stata data sets are rectangular arrays with n observations on m variables.