Typical topics include unit root tests,cointegration, estimation by the generalized method of moments, heteroskedasticityand autocorrelation consistent standard errors, modelling conditi
Trang 3A Guide to
Modern
Econometrics
Trang 6West Sussex PO19 8SQ, England Telephone (+44) 1243 779777 Email (for orders and customer service enquiries): cs-books@wiley.co.uk
Visit our Home Page on www.wileyeurope.com or www.wiley.com
All Rights Reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning
or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to permreq@wiley.co.uk,
or faxed to ( +44) 1243 770620.
This publication is designed to provide accurate and authoritative information in regard to the subject matter covered It is sold on the understanding that the Publisher is not engaged in rendering professional services If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809 John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1 Wiley also publishes its books in a variety of electronic formats Some content that appears
in print may not be available in electronic books.
Library of Congress Cataloging-in-Publication Data
Verbeek, Marno.
A guide to modern econometrics / Marno Verbeek – 2nd ed.
p cm.
Includes bibliographical references and index.
ISBN 0-470-85773-0 (pbk : alk paper)
1 Econometrics 2 Regression analysis I Title.
HB139.V465 2004
330.015195 – dc22
2004004222
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN 0-470-85773-0
Typeset in 10/12pt Times by Laserwords Private Limited, Chennai, India
Printed and bound in Great Britain by TJ International, Padstow, Cornwall
This book is printed on acid-free paper responsibly manufactured from sustainable forestry
in which at least two trees are planted for each one used for paper production.
Trang 72.5.4 A Joint Test of Significance of Regression Coefficients 27
Trang 82.6 Asymptotic Properties of the OLS Estimator 32
4.3.2 Estimator Properties and Hypothesis Testing 84
4.3.4 Heteroskedasticity-consistent Standard Errors for OLS 87
4.4.1 Testing Equality of Two Unknown Variances 904.4.2 Testing for Multiplicative Heteroskedasticity 91
Trang 94.10.2 Heteroskedasticity-and-autocorrelation-consistent
4.11 Illustration: Risk Premia in Foreign Exchange Markets 112
4.11.2 Tests for Risk Premia in the One-month Market 1134.11.3 Tests for Risk Premia Using Overlapping Samples 116
5.1 A Review of the Properties of the OLS Estimator 122
5.2.1 Autocorrelation with a Lagged Dependent Variable 126
5.3.1 Estimation with a Single Endogenous Regressor
5.4 Illustration: Estimating the Returns to Schooling 1375.5 The Generalized Instrumental Variables Estimator 1425.5.1 Multiple Endogenous Regressors with an Arbitrary
5.7 Illustration: Estimating Intertemporal Asset
Trang 105.8 Concluding Remarks 157
6 Maximum Likelihood Estimation and Specification Tests 161
6.4 Quasi-maximum Likelihood and Moment Conditions Tests 182
7.3.1 The Poisson and Negative Binomial Models 2117.3.2 Illustration: Patents and R&D Expenditures 215
Trang 11CONTENTS ix
7.4.3 Illustration: Expenditures on Alcohol and Tobacco
7.5.4 Illustration: Expenditures on Alcohol and Tobacco
7.6.2 Semi-parametric Estimation of the Sample Selection
7.8.3 Illustration: Duration of Bank Relationships 249
8.1.2 Stationarity and the Autocorrelation Function 258
8.4.1 Testing for Unit Roots in a First Order Autoregressive
8.7.5 Illustration: Modelling the Price/Earnings Ratio 286
Trang 128.8 Predicting with ARMA Models 288
8.9 Illustration: The Expectations Theory of the Term Structure 2938.10 Autoregressive Conditional Heteroskedasticity 297
8.10.3 Illustration: Volatility in Daily Exchange Rates 303
9.2.3 Cointegration and Error-correction Mechanisms 3189.3 Illustration: Long-run Purchasing Power Parity (Part 2) 319
9.5.2 Example: Cointegration in a Bivariate VAR 327
9.5.4 Illustration: Long-run Purchasing Power Parity (Part 3) 331
10.2.5 Alternative Instrumental Variables Estimators 353
10.2.7 Testing for Heteroskedasticity and Autocorrelation 357
10.5 Illustration: Wage Elasticities of Labour Demand 366
Trang 13CONTENTS xi
10.6 Nonstationarity, Unit Roots and Cointegration 368
10.7.5 Dynamics and the Problem of Initial Conditions 378
10.8.3 Estimation with Nonrandomly Missing Data 385
Trang 15Emperor Joseph II: “Your work is ingenious It’s quality work And there are simply
too many notes, that’s all Just cut a few and it will be perfect.”
Wolfgang Amadeus Mozart: “Which few did you have in mind, Majesty?”
from the movie Amadeus, 1984 (directed by Milos Forman)
The field of econometrics has developed rapidly in the last two decades, while the use
of up-to-date econometric techniques has become more and more standard practice inempirical work in many fields of economics Typical topics include unit root tests,cointegration, estimation by the generalized method of moments, heteroskedasticityand autocorrelation consistent standard errors, modelling conditional heteroskedasticity,models based on panel data, and models with limited dependent variables, endoge-nous regressors and sample selection At the same time econometrics software hasbecome more and more user friendly and up-to-date As a consequence, users areable to implement fairly advanced techniques even without a basic understanding ofthe underlying theory and without realizing potential drawbacks or dangers In con-trast, many introductory econometrics textbooks pay a disproportionate amount ofattention to the standard linear regression model under the strongest set of assump-tions Needless to say that these assumptions are hardly satisfied in practice (butnot really needed either) On the other hand, the more advanced econometrics text-books are often too technical or too detailed for the average economist to grasp theessential ideas and to extract the information that is needed This book tries to fillthis gap
The goal of this book is to familiarize the reader with a wide range of topics
in modern econometrics, focusing on what is important for doing and understandingempirical work This means that the text is a guide to (rather than an overview of)alternative techniques Consequently, it does not concentrate on the formulae behindeach technique (although the necessary ones are given) nor on formal proofs, but onthe intuition behind the approaches and their practical relevance The book covers awide range of topics that is usually not found in textbooks at this level In partic-ular, attention is paid to cointegration, the generalized method of moments, models
Trang 16with limited dependent variables and panel data models As a result, the book cusses developments in time series analysis, cross-sectional methods as well as paneldata modelling Throughout, a few dozen full-scale empirical examples and illus-trations are provided, taken from fields like labour economics, finance, internationaleconomics, consumer behaviour, environmental economics and macro-economics Inaddition, a number of exercises are of an empirical nature and require the use ofactual data.
dis-For the second edition, I have tried to fine-tune and update the text, adding additionaldiscussion, material and more recent references, whenever necessary or desirable Thematerial is organized and presented in a similar way as in the first edition Some topicsthat were not or only limitedly included in the first edition now receive much moreattention Most notably, new sections covering count data models, duration models andthe estimation of treatment effects in Chapter 7, and panel data unit root and cointe-gration tests in Chapter 10 are added Moreover, Chapter 2 now includes a subsection
on Monte Carlo simulation At several places, I pay more attention to the possibilitythat small sample distributions of estimators and test statistics may differ from theirasymptotic approximations Several new tests have been added to Chapters 3 and 5,and the presentation in Chapters 6 and 8 has been improved At a number of places,empirical illustrations have been updated or added As before, (almost) all data setsare available through the book’s website
This text originates from lecture notes used for courses in Applied Econometrics inthe M.Sc programs in Economics at K U Leuven and Tilburg University It is writ-ten for an intended audience of economists and economics students that would like tobecome familiar with up-to-date econometric approaches and techniques, important fordoing, understanding and evaluating empirical work It is very well suited for courses
in applied econometrics at the masters or graduate level At some schools this bookwill be suited for one or more courses at the undergraduate level, provided studentshave a sufficient background in statistics Some of the later chapters can be used inmore advanced courses covering particular topics, for example, panel data, limiteddependent variable models or time series analysis In addition, this book can serve as
a guide for managers, research economists and practitioners who want to update theirinsufficient or outdated knowledge of econometrics Throughout, the use of matrixalgebra is limited
I am very much indebted to Arie Kapteyn, Bertrand Melenberg, Theo Nijman, andArthur van Soest, who all have contributed to my understanding of econometrics andhave shaped my way of thinking about many issues The fact that some of their ideashave materialized in this text is a tribute to their efforts I also owe many thanks toseveral generations of students who helped me to shape this text into its current form I
am very grateful to a large number of people who read through parts of the manuscriptand provided me with comments and suggestions on the basis of the first edition Inparticular, I wish to thank Peter Boswijk, Bart Cap´eau, Geert Dhaene, Tom Doan,Peter de Goeij, Joop Huij, Ben Jacobsen, Jan Kiviet, Wim Koevoets, Erik Kole, MarcoLyrio, Konstantijn Maes, Wessel Marquering, Bertrand Melenberg, Paulo Nunes, Ana-toly Peresetsky, Max van de Sande Bakhuyzen, Erik Schokkaert, Arthur van Soest,Frederic Vermeulen, Guglielmo Weber, Olivier Wolthoorn, Kuo-chun Yeh and a num-ber of anonymous reviewers Of course I retain sole responsibility for any remaining
Trang 19Traditionally econometrics has focused upon aggregate economic relationships.Macro-economic models consisting of several up to many hundreds equations werespecified, estimated and used for policy evaluation and forecasting The recenttheoretical developments in this area, most importantly the concept of cointegration,have generated increased attention to the modelling of macro-economic relationshipsand their dynamics, although typically focusing on particular aspects of the economy.Since the 1970s econometric methods are increasingly employed in micro-economicmodels describing individual, household or firm behaviour, stimulated by thedevelopment of appropriate econometric models and estimators which take into accountproblems like discrete dependent variables and sample selection, by the availability oflarge survey data sets, and by the increasing computational possibilities More recently,the empirical analysis of financial markets has required and stimulated many theoreticaldevelopments in econometrics Currently econometrics plays a major role in empiricalwork in all fields of economics, almost without exception, and in most cases it is nolonger sufficient to be able to run a few regressions and interpret the results As aresult, introductory econometrics textbooks usually provide insufficient coverage forapplied researchers On the other hand, the more advanced econometrics textbooks areoften too technical or too detailed for the average economist to grasp the essential ideasand to extract the information that is needed Thus there is a need for an accessibletextbook that discusses the recent and relatively more advanced developments.
Trang 20The relationships that economists are interested in are formally specified in matical terms, which lead to econometric or statistical models In such models there isroom for deviations from the strict theoretical relationships due to, for example, mea-surement errors, unpredictable behaviour, optimization errors or unexpected events.Broadly, econometric models can be classified in a number of categories.
mathe-A first class of models describes relationships between present and past For example,how does the short-term interest rate depend on its own history? This type of model,typically referred to as a time series model, usually lacks any economic theory and
is mainly built to get forecasts for future values and the corresponding uncertainty
or volatility
A second type of model considers relationships between economic quantities over acertain time period These relationships give us information on how (aggregate) eco-nomic quantities fluctuate over time in relation to other quantities For example, whathappens to the long-term interest rate if the monetary authority adjusts the short-termone? These models often give insight into the economic processes that are operating.Third, there are models that describe relationships between different variables mea-sured at a given point in time for different units (for example households or firms).Most of the time, this type of relationship is meant to explain why these units are dif-ferent or behave differently For example, one can analyse to what extent differences inhousehold savings can be attributed to differences in household income Under parti-cular conditions, these cross-sectional relationships can be used to analyse ‘what if’questions For example, how much more would a given household, or the averagehousehold, save if income would increase by 1%?
Finally, one can consider relationships between different variables measured fordifferent units over a longer time span (at least two periods) These relationshipssimultaneously describe differences between different individuals (why does person 1save much more than person 2?), and differences in behaviour of a given individual overtime (why does person 1 save more in 1992 than in 1990?) This type of model usuallyrequires panel data, repeated observations over the same units They are ideally suitedfor analysing policy changes on an individual level, provided that it can be assumedthat the structure of the model is constant into the (near) future
The job of econometrics is to specify and quantify these relationships That is, metricians formulate a statistical model, usually based on economic theory, confront itwith the data, and try to come up with a specification that meets the required goals The
econo-unknown elements in the specification, the parameters, are estimated from a sample
of available data Another job of the econometrician is to judge whether the resultingmodel is ‘appropriate’ That is, check whether the assumptions made to motivate theestimators (and their properties) are correct, and check whether the model can be usedfor what it is made for For example, can it be used for prediction or analysing policychanges? Often, economic theory implies that certain restrictions apply to the modelthat is estimated For example, (one version of) the efficient market hypothesis impliesthat stock market returns are not predictable from their own past An important goal ofeconometrics is to formulate such hypotheses in terms of the parameters in the modeland to test their validity
The number of econometric techniques that can be used is numerous and their ity often depends crucially upon the validity of the underlying assumptions This bookattempts to guide the reader through this forest of estimation and testing procedures,
Trang 21valid-THE STRUCTURE OF THIS BOOK 3
not by describing the beauty of all possible trees, but by walking through this forest
in a structured way, skipping unnecessary side-paths, stressing the similarity of thedifferent species that are encountered, and by pointing out dangerous pitfalls Theresulting walk is hopefully enjoyable and prevents the reader from getting lost in theeconometric forest
The first part of this book consists of Chapters 2, 3 and 4 Like most textbooks, it startswith discussing the linear regression model and the OLS estimation method Chapter 2presents the basics of this important estimation method, with some emphasis on itsvalidity under fairly weak conditions, while Chapter 3 focuses on the interpretation ofthe models and the comparison of alternative specifications Chapter 4 considers twoparticular deviations from the standard assumptions of the linear model: autocorrela-tion and heteroskedasticity of the error terms It is discussed how one can test forthese phenomena, how they affect the validity of the OLS estimator and how this can
be corrected This includes a critical inspection of the model specification, the use
of adjusted standard errors for the OLS estimator and the use of alternative (GLS)estimators These three chapters are essential for the remaining part of this book andshould be the starting point in any course
In Chapter 5 another deviation from the standard assumptions of the linear model isdiscussed which is, however, fatal for the OLS estimator As soon as the error term inthe model is correlated with one or more of the explanatory variables all good properties
of the OLS estimator disappear and we necessarily have to use alternative estimators.The chapter discusses instrumental variables (IV) estimators and, more generally, thegeneralized method of moments (GMM) This chapter, at least its earlier sections, isalso recommended as an essential part of any econometrics course
Chapter 6 is mainly theoretical and discusses maximum likelihood (ML) estimation.Because in empirical work maximum likelihood is often criticized for its dependenceupon distributional assumptions, it is not discussed in the earlier chapters where alter-natives are readily available that are either more robust than maximum likelihood or(asymptotically) equivalent to it Particular emphasis in Chapter 6 is on misspecifica-tion tests based upon the Lagrange multiplier principle While many empirical studiestend to take the distributional assumptions for granted, their validity is crucial for con-sistency of the estimators that are employed and should therefore be tested Often thesetests are relatively easy to perform, although most software does not routinely providethem (yet) Chapter 6 is crucial for understanding Chapter 7 on limited dependentvariable models and for a small number of sections in Chapters 8 to 10
The last part of this book contains four chapters Chapter 7 presents models thatare typically (though not exclusively) used in micro-economics, where the dependentvariable is discrete (e.g zero or one), partly discrete (e.g zero or positive) or a duration
It also includes discussions of the sample selection problem and the estimation oftreatment effects that go further than their typical textbook treatment
Chapters 8 and 9 discuss time series modelling including unit roots, cointegrationand error-correction models These chapters can be read immediately after Chapter 4 or
5, with the exception of a few parts that relate to maximum likelihood estimation The
Trang 22theoretical developments in this area over the last 20 years have been substantial andmany recent textbooks seem to focus upon it almost exclusively Univariate time seriesmodels are covered in Chapter 8 In this case models are developed that explain aneconomic variable from its own past This includes ARIMA models, as well as GARCHmodels for the conditional variance of a series Multivariate time series models thatconsider several variables simultaneously are discussed in Chapter 9 This includesvector autoregressive models, cointegration and error-correction models.
Finally, Chapter 10 covers models based on panel data Panel data are available if
we have repeated observations of the same units (for example households, firms orcountries) The last decade the use of panel data has become important in many areas
of economics Micro-economic panels of households and firms are readily availableand, given the increase in computing resources, more manageable than in the past Inaddition, it is more and more common to pool time series of several countries One ofthe reasons for this may be that researchers believe that a cross-sectional comparison
of countries provides interesting information, in addition to a historical comparison of
a country with its own past This chapter also discusses the recent developments onunit roots and cointegration in a panel data setting
At the end of the book the reader will find two short appendices discussing matical and statistical results that are used at several places in the book This includes
mathe-a discussion of some relevmathe-ant mmathe-atrix mathe-algebrmathe-a mathe-and distribution theory In pmathe-articulmathe-ar, mathe-adiscussion of properties of the (bivariate) normal distribution, including conditionalexpectations, variances and truncation is provided
In my experience the material in this book is too much to be covered in a gle course Different courses can be scheduled on the basis of the chapters thatfollow For example, a typical graduate course in applied econometrics would coverChapters 2, 3, 4, parts of Chapter 5, and then continue with selected parts of Chapters 8and 9 if the focus is on time series analysis, or continue with Section 6.1 and Chapter 7
sin-if the focus is on cross-sectional models A more advanced undergraduate or graduatecourse may focus attention to the time series chapters (Chapters 8 and 9), the micro-econometric chapters (Chapters 6 and 7) or panel data (Chapter 10 with some selectedparts from Chapters 6 and 7)
Given the focus and length of this book, I had to make many choices of whichmaterial to present or not As a general rule I did not want to bother the reader withdetails that I considered not essential or do not have empirical relevance The maingoal was to give a general and comprehensive overview of the different methodologiesand approaches, focusing on what is relevant for doing and understanding empiricalwork Some topics are only very briefly mentioned and no attempt is made to discussthem at any length To compensate for this I have tried to give references at appropriateplaces to other, often more advanced, textbooks that do cover these issues
In most chapters a variety of empirical illustrations is provided in separate sections orsubsections While it is possible to skip these illustrations essentially without losingcontinuity, these sections do provide important aspects concerning the implementation
of the methodology discussed in the preceding text In addition, I have attempted to
Trang 23ILLUSTRATIONS AND EXERCISES 5
provide illustrations that are of economic interest in themselves, using data that aretypical for current empirical work and covering a wide range of different areas Thismeans that most data sets are used in recently published empirical work and are fairlylarge, both in terms of number of observations and number of variables Given thecurrent state of computing facilities, it is usually not a problem to handle such largedata sets empirically
Learning econometrics is not just a matter of studying a textbook Hands-on ence is crucial in the process of understanding the different methods and how and when
experi-to implement them Therefore, readers are strongly encouraged experi-to get their hands dirtyand to estimate a number of models using appropriate or inappropriate methods, and
to perform a number of alternative specification tests With modern software becomingmore and more user-friendly, the actual computation of even the more complicatedestimators and test statistics is often surprisingly simple, sometimes dangerously sim-ple That is, even with the wrong data, the wrong model and the wrong methodology,programs may come up with results that are seemingly all right At least some exper-tise is required to prevent the practitioner from such situations and this book plays animportant role in this
To stimulate the reader to use actual data and estimate some models, almost all datasets used in this text are available through the web site http://www.wileyeurope.com/go/verbeek2ed Readers are encouraged to re-estimate the models reported in this textand check whether their results are the same, as well as to experiment with alternativespecifications or methods Some of the exercises make use of the same or additionaldata sets and provide a number of specific issues to consider It should be stressedthat for estimation methods that require numerical optimization, alternative programs,algorithms or settings may give slightly different outcomes However, you should getresults that are close to the ones reported
I do not advocate the use of any particular software package For the linear sion model any package will do, while for the more advanced techniques each packagehas its particular advantages and disadvantages There is typically a trade-off betweenuser-friendliness and flexibility Menu driven packages often do not allow you to com-pute anything else than what’s on the menu, but if the menu is sufficiently rich thatmay not be a problem Command driven packages require somewhat more input fromthe user, but are typically quite flexible For the illustrations in the text, I made use ofEViews 3.0, LIMDEP 7.0, MicroFit 4.0, RATS 5.1 and Stata 7.0 Several alternativeeconometrics programs are available, including ET, PcGive, TSP and SHAZAM Jour-
regres-nals like the Journal of Applied Econometrics and the Journal of Economic Surveys
regularly publish software reviews
The exercises included at the end of each chapter consist of a number of questionsthat are primarily intended to check whether the reader has grasped the most importantconcepts Therefore, they typically do not go into technical details nor ask for deriva-tions or proofs In addition, several exercises are of an empirical nature and requirethe reader to use actual data
Trang 252 An Introduction to
Linear Regression
One of the cornerstones of econometrics is the so-called linear regression model and the ordinary least squares (OLS) estimation method In the first part of this
book we shall review the linear regression model with its assumptions, how it can
be estimated, how it can be used for generating predictions and for testing economichypotheses
Unlike many textbooks, I do not start with the statistical regression model withthe standard, Gauss–Markov, assumptions In my view the role of the assumptionsunderlying the linear regression model is best appreciated by first treating the mostimportant technique in econometrics, ordinary least squares, as an algebraic tool ratherthan a statistical one This is the topic of Section 2.1 The linear regression model isthen introduced in Section 2.2, while Section 2.3 discusses the properties of the OLSestimator in this model under the so-called Gauss–Markov assumptions Section 2.4discusses goodness-of-fit measures for the linear model, and hypothesis testing istreated in Section 2.5 In Section 2.6, we move to cases where the Gauss–Markovconditions are not necessarily satisfied and the small sample properties of the OLSestimator are unknown In such cases, the limiting behaviour of the OLS estima-tor when – hypothetically – the sample size becomes infinitely large, is commonlyused to approximate its small sample properties An empirical example concerningthe capital asset pricing model (CAPM) is provided in Section 2.7 Sections 2.8 and2.9 discuss multicollinearity and prediction, respectively Throughout, an empiricalexample concerning individual wages is used to illustrate the main issues Additionaldiscussion on how to interpret the coefficients in the linear model, how to test some
of the model’s assumptions and how to compare alternative models, is provided inChapter 3
Trang 262.1 Ordinary Least Squares as an Algebraic Tool
2.1.1 Ordinary Least Squares
Suppose we have a sample with N observations on individual wages and some ground characteristics Our main interest lies in the question how in this sample wages are related to the other observables Let us denote wages by y and the other K− 1 char-acteristics byx2, , x K It will become clear below why this numbering of variables
back-is convenient Now we may ask the question: which linear combination ofx2, , x K
and a constant gives a good approximation of y? To answer this question, first consider
an arbitrary linear combination, including a constant, which can be written as
˜β1+ ˜β2x2+ · · · + ˜βK x K , (2.1)
where ˜β1, , ˜ β K are constants to be chosen Let us index the observations by i such
that i = 1, , N Now, the difference between an observed value yi and its linearapproximation is
small as possible That is, we determine ˜β to minimize the following objective function
This approach is referred to as the ordinary least squares or OLS approach Taking
squares makes sure that positive and negative deviations do not cancel out when takingthe summation
To solve the minimization problem, we can look at the first order conditions, obtained
by differentiatingS( ˜ β) with respect to the vector ˜ β (Appendix A discusses some rules
on how to differentiate a scalar expression, like (2.4), with respect to a vector.) This
gives the following system of K conditions:
Trang 27ORDINARY LEAST SQUARES AS AN ALGEBRAIC TOOL 9
These equations are sometimes referred to as normal equations As this system has K
unknowns, one can obtain a unique solution for ˜β provided that the symmetric matrix
N
i=1x i x i, which contains sums of squares and cross products of the regressorsx i, can
be inverted For the moment, we shall assume that this is the case The solution to the
minimization problem, which we shall denote by b, is then given by
errors) is minimal for the least squares solution b.
In deriving the linear approximation we have not used any economic or statisticaltheory It is simply an algebraic tool and it holds irrespective of the way the data aregenerated That is, given a set of variables we can always determine the best linearapproximation of one variable using the other variables The only assumption that
we had to make (which is directly checked from the data) is that the K × K matrix
N
i=1x i x i is invertible This says that none of thex ik s is an exact linear combination
of the other ones and thus redundant This is usually referred to as the collinearity assumption It should be stressed that the linear approximation is an
no-multi-in-sample result (that is, in principle it does not give information about observations
(individuals) that are not included in the sample) and, in general, there is no directinterpretation of the coefficients
Despite these limitations, the algebraic results on the least squares method are very
useful Defining a residuale i as the difference between the observed and the imated value,e i = yi − ˆyi = yi − x
approx-i b, we can decompose the observed y i as
Trang 28which is referred to as the residual sum of squares It can be shown that the
approx-imated value x ib and the residual e i satisfy certain properties by construction For
example, if we rewrite (2.5), substituting the OLS solution b, we obtain
This means that the vectore = (e1, , e N )is orthogonal1 to each vector of
observa-tions on an x-variable For example, if x i contains a constant, it implies thatN
i=1e i =
0 That is, the average residual is zero This is an intuitively appealing result If theaverage residual were nonzero, this would mean that we could improve upon theapproximation by adding or subtracting the same constant for each observation, i.e bychangingb1 Consequently, for the average observation it follows that
where ¯y = (1/N)Ni=1y i and ¯x = (1/N)Ni=1x i , a K-dimensional vector of sample
means This shows that for the average observation there is no approximation error
Similar interpretations hold for the other x-variables: if the derivative of the sum of
squared approximation errors with respect to ˜β k is positive, that is ifN
i=1x ik e i > 0
it means that we can improve the objective function by decreasing ˜β k
2.1.2 Simple Linear Regression
In the case where K= 2 we only have one regressor and a constant In this case,the observations2 (y i , x i ) can be drawn in a two-dimensional graph with x-values on
the horizontal axis and y-values on the vertical one This is done in Figure 2.1 for
a hypothetical data set The best linear approximation of y from x and a constant is
obtained by minimizing the sum of squared residuals, which – in this two-dimensionalcase – equal the vertical distances between an observation and the fitted value All
fitted values are on a straight line, the regression line.
Because a 2× 2 matrix can be inverted analytically we can derive solutions for b1andb2 in this special case from the general expression for b above Equivalently, we
can minimize the residual sum of squares with respect to the unknowns directly Thus
1Two vectors x and y are said to be orthogonal if xy= 0, that is ifi x i y i= 0 (see Appendix A).
2 In this subsectionx will be used to denote the single regressor, so that it does not include the constant.
Trang 29ORDINARY LEAST SQUARES AS AN ALGEBRAIC TOOL 11
Figure 2.1 Simple linear regression: fitted line and observation points
From (2.13) we can write
Through adding a factor 1/(N − 1) to numerator and denominator it appears that the
OLS solutionb2 is the ratio of the sample covariance between x and y and the sample variance of x From (2.15), the intercept is determined so as to make the average
approximation error (residual) equal to zero
Trang 302.1.3 Example: Individual Wages
An example that will appear frequently in this chapter is based on a sample of individualwages with background characteristics, like gender, race and years of schooling Weuse a subsample of the US National Longitudinal Survey (NLS) that relates to 1987and we have a sample of 3294 young working individuals, of which 1569 are female.3The average hourly wage rate in this sample equals $6.31 for males and $5.15 forfemales Now suppose we try to approximate wages by a linear combination of aconstant and a 0–1 variable denoting whether the individual is male or not That is,
x i = 1 if individual i is male and zero otherwise Such a variable, which can only take
on the values of zero and one, is called a dummy variable Using the OLS approach
the result is
ˆyi = 5.15 + 1.17xi
This means that for females our best approximation is $5.15 and for males it is $5.15+
$1.17 = $6.31 It is not a coincidence that these numbers are exactly equal to the
sample means in the two subsamples It is easily verified from the results above that
nota-is faster, but it requires some knowledge of matrix differential calculus We introducethe following notation:
given in (2.4), can be rewritten in matrix notation using that the inner product of a
given vector a with itself (aa) is the sum of its squared elements (see Appendix A).
That is,
S( ˜ β) = (y − X ˜β)(y − X ˜β) = yy − 2yX ˜ β + ˜βXX ˜ β, (2.17)
3 The data for this example are available as WAGES1.
Trang 31ORDINARY LEAST SQUARES AS AN ALGEBRAIC TOOL 13
from which the least squares solution follows from differentiating4 with respect to ˜β
and setting the result to zero:
is no exact (or perfect) multicollinearity
As before, we can decompose y as
where e is an N-dimensional vector of residuals The first order conditions imply that
X(y − Xb) = 0 or
Xe = 0, (2.21)
which means that each column of the matrix X is orthogonal to the vector of residuals.
With (2.19) we can also write (2.20) as
y = Xb + e = X(XX)−1Xy + e = ˆy + e (2.22)
so that the predicted value for y is given by
ˆy = Xb = X(XX)−1Xy = PX y (2.23)
In linear algebra, the matrixP X ≡ X(XX)−1X is known as a projection matrix (see
Appendix A) It projects the vector y upon the columns of X (the column space
of X ) This is just the geometric translation of finding the best linear tion of y from the columns (regressors) in X The residual vector of the projection
approxima-e = y − Xb = (I − PX )y = MX y is the orthogonal complement It is a projection of
y upon the space orthogonal to the one spanned by the columns of X This
interpre-tation is sometimes useful For example, projecting twice on the same space shouldleave the result unaffected, so that it holds thatP X P X = PX andM X M X = MX Moreimportantly, it holds that M X P X = 0 as the column space of X and its orthogonal
complement do not have anything in common (except the null vector) This is analternative way to interpret the result that ˆy and e and also X and e are orthogonal.
The interested reader is referred to Davidson and MacKinnon (1993, Chapter 1) for anexcellent discussion on the geometry of least squares
4 See Appendix A for some rules for differentiating matrix expressions with respect to vectors.
Trang 322.2 The Linear Regression Model
Usually, economists want more than just finding the best linear approximation ofone variable given a set of others They want economic relationships that are moregenerally valid than the sample they happen to have They want to draw conclusionsabout what happens if one of the variables actually changes That is: they want tosay something about things that are not observed (yet) In this case, we want therelationship that is found to be more than just a historical coincidence; it should reflect
a fundamental relationship To do this it is assumed that there is a general relationshipthat is valid for all possible observations from a well-defined population (for exampleall US households, or all firms in a certain industry) Restricting attention to linear
relationships, we specify a statistical model as
y i = β1+ β2x i2 + · · · + βK x iK + εi (2.24)
or
y i = x
i β + εi , (2.25)
where y i and x i are observable variables and ε i is unobserved and referred to as an
error term or disturbance term The elements inβ are unknown population
param-eters The equality in (2.25) is supposed to hold for any possible observation, while
we only observe a sample of N observations We shall consider this sample as one
realization of all potential samples of size N that could have been drawn from the same
population In this way we can view y i and ε i (and oftenx i) as random variables.
Each observation corresponds to a realization of these random variables Again we canuse matrix notation and stack all observations to write
where y and ε are N-dimensional vectors and X, as before, is of dimension N × K.
Notice the difference between this equation and (2.20)
In contrast to (2.20), equations (2.25) and (2.26) are population relationships, where
β is a vector of unknown parameters characterizing the population The sampling
pro-cess describes how the sample is taken from the population and, as a result, determines
the randomness of the sample In a first view, thex i variables are considered as fixed
and non-stochastic, which means that every new sample will have the same X matrix.
In this case one refers tox i as being deterministic A new sample only implies new
values forε i, or – equivalently – fory i The only relevant case where thex is are trulydeterministic is in a laboratory setting, where a researcher can set the conditions of agiven experiment (e.g temperature, air pressure) In economics we will typically have
to work with non-experimental data Despite this, it is convenient and in particularcases appropriate in an economic context to treat thex i variables as deterministic Inthis case, we will have to make some assumptions about the sampling distribution
of ε i A convenient one corresponds to random sampling where each error ε i is arandom drawing from the population distribution, independent of the other error terms
We shall return to this issue below
Trang 33THE LINEAR REGRESSION MODEL 15
In a second view, a new sample implies new values for bothx i andε i, so that each
time a new set of N observations for (y i , x i ) is drawn In this case random sampling
means that each set (x i , ε i ) or (y i , x i ) is a random drawing from the population
dis-tribution In this context, it will turn out to be important to make assumptions aboutthe joint distribution of x i and ε i, in particular regarding the extent to which the dis-tribution of ε i is allowed to depend upon X The idea of a (random) sample is most
easily understood in a cross-sectional context, where interest lies in a large and fixedpopulation, for example all UK households in January 1999, or all stocks listed at theNew York Stock Exchange on a given date In a time series context, different obser-vations refer to different time periods and it does not make sense to assume that wehave a random sample of time periods Instead, we shall take the view that the sample
we have is just one realization of what could have happened in a given time span andthe randomness refers to alternative states of the world In such a case we will need
to make some assumptions about the way the data are generated (rather than the waythe data are sampled)
It is important to realize that without additional restrictions the statistical model in
(2.25) is a tautology: for any value forβ one can always define a set of ε is such that(2.25) holds exactly for each observation We thus need to impose some assumptions
to give the model a meaning A common assumption is that the expected value ofε i
given all the explanatory variables inx i is zero, that isE {εi |xi} = 0 Usually, people
refer to this as the assumption saying that the x-variables are exogenous Under this
assumption it holds that
E {yi |xi } = x
i β, (2.27)
so that the regression line x iβ describes the conditional expectation of y i given thevalues forx i The coefficientsβ kmeasure how the expected value ofy i is changed if thevalue ofx ikis changed, keeping the other elements inx i constant (the ceteris paribus
condition) Economic theory, however, often suggests that the model in (2.25) describes
a causal relationship, in which the β coefficients measure the changes in y i caused
by a ceteris paribus change in x ik In such cases, ε i has an economic interpretation(not just a statistical one) and imposing that it is uncorrelated with x i, as we do byimposing E {εi |xi} = 0, may not be justified As in many cases it can be argued that
unobservables in the error term are related to observables inx i, we should be cautiousinterpreting our regression coefficients as measuring causal effects We shall come back
to these issues in Chapter 5
Now that ourβ coefficients have a meaning, we can try to use the sample (y i , x i ),
i = 1, , N to say something about it The rule which says how a given sample is
translated into an approximate value forβ is referred to as an estimator The result for
a given sample is called an estimate The estimator is a vector of random variables,
because the sample may change The estimate is a vector of numbers The most widely
used estimator in econometrics is the ordinary least squares (OLS) estimator This
is just the ordinary least squares rule described in Section 2.1 applied to the availablesample The OLS estimator forβ is thus given by
Trang 34Because we have assumed an underlying ‘true’ model (2.25), combined with a sampling
scheme, b is now a vector of random variables Our interest lies in the true unknown
parameter vectorβ, and b is considered an approximation to it While a given sample
only produces a single estimate, we evaluate the quality of it through the properties
of the underlying estimator The estimator b has a sampling distribution because its
value depends upon the sample that is taken (randomly) from the population
2.3.1 The Gauss–Markov Assumptions
In this section we shall discuss several important properties of the OLS estimator b.
To do so, we need to make some assumptions about the error term and the explanatoryvariablesx i The first set of assumptions we consider are the so-called Gauss–Markovassumptions These assumptions are usually standard in the first chapters of economet-rics textbooks, although – as we shall see below – they are not all strictly needed tojustify the use of the ordinary least squares estimator They just constitute a simple
case in which the small sample properties of b are easily derived.
The standard set of Gauss–Markov assumptions is given by
{ε1, , ε N } and {x1, , x N} are independent (A2)
cov{εi , ε j} = 0, i, j = 1, , N, i = j. (A4)
Assumption (A1) says that the expected value of the error term is zero, which means
that, on average, the regression line should be correct Assumption (A3) states that all
error terms have the same variance, which is referred to as homoskedasticity, while
assumption (A4) imposes zero correlation between different error terms This excludes
any form of autocorrelation Taken together, (A1), (A3) and (A4) imply that the error
terms are uncorrelated drawings from a distribution with expectation zero and constantvarianceσ2 Using the matrix notation from above, it is possible to rewrite these threeconditions as
where I N is the N × N identity matrix This says that the covariance matrix of the
vector of error termsε is a diagonal matrix with σ2 on the diagonal Assumption (A2)
implies that X and ε are independent This is a fairly strong assumption, which can be
relaxed somewhat (see below) It implies that
and
Trang 35SMALL SAMPLE PROPERTIES OF THE OLS ESTIMATOR 17
That is, the matrix of regressor values X does not provide any information about the
expected values of the error terms or their (co)variances The two conditions (2.30) and(2.31) combine the necessary elements from the Gauss–Markov assumptions neededfor the results below to hold Often, assumption (A2) is stated as: the regressor matrix
X is a deterministic nonstochastic matrix The reason for this is that the outcomes in
the matrix X can be taken as given without affecting the properties of ε, that is, one
can derive all properties conditional upon X For simplicity, we shall take this approach
in this section and Section 2.5 Under the Gauss–Markov assumptions (A1) and (A2),the linear model can be interpreted as the conditional expectation of y i givenx i, i.e
E {yi |xi} = x
i β This is a direct implication of (2.30).
2.3.2 Properties of the OLS Estimator
Under assumptions (A1)–(A4), the OLS estimator b for β has several desirable
prop-erties First of all, it is unbiased This means that, in repeated sampling, we can
expect that our estimator is on average equal to the true valueβ We formulate this as
E {b} = β It is instructive to see the proof:
E {b} = E{(XX)−1Xy } = E{β + (XX)−1Xε}
= β + E{(XX)−1Xε } = β.
The latter step here is essential and it follows from
E {(XX)−1Xε } = E{(XX)−1X}E{ε} = 0,
because X and ε are independent and E {ε} = 0 Note that we did not use assumptions
(A3) and (A4) in the proof This shows that the OLS estimator is unbiased as long
as the error terms are mean zero and independent of all explanatory variables, even ifheteroskedasticity or autocorrelation are present We shall come back to this issue inChapter 4
In addition to knowing that we are, on average, correct, we would also like to makestatements about how (un)likely it is to be far off in a given sample This means we
would like to know the distribution of b First of all, the variance of b (conditional upon X ) is given by
which, for simplicity, we shall denote byV {b} Implicitly, this means that we treat X
as deterministic The proof is fairly easy and goes as follows:
V {b} = E{(b − β)(b − β)} = E{(XX)−1XεεX(XX)−1}
= (XX)−1X(σ2I )X(XX)−1= σ2(XX)−1.
Trang 36Without using matrix notation the proof goes as follows:
The last result is collected in the Gauss–Markov Theorem, which says that under
assumptions (A1)–(A4) the OLS estimator b is the best linear unbiased estimator for
β In short we say that b is BLUE for β To appreciate this result, consider the class of
linear unbiased estimators A linear estimator is a linear function of the elements in y
and can be written as ˜b = Ay, where A is a K × N matrix The estimator is unbiased
if E {Ay} = β (Note that the OLS estimator is obtained for A = (XX)−1X.) Thenthe theorem states that the difference between the covariance matrices of ˜b = Ay and the OLS estimator b is always positive semi-definite What does this mean? Suppose
we are interested in some linear combination ofβ coefficients, given by dβ where d
is a K-dimensional vector Then the Gauss–Markov result implies that the variance of
the OLS estimatordb for dβ is at least as large as the variance of any other linear
unbiased estimatord˜b, that is
V {d˜b} ≥ V {db } for any vector d.
As a special case this holds for the k-th element and we have that
To estimate the variance of b we need to replace the unknown error variance σ2by an
estimate An obvious candidate is the sample variance of the residualse i = yi − x
(recalling that the average residual is zero) However, becausee i is different fromε i,
it can be shown that this estimator is biased forσ2 An unbiased estimator is given by
Trang 37SMALL SAMPLE PROPERTIES OF THE OLS ESTIMATOR 19
argument for this is that K parameters were chosen so as to minimize the residual sum
of squares and thus to minimize the sample variance of the residuals Consequently,
˜s2 is expected to underestimate the variance of the error term σ2 The estimator s2,with a degrees of freedom correction, is unbiased under assumptions (A1)–(A4); seeHayashi (2000, Section 1.3) or Greene (2003, Section 4.6) for a proof The variance
of b can thus be estimated by
The estimated variance of an element b k is given by s2c kk where c kk is the (k, k)
element in( i x i x i)−1 The square root of this estimated variance is usually referred to
as the standard error ofb k We shall denote it as se(b k ) It is the estimated standard
deviation ofb k and is a measure for the accuracy of the estimator Under assumptions(A1)–(A4), it holds that se(b k ) = s√ckk When the error terms are not homoskedastic
or exhibit autocorrelation, the standard error of the OLS estimatorb k will have to becomputed in a different way (see Chapter 4)
So far, we made no assumption about the shape of the distribution of the error terms
ε i , except that they were mutually uncorrelated, independent of X, had zero mean and a constant variance For exact statistical inference from a given sample of N observations,
explicit distributional assumptions have to be made.5 The most common assumption
is that the errors are jointly normally distributed.6 In this case the uncorrelatedness
of (A4) is equivalent to independence of all error terms The precise assumption is
as follows
ε∼N (0, σ2I N ), (A5)
saying thatε has an N-variate normal distribution with mean vector 0 and covariance
matrixσ2I N Assumption (A5) thus replaces (A1), (A3) and (A4) An alternative way
of formulating (A5) is
which is a shorthand way of saying that the error terms ε i are independent drawingsfrom a normal distribution (n.i.d.) with mean zero and varianceσ2 Even though errorterms are unobserved, this does not mean that we are free to make any assumption
we like For example, if error terms are assumed to follow a normal distribution thismeans that y i (for a given value of x i) also follows a normal distribution Clearly,
we can think of many variables whose distribution (conditional upon a given set of
x i variables) is not normal, in which case the assumption of normal error terms isinappropriate Fortunately, not all assumptions are equally crucial for the validity ofthe results that follow and, moreover, the majority of the assumptions can be testedempirically; see Chapters 3, 4 and 6 below
To make things simpler let us consider the X matrix as fixed and deterministic or, alternatively, let us work conditional upon the outcomes X Then the following result
5 Later we shall see that for approximate inference in large samples this is not necessary.
6 The distributions used in this text are explained in Appendix B.
Trang 38Table 2.1 OLS results wage equation
Dependent variable: wage
Variable Estimate Standard error constant 5.1469 0.0812
s = 3.2174 R2= 0.0317 F = 107.93 holds Under assumptions (A2) and (A5) the OLS estimator b is normally distributed
with mean vectorβ and covariance matrix σ2(XX)−1, i.e
b∼N (β, σ2(XX)−1) (2.37)
The proof of this follows directly from the result that b is a linear combination of all
ε i and is omitted here From this it also follows that each element in b is normally
distributed, for example
b k ∼N (β k , σ2c kk ), (2.38)
wherec kk is the (k, k ) element in (XX)−1 These results provide the basis for statistical
tests based upon the OLS estimator b.
2.3.3 Example: Individual Wages (Continued)
Let us now turn back to our wage example We can formulate a statistical model as
wage i = β1+ β2male i + εi , (2.39)
where wage i denotes the hourly wage rate of individual i and male i = 1 if i is male and
0 otherwise Imposing thatE {εi } = 0 and E{εi |malei} = 0 gives β1 the interpretation
of the expected wage rate for females, while E {wagei |malei = 1} = β1+ β2 is theexpected wage rate for males These are unknown population quantities and we maywish to estimate them Assume that we have a random sample, implying that differentobservations are independent Also assume that ε i is independent of the regressors,
in particular, that the variance ofε i does not depend upon gender (male i ) Then the
OLS estimator for β is unbiased and its covariance matrix is given by (2.32) The
estimation results are given in Table 2.1 In addition to the OLS estimates, identical
to those presented before, we now also know something about the accuracy of theestimates, as reflected in the reported standard errors We can now say that our estimate
of the expected hourly wage differentialβ2between males and females is $1.17 with astandard error of $0.11 Combined with the normal distribution, this allows us to makestatements aboutβ2 For example, we can test the hypothesis thatβ2= 0 If this is thecase, the wage differential between males and females in our sample is nonzero only
by chance Section 2.5 discusses how to test hypotheses regardingβ.
Trang 39i b and ¯y = (1/N)i y i denotes the sample mean ofy i Note that ¯y also
corresponds with the sample mean of ˆyi, because of (2.11)
From the first order conditions (compare (2.10)) it follows directly that
N
i=1
e i x ik = 0, k = 1, , K.
Consequently, we can writey i = ˆyi + ei where
i e i ˆyi = 0 In the most relevant casewhere the model contains an intercept term, it holds that
ˆV {yi} = ˆV { ˆyi } + ˆV {ei}, (2.41)
where ˆV {ei } = ˜s2 Using this, theR2 can be rewritten as
If the model of interest contains an intercept term, the two expressions for R2 in(2.40) and (2.42) are equivalent Moreover, in this case it can be shown that 0≤ R2≤
1 Only if alle i = 0 it holds that R2 = 1, while the R2 is zero if the model does notexplain anything in addition to the sample mean ofy i That is, theR2 of a model withjust an intercept term is zero by construction In this sense, theR2indicates how muchbetter the model performs than a trivial model with only a constant term
From the results in Table 2.1, we see that theR2 of the very simple wage equation
is only 0.0317 This means that only approximately 3.2% of the variation in individualwages can be attributed to gender differences Apparently, many other observable andunobservable factors affect a person’s wage besides gender This does not automaticallyimply that the model that was estimated in Table 2.1 is incorrect or useless: it justindicates the relative (un)importance of gender in explaining individual wage variation
In the exceptional cases that the model does not contain an intercept term, the two
expressions for R2 are not equivalent The reason is that (2.41) is violated because
N
i=1e i is no longer equal to zero In this situation it is possible that theR2computedfrom (2.42) becomes negative An alternative measure, which is routinely computed bysome software packages if there is no intercept, is the uncentredR2, which is defined as
Trang 40Because the R2 measures the explained variation in y i it is also sensitive to thedefinition of this variable For example, explaining wages is something different thanexplaining log wages, and the R2s will be different Similarly, models explainingconsumption, changes in consumption or consumption growth will not be directlycomparable in terms of their R2s It is clear that some sources of variation aremuch harder to explain than others For example, variation in aggregate consump-tion for a given country is usually easier to explain than the cross-sectional varia-tion in consumption over individual households Consequently, there is no absolutebenchmark to say that an R2 is ‘high’ or ‘low’ A value of 0.2 may be high incertain applications but low in others, and even a value of 0.95 may be low in cer-tain contexts.
Sometimes the R2 is interpreted as a measure of quality of the statistical model,
while in fact it measures nothing more than the quality of the linear approximation Asthe OLS approach is developed to give the best linear approximation, irrespective ofthe ‘true’ model and the validity of its assumptions, estimating a linear model by OLSwill always give the bestR2 possible Any other estimation method, and we will seeseveral below, will lead to lowerR2 values even though the corresponding estimatormay have much better statistical properties under the assumptions of the model Evenworse, when the model is not estimated by OLS the two definitions (2.40) and (2.42)are not equivalent and it is not obvious how anR2 should be defined For later use, weshall present an alternative definition of theR2, which for OLS is equivalent to (2.40)and (2.42), and for any other estimator is guaranteed to be between zero and one It isgiven by
R2= corr2{yi , ˆyi} =
Another drawback of theR2is that it will never decrease if the number of regressors
is increased, even if the additional variables have no real explanatory power A commonway to solve this is to correct the variance estimates in (2.42) for the degrees of
freedom This gives the so-called adjustedR2, or ¯R2, defined as
a variable is added to the set of regressors Note that, in extreme cases, the ¯R2 may