1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Class Notes in Statistics and Econometrics Part 1 pdf

88 368 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 88
Dung lượng 569,29 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The experiment in which one draws balls from urns shows clearly other aspect of this paradigm: the set of different possible outcomes isfixed beforehand, and the probability enters in th

Trang 1

Class Notes in Statistics and Econometrics

Trang 2

freely available The source code for these notes can be downloaded from

www.econ.utah.edu/ehrbar/ecmet-sources.zip Copyright Hans G Ehrbar der the GNU Public License

Trang 3

un-Contents

Trang 4

2.9 Bayes Theorem 48

3.10 Location Parameters and Dispersion Parameters of a Random Variable 89

4.1 Alternatives to the Linear Congruential Random Generator 125

Trang 5

4.4 Public Key Cryptology 133

Chapter 7 Chebyshev Inequality, Weak Law of Large Numbers, and Central

Trang 6

7.1 Chebyshev Inequality 189

8.3 Conditional Probability Distribution and Conditional Mean 212

9.2 Means and Variances of Quadratic Forms in Random Matrices 249

Trang 7

10.3 Special Case: Bivariate Normal 265

Chapter 13 Estimation Principles and Classification of Estimators 355

Trang 8

13.9 Sufficient Statistics and Estimation 397

15.1 Duality between Significance Tests and Confidence Regions 433

15.6 The Wald, Likelihood Ratio, and Lagrange Multiplier Tests 465

Trang 9

18.4 The Adjusted R-Square 509

19.2 Correlation Coefficients and the Associated Least Squares Problem 519

19.4 Some Remarks about the Sample Partial Correlation Coefficients 524

21.5 Instructions for Statistics 5969, Hans Ehrbar’s Section 547

Trang 10

22.1 Cobb Douglas Aggregate Production Function 563

Chapter 23 The Mean Squared Error as an Initial Criterion of Precision 629

24.5 Mallow’s Cp-Statistic as Estimator of the Mean Squared Error 668

Chapter 25 Variance Estimation: Should One Require Unbiasedness? 675

25.2 Derivation of the Best Bounded MSE Quadratic Estimator of the

Trang 11

25.3 Unbiasedness Revisited 688

27.3 Prediction of Future Observations in the Regression Model 720

Chapter 28 Updating of Estimates When More Observations become Available731

29.2 Conversion of an Arbitrary Constraint into a Zero Constraint 740

29.4 Constrained Least Squares as the Nesting of Two Simpler Models 748

Trang 12

29.9 Application: Biased Estimators and Pre-Test Estimators 764

32.4 Sensitivity of Estimates to Omission of One Observation 820

Trang 13

Chapter 34 Asymptotic Properties of the OLS Estimator 847

Chapter 35 Least Squares as the Normal Maximum Likelihood Estimate 855

39.1 Strongest Assumption: Error Term Well Behaved Conditionally on

39.3 Disturbances Correlated with Regressors in Same Observation 896

Trang 14

40.3 First Scenario: Minimizing relative increase in Mahalanobis distance

40.5 Third Scenario: one additonal observation in a Regression Model 909

41.4 Interpretation in terms of Studentized Mahalanobis Distance 932

42.3 The F-Test Statistic is a Function of the Likelihood Ratio 966

Trang 15

43.3 Large-Sample Simultaneous Confidence Regions 983

45.1 Categorical Variables: Regression with Dummies and Factors 998

46.1 Alternating Least Squares and Alternating Conditional Expectations1020

46.2 Additivity and Variance Stabilizing Transformations (avas) 1027

Trang 16

47.7 Other Approaches to Density Estimation 1036

Trang 17

Chapter 51 Distinguishing Random Variables from Variables Created by a

53.9 Estimation When the Error Covariance Matrix is Exactly Known 1165

Trang 18

54.4 Exchange Rate Forecasts 1186

Chapter 57 Applications of GLS with Nonspherical Covariance Matrix 1233

Trang 19

Chapter 60 Bootstrap Estimators 1299

Chapter 63 Independent Observations from the Same Multivariate Population1333

Trang 20

64.4 Relation between the three Models so far: 1365

Chapter 65 Disturbance Related (Seemingly Unrelated) Regressions 1375

Trang 21

67.3 Nonstationary Processes 1460

69.1 Fisher’s Scoring and Iteratively Reweighted Least Squares 1487

Trang 22

A.7 Determinants 1528

Trang 23

P.547 gives instructions how to download it.

Here are some features by which these notes may differ from other teachingmaterial available:

Trang 24

• A typographical distinction is made between random variables and the ues taken by them (page63).

val-• Best linear prediction of jointly distributed random variables is given as asecond basic building block next to the least squares model (chapter27)

• Appendix A gives a collection of general matrix formulas in which the inverse is used extensively

g-• The “deficiency matrix,” which gives an algebraic representation of the nullspace of a matrix, is defined and discussed in AppendixA.4

• A molecule-like notation for concatenation of higher-dimensional arrays isintroduced in Appendix B and used occasionally, see (10.5.7), (64.3.2),(65.0.18)

Other unusual treatments can be found in chapters/sections3.11,18.3,25,29,40,36,

41–42, and64 There are a number of plots of density functions, confidence ellipses,and other graphs which use the full precision of TEX, and more will be added in thefuture Some chapters are carefully elaborated, while others are still in the process

of construction In some topics covered in those notes I am an expert, in others I amstill a beginner

This edition also includes a number of comments from a critical realist spective, inspired by [Bha78] and [Bha93]; see also [Law89] There are manysituations in the teaching of probability theory and statistics where the concept of

Trang 25

per-totality, transfactual efficacy, etc., can and should be used These comments are still

at an experimental state, and are the students are not required to know them for theexams In the on-line version of the notes they are printed in a different color

After some more cleaning out of the code, I am planning to make the AMS-LATEXsource files for these notes publicly available under the GNU public license, and up-load them to the TEX-archive networkCTAN Since I am using Debian GNU/Linux,the materials will also be available as a deb archive

The most up-to-date version will always be posted at the web site of the ics Department of the University of Utah www.econ.utah.edu/ehrbar/ecmet.pdf.You can contact me by email atehrbar@econ.utah.edu

Econom-Hans Ehrbar

Trang 27

• Games of chance: throwing dice, shuffling cards, drawing balls out of urns.

• Quality control in production: you take a sample from a shipment, counthow many defectives

• Actuarial Problems: the length of life anticipated for a person who has justapplied for life insurance

• Scientific Eperiments: you count the number of mice which contract cancerwhen a group of mice is exposed to cigarette smoke

Trang 28

• Markets: the total personal income in New York State in a given month.

• Meteorology: the rainfall in a given month

• Uncertainty: the exact date of Noah’s birth

• Indeterminacy: The closing of the Dow Jones industrial average or thetemperature in New York City at 4 pm on February 28, 2014

• Chaotic determinacy: the relative frequency of the digit 3 in the decimalrepresentation of π

• Quantum mechanics: the proportion of photons absorbed by a polarizationfilter

• Statistical mechanics: the velocity distribution of molecules in a gas at agiven pressure and temperature

In the probability theoretical literature the situations in which probability theoryapplies are called “experiments,” see for instance [R´en70, p 1] We will not use thisterminology here, since probabilistic reasoning applies to several different types ofsituations, and not all these can be considered “experiments.”

Problem 1 (This question will not be asked on any exams) R´enyi says: serving how long one has to wait for the departure of an airplane is an experiment.”Comment

Trang 29

“Ob-Answer R´ eny commits the epistemic fallacy in order to justify his use of the word iment.” Not the observation of the departure but the departure itself is the event which can be theorized probabilistically, and the word “experiment” is not appropriate here 

“exper-What does the fact that probability theory is appropriate in the above situationstell us about the world? Let us go through our list one by one:

• Games of chance: Games of chance are based on the sensitivity on initialconditions: you tell someone to roll a pair of dice or shuffle a deck of cards,and despite the fact that this person is doing exactly what he or she is asked

to do and produces an outcome which lies within a well-defined universeknown beforehand (a number between 1 and 6, or a permutation of thedeck of cards), the question which number or which permutation is beyondtheir control The precise location and speed of the die or the precise order

of the cards varies, and these small variations in initial conditions give rise,

by the “butterfly effect” of chaos theory, to unpredictable final outcomes

A critical realist recognizes here the openness and stratification of theworld: If many different influences come together, each of which is gov-erned by laws, then their sum total is not determinate, as a naive hyper-determinist would think, but indeterminate This is not only a conditionfor the possibility of science (in a hyper-deterministic world, one could notknow anything before one knew everything, and science would also not be

Trang 30

necessary because one could not do anything), but also for practical humanactivity: the macro outcomes of human practice are largely independent ofmicro detail (the postcard arrives whether the address is written in cursive

or in printed letters, etc.) Games of chance are situations which erately project this micro indeterminacy into the macro world: the microinfluences cancel each other out without one enduring influence taking over(as would be the case if the die were not perfectly symmetric and balanced)

delib-or deliberate human cdelib-orrective activity stepping into the void (as a cardtrickster might do if the cards being shuffled somehow were distinguishablefrom the backside)

The experiment in which one draws balls from urns shows clearly other aspect of this paradigm: the set of different possible outcomes isfixed beforehand, and the probability enters in the choice of one of thesepredetermined outcomes This is not the only way probability can arise;

an-it is an extensionalist example, in which the connection between successand failure is external The world is not a collection of externally relatedoutcomes collected in an urn Success and failure are not determined by achoice between different spacially separated and individually inert balls (orplaying cards or faces on a die), but it is the outcome of development andstruggle that is internal to the individual unit

Trang 31

• Quality control in production: you take a sample from a shipment, counthow many defectives Why is statistics and probability useful in produc-tion? Because production is work, it is not spontaneous Nature does notvoluntarily give us things in the form in which we need them Production

is similar to a scientific experiment because it is the attempt to create localclosure Such closure can never be complete, there are always leaks in it,through which irregularity enters

• Actuarial Problems: the length of life anticipated for a person who hasjust applied for life insurance Not only production, but also life itself is

a struggle with physical nature, it is emergence And sometimes it fails:sometimes the living organism is overwhelmed by the forces which it tries

to keep at bay and to subject to its own purposes

• Scientific Eperiments: you count the number of mice which contract cancerwhen a group of mice is exposed to cigarette smoke: There is local closureregarding the conditions under which the mice live, but even if this clo-sure were complete, individual mice would still react differently, because ofgenetic differences No two mice are exactly the same, and despite thesedifferences they are still mice This is again the stratification of reality Twomice are two different individuals but they are both mice Their reaction

to the smoke is not identical, since they are different individuals, but it is

Trang 32

not completely capricious either, since both are mice It can be predictedprobabilistically Those mechanisms which make them mice react to thesmoke The probabilistic regularity comes from the transfactual efficacy ofthe mouse organisms.

• Meteorology: the rainfall in a given month It is very fortunate for thedevelopment of life on our planet that we have the chaotic alternation be-tween cloud cover and clear sky, instead of a continuous cloud cover as inVenus or a continuous clear sky Butterfly effect all over again, but it ispossible to make probabilistic predictions since the fundamentals remainstable: the transfactual efficacy of the energy received from the sun andradiated back out into space

• Markets: the total personal income in New York State in a given month.Market economies are a very much like the weather; planned economieswould be more like production or life

• Uncertainty: the exact date of Noah’s birth This is epistemic uncertainty:assuming that Noah was a real person, the date exists and we know a timerange in which it must have been, but we do not know the details Proba-bilistic methods can be used to represent this kind of uncertain knowledge,but other methods to represent this knowledge may be more appropriate

Trang 33

• Indeterminacy: The closing of the Dow Jones Industrial Average (DJIA)

or the temperature in New York City at 4 pm on February 28, 2014: This

is ontological uncertainty, not only epistemological uncertainty Not only

do we not know it, but it is objectively not yet decided what these datawill be Probability theory has limited applicability for the DJIA since itcannot be expected that the mechanisms determining the DJIA will be thesame at that time, therefore we cannot base ourselves on the transfactualefficacy of some stable mechanisms It is not known which stocks will beincluded in the DJIA at that time, or whether the US dollar will still bethe world reserve currency and the New York stock exchange the pinnacle

of international capital markets Perhaps a different stock market indexlocated somewhere else will at that time play the role the DJIA is playingtoday We would not even be able to ask questions about that alternativeindex today

Regarding the temperature, it is more defensible to assign a probability,since the weather mechanisms have probably stayed the same, except forchanges in global warming (unless mankind has learned by that time tomanipulate the weather locally by cloud seeding etc.)

• Chaotic determinacy: the relative frequency of the digit 3 in the decimalrepresentation of π: The laws by which the number π is defined have very

Trang 34

little to do with the procedure by which numbers are expanded as decimals,therefore the former has no systematic influence on the latter (It has aninfluence, but not a systematic one; it is the error of actualism to think thatevery influence must be systematic.) But it is also known that laws canhave remote effects: one of the most amazing theorems in mathematics isthe formula π4 = 1 −13+15−1

4+ · · · which estalishes a connection betweenthe geometry of the circle and some simple arithmetics

• Quantum mechanics: the proportion of photons absorbed by a polarizationfilter: If these photons are already polarized (but in a different directionthan the filter) then this is not epistemic uncertainty but ontological inde-terminacy, since the polarized photons form a pure state, which is atomic

in the algebra of events In this case, the distinction between epistemic certainty and ontological indeterminacy is operational: the two alternativesfollow different mathematics

un-• Statistical mechanics: the velocity distribution of molecules in a gas at agiven pressure and temperature Thermodynamics cannot be reduced tothe mechanics of molecules, since mechanics is reversible in time, whilethermodynamics is not An additional element is needed, which can bemodeled using probability

Trang 35

Problem 2 Not every kind of uncertainty can be formulated stochastically.Which other methods are available if stochastic means are inappropriate?

Pure uncertainty is as hard to generate as pure certainty; it is needed for cryption and numerical methods

en-Here is an encryption scheme which leads to a random looking sequence of bers (see [Rao97, p 13]): First a string of binary random digits is generated which isknown only to the sender and receiver The sender converts his message into a string

num-of binary digits He then places the message string below the key string and obtains

a coded string by changing every message bit to its alternative at all places wherethe key bit is 1 and leaving the others unchanged The coded string which appears

to be a random binary sequence is transmitted The received message is decoded by

Trang 36

making the changes in the same way as in encrypting using the key string which isknown to the receiver.

Problem 4 Why is it important in the above encryption scheme that the keystring is purely random and does not have any regularities?

Problem 5 [Knu81, pp 7, 452] Suppose you wish to obtain a decimal digit atrandom, not using a computer Which of the following methods would be suitable?

• a Open a telephone directory to a random place (i.e., stick your finger in itsomewhere) and use the unit digit of the first number found on the selected page

Answer This will often fail, since users select “round” numbers if possible In some areas, telephone numbers are perhaps assigned randomly But it is a mistake in any case to try to get several successive random numbers from the same page, since many telephone numbers are listed

• b Same as a, but use the units digit of the page number

Answer But do you use the left-hand page or the right-hand page? Say, use the left-hand

• c Roll a die which is in the shape of a regular icosahedron, whose twenty faceshave been labeled with the digits 0, 0, 1, 1, , 9, 9 Use the digit which appears on

Trang 37

top, when the die comes to rest (A felt table with a hard surface is recommended forrolling dice.)

Answer The markings on the face will slightly bias the die, but for practical purposes this method is quite satisfactory See Math Comp 15 (1961), 94–95, for further discussion of these

•d Expose a geiger counter to a source of radioactivity for one minute (shieldingyourself ) and use the unit digit of the resulting count (Assume that the geigercounter displays the number of counts in decimal notation, and that the count isinitially zero.)

Answer This is a difficult question thrown in purposely as a surprise The number is not uniformly distributed! One sees this best if one imagines the source of radioactivity is very low level, so that only a few emissions can be expected during this minute If the average number of emissions per minute is λ, the probability that the counter registers k is e−λλk/k! (the Poisson distribution) So the digit 0 is selected with probability e−λP∞k=0λ10k/(10k)!, etc 

• e Glance at your wristwatch, and if the position of the second-hand is between6n and 6(n + 1), choose the digit n

Answer Okay, provided that the time since the last digit selected in this way is random A bias may arise if borderline cases are not treated carefully A better device seems to be to use a stopwatch which has been started long ago, and which one stops arbitrarily, and then one has all

Trang 38

• f Ask a friend to think of a random digit, and use the digit he names.Answer No, people usually think of certain digits (like 7) with higher probability 

•g Assume 10 horses are entered in a race and you know nothing whatever abouttheir qualifications Assign to these horses the digits 0 to 9, in arbitrary fashion, andafter the race use the winner’s digit

Answer Okay; your assignment of numbers to the horses had probability 1/10 of assigning a

2.2 Events as SetsWith every situation with uncertain outcome we associate its sample space U ,which represents the set of all possible outcomes (described by the characteristicswhich we are interested in)

Events are associated with subsets of the sample space, i.e., with bundles ofoutcomes that are observable in the given experimental setup The set of all events

we denote with F (F is a set of subsets of U )

Look at the example of rolling a die U = {1, 2, 3, 4, 5, 6} The events of getting

an even number is associated with the subset {2, 4, 6}; getting a six with {6}; notgetting a six with {1, 2, 3, 4, 5}, etc Now look at the example of rolling two indistin-guishable dice Observable events may be: getting two ones, getting a one and a two,

Trang 39

etc But we cannot distinguish between the first die getting a one and the second atwo, and vice versa I.e., if we define the sample set to be U = {1, , 6} × {1, , 6},i.e., the set of all pairs of numbers between 1 and 6, then certain subsets are notobservable {(1, 5)} is not observable (unless the dice are marked or have differentcolors etc.), only {(1, 5), (5, 1)} is observable.

If the experiment is measuring the height of a person in meters, and we makethe idealized assumption that the measuring instrument is infinitely accurate, thenall possible outcomes are numbers between 0 and 3, say Sets of outcomes one isusually interested in are whether the height falls within a given interval; thereforeall intervals within the given range represent observable events

If the sample space is finite or countably infinite, very often all subsets areobservable events If the sample set contains an uncountable continuum, it is notdesirable to consider all subsets as observable events Mathematically one can definequite crazy subsets which have no practical significance and which cannot be mean-ingfully given probabilities For the purposes of Econ 7800, it is enough to say thatall the subsets which we may reasonably define are candidates for observable events.The “set of all possible outcomes” is well defined in the case of rolling a dieand other games; but in social sciences, situations arise in which the outcome isopen and the range of possible outcomes cannot be known beforehand If one uses

a probability theory based on the concept of a “set of possible outcomes” in such

Trang 40

a situation, one reduces a process which is open and evolutionary to an imaginarypredetermined and static “set.” Furthermore, in social theory, the mechanism bywhich these uncertain outcomes are generated are often internal to the members ofthe statistical population The mathematical framework models these mechanisms

as an extraneous “picking an element out of a pre-existing set.”

From given observable events we can derive new observable events by set retical operations (All the operations below involve subsets of the same U )

theo-Mathematical Note: Notation of sets: there are two ways to denote a set: either

by giving a rule, or by listing the elements (The order in which the elements arelisted, or the fact whether some elements are listed twice or not, is irrelevant.)

Here are the formal definitions of set theoretic operations The letters A, B, etc.denote subsets of a given set U (events), and I is an arbitrary index set ω stands

Ngày đăng: 04/07/2014, 15:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN