Introduction to Mathematical Statistics - Hogg & McKean & Craig

distribution, which we will derive from a pair of independent r random variables.. We shall show that Y1 and Y2 are independent. The space S is, exclusive of the points on the coordin[r]

Trang 2

Introduction

to Mathematical Statistics

University of Iowa

Pearson Education International

Trang 3

If you purchased this book within the United States or Canada, you should be aware that it has been wrongly imported without the approval of the Publisher or Author Executive Acquisitions Editor: G eorg e Lobell

Executive Editor-in-Chief: Sally Y agan

Vice President/Director of Production and Manufacturing: D avid W Riccard i

Production Editor: Bayani M endoza de Leon

Senior Managing Editor: Linda M ihatov Behrens

Executive Managing Editor: K athleen Schiaparelli

Assistant lVIanufacturing lVIanagerfBuyer: M ichael Bell

Manufacturing Manager: Tru dy P isciotti

Marketing Manager: Halee D insey

Marketing Assistant: Rachael Beck man

Art Director: J ayne Conte

Cover Designer: Bruce K enselaar

Art Editor: Thomas Benfatti

Editorial Assistant: J ennifer Bro dy

Cover Image: Th n shell (Tonna galea) D avid Roberts/Science P hoto Libmry/P hoto Researchers, I nc

@2005, 1995, 1978, 1970, 1965, 1958 Pearson Education, Inc Pearson Prentice Hall

Pearson Education, Inc

Upper Saddle River, NJ 07458

Pearson Prentice Hall® is a trademark of Pearson Education, Inc

Printed in the United States of America

10 9 8 7 6 5 4 3

ISBN: 0-13-122605-3

Pearson Education, Ltd., London

Pearson Education Australia PTY Limited, Sydney

Pearson Education Singapore, Pte., Ltd

Pearson Education North Asia Ltd, Hong K ong

Pearson Education Canada, Ltd., Toronto

Pearson Education de Mexico, S.A de C.V

Pearson Education - Japan, Tok yo

Pearson Education Malaysia, Pte Ltd

Pearson Education, U pper Saddle River, N ew J ersey

Trang 4

To Ann and to Marge

Trang 6

1.3 The Probability Set Function

1.4 Conditional Probability and Independence

1.8 Expectation of a Random Variable

1.9 Some Special Expectations

1.10 Important Inequalities

2 Multivariate Distributions

2.1 Distributions of Two Random Variables

2.1.1 Expectation

2.2 'Iransformations: Bivariate Random Variables

2.3 Conditional Distributions and Expectations

2.4 The Correlation Coefficient

2.5 Independent Random Variables

2.6 Extension to Several Random Variables

2.6.1 *Variance-Covariance

2 7 'Iransformations: Random Vectors

3 Some Special Distributions

3.1 The Binomial and Related Distributions

3.2 The Poisson Distribution

3.3 The r, x2, and {3 Distributions

3.4 The Normal Distribution

Trang 7

4.3.3 Moment Generating Function Technique

4.4 Central Limit Theorem

4.5 * Asymptotics for Multivariate Distributions

5 Some Elementary Statistical Inferences

5.1 Sampling and Statistics

5.2 Order Statistics

5.2.1 Quantiles

5.2.2 Confidence Intervals of Quantiles

5.3 *Tolerance Limits for Distributions

5.4 More on Confidence Intervals

5.4.1 Confidence Intervals for Differences in Means

5.4.2 Confidence Interval for Difference in Proportions

5.5 Introduction to Hypothesis Testing

5.6 Additional Comments About Statistical Tests

5 7 Chi-Square Tests

5.8 The Method of Monte Carlo

5.8.1 Accept-Reject Generation Algorithm

5.9 Bootstrap Procedures

5.9.1 Percentile Bootstrap Confidence Intervals

5.9.2 Bootstrap Testing Procedw·es

6 Maximum Likelihood Methods

6.1 Maximum Likelihood Estimation

6.2 Rao-Cramer Lower Bound and Efficiency

6.3 Maximum Likelihood Tests

6.4 Multiparameter Case: Estimation

6.5 Multiparameter Case: Testing

Trang 8

Contents

7 Sufficiency

7.1 Measures of Quality of Estimators

7.2 A Sufficient Statistic for a Parameter

7.3 Properties of a Sufficient Statistic

7.4 Completeness and Uniqueness

7.5 The Exponential Class of Distributions

7.6 Functions of a Parameter

7 7 The Case of Several Parameters

7.8 Minimal Sufficiency and Ancillary Statistics

7.9 Sufficiency, Completeness and Independence

8 Optimal Tests of Hypotheses

8.1 Most Powerful Tests

8.2 Uniformly Most Powerful Tests

8.3 Likelihood Ratio Tests

8.4 The Sequential Probability Ratio Test

8.5 Minimax and Classification Procedures

9.8 The Distributions of Certain Quadratic Forms

9.9 The Independence of Certain Quadratic Forms

10 Nonparametric Statistics

10.1 Location Models

10.2 Sample Median and Sign Test

10.2.1 Asymptotic Relative Efficiency

10.2.2 Estimating Equations Based on Sign Test

10.2.3 Confidence Interval for the Median

Trang 9

viii

10.5.1 Efficacy

10.5.2 Estimating Equations Based on General Scores

10.5.3 Optimization: Best Estimates

11.2.1 Prior and Posterior Distributions

11.2.2 Bayesian Point Estimation

11.2.3 Bayesian Interval Estimation

11.2.4 Bayesian Testing Procedures

11.2.5 Bayesian Sequential Procedures

11.3 More Bayesian Terminology and Ideas

12.5.1 Distribution Theory for the LS Test for Normal Errors 650

12.5.3 Examples 652

Trang 12

Preface

Since Allen T Craig's death in 1978, Bob Hogg has revised the later editions of this text However, when Prentice Hall asked him to consider a sixth edition, he thought of his good friend, Joe McKean, and asked him to help That was a great choice for Joe made many excellent suggestions on which we both could agree and these changes are outlined later in this preface

In addition to Joe's ideas, our colleague Jon Cryer gave us his marked up copy

of the fifth edition from which we changed a number of items Moreover, George Woodworth and Kate Cowles made a number of suggestions concerning the new Bayesian chapter; in particular, Woodworth taught us about a "Dutch book" used

in many Bayesian proofs Of course, in addition to these three, we must thank others, both faculty and students, who have made worthwhile suggestions However, our greatest debts of gratitude are for our special friend, Tom Hettmansperger of Penn State University, who used our revised notes in his mathematical statistics course during the 2002-2004 academic years and Suzanne Dubnicka of Kansas State University who used our notes in her mathematical statistics course during Fall

of 2003 From these experiences, Tom and Suzanne and several of their students provided us with new ideas and corrections

While in earlier editions, Hogg and Craig had resisted adding any "real" problems, Joe did insert a few among his more important changes While the level of the book is aimed for beginning graduate students in Statistics, it is also suitable for senior undergraduate mathematics, statistics and actuarial science majors The major differences between this edition and the fifth edition are:

• It is easier to find various items because more definitions, equations, and theorems are given by chapter, section, and display numbers Moreover, many theorems, definitions, and examples are given names in bold faced type for easier reference

• Many of the distribution finding techniques, such as transformations and moment generating methods, are in the first three chapters The concepts of expectation and conditional expectation are treated more thoroughly in the first two chapters

• Chapter 3 on special distributions now includes contaminated normal distributions, the multivariate normal distribution, the t-and F-distributions, and

a section on mixture distributions

xi

Trang 13

• Maximum likelihood methods, Chapter 6, have been expanded For illustration, the regulru·ity conditions have been listed which allows us to provide better proofs of a number of associated theorems, such as the limiting distributions of the maximum likelihood procedures This forms a more complete inference for these important methods The EM algorithm is discussed and is applied to several maximum likelihood situations

• Chapters 7-9 contain material on sufficient statistics, optimal tests of hypotheses, and inferences about normal models

• Chapters 10-12 contain new material Chapter 10 presents nonpru·runetric procedures for the location models and simple lineru· regression It presents estimation and confidence intervals as well as testing Sections on optimal scores ru1d adaptive methods are presented Chapter 11 offers an introduction

to Bayesian methods This includes traditional Bayesian procedures as well

as Markov Chain Monte Carlo procedures, including the Gibbs srunpler, for hierru·chical and empirical Bayes procedures Chapter 12 offers a comparison

of robust and traditional least squru·es methods for linear models It introduces the concepts of influence functions and breakdown points for estimators Not every instructor will include these new chapters in a two-semester course, but those interested in one of these ru·eas will find their inclusion very worthwhile These last three chapters ru·e independent of one another

• We have occasionally made use of the statistical softwares R, (Ihaka and Gentleman, 1996), and S-PLUS, (S-PLUS, 2000), in this edition; see Venables and Ripley (2002) Students do not need resource to these paclmges to use the text but the use of one (or that of another package) does add a computational flavor The paclmge R is freewru·e which can be downloaded for free at the site

http:/ /lib.stat.cmu.edu/R/CRAN/

Trang 14

Preface xiii

There are versions of R for unix, pc and mac platforms We have written some R functions for several procedures in the text These we have listed in Appendix B but they can also be downloaded at the site

http:/ /www.stat.wmich.edu/mckean/HMC/Rcode These functions will run in S-PLUS also

• The reference list has been expanded so that instructors and students can find the original sources better

• The order of presentation has been greatly improved and more exercises have been added As a matter of fact, there are now over one thousand exercises and, further, many new examples have been added

Most instructors will find selections from the first nine chapters sufficient for a twosemester course However, we hope that many will want to insert one of the three topic chapters into their course As a matter of fact, there is really enough material for a three semester sequence, which at one time we taught at the University of Iowa A few optional sections have been marked with an asterisk

We would like to thank the following reviewers who read through earlier versions

of the manuscript: Walter Freiberger, Brown University; John Leahy, University

of Oregon; Bradford Crain, Portland State University; Joseph S Verducci, Ohio State University and Hosam M Mahmoud, George Washington University Their suggestions were helpful in editing the final version

Finally, we would like to thank George Lobell and Prentice Hall who provided funds to have the fifth edition converted to Y.'IEX2c-and Kimberly Crimin who carried out this work It certainly helped us in writing the sixth edition in J!l!EX2c- Also, a special thanks to Ash Abe be for technical assistance Last, but not least, we must thank our wives, Ann and Marge, who provided great support for our efforts Let's hope the readers approve of the results

Bob Hogg

J oe M cK ean

joe@stat wmich.edu

Trang 16

a drug that is to be administered; or an economist may be concerned with the prices of three specified commodities at various time intervals; or the agronomist may wish to study the effect that a chemical fertilizer has on the yield of a cereal grain The only way in which an investigator can elicit information about any such phenomenon is to perform the experiment Each experiment terminates with an

outcome But it is characteristic of these experiments that the outcome cannot be predicted with certainty prior to the performance of the experiment

Suppose that we have such an experiment, the outcome of which cannot be predicted with certainty, but the experiment is of such a nature that a collection

of every possible outcome can be described prior to its performance If this kind

of experiment can be repeated under the same conditions, it is called a mndom exp eri ment, and the collection of every possible outcome is called the experimental space or the sample space

Example 1 1 1 In the toss of a coin, let the outcome tails be denoted by T and let the outcome heads be denoted by H If we assume that the coin may be repeatedly tossed under the same conditions, then the toss of this coin is an example of a random experiment in which the outcome is one of the two symbols T and H; that

is, the sample space is the collection of these two symbols •

Example 1 1 2 In the cast of one red die and one white die, let the outcome be the ordered pair (number of spots up on the red die, number of spots up on the white die) If we assume that these two dice may be repeatedly cast under the same conditions, then the cast of this pair of dice is a random experiment The sample space consists of the 36 ordered pairs: (1, 1), , (1, 6), (2, 1), , (2, 6), , (6, 6) •

Let C denote a sample space, let c denote an element of C, and let C represent a collection of elements of C If, upon the performance of the experiment, the outcome

1

Trang 17

2 Probability and Distributions

is in C, we shall say that the event C has occurred Now conceive of our having made N repeated performances of the random experiment Then we can count the number f of times (the frequency) that the event C actually occurred throughout the N performances The ratio fIN is called the relative fre quency of the event

C in these N experiments A relative frequency is usually quite erratic for small values of N, as you can discover by tossing a coin But as N increases, experience indicates that we associate with the event C a number, say p, that is equal or approximately equal to that number about which the relative frequency seems to stabilize If we do this, then the number p can be interpreted as that number which,

in future performances of the experiment, the relative frequency of the event C will either equal or approximate Thus, although we cannot predict the outcome of

a random experiment, we can, for a large value of N, predict approximately the relative frequency with which the outcome will be in C The number p associated with the event C is given various names Sometimes it is called the probability that the outcome of the random experiment is in C; sometimes it is called the probability

of the event C; and sometimes it is called the probability measure of C The context usually suggests an appropriate choice of terminology

Example 1 1 3 Let C denote the sample space of Example 1.1.2 and let C be the collection of every ordered pair of C for which the sum of the pair is equal to seven Thus C is the collection (1, 6), ( 2, 5), (3, 4), (4, 3), (5, 2), and (6, 1) Suppose that the dice are cast N = 400 times and let f, the frequency of a sum of seven, be f = 60

Then the relative frequency with which the outcome was in C is fIN= :0°0 = 0.15

Thus we might associate with C a number p that is close to 0.15, and p would be called the probability of the event C •

Remark 1 1 1 The preceding interpretation of probability is sometimes referred

to as the relative fr equency approach, and it obviously depends upon the fact that an experiment can be repeated under essentially identical conditions However, many persons extend probability to other situations by treating it as a rational measure

of belief For example, the statement p = � would mean to them that their personal

or subj ective probability of the event C is equal to � Hence, if they are not opposed

to gambling, this could be interpreted as a willingness on their part to bet on the outcome of C so that the two possible payoffs are in the ratio pI ( 1 -p) = �I� = � Moreover, if they truly believe that p = � is correct, they would be willing to accept either side of the bet: (a) win 3 units if C occurs and lose 2 if it does not occur, or (b) win 2 units if C does not occur and lose 3 if it does However, since the mathematical properties of probability given in Section 1.3 are consistent with either of these interpretations, the subsequent mathematical development does not depend upon which approach is used •

The primary purpose of having a mathematical theory of statistics is to provide mathematical models for random experiments Once a model for such an experiment has been provided and the theory worked out in detail, the statistician may, within this framework, mal{e inferences (that is, draw conclusions) about the random experiment The construction of such a model requires a theory of probability One of the more logically satisfying theories of probability is that based on the concepts of sets and functions of sets These concepts are introduced in Section 1.2

Trang 18

1 2 Set Theory 3

The concept of a set or a collection of objects is usually left undefined However,

a particular set can be described so that there is no misunderstanding as to what collection of objects is under consideration For example, the set of the first 10

positive integers is sufficiently well described to make clear that the numbers � and

14 are not in the set, while the number 3 is in the set If an object belongs to a set, it is said to be an element of the set For example, if C denotes the set of real numbers x for which 0 � x � 1, then � is an element of the set C The fact that

� is an element of the set C is indicated by writing � E C More generally, c E C means that c is an element of the set C

The sets that concern us will frequently be sets of num bers However, the language of sets of points proves somewhat more convenient than that of sets of numbers Accordingly, we briefly indicate how we use this terminology In analytic geometry considerable emphasis is placed on the fact that to each point on a line (on which an origin and a unit point have been selected) there corresponds one and only one number, say x; and that to each number x there corresponds one and only one point on the line This one-to-one correspondence between the numbers and points on a line enables us to speak, without misunderstanding, of the "point

x" instead of the "number x." Furthermore, with a plane rectangular coordinate system and with x and y numbers, to each symbol ( x, y) there corresponds one and only one point in the plane; and to each point in the plane there corresponds but one such symbol Here again, we may speak of the "point (x, y) ," meaning the

"ordered number pair x and y." This convenient language can be used when we have a rectangular coordinate system in a space of three or more dimensions Thus the "point (x1, x2 , , Xn)" means the numbers x17 x2 , , Xn in the order stated Accordingly, in describing our sets, we frequently speak of a set of points (a set whose elements are points), being careful, of course, to describe the set so as to avoid any ambiguity The notation C = { x : 0 � x � 1} is read "C is the one-dimensional set

of points x for which 0 � x � 1." Similarly, C = {(x, y) : 0 � x � 1,0 � y � 1}

can be read "C is the two-dimensional set of points (x, y) that are interior to, or on the boundary of, a square with opposite vertices at (0, 0) and (1, 1) " We now give some definitions (together with illustrative examples) that lead to an elementary algebra of sets adequate for our purposes

Definition 1 2 1 If each element of a set C1 is also an ele ment of set C2, the

se t C1 is called a subset of the set C2 This is indicated by wri ting C1 c C2

If C1 c C2 and also C2 c C1, the two sets have the same e lements, and this is indicated by writing cl = c2

Example 1 2 1 Let cl = {x : 0 �X� 1} and c2 = {x : -1 �X� 2} Here the one-dimensional set C1 is seen to be a subset of the one-dimensional set C2; that

is, C1 c C2 Subsequently, when the dimensionality of the set is clear, we shall not make specific reference to it •

Example 1.2.2 Define the two sets cl = {(x, y) : 0 � X = y � 1} and c2 =

{(x, y) : 0 �X � 1, 0 � y � 1} Because the elements of cl are the points on one diagonal of the square, then cl c c2 •

Trang 19

Definition 1.2.2 indicated by writing C If a set C has no elements, C is called the = r/J null set This is Definition 1 2 3 and C2 is called the The set of all elements that belong to at least one of the sets C1

union of C1 and C2 The union of C1 and C2 is indicated by writing C1 U G2 The union of several sets C1, C2, Ca, is the set of all elements that belong to at least one of the several sets, denoted by C1 U G2 U Ga U · · · or by G1 U G2 U · · · U Ck if a finite number k of sets is involved

Example 1.2.3 Define the sets G1 = {x : x = 8, 9, 10, 11, or 11 < x � 12} and

c2 = {x : X= 0, 1, '10} Then

{x : x = 0, 1, , 8, 9, 10, 11, or 11 < x � 12}

{X : X = 0, 1, , 8, 9, 10 Or 11 � X � 12} • Example 1 2.4 Define G1 and G2 as in Example 1.2.1 Then G1 U G2 = C2 • Example 1 2 5 Let G2 = r/J Then G1 U G2 = C1, for every set G1 •

Example 1.2.6 For every set C, C U C = C •

Example 1 2 7 Let

Ck = {X : k!.l � X � 1} , k = 1, 2, 3,

Then G1 U G2 U Ga U · · · = {x : 0 < x � 1} Note that the number zero is not in this set, since it is not in one of the sets C1, C2, Ca, •

Definition 1 2.4 is called the The set of all elements that belong to each of the sets C1 and C2

intersection ofC1 and C2 The intersection ofC1 and C2 is indicated

by writing C1 n G2 The intersection of several sets C1, C2, Ca, is the set of all elements that belong to each of the sets C1, C2, G3, • • • This intersection is denoted

by C1 n G2 n Ga n · · · or by C1 n G2 n · · · n Ck if a finite number k of sets is involved Example 1 2.8 Let G1 = {(0, 0) , (0, 1) , (1, 1)} and G2 = {(1, 1) , (1, 2) , (2, 1)}

Then G1 n G2 = {(1, 1)} •

Example 1.2.9 Let G1 = {(x, y) : 0 � x + y � 1} and G2 = {(x, y) : 1 < x + y}

Then G1 and G2 have no points in common and G1 n G2 = r/J •

Example 1 2 10 For every set C, C n C = C and C n rjJ = rjJ •

Example 1 2 1 1 Let

ck = { x : o < x < H , k = 1, 2, 3, ..

Then G1 n G2 n G3 n · · · is the null set, since there is no point that belongs to each

of the sets G1 n G2 n Ga n · · · •

Trang 20

1 2 Set Theory 5

Figure 1 2 1 : {a) C1 U C2 and {b) C1 n C2

Example 1 2 12 Let C1 and C2 represent the sets of points enclosed, respectively,

by two intersecting circles Then the sets C1 U C2 and C1 n C2 are represented, respectively, by the shaded regions in the V enn diagrams in Figure 1.2.1 • Example 1.2.13 Let C1 , C2 and C3 represent the sets of points enclosed, respectively, by three intersecting circles Then the sets (C1 U C2) n C3 and {C1 n C2) U C3 are depicted in Figure 1.2.2 •

Definition 1.2.5 I n certain discussions or considerations, the totality of all ele ments that pertain to the discussion can be described This set of all elements u nde r consideration is given a special name I t is called the space We shall oft en denote spaces by letters such as C and V

Example 1 2 14 Let the number of heads, in tossing a coin four times, be denoted

by x Of necessity, the number of heads will be of the numbers 0, 1, 2, 3, 4 Here, then, the space is the set C = {0, 1, 2, 3, 4} •

Trang 21

Example 1.2.15 Consider all nondegenerate rectangles of base x and height y

To be meaningful, both x and y must be positive Then the space is given by the set C = {(x, y) : x > O , y > 0} •

Definition 1 2.6 Let C denote a space and let C be a subset of the set C The set that consists of all elements of C that are not elements of C is called the comple ment of C ( actually, with respect to C) The complement of C is denoted by cc

I n particular, cc = ¢

Example 1 2 16 Let C be defined as in Example 1.2.14, and let the set C = {0, 1}

The complement of C (with respect to C) is cc = { 2, 3, 4} •

Example 1 2 17 Given C c C Then C U cc = C, C n cc = ¢, C U C = C,

The reader is asked to prove these in Exercise 1.2.4 •

In the calculus, functions such as

are of common occurrence The value of f(x) at the "point x = 1" is f(1) = 2; the value of g (x, y) at the "point ( -1, 3)" is g( -1, 3) = 0; the value of h (x1, x2 , , xn )

at the "point (1, 1 , , 1)" is 3 Functions such as these are called functions of a point or, more simply, point functions because they are evaluated (if they have a value) at a point in a space of indicated dimension

There is no reason why, if they prove useful, we should not have functions that can be evaluated, not necessarily at a point, but for an entire set of points Such functions are naturally called functions of a set or, more simply, set functions We shall give some examples of set functions and evaluate them for certain simple sets

Example 1 2 19 Let C be a set in one-dimensional space and let Q(C) be equal

to the number of points in C which correspond to positive integers Then Q( C)

is a function of the set C Thus, if C = {x : 0 < x < 5}, then Q(C) = 4; if

C = {-2,-1}, then Q(C) = 0; if C = {x : -oo < x < 6}, then Q(C) = 5 •

Trang 22

At this point we introduce the following notations The symbol

fct(x) dx

will mean the ordinary (Riemann) integral of f(x) over a prescribed one-dimensional

set C; the symbol J Jg(x,y) dxdy

c

will mean the Riemann integral of g(x, y) over a prescribed two-dimensional set C; and so on To be sure, unless these sets C and these functions f(x) and g(x, y) are chosen with care, the integrals will frequently fail to exist Similarly, the symbol

Lf(x)

c

will mean the sum extended over all x E C; the symbol

2:2:g(x, y) c

will mean the sum extended over all (x, y) E C; and so on

Example 1 2.22 Let C be a set in one-dimensional space and let Q(C) = Lf(x), c

Trang 23

if C = C1 U C2 , where C1 = {x : 0 �X� 2} and C2 = {x : 1 �X� 3}, then

Q(C) = Q(C1 U C2) = 13 e-xdx

= 12 e-xdx + 13 e-xdx -12 e-xdx Q(C1 ) + Q(C2) - Q(C1 n C2) • Example 1.2.25 Let C be a set inn-dimensional space and let

Trang 24

1 2 Set Theory 9 EXERCISES

1 2 1 Find the union C1 U C2 and the intersection C1 n C2 of the two sets C1 and c2 , where:

1 2.4 Referring to Example 1.2.18, verify DeMorgan's Laws {1.2.1) and {1.2.2) by using Venn diagrams and then prove that the laws are true Generalize the laws to arbitrary unions and intersections

1.2.5 By the use of Venn diagrams, in which the space C is the set of points enclosed by a rectangle containing the circles, compare the following sets These laws are called the distributive laws

(a) C1 n (C2 u Ca ) and (C1 n C2 ) u (C1 n Ca)

(b) C1 U {C2 n Ca) and {C1 U C2) n {C1 U Ca)

1.2.6 If a sequence of sets C1 , C2, C3, • • • is such that Ck c Ck+l• k = 1 , 2, 3, , the sequence is said to be a nondecreasing sequence Give an example of this kind

Trang 25

1.2.9 If C11 C2, Ca, are sets such that Ck ::J Ck+l• k = 1, 2, 3, , lim Ck is k-+oo defined as the intersection C1 n C2 n C3 n · · · Find lim Ck if: k-+oo

Hint: Recall that Sn = a + ar + · · · + arn-l = a{1 - rn)/{1 - r) and, hence, it follows that limn-+oo Sn = a/{1 - r) provided that lrl < 1

1.2 1 1 For every one-dimensional set C for which the integral exists, let Q(C) =

fa f(x) dx, where f(x) = 6x{1 - x) , 0 < x < 1, zero elsewhere; otherwise, let Q(C)

be undefined If cl = {x : i < X < n c2 = g}, and Ca = {x : 0 < X < 10}, find Q(C1 ) , Q(C2) , and Q(Ca)

1 2 1 2 For every two-dimensional set C contained in R2 for which the integral exists, let Q(C) = fa f(x2 + y2) dxdy If cl = {(x, y) : -1 ::::; X ::::; 1,-1 ::::; y::::; 1}, c2 = {(x, y) : - 1::::; X = y::::; 1}, and Ca = {(x, y) : x2 +y2 ::::; 1}, find Q(CI ) , Q(C2) , and Q(Ca)

1 2.13 Let C denote the set of points that are interior to, or on the boundary of, a square with opposite vertices at the points {0, 0) and {1, 1) Let Q(C) = fa f dy dx

(a) If C C C is the set {(x, y) : 0 < x < y < 1}, compute Q(C)

(b) If C c C is the set {(x, y) : 0 < x = y < 1}, compute Q(C)

(c) If C C C is the set {(x, y) : 0 < x/2 ::::; y::::; 3x/2 < 1}, compute Q(C)

1 2 14 Let C be the set of points interior to or on the boundary of a cube with edge of length 1 Moreover, say that the cube is in the first octant with one vertex

at the point {0, 0, 0) and an opposite vertex at the point {1, 1, 1) Let Q(C) =

f f f0dx dydz

(a) If C c C is the set {(x, y, z) : 0 < x < y < z < 1}, compute Q(C)

(b) If C is the subset {(x, y, z) : 0 < x = y = z < 1}, compute Q(C)

1 2 1 5 Let C denote the set {(x, y, z) : x2 +y2 +z2::::; 1} Evaluate

Q(C) = f f fa Jx2 + y2 + z2 dxdydz Hint: Use spherical coordinates

1 2 16 To join a certain club, a person must be either a statistician or a mathematician or both Of the 25 members in this club, 19 are statisticians and 16 are mathematicians How many persons in the club are both a statistician and a mathematician?

Trang 26

1.3 The Probability Set Function 11

1.2 17 After a hard-fought football game, it was reported that, of the 11 starting players, 8 hurt a hip, 6 hurt an arm, 5 hurt a knee, 3 hurt both a hip and an arm,

2 hurt both a hip and a knee, 1 hurt both an arm and a knee, and no one hurt all three Comment on the accuracy of the report

1 3 The Probability Set Function

Let C denote the san1ple space ·what should be our collection of events? As discussed in Section 2, we are interested in assigning probabilities to events, complements of events, and union and intersections of events (i.e., compound events) Hence, we want our collection of events to include these combinations of events Such a collection of events is called a a-field of subsets of C, which is defined as

follows

Definition 1 3 1 (a-Field) Let B be a collection of subsets of C We say B is a a-field if

{1) ¢ E B, (B is not empty)

{2) If C E B then cc E B, (B is closed under complements)

{3) If the sequence of sets {Ct C2, } is in B then U:1 Ci E B, (B is closed under countable unions)

Note by (1) and (2), a a-field always contains ¢ and C By (2) and (3), it follows from DeMorgan's laws that a a-field is closed under countable intersections, besides countable unions This is what we need for our collection of events To avoid confusion please note the equivalence: let C C C Then

the statement C is an event is equivalent to the statement C E B

We will use these expressions interchangeably in the text Next, we present some examples of a-fields

1 Let C be any set and let C c C Then B = { C, cc, ¢, C} is a a-field

2 Let C be any set and let B be the power set of C, (the collection of all subsets

Trang 27

The a-field, Bo is often referred to as the Borel a-field on the real line As Exercise 1.3.21 shows, it contains not only the open intervals, but the closed and half-open intervals of real numbers This is an important a-field

Now that we have a sample space, C, and our collection of events, B, we can define the third component in our probability space, namely a probability set function In order to motivate its definition, we consider the relative frequency approach to probability

Remark 1 3 1 The definition of probability consists of three axioms which we will motivate by the following three intuitive properties of relative frequency Let

C be an event Suppose we repeat the experiment N times Then the relative frequency of C is f c = #{C}fN, where #{C} denotes the number of times C

occurred in the N repetitions Note that f c � 0 and fc :5 1 These are the first two properties For the third, suppose that Ct and C2 are disjoint events Then

f c1uc2 = f c1 + f c2 • These three properties of relative frequencies form the axioms

of a probability, except that the third axiom is in terms of countable unions As with the axioms of probability, the readers should check that the theorems we prove below about probabilities agree with their intuition of relative frequency •

Definition 1.3.2 ( Probability ) Let C be a sample space and let B be a a-field

on C Let P be a real valu ed function defined on B Then P is a probability set function if P satisfies the following three conditions:

1 P(C) � 0, for all C E B

2 P(C) = 1

3 I f {Cn} i s a sequ ence of sets i n B and Cm n Cn =¢for all m-=/; n, then

A probability set function tells us how the probability is distributed over the set

of events, B In this sense we speak of a distribution of probability We will often drop the word set and refer to P as a probability function

The following theorems give us some other properties of a probability set function In the statement of eaclt of these theorems, P( C) is taken, tacitly, to be a probability set function defined on a a-field B of a sample space C

Theorem 1 3 1 F or each event C E B, P(C) = 1 -P(Cc)

Proof: We have C = C U cc and C n cc = ¢ Thus, from (2) and (3) of Definition 1.3.2, it follows that

1 = P(C) + P(Cc)

which is the desired result •

Theorem 1.3.2 The probability of the nu ll set is zero; that is, P(¢) = 0

Trang 28

1.3 The Probability Set Function 13 Proof" In Theorem 1.3.1, take C = ¢ so that cc = C Accordingly, we have

P(¢) = 1-P(C) = 1- 1 = 0 and the theorem is proved •

Theorem 1 3.3 lfC1 and C2 are event s such t hat C1 c C2, t hen P(C1 ) ::; P(C2) Proof: Now C2 = C1 U (Cf n C2 ) and C1 n (Cf n C2 ) = ¢ Hence, from (3) of

Definition 1.3.2,

P(C2 ) = P(C1) + P(Cf n C2 )

From (1) of Definition 1.3.2, P(Cf n C2 ) � 0 Hence, P(C2 ) � P(Ct) •

Theorem 1.3.4 F or each C E B, 0::; P(C) ::; 1

Proof: Since ¢ c C c C, we have by Theorem 1.3.3 that

the desired result •

P(¢)::; P(C)::; P(C) or 0::; P(C)::; 1

Part (3) of the definition of probability says that P(C1 U C2 ) = P(Ct) + P(C2 ) ,

if C1 and C2 are disjoint, i.e., C1 n C2 = ¢ The next theorem, gives the rule for any two events

Theorem 1.3.5 If C1 and C2 are event s in C, t hen

Proof: Each of the sets C1 U C2 and C2 can be represented, respectively, as a union

of nonintersecting sets as follows:

Thus, from (3) of Definition 1.3.2,

and

P(C2 ) = P(C1 n C2 ) + P(Cf n C2 )

If the second of these equations is solved for P( Cf n C2 ) and this result substituted

in the first equation, we obtain,

This completes the proof •

Trang 29

Remark 1.3.2 (Inclusion-Exclusion Formula) It is easy to show (Exercise 1.3.9) that

where P(C1 U C2 U Ca) = Pl - P2 + Pa,

sets It is clear in the case k = 3 that Pl � P2 � Pa, but more generally PI � P2 �

· · · � Pk· As shown in Theorem 1.3.7,

This is known as Boole 's inequality For k = 2, we have

which gives Bonferroni's Inequality,

Example 1.3.2 Two coins are to be tossed and the outcome is the ordered pair

(face on the first coin, face on the second coin) Thus the sample space may be represented as C = {( H, H) , ( H, T) , (T, H) , ( T, T)} Let the probability set function assign a probability of � to each element of C Let C1 = { ( H, H) , ( H, T)} and

C2 = {(H, H), (T, H)} Then P(CI) = P(C2) = �' P(C1 n C2) = � ' and, in accordance with Theorem 1.3.5, P(C1 U C2) = � + � - � = £ •

Trang 30

Let C denote a sample space and let ell e2, e3, denote events of C If these events are such that no two have an element in common, they are called mutually disjoint sets and the corresponding events e 1' e2' e3' are said to be mu tually exclu sive events Then P(e1 U e2 U e3 U · · ·) = P(ei) + P(e2) + P(e3) + · · · ,

in accordance with (3) of Definition 1.3.2 Moreover, if C = e1 U e2 U e3 U · · · ,

the mutually exclusive events are further characterized as being exhaustive and the probability of their union is obviously equal to 1

Example 1 3 3 (Equilikely Case) Let C be partitioned into k mutually disjoint subsets el l e2, 'ek in such a way that the union of these k mutually disjoint subsets is the sample space C Thus the events e1, e2, , ek are mutually exclusive and exhaustive Suppose that the random experiment is of such a character that it

is reasonable to assu me that each of the mutually exclusive and exhaustive events

ei, i = 1, 2, , k, has the same probability It is necessary, then, that P( ei) = 1/k,

i = 1, 2, 'k; and we often say that the events el , e2, 'ek are equ ally lik ely

Let the event E be the union of r of these mutually exclusive events, say

P(E) = P(el) + P(e2) + · · · + P(er) = k"

Frequently, the integer k is called the total number of ways (for this particular partition of C) in which the random experiment can terminate and the integer r is called the number of ways that are favorable to the event E So, in this terminology,

P(E) is equal to the number of ways favorable to the event E divided by the total number of ways in which the experiment can terminate It should be emphasized that in order to assign, in this manner, the probability r/k to the event E, we must assume that each of the mutually exclusive and exhaustive events el' e2, 'ek has the same probability 1/k This assumption of equally likely events then becomes a

part of our probability model Obviously, if this assumption is not realistic in an application, the probability of the event E cannot be computed in this way •

In order to illustrate the equilikely case, it is helpful to use some elementary counting rules These are usually discussed in an elementary algebra course In the next remark, we offer a brief review of these rules

Remark 1.3.3 (Counting Rules) Suppose we have two experiments The first experiment results in m outcomes while the second experiment results in n outcomes The composite experiment, first experiment followed by second experiment, has mn outcomes which can be represented as mn ordered pairs This is called the

multiplication rule or the mn -rule This is easily extended to more than two experiments

Let A be a set with n elements Suppose we are interested in k-tuples whose components are elements of A Then by the extended multiplication rule, there are n · n · · · n = n k such k-tuples whose components are elements of A Next, suppose k :5 n and we are interested in k-tuples whose components are distinct (no repeats) elements of A There are n elements from which to choose for the first

Trang 31

component, n - 1 for the second component, , n - (k - 1) for the kth Hence,

by the multiplication rule, there are n(n - 1) · · · (n - (k - 1)) such k-tuples with distinct elements We call each such k-tuple a permutation and use the symbol

PJ: to denote the number of k permutations taken from a set of n elements Hence,

we have the formula,

PJ: = n(n - 1) · · · (n - (k - 1)) = n!

Next suppose order is not important, so instead of counting the number of permutations we want to count the number of subsets of k elements taken from A We will use the symbol (�) to denote the total number of these subsets Consider a subset

of k elements from A By the permutation rule it generates P/: = k(k - 1) · · ·1 permutations Furthermore all these permutations are distinct from permutations generated by other subsets of k elements from A Finally, each permutation of k distinct elements drawn form A, must be generated by one of these subsets Hence,

we have just shown that PJ: = (�)k!; that is,

We often use the terminology combinations instead of subsets So we say that there are (�) combinations of k things taken from a set of n things Another common symbol for (�) is c�

It is interesting to note that if we expand the binomial,

Example 1.3.4 (Poker Hands) Let a card be drawn at random from an ordinary deck of 52 playing cards which has been well shuffied The sample space C is the union of k = 52 outcomes, and it is reasonable to assume that each of these outcomes has the same probability 5� Accordingly, if E1 is the set of outcomes that are spades, P(EI) = �� = i because there are r1 = 13 spades in the deck; that is, i

is the probability of drawing a card that is a spade If E2 is the set of outcomes that are kings, P(E2) = 5� = 1� because there are r2 = 4 kings in the deck; that

is, 113 is the probability of drawing a card that is a king These computations are very easy because there are no difficulties in the determination of the appropriate values of r and k

However, instead of drawing only one card, suppose that five cards are taken,

at random and without replacement, from this deck; i.e, a 5-card poker hand In this instance, order is not important So a hand is a subset of 5 elements drawn

Trang 32

from a set of 52 elements Hence, by (1.3.7) there are (5;) poker hands If the deck is well shuffled, each hand should be equilikely; i.e., each hand has probability 1/ (5;) We can now compute the probabilities of some interesting poker hands Let

E1 be the event of a flush, all 5 cards of the same suit There are (i) = 4 suits to choose for the flush and in each suit there are c;) possible hands; hence, using the multiplication rule, the probability of getting a flush is

Now suppose that Ea is the set of outcomes in which exactly three cards are

kings and exactly two cards are queens Select the kings, in (:) ways and select the

queens, in (�) ways Hence, the probability of Ea is,

P(Ea) = (:) G) I c52) = 0.0000093

The event Ea is an example of a full house: 3 of one kind and 2 of another kind Exercise 1.3.19 asks for the determination of the probability of a full house •

Example 1.3.4 and the previous discussion allow us to see one way in which

we can define a probability set function, that is, a set function that satisfies the requirements of Definition 1.3.2 Suppose that our space C consists of k distinct points, which, for this discussion, we take to be in a one-dimensional space If the random experiment that ends in one of those k points is such that it is reasonable

to assume that these points are equally likely, we could assign 1/k to each point and let, for C c C,

P(C ) number of points in k C

L f(x),

xEC

where f(x) = k, x E C 1

For illustration, in the cast of a die, we could take C = {1, 2, 3, 4, 5, 6} and

f(x) = � x E C, if we believe the die to be unbiased Clearly, such a set function satisfies Definition 1.3.2

Trang 33

18 Probability and Distributions The word u nbiased in this illustration suggests the possibility that all six points might not, in all such cases, be equally likely As a matter of fact, loaded dice do exist In the case of a loaded die, some numbers occur more frequently than others

in a sequence of casts of that die For example, suppose that a die has been loaded

so that the relative frequencies of the numbers in C seem to stabilize proportional

to the number of spots that are on the up side Thus we might assign f(x) = x/21,

x E C, and the corresponding

We end this section with another property of probability which will be useful

in the sequel Recall in Exercise 1.2.8 we said that a sequence of events {en} is an increasing sequence if en C en+l , for all n , in which case we wrote limn-+oo en =

U�=1en Consider, limn-+oo P(en)· The question is: can we interchange the limit and P? As the following theorem shows the answer is yes The result also holds for a decreasing sequence of events Because of this interchange, this theorem is sometimes referred to as the continuity theorem of probability

Theorem 1 3.6 Let {en} be an increasing sequ ence of events Then

lim P( en) = P( lim en) = P ( Uoo en) (1.3.9)

Let {en} be a decreasing sequ ence of events Then

lim P(en) = P( lim en) = P ( noo en)

Proof We prove the result (1.3.9) and leave the second result as Exercise 1.3.22 Define the sets, called rings as: R1 = e1 and for n > 1, Rn = en n e�_1 It follows that U�=1 en = U�=1 Rn and that Rm n Rn = ¢, for m 'f n Also,

P(Rn) = P(en) - P(en-d· Applying the third axiom of probability yields the following string of equalities:

p [n-+oo lim en]

n

lim {P(et) + "'[P(ei) - P(ei-1)]} = lim P(en)· (1.3.11)

Trang 34

1 3 The Probability Set Function 19

This is the desired result •

Another useful result for arbitrary unions is given by

Theorem 1.3.7 (Boole's Inequality) Let {Cn} be an arbit rary sequence of event s Then

(1.3 12)

Proof: Let Dn = U�=1 Ci Then {Dn} is an increasing sequence of events which go

up to U�=1 Cn· Also, for all j , Di = Di-1 U Ci Hence, by Theorem 1.3.5

C2 = {3, 4, 5, 6} If the probability set function P assigns a probability of� to each

of the elements of C, compute P(CI), P(C2), P(C1 n C2) , and P(C1 U C2)

1 3.2 A random experiment consists of drawing a card from an ordinary deck of

52 playing cards Let the probability set function P assign a probability of 512 to each of the 52 possible outcomes Let cl denote the collection of the 13 heruts and let C2 denote the collection of the 4 kings Compute P(C1), P(C2) , P(C1 n C2),

and P(C1 U C2)

1.3.3 A coin is to be tossed as many times as necessary to turn up one head Thus the elements c of the sample space C ru·e H, T H, TT H, TTT H, and so forth Let the probability set function P assign to these elements the respective probabilities � ·!:, � l6 , and so forth Show that P(C) = 1 Let C1 = {c :

c is H, TH, TTH, TTTH, or TTTTH} Compute P(CI) Next, suppose that C2 =

{c : c is TTTTH or TTTTTH} Compute P(C2), P(C1 n C2) , ru1d P(C1 U C2)

1 3.4 If the sample space is C = C1 U C2 and if P( CI) = 0 8 and P( C2) = 0.5, find

P(C1 n C2)

Trang 35

1 3 5 Let the sample space be C = { c : 0 < c < oo } Let C C C be defined by

C = {c : 4 < c < oo} and take P(C) = fc e-x dx Evaluate P(C), P(Cc), and

P(c u cc)

1.3.6 If the sample space is C = { c : -oo < c < oo} and if C C C is a set for which the integral fc e-lxl dx exists, show that this set function is not a probability set function What constant do we multiply the integrand by to make it a probability set function?

1 3 7 If cl and c2 are subsets of the sample space c, show that

1 3 8 Let Ct C2, and C3 be three mutually disjoint subsets of the sample space

C Find P[(C1 u C2) n Ca] and P(Cf U q)

1 3.9 Consider Remark 1.3.2

(a) If C1 , C2, and Ca are subsets of C, show that

P(C1 U C2 U Ca) = P(C1) + P(C2) + P(Ca) - P(C1 n C2)

-P(C1 n Ca) - P(C2 n Ca) + P(C1 n C2 n Ca),

(b) Now prove the general inclusion-exclusion formula given by the expression (1 3.4)

1 3 10 Suppose we turn over cards simultaneously from two well shuffled decks of ordinary playing cards We say we obtain an exact match on a particular turn if the same card appears from each deck; for example, the queen of spades against the queen of spades Let p M equal the probability of at least one exact match

(a) Show that

PM = 1 - 2! + 3! - 4! + - 52!

Hint: Let Ci denote the event of an exact match on the ith turn Then

PM = P(C1 U C2 U · · · U C52) Now use the the general inclusion-exclusion formula given by (1.3.4) In this regard note that: P(Ci) = 1/52 and hence

p1 = 52(1/52) = 1 Also, P(Ci n Cj) = 50!/52! and, hence, p2 = (5{) /(52 · 51)

(b) Show that Pm is approximately equal to 1 - e-1 = 0.632

Remark 1.3.4 In order to solve a number of exercises, like (1.3.11) - (1.3.19), certain reasonable assumptions must be made •

1 3 1 1 A bowl contains 16 chips, of which 6 are red, 7 are white, and 3 are blue If four chips are talmn at random and without replacement, find the probability that: (a) each of the 4 chips is red; (b) none of the 4 chips is red; (c) there is at least 1 chip of each color

Trang 36

1 3.12 A person has purchased 10 of 1000 tickets sold in a certain raffle To determine the five prize winners, 5 tickets are to be drawn at random and without replacement Compute the probability that this person will win at least one prize

Hint: First compute the probability that the person does not win a prize

1.3 13 Compute the probability of being dealt at random and without replacement

a 13-card bridge hand consisting of: (a) 6 spades, 4 hearts, 2 diamonds, and 1 club; (b) 13 cards of the same suit

1 3 14 Three distinct integers are chosen at random from the first 20 positive integers Compute the probability that: (a) their stun is even; (b) their product is even

1 3 1 5 There are 5 red chips and 3 blue chips in a bowl The red chips are numbered 1,2,3,4,5, respectively, and the blue chips are numbered 1,2,3, respectively

If 2 chips are to be drawn at random and without replacement, find the probability that these chips have either the same number or the same color

1 3 16 In a lot of 50 light bulbs, there are 2 bad bulbs An inspector examines 5 bulbs, which are selected at random and without replacement

(a) Find the probability of at least 1 defective bulb among the 5

(b) How many bulbs should be examined so that the probability of finding at least

1 bad bulb exceeds ! ?

1 3 1 7 If cl , 'ck are k events in the sample space c, show that the probability that at least one of the events occurs is one minus the probability that none of them occur; i.e.,

P(C1 U · · · U Ck) = 1 -P(Cf n · · · n Ck)· (1.3.13)

1.3.18 A secretary types three letters and the three corresponding envelopes In

a hurry, he places at random one letter in each envelope What is the probability that at least one letter is in the correct envelope? Hint: Let Ci be the event that the ith letter is in the correct envelope Expand P(C1 U C2 U Ca) to determine the probability

1.3.19 Consider poker hands drawn form a well shuffied deck as described in Example 1.3.4 Determine the probability of a full house; i.e, three of one kind and two of another

1.3.20 Suppose V is a nonempty collection of subsets of C Consider the collection

of events,

B = n{£ : V c £ and £ is a a-field}

Note that ¢ E B because it is in each a-field, and, hence, in particulru·, it is in each a-field £ :J V Continue in this way to show that B is a a-field

Trang 37

1.3.21 Let C = R, where R is the set of all real numbers Let I be the set of all open intervals in R Recall from (1.3.2) the Borel a-field on the real line; i.e, the a-field Bo given by

1.3.24 Consider the events Cl l C2, C3

(a) Suppose Cl l C2, C3 are mutually exclusive events If P(Ci) =Pi, i = 1, 2, 3,

what is the restriction on the sum Pl + P2 + P3?

(b) In the notation of Part (a), if P1 = 4/10, P2 = 3/10, and P3 = 5/10 are

cl , c2, c3 mutually exclusive?

In some random experiments, we are interested only in those outcomes that are elements of a subset C1 of the sample space C This means, for our purposes, that the sample space is effectively the subset C1 We are now confronted with the problem of defining a probability set function with cl as the "new" sample space Let the probability set function P( C) be defined on the sample space C and let

C1 be a subset of C such that P( Ct) > 0 We agree to consider only those outcomes

of the random experiment that are elements of C1; in essence, then, we take C1 to

be a san1ple space Let C2 be another subset of C How, relative to the new sample space C1 , do we want to define the probability of the event C2? Once defined, this probability is called the conditional probability of the event C2, relative to the hypothesis of the event C1; or, more briefly, the conditional probability of C2, given

C1 Such a conditional probability is denoted by the symbol P(C2IC1) We now return to the question that was raised about the definition of this symbol Since C1

is now the sample space, the only elements of C2 that concern us are those, if any, that are also elements of el l that is, the elements of cl n c2 It seems desirable, then, to define the symbol P( C2IC1) in such a way that

Trang 38

1 4 Conditional Probability and Independence 23

Moreover, from a relative frequency point of view, it would seem logically inconsistent if we did not require that the ratio of the probabilities of the events cl n c2

and Ct relative to the space C1 , be the same as the ratio of the probabilities of these events relative to the space C; that is, we should have

P(C1 n C2ICt) _ P(C1 n C2) P(CdC1) - P(C1)

These three desirable conditions imply that the relation

is a suitable definition of the conditional probability of the event C2, given the event

Ct provided that P(C1) > 0 Moreover, we have

1 P(C2IC1) � 0

2 P(C2 u Ca u · · · ICt) = P(C2ICt) + P(CaiCt) + · · · , provided that C2, Ca,

are mutually disjoint sets

3 P( CdCl) = 1

Properties (1) and (3) are evident; proof of property (2) is left as Exercise (1.4.1)

But these are precisely the conditions that a probability set function must satisfy Accordingly, P(C2ICt) is a probability set function, defined for subsets of C1 It may be called the conditional probability set function, relative to the hypothesis

C1; or the conditional probability set function, given C1 It should be noted that this conditional probability set function, given cl' is defined at this time only when

P(Cl) > 0

Example 1 4 1 A hand of 5 cards is to be dealt at random without replacement from an ordinary deck of 52 playing cards The conditional probability of an allspade hand ( C2), relative to the hypothesis that there are at least 4 spades in the hand (C1), is, since C1 n C2 = C2,

Note that this is not the same as drawing for a spade to complete a flush in draw poker; see Exercise 1.4.3 •

From the definition of the conditional probability set function, we observe that This relation is frequently called the multiplication rule for probabilities Sometimes, after considering the nature of the random experiment, it is possible to make

Trang 39

reasonable assumptions so that both P(C1) and P(C2ICI) can be assigned Then

P(C1 n C2) can be computed under these assumptions This will be illustrated in Examples 1.4.2 and 1.4.3

Example 1.4.2 A bowl contains eight chips Three of the chips are red and the remaining five are blue Two chips are to be drawn successively, at random and without replacement We want to compute the probability that the first draw results in a red chip (CI) and that the second draw results in a blue chip (C2) It

is reasonable to assign the following probabilities:

P(CI) = i and P(C2ICI) = �

Thus, under these assignments, we have P(C1 n C2) = (i)(�) = �� = 0.2679 • Example 1 4.3 From an ordinary deck of playing cards, cards are to be drawn successively, at random and without replacement The probability that the third spade appears on the sixth draw is computed as follows Let C1 be the event of two spades in the first five draws and let c2 be the event of a spade on the sixth draw Thus the probability that we wish to compute is P(C1 n C2) It is reasonable to take

and The desired probability P(C1 nC2) is then the product of these two numbers, which

But P(C1 n C2) = P(CI)P(C2ICI) Hence, provided P(C1 n C2) > 0,

This procedure can be used to extend the multiplication rule to four or more events The general formula for k events can be proved by mathematical induction

Example 1 4.4 Four cards are to be dealt successively, at random and without replacement, from an ordinary deck of playing cards The probability of receiving a spade, a heart, a diamond, and a club, in that order, is (��)(��)(��)(!�)= 0.0044

This follows from the extension of the multiplication rule •

Consider k mutually exclusive and exhaustive events C� , C2, • • , Ck such that

P(Ci) > 0, i = 1, 2, , k Suppose these events form a partition of C Here the events C 1, C2, , Ck do not need to be equally likely Let C be another event Thus C occurs with one and only one of the events C 1, C2, , Ck; that is,

c = c n (c1 u c2 u ck)

(C n C1) u (C n C2) u · · · u (C n Ck)·

Trang 40

1 4 Conditional Probability and Independence

Since C n Ci, i = 1, 2, , k, are mutually exclusive, we have

P(C) = P(C n C1) + P(C n C2) + · · · + P(C n Ck)·

However, P(C n Ci) = P(Ci)P(CICi), i = 1, 2, , k; so

Suppose, also, that P( C) > 0 From the definition of conditional probability,

we have, using the law of total probability, that

P(C·IC) 3 = P(C P(C) n Ci) = 2::�=1 P(Ci)P(CICi) ' P(Ci)P(CICi) (1.4.1)

which is the well-known Bayes ' theorem This permits us to calculate the conditional probability of Ci, given C, from the probabilities of C1 , C2, ,Ck and the conditional probabilities of c, given ci, i = 1, 2, 'k

Example 1 4.5 Say it is known that bowl C1 contains 3 red and 7 blue chips and bowl C2 contains 8 red and 2 blue chips All chips are identical in size and shape

A die is cast and bowl C1 is selected if five or six spots show on the side that is up; otherwise, bowl C2 is selected In a notation that is fairly obvious, it seems reasonable to assign P(C1) = � and P(C2) = �- The selected bowl is handed to another person and one chip is taken at random Say that this chip is red, an event which we denote by C By considering the contents of the bowls, it is reasonable

to assign the conditional probabilities P(CIC1) = 1� and P(CIC2) = 1� Thus the conditional probability of bowl c1, given that a red chip is drawn, is

P(CI)P(CIC1) + P(C2)P(CIC2)

(�)( 130) + (�)( 180) = 19 "

In a similar manner, we have P(C2IC) = ��- •

In Example 1.4.5, the probabilities P(CI) = � and P(C2) = � are called prior probabilities of c1 and c2, respectively, because they are known to be due to the random mechanism used to select the bowls After the chip is taken and observed

to be red, the conditional probabilities P(C1 IC) = 1� and P(C2IC) = �� are called

posterior probabilities Since C2 has a larger proportion of red chips than does C1, it appeals to one's intuition that P(C2IC) should be larger than P(C2) and, of course,

P(CdC) should be smaller than P(C1) That is, intuitively the chances of having bowl c2 are better once that a red chip is observed than before a chip is taken Bayes' theorem provides a method of determining exactly what those probabilities are

Định dạng
Số trang	719
Dung lượng	14,08 MB