Handbook of mathematics for engineers and scienteists part 160 pps

Some reasoning suggests that the distribution function of a random variable X is F x; the question is whether the observed values are compatible with the hypothesis to be tested that the

Trang 1

Mathematical Statistics

21.1 Introduction to Mathematical Statistics

21.1.1 Basic Notions and Problems of Mathematical Statistics

21.1.1-1 Basic problems of mathematical statistics

The term “statistics” derives from the Latin word “status,” meaning “state.” Statistics comprises three major divisions: collection of statistical data, their statistical analysis, and development of mathematical methods for processing and using the statistical data to draw scientific and practical conclusions It is the last division that is commonly known as

mathematical statistics.

The original material for a statistical study is a set of results specially gathered for this study

or a set of results of specially performed experiments The following problems arise in this connection

1 Estimating the unknown probability of a random event

2 Finding the unknown theoretical distribution function

The problem is stated as follows Given the values x1, , x n of a random variable X obtained in n independent trials, find, at least approximately, the unknown distribution function F (x) of the random variable X.

3 Determining the unknown parameters of the theoretical distribution function

The problem is stated as follows A random variable X has the distribution function

F (x; θ1, , θ k ) depending on k parameters θ1, , θ k, whose values are unknown The

main goal is to estimate the unknown parameters θ1, , θ k using only the results X1, ,

X n of observations of the random variable X.

Instead of seeking approximate values of the unknown parameters θ1, , θ k in the

form of functions θ ∗ , , θ ∗ k , in a number of problems it is preferable to seek functions θ ∗ i,L and θ i,R ∗ (i =1,2, , k) depending on the results of observations and known variables and such that with sufficient reliability one can claim that θ ∗ i,L < θ ∗ i < θ ∗ i,R (i =1,2, , k) The functions θ i,L ∗ and θ ∗ i,R (i =1,2, , k) are called the confidence boundaries for θ ∗1, , θ ∗ k

4 Testing statistical hypotheses

The problem is stated as follows Some reasoning suggests that the distribution function

of a random variable X is F (x); the question is whether the observed values are compatible with the hypothesis to be tested that the random variable X has the distribution F (x).

5 Estimation of dependence

A sequence of observations is performed simultaneously for two random variables X and Y The results of observations are given by pairs of values x1, y1, x2, y2, , x n , y n

It is required to find a functional or correlation relationship between X and Y

21.1.1-2 Population and sample

The set of all possible results of observations that can be made under a given set of conditions

is called the population In some problems, the population is treated as a random variable X.

1081

Trang 2

An example of population is the entire population of a country In this population, we are, for example, interested in the age of people Another example of population is the set

of parts produced by a given machine These parts can be either accepted or rejected The number of entities in a population is called its size and is usually denoted by the

symbol N

A set of entities randomly selected from a population is called a sample A sample must

be representative of the population; i.e., it must show the right proportions characteristic of the population This is achieved by the randomness of the selection when all entities in the population can be selected with the same probability

The number of elements in a sample is called its size and is usually denoted by the symbol n The elements of a sample will be denoted by X1, , X n

Note that sampling itself can be performed by various methods Having selected an element and measured its value, one can delete this element from the population so that

it cannot be selected in subsequent trials (sampling without replacement) Alternatively, after measuring the value of an element, one can return it to the population (samples with replacement) Obviously, for a sufficiently large population size the difference between sampling with and without replacement disappears

21.1.1-3 Theoretical distribution function

Each element X i in a sample has the distribution function F (x), and the elements X1, ,

X n are assumed to be independent for sampling with replacement or as n → ∞ A sample

X1, , X n is interpreted as a set of n independent identically distributed random variables with distribution function F (x) or as n independent realizations of an observable random variable X with distribution function F (x) The distribution function F (x) is called the theoretical distribution function.

The joint distribution function F X1, ,X n (x1, , x n ) of the sample X1, , X nis given

by the formula

F X1 , ,X n (x1, , x n ) = P (X1 < x1, , X n < x n ) = F (x1)F (x2) F (x n) (21.1.1.1)

21.1.2 Simplest Statistical Transformations

21.1.2-1 Series of order statistics

By arranging the elements of a sample X1, , X n in ascending order, X(1) ≤· · ·≤X(n),

we obtain the series of order statistics X( 1 ), , X(n) Obviously, this transformation does

not lead to a loss of information about the theoretical distribution function The variables

X( 1 )and X(n) are called the extreme order statistics.

The difference

R = X(∗ n) – X(∗1) (21.1.2.1)

of the extreme order statistics is called the range statistic, or the sample range R.

The series of order statistics is used to construct the empirical distribution function (see Paragraph 21.1.2-6)

21.1.2-2 Statistical series

If a sample X1, , X ncontains coinciding elements, which may happen in observations

of a discrete random variable, then it is expedient to group the elements For a common

Trang 3

value of several variates in the sample X1, , X n, the size of the corresponding group of

coinciding elements is called the frequency or the weight of this variate value By n i we

denote the number of occurrences of the ith variate value.

The set Z1, , Z L of distinct variate values arranged in ascending order with the

corresponding frequencies n1, , n L represents the sample X1, , X n and is called a

statistical series (see Example 1 in Paragraph 21.1.2-7).

21.1.2-3 Interval series

Interval series are used in observations of continuous random variables In this case, the

entire sample range is divided into finitely many bins, or class intervals, and then the number

of variates in each bin is calculated

The ordered sequence of class intervals with the corresponding frequencies or relative

frequencies of occurrences of variates in each of these intervals is called an interval series.

It is convenient to represent an interval series as a table with two rows (e.g., see Example 2 in

Paragraph 21.1.2-7) The first row of the table contains the class intervals [x0, x1), [x1, x2),

, [x L–1, x L ), which are usually chosen to have the same length The interval length h is usually determined by the Sturges formula

∗

(n) – X(∗1 )

where1+ log2n = L is the number of intervals (log2n≈ 3.322lg n) The second row of the

interval series contains the frequencies or relative frequencies of occurrences of the sample elements in each of these intervals

Remark. It is recommended to take X( 1 ) –12hfor the left boundary of the first interval.

21.1.2-4 Relative frequencies

Let H be the event that the value of a random variable X belongs to a set S H Suppose

also that a random sample X1, , X n is given The number n H of sample elements lying

in S H is called the frequency of the event H The ratio of the frequency n H to the sample

size is called the relative frequency and is denoted by

p ∗

H = n n H. (21.1.2.3)

Since a random sample can be treated as the result of a sequence of n Bernoulli trials (Paragraph 20.1.3-2), it follows that the random variable n H has the binomial distribution

with parameter p = P (H), where P (H) is the probability of the event H One has

E{p ∗

H}= P (H), Var{p ∗

H}= P (H)[1– P (H)]

The relative frequency p ∗ H is an unbiased consistent estimator for the corresponding

probability P (H) As n → ∞, the estimator p ∗

H is asymptotically normal with the

param-eters (21.1.2.4)

Let H i (i =1,2, , L) be the random events that the random variable takes the value Z i (in the discrete case) or lies in the ith interval of the interval series (in the continuous case),

Trang 4

and let n i and p ∗ i be their frequencies and relative frequencies, respectively The cumulative frequencies N lare determined by the formula

N l=

l

i=1

The cumulative relative frequencies W lare given by the expression

W l=

l

i=1

p ∗

i = N n l. (21.1.2.6)

21.1.2-5 Notion of statistic

To make justified statistical conclusions, one needs a sample of sufficiently large size n.

Obviously, it is rather difficult to use and store such samples The notion of statistic allows one to avoid these problems

A statistic S = (S1, , S k ) is an arbitrary k-dimensional function of the sample

X1, , X n:

S i = S i (X1, , X n) (i =1,2, , k). (21.1.2.7)

Being a function of the random vector (X1, , X n ), the statistic S = (S1, , S k) is also a random vector, and its distribution function

F S1 , ,S k (x1, , x n ) = P (S1< x1, , S k < x k)

is given by the formula

F S1 , ,S k (x1, , x n) =

P (y1) P (y n)

for a discrete random variable X and by the formula

F S1 , ,S k (x1, , x n) =

.

p (y1) p(y n ) dy1 dy n

for a continuous random variable, where the summation or integration extends over all

possible values y1, , y n (in the discrete case, each y i belongs to the set Z1, , Z L) satisfying the inequalities

S1(y1, , y n ) < x1, S2(y1, , y n ) < x2, ., S k (y1, , y n ) < x k.

21.1.2-6 Empirical distribution function

The empirical (sample) distribution function corresponding to a random sample X1, , X n

is defined for each real x by the formula

F ∗

n (x) = μ n (X1, , X n n ; x), (21.1.2.8)

Trang 5

where μ n (X1, , X n ; x) is the number of sample elements whose values are less than x.

It is a nondecreasing step function such that F n ∗(–∞) = 0and F n ∗(+∞) = 1 Since each

X i is less than x with probability p x = F n ∗ (x), while X i themselves are independent, it follows that μ n (X1, , X n ; x) is an integer random variable distributed according to the

binomial law

P (μ n (X1, , X n ; x) = k) = C n k [F (x)] k[1– F (x)] n–k with E{F ∗

n (x)} = F (x) and Var{F ∗

n (x)} = F (x)[1– F (x)] By the Glivenko–Cantelli theorem,

D n= sup

x F ∗

n (x) – F (x) a.s.

as n → ∞; i.e., the variable D n converges to 0with probability 1 or almost surely (see

Paragraph 20.3.1-2) The random variable D n measures how close F n ∗ (x) and F (x) are The empirical distribution function F n ∗ (x) is an unbiased consistent estimator of the theoretical

distribution function

If a sample is given by a statistical series, then the following formula can be used:

F ∗ (x) =

Z i<x

p ∗

i (21.1.2.10)

It is convenient to construct the empirical distribution function F n ∗ (x) using the series

of order statistics X(1)≤ .≤X(n) In this case,

F ∗

n (x) =

( 1 ),

k/n if X(k) < x≤X(k+1 ),

1 if x > X(n);

(21.1.2.11)

i.e., the function F n ∗ (x) is constant on each interval (X(k) , X(k+1)] and increases by1/nat

the point X(k)

21.1.2-7 Graphical representation of statistical distribution

1◦ A broken line passing through the points with coordinates (Z i , n i ) (i = 1,2, , L), where Z i are the variate values in a statistical series and n iare the corresponding frequencies,

is called the frequency polygon or a distribution polygon.

If the relative frequencies p ∗ = n1/n , , p ∗ L = n L /nare used instead of the

frequen-cies n i (n1+· · · + n L = n), then the polygon is called the relative frequency polygon.

Example 1 For the statistical series

p ∗ j 0 1 0 15 0 3 0 25 0 15 0 05

the relative frequency polygon has the form shown in Fig 21.1.

Z

0.1 ( ,Z p)

( ,Z p)

( ,Z p) ( ,Z p) ( ,Z p) ( ,Z p) 1

2

3 4

5

6 1

2

3 4

5

6

*

0.2 0.3

p*

Figure 21.1 Example of a relative frequency polygon.

Trang 6

2◦ The bar graph consisting of rectangles whose bases are class intervals of length Δi =

x i+1–x i and whose heights are equal to the frequency densities n i /Δi is called the frequency histogram The area of a frequency histogram is equal to the size of the corresponding

random sample

The bar graph consisting of rectangles whose bases are class intervals of lengthΔi =

x i+1–x i and whose heights are equal to the relative frequency densities p ∗ i (x)/Δi = n i / (nΔi)

is called the relative frequency histogram The area of the relative frequency histogram is

equal to1 The relative frequency histogram is an estimator of the probability density

Example 2 For the interval series

[x i , x i+1) [ 0 , 5 ) [ 5 , 10 ) [ 10 , 15 ) [ 15 , 20 ) [ 20 , 25 )

the relative frequency histogram has the form shown in Fig 21.2.

x

0.02 0.04 0.06

p x( ) Δ

*

Fig 21.2 Example of a relative frequency histogram.

21.1.2-8 Main distributions of mathematical statistics

The normal distribution, the chi-square distribution, and the Student distribution were considered in Paragraphs 20.2.4-3, 20.2.4-5, and 20.2.4-6, respectively

1◦ A random variableΨ has a Fisher–Snedecor distribution, or an F -distribution, with n1

and n2degrees of freedom if

Ψ = n2χ21

n1χ2 2

where χ21 and χ22 are independent random variables obeying the chi-square distribution

with n1and n2degrees of freedom The F -distribution is characterized by the probability

density function

Ψ(x) = Γ n1+2n2

Γ n1

2

Γ n2

2

n n21

1 n

n2

2

2 x

n1

2 –1(n2+ n1x)– n1+2n2 (x >0) (21.1.2.13) whereΓ(x) is Gamma function The quantiles of the F -distribution are usually denoted

by φ α

2◦ The Kolmogorov distribution function has the form

K (x) =

∞

k=–∞

(–1)k e–2k2x2

(x >0) (21.1.2.14)

The Kolmogorov distribution is the distribution of the random variable η = max

0≤t≤1|ξ (t)|, where

ξ (t) is a Wiener process on the interval0 ≤t≤ 1with fixed endpoints ξ(0) =0and ξ(1) =0

Trang 7

21.1.3 Numerical Characteristics of Statistical Distribution

21.1.3-1 Sample moments

The kth sample moment of a random sample X1, , X nis defined as

α ∗

k = 1

n

i=1

X k

The kth sample central moment of a random sample X1, , X nis defined as

μ ∗

k= 1

n

i=1

(X i – α ∗1)k (21.1.3.2) The sample moments satisfy the following formulas:

E{α ∗

k}= α k, Var{α ∗

k}= α2k – α

2

k

E{μ ∗

k}= μ k + O(1/n), Var{μ ∗

k}= μ2k–2kμ k–1μ k+1– μ2k + k2μ2μ2

k–1

The sample moment α ∗ kis an unbiased consistent estimator of the corresponding

popu-lation moment α k The sample central moment μ ∗ k is a biased consistent estimator of the

corresponding population central moment μ k

If there exists a moment μ2k , then the sample moment μ ∗ k is asymptotically normally

distributed with parameters (α k , (α2k – α2k )/n) as n → ∞.

Unbiased consistent estimators for μ3and μ4are given by

μ ∗

2α ∗

3

(n –1)(n –2), μ

∗

4 = n (n

2–2n+3)α ∗4–3n(2n–3)(α ∗4)2

(n –1)(n –2)(n –3) . (21.1.3.5)

21.1.3-2 Sample mean

The sample mean of a random sample X1, , X n is defined as the first-order sample moment, i.e.,

m ∗ = α ∗

1= 1

n

i=1

The sample mean of a random sample X1, , X n is also denoted by X It satisfies the

following formulas:

E{m ∗}= m (m = α1), Var{m ∗}= σ

2

E{(m ∗ – m)3}= μ3

n2, E{(m ∗ – m)4}= 3(n –1)σ2+ μ4

n3 . (21.1.3.8)

The sample mean m ∗is an unbiased consistent estimator of the population expectation

E{X}= m If the population variance σ2exists, then the sample mean m ∗is asymptotically

normally distributed with parameters (m, σ2/n)

The sample mean for the function Y = f (X) of a random variable X is

Y = 1

n

i=1

f (X i)

Định dạng
Số trang	7
Dung lượng	422,98 KB