Some reasoning suggests that the distribution function of a random variable X is F x; the question is whether the observed values are compatible with the hypothesis to be tested that the
Trang 1Mathematical Statistics
21.1 Introduction to Mathematical Statistics
21.1.1 Basic Notions and Problems of Mathematical Statistics
21.1.1-1 Basic problems of mathematical statistics
The term “statistics” derives from the Latin word “status,” meaning “state.” Statistics comprises three major divisions: collection of statistical data, their statistical analysis, and development of mathematical methods for processing and using the statistical data to draw scientific and practical conclusions It is the last division that is commonly known as
mathematical statistics.
The original material for a statistical study is a set of results specially gathered for this study
or a set of results of specially performed experiments The following problems arise in this connection
1 Estimating the unknown probability of a random event
2 Finding the unknown theoretical distribution function
The problem is stated as follows Given the values x1, , x n of a random variable X obtained in n independent trials, find, at least approximately, the unknown distribution function F (x) of the random variable X.
3 Determining the unknown parameters of the theoretical distribution function
The problem is stated as follows A random variable X has the distribution function
F (x; θ1, , θ k ) depending on k parameters θ1, , θ k, whose values are unknown The
main goal is to estimate the unknown parameters θ1, , θ k using only the results X1, ,
X n of observations of the random variable X.
Instead of seeking approximate values of the unknown parameters θ1, , θ k in the
form of functions θ ∗ , , θ ∗ k , in a number of problems it is preferable to seek functions θ ∗ i,L and θ i,R ∗ (i =1,2, , k) depending on the results of observations and known variables and such that with sufficient reliability one can claim that θ ∗ i,L < θ ∗ i < θ ∗ i,R (i =1,2, , k) The functions θ i,L ∗ and θ ∗ i,R (i =1,2, , k) are called the confidence boundaries for θ ∗1, , θ ∗ k
4 Testing statistical hypotheses
The problem is stated as follows Some reasoning suggests that the distribution function
of a random variable X is F (x); the question is whether the observed values are compatible with the hypothesis to be tested that the random variable X has the distribution F (x).
5 Estimation of dependence
A sequence of observations is performed simultaneously for two random variables X and Y The results of observations are given by pairs of values x1, y1, x2, y2, , x n , y n
It is required to find a functional or correlation relationship between X and Y
21.1.1-2 Population and sample
The set of all possible results of observations that can be made under a given set of conditions
is called the population In some problems, the population is treated as a random variable X.
1081
Trang 2An example of population is the entire population of a country In this population, we are, for example, interested in the age of people Another example of population is the set
of parts produced by a given machine These parts can be either accepted or rejected The number of entities in a population is called its size and is usually denoted by the
symbol N
A set of entities randomly selected from a population is called a sample A sample must
be representative of the population; i.e., it must show the right proportions characteristic of the population This is achieved by the randomness of the selection when all entities in the population can be selected with the same probability
The number of elements in a sample is called its size and is usually denoted by the symbol n The elements of a sample will be denoted by X1, , X n
Note that sampling itself can be performed by various methods Having selected an element and measured its value, one can delete this element from the population so that
it cannot be selected in subsequent trials (sampling without replacement) Alternatively, after measuring the value of an element, one can return it to the population (samples with replacement) Obviously, for a sufficiently large population size the difference between sampling with and without replacement disappears
21.1.1-3 Theoretical distribution function
Each element X i in a sample has the distribution function F (x), and the elements X1, ,
X n are assumed to be independent for sampling with replacement or as n → ∞ A sample
X1, , X n is interpreted as a set of n independent identically distributed random variables with distribution function F (x) or as n independent realizations of an observable random variable X with distribution function F (x) The distribution function F (x) is called the theoretical distribution function.
The joint distribution function F X1, ,X n (x1, , x n ) of the sample X1, , X nis given
by the formula
F X1 , ,X n (x1, , x n ) = P (X1 < x1, , X n < x n ) = F (x1)F (x2) F (x n) (21.1.1.1)
21.1.2 Simplest Statistical Transformations
21.1.2-1 Series of order statistics
By arranging the elements of a sample X1, , X n in ascending order, X(1) ≤· · ·≤X(n),
we obtain the series of order statistics X( 1 ), , X(n) Obviously, this transformation does
not lead to a loss of information about the theoretical distribution function The variables
X( 1 )and X(n) are called the extreme order statistics.
The difference
R = X(∗ n) – X(∗1) (21.1.2.1)
of the extreme order statistics is called the range statistic, or the sample range R.
The series of order statistics is used to construct the empirical distribution function (see Paragraph 21.1.2-6)
21.1.2-2 Statistical series
If a sample X1, , X ncontains coinciding elements, which may happen in observations
of a discrete random variable, then it is expedient to group the elements For a common
Trang 3value of several variates in the sample X1, , X n, the size of the corresponding group of
coinciding elements is called the frequency or the weight of this variate value By n i we
denote the number of occurrences of the ith variate value.
The set Z1, , Z L of distinct variate values arranged in ascending order with the
corresponding frequencies n1, , n L represents the sample X1, , X n and is called a
statistical series (see Example 1 in Paragraph 21.1.2-7).
21.1.2-3 Interval series
Interval series are used in observations of continuous random variables In this case, the
entire sample range is divided into finitely many bins, or class intervals, and then the number
of variates in each bin is calculated
The ordered sequence of class intervals with the corresponding frequencies or relative
frequencies of occurrences of variates in each of these intervals is called an interval series.
It is convenient to represent an interval series as a table with two rows (e.g., see Example 2 in
Paragraph 21.1.2-7) The first row of the table contains the class intervals [x0, x1), [x1, x2),
, [x L–1, x L ), which are usually chosen to have the same length The interval length h is usually determined by the Sturges formula
∗
(n) – X(∗1 )
where1+ log2n = L is the number of intervals (log2n≈ 3.322lg n) The second row of the
interval series contains the frequencies or relative frequencies of occurrences of the sample elements in each of these intervals
Remark. It is recommended to take X( 1 ) –12hfor the left boundary of the first interval.
21.1.2-4 Relative frequencies
Let H be the event that the value of a random variable X belongs to a set S H Suppose
also that a random sample X1, , X n is given The number n H of sample elements lying
in S H is called the frequency of the event H The ratio of the frequency n H to the sample
size is called the relative frequency and is denoted by
p ∗
H = n n H. (21.1.2.3)
Since a random sample can be treated as the result of a sequence of n Bernoulli trials (Paragraph 20.1.3-2), it follows that the random variable n H has the binomial distribution
with parameter p = P (H), where P (H) is the probability of the event H One has
E{p ∗
H}= P (H), Var{p ∗
H}= P (H)[1– P (H)]
The relative frequency p ∗ H is an unbiased consistent estimator for the corresponding
probability P (H) As n → ∞, the estimator p ∗
H is asymptotically normal with the
param-eters (21.1.2.4)
Let H i (i =1,2, , L) be the random events that the random variable takes the value Z i (in the discrete case) or lies in the ith interval of the interval series (in the continuous case),
Trang 4and let n i and p ∗ i be their frequencies and relative frequencies, respectively The cumulative frequencies N lare determined by the formula
N l=
l
i=1
The cumulative relative frequencies W lare given by the expression
W l=
l
i=1
p ∗
i = N n l. (21.1.2.6)
21.1.2-5 Notion of statistic
To make justified statistical conclusions, one needs a sample of sufficiently large size n.
Obviously, it is rather difficult to use and store such samples The notion of statistic allows one to avoid these problems
A statistic S = (S1, , S k ) is an arbitrary k-dimensional function of the sample
X1, , X n:
S i = S i (X1, , X n) (i =1,2, , k). (21.1.2.7)
Being a function of the random vector (X1, , X n ), the statistic S = (S1, , S k) is also a random vector, and its distribution function
F S1 , ,S k (x1, , x n ) = P (S1< x1, , S k < x k)
is given by the formula
F S1 , ,S k (x1, , x n) =
P (y1) P (y n)
for a discrete random variable X and by the formula
F S1 , ,S k (x1, , x n) =
.
p (y1) p(y n ) dy1 dy n
for a continuous random variable, where the summation or integration extends over all
possible values y1, , y n (in the discrete case, each y i belongs to the set Z1, , Z L) satisfying the inequalities
S1(y1, , y n ) < x1, S2(y1, , y n ) < x2, ., S k (y1, , y n ) < x k.
21.1.2-6 Empirical distribution function
The empirical (sample) distribution function corresponding to a random sample X1, , X n
is defined for each real x by the formula
F ∗
n (x) = μ n (X1, , X n n ; x), (21.1.2.8)
Trang 5where μ n (X1, , X n ; x) is the number of sample elements whose values are less than x.
It is a nondecreasing step function such that F n ∗(–∞) = 0and F n ∗(+∞) = 1 Since each
X i is less than x with probability p x = F n ∗ (x), while X i themselves are independent, it follows that μ n (X1, , X n ; x) is an integer random variable distributed according to the
binomial law
P (μ n (X1, , X n ; x) = k) = C n k [F (x)] k[1– F (x)] n–k with E{F ∗
n (x)} = F (x) and Var{F ∗
n (x)} = F (x)[1– F (x)] By the Glivenko–Cantelli theorem,
D n= sup
x F ∗
n (x) – F (x) a.s.
as n → ∞; i.e., the variable D n converges to 0with probability 1 or almost surely (see
Paragraph 20.3.1-2) The random variable D n measures how close F n ∗ (x) and F (x) are The empirical distribution function F n ∗ (x) is an unbiased consistent estimator of the theoretical
distribution function
If a sample is given by a statistical series, then the following formula can be used:
F ∗ (x) =
Z i<x
p ∗
i (21.1.2.10)
It is convenient to construct the empirical distribution function F n ∗ (x) using the series
of order statistics X(1)≤ .≤X(n) In this case,
F ∗
n (x) =
( 1 ),
k/n if X(k) < x≤X(k+1 ),
1 if x > X(n);
(21.1.2.11)
i.e., the function F n ∗ (x) is constant on each interval (X(k) , X(k+1)] and increases by1/nat
the point X(k)
21.1.2-7 Graphical representation of statistical distribution
1◦ A broken line passing through the points with coordinates (Z i , n i ) (i = 1,2, , L), where Z i are the variate values in a statistical series and n iare the corresponding frequencies,
is called the frequency polygon or a distribution polygon.
If the relative frequencies p ∗ = n1/n , , p ∗ L = n L /nare used instead of the
frequen-cies n i (n1+· · · + n L = n), then the polygon is called the relative frequency polygon.
Example 1 For the statistical series
p ∗ j 0 1 0 15 0 3 0 25 0 15 0 05
the relative frequency polygon has the form shown in Fig 21.1.
Z
0.1 ( ,Z p)
( ,Z p)
( ,Z p) ( ,Z p) ( ,Z p) ( ,Z p) 1
2
3 4
5
6 1
2
3 4
5
6
*
*
*
*
*
*
0.2 0.3
p*
Figure 21.1 Example of a relative frequency polygon.
Trang 62◦ The bar graph consisting of rectangles whose bases are class intervals of length Δi =
x i+1–x i and whose heights are equal to the frequency densities n i /Δi is called the frequency histogram The area of a frequency histogram is equal to the size of the corresponding
random sample
The bar graph consisting of rectangles whose bases are class intervals of lengthΔi =
x i+1–x i and whose heights are equal to the relative frequency densities p ∗ i (x)/Δi = n i / (nΔi)
is called the relative frequency histogram The area of the relative frequency histogram is
equal to1 The relative frequency histogram is an estimator of the probability density
Example 2 For the interval series
[x i , x i+1) [ 0 , 5 ) [ 5 , 10 ) [ 10 , 15 ) [ 15 , 20 ) [ 20 , 25 )
the relative frequency histogram has the form shown in Fig 21.2.
x
0.02 0.04 0.06
p x( ) Δ
*
Fig 21.2 Example of a relative frequency histogram.
21.1.2-8 Main distributions of mathematical statistics
The normal distribution, the chi-square distribution, and the Student distribution were considered in Paragraphs 20.2.4-3, 20.2.4-5, and 20.2.4-6, respectively
1◦ A random variableΨ has a Fisher–Snedecor distribution, or an F -distribution, with n1
and n2degrees of freedom if
Ψ = n2χ21
n1χ2 2
where χ21 and χ22 are independent random variables obeying the chi-square distribution
with n1and n2degrees of freedom The F -distribution is characterized by the probability
density function
Ψ(x) = Γ n1+2n2
Γ n1
2
Γ n2
2
n n21
1 n
n2
2
2 x
n1
2 –1(n2+ n1x)– n1+2n2 (x >0) (21.1.2.13) whereΓ(x) is Gamma function The quantiles of the F -distribution are usually denoted
by φ α
2◦ The Kolmogorov distribution function has the form
K (x) =
∞
k=–∞
(–1)k e–2k2x2
(x >0) (21.1.2.14)
The Kolmogorov distribution is the distribution of the random variable η = max
0≤t≤1|ξ (t)|, where
ξ (t) is a Wiener process on the interval0 ≤t≤ 1with fixed endpoints ξ(0) =0and ξ(1) =0
Trang 721.1.3 Numerical Characteristics of Statistical Distribution
21.1.3-1 Sample moments
The kth sample moment of a random sample X1, , X nis defined as
α ∗
k = 1
n
n
i=1
X k
The kth sample central moment of a random sample X1, , X nis defined as
μ ∗
k= 1
n
n
i=1
(X i – α ∗1)k (21.1.3.2) The sample moments satisfy the following formulas:
E{α ∗
k}= α k, Var{α ∗
k}= α2k – α
2
k
E{μ ∗
k}= μ k + O(1/n), Var{μ ∗
k}= μ2k–2kμ k–1μ k+1– μ2k + k2μ2μ2
k–1
The sample moment α ∗ kis an unbiased consistent estimator of the corresponding
popu-lation moment α k The sample central moment μ ∗ k is a biased consistent estimator of the
corresponding population central moment μ k
If there exists a moment μ2k , then the sample moment μ ∗ k is asymptotically normally
distributed with parameters (α k , (α2k – α2k )/n) as n → ∞.
Unbiased consistent estimators for μ3and μ4are given by
μ ∗
2α ∗
3
(n –1)(n –2), μ
∗
4 = n (n
2–2n+3)α ∗4–3n(2n–3)(α ∗4)2
(n –1)(n –2)(n –3) . (21.1.3.5)
21.1.3-2 Sample mean
The sample mean of a random sample X1, , X n is defined as the first-order sample moment, i.e.,
m ∗ = α ∗
1= 1
n
n
i=1
The sample mean of a random sample X1, , X n is also denoted by X It satisfies the
following formulas:
E{m ∗}= m (m = α1), Var{m ∗}= σ
2
E{(m ∗ – m)3}= μ3
n2, E{(m ∗ – m)4}= 3(n –1)σ2+ μ4
n3 . (21.1.3.8)
The sample mean m ∗is an unbiased consistent estimator of the population expectation
E{X}= m If the population variance σ2exists, then the sample mean m ∗is asymptotically
normally distributed with parameters (m, σ2/n)
The sample mean for the function Y = f (X) of a random variable X is
Y = 1
n
n
i=1
f (X i)