Bài giảng thống kê ứng dụng trong quản lý xây dựng Lê Hoài Long 11Population and sample 12Variables 13Measures of data 14Pattern of data 15Scales of measurement 16Between variables 21Data collection 22Probability 23Variable 24Sampling distribution 31One sample estimation 32One sample hypothesis testing 33Two sample 34Multisample estimation and testing 35Nonparametric techniques
Trang 1Part 2 – Section 3
VARIABLE AND DISTRIBUTION
Trang 2 When the numerical value of a variable is
determined by a chance event, that variable
is called a random variable
Random variables can be discrete or
continuous
Trang 3 A probability distribution is a table or an
equation that links each possible value that a random variable can assume with its
probability of occurrence
Two types:
Discrete Probability Distributions
Continuous Probability Distributions
Trang 4Discrete Probability Distributions
The probability distribution of a discrete
random variable can always be represented
by a table
Given a probability distribution, you can find cumulative probabilities
Trang 5Continuous Probability Distributions
The probability distribution of a continuous random
variable is represented by an equation, called the
probability density function (pdf).
All probability density functions satisfy the following
conditions:
The random variable Y is a function of X; that is, y = f(x).
The value of y is greater than or equal to zero for all
values of x.
The total area under the curve of the function is equal to one.
Trang 6Continuous Probability Distributions
The probability that a continuous
random variable falls in the interval
between a and b is equal to the area
under the pdf curve between a and b.
There are an infinite number of
values between any two data points
As a result, the probability that a
continuous random variable will
assume a particular value is always
Trang 7Mean of a Discrete Random Variable
The mean of the discrete random variable X is
also called the expected value of X, denoted by
E(X):
where xi is the value of the random variable for outcome i, μx is the mean of random variable X, and P(xi) is the probability that the random
variable will be outcome i.
Trang 8Median of a Discrete Random Variable
The median of a discrete random variable is the "middle" value
It is the value of X for which P(X < x) is
greater than or equal to 0.5 and P(X > x) is
greater than or equal to 0.5
Trang 9 The number of hits made by each player is
described by the following probability distribution Number of hits, x 0 1 2 3 4
Probability, P(x) 0.10 0.20 0.30 0.25 0.15
What is the mean of the probability distribution?
What is the median?
Trang 10Variability of a Discrete Random Variable
The equation for computing the variance of a discrete random variable is:
Trang 11Sums and Differences of Random Variables:
If X and Y are random variables, then
where E(X) is the expected value (mean) of X, E(Y) is the expected value of Y, E(X + Y) is the expected value of X plus Y, and E(X - Y) is the expected value of X minus Y.
) (
) (
) (
) (
) (
) (
Y E X
E Y
X E
Y E
X E
Y X
Trang 12Sums and Differences of Random Variables:
The variance of (X + Y) and the variance of (X Y) are described by the following equations
-where Var(X + Y) is the variance of the sum of X and Y, Var(X - Y) is the variance of the
difference between X and Y, Var(X) is the
) (
) (
) (
)
Trang 13Independence of Random Variables
If two random variables, X and Y, are
independent, they satisfy the following
conditions (either one)
Y and X
of values all
for )
( )
( )
(
Y and X
of values all
for )
( )
(
y P x
P y
x
P
x P y
Trang 14 Considering only these types:
Trang 15BINOMIAL DISTRIBUTION
Trang 16Binomial Experiment
The experiment consists of n
repeated trials
Each trial can result in just
two possible outcomes
(success and failure)
The probability of success,
denoted by p, is the same
on every trial
Trang 17 Notation
x: The number of successes that result from the binomial
experiment
n: The number of trials in the binomial experiment
p: The probability of success on an individual trial
q: The probability of failure on an individual trial.
f(x): Binomial probability function
nCx: The number of combinations of n things, taken x at a
Trang 18Binomial Distribution
A binomial random variable is the number
of successes x in n repeated trials of a
binomial experiment
The probability distribution of a binomial
random variable is called a binomial
distribution (also known as a Bernoulli
distribution).
Trang 19The binomial distribution has the following
properties:
The mean of the distribution (μx) is equal to np
The variance (σ 2
x) is np(1 - p).
The standard deviation is σx
It is generally agreed that if the ratio of
sample size to population size is no more
than 0.05, the trials without replacement are essentially independent
Trang 20Binomial Probability
The binomial probability refers to the
probability that a binomial experiment results
in exactly x successes Suppose a binomial experiment consists of n trials and results in
x successes If the probability of success on
an individual trial is P, then the binomial
x
n C p p x
f ( ) ( 1 )
Trang 21Cumulative Binomial Probability
A cumulative binomial probability refers to
the probability that the binomial random
variable falls within a specified range (e.g., is greater than or equal to a stated lower limit and less than or equal to a stated upper
x n x
x n
a x
p p
C x
f a
Trang 23Example: Flooding of a road.
Suppose a road is flooded with probability p = 0.1 during a year and not more than one flood
occurs during a year
What is the probability that it will be flooded at least once during a 5-year period?
What is the probability that it will be flooded
twice during a 5-year period?
What is the probability distribution?
Trang 24Applied: Lot-acceptance sampling:
- A quality control procedure
- A standard is set and if the products meet
this standard, then they are accepted
- A lot is a large number of the same items so impossible to test all
- Therefore, a much smaller random sample is taken from each lot to test
Trang 25Applied: Lot-acceptance sampling
- For a given sampling plan with parameters n
(sample size) and r (acceptance number, e.g
defective items in sample), the probability that a lot will be accepted is Pr (with p is the expected proportion in the lot)
x n x
r
p
p x
n x
n
r F r
x P P
) 1
( )!
(
!
!
) ( )
(
Trang 26- The sample plan: sample size n=10 items
and defective item not larger than 1
Applied: Lot-acceptance sampling
Trang 27n = 15
So, with various n
(sample size) selected, please specify the resulted
effects.
Applied: Lot-acceptance sampling
Trang 28Consumer’s risk and producer’s risk
Consumer’s risk is the probability of accepting a lot that has a higher p than the consumer can tolerate
Producer’s risk is the probability of a lot being rejected that is actually in conformity with the
consumer' The
P
1 risk
s producer' The
Trang 29NEGATIVE BINOMIAL
Trang 30Negative Binomial Experiment
The experiment consists of x repeated trials
Each trial can result in just two possible
outcomes
The probability of success, denoted by p, is the
same on every trial
The trials are independent
The experiment continues until r successes are observed, where r is specified in advance
Trang 31 f*(x): Negative binomial probability
nCr: The number of combinations of n things,
taken r at a time.
Trang 32Negative Binomial Distribution
A negative binomial random variable is the
number x of repeated trials to produce r
successes in a negative binomial
experiment
The probability distribution of a negative
binomial random variable is called a
negative binomial distribution (Pascal
Trang 33Negative Binomial Probability
The negative binomial probability refers to
the probability that a negative binomial
experiment results in r - 1 successes after
trial x - 1 and r successes after trial x
r x r
r
x
f * ( ) 1 1 ( 1 )
Trang 35The Mean of the Negative Binomial
Distribution
If we define the mean of the negative binomial
distribution as the average number of trials required
to produce r successes, then the mean is equal to:
where μ is the mean number of trials, r is the
number of successes, and p is the probability of a success on any given trial.
p
r
Trang 36Example: Delivery of equipments.
A company has bid to supply equipments for a system in a region, having quoted a low price for the job However, the supervising engineer has estimated from previous
experience that 10% of equipments by this company are
defective in someway If 5 equipments are required,
determine the minimum number of equipments to be
ordered to be 95% sure that a sufficient number of
nondefective equipments are delivered It is assumed that the delivery of an equipment is an independent trial and any fault that may occur in one equipment is not related to
Trang 37GEOMETRIC DISTRIBUTION
Trang 38Geometric Distribution
The geometric distribution is a special
case of the negative binomial distribution
It deals with the number of trials required for
a single success Thus, the geometric
distribution is negative binomial distribution
where the number of successes (r) is equal
to 1
Trang 39Geometric Probability Formula.
Suppose a negative binomial experiment
consists of x trials and results in one
success If the probability of success on an
individual trial is p, then the geometric
probability is:
1
) 1
( )
( x p p xf
Trang 40MULTINOMIAL DISTRIBUTION
Trang 41MULTINOMIAL EXPERIMENT
Consisting of n identical trials
For each trial, there are k
possible mutually exclusive
events (A1, …, Ak)
The probability of Ai is pi and
pi remains constant over trials
Sum of pi equals unity
The trials are independent
Trang 42MULTINOMIAL PROBABILITY FUNCTION
The multinomial probability function gives the probability in the n trials of multinomial
experiment:
k
x k x
k
x x
n x
,
1 1
Trang 43MULTINOMIAL DISTRIBUTION
Example: Bids for contracts.
A city engineer invites separate bids for widening four roads Three contractors submit their quotations The first contractor is usually successful in getting 60% of similar work in the area, where as the other two have equal chances of 15%.
What is the probability that the first contractor will be given at least three of the jobs on the basis of past performances?
Trang 44HYPERGEOMETRIC DISTRIBUTION
Trang 45HYPERGEOMETRIC EXPERIMENT
Resembling the binomial experiment except that the hypergeometric involves sampling
from a finite population without replacement
A random sample of n objects is taken one at a time from a finite population of NT objects by
sampling without replacement
Of the NT objects, NS are of one type, called
‘successes’, and NF are of another type
The random variable X is used to count the
number of successes
Trang 46(
!)!
(
!
!
)(
N
x n
N x
n
N x
N x
N
C
C
C x
f
T
F
F S
S
n N
x n N
x N
T
F s
Trang 47HYPERGEOMETRIC PROBABILITY
Trang 48MEAN, VARIANCE AND SD
E ( )
) 1 (
T F
S
N N
n N
Trang 49HYPERGEOMETRIC DISTRIBUTION
Example: Personnel organization
A manager must select a committee of three from his staff of six men (M) and four women (W) He writes their names on separate
pieces of paper, puts in a bowl, then blindly picks a sequence of three papers
Find the probability that he picks two W?
Trang 50POISSON DISTRIBUTION
Trang 511 For a given continuous unit of time or space,
there is a known empirically determined
positive constant, denoted by , that is the
average rate of occurrence of successes in the given unit
2 For any size of subunit of the given unit, the
number of successes occurring in the subunit
is independent of the number of successes in any other nonoverlapping subunit
Trang 523. If the specified unit is divided into very
small subunits denoted by t, the probability
of exactly one success occurring in an t is
very small and it is the same for all ts in the
unit no matter when (where) they appear
4. The probability of more than one success
occurring in any very small subunit t is
essentially zero
Trang 53 The poisson probability function utilizes the constant t to determine the probability of
occurrence of successes (X) in some
multiple t of the defining unit for a Poisson
experiment
!
)
( )
( )
(
x
e
t x
X P
x f
Trang 54POISSON DISTRIBUTION
Trang 55 The mean:
The variance:
The standard deviation:
t X
Trang 56 The cumulative Poisson probability values
calculated with this equation:
t x
x
e
t a
F
!
)
( )
(
Trang 57 A manufacturer of cable, knowing that defects appear
‘randomly’ in the cable as it is produced, wants to use
Poisson techniques to determine the probabilities for
different numbers of defects in a fixed length (unit) of
cable He decides to use 4 meters as the fixed unit and after counting defects in many 4-meter lengths, he finds that the average of there counts is 4 defects per 4 meters.
Find the average rate of occurrence of defects in 1 meter
Find the probability of two defects occurring in 1 meter
Trang 58 Floorboards supplied by a contractor have some imperfections A builder decides that two
imperfections per 40m2 is acceptable
Is there at least a 95% chance of meeting such
requirements, if from previous experience with the same material, an average of one imperfection per
65 m2 has been found?
Trang 59 We deal with some types:
Trang 61 If a continuous random variable
X can assume any value in the
interval [a,b] and only these
values, and if its probability
density function f(X) is constant
over that interval and equal to
zero elsewhere, then X is said to
be uniformly distributed.
elsewhere
0 )
(
for
,
1 )
b x
a a
b X
f
Trang 63 An industrial psychologist has determined
that it takes a worker between 9 and 15
minutes to complete a task on an assembly line If the continuous random variable – time
to complete the task – is uniformly distributed over the interval [9,15]
Finding the probability that a worker will
complete a task shorter than 13 minutes
Trang 65 A continuous random
variable X is said to be
exponentially distributed if,
for any >0, its probability
density function is given by
the exponential probability
density function:
0for x
0f(x)
0for x
)
Trang 67 The cumulative distribution function:
0for x
0
0for x
1
)(
)(
X P
x
Trang 68Example: Floods affecting construction.
An engineer constructing a bridge across a river is concerned of the possible occurrence of a flood
exceeding 100 m3/s which can seriously affect his work If a flow of such magnitude is exceeded once
in 5 years on average, on the basis of recorded
data, what is the chance that the work which is
scheduled to last 14 months can proceed without interruption or detrimental effects?
Trang 70 The beta distribution models a random variable that takes values in the interval given by 0–1.
The distribution plays a special role in decision methods.
1 1
) 1
( )
, (
0 )
(
0 ,
) 1
( )
, (
1 )
, ,
(
dx x
x B
x
f
x
x B
and 1
x 0
for
Trang 72( X
E
) 1 (
Trang 73Example: Maintenance of major roads.
There are 10 major roads in province A and a similar
number and length of roads in province B The
proportion of roads that require substantial
maintenance works during an annual period can be
approximated by beta(4, 3) and beta(1, 4) distributions, respectively, in the two provinces.
(1) Which province should spend more on annual
maintenance?
(2) What is the probability that not more than two roads will require substantial maintenance work in province B during an annual period?
Trang 75 The value of the random variable Y is:
where X is a normal random variable, μ is the
mean, σ is the standard deviation
x
e Y