Applied Structural and Mechanical Vibrations Theory, Methods and Measuring Instrumentation 11 Probability and statistics preliminaries to random vibrations 11 1 Introduction This chapter covers some f.
Trang 111 Probability and statistics:
3 and in the appendices), the idea is to introduce and discuss some basicconcepts with the intention of following a continuous line of reasoning fromsimple to more complex topics and the hope of giving the reader a usefulsource of reference for a clear understanding of this text in the first place,but of other more specialized books as well
In everyday conversation, probability is a loosely defined term employed toindicate the measure of one’s belief in the occurrence of a future event whenthis event may or may not occur Moreover, we use this word by indirectlymaking some common assumptions: (1) probabilities near 1 (100%) indicatethat the event is extremely likely to occur, (2) probabilities near zero indicatethat the event is almost not likely to occur and (3) probabilities near 0.5 (50%)indicate a ‘fair chance’, i.e that the event is just as likely to occur as not
If we try to be more specific, we can consider the way in which we assignprobabilities to events and note that, historically, three main approacheshave developed through the centuries We can call them the personalapproach, the relative frequency approach and the classical approach Thepersonal approach reflects a personal opinion and, as such, is alwaysapplicable because anyone can have a personal opinion about anything.However, it is not very fruitful for our purposes The relative frequencyapproach is more objective and pertains to cases in which an ‘experiment’
Trang 2can be repeated many times and the results observed; P[A], the probability
of occurrence of event A is given as
(11.1)
where n A is the number of times that event A occurred and n is the total
number of times that the experiment was run This approach is surely useful
in itself but, obviously, cannot deal with a one-shot situation and, in any case,
is a definition of an a posteriori probability (i.e we must perform the experiment to determine P[A]) The idea behind this definition is that the ratio on the r.h.s of eq (11.1) is almost constant for sufficiently large values of n.
Finally, the classical approach can be used when it can be reasonablyassumed that the possible outcomes of the experiment are equally likely; then
(11.2)
where n(A) is the number of ways in which outcome A can occur and n(S)
is the number of ways in which the experiment can proceed Note that inthis case we do not really need to perform the experiment because eq (11.2)
defines an a priori probability A typical example is the tossing of a fair coin; without an experiment we can say that n(S)=2 (head or tail) and the
probability of, say, a head is Pictorially (and also forhistorical reasons), we may view eq (11.2) as the ‘gambler’s definition’ ofprobability
However, consider the following simple and classical ‘meeting problem’:two people decide to meet at a given place anytime between noon and 1p.m The one who arrives first is obliged to wait 20 min and then leave Iftheir arrival times are independent, what is the probability that they actuallymeet? The answer is 5/9 (as the reader is invited to verify) but the point isthat this problem cannot be tackled with the definitions of probability givenabove
We will not pursue the subject here, but it is evident that the definitionsabove cannot deal with a large number of problems of great interest As amatter of fact, a detailed analysis of both definitions (11.1) and (11.2)—because of their intrinsic limitations, logical flaws and lack of stringency—shows that they are inadequate to form a solid basis for a more rigorousmathematical theory of probability Also, the von Mises definition, whichextends the relative frequency approach by writing
(11.3)suffers serious limitations and runs into insurmountable logical difficulties
Trang 3The solutions to these difficulties was given by the axiomatic theory ofprobability introduced by Kolmogorov Before introducing this theory,however, it is worth considering some basic ideas which may be useful asguidelines for Kolmogorov’s abstract formulation.
Let us consider eq (11.2), we note that, in order to determine what is
‘probable’, we must first determine what is ‘possible’; this means that wehave to make a list of possibilities for the experiment Some commondefinitions are as follows: a possible outcome of our experiment is called anevent and we can distinguish between simple events, which can happen only
in one way, and compound events, which can happen in more than one distinctway In the rolling of a die, for example, a simple event is the observation of
a 6, whereas a compound event is the observation of an even number (2, 4 or6) In other words, simple events cannot be decomposed and are also calledsample points The set of all possible sample points is called a sample space.Now, adopting the notation of elementary set theory, we view the sample
space as a set W whose elements E j are the sample points If the samplespace is discrete, i.e contains a finite or countable number of sample points,
any compound event A is a subset of W and can be viewed as a collection
of two or more sample points, i.e as the ‘union’ of two or more samplepoints In the die-rolling experiment above, for example, we can write
where we call A the event ‘observation of an even number’, E2 the samplepoint ‘observation of a 2’ and so on In this case, it is evident that
and, since E2, E4 and E6 are mutually exclusive
(11.4a)The natural extension of eq (11.4a) is
Trang 4often called the compound probability, i.e the probability that events B and
C occur simultaneously (Note that one often finds also the symbols A+B for and AB for ) Again, in the rolling of a fair die, for example, let
For three nonmutually exclusive sets, it is not difficult to extend eq(11.4d) to
(11.4e)
as the reader is invited to verify
Incidentally, it is evident that the method that we are following requirescounting; for example, the counting of sample points and/or a completeitemization of equiprobable sets of sample points For large sample spacesthis may not be an easy task Fortunately, aid comes from combinatorialanalysis from which we know that the number of permutations (arrangements
of objects in a definite order) of n distinct objects taken r at a time is given by
(11.5)
while the number of combinations (arrangements of objects without regard
to order) of n distinct objects taken r at a time is
(11.6)
For example, if n=3 (objects a, b and c) and r=2, the fact that the number of
combination is less than the number of permutations is evident if one thinks
that in a permutation the arrangement of objects {a, b} is considered different from the arrangement {b, a}, whereas in a combination they count as one
single arrangement
These tools simplify the counting considerably For example, suppose that
a big company has hired 15 new engineers for the same job in differentplants If a particular plant has four vacancies, in how many ways can theyfill these positions? The answer is now straightforward and is given by
C15,4=1365 Moreover, note also that the calculations of factorials can beoften made easier by using Stirling’s formula, i.e whichresults in errors smaller that 1% for
Returning now to our main discussion, we can make a final commentbefore introducing the axiomatic theory of probability: the fact that two
events B and C are mutually exclusive is formalized in the language of sets
Trang 5as where Ø is the empty set So, we need to include this event inthe sample space and require that By so doing, we obtain the
expected result that eq (11.4d) reduces to the sum P[B]+P[C] whenever events
B and C are mutually exclusive In probability terminology, Ø is called the
‘impossible event’
11.2.1 Probability—axiomatic formulation and some
fundamental results
We define a probability space as a triplet where:
1 W is a set whose elements are called elementary events.
2 is a σ-algebra of subsets of W which are called events.
3 P is a probability function, i.e a real-valued function with domain and
3 If and for every index j=1, 2, 3,…then
Two observations can be made immediately First—although it may not seemobvious—the axiomatic definition includes as particular cases both theclassical and the relative frequency definitions of probability without sufferingtheir limitations; second, this definition does not tell us what value ofprobability to assign to a given event This is in no way a limitation
of this definition but simply means that we will have to model our experiment
in some way in order to obtain values for the probability of events In fact,many problems of interest deal with sets of identical events which are notequally likely (for example, the rolling of a biased die)
Let us introduce now two other definitions of practical importance:conditional probability and the independence of events Intuitively, we canargue that the probability of an event can vary depending upon the occurrence
or nonoccurrence of one or more related events: in fact, it is different to ask
in the die-rolling experiment ‘What is the probability of a 6?’ or ‘What is theprobability of a 6 given that an even number has fallen?’ The answer to thefirst question is 1/6 while the answer to the second question is 1/3 This is
the concept of conditional probability, i.e the probability of an event A
Trang 6given that an event B has already occurred The symbol for conditional probability is P[A|B] and its definition is
(11.7)
provided that It is not difficult to see that, for a given probabilityspace satisfies the three axioms above and is a probabilityfunction in its own right Equation (11.7) yields immediately themultiplication rule for probabilities, i.e
(11.8a)which can be generalized to a number of events as follows:
If the occurrence of event B has no effect on the probability assigned to
an event A, then A and B are said to be independent and we can express this
fact in terms of conditional probability as
(11.9a)
or, equivalently
(11.9b)Clearly, two mutually exclusive events are not independent because, from
eq (11.7), we have P[A|B]=0 when Also, if A and B are two
independent events, we get from eq (11.7)
(11.10a)which is referred to as the multiplication theorem for independent events.(Note that some authors give eq (11.10a) as the definition of independent
events) For n mutually (or collectively) independent events eq (11.8b) yields
Trang 7Example 11.1 Consider a lottery with eight numbers (1–8) and let
respectively, be the simple events of extraction of 1, extraction
meaning that the three events are not mutually, or collectively, independent
Another important result is known as the total probability formula Let
be n mutually exclusive events such that where
W is the sample space Then, a generic event B can be expressed as
(11.11)
where the n events are mutually exclusive Owing to the third axiom
of probability, this implies
so that, by using the multiplication theorem, we get the total probabilityformula
(11.12)
which remains true for
With the same assumptions as above on the events let
us now consider a particular event A k; the definition of conditional probabilityyields
(11.13)
Trang 8where eq (11.12) has been taken into account Also, by virtue of eq (11.8a) wecan write so that substituting in eq (11.13) we get
(11.14)
which is known as Bayes’ formula and deserves some comments First, the
formula is true if Second, eq (11.14) is particularly useful for experiments
consisting of stages Typically, the A j s are events defined in terms of a first stage (or, otherwise, the P[A j ] are known for some reason), while B is an event defined
in terms of the whole experiment including a second stage; asking for P[A k |B]
is then, in a sense, ‘backward’, we ask for the probability of an event defined atthe first stage conditioned by what happens in a later stage In Bayes’ formulathis probability is given in terms of the ‘natural’ conditioning, i.e conditioning
on what happens at the first stage of the experiment This is why the P[A j ] are called the a priori (or prior) probabilities, whereas P[A k |B] is called a posteriori
(posterior or inverse) probability The advantage of this approach is to be able
to modify the original predictions by incorporating new data Obviously, theinitial hypotheses play an important role in this case; if the initial assumptionsare based on an insufficient knowledge of the mechanism of the process, theprior probabilities are no better than reasonable guesses
Example 11.2 Among voters in a certain area, 40% support party 1 and
60% support party 2 Additional research indicates that a certain electionissue is favoured by 30% of supporters of party 1 and by 70% of supporters
of party 2 One person at random from that area—when asked—says thathe/she favours the issue in question What is the probability that he/she is asupporter of party 2? Now, let
• A1 be the event that a person supports party 1, so that P[A1]=0.4;
• A2 be the event that a person supports party 2, so that P[A2]=0.6;
• B be the event that a person at random in the area favours the issue in
question
Prior knowledge (the results of the research) indicate that P[B|A1]=0.3 and
P[B|A2]=0.7 The problem asks for the a posteriori probability P[A2|B], i.e the
probability that the person who was asked supports party 2 given the fact thathe/she favours that specific election issue From Bayes’ formula we get
Then, obviously, we can also infer that
Trang 911.3 Random variables, probability distribution functions
and probability density functions
Events of major interest in science and engineering are those identified bynumbers Moreover—since we assume that the reader is already familiar withthe term ‘variable’—we can state that a random variable is a real variablewhose observed values are determined by chance or by a number of causesbeyond our control which defy any attempt at a deterministic description Inthis regard, it is important to note that the engineer’s and applied scientist’sapproach is not so much to ask whether a certain quantity is a random variable
or not (which is often debatable), but to ask whether that quantity can bemodelled as a random variable and if this approach leads to meaningful results
In mathematical terms, let x be any real number, then a random variable
on the probability space (W, , P) is a function ( is the set ofreal numbers) such that the sets
are events, i.e In words, let X be a real-valued function defined on W; given a real number x, we call B x the set of all elementary events w for
which If, for every x the sets B x belong to the σ-algebra , then X
is a (one-dimensional) random variable
The above definition may seem a bit intricate at first glance, but a littlethought will show that it provides us precisely with what we need In fact,
we can now assign a definite meaning to expression P[Bx], i.e the probability that the random variable X corresponding to a given experiment will assume
a value less than or equal to x It is then straightforward, for a given random variable X, to define the function as
(11.15)
which is called the cumulative distribution function (cdf, or the distribution
function) of the random variable X From the definition, the following
properties can be easily proved:
(11.16)
where x1, and x2 are any two real numbers such that In other words,distribution functions are monotonically non-decreasing functions which start
Trang 10at zero for and increase to unity for It should be noted thatevery random variable defines uniquely its distribution functions but a givendistribution function corresponds to an arbitrary number of different randomvariables Moreover, the probabilistic properties of a random variable can
be completely characterized by its distribution function
Among all possible random variables, an important distinction can be
made between discrete and continuous random variables The term discrete
means that the random variable can assume only a finite or countably infinitenumber of distinct possible values Then, a complete descriptioncan be obtained by knowing the probabilities for
k=1, 2, 3,…by defining the distribution function as
discontinuities occurring at any point x k A typical and simple example is
provided by the die-rolling experiment where X is the numerical value
observed in the rolling of the die In this case, etc and
for every k=1, 2,…, 6 Then
for
A continuous random variable, on the other hand, can assume any value
in some interval of the real line For a large and important class of randomvariables there exist a certain non-negative function (x) which satisfies
the relationship
(11.19)
where px(x) is called the probability density function (pdf) and η is a dummy
Trang 11variable of integration The main properties of (x) can be summarized as
follows:
(11.20)
The second property is often called the normalization condition and is
equivalent to Also, it is important to notice a fundamentaldifference with respect to discrete random variables: the probability that the
continuous random variable X assumes a specific value x is zero and probabilities must be defined over an interval Specifically, if (x) is continuous at x we have
(11.21a)and, obviously
(11.21b)
Example 11.3 Discrete random variables—binomial, Poisson and geometric
distributions Let us consider a fixed number (n) of typical ‘Bernoulli trials’.
A ‘Bernoulli trial’ is an experiment with only two possible outcomes whichare usually called ‘success’ and ‘failure’ Furthermore, the probability of
success is p and does not change from trial to trial, the probability of failure
is and the trials are independent The discrete random variable of
interest X is the number of successes during the n trials It is shown in every book on statistics that the probability of having x successes is given by
(11.22)
where x=1, 2, 3,…, n and 0<p<1 We say that a random variable has a binomial distribution with parameters n and p when its density function is
given by eq (11.22)
Now, suppose that p is very small and suppose that n becomes very large
in such a way that the product pn is equal to a constant In mathematical
terms, provided that we can let and then
Trang 12because A random variable X with a pdf given by
(11.23)
is said to have a Poisson distribution with parameter Equation (11.23)
is a good approximation for the binomial equation (11.22) when either
or Poisson-distributed random variablesarise in a number of situations, the most common of which concern ‘rare’events, i.e events with a small probability of occurrence The parameter then represents the average number of occurrences of the event permeasurement unit (i.e a unit of time, length, area, space, etc.) For example,knowing that at a certain intersection we have on average 1.7 car accidentsper month, the probability of zero accidents in a month is given by
The fact that the number of accidents follows aPoisson distribution can be roughly established as follows Divide a month
into n intervals, each of which is so small that at most one accident can
occur with a probability Then, during each interval (if the occurrence
of accidents can be considered as independent from interval to interval)
we have a Bernoulli trial where the probability of ‘success’ p is relatively small if n is large and Note that we do not need to know the
values of n and/or p (which can be, to a certain extent, arbitrary), but it is
sufficient to verify that the underlying assumptions of the Poissondistribution hold
If now, in a series of Bernoulli trials we consider X to be the number of
trials before the first success occurs we are, broadly speaking, dealing withthe same problem as in the first case but we are asking a different question
Trang 13(the number of trials is not fixed in this case) It is not difficult to show thatthis circumstance leads to the geometric distribution, which is written
a century later by Gauss and Laplace Its importance is due to the central
limit theorem which we will discuss in a later section A random variable X
is said to have a Gaussian (or normal) distribution with parameters µ, and
(11.28)
Equation (11.28) has been given because either FZ(z) or Φ(z) are commonly
found in statistical tables
Trang 14Also, it can be shown (local Laplace-de Moivre theorem, see for example
Gnedenko [1]) that when n and np are both large—i.e for —we have
(11.29)
meaning that the binomial distribution can be approximated by a Gaussiandistribution The r.h.s of eq (11.29) is called the Gaussian approximation tothe binomial distribution
Example 11.5 For purposes of illustration, let us take a probabilistic approach
to a deterministic problem Consider the sinusoidal deterministic signal
We ask, for any given value of amplitude x<x0, what is the probability that the amplitude of our signal lies between x and x+dx?
From our previous discussion it is evident that we are asking for the pdf of
the ‘random’ variable X, i.e the amplitude of our signal This can be obtained
by calculating the time that the signal amplitude spends between x and x+dx
during an entire period Now, from we get
which yields
(11.30)
Within a period T the amplitude passes in the interval from x to x+dx
twice, so that the total amount of time that it spends in such an interval is
2dt; hence
(11.31)
where the last expression holds because But, noting that 2dt/T is
exactly i.e the probability that, within a period, the amplitude lies
between x and x+dx, we get
(11.32)
Trang 15which is shown in Fig 11.1 for x0=1 From this graph it can be noted that
a sinusoidal wave spends more time near its peak values than it does near itsabscissa axis (i.e its mean value)
11.4 Descriptors of random variable behaviour
From the discussion of preceding sections, it is evident that the completedescription of the behaviour of a random variable is provided by itsdistribution function However, a certain degree of information—althoughnot complete in many cases—can be obtained by well-known descriptorssuch as the mean value, the standard deviation etc These familiar concepts
are special cases of a series of descriptors called moments of a random
variable For a continuous random variable X, we define the first moment
Trang 16Equations (11.33a) or (11.33b) define what is usually called in engineering
terms the mean (or also the ‘expected value’) of X and is indicated by the
symbol µ X Similarly, the second moment is the expected value of X2—i.e E[X2]—and has a special name, the mean squared value of X, which for a
continuous random variable is written as
Equations (11.35a) are just particular cases of eq (11.35b)
When we first subtract its mean from the random variable and then
calculate the expected values, we speak of central moments, i.e the mth
central moment is given by
(11.36)
In particular, the second central moment is well known and
has a special name: the variance, usually indicated with the symbols or
Var[X] Note that the variance can also be evaluated by
(11.37)
which is just a particular case of the fact that central moments can beevaluated in terms of ordinary (noncentral) moments by virtue of the binomialtheorem In formulas we have
(11.38)
Trang 17The square root of the variance, i.e is called the standard
deviation and we commonly find the symbols s X or SD[X].
Example 11.6 Let us consider some of the pdfs introduced in previous sections
and calculate their mean and variance For the binomial distribution, forexample, we can show that
(11.39)
The first of eqs (11.39) can be obtained as follows:
where the last equality holds because the summation represents the sum ofall the ordinates of the binomial distribution and must be equal to 1 for thenormalization condition For the second of eqs (11.39) we can use eq (11.37)
so that we only need the term E[X2] This is given by
so that