Ch11 probability and statistics

Applied Structural and Mechanical Vibrations Theory, Methods and Measuring Instrumentation 11 Probability and statistics preliminaries to random vibrations 11 1 Introduction This chapter covers some f.

Trang 1

11 Probability and statistics:

3 and in the appendices), the idea is to introduce and discuss some basicconcepts with the intention of following a continuous line of reasoning fromsimple to more complex topics and the hope of giving the reader a usefulsource of reference for a clear understanding of this text in the first place,but of other more specialized books as well

In everyday conversation, probability is a loosely defined term employed toindicate the measure of one’s belief in the occurrence of a future event whenthis event may or may not occur Moreover, we use this word by indirectlymaking some common assumptions: (1) probabilities near 1 (100%) indicatethat the event is extremely likely to occur, (2) probabilities near zero indicatethat the event is almost not likely to occur and (3) probabilities near 0.5 (50%)indicate a ‘fair chance’, i.e that the event is just as likely to occur as not

If we try to be more specific, we can consider the way in which we assignprobabilities to events and note that, historically, three main approacheshave developed through the centuries We can call them the personalapproach, the relative frequency approach and the classical approach Thepersonal approach reflects a personal opinion and, as such, is alwaysapplicable because anyone can have a personal opinion about anything.However, it is not very fruitful for our purposes The relative frequencyapproach is more objective and pertains to cases in which an ‘experiment’

Trang 2

can be repeated many times and the results observed; P[A], the probability

of occurrence of event A is given as

(11.1)

where n A is the number of times that event A occurred and n is the total

number of times that the experiment was run This approach is surely useful

in itself but, obviously, cannot deal with a one-shot situation and, in any case,

is a definition of an a posteriori probability (i.e we must perform the experiment to determine P[A]) The idea behind this definition is that the ratio on the r.h.s of eq (11.1) is almost constant for sufficiently large values of n.

Finally, the classical approach can be used when it can be reasonablyassumed that the possible outcomes of the experiment are equally likely; then

(11.2)

where n(A) is the number of ways in which outcome A can occur and n(S)

is the number of ways in which the experiment can proceed Note that inthis case we do not really need to perform the experiment because eq (11.2)

defines an a priori probability A typical example is the tossing of a fair coin; without an experiment we can say that n(S)=2 (head or tail) and the

probability of, say, a head is Pictorially (and also forhistorical reasons), we may view eq (11.2) as the ‘gambler’s definition’ ofprobability

However, consider the following simple and classical ‘meeting problem’:two people decide to meet at a given place anytime between noon and 1p.m The one who arrives first is obliged to wait 20 min and then leave Iftheir arrival times are independent, what is the probability that they actuallymeet? The answer is 5/9 (as the reader is invited to verify) but the point isthat this problem cannot be tackled with the definitions of probability givenabove

We will not pursue the subject here, but it is evident that the definitionsabove cannot deal with a large number of problems of great interest As amatter of fact, a detailed analysis of both definitions (11.1) and (11.2)—because of their intrinsic limitations, logical flaws and lack of stringency—shows that they are inadequate to form a solid basis for a more rigorousmathematical theory of probability Also, the von Mises definition, whichextends the relative frequency approach by writing

(11.3)suffers serious limitations and runs into insurmountable logical difficulties

Trang 3

The solutions to these difficulties was given by the axiomatic theory ofprobability introduced by Kolmogorov Before introducing this theory,however, it is worth considering some basic ideas which may be useful asguidelines for Kolmogorov’s abstract formulation.

Let us consider eq (11.2), we note that, in order to determine what is

‘probable’, we must first determine what is ‘possible’; this means that wehave to make a list of possibilities for the experiment Some commondefinitions are as follows: a possible outcome of our experiment is called anevent and we can distinguish between simple events, which can happen only

in one way, and compound events, which can happen in more than one distinctway In the rolling of a die, for example, a simple event is the observation of

a 6, whereas a compound event is the observation of an even number (2, 4 or6) In other words, simple events cannot be decomposed and are also calledsample points The set of all possible sample points is called a sample space.Now, adopting the notation of elementary set theory, we view the sample

space as a set W whose elements E j are the sample points If the samplespace is discrete, i.e contains a finite or countable number of sample points,

any compound event A is a subset of W and can be viewed as a collection

of two or more sample points, i.e as the ‘union’ of two or more samplepoints In the die-rolling experiment above, for example, we can write

where we call A the event ‘observation of an even number’, E2 the samplepoint ‘observation of a 2’ and so on In this case, it is evident that

and, since E2, E4 and E6 are mutually exclusive

(11.4a)The natural extension of eq (11.4a) is

Trang 4

often called the compound probability, i.e the probability that events B and

C occur simultaneously (Note that one often finds also the symbols A+B for and AB for ) Again, in the rolling of a fair die, for example, let

For three nonmutually exclusive sets, it is not difficult to extend eq(11.4d) to

(11.4e)

as the reader is invited to verify

Incidentally, it is evident that the method that we are following requirescounting; for example, the counting of sample points and/or a completeitemization of equiprobable sets of sample points For large sample spacesthis may not be an easy task Fortunately, aid comes from combinatorialanalysis from which we know that the number of permutations (arrangements

of objects in a definite order) of n distinct objects taken r at a time is given by

(11.5)

while the number of combinations (arrangements of objects without regard

to order) of n distinct objects taken r at a time is

(11.6)

For example, if n=3 (objects a, b and c) and r=2, the fact that the number of

combination is less than the number of permutations is evident if one thinks

that in a permutation the arrangement of objects {a, b} is considered different from the arrangement {b, a}, whereas in a combination they count as one

single arrangement

These tools simplify the counting considerably For example, suppose that

a big company has hired 15 new engineers for the same job in differentplants If a particular plant has four vacancies, in how many ways can theyfill these positions? The answer is now straightforward and is given by

C15,4=1365 Moreover, note also that the calculations of factorials can beoften made easier by using Stirling’s formula, i.e whichresults in errors smaller that 1% for

Returning now to our main discussion, we can make a final commentbefore introducing the axiomatic theory of probability: the fact that two

events B and C are mutually exclusive is formalized in the language of sets

Trang 5

as where Ø is the empty set So, we need to include this event inthe sample space and require that By so doing, we obtain the

expected result that eq (11.4d) reduces to the sum P[B]+P[C] whenever events

B and C are mutually exclusive In probability terminology, Ø is called the

‘impossible event’

11.2.1 Probability—axiomatic formulation and some

fundamental results

We define a probability space as a triplet where:

1 W is a set whose elements are called elementary events.

2 is a σ-algebra of subsets of W which are called events.

3 P is a probability function, i.e a real-valued function with domain and

3 If and for every index j=1, 2, 3,…then

Two observations can be made immediately First—although it may not seemobvious—the axiomatic definition includes as particular cases both theclassical and the relative frequency definitions of probability without sufferingtheir limitations; second, this definition does not tell us what value ofprobability to assign to a given event This is in no way a limitation

of this definition but simply means that we will have to model our experiment

in some way in order to obtain values for the probability of events In fact,many problems of interest deal with sets of identical events which are notequally likely (for example, the rolling of a biased die)

Let us introduce now two other definitions of practical importance:conditional probability and the independence of events Intuitively, we canargue that the probability of an event can vary depending upon the occurrence

or nonoccurrence of one or more related events: in fact, it is different to ask

in the die-rolling experiment ‘What is the probability of a 6?’ or ‘What is theprobability of a 6 given that an even number has fallen?’ The answer to thefirst question is 1/6 while the answer to the second question is 1/3 This is

the concept of conditional probability, i.e the probability of an event A

Trang 6

given that an event B has already occurred The symbol for conditional probability is P[A|B] and its definition is

(11.7)

provided that It is not difficult to see that, for a given probabilityspace satisfies the three axioms above and is a probabilityfunction in its own right Equation (11.7) yields immediately themultiplication rule for probabilities, i.e

(11.8a)which can be generalized to a number of events as follows:

If the occurrence of event B has no effect on the probability assigned to

an event A, then A and B are said to be independent and we can express this

fact in terms of conditional probability as

(11.9a)

or, equivalently

(11.9b)Clearly, two mutually exclusive events are not independent because, from

eq (11.7), we have P[A|B]=0 when Also, if A and B are two

independent events, we get from eq (11.7)

(11.10a)which is referred to as the multiplication theorem for independent events.(Note that some authors give eq (11.10a) as the definition of independent

events) For n mutually (or collectively) independent events eq (11.8b) yields

Trang 7

Example 11.1 Consider a lottery with eight numbers (1–8) and let

respectively, be the simple events of extraction of 1, extraction

meaning that the three events are not mutually, or collectively, independent

Another important result is known as the total probability formula Let

be n mutually exclusive events such that where

W is the sample space Then, a generic event B can be expressed as

(11.11)

where the n events are mutually exclusive Owing to the third axiom

of probability, this implies

so that, by using the multiplication theorem, we get the total probabilityformula

(11.12)

which remains true for

With the same assumptions as above on the events let

us now consider a particular event A k; the definition of conditional probabilityyields

(11.13)

Trang 8

where eq (11.12) has been taken into account Also, by virtue of eq (11.8a) wecan write so that substituting in eq (11.13) we get

(11.14)

which is known as Bayes’ formula and deserves some comments First, the

formula is true if Second, eq (11.14) is particularly useful for experiments

consisting of stages Typically, the A j s are events defined in terms of a first stage (or, otherwise, the P[A j ] are known for some reason), while B is an event defined

in terms of the whole experiment including a second stage; asking for P[A k |B]

is then, in a sense, ‘backward’, we ask for the probability of an event defined atthe first stage conditioned by what happens in a later stage In Bayes’ formulathis probability is given in terms of the ‘natural’ conditioning, i.e conditioning

on what happens at the first stage of the experiment This is why the P[A j ] are called the a priori (or prior) probabilities, whereas P[A k |B] is called a posteriori

(posterior or inverse) probability The advantage of this approach is to be able

to modify the original predictions by incorporating new data Obviously, theinitial hypotheses play an important role in this case; if the initial assumptionsare based on an insufficient knowledge of the mechanism of the process, theprior probabilities are no better than reasonable guesses

Example 11.2 Among voters in a certain area, 40% support party 1 and

60% support party 2 Additional research indicates that a certain electionissue is favoured by 30% of supporters of party 1 and by 70% of supporters

of party 2 One person at random from that area—when asked—says thathe/she favours the issue in question What is the probability that he/she is asupporter of party 2? Now, let

• A1 be the event that a person supports party 1, so that P[A1]=0.4;

• A2 be the event that a person supports party 2, so that P[A2]=0.6;

• B be the event that a person at random in the area favours the issue in

question

Prior knowledge (the results of the research) indicate that P[B|A1]=0.3 and

P[B|A2]=0.7 The problem asks for the a posteriori probability P[A2|B], i.e the

probability that the person who was asked supports party 2 given the fact thathe/she favours that specific election issue From Bayes’ formula we get

Then, obviously, we can also infer that

Trang 9

11.3 Random variables, probability distribution functions

and probability density functions

Events of major interest in science and engineering are those identified bynumbers Moreover—since we assume that the reader is already familiar withthe term ‘variable’—we can state that a random variable is a real variablewhose observed values are determined by chance or by a number of causesbeyond our control which defy any attempt at a deterministic description Inthis regard, it is important to note that the engineer’s and applied scientist’sapproach is not so much to ask whether a certain quantity is a random variable

or not (which is often debatable), but to ask whether that quantity can bemodelled as a random variable and if this approach leads to meaningful results

In mathematical terms, let x be any real number, then a random variable

on the probability space (W, , P) is a function ( is the set ofreal numbers) such that the sets

are events, i.e In words, let X be a real-valued function defined on W; given a real number x, we call B x the set of all elementary events w for

which If, for every x the sets B x belong to the σ-algebra , then X

is a (one-dimensional) random variable

The above definition may seem a bit intricate at first glance, but a littlethought will show that it provides us precisely with what we need In fact,

we can now assign a definite meaning to expression P[Bx], i.e the probability that the random variable X corresponding to a given experiment will assume

a value less than or equal to x It is then straightforward, for a given random variable X, to define the function as

(11.15)

which is called the cumulative distribution function (cdf, or the distribution

function) of the random variable X From the definition, the following

properties can be easily proved:

(11.16)

where x1, and x2 are any two real numbers such that In other words,distribution functions are monotonically non-decreasing functions which start

Trang 10

at zero for and increase to unity for It should be noted thatevery random variable defines uniquely its distribution functions but a givendistribution function corresponds to an arbitrary number of different randomvariables Moreover, the probabilistic properties of a random variable can

be completely characterized by its distribution function

Among all possible random variables, an important distinction can be

made between discrete and continuous random variables The term discrete

means that the random variable can assume only a finite or countably infinitenumber of distinct possible values Then, a complete descriptioncan be obtained by knowing the probabilities for

k=1, 2, 3,…by defining the distribution function as

discontinuities occurring at any point x k A typical and simple example is

provided by the die-rolling experiment where X is the numerical value

observed in the rolling of the die In this case, etc and

for every k=1, 2,…, 6 Then

for

A continuous random variable, on the other hand, can assume any value

in some interval of the real line For a large and important class of randomvariables there exist a certain non-negative function (x) which satisfies

the relationship

(11.19)

where px(x) is called the probability density function (pdf) and η is a dummy

Trang 11

variable of integration The main properties of (x) can be summarized as

follows:

(11.20)

The second property is often called the normalization condition and is

equivalent to Also, it is important to notice a fundamentaldifference with respect to discrete random variables: the probability that the

continuous random variable X assumes a specific value x is zero and probabilities must be defined over an interval Specifically, if (x) is continuous at x we have

(11.21a)and, obviously

(11.21b)

Example 11.3 Discrete random variables—binomial, Poisson and geometric

distributions Let us consider a fixed number (n) of typical ‘Bernoulli trials’.

A ‘Bernoulli trial’ is an experiment with only two possible outcomes whichare usually called ‘success’ and ‘failure’ Furthermore, the probability of

success is p and does not change from trial to trial, the probability of failure

is and the trials are independent The discrete random variable of

interest X is the number of successes during the n trials It is shown in every book on statistics that the probability of having x successes is given by

(11.22)

where x=1, 2, 3,…, n and 0<p<1 We say that a random variable has a binomial distribution with parameters n and p when its density function is

given by eq (11.22)

Now, suppose that p is very small and suppose that n becomes very large

in such a way that the product pn is equal to a constant In mathematical

terms, provided that we can let and then

Trang 12

because A random variable X with a pdf given by

(11.23)

is said to have a Poisson distribution with parameter Equation (11.23)

is a good approximation for the binomial equation (11.22) when either

or Poisson-distributed random variablesarise in a number of situations, the most common of which concern ‘rare’events, i.e events with a small probability of occurrence The parameter then represents the average number of occurrences of the event permeasurement unit (i.e a unit of time, length, area, space, etc.) For example,knowing that at a certain intersection we have on average 1.7 car accidentsper month, the probability of zero accidents in a month is given by

The fact that the number of accidents follows aPoisson distribution can be roughly established as follows Divide a month

into n intervals, each of which is so small that at most one accident can

occur with a probability Then, during each interval (if the occurrence

of accidents can be considered as independent from interval to interval)

we have a Bernoulli trial where the probability of ‘success’ p is relatively small if n is large and Note that we do not need to know the

values of n and/or p (which can be, to a certain extent, arbitrary), but it is

sufficient to verify that the underlying assumptions of the Poissondistribution hold

If now, in a series of Bernoulli trials we consider X to be the number of

trials before the first success occurs we are, broadly speaking, dealing withthe same problem as in the first case but we are asking a different question

Trang 13

(the number of trials is not fixed in this case) It is not difficult to show thatthis circumstance leads to the geometric distribution, which is written

a century later by Gauss and Laplace Its importance is due to the central

limit theorem which we will discuss in a later section A random variable X

is said to have a Gaussian (or normal) distribution with parameters µ, and

(11.28)

Equation (11.28) has been given because either FZ(z) or Φ(z) are commonly

found in statistical tables

Trang 14

Also, it can be shown (local Laplace-de Moivre theorem, see for example

Gnedenko [1]) that when n and np are both large—i.e for —we have

(11.29)

meaning that the binomial distribution can be approximated by a Gaussiandistribution The r.h.s of eq (11.29) is called the Gaussian approximation tothe binomial distribution

Example 11.5 For purposes of illustration, let us take a probabilistic approach

to a deterministic problem Consider the sinusoidal deterministic signal

We ask, for any given value of amplitude x<x0, what is the probability that the amplitude of our signal lies between x and x+dx?

From our previous discussion it is evident that we are asking for the pdf of

the ‘random’ variable X, i.e the amplitude of our signal This can be obtained

by calculating the time that the signal amplitude spends between x and x+dx

during an entire period Now, from we get

which yields

(11.30)

Within a period T the amplitude passes in the interval from x to x+dx

twice, so that the total amount of time that it spends in such an interval is

2dt; hence

(11.31)

where the last expression holds because But, noting that 2dt/T is

exactly i.e the probability that, within a period, the amplitude lies

between x and x+dx, we get

(11.32)

Trang 15

which is shown in Fig 11.1 for x0=1 From this graph it can be noted that

a sinusoidal wave spends more time near its peak values than it does near itsabscissa axis (i.e its mean value)

11.4 Descriptors of random variable behaviour

From the discussion of preceding sections, it is evident that the completedescription of the behaviour of a random variable is provided by itsdistribution function However, a certain degree of information—althoughnot complete in many cases—can be obtained by well-known descriptorssuch as the mean value, the standard deviation etc These familiar concepts

are special cases of a series of descriptors called moments of a random

variable For a continuous random variable X, we define the first moment

Trang 16

Equations (11.33a) or (11.33b) define what is usually called in engineering

terms the mean (or also the ‘expected value’) of X and is indicated by the

symbol µ X Similarly, the second moment is the expected value of X2—i.e E[X2]—and has a special name, the mean squared value of X, which for a

continuous random variable is written as

Equations (11.35a) are just particular cases of eq (11.35b)

When we first subtract its mean from the random variable and then

calculate the expected values, we speak of central moments, i.e the mth

central moment is given by

(11.36)

In particular, the second central moment is well known and

has a special name: the variance, usually indicated with the symbols or

Var[X] Note that the variance can also be evaluated by

(11.37)

which is just a particular case of the fact that central moments can beevaluated in terms of ordinary (noncentral) moments by virtue of the binomialtheorem In formulas we have

(11.38)

Trang 17

The square root of the variance, i.e is called the standard

deviation and we commonly find the symbols s X or SD[X].

Example 11.6 Let us consider some of the pdfs introduced in previous sections

and calculate their mean and variance For the binomial distribution, forexample, we can show that

(11.39)

The first of eqs (11.39) can be obtained as follows:

where the last equality holds because the summation represents the sum ofall the ordinates of the binomial distribution and must be equal to 1 for thenormalization condition For the second of eqs (11.39) we can use eq (11.37)

so that we only need the term E[X2] This is given by

so that

Tiêu đề	Probability and Statistics: Preliminaries to Random Vibrations
Trường học	University of Example
Chuyên ngành	Probability and Statistics
Thể loại	Textbook
Năm xuất bản	2003
Thành phố	New York

Định dạng
Số trang	34
Dung lượng	494,76 KB