Nguyen An Khuong, Huynh Tuong Nguyen Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Bi
Trang 1Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Chapter 7
Discrete Probability with R
Discrete Structures for Computer Science (CO1007) on
Trang 2Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
5 Probability calculations and combinatorics with R
6 Discrete Random variables
7 Some Discrete Probability Models
Trang 3Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Motivations
• Gambling
• Real life problems
• Computer Science: cryptology, coding theory, algorithmic
complexity,
Trang 4Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Randomness
Which of these arerandom phenomena?
• The number you receive when rolling afair dice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency
(fraction of times that the event occurs over and over and over)
Trang 5Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Randomness in Statistics
• Randomness and probability: central to statistics
• Empirical fact: Most experiments and investigations are not
perfectly reproducible
• The degree of irreproducibility may vary:
• Some experiments in physics may yield data that are accurate
to many decimal places,
• whereas data on biological systems are typically much less
reliable
• View of data as something coming from a statistical
distribution: vital to understanding statistical methods
• We outline the basic ideas of probability and the functions
that R has for random sampling and handling of theoretical
distributions
Trang 6Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Random Numbers with R
• Much of the earliest work in probability theory was about
games and gambling issues, based on symmetry
considerations
• The basic notion then is that of a random sample: dealing
from a well-shuffled pack of cards or picking numbered balls
from a well-stirred urn
• In R, we can simulate these situations with the sample
function
• If we want to pick five numbers at random from the set
1 : 40, then you can write
> sample(1:40,5)
[1] 4 30 28 40 13
Trang 7Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Sample function
• The first argument(x)is a vector of values to be sampled
• The second (size)is the sample size
• Actually, sample(40, 5)would suffice since a single number is
interpreted to represent the length of a sequence of integers
• Notice that the default behavior ofsampleis sampling
without replacement
• That is, the samples will not contain the same number twice,
and size obviously cannot be bigger than the length of the
vector to be sampled
• If we want sampling with replacement, then we need to add
the argument replace = TRUE
Trang 8Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Sampling with replacement
• Sampling with replacement is suitable for modelling coin
tosses or throws of a die
• So, for instance, to simulate 10 coin tosses we could write
> sample(c("H","T"), 10, replace=T)
[1] "T" "T" "T" "T" "T" "H" "H" "T" "H" "T"
• In fair coin-tossing, the probability of heads should equal the
probability of tails, but the idea of a random event is not
restricted to symmetric cases
Trang 9Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Data with nonequal probabilities
• You can simulate data with nonequal probabilities for the
outcomes (say, a90% chance of success) by using theprob
argument to sample, as in
> sample(c("succ", "fail"), 10, replace=T,
prob=c(0.9, 0.1))
[1] "succ" "succ" "succ" "succ" "succ"
"fail" "succ" "succ" "succ" "fail"
• This may not be the best way to generate such a sample,
though See the later discussion of the binomial distribution
Trang 10Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Terminology
• Experiment/trial(thí
nghiệm (ngẫu nhiên)/phép
thử ): a procedure that yields
one of a given set of possible
• Event(sự kiện): a subset of sample space
• You see Head after an experiment {Head} is an event
• {1, 3, 5}
Trang 11Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Example
Example
Experiment: Rolling two dice What is the sample space?
Answer:It depends on what we’re going to ask!
• The total number?
Trang 12Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
The Law of Large Numbers (LLN)
Definition
The Law of Large Numbers (Luật số lớn) states that thelong-run
relative frequencyof repeated independent events gets closer and
closer to thetruerelative frequency as the number of trials
Trang 13Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Be Careful!
Don’t misunderstand the Law of Large Numbers (LLN) It can
lead to money lost and poor business decisions
Example
I had 8 children, all of them are girls Thanks to LLN (!?), there
are high possibility that the next one will be a boy
(Overpopulation!!!)
Example
I’m playing Bầu cua tôm cá, the fish has not appeared in recent 5
games, it will be more likely to be fish next game Thus, I bet all
my money in fish (Sorry, you lose!)
Trang 14Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Probability
Definition
Theprobability(xác suất) of an eventE of a finite nonempty
sample space ofequally likely outcomesΩis:
Trang 15Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Examples
Example (1)
What is the probability of getting a Head when tossing a coin?
Answer:
• There are|Ω| = 2possible outcomes
• Getting a Head is|E| = 1outcome, so
Trang 16Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Examples
Example (3)
We toss a coin 6 times What is probability of H in 6th toss, if all
the previous 5 are T?
Answer:
Don’t be silly! Still 1/2
Example (4)
Which is more likely:
• Rolling an 8 when 2 dice are rolled?
• Rolling an 8 when 3 dice are rolled?
Answer:
Two dice:5/36 ≈ 0.139
Three dice:21/216 ≈ 0.097
Trang 17Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Formal Probability
Rule 1
A probability is a numberbetween 0 and 1
0 ≤ p(E) ≤ 1
Rule 2: Something has to happen rule
The probability of the set of all possible outcomes of a trialmust
be 1
p(Ω) = 1
Rule 3: Complement Rule
The probability of an event occurring is 1 minus the probability
that it doesn’t occur
p(E) = 1 − p(E)
Trang 18Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Example (Birthday Problem)
Given a group of n < 365students We’ll ignore leap years and
assume that all birthdays are equally likely
i) If we pick a specific day (say December 7th), then what is the
chance that at least one student was born on that day?
ii) What is the probability that at least one student has the
same birthday as any other student?
Answer i)
• The sample space is the set of all365n possible choices of
birthdays fornindividuals
• p1(n) = P (At least one student was born on December 7th)
= 1 − P (No students were born on December 7th)
= 1 −364365nn
• We havep1(30) ≈ 7.9%, and p2(91) ≈ 21.8%
• In order for the probability of at least one other person to
share your birthday to exceed50%,we neednlarge enough
that
p1(n) = 1 −364n > 0.5,orn > 253
Trang 19Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Birthday Problem (cont’d)
Answer ii)
• p2(n) = P (At least 1 same birthday)
= 1 − P (No same birthdays)
Trang 20Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Trang 21Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Generalization and Variation of Birthday Problem
• More generally, suppose we haveN objects, whereN is large
There arerpeople, and each chooses an object
• Then, similarly to above approximation,
• If there areN possibilities and we have a list of length√N,
then there is a good chance of a match:≈ 40%
• If we want to increase the chance of a match, we can make a
list of length of a constant times√N
• As a variation, suppose there areN objects and there are
two groups ofrpeople Each person from each group selects
an object What is the probability that someone from the first
group choose the same object as someone from the second
group?
• P (there is a match between two groups) = 1 − e−r2N.(Rather
difficult!)
• Eg If we takeN = 365andr = 30, then
P (there is a match between two groups) = 1 − e−302/365=
0.915
Trang 22Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
A birthday attack on discrete logarithm
• We want to solveαx≡ β (mod p)
• Make two lists, both of length around√p:
• 1st list:αk (mod p)for randomk
• 2nd list:βα−h(mod p)for randomh
• There is a good chance that there is a match:
αk≡ βα−h (mod p)
• Hence, x = h + k
Trang 23Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Formal Probability
General Addition Rule
p(E1∪ E2) = p(E1) + p(E2) − p(E1∩ E2)
• IfE1∩ E2= ∅: They aredisjoint, which means they can’t
occur together
• then,p(E1∪ E2) = p(E1) + p(E2)
Trang 24Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Example
Example (1)
If you choose a number between 1 and 100, what is the probability
that it is divisible by either 2 or 5?
There are a survey that about 45%of VN population hasType O
blood,40% type A,11% type Band the resttype AB What is the
probability that a blood donor has Type A or Type B?
Short Answer:
40% + 11% = 51%
Trang 25Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers ReferencesConditional Probability (Xác suất có điều kiện)
• “Knowledge” changes probabilities
Trang 26Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Trang 27Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Example
Example
What is the probability of drawing a red card and then another red
cardwithout replacement(không hoàn lại )?
Solution
E: the event of drawing the first red card
F: the event of drawing the second red card
p(E) = 26/52 = 1/2
p(F | E) = 25/51
So the event of drawing a red card and then another red card is
p(E ∩ F ) = p(E) × p(F | E) = 1/2 × 25/51 = 25/102
Trang 28Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
• Example:p(“Head”|“It’s raining outside”) =p(“Head”)
• IfE andF are independent
p(E ∩ F ) = p(E) × p(F )
Disjoint 6=Independence
Disjoint events cannot be independent They have no outcomes in
common, so knowing that one occurred means the other did not
Trang 29Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Trang 30Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Probability of sampling without replacement
• Let us return to the case of sampling without replacement,
specificallysample(1 : 40, 5)
• The probability of obtaining a given number as the first one of
the sample should be1/40,the next one1/39,and so forth
• The probability of a given sample should then be
Trang 31Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Probability of a set
• The above probability is the probability of getting given
numbers in a given order
• If this were a Lotto-like game, then you would rather be
interested in the probability of guessing a givensetof five
numbers correctly
• Thus you need also to include the cases that give the same
numbers in a different order
• Since obviously the probability of each such case is going to
be the same, all we need to do is to figure out how many
such cases there are and multiply by that
• There are five possibilities for the first number, and for each
of these there are four possibilities for the second, and so
forth;
• that is, the number is5 × 4 × 3 × 2 × 1
• This number is also written as 5!(5factorial)
• So the probability of a “winning Lotto coupon” would be
> prod(5:1)/prod(40:36)
[1] 1.519738e-06
Trang 32Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
The choose function
• There is another way of arriving at the same result
• Notice that since the actual set of numbers is immaterial, all
sets of five numbers must have the same probability
• So all we need to do is to calculate the number of ways to
choose 5 numbers out of40
• This is denoted by 405 = 40!
5! · 35! = 658008.
• In R, the choosefunction can be used to calculate this
number, and the probability is thus
> 1/choose(40,5)
[1] 1.519738e-06
Trang 33Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Random Variables
• When looking at independent replications of a binary
experiment, we would not usually be interested in whether
each case is a success or a failure but rather in the total
number of successes (or failures)
• Obviously, this number is random since it depends on the
individual random outcomes,
• and it is consequently called a random variable
• In this case it is a discrete-valued random variable that can
take values0, 1, , n,where nis the number of replications
• A random variable X has a probability distribution that can
be described using point probabilities fX(x) = p(X = x),
• or the cumulative distribution functionF (x) = p(X ≤ x)
• Expected value (giá trị kỳ vọng):
Trang 34Nguyen An Khuong, Huynh Tuong Nguyen
Contents Randomness Sampling with R Probability Probability Rules Probability with R Discrete RVs Some Discrete Probability Models Geometric Model Binomial Model The built-in distributions in R Densities Cdf Quantiles Random numbers References
Expected Value: An Example
An insurance company charges $50 a year Can company make a
profit? Assuming that it made a research on 1000 people and have
X: amount of payment, is adiscrete random variable(biến ngẫu
nhiên rời rạc) The companyexpectsthat they have to pay each