• We outline the basic ideas of probability and the functions that Rhas for random sampling and handling of theoretical distributions... • We outline the basic ideas of probability and t
Trang 1Introduction to
Probability and Distributions with R
Nguyen An Khuong, HCMUT, VNU-HCM
Ngày 15 tháng 9 năm 2016
Trang 3Probability and distributions with R
Randomness
Motivations
• Gambling
• Real life problems
• Computer Science: cryptology, coding theory, algorithmiccomplexity,
Trang 4Probability and distributions with R
Randomness
Motivations
• Gambling
• Real life problems
• Computer Science: cryptology, coding theory, algorithmiccomplexity,
Trang 5Probability and distributions with R
Randomness
Motivations
• Gambling
• Real life problems
• Computer Science: cryptology, coding theory, algorithmiccomplexity,
Trang 6Probability and distributions with R
Randomness
Motivations
• Gambling
• Real life problems
• Computer Science: cryptology, coding theory, algorithmiccomplexity,
Trang 7Probability and distributions with R
Randomness
Motivations
• Gambling
• Real life problems
• Computer Science: cryptology, coding theory, algorithmiccomplexity,
Trang 8Probability and distributions with R
Randomness
Motivations
• Gambling
• Real life problems
• Computer Science: cryptology, coding theory, algorithmiccomplexity,
Trang 9Probability and distributions with R
Randomness
Motivations
• Gambling
• Real life problems
• Computer Science: cryptology, coding theory, algorithmiccomplexity,
Trang 10Motivations
• Gambling
• Real life problems
• Computer Science: cryptology, coding theory, algorithmic
complexity,
Trang 11Probability and distributions with R
Randomness
Randomness
Which of these arerandom phenomena?
• The number you receive when rolling a fairdice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency(fraction
of times that the event occurs over and over and over)
Trang 12Probability and distributions with R
Randomness
Randomness
Which of these arerandom phenomena?
• The number you receive when rolling a fairdice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency(fraction
of times that the event occurs over and over and over)
Trang 13Probability and distributions with R
Randomness
Randomness
Which of these arerandom phenomena?
• The number you receive when rolling a fairdice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency(fraction
of times that the event occurs over and over and over)
Trang 14Probability and distributions with R
Randomness
Randomness
Which of these arerandom phenomena?
• The number you receive when rolling a fairdice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency(fraction
of times that the event occurs over and over and over)
Trang 15Probability and distributions with R
Randomness
Randomness
Which of these arerandom phenomena?
• The number you receive when rolling a fairdice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency(fraction
of times that the event occurs over and over and over)
Trang 16Probability and distributions with R
Randomness
Randomness
Which of these arerandom phenomena?
• The number you receive when rolling a fairdice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency(fraction
of times that the event occurs over and over and over)
Trang 17Probability and distributions with R
Randomness
Randomness
Which of these arerandom phenomena?
• The number you receive when rolling a fairdice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency(fraction
of times that the event occurs over and over and over)
Trang 18Probability and distributions with R
Randomness
Randomness
Which of these arerandom phenomena?
• The number you receive when rolling a fairdice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency(fraction
of times that the event occurs over and over and over)
Trang 19Randomness
Which of these arerandom phenomena?
• The number you receive when rolling a fairdice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light isnotrandom It has timer
• The pattern ofyour ridingis random
So what is special about randomness?
In thelong run, they are predictable and haverelative frequency(fraction
of times that the event occurs over and over and over)
Trang 20Probability and distributions with R
Randomness
Randomness in Statistics
• Randomness and probability: central to statistics
• Empirical fact: Most experiments and investigations are not perfectlyreproducible
• The degree of irreproducibility may vary:
• Some experiments in physics may yield data that are accurate tomany decimal places,
• whereas data on biological systems are typically much less reliable
• View of data as something coming from a statistical distribution:
vital to understanding statistical methods
• We outline the basic ideas of probability and the functions that Rhas for random sampling and handling of theoretical distributions
Trang 21Probability and distributions with R
Randomness
Randomness in Statistics
• Randomness and probability: central to statistics
• Empirical fact: Most experiments and investigations are not perfectly
reproducible
• The degree of irreproducibility may vary:
• Some experiments in physics may yield data that are accurate tomany decimal places,
• whereas data on biological systems are typically much less reliable
• View of data as something coming from a statistical distribution:
vital to understanding statistical methods
• We outline the basic ideas of probability and the functions that Rhas for random sampling and handling of theoretical distributions
Trang 22Probability and distributions with R
Randomness
Randomness in Statistics
• Randomness and probability: central to statistics
• Empirical fact: Most experiments and investigations are not perfectly
reproducible
• The degree of irreproducibility may vary:
• Some experiments in physics may yield data that are accurate tomany decimal places,
• whereas data on biological systems are typically much less reliable
• View of data as something coming from a statistical distribution:vital to understanding statistical methods
• We outline the basic ideas of probability and the functions that Rhas for random sampling and handling of theoretical distributions
Trang 23Probability and distributions with R
Randomness
Randomness in Statistics
• Randomness and probability: central to statistics
• Empirical fact: Most experiments and investigations are not perfectly
reproducible
• The degree of irreproducibility may vary:
• Some experiments in physics may yield data that are accurate to
many decimal places,
• whereas data on biological systems are typically much less reliable
• View of data as something coming from a statistical distribution:vital to understanding statistical methods
• We outline the basic ideas of probability and the functions that Rhas for random sampling and handling of theoretical distributions
Trang 24Probability and distributions with R
Randomness
Randomness in Statistics
• Randomness and probability: central to statistics
• Empirical fact: Most experiments and investigations are not perfectly
reproducible
• The degree of irreproducibility may vary:
• Some experiments in physics may yield data that are accurate to
many decimal places,
• whereas data on biological systems are typically much less reliable
• View of data as something coming from a statistical distribution:vital to understanding statistical methods
• We outline the basic ideas of probability and the functions that Rhas for random sampling and handling of theoretical distributions
Trang 25Probability and distributions with R
Randomness
Randomness in Statistics
• Randomness and probability: central to statistics
• Empirical fact: Most experiments and investigations are not perfectly
reproducible
• The degree of irreproducibility may vary:
• Some experiments in physics may yield data that are accurate to
many decimal places,
• whereas data on biological systems are typically much less reliable
• View of data as something coming from a statistical distribution:
vital to understanding statistical methods
• We outline the basic ideas of probability and the functions that Rhas for random sampling and handling of theoretical distributions
Trang 26Randomness in Statistics
• Randomness and probability: central to statistics
• Empirical fact: Most experiments and investigations are not perfectlyreproducible
• The degree of irreproducibility may vary:
• Some experiments in physics may yield data that are accurate to
many decimal places,
• whereas data on biological systems are typically much less reliable
• View of data as something coming from a statistical distribution:
vital to understanding statistical methods
• We outline the basic ideas of probability and the functions that R
has for random sampling and handling of theoretical distributions
Trang 27Probability and distributions with R
Sampling with R
Random Numbers with R
• Much of the earliest work in probability theory was about games and
gambling issues, based on symmetry considerations
• The basic notion then is that of a random sample:
dealing from awell-shuffled pack of cards or picking numbered balls from awell-stirred urn
• In R, we can simulate these situations with the sample function
• If we want to pick five numbers at random from the set 1 : 40, thenyou can write
> sample(1:40,5)[1] 4 30 28 40 13
Trang 28Probability and distributions with R
Sampling with R
Random Numbers with R
• Much of the earliest work in probability theory was about games and
gambling issues, based on symmetry considerations
• The basic notion then is that of a random sample:
dealing from awell-shuffled pack of cards or picking numbered balls from awell-stirred urn
• In R, we can simulate these situations with the sample function
• If we want to pick five numbers at random from the set 1 : 40, thenyou can write
> sample(1:40,5)[1] 4 30 28 40 13
Trang 29Probability and distributions with R
Sampling with R
Random Numbers with R
• Much of the earliest work in probability theory was about games and
gambling issues, based on symmetry considerations
• The basic notion then is that of a random sample:
dealing from awell-shuffled pack of cards or picking numbered balls from awell-stirred urn
• In R, we can simulate these situations with the sample function
• If we want to pick five numbers at random from the set 1 : 40, thenyou can write
> sample(1:40,5)[1] 4 30 28 40 13
Trang 30Probability and distributions with R
Sampling with R
Random Numbers with R
• Much of the earliest work in probability theory was about games and
gambling issues, based on symmetry considerations
• The basic notion then is that of a random sample: dealing from a
well-shuffled pack of cards or
picking numbered balls from awell-stirred urn
• In R, we can simulate these situations with the sample function
• If we want to pick five numbers at random from the set 1 : 40, thenyou can write
> sample(1:40,5)[1] 4 30 28 40 13
Trang 31Probability and distributions with R
Sampling with R
Random Numbers with R
• Much of the earliest work in probability theory was about games and
gambling issues, based on symmetry considerations
• The basic notion then is that of a random sample: dealing from a
well-shuffled pack of cards or picking numbered balls from a
well-stirred urn
• In R, we can simulate these situations with the sample function
• If we want to pick five numbers at random from the set 1 : 40, thenyou can write
> sample(1:40,5)[1] 4 30 28 40 13
Trang 32Probability and distributions with R
Sampling with R
Random Numbers with R
• Much of the earliest work in probability theory was about games and
gambling issues, based on symmetry considerations
• The basic notion then is that of a random sample: dealing from a
well-shuffled pack of cards or picking numbered balls from a
well-stirred urn
• In R, we can simulate these situations with the sample function
• If we want to pick five numbers at random from the set 1 : 40, then
you can write
> sample(1:40,5)[1] 4 30 28 40 13
Trang 33Probability and distributions with R
Sampling with R
Random Numbers with R
• Much of the earliest work in probability theory was about games and
gambling issues, based on symmetry considerations
• The basic notion then is that of a random sample: dealing from a
well-shuffled pack of cards or picking numbered balls from a
well-stirred urn
• In R, we can simulate these situations with the sample function
• If we want to pick five numbers at random from the set 1 : 40, then
you can write
> sample(1:40,5)
[1] 4 30 28 40 13
Trang 34Sampling with R
Random Numbers with R
• Much of the earliest work in probability theory was about games andgambling issues, based on symmetry considerations
• The basic notion then is that of a random sample: dealing from a
well-shuffled pack of cards or picking numbered balls from a
well-stirred urn
• In R, we can simulate these situations with the sample function
• If we want to pick five numbers at random from the set 1 : 40, thenyou can write
> sample(1:40,5)
[1] 4 30 28 40 13
Trang 35Probability and distributions with R
Sampling with R
Sample function
• The first argument (x) is a vector of values to be sampled
• The second (size) is the sample size
• Actually, sample(40, 5) would suffice
since a single number isinterpreted to represent the length of a sequence of integers
• Notice that the default behavior of sample is sampling withoutreplacement
• That is, the samples will not contain the same number twice, andsize obviously cannot be bigger than the length of the vector to besampled
• If we want sampling with replacement, then we need to add theargument replace = TRUE
Trang 36Probability and distributions with R
Sampling with R
Sample function
• The first argument (x) is a vector of values to be sampled
• The second (size) is the sample size
• Actually, sample(40, 5) would suffice
since a single number isinterpreted to represent the length of a sequence of integers
• Notice that the default behavior of sample is sampling withoutreplacement
• That is, the samples will not contain the same number twice, andsize obviously cannot be bigger than the length of the vector to besampled
• If we want sampling with replacement, then we need to add theargument replace = TRUE
Trang 37Probability and distributions with R
Sampling with R
Sample function
• The first argument (x) is a vector of values to be sampled
• The second (size) is the sample size
• Actually, sample(40, 5) would suffice
since a single number isinterpreted to represent the length of a sequence of integers
• Notice that the default behavior of sample is sampling withoutreplacement
• That is, the samples will not contain the same number twice, andsize obviously cannot be bigger than the length of the vector to besampled
• If we want sampling with replacement, then we need to add theargument replace = TRUE
Trang 38Probability and distributions with R
Sampling with R
Sample function
• The first argument (x) is a vector of values to be sampled
• The second (size) is the sample size
• Actually, sample(40, 5) would suffice
since a single number isinterpreted to represent the length of a sequence of integers
• Notice that the default behavior of sample is sampling withoutreplacement
• That is, the samples will not contain the same number twice, andsize obviously cannot be bigger than the length of the vector to besampled
• If we want sampling with replacement, then we need to add theargument replace = TRUE
Trang 39Probability and distributions with R
Sampling with R
Sample function
• The first argument (x) is a vector of values to be sampled
• The second (size) is the sample size
• Actually, sample(40, 5) would suffice since a single number is
interpreted to represent the length of a sequence of integers
• Notice that the default behavior of sample is sampling withoutreplacement
• That is, the samples will not contain the same number twice, andsize obviously cannot be bigger than the length of the vector to besampled
• If we want sampling with replacement, then we need to add theargument replace = TRUE
Trang 40Probability and distributions with R
Sampling with R
Sample function
• The first argument (x) is a vector of values to be sampled
• The second (size) is the sample size
• Actually, sample(40, 5) would suffice since a single number is
interpreted to represent the length of a sequence of integers
• Notice that the default behavior of sample is sampling without
replacement
• That is, the samples will not contain the same number twice, andsize obviously cannot be bigger than the length of the vector to besampled
• If we want sampling with replacement, then we need to add theargument replace = TRUE