ENGINEERING SYSTEMS & DECISION ANALYSIS (CIVE3066) COURSE INTRODUCTION Instructor Nguyen Tuan Thanh LE (thanhlnt@tlu edu vn) Based on the CIVE203 Engineering system and decision analysis Colorado stat[.]
Trang 1Instructor:
Nguyen-Tuan-Thanh LE (thanhlnt@tlu.edu.vn)
Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA
& Role of statistics in engineering – California State University, USA
Trang 2COURSE OBJECTIVES
▪ Prepare you to effectively use models and statistics in your courses and career using MatLab/Octave and GIS skills
▪ Provide you with the capabilities of:
▪ Understanding the concepts of mathematical modeling and statistical data analysis as
applied to civil engineering systems.
▪ Estimating parameters for various statistical distributions, determining which distribution
best describes a set of data and to generate random samples from those distributions.
▪ Demonstrating the proper application of confidence limits and hypothesis testing to
examples from civil engineering systems.
▪ Demonstrating the proper application of simple linear or multiple regression for building
empirical models of engineering and scientific data.
▪ Demonstrating the use of Geographic Information Systems (GIS) for spatial data
collection, organization, and analysis.
2
Trang 4EXAMS AND GRADING
▪ The course will include two multiple choice quarterly, a midterm and
a comprehensive final examination
▪ Grading will be based on the following components:
4
Trang 5▪ Assigned weekly on the piazza forum
▪ Students submit homeworks on piazza with a zip files including all
necessary files The name of the zip file as well as the title of the
email should be: CIVE3066_59NKN_HWK n_Student’s name
▪ where n is the number of the current homework
▪ for example: CIVE3066_59NKN_HWK1_NguyenVanA
▪ Late homework is not accepted
▪ Solutions are posted on piazza or presented in class after due date
▪ Must be your own work Copied work will have the note 0
5
Trang 6GENERAL CLASS POLICIES
Students are expected to:
▪ Attend regularly
▪ Ask questions
▪ If you do not understand what the lecturer is saying or if you detect any errors
▪ Access the course forum piazza regularly
▪ Respect the lecture time
▪ Turn off or silence your cell phones before the start of class
▪ Respect assignment deadlines: late submissions will not be accepted.
▪ Be honest:
▪ Violations of the academic integrity policies may include: cheating, plagiarism, aiding
academic dishonesty, fabrication, lying, bribery, and threatening behavior.
6
Trang 77 Decision making for a single sample
8 Building empirical models
9 Simple linear regression
7
Trang 8RECOMMENDED TEXTBOOKS
▪ Engineering Statistics-Fifth Edition, Montgomery, D.C., G.C Runger and N.F
Hubele, John Wiley & Sons, Inc., 2011.
▪ Probability Concepts in Engineering- Second Edition, Ang, A and W Tang,
John Wiley & Sons, Inc.
▪ Geographic Information Systems and Science, Longley, P., John Wiley & Sons,
Inc.
▪ Applied Numerical Methods with MATLAB for Engineers and Scientists,
Chapra, S.C., McGraw Hill.
8
Trang 9OTHER MATERIALS
▪ MATLAB Statistics Toolbox, MathWorks Inc.
▪ MATLAB Curve Fitting Toolbox, MathWorks Inc.
▪ Numerical Computing with MATLAB , Moler, C.
▪ Think Stats: Exploratory Data Analysis in Python , Allen B Downey, 2014.
9
Trang 12THE SCIENTIFIC METHOD
Trang 13▪ Your sample is the 100 chosen people, while the population is all
the people at that match
▪ Sample: a selection taken from a larger/total group (the
“Population”) so that you can examine it to find out something
about the larger/total group [1]
Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA
[1] https://www.mathsisfun.com/definitions/sample.html 4
Trang 14DATA SAMPLING (2/2)
▪ Data sampling is a statistical analysis technique
representative subset of data points
Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA
[2] http://searchbusinessanalytics.techtarget.com/definition/data-sampling 5
Trang 15DATA COLLECTION:
SAMPLE DESIGN – STRATEGY
▪ Non-probability Sampling : selecting samples based on the subjective
judgment of researchers rather than random selection
▪ Haphazard Sampling (Convenience Sampling)
▪ Judgment Sampling (Purposive/Expert Sampling)
▪ Probability Sampling : sample are chosen using a method based on the
theory of probability
▪ Simple Random Sampling
▪ Stratified Random Sampling
▪ Clustering Sampling
▪ Multistage Sampling
▪ Systematic Sampling
6
Trang 16NON-PROBABILITY SAMPLING:
HAPHAZARD/CONVENIENCE
SAMPLING
and recreate true randomness.
▪ Example: you stand on a busy corner during rush hour and interviewing
people who pass by.
unbiased estimates
Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 7
Trang 17NON-PROBABILITY SAMPLING:
JUDGMENT/PURPOSIVE/EXPERT SAMPLING
credibility
(with respect to attributes and representation of a population) to
participate in research study.
completely accessible so that sample selection bias is not a problem.
8
Trang 18PROBABILITY SAMPLING:
SIMPLE RANDOM SAMPLING (1/3)
▪ Each of the population units has an equal chance of being
selected for measurement.
other units
locations haphazardly.
population does not contain major trends, cycles, or
patterns.
9
Trang 19PROBABILITY SAMPLING:
STEPS
1 A list of all the members of the population is prepared
initially and then each member is marked with a specific
number (for example, there are N members then they will be
Trang 20Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 11
Trang 21Method of lottery
Trang 22Excel Functions:
• RANDBETWEEN(a,b)
• RAND()
Trang 23PROBABILITY SAMPLING:
SIMPLE RANDOM SAMPLING (3/3) – EXAMPLE
100 from them.
▪ Step 1: Make a list of all the employees working in the organization
▪ As mentioned above there are 500 employees in the organization, the list must contain
500 names).
▪ Step 2: Assign a sequential number to each employee (1,2,3…500) This is
your sampling frame (the list from which you draw your simple random sample).
▪ Step 3: Figure out what your sample size is going to be.
▪ In this case, the sample size is 100
▪ Step 4: Use a random number generator to select the sample, using your
sampling frame (population size) from Step 2 and your sample size from Step 3
▪ In this case, your sample size is 100 and your population is 500, so generate 100
random numbers between 1 and 500.
14
Trang 24PROBABILITY SAMPLING:
▪ The target population is divided into non-overlapping,
homogeneous sub-regions/groups called strata (statum) to obtain
a better estimation of the mean of the population.
▪ Age, socioeconomic divisions, nationality, religion, educational achievements, … fall under stratified random sampling.
▪ Samples within each strata is selected by Simple Random
Trang 25Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 16
Trang 26PROBABILITY SAMPLING:
STRATIFIED RANDOM SAMPLING (2/2) - EXAMPLE
▪ Let’s consider a situation where a research team is seeking
opinions about religion amongst various age groups
▪ Instead of collecting feedback from 326,044,985 U.S citizens, random samples of around 10000 can be selected for research
▪ These 10000 citizens can be divided into strata according to age,i.e, groups of 18-29, 30-39, 40-49, 50-59, and 60 and above
▪ Each stratum will have distinct members and number of members.
17
Trang 27PROBABILITY SAMPLING:
CLUSTERING SAMPLING
▪ The target population is divided into clusters of
individual units
in the chosen clusters are measured
▪ Useful when population units cluster together and each unit in the randomly selected cluster can be measured
Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 18
Trang 28Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 19
Trang 29Cluster Sampling
▪ Elements of a population are randomly
selected to be a part of groups (clusters).
▪ Members from randomly selected clusters are
a part of this sample.
▪ Homogeneity is maintained between clusters
▪ Heterogeneity is maintained with the clusters.
▪ The clusters are divided naturally.
▪ The key objective is to minimize the cost
involved and enhance competence.
Stratified Random Sampling
▪ The entire population is divided into even segments (strata).
▪ Individual components of the strata are randomly considered to be a part of sampling units.
▪ Homogeneity is maintained within the strata.
▪ Heterogeneity is maintained between strata.
▪ The strata division is primarily decided by the researchers or statisticians.
▪ The key objective is to conduct accurate sampling along with properly represented population.
20
CLUSTER SAMPLING VS
STRATIFIED SAMPLING
Trang 30PROBABILITY SAMPLING:
MULTISTAGE SAMPLING
▪ The target population is divided into primary units
(clusters)
▪ Then, a set of primary units is selected by using
Simple Random Sampling and each is randomly
sub-sampled
▪ Needed when measurements are made on
sub-samples of the field sample
Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 21
Trang 31Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 22
Trang 32PROBABILITY SAMPLING:
▪ The elements are chosen from a target population by selecting a
random starting point and selecting other members after a fixed
‘ sampling interval ’.
▪ Sampling interval is calculated by dividing the entire population
size by the desired sample size.
▪ Example:
▪ A local NGO is seeking to form a systematic sample of 500
volunteers from a population of 5000,
▪ They can select every 10 th person in the population to
systematically form a sample
23
Trang 33Linear Systematic Sampling Circular Systematic Sampling
24
PROBABILITY SAMPLING:
1 Arrange the entire population in a classified sequence
2 Select the sample size (n)
3 Calculate sampling interval (k) = N/n
4 Select a random number between 1 to k (including k)
5 Add the sampling interval (k) to the chosen random number to
add the next member to a sample and repeat this procedure to
add remaining members of the sample
6 In case k isn’t an integer, can select the closest integer to N/n.
1 Calculate sampling interval (k) = N/n (If N = 11 and n
= 2, then k is taken as 5 and not 6)
2 Start randomly between 1 to N
3 Create samples by skipping through k units every time until you select members of the entire population
4 In case of this systematic sampling method, there will
be N number of samples, unlike k samples in the linear systematic sampling method
if N = 7, n = 2, k=3, the samples
will be: ad, be,
ca, db and ec.
Trang 34THE SCIENTIFIC METHOD
Trang 352 DESCRIPTIVE STATISTICS
▪ Descriptive statistics are statistical measures used to
describe a set of samples (or observations)
▪ Three kinds of descriptive statistics:
Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 26
Trang 36DESCRIPTIVE STATISTICS:
CENTRAL TENDENCY - MODE
▪ The value has the largest number of observations
▪ MATLAB Syntax: M = mode(X); M = mode(X, dim);
▪ Description
▪ M = mode(X)
▪ If X is a vector, M is the sample mode (the most frequently occurring value) of X
▪ If X is a matrix, M is a row vector containing the mode of each column of that matrix
▪ When there are multiple values occurring equally frequently, mode returns the smallest
of those values
▪ M = mode(X, dim) computes the mode along the dimension dim of X
▪ dim = 1 or 2
▪ 1: return a row vector (default)
▪ 2: return a colum vector
27
Trang 43≥ ≥
Trang 45Normalize with N-1, provides the square root
of the best unbiased estimator of the variance
Normalize with N, this provides the square root
of the second moment around the mean
Trang 4738
Trang 483 DATA PROCESSING
▪ Data processing involves verification, coding, classification
& tabulation of data
▪ Verification: verify to ensure that the data is accurate
▪ Coding: the verify data is converted into machine readable form so
that it can be processed through computer
▪ Classification: data are classified on the basis of common
characteristics which may be qualitative or descriptive &
quantitative or numericals
▪ Tabulation: it is concise, logical & orderly arrangement of data in a
columns & rows
Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 39
Trang 49MISSING DATA (1/2)
usually represented by the special value NaN, which is
Not-a-Number.
1) if any element of the vector is nonzero
Based on the CIVE203 - Engineering system and decision analysis - Colorado state University, USA 40
Trang 50▪ X=X(~isnan(X)) : remove NaNs from X
▪ X(isnan(X)) = [] : remove NaNs from x
▪ M(any(isnan(M),2),:)=[] : remove any rows containing NaNs
M= NaN 6 10 2
8 1 4 2
8 6 7 2
Trang 52SCATTER PLOT
▪ scatter(X, Y) : create a scatter plot with circles at the
locations specified by the vectors X and Y
43
Trang 53TIME SERIES PLOT
▪ plot(X) : plot the time series data X against time.
44
Trang 54BAR CHART
▪ bar(Y) : create a bar graph with one bar for each element in Y
▪ bar(X, Y) : draws the bars of Y at the locations specified in X
45
Trang 55STEM PLOT
▪ stem(Y) : plot the data sequence Y as stems that extend from the
baseline along the x-axis
▪ stem(X, Y) : plot the data sequence Y at values specified by X
46
Trang 56▪ boxplot(X) : create a box plot of the data in X
▪ boxplot(X1, X2, X3) : create a box plot for each group of data X1, X2, X3
47
Trang 58• Model: an idealized version of how the world works
• Data: collected observations
Trang 59PROBABILITY VS STATISTICS (2/2)
▪ Probability:
▪ Statistics:
3
Trang 601 Basic concepts
Based on Probability and statistics for engineers – Radim Bris – Technical University of Ostrava and
Probability and statistics for enginneers and scientists – Eight Edition – Ronald E Walpole and Raymond H Myers 4
Trang 611 BASIC CONCEPTS
Examples of Probability
▪ Flip a coin N times, the proportion of heads against the number of flips
▪ The probability of a certain size of flood flow occurring in any one year
▪ The probability of a certain kind of vehicle crossing a certain point on a road
Probability theory
▪ Probability is a measure of the likelihood of a random phenomenon or chance
behavior
▪ Probability allows to model the frequency of realization of random events.
▪ Probability theory is a mathematical framework for computing the probability of
complex events
Based on Probability and statistics for engineers – Radim Bris – Technical University of Ostrava and
Probability and statistics for enginneers and scientists – Eight Edition – Ronald E Walpole and Raymond H Myers 5
Trang 62SOME DEFINITIONS
▪ Probability experiment (ε): an action or trial through which specific
results (counts, measurements or responses) are obtained
▪ Outcome (ω): the result of a single trial (a probability experiment)
▪ Sample space (Ω): the set of all possible outcomes of a probability
experiment
▪ Event (A): is a subset of the sample space, consisting of one or more
outcomes
▪ We have: ω ∊ Ω and A ⊂ Ω
Based on Probability and statistics for engineers – Radim Bris – Technical University of Ostrava and
Probability and statistics for enginneers and scientists – Eight Edition – Ronald E Walpole and Raymond H Myers 6