1. Trang chủ
  2. » Giáo án - Bài giảng

Numerical Methods and DATA COLLECTION AND SAMPLING

78 134 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 78
Dung lượng 1,06 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

4.1 Measures of Central Location With one data point clearly the central location is at the point itself.. With two data points, the central location should fall in the middle between th

Trang 1

Descriptive Statistics:

Numerical Methods

Trang 2

4.1 Measures of Central Location

With one data point clearly the central location is at the point

itself.

 The central data point reflects the locations of all

the actual data points.

With two data points, the central location should fall in the middle between them (in order

to reflect the location of both of them).

Trang 3

4.1 Measures of Central Location

 The central data point reflects the locations of all

the actual data points.

If the third data point appears in the center the measure of central location will remain

in the center, but… (click)

But if the third data point appears on the left hand-side

of the midrange, it should “pull”

the central location to the left.

Trang 5

Sum of the measurements Number of measurements Mean =

• This is the most popular and useful measure of central location

The Arithmetic Mean (average)

Trang 6

Sample mean Population mean

The Arithmetic Mean

n

x

n 1 i

 n

x

n 1 i

Trang 7

Find the mean rate of return for a portfolio equally invested in five

stocks having the following annual rate of returns: 11.2%, 8.07%,

5.55%, 13.7%, 21%

Solution

Example 1 The Arithmetic Mean

% 764

9 5

21 7

13 55

5 07

8 2

11

x      

Trang 8

3 Geometric mean

• A specialized measure, used to find the average growth rate, or rate

of change of a variable over time

• Example:

The number of students attending the music class last Tuesday was

160 This Tuesday, the number is expected to increase by 15%

How many of them are likely to attend this Tuesday?

Trang 9

The number of students likely to attend this Tuesday

Growth rate/rate of change?

Trang 10

(i) Simple geometric mean: applied when each

rate of change appears once only

Rgn (1 R1)(1 R2 ) (1 Rn) - 1

Trang 12

200 220 250 262 284 300 312

What is the average rate of change in the

number of employees?

Trang 13

Year 200

0 200 1 200 2 200 3 200 4 200 5 200 6

No of emplo yees

200 220 250 262 284 300 312

(1+R) - 1.1 1.136 1.048 1.084 1.056 1.04

Trang 14

The average rate of change:

Rg 6 1.1´1.136´1.048´1.084´1.056´1.04

- 10.077 ~ 7.7%

Trang 15

Grow

th rate (%)

Grow

th rate (%)

Trang 17

Characteristics of the mean

A representative of a data set

Takes every single value into account so it is likely to be affected by extreme values

Used to compare different-sized data sets.

Trang 18

• The median of a set of measurements is the value

that falls in the middle when the measurements are

arranged in order of magnitude

• When determining the median pay attention to the number of observations (k).

• ‘k’ is odd

Median = the number at the (k+1)/2th location of the ordered array

• ‘k’ is Even

Median = the average of the two numbers in the middle

(The number at the (k/2)th and the [(k/2)+1)]th

locations of the ordered array.)

The Median

Trang 19

Find the median salary.

Suppose an additional salary of $31,000

is added to the group of salaries recorded

before Find the median salary.

Even number of observations

29.5,

The Median

There are seven salaries (K = 7)

The (k+1)/2 th salary of the ordered array is the number at the (7+1)/2 th = 4 th location.

The median is 29

There are eight salaries (K = 8)

The two salaries in the middle are 29 (in the (k/2) th =4 th location), and 30 (in the

[(k/2)+1] th =5 th location.

The median is the average number – 29.5

Trang 20

• The Mode of a set of measurements is the value

that occurs most frequently.

• A Set of data may have one mode (or modal class),

or two or more modes.

The modal class For large data setsthe modal class is

much more relevant than a single-value

mode

The Mode

Trang 21

 The mode of this data set is 34 in

This information seems to be valuable (for example, for the design of a new display in the store), much more than “ the

median is 33.5 in.”

This information seems to be valuable (for example, for the design of a new display in the store), much more than “ the

median is 33.5 in.”

The Mode

Trang 22

 If a distribution is non symmetrical, and skewed to the

left or to the right, the three measures differ.

A positively skewed distribution (“skewed to the right”)

Mean Median

Mode

Trang 23

• If a distribution is symmetrical, the mean, median and mode coincide

 If a distribution is non symmetrical, and skewed

to the left or to the right, the three measures

differ.

A positively skewed distribution

(“skewed to the right”)

Mean Median

MedianMode

A negatively skewed distribution

(“skewed to the left”)

Relationship among Mean, Median, and Mode

Trang 24

Using the Mean, Median, and

Mode

• The mean - is very sensitive to extreme

• The median is not effected by extreme values, yet, does not reflect all the values

included in the data set, but rather the location of the observation in the middle

• The mode – should be used mainly for

Trang 25

4.2 Measures of Variability

• Measures of central location fail to tell the whole story about the distribution.

• A question of interest still remains unanswered:

How much are the values of a given set spread

out around the mean value?

Trang 26

• Think of a sample portfolio composed of three stocks

100 shares ARR = 10%

200 shares ARR = 15% 100 shares

ARR = 20%

A central measure for this portfolio’s ARR for is 15%.

 Now observe the following portfolio

Trang 27

• Considering the average ARR only the two portfolios are equal But are they really?

• Is the dispersion (variability) of ARR the same for the two portfolio?

• The dispersion is as important as the central location.

Trang 28

But, how do all the measurements spread out?

Smallest measurement

Largest measurement

The range cannot assist in answering this questionRange

The Range

Trang 29

This measure reflects the dispersion of all the

measurement values.

 The variance of a population of N measurements

x1, x2,…,xN having a mean  is defined as

 The variance of a sample of n measurements

x1, x2, …,xn having a mean is defined as

The Variance

x

N

) x

N 1 i

1 n

) x x

( s

2 i

n 1 i

Trang 30

Consider two small populations:

10

98

4-10 = - 6

7-10 = -3 13-10 = +3 16-10 = +6

A measure of dispersion should agree with this

observation

Can the sum of deviations from the mean

be a good measure of dispersion?

A

B

The Variance

Trang 31

The sum of deviations is zero for both populations, therefore, is not a good measure of dispersion, since clearly their dispersion is not equal.

The Variance

Trang 32

Let us calculate the variance of the two populations

The Variance

18 5

) 10 16

( )

10 13

( )

10 10

( )

10 7

( )

10 4

 -

 -

) 10 12

( )

10 11

( )

10 10

( )

10 9

( )

10 8

 -

 -

-

Trang 33

• Example 6

• Find the variance of the following set of numbers, representing annual rates of returns for a group of mutual funds Assume the set is (i) a sample, (ii) a population: -2, 4, 5, 6.9, 10

2

2 i

n 1 i 2

percent59

.19

)78.410(

)78.44()

78.42

(15

11

n

)xx

(s

-

-

-

-

--

-

 

4.78 5

23.9 5

10 6.9

5 4

2 5

x

x  i61 i  -      

Trang 34

2 i

n 1 i 2

percent 6736

15

) 78 4 10 (

) 78 4 4 ( )

78 4 2

( 5

1 n

) x x

 -

Trang 37

deviation andard

st Population

s s

: deviation standard

Trang 38

Standard Deviation

Trang 39

The Empirical Rule for a Bell

Shaped Data Set …

Approximately 68% of all observations fall

within one standard deviation of the mean.

Approximately 95% of all observations fall

within two standard deviations of the mean.

Approximately 99.7% of all observations fall

within three standard deviations of the mean.

Trang 40

• The proportion of observations in any sample that lie within k standard deviations of the mean is at least 1- 1/k2

for any k > 1.

• This theorem is valid for any set of measurements

(sample, population) of any shape!!

s 3 x

, s 3

s 2 x

, s 2

s 4 x

, s 4

Trang 41

If the histogram is not at all bell-shaped we can say that at least 75% of the marks fell between 60 and 80, and at

least 88.9% of the marks fell between 55 and 85 (We can use other values of k.)

Trang 42

• At most p% of the measurements are less than that value

• At most (100-p)% of all the measurements are greater than that value.

Trang 43

• First (lower) decile = 10th percentile

• First (lower) quartile, Q 1 , = 25th percentile

• Third quartile, Q 3, = 75th percentile

• Ninth (upper) decile = 90th percentile

Lower decile

A demostration of Commonly used percentiles

lie here

Trang 44

• Commonly used percentiles:

• First (lower) decile = 10th percentile

• First (lower) quartile, Q 1 , = 25th percentile

• Third quartile, Q 3, = 75th percentile

• Ninth (upper) decile = 90th percentile

Lower quartile

A demostration of Commonly used percentiles - optional

lie here

lie here

Trang 45

• Commonly used percentiles:

• First (lower)decile = 10th percentile

• First (lower) quartile, Q 1 , = 25th percentile

• Third quartile, Q 3, = 75th percentile

• Ninth (upper) decile = 90th percentile

Middle decile -Median

A demostration of Commonly used percentiles

Trang 46

the of

location the

is L

where

100

P ) 1 n

( L

th P

Trang 47

• Example 12-solution continued

• Finding the location of the 20th percentile:

• 2.7, 3.1, 5.2, 6.2, 8.3, 20.9, 24.4, 30.05, 33.6, 42.9

• Finding the value of the 20th percentile

The 20th percentile is located at location 2.75, that is, at 75 the distance from 3.1 to 5.2

Therefore,

3.1 5.2

2100

20)110

(100

P)1n(

Trang 48

Quartiles and Variability

• Quartiles can provide an idea about the shape of a histogram

Q1 Q2 Q3Positively skewed

histogram

Q1 Q2 Q3Negatively skewed

histogram

Trang 49

• This is a measure of the spread of the middle 50% of the observations

• Large value indicates a large spread of the observations

Interquartile range = Q3 – Q1

Inter-quartile Range

Trang 50

1.5(Q3 – Q1) 1.5(Q3 – Q1)

• A box plot is a pictorial display that provides the main descriptive measures of the measurement set:

• L - the largest measurement

• Q3 - The upper quartile

• Q2 - The median

• Q1 - The lower quartile

• S - The smallest measurement

Trang 51

.

Smallest = 449 Q1 = 512

Median = 537 Q3 = 575 Largest = 788 IQR = 63 Outliers = (788, 788, 766, 763, 756, 719, 712, 707, 703, 694, 690, 675, )

Trang 52

DATA COLLECTION AND

SAMPLING

CHAPTER 4

Trang 53

 Methods of collecting data

 Simple Random Sampling

 Stratified Random Sampling

Trang 55

1 Observation

The investigator observes characteristics of a subset

of members of one or more existing populations.

 Goal: draw conclusions about the corresponding population or about the difference between two or more populations.

 Advantage vs Disadvantage

o Advantage: easy to conduct, relatively inexpensive

o Disadvantage: provide little useful information;

impossible to draw cause-and-effect conclusions due

to confounding variable

55

I Methods of collecting

data

Trang 56

A researcher for a pharmaceutical company wants to determine whether aspirin does reduce the incidence of heart attacks He select a sample of men and women and asking each whether he or she has taken aspirin regularly over the past 2 years Each person would be asked whether he or she had suffered a heart attack over the same period The proportions reporting heart attacks would be compared and a conclusion can be drawn whether aspirin is effective in reducing the likelihood of heart attacks

56

Observation

Trang 57

2 Experiment

The investigator observes how a response variable

behaves when the researcher manipulates one or more explanatory variables (factors).

 Goal: determine the effect of the manipulated factors on the response variable

Trang 58

A researcher for a pharmaceutical company wants to determine whether aspirin does reduce the incidence of heart attacks He select a sample of men and women The sample would be divided into two groups: one group would take aspirin regularly and the other would not After 2 years, the researcher would determine the proportion

of people in each group who had suffered a heart attack Then, it is possible to draw conclusion whether aspirin is effective in reducing the likelihood of heart attacks

58

Experiment

Trang 59

3 Survey

One of the most familiar methods of collecting data

Goal: Used to solicit information from people

concerning things as income, family size, opinions on

various issues…

 The majority of surveys are conducted for private use

 Examples:

o Market researchers conduct a survey to determine the

preferences and attitudes of consumers which will help target a new product;

o A company surveys customers’ satisfaction on their products and service.

59

I Methods of collecting

data

Trang 60

- Inexpensive

- Low response rate, high

number of incorrect answers

Trang 61

 Define the issue

 What are the purpose and objectives of the survey

 Identify the questions to answer?

 Deciding what to measure and how to measure

 Decide what information needed to answer questions

 Think about how you intend to tabulate and analyze the response

 Define the population of interest

61

Survey Design Steps

Trang 62

Design questionnaire

 Questionnaire should be kept as short as possible

 The questions should be short, simple, clear,

unambiguous

 Begin with simple demographic questions

 Use both dichotomous questions (close–ended) questions as well as open – ended question

 Avoid using leading questions

62

Survey Design Steps

Trang 63

 Pre-test the survey

 pilot test with a small group of participants

 assess clarity and length

 Determine the sample size and sampling method

 Select Sample and administer the survey

63

Survey Design Steps

Trang 64

 Close-ended Questions

• Select from a short list of defined choices

Example: Major: business liberal arts science other

• Questions about the respondents’ personal characteristics

Example: Gender: Female Male

64

Types of Questions

Trang 65

1/ Why Sampling

- Less time consuming than a census

- Less costly to administer than a census

- It is possible to obtain statistical results of a sufficiently high precision based on samples.

- Sometimes, it’s impossible to identify the whole

population

65

II SAMPLING METHODS

Trang 66

 A few parts selected for destructive testing

selected for auditVS

Trang 68

 Every individual or item from the population has

an equal chance of being selected

 Selection may be with replacement or without replacement

 Samples can be obtained from a table of random numbers or computer random number generators

68

Simple Random Samples

Trang 69

 Population divided into subgroups (called strata)

according to some common characteristic

 Simple random sample selected from each subgroup

 Samples from subgroups are combined into one

Trang 70

 Decide on sample size: n

 Divide frame of N individuals into groups of k

individuals: k=N/n

 Randomly select one individual from the 1st group

 Select every kth individual thereafter

Trang 71

• Population is divided into several “clusters,” each

representative of the population

• A simple random sample of clusters is selected

• All items in the selected clusters can be used, or items can

be chosen from a cluster using another probability sampling technique

71

Cluster Samples

Population

divided into

16 clusters Randomly selected

clusters for sample

Trang 73

self-1/ Sampling Error

- An error is expected to occur when making statement

about the population that is based on the observations

contained in a sample taken from the population

- The difference/deviation between the true (unknown)

value of a population parameter (mean, standard

deviation…) and its estimate, the sample statistic is the

Ngày đăng: 03/04/2019, 11:12

TỪ KHÓA LIÊN QUAN