1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

3. Lecture 3 - Descriptive Statistics

39 5 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 39
Dung lượng 1,92 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Thống kê mô tả

Trang 1

Lecture 3 – Descriptive

Statistics

Trang 2

•Measure of data’s location, variability

•Exploratory Data Analysis

•Association Between Two Variables

Trang 3

Measures of Location

If the measures are computed for data from a sample,

they are called sample statistics

If the measures are computed for data from a population,

they are called population parameters

A sample statistic is referred to

as the point estimator of thecorresponding population parameter

Trang 4

• The mean of a data set is the average of all the

data values

x

 The sample mean is the point

estimator of the population mean

Trang 5

Sample Mean x

Number ofobservations

in the sample

Number ofobservations

n

xix

n

Trang 6

Population Mean 

Number ofobservations inthe population

Number ofobservations inthe population

Sum of the values

xiN

Trang 7

 Whenever a data set has extreme values, the median

is the preferred measure of central location

 A few extremely large incomes or property values can inflate the mean

 The median is the measure of location most often

reported for annual income and property value data

 The median of a data set is the value in the middle

when the data items are arranged in ascending order

Trang 11

 A percentile provides information about how the

data are spread over the interval from the smallest value to the largest value

 Admission test scores for colleges and universities are frequently reported in terms of percentiles

• The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this

value or more.

Trang 12

Quartiles are specific percentiles:

 First Quartile = 25th Percentile

 Second Quartile = 50th Percentile = Median

 Third Quartile = 75th Percentile

Quartiles

Trang 13

Measures of Variability

 It is often desirable to consider measures

of variability (dispersion), as well as

measures of location.

 For example, in choosing supplier A or

supplier B we might consider not only

the average delivery time for each, but

also the variability in delivery time for each.

Trang 14

Measures of Variability  Range

 Interquartile Range

 Variance

 Standard Deviation

 Coefficient of Variation

Trang 15

 The range of a data set is the difference

between the largest and smallest data values.

 It is the simplest measure of variability.

 It is very sensitive to the smallest and

largest data values.

Trang 16

Interquartile Range

 The interquartile range of a data set is the difference between the third quartile and the first quartile

 It is the range for the middle 50% of the data

 It overcomes the sensitivity to extreme data values

Trang 17

The variance is a measure of variability that utilizes all the data.

Variance

The variance is useful in comparing the variability

of two or more variables

Trang 18

The variance is the average of the squared

differences between each data value and the mean.

The variance is the average of the squared

differences between each data value and the mean.

for a sample for apopulation

Trang 19

Standard Deviation

The standard deviation of a data set is the positive square root of the variance

It is measured in the same units as the data, making

it more easily interpreted than the variance

Trang 20

The standard deviation is computed as follows:

The standard deviation is computed as follows:

for a sample for apopulation

Standard Deviation

ss2

ss2   22

Trang 21

The coefficient of variation is computed as follows:

Trang 22

Measures of Distribution Shape, Relative Location, and Detecting Outliers

• Distribution Shape

 z-Scores

 Chebyshev’s Theorem

 Empirical Rule

 Detecting Outliers

Trang 23

Distribution Shape:

Skewness An important measure of the shape of a

distribution is called skewness

 The formula for the skewness of sample data is

 Skewness can be easily computed using statistical software

1 (

Skewness

s

x

x n

1 (

Skewness

s

x

x n

n

Trang 26

The z-score is often called the standardized value.

The z-score is often called the standardized value

It denotes the number of standard deviations a data value x i is from the mean

It denotes the number of standard deviations a data value x i is from the mean

Excel’s STANDARDIZE function can be used to

compute the z-score

Excel’s STANDARDIZE function can be used to

compute the z-score

Trang 27

 A data value less than the sample mean will have a z-score less than zero

 A data value greater than the sample mean will have

a z-score greater than zero

 A data value equal to the sample mean will have a z-score of zero

 An observation’s z-score is a measure of the relative location of the observation in a data set

Trang 28

construct a box plot.

We simply sort the data values into ascending order and identify the five-number summary and then

construct a box plot

Trang 29

Five-Number Summary

1 Smallest Value Smallest Value

First Quartile First Quartile Median

Median Third Quartile

Trang 30

A key to the development of a box plot is the

computation of the median and the quartiles Q1 and

Q3.

A key to the development of a box plot is the

computation of the median and the quartiles Q1 and

Q3.

Box plots provide another way to identify outliers Box plots provide another way to identify outliers.

Trang 31

40 0

62 5

62 5

• A box is drawn with its ends located at the first and third quartiles.

Trang 32

Box Plot

 Limits are located (not drawn)

using the interquartile range (IQR).

 Data outside these limits are

considered outliers.

 The locations of each outlier is

shown with the symbol * .

Trang 33

Box Plot

An excellent graphical technique for making comparisons among two or more groups.

Trang 34

Measures of Association

Between Two Variables

Thus far we have examined numerical methods used

to summarize the data for one variable at a time

Thus far we have examined numerical methods used

to summarize the data for one variable at a time

Often a manager or decision maker is interested in the relationship between two variables

Often a manager or decision maker is interested in the relationship between two variables

Two descriptive measures of the relationship

between two variables are covariance and correlation coefficient

Two descriptive measures of the relationship

between two variables are covariance and correlation coefficient

Trang 35

Positive values indicate a positive relationship

Positive values indicate a positive relationship

Negative values indicate a negative relationship Negative values indicate a negative relationship

The covariance is a measure of the linear association between two variables

The covariance is a measure of the linear association between two variables

Trang 36

for populations

Trang 37

Correlation

Coefficient

Just because two variables are highly correlated, it does not mean that one variable is the cause of the other

Just because two variables are highly correlated, it does not mean that one variable is the cause of the other

Correlation is a measure of linear association and not necessarily causation

Correlation is a measure of linear association and not necessarily causation

Trang 38

The correlation coefficient is computed as follows:

Ngày đăng: 16/09/2021, 17:23

TỪ KHÓA LIÊN QUAN