After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Compute the range, variance, and standard deviation and know
Trang 2After completing this chapter, you should be able to:
Compute and interpret the mean, median, and mode for a set of data
Compute the range, variance, and standard deviation and know what these values mean
Construct and interpret a box and whisker graph
Compute and explain the coefficient of variation and
z scores
Use numerical measures along with graphs, charts, and
tables to describe data
Chapter Goals
Trang 3Chapter Topics
Measures of Center and Location
Mean, median, mode
Other measures of Location
Weighted mean, percentiles, quartiles
Measures of Variation
Range, interquartile range, variance and standard deviation, coefficient of variation
Using the mean and standard deviation together
Coefficient of variation, z-scores
Trang 4Range Percentiles
Interquartile Range Quartiles
Trang 5Measures of Center and Location
Center and Location
x n
x x
i
i
i W
w
x w w
x
w X
Overview
Trang 6Mean (Arithmetic Average)
x n
x N
Trang 7Mean (Arithmetic Average)
The most common measure of central tendency
Mean = sum of values divided by the number of values
Affected by extreme values (outliers)
4 3
Trang 9Median
To find the median, sort the n data values from low to high (sorted data is called a
data array )
Find the value in the i = (1/2)n position
The i th position is called the Median Index Point
If i is not an integer, round up to next highest integer
(continued)
Trang 10Median Example
Note that n = 13
Find the i = (1/2)n position:
i = (1/2)(13) = 6.5
Since 6.5 is not an integer, round up to 7
The median is the value in the 7th position :
M d = 12
(continued)
Data array:
4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24
Trang 11Shape of a Distribution
Describes how data is distributed
Mean = Median Mean < Median Median < Mean
Right-Skewed Left-Skewed Symmetric
Trang 12 A measure of location
The value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5
0 1 2 3 4 5 6
No Mode
Trang 1326
164
2 8 12 4
8) (2
7) (8
6) (12
5)
(4 w
x
w X
i
i i W
Trang 14 Five houses on a hill by the beach
Trang 16 Mean is generally used, unless extreme values (outliers) exist
Then Median is often used, since the median is not sensitive to
extreme values.
Example: Median home prices may be reported for a region – less sensitive to outliers
Which measure of location
is the “best”?
Trang 17Other Location Measures
The p th percentile in a data array:
p% are less than or equal to this
value
(100 – p)% are greater than or
equal to this value
(where 0 ≤ p ≤ 100)
Trang 1811.4
(19) 100
60 (n)
100
p
If i is not an integer, round up to the next higher integer value
So use value in the
i = 12 th position
Trang 20Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Example: Find the first quartile
(n = 9)
Q1 = 25th percentile, so find i : i = (9) = 2.25
so round up and use the value in the 3rd position: Q1 = 13
25 100
Trang 21Box and Whisker Plot
A graphical display of data using a central “box” and extended “whiskers”:
Trang 22Constructing the Box and Whisker Plot
Outliers Lower 1st Median 3rd Upper Limit Quartile Quartile Limit
The center box extends from Q1 to Q3
The line within the box is the median
The whiskers extend to the smallest and largest values within the calculated limits
Outliers are plotted outside the calculated limits
Trang 23Shape of Box and Whisker Plots
The Box and central line are centered between the
endpoints if data is symmetric around the median
(A Box and Whisker plot can be shown in either
vertical or horizontal format)
Trang 24Distribution Shape and Box and Whisker Plot
Right-Skewed
Trang 25Box-and-Whisker Plot Example
Below is a Box-and-Whisker plot for the following data:
0 2 2 2 3 3 4 5 6 11 27
0 2 3 6 12 27
Min Q 1 Q 2 Q 3 Max
*
Upper limit = Q3 + 1.5 (Q3 – Q1) = 6 + 1.5 (6 – 2) = 12
27 is above the upper limit so is shown as an outlier
Trang 26Measures of Variation
Variation
Variance Standard Deviation Coefficient of
Variation Population
Variance
Sample Variance
Population Standard Deviation
Sample Standard Deviation
Range
Interquartile
Range
Trang 27 Measures of variation give information on
the spread or variability of the data values.
Variation
Same center, different variation
Trang 28 Simplest measure of variation
Difference between the largest and the smallest observations:
Range = x maximum – x minimum
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
Trang 29 Ignores the way in which data are distributed
Sensitive to outliers
7 8 9 10 11 12 Range = 12 - 7 = 5
Trang 30Interquartile Range
Can eliminate some outlier problems by using
Eliminate some high-and low-valued
observations and calculate the range from the
remaining values.
Interquartile range = 3 rd quartile – 1 st quartile
Trang 31Interquartile Range Example
Median (Q2) X maximum
Trang 32 Average of squared deviations of values from
N
1 i
2 i
) x
(x s
n
1 i
2 i
Trang 33Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
Population standard deviation:
Sample standard deviation:
N
μ)
(x σ
N
1 i
2 i
n
2 i
Trang 341 8
16) (24
16) (14
16) (12
16) (10
1 n
) x (24
) x (14
) x (12
) x
(10 s
2 2
2 2
2 2
2 2
Trang 35Comparing Standard Deviations
Mean = 15.5
s = 3.338
Same mean, but different standard deviations:
Trang 36Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Is used to compare two or more sets of data
measured in different units
100% x
Trang 37deviation, but stock B is less variable relative
x s
CV B
Trang 38 If the data distribution is bell-shaped, then the interval:
contains about 68% of the values in
the population or the sample
The Empirical Rule
Trang 39 contains about 95% of the values in
the population or the sample
contains about 99.7% of the values
in the population or the sample
The Empirical Rule
Trang 40 Regardless of how the data are distributed,
at least (1 - 1/k 2 ) of the values will fall within
k standard deviations of the mean
Examples:
(1 - 1/1 2 ) = 0% …… k=1 (μ ± 1σ) (1 - 1/2 2 ) = 75% … k=2 (μ ± 2σ) (1 - 1/3 2 ) = 89% ……… k=3 (μ ± 3σ)
Tchebysheff’s Theorem
within
At least
Trang 41 A standardized data value refers to the number of standard deviations a value is from the mean
sometimes referred to as z-scores
Standardized Data Values
Trang 42(number of standard deviations x is from μ)
Standardized Population Values
σ
μ x
Trang 43(number of standard deviations x is from μ)
Standardized Sample Values
s
x x
Trang 44 IQ scores in a large population have a
bell-shaped distribution with mean μ = 100 and standard deviation σ = 15
Find the standardized score (z-score) for a person with an IQ of 121
Someone with an IQ of 121 is 1.4 standard deviations above the mean
Standardized Value Example
1.4 15
100
121 σ
μ
x
Answer:
Trang 45Using Microsoft Excel
Descriptive Statistics are easy to obtain from Microsoft Excel
Data / data analysis / descriptive statistics
Enter details in dialog box
Trang 46Using Excel
Select:
Data / data analysis / descriptive statistics
Trang 47 Enter dialog box
Trang 48Excel output
Microsoft Excel descriptive statistics output,
using the house price data:
House Prices:
$2,000,000 500,000 300,000 100,000 100,000
Trang 49Chapter Summary
Described measures of center and location
Mean, median, mode, weighted mean
Discussed percentiles and quartiles
Created Box and Whisker Plots
Illustrated distribution shapes
Symmetric, skewed
Trang 50Chapter Summary
Described measure of variation
Range, interquartile range, variance, standard deviation, coefficient of variation
Discussed Tchebysheff’s Theorem
Calculated standardized data values
(continued)