To answer this question we must understand the typical length of a song and the variation of song sizes around the typical length We can do this using summary statistics Copyright ©
Trang 2Describing Numerical
Data
Chapter 4
Trang 34.1 Summaries of Numerical Variables
Can 500 different songs fit on the iPod
Shuffle?
To answer this question we must understand the typical length of a song and the variation of song sizes around the typical length
We can do this using summary statistics
Copyright © 2011 Pearson Education, Inc.
3 of 42
Trang 44.1 Summaries of Numerical Variables
A Subset of the Data
Trang 54.1 Summaries of Numerical Variables
The Median
Value in the middle of a sorted list of numerical
values (a typical value)
Half of the values fall below the median; half fall above
It is the 50th Percentile
Copyright © 2011 Pearson Education, Inc.
5 of 42
Trang 64.1 Summaries of Numerical Variables
Common Percentiles
Lower Quartile = 25th Percentile
Upper Quartile = 75th Percentile
One quarter of the values fall below the lower
quartile and one quarter fall above the upper
quartile
Trang 74.1 Summaries of Numerical Variables
The Interquartile Range (IQR)
IQR = 75th Percentile – 25th Percentile
A measure of variation based on quartiles
Used to accompany the median
Copyright © 2011 Pearson Education, Inc.
7 of 42
Trang 84.1 Summaries of Numerical Variables
The Range
Range = Maximum - Minimum
Maximum Value = 100th Percentile
Minimum Value = 0th Percentile
Another measure of variation; not preferred
because based on extreme values
Trang 94.1 Summaries of Numerical Variables
The Five Number Summary
Trang 104.1 Summaries of Numerical Variables
The Five Number Summary for Song Sizes
Trang 114.1 Summaries of Numerical Variables
Summary Statistics for Song Sizes
Trang 124.1 Summaries of Numerical Variables
The Mean (Average)
Arithmetic average; divide the sum of the values
by the number of values (another typical value)
The symbol y represents the variable of interest
The symbol read “y bar” represents the meany
Trang 134.1 Summaries of Numerical Variables
The Mean (Average)
Trang 144.1 Summaries of Numerical Variables
The Variance (s 2)
Is a measure of variation based on the mean
How far a value is from the mean is known as its
deviation; the variance is the average of the squared
deviations
Trang 15
4.1 Summaries of Numerical Variables
Trang 164.1 Summaries of Numerical Variables
The Standard Deviation (SD)
Is the square root of the variance
Is a measure of variability in the original units of the data (the variance results in squared units)
2
Trang 174.1 Summaries of Numerical Variables
Summary Statistics for Song Sizes
Trang 184M Example 4.1: MAKING M&M’s
Motivation
How many M&M’s are needed to fill a bag labeled to weigh 1.6 ounces?
Trang 194M Example 4.1: MAKING M&M’s
Method
Data are weights of 72 plain chocolate M&M’s taken from several packages To get a measure of the amount of variation relative to the typical size, we use the ratio of the standard deviation to the
mean (known as the coefficient of variation)
Copyright © 2011 Pearson Education, Inc.
19 of 42
v
s c
y
Trang 204M Example 4.1: MAKING M&M’s
Mechanics
Mean Weight = 0.86 gm
SD = 0.04 gm
Trang 214M Example 4.1: MAKING M&M’s
Message
Since the SD is quite small compared to the mean
(with a c v of about 5%) the results suggest that 53
pieces are usually enough to fill a bag
A bag labeled 1.6 ounces weighs about 45.36 grams
Since there is little variability around the typical weight of
an M&M, we can calculate the number of pieces to fill a
1.6 ounce bag as 45.36/0.86.
Copyright © 2011 Pearson Education, Inc.
21 of 42
Trang 224.2 Histograms and the
Distribution of Numerical Data
Histograms
Plot the distribution of a numerical variable by
showing counts of values occurring within
adjacent intervals
Similar to bar charts but designed for continuous quantitative data (bar charts are only appropriate for discrete categories)
Trang 234.2 Histograms and the
Distribution of Numerical Data
Histogram of Song Sizes
Copyright © 2011 Pearson Education, Inc.
23 of 42
Trang 244.2 Histograms and the
Distribution of Numerical Data
Histogram of Song Sizes
Indicates a few very long songs (outliers)
The graph devotes more than half of its area to
show less than 1% of the songs (white space
rule: graphs with mostly white space can be
improved by changing the interval of the plot to
focus on the data rather than the white space)
Trang 254.3 Boxplot
Graph of the Five Number Summary
Copyright © 2011 Pearson Education, Inc.
25 of 42
Trang 264.3 Boxplot
Combining Boxplots with Histograms
Boxplots locate the median and quartiles
and highlight outliers
The median splits the area of the histogram
in half (unlike the mean, it is resistant or
robust to the effects of outliers)
Trang 274.3 Boxplot
Boxplot with Histogram of Song Sizes
Copyright © 2011 Pearson Education, Inc.
27 of 42
Trang 284.4 Shape of a Distribution
Modes
bimodal; three or more is multimodal
height is uniform
Trang 294.4 Shape of a Distribution
Symmetry and Skewness
A distribution is symmetric if the two sides
of its histogram are mirror images
A distribution is skewed if one tail of the
histogram stretches out farther than the
other
Copyright © 2011 Pearson Education, Inc.
29 of 42
Trang 304.4 Shape of a Distribution
Distribution of Song Sizes
The mode lies between 3 and 4 MB
The distribution is right skewed (the right
tail stretches out farther than the left tail)
Trang 344M Example 4.2:
EXECUTIVE COMPENSATION
Message
The distribution of annual salaries of CEO’s
in 2003 is unimodal, nearly symmetric
around the median of $650,000, and right
skewed The average is $697,000 The
largest salary is $4,000,000.
Trang 354.4 Shape of a Distribution
Bell-Shaped Distributions and Empirical Rule
A bell-shaped distribution is symmetric and unimodal
The empirical rule uses the standard
deviation to describe how data with a
bell-shaped distribution cluster around the mean
Copyright © 2011 Pearson Education, Inc.
35 of 42
Trang 364.4 Shape of a Distribution
The Empirical Rule
Trang 374.4 Shape of a Distribution
Standardizing
Converting data to z-scores
Z- scores measure the distance from the
mean in standard deviations
Copyright © 2011 Pearson Education, Inc.
37 of 42
y y z
s
Trang 384.5 Epilog
Can 500 different songs fit on the iPod
Shuffle?
Because of variation, not every collection of 500
songs will fit The longest 500 songs won’t fit
However, based on the typical song size, the
amount of variation in song sizes and the shape
of its distribution, we can say that most
collections of 500 songs will fit!
Trang 39Best Practices
histograms and summaries such as the mean and standard deviation
with a graph
when preparing a histogram
Copyright © 2011 Pearson Education, Inc.
39 of 42
Trang 40Best Practices (Continued)
Scale your plots to show data, not empty space
Anticipate what you will see in a histogram
Label clearly
Check for gaps
Trang 41 Do not ignore the presence of outliers.
Copyright © 2011 Pearson Education, Inc.
41 of 42