3.1 Looking At DataFrequency and Relative Frequency Tables The distribution of a categorical variable is a list of values with its associated count frequency A frequency table summar
Trang 2Describing Categorical
Data
Chapter 3
Trang 33.1 Looking At Data
Which hosts send the most visitors to
Amazon’s Web site?
Data set consists of 188,996 visits
To answer this question we must describe the
variation in Host
Trang 43.1 Looking At Data
Frequency and Relative Frequency Tables
The distribution of a categorical variable is a list
of values with its associated count (frequency)
A frequency table summarizes the distribution of
a categorical variable
A relative frequency table shows the proportion
(or percentage) in each category
Trang 53.1 Looking At Data
Trang 63.2 Charts of Categorical Data
Bar Charts and Pie Charts
Unless you need to know exact counts, charts are better than tables for summarizing more than five categories
The two most common displays of a categorical
variable are a bar chart and a pie chart
Trang 73.2 Charts of Categorical Data
The Bar Chart
Uses horizontal or vertical bars to show the
distribution of a categorical variable
Is called a Pareto chart when the categories are sorted by frequency (popular in quality control)
Becomes cluttered with too many categories
Is appropriate for ordinal categorical variables
Trang 83.2 Charts of Categorical Data
Bar Chart (Horizontal) of Top 10 Hosts
Trang 93.2 Charts of Categorical Data
Bar Chart (Vertical) of Top 10 Hosts
Trang 103.2 Charts of Categorical Data
The Pie Chart
Uses wedges of a circle to show the distribution of
a categorical variable
Commonly chosen to illustrate market shares or
sources of revenue for a company
Less useful than bar charts if we want to compare actual counts (easier to compare bars than angles
of wedges)
Trang 113.2 Charts of Categorical Data
Pie Chart of Top 10 Hosts
Trang 123.3 The Area Principle
The Fundamental Rule for Data Displays
The area occupied by a part of the graph/chart
that displays data should be proportional to the
amount of data it represents
Charts decorated to attract attention often violate the area principle
Trang 133.3 The Area Principle
An Example Violating the Area Principle
Trang 143.3 The Area Principle
The Same Example Respecting the Area Principle
Trang 154M Example 3.1: ROLLING OVER
Motivation
Are certain types of vehicles more prone to
roll-over accidents than others?
Trang 164M Example 3.1: ROLLING OVER
Method
Data gathered from Fatality Analysis Reporting
System (FARS) for roll-over accidents on
interstate highways Cases that make up the
rows are accidents resulting in roll-overs in 2000 The column of interest is model of the car
involved
Trang 174M Example 3.1: ROLLING OVER
Mechanics
Trang 184M Example 3.1: ROLLING OVER
Mechanics
Trang 194M Example 3.1: ROLLING OVER
Message
Ford Broncos were involved in more than twice as many roll-over accidents as the next-closest
model
Trang 204M Example 3.2: CHIP SALES
Motivation
Infineon pled guilty to price fixing for DRAM’s
in September 2004 Did Infineon gain a
larger share of the market for chips during this period?
Trang 214M Example 3.2: CHIP SALES
Method
Trang 224M Example 3.2: CHIP SALES
Mechanics
Trang 234M Example 3.2: CHIP SALES
Trang 243.4 Mode and Median
Mode
Category with the highest frequency
The longest bar in a bar chart
The widest slice in a pie chart
Two or more categories can tie with the highest
frequency (bimodal or multimodal)
Trang 253.4 Mode and Median
Median
Not appropriate for nominal data
Data must be ordinal
It is the category label of the middle observation
in ordered data
Trang 27Best Practices (Continued)
Respect the area principle
Show the best plots to answer the motivating
question
Label your chart to show the categories and
indicate whether some have been combined or
omitted
Trang 28 Avoid elaborate plots that may be deceptive
Do not show too many categories
Do not put ordinal data in a pie chart
Do not carelessly round data