2.1 Types of DataData at least for purposes of Statistics fall into three main groups: • Numerical interval or quantitative data • Nominal categorical or qualitative data • Ordinal ranke
Trang 2Chapter 2
Graphical descriptive methods
Trang 3Introduction and Re-cap…
Descriptive statistics
involves arranging, summarising, and presenting a
set of data in such a way that useful information is
Trang 4Populations and Samples
The graphical and tabular methods presented here
apply to both entire populations and samples drawn
Trang 5A variable is some characteristic of a population
or sample
E.g student grades.
Typically denoted with a capital letter: X, Y, Z…
The values of the variable are the range of
possible values for a variable
E.g student marks (0…100)
Data are the observed values of a variable.
Trang 62.1 Types of Data
Data (at least for purposes of Statistics) fall into three main groups:
• Numerical (interval or quantitative) data
• Nominal (categorical or qualitative) data
• Ordinal (ranked) data
6
Trang 7Numerical Data…
Numerical data
• Real numbers, i.e heights, weights, prices,
waiting time at a medical practice, etc
• Also referred to as quantitative or interval.
• Arithmetic operations can be performed on
numerical data, thus its meaningful to talk
about 2*Height, or Price + $1, and so on
Trang 8Nominal Data
•The values of nominal data are categories.
E.g responses to questions about marital status are
categories, coded as: Single = 1, Married = 2,
Trang 9Ordinal Data
•Ordinal data appear to be categorical in
nature, but their values have an order; a
excellent > poor or fair < very goodThat is, order is maintained no matter what
Ordinal Data…
Trang 10Types of data – Examples
exam grade
HD D C P F
exam grade
HD D C P F
Ordinal data
Food quality
Excellent Good Satisfactory Poor
Food quality Excellent Good Satisfactory Poor
With ordinal data, all we can use is computations involving the ordering
process
10
Trang 11Calculations for Types of Data
As mentioned above,
• All calculations are permitted on interval data
• No calculations are allowed for nominal data,
except counting the number of observations in each category and calculating their proportions
• Only calculations involving a ranking process
are allowed for ordinal data.
This lends itself to the following ‘hierarchy of
data’…
Trang 12Hierarchy of Data…
Numerical
• Values are real numbers.
• All calculations are valid.
• Data may be treated as ordinal or nominal.
• Values must represent the ranked order of the data.
• Calculations based on an ordering process are valid.
• Data may be treated as nominal but not as numerical.
12
Trang 13Other Forms of Data
• Cross-sectional data is collected at a certain
point in time across a number of units of interest – marketing survey (observe preferences by
gender, age) – test score in a statistics course exam
– starting salaries of graduates of an MBA
program in a particular year.
• Time-series data is collected over successive
points in time
– weekly closing price of gold
– monthly tourist arrivals in Australia.
Trang 142.2 Graphical and tabular
techniques for nominal data
The only allowable calculation on nominal data is
to count the frequency of each value of the
variable
We can summarise the data in a table that
presents the categories and their counts called a
frequency distribution.
A relative frequency distribution lists the
categories and the proportion with which each occurs
14
Trang 15Introduction
• The methods presented apply to both
– the entire population, and
– a sample selected from the population
Trang 16Graphical techniques
for nominal data
are used primarily for nominal data.
when the raw data can be naturally
categorised in a meaningful manner.
16
Trang 17Bar charts
data.
frequency of each category as a bar rising vertically from the horizontal axis
frequency of the corresponding category.
Trang 18• Another useful chart to present nominal data is the pie chart.
represent the proportions of appearance for nominal data.
slices whose areas are proportional to the
frequencies (or relative frequencies),
thereby displaying the proportion of
occurrences of each category.
Pie charts
18
Trang 19Example 2.1
• To determine the approximate market share of various women’s magazines in New Zealand, a women’s magazine readership survey was
conducted using a sample of 200 readers
• Data was collected and the count of the
occurrences (frequencies) was recorded for each magazine
• The frequencies were presented in a bar chart
• Then the frequencies were converted to
proportions and the results were presented in a
pie chart
Trang 20Example 2.1
1 = Australian Women’s Weekly (NZ Edition); 2 = Next;
3 = NZ New Idea; 4 = NZ Woman’s Day; 5 = NZ Women’s Weekly; and 6 = That’s Life.
Australian Women’s Week ly, NZ Edn (1) 36 18
Trang 21Example 2.1 cont (Excel representation)
Trang 22The size of each slice in a pie chart is proportional
to the percentage corresponding to the category it
represents
(10/100)(360 0 ) = 36 0
22
Trang 23– Use bar charts also when the order in which
data are presented is meaningful
Trend in total exports, Australia, 1992–2009
Trang 242.3 Graphical Techniques for
Numerical Data
There are several graphical methods that are
used when the data are numerical (i.e
quantitative, non-categorical)
The most important of these graphical methods
is the histogram.
The histogram is not only a powerful graphical
technique used to summarise interval data, but
it is also used to help explain probabilities.
Trang 25Example 2.5
• Providing information concerning the
monthly bills of new subscribers in the first month after signing on with a
telephone company
– collect data
– prepare a frequency distribution
– draw a histogram
Trang 26As part of a larger study, a long-distance company wanted to acquire information about the monthly bills of new subscribers in the first month after signing with the company The company’s marketing manager conducted a survey of 200 new residential subscribers wherein the first month’s bills were recorded These data are stored
in file XM02-05 The general manager planned to present his findings to senior executives What information can be extracted from these data?
Example 2.5 …
26
Trang 27In Example 2.1 we created a frequency distribution
of the 6 categories In this example we also create
a frequency distribution by counting the number of observations that fall into a series of intervals,
called classes
The justification for the classes chosen below will
be discussed later
Example 2.5 …
Trang 28We have chosen eight classes defined in such a way that each observation falls into one and only one class These classes are defined as follows:
Classes
Amounts that are less than or equal to 15
Amounts that are more than 15 but less than or equal to 30
Amounts that are more than 30 but less than or equal to 45
Amounts that are more than 45 but less than or equal to 60
Amounts that are more than 60 but less than or equal to 75
Amounts that are more than 75 but less than or equal to 90
Amounts that are more than 90 but less than or equal to 105
Amounts that are more than 105 but less than or equal to 120
Example 2.5 …
28
Trang 30About half (71+37=108)
of the bills are ‘small’,
i.e less than $30.
There are only a few telephone bills in the middle range.
(18+28+14=60)÷200 = 30% i.e nearly a third of the phone bills are $90 or more.
30
Trang 31Building a Histogram…
1) Collect the data
2) Create a frequency distribution for the data…
Trang 32Class width
widths, but sometimes unequal class
widths are called for.
frequency associated with some classes is too low Then,
– several classes are combined together to form
a wider and ‘more populated’ class
– it is possible to form an open-ended class at the higher or lower end of the histogram.
32
Building a Histogram…
Trang 331) Collect the data
2) Create a frequency distribution for the data…
How?
a) Determine the number of classes to use [8]
b) Determine how wide to make each class
(assuming equal class width) How?
Look at the range of the data, that is,
Range = Largest observation
– Smallest observationRange = $119.63 – $0 = $119.63
Then each class width becomes:
Building a Histogram…
Trang 34Building a Histogram…
34
Trang 35Building a Histogram…
Trang 36Symmetry
A histogram is said to be symmetric if, when
we draw a vertical line down the center of the histogram, the two sides are identical in shape and size:
Trang 37A skewed histogram is one with a long tail
extending to either the right or the left:
Trang 38Modality
A unimodal histogram is one with a single peak, while
a bimodal histogram is one with two peaks:
A modal class is the class with
the largest number of observations
Shapes of Histograms…
Trang 39Bell Shape
A special type of symmetric unimodal
histogram is one that is bell shaped:
Many statistical techniques require
that the population be bell
shaped.
Drawing the histogram helps
Shapes of Histograms…
Trang 40Compare and contrast the following histograms based
on data from Example 2.7: The marks from the
computer-based statistics course and the manual statistics course have very different histograms…
Trang 41• Retains information about individual observations that would normally be lost in the creation of a histogram.
• Split each observation into two parts, a stem and a leaf:
• e.g Observation value: 42.19
• There are several ways to split it up…
• We could split it at the decimal point:
• Or split it at the ‘tens’ position (while rounding to the nearest integer in the ‘ones’ position).
Trang 42• Continue this process for all the observations
Then, use the ‘stems’ for the classes and each
leaf becomes part of the histogram (based on
Example 2.5 data) as follows…
Thus, we still have access to
our original data point’s value!
Stem & Leaf Display…
Trang 43Histogram and Stem and Leaf
Trang 44Relative frequency
• It is often preferable to show the
relative frequency (proportion) of
observations falling into each class,
rather than the frequency itself.
Class relative frequency = Class frequency
Total number of observations
44
Trang 45• Relative frequencies should be used
when
– the population relative frequencies are
studied
– comparing two or more histograms
– the number of observations of the samples studied are different
Relative frequency
Trang 46Cumulative frequency of a class
than the upper limit of that class.
class, we add the frequency of that class
and the frequencies of all previous classes.
particular class is the proportion of
measurements that are less than the upper limit of that class.
46
Trang 47• (pronounced ‘Oh-jive’) is a graph of
a cumulative frequency distribution.
• We create an ogive in three steps…
• First, from the frequency distribution created
earlier, calculate relative frequencies:
Class relative frequency = Class frequency
Total number of observations
Trang 49• Is a graph of a cumulative frequency
distribution
• We create an ogive in three steps…
1) Calculate relative frequencies
2) Calculate cumulative relative
frequencies by adding the current
class’ relative frequency to the previous
class’ cumulative relative frequency
(For the first class, its cumulative relative
frequency is just its relative frequency.)
Trang 50TABLE 2.15 Cumulative relative frequencies for Example 2.5
Trang 51• Is a graph of a cumulative frequency
distribution
• 1) Calculate relative frequencies
• 2) Calculate cumulative relative
frequencies
• 3) Graph the cumulative relative
frequencies… Example 2.5 Ogive
Trang 52Ogive…
The ogive can be used to
answer questions like:
What telephone bill value
is at the 50th percentile?
(Refer also to Fig 2.21 in your textbook.)
around $35
Example 2.5 Ogive
Trang 532.4 Describing Time Series Data
• Observations measured at the same point in
time across individual units are called
cross-sectional data.
• Observations measured at successive points in
time on a single unit are called time-series
data
• Time-series data are graphed on a line chart,
which plots the value of the variable on the vertical axis against the time periods on the horizontal axis
• Time series data graphed on a line chart is
Trang 54Time Series Data
We recorded the value of Australian exports from
1992 to 2009 (Figure 2.22) Draw a line chart to describe these data and briefly describe the
results
Trang 55Line Chart
• Plot the frequency of a category above the point on the horizontal axis
representing that category.
• Use line charts when the categories are points in time.
• Line charts are particularly useful when the trend over time is to be
emphasised.
Trang 56Line Chart
Figure 2.22 Line chart showing change in Australian exports over time
Trang 57Line Chart
• Australian exports have had a slow but steady increase from 1992 to 2004
• After 2004, Australian exports have
been increasing steadily at a much
higher rate.
Trang 58techniques.
Trang 59A classification table (or
cross-tabulation table) is used to describe the
A cross-classification table lists the
frequency of each combination of the
values of the two variables…
Describing the Relationship between Two Nominal Variables
Trang 60Example 2.8
competing newspapers: N1, N2, N3 and N4
advertising managers of the newspapers
need to know which segments of the
newspaper market are reading their papers
relationship between newspapers read and occupation
Trang 61A sample of newspaper readers was
asked to report which newspaper they
read: N1, N2, N3, N4, and to indicate
whether they were blue-collar worker (1), white-collar worker (2), or professional
(3)
The responses are stored in file XM02-08.
Example 2.8
Trang 62Example 2.8
the 12 combinations occurs, we produced the Table 2.16.
Trang 63Example 2.8
then there will be differences in the
newspapers read among the occupations
frequencies in each row to relative
frequencies in each row
Trang 65Example 2.8
• Interpretation: The relative frequencies in the rows
2 and 3 are similar, but there are large differences between rows 1 and 2, and between rows 1 and 3
• Row 1: Blue collar (1); Row 2: White collar (2);
Row 3: Professional (3)
• This tells us that blue collar workers tend to read different newspapers from both white collar
workers and professionals and that white collar
and professionals are quite similar in their
newspaper choice
Trang 66newspaper N2 more than twice
as often as newspaper N3.
Trang 67Describing the Relationship
between Two Numerical Variables
• Often we are interested in the relationships
between two numerical variables
Trang 68Example 2.9
• A small-business owner wants to assess the effects of advertising on sales levels
• Paired observation data were collected
• Each pair consisted of monthly advertising
expenditure and monthly sales levels
68
Trang 69• A scatter diagram can describe the
relationship between advertising
expenditure and sales.
0 10 20 30 40 50 60
Trang 70Patterns of Scatter Diagrams…
Linearity and direction are two concepts we are interested in
Weak or non-linear relationship
Trang 71• That is, they believed that when the price of oil increased the price of petrol also increased, but
Trang 72Chapter-Opening Example
WERE OIL COMPANIES GOUGING CUSTOMERS 2006?: SOLUTION
1999-• To determine whether this perception is
accurate we determined the monthly figures for both commodities CH02:\Oil
• Graphically depict these data and describe the findings
Trang 73Chapter-Opening Example
Trang 74Chapter-Opening Example
Interpreting the results:
• The scatter diagram reveals that the two
prices are strongly related linearly
• As the oil price increases, petrol price also
increases When the price of oil was below
A$85, the relationship between the two
variables was stronger than when the price of oil exceeded A$85