1. Trang chủ
  2. » Giáo án - Bài giảng

Chap 2: Graphical descriptive methods

77 238 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Graphical Descriptive Methods
Trường học Standard University
Chuyên ngành Statistics
Thể loại Bài báo
Thành phố Standard City
Định dạng
Số trang 77
Dung lượng 1,77 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

2.1 Types of DataData at least for purposes of Statistics fall into three main groups: • Numerical interval or quantitative data • Nominal categorical or qualitative data • Ordinal ranke

Trang 2

Chapter 2

Graphical descriptive methods

Trang 3

Introduction and Re-cap…

Descriptive statistics

involves arranging, summarising, and presenting a

set of data in such a way that useful information is

Trang 4

Populations and Samples

The graphical and tabular methods presented here

apply to both entire populations and samples drawn

Trang 5

A variable is some characteristic of a population

or sample

E.g student grades.

Typically denoted with a capital letter: X, Y, Z…

The values of the variable are the range of

possible values for a variable

E.g student marks (0…100)

Data are the observed values of a variable.

Trang 6

2.1 Types of Data

Data (at least for purposes of Statistics) fall into three main groups:

• Numerical (interval or quantitative) data

• Nominal (categorical or qualitative) data

• Ordinal (ranked) data

6

Trang 7

Numerical Data…

Numerical data

• Real numbers, i.e heights, weights, prices,

waiting time at a medical practice, etc

Also referred to as quantitative or interval.

• Arithmetic operations can be performed on

numerical data, thus its meaningful to talk

about 2*Height, or Price + $1, and so on

Trang 8

Nominal Data

The values of nominal data are categories.

E.g responses to questions about marital status are

categories, coded as: Single = 1, Married = 2,

Trang 9

Ordinal Data

Ordinal data appear to be categorical in

nature, but their values have an order; a

excellent > poor or fair < very goodThat is, order is maintained no matter what

Ordinal Data…

Trang 10

Types of data – Examples

exam grade

HD D C P F

exam grade

HD D C P F

Ordinal data

Food quality

Excellent Good Satisfactory Poor

Food quality Excellent Good Satisfactory Poor

With ordinal data, all we can use is computations involving the ordering

process

10

Trang 11

Calculations for Types of Data

As mentioned above,

All calculations are permitted on interval data

No calculations are allowed for nominal data,

except counting the number of observations in each category and calculating their proportions

• Only calculations involving a ranking process

are allowed for ordinal data.

This lends itself to the following ‘hierarchy of

data’…

Trang 12

Hierarchy of Data…

Numerical

• Values are real numbers.

• All calculations are valid.

• Data may be treated as ordinal or nominal.

• Values must represent the ranked order of the data.

• Calculations based on an ordering process are valid.

• Data may be treated as nominal but not as numerical.

12

Trang 13

Other Forms of Data

Cross-sectional data is collected at a certain

point in time across a number of units of interest – marketing survey (observe preferences by

gender, age) – test score in a statistics course exam

– starting salaries of graduates of an MBA

program in a particular year.

Time-series data is collected over successive

points in time

– weekly closing price of gold

– monthly tourist arrivals in Australia.

Trang 14

2.2 Graphical and tabular

techniques for nominal data

The only allowable calculation on nominal data is

to count the frequency of each value of the

variable

We can summarise the data in a table that

presents the categories and their counts called a

frequency distribution.

A relative frequency distribution lists the

categories and the proportion with which each occurs

14

Trang 15

Introduction

• The methods presented apply to both

– the entire population, and

– a sample selected from the population

Trang 16

Graphical techniques

for nominal data

are used primarily for nominal data.

when the raw data can be naturally

categorised in a meaningful manner.

16

Trang 17

Bar charts

data.

frequency of each category as a bar rising vertically from the horizontal axis

frequency of the corresponding category.

Trang 18

• Another useful chart to present nominal data is the pie chart.

represent the proportions of appearance for nominal data.

slices whose areas are proportional to the

frequencies (or relative frequencies),

thereby displaying the proportion of

occurrences of each category.

Pie charts

18

Trang 19

Example 2.1

• To determine the approximate market share of various women’s magazines in New Zealand, a women’s magazine readership survey was

conducted using a sample of 200 readers

• Data was collected and the count of the

occurrences (frequencies) was recorded for each magazine

• The frequencies were presented in a bar chart

• Then the frequencies were converted to

proportions and the results were presented in a

pie chart

Trang 20

Example 2.1

1 = Australian Women’s Weekly (NZ Edition); 2 = Next;

3 = NZ New Idea; 4 = NZ Woman’s Day; 5 = NZ Women’s Weekly; and 6 = That’s Life.

Australian Women’s Week ly, NZ Edn (1) 36 18

Trang 21

Example 2.1 cont (Excel representation)

Trang 22

The size of each slice in a pie chart is proportional

to the percentage corresponding to the category it

represents

(10/100)(360 0 ) = 36 0

22

Trang 23

– Use bar charts also when the order in which

data are presented is meaningful

Trend in total exports, Australia, 1992–2009

Trang 24

2.3 Graphical Techniques for

Numerical Data

There are several graphical methods that are

used when the data are numerical (i.e

quantitative, non-categorical)

The most important of these graphical methods

is the histogram.

The histogram is not only a powerful graphical

technique used to summarise interval data, but

it is also used to help explain probabilities.

Trang 25

Example 2.5

• Providing information concerning the

monthly bills of new subscribers in the first month after signing on with a

telephone company

– collect data

– prepare a frequency distribution

– draw a histogram

Trang 26

As part of a larger study, a long-distance company wanted to acquire information about the monthly bills of new subscribers in the first month after signing with the company The company’s marketing manager conducted a survey of 200 new residential subscribers wherein the first month’s bills were recorded These data are stored

in file XM02-05 The general manager planned to present his findings to senior executives What information can be extracted from these data?

Example 2.5 …

26

Trang 27

In Example 2.1 we created a frequency distribution

of the 6 categories In this example we also create

a frequency distribution by counting the number of observations that fall into a series of intervals,

called classes

The justification for the classes chosen below will

be discussed later

Example 2.5 …

Trang 28

We have chosen eight classes defined in such a way that each observation falls into one and only one class These classes are defined as follows:

Classes

Amounts that are less than or equal to 15

Amounts that are more than 15 but less than or equal to 30

Amounts that are more than 30 but less than or equal to 45

Amounts that are more than 45 but less than or equal to 60

Amounts that are more than 60 but less than or equal to 75

Amounts that are more than 75 but less than or equal to 90

Amounts that are more than 90 but less than or equal to 105

Amounts that are more than 105 but less than or equal to 120

Example 2.5 …

28

Trang 30

About half (71+37=108)

of the bills are ‘small’,

i.e less than $30.

There are only a few telephone bills in the middle range.

(18+28+14=60)÷200 = 30% i.e nearly a third of the phone bills are $90 or more.

30

Trang 31

Building a Histogram…

1) Collect the data

2) Create a frequency distribution for the data…

Trang 32

Class width

widths, but sometimes unequal class

widths are called for.

frequency associated with some classes is too low Then,

– several classes are combined together to form

a wider and ‘more populated’ class

– it is possible to form an open-ended class at the higher or lower end of the histogram.

32

Building a Histogram…

Trang 33

1) Collect the data

2) Create a frequency distribution for the data…

How?

a) Determine the number of classes to use [8]

b) Determine how wide to make each class

(assuming equal class width) How?

Look at the range of the data, that is,

Range = Largest observation

– Smallest observationRange = $119.63 – $0 = $119.63

Then each class width becomes:

Building a Histogram…

Trang 34

Building a Histogram…

34

Trang 35

Building a Histogram…

Trang 36

Symmetry

A histogram is said to be symmetric if, when

we draw a vertical line down the center of the histogram, the two sides are identical in shape and size:

Trang 37

A skewed histogram is one with a long tail

extending to either the right or the left:

Trang 38

Modality

A unimodal histogram is one with a single peak, while

a bimodal histogram is one with two peaks:

A modal class is the class with

the largest number of observations

Shapes of Histograms…

Trang 39

Bell Shape

A special type of symmetric unimodal

histogram is one that is bell shaped:

Many statistical techniques require

that the population be bell

shaped.

Drawing the histogram helps

Shapes of Histograms…

Trang 40

Compare and contrast the following histograms based

on data from Example 2.7: The marks from the

computer-based statistics course and the manual statistics course have very different histograms…

Trang 41

• Retains information about individual observations that would normally be lost in the creation of a histogram.

Split each observation into two parts, a stem and a leaf:

e.g Observation value: 42.19

• There are several ways to split it up…

• We could split it at the decimal point:

• Or split it at the ‘tens’ position (while rounding to the nearest integer in the ‘ones’ position).

Trang 42

• Continue this process for all the observations

Then, use the ‘stems’ for the classes and each

leaf becomes part of the histogram (based on

Example 2.5 data) as follows…

Thus, we still have access to

our original data point’s value!

Stem & Leaf Display…

Trang 43

Histogram and Stem and Leaf

Trang 44

Relative frequency

• It is often preferable to show the

relative frequency (proportion) of

observations falling into each class,

rather than the frequency itself.

Class relative frequency = Class frequency

Total number of observations

44

Trang 45

• Relative frequencies should be used

when

– the population relative frequencies are

studied

– comparing two or more histograms

– the number of observations of the samples studied are different

Relative frequency

Trang 46

Cumulative frequency of a class

than the upper limit of that class.

class, we add the frequency of that class

and the frequencies of all previous classes.

particular class is the proportion of

measurements that are less than the upper limit of that class.

46

Trang 47

• (pronounced ‘Oh-jive’) is a graph of

a cumulative frequency distribution.

• We create an ogive in three steps…

• First, from the frequency distribution created

earlier, calculate relative frequencies:

Class relative frequency = Class frequency

Total number of observations

Trang 49

Is a graph of a cumulative frequency

distribution

• We create an ogive in three steps…

 1) Calculate relative frequencies 

2) Calculate cumulative relative

frequencies by adding the current

class’ relative frequency to the previous

class’ cumulative relative frequency

(For the first class, its cumulative relative

frequency is just its relative frequency.)

Trang 50

TABLE 2.15 Cumulative relative frequencies for Example 2.5

Trang 51

Is a graph of a cumulative frequency

distribution

• 1) Calculate relative frequencies 

• 2) Calculate cumulative relative

frequencies 

• 3) Graph the cumulative relative

frequencies… Example 2.5 Ogive

Trang 52

Ogive…

The ogive can be used to

answer questions like:

What telephone bill value

is at the 50th percentile?

(Refer also to Fig 2.21 in your textbook.)

around $35

Example 2.5 Ogive

Trang 53

2.4 Describing Time Series Data

• Observations measured at the same point in

time across individual units are called

cross-sectional data.

• Observations measured at successive points in

time on a single unit are called time-series

data

Time-series data are graphed on a line chart,

which plots the value of the variable on the vertical axis against the time periods on the horizontal axis

• Time series data graphed on a line chart is

Trang 54

Time Series Data

We recorded the value of Australian exports from

1992 to 2009 (Figure 2.22) Draw a line chart to describe these data and briefly describe the

results

Trang 55

Line Chart

• Plot the frequency of a category above the point on the horizontal axis

representing that category.

• Use line charts when the categories are points in time.

• Line charts are particularly useful when the trend over time is to be

emphasised.

Trang 56

Line Chart

Figure 2.22 Line chart showing change in Australian exports over time

Trang 57

Line Chart

• Australian exports have had a slow but steady increase from 1992 to 2004

• After 2004, Australian exports have

been increasing steadily at a much

higher rate.

Trang 58

techniques.

Trang 59

A classification table (or

cross-tabulation table) is used to describe the

A cross-classification table lists the

frequency of each combination of the

values of the two variables…

Describing the Relationship between Two Nominal Variables

Trang 60

Example 2.8

competing newspapers: N1, N2, N3 and N4

advertising managers of the newspapers

need to know which segments of the

newspaper market are reading their papers

relationship between newspapers read and occupation

Trang 61

A sample of newspaper readers was

asked to report which newspaper they

read: N1, N2, N3, N4, and to indicate

whether they were blue-collar worker (1), white-collar worker (2), or professional

(3)

The responses are stored in file XM02-08.

Example 2.8

Trang 62

Example 2.8

the 12 combinations occurs, we produced the Table 2.16.

Trang 63

Example 2.8

then there will be differences in the

newspapers read among the occupations

frequencies in each row to relative

frequencies in each row

Trang 65

Example 2.8

• Interpretation: The relative frequencies in the rows

2 and 3 are similar, but there are large differences between rows 1 and 2, and between rows 1 and 3

• Row 1: Blue collar (1); Row 2: White collar (2);

Row 3: Professional (3)

• This tells us that blue collar workers tend to read different newspapers from both white collar

workers and professionals and that white collar

and professionals are quite similar in their

newspaper choice

Trang 66

newspaper N2 more than twice

as often as newspaper N3.

Trang 67

Describing the Relationship

between Two Numerical Variables

• Often we are interested in the relationships

between two numerical variables

Trang 68

Example 2.9

• A small-business owner wants to assess the effects of advertising on sales levels

• Paired observation data were collected

• Each pair consisted of monthly advertising

expenditure and monthly sales levels

68

Trang 69

• A scatter diagram can describe the

relationship between advertising

expenditure and sales.

0 10 20 30 40 50 60

Trang 70

Patterns of Scatter Diagrams…

Linearity and direction are two concepts we are interested in

Weak or non-linear relationship

Trang 71

• That is, they believed that when the price of oil increased the price of petrol also increased, but

Trang 72

Chapter-Opening Example

WERE OIL COMPANIES GOUGING CUSTOMERS 2006?: SOLUTION

1999-• To determine whether this perception is

accurate we determined the monthly figures for both commodities CH02:\Oil

• Graphically depict these data and describe the findings

Trang 73

Chapter-Opening Example

Trang 74

Chapter-Opening Example

Interpreting the results:

• The scatter diagram reveals that the two

prices are strongly related linearly

• As the oil price increases, petrol price also

increases When the price of oil was below

A$85, the relationship between the two

variables was stronger than when the price of oil exceeded A$85

Ngày đăng: 05/06/2014, 08:34

TỪ KHÓA LIÊN QUAN

w