Quantitative Methods for Business chapter 4 doc

Filling up – fuelling quantitative analysis4 Chapter objectives This chapter will help you to: ■ understand key statistical terms ■ distinguish between primary and secondary data ■ recog

Trang 1

Filling up – fuelling quantitative analysis

4

Chapter objectives

This chapter will help you to:

■ understand key statistical terms

■ distinguish between primary and secondary data

■ recognize different types of data

■ arrange data using basic tabulation and frequency distributions

■ use the technology: arrange data in EXCEL, MINITAB andSPSS

In previous chapters we have concentrated on techniques or modelsinvolving single values that are known with certainty Examples of theseare break-even analysis and linear programming, which we looked at inChapter 2, and the Economic Order Quantity model featured inChapter 3 In break-even analysis the revenue per unit, the fixed costand the variable cost per unit are in each case a specified single value

In linear programming we assume that both profit per unit and resourceusage are constant amounts In the Economic Order Quantity modelthe order cost and the stock-holding cost per unit are each known singlevalues Because these types of models involve values that are fixed or

predetermined they are called deterministic models.

Deterministic models can be useful means of understanding andresolving business problems Their reliance on known single valueinputs makes them relatively easy to use but is their key shortcoming.Companies simply cannot rely on a figure such as the amount of

Trang 2

raw material used per unit of production being a single constant value.

In practice, such an amount may not be known with certainty, because

it is subject to chance variation Because of this company managersmay well need to study the variation and incorporate it within the modelsthey use to guide them

Models that use input values that are uncertain rather than certain,values that are subject to chance variation rather than known, are

called probabilistic models, after the field of probability, which involves

the measurement and analysis of chance We shall be dealing withprobability in later chapters

Before you can use probability to reflect the chance variation in ness situations you need to know how to get some idea of the variation

busi-To do this we have to start by ascertaining where relevant informationmight be found Having identified these sources you need to knowhow to arrange and present what you find from them in forms that willhelp you understand and communicate the variation In order to dothis in appropriate ways it is important that you are aware of the differenttypes of data that you may meet

The purpose of this chapter is therefore to acquaint you with someessential preliminaries for studying variation We will start with defin-itions of some key terms, before looking into sources of data and con-sidering the different types of data Subsequently we shall look at basicmethods of arranging data

4.1 Some key words you need to know

There are several important terms that you will find mentionedfrequently in this and subsequent chapters They are:

Data The word data is a plural noun (the singular form is datum), which

means a set of known or given things, facts Data can be numerical (e.g.wages of employees) or non-numerical (e.g job titles of employees)

Variable A variable is a quantity that varies, the opposite of a constant.For example, the number of telephone calls made to a call centre perhour is a variable, whereas the number of minutes in an hour is a

constant Often a capital letter, usually X or Y, is used to represent a

variable

Value A value is a specific amount that a variable could be Forexample the number of telephone calls made to a call centre perhour could be 47 or 71.These are both possible values of thevariable ‘number of calls made’

Trang 3

Observation or Observed value This is a value of a variable that hasactually occurred, i.e been counted or measured For example, if

58 telephone calls are made to a call centre in a particular hour that

is an observation or observed value of the variable ‘number of callsmade’

An observation is represented by the lower case of the letter used

to represent the variable; for instance ‘x’ represents a single observed value of the variable ‘X’.A small numerical suffix is added to

distinguish particular observations in a set; x1would represent the

first observed value, x2the second and so on

Data set A data set consists of all the observations of all the variablescollected in the course of a study or investigation, together with thevariable names

Random This describes something that occurs in an unplanned way,

by chance

Random variable A random variable has observed values that arise bychance.The number of new cars a car dealer sells during a month is arandom variable; whereas the number of days in a month is a variablethat is not random because its observed values are pre-determined

Distribution The pattern exhibited by the observed values of a variable

when they are arranged in order of magnitude.A theoretical

distribution is one that has been deduced, rather than compiled fromobserved values

Population Generally this means the total number of persons residing in

a defined area at a given time In quantitative methods a population

is the complete set of things or elements we want to investigate.These

may be human, such as all the people who have purchased a particularproduct, or inanimate, such as all the cars repaired at a garage

Sample A sample is a subset of a population, that is, a smaller number

of items picked from the population.A random sample is a sample that

has components chosen in a random way, on the basis that any singleitem in the population has no more or less chance than any other to

be included in the sample

A typical quantitative investigation of a business problem might involve

defining the population and specifying the variables to be studied Following this a sample of elements from the population is selected and observations of the variables for each element in the sample recorded.

Once the data set has been assembled work can begin on arranging

and presenting the data so that the patterns of variation in the butions of values can be examined.

distri-At this point you may find it useful to try Review Question 4.1 at the

end of the chapter

Trang 4

4.2 Sources of data

The data that form the basis of an investigation might be collected atfirst hand in response to a specific problem This type of data, col-

lected by direct observation or measurement, is known as primary data.

The procedures used to gather primary data are surveys, experimentsand observational methods A survey might involve asking consumerstheir opinion of a product A series of experiments might be conducted

on products to assess their quality Observation might be used to ascertainthe hazards at building sites

The advantages of using primary data are that they should match therequirements of those conducting the investigation and they are up-to-date The disadvantages are that gathering such data is both costly andtime-consuming

An alternative might be to find data that have already been collected

by someone else This is known as secondary data A company looking

for data for a specific study will have access to internal sources of ondary data, but as well as those there are a large number of externalsources; government statistical publications, company reports, aca-demic and industry publications, and specialist information servicessuch as the Economist Intelligence Unit The advantages of using sec-ondary data are that they are usually easier and cheaper to obtain Thedisadvantages are that they could be out of date and may not beentirely suitable for the purposes of the investigation

sec-4.3 Types of data

Collecting data is usually not an end in itself When collected the datawill be in ‘raw’ form, a state that might lead someone to refer to it as

‘meaningless data’ Once it is collected the next stage is to begin

trans-forming it into information, literally to enable it to inform us about the

issue being investigated

There is a wide range of techniques that you can use to organize, play and represent data Selecting which ones to use depends on the type

dis-of data you have The nature dis-of the raw material you are working withdetermines your choice of tools Scissors are fine for cutting paper but nogood for cutting wood A saw will cut wood but is useless for cutting paper

It is therefore essential that you understand the nature of the data youwant to analyse before embarking on the analysis, so in this section we willlook at several ways of distinguishing between different types of data

Trang 5

There are different types of data because there are different ways inwhich facts are gathered Some data may exist because specific thingshave characteristics that have been categorized whereas other data mayexist as a result of things being counted, or measured, on some sort ofscale.

Perhaps the most important way of contrasting data types is onthe basis of the scales of measurement used in obtaining them The

acronym NOIR stands for Nominal, Ordinal, Interval, Ratio; the four

basic data types Nominal is the ‘lowest’ form of data, which containsthe least amount of information Ratio is the ‘highest’ form of data,which contains the most amount of information

The word nominal comes from the same Latin root as the wordname Nominal data are data that consist solely of names or labels.These labels might be numeric such as a bank account number, or theymight be non-numeric such as gender Nominal data can be categor-ized using the labels themselves to establish, for instance the number

of males and females It is possible to represent and analyse nominaldata using proportions and modes (the modal category is the one thatcontains the most observations), but carrying out more sophisticatedanalysis such as calculating an average is inappropriate; for example,adding a set of telephone numbers together and dividing by the numberthere are to get an average would be meaningless

Like nominal data, ordinal or ‘order’ data consist of labels that can

be used to categorize the data, but order data can also be ranked.Examples of ordinal data are academic grades and finishing positions

in a horse race An academic grade is a label (an ‘A’ grade student)that also belongs to a ranking system (‘A’ is better than ‘B’) Becauseordinal data contain more information than nominal data we can use

a wider variety of techniques to represent and analyse them As well as

proportions and modes we can also use order statistics, such as identifying the middle or median observation However, any method involving arith-

metic is not suitable for ordinal data because although the data can be

Example 4.1

Holders of a certain type of investment account are described as ‘wealthy’

To investigate this we could use socio-economic definitions of class to categorize each account holder, or we could count the number of homes owned by each account holder,

or we could measure the income of each account holder.

Trang 6

ranked the intervals between the ranks are not consistent For instance,the difference between the horse finishing first in a race and the onefinishing second is one place The difference between the horse fin-ishing third and the one finishing fourth is also one place, but this doesnot mean that there is the same distance between the third- and fourth-placed horses as there is between the first- and second-placed horses.Interval data consist of labels and can be ranked, but in addition theintervals are measured in fixed units so the differences between valueshave meaning It follows from this that unlike nominal and ordinal,both of which can be either numeric or non-numeric, interval data arealways numeric Because interval data are based on a consistent numer-ical scale, techniques using arithmetical procedures can be applied tothem Temperatures measured in degrees Fahrenheit are intervaldata The difference between 30° and 40° is the same as the differencebetween 80° and 90°.

What distinguishes interval data from the highest data form, ratiodata, is that interval data are measured on a scale that does not have ameaningful zero point to ‘anchor’ it The zero point is arbitrary, forinstance 0° Fahrenheit does not mean a complete lack of heat, nor is itthe same as 0° Celsius The lack of a meaningful zero also means thatratios between the data are not consistent, for example 40° is not half

as hot as 80° (The Celsius equivalents of these temperatures are 4.4°and 26.7°, the same heat levels yet they have a completely differentratio between them.)

Ratio-type data has all the characteristics of interval data – it consists

of labels that can be ranked as well as being measured in fixed amounts

on a numerical scale The difference is that the scale has a meaningfulzero and ratios between observations are consistent Distances are ratiodata whether we measure them in miles or kilometres Zero kilometresand zero miles mean the same – no distance Ten miles is twice as far

as five, and their kilometre equivalents, 16 and 8, have the same ratiobetween them

Example 4.2

Identify the data types of the variables in Example 4.1

The socio-economic classes of account holders are ordinal data because they arelabels for the account holders and they can be ranked

The numbers of homes owned by account holders and the incomes of account ers are both ratio data Four homes are twice as many as two, and £60,000 is twice asmuch income as £30,000

Trang 7

hold-At this point you may find it useful to try Review Question 4.2 at the

end of the chapter

Another important distinction you need to make is between tive data and quantitative data Qualitative data consist of categories or

qualita-types of a characteristic or attribute and are always either nominal orordinal The categories form the basis of the analysis of qualitativedata Quantitative data are based on counting ‘how many’ or measur-ing ‘how much’ and are always of interval or ratio type The numericalscale used to produce the figures forms the basis of the analysis ofquantitative data

There are two different types of quantitative data: discrete and uous Discrete data are quantitative data that can take only a limited

contin-number of values because they are produced by counting in distinct or

‘discrete’ steps, or measuring against a scale made up of distinct steps.There are three types of discrete data that you may come across.First, data that can only take certain values because other values simplycannot occur, for example the number of hats sold by a clothingretailer in a day There could be 12 sold one day and 7 on another, butselling 9.3 hats in a day is not possible because there is no such thing as0.3 of a hat Such data are discrete by definition

Secondly, data that take only certain values because those are theones that have been established by long-standing custom and practice,for example bars in the UK sell draught beer in whole and half pints.You could try asking for three-quarters of a pint, but the bar staff would

no doubt insist that you purchase the smaller or larger quantity Theysimply would not have the equipment or pricing information to hand

to do otherwise

There are also data that only take certain values because the peoplewho have provided the data or the analysis have decided, for conveni-ence, to round values that do not have to be discrete This is what youare doing when you give your age to the last full year Similarly, the tem-peratures given in weather reports are rounded to the nearest degree,and the distances on road signs are usually rounded to the nearestmile These data are discrete by convention rather than by definition

They are really continuous data.

Discrete data often but not always consist of whole number values.The number of visitors to a website will always be a whole number, butshoe sizes include half sizes In other cases, like the UK standard sizes

of women’s clothing, only some whole numbers occur

The important thing to remember about discrete data is that thereare gaps between the values that can occur, that is why this type of data

is sometimes referred to as discontinuous data In contrast, continuous

Trang 8

data consist of numerical values that are not restricted to specificnumbers Such data are called continuous because there are no gapsbetween feasible values This is because measuring on a continuousscale such as distance or temperature yields continuous data.

The precision of continuous data is limited only by how precisely thequantities are measured For instance, we measure both the length of busjourneys and athletic performances using the scale of time In the firstcase a clock or a wristwatch is sufficiently accurate, but in the second case

we would use a stopwatch or an even more sophisticated timing device

The terms discrete variable and continuous variable are used in describing

data sets A discrete variable has discrete values whereas a continuousvariable has continuous values

At this point you may find it useful to try Review Questions 4.3 and

4.4at the end of the chapter

In most of your early work on analysing variation you will probably beusing data that consist of observed values of a single variable Howeveryou may need to analyse data that consist of observed values of two variables in order to find out if there is a connection between them Forinstance, we might want to ascertain how cab fares are related to journeytimes

In dealing with a single variable we apply univariate analysis, whereas

in dealing with two variables we apply bivariate analysis The prefixes

uni- and bi- in these words convey the same meanings as they do inother words like unilateral and bilateral You may also find reference to

multivariate analysis, which involves exploring relationships between

more than two variables

Example 4.3

A motoring magazine describes cars using the following variables:

Type of vehicle – Hatchback/Estate/MPV/Off-Road/Performance

Number of passengers that can be carried

Fuel type – petrol/diesel

Fuel efficiency in miles per gallon

Which variables are qualitative and which quantitative?

The type of car and fuel type are qualitative; the number of passengers and the fuelefficiency are quantitative

Which quantitative variables are discrete and which continuous?

The number of passengers is discrete; the fuel efficiency is continuous

Trang 9

You may come across data referred to as either hard or soft Hard data

are facts, measurements or characteristics arising from situations thatactually exist or were in existence Temperatures recorded at a weatherstation and the nationalities of tourists are examples of hard data Softdata are about beliefs, attitudes and behaviours Asking consumerswhat they know about a product or how they feel about an advertise-ment will yield soft data The implication of this distinction is that harddata can be subjected to a wider range of quantitative analysis Softdata is at best ordinal and therefore offers less scope for quantitativeanalysis

A further distinction you need to know is between cross-section and time series data Cross-section data are data collected at the same point

in time or based on the same period of time Time series data consist

of observations collected at regular intervals over time The volumes ofwine produced in European countries in 2002 are cross-section datawhereas the volumes of wine produced in Italy in the years 1992 to

2002 are time series data

At this point you may find it useful to try Review Question 4.5 at the

end of the chapter

4.4 Arrangement of data

Arranging or classifying data in some sort of systematic manner is the

vital first stage you should take in transforming the data into tion, and hence getting it to ‘talk to you’ The way you approach thisdepends on the type of data you wish to analyse

informa-4.4.1 Arranging qualitative data

Dealing with qualitative data is quite straightforward as long as thenumber of categories of the characteristic being studied is relativelysmall Even if there are a large number of categories, the task can bemade easier by merging categories

The most basic way you can present a set of qualitative data is to late it, to arrange it in the form of a summary table A summary table

tabu-consists of two parts, a list of categories of the characteristic, and the

number of things that fall into each category, known as the frequency of

the category Compiling such a table is simply a matter of countinghow many elements in the study fall into each category

Trang 10

In Table 4.1 the outlet types are qualitative data The ‘Other’ egory, which might contain several different types of outlet, such ashypermarkets and market stalls, has been created in order to keep thesummary table to manageable proportions.

cat-Notice that for each category, the number of outlets as a percentage

of the total, the relative frequency of the category, is listed on the right

hand side This is to make it easier to communicate the contents; ing 30.8% of the outlets are shoe shops is more effective than saying12/39ths of them were shoe shops, although they are different ways ofsaying the same thing

say-You may want to use a summary table to present more than one

attrib-ute Such a two-way tabulation is also known as a contingency table because

it enables us to look for connections between the attributes,in other

words to find out whether one attribute is contingent upon another.

Example 4.4

Suppose we want to find how many different types of retail outlet in an area sell trainers

We could tour the area or consult the telephone directory in order to compile a list

of outlets, but the list itself may be too crude a form in which to present our results

By listing the types of outlet and the number of each type of outlet we find we canconstruct a summary table:

Example 4.5

Four large retailers each operate their own loyalty scheme Customers can apply forloyalty cards and receive points when they present them whilst making purchases.These points are accumulated and can subsequently be used to obtain gifts ordiscounts

Trang 11

At this point you may find it useful to try Review Questions 4.6 to 4.8

at the end of the chapter

4.4.2 Arranging quantitative data

The nature of quantitative data is different to qualitative data andtherefore the methods used to arrange quantitative data are rather dif-ferent However, the most appropriate way of arranging some quanti-tative data is the same as the approach we have used to arrangequalitative data

This applies to the analysis of a discrete quantitative variable that has

a very few feasible values You simply treat the values as you would thecategories of a characteristic and tabulate the data to show how ofteneach value occurs When quantitative data are tabulated, the resulting

table is called a frequency distribution because it demonstrates how

frequently each value in the distribution occurs

A survey of usage levels of loyalty cards provided the information in the followingtable:

Table 4.2

Number of transactions by loyalty card use

Transactions Retailer With card Without card Total

Aptyeka 236 705 941 Botinky 294 439 733 Crassivy 145 759 904 Total 675 1903 2578

Example 4.6

The UREA department store offers free refills when customers purchase hot beverages

in its cafe The numbers of refills taken by 20 customers were:

Trang 12

At this point you may find it useful to try Review Questions 4.9 to

4.11at the end of the chapter

We can present the data in Example 4.6 in the form of a simple tablebecause there are only a very limited number of values Unfortunatelythis is not always the case, even with discrete quantitative data

For instance, if Example 4.6 included customers who spent all day inthe café and drank 20 or so cups of coffee each then the number ofrefills might go from none to 30 This would result in a table with fartoo many rows to be of use

To get around this problem we can group the data into fewer egories or classes by compiling a grouped frequency distribution This

cat-shows the frequency of observations in each class

These figures can be tabulated as follows:

Table 4.3

Number of hot beverage refills taken

Trang 13

In order to compile a grouped frequency distribution you will need toexercise a little judgement because there are many sets of classes thatcould be used for a specific set of data To help you, there are three rules:

1 Don’t use classes that overlap

2 Don’t leave gaps between classes

3 The first class must begin low enough to include the lowestobservation and the last class must finish high enough toinclude the highest observation

In Example 4.7 it would be wrong to use the classes 0–20, 20–40, 40–60and so on because a value on the very edge of the classes like 20 could beput into either one, or even both, of two classes Although there arenumerical gaps between the classes that have been used in Example 4.7,they are not real gaps because no feasible value could fall into them Thefirst class finishes on 19 and the second begins on 20, but since the num-ber of messages received is a discrete variable a value like 19.6, whichwould fall into the gap, simply will not occur Since there are no observedvalues lower than zero or higher than 99, the third rule is satisfied

We could sum up these rules by saying that anyone looking at agrouped frequency distribution should be in no doubt where each feas-ible value belongs Every piece of data must have one and only oneplace for it to be To avoid any ambiguity whatsoever, you may like touse the phrase ‘and under’ between the beginning and end of eachclass The classes in Example 4.7 could be rewritten as:

0 and under 20

20 and under 40 … and so on

It is especially important to apply these rules when you are dealing withcontinuous quantitative data Unless you decide to use ‘and under’ or

a similar style of words, it is vital that the beginning and end of eachclass is specified to at least the same degree of precision as the data

Example 4.8

The results of measuring the contents (in millilitres) of a sample of 30 bottles of ‘Nogat’nail polish labelled as containing 10 ml were:

10.30 10.05 10.06 9.82 10.09 9.85 9.98 9.97 10.28 10.01 9.9210.03 10.17 9.95 10.23 9.92 10.05 10.11 10.02 10.06 10.21 10.0410.12 9.99 10.19 9.89 10.05 10.11 10.00 9.92

Định dạng
Số trang	27
Dung lượng	147,29 KB