Chapter 7 Analyzing quantitative data Chapter 7 Quantitative Research Methods • Raw quantitative data, that haven’t been processed or analyzed, convey very little meaning to most people • For these da[.]
Trang 37.1 Preparing, checking and inputting data
Types of data:
• Quantitative data can be divided in two groups: categorical data and numerical data
Categorical data are those whose values cannot be measured numerically but can be
classified into sets/categories according to the characteristics that describe or identify the variable, or they could be placed in rank order
• There are two types of data:
• Descriptive/nominal data – these data can simply count the number of occurrences
in each category of a variable When a variable is divided into two categories
(female/male for example) than the data are known as dichotomous data
• Ranked/ordinal data – these are data that are a more precise form than categorical
Trang 47.1 Preparing, checking and inputting data
• Alternatively, numerical data are those whose values are numerically measured or counted as quantities (Berman 2008)
• Numerical data are therefore more precise than categorical ones because one can assign each data value a position on a numerical scale.
• Numerical data can be subdivided in two ways: based on interval and ratio data: or based on continuous or discrete data
• Interval data can state the difference (interval) between any two data values of a certain variable, whereas ratio data can calculate the relative difference (ratio) between any two data values of a certain variable.
• Continuous data are those whose values can take any value (given that you measure them accurately) while discrete data can be measured precisely (often whole numbers/integers).
Trang 57.1 Preparing, checking and inputting data
• After determining the types of data that are to be collected the researcher can start to enter the data into data computer data processing software (RSS/EXCELL)
• To do this the data need to be coded using numerical codes This enables the researcher
to enter the data quickly with fewer errors
• When this is done the data should be checked for errors
Trang 67.2 Exploring and presenting the data
• Turkey’s (1977) exploratory data analysis (EDA) is a useful approach to start the analysis
of quantitative data This approach focuses on the use of diagrams to explore and
understand the data Sometimes it might be possible that this approach enables you to look at other relationships in data, which your research was not designed to test
• When looking at the collected data it is best to explore specific values, highest and
lowest values, trends over time, proportions and distributions
• Once these have been explored one can start to compare them and look for (causal)
relationships between variables)
Trang 77.2 Exploring and presenting the data
Exploring
variables
Shapes of diagrams
Comparing variables
Trang 87.2 Exploring and presenting the data
• Exploring variables:
• The easiest way of summarizing the data is by using tables However, tables do not demonstrate visual significance to highest or lowest values so it may be that diagrams are a better option for summarizing the data
• Another way to present data is by using a bar chart, where the height or length of each bar represents the frequency of occurrence
• Bar charts are similar to histograms, another type of data presenting, where the area of each bar represents the frequency of occurrence and where the continuous nature of the data is emphasized by the absence of gaps between
bars.
Trang 107.2 Exploring and presenting the data
Trang 117.2 Exploring and presenting the data
• Comparing variables
• Contingency tables or cross tabulation are approaches one could use examine the
interdependence between variables Other approaches are:
• Multiple bar charts - to explore highest and lowest values.
• Percentage component bar chart – this is used to compare proportions between variables.
• Multiple line graph – this Is used to compare trends and conjunctions.
• Stacked bar chart – used to compare totals between variables.
• Comparative proportional pie chart – this is used to compare proportions of each category or value as well as the totals between variables.
• Scatter graphs or scatter plots – this diagram is often used to explore the possible
relationships between ranked and numerical data variables by plotting one variable against
Trang 127.3 Describing data with use of statistics
• Turkey’s exploratory data analysis approach is a good approach to understand the data using diagrams
• Descriptive statistics, on the other hand, enable one to describe the variables
numerically They describe a variable focus on the central tendency and the dispersion
• Central tendency is measured by general impressions of values that could be seen as
common, middling or average These measures are determined by:
• The mode – the value that is visible most often
• The median – the middle value or mid-point after the data have been ranked
• The mean – also known as the average
Trang 137.3 Describing data with use of statistics
• The dispersion (how data are distributed around the central tendency) could be
described by:
• Inter-quartile range – the difference within the middle 50 per cent of values
• Standard deviation – extent to which the value differs from the mean
• Range – the difference between the lowest and the highest values
• Coefficient of variation – this is to compare the relative spread of data between
distributions of different magnitudes, for example hundreds of tons with billions of tons (calculated by dividing the standard deviation by the mean and multiply the answer by 100)
Trang 147.4 Explore relationships, differences and trends using statistics
• In a research one often wishes to find the relationship between variables
• This is called hypothesis testing, where one is actually comparing the collected data with what he expected to happen
• There are two general groups of statistical significance tests: the non-parametric tests (used when the data are not normally distributed) and the parametric tests( these are used with numerical data)
Trang 157.4 Explore relationships, differences and trends using statistics
Testing for normal distribution
Testing for significance
Type 1 and 2 errors
Trang 167.4 Explore relationships, differences and trends using statistics
• Testing for normal distribution:
• A way to test for normality is to use statistics to determine whether the distribution for a variable differs significantly from a comparable normal distribution
• This could be done using statistical software that use the Kolmogorov-Smirnov test and the Shapiro-Wilk test A probability of 0.05 means that there is a 5 per cent chance that the data distribution differs from a comparable normal distribution
• Thus if the probability is lower than 0,05, the data are not normally distributed
Trang 177.4 Explore relationships, differences and trends using statistics
• Testing for significance
• If a there is a relationship between variables than the researcher will reject the null
hypothesis and accept the alternative hypothesis
• It is difficult to obtain a significant test statistic with a small sample, by increasing the
sample size more relationships found will be significant
• This is because the sample size resembles that of the population from which it was
selected
Trang 187.4 Explore relationships, differences and trends using statistics
• Type 1 and 2 errors
• A Type 1 error occurs when the null hypothesis has been wrongly rejected and the
alternative hypothesis should not have been accepted In other words, the researcher states that two variables are related when they are actually not Statististical significance
is the same as determining the probability of making a Type 1 error
• A Type 2 error is when a researcher does not reject the null hypothesis when he should Thus he states that two variable are not related when they actually are
Trang 197.4 Explore relationships, differences and trends using statistics
• Type 1 and 2 errors
• When descriptive or numerical data are summarized as a
two-way contingency table it is helpful to use a chi square test
• A chi square test makes it possible to determine how likely it is
that two variables are associated
• In order to do this test two assumptions should be met:
• The categories of the contingency table are mutually
exclusive Each observation falls into one category only
• Not more than 25 per cent of the cells can have expected
values of less than 5 When the table consists of two rows
and two columns, no expected values can be less than 10.
Trang 207.4 Explore relationships, differences and trends using statistics
• Exploring the strength of a relationship
• There are two kinds of relationships:
• Correlations: this is when a change in one variable
leads to a change in another variable, but it is not
clear which variable has caused the other to
change
• Cause-and-effect relationship: when a change in
one or more variables cause a change in another
variable
Trang 217.4 Explore relationships, differences and trends using statistics
• Exploring the strength of a relationship
• The correlation coefficient quantifies the strength
of a linear relationship between two ranked or
numerical variables between a number of +1 and
-1
• A value of +1 means positive correlation, which
means that the two variables are exactly related
and when one increases, the other one will
increase as well
• A value of -1 demonstrates a negative correlation,