Levels of Data Data conforms to one of four levels: Nominal categorical - the value is either present or not Ordinal - the value is ranked relative to others Interval - the value
Trang 1must be extremely aware of issues such as the level of data used and characteristics of
the population, namely distributional assumptions
Levels of Data
Data conforms to one of four levels:
Nominal (categorical) - the value is either present or not
Ordinal - the value is ranked relative to others
Interval - the value is scored absolute to others
Ratio - the value is scored absolute to others and to a meaningful zero
An example:
Consider three horses in a race Coding the race times under a nominal level will tell
us if any particular horse won the race or not (e.g Guttman‟s Folly did not win) Coding under an ordinal level, we can tell where a given horse came in relative to the others (e.g Guttman‟s Folly came in second) Coding under an interval level, we know where a given horse came absolute to the others (e.g Guttman‟s Folly was 1.5 seconds faster than Galloping Galton, but 2.3 seconds slower than Cattell‟s Chance) Coding under a ratio level we would know where a given horse came absolute to the others and a meaningful common zero point to all of them (e.g Guttman‟s Folly came home in 67.5 seconds, Galloping Galton was 69.0 seconds, and Cattell‟s Chance was 65.2 seconds)
Sometimes we use dichotomies This basically is a variable that can only take one of
two values, either present (1) or absent (0) Its level therefore is nominal
Descriptive Statistics
Measures of Central Tendency
There are three measures that give an indication of the „average‟ value of a data set:
Mode - this is the most common value in the data set (most appropriate for nominal level data)
Mean - this is the arithmetic average, the one most familiar to people (most appropriate for interval and ratio level data)
Median - this is the middle value in the data set (most appropriate for ordinal level data)
As an example, the following are the numbers of children for seven families:
0 0 0 1 1 5 7
The mode (most common value) is 0
The mean is calculated as 14 (sum of all seven scores) divided by seven (number
of cases), which equals 2
Trang 2 The median is 1 (the middle case of the seven cases)
Measures of Dispersion
Some measures of dispersion include:
Range - this is basically just the difference between the highest and lowest scores, e.g in the above example of families the range would be 7 minus 0, which is 7
Standard Deviation - this represents an average deviation from the mean, essentially In this case, it is 2.6 This measure of dispersion is normally calculated through SPSS
Normal (and Abnormal) Distributions
A normal distribution is a reflection of a naturally occurring distribution of values
also known as the bell curve, where the mean, median and mode are all equal, e.g IQ scores If this is the case, then the researcher is able to make certain assumptions about the population parameters This assumption enables specific methods of analysis to be used
A normal distribution:
However, normal distributions are something of an ideal For example, upon examining the earlier data on number of children, we see that the mean, median and mode are not equal Therefore we can not make assumptions about the population
parameters In other words, non-parametric methods of analysis must be used
Typically, the measures used to represent these kinds of distributions are the median and range, as opposed to mean and standard deviation Often, the data you will be using will not allow you to make assumptions about the population parameters, so non-parametric methods must be used (more on this later)
Frequencies
Frequencies represent the number of occurrences for values of a given variable If, for example, ten participants in an experiment were made up of five males and five females, then the frequencies for the values of 1 (male) and 2 (female) would both be 50% A frequency score for a given value is the percentage of all the subjects/cases/participants that have that value as a score There are different forms of
Trang 3frequency counts in SPSS, all of which are detailed in the SPSS section on descriptive statistics
Inferential Statistics
Parametric Statistics
Briefly, traditional statistics are used after conducting an experiment testing a research hypothesis This hypothesis is about a relationship between the independent and dependent variables Inferential (i.e hypothesis testing) statistical methods do this by applying the findings of descriptive statistics discussed earlier Thus we can infer an aspect of the characteristics of the population from the samples we take of these populations
Take an easy example Let us try to determine whether or not our group of MSc students are „normal‟ with respect to the population of postgraduate students in the
UK We hypothesise that you are not We obtain scores for normality from the files and observations of your behaviour, and calculate the central tendency and variation
of the group The „N‟ rating for this group is 40 with a standard deviation of 10 We know that other postgraduate students have an „N‟ rating of 60 with a standard deviation of 20 In terms of probability, calculations would show that the chances of the MSc‟s coming from the „normal‟ population is 2%, therefore you are statistically unlikely to come from that population
However, the fun begins when we:
1 try to set up the experimental designs that allow the independent variable to be manipulated to cause a change in the dependent variable, and/or
2 have to estimate the population parameters from the sample(s) because we don‟t know enough about the population
So, looking at (2) first, estimating population parameters when they are unknown is done using the sample itself „Aha‟, you think, „surely that‟s got to be wrong because
we are testing the sample against a population which is estimated using the sample.‟
That‟s where (1) comes in - by performing certain experimental manipulations (random selection, large sample size, etc.) we can ensure that the sample provides an
unbiased estimate of the population If we do this then the error in the sample is
minimised, though never eliminated - hence the need for p values Shall we look at this in more detail?
Ways to allow the experimental design to overcome the difficulties in estimating population parameters include, most importantly, random assignment of subjects and random assignment of treatments This includes levels of treatments as well
The use of experimental controls is also important to ensure that the participants are not biasing the sample in any way, i.e independent or between groups Alternatively, and preferably, subjects may act as their own controls, i.e dependent or within groups Placebos are an important way to reduce subject error, and experimenter bias
„Blind‟ and „double-blind‟ experiments are those that take this into consideration If
Trang 4such things are fully accounted for, then the parameters of the population (i.e central tendency and variation in scores) are estimated using the sample, and the accuracy of this sample is given by statistical levels of probability
Effect Size and Other Concerns
So, we‟ve designed the experiment adequately and we‟ve gathered the data bearing in mind all those things described above Now we must check that the statistics we perform on the data are capable of rejecting or accepting the hypotheses we proposed This is the effect size of the manipulation
The risk of falsely accepting the null hypothesis when it is in fact true is traditionally set at 5%, i.e = 0.05 Traditional statistics is very conservative, and has a morbid fear of rejecting the null hypothesis when it is in fact true In more applied settings, the ability of a test to be sensitive enough to reject the null hypothesis when it is in fact not true is also important This level is not usually mentioned, but is implicitly assumed to be 20%, i.e = 0.20 These levels play a strong part in the effect size, as does the direction of the hypothesised relationship being one or two-tailed
Other influences on effect size include sample size Often, this is limited by the number of subjects available, though ideally the sample size should be determined by the desired effect size The other determining influence is statistical test used For more details on the theories underlying statistics, consult a statistics book, such as Howell
Fundamentals of Statistical Testing
All parametric statistical tests have one basic idea in common: each produces a test statistic (t, F, etc.) that is associated with a significance value given the size of the sample This statistic is a summary of the following ratio:
test statistic = amount of systematic variation
amount of error variation
Systematic variation comes from the (desirable) effect of the manipulation of the independent variable and error variation comes from the (undesirable) effect of error-ridden noise Hence the larger the error is in sampling, the more powerful the manipulation of independent variable must be to create a „significant‟ effect A sensible way to obtain a „good‟ test statistic is to reduce the error in the sample (the denominator in the equation), though many psychologists prefer to have HUGE samples and increase the systematic variation (the numerator in the equation)
Which Statistical Test?
Parametric inferential tests can be divided based on the design of the experiment, the number of conditions being tested and the number of levels of study
Designs can be of two types – between subject and within subject The former is when you divide subjects into independent groups, such as on the basis of gender, or into one group that receives a drug, and a second that receives a placebo Within subject
Trang 5designs are when all subjects are subjected to all conditions, e.g testing reaction times before and after receiving a drug
The number of conditions is merely how many “tests” you administer for an independent variable So, in the above example, the between subjects would have two conditions (drug and placebo) The within subjects would also have two (before and after drug) For two conditions, you run a t-test For three or more, you run an ANOVA Finally, the design can have multiple levels, e.g two independent variables
of drug and placebo and participant gender, creating four combinations Different levels can also result in mixed designs An example could have a between subjects independent variable (gender) and a within subjects IV (the test-retest of reaction times)
(Between Subjects) (Within Subjects)
Ratio/Interval
(2 conditions)
Ratio/Interval
(3 or more conditions)
Unrelated T
Unrelated ANOVA
Related T
Related ANOVA
Nonparametric Statistics
Unlike parametric statistics, which (as mentioned before) test hypotheses about specific population parameters, nonparametric statistics test hypotheses about such things as similarity of distributions or the measures of central tendency It is important to note that the assumptions for these tests are weaker than those for parametric tests, so the results are not as powerful On the other hand, there are a lot
of analyses where parametric tests are not particularly appropriate, e.g situations with very unequal sample sizes
In Investigative Psychology, significant amounts of your data will not be of a nature that lends itself to parametric tests Data quality and experimental control are not one
of our strong points, but this is not a weakness in our research as long as we are aware
of the limitations and act accordingly Nonparametric tests are one of the ways in which we try to deal with our problematic data
Referring back to the previous table, there are 3 basic tests listed (we won‟t go into Sign here, it‟ll be in most statistics books) - Chi-square, Mann-Whitney and Wilcoxon In addition, there are ANOVAs for nonparametric testing of more than two conditions
Trang 6Data Level Design
(Between Subjects) (Within Subjects)
Chi-squared tests look at associations between variables, while the nonparametric t-tests and ANOVAs examine differences in shape or location of the populations
Chi-square Tests
Essentially, the CS test uses frequency scores for a variable or variables to determine whether the actual observed frequencies (those that are recorded) are different from those we would expect if there were no differences between the values, in a between-subjects design The closer the observed frequencies are to the expected ones, then the lower the value of the Chi-square If the two are similar enough, this indicates that no significant difference exists between the values
Using the Chi-square with Crosstab
Often, the test is used in conjunction with doing a crosstab, which indicates the frequency counts for each combination of values for the two variables In the table below, there are two variables, both with two values (present/not present) The frequencies of occurrence for each of the four possible combinations of values are listed (e.g blindfolding and threats to not report co-occurred 10 times)
Threat - No Report
Present Absent/not
recorded
Blindfold
Present 10 5 Absent/not
recorded
5 5
A Chi-square might reveal that there is a significant difference between the cells, and examining the table would suggest that the difference lies in how often these behaviours co-occur versus when they occur alone or when they both don‟t occur
The Mann-Whitney
This is the nonparametric equivalent of the independent samples t-test The major difference is that this test looks at the ranks of the scores, regardless of which value they belong to, for the two distributions, rather than the actual scores Ranking is a pretty straightforward concept Looking at the table below, we see that there are four scores for age, one of which would appear to be an extreme outlier and so skews the distribution (making it far from normal) If we rank the scores (listed in brackets
Trang 7beside the actual scores) the rank of age 2 is 4 The scores for age shift, by ranking, from interval to ordinal data and the effect of the extreme outlier is eliminated
Age 1 Age 2 Age 3 Age 4 Score 24 (1) 78 (4) 28 (3) 27 (2)
In the case of the Mann-Whitney, all the scores for both samples are listed together and ranked If there is a difference between the two distributions, then there will be some sort of significant ordering effect in the ranking (i.e a significant portion of one
of the two samples will make up the lower ranks, rather than a random mix) The null hypothesis of no differences between the two samples will be accepted if there is no significant difference The actual results will depend on such things as sample sizes, but SPSS will adjust itself accordingly
The Wilcoxon T-test
This, unsurprisingly, is the nonparametric equivalent of the dependent samples t-test Ranks in this case are calculated based on the differences between the two scores for each subject over the two conditions, e.g if one subject scored 3 acts of aggression before taking speed and 6 after, the difference score would be -3 These differences are then ranked, ignoring the sign, and then the statistics are carried out to identify whether the two conditions differ
Kruskal-Wallis One-Way ANOVA
Used, as with the parametric ANOVA, when a variable has more than two levels (independent of each other), the KWANOVA tests for differences using the ranks of the scores rather than the actual scores Like the ANOVA, the KWANOVA is a test for differences in the averages of the values, but these averages are drawn from the relative ranking, rather than the actual scores Again, a significant result indicates that differences do exist As far as I can tell, SPSS does not have a post-hoc test option for KWANOVA, which means you‟ll have to do it by hand Just find yourself a good statistics book and the information you need should lie within This ANOVA, and its equivalent measure for related samples, are described in the SPSS section below
Correlations and Associations
When trying to determine the relationship between two variables, graphing each case using the scores as x and y co-ordinates can give you something of an initial impression of what associations may be occurring However, to statistically test the relationship - to see how strong it is, in a sense - you need to determine how correlated they are In a nutshell, the results will show to what degree the scores of two variables relate to one another The more they coincide, the stronger the degree
of association between the two
In general, the correlation coefficients you will use are appropriate in situations where there is, to some degree, a linear relationship between the two variables If the relationship is strongly curvilinear (i.e if you plotted the two variables and the line
Trang 8did a crazy zigzag pattern across the graph), then there are alternatives, which we won‟t go into here For most purposes, you will use one of two correlation coefficients - Pearson‟s Product Moment and Spearman‟s Rank Order
Deciding between the two is fairly easy If you are using an Ordinal scale, Spearman‟s is the one to use If the variables are interval, and the actual plot of the variables is weakly curvilinear (not a straight line, but generally when x goes up, y goes up, just to varying degrees), you use Spearman‟s If the variables are interval, and the graph is linear, then you use Pearson‟s You can easily run both at the same time, so you might as well However, it‟s important to understand which one of the two is more appropriate for your analysis, so that you include the right one in your assignments or dissertations Pearson‟s for ratio data, Spearman‟s for the rest
With all the correlations, you will end up with a score between +1.00 and -1.00 The best way to think of what it means is to split it into two parts The sign of the coefficient indicates whether the relationship is positive (+) or negative (-) The former means that as x increases in value, so does y The latter means that as the value of x increases, the value of y decreases (or, as y goes up, x goes down) The size of the coefficient, ignoring the sign, represents how powerful the relationship is
A score of 1.00 (with a + or - sign) would represent a perfect correlation, while a value of, say, +0.85 would be very strong - as x increases, so does y to a roughly equivalent degree A value of 0.00 would indicate that there is no relationship between the two
Some warnings
1 Remember that the coefficients indicate a degree of association, but not causality Unless you have strong theoretical reasons to indicate such, you can not clearly state that x influences y It could be the case that it is y influencing x, or even a third variable could exist, z, that is influencing both
2 A number of factors can influence a non-ranking correlation: the use of extreme groups, the presence of outliers, combining different groups and curvilinearity Each of these can lead to inaccurate findings
Pearson’s Product Moment Correlation Coefficient
I won‟t bore you with detailed descriptions of covariance and such As usual, if you want a deeper understanding of the inner workings of this procedure, you‟ll have to find a book on it The value that is of importance to you is the squared correlation coefficient (r2) This indicates how much of the variance in y can be accounted for by
x - their common variance, as it were Note that since r is between 1.00 and -1.00, r2
is always smaller than r
Trang 9Spearman’s Rank Order Correlation Coefficient
This is basically the nonparametric version of Pearson‟s, by way of ranking the scores for the two variables, rather than using the raw scores themselves Interpretation of the results is the same
Other Measures of Association
1) Dichotomous Data:
Jaccard‟s
Jaccard‟s is the appropriate measure of association to use for dichotomous variables where mutual non-occurrence does not indicate anything about the relationship between the two variables This is typically the case in content analysis, e.g using police records
Yule‟s Q/Guttman‟s Mu
This is the best measure to use if you do know that mutual non-occurrence does indicate something about the relationship between the two variables There was a tendency last year to automatically run SSAs using Jaccard‟s This was appropriate most of the time, but there were times when Mu could have been used instead Keep
in mind that Jaccard‟s is the weakest of all possible measures of association - it is used for the type of variables with the least information (dichotomous) and then does not use all the information available from the variables
If you are using materials that aren‟t subject to the problems police records suffer, e.g
if you are doing analyses on a drug abusers personal diaries you know whether the variables are present or not, then use Guttman‟s Mu
2) Ordinal Data:
Kendall‟s Tau/Guttman‟s Mu
Use both of these for non-metric analyses (e.g non-metric SSA) Use the former when you have equal numbers of categories between the two variables, and the latter, which is weaker, when you have unequal numbers
3) Interval data:
Pearson‟s for parametric analyses (see above) Alternatively, in SSA you can use Guttman‟s Mu for non-parametric analyses of interval data