While numerous studies have described the reliability of two-point discrimination testing, interpreting these results to apply them to clinical practice has been hampered by the lack of standardized testing procedures and the inability to quantify subject cognitive function.
Reliability testing has ranged from moderate and good to poor. The cooperation of the subject and the ability of the subject to attend to the stimulus have been suggested to influence two-point discrimination measures, as do central training effects.
There appears to be little carry-over between static two-point discrimination tests and function, although moving two-point discrimination testing (which tests rapidly adapting afferent fibers) has been shown to correlate with object identification tests. Likewise, the sensitivity of two-point discrimination testing to detect change over time is poor.
Reports of the reliability of two-point discrimination testing vary according to the age and sex of subjects, the peripheral nerve tested, and whether the subject is symptomatic or asymptomatic. Testing procedures also vary with the starting position (wide or narrow distances), the amount of pressure applied, and the instrument used to apply the stimulus.
It is questionable whether any reliability measures of sensibility can be used as a reference to judge the presence of pathology. It is recommended that results from any sensory
testing procedures not be used as the sole means of developing diagnoses of peripheral or central nervous system origin.
Bibliography
Adams RD, Victor M, Ropper AH:Principles of neurology,New York, 1997, McGraw-Hill.
Feinberg JH: Burners and stingers,Phys Med Clin N Am 11:771-784, 2000.
Gilroy J:Basic neurology,ed 3, New York, 2000, McGraw-Hill.
Haymore J: A neuron in a haystack: advanced neurologic assessment,AACN Clin Issues15:568-581, 2004.
Lance JW: The Babinski sign,J Neurol Neurosurg Psychiatry73:360-362, 2002.
Lundborg G, Rosen B: The two-point discrimination tests—time for a re-appraisal? J Hand Surg (Br) 29B(5):418-422, 2004.
Lundy-Eckman L:Neuroscience: fundamentals for rehabilitation,Philadelphia, 1998, WB Saunders.
Magee DJ:Orthopedic assessment,ed 3, Philadelphia, 1997, WB Saunders.
Milhorat TH: Classification of syringomyelia,Neurosurg Focus 8:1-6, 2000.
Novak C et al: Establishment of reliability in the evaluation of hand sensibility,Plast Reconstruct Surg 93:311-322, 1993.
Peters EW et al: The reliability of assessment of vibration sense,Acta Neurol Scand107:293-298, 2003.
Rainville J et al: Comparison of four tests of quadriceps strength in L3 or L4 radiculopathies,Spine 28:2466-2471, 2003.
Rozental TD et al: Intra- and interobserver reliability of sensibility testing in asymptomatic individuals,Ann Plast Surg44:605-609, 2000.
Shy ME et al: Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology. Quantitative sensory testing,Neurology60:898-904, 2003.
Umphred DA:Neurological rehabilitation,ed 2, St Louis, 1990, Mosby.
Viikari-Juntra E, Porras M, Laasonen EM: Validity of clinical tests in the diagnosis of root compression in cervical disc disease,Spine14:253-257, 1989.
Waxman SG:Correlative neuroanatomy,ed 23, Stamford, Conn, 1996, Appleton & Lange.
Yoss RE et al: Significance of symptoms and signs in localization of involved root in cervical disc protrusions, Neurology7:673-683, 1999.
Yuras S: Syringomyelia: an expanding problem,J Am Acad Nurse Pract12:22-24, 2000.
2 1 C h a p t e r
Clinical Research and Data Analysis
Frank B. Underwood, PT, PhD, ECS
1. What is research?
Research is a controlled, systematic approach to obtain an answer to a question.Experimental research involves the manipulation of a variable and measurement of the effects of this
Bibliography
Adams RD, Victor M, Ropper AH:Principles of neurology,New York, 1997, McGraw-Hill.
Feinberg JH: Burners and stingers,Phys Med Clin N Am 11:771-784, 2000.
Gilroy J:Basic neurology,ed 3, New York, 2000, McGraw-Hill.
Haymore J: A neuron in a haystack: advanced neurologic assessment,AACN Clin Issues15:568-581, 2004.
Lance JW: The Babinski sign,J Neurol Neurosurg Psychiatry73:360-362, 2002.
Lundborg G, Rosen B: The two-point discrimination tests—time for a re-appraisal? J Hand Surg (Br) 29B(5):418-422, 2004.
Lundy-Eckman L:Neuroscience: fundamentals for rehabilitation,Philadelphia, 1998, WB Saunders.
Magee DJ:Orthopedic assessment,ed 3, Philadelphia, 1997, WB Saunders.
Milhorat TH: Classification of syringomyelia,Neurosurg Focus 8:1-6, 2000.
Novak C et al: Establishment of reliability in the evaluation of hand sensibility,Plast Reconstruct Surg 93:311-322, 1993.
Peters EW et al: The reliability of assessment of vibration sense,Acta Neurol Scand107:293-298, 2003.
Rainville J et al: Comparison of four tests of quadriceps strength in L3 or L4 radiculopathies,Spine 28:2466-2471, 2003.
Rozental TD et al: Intra- and interobserver reliability of sensibility testing in asymptomatic individuals,Ann Plast Surg44:605-609, 2000.
Shy ME et al: Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology. Quantitative sensory testing,Neurology60:898-904, 2003.
Umphred DA:Neurological rehabilitation,ed 2, St Louis, 1990, Mosby.
Viikari-Juntra E, Porras M, Laasonen EM: Validity of clinical tests in the diagnosis of root compression in cervical disc disease,Spine14:253-257, 1989.
Waxman SG:Correlative neuroanatomy,ed 23, Stamford, Conn, 1996, Appleton & Lange.
Yoss RE et al: Significance of symptoms and signs in localization of involved root in cervical disc protrusions, Neurology7:673-683, 1999.
Yuras S: Syringomyelia: an expanding problem,J Am Acad Nurse Pract12:22-24, 2000.
2 1 C h a p t e r
Clinical Research and Data Analysis
Frank B. Underwood, PT, PhD, ECS
1. What is research?
Research is a controlled, systematic approach to obtain an answer to a question.Experimental research involves the manipulation of a variable and measurement of the effects of this
manipulation.Nonexperimental researchdoes not manipulate the environment but may describe the relationship between different variables, obtain information about opinions or policies, or describe current practice.Basic researchis generally thought of as laboratory-based research, in which the researcher has control over nearly all aspects of the environment and subjects.Clinical orapplied researchusually uses entire, intact organisms in a more natural environment.
2. What are variables?
Variables are measurements or phenomena that can assume more than one value or more than one category. A categoricalordiscrete variableis one that can assume only certain values and often is qualitative(no quantity or numerical value implied).Continuous variablesare ones that can assume a wide range of possible values and are usually quantitativein nature.
3. Define independent variable and dependent variable.
• Independent variable—the variable that is manipulated by the researcher
• Dependent variable—the variable that is measured by the researcher
Independent variables often are qualitative, and dependent variables usually are quantitative. The different permutations of the independent variable are called levels.To be an independent variable, there must be at least two levels; if some aspect of the research has only one possible value or category, it is a constant.
4. Describe other types of variables.
Extraneousorconfoundingvariables are phenomena that are not of interest to the researcher but may have an effect on the value of the dependent variable. Extraneous variables must be controlled as much as possible, usually by holding some aspect of the research constant. A covariateis a phenomenon that affects the dependent variable and is not of interest to the researcher, but that the researcher is unable to control.
5. How accurate are measurements?
The observed measurement of any phenomenon is composed of a true scoreanderror.Error may be systematic, in which case all scores are increased or decreased by a constant amount, or random.
Systematic error generally is the result of using the measurement instrument incorrectly or improper calibration of the instrument. Random error is precisely that—random. Even if the true score is constant, and there is no systematic error, repeated measurements of a phenomenon do not produce identical scores. It is generally assumed that the effects of all of the sources of random error cancel each other, such that the measured score is the best estimate of the true score. If the true score is constant, repeating the measurement and calculating an average score may be a better estimate of the true score. If the true score is labile or is altered as a consequence of the measurement, repeated measurements may reduce the accuracy of the measurement.
6. Define measurement reliability.
Reliability is related to consistency or repeatability. In the absence of a change in the true score, how similar are repeated measurements of the same phenomenon? Intra-rater reliability is a measure of how consistent an individual is at measuring a constant phenomenon,inter-rater reliabilityrefers to how consistent different individuals are at measuring the same phenomenon, andinstrument reliabilitypertains to the tool used to obtain the measurement. If a measurement cannot be performed reliably, it is difficult to ascribe changes in the dependent variable to the effects of the independent variable, rather than measurement error.
7. Describe statistical procedures used to estimate reliability.
Theintraclass correlation coefficient (ICC), which is based on an analysis of variance (ANOVA) statistical procedure, is a popular means of estimating reliability. A means of measuring absolute
they measure covariance, not agreement.
8. Define measurement validity.
It is an indication of whether the measurement is an accurate representation of the phenomenon of interest. Some clinical measurements have obvious validity. For example, using a goniometerto measure the angle between two bones with the joint as the axis is generally accepted as a valid indication of the status of the tissue that limits motion at that joint. For other measurements, the relationship between what is measured and what is inferred from the measurement is more tenuous. To establish the validity of a clinical test, a more direct measurement that is considered a gold standardis established. If acceptable numbers of patients with a positive Lachman’s test have anterior cruciate ligament tears and those without tears have a negative Lachman’s test, the Lachman’s test is considered a valid test for anterior cruciate ligament integrity. There is no universal definition ofacceptable numbers;this is left to the researcher to defend and the clinician to accept or reject.
9. What is a research design?
A research design is a plan or structure of the means used to answer the research question or to gather the information for a nonexperimental study. There are three basic designs for experimental research:
1. A completely randomized design uses a single independent variable and assigns different groups of subjects to each level of the independent variable. Because each subject receives only one type of treatment, this design is also called a between-subjects design.If the independent variable is the type of brace and there are three levels (i.e., three different braces are being used), then an individual subject would be measured while using only one of the three braces.
2. A repeated measures design uses a single independent variable and measures each subject under all levels of the independent variable. If the independent variable is the dosage of a drug and levels are 200, 400, and 600 mg/day, then each subject would be measured while taking each of the three dosages.
3. A factorial design uses two or more independent variables. A completely randomized factorial designis one in which all of the independent variables are independent factors, meaning an individual subject is measured under only one condition. If the two independent variables are type of brace and dosage of a drug, and there are three levels of each variable, then nine groups of subjects would be studied. A within-subjects factorial designmeasures each subject in all levels of all variables. Using the brace and dosage variables, each subject would be measured with each brace and dosage (e.g., brace A and 200 mg, brace A and 400 mg, brace A and 600 mg). A mixed factorial designuses at least one independent factor and at least one repeated factor. If subjects are assigned to only one brace, but are measured with all three drug dosages, the design is mixed.
10. Which descriptive statistics are most useful for describing a set of data?
It depends on the data. If the data are distributed normally, the three measures of central tendency are equal; in this case, the meanis most often used to describe the typical performance. If there are a few scores at one extreme or the other in the set of data, the medianis considered the best measure of central tendency. For example, in the data set 2, 4, 5, 7, 83, the mean is 20.2, and the median is 5; 5 is more descriptive of the typical score than 20.2. The standard deviation(or variance) is the most descriptive value for the variability of a data set that is distributed normally, and minimum-maximum may be the best measure of variability in data sets that are best described with the median.
11. Are the terms normal distribution, bell curve, and gaussian distribution equivalent?
Yes, in that all three terms refer to the shape of a frequency histogram constructed using the scores from any measurement that is the sum of a true score and multiple, small, independent sources of error. Nearly any physiologic or anatomic parameter that is measured in a large group of individuals falls into a normal distribution.For example, suppose the maximal aerobic capacity is measured in 500 individuals selected at random. The scores are counted and grouped into increments of 5 (e.g., the number of subjects with a maximal aerobic capacity of 0 to 5, 6 to 10, 11 to 15), and the results are used to construct a bar plot with the increments on the x-axis and the number of individuals in each bin on the y-axis. If the average value was 36, and the standard deviation was 6, the resulting plot might look like the figure. Most scores were between 31 and 35, with fewer scores at each extreme. For example, the number of scores in the 6-10 range is approximately equal to the number of scores in the 51-55 range. In a perfectly normal distribution, 68% of the scores will be found within 1 standard deviation of the mean; in this example, 340 of the 500 scores should be between 26 and 38 (32 ±6), 95% of the scores will be within 2 standard deviations of the mean, and 99% of the scores will be within 3 standard deviations of the mean.
12. Are there distributions other than a normal distribution?
Yes, especially with small samples,skewed distributionsare possible. A skewed distribution results when there are a few extreme scores at one end or the other of the distribution. For example, if most of the scores are low, but there are a few high scores, the distribution might be similar to the figure. This distribution is skewed to the right by the few extremely high scores. If there are a few extremely low scores, the distribution is skewed to the left. The direction of the skew is determined by drawing (or imagining) a line connecting the top of each bar in the histogram and stating to which side of the plot the tail extends.
120 110 100 90 80 70 60 50 40 30 20 10 0
0 5 10 15 20 25 30 35
Maximal Oxygen Consumption
40 45 50 55
Number of Subjects
13. Can the same concepts be used with a skewed distribution; that is, are 68%
of the scores within 1 standard deviation of the mean?
No. These values hold true only for a normal distribution. In the case of a skewed distribution, the median is a better descriptor of the typical score, and the minimum-maximum better describes the variability in the set of data.
14. What are inferential statistics?
When data are collected, the researcher needs to determine the probability of obtaining a particular set of scores by chance alone. The procedures used to calculate this probability are called inferential statisticsand are the heart of testing an experimental hypothesis. There are different procedures used based on the research design, the nature of the research question (what the researcher is trying to answer), and the nature of the data.
15. Describe the fundamental concept of inferential statistics.
In the simplest case, consider a randomized design, with a single independent variable having two levels and a single dependent variable. Suppose a researcher posed the following question: What is the effect of adding neural glide techniques for the median nerve to the standard treatment for patients with carpal tunnel syndrome? The independent variable is treatment, and the levels are standard and neural glide. The dependent variable could be number of days until the patient is free of symptoms for 10 consecutive days. A sample of patients with carpal tunnel syndrome is selected at random from the population of patients with carpal tunnel syndrome, and the patients in the sample are assigned at random to one of the two treatment levels. Because the patients have been selected at random from the population, and then assigned at random to one of the two treatment groups, it is a reasonable assumption that the mean and standard deviation for the dependent variable would be the same for both treatment groups if there is no effect of adding neural glide to the standard treatment. All of the subjects are treated until the criterion for discharge is met (i.e., free of symptoms for 10 consecutive days), and the data are summarized. If the standard group recovered in an average of 40 days, with a standard deviation of 7 days, and the neural glide group recovered in 32 days, with a standard deviation of 6 days, did the treatment work? There is a difference in the average days to recovery, but is that difference large enough to conclude that it was due to the neural glide, or could it be attributed to chance alone? Perhaps the subjects in the neural
16 14 12 10 8 6 4 2 0
0 5 10 15 20 25 30 35 40 45 50 55
Maximal Oxygen Consumption
Number of Subjects
glide group did not have as severe compression of the median nerve at the beginning of the study and recovered more quickly despite the neural glide. The essence of inferential hypothesis testing is to answer the following question: What is the probability of having obtained a difference in days to recovery of this magnitude as a result of random factors? If this probability is low enough, the researcher can conclude that the treatment had a beneficial effect and should become a part of standard practice.
16. How is the correct statistical test chosen?
The short answer is that it depends on the question being asked:
• If the desire is to learn about the association between two variables (e.g., the relationship between thigh girth and knee extensor force), a correlation coefficientshould be calculated.
• If the question concerns prediction (e.g., if a patient has knee range of motion of 5 to 60 degrees on the second postoperative day, how many days will the patient likely remain in the hospital?), aregression analysisis appropriate.
• If the question is whether a treatment has an effect (e.g., does spinal traction reduce the signs and symptoms of a lumbosacral root compression?), a chi-square,analysis of variance (ANOVA), ort-test,which is a special case of the ANOVA, is appropriate.
However, because there are different types of data and different types of restrictions placed upon the testing, the answer is more complicated. There are four levels of data: nominal, ordinal, interval, and ratio. Information measured on a nominal scale results in a name only; that is, it does not imply a quantity. Left versus right and red versus blue are examples of nominal data. If a numeral is assigned to information on a nominal scale, a quantity is not implied; if red is coded 1 and blue is coded 2, it does not mean that blue is twice as much as red.
An ordinal scale implies a rank order, with some quantitative value. The person who finishes a race first receives the number 1, meaning this person finished the race in a shorter time than the second-place finisher. However, the amount of time between first and second place is not likely the same as the amount of time between fifth and sixth place.
For statistical purposes, there are no meaningful differences between an interval and a ratio scale; both imply not only a rank order but also an equivalence between points on the scale. The difference between 80 and 95 is the same as the difference between 25 and 40; in both cases, it is 15.
For a correlation study, a Spearman rho(for Spearman, who developed the procedure, and rank order) is used for ordinal data. A Pearson correlation coefficientis calculated for interval data. In both cases, the coefficient can vary between −1.00 and +1.00. A value of zero means that there is no correlation, and a value of 1.00 signifies the correlation is perfect. If the sign is +, the value of one variable increases as the other increases. If the sign is −, the value of one variable decreases as the other increases.
For experimental studies, those designed to determine if there is a difference, a chi-square is computed for data that are nominal. There is some disagreement regarding the appropriate analysis when the data meet the definition of ordinal or interval. It is almost universally agreed that to perform a traditional ANOVA, the sets of data should have a normal distribution, and the variance of the sets of data should be similar (the definition of similar is usually lacking; a rule of thumb is that the variance of one set should be no more than twice the other set). There are formal tests that can be used to determine whether the data are normally distributed, and whether the variances are equal; these are beyond the scope of this book, and are generally of little or no interest to the clinician. Some authors further state that the data must meet the definition of interval or ratio data; in fact, some researchers ignore the more important requirements of normal distribution and equality of variance and claim that the tests are robust enough that any data on an interval or ratio scale can be analyzed with a traditional ANOVA. However, the scale of the data was not an issue when the traditional ANOVA approach was developed. Therefore if the data are normally distributed, and the variances are equal, then a traditional ANOVA is appropriate, regardless of the scale of the data. Often, especially with the small sample sizes usually used in rehabilitation