Consider this situation: a researcher polls a random sample of 100 pediatric cardiologists regarding their preferred initial therapy for heart failure, finding that 72% of the sampled ph
Trang 1population The larger your sample, the more precise the measurement and the closer you will be to the true mean This is because based on the actual
distribution of blood pressures in the population, more individuals have a value near 100 mm Hg, and with increased samples, each individual value contributes less to the total, so extreme values have less effect on the mean
How do we tell whether measurements are different from each other by
chance or truly different? Consider this situation: a researcher polls a random sample of 100 pediatric cardiologists regarding their preferred initial therapy for heart failure, finding that 72% of the sampled physicians prefer using ACE
inhibitors (ACEIs) over β-blockers Since the sample was chosen at random, the researcher decides that it is a reasonable assumption that this group is
representative of all pediatric cardiologists A report is published titled, “ACEIs Are Preferred Over β-Blockers for the Treatment of Heart Failure in Children.” Had all pediatric cardiologists been polled, would 72% of them have chosen ACEIs? If another researcher had selected a second random sample of 100
pediatric cardiologists, would 72% of them also have chosen ACEIs over β-blockers? The answer in both cases is probably not, but if both the samples came from the same population and were chosen randomly, the results should be close
Next, suppose that a new study is subsequently published reporting that β-blockers are actually better at improving ventricular function than ACEIs You subsequently poll a new sample of 100 pediatric cardiologists and find that only 44% now prefer ACEIs Is the difference between your original sample and your new sample due to random error, or did the publication of the new study have an effect on preference in regard to therapy for children in heart failure? The key to answering this question is to estimate the probability by chance alone of
obtaining a sample in which 44% of respondents prefer ACEIs when, in fact, 72% of the population from which the sample is drawn actually prefer ACEIs In such a situation, inferential statistics can be used to assess the difference between the distribution in the sample as opposed to the population, and the likelihood or probability that the difference is due to chance or random error
Relationship Between Probability and Inference
Statistical testing comparing two groups starts with the hypothesis that both
groups are equivalent, also called the null hypothesis A two-tailed test tests the
probability that group A is different than group B, either higher or lower,
whereas a one-tailed test tests the probability that group A is either specifically
Trang 2higher or lower than group B but not both Two-tailed tests are generally used in medical research statistics (with a common exception being noninferiority trials)
Statistical significance is reached when the P value obtained from the tests is
under 0.05, meaning that the probability that both groups are equivalent is lower
than 5% The P value is an expression of the confidence we might have that the
findings are true and not the result of random error Using our previous example
of preferred treatment for heart failure, suppose the P value was <.001 for 44%
being different from 72% This means that the chance that our sample measure
of 44% was different from 72% due to random error was 1 in 1000 We can confidently conclude, therefore, that the second sample is truly different from the original sample and that the opinions of the pediatric cardiologists have changed
Relevance of P Values
Limitations.
P < 05 is the standard value for defining statistical significance, meaning that
we typically accept a result as significant if the chance of its occurrence by
random error alone was less than 5% However, there is nothing particularly
unique about the specific value P < 05 If the measured P value for a statistic is
.06, is the 6% probability that a result was due to chance really nonsignificant,
whereas the 4% chance implied by a P value of 04 is significant? Yet many assume that if the P value is <.05, then the results are important and meaningful This is not the case, since a P value is only a measure of confidence Thus it is important to understand the implications and meaning of a P value The results
must always be considered in light of the magnitude of the observed difference
or association, the variation in the data, the number of subjects or observations, and the clinical importance of the observed results
Clinical Relevance Versus Statistical Significance.
A primary consideration regarding statistical analysis in medical research is the difference between statistical significance and clinical relevance There is no real value to an association that is highly statistically significant but clinically or biologically implausible, which is a more common finding for studies with very large populations Likewise, studies with results that are clinically important but not statistically significant are of uncertain value, which is a more common
finding for studies with very small populations Statistical significance does not
Trang 3statistically significant might be very clinically important
Confidence Intervals.
Confidence intervals are important tools for assessing clinical relevance because
they are intrinsically linked to a statistical P value, but they give much more
information The confidence interval is a representation of the underlying
probability distribution of the observed result based on the sample size and variation in the data A 95% confidence interval would encompass 95% of the results from similar samples with the same sample size and variation properties and thus would represent a wider range of values within which we might be confident that the true result might lie
The confidence interval asks two questions: first, what is the likely difference between groups and, second, what are the possible values this difference might have? For example, suppose that a randomized clinical trial testing the effect a new drug called “pumpamine,” for children with low cardiac output, found a
reduction in mortality with pumpamine compared with placebo, but the P value
was >.05; hence the result did not achieve statistical significance Before we conclude that pumpamine conferred no benefit, we must examine the result and the confidence interval The trial randomized 30 patients to pumpamine and 30
to placebo The mortality in the intensive care unit was 40% with pumpamine and 20% with placebo, giving a 20% absolute difference in mortality for
pumpamine When assessing the significance of the results, the 95% confidence
interval ranges from −41% to +3%, and the P value is 09 Based on the P value,
we might conclude that there is no benefit for pumpamine But we also know that we are 95% confident that the truth lies somewhere between a 41%
reduction in mortality and a 3% increase in mortality Since the interval includes the value 0%, we cannot confidently conclude that there is a difference between the two drugs However, we also cannot confidently conclude that there was not
an important difference between the two drugs, including a benefit potentially as great as a 41% absolute reduction in mortality for pumpamine It would be
difficult to know what to conclude from this study This situation most
commonly arises when we have an insufficient number of study subjects or
observations, or the study lacks adequate power.
Type I Error, Type II Error, and Power