1. Trang chủ
  2. » Y Tế - Sức Khỏe

Essentials of Clinical Research - part 8 ppsx

36 336 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 36
Dung lượng 675,4 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

14 Research Methodology for Studies of Diagnostic Tests 251“trade off” between sensitivity and specificity, and the area under the curve AUC of a ROC curve is a measure of overall test a

Trang 1

14 Research Methodology for Studies of Diagnostic Tests 249

test Note that the FP percentage is 1-specificity (that is, if the specificity is 90% –

in 100 patients without the index disease, 90 will have a negative test, which means

10 will have a positive test – i.e FP is 10%)

Predictive Value

Another concept is that of the predictive value (PV+ and PV−) of a test This is asking the question differently than what sensitivity and specificity address – that

is rather than asking what the TP and TN rate of a test is, the PV+ of a test result

is asking how likely is it that a positive test is a true positive (TP)? i.e TP/TP + FP (for PV- it is TN/TN + FN)

Ways of Determining Test Accuracy and/or Clinical Usefulness

There are at least six ways of determining test accuracy and they are all interrelated

so the determination of which to use is based on the question being asked, and one’s personal preference They are:

Sensitivity and specificity

as P(T+/D+) and PV+ as P(D+/T+) Bayes’ Formula can be written then as follows: The post test probability of disease =

Table 14.1 The relationship between disease and test

result

Abnormal test Normal test

Trang 2

in the example in Table 14.2 this yields a sensitivity of 62% and specificity of 89% Note what happens when one changes the definition of what a positive test is, by using 0.5 mm ST depression as the cut-point for calling test positive or negative Another important axiom is that the prevalence of disease in the population you are studying does not significantly influence the sensitivity or specificity of a test (to derive those variables the denominators are defined as subjects with or without the disease i.e if you are studying a population with a 10% disease prevalence one is determining the sensitivity of a test – against a gold standard – only in those 10%)

In contrast, PV is very dependent on disease prevalence because more individuals will have a FP test in populations with a disease prevalence of 10% than they would

if the disease prevalence was 90% Consider the example in Table 14.3

Receiver Operator Characteristic Curves (ROC)

The ROC is another way of expressing the relationship between sensitivity and specificity (actually 1-specificity) It plots the TP rate (sensitivity) against the FP rate over a range of “cut-point” values It thus provides visual information on the

Table 14.2 Pre vs post-test probability

Trang 3

14 Research Methodology for Studies of Diagnostic Tests 251

“trade off” between sensitivity and specificity, and the area under the curve (AUC)

of a ROC curve is a measure of overall test accuracy (Fig 14.3) ROC analysis was born during WW II as a way of analyzing the accuracy of sonar detection of sub-marines and differentiating signals from noise.6 In Fig 14.4, a theoretic “hit” means

a submarine was correctly identified, and a false alarm means that a noise was incorrectly identified as a submarine and so on You should recognize this figure as the equivalent of the table above discussing false and true positives

Another way to visualize the tradeoff of sensitivity and specificity and how ROC curves are constructed is to consider the distribution of test results in a population

In Fig 14.5, the vertical line describes the threshold chosen for a test to be called positive or negative (in this example the right hand curve is the distribution of sub-jects within the population that have the disease, the left hand curve those who do

Table 14.3 Pre vs post-test probability

AUC can be calculated, the closer to 1 the better the test Most good tests run 7-.8 AUC

Tests that discriminate well, crowd toward the upper left corner of the graph.

Trang 4

252 S.P Glasser

not have the disease) The uppermost figure is an example of choosing a very low threshold value for separating positive from negative By so doing, very few of the subjects with disease (recall the right hand curve) will be missed by this test (i.e the sensitivity is high – 97.5%), but notice that 84% of the subjects without disease will also be classified as having a positive test (false alarm or false + rate is 84% and the specificity of the test for this threshold value is 16%) By moving the verti-cal line (threshold value) we can construct different sensitivity to false + rates and construct a ROC curve as demonstrated in Fig 14.6

As mentioned before, ROC curves also allow for an analysis of test accuracy (a combination of TP and TN), by calculating the area under the curve as shown in the figure above Test accuracy can also be calculated by dividing the TP and TN by all possible test responses (i.e TP, TN, FP, FN) as is shown in Fig 14.4 The way ROC curves can be used during the research of a new test, is to compare the new test to existent tests as shown in Fig 14.7

Fig 14.4 Depiction of true and false responses

based upon the correct sonar signal for

submarines

Fig 14.5 Demonstrates how

changing the threshold for what

divides true from false signals affects

ones interpretation

Trang 5

14 Research Methodology for Studies of Diagnostic Tests 253

Fig 14.6 Comparison of ROC curves

Fig 14.7 Box 12-1 Receiver operating characteristic curve for cutoff levels of B-type natriuretic peptide in differentiating between dyspnea due to congestive heart failure and dyspnea due to other causes

Trang 6

254 S.P Glasser

Likelihood Ratios

Positive and Negative Likelihood Ratios (PLR and NLR) are another way of ing the results of diagnostic tests Essentially, PLR is the odds that a person with a disease would have a particular test result, divided by the odds that a person without disease would have that result In other words, how much more likely is a test result

analyz-to occur in a person with disease than a person without disease If one multiplies the pretest odds of having a disease by the PLR, one obtains the posttest odds of having that disease The PLR for a test is calculated as the tests sensitivity/1-specificity (i

e FP rate) So a test with a sensitivity of 70% and a specificity of 90% has a PLR

of 7 (70/1-90) Unfortunately, it is made a bit more complicated by the fact that we generally want to convert odds to probabilities That is, the PLR of 7 is really an odds of 7 to 1 and that is more difficult to interpret than a probability Recall that odds of an event are calculated as the number of events occurring, divided by the

number of events not occurring (i.e non events, or p/p-1,) So if blood type O occurs

in 42% of people, the odds of someone having a blood type of O are 0.42/1-0.42 i

e the odds of a randomly chosen person having blood type O is 0.72:1 Probability

is calculated as the odds/odds + 1, so in the example above 0.72/1.72 = 42% (or 0.42 – that is one can say the odds have having blood type O is 0.72 to 1 or the probabil-ity is 42% – the latter is easier to understand for most) Recall, that probability is the extent to which something is likely to happen To review, take an event that has

a 4 in 5 probability of occurring (i.e 80% or 0.8) The odds of its occurring is 0.8 or 4:1 Odds then, are a ratio of probabilities Note that an odds ratio (often used

0.8/1-in the analysis of cl0.8/1-inical trials) is also a ratio of odds

If one has estimated a pretest odds of disease, one can multiply that odds by the

LR to obtain the post test odds, i.e.:

Post-test odds = pre-test odds × LR

To use an exercise test example consider the sensitivity for the presence of CAD (by coronary angiography) based on 1 mm ST segment depression In this afore-mentioned example, the sensitivity of a “positive” test is 70% and the specificity is 90% (PLR = 7; NLR = 0.33) Let’s assume that based upon our history and physical exam we feel the chance of a patient having CAD before the exercise test is 80% (0.8) If the exercise test demonstrated 1 mm ST segment depression, your post-test odds of CAD would be 0.8 × 7 or 5.6 (to 1) The probability of that patient having CAD is then 5.6/1 + 5.6 = 0.85 (85%) Conversely if the exercise test did not dem-onstrate 1 mm ST segment depression the odds that the patient did not have CAD

is 0.33 × 7 = 2.3 (to 1) and the probability of his not having CAD is 70% In other

words before the exercise test there was an 80% chance of CAD, while after a

posi-tive test it was 85% Likewise before the test, the chance of the patient not having CAD was 20%, and if the test was negative it was 70%

Trang 7

14 Research Methodology for Studies of Diagnostic Tests 255

To add a bit to the confusion of using LRs there are two lesser used derivations

of the LR as shown in Table 14.4 One can usually assume that if not otherwise designated, the descriptions for PLR and NLR above apply But, if one wanted to

Fig 14.8 Nomogram for interpreting diagnostic test results (Adapted from Fagan8 )

Table 14.4 Pre vs post-test probabilities

Clinical presentation Pre test P (%) Post test P (%) T + Post test F (%)

Trang 8

in stead of having to do the calculations Fig 14.8

In summary, the usefulness of diagnostic data depends on making an accurate diagnosis based upon the use of diagnostic tests, whether the tests are radiologic, laboratory based, or physiologic The questions to be considered by this approach include: “How does one know how good a test is in giving you the answers that you seek?”, and “What are the rules of evidence against which new tests should be judged?” Diagnostic data can be sought for a number of reasons including: diagno-sis, disease severity, to predict the clinical course of a disease, to predict therapy response That is, what is the probability my patient has disease x, what do my history and PE tell me, what is my threshold for action, and how much will the available tests help me in patient management An example of the use of diagnostic research is provided by Miller and Shaw.7 From Table 14.5, one can see how the coronary artery calcium (CAC) score can be stratified by age and the use of the various definitions described above

References

1 Bayes T An essay toward solving a problem in the doctrine of chances Phil Trans Roy Soc

London 1764; 53:370–418.

2 Ledley RS, Lusted LB Reasoning foundations of medical diagnosis; symbolic logic,

probabil-ity, and value theory aid our understanding of how physicians reason Science July 3 1959;

Trang 9

14 Research Methodology for Studies of Diagnostic Tests 257

4 Rifkin RD, Hood WB, Jr Bayesian analysis of electrocardiographic exercise stress testing N

Engl J Med Sept 29, 1977; 297(13):681–686.

5 McGinn T, Wyer PC, Newman TB, Keitz S, Leipzig R, For GG Tips for learners of

evidence-based medicine: 3 Measures of observer variability (kappa statistic) CMAJ Nov 23, 2004;

171(11):1369–1373.

6 Green DM, Swets JM Signal Detection Theory and Psychophysics New York: Wiley; 1966.

7 Miller DD, Shaw LJ Coronary artery disease: diagnostic and prognostic models for reducing

patient risk J Cardiovasc Nurs Nov–Dec 2006; 21(6 Suppl 1):S2–16; quiz S17–19.

8 Fagan TJ Nomogram for Bayes’s theorem (C) N Engl J Med 1975; 293:257.

Trang 10

Part III

This Part addresses statistical concepts important for the clinical researcher It is not

a Part that is for statisticians, but rather approaches statistics from a basic tion standpoint

founda-Statistician: Oh, so you already have calculated the p-value? Surgeon: Yes, I used multinomial logistic regression Statistician: Really? How did you come up with that?

Surgeon: Well, I tried each analysis on the SPSS drop-down

menu, and that was the one that gave the smallest p-value.

Vickers A, Shoot first and ask questions later Medscape Bus Med 2006; 7(2), posted 07/26/2006

Trang 11

S.P Glasser (ed.), Essentials of Clinical Research, 261

© Springer Science + Business Media B.V 2008

Chapter 15

Statistical Power and Sample Size: Some

Fundamentals for Clinician Researchers

J Michael Oakes

Surgeon: Say, I’ve done this study but my results are disappointing.

Statistician: How so?

Surgeon: The p-value for my main effect was 0.06.

Statistician: And?

Surgeon: I need something less than 0.05 to get tenure.

Abstract This chapter aims to arm clinical researchers with the necessary

concep-tual and practical tools (1) to understand what sample size or power analysis is, (2)

to conduct such analyses for basic low-risk studies, and (3) to recognize when it

is necessary to seek expert advice and input I hope it is obvious that this chapter aims to serve as a general guide to the issues; specific details and mathematical presentations may be found in the cited literature Additionally, it should be obvi-ous that this discussion of statistical power is focused, appropriately, on quantitative investigations into real or hypothetical effects of treatments or interventions It does

not address qualitative study designs The ultimate goal here is to help practicing clinical researcher get started with power analyses.

Introduction

My experience as both and educator and collaborator is that clinical researchers are frequently perplexed if not unnerved by questions of statistical power, detectable effect, number-needed-to-treat, sample size calculations, and related concepts Those who have taken a masters-level biostatistics course may even become para-lyzed by authoritative cautions, supporting the quip that a little knowledge can be

a dangerous thing Unfortunately, anxiety and misunderstanding seem to push some

to ignore the issues while others appear rigid in their interpretations, rejecting all

‘under-powered’ studies as useless Neither approach is helpful to researchers or medical science

I do not believe clinician researchers, especially, are to blame for the trouble My take is that when it comes to statistical power and related issues, instructors, usually

Trang 12

262 J.M Oakes

biostatisticians, are too quick to present equations and related algebra instead of the underlying concepts of uncertainty and inference Such presentations are under-

standable since the statistically-minded often think in terms of equations and are

obviously equipped with sufficient background information and practice to make sense of them But the same is not usually true of clinicians or perhaps even some epidemiologists Blackboards filled with Greek letters and algebraic expressions, to say nothing of terms like ‘sampling distribution,’ only seem to intimidate if not turn-off students eager to understand and implement the ideas What is more, I have come across strikingly few texts or articles aimed at helping clinician-researchers understand key issues Most seem to address only experimental (e.g., drug trial) research, offer frightening cautions, or consider only painfully simple studies Little attention is paid to less glorious but common clinical studies such as sample-survey research or perhaps the effects of practice/cultural changes to an entire clinic Too little has written about the conceptual foundations of statistical power, and even less

of this is tailored for clinician-researchers

I find that clinical researchers gain a more useful understanding of, and ation for, the concepts of statistical power when the ideas are first presented with some utilitarian end in mind, and when the ideas are located in the landscape of inference and research design Details and special-cases are important, but an emphasis must be placed on simple and concrete examples relevant to the audience Mathematical nuance and deep philosophical issues are best reserved for the few who express interest Still, I agree with Baussel and Li1 who write,

appreci-… a priori consideration of power is so integral to the entire design process that its eration should not be delegated to individuals not integrally involved in the conduct of an investigation…

consid-Importantly, emphasis on concepts and understanding may also be sufficient for clinical researchers since I believe the following three points are critical to a suc-cessful power-analysis:

1 The More, the Merrier – Except for exceptional cases when study subjects are exposed to more than minimal risk, there is hardly any pragmatic argument for

not enrolling as many subjects as the budget permits Over-powered studies are not much of a threat, especially when authors and readers appreciate the abun-dant limitations of p-values and other summary measures of ‘significance.’ While perhaps alarming, I have found analytic interest in subgroup comparisons

or other ‘secondary’ aims to be universal; few researchers are satisfied when

‘real’ analyses are limited to main study hypotheses It follows that more jects are always needed But let me be clear: when risk is elevated, clinical researchers must seek expert advice

sub-2 Use Existing Software – Novice study designers should rely on one or more of

the high-quality and user-friendly software packages available for calculating statistical power Novice researchers should not attempt to derive new equations nor should they attempt to implement any such equation into a spreadsheet pack-age The possibility of error is too great and efforts to ‘re-invent the wheel’ will likely lead to mistakes Existing software packages have been tested and will

Trang 13

15 Statistical Power and Sample Size 263

give the correct answer, provided researchers input the correct information This means, of course, that the researcher must understand the function of each input parameter and the reasonableness of the values entered

3 If No Software, Seek Expert – If existing sample-size software cannot

accom-modate a particular study design or an analysis plan, novice researchers should seek expert guidance from biostatistical colleagues or like-minded scholars Since existing software accommodates many (sophisticated) analyses, excep-tions mean something unusual must be considered Expert training, experience, and perhaps an ability to simulate data are necessary in such circumstances Expert advice is also necessary when risks of research extend beyond the mini-mal threshold

The upshot is that clinical researchers need to minimally know what sort of sample size calculation they need and, at most, what related information should be entered

into existing software Legitimate and accurate interpretation of output is then

par-amount, as it should be Concepts matter most here, and are what seem to be retained anyway.2

Accordingly, this chapter aims to arm clinical researchers with the necessary conceptual and practical tools (1) to understand what sample size or power analysis

is, (2) to conduct such analyses for basic low-risk studies, and (3) to recognize when it is necessary to seek expert advice and input I hope it is obvious that this chapter aims to serve as a general guide to the issues; specific details and mathe-matical presentations may be found in the cited literature Additionally, it should be obvious that this discussion of statistical power is focused, appropriately, on quan-titative investigations into real or hypothetical effects of treatments or interventions

I do not address qualitative study designs The ultimate goal here is to help ing clinical researcher get started with power analyses Alternative approaches to

practic-inference and ‘statistical power’ continue to evolve and merit careful consideration

if not adoption, but such a discussion is far beyond the simple goals here; see.3,4

Fundamental Concepts

Inference

Confusion about statistical power often begins with a misunderstanding about the point of conducting research In order to appreciate the issues involved in a power calculation, one must appreciate that the goal of research is to draw credible infer-ences about a phenomena under study Of course, drawing credible inferences is difficult because of the many errors and complications that can cloud or confuse our understanding Note that, ultimately, power calculations aim to clarify and quantify some of these potential errors

To make issues concrete, consider patient A with systolic pressure of 140 mm

Hg and, patient B, with a reading of 120 mm Hg Obviously, the difference between

Trang 14

So, what is the issue? Well, as any clinician knows either or both the sure measures could (probably do!) incorporate error Perhaps the cuff was incor-rectly applied or the clinician misread the sphygmomanometer Or perhaps the patient suffers white-coat hypertension making the office-visit measure different from the patient’s ‘true’ measure Any number of measurement errors can be at work making the calculation of the observed difference between patients an error-prone measure of the true difference, symbolized by ∆, the uppercase Greek-letter

blood-pres-‘D’, for True or philosophically perfect difference

It follows that what we actually measure is a mere estimate of the thing we are

trying to measure, the True or parameter value We measure blood-pressures in both patients and calculate a difference, 20, but no clinician will believe that the true or real difference in pressures between these two individuals is precisely 20 now or for all time Instead, most would agree that the quantity 20 is an estimate of the true difference, which we may believe is 20, plus or minus 5 mm Hg, or whatever And that this difference changes over time if not place

This point about the observed difference of 20 being an estimate for the true ference is key One takes measures, but appreciates that imprecision is the rule How

dif-can we gauge the degree of measurement error in our estimate of d = 20 → ∆?One way is to take each patient’s blood-pressures (BP) multiple times and, say, average them It may turn out that patient A’s BP was measured as 140, 132, 151,

141 mm Hg, and patient B might have measures 120, 121, 123, 119, 117 The age of patient A’s four measurements is, obviously, 141 mm Hg, while patient B’s five measurements yield an average of 120 mm Hg If we use these presumably more accurate average BPs, we now have this

Trang 15

15 Statistical Power and Sample Size 265

blood pressures, which is more important: claiming there is no difference when in fact there is one, or claiming there is a difference when in fact there is not one? It

is questions like these that motivate our discussion of statistical power

The basic goal of a ‘power analysis’ is to appreciate approximately how many subjects are needed to detect a meaningful difference between two or more experi-

mental groups In other words, the goal of power analysis is to consider natural occurring variance of the outcome variable, errors in measurement, and the impact

of making certain kinds of inferential errors (e.g., claiming a difference when in truth the two persons or groups are identical) Statistical power calculations are about inference, or making (scientific) leaps of faith from real-world observations

to statements about the underlying truth

Notice above, that I wrote ‘approximately.’ This is neither a mistake nor a subtle nuance Power calculations are useful to determine if a study needs 50 or 100 sub-jects; the calculations are not useful in determining whether a study needs 50 or 52 subjects The reason is that power calculations are loaded with assumptions, too often hidden, about distributions, measurement error, statistical relationships and perfectly executed study designs As mentioned above, it is rare for such perfection

to exist in the real world Believing a given power analysis is capable of ating the utility of a proposed study within a degree of handful of study subjects is

differenti-an exercise in denial differenti-and is sure to inhibit scientific progress

I also wrote that power was concerned with differences between ‘two groups.’

Of course study designs with more groups are possible and perhaps even desirable But power calculations are best done by keeping comparisons simple, as when only two groups are involved Furthermore, this discussion centers on elementary prin-ciples and so simplicity is paramount

The other important word is ‘meaningful’ It must be understood that power

cal-culations offer nothing by way of meaning; manipulation of arbitrary quantities

through some algebraic exercise is a meaningless activity The meaningfulness of a given power calculation can only come from scientific/clinical expertise To be concrete, while some may believe a difference of, say, 3 mm Hg of systolic blood pressure between groups is important enough to act on, others may say such a dif-

ference is not meaningful even if it is an accurate measure of difference The proper

attribution of meaningfulness, or perhaps importance or utility, requires tistical knowledge Clinical expertise is paramount

extra-sta-Standard Errors

A fundamental component of statistical inference is the idea of ‘standard error.’ As

an idea, a standard error can be thought of as the standard deviation of a test

statis-tic in the sampling distribution You may be asking, what does this mean?

Essentially, our simplified approach to inference is one of replicating a given study over and over again This replication is not actually done, but is instead a though experiment, or theory that motivates inference The key is to appreciate that

Trang 16

266 J.M Oakes

for each hypothetical and otherwise identical study we observe a treatment effect

or some other outcome measure Because of natural variation and such, for some studies the test statistic is small/low, for others, large/high Hypothetically, the test statistic is distributed in a bell-shaped curve, with one point/measure for each hypo-

thetical study This distribution is called the sampling distribution The standard

deviation (or spread) of this sampling distribution is the standard error of the test statistic The smaller the standard deviation, the smaller the standard error

We calculate standard errors in several ways depending on the study design and

the chosen test statistics Standard error formulas for common analytic estimators (i.e., tests) are shown in Fig 15.1 Notice the key elements of each standard error formula are the variance of the outcome measure, s 2, and sample size, n.

Researchers must have a sound estimate of the outcome measure variance at ning Reliance on existing literature and expertise is a must Alternative approaches are discussed by Browne.5

plan-Since smaller standard errors are usually preferred (as they imply a more precise test statistic), one is encouraged to use quality measurement tools and/or larger sample sizes

Hypotheses

A fundamental idea is that of the ‘hypothesis’ or ‘testable conjecture.’ The term

‘hypothesis’ may be used synonymously with ‘theory’ A necessary idea here is

that the researcher has a reasoned and a priori guess or conjecture about the outcome

Trang 17

15 Statistical Power and Sample Size 267

of their analysis or experiment The a priori (or in advance) aspect is critical since

power is done in the planning stage of a study

For purposes here, hypotheses may be of just two types: the null and the

tive The null hypothesis is, oddly, what is not expected from the study The

alterna-tive hypothesis is what is expected given one’s theory This odd reversal of terms

or logic may be a little tricky at first but everyone gets used to it Regardless, the key idea is that researchers marshal information and evidence from their study to

either confirm or disconfirm (essentially reject) their a priori null hypothesis For

us, a study is planned to test a theory by setting forth a null and alternative esis and evaluating data/results accordingly Researchers will generally be glad to observe outcomes that refute null hypotheses

hypoth-Several broad kinds of hypotheses are important for clinical researchers but two merit special attention:

1 Equality of groups – The null hypothesis is that the, say, mean in the treatment group is strictly equal to the mean in the control group; symbolically m T = m C,wherem I represents the mean of the treatment group and m C represents the mean

of the control group The analysis conducted aims to see if the treatment is strictly different from control; symbolically m T ≠ m C As can be imagined, this strict equality or difference hypothesis is not much use in the real world

2 Equivalence of groups – In contrast to the equality designs, equivalence designs

do not consider just any difference to be important, even if statistically cant! Instead, equivalence studies require that the identified difference be clini-

signifi-cally meaningful, above some pre-defined value, d The null hypothesis in

equivalence studies is that the (absolute value of) the difference between ment and control groups be larger than some meaningful value; symbolically,

treat-|m Tm C | ≥ d The alternative hypothesis is then that the observed difference is smaller than the predefined threshold value d, or in symbols | m Tm C | < d If the observed is less than d, then two ‘treatments’ are viewed as equivalent, though

this does not mean strictly equal

Finally, it is worth pointing out that authors typically abbreviate the term null hypothesis with H0 and the alternative hypothesis with HA

Type I and Type II Error

When it comes to elementary inference, it is useful to define two kinds of errors Using loose terms, we may call them errors of commission and omission, with respect to stated hypotheses

Errors of commission are those of inferring a relationship between study bles when in fact there is not one In other words, errors of commission are rejecting

varia-a null hypothesis (no relvaria-ationship) when in fvaria-act it should hvaria-ave been varia-accepted it In other words, you have done something you should not have

Errors of omission are those of not inferring a relationship between study bles when in fact there is a relationship In other words, not rejecting a null in favor

Trang 18

of the Greek alphabet).

Both Type I and Type II errors are quantified as probabilities The probability of incorrectly rejecting a true null hypothesis – or accepting that there is a relationship when in fact there is not – is a (ranging from 0 to 1) So, Type I error may be 0.01,

0.05 or any other such value The same goes for Type II error

For better or worse, by convention researchers typically plan studies with an Type I error rate of 0.05, or 5%, and a Type II error rate of 0.20 (20%) or less Notice this implies that making an error of commission (5% alpha or Type I error)

is four times more worrisome than making an error of omission (20% beta or Type

II error) By convention, we tolerate less Type I error than Type II error Essentially, this relationship reflects the conservative stance of science: scientists should accept the null (no relationship) unless there is strong evidence to reject it and accept the alternative hypothesis That is the scientific method

Statistical Power

We can now define statistical power Technically, power is the complement of the Type II error (i.e., the difference between 1 and the amount of Type II error in the study) A simple definitional equation is,

Power = 1-b

Mother Nature or True State of Null Hypothesis Researcher’s

Reject H0 Type I error

Ngày đăng: 14/08/2014, 11:20