03/10/15 Đào thị minh an, MD PhD- Khoa y tế công cộng - Đại học y Hà Nội Case-control studies ◆ Selection of cases – Case definition is very important – All cases have an equal probabili
Trang 1§o l êng kÕt hîp
Dao Thi Minh An, MD PhD
Trang 203/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Trang 303/10/15 Đào thị minh an, MD
PhD- Khoa y tế cụng cộng - Đại học y Hà Nội
(Nghiên cứu Thuần tập)
Nguy co tuong doi (RR)= CIe/CI0 = a/(a + b) : c/(c + d)
Trong đó: CIe: Số mới mắc tích luỹ ở nhóm có phơi nhiễm
CI0: Số mới mắc tích luỹ ở nhóm không phơi nhiễm
(Nghiên cứu Bệnh-Chứng)
Tỷ suất chênh (Odd Ratio) OR= ad/bc
Trang 403/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
◆ RR = CIe/CI0 = a/(a + b) : c/(c
+ d) (Nghiªn cøu ThuÇn tËp)
◆ CIe: Sè míi m¾c tÝch luü ë
Trang 503/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
◆ Nguy c¬ quy thuéc (Attributable Risk)
◆ AR = Ie - Io = CIe - CIo
◆ AR = CIe - CIo = a/(a+b) - c/(c+d)
◆ Nguy c¬ quy thuéc phÇn tr¨m (AR%)
◆ AR% = AR/Ie x 100 = (Ie - Io)/Ie x 100
◆ AR% = AR/Ie x 100
◆ Nguy c¬ quy thuéc quÇn thÓ (PAR)
◆ PAR =IT -Io
◆ hay PAR=(AR)(Pe)
◆ IT: Tû lÖ bÖnh cña quÇn thÓ
◆ IO: Tû lÖ m¾c bÖnh ë nhãm kh«ng ph¬i nhiÔm
◆ Pe: Tû lÖ nh÷ng c¸ thÓ cã ph¬i nhiÔm trong quÇn thÓ
◆ Nguy c¬ quy thuéc quÇn thÓ phÇn tr¨m (PAR%)
◆ PAR% = PAR :IT x100
Trang 603/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Prevalence
◆ Prevalence (a proportion):
– the proportion of the population at a given time that
have the factor of interest
– Prevalence of an exposure
» what proportion of this class have BMI > 25
» what proportion of this class have hypertension
◆ Point Prevalence - existing cases at a point in time
◆ Period Prevalence - existing cases plus those
developing over a specified period of time
Trang 703/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
in Number
cases prevailing
of Number
P =
Trang 803/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Prevalence
Choice of denominator may be difficult
◆ in 1997 there were 1854 cases of syphilis in Harris County
◆ what should be used for the denominator?
◆ 55 cases of a new disease reported in three states
◆ what should be used for the denominator?
Trang 903/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Incidence
◆ Incidence density: the probability (risk) of an individual
developing the disease (outcome) during a specific period of time, using total person-time as the denominator One
subject followed one year contributes one person-year (PY)
time -
person Total
period given time
a during disease
of cases Number
=
density
I
Trang 1003/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
ofstart the
atrisk
at Population
periodstudy
aoverdisease
ofcasesnew
Number
=
I
Trang 1103/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Incidence, Prevalence
What was prevalence of disease in 1992?
What is risk of developing disease within 2 years?
Trang 1203/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Trang 1303/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Measure of Disease Association
Trang 1403/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Case Reports and Medical Advancement
These all started with case reports - what study design next?
Trang 1503/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
What next?
– create case definition
– active case finding
Trang 1603/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
The case-control study
◆ Retrospective study design
Trang 1703/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Retrospective vs prospective study designs
d c
b a
Present(Cases)
Absent(Controls)
Disease
Retrospective(Case-control)
Present(Exposed)
Absent(Not Exposed)
Risk Factor
Prospective
(Cohort)
Trang 1803/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Case control studies
d c
b a
Disease Status
YesNo
Trang 1903/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Case-control studies
◆ Selection of cases
– Case definition is very important
– All cases have an equal probability for
selection: reduce selection bias
◆ Selection of controls
– Identical in every respect except disease of
interest
Trang 2003/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Case control studies
◆ More easily replicated
◆ Can test hypotheses
Weaknesses
◆ Uncertainty is disease time relationship
exposure-◆ Representativeness of cases or controls
Trang 2103/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Case-control and the Odds Ratio
c+da+b
a+c
dc
Nb+d
YN
Disease Exposure
Odds of exposure if case = [a / (a+c)] / [c / (a+c)] = a/c
Odds of exposure if control = [b / (b+d)] / [d /(b+d)] = b/d
Odds exposure given disease = (a/c)/(b/d) = (a*d)/(c*b)
How much risk
is too much risk?
Trang 2203/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Case-control and the Odds Ratio
YN
Cases Controls Exposed 70 30 100 RR = 70/100 ÷ 30/100 = 2.3 Not Exposed 30 70 100 OR = 70/30 ÷ 30/70 = 5.4
100 100 200
Cases Controls Exposed 70 300 370 RR = 70/100 ÷ 300/1000 = 4.6 Not Exposed 30 700 730 OR = 70/30 ÷ 30/70 = 5.4
100 1000 1100
Trang 2303/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
TSS - 3 case-control studies
Cases Controls Tampon users 50 43 reported p=0.02
Non-users 0 7
Cases Controls Tampon users 30 71 reported p=0.014
Non-users 1 22
Cases Controls Tampon users 12 32 p = 0.20
(12/0) / (32/8) =
6.5
NOTE: A correction factor of 0.5 was added to each cell when 1 cell contained 0
Trang 2403/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Study Methods
◆ CDC - 1: 52 TSS cases with age-matched
acquaintance controls
◆ Wisconsin Study: 31 cases, 93 controls from
gynecologic clinics, matched only for menstruation
◆ Utah: 12 TSS cases, 40 neighborhood-matched
controls
Trang 2503/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Matched Pair analysis
How many cases used tampons continually?
How many cases did not use tampons continually?
What about controls?
Trang 2603/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
How Big is Big?
– Is an OR of 16 big?
– Is an OR of 16 statistically significant?
Trang 2703/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
BRIEF INTERLUDE - STATISTICS
about inference and statistical association
Trang 2803/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
How Big is Big?
– Is an OR of 16 big?
– Is an OR of 16 statistically significant?
Outcome Expected
of Variance
Outcome Expected
Outcome
Trang 2903/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
◆ The whole purpose for doing research is to learn
something new.
– The result of a research project is the goal
» this is the important information that the researchers want the informed public to remember
– As we read the literature - we should ask ourselves:
» What is the major result?
» What does this result mean?
Trang 3003/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ We have to remember that epidemiologic studies draw inferences about the experiences of an entire population based on an evaluation of only a sample.
Trang 3103/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
◆ When studying a sample of the population the observed associations can be due to:
Trang 3203/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ What do we mean by chance and how does this relate to determining a
“true association”
◆ Where do we start?
Trang 3303/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ Association does not mean cause and effect
◆ Assessing causality involves judgement based on
the totality of evidence
◆ Making judgements about causality involves a chain
of logic that addresses two major areas:
1 Whether the observed association is valid
2 Whether the totality of evidence supports a
judgement of causality
Trang 3403/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ The evaluation of the role of chance is done
in 2 steps
1 Estimate the magnitude of the association
– We do this with OR, RR, correlations, AR
2 Hypothesis testing:
– Calculate a test statistic, obtain a p value or
confidence interval
Trang 3503/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ p-value: the probability of obtaining a sample showing
an association of the observed size or larger by chance alone under the hypothesis that no association exists
◆ Confidence interval: a range of values that one can say,
with a specific degree of confidence, contains the true population value.
◆ Sample statistic: a number which describes some aspect
of a sample which represents a population.
Trang 3603/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ This can be done by calculating a test statistic of the general format:
◆ The selection of the particular test used depends on the specific hypothesis being tested and characteristics of the collected data.
Outcome Expected
of Variance
Outcome Expected
Outcome
Trang 3703/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ If we were to toss a coin 30 times while
trying to determine if it was a fair coin, and
we got 24 heads, how would we determine if
24 was different that the expected number?
Observed - Expected (under the null) Estimated variability in the sample
Trang 3803/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
Observed - Expected (under the null) Estimated variability in the sample
24/30 - 15/30
? Variability
Variability = [p(1-p)/n]1/2 = (24/30*6/30)/301/2 = 0.07 [ (24/30) - (15/30) ] /0.07 = 4.3
p = <0.001
Trang 3903/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ The p value indicates the possibility that findings at least as extreme as those observed were unlikely to have occurred by chance alone.
◆ In 1000 experiments with 30 tosses with a fair coin -
we would expect only 1 to result in 24 heads or more
Trang 4003/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ A statistically significant finding does not mean that the results DID NOT occur by chance - only that it is unlikely that they occurred by chance
◆ A non-significant finding does not mean that the
results DID occur by chance.
Trang 4103/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ More often in epidemiology we are examining
discrete data - the 2 x 2 table presents discrete data Here we are testing whether the distribution of counts
in the 4 cells is different than expected under the null hypothesis
Trang 4203/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ But how do we determine the expected value for the cells of a 2 x 2 table?
O = Observed Count in a category
E = Expected Count in a category
å = Sum of all categories
Trang 4303/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ All tests of statistical significance lead to a
– probability statement
– usually expressed as a p value
◆ The p-value obtained is based on the principle that, given the distribution of interest, it is possible to
calculate the exact probability or likelihood of
obtaining a result at least as extreme as that
observed by chance alone assuming there is truly no association.
Trang 4403/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ A probability of 0.05 is the usual (arbitrary) cut-off
level for statistical significance
◆ If p <0.05, we conclude that chance is an unlikely
explanation for the finding The null hypothesis is rejected, and the statistical association is said to be significant.
◆ If p >0.05, we conclude that chance cannot be
excluded as an explanation for the finding; we fail to reject the null hypothesis.
Trang 4503/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ No p value
– however small - completely excludes chance
◆ No p value
– however large - completely mandates chance
◆ p values only evaluate the role of chance
– they say nothing about other alternative explanations or about causality
◆ p values reflect the strength of the association and the study sample size
Trang 4603/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ A small difference may achieve statistical
significance if the sample size is large
◆ A large difference may not achieve statistical
significance if the sample size is too small
Trang 4703/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ We address these problems by calculating
confidence intervals (CI)
– CI indicates the range within which the true magnitude
of effect lies with a certain degree of assurance The degree of assurance is defined by the p value you
assign.
◆ The CI gives all the information of a p value PLUS the expected range of effect sizes.
Trang 4803/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ If the null value is included in a 95% confidence
interval, then the corresponding p value is, by
definition, greater than 0.05.
◆ If the null value is not included, the association is considered to be statistically significant.
◆ WHAT IS THE NULL VALUE for Odds Ratios and
Relative Risks (Rate Ratios)?
Trang 4903/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
be estimated using the chi-square test
statistic Miettinen, Am J Epidemiol 103:226-235, 1976
2
/ 96 1
1
%
Trang 5003/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
Taylor Series: to estimate the lnOR variance Woolf, Ann
±
e OR
CI
1 1 1 1 96
1
*
% 95
Note: e is a function on you calculator You need a key marked
ex and you enter the OR times e raised to the power of the results between the brackets [ ]
Trang 5103/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
Taylor Series: to estimate the lnRR variance Katz, Biometrics, 34:469, 1973
Note: e is a function on you calculator You need a key marked
ex and you enter the OR times e raised to the power of the results between the brackets [ ]
a a
b
e RR
CI
Trang 5203/10/15 Đào thị minh an, MD
PhD- Khoa y tế công cộng - Đại học y Hà Nội
Statistical Issues in Epidemiology
◆ Inference involves making a generalization about a larger group of individuals on the basis of a subset or sample
◆ The p value indicates the probability or likelihood of obtaining a result at least as extreme as that
observed in a study by chance alone, assuming that there is truly no association between the study
variables.