Steps used to validate a new molecular strain typing test for epidemiologic investigation of infectious diseases Assess whether or not the Demonstrate that the typing information generat
Trang 1National Institute of Infectious Disease
Trang 3Validity of a new test:
• Sensitivity
• Specificity
Validity: ability of a test to correctly
predict or identify those who truly
have the characteristic the test is
trying to detect, and exclude those
who do not have the characteristic.
• In molecular epidemiology, a new test is validated by its ability to discriminate strains
that are epidemiologically related from
those that are not
Validity is determined by
comparison of the observed
results to a reference standard,
“truth”, “gold standard”.
In molecular epidemiology, validity is determined empirically
Trang 4Steps used to validate a new molecular strain typing test for
epidemiologic investigation of infectious diseases
Assess whether or not the
Demonstrate that the typing information generated by the test is indistinguishable for all isolates from persons with disease in a recognized outbreak.
Select appropriate comparison isolates (geographic and temporal controls), and show that the typing information from these isolates is distinct from that of the outbreak isolates.
Trang 5Steps used to validate a new test in molecular epidemiology—
cont.
• Ascertain fidelity of the typing
information used
• temporal stability of the taxonomic unit;
• clonality of the isolates obtained from a single host)
• Perform new analysis in
Trang 6Steps used to validate a new test in molecular epidemiology—cont
If outbreak occurrence is uncertain:
Show that isolates that do not belong
to the clonal group
do not have the same epidemiologic association.
Ascertain fidelity of the typing
information.
If possible, eliminate the identified or putative risk factor and evaluate if this will lead to control
or amelioration of the problem.
Trang 7For molecular epidemiology, a strain typing test that cannot yield any epidemiologically useful or meaningful information, no matter how simple, discriminating, or taxonomically relevant, is not valid!
Final test of validity of a molecular typing technique:
Trang 8Lecture 4, Part 2:
Analysis of similarity and relatedness
Principles of Molecular Epidemiology
National Institute of Infectious Disease
January 16, 2017
Trang 9 Describe cladistic vs phenetic methods of classifying microbes.
Understand the appropriate applications of similarity coefficient calculations
to analyze patterns generated from strain-typing methods
Describe different ways to measure reliability of the relationships portrayed
by a dendrogram
Name different analytical tools needed to conduct molecular epidemiologic investigations
Trang 10What can you do with this pattern?
Trang 11All molecular techniques used to type organisms can be divided
into 3 general methods:
1) Direct comparison of nucleotide sequences.
2) Gel electrophoretic fingerprinting methods (e.g., REA, RFLP/Southern
blot hybridization, PFGE, and some PCR-generated patterns)
3) Hybridization matrix patterns (e.g., spoligotyping)
Trang 12Questions about strain relatedness that arise in epidemiologic investigations
Determining relatedness between two or more strains isolated from seemingly unrelated infected persons or contaminated sources
Distinguishing strain typing data on the basis of variations within a range of such data (e.g., the number of bands in electrophoretic
patterns, or nucleotide substitutions in DNA sequences)
Trang 13Questions about strain relatedness that arise in epidemiologic
Identifying hidden groupings in a large collection of
strain typing data
Selecting criteria for assigning a new pattern or
sequence into existing sets of strain typing data
Trang 15Types of errors:
• Molecular epidemiology: chance of erroneously rejecting a subtyping assignment that concludes there is no epidemiologic relationship.
Type 1 probability error:
The chance of erroneously
rejecting a null hypothesis
that is in fact true
• Molecular epidemiology: chance of failing
to reject a subtyping assignment that concludes there is no epidemiologic relationship
Type 2 probability error:
The chance of erroneously
failing to reject a null
hypothesis that is indeed false
Trang 16Similarity (or difference) analysis of patterns (electrophoretic, hybridization matrix) and
sequences all take into consideration methods that minimize these probability errors.
Trang 17• Similarity or difference (distance) is compared between OTUs
Phenetic
methods
(numerical
taxonomy)
Trang 18Phenetic methods of classification used in epidemiology
Methods that measure similarity or difference (dissimilarity, distance) between individual OTUs.
Clustering methods based on similarity (distance) index that identify patterns among a collection of OTUs
Trang 19Epidemiologic applicability of the phenetic classification
methods
For epidemiologic applications, the “correctness” of the OTU
assignment is ultimately based on how well this assignment explains and solves the epidemiologic problem posed
The advantage of the phenetic methods applied to epidemiology is that for a given pathogen, the predictability of a classification scheme can be maximized and validated empirically through multiple
epidemiologic studies.
Trang 20Similarity coefficient calculation:
Trang 21Simple matching index:
Trang 23Lane position
1 2 3 4 5 6 7 8 9 10
OTU-A OTU-B cell
Trang 24Similarity calculations based on different indices for the above example
a = 5 (band present in both OTUs)
b = 1 (band present only in OTU-B)
c = 2 (band present only in OTU-A)
d = 2 (band absent from both OTUs)
(1.0 = identical)
Simple matching index: S = 0.70
Sokal and Sneath’s index: Sss = 0.82
Jaccard index: SJ = 0.62
Dice index or coefficient: SD = 0.77
Trang 25Multiple OTUs
Trang 28Which similarity coefficient to use?
Trang 30Cluster analysis
Trang 31Comparing relatedness among multiple strains (Cluster analysis)
Nearest neighbor (also called single linkage clustering)
Farthest neighbor (also called complete linkage
clustering)
averages (UPGMA) cluster analysis
Ordination analysis : non-hierarchical cluster analysis
method (e.g., principal component analysis or PCA)
Trang 32Algorithms for electrophoretic banding pattern analysis
•Jaccard or Dice coefficients
Comparison of band
positions (binary
character state)
•Pearson product-moment correlation coefficient
Comparison of width
or intensity of bands
(continuous character
state; curve-based)
Trang 33Dice (Opt:2.00%) (Tol 2.0%-2.0%) (H>0.0% S>0.0%) [0.0%-100.0%]
G4 G7 G8 G11 G11 G4 G4 G6 G7 G9
8 8 8 9 10 10 10 10 10 10
D6 D11 D12 D15 D14 D6 D6 D8 D9 D13
A A A A A A A2 A A A
Dendrogram (phenogram) constructed from IS6110 RFLP analysis of
M tuberculosis isolates from Sao Paulo, Brazil (Ferrazoli et al): Dice
coefficient
Trang 34Pearson correlation (Opt:2.00%) [0.0%-100.0%]
A A A A A A2 A A A A
Dendrogram (phenogram) constructed from IS6110 RFLP analysis of M
tuberculosis isolates from Sao Paulo, Brazil (Ferrazoli et al): Pearson
coefficient
Trang 35Assessing reliability of relatedness measures
Phenetic methods depict mathematical relationships that attempt to predict biological relationships (evolutionary or epidemiologic).
Trang 36Evolutionary vs epidemiologic relatedness
Based on a model or consensus definition
Based on evolutionary relatedness data
Based on empirically-validated data
Trang 37Phylogenetic tree based on 1278 core genes of 186 E coli strains (Kaas et al, 2012)
Trang 38Phylogenetic tree of E coli O157:H7 by their core
genes (Kaas RS et al, 2012)
Trang 42“What we observe is not nature itself, but nature
exposed to our method of questioning.”
Trang 43Assessing reliability of relatedness measures—cont.
Their “true” relationship needs to be empirically
determined In some situations, this can be done (e.g., outbreaks) If this cannot be done (e.g., relationship of multiple nucleic acid sequences—alignments), the data points need to be examined for their reliability by a
stochastic method.
Trang 44Measures of reliability of data points
Resampling methods:
Data points from the original data set containing n data points
are randomly and repeatedly sampled until new sample sets,
each containing n points are created
Trang 45Example of resampling of nucleotide sequences—bootstrapping:
OTU
A: 5 ’-atgggcgacttcatcacgatgaggtcaggaggccactatt ref
B: 5 ’-atgggctacttcttcacgatcaggtcaggaggccactatt
C: 5 ’-atcggcgacttcatcacgatgaggtgtggaggccactatt
D: 5 ’-aagggcgacttcatcaccatgaggtcaggaggccactata
E: 5 ’-atgggcgattttaccactttgaggtcaggtggccggtatt
F: 5 ’-atggcttgctttataacgattaggtgagaaggccactatt
G: 5 ’-cagggcgacttcatcttagcctggtcagcaggccacgatt
Trang 47Dendrogram generated from the original data set
Trang 49resampled
Trang 50Resampling methods—cont.
Degree of deviation from the original tree among the pseudosamples measures the reliability of the original tree If there is no deviation, then the original tree can be said to be unaffected by any stochastic effects.
Trang 51Typical software requirements in a laboratory for database analysis involving a molecular epidemiology project
Study design methods Software
power, sample size calculation EpiInfo
questionnaire design EpiInfo
data entry, storage, line listing EpiInfo, Access., Excel, etc.
data analysis EpiInfo
Advanced statistical methods EpiInfo, STATA, SAS, SPSS, R
Capturing and storing pattern images tiff, jpeg, gif, etc
Image normalization, similarity/distance
and cluster analysis, storage, tree generation GelCompar, Molecular Analyst
Sequence alignment ClustalX
Trang 52Command-line search programs used to analyze
high-throughput sequences
Nucleic acid sequence
assembly to create contigs
Comparison Tool (ACT)
BRIG (reference comparison)
(phage genes)