GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences Nucleic Acids Research , 2011 Jan;39Database issue:D32-7... BLAST uses an
Trang 1All nucleotides contain three components:
1 A nitrogen heterocyclic base
Trang 2Ribonucleotides have a 2’- OH
Deoxyribonucleotides have a 2’- H
Chemical Structure of DNA vs RNA
Trang 3Structure of Nucleotide Bases
Bases are classified as Py rimidines or
Purines
Trang 4The nucleus contains the cell’s DNA (genome) RNA is synthesized in the nucleus and exported
to the cytoplasm
Nucleus
Cytoplasm
DNA RNA (mRNA)
Proteins
replication
transcription
translation
Trang 5dA dG dT dC
Deoxyribonucleotides found in DNA
Trang 6Nucleotides are
linked by
phosphodiester
bonds
Trang 7Bases form a specific hydrogen bond pattern
DNA is double stranded
Trang 8The strands of DNA are antiparallel The strands are complimentary There are Hydrogen bond forces There are base stacking interactions There are 10 base pairs per turn
Properties of a DNA
double helix
Trang 9DNA is a Double-Helix
Trang 10RNase P M1 RNA
Transcription of a DNA
molecule results in a mRNA
molecule that is
single-s tranded
RNA molecules do not have a
regular structure like DNA.
Structures of RNA molecules
are complex and unique.
RNA molecules can base pair
with complementary DNA or
RNA sequences.
G pairs with C, A pairs with U,
and G pairs with U.
bulge
i nternal loop
hairpin
Trang 11Nucleic Acids in Acid and Base
The glycosidic bond of DNA and RNA is hydrolyzed by acids.
Order of stability: dA, dG < rA, rG < dC, dT < rC, rU
dA, dG hydrolyzed in boiling 0.1 M hydrochloric acid in 30 min
rA, rG hydrolyzed in boiling 1 M hydrochloric acid in 60 min
rC, rU hydrolyzed in boiling 12 M perchloric acid in 60 min
DNA is quite stable under basic conditions.
RNA is readily hydrolyzed by base.
Trang 12RNA is hydrolyzed under alkaline (basic) conditions
Trang 13Methylation of Nucleotide bases
Certain nucleotide bases in DNA molecules are methylated, catalyzed by enzymes.Adenine and Cytosine are methylated more often than Guanine and Thymine
Methylation is confined to specific regions of DNA and aid in biological processes
E coli DNA is methylated to distinguish its DNA from that of foreign invaders.
In eukaryotic cells about 5% of cytidines are methylated, producing 5-methylcytidine
Trang 14Spontaneous Alterations in Nucleic Acids
In a human cell, DNA undergoes spontaneous alterations in structure (mutations)
As a cell ages, the number of mutations increases, making it likely that a cell’s
normal processes may be altered
There is a link between spontaneous mutation, aging, and carcinogenesis
Depurination
Trang 16If DNA contined uracil, during replication of DNA the
uracils would be base-paired with adenine.
Deaminated cytosines would also be base-paired with adenine This would decrease the number of G-C base pairs over time
and increase the number of A-U base pairs.
Eventually all the G-C base pairs could be lost.
The genetic code would not exist as we know it.
Why does DNA contain thymine and not uracil?
Trang 17Ultraviolet light is damaging to DNA
Near-UV radiation (wavelengths of
200 – 400 nm) is a significant
portion of the solar spectrum
Upon exposure to ultraviolet
radiation, two adjacent pyrimidine
bases can dimerize
This happens most often between
two adjacent thymines
Two products often form:
cyclobutane thymine dimer6-4 photoproduct
Trang 18Nucleic Acids Where are they found in nature?
and What do they look like?
Trang 19Source of DNA Size (bases) Type
Escherichia coli 9,200,000 Closed-circular double-stranded DNA
Bacillus subtilis 4,200,000 Closed-circular double-stranded DNA
F plasmid 95,000 Closed-circular double-stranded DNA
λ phage 48,500 Linear double-stranded DNA
T7 phage 40,000 Linear double-stranded DNA
M13 phage 6,400 Closed-circular single-stranded DNAMS2 phage 3,600 Linear single-stranded RNA
Human 6,000,000,000 Linear double-stranded DNA
Fruit fly 270,000,000 Linear double-stranded DNA
HIV 9,700 Linear single-stranded RNA
Trang 20DNA molecules are packaged in the cell as structures called chromosomes.Bacteria have a single chromosome Eukaryotes have multiple chromosomes.
A single chromosome contains thousands of genes, each encoding a protein.All of an organism’s chromosomes make up the genome
Humans have 46 chromosomes
The human genome has about 3 billion nucleotide base pairs
Trang 21The Human Genome
http://www.ncbi.nlm.nih.gov/genome/guide/human/
Trang 23How is DNA packaged into a cell?
E coli has a single double-stranded
DNA molecule as its genome.
There are 4,639,221 base pairs
in the E coli genome.
The DNA is 1.7 mM long,
850 times the length of an E coli cell.
Trang 24plasmid
Trang 25Large DNA molecules are compacted in a cell
by supercoiling.
relaxed supercoiled
Trang 26DNA in eukaryotic cells is packaged into nucleosomes,
which contain proteins called histones.
DNA wrapped around a
histone core (side view)
Trang 27Nucleosomes are packaged to form 30 nm fibers
Trang 28Compaction of 30 nm fibers uses nuclear
scaffolds
Trang 33Telomeres are sequences at the end of eukaryotic
chromosomes that help stabilize the chromosome.
Telomeres are repeats of the following sequence:
5’-(TxGy)n x and y = 1 to 4 3’-(AxCy)n The TG strand is longer
5’-TTTGGTTTGGTTTGGTTTGGTTTGGTTTGG… 3’-AAACCAAACCAAACC…
Can be >10,000 nucleotides in mammals.
The ends of the chromosome are replicated by
the enzyme telomerase
Trang 34Telomeres and aging
There appears to be a relationship between the length of
telomeres at the end of chromosomes and the age of
an individual.
The older you are, the shorter your telomeres are.
Germ-line cells (reproductive cells) contain telomerase activity Non-germ-line cells (somatic cells) do not contain telomerase
activity.
We have a certain length of telomeres that we are born with.
As we age, the telomeres get shorter.
Is our life-span pre-determined by the length of our telomeres?
Trang 35Internet Resources
Nucleic Acids
National Center for Biotechnology Information (NCBI)
National Library of Medicine (NLM) National Institutes of Health (NIH)
http://www.ncbi.nlm.nih.gov/
Trang 37GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences ( Nucleic Acids Research , 2011 Jan;39(Database issue):D32-7 ) There are
approximately 126,551,501,141 bases in 135,440,924 sequence records in the traditional GenBank divisions and 191,401,393,188 bases
in 62,715,288 sequence records in the WGS division as of April 2011
GenBank
Trang 38BLAST SEARCH
What is BLAST?
BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA The scores assigned in a BLAST search have a well-defined
statistical interpretation, making real matches easier to distinguish from random background hits BLAST uses an algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share
only isolated regions of similarity
The core of NCBI 's BLAST services is BLAST 2.0 otherwise known as "Gapped BLAST" This service is designed to take protein and nucleic acid sequences and
compare them against a selection of NCBI databases
Instead of relying on global alignments (commonly seen in multiple sequence alignment programs) BLAST emphasizes regions of local alignment to detect relationships among sequences which share only isolated regions of similarity Therefore, BLAST is more than a tool to view sequences aligned with each other or to
calculate percent homology, but a program to locate regions of sequence similarity
with a view to comparing structure and function
Trang 39Below is a table of these programs
products of an unknown nucleotide sequence.
tblastn Compares a protein query sequence against a nucleotide sequence database
dynamically translated in all reading frames.
tblastx
Compares the frame translations of a nucleotide query sequence against the frame translations of a nucleotide sequence database Please note that the tblastx program cannot be used with the nr database on the BLAST Web page because it is
six-computationally intensive.
The BLAST search pages allow you to select from several different programs
Trang 40Database Description
nr All non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or HTGS sequences) month All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days.
dbest Non-redundant database of GenBank+EMBL+DDBJ EST Divisions.
dbsts Non-redundant database of GenBank+EMBL+DDBJ STS Divisions.
mouse ests The non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism mouse.
human ests The Non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism human.
other ests The non-redundant database of GenBank+EMBL+DDBJ EST Divisions all organisms except mouse and human.
yeast Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences Not a collection of all Yeast nucelotides
sequences, but the sequence fragments from the Yeast complete genome.
E coli E coli (Escherichia coli) genomic nucleotide sequences.
pdb Sequences derived from the 3-dimensional structure of proteins.
kabat [kabatnuc] Kabat's database of sequences of immunological interest
patents Nucleotide sequences derived from the Patent division of GenBank.
vector Vector subset of GenBank(R), NCBI
mito Database of mitochondrial sequences
alu Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences It is available at epd Eukaryotic Promotor Database ISREC in Epalinges s/Lausanne (Switzerland).
gss Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences htgs High Throughput Genomic Sequences.
Nucle otide Database s
Trang 41CGTGATGAACGGCTTCGAGCGATACGAGGGAGTGCGTCACTGCCGCTATGTGGACGAGTTGCA GATCGTCCAGAATGCGCCATGGACTCTGTCCGATGAATTCATCGCCGACAACAAAATCGACTT TGTGGCCCACGACGACATTCCGTATGTAACCGATGGCATGGACGACATCTATGCTCCTCTCAA GGCGCGCGGCATGTTTGTGGCCACGGAGCGCACTGAGGGTGTGTCCACCTCGGACATCGTAGC CCGGATCGTCAAGGATTACGATCTGTATGTGCGTCGTAATCTGGCCAGAGGCTATTCGGCCAA GGAACTCAATGTGTCGTTCCTGTCCGAGAAGAAGTTCCGGCTGCAGAACAA
Nucleic Acid Sequence What does it encode?
Trang 42suspect the organism from which the gene came may be harmful to the public
• 5’-CATCCAGGGAATCACCAAGCCCGCCATTCGCCGTCTGGCTCGCCG-3’
• Determine if you should shut down public access to the
lake, or if the lake is safe.
Trang 43Problem #2
• In the middle of the swimming season you re-test the lake to make sure it is safe for human use
• In a sample isolated from the lake you discover the
following nucleic acid molecule that you again believe is
part of a larger gene sequence You wonder the organism from which the gene came may be harmful to the public
• 5’-GTCGAAGCGCCACTCGAAGGAGAAGGACACGCTCGGGGGCATCAC-3’
• As before, determine if you should shut down public access
to the lake, or if the lake is safe.
Trang 44DNA sequences recognized by regulatory proteins are often inverted
repeats of a short DNA sequence These repeats form a palindrome with two-fold symmetry about a central axis.
DNA binding proteins are often dimeric, with two
identical protein subunits
Each subunit binds to one strand of the DNA.
5’-TACGGTACT GTGCTCGAGCAC TGCTGTACT-3’
3’-ATGCCATGA CACGAGCTCGTG ACGACATGA-5’
central axis
Regulatory Proteins
Trang 45Proteins often bind to specific sequences of DNA.
Example: Restriction enzyme EcoRI binds to the DNA sequence
5’-GAATTC-3’
3’-CTTAAG-5’
Protein – DNA interaction
Trang 46A variation in sizes of DNA seen after cutting with restriction enzymes.
Restriction enzymes cut DNA at a specific site For example, the EcoR1 restriction enzyme cuts DNA whenever it sees the letters GAATTC:
DNA before cutting by EcoR1:
5’-AATCTAGGGAATTCACAGCGATGCGAATTCGCAATTA-3’
3’-TTAGATCCCTTAAGTGTCGCTACGCTTAAGCGTTAAT-5’
DNA after cutting by EcoR1:
5’-AATCTAGGG AATTCACAGCGATGCG AATTCGCAATTA-3’
3’-TTAGATCCCTTAA GTGTCGCTACGCTTAA GCGTTAAT-5’
In this example, EcoR1 has cut the one strand of 37 base pairs into 3 smaller strands of DNA If another person has slightly different DNA, EcoR1 may cut the DNA into
pieces of different lengths (For example: If the second GAATTC is GAATTT, EcoR1 will cut this other person's DNA in only one place, producing 2 smaller strands of
DNA.)
The words "fragment length polymorphism" mean "DNA pieces of different lengths." RFLPs are a quick way to see if two pieces of DNA are identical, without having to look at the entire DNA sequence
Restriction Fragment Length Polymorphism (RFLP)
Trang 47IS6110 Fingerprints of M tuberculosis
Trang 48Each person has a unique set of fingerprints As with a person’s fingerprint no two individuals share the same genetic makeup This genetic makeup, which is the hereditary
blueprint imparted to us by our parents, is stored in the chemical deoxyribonucleic acid (DNA), the basic molecule of life Examination of DNA from individuals, other than identical twins, has shown that variations exist and that a specific DNA pattern or profile could be associated with an individual These DNA profiles have revolutionized criminal investigations and have become powerful tools in the identification of individuals in criminal and paternity cases
The first widespread use of DNA tests involved RFLP (restriction fragment length polymorphism) analysis, a test designed to detect variations in the DNA from different
individuals In the RFLP method, DNA is isolated from a biological specimen (e.g., blood,
semen, vaginal swabs) and cut by an enzyme into restriction fragments The DNA fragments are separated by size into discrete bands in a gel (gel electrophoresis), transferred onto a membrane, and identified using probes (known DNA sequences that are "tagged" with a chemical tracer) The resulting DNA profile is visualized by exposing the membrane to a piece of x-ray film which allows the scientist to determine which specific fragments the probe identified among the
thousands in a sample of human DNA A "match" is made when similar DNA profiles are
observed between an evidentiary sample and those from a suspect’s DNA A determination is then made as to the probability that a person selected at random from a given population would match the evidence sample as well as the suspect The entire analysis may require from 6 to 10 weeks for completion.
DNA Profiling
Trang 49Technique, also known as DNA fingerprinting, that allows familial relationships to be established by comparing the characteristic
polymorphic patterns that are obtained when certain regions of genomic DNA are amplified (typically by PCR) and cut with certain restriction enzymes In principle, an individual can be identified unambiguously by
RFLP (hence the use of RFLP in forensic analysis of blood, hair or semen) Similarly, if a polymorphism can be identified close to the locus
of a genetic defect, it provides a valuable marker for tracing the
inheritance of the defect
restriction fragment length
polymorphism (= RFLP)
Trang 50Parentage Testing
The matching process for identifying DNA profile patterns which either
"exclude" or "include" a person as being the parent of a child is shown in the figure below In this instance man 1 is excluded from paternity and
man 2 is included as a possible father of the child