List of Figures 1.2 Schematic diagram of the mouse oxytocin/vasopressin locus 10 1.3 Expression patterns of oxytocin transgenes 20 1.4 Expression patterns of vasopressin transgenes 22 1
Trang 1COMPARATIVE GENETIC ANALYSIS OF THE TRANSCRIPTIONAL REGULATORY DNA OF
Patrick Gilligan
(M.Sc University of Waikato)
A THESIS SUBMITTED FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
INSTITUTE OF MOLECULAR AND CELL BIOLOGY NATIONAL UNIVERSITY OF SINGAPORE
2004
Trang 2Nothing in biology makes sense except in the light of evolution
Theodosius Dobzhansky
The American Biology Teacher, 35:125-129
Trang 3Acknowledgements
I thank:
Firstly, my supervisor Byrappa Venkatesh
Past and present members of the Marine Molecular Genomics lab, including Tay Boon Hui (who helped with cloning and other stuff), Michael Richardson, Choong Po Loong, Meng Hwee, Diane Tan, Goh Boon Young, Eugene Kroll, Hawys, Alan Christoffels, Esther Koh, variously for help and making the lab a good place to work
Roland Degenkolbe, for copious discussions, biochemical advice, style editing and general generously provided opinions
Sathivel Poniah for a great deal of excellent advice
Members of several core facilities, including: from the mouse house, Esther Wong who taught me mouse transgenesis, Nachia, Arun, Din, Florida, Jean; from histology, Yong Tan Foong, who taught me basic histology and Gou Ke who provided lots of help, Li Jia
Members of my supervisory committee: Hans-Ulrich Bernard (who kept suggesting I do some gel-shifts somewhere along the line), Wang Yue and Sydney Brenner
The Institute of Molecular and Cell Biology funded this research
Lastly I would like to thank my wife Joanna, without whom I probably wouldn’t have begun the doctorate, whose idea it was to come to Singapore in the first place, and who encouraged me through the thesis
Trang 4TABLE OF CONTENTS
Acknowledgements iii
TABLE OF CONTENTS iv
List of Figures viii
List of Tables x
Abbreviations and Acronyms xi
List of publications xiii
Summary xiv
Chapter 1 Introduction 2
1.1 Oxytocin and vasopressin peptides 3
1.1.1 Neuroanatomy of oxytocin and vasopressin neurosecretory neurons 7
1.1.2 Oxytocin and vasopressin receptors 8
1.1.3 Evolution of the oxytocin and vasopressin neurons 9
1.2 Structure of oxytocin and vasopressin related genes 9
1.3 Regulatory DNA 12
1.3.1 Definitions related to regulatory DNA 12
1.3.2 Enhancers 12
1.3.3 Identification and characterization of enhancers 14
1.3.3.1 Biochemical approach 14
1.3.3.2 Genetic approach 16
1.3.3.3 Sequence comparison 17
1.4 Previous work on oxytocin and vasopressin gene expression 18
1.4.1 Expression studies in transgenics 19
1.4.1.1 Oxytocin transgenes 21
1.4.1.2 Vasopressin transgenes 21
1.4.2 Expression studies in explants 23
1.4.3 Expression studies in lung cancer cells 25
1.4.4 Summary of previous work on oxytocin-vasopressin regulation 27
1.5 Objectives of the present study 30
2 Materials and Methods 33
2.1 Isolation and sequencing of cosmid 197K21 33
2.2 Generation of transgenic mice 33
Trang 52.2.2 Preparation of DNA for microinjection 34
2.2.3 Embryo culture media 35
2.2.4 Microinjection 36
2.2.5 Genotyping 39
2.2.6 Transferring DNA 40
2.2.7 Labeling the probe 41
2.2.8 Hybridization 42
2.3 Northern hybridization 42
2.3.1 Extracting total RNA from tissues 42
2.3.2 Fractionation of total RNA 43
2.4 In Situ Hybridization 44
2.4.1 Mounting and sectioning tissues 44
2.4.2 Labeling oligo probes for in situ hybridization 44
2.4.3 Fixing tissue sections for in situ hybridisation 46
2.4.4 Hybridisation, washing, and visualisation 47
2.5 Electrophoretic Mobility-shift Assay 50
2.5.1 Nuclear extracts 50
2.5.2 Radiolabelling probes 51
2.5.3 Cold oligo for competition: 52
2.5.4 Gel Shifts 52
2.6 Computer programs and databases 53
2.6.1 Gene prediction 53
2.6.2 Sequence alignments 53
Chapter 3 Results: Extension and annotation of Fugu isotocin/vasotocin contig 56
3.1 Extension of the isotocin-vasotocin locus sequence 57
3.2 Annotation of the Fugu isotocin-vasotocin locus 57
3.2.1 Novel gene predictions 59
3.2.2 Compact and overlapping Fugu promoters 66
3.3 Conservation of contiguity between the Fugu and human loci 67
3.4 Conserved non-coding sequences 68
Trang 6Chapter 4 Results: Expression of Fugu isotocin and vasotocin genes in transgenic
mice 72
4.1 Introduction 72
4.2 Transgenic mice carrying Fugu cosmids 73
4.3 Transgenic mice with isotocin- and vasotocin-subclones 76
4.3.1 Expression from isotocin-subclone 76
4.3.2 Salt-loading and isotocin expression 78
4.3.3 Expression from the vasotocin-subclone 80
4.3.4 Vasotocin expression and salt-loading 82
4.4 Discussion 82
4.4.1 Possible mosaic expression of isotocin in oxytocin neurons 83
4.4.2 Oxytocin expression detected in Vasopressin neurons 84
4.4.3 Vasotocin gene insensitivity to salt-loading in mice 85
4.4.4 Sequence similarity between Fugu and mouse orthologues 86
Chapter 5 Results: Detection of regulatory DNA – Theoretical considerations 89
5.1 Conservation of regulatory information 89
5.1.1 Conservation of regulatory DNA 89
5.1.2 The short sequence matches are probably not informative 90
5.1.3 Comparing mouse and human non-coding sequences 90
5.2 How can sequence not be conserved while function is? 94
5.3 Where is the oxytocin CRM? 98
5.3.1 Transgenic evidence for the oxytocin CRM 99
5.3.2 Sequence alignment evidence for the oxytocin CRM 102
5.3.3 Explant transfection evidence for the oxytocin CRM 103
5.4 Summary 104
Chapter 6 Results: Gel-shift analysis of oxytocin CRM 107
6.1 Introduction 107
6.1.2 Gel-shifts to identify TFBSs 107
6.1.3 The nuclear extracts 109
6.2 The initial screen 110
6.2.2 Shift C 113
Trang 76.2.2.1 Binding site C present in both mouse and human oxytocin CRMs 113
6.2.2.2 TF C is Sp1-like 114
6.2.2.3 An anti-Sp1 antibody super-shifts Shift C 116
6.2.2.4 The weak Sp1-like sites in the mouse CRM might be cooperative 116
6.2.3 Shift A 118
6.2.3.1 Shift A reveals the same binding site on h15 and m19 118
6.2.3.2 Factor A binds a long, G-rich sequence 120
6.2.4 Shift D 121
6.2.4.1 Site D is present in mouse and human presumed CRMs 121
6.2.5 Shift B 122
6.2.5.1 TFBS B is present in the mouse and human CRMs 123
6.3 Derived maps 125
6.4 Discussion 130
6.4.1 Evidence for the TFBSs being functional 130
6.4.2 Validation of identified TFBSs 131
6.4.3 These gel-shifts and previous in silico work 132
6.4.3 TFBS turnover 133
6.5 Outlook 134
6.5.1 Limitations of the protocol 135
6.5.2 And on to syntax… 136
References 137
Appendix 1 148
Appendix II 149
Appendix III 151
Appendix IV 156
Trang 8List of Figures
1.2 Schematic diagram of the mouse oxytocin/vasopressin locus 10
1.3 Expression patterns of oxytocin transgenes 20
1.4 Expression patterns of vasopressin transgenes 22
1.5 Oxytocin constructs transfected into hypothalamic slice explants 24
1.6 Vasopressin constructs transfected into hypothalamic slice explants 25
1.7 “The intergenic region” hypothesis 28
3.1 Schematic diagram of the Fugu isotocin-vasotocinlocus 56
3.2 Alignment of the Fugu ZF1 protein with its human ortholog 60
3.3 Alignment of the Fugu ZF2 protein with its human ortholog 61
3.4 Expression patterns of the Fugu ZF1, CL1, ZF2, PK, IHABP and PG1 genes
3.5 Alignment of the novel Fugu PG1 protein with its human ortholog 63
3.6 Amino acid alignment of Fugu chemokine gene CL1 with human CCL28 and
3.7 Alignment of the Fugu PK (fPK) and its closest human protein, HIPK2 65
3.8 Conserved ZF1 and ZF2 non-coding sequences 67
3.9 Conserved contiguity between Fugu isotocin contig and human orthologous
fragments 69
4.1 Schematic diagram of the Fugu isotocin/vasotocin locus 72
4.2 Northern analysis of vasotocin expression in cosmid transgenics 74
4.3 In situ detection of isotocin mRNA in 155F06 transgenic following 74
Trang 94.4 Representative photomicrographs of SON from transgenic mice bearing Fugu
cosmids 75
4.5 Double in situ detection of isotocin and oxytocin or isotocin and vasopressin
transcripts in transgenic mice bearing a 5-kb isotocin transgene 77
4.6 Higher-expressing lines do not have higher transgene copy-number 78
4.7 Expression of both isotocin and oxytocin is increased in response to salt loading, in
4.8 Double in situ detection of vasotocin and vasopressin or vasotocin and oxytocin
transcripts in transgenic mice bearing a 9-kb vasotocin transgene 81
4.9 Vasotocin expression does not respond to salt loading in transgenic mice 82
5.1 The alignment of sequences 3’ of the mouse and human oxytocin polyA signals
92
5.2 The alignment of sequence 3’ of the mouse and human vasopressin polyA signals
93
5.3 Apparent “blocks” of contiguous nucleotide identity in a human/mouse oxytocin
non-coding alignment break down in multi-species comparisons 94
5.4 Hypotesised location of the oxytocin and vasopressin CRMs 99
5.5 Evidence for a downstream oxytocin enhancer 101
5.6 Each gene likely has two enhancers 103
6.2 Shift C is shared between the mouse and human sequences 114
6.3 Weak, adjacent Sp1-like sites identified by tiling oligos 117
6.4 Shift A is present in both mouse and human regions 119
6.5 Shift D is present in both human and mouse sequences 122
6.6 Mouse (aligned with rat) and human oligos containing site D 123
Trang 106.8 Shift B is present in both the human and mouse sequences 126
6.9 Inferred maps of the TFBSs in the presumed CRMs of human and mouse oxytocin
127
6.10 A partial TFBS map plus BLASTZ alignment of the human and mouse oxytocin
CRM 129 6.11 Comparison of TFBS map with MEME/MAST results 132
6.12 Conservation of regulatory DNA between mouse and human in the HoxA and
List of Tables
4.1 Relative copy and expression level of different transgenic lines 79
Trang 11
Abbreviations and Acronyms
BSA bovine serum albumin
EDTA ethylenediamine- N,N,N’,N’-tetraacetic acid
EMSA electorphoretic mobility shift assay
MOPS 3-(N-morpholino)-propanesulfonic acid
MYA million years ago
Trang 12PAGE polyacrylamide gel electrophoresis
PBS phosphate buffered saline
PCR polymerase chain reaction
pH positive log(10) of the proton concentration
PMSF phenylmethylsulfonyl fluoride
PWM position weight matrix
RACE rapid amplification of cDNA ends
RT room temperature, 20-25˚C
RT-PCR reverse-transcription PCR
SDS sodium dodecyl sulfate
TEMED N, N, N’,N’-tetramethylethylenediamine
TFBS transcription factor binding site
tss transcription start site
UTR untranslated region (of an mRNA)
Trang 13List of publications
Gilligan P., and Venkatesh B.; Application of comparative genomics to the analysis of
vertebrate regulatory elements Briefings in Functional Genomics & Proteomics In Press
Gilligan P., Brenner S., Venkatesh B.; Neurone-specific expression and regulation of the
pufferfish isotocin and vasotocin genes in transgenic mice J Neuroendocrinol 2003,
11:1027-36
Gilligan P., Brenner S., Venkatesh B.; Fugu and human sequence comparison identifies
novel human genes and conserved non-coding sequences Gene 2002, 294:35-44
Venkatesh B., Gilligan P., Brenner S.; Fugu: a compact vertebrate reference genome
FEBS Lett 2000, 476:3-7
Trang 14Summary
In the work described in this thesis, I investigate regulatory DNA (enhancers, or
cis-regulatory modules) of the oxytocin and vasopressin genes Oxytocin and vasopressin, the first-discovered neuropeptide hormones, are homologs, and are secreted from the anterior pituitary, and involved in fluid homeostasis and reproductive behavior The genes encoding oxytocin and vasopressin are expressed in oxytocinergic and
vasopressinergic neurons in the hypothalamus Considerable work has been directed towards identifying the DNA that regulates transcription of these genes, but results so far are inconclusive
I chose comparative genetics to localize the oxytocin and vasopressin enhancers and identify transcription factor binding sites Transgenic experiments demonstrated that the enhancers that direct hypothalamic transcription are conserved between mouse oxytocin and vasopressin, and their pufferfish (Fugu) orthologs isotocin and vasotocin genes, respectively The Fugu genes were expressed precisely in the neurons that express the respective mouse genes However, I could not show any informative non-coding
sequence similarity between the mouse and Fugu genes, or even between the mouse and human genes I postulated that transcription factor binding sites might - while being retained in the enhancers - be rapidly gained and lost at positions within the enhancers This would rapidly obliterate sequence similarity between orthologous enhancers Careful analysis of previous transgenic work indicated that there was an oxytocin enhancer within the 600 bp region immediately downstream of the oxytocin polyA signal I analyzed this
Trang 15region by 'comparative gel-shift' experiments between human and mouse; and identified several transcription factor binding sites that are 'conserved' between the human and mouse locus The 'conserved' transcription factor binding sites were indeed 'rearranged',
as I predicted These results demonstrate that, in principle, enhancers can be flexible in the way they are coded If so, then mutations to such an enhancer are likely to alter the expression pattern it drives, rather than abolish it; the enhancer would not be 'rigid and brittle', but 'plastic' If, as this suggests, enhancers can be evolutionarily plastic, it makes
it possible to imagine how metazoan evolution takes place primarily in regulatory DNA rather than protein coding sequences This is an important theoretical contribution to the study of syntax in regulatory DNA
Trang 16Chapter 1 Introduction
Trang 17Chapter 1 Introduction
The work described in this thesis is intended to improve our understanding of the
regulation of the transcription of the oxytocin and vasopressin genes The information necessary for correctly regulating the expression of genes is itself written in the DNA, in
the genes that encode the necessary (trans-acting) transcription factors and the
cis-regulatory sequences, which are mostly binding sites for the transcription factors The information is encoded in the DNA in an abstract ‘language’ This language is generally interpreted biochemically, in interactions between proteins, DNA, RNA and other
macromolecules The way proteins are encoded is fairly strict; there are only 64 possible
codons ('words'), every single one specifies a known amino acid or stop In contrast,
cis-regulatory DNA is poorly understood, and since the information is probably in the
strength of various transcription factor binding sites, and whether and how strongly proteins bound at those sites interact with each other, it is unlikely that it will be
summarised in anything so simple as a codon table Since we do not really even know the
‘words’ in regulatory DNA, it is cryptic to us However, evolution leaves footprints While non-functional DNA accumulates mutations over generations, functional DNA is constrained, so comparison of non-protein-coding genomic sequence between two related species has the potential to reveal regulatory DNA, and in principle, constituent elements The ‘strength’ of footprints depends on how sensitive regulatory machinery is to
sequence changes, which will, in turn, depend how it works and how the machinery itself evolves
Trang 18In this introduction, I describe in some detail the oxytocin and vasopressin peptides, and the structure and evolution of the genes encoding them Then I discuss the known
characteristics of regulatory DNA, and the approaches used to identify and characterize regulatory DNA, including comparative genetics, and, lastly, previous work on oxytocin and vasotocin gene regulation
1.1 Oxytocin and vasopressin peptides
The mammalian neuropeptides, oxytocin and arginine-vasopressin (vasopressin), are structurally related nona-peptides having the sequences CYIQNCPLG-NH2 and
CYFQNCPRG-NH2, respectively They were first experimentally encountered as
pituitary extracts with oxytocic (“rapid birth”)/milk-ejecting and vasopressor/antidiuretic
activities, respectively (Burbach et al 2001) Oxytocin and vasopressin are synthesized
as preprohormones mainly in the magnocellular neurones of the supraoptic nuclei (SON) and paraventricular nuclei (PVN) in the hypothalamus (Figure 1.1) The preprohomones comprise of a signal peptide, followed by the respective nona-peptide hormone and a neurophysin molecule The neurophysin molecule is separated from the nonapeptide hormone by the tripeptide cleavage signal “Gly-Lys-Arg” The neurophysin is thought to act as a carrier molecule for the hormone between translation into the endoplasmic
reticulum, and secretion The vasopressin preprohormone contains an additional
C-terminal peptide termed copeptin, of unknown function The prohormone is cleaved from
the signal peptide and a prohormone-convertase releases the nonapeptide (Burbach et al
2001) The mature hormones are sorted into secretory vesicles, and stored at the axon terminal in the posterior pituitary When the neuron fires, the hormone is released into the
Trang 19Oxytocin and vasopressin have some distinct as well as overlapping physiological roles, including the following: Oxytocin induces contraction of smooth muscle of full term uterus and the lactating mammary gland, respectively (Russell and Leng 1998; Winslow
Trang 20and Insel 2002; Goodson and Bass 2001) On the other hand, the main role of vasopressin
is maintenance of blood osmolarity and volume homeostasis (Orloff and Handler 1967) Oxytocin is also known to stimulate Na+ excretion in the kidney (Verbalis et al 1991)
Peptides related to oxytocin and vasopressin have been identified in a wide range of animal taxa (Table 1) Whereas all jawed vertebrates have at least one homologue each of oxytocin and vasopressin, jawless-vertebrates have a single oxytocin/vasopressin-related peptide (Table 1) Mesotocin is the oxytocin homolog in reptiles, birds, amphibians and lungfishes, whereas in teleosts, it is isotocin At least five homologs of oxytocin have
been identified in cartilaginous fishes (chondricthyes) (Chauvet et al 1994) Vasotocin is the vasopressin homolog found in all non-mammalian vertebrates (Murphy et al 1998)
Vasotocin is also the single oxytocin-vasopressin related peptide found in jawless
vertebrates such as the lampreys and hagfishes (Heierhorst et al 1992); (Suzuki et al
1995) Among invertebrates, a single oxytocin-vasopressin related peptide is found in the
pond snail (van Kesteren et al 1992b), and earthworm (Satake et al 1999) However, in the octopus two peptides, cephalatocin and octopressin, have been identified (Kanda et
al 2003; Takuwa-Kuroda et al 2003) These peptides are more similarity to each other
than those from other invertebrates, suggesting they result from a lineage-specific gene duplication
The structure of the oxytocin-vasopressin related nona-peptides is highly conserved across taxa (Table 1.1) They all contain a bulky aromatic residue in the second position, which is critical for binding to the neurophysin carrier There is a conserved di-sulfide
Trang 21bridge between the cysteines at the first and sixth positions, causing a ring structure The
amino acid at the eighth position confers either oxytocin- or vasopressin-like properties to
the peptides If this amino acid is neutral (Leu, Ile, Val), the peptide is considered to be
an oxytocin homologue, and if it is charged (Arg or Lys), a vasopressin homologue
(Acher 1993)
Table 1.1 Oxytocin- and vasopressin-related peptides The peptide sequence of oxytocin is represented on the first line Matches to this sequence are indicated by dots, mis-matches by the letter indicating the substituted amino acid Adapted from (Acher 1993) and (Reich 1992;
Takuwa-Kuroda et al 2003; Oumi et al 1994; van Kesteren et al 1992b; van Kesteren et al
1992a)
Oxytocin CYIQNCPLG-NH2 Placentals, some marsupials, ratfish
Mesotocin .I Marsupials, nonmammalian tetrapods, lungfishes Isotocin .S I Osteichthyes
Glumitocin .S Q Skates (Chondrichthyes)
Valitocin .V Sharks (Chondrichthyes)
Aspargtocin .N Sharks (Chondrichthyes)
Asvatocin .N V Sharks (Chondrichthyes)
Phasvatocin FN V Sharks (Chondrichthyes)
Cephalotocin FR I Octopus (Octopus vulgaris, mollusc)
Octopressin .FWTS I Octopus
Annetocin .FVR T Earthworm (Eisenia foetida, annelid)
Vasopressin F R Mammals
Vasotocin .R Nonmammalian vertebrates, cyclostomes
Lysipressin F K Pig, some marsupials
Phenypressin .FF K Macropodids (Marsupials)
Locupressin .L.T R Locust (Locusta migratoria, Insect)
Arg-conopressin I.R R Cone snail (Conus geographicus, mollusc)
Lys-conopressin F.R K Pond snail (Lymnaea stagnalis, mollusc)
Trang 221.1.1 Neuroanatomy of oxytocin and vasopressin neurosecretory neurons
The oxytocin and vasopressin destined for circulation is produced in magnocellular
neurons of the hypothalamus The cell bodies of these magnocellular (neurosecretory)
oxytocinergic and vasopressinergic neurons lie in two pairs of hypothalamic nuclei, the supraoptic nuclei (SON) and paraventricular nuclei (PVN) In the mouse, the supraoptic nuclei are ~ 100 µm in diameter, and ~ 800 µm in length These neurons have large soma, approximately 20 – 30 µm in diameter (hence “magnocellular”) The transcription
of oxytocin and vasopressin genes in these neurons are massive; mRNA copy number is
variously estimated at around 20,000 to 500,000 per cell (Burbach et al 2001) Oxytocin
and vasopressin are released from the posterior lobe of the pituitary, which is largely composed of axon terminals of specialized ‘neurosecretory’ neurons
The expression patterns of oxytocin and vasopressin genes, as visualized by
immunohistochemistry (IHC) and in situ hybridization (ISH), appears to be mutually
exclusive The neurons differ physiologically; oxytocinergic neurons fire more frequently
in response to steady depolarization, whereas vasopressinergic neurons generate phasic bursting patterns (Hatton 1990) However, 3 – 5% of the magnocellular neurons express both oxytocin and vasopressin mRNAs, and this ‘co-expressing’ subpopulation increases
to around 10 to 20% under conditions that stimulate transcription (Gainer and Young 2001) Furthermore, recent single-cell reverse transcriptase-polymearase chain reaction (RT-PCR) studies revealed that most of the oxytocinergic and vasopressinergic neurons also synthesize vasopressin and oxytocin respectively, albeit in lower amounts (~100 fold
less than the main hormone (Xi et al 1999) This co-expression, perhaps due to
Trang 23‘leakiness’, underscores the homology of the regulatory DNA of oxytocin and
vasopressin
1.1.2 Oxytocin and vasopressin receptors
Mammals have three pharmacologically distinct vasopressin receptors and a single oxytocin receptor, which fall into two subfamilies The V2 vasopressin receptor
constitutes the first family It is coupled to the adenylate-cyclase pathway (Birnbaumer et
al 1992; Gorbulev et al 1993; Lolait et al 1992), and promotes antidiuresis by causing
the water channel, aquaporin, to cycle to the surface in the distal nephron of the kidney (Orloff and Handler 1967) In contrast, receptors of the second family, composed of the vasopressin receptors V1, (formerly V1a; Morel et al 1992) and V3 (formerly V1b;
Sugimoto et al 1994), and the oxytocin receptor (Gorbulev et al 1993; Kimura et al
1992), couple to the inositol triphosphate/Ca2+ pathway In mammals, the V1 receptor is expressed principally in the liver (where it mediates the effects of vasopressin on
glycogenolysis), platelets, smooth muscle cells and CNS (Morel et al 1992; Thibonnier
et al 1994) The V3 receptor mediates the release of ACTH from the anterior pituitary
(Sugimoto et al 1994) The oxytocin receptor is expressed predominantly in the smooth muscle of reproductive tracts (e.g., uterus), various parts of the brain (variable between species), and the myoepithelium of the mammary gland (Kimura et al 1992; Gorbulev et
al 1993; Rozen et al 1995).
Trang 241.1.3 Evolution of the oxytocin and vasopressin neurons
As briefly summarized above, oxytocin and vasopressin have homologous receptors, overlapping functions and target tissues that look functionally homologous on an
evolutionary timescale The oxytocin and vasopressin magnocellular neurons are
probably ‘duplicate neuron populations’, by analogy with the duplicate genes It seems likely that oxytocin and vasopressin neurons are ‘subpopulations’ of an ancestral
population, which expressed the single ancestral gene The ‘last common ancestral neuronal population’ would likely be a vasotocinergic population in the last common ancestor of jawed and jawless vertebrates This population can probably be traced quite far back in evolution; the earthworm homologue, annetocin, is expressed in ten neurons
of the subpharnageal ganglion, which is homologous to the vertebrate central nervous
system (Satake et al 1999) The central nervous system in lower invertebrates such as
molluscs and annelids consists of collections of ganglia, whereas the brains of mammals are vastly more complex This is true also of the neuroendocrine system The simplest way to generate a more complex system is to increase the number of cells, and
differentiate them over evolutionary time (Carroll 2000; Carroll 2001) This seems to me
to emphasize plasticity in the ancestral oxytocin/vasopressin enhancer, and homology between the modern oxytocin and vasopressin enhancers
1.2 Structure of oxytocin and vasopressin related genes
Trang 25The genes encoding oxytocin and vasopressin are homologous in the sequence and organization of their exons (see Figure 1.2) The two genes are closely linked in a tail-to-tail array in all the mammals investigated The intergenic region in various mammals
range from 3 kb to 12 kb (Figure 1.2) (Schmitz et al 1991; Gainer and Young 2001)
The exon-intron organization of the isotocin and vasotocin genes in the teleost fish, Fugu
rubripes (Fugu) is identical to their mammalian homologs However, in contrast to the
tail to tail organization of the mammalian homologs, the Fugu genes are linked tail and separated by 24.4 kb, with five intervening genes (Venkatesh et al., 1997) Thus,
head-to-it appears that an inversion has occurred in this locus ehead-to-ither in the mammalian or the teleost lineage, since the divergence of the two lineages Interestingly, in another teleost
fish, Catostomus commersoni (white sucker), two copies of isotocin genes have been cloned and both are intronless (Figueroa et al 1989) Since the mammalian and Fugu
oxytocin-vasopressin related genes contain introns at identical positions, it is likely that
Figure 1.2 Schematic diagram of the mouse oxytocin/vasopressin locus In mammals investigated, the oxytocin and vasopressin genes are transcribed towards each other, separated by 3.5 to 12 kb The expanded intergenic region in humans and rat is due to additional repetitive DNA The horizontal bars represent introns and UTRs; the vertical bars, coding sequence; and the bent arrows, transcription start sites
Trang 26isotocin introns were secondarily lost in the white sucker, possibly via recombination with an isotocin cDNA produced by an endogenous retrovirus
The single gene encoding vasotocin in the hagfish (a jawless vertebrate), has an intron structure similar to that of the mammalian and teleost oxytocin-vasopressin related
exon-genes (Heierhorst et al 1992; Suzuki et al 1995) In addition to the two introns in the
coding sequence, the hagfish vasotocin gene contains an intron in the 5’ untranslated
region (Heierhorst et al 1992) A single oxytocin/vasopressin family gene is found in the
invertebrates pond snail (van Kesteren, Smit et al., 1992), and earthworm (Satake,
Takuwa et al., 1999), in both cases having the canonical exon/intron organisation
However, in the octopus, the two genes encoding cephalatocin and octopressin are
reported to be intronless (Kanda,Takuwa-Kuroda et al., 2003), like the two isotocin genes
in the white sucker (Figueroa, Morley et al., 1989) The introns may be secondarily lost
in the octopus (though the cephalotocin and octopressin cDNA sequences become almost identical exactly at the point where intron 1 would be, which is difficult to explain if there is no intron)
The presence of a single oxytocin-vasopressin related gene in the jawless vertebrates, and
at least one homologue each of oxytocin- and vasopressin-related genes in jawed
vertebrates, indicates that a single neuropeptide gene in jawless fishes was duplicated in the ancestor of jawed fishes to give rise to oxytocin and vasopressin related genes (Acher 1980) The close linkage of the oxytocic-vasopressin related genes in mammals and Fugu supports this hypothesis The peptides encoded by the newly duplicated genes would
Trang 27have had the same function initially and undergone 'subfunctionalization' (Force et al
1999) subsequently to give rise to two genes that shared the expression domains and functions of the parent gene
1.3 Regulatory DNA
1.3.1 Definitions related to regulatory DNA
The terms gene, promoter and enhancer are often used vaguely The definitions of these terms as used in this thesis are given below:
A ‘gene’ is the unit of heritability, i.e., a genetic locus, or a region of chromosome
that is ‘expressed’ in the phenotype of the individual A protein-coding gene includes the exons and introns, 5’ and 3’ UTR, and the promoter and regulatory DNA
The ‘promoter’ is the stretch of DNA from approximately –40 to +10 bp of the transcription start site; it is the sequence to which the RNA polymerase binds, and from which it initiates transcription By itself, a eukaryotic promoter is insufficient for
transcription, i.e., it is ‘default silent’
‘Cis-regulatory DNA’ is a generic term for the sequence that regulates
transcription It may be adjacent to the promoter, within the introns, or upstream or downstream of the coding sequence, often tens of kilobases away Individual stretches of
cis-regulatory DNA are referred to as ‘enhancers’ or ‘cis-regulatory modules’ (‘CRMs’)
1.3.2 Enhancers
Enhancers are discrete modules about 200 to 2000 bp long, containing clusters of binding sites for multiple transcription factors (TFs) that activate and/or repress transcription
Trang 28they were observed as fragments of DNA that enhanced transcription of genes However, many ‘enhancers’ posses both enhancing and silencing activities
The dispersed, modular enhancer organization probably allows more than one enhancer
to communicate with a target promoter, without interfering with one another This
organization likely requires that the module that includes the promoter is to some extent
‘neutral’, and that it can interact with distant CRMs Such a ‘promoter-including-module’ might allow a gene to readily acquire new enhancers, and therefore expression domains Thus a promoter should have linked sequence that allows it to interact with distant
enhancers, neither too selectively, nor too promiscuously
Transcription factor binding sites (TFBSs) are generally short and degenerate TFs
normally do not have stringent DNA recognition sequences like restriction enzyme cutting sites, but preferentially bind sequences with sufficient similarity to (conceptual)
‘consensus binding sites’ The binding preferences of the transcription factor can be represented by position weight matrices (PWMs) A PWM is compiled from a collection
of binding sites of a factor; the numbers indicate how frequently each base was observed
at each position In principle, a table of PWMs for the TFs in the genome (analogous to the codon table; a ‘lexicon’) could be used to identify the TF binding sites in genomic DNA What would be required for such a table? It has been estimated that for the TF CTF/NF1, several thousand medium affinity sites (a collection of high affinity sites would have lost information about base preferences at some positions) would be required
to accurately define the binding site (Roulet et al 2002) It is not obvious that CTF/NF1
Trang 29should be unique in requiring such a large set of examples to derive an accurate PWM However, PWMs in the literature are typically derived from a few, to a few dozen, sites
That we cannot yet accurately recognize TFBSs, taken together with the fact that
enhancers are generally dispersed in copious junk DNA (so we do not even know where
to start looking for them), are serious obstacles to studying enhancers Hence the progress
in the identification and characterization of enhancers has been slow compared to the progress with predicting and characterizing coding regions Only about a dozen
enhancers have been thoroughly described (Arnone and Davidson 1997) Some well
characterized enhancers are the even-skipped stripe2 (Small et al 1992; Small et al 1991; Stanojevic et al 1991) and mesodermal/heart enhancers in the fruit fly (Halfon et
al 2000), the ‘enhanceosome’ in the human interferon-β gene (Merika and Thanos
2001), the enhancers of the Endo16 and CyIIIa of the sea urchin genes (Arnone and
Davidson 1997; Yuh et al 1998)
1.3.3 Identification and characterization of enhancers
The location and composition of enhancers can be analysed biochemically, genetically, or
by sequence comparisons
1.3.3.1 Biochemical approach
DNA is usually packaged in chromatin, and genes have to be ‘unpacked’ when the gene
is transcribed It is thought that histones and other chromatin proteins are bound less tightly to DNA in active genes, and some of the DNA is likely to be naked, or otherwise
Trang 30so enhancers can be detected as DNase I hypersensitive sites To do this, nuclei are prepared from cells or a tissue, and incubated with various concentrations of DNase I The DNA is then extracted and digested with a restriction enzyme to make a defined end, from which the hypersensitive sites can be located A Southern is performed and probed with fragment adjacent to the restriction enzyme site (‘indirect end labeling’) New, more rapidly migrating bands that appear represent hypersensitive sites The technique detects enhancers experimentally in a simple, quick procedure, and has quite long range,
dependant on the agarose gel, from ~ 1kb up to 20 – 30 kb However, it is drastically limited by sensitivity
DNase I can also be used to dissect enhancers into their constituent TF binding sites Bound TFs will tend to protect DNA from DNase I and produce a characteristic
'footprint' when fractionated on a gel DNase I footprinting is usually performed by incubating an end-labeled ~200 – 1000 bp cloned fragment containing an enhancer together with TFs that are known to bind it, digesting with various concentrations of DNase I, and resolving the DNA on a sequencing gel Protected sites are observed as bands that are less intense when the TFs are added to the DNA before DNase I digestion The technique is attractive in quickly identifying TF binding sites, but it is not very sensitive, since the protein must protect a substantial percentage of the sites (~20% or more) to cause a detectable footprint Thus it requires either purified TFs (in which case one must already have candidates), or exceptionally good extract and favorable binding conditions
Trang 31TF binding sites can also be detected or tested in gel-shift assays Typically, a labeled DNA (often a synthetic oligo) is incubated with TF, and the mix is fractionated on an
acrylamide gel A TF will retard DNA to which it is bound (i.e., cause a ‘mobility shift’)
Since the shifted band only needs to be strong enough to be detected, the technique is very sensitive, and can function well with crude extracts It is limited in range (the length
of the oligo), but is easy to perform and is a very popular technique
Another biochemical approach is to use TFs known to act directly on the gene in
question, in DNase I footprinting, or gelshift assays, to locate and partly analyse the
enhancers (this was done in the eve stripe 2 enhancer, for instance (Small et al 1991; Stanojevic et al 1989)) There are several mouse TF mutants that fail to express oxytocin and vasopressin (Burbach et al 2001) Unfortunately, this is because mutant animals
have deformed hypothalami, so it is quite likely the TFs in question do not act directly on the oxytocin or vasopressin enhancers
1.3.3.2 Genetic approach
All candidate enhancers, and their constituents, must eventually be tested for
transcription activating activity This activity is only manifest in a complex system (the
appropriate ensemble of TFs, target promoter, etc), so it cannot be assayed in isolation
The best assay system is a whole organism, but in certain circumstances, tissue explants, cell lines, or even cellular extracts may be used as alternatives In principle, one locates
or tests candidate regions for activity in the system The candidate might be a
hypersensitive site, a conserved non-coding region, or simply a large piece of DNA
Trang 32enhancer is narrowed down to the minimal region required for activity In mammals, the severe bottleneck imposed by transgenesis is probably the main reason that work on regulatory DNA has gone so slowly It is no coincidence that, of well-characterized
enhancers, most come from genetically tractable organisms (Drosophila, sea urchin), or can function in cell lines (e.g., the interferon-β enhanceosome mentioned above)
1.3.3.3 Sequence comparison
Enhancers are bound by sequence-specific transcription factors Therefore, the sequence
of an enhancer should be conserved between related species, and should be identifiable in comparative sequence analysis Several species can be used in the comparisons, so that conserved features are identified more clearly, and with greater resolution The choice of the species used in such comparisons is critical, since there must be sufficient
conservation to allow detection, and sufficient divergence that functional sequences can
be distinguished from non-functional sequences If the work stays at the level of
comparison, it can be called ‘comparative genomics’; and if extended to characterization using transgenic assay systems, it can be called ‘comparative genetics’
A good example of characterization of enhancers using comparative genomics and
genetics approaches is the human SCL locus Comparisons of the human and mouse loci identified eight transcriptional regulatory modules, including two lineage-specific
promoters (Aplan et al 1990), and six enhancers (Gottgens et al 2000; Sanchez et al 1999; Sinclair et al 1999) However, comparison of the mouse and chicken loci
identified the two promoters but only three of the six enhancers (Gottgens et al 2002) In
Trang 33were identified (Gottgens et al., 2002) Since in situ hybridizations reveal that SCL expression in zebrafish is homologous to that in mouse (Gering et al 1998; Liao et al
1998), and Fugu sequences can recapitulate this pattern and rescue zebrafish SCL
mutants (Barton et al 2001) the enhancers are probably homologous from mammals to fish, but cannot be detected by present comparative analyses (Barton et al., 2001)
Comparative sequence analysis can also be used to dissect enhancers, if the resolution is
sufficient For instance, the TF binding sites in the stripe 2 enhancer of the Drosophila
even skipped gene were first identified biochemically These sites can also be identified
clearly in a multispecies sequence alignment (Ludwig et al 1998), giving an analogous
result to DNase I footprinting This approach has recently been called “phylogenetic
shadowing” (Boffelli et al 2003) Again, putative sites would need to be assayed in a
system such as transgenics
1.4 Previous work on oxytocin and vasopressin gene expression
An understanding of the neuroendocrine role of oxytocin and vasopressin requires an understanding of how oxytocin and vasopressin are expressed in the hypothalamic
magnocellular neurosecretory neurons (MCNs) In theory, we might investigate this expression in transgenics, tissue explants, cell lines, or nuclear extracts (nuclear run-on
experiments, or in vitro transcription assays) In fact, the MCNs are very few, so it is not
practical to make nuclear extracts from them There were early attempts to derive MCN
cell lines (Murphy et al 1987), but these were not successful Thus, most of the work on
regulation of oxytocin and vasopressin genes has been carried out in transgenic rodents
Trang 34Oxytocin and vasopressin promoters direct inappropriate expression in some
heterologous cell lines (reviewed in (Burbach et al 2001)), and this phenomenon has been exploited to study modulation of the basal mis-expression, but is not obviously
relevant to studying how legitimate expression is driven in MCNs Oxytocin and
vasopressin are routinely expressed in some small cell lung cancers (SCLC), and
elements of the vasopressin promoter that drive expression in SCLC cell lines have been
investigated (Coulson et al 1999)
1.4.1 Expression studies in transgenics
Transgenics experiments are the gold standard, since the experiments are closest to what
the genes must actually achieve in vivo However, transgenic experiments are
time-consuming and expensive Since there are no hypothalamic cell lines that specifically express oxytocin and vasopressin genes, transgenics has become the method of choice for analysing their neuron specific expression Transgenic experiments generally involve generation of transgenic mice or rats bearing structural transgenes flanked with different sizes of 5' and 3' flanking sequences A few transgene constructs include both oxytocin
and vasopressin sequences (Gainer and Young 2001) (Zhang et al 2002) In some studies, reporter genes such as lacZ and CAT have been incorporated into the structural genes However, LacZ is known to interfere with the expression patterns of transgenes (Cohen-Tannoudji et al 2000; Paldi et al 1993; Chevalier-Mariette et al 2003) In
trying to locate enhancers, one is generally interested in the smallest transgene that expresses correctly, since it will generally narrow the enhancer down to the smallest interval A further deletion that eliminates enhancer activity should, in theory, locate the
Trang 35Figure 1.3 Expression patterns of oxytocin transgenes A “tick” indicates expression
in oxytocinergic neurons Numbers indicate distance in kb the transgene extends
upstream of transcription start site, or downstream of polyA signal The roman
Trang 36enhancer The results of the oxytocin and vasopressin transgenes have been summarized
in Figures 1.3 and 1.4, respectively
1.4.1.1 Oxytocin transgenes
The smallest transgene that correctly expresses oxytocin is a fragment of the bovine oxytocin gene (Figure 1.3, viii) which extends from 0.6 kb upstream of the tss to 1.9 kb
downstream of the polyA signal (bOT3.5, (Ho et al 1995)), indicating that there is an
enhancer mediating oxytocinergic expression within this construct, most likely in the
downstream sequence (i.e., not adjacent to the promoter, or in the (small) introns) When this transgene is extended a further 0.6 kb downstream (Figure 1.3, x) (bOT, (Ho et al
1995)), it looses MCN expression, implying a repressor between 1.9 and 2.5 kb
downstream of the polyA signal Two of the transgenes (Figure 1.3, i and x) have been reported to be impossible to derive transgenics with, which suggests that they are lethal (Murphy and Wells 2003), most likely due to mis-expression during development
Interestingly, when a fragment of the vasopressin gene, minimally spanning intron 2, exon 3 and 180 bp of downstream sequence was added to the transgene construct i (Figure 1.3, iii and ii, see (Gainer and Young 2001)) transgenic pups were obtained In these lines, the transgene was expressed specifically in oxytocinergic neurons This has been interpreted as revealing an oxytocin enhancer downstream of the vasopressin gene
(Gainer, 1998; Burbach et al., 2001; Gainer et al., 2001; Gainer and Young, 2001)
1.4.1.2 Vasopressin transgenes
The rat transgenes xiv through xviii (Grant et al 1993) (Zeng et al 1994a; Davies et al 2003) bovine transgenes xxii and xxiii (Ang et al 1993), and mouse transgenes xxiv and
Trang 37Figure 1.4 Expression patterns of vasopressin transgenes A tick indicates expression in
vasopressinergic neurons The roman numerals to the right are used in the text to identify transgenes, Arabic numerals indicate length in kb of upstream and downstream regions References: xii, xiii
(Gainer and Young 2001); xiv (Grant et al 1993); xvi (Zeng et al 1994a; Takuwa-Kuroda et al 2003;
Trang 38xxv (Jeong et al 2001) (Figure 1.4) all express in vasopressinergic magnocellular
neurons The smallest of these is a rat vasopressin genomic fragment containing 3 kb upstream of the tss and 0.2 kb downstream of the polyA signal with a CAT reporter gene
inserted into the third exon (Figure 1.4, xviii) (Davies et al 2003), and sequences
equivalent to this transgene are contained in all the above, expressing, vasopressin
transgenes There are two smaller transgenes which do not express correctly, namely, the rat-derived transgene xii and the bovine-derived transgene xxi (Figure 1.4) Both are similar to xviii, except that they lack sequences upstream of –1.4 and –1.2 kb,
respectively This suggests sequences necessary and sufficient to drive expression in vasopressinergic neurons are located between –3.0 and –1.4 kb As an aside, one of the three lines derived with the bovine transgene xxi (Figure 1.4), in addition to general non-specific expression in neurons, expresses high levels of vasopressin mRNA in the SON
and PVN (Ang et al 1993), perhaps indicating that it has ‘almost’ enough information to
drive correct expression
1.4.2 Expression studies in explants
In this technique, hypothalamic explant cultures, derived from neonatal rats, are used as
an assay system It has been shown that oxytocin and vasopressin neurons survive for
some days in culture, and can be transfected (Gainer et al., 2001) Oxytocin and
vasopressin gene constructs linked to a reporter such as the GFP are transfected into these hypothalamic explant cultures and the expression pattern is observed This is obviously a very useful technique, as results can be obtained within a few days and a large number of constructs can be assayed Results of some recent studies using oxytocin and vasopressin
Trang 39Figure 1.5 Oxytocin constructs transfected into hypothalamic slice explants Expression was specific for oxytocin MCNs Roman numerals to the right are used to identify constructs in
constructs are summarised in Figures 1.5 and 1.6 Studies with the oxytocin constructs
show that the 450 bp region downstream of the mouse oxytocin polyA signal
(presumably homologous sequences were contained in the apparently lethal rat transgene
‘i’ Figure 1.3) are necessary and sufficient to drive expression in oxytocin MCN neurons,
and that the 180 bp from downstream of the mouse vasopressin polyA signal can drive
oxytocin MCN-specific expression from the oxytocin promoter, or vasopressin MCN-
specific expression from the vasopressin promoter (Fields et al., 2003) This is
interesting, since that vasopressin 180 bp are contained in the transgene xii (Figure 1.4),
Trang 40which does not express, indicating different requirements in the explant culture and
transgenics assays
1.4.3 Expression studies in lung cancer cells
Lung cancers are classified as either small cell lung cancers (SCLCs) or non-small cell
Figure 1.6 Vasopressin constructs transfected into hypothalamic slice explants All of the expressing constructs were specific for vasopressin neurons Roman numerals to the right are used to identify constructs in the text, Arabic numerals indicate length in kb of upstream and