ROLES OF LONG NON-CODING RNAS IN HUMAN EMBRYONIC STEM CELL PLURIPOTENCY AND NEURAL DIFFERENTIATION NG SHI YAN B.. 1.1.1 The transcription factors OCT4, NANOG and SOX2 constitute the h
Trang 1ROLES OF LONG NON-CODING RNAS IN HUMAN
EMBRYONIC STEM CELL PLURIPOTENCY AND
NEURAL DIFFERENTIATION
NG SHI YAN
B Sc (Honors) National University of Singapore, 2008
A THESIS SUBMITTED FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY NUS GRADUATE SCHOOL FOR INTEGRATIVE SCIENCES AND
ENGINEERING NATIONAL UNIVERSITY OF SINGAPORE
2012
Trang 2Acknowledgements
I would like to express my deepest gratitude to the many people who made this thesis possible I thank my supervisor Dr Lawrence Stanton for his invaluable guidance, advice, support and his belief in me His mentorship encouraged independence, creativity, and allowed me the freedom to grow and develop I have learnt a lot during the past four years, which has been an extremely enriching and inspiring experience I would also like to thank my thesis advisory committee, Dr Wang Hongyan, Dr Paul Robson, and Dr Gerald Udolph, for their critical feedback along the way
I am especially grateful to Dr Rory Johnson, who introduced me to the world
of lncRNAs I am thankful for the insightful discussions we had, the custom lncRNA microarray which he designed, as well as his feedback and comments I also thank Gireesh Bogu, for the RNA-seq analyses Dr Irene Aksoy was most generous in providing me with a human iPS cell line I am also thankful for stimulating discussions with Dr Akshay Bhinge It is also a pleasure to thank everyone in the GIS Stem Cell groups for companionship, support and helpful discussions
Most importantly, my family has given me incredible support and encouragement Their love and understanding helped me overcome the obstacles in the process My parents have provided me with an incredible environment to grow and nurture; my sister was amazingly supportive, understanding and a great companion I also thank my fiancé for his patience and companionship, who made long hours in the lab more pleasant.
Trang 31.1.1 The transcription factors OCT4, NANOG and SOX2 constitute the human embryonic
1.1.2 An expanded transcriptional regulatory circuit maintaining pluripotency 2 1.1.3 Non-coding RNAs modulate pluripotency by regulating expression of the core transcription factors and/or downstream genes 6
1.2 Directed neural differentiation of hESCs 7
1.2.1 Neural induction – lessons from the embryo 7 1.2.1.1 Inhibiting the TGF-β signaling pathway enhances neural induction 8 1.2.1.2 Stromal co-culture induces neural conversion 10 1.2.2 Regional specification of midbrain dopamine neurons 11
1.2.3 Radial glia cells are neuronal progenitors in vivo 14
1.3.1 Long non-coding RNAs in pluripotency 16 1.3.2 Long non-coding RNAs in neural development 19
1.4 Molecular mechanisms of long non-coding RNA function 21
1.4.1 LncRNAs behave as scaffolds that target protein complexes to specific genomic loci
1.4.2 LncRNAs with enhancer-like functions 24 1.4.3 LncRNAs regulate gene expression by behaving as competing endogenous RNAs or
1.4.4 Other molecular functions of lncRNAs 25
Chapter II: Aims and Objectives ……… 26
Chapter III: Materials and Methods ……… 28 3.1 Feeder-free culture of human pluripotent stem cells 28 3.2 Expansion and mitotic inactivation of MEF cells 28 3.3 Preparation of MEF-conditioned medium 29
Trang 43.4 Culture of PA6 mouse skull bone marrow stromal cells 29 3.5 Differentiation of hESCs into neural progenitors and dopaminergic
3.5.1 Co-culture of PA6 and hESCs for stromal-derived inducing activity (SDIA) 30 3.5.2 Isolation and expansion of neural progenitor cells (NPCs) 30 3.5.3 Differentiation of NPCs into dopaminergic (DA) neurons 31
3.6 Culture of human fetal mesencephalon-derived neural stem cells
3.7 Small-interfering RNA (siRNA)-mediated gene silencing 33
3.7.1 Transfection of H1 hESCs with siRNAs 33 3.7.2 Transfection of ReN-VM cells with siRNAs 33
3.9.2 Extraction of nuclear and cytoplasmic RNA (RNA fractionation) 35
3.10 Reverse transcription of RNA to cDNA 36
3.12 Analysis of gene expression by microarray 39
3.12.1 RNA amplification for Illumina bead chips 39 3.12.2 Illumina bead chip hybridization 40 3.12.3 RNA amplification and hybridization on custom designed Agilent arrays 40 3.12.4 Statistical analysis of microarray data 40
3.18 Chromatin immunoprecipitation (ChIP) 47
3.21 RNA fluorescence in situ hybridization 50
Chapter IV: Neural Differentiation of Human Pluripotent Stem Cells ……… 52
4.2.1 A homogenous population of neural progenitors was derived from hESCs by the
4.2.2 Human ESC-derived neural progenitors differentiated into functional dopamine
Trang 5Chapter V: Identification of Long Non-coding RNAs Associated with Pluripotency and Neural Differentiation ……… 70
5.2.1 Microarray expression profiling identifies differentially expressed lncRNAs 72 5.2.2 Identification of lncRNAs associated with pluripotency (pluripotent lncRNAs) 74 5.2.3 Identification of lncRNAs associated with neural progenitors (NPC lncRNAs) 77 5.2.4 Identification of lncRNAs associated with neuronal differentiation (neuronal
8.2.2 RMST is developmentally regulated by transcription factor REST 129
8.2.3 RMST is indispensable for neurogenesis, but not required for maintenance of neuronal
8.2.4 Nuclear-retained RMST physically associates with RNA-binding protein
hnRNPA2B1 and transcription factor SOX2 134
8.2.5 RMST and SOX2 co-regulate a common pool of genes 140
8.2.6 RMST does not regulate SOX2 expression 144
8.3.1 RMST forms part of a complex that is required for neurogenesis 145
Trang 68.3.2 RMST may change the binding patterns of SOX2 to chromatin 146
8.3.3 RMST may bind to proteins other than hnRNPA2B1 and SOX2 147
9.2.2 Epigenetic regulation of pluripotency 151
9.2.5 Long non-coding RNAs or short peptides 153
References ……… 155 Appendices ……… 169
Appendix I: List of 152 genes downregulated upon RMST and SOX2 knockdown 169 Appendix II: List of 331 genes upregulated upon RMST and SOX2 knockdown 170
Trang 7Abstract
Long non-coding RNAs (lncRNAs) are a recently discovered class of transcripts encoded within the human genome LncRNAs have been proposed to be key regulators of biological processes, including stem cell pluripotency and neurogenesis However, at present very little functional characterization of lncRNAs involved in differentiation has been carried out in human cells In this thesis, functional characterization of lncRNAs in human development is addressed using human embryonic stem cells (hESCs) as a paradigm for pluripotency and neuronal differentiation Human ESCs were robustly and efficiently differentiated into neurons, and expression of lncRNAs was profiled using a custom-designed microarray Some hESC-specific lncRNAs involved in pluripotency maintenance were identified, and shown to physically interact with SOX2, and PRC2 complex component, SUZ12 Using a similar approach, we identified lncRNAs required for neurogenesis Knockdown studies indicated that loss of any of these lncRNAs blocked neurogenesis, and immunoprecipitation studies revealed physical association with REST and SUZ12
In particular, a neuronal lncRNA, RMST, was found to be essential for neurogenesis Knockdown of RMST in human neural stem cells prevented neurogenesis RNA pulldown and RNA immunoprecipitation indicated that RMST
physically associated with the RNA-binding protein hnRNPA2B1 and the transcription factor SOX2 Perturbation studies, followed by genome-wide
transcriptional profiling indicated that RMST and SOX2 co-regulate a large pool of targets Interestingly, knockdown of RMST resulted in reduced SOX2 occupancy at its
Trang 8target gene promoters, suggesting that RMST may alter SOX2 binding to chromatin
during neurogenesis Together, this study represents important evidence for an indispensable role of lncRNAs in human brain development
Trang 9List of figures
Figure 1.1 OCT4, SOX2 and NANOG form the core transcription factors
Figure 1.2 An extended ES transcriptional network and regulatory circuit 5
Figure 1.3 Patterning of the neural tube generates unique domains for neuronal
Figure 1.4 The differentiation of pluripotent stem cells into neuroepithelial stem
Figure 1.5 Correlation of expression profiles of lncRNAs with protein-coding
gene markers during embryoid body (EB) differentiation 17
Figure 1.6 A model for lincRNA integration into the molecular circuitry of
Figure 1.7 Paradigms for how lncRNAs function at the molecular level 22 Figure 4.1 Stromal co-culture of H1 hESCs resulted in neural differentiation 55
Figure 4.2 Differentiation of hESCs into a monolayer neural progenitor
Figure 4.3 Neural progenitors derived from H1 hESCs (H1-NPCs) homogenously
expressed neural stem cell and radial glia markers 59 Figure 4.4 Schematic representation of the differentiation of hESC-derived radial
glia-like NPCs into midbrain dopaminergic (DA) neurons 61 Figure 4.5 Immunofluorescence characterization of H1-derived neurons indicating
their midbrain dopaminergic identity 62 Figure 4.6 Quantitative PCR (qPCR) analysis confirming midbrain dopaminergic
Figure 4.7 Flow cytometry analysis showed that H1-NPCs differentiated into TH+
midbrain dopamine neurons with high efficiency 65 Figure 4.8 Dopamine enzyme-linked immunosorbent assay indicated that H1-
derived dopamine neurons were responsive to depolarization 66 Figure 5.1 Microarray expression profilinf identified diferentially expressed
lncRNAs during neural differentiation of hESCs 74
Trang 10Figure 5.2 Identification of lncRNAs important in pluripotency, neural induction,
Figure 6.1 Three lncRNAs were exclusively expressed in hESCs and iPSCs 87 Figure 6.2 Pluripotent lnRNAs are low abundance transcripts 88 Figure 6.3 RNA-seq analysis of pluripotent lncRNAs in H1 hESCs, indicating
Figure 6.4 Schematic showing OCT4 and NANOG binding sites in the vicinity of
component SUZ12, and the pluripotent transcription factor SOX2 100 Figure 6.12 Proposed mechanism for role of lncRNA in hESC pluripotency 102
Figure 6.13 In silico prediction of lncRNA-protein interactions supporting the
proposed mechanism of lncRNAs functioning as a modular scaffold for chromatin modifiers and transcription factors 103 Figure 7.1 In situ hybridization images of lncRNA expression showing that
lncRNAs were specifically localized to the specific brain regions 106
Figure 7.2 RNA-seq analysis indicating transcription start and end sites of
Figure 7.3 Tissue specificity of the neuronal lncRNAs RMST, lncRNA_N1,
Figure 7.4 Relative abundance of neuronal lncRNAs, compared to that of GAPDH
Trang 11Figure 7.5 Schematic representation of differentiation of ReN-VM neural stem
cells following transfection of siRNAs 113 Figure 7.6 Neuronal lncRNAs can be efficiently targeted by siRNAs 114 Figure 7.7 Knockdown of neuronal lncRNAs prevent neurogenesis 116 Figure 7.8 Loss of TUJ1+ cells upon knockdown of neuronal lncRNAs 117
Figure 7.9 Knockdown of neuronal lncRNAs resulted in cells adopting a glia fate
Figure 7.10 Apart from lncRNA_N2, the other neuronal lncRNAs were
Figure 7.11 RNA immunoprecipitation indicated that lncRNA_N1 and lncRNA_N3
Figure 7.12 Quantification of changes in hosted miRNAs in response to
Trang 12Figure 8.12 Biotinylated RMST pulldown, coupled to LC-MS/MS mass
spectrometry identified hnRNPA2B1 and SOX2 as protein partners of
Figure 8.13 Western blot confirmed that hnRNPA2B1 and SOX2 specifically
Figure 8.14 RNA immunoprecipitation (RIP) established in vivo binding of RMST
Figure 8.15 Co-immunoprecipitation (Co-IP) of hnRNPA2-FLAG and SOX2 in the
Figure 8.16 Proposed model of the RMST complex 140
Figure 8.17 Knockdown of components of the RMST complex prevented
Figure 8.18 RMST and SOX2 regulate a common pool of targets 143
Figure 8.19 Transcript levels of SOX2 remained unchanged following RMST
Figure 8.20 Depletion of RMST resulted in decreased SOX2 occupancy at target
Trang 13List of tables
Table 3.2 List of primers for protein-coding genes and Universal Library Probe
Table 3.3 List of primers used for lonc non-coding RNAs in qPCR 39 Table 3.4 Primary antibodies and respective dilutions used in Western blot 44 Table 3.5 Primary antibodies and respective dilutions used in
immunofluorescence (IF) and flow cytometry (FACS) 45 Table 3.6 Sequences of primers used for ChIP-PCR 48
Table 5.1 Genes expressed in H1-derived neurons were highly enriched for Gene
Ontology (GO) terms relating to neuronal differentiation 73 Table 5.2 List of the 36 pluripotency lncRNAs 76
Table 6.1 List of pluripotent lncRNAs that occupy a unique location in the
Table 6.2 Pluripotent lncRNAs in this study 89
Table 7.1 List of 25 neuronal lncRNAs that occupied a unique location in the
human genome, and can be targeted by RNAi 108
Table 8.1 Table summarizing microarray findings upon knockdown of RMST or
Table 8.2 Gene Ontology (GO) analysis of the 152 genes in the RMST and
Trang 14BSA Bovine serum albumin
cDNA Complementary DNA
DMSO Dimethyl sulfoxide
DNA Deoxyribonucleic acid
ES cells Embryonic stem cells
FACS Fluorescence activated cell
sorting
FBS Fetal bovine serum
FDR False discovery rate
GFP Green fluorescent protein
GO Gene ontology
hnRNP heterogenous nuclear
ribonucleoprotein
IgG Immunoglobulin G
LiCl Lithium chloride
MAP2 Microtubule-associated protein
2
MEF Mouse embryonic fibroblast
OCT4 Octamer-binding transcription
factor 4 PAGE Polyacrylamide gel
electrophoresis PBS Phosphate buffered saline PCR Polymerase chain reaction qPCR Quantitative PCR
RISC RNA-induced silencing
complex RNA Ribonucleic acid RNAi RNA interference SDS Sodium dodecyl sulfate siRNA Small interfering RNA SOX SRY-related HMG Box TBST Tris-buffered saline/ Tween-20
Trang 15Chapter I – Introduction
1.1 Transcriptional control of stem cell pluripotency
Embryonic stem cells (ESCs) are derived from the inner cell mass of blastocysts, and can be maintained undifferentiated in culture Pluripotency and self-renewal are hallmarks of embryonic stem cells (ESCs) Pluripotency, or the undifferentiated ESC state that can give rise to mature cells of the three germ layers, is in part maintained
by an intricate interplay between transcription factors and their genomic targets Forced expression of the right cocktail of transcription factors is also known to induce pluripotency in adult somatic cells (Takahashi and Yamanaka, 2006), further demonstrating the pivotal role of transcription factors in the maintaining and inducing the pluripotent cell state
1.1.1 The transcription factors OCT4, NANOG and SOX2 constitute the human embryonic stem cell transcriptional core
The transcription factors OCT4, NANOG and SOX2 play essential roles in early embryonic development, and are among the pluripotency-associated factors that maintain ESCs (Boiani and Scholer, 2005; Chambers et al., 2003; Niwa et al., 2000) Genome-wide chromatin immunoprecipitation (ChIP) studies aimed at elucidating transcription factor binding sites and regulation of pluripotency have led to the discovery of a transcriptional regulatory circuitry in hESCs (Boyer et al., 2005; Ivanova et al., 2006; Loh et al., 2006; Rao and Orkin, 2006) Numerous target genes co-bound and co-regulated by OCT4, SOX2 and NANOG have been identified (Figure 1.1A) In addition, genes that are co-bound by OCT4, NANOG and SOX2
Trang 16include those that promote pluripotency and self-renewal, such as OCT4, SOX2, NANOG, STAT3, ZIC3, and components of the TGF-β and Wnt signaling pathways These observations suggested that OCT4, SOX2 and NANOG promote pluripotency
by positively regulating their own expression and genes encoding components of key signaling pathways (Figure 1.1B)
Apart from binding to pluripotency targets, OCT4, SOX2 and NANOG bound targets were also enriched for genes implicated in developmental processes These included genes that specify transcription factors important for differentiation into the extra-embryonic, endodermal, mesodermal, and ectodermal lineages (Figure 1.1C) The observation that OCT4, SOX2 and NANOG co-occupy a set of repressed genes that are key to developmental processes suggested that in addition to activating pluripotency-associated genes, they also repress genes associated with differentiation
co-1.1.2 An expanded transcriptional regulatory circuit maintaining pluripotency
In a landmark report by Takahashi and Yamanaka (2006), it was demonstrated that pluripotency could be induced in somatic cells such as fibroblasts by the forced expression of four factors, namely Oct4, Sox2, Klf4 and c-Myc The induced pluripotent stem cells (iPSCs) were similar to ESCs in morphology, proliferation, surface antigens, telomerase activity, gene expression and epigenetic marks (Takahashi et al., 2007) Surprising, they observed that Nanog was dispensable for the reprogramming of fibroblasts into iPSCs, implying that other transcription factors, such as Klf4, were also part of an extended regulatory network governing pluripotency (Figure 1.2)
Trang 18Figure 1.1: (Adapted from Boyer et al., 2005) OCT4, SOX2 and NANOG form the core transcription factors governing pluripotency of hESCs (A) Venn
diagram representing the extent of overlap of OCT4, SOX2 and NANOG promoter bound regions (B) The interconnected autoregulatory loop formed by OCT4, SOX2, and NANOG Regulators are represented as blue circles while gene promoters are shown as red rectangles OCT4 and SOX2 are known to physically interact (C) A model for the core transcriptional regulatory network in hESCs, whereby OCT4, SOX2 and NANOG are the core transcription factors that co-activate pluripotency-associated factors, and co-repress differentiation genes
From the iPSC studies, it was evident that Kruppel-like factor 4 (Klf4) was an important component of the ES circuitry It was a mysterious player among the four
“Yamanaka factors”, since there were no apparent defects in a loss-of-function assay
It was later discovered that other Klf members, Klf2 and Klf5, could compensate the loss of Klf4 in maintaining pluripotency, and depletion of all three Klf members led to differentiation Genome-wide ChIP assays revealed that these Klfs share many common downstream targets of Nanog, and a NANOG promoter luciferase reporter assay established that KLF4 directly regulates NANOG expression (Chan et al., 2009; Jiang et al., 2008) This indicates that the core KLF circuit is integrated into the NANOG transcriptional network, to specify gene expression unique to ESCs
Several other transcription factors are also integral members of the ES transcriptional regulatory network, including Dax1, Zfp281, Rex1 and Esrrb (Figure 1.2A; (Kim et al., 2008) Another example is Zfp206 or Zscan10, which encodes a zinc finger transcription factor specifically expressed in pluripotent stem cells, and is required for self-renewal of undifferentiated cells Zfp206 physically associates with both Oct4 and Sox2, and genome-wide mapping of Zfp206 binding sites in ESCs
Trang 19Zfp206 is also a key component of the regulatory network that maintains ESCs (Yu et al., 2009)
Figure 1.2: (Adapted from Kim et al., 2008) An extended ES transcriptional network and regulatory circuit (A) A transcriptional regulatory circuit maintaining
pluripotency in ESCs, with five factors (Nanog, Oct4, Sox2, Dax1, and Klf4) showing
an auto-regulatory mechanism (B) Transcriptional regulatory circuit within the four Yamanaka reprogramming factors and their integration into the Nanog circuitry
Trang 201.1.3 Non-coding RNAs modulate pluripotency by regulating expression of the core transcription factors and/or downstream genes
Non-coding RNAs (NcRNAs) are an important class of regulatory molecules that are changing our concept of gene regulation In particular, microRNAs (miRNAs) are well-characterized regulators of gene expression MiRNAs are short, approximately 22-nucleotide RNAs that attenuate gene expression post-transcriptionally by base-pairing with target mRNAs which has a sequence in the 3’ untranslated region (UTR) that imperfectly matches the six- to eight-nucleotide “seed sequence” of the miRNA The miRNA then inhibits translation or cause degradation of the mRNA by the RISC complex when there is perfect complementarity with the seed sequence (Brodersen et al., 2008; Farazi et al., 2008; Filipowicz et al., 2008), a phenomenon known as RNAi
Recent reports have identified miRNAs that were upregulated in differentiating ESCs and played a critical role in the control of pluripotency genes (Judson et al., 2009; Suh et al., 2004; Tay et al., 2008a; Tay et al., 2008b; Xu et al., 2009) miR-296, miR-470 and miR-134 were found to be up-regulated during retinoic acid induced differentiation of mESCs, and were computationally predicted to target the protein-coding sequences (CDS) of the pluripotent core transcription factors Oct4, Sox2 and Nanog Forced expression of the miRNAs led to ESC differentiation, as well as down-regulation of Oct4, Nanog and Sox2 at both the mRNA and protein levels (Tay et al., 2008a)
More recently, it was demonstrated that the forced expression of miRNAs alone could reprogram adult somatic cells to attain pluripotency (Anokye-Danso et al., 2011) The miR302/367 cluster specifically expressed in undifferentiated ESCs
Trang 21and iPSCs has been shown to be a direct target of Oct4 and Sox2, two of the most important pluripotency transcription factors that are also required for iPSC reprogramming The authors also reported that miR367 expression was required for reprogramming and activated Oct4 gene expression Together, studies on miRNAs in pluripotency point to an important role of miRNAs in regulating the transcriptional network of pluripotency
In recent years, long non-coding RNAs (lncRNAs) became prominent in the research of ncRNAs, in part fueled by whole genomic and transcriptomic analyses and deep sequencing technologies (Chen et al., 2011; Guttman et al., 2009; Lin et al., 2011) LncRNAs are defined as RNA transcripts that are longer than 200 nucleotides, and have little or no protein coding capacity In contrast to the small ncRNAs such as miRNAs, which are highly conserved and silence gene expression through specific base-pairing with target mRNAs, lncRNAs are poorly conserved and regulate gene expression by diverse mechanisms that are not entirely understood The roles of lncRNAs in pluripotency will be described in Section 1.3
1.2 Directed neural differentiation of hESCs
1.2.1 Neural induction – lessons from the embryo
One of the characteristics of pluripotent stem cells is the ability to differentiate into mature cell types of the three germ layers: ectoderm (including neural lineages), mesoderm and the endoderm However, the efficient differentiation of pluripotent stem cells into specific neural cell types requires exposure of the cells to the right culture conditions (an optimal concentration of growth factors, signaling molecules,
Trang 22inhibitors, and cell-cell signaling) for the right duration of time Development of the nervous system can be divided roughly into three processes: neural induction, neurulation, and regional specification Cues taken from the embryo during each of
these processes would be useful in establishing efficient methods for in vitro neural
differentiation
Explant studies in the Xenopus had shown that the Spemann organizer could
induce a complete new dorsal axis, resulting in a twinned embryo with two distinct heads The Spemann organizer is involved in the induction of neural tissue from the ectoderm by secreting bone morphogenetic protein (BMP) antagonists such as chordin, noggin and cereberus BMP4 is pivotal in neural induction, as it inhibits cells from forming neural tissue (Finley et al., 1999) Therefore, inhibition of BMP signaling is crucial for initiating neural differentiation However, double knockout of noggin and chordin in zebrafish did not prevent a neural plate from developing, although there were defects in the neural tube This suggests that other signaling pathways, in addition to the inhibition of the BMP signaling, could be involved in
neural development With insights from in vivo neural development, methods aimed
at efficient neural induction were reported, and described below
1.2.1.1 Inhibiting the TGF-β signaling pathway enhances neural induction
The transforming growth factor beta (TGF-β) signaling pathway is involved in many cellular processes, including cellular differentiation The TGF-β superfamily of ligands includes the BMPs, growth and differentiation factors (GDFs), anti-mullerian hormone (AMH), Activin, Nodal and the TGFβs The TGF-β superfamily ligands bind to a type II receptor, which sets off a phosphorylating cascade culminating in the
Trang 23eventual phosphorylation of receptor-regulated SMADs The SMAD complexes then accumulate in the nucleus where they act as transcription factors that regulate gene expression of their targets
Noggin is a secreted factor expressed in the Spemann organizer In Xenopus,
the Spemann organizer induces neural tissue from dorsal ectoderm and dorsalizes lateral and ventral mesoderm Noggin binds and inactivates the BMPs, and the inhibition of the BMP signaling pathway in the ectoderm is the hallmark of neural fate acquisition (Munoz-Sanjuan and Brivanlou, 2002) By recapitulating neural development, Lamb et al (1993) and Valenzuela et al (1995) showed that Noggin directly induced neural tissue, and was an endogenous neural inducer Subsequently, recombinant Noggin has been used in several hESC neural induction protocols (Chambers et al., 2009; Elkabetz et al., 2008; Lee et al., 2007)
Recently, Dorsomorphin or Compound C was discovered to be a small molecule substitute for recombinant Noggin (Yu et al., 2008; Zhou et al., 2010) Dorsomorphin is an inhibitor of BMP signaling, identified in a screen for compounds that perturb the dorsoventral axis formation in zebrafish, and selectively inhibits the BMP type I receptors ALK2, ALK3 and ALK6 (Yu et al., 2008) This eventually results in blockade of SMAD1/ 5/ 8 phosphorylation, and induces neural conversion
of hESCs by suppressing endoderm, mesoderm and trophectodoerm differentiation (Morizane et al., 2011; Zhou et al., 2010)
In addition, the small molecule SB431542 was shown to enhance neural induction by inhibiting the Lefty/ Activin/ TGF-β pathways by blocking
Trang 24phosphorylation of the ALK4, ALK5 and ALK7 receptors The combined blockade of the TGF-β pathway using SB431542 in combination with Noggin or Dorsomorphin was reported to achieve very efficient neural induction (Chambers et al., 2009; Morizane et al., 2011)
1.2.1.2 Stromal co-culture induces neural conversion
It was discovered by Kawasaki and colleagues (2000) that the co-culture of ES cells and the mouse PA6 stromal cells could induce efficient dopamine neuronal differentiation They described the neuralizing activity conferred by the stromal cells
as “stromal cell-derived inducing activity” or SDIA Since then, research groups have attempted to unravel the basis of SDIA for efficient neural differentiation of hESCs (Swistowska et al., 2010; Vazin et al., 2009; Vazin et al., 2008) To this end, Vazin et
al (2008) found that PA6 cell surface activity was required for neural differentiation
of hESCs, as the PA6-conditioned medium (PA6-CM) was ineffective in neural induction In addition, it was reported that PA6-CM could induce dopaminergic differentiation in neural progenitors derived from hESCs, but not directly from hESCs, indicating that soluble factors, including Sonic hedgehog (SHH) secreted by the PA6 stromal cells act on neural precursor cells to specify a dopaminergic fate (Swistowska et al., 2010) SHH is a ligand of the Hedgehog signaling pathway, and a ventralizing signal required for the development of midbrain dopamine neurons
In attempts to chemically define SDIA, a genome-wide expression analysis was performed to compare global gene expression differences between PA6 cells, and various cell lines lacking the SDIA effect Among the factors highly expressed by PA6 cells, Vazin et al (2009) discovered that the combination of stromal cell-derived
Trang 25factor 1 (SDF-1/CXCL12), pleiotrophin (PTN), insulin-like growth factor (IGF2) and ephrin B1 (EFNB1) induced undifferentiated hESCs to dopamine neurons, mimicking the effects of SDIA
1.2.2 Regional specification of midbrain dopamine neurons
After neuroectoderm induction, the embryo undergoes neurulation, in which the epithelial neural plate begins to furrow and the neural tube forms, which will eventually differentiate into the spinal cord and the brain, forming the central nervous system Following neural tube closure, patterning is important for determining regional cell fate, and this is mediated by signaling between adjacent cells by cell-surface proteins, and by gradients of signaling molecules The neural tube becomes regionalized along the antero-posterior (AP) axis, as well as the dorso-ventral (DV) axis
Cell fate decisions along the DV and AP axes are dictated by factors from signaling and organizing centers (Rowitch and Kriegstein, 2010) The formation of midbrain (mesencephalic) dopamine neurons is directed by diffusible signals from the notochord, floor plate, and isthmic organizer Sonic hedgehog (SHH), secreted by the notochord and floor plate, and fibroblast growth factor 8 (FGF8) secreted by the isthmic organizer, are key molecules involved in dopaminergic differentiation (Ye et al., 1998)
SHH is secreted initially by the notochord, then the floor plate, and antagonizes the BMPs secreted by the roof plate, thereby creating a SHH gradient along the DV axis (Figure 1.3A) In response to SHH signaling, several homeodomain
Trang 26genes are expressed in a dose-dependent manner A strong SHH signal ventralizes neural tissues, resulting in expression of ventral genes NKX6.1 and NKX2.2
Along the AP axis, FGF8 is secreted by the isthmic organizer at the hindbrain boundary (Figure 1.3B) Transgenic mouse analyses, quail-chick chimeras, and FGF8 delivery by ligand-soaked beads have demonstrated that FGF8 signaling is critical to midbrain patterning (Crossley et al., 1996; Lee et al., 1997) In the midbrain, FGF8 shifts the isthmus forward and converts midbrain into hindbrain, while in the caudal diencephalon, FGF8 induces an ectopic isthmic organizer which converts the surrounding diencephalon into midbrain tissue The difference in activity
midbrain-is again due to the dose-dependent response to the classic diffusible morphogen FGF8 A strong FGF8 signal activates the extracellular signal-regulated kinase (ERK) pathway to induce cerebellar development On the other hand, a lower level of FGF8 signaling induces midbrain development (Sato and Nakamura, 2004), by inducing expression of midbrain transcription factors such as EN1/2, PAX2 and WNT1
Efficient differentiation can be achieved by mimicking developmental processes By recapitulating the process of regional specification of midbrain by addition of the right concentrations of SHH and FGF8, several groups reported the efficient generation of dopamine neurons from embryonic stem cells (Cho et al., 2008; Iacovitti et al., 2007; Kriks et al., 2011; Perrier et al., 2004)
Trang 27Figure 1.3: (Modified from Rowitch and Kriegstein, 2010) Patterning of the neural tube generates unique domains for neuronal progenitors (A) The
primitive neuroepithelium of the neural tube is patterned by organizing signals These signals emanate from the ventral floor plate (such as SHH, purple) and the roof plate (such as BMPs, green) The two signals with opposing actions from two ends of the dorso-ventral axis create a SHH gradient which results in specification of neuronal types (B) Along the antero-posterior axis, FGF8 is secreted by the isthmic organizer
or the rostral centre Midbrain dopamine neurons are specific at the region where FGF8 and SHH signals intersect each other
Trang 281.2.3 Radial glia cells are neuronal progenitors in vivo
During development, radial glia cells function as neural progenitors (Malatesta et al., 2000; Noctor et al., 2001), as well as a scaffold upon which nascent neurons migrate
along (Poluch and Juliano, 2007) In vivo evidence also suggests that radial glia cells
serve as neuronal progenitors in all region of the central nervous system including the mesencephalon (Anthony et al., 2004; Hebsgaard et al., 2009), which gives rise to dopaminergic neurons
Radial glia cells are differentiated from neuroepithelial cells, and therefore represent more fate-restricted progenitors than neuroepithelial cells, and successively replace the latter (Gotz and Huttner, 2005) As a result, most of the neurons in the brain are derived from the radial glia cells The neuroepithelial properties that are retained by radial glia include expression of “neural stem cell markers” such as NESTIN, MUSASHI1 (MSI1) and SOX2 Neurogenic radial glia cells also express PAX6 (Haubst et al., 2004; Heins et al., 2002) However, in contrast to neuroepithelial cells, radial glia cells show several astroglia properties, such as expression of astrocyte-specific glutamate transporter (GLAST), glia fibrillary acidic protein (GFAP), brain-lipid-binding protein (BLBP), and vimentin (Figure 1.4) In terms of potential, while neuroepithelial cells can differentiate into neurons, astrocytes and oligodendrocytes, most radial glia are restricted to the generation of a single cell type (reviewed in Gotz and Huttner, 2005) Therefore, it appears that differentiation of pluripotent stem cells into a specific radial glia population may result in more efficient neuronal differentiation
Trang 29Figure 1.4: The differentiation of pluripotent stem cells into neuroepithelial stem cells and radial glia Differentiation of hESCs or iPSCs into neuroepithelial cells
requires the inhibition of the BMP and TGFβ pathways Subsequently, neuroepithelial cells may differentiate into radial glia cells, with the loss of neuroepithelial markers SOX1, and the simultaneous gain of astroglia markers GFAP, BLBP, VIMENTIN and GLAST
1.3 Long non-coding RNAs in biology
Transcription is widespread in the mammalian genome Large-scale cDNA cloning projects (Carninci et al., 2005; Katayama et al., 2005), genomic tiling arrays (Bertone
et al., 2004; Birney et al., 2007; Kapranov et al., 2007), and more recently deep transcriptome sequencing (Chen et al., 2011; Khalil et al., 2009) indicate that transcription is not limited to protein-coding regions Once thought to be transcriptional “noise” or spurious transcription, it is becoming increasing clear that these non-coding RNAs, including long non-coding RNAs (lncRNAs) are functional
Trang 30LncRNAs are defined as RNA transcripts that are longer than 200 nucleotides, and have little or no protein coding capacity They are transcribed by either RNA Polymerase II (RNA Pol II) or RNA Pol III Some lncRNAs resemble mRNAs in that they are spliced, capped at the 5’ end and polyadenylated at the 3’ end, except that the lncRNAs do not code for proteins Depending on their proximity to protein-coding genes in the genome, lncRNAs can be categorized into five classes: sense, antisense, bidirectional, intronic and intergenic Sense lncRNAs overlap with the sense strand of coding sequence (CDS) of protein-coding gene, and are transcribed in the same direction Similarly, an antisense lncRNA overlaps with the CDS of a protein-coding gene, but is transcribed from the opposite orientation A bidirectional lncRNA is located from the opposite strand of a protein-coding gene, whose transcription is initiated less than 1 kb away Intronic lncRNAs are derived entirely from an intron of another transcript, and intergenic lncRNAs are not located near (> 10 kb) any protein-coding loci The category of lncRNAs is not an indicator of their function However,
it was found that a specific subset of functional intergenic lncRNAs, termed large intergenic non-coding RNAs or lincRNAs, are defined by their epigenetic marks: trimethylation of lysine 4 of histone H3 (H3K4me3) at their promoters, and the trimethylation of lysine 36 of histone H3 (H3K36me3) along the length of the
transcribed region (Guttman et al., 2009)
1.3.1 Long non-coding RNAs in pluripotency
Recent studies in mESCs suggest that lncRNAs are integral members of the ES cell regulatory network (Dinger et al., 2008; Guttman et al., 2011; Sheik Mohamed et al., 2010) In a study by Dinger et al (2008), global expression of lncRNAs during a 16-day differentiation course was profiled by means of a customized lncRNA
Trang 31microarray LncRNAs associated with pluripotency exhibit expression profiles that correlated well with those of OCT4, NANOG and SOX2, which are core components
of the transcriptional network in maintaining pluripotency (Chapter 1.1.1; Figure 1.5) The functions of these lncRNAs were however, not established in the study
Figure 1.5: (Adapted from Dinger et al., 2008) Correlation of expression profiles
of lncRNAs with protein-coding gene markers during embryoid body (EB)
differentiation Genes with well-characterized roles in pluripotency (Pou5f1 or Oct4,
Sox2 and Fbxo15) were used to identify lncRNAs with correlated expression profiles
(Pearson’s coefficient > 0.9)
In another large-scale, systematic study of lncRNAs in mESC pluripotency, Guttman et al (2011) performed loss-of-function experiments on each of the 226 lincRNAs known to be expressed in mESCs Of the 226 lincRNA promoters, approximately 75% are bound by at least one of nine pluripotency associated transcription factors (Oct4, Sox2, Nanog, c-Myc, n-Myc, Klf4, Zfx, Smad and Tcf3) From the loss-of-function experiments, the authors identified 26 lincRNAs that maintain the pluripotent state Knockdown of these lincRNAs resulted in a loss of
Trang 32pluripotency markers, and reduced Nanog promoter activity At the same time,
knockdown of these 30 lincRNAs produced expression patterns similar to differentiation into specific lineages, suggesting that lincRNAs repressed differentiation programs in mESCs Taken together, this indicates that lincRNAs are integral members of the pluripotency network, with key pluripotent transcription factors regulating lincRNA expression, which in turn modulates pluripotency and lineage-specific differentiation pathways (Figure 1.6)
Figure 1.6: (Adapted from Guttman et al., 2011) A model for lincRNA integration into the molecular circuitry of embryonic stem cells ES-specific
lincRNAs are bound, and regulated by pluripotent transcription factors The lincRNAs then form specific lincRNA-protein complexes that promote pluripotency, and repressing differentiation simultaneously
Trang 331.3.2 Long non-coding RNAs in neural development
Long non-coding RNAs are abundant in the brain, and a study scrutinizing in situ
hybridization data from the Allen Brain Atlas discovered specific expression of some lncRNAs expressed in regions of the brain These were also localized in specific compartments of neural cells, which indicated biological meaning and function (Mercer et al., 2008) Another recent study demonstrated that 169 lncRNAs were differentially expressed during the differentiation of mouse neural progenitors into GABA neurons and oligodendrocytes (Mercer et al., 2010) These dynamically regulated lncRNAs were associated with key neural developmental protein-coding genes, indicating a possible role in neuronal-glia fate decisions Although many lncRNAs have been predicted to be involved in neural development, only a handful have been functionally characterized Some of these lncRNAs are described below
1.3.2.1 Nkx2.2AS
Nkx2.2AS is an endogenous 4.3 kb transcript antisense to Nkx2.2, and is localized in
the cytoplasm Nkx2.2AS is also polyadenylated, but does not code for a protein Tochitani and Hayashizaki (2008) showed that by overexpressing Nkx2.2AS in mouse
neural stem cells, differentiation into the oligodendrocytic lineage was enhanced,
possibly by upregulation of Nkx2.2 Nkx2.2 is a transcription factor known to direct
the differentiation of neural stem cells into oligodendrocytes (Guillemot, 2007)
1.3.2.2 Evf2
The 3.8 kb long polyadenylated Evf2 lncRNA is transcribed from the intergenic region between the Dlx5 and Dlx6 loci, and is overlapped with the conserved Dlx5/6 intergenic enhancer Evf2 is a downstream target of SHH signaling in the developing
Trang 34telencephalon, exhibiting trans-acting transcriptional cooperativity with DLX homeodomain proteins and increasing Dlx5/6 enhancer activity in a neural stem cell
line (Feng et al., 2006)
In the developing ventral forebrain, Evf2 recruits Dlx and Mecp2 transcription factors to regulate Gad1 expression through trans-acting mechanisms Gad1 or
glutamate decarboxylase 1 is a gamma-aminobutyric acid (GABA) neuron marker responsible for catalyzing the production of GABA from L-glutamic acid
Consequently, Evf2-deficient mice showed reduced numbers of GABA interneurons
in the dentate gyrus and hippocampus (Bond et al., 2009)
1.3.2.3 Malat1
Malat1 is a 6.7 kb nuclear retained lncRNA, and is highly abundant in neurons DNA
microarray analysis in Malat1-depleted cells indicated that Malat1 controlled genes involved in synapse function Bernard et al (2010) performed knockdown of Malat1
in culture hippocampal neurons, which resulted in decreased synaptic density, whereas an increased synaptic density was observed following overexpression of
Malat1, indicating that Malat1 regulated synapse formation in neurons
Trang 351.4 Molecular mechanisms of long non-coding RNA function
Unlike small ncRNAs such as miRNAs, siRNAs and piRNAs, which are highly conserved and involved in transcription and posttranscriptional gene silencing through specific base pairing with their targets, lncRNAs are a heterogeneous group of molecules that exhibit poor conservation and regulate gene expression in many ways
Earlier reports of lncRNAs support a role of the transcripts acting in cis, regulating
the expression of genes in the vicinity of the lncRNA (Mercer et al., 2008; Ponjavic et al., 2009; Tochitani and Hayashizaki, 2008) However, numerous studies on lncRNAs
in recent years have shed light on the trans-acting mechanisms of lncRNAs, which
are discussed below
Molecular function of lncRNAs is dependent upon cellular localization of the non-coding transcripts Nuclear-retained lncRNAs have been reported to interact with chromatin modifying complexes and transcription factors in the nucleus (Khalil et al., 2009), modulate splicing by associating with splicing proteins (Bernard et al., 2010; Tripathi et al., 2010) and interact with chromatin (Chu et al., 2011; Sarma et al., 2010) Cytoplasmic lncRNAs on the other hand have been known to mediate mRNA decay (Gong and Maquat, 2011) or act as competing endogenous RNAs (ceRNAs) (Salmena et al., 2011) As more lncRNAs have been identified, numerous paradigms
on their molecular function are beginning to emerge (Wang and Chang, 2011), and a few are described in detail below
Trang 36Figure 1.7: (Wilusz et al., 2009) Paradigms for how lncRNAs function at the molecular level Transcription from an upstream non-coding region (orange) can [1]
negatively or [2] positively affect expression of the downstream protein-coding gene (blue) by inhibiting RNA Pol II recruitment or inducing chromatin remodeling, respectively [3] An antisense lncRNA (purple) is able to hybridize to the overlapping sense mRNA (blue) and block the recognition of splice sites by the spliceosome, thereby modulating alternative splicing patterns [4] Alternatively, hybridization of the sense mRNA and antisense lncRNA allows for Dicer to generate endogenous siRNAs that mediate gene silencing By binding to specific protein partners, a lncRNA (green) can [5] modulate protein activity, [6] serve as a scaffold which allows for a larger RNA-protein complex to form, or [7] alter protein localization [8] LncRNAs (pink) can also be processed to yield small RNAs, such as miRNAs, siRNAs or piRNAs [9] Cytoplasmic lncRNAs can also act as competing endogenous RNAs (ceRNAs), which compete for shared miRNAs, thereby imposing an additional level of posttranscriptional regulation
Trang 371.4.1 LncRNAs behave as scaffolds that target protein complexes to specific genomic loci to regulate gene transcription
A high throughput RNA immunoprecipitation study using antibodies against proteins involved in chromatin remodeling (such as PRC2, coREST, SMCX) provided the first evidence that large numbers of lncRNAs in the cells are physically bound by protein-modifying complexes In addition, knockdown of PRC2-associated lncRNAs resulted
in the activation of genes known to be repressed by PRC2, implying that lncRNAs function in regulating the epigenetic landscape at specific gene loci (Khalil et al.,
2009) LncRNAs such as HOTAIR and KCNQ1OT1 are known to recruit chromatin
modifying complexes including the PRC2 complex, G9a and DNMT1 to specific genomic loci to regulate transcription (Chu et al., 2011; Mohammad et al., 2010; Pandey et al., 2008; Rinn et al., 2007)
Further studies on well-characterized lncRNA HOTAIR revealed that it serves
as a scaffold for at least two distinct histone modification complexes The 5’ domain
of HOTAIR binds to the PRC2 polycomb repressive complex, while a 3’ domain is
bound by LSD1/CoREST/REST complex (Tsai et al., 2010) With the ability to associate with two distinct protein complexes to coordinate PRC2 and LSD1 to
chromatin for coupled H3K27 methylation and H3K4 demethylation, HOTAIR may
function as a modular scaffold on which protein complexes bind to Subsequent
studies suggest that the modular scaffold hypothesis is not restricted to HOTAIR
Guttman et al (2011), in their study of lincRNAs in mESCs, also suggested that the mESC lincRNAs may also function as scaffolds, by interacting with multiple different protein complexes, forming cell-type specific RNA-protein complexes that coordinate gene expression in a specific manner
Trang 381.4.2 LncRNAs with enhancer-like functions
With the observation that some lncRNAs tend to act in cis, affecting expression of
genes neighboring to the lncRNAs, Orom et al (2010) presented evidence indicating that lncRNAs are associated with enhancer regions, and that such non-coding transcription correlate with the increased activity of the neighboring genes Depletion
of these enhancer-like lncRNAs led to decreased expression of their neighboring protein-coding genes, including several master regulators of cellular differentiation
In mouse neurons, it was found that upon membrane depolarization, the number of p300/CBP binding sites increased from fewer than 1,000 to 28,000 (Kim et al., 2010) CBP binding is a mark of enhancer regions At promoters, CBP recruits components of the basal transcriptional machinery, including RNA Pol II, which facilitates transcription Kim et al (2010) found that CBP binding at enhancer regions also resulted in RNA Pol II assembly, and transcription of enhancer RNAs or eRNAs were detected The level of eRNA expression positively correlated with the level of mRNA synthesis at nearby genes, suggesting that eRNA transcription occurs specifically at enhancers that are actively engaged in promoting transcription
1.4.3 LncRNAs regulate gene expression by behaving as competing endogenous RNAs or promoting mRNA decay
LncRNAs can control gene expression posttranscriptionally by base-pairing with other RNA molecules Gong and Maquat (2011) reported that Staufen 1-mediated mRNA decay (SMD) can be mediated by imperfect base-pairing between an Alu element in the 3’ UTR of an SMD target and another Alu element in a cytoplasmic, polyadenylated lncRNA SMD involves the degradation of mRNAs whose 3’ UTRs
Trang 39bind to Staufen 1, a protein that binds double-stranded RNA Through this mechanism, an individual lncRNA can downregulate a subset of SMD targets, and distinct lncRNAs can downregulate the same SMD target
Yet another novel molecular function of lncRNAs is that of competing endogenous RNAs (ceRNAs) Cytoplasmic ceRNAs regulate transcription by
“sponging up” miRNAs or competing with mRNAs for common microRNAs (Cesana
et al., 2011; Tay et al., 2011) MicroRNAs are small (about 22 nt long) RNAs that negatively regulate target gene expression via base-pairing and directing them for
transcript cleavage or translational inhibition One such example of a lncRNA is
linc-MD1, which controls muscle differentiation by depleting miR-133 and miR-135 in the
cells to regulate the expression of MAML1 and MEF2C, which are transcription factors that activate muscle-specific gene expression (Cesana et al., 2011)
1.4.4 Other molecular functions of lncRNAs
The functions of lncRNAs are diverse and three of the most unique mechanisms are described above Other possibilities of lncRNA function include that as a small RNA precursor, or as a part of cellular structural components such as lncRNAs in nuclear
paraspeckles (Souquere et al., 2010) LncRNAs such as Malat1 also participate in modulating alternative splicing (Tripathi et al., 2010) Others such as Evf2 modulate
the activity of their protein partners (Feng et al., 2006)
Trang 40Chapter II – Aims and Objectives
2.1 Main goals of the thesis
The central aim of this thesis involves two important characteristics of hESCs: their ability to self-renew and yet gain the ability to differentiate into specific cell types when exposed to certain culture conditions While the pluripotency transcriptional regulatory network in hESCs is well established, it is not completely understood how lncRNAs regulate the transcriptional network to maintain the ES state At the same time, it was not known if lncRNAs play a role in the specification of neuronal differentiation of hESCs Therefore, the main objectives are to identify functional lncRNAs in human neural development, and to elucidate the molecular mechanisms underlying the biological functions As such, I set out to address the following questions in this thesis:
1 Does the lncRNA transcriptome change as hESCs differentiate into neural progenitors and eventually into specific neurons?
2 Is there a group of hESC-specific lncRNAs that is required for the maintenance of pluripotency, and how the lncRNAs function to achieve their role in hESCs?
3 Do the hESC-specific lncRNAs interact with pluripotency-related proteins, and how do the lncRNAs integrate into the ES gene regulatory network?
4 Similarly, are the lncRNAs upregulated during neuronal differentiation important for neurogenesis?