I hypothesized that if the lncRNAs were important in pluripotency, they should be highly expressed in undifferentiated hESCs and downregulated upon differentiation.. Figure 5.1: Microarr
Trang 1Chapter V – Identification of Long Non-coding RNAs Associated with Pluripotency and Neural Differentiation
LncRNAs are emerging players in embryogenesis and in developmental processes (Amaral and Mattick, 2008; Dinger et al., 2008) Recent studies in embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs) indicate that lncRNAs are integral members of the ESC self-renewal regulatory circuit (Guttman et al., 2011; Sheik Mohamed et al., 2010) In addition, Loewer et al (2010) showed that
a large intergenic non-coding RNA (lincRNA), lincRNA-RoR, enhanced the reprogramming of fibroblasts into iPSCs LncRNAs such as MALAT1, Evf2 and
Trang 2al., 2010; Bond et al., 2009; Rapicavoli et al., 2010; Tochitani and Hayashizaki, 2008) LncRNAs are also dynamically expressed during neuronal-glia fate specification, and they appear to regulate the expression of protein-coding genes within the same genomic locus, suggesting lncRNA function (Mercer et al., 2010) Additional evidence suggesting functional roles of lncRNAs in the brain includes a
computational analysis of in situ hybridization data from the Allen Brain Atlas, which
identified 849 lncRNAs showing specific expression in the mouse brain (Mercer et al., 2008) Furthermore, neural lncRNAs have been shown to be regulated by transcription factors (Johnson et al., 2009) and epigenetic processes (Mercer et al., 2010) So far, most efforts aimed at understanding lncRNA functions in pluripotency and neural differentiation focused on the mouse as a model system (Dinger et al., 2008; Guttman et al., 2011; Mercer et al., 2010; Sheik Mohamed et al., 2010; Tochitani and Hayashizaki, 2008) To date, the roles of lncRNAs in human embryonic and neural developmental gene networks have not been investigated Given the generally poor evolutionary conservation of lncRNAs (Pang et al., 2006), there is a clear need to investigate whether lncRNAs are also important in human embryonic and neuronal developmental networks
The enriched and highly homogenous cultures of neural progenitors and neurons derived from hESCs provided an ideal source of cells for expression profiling
to identify lncRNAs that are necessary for pluripotency and neural development I hypothesized that if the lncRNAs were important in pluripotency, they should be highly expressed in undifferentiated hESCs and downregulated upon differentiation Likewise, lncRNAs involved in neuronal differentiation would probably be silenced
Trang 3describe the identification of lncRNAs possibly involved in pluripotency or neurogenesis by means of microarray expression profiling
5.2 Results
5.2.1 Microarray expression profiling identifies differentially expressed lncRNAs
The highly enriched cultures of NPCs and neurons described in Chapter IV, together with undifferentiated hESCs, were used for global expression profiling, to examine gene expression changes as hESCs differentiate into NPCs and subsequently into neurons For this purpose, two types of microarrays were utilized A custom-designed microarray was used for detecting lncRNA transcripts, while an Illumina beadchip microarray was used for protein-coding transcripts
The lncRNA microarray design included 6671 transcripts identified in a number of published sources, and described in a previous publication (Jia et al., 2010) Importantly, the non-coding status of these transcripts was independently validated in that study In total, the microarray contained 43800 probes such that each lncRNA was represented by 6 to 8 probes, which achieved high sensitivity and specificity
To summarize the microarray findings, comparing the NPC to hESC stages,
we found 25% of protein-coding probes detected above background (6153 out of 24526) and 4500 probes (18%) were significantly differentially detected (FDR < 0.01; fold change > 2) Of the lncRNA subset, 16% of probes were detected above
Trang 4(p < 0.05; fold change > 2) When DA neuron stage was compared to the NPC stage, 24% of protein-coding probes were detected above background (5852 out of 24526), with 13% of these (3076 probes) being differentially detected Similarly, a smaller percentage (11.5%) of lncRNA probes (5058 probes) was expressed above background with 6% being differentially expressed (2622 probes) Altogether, a total
of 5051 differentially regulated mRNAs and 934 differentially regulated lncRNAs were identified (Figure 5.1)
As a further confirmation that the neural cell types derived from hESCs were expressing neural genes, a gene ontology (GO) analysis of the mRNA genes upregulated in the neurons compared to undifferentiated hESCs was performed This indicated an enrichment of GO terms related to neuronal differentiation and is presented in Table 5.1
Table 5.1 Genes expressed in H1-derived neurons were highly enriched for Gene Ontology terms relating to neuronal differentiation The top 10 terms are shown
Gene Ontology Biological Process GO Term Percentage of Genes
6 central nervous system development GO:0007417 9.07
7 cell morphogenesis involved in differentiation GO:0000904 8.54
9 negative regulation of biosynthetic process GO:0009890 6.41
10 positive regulation of gene expression GO:0010628 6.23 Gene clusters categorized into biological processes at level 6-9 when analyzed with FatiGO P-value < 0.01
Trang 5Figure 5.1: Microarray expression profiling identified differentially expressed lncRNAs during neural differentiation of hESCs Amplified total RNA from
undifferentiated hESCs (ES), derived neural progenitors (NPC) and derived neurons (N) were hybridized onto the coding (mRNA) microarray and the lncRNA microarray simultaneously Expression profiling identified 5051 differentially expressed mRNAs and 934 differentially expressed lncRNAs LncRNAs that are important for maintenance of stem cell identity should be highly expressed in
hESC-ES, while lncRNAs important for neuronal differentiation should be upregulated in the neurons
5.2.2 Identification of lncRNAs associated with pluripotency (pluripotent lncRNAs)
One hypothesis is that lncRNA transcripts important for hESC pluripotency maintenance would have an expression pattern similar to that of known pluripotency drivers such as OCT4, NANOG, and ZNF206, which are highly expressed in undifferentiated hESCs and downregulated upon differentiation (Figure 5.2A) To identify lncRNAs that control pluripotency, I filtered for lncRNA transcripts that had
at least 4 probes showing a greater than 5-fold downregulation (p < 0.05) when differentiated from hESCs to NPCs 36 lncRNAs were identified (Figure 5.2B and
Table 5.2), including the telomerase RNA component TERC (Agarwal et al., 2010),
indicating that our custom-designed array was able to identify pluripotency-associated lncRNAs
Trang 6Figure 5.2: Identification of lncRNAs important in pluripotency, neural induction, and neuronal differentiation (A) The expression profiles of known
pluripotency markers in ES, NPC and N stages (B) To identify pluripotency lncRNAs, we filtered for transcripts that showed at least 5-fold downregulation upon differentiation These lncRNAs show an expression pattern similar to that of known pluripotency genes (C, D) NPC lncRNAs were identified by at least 3-fold enrichment in NPC compared to ES or N This yielded an expression profile similar to
known NPC markers such as NOTCH1 and PAX6 (E, F) Neuronal lncRNAs in this
study were defined as transcripts enriched by at least 3-fold in N, relative to ES and NPC
Trang 7Table 5.2: List of the 36 pluripotency lncRNAs identified from the custom lncRNA array These lncRNAs were at least 5 times upregulated in hESCs compared to NPCs
Trang 85.2.3 Identification of lncRNAs associated with neural progenitors (NPC lncRNAs)
Apart from pluripotent lncRNAs, transcripts whose expression peaked in the NPC stage of neural differentiation were also detected Since these lncRNAs showed the highest level of transcription in the neural progenitors, they were referred to as NPC lncRNAs These NPC lncRNAs were defined as transcripts having at least 4 probes showing a greater than 3-fold expression in NPCs compared to hESCs and neurons, and mirrored the expression of genes known to be important in the maintenance of
neural stem cell identity, such as PAX6, POU3F2 and SOX1 (Figures 5.2C-D)
Interestingly, SOX2OT was also identified as a NPC lncRNA from the transcriptome-wide study SOX2OT, or SOX2-overlapping transcript, is a 2.4 kb long non-coding RNA, and encompasses the entire SOX2 gene Both SOX2 and SOX2OT are transcribed in the same orientation, with the single-exon SOX2 gene embedded within an intron of SOX2OT A previous report by Amaral et al (2009) had also reported a similar observation that expression of Sox2ot increased as mouse ES cells
differentiate into neural lineages This again indicated the sensitivity and specificity
of the custom-designed lncRNA microarray
Trang 9Table 5.3: List of the 24 NPC lncRNAs identified from the lncRNA microarray These lncRNAs were at least 3-fold upregulated in NPCs compared to hESCs and neurons
lncRNA ID genomic location (hg18/ NCBI36)
Trang 105.2.4 Identification of lncRNAs associated with neuronal differentiation (neuronal lncRNAs)
From the microarray, it was evident that there was a group of neuronal lncRNAs that were highly expressed in neurons and weakly expressed in undifferentiated hESCs and NPCs (Figure 5.1) In addition, the expression profiles of these lncRNAs mirror that of known drivers of neuronal differentiation (Figures 5.2E-F), and were therefore indicative of important roles of the lncRNAs in neurogenesis
A group of 35 lncRNAs were found to be highly expressed in differentiated neurons compared to NPCs and undifferentiated hESCs (Figure 5.2F), and are listed
in Table 5.4 These 35 lncRNAs were filtered based on having at least 4 probes showing a greater than 3-fold upregulation (p < 0.05) compared to NPCs and hESCs
Trang 11Table 5.4: List of the 35 neuronal lncRNAs identified from the custom lncRNA array These lncRNAs were at least 3-fold upregulated in mature neurons compared to hESCs and NPCs
lncRNA ID genomic location (hg18/NCBI36)
Trang 125.3 Discussion
The underlying rationale for the identification of lncRNAs involved in pluripotency and different stages of neural differentiation is that lncRNAs important for such biological processes tend to be spatially and temporally regulated (Dinger et al., 2008; Efroni et al., 2008; Mercer et al., 2010) For example, Loewer et al (2010) identified
lincRNA-RoR, a lncRNA highly expressed in human iPSCs compared to hESCs, and
found that lincRNA-RoR improved the efficiency of reprogramming fibroblasts into
pluripotent stem cells
Even though some lncRNAs are weakly expressed, the custom-designed lncRNA microarray was still able to detect changes in expression of these low abundance transcripts The probes printed onto the microarray represent previously identified lncRNA sequences from published sources (Engstrom et al., 2006; Imanishi
et al., 2004; Pang et al., 2005; Willingham et al., 2005; Zhang et al., 2006) Therefore, novel lncRNA transcripts were not identified in this study
The total number of lncRNAs is estimated to be just as abundant as mRNA transcripts (Carninci et al., 2005) Therefore, while the microarray could only detect
6671 lncRNAs, a number of possibly functional transcripts were left out in this study
With recently developed de novo sequencing technologies such as deep transcriptome
sequencing or RNA-seq, previously undiscovered lncRNA transcripts could be detected and analyzed However, in this thesis, RNA-seq was not applied to detect expression changes of novel transcripts
Trang 135.4 Conclusion
This chapter describes the identification of differentially expressed lncRNAs, which could possibly be important for pluripotency, neural stem cell identity, or neuronal differentiation RNA from the highly homogenous cell types derived from hESCs were hybridized onto two separate arrays: the custom-designed human lncRNA microarray and the Illumina microarray for protein-coding genes From the lncRNA microarray, 36 pluripotent lncRNAs, 24 NPC lncRNAs and 35 neuronal lncRNAs were identified In the next two chapters, I will describe the roles of lncRNAs in the maintenance of hESC pluripotency, and their involvement in neuronal differentiation respectively
Trang 14Chapter VI – Long Non-coding RNAs Regulate Human
Embryonic Stem Cell Pluripotency
6.1 Introduction
LncRNAs are integral members of the ESC self-renewal regulatory circuit In one of the earliest studies of lncRNAs in mouse ESC (mESC) pluripotency and early lineage specification, a lncRNA microarray to profile the lncRNA transcriptome of mESCs in various stages of differentiation was utilized (Dinger et al., 2008) This study provided evidence that lncRNAs were functional First, profiling of the transcriptome during mESC differentiation revealed that various mouse lncRNAs were dynamically expressed and closely resembled well-characterized differentiation markers Furthermore, these lncRNAs showed cell-type specific expression, indicating tight regulation of transcription Second, the promoters of the lncRNAs are decorated with H3K4me3 (trimethylation of histone 3 lysine 4) and H3K27me3 (trimethylation of histone 3 lysine 27) bivalent epigenetic marks, reminiscent of “poised” protein-coding genes in mESCs These bivalent domains usually mark developmental genes whose expressions were “poised”, or lineage-specifically expressed or repressed during differentiation LncRNAs with this bivalent domain may also then fulfill roles in lineage differentiation Third, Dinger et al showed that lncRNAs could physically associate with chromatin marks and chromatin modifying complex MLL1, suggesting that lncRNAs could modulate MLL1 activity to affect expression of developmental genes in ESCs that control differentiation
Trang 15A specific class of lncRNAs known as large intergenic non-coding RNAs (lincRNAs) was also shown to be integrated into the ESC circuitry Key pluripotent transcription factors regulate expression of the lincRNAs, which in turn regulate global gene expression by binding to multiple chromatin regulatory proteins, thereby controlling the ES cell state In this study by Guttman et al (2011), loss-of-function experiments were performed using short hairpin RNAs (shRNAs) to target 226 lincRNAs expressed in mESCs, and the authors identified 26 lincRNAs that resulted
in loss of Nanog and Oct4 expression upon lincRNA knockdown, indicating that lincRNAs maintain the pluripotent cell state
The studies on lncRNAs and pluripotency were all performed in mouse embryonic stem cells (mESCs) Results from the mouse studies may not be directly applicable to hESCs, because of the intrinsic differences between mESCs and hESCs (Ginis et al., 2004) In the previous chapter, a class of pluripotent human lncRNAs was identified from the microarray In this chapter, I describe the screening for functional lncRNAs in the maintenance of pluripotency, and provide an insight into their molecular functions
6.2 Results
6.2.1 Screening for possibly functional pluripotent lncRNAs
From the microarray study described in Chapter V, 36 lncRNAs were found to be highly expressed in undifferentiated hESCs, and weakly expressed in the differentiated cells However, not all 36 lncRNAs may be functional in maintaining
Trang 16the pluripotent state of hESCs Therefore, a loss-of-function or RNA interference (RNAi) assay was designed to identify functional pluripotent lncRNAs
Of the 36 pluripotency-associated lncRNAs, only 16 could have specific siRNAs designed to target them for knockdown, as the other 20 were substantially overlapping protein-coding genes, rendering it difficult to design specific siRNA sequences The list of lncRNAs that could be targeted by RNAi is provided in Table 6.1 To select candidates for knockdown studies, I hypothesized that if the identified lncRNAs were functional in maintaining pluripotency, their expression would be specific to pluripotent cells Thus, the expression levels of the 16 lncRNAs in undifferentiated human pluripotent stem cells and a panel of somatic tissues were quantified Three of the pluripotency lncRNAs were exclusively expressed in undifferentiated hESCs and iPSCs (Figure 6.1), indicating that they were likely to play a role in pluripotency Their expression was low (~0.9 to 2.5%) compared to that
of OCT4 mRNA level in undifferentiated hESCs (Figure 6.2), suggesting that they
might be playing a regulatory role These transcripts are subsequently referred to as
lncRNA_ES1 (AK056826), lncRNA_ES2 (EF565083) and lncRNA_ES3 (BC026300)
Notably, mouse orthologs for these three transcripts do not exist, indicating that these are probably human-specific lncRNAs
Trang 17Table 6.1: List of pluripotent lncRNAs that occupy a unique location in the genome, and can be targeted by RNAi
lncRNA ID
Genomic location (NCBI36/hg18)
Trang 18Figure 6.1: Three lncRNAs were exclusively expressed in hESCs and iPSCs
Expression levels of the 16 lncRNAs listed in Table 6.1 were quantified in undifferentiated human pluripotent stem cells and a panel of somatic tissues Of these,
AK056826 (lncRNA_ES1), EF565083 (lncRNA_ES2) and BC026300 (lncRNA_ES3)
were specifically expressed in the three pluripotent stem cell lines (H1, H9 and iPS cell lines)
Trang 19Figure 6.2: Pluripotent lncRNAs are low abundance transcripts Relative to the
abundance of OCT4 mRNA (100%) in undifferentiated hESCs, lncRNAs_ES1 to 3
were between 0.9% and 2.5% that of OCT4 mRNA level, indicating that expression
of the pluripotent lncRNAs was very low
To validate that the pluripotency lncRNAs are bona fide non-coding
transcripts, the Coding Potential Calculator (CPC) tool was employed to predict protein-coding potential of the transcripts, as it combines a variety of parameters in conjunction with a support vector machine, and the accuracy of prediction was more
than 95% (Kong et al., 2007) CPC indicated that lncRNA_ES1 and lncRNA_ES2 are
very likely non-coding while lncRNA_ES3 could be a “weakly coding” transcript, and the putative 40 amino-acid peptide has neither BLAST hits nor protein domains (Table 6.2) The transcription start and end sites were inferred by deep sequencing of the hESC transcriptome (RNA-seq) and are presented in Figure 6.3
Trang 20Table 6.2: Pluripotent lncRNAs in this study
Transcript length (bp)
class of lncRNA
CPC score*
lncRNA_ES1 AK056826 chr6:14,388,338-14,393,355 3150 intergenic -1.15338
lncRNA_ES2 EF565083 chr1:198,709,840-198,710,182 343 intergenic -0.922722
lncRNA_ES3 BC026300 chr13:53,593,076-53,605,002 1053 intergenic 0.777772
* A negative score assigned by the Coding Potential Calculator (CPC) indicates a non-coding transcript while a value between 0 and 1 indicates a “weakly coding” transcript
Trang 21Figure 6.3: RNA-seq analysis of pluripotent lncRNAs in H1 hESCs, indicating transcriptional start and end sites (A) The presence of transcription upstream of
lncRNA_ES1 indicated that lncRNA_ES1 might be part of a longer transcript that has
not been validated yet (B) lncRNA_ES2 is transcribed from the minus strand, and overlaps with another lncRNA transcribed from the plus strand (C) lncRNA_ES3 is
possibly alternatively spliced, giving rise to two lncRNAs shown: BC026300 and BC018008
Trang 226.2.2 Pluripotent lncRNAs are regulated by transcription factors
Since the expression levels of lncRNA_ES1, lncRNA_ES2 and lncRNA_ES3 were high
in undifferentiated hESCs, and very weak in differentiated cell types or somatic tissues (Figure 6.1), they could possibly be regulated by pluripotent transcription factors such as OCT4 and NANOG, which also display hESC-specific expression patterns To investigate if lncRNAs could be regulated by OCT4 and NANOG, chromatin-immunoprecipitation (ChIP) libraries were analyzed and loss-of-function studies were carried out
Available ChIP-sequencing libraries in hESCs (Chia et al., 2010) revealed that there are OCT4 and NANOG binding sites located near the transcription start sites of the lncRNAs (Figure 6.4) The proximity of these binding sites suggests that the lncRNAs may be direct, downstream targets of pluripotency factors OCT4 and NANOG To test this, the expression patterns of the pluripotency lncRNAs over a period of 5 days were monitored following either OCT4 RNAi or NANOG RNAi
lncRNA_ES1 has an OCT4 binding site in its vicinity, and its expression decreased in
response to OCT4 RNAi (Figure 6.5A) Pluripotency lncRNAs with a neighboring
NANOG binding site (namely lncRNA_ES1 and lncRNA_ES3) also showed decreased
expression upon NANOG RNAi (Figure 6.5B) Together, these results suggest that pluripotency lncRNAs are integrated into known pluripotency transcriptional networks
Trang 23Figure 6.4: Schematic showing OCT4 and NANOG binding sites in the vicinity
of the lncRNAs Black arrows indicate transcription start sites, and thick black bars
indicate exons Binding site locations were obtained from ChIP-seq data in (Chia et al., 2010) The proximity of OCT4 and NANOG binding sites indicate the possibility that the lncRNAs were regulated by the pluripotent transcription factors
Trang 24Figure 6.5: Pluripotent lncRNAs are possibly regulated by OCT4 and NANOG
Changes in expression of lncRNAs were measured by qPCR in response to knockdown of OCT4 and NANOG by RNAi in (A) and (B) respectively (A) In
response to OCT4 RNAi, expression of lncRNA_ES1 and lncRNA_ES2 decreased (B) Likewise, in response to NANOG RNAi, expression of lncRNA_ES2 and
lncRNA_ES3 decreased ACTB encodes a housekeeping gene
Trang 256.2.3 Knockdown of lncRNAs result in hESC differentiation
To determine if lncRNAs affect the pluripotent status of hESCs, a loss-of-function or RNAi assay was performed Two siRNAs were designed for each lncRNA and the more effective siRNA was subsequently used (Figure 6.6) The siRNAs were transfected into undifferentiated hESCs, and maintained in hESC medium Seven days later, pluripotency was assessed by OCT4 immunofluorescence, and RNA was also isolated for global gene expression by microarray profiling Knockdown of any
of the three pluripotency lncRNAs resulted in a loss of OCT4 protein (Figure 6.7 and 6.8) and mRNA (Figure 6.9B) In addition, knockdown of lncRNAs resulted in downregulation of a panel of pluripotency markers and simultaneous upregulation of lineage markers corresponding to the neuroectoderm, endoderm and mesoderm germ layers (Figure 6.9B)
Figure 6.6: Pluripotent lncRNAs can be effectively targeted by siRNAs Two
siRNA duplexes were used for the knockdown of each pluripotency lncRNA Efficiency of knockdown was compared relative to the non-target siRNA (si-NT) control Subsequently, the more effective siRNA was used
Trang 26Figure 6.7: Knockdown of pluripotent lncRNAs resulted in hESC differentiation
H1 hESCs were transfected with the indicated siRNAs, and OCT4 protein level was assayed by immunofluorescence 7 days later In the non-target siRNA (si-NT) control, most of the cells retained OCT4 protein levels However, OCT4 was lost when the pluripotent lncRNAs were knocked down OCT4 siRNA (si-OCT4) was used as a positive control The scale bar indicates 100 µm
Trang 27From the microarray data, hierarchical clustering revealed that lncRNA_ES3
RNAi expression patterns clustered closely with those from the NANOG RNAi, in accordance with the regulation of the lncRNA by pluripotency transcription factors
(Figure 6.9A) However, lncRNA_ES1 and lncRNA_ES2 knockdown showed a global transcriptome profile most similar to SOX2 RNAi, suggesting that lncRNA_ES1 and
lncRNA_ES2 could be maintaining pluripotency in a SOX2-dependent manner
Figure 6.8: Knockdown of pluripotent lncRNAs resulted in loss of OCT4 The
percentage of OCT4+ cells from Figure 6.7 was quantified by flow cytometry The number of OCT4+ cells were significantly reduced upon knockdown of any of the three pluripotent lncRNAs * and ** indicate p-values of < 0.05 and < 0.01 respectively