Analysis of the promoter architecture of different miRNA genes in plants revealed significant differences in the number and distribution of core promoters between intergenic miRNAs and in
Trang 1characteristics for plant microRNA genes
Meng Zhou1,*, Jie Sun1,*, Qiang-Hu Wang1,*, Li-Qun Song2, Guang Zhao1, Hong-Zhi Wang2, Hai-Xiu Yang1and Xia Li1
1 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
2 Department of Internal Medicine, Affiliated Hospital of Heilongjiang University of Chinese Medicine, Harbin, China
Introduction
MicroRNAs (miRNAs), 21–24 nucleotides in
length, are a large class of endogenous, noncoding
small RNA molecules that regulate gene expression
at the post-transcriptional level in animals and plants
[1–4] The first microRNA – lin-4 – was discovered
in 1993 in Caenorhabditis elegans through forward
genetic screens [5] The first plant miRNA was
dis-covered in Arabidopsis thaliana in 2002 [6,7] Plant
miRNA genes are mostly transcribed into primary
miRNA transcripts (pri-miRNAs) by RNA
polymer-ase II (Pol II) The pri-miRNAs are processed by DICER-LIKE 1 (DCL1) into stem–loop pre-miRNAs
in the nucleus Then, pre-miRNAs are processed by DCL1 in the nucleus and exported to the cytoplasm, possibly through the action of the plant exportin 5 or-thologue HASTY and other unknown factors Mature RNA duplexes excised from pre-miRNAs
(miR-NA⁄ miRNA*, where miRNA is the guide strand and miRNA* is the degraded strand) are methylated by HEN1 The guide miRNA strand is then incorporated
Keywords
clustering patterns; flanking regions; motif;
plant microRNA gene; sequence
characteristics
Correspondence
Xia Li, College of Bioinformatics Science
and Technology, Harbin Medical University,
Harbin 150081, China.
Fax: +86 045186615922
Tel: +86 045186669617
E-mail: lixia@hrbmu.edu.cn
*These authors contributed equally to this
work
(Received 11 October 2010, revised 7
December 2010, accepted 7 January 2011)
doi:10.1111/j.1742-4658.2011.08008.x
MicroRNAs (miRNAs) have been proven to play important roles at the post-transcriptional level in animals and plants To investigate clustering patterns and specific sequence characteristics in the flanking regions of plant miRNA genes, we performed genome-wide analyses of Arabidopsis thaliana, Populus trichocarpa, Oryza sativa and Sorghum bicolor Our results showed that miRNA pair distances were significantly higher than would have been expected to occur at random and that the number of miRNA gene pairs separated by very short distances of < 1 kb was higher than of protein-coding gene pairs Analysis of the promoter architecture of different miRNA genes in plants revealed significant differences in the number and distribution of core promoters between intergenic miRNAs and intragenic miRNAs, and between highly conserved miRNAs and low conserved or nonconserved miRNAs We applied two motif-finding algo-rithms to search for over-represented, statistically significant sequence motifs, and discovered six species-specific motifs across the four plant spe-cies studied Moreover, we also identified, for the first time, several signifi-cantly over-represented motifs that were associated with conserved miRNAs, and these motifs may be useful for understanding the mechanism
of origin of new plant miRNAs The results presented provide a new insight into the transcriptional regulation and processing of plant miRNAs
Abbreviations
miRNA, microRNA; Pol II, RNA polymerase II; pri-miRNAs, primary miRNA transcripts; TSSs, transcription start sites.
Trang 2into AGO proteins to carry out the silencing reactions
[1,2]
In plants, Xie et al [8] identified transcription start
sites (TSSs) for 63 miRNA primary transcripts in
A thaliana and found the TATA box motif in their
core promoter regions Unlike animal miRNAs, the
vast majority of plant miRNAs are intergenic but not
intronic [2,9] Several studies have characterized the
upstream sequences of intergenic miRNAs in model
organisms and found the same type of promoters as in
the protein-coding genes of most of the intergenic
miRNAs [10–12] Furthermore, Zhou et al [11] also
discovered some interesting sequence motifs that are
specific to intergenic miRNAs in four different model
species For all other miRNAs located within the
introns of protein-coding genes, little is known about
their transcriptional regulatory element These
intra-genic miRNAs are possibly transcribed with, or
inde-pendently of, the host genes Recently, Heikkinen
et al [13] examined the upstream sequences of
miR-NAs in C elegans and Caenorhabditis briggsae, and
discovered a sequence motif – GANNNNGA –
com-mon to all miRNAs, including intragenic miRNAs In
rice (Oryza sativa), some intragenic miRNAs were
found to contain class II promoters in upstream
sequences [10] However, the complex transcriptional
regulation mechanisms of plant miRNAs still remain
largely unknown
Although many efforts have been directed towards
examining clustering patterns and the sequence
charac-teristics of the upstream sequences of miRNA genes in
animals in an attempt to understand transcriptional
regulation [11,13–16], similar analyses have been
per-formed only for a relatively small number of miRNAs
in plants, and these were limited to A thaliana and
O sativa Recently, increasing numbers of plant
miRNAs have been identified through forward
genet-ics, direct cloning and computational prediction An
increasing number of plant miRNAs provide a good
opportunity to uncover complex transcriptional
regu-lation mechanisms for plant miRNAs In our study,
we performed computational approaches, based on
genome-wide analyses, to examine the clustering
pat-terns of plant miRNAs In addition, we analyzed
regions, up to 2 kb upstream and up to 1 kb
down-stream, of miRNA stem–loop sequences in four plant
species, to identify characteristic sequence motifs We
hope that the present results can improve the current
understanding of transcriptional regulation and
pro-cessing of plant miRNAs and provide useful knowledge
for understanding the mechanism of the origin and
computational identification of new miRNAs in plants
Results and Discussion Analysis of clustering patterns of miRNA genes
in four plant genomes Many previous studies have shown that miRNA genes tend to be present as clusters within a region of several kilobases in animal genomes [17–20] In contrast, plant miRNA genes are rarely arranged in tandem [1] To further explore the clustering patterns of miRNAs in plant genomes, we computed the distances between same-strand consecutive miRNA genes of four plant species to analyze the distance distribution of miRNA genes in different plant species based on reported miR-Base coordinates The cumulative distance distribution
of the miRNA gene pairs is presented in Fig 1 and shows that 17.71%, 26.94% and 29.07% of the
miR-NA gene pairs are separated by regions of < 1, 10 and 100 kb, respectively, which are much smaller than the regions separating animal miRNA gene pairs Fur-thermore, we compared the distance distribution of the miRNA gene pairs with the distance distribution of protein-coding genes in four plant genomes (Fig 1)
We found that more miRNA gene pairs than protein-coding gene pairs were separated by very short dis-tances of < 1 kb To evaluate the statistical signifi-cance of the clustering patterns of miRNA genes in the four plant species studied, we also compared the distances of the miRNA gene pairs with random dis-tances, as described in the Materials and methods, and found that the miRNA gene pair distances were statis-tically significantly higher than expected at random (P < 0.001) To identify more characteristics of
miR-NA clusters in plant genomes, we defined 10 kb as the maximum inter-miRNA distance for two miRNA genes to be considered as clustered because 26.94% of the miRNA gene-pair distances were < 10 kb and extending the threshold to 100 kb added relatively few miRNA gene pairs Furthermore, the relatively small distance prevented overestimation of the number of clusters and made our analysis more stringent Accord-ing to this definition, we examined the characteristics
of potential clusters within maximum inter-miRNA distance of 2-,5- and 10-kb (Table 1) Our study revealed that the number of members in miRNA clus-ters at very short gene-pair distances in O sativa and Sorghum bicolorwas significantly larger than in A tha-liana and Populus trichocarpa (P < 0.01; two-sample t-test) This may suggest that miRNA clusters in monocots are larger than those in eudicots This spe-cific clustering pattern of miRNAs may be indicative
of functional divergence of the miRNA cluster in
Trang 3miRNA-mediated gene regulation between monocots
and eudicots Furthermore, miRNA clusters in plants
are frequently found to have smaller size of cluster
compared with miRNA clusters in animals (P < 0.01;
two-sample t-test) In animals, a large proportion of
known miRNAs are arranged in clusters For example,
48% of human miRNAs appear as clusters within a
maximum inter-miRNA distance of 10 kb [21] and
50% of miRNAs appear as clusters within a maximum
inter-miRNA distance of 3 kb in the zebrafish genome
[22] In contrast to patterns of clustering found in
ani-mal miRNAs, only a sani-mall proportion of plant
miR-NAs (25.35% in A thaliana, 17.09% in P trichocarpa,
22.29% in O sativa and 21.62% in S bicolor) were
found to be clustered within a 10-kb region in our
study It has been demonstrated that miRNA families
are preferentially expressed in eudicots relative to
monocots [23] Our analysis further indicated that
most plant miRNA clusters are composed of family
members and are located in intergenic regions, which
is consistent with previous studies in plants [10,24,25]
Our results imply that the size of the miRNA cluster
may contribute to preferential expression in eudicots
relative to monocots Li et al [25] suggested that the
co-transcription of similar or identical miRNAs in
clusters for plants may be involved in gene dosage effect
Analysis of the core promoter of the class II promoter in plant miRNA genes
miRNA genes were determined to be part of the poly-cistronic transcript if the pairwise distance of two miRNAs on the same chromosome was < 10 kb For miRNAs in polycistronic transcripts, only sequences upstream of the 5¢ pre-miRNAs and downstream of the 3¢ pre-miRNAs were chosen to represent the poly-cistronic transcript As described in the Materials and methods, we used the TSSP-TCM program to initially search for the putative core promoter of the class II promoter occurring in 2-kb upstream sequences of miRNAs in the four plant species studied We identi-fied 130 (77.8%) miRNAs in A thaliana, 145 (89%) miRNAs in P trichocarpa, 233 (71.5%) miRNAs in
O sativa and 102 (81.6%) miRNAs in S bicolor to contain the core promoter of the class II promoter, suggesting that a significant proportion of plant miRNA genes have resident Pol II promoters in upstream regions It is generally accepted that miRNA genes located in the intronic regions as part of the host
Fig 1 Cumulative distance distribution of miRNA genes and protein-coding genes in four plant species The neighbour distances between every two same-strand miRNA genes or protein-coding genes in the same chromosome were calculated The distance is drawn on a loga-rithmic scale.
Trang 4gene are expressed from the host gene promoters [26,27] However, a recent study on intergenic⁄ intronic and conserved⁄ nonconserved miRNA genes in rice revealed that several intronic miRNA genes in rice have a class II promoter, and rice miRNAs with more than one promoter appear to be conserved [10], thus implying that different sequence characteristics may be presented in upstream regions of different miRNA genes in plants To further explore the promoter archi-tecture of different miRNAs in plants and the relation-ship between the number of Pol II promoters and the degree of conservation of miRNAs, we classified four plant miRNA genes into two types (intergenic miR-NAs and intragenic miRmiR-NAs) based on their genomic locations Then, the miRNAs from the four plant spe-cies studied were divided into three groups (based on evolutionary conservation across all plant species, as described in the Materials and methods): highly con-served miRNAs, low concon-served miRNAs and noncon-served miRNAs The results are summarized in Fig 2
As shown in Fig 2A, we found a significant difference between intergenic miRNAs and intragenic miRNAs
in the numbers of class II promoters in the upstream regions (P < 0.001; two-sample t-test) The miRNAs lying between protein-coding genes usually contained more class II promoters in their upstream sequences (on average 1.4 per miRNA) than those miRNAs lying within the introns (on average 0.7 per miRNA) in the four plant species studied These results strongly indi-cate that most intergenic miRNAs are transcribed by RNA polymerase II in plants, and provide additional evidence that a significant proportion of intragenic miRNAs have Pol II promoters It suggests that these intragenic miRNAs may be transcribed as an indepen-dent unit from their own promoter However, in plants, a small number of miRNAs with no class II promoter may be transcribed through other transcrip-tional mechanisms, such as the host gene promoter Further studies carried out to explore whether there is
a relationship between the number of Pol II promoters and the degree of miRNA conservation revealed that the number of Pol II promoters in the upstream sequences of highly conserved miRNAs was signifi-cantly higher than in low conserved (P < 0.001; sample t-test) and in nonconserved (P < 0.001; two-sample t-test) miRNAs As shown in Fig 2B, only 13.67% of highly conserved miRNAs had no Pol II promoter, which is significantly lower than in low con-served miRNAs (31.14%) (P < 0.01; Fisher’s exact test) and in nonconserved miRNAs (26.76%) (P < 0.05; Fisher’s exact test) On the contrary, 50.13% of highly conserved miRNAs have at least two Pol II promoters, whereas only 27.38% of low
f The
Trang 5served miRNAs (P < 0.001; Fisher’s exact test) and
23.94% of nonconserved miRNAs (P < 0.0001;
Fish-er’s exact test) have at least two Pol II promoters
However, there was no significant difference in the
number of Pol II promoters in upstream sequences
between low conserved and nonconserved miRNAs in
plants Taken together with the findings of the study
performed by Cui et al [10], our results provide a
more comprehensive understanding of the relationship
between the number of Pol II promoters and the
degree of miRNA conservation in plant genomes
Highly conserved miRNAs may be associated with
more Pol II promoters (on average 1.72 per miRNA)
than low conserved and nonconserved miRNAs (on average 1.13 and 1.05 per miRNA, respectively) in plants It has been demonstrated that the highly con-served miRNAs are likely to be central regulators and are highly expressed [28,29] The results of one study suggested that less conserved miRNAs rarely had obvi-ous effects on plant morphology [30] Therefore, we speculate that the increased number of Pol II promot-ers located in the upstream regions of highly conserved miRNAs may have an important effect on the high levels of expression of highly conserved miRNAs
To further characterize the putative core promoter
of the Pol II promoter in the upstream sequences of
40%
50%
A
B
20%
30%
Intragenic Intergenic Intragenic
Intergenic
Intragenic Intergenic Intragenic
Intergenic
0%
10%
40%
50%
60%
20%
30%
0%
10%
40%
50%
20%
30%
0%
10%
40%
50%
20%
30%
0%
10%
P trichocarpa
A thaliana
0 1 2 ≥3
0 1 2 ≥3
The number of core promoter The number of core promoter
The number of core promoter
0 1 2 ≥3
The number of core promoter
The number of core promoter 100%
60%
80%
Non-conserved Low conserved Highly conserved
20%
40%
0%
Fig 2 Distribution of miRNA genes with
the same number of putative core
promot-ers (A) The percentage of miRNA genes
occurring between protein-coding genes or
within the introns in four plant species.
(B) The percentage of miRNA genes with
different degrees of conservation.
Trang 6plant miRNAs, we examined the distribution of the
putative core promoter in 2-kb upstream regions of
miRNAs in the four species of plant studied In these
four plant species, the vast majority of the predicted
core promoters of the Pol II promoters were found to
lie within a 900-bp region upstream of the miRNAs
Distribution analysis of core promoter localization in
2-kb regions upstream of the miRNAs from the four
species of plants studied showed that 50.4% of the
putative core promoters of the Pol II promoter were
located within 0–1 kb, 26.8% were located within 1–
1.5 kb and 22.8% were located within 1.5–2 kb,
respectively of the miRNA A recent study on rice
(O sativa) suggested that the majority of TSSs and
TATA-boxes are found within 0–400 bp upstream of
the miRNA [10] Here, we found a similar distribution
of the putative core promoter in upstream regions of
miRNAs in four plant species As shown in Fig 3A, a
significant number of putative core promoters of the Pol II promoter were found to be located within the 400-bp upstream regions in three plant species, although the putative promoters in O sativa were dis-tributed mainly from 0 to 0.4 kb and from 1.6 to 2 kb Together, these results indicate that this distribution pattern of putative core promoters seems to be con-served in the 2-kb region upstream of miRNAs in dif-ferent plant species, and provide additional evidence that the core promoter regions of most miRNAs are close to pre-miRNA hairpins in plants Fig 3B shows the distribution of the core promoter in upstream sequences in view of the evolutionary conservation of plant miRNAs We found that the distribution pattern
of the core promoter in upstream regions was different between highly conserved miRNAs and low conserved
or nonconserved miRNAs Highly conserved miRNAs tend to contain more core promoters within the
400-bp region upstream of the miRNA However, core promoters are distributed mainly in the 0 to )0.4 kb, )0.8 to )1.2 kb and )1.6 to )2 kb regions upstream
of low conserved miRNAs, and, in contrast, core pro-moters are evenly distributed in upstream regions of nonconserved miRNAs These results suggest that there is a relationship between the distribution pattern
of core promoters and the degree of miRNA conserva-tion in plants Based on these observaconserva-tions, we pro-pose that the core promoter of Pol II promoters in the close proximal promoter region of miRNAs may play
a more effective, or even a greater, role for efficient transcription initiation
Analysis of specific sequence motifs in four plant species
To further identify specific characteristic motifs in the flanking regions of miRNAs in four plant species, we performed motif analysis to search for over-repre-sented and statistically significant motifs in the flank-ing regions up to 2 kb upstream and 1 kb downstream from the miRNA stem–loop sequences First of all, we used RepeatMasker with default settings to mask repeats in all upstream and downstream sequences, and then used two motif-finding tools – MEME and MotifSampler – to identify over-represented motifs Finally, we carried out whole-genome Monte Carlo simulation analysis to assess the specificity and signifi-cance of motifs identified, as described in the Materials and methods Motifs whose Z-scores were > 2.0 were considered as over-represented and statistically signifi-cant motifs Several signifisignifi-cantly over-represented spe-cies-specific motifs were identified in the flanking regions of four plant species All the species-specific
0%
10%
20%
30%
40%
50%
A
B
S bicolor
O sativa
A thaliana
P trichocarpa
0%
10%
20%
30%
40%
–0.4 kb –0.8 kb –1.2 kb –1.6 kb –2 kb
–0.4 kb –0.8 kb –1.2 kb –1.6 kb –2 kb
Highly conserved Low conserved Non-conserved
Fig 3 Histograms of distances between putative core promoters
and miRNA stem–loop sequences The horizontal axis shows the
positions of putative core promoters with respect to the
corre-sponding miRNA stem–loop sequences, and the vertical axis shows
the percentage of putative core promoters at the specified
posi-tions (A) Percentage of putative core promoters at the specified
positions in different plant species (B) Percentage of putative core
promoters at the specified positions for miRNAs with a different
degree of conservation.
Trang 7motifs found in the four plant species studied are
shown in Table 2 The motif M2, represented by the
consensus sequence TTAGGGTTTC, has also been
found in A thaliana by Zhou et al [11] Moreover, we
also discovered a novel motif – M1 – with a Z-score
value of 10.62 that is specific to A thaliana In order
to gain a deeper insight into the function of these
spe-cies-specific motifs, we compared our spespe-cies-specific
motifs against known transcription factors in plants
from the PlantCARE database [31] Only one motif
(M5) was already a known transcription factor in
plant promoters We found that M5, with the
consen-sus sequence GCATGCATGC, is an RY cis-acting
regulatory element involved in seed-specific regulation
in both monocot and eudicot species of plants [32,33]
Although the functions of other species-specific motifs
are still unknown, we found that some motifs have
repeat sequences in their consensus M5 has two copies
of GCAT, and M3, which can be considered as GCA-repeats Palindromic patterns have been found in the binding sites of some transcription factors in plants and animals [34,35] In contrast to A thaliana,
P trichocarpa and S bicolor, we could not detect any significant species-specific motifs in the flanking regions of miRNAs in O sativa, although a previous study has identified three specific motifs in the promot-ers of miRNAs in O sativa [11] Our analysis suggests that these species-specific motifs are associated with different specific functions, and may play an important role in species-specific transcriptional regulation net-works of miRNA genes or contribute to the formation
of species-specific miRNAs in plants However, their functions need to be investigated in further studies Furthermore, these species-specific motifs will be useful
in the computational identification of species-specific miRNAs in plants
Table 2 Significantly over-represented species-specific sequence motifs identified in the flanking regions of the three plant species studied.
Consensus
a The consensus sequence represents a sequence of the most frequent base at each position b The motif logos show the information con-tent present at each position in the sequence c The expected frequencies of motifs in a random database of the same size d The Z-score value was obtained by whole-genome Monte Carlo simulation analysis.
Trang 8The mechanism by which new plant miRNAs
origi-nate is not fully understood It is believed that the
ori-gin of new plant miRNAs is dependent on duplication
and inversion events [36–38] However, several lines of
evidence have also suggested that new plant miRNA
genes can arise from foldback sequences, which are
under the control of transcriptional regulatory
sequences [39,40] In order to determine whether some
significantly over-represented sequence motifs are
related to the degree of conservation of miRNA genes
in plants, we classified the miRNA genes of four plant
species into highly conserved miRNAs, low conserved
miRNAs and nonconserved miRNAs, as described in
the Materials and methods We then examined the
upstream sequences and downstream sequences of
these miRNA genes to reveal characteristic sequence
motifs Several significantly over-represented motifs
associated with the degree of miRNA conservation are
identified and listed in Table 3 Two motifs
respectively), which have repetitive and palindromic
patterns in their consensus sequences, were found to
be significantly over-represented in highly conserved
plant miRNAs and therefore these motifs can be
con-sidered as CATG repeats and CTAG repeats,
respec-tively However, we did not find any significantly
over-represented sequence motifs in the flanking sequences
of nonconserved miRNAs in the four plant species In
contrast to nonconserved miRNA genes that have a
single copy, conserved miRNA genes are usually
multi-copy [25] miRNAs that are highly conserved across
plant species must have originated a long time ago and
experienced many genome-duplication events It has
been shown that the duplication events for miRNA gene evolution in plants not only involve the region that is transcribed but also the miRNA promoter regions [41,42] This might indicate that these signifi-cantly over-represented sequence motifs in highly con-served and low concon-served miRNAs are evolutionarily related elements that play important functional roles in evolutionarily conserved regulatory systems in plants
or are associated with duplication events for miRNA gene evolution in plants, although the functionality of these computationally identified conserved motifs remains to be experimentally validated
Conclusions
In this study, we concentrated our efforts on clustering patterns and flanking characteristics that might be involved in the transcriptional regulation and process-ing of plant miRNAs, includprocess-ing the miRNAs located
in the intergenic area and in the protein-coding area whose possible sequence characteristics were not stud-ied earlier Previous studies have revealed that miR-NAs located in close genomic proximity to each other are co-transcribed as polycistronic units [24,43,44] Therefore, we performed genome-wide analysis to examine the clustering patterns of the miRNAs in four species of plant The pairwise distance analysis results
of same-strand consecutive miRNAs suggested that the distances between the four plant miRNAs are statisti-cally significantly higher than expected at random (P < 0.001) Comparison of the miRNA pair distances with the pair distances of protein-coding genes revealed that plant miRNAs are more clustered than
Table 3 Significantly over-represented sequence motifs related to the conservation of miRNAs.
a The consensus sequence represents a sequence of the most frequent base at each position b The motif logos show the information con-tent present at each position in the sequence c The expected frequencies of motifs in a random database of the same size d The Z-score value obtained by whole-genome Monte Carlo simulation analysis.
Trang 9protein-coding genes in the very short pairwise
dis-tances of < 1 kb Then, we characterized the putative
core promoter of Pol II promoters in plant miRNA
upstream sequences Our results suggest that most
plant miRNAs contain the core promoter of Pol II
promoters that are close to pre-miRNA hairpins
Analysis of promoter architecture for different miRNA
genes in plants reveals significant differences in the
number and distribution of core promoters between
intergenic miRNAs and intragenic miRNAs, and
between highly conserved miRNAs and low or
non-conserved miRNAs We applied two motif-finding
tools to search for over-represented, statistically
signifi-cant sequence motifs in the flanking regions of
miRNAs in different plant species Six motifs were
found to be species-specific motifs in three plant
species and included some previously known
species-specific motifs and some novel species-species-specific motifs
We also identified three specific motifs associated with
the degree of miRNA conservation
Compared with previous studies, our study
system-atically explored clustering patterns and the
character-istics of flanking regions up to 2 kb upstream and 1 kb
downstream of miRNA stem–loop sequences, and
extended the results on a small number of miRNAs in
A thaliana and in O sativa to all known miRNAs in
four plant species It remains largely unknown whether
there are some motifs related to the degree of
conser-vation of miRNAs In order to dissect this question,
we classified the miRNA genes of the four plant
species studied into three groups, according to their
conservation, and examined characteristic sequence
motifs in the flanking sequences of these miRNA
genes Several significant motifs appeared to be related
to the degree of miRNA conservation We hope that
our results can contribute to gaining a better
under-standing of transcriptional regulation and
process-ing of miRNAs and provide useful data for further
computational identification of miRNAs in plants
Also, we anticipate that these motifs related to the
degree of miRNA conservation may be useful for
understanding the mechanism of the origin of new
plant miRNAs
Materials and methods Data sets
To obtain the upstream and downstream sequences of plant miRNA genes, we chose four species of plant (A thaliana,
P trichocarpa, O sativa and S bicolor) to study clustering patterns and sequence characteristics in the flanking regions
of plant miRNA genes because the number of miRNA genes in these four plant species is relatively large and the genome sequences are relatively complete All known miRNAs and genome coordinates in these four plant spe-cies were downloaded from the miRBase Sequence Data-base, release 16 (http://www.mirbase.org/) [45] The genome sequences and the protein-coding genes of A thaliana and
S bicolor were downloaded from MapViewer in National Center for Biotechnology Information (http://www.ncbi nlm.nih.gov/) The genome sequences of P trichocarpa and
O sativa and the protein-coding genes were downloaded from the Poplar site on Phytozome v6.0 (P trichocarpa v2.0) (http://www.phytozome.net/poplar) [23] and TIGR Oryza Pseudomolecules (version_6.0) [46], respectively Then, we extracted sequences up to 2 kb upstream and up
to 1 kb downstream from all available miRNA precursors
in the four plant species A detailed description of the data set used in our study is shown in Table 4
Conservation analysis of miRNA in the four plant species studied
To determine the degree of conservation of miRNA in the four plant species, we performed a sequence-based homol-ogy search for known miRNAs from the four plants to detect both closely related and distantly related homo-logues First, known miRNA hairpin sequences from the four plants were aligned against all known miRNA hairpin sequences in monocots and eudicots using standalone BLAST (blastn, version 2.2.27) The hairpin sequences were considered as homologues when they exhibited a minimum sequence identity of 85% over an alignment length of at least 90% Second, ClustalW [47] was used to compare mature miRNA sequences for a search of homologues We adopted mature miRNA sequences matching at least 18 nucleotides and left 0–3 nucleotides for possible sequence
Table 4 Detailed description of the data set in our study.
Species
Version of genome annotation
No of miRNAs
No of polycistronic transcripts
No of upstream sequences
No of downstream sequences
Trang 10variations [19] Finally, we divided the miRNAs of the four
plant species into three groups: the miRNAs whose
homo-logues were found simultaneously in monocots and eudicots
were considered as highly conserved miRNAs; those found
only in monocots or eudicots were considered as low
con-served miRNAs; and those found only in one species were
considered as nonconserved miRNAs
Analysis of clustering patterns
To study the clustering patterns of miRNA genes in
differ-ent plant species, we computed the neighbour distances
between every two same-strand consecutive miRNA genes
in the same chromosome The average distance of the
neighbour miRNA pairs was calculated across all
chromo-somes in the four plant species studied To evaluate the
sta-tistical significance of miRNA clustering patterns in the
four plant species, we performed a sampling approach to
evaluate significance First, we selected random positions
whose number was equal to the number of miRNA genes
on each chromosome Then we computed the neighbour
distances between consecutive random points and the
aver-age By random shuffling 1000 times, we set the P value as
the fraction of times for which the random averages were
smaller (or larger) than the average distances of miRNA
pairs to evaluate the statistical significance for clustering
patterns in the four plant species
Prediction of the core promoter of the plant
miRNA gene
The core promoter of the class II promoter, including the
TSS and the TATA-box, in the upstream sequences of plant
miRNA genes were detected using the TSSP-TCM program
(http://mendel.cs.rhul.ac.uk/mendel.php?topic=fgen) with
its default parameters; this program is well established
and is the most commonly used plant promoter prediction
software [48]
Motif analysis
To identify characteristic motifs in the flanking regions for
microRNA genes in the four plant species, we first used
RepeatMasker (version 3.2.9; http://www.repeatmasker.org)
with default settings to mask repeats in all upstream and
downstream sequences Then we applied the MEME Suite
software (version 4.3.0; http://meme.sdsc.edu/), which is a
probabilistic local alignment tool [49] The significance of
a detected motif was represented by the E-value, which
refers to the expected number of motifs of equal width
with the same or higher likelihood in a random sequence
set with the same size and nucleotide composition as the
considered set of sequences Here, MEME was used to
identify 10 top-ranking motifs for each species with a
width of 10 bp All other options were left as default Furthermore, we also applied MotifSampler, which is based on Gibbs sampling [50], to find over-represented motifs MotifSampler is a stochastic algorithm and the results may vary for different runs Therefore, we carried out 50 repeated runs of MotifSampler for each analysis The number of different motifs was set to 10 and the width of the motifs was set to 10 All other options were set at a variety of arguably sensible settings The results of these two programs were integrated to identify motifs that are frequently reported to have a low E-value among these settings and among both motif-finding tools in the flanking regions of the microRNA genes from the four plant spe-cies Sequence logos for all motifs found by these two pro-grams were created using WebLogo Version 2.8.2 (http:// weblogo.berkeley.edu) [51]
In order to determine whether a motif is statistically sig-nificant in the flanking regions of plant miRNA genes, whole-genome Monte Carlo simulation, resulting in a Z-score, was used to take into account the specificity and significance of a motif, as previously described by Zhou et al [11] For a given motif, we first obtained the average number of occurrences per target sequence, denoted
as Nt, and then randomly generated the same number of ref-erence sets from protein-coding genes and an intergenic sequence, far upstream of the miRNA, as an appropriate background Next, the MEME motifs were individually aligned using the MAST program with default values [52] to the reference sets to compute the average number of occur-rences of a motif, Nr, and its standard deviation, rr, over the reference sets The Z-score was computed as Z = (Nt⁄ Nr) = rr, which measures the normalized difference between the average occurrence of the motif in the target set and the sample mean in the reference sets [11]
Acknowledgements This work was supported in part by the National Natural Science Foundation of China (grant nos
30871394, 30600367 and 30571034), the National High Tech Development Project of China, the 863 Program (grant nos 2007AA02Z329), the National Basic Research Program of China, the 973 Program (grant nos 2008CB517302) and the National Science Foundation of Heilongjiang Province (grant nos ZJG0501, 1055HG009, GB03C602-4, JC2007H and BMFH060044)
References
1 Voinnet O (2009) Origin, biogenesis, and activity of plant microRNAs Cell 136, 669–687
2 Chen X (2008) MicroRNA metabolism in plants Curr Top Microbiol Immunol 320, 117–136