MYB family proteins are one of the most abundant transcription factors in the cotton plant and play diverse roles in cotton growth and evolution. Previously, few studies have been conducted in upland cotton, Gossypium hirsutum. The recent release of the G. hirsutum genome sequence provides a great opportunity to identify and characterize the entire upland cotton MYB protein family.
Trang 1R E S E A R C H A R T I C L E Open Access
Genome-wide characterization and
expression analysis of MYB transcription
Haron Salih1,2,4, Wenfang Gong1, Shoupu He1, Gaofei Sun3, Junling Sun1*and Xiongming Du1*
Abstract
Background: MYB family proteins are one of the most abundant transcription factors in the cotton plant and play diverse roles in cotton growth and evolution Previously, few studies have been conducted in upland cotton,
Gossypium hirsutum The recent release of the G hirsutum genome sequence provides a great opportunity to
identify and characterize the entire upland cotton MYB protein family
Results: In this study, we undertook a comprehensive genome-wide characterization and expression analysis of the MYB transcription factor family during cotton fiber development A total of 524 non-redundant cotton MYB genes, among 1986 MYB and MYB-related putative proteins, were identified and classified into four subfamilies including 1R-MYB, 2R-MYB, 3R-MYB, and 4R-MYB Based on phylogenetic tree analysis, MYB transcription factors were divided into 16 subgroups The results showed that the majority (69.1 %) of GhMYBs genes belong to the 2R-MYB subfamily
in upland cotton
Conclusion: Our comparative genomics analysis has provided novel insights into the roles of MYB transcription factors in cotton fiber development These results provide the basis for a greater understanding of MYB regulatory networks and to develop new approaches to improve cotton fiber development
Keywords: Comparative genomics analysis, Upland cotton, MYB genes, Fiber development
Background
Plant growth and development are controlled by
multi-gene families Transcription factors play a key role in the
regulation of gene transcription and commonly comprise
four distinct domains: a DNA-binding domain, a nuclear
localization signal, a transcription activation domain, and
an oligomerization site [1] These four domains work
to-gether to control many aspects of plant growth and
devel-opment by activating or suppressing the transcriptional
process [2] Additionally, transcription factors are
regu-larly encoded by multigene families which makes
analyz-ing their individual roles more complex [1] Compared
with fungi and animals, the MYB transcription factors of
higher plants are more broadly dispersed in the genome
[3] The MYB domain is highly conserved among plants,
and proteins usually contain between one and four repeats
(SONT domains) named R1, R2, R3, and R4 Each repeat
is comprised of 50–53 amino acids which encode three α-helices, the second and third of which form a helix–turn–
transcription factor DNA recognition site and interacts with the major groove of DNA [5] Moreover, it is com-prised of regularly spread triplet tryptophan residues that group together to make a hydrophobic core [6] In con-trast, the C-terminal promoter domain of different MYB proteins is quite diverse, leading to the broad variety of regulatory roles of the MYB gene family [7, 8] The MYB domain was first identified in the avian myeloblastosis virus (v-myb) [9] and three additional MYB genes (c-myb, A-myb, and B-myb) were identified in different organisms such as vertebrates, insects, fungi, and slime molds [7, 10, 11] The corn C1 gene was the first MYB gene identified
in plants, and encodes a c-myb-like transcription factor responsible for the regulation of anthocyanin biosynthesis [12] Generally, R2R3-MYB domain proteins are the pre-dominant form found in higher plants [8]
* Correspondence: sunjl000@163.com ; dujeffrey8848@hotmail.com
1 State Key Laboratory of Cotton Biology/Institute of Cotton Research,
Chinese Academy of Agricultural Science (ICR, CAAS), Anyang 455000, China
Full list of author information is available at the end of the article
© 2016 The Author(s) Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made The Creative Commons Public Domain Dedication waiver
Trang 2To date, the plant transcription factor database (http://
planttfdb.cbi.edu.cn/) contains approximately 8746 MYB,
and 6410 MYB-related, sequences [13] These genes may
be involved in various plant cell activities such as
second-ary metabolism, hormone signaling [8, 14], environmental
stress, cell development [15], and organ growth [16, 17]
Recently, several genome-wide analyses of MYB
tran-scription factors have been conducted in Arabidopsis, rice
[18], maize [19], Salvia miltiorrhiza [20], soybean [21],
apple [22], sugarcane [15], and Chinese cabbage [23] These
studies can be utilized to identify MYB transcription factors
in other plants, including cotton However, very little
infor-mation about MYB gene diversity and abundance in upland
cotton is available Several studies have shown that the
MYB transcription factor family has a role in regulating
fiber progress in cotton The GhMYB109 and GhMYB2
genes, that belong to the R2R3-MYB subfamily, are
associ-ated with positive regulation of cotton fiber development
[24, 25] Moreover, silencing of GhMYB25 is involved in
the production of short fibers in cotton [26], while
suppres-sion of GhMYB25-like produces fiber-less cotton [27] The
negative regulator of cotton fiber elongation [28] A recent
report has revealed that ten MYB (MIXTA-like) genes were
highly expressed during early fiber development in
expression in three naked seed (fiber-less mutants) cotton
mutants revealed that only one group of MIXTA-like genes
had decreased expression levels [29]
Upland cotton is one of the most important fiber crops
in the world and provides raw material for the textile
in-dustry [30] Transcriptome analyses showed that several
pathways were regulated in developing cotton fibers [31]
The status of those up-regulated or down-regulated
path-ways, and the molecular mechanisms by which they are
controlled requires further investigation [32, 33] As MYB
proteins are one of the largest transcription factor families
in higher plants, they may play key roles in regulating
di-verse pathways in cotton during fiber development [30]
Comprehensive analysis of upland cotton MYB proteins
and their evolutionary variations, through tetraploid
cot-ton, might help to reveal critical molecular mechanisms of
cotton development and growth In addition, the recent
release of the G hirsutum genome sequence [34] provides
a great tool to identify and characterize the entire MYB
protein family in upland cotton
Here, we conducted a comprehensive genome-wide
analysis of the MYB transcription factor family in G
hir-sutum A total of 524 MYB transcription factor encoding
genes were identified and subsequently subjected to
sys-tematic analyses including: phylogenetic tree analysis,
chromosomal location determination, gene structure
(RNA-seq) analysis, and qRT-PCR analysis of selected
MYB genes Our genome-wide analysis of the GhMYB gene family might contribute to future studies on the functional characterization of MYB proteins in G hirsu-tum These research findings will provide information fun-damental to determining the molecular and regulatory mechanisms of MYB transcription factors in cotton
Methods
Identification of MYB gene family in upland cotton
Upland cotton protein sequences were downloaded from the Cotton Genome Project (http://cgp.genomics.org.cn/ page/species/download.jsp?category=hirsutum) for compu-tational analysis Corresponding protein sequences were downloaded from the Arabidopsis database (TAIR; http:// www.Arabidopsis.org/), cacao, mays, Vitis vinifera, Populus
downloaded from the plant transcription factor database (http://planttfdb.cbi.edu.cn/) A local BLASTP search was performed to identify candidate MYB members, using Ara-bidopsis, cacao, mays, Vitis vinifera, Populus trichocarpa and G raimondii MYB protein sequences as the query Hits with e-values of 1e- 10 were deemed to be members
of the MYB family Furthermore, to confirm the protein se-quences derived from the selected cotton MYB, candidate genes were examined using the domain analysis programs
of Pfam (Protein family: http://pfam.sanger.ac.uk/) and SMART (Simple Modular Architecture Research Tool: http://smart.embl-heidelberg.de/) All redundant sequences were manually discarded, resulting in 524 MYB pro-tein sequences Additional analysis was based on clus-ter W alignment results
Mapping MYB genes on chromosomes
The chromosomal position of all GhMYB genes was deter-mined through BLASTN searches against the G hirsutum genome project (Cotton Genome Project of the Institute
of Cotton Research of Chinese Academy of Agricultural Sciences) GhMYB genes were mapped on the chromo-some using the Map Chart software Two types of gene duplications were identified: tandem and segment duplica-tion events Gene duplicaduplica-tions were identified provided that the length of the aligned sequence covered >80 % of the longer gene, and that basic on the similarity of the gene alignment regions was >80 % [35] In addition, to further estimate GhMYB genes duplication events, the synonymous (Ks) and non-synonymous (Ka) substitution rates of evolution were calculated using the DnaSP soft-ware (version 5.10) [36] Ka/Ks calculator was run on those GhMYB gene pairs to estimate their synonymous and non-synonymous rates of evolution To estimate the evolutionary time of duplicated genes, Ks values were trans-lated into duplication time in millions of years based on a rate of 1 substitution per synonymous site per year The du-plication events time (T) was calculated as T = Ks/2λ × 10−6
Trang 3Mya (approximate value for clock-like rate λ = 1.5 × 10−8
years for cotton) [37]
Phylogenetic analysis
The phylogenetic tree of MYB transcription factor genes
was generated using multiple sequence alignments of
upland cotton, Arabidopsis and cacao MYB protein
se-quences using Cluster W (http://www.ebi.ac.uk/Tools/msa/
clustalw2/) Phylogenetic and molecular evolutionary
ana-lyses were performed using MEGA 6.0 software (http://
www.megasoftware.net) with pairwise distance and the
Neighbor-Joining (NJ) method The tree was constructed
with the following parameters: Substitution, Poisson Model;
data subset to use, the p-distance, complete deletion;
replication, bootstrap analysis with 1,000 replicates
More-over, maximum likelihood and minimum evolution methods
were also used in our phylogenetic tree to validate the result
from the NJ method Additionally, a separate phylogenetic
tree was constructed with all the GhMYB protein sequences
in G hirsutum for further analysis
Gene structure analysis and identified motifs
MYB genomic and cDNA sequences were obtained from
the Cotton Genome Project of the Institute of Cotton
Research of Chinese Academy of Agricultural Sciences
(http://cgp.genomics.org.cn/page/species/download.jsp?Ca-tegory=hirsutum) The Online Gene Structure Display
Ser-ver (GSDS 2.0) (http://gsds.cbi.pku.edu.cn/index.php) was
used to examine gene structure by comparing each cDNA
sequence with the corresponding genomic sequence
Con-served protein motifs in G hirsutum MYBs were identified
using the MEME program (version 4.8.2)
(http://mem-e.nbcr.net/meme/intro.html) The following parameters
were used: any number of repetitions, the maximum
num-ber of motifs-20, and optimum width from 6 to 250
Plant materials, RNA extraction and qRT-PCR analysis
Upland cotton (Gossypium hirsutum L.) Ligon-lintless 1
(Li1) mutants and wild-types (TM-1) seeds were provided
from Institute of Cotton Research, Chinese Academy of
Agricultural Sciences (CAAS, Anyang, China) and planted
in the experimental field at the Institute of Cotton
Re-search under conventional field management conditions
Flowers on one day before anthesis were tagged for
self-pollination To detect the MYB gene expression, samples
were collected from ligon-lintless1 and wild-type cotton at
different stages of cotton fiber development: 0, 3, 5, 8, 10,
and 15 DPA RNA was extracted from cotton ovules and
fi-bers using the RNA Aprep Pure Plant Kit (Tiangen) The
quality and concentration of each RNA sample was
deter-mined using gel electrophoresis and a NanoDrop 2000
spectrophotometer (Only that met the criterion 260/280
ratio of 1.8-2.1, 260/230 ratio≥ 2.0) were used for further
were treated with DNase I (TaKaRa, Japan) to eliminate contaminating genomic DNA The cDNA was synthesized
Ace qPCR RT kit (TOYOBO, Japan) according to the man-ufacturer’s manual qRT-PCR experiments were conducted
to measure expression levels of MYB transcription factor family genes during cotton fiber development qRT-PCR analysis was performed using the Applied Biosystems 7500 Real-Time PCR system and the SYBER premix ExTaq kit (TaKaRa Japan) Target gene amplification was checked by SYBR Green fluorescence signal The cotton constitutive β-actin gene was used as a reference gene and specific MYB primers were used for qRT-PCR The following ther-mal cycle conditions were used: 95 °C for 2 min, followed
by 40 cycles of 95 °C for 5 s, products collected at 60 °C for 34 s All reactions were repeated three times with three biological replicates Expression levels were calculated as the mean signal intensity across the three replicates Fol-lowing the PCR, a melting curve analysis was performed
Ct or threshold cycle was used for relative quantification of the input target number Relative fold difference (N) is the number of treated target gene transcript copies relative to that of the untreated gene transcript copies, and is calcu-lated according to Schmittgen et al 2001 [38] as follows:
N¼ 2ΔΔCt¼ 2ð ΔCt treated‐ΔCt control Þ
differ-ence in threshold cycles for GhNAC18 target and the GhActin1internal reference
RNA-seq data analysis
To analyze upland cotton MYB expression patterns, we used Illumina RNA-seq data, including five stages of cot-ton fiber development (−1, 1, 3, 5, and 10 DPA) from
calculated as reads per kilobase of exon model per million mapped reads (RPKM) units (Additional file 1: Table S1) Fold changes of different genes expression analysis and the related statistical computations of the two tested condi-tions were performed using the DESeq R package (1.10.1) The resulting P-values were adjusted using Benjamini’s and Hochberg’s method to control the false rate [39] Only genes with an adjusted P-value < 0.05 found using DESeq were categorized as differentially expressed Heat maps were generated and hierarchical clustering was performed using genesis_v1.7.6.30.09.10-DIGERATI software
Results and discussion
Genome-wide identification of upland cottonGhMYB transcription factors
MYB transcription factor encoding genes of G hirsutum and homologous MYB genes collected from Arabidopsis,
Trang 4cacao, mays, Vitis vinifera, Populus trichocarpa and G
DNA-binding domain, contains ~52 amino acid residues
in length and forms a helix-turn-helix fold with three
regu-larly spaced tryptophan residues) and MYB-related
(MYB-related genes are those transcription factors they have only
one MYB domain MYB domain proteins are more
preva-lent) [18] putative protein sequences were associated with
the upland cotton genome Of these, 582 non-redundant
MYB sequences, which met the crucial value of 1 e- 10,
were obtained Furthermore, GhMYB candidate genes were
examined using domain analysis programs of Pfam and
SMART A total of 524 non-redundant GhMYB genes
were identified and considered for further analysis These
genes were classified into four distinct subfamilies
includ-ing: 1R-MYB, 2R-MYB (R2R3-MYB), 3R-MYB
(R1R2R3-MYB), and 4R-MYB (Additional files 2: Table S2 and
Additional file 3: Figure S1) based on the number and
loca-tion of MYB repeats Our results indicate that, consistent
with results observed in rice, Arabidopsis [18], Chinese
cabbage [23], and apple [22], the majority of GhMYBs in
upland cotton belong to the 2R-MYB sub-family (69.1 %)
The second largest group 1R-MYB, accounted for 27.67 %
of all GhMYB genes, while 3R-MYB and 4R-MYB
accounted for 2.86 and 0.38 %, respectively (Table 1)
Chromosomal distribution and annotation MYB genes
Analysis of the G hirsutum genome sequence revealed
524 possible members of the GhMYB gene family Of
these genes, 114 had been annotated previously Three
hundred and seventy three (373) GhMYB transcription
factor genes were mapped onto upland cotton
chromo-somes and named according to their chromosomal order
(from chromosome 1 to 26) as GhMYB1 to GhMYB373
One hundred and fifty one (151) GhMYB genes were not
obviously mapped to any chromosome (scaffolds), and
named GhMYB374 to GhMYB524, respectively
(Add-itional file 4: Table S3) The distribution and density of
MYB transcription factor genes on chromosomes was not
uniform Some chromosomes, and chromosomal regions,
have a high density of MYB transcription factor genes
while others do not (Fig 1) The highest density of MYB
genes was observed on chromosome At 9 and its homolog
chromosome Dt 9 (23) with 58 genes, and the lowest
density of MYB genes was observed on chromosome At 3
and its homolog chromosome Dt 3 (17), with 11 genes In
addition, the majority of MYB transcription factor genes
were found at the upper and centromeric regions of the
chromosomes In addition, a greater number of MYB
genes were located on Dt chromosomes (tetraploid D)
than on At chromosomes (tetraploid A) with 201 and 172
genes, respectively (Table 1)
Tandem and segmental duplication events are the main
causes of gene-family expansion in upland cotton Based on
the whole genome analysis of gene duplications, 73 dupli-cated GhMYB gene pairs were made by segmental and tan-dem duplication, including 40 duplication events within the
At and Dt chromosomes as well as 33 duplication events between the At, Dt chromosomes and scaffold (Additional file 5: Table S4), indicating that segmental duplications and tandem duplications contributed to the expansion of
genes reside on the same chromosome or on different chromosomes A tandem duplication event is when gene duplication happens within the same chromosome while segmental duplication is when duplicated genes are located
in different chromosomes In this study, clusters formed by GhMYBs in the upland cotton (AD) genome were identi-fied to explain the mechanism behind the expansion of the GhMYBfamily in cotton We found that 5 gene pairs dupli-cated tandemly into chromosomes (At_chr5, At_chr7, At_chr12, Dt_chr5 and Dt_chr7) and 68 gene pairs dupli-cated segmentally, which deeply contributed to the expan-sion of the GhMYB transcription factors in upland cotton The results also indicated that, among the duplication events in the GhMYB transcription factor family, the gene pairs that appeared to be derived from segmental duplica-tion events occurred earlier than those that arose from tan-dem duplication (Additional file 5: Table S4) A gene duplication event, occurring during the course of cotton evolution, has led to the creation of new gene functions [40] The origin of multigene families has been attributed
to a region-specific gene duplication that occurred in up-land cotton [34] Furthermore, to calculate the evolution-ary time of these identified MYB in G hirsutum, an estimation of their synonymous and non-synonymous substitution rates during evolution, Ks and Ka values were calculated using the DnaSp software Nucleotide substitu-tions in protein-coding genes can be categorized as syn-onymous or non-synsyn-onymous substitutions as elaborated
in (Additional file 5: Table S4) The Ka/Ks ratio is a meas-ure used to examine the mechanisms of gene duplication evolution after divergence from their ancestors A Ka/Ks value of 1 suggests neutral selection, a Ka/Ks value of <1 suggests purifying selection, and a Ka/Ks value of >1 sug-gests positive selection (Hurst 2002) Here, we estimated synonymous and non-synonymous substitutions ratios (Ka/Ks) for the 73 pairs of segmentally and tandemly du-plicated genes It was found that most of GhMYB genes had Ka/Ks values of less than 1, implying that GhMYB genes have evolved under the effect of purifying selection while 17 duplicated genes had the Ka/Ks ratio more than
1, implying that those had evolved under positive selec-tion We further used Ks to estimate the time of GhMYB genes duplication events during the evolutionary time of upland cotton genome The tandem and segmental dupli-cation events in upland cotton that occurred between 0.26 (Ks = 0.0078) and 124.42 mya (Ks = 3.7326), with an
Trang 5average of 46.494 mya (million years ago) The Ks of
tan-dem duplications of GhMYB genes occurred from 2.76
(Ks = 0.0828) mya to 44.343 (Ks = 1.3303) mya, with
aver-age 26.0493 mya The results suggest that the expansion
of the GhMYB genes in upland cotton which
origi-nated from At and Dt genomes mostly arose from
evolution
Phylogenetic analysis of MYB transcription factors genes
in upland cotton
To identify potential relationships between the various
phylogenetic tree was constructed Examination of protein sequence similarity and phylogenetic tree analysis allowed
us to divide the 524 upland cotton MYB genes into 16 sub-groups, which ranged in size from 2 to 68 MYB genes (Additional file 6: Figure S2A) The bootstrap values for some subgroups of the NJ tree were low as a result of rela-tively large number of gene sequences that were also found
in earlier study [23] Supporting the phylogenetic analysis
of subgroups, most GhMYB protein domain repeats exhib-ited high similarities within subgroups Hence, we strongly sought other evidence to check the reliability of our phylo-genetic tree The phylophylo-genetic trees of MYB TFs were re-constructed with maximum likelihood and minimum
Table 1 The different subfamilies of GhMYB transcription factor types distributed on upland cotton chromosomes
Chromosome Transcription factor Types
Trang 6Fig 1 Distribution of GhMYB genes on cotton chromosomes The chromosomal position of each GhMYB was mapped to the upland cotton genome
Trang 7evolution methods to validate the result from the NJ and
pairwise distance method The trees constructed by the
three methods mentioned above, are almost the same with
only minimal differences at some subgroups (subgroup 7
and 16), implying that the tree methods were mainly
con-sistent with each other (Additional file 7: Figure S3)
MYB gene structure analysis and conserved motif
identification
Gene structure analysis of 524 GhMYB transcription
fac-tors was performed To provide greater insight into their
intron/exon structure, cDNA and corresponding
gen-omic sequences were compared Approximately 90.84 %
of upland cotton GhMYB genes contained between 1 and
12 introns Similar to that described in Arabidopsis, Vitis
vinifera, and Eucalyptus grandis [41, 42], most GhMYB
transcription factors contained 1 (17.4 %) or 2 introns
(52.7 %) The remaining 22.51 % of GhMYB genes
con-tained more than two introns However, only 8.2 % of
Most of the intronless genes were clustered into the 13
and 21 subgroups Furthermore, an uprooted phylogenetic
tree was constructed using GhMYB protein sequences to
assess the similarities in intron/exon structure within
Within subgroups the majority of GhMYB genes contained
similar exon/intron distribution arrangements, particularly
related to exon length and intron number Most of the
2R-MYB transcription factor genes had a conserved gene
structure with three exons and two introns Additionally,
the size of the third exon was more variable than that of
the first and second exon (Additional file 6: Figure S2B)
High levels of variation in the sequence of the third exon
are reported to be associated with functional divergence
among R2R3-MYB genes [43] Whilst few R2R3-MYB
genes contained no intron, they were clustered into 13
subgroups It was noted that the duplication of 2R during
the early development of MYB proteins containing two
re-peats gave rise to the 3R-MYB domains Therefore, it was
found that most of the 1R-MYB, 3R-MYB, and 4R-MYB
genes were disrupted by more than four introns This is
consistent with previous reports which have suggested that
most MYB-related genes in higher plants contained more
than 2 introns [44] Within each subgroup, most of the
struc-tures, as it was described previously by Jiang, Gu, and
Peterson and Matus [42, 45] There was a strong
connec-tion between the phylogeny tree analysis and the intron/
exon structure of the GhMYB transcription factor family
in upland cotton
Further investigation of the variation within the
con-served motifs of GhMYB proteins using the MEME
pro-gram identified 20 conserved motifs, which we designated
motifs 1 to 20 Most of the GhMYB proteins within the
same subgroup showed similar motif compositions, while high variance was observed between the different sub-groups This is consistent with previous reports suggesting that MYB family members with similar protein arrange-ments were classified into the same subfamily [8] For ex-ample, all GhMYB proteins in subgroup1 possessed motifs
1, 2, 3, and 12 while all members in subgroup 15 contained motifs 2, 3, 4, 6, 7, and 15 (Additional file 8: Figure S4) In addition, some motifs were specific to a distinctive sub-group, indicative of a particular function of that subgroup Though the functions of most of the conserved motifs re-main to be identified, they are likely to play an important role in the transcriptional regulation of target genes, and may indicate further functional diversification in specific species Our results suggest that these motifs are evolu-tionarily conserved and functionally important This result was similar to Stracke and Dubos who suggested that if MYB genes from the same subgroup share similar protein motifs they probably share similar functions [8, 46]
Upland cotton MYB family relationships with other plant
To understand the relationship between the members of the MYB gene family, we constructed an NJ phylogen-etic tree of upland cotton, Arabidopsis, and cacao MYB proteins Comparison of protein sequences and phylo-genetic tree analysis enabled us to categorize the 524,
197, 141 and 256 MYB genes of upland cotton,
identified 15 subgroups containing 4 to 195 MYB genes (Fig 2) The bootstrap values for some subgroups of the
NJ tree were low as a result of relatively large number of gene sequences that were also found in earlier study [23] The low bootstrap support for the internal sub-groups of those trees was in agreement with phylogen-etic analysis of MYBs in other plants [23] It was likely due to the fact that the MYB domains are comparatively short, and members within a subgroup are highly con-served, with relatively few informative character posi-tions Most MYB subgroups contained more upland cotton members than Arabidopsis, cacao and Gossyoium
(Additional file 9: Table S5) Moreover, the classification and identification of the MYB protein sequences was consistent with the previous classification described by Stracke and Dubos [4, 8] For example, subgroup 1 con-tained 32 upland cotton, 13 Arabidopsis, 9 cacao and 22
14 contained 1 Arabidopsis, 3 upland cotton MYB genes Curiously, some homologs were clustered by species within a subgroup, which referred to that species (Additional file 9: Table S5) Identification of putative orthologous MYB genes of upland cotton, Arabidopsis, ca-cao and Gossypium raimondii was relatively easy because they were clustered in pairs within a subgroup One
Trang 8hundred forty-nine (90 homologous gene pairs between
D5/Dt and 59 between D5/At) orthologous gene pairs
were found between upland cotton and Gossypium
rai-mondii, while only 1 orthologous was identified between
upland cotton and cacao, and 3 orthologous were found
between upland cotton and Arabidopsis, which might be
due to the closer relationships between upland cotton and
G raimondii
In addition, to gain more insights on divergence of the
MYB genes after polyploidization, the non-synonymous
(Ka) and synonymous (Ks) nucleotide substitutions and
their ratio (Ka/Ks) were analyzed for the homologous
gene pairs between G raimondii (D5) and upland cotton
homologous, 45 were identical (Ka = Ks = 0 or Ka/Ks
ra-tio = 0), 41 had a Ka/Ks less than 1, suggesting that MYB
genes have evolved mainly under the effect of purifying
se-lection However, only 3 MYB genes had a Ka/Ks ratio
more than 1, suggesting that these genes have been
evolved by positive selection (Additional file 10: Table S6)
This result implying that most of the ancestral MYB
genes have been retained in upland cotton G
hirsu-tum after polyploidization
Expression profiles of MYB genes inG hirsutum
MYB gene expression was analyzed using RNA-seq data
from different stages of cotton fiber development
(82.3 %) MYB genes were expressed in at least one stage
of cotton fiber development and the expression 93/524
(17.7 %) MYB genes were not detected by RNA-seq
(Additional file 1: Table S1) In addition, of the 431
low expression levels during different stages of fiber de-velopment (Additional file 6: Figure S2C) According to the phylogenetic tree analysis, the expression of MYB transcription factors can also be divided into 16 sub-groups All genes within subgroups 2, 3 and 5 exhibited low or undetectable expression levels during the five early stages of cotton fiber development In Arabidopsis, this subgroup has been shown to be involved in salt tol-erance [47, 48] GhMYBs showed elevated transcript levels in the five cotton fiber developmental periods, suggesting that they might be important for maintenance
of metabolic processes and normal cotton fiber develop-ment Most MYB genes in subgroup 1, 4 and subgroups
12 to 15 were highly expressed in the five analyzed stages of fiber development Many MYB MIXTA-like transcription factors could be involved in the regulation
of epidermal cell differentiation in different plant spe-cies, including specifying cell shape in petals, vegetative trichome initiation, and branching and seed fiber initi-ation [26, 27] Recently, it has been reported that ten MYB (MIXTA-like) genes were highly expressed during early fiber development in G hirsutum In contrast, only one group of MIXTA-like genes had low expression levels in three natural fiber-less mutants [29] These mu-tants provide a means to analyze the roles of MYB tran-scription factors in the control fiber development in upland cotton Interestingly, expression of the rice
hor-mone treatment Moreover, in Arabidopsis the MYB91 gene has been shown to integrate endogenous develop-mental signals with different environdevelop-mental conditions [49] In addition, MYB88 normally maintains fate and de-velopmental progression throughout the stomatal cell lineage [50] These results indicate that MYB genes can have multiple functions in plant growth and stress re-sponses Sixty six genes in subgroup 7, CotAD_42675 (MYB2 or GL1), CotAD_18666 (MYB109), CotAD_46807 (MYB109), CotAD_02818 and ect, were highly expressed during the initiation and elongation stages of fiber devel-opment, implying that these genes may be involved in a complex network of fiber development In fact, MYB2 stimulates cotton fiber development [25] and MYB109 is specifically expressed in fiber initiation and elongation stages [24] Recently, it was found that MYB2 and MYB109 promote normal fiber development in cotton [29] In this study, 15 MYB-3R genes were identified and clustered into subgroups 10 and 13 The expression of three of the 15 MYB-3R genes could not be detected dur-ing cotton fiber development Therefore, the fact that the MYB-3R family is easily identifiable and characterized could make them suitable targets for genetic engineering approaches aimed at improving cotton fiber development
Fig 2 Phylogenetic tree of 492 upland cotton MYB proteins, 197
Arabidopsis MYB proteins and cacao 249 MYB proteins The phylogenetic
tree was constructed by MEGA 6.0 using the Neighbor-Joining method.
The bootstrap test was performed with 1,000 iterations The 29 subgroups
are shown with different colors
Trang 9Fig 3 Expression levels of 20 GhMYB genes measured by qRT-PCR analysis of Ligon-lintless1 mutant and wild-type at different stages of cotton fiber development Black and grey represent the expression levels of Li1 mutant and wild-type, respectively
Trang 10Previous reports suggest that GhCPC, belonging to the
MYB-3R subfamily, negatively controls cotton fiber
devel-opment during the initiation and early elongation
stages in mutant cotton [28, 51] These results are
consistent with previous studies which have suggested
that MYBs played a crucial role in a wide variety of
biological processes including: cell growth, cell cycle
control, signal transduction, metabolic and
physio-logical stability, and response to environmental
stim-uli [37, 52] Therefore, the MYB family might provide
a means to regulate cotton fiber development and
offer a path to understanding cell fiber development
during the initiation and elongation stages
Expression verification ofGhMYB genes involved in
cotton fiber development
MYB transcription factors play roles in many plant specific
processes, such as primary and secondary metabolism, cell
shape, anthers development, cellular proliferation,
differen-tiation, and stress responses [53, 54] We randomly selected
20 MYB genes to undergo expression verification using
qRT-PCR (Fig 3) The GhMYB genes MYB25, MYB2,
MYB109, MYB5, and MYB3 were highly expressed in
wild-type G hirsutum, and exhibited lower expression levels in
G hirsutumLigon-lintless1 (Li1) mutants after 5DPA
Pre-viously, it found that 8DPA was the critical point for the
Ligon-lintless1 mutant [32] In addition, MYB109 and
CotAD_02818 (GL1) promoted cotton fiber development
[29] MYB25 was expressed in ovules (initiation) and fiber
development [26] Our results indicate these genes may
play an essential in maintaining normal cotton fiber
development In contrast, some selected GhMYB genes such
as CotAD_29631 (CPC-like), CotAD_47467 (MYB103),
CotAD_11820 (CPC-3R-MYB), CotAD_64719 (MYB1),
CotAD_42115 (MYB83), and CotAD_21852 (MYB69)
were significantly expressed in theLigon-lintless1 mutant,
but not in wild-type A previous study reported that
CPC-3R-MYB negatively controlled cotton fiber development
[28], consistent with this the genes that are up-regulated in
the Ligon-lintless1 mutant could be responsible for the
short fiber phenotype observed Other groups of genes such
as CotAD_04154 (MYB_255), CotAD_02811 (MYB52),
CotAD_41041 (MYB_198), CotAD_64081 (MYBML5),
CotAD_71681 (MYB42), CotAD_13600 (MYB46), and
CotAD_27106 (MYB20) were expressed at different
levels in the Li1 mutant and wild-type which may
in-dicate functional divergence of GhMYB genes during
cotton fiber development Previous reports mentioned
that MYB genes showed significant expression differences
between Ligon-lintless2 and wild-type expression during
the later stage of cotton fiber development at 20DPA [55]
Several MYB transcription factors were readjusted by the
Ligon-lintless1 mutant at 5 DPA [56], 6 DPA [31], 1 DPA,
3 DPA, and 8 DPA ovules [33] Overall, it can be seen that
our RNA-seq data is consistent with qRT-PCR results In addition, a comparative expression profile analysis of MYBs in upland cotton revealed that GhMYB might have diverse functions at different stages of cell fiber develop-ment Taken together, the RNA-seq and qRT-PCR expres-sion analyses in G hirsutum support the hypothesis that GhMYBs are involved in fiber development during differ-ent developmdiffer-ental stages, and may have diverse functions
in Arabidopsis and other species The functions of most MYBs in higher plants remain unclear, and fur-ther investigation is required to elucidate their exact func-tions Our results provide a comprehensive understanding
of GhMYBs and provide the foundation for future func-tional analyses of MYB genes and their roles in cotton fiber development
Conclusions
The MYB gene family is part of the biggest transcription factor family in higher plants and plays an important role in plant growth and development We undertook a comprehensive genome-wide characterization and ex-pression analysis of the MYB transcription factor family
in cotton fiber development A total of 524 MYB genes were identified and classified into four subfamilies Based on phylogenetic tree analysis, these MYB tran-scription factors were classified into 16 subgroups Pro-teins within the same subgroup contained very similar gene structures and protein motifs Additionally, our results revealed that MYB genes were distributed across the entire upland cotton genome Moreover, RNA-seq data showed that MYB genes play an important role in plants The expression profiles of 20 genes during cot-ton fiber development, obtained by qRT-PCR, show that different MYB genes can positively or negatively regulate cotton fiber development Additionally, other MYB genes are expressed in both mutant and wild-type fiber, further highlighting the diverse functions of MYB proteins in the development of the cotton fiber cell This study provides strong evidence that GhMYB genes play a major role in cotton fiber development and pro-vides a platform for the characterization of interesting MYB genes in the future
Additional files
Additional file 1: Table S1 Expression patterns of MYB genes in different stages of cotton fiber development measured by RNA-seq (XLSX 63 kb) Additional file 2: Table S2 Classification of MYB genes according to MYB repeat domains and protein length (XLSX 26 kb)
Additional file 3: Figure S1 Schematic representation of the general structure of upland cotton R1-MYB, 2R-MYB, 3R-MYB and 4R-MYB domain proteins (TIF 400 kb)
Additional file 4: Table S3 Location of MYB genes in the upland cotton genome The positive (+) and negative ( −) symbols following