genes in Brassica rapaYun-Xiang Zang1,2,*, Hyun Uk Kim1,*, Jin A Kim1, Myung-Ho Lim1, Mina Jin1, Sang Choon Lee1, Soo-Jin Kwon1, Soo-In Lee1, Joon Ki Hong1, Tae-Ho Park1, Jeong-Hwan Mun1
Trang 1genes in Brassica rapa
Yun-Xiang Zang1,2,*, Hyun Uk Kim1,*, Jin A Kim1, Myung-Ho Lim1, Mina Jin1, Sang Choon Lee1, Soo-Jin Kwon1, Soo-In Lee1, Joon Ki Hong1, Tae-Ho Park1, Jeong-Hwan Mun1, Young-Joo Seol1, Seung-Beom Hong3and Beom-Seok Park1
1 Genomics Division, Department of Agricultural Bio-resources, National Academy of Agricultural Science (NAAS), Rural Development Administration (RDA), Suwon, Korea
2 School of Agricultural and Food Science, Zhejiang Forestry University, Lin’an, Hangzhou, China
3 Department of Biology, San Jacinto College, Houston, TX, USA
Keywords
bioinformatics; biosynthesis pathway;
Brassica rapa; gene identification;
glucosinolate
Correspondence
B S Park, Genomics Division, Department
of Agricultural Bio-resources, National
Academy of Agricultural Science (NAAS),
Rural Development Administration (RDA),
Suwon 441-707, Korea
Fax: +82 31 299 1672
Tel: +82 31 299 1671
E-mail: pbeom@rda.go.kr
*These authors contributed equally to this work
Database
The following have been deposited to the
GenBank database Accession numbers are
shown in parenthesis: BrBCAT4 (FJ376036–
FJ376037), BrMAM (FJ376038–FJ376041),
BrBCAT3 (FJ376042–FJ376043), BrCYP79F1
(FJ376044), BrCYP79B2 (FJ376045–
FJ376046), BrCYP79B3 (FJ376047),
BrCYP79A2-1 (FJ376048), BrCYP83A1
(FJ376049–FJ376050), BrCYP83B1
(FJ376051), BrC-S lyase (FJ376052–
FJ376053), BrUGT74B1-1 (FJ376054),
BrUGT74C1 (FJ376055–FJ376057), BrST5a
(FJ376058–FJ376059), BrST5b (FJ376060–
FJ376068), BrST5c-1 (FJ376069), BrFMOGS-OX1
(FJ376070), BrFMOGS-OX5(FJ376071),
BrAOP2 (FJ376073), BrGSL-OH (FJ376074),
BrBZO1p (FJ376075), BrDof1.1 (FJ584284–
FJ584285), BrIQD1-1 (FJ584286), BrMYB28
(FJ584287–FJ584289), BrMYB29 (FJ584290–
FJ584292), BrMYB34 (FJ584293–FJ584295),
BrMYB51 (FJ584296–FJ584299),
BrMYB122-1 (FJ584300)
(Received 15 February 2009, revised 31
March 2009, accepted 24 April 2009)
doi:10.1111/j.1742-4658.2009.07076.x
Glucosinolates play important roles in plant defense against herbivores and microbes, as well as in human nutrition Some glucosinolate-derived isothi-ocyanate and nitrile compounds have been clinically proven for their anti-carcinogenic activity To better understand glucosinolate biosynthesis in Brassica rapa, we conducted a comparative genomics study with Arabidopsis thaliana and identified total 56 putative biosynthetic and regulator genes This established a high colinearity in the glucosinolate biosynthesis path-way between Arabidopsis and B rapa Glucosinolate genes in B rapa share 72–94% nucleotide sequence identity with the Arabidopsis orthologs and exist in different copy numbers The exon⁄ intron split pattern of B rapa is almost identical to that of Arabidopsis, although inversion, insertion, dele-tion and intron size variadele-tions commonly occur Four genes appear to be nonfunctional as a result of the presence of a frame shift mutation and retrotransposon insertion At least 12 paralogs of desulfoglucosinolate sulfotransferase were found in B rapa, whereas only three were found in Arabidopsis The expression of those paralogs was not tissue-specific but varied greatly depending on B rapa tissue types Expression was also developmentally regulated in some paralogs but not in other paralogs Most of the regulator genes are present as triple copies Accordingly, gluc-osinolate synthesis and regulation in B rapa appears to be more complex than that of Arabidopsis With the isolation and further characterization of the endogenous genes, health-beneficial vegetables or desirable animal feed crops could be developed by metabolically engineering the glucosinolate pathway
Abbreviations
BAC, bacterial artificial chromosome; CDS, coding sequence; EST, expressed sequence tag; LTR, long terminal repeat; MAM,
methylthioalkylmalate synthase; NCBI, National Center for Biotechnology Information.
Trang 2Glucosinolates, a group of sulfur-rich secondary
metabolites, have received much attention because
their breakdown products display several potent
bio-activities that serve as plant defense, as well as
anti-carcinogenesis compounds, in mammals [1–3] Upon
tissue disruption, the enzyme myrosinase cleaves off
the glucose group from a glucosinolate, and the
remaining molecule then quickly converts to a
bioac-tive substance (i.e an isothiocyanate, nitrile or
thiocya-nate) Among the isothiocyanates, sulforaphane, a
derivative of glucoraphanin, is known to be the most
promising anticancer agent because of its strong and
broad spectrum activity against several types of cancer
cells [3–10] Indole-3-carbinol, a derivative of
gluco-brassicin, also comprises a good anticarcinogen Both
exhibit their effects by inducing phase II detoxification
enzymes, altering estrogen metabolism, blocking the
cell cycle or protecting against oxidative damages
[11–15] Phenethyl isothiocyanate, a derivative of
gluconasturtiin, was reported to be effective for
chemoprotection [16–18], although it possesses
geno-toxic activity [19–21] Crambene
(1-cyano-2-hydroxy-3-butene), an aliphatic nitrile derived from progoitrin,
upregulates the synthesis of glutathione S-transferase
in the liver and other organs [22]
Glucosinolates are classified into three major groups,
namely aliphatic, indolyl and aromatic glucosinolates,
based on the amino acids from which they are
synthe-sized [23] Biosynthesis of aliphatic and aromatic
gluc-osinolates generally involves three steps (Fig 1) and
begins with the elongation of methionine and
phenyl-alanine, respectively The initial step of aliphatic
glucosinolate synthesis is catalyzed by
methylthioalkyl-malate synthase (MAM) to form the elongated
homologs [24,25] The core structures are made via
oxidation by cytochrome P450 enzymes, CYP79 and
CYP83, followed by C-S cleavage, glucosylation and
sulfation Finally, the side chains are modified by
oxidation, elimination, akylation or esterification
Some of the genes involved in this step, FMOGS-OX15,
AOP, GSL-OH and BZO1, have been isolated recently
[26–31]
Cruciferous vegetables, including broccoli, cabbage,
Chinese cabbage, cauliflower, and brussels sprouts, are
rich in glucosinolates A high intake of cruciferous
vegetables was shown to significantly reduce the risk
of certain cancers and cardiovascular diseases [32–34]
Chinese cabbage (Brassica rapa ssp pekinensis) is one
of the most highly consumed vegetable crops in Asia
However, unlike broccoli, many Chinese cabbage
culti-vars do not produce detectable levels of glucoraphanin
To date, most of the structural genes responsible for
the biosynthesis of the three groups of glucosinolates
have been identified and characterized in Arabidopsis [23,35] In addition, several regulators that control glucosinolate biosynthesis have been identified recently
in Arabidopsis [36–43] However, little is known about the specific genes existing in Brassica crops, except for the MAM and AOP genes in Brassica oleracea [44–46] The glucosinolate profile is highly dependent on genotype, although it is also affected by developmental
or environmental changes [47–49] Previously, we reported that the ectopic expression of Arabidopsis glucosinolate synthesis genes altered the glucosinolate profile in Chinese cabbage [50,51] Because most of the Arabidopsis genes encoding glucosinolate biosynthesis pathways have been identified and Chinese cabbage is
a close relative of Arabidopsis, comparative genomic studies will allow for the easy identification of relevant genes in Brassicas The identification and characteriza-tion of glucosinolate synthesis genes in Chinese cab-bage would pave the way for further improvement of agronomic traits via genetic engineering In the present study, we report the genome-wide identification of
B rapa glucosinolate synthesis (BrGS) and regulator genes using our B rapa genome sequence in conjunc-tion with the available Arabidopsis sequence We also show that many BrGS genes exist in a small multigene family and that at least 12 desulfoglucosinolate sulfotransferase (BrST) paralogs are present and are differentially expressed
Results BrGS gene identification from cDNA and bacterial artificial chromosome (BAC) libraries
As part of the B rapa genome sequencing project, we produced 127 143 expressed sequence tags (ESTs) from
28 different cDNA libraries that were released to the National Center for Biotechnology Information (NCBI) database and a new B rapa EST database, BrEMD (http://www.brassica-rapa.org/BrEMD/) with microarray data Furthermore, we determined more than 127 000 BAC end sequences, and approximately
589 seed BACs were sequenced and anchored in Arabidopsis whole chromosomes The 65.8 Mb seed BAC sequence information covered approximately 75.3% of the Arabidopsis genome and 40% of the
B rapa euchromatin region [52] On the basis of these databases, homologous genes were identified by a blastn search using the Arabidopsis gene sequence as query All the ESTs that matched each query sequence were aligned to remove the redundant clones, and EST clones containing a start codon were resequenced
to generate the full-length cDNA sequence Through
Trang 3Fig 1 Biosynthesis pathways of the three major groups of glucosinolates in B rapa The genes involved in each step are shown Numbers
in parenthesis denote gene copy numbers.
Trang 4this alignment, a total of 35 different genes was found
from ESTs In the same way, blastn searches were
per-formed against the BAC sequence databases, yielding
44 different genes, among which 23 overlapped the EST
sequences Thus, a total of 56 individual genes was
identified from both EST and BAC clones, of which
44 contained the full-length coding sequence (CDS)
(Fig 1,Tables 1 and 2) They contain all the homologs
of Arabidopsis except for CYP79F2, FMOGS-OX24,
AOP3 and MYB76 In Arabidopsis, AOP2 and AOP3
are tandemly located on chromosome IV [29]; however,
AOP2was only found in B rapa The same observation
was also made in B oleracea [45] This suggests that duplication occurred in Arabidopsis after its divergence from Brassica Four genes, BrUGT74C1-1, BrST5b-6, BrST5b-4 and BrMYB122-1, appear to be nonfunc-tional as a result of a frame shift or retrotransposon– insertion mutations (Fig 2)
To estimate the total number of putative BrGS genes
in the whole genome of B rapa, a genomic blot was performed using the CYP79F1⁄ F2, CYP79B2 ⁄ B3, CYP83A1 and CYP83B1 genes as probes (see Support-ing information, Fig S1) [53] This analysis predicted the presence of a total of eight genes (two, three, two
Table 1 Comparison of putative BrGS biosynthetic genes with the Arabidopsis orthologs The nucleotide sequence of the coding region was used for comparison analysis; the BrGS gene sequence is from the partial- or full-length CDS; the single percentage indicates the single
B rapa orthologous sequence that was available Most of the genes are full length except those marked with an asterisk.
Glucosinolate pathway
B rapa gene name
Corresponding AGI
No of genes found
Corresponding clones
% Identity At and B rapa
Amino acid side chain
elongation
78.4–87.0
Core structure formation
step
References [24], [25], [26], [27], [28], [29], [31], [57], [58], [59], [79]
Trang 5and one copies for each gene, respectively) On the
other hand, a total of seven genes was found from our
database search for those genes, suggesting that the
percentage of BrGS genes identified in the present
study is approximately 87.5%
BrGS gene identity with Arabidopsis and other
Brassica orthologs
BrGS biosynthetic genes share 72–90% nucleotide
sequence identity with Arabidopsis orthologs and 28
genes exist in a small multigene family (Table 1) This
close relatedness is further substantiated by our
phy-logenetic tree analyses (Fig 3; see also Supporting
information, Figs S2–S11) However, most of the
BrGS genes share more than 90% identity with other
Brassica orthologs (Table 3) This is consistent with
the notion that the Brassica species evolved after
divergence from the Arabidopsis lineage Notably,
BrAOP2 has the lowest sequence identity with the
orthologs of Arabidopsis and B oleracea Identities
within the BrGS paralogs are usually higher than
those with Arabidopsis and other Brassica species All
of the BrST5b paralogous genes except BrST5b-4
share more than 80% sequence identity with AtST5b
(Table 4) Identities between BrST5b and AtST5b
(76–86%) are comparable to those between tandem
BrST5b repeats (77–89%) and between nontandem
repeats BrST5b-6 and BrST5b-9 (88%) (Fig 4,
Table 4) This suggests that duplication occurred after
a very recent divergence between Arabidopsis and
B rapa One putative benzoate-CoA ligase gene
BrBZO1pwas identified (see also Supporting
informa-tion, Fig S11) It has a similarity of 81% compared
to both BZO1 and At1g65890
Similar to the biosynthetic genes, BrGS regulator genes share 81–94% nucleotide sequence identity with Arabidopsis orthologs and 15 genes exist in a small multigene family (Table2) Most of the genes are trip-licated, indicating that regulator genes are mostly retained after the Brassica genome triplication
Structure of BrGS genes Ordered assembly of the overlapping sequences of BAC and EST clones yielded the overall gene struc-tures shown in Fig 5 The exon and intron structures
of the BrGS genes were identical to those of Arabidop-sis homologs However, insertion, deletion and intron size variations were commonly noted in BrGS genes One of the two BrC-S lyase genes had a 2 bp deletion
at the last exon, which resulted in a 3¢ truncated pro-tein with a 16 amino acid deletion compared to the Arabidopsis homolog The truncation of 3¢ end exon might alter either gene function or the expression pat-tern in such a way to change feedback regulation, as previously proposed by Gao et al [46] Desulfogluco-sinolate sulfotransferase genes did not have any intron
in both Arabidopsis and B rapa (Fig 5A) The AOP2 structure of B rapa was compared with that of B oler-acea and Arabidopsis All three species contained four exons and three introns, along with considerable changes in intron sizes (Fig 5B) One of the two BrST5a genes contained a 3 bp insertion (Fig 5A), which did not lead to a frame shift mutation
Insertion or deletion often gives rise to a frame shift mutation that causes the loss of gene function This type of mutation occurred in two BrGS genes with pre-mature stop codons immediately after the deletion sites (Fig 2A) Among nine BrST5b paralogs, BrST5b-4
A
B
Fig 2 Structures of the predicted nonfunctional BrGS genes (A) Three of the four carried deletion mutations and (B) the fourth one carried a putative restrotransposon insertion A non-LTR retrotransposon insertion is marked by approximately 6 kb insertion Asterisks indicate the posi-tion of a premature stop codon Thick, thin and dotted lines denote the exon, intron and the gap between BrST5b-4 and BrST5b-x, respectively.
Trang 6appears to be a pseudogene because it contains an approximately 6 kb insert of a putative non-long ter-minal repeat (LTR) retrotransposon that encodes a reverse transcriptase (Fig 2B) Transposon insertion mutations in coding sequences or intergenic regions were also previously observed in B oleracea [45] Another gene, BrST5b-x, with only a 150 bp 3¢ end partial sequence, was found to be located in the inter-genic region approximately 500 bp downstream of BrST5b-4 (Fig 2B) However, we did not consider this
as another copy of BrST5b because of the presence of only a small amount of remainder sequence as a result
of a massive deletion event Pseudogenes are assumed
to arise frequently during genome evolution and are often regarded as ‘molecular fossils’ in evolutionary genomics [54] Pseudogenes might be the result of natural selection reducing functional redundancy However, the divergent copies of duplicated genes would be further diversified to evolve for neofunction-alization or subfunctionneofunction-alization [55,56]
Divergent duplication and differential expression
of BrST genes
In comparison with three orthologs in Arabidopsis, desulfoglucosinolate sulfotransferase exists in a small
Table 2 Comparison of putative BrGS regulator genes with the Arabidopsis orthologs The nucleotide sequence of the coding region was used for comparison; the BrGS gene sequences used are either partial or full-length CDS Most of the genes are full length except those marked with an asterisk.
Transcription factors
B rapa gene name
Corresponding AGI
No of genes found
Corresponding clones
% Identity At and B rapa
R2R3-Myb transcription
factors for aliphatic
glucosinolates
R2R3-Myb transcription
factors for indole
glucosinolates
KBrB118H07R &
KBrH078K01R
BR115967
References [36], [37], [38], [39], [40],[41],[42],[43]
Fig 3 Nonrooted neighbor-joining phylogenetic tree of B rapa
desulfoglucosinolate sulfotransferases and Arabidopsis
sulfotrans-ferases Coding sequences of AtST5b were used to identify the
orthologs between these two species because some of BrST5b are
pseudogenes Bootstrap values with 500 replicates are denoted as
percentages.
Trang 7Fig 4 Comparative map of the five BACs
containing BrST paralogs and their
counter-parts in Arabidopsis At Chr1, Arabidopsis
chromosome 1; Br R7 (Chr7), B rapa
link-age group R7 (chromosome 7); Br R9
(Chr1), B rapa linkage group R9
(chromo-some 1); Mb, megabase; cM, centimorgan.
The loci of AtST5a,b (At1g74100 and
At1g74090) and BrST counterparts are
indi-cated by oval-shaped bars The loci that
cor-respond to the five Brassica BACs are all
located on Arabidopsis chromosome 1 and
are marked by stick bars Colinear and
non-colinear genes are indicated by dashed and
dotted lines, respectively The location of
KBrB034H04 on the B rapa chromosome
has not yet been established.
Table 3 Sequence similarities between BrGS genes and other Brassica orthologs The nucleotide sequence of the coding region was used for comparison analysis; the highest percentages are shown when a gene has several copies.
B rapa
gene
Gene of other Brassica species
% of
Table 4 Similarity and divergence among desulfoglucosinolate sulfotransferase genes of Arabidopsis and B rapa Values represent the per-centage similarity in the upper triangle area and perper-centage divergence in the lower triangle area as demarcated by the diagonally aligned black squares; full-length CDS was employed for the analyses using DNASTAR software (DNASTAR Inc., Madison, WI, USA).
Trang 8multigene family with 12 paralogs in which two EST
clones are not mapped on B rapa (Table 1, Fig 4)
Most of them are clustered in a tandem array, as
shown in the chromosomal loci of BAC clones
(Fig 4) In addition, they are usually clustered
together in the phylogenetic tree (Fig 3)
Two Arabidopsis desulfoglucosinolate
sulfotrans-ferases, AtST5a (At1g74100) and AtST5b (At1g74090),
are involved in the biosynthesis of indolyl and ali-phatic glucosinolates, respectively [57] Nevertheless, they share 80% nucleotide identity and are tandemly located on chromosome 1 (Fig 4) Thus, we examined the expression patterns of BrST genes in different tis-sues by RT-PCR (Fig 6) BrActin1, an actin gene of
B rapa, was used as an internal control to adjust the amount of cDNA template for PCR because it is con-stitutively expressed in all types of tissues Primers were designed from the gene specific untranslated region (see Supporting information, Table S1) All of the genes except BrST5b-4 were expressed in all six ferent tissues, although the expression profiles were dif-ferent Generally, BrST5b was expressed at higher levels than BrST5c but at lower levels than BrST5a Specifically, BrST5b-6 and BrST5b-7 were expressed at the lowest levels because their products were not shown until after 40 cycles of PCR (Fig 6) All of the PCR products were sequenced and were matched to individual gene sequences (data not shown) BrST5a-1
A
B
Fig 5 Structures of representative BrGS genes (A) Comparison
with Arabidopsis orthologs (B) Structural comparison of AOP2
orthologs of Arabidopsis (At), B rapa (Br) and B oleracea (Bo).
Representative BrGS gene structures were composed based on
the full-length genomic, cDNA, or coding sequences of BAC and
EST clones Arabidopsis gene structures were generated according
to NCBI sequence information Each pair of genes was aligned in a
colinear form Positions of introns are indicated by the triangles,
above which intron sizes (bp) are shown as numerals The position
and size of the nucleotide insertion and deletion are also marked as
In ⁄ Del.
Fig 6 RT-PCR analysis of BrST genes in different types of tissues.
L, mature leaf; R, mature root; FB, floral bud; SL, seedling; S, sta-men; C, carpel The PCR products of the BrST genes are approxi-mately 1 kb; BrActin1 is approxiapproxi-mately 450 bp, which serves as an internal control.
Trang 9was strongly expressed in all tissues except the stamen.
By contrast, BrST5a-2 was strongly expressed in the
stamen, but weakly in the floral bud and carpel
Over-all, BrST5b paralogs were expressed at a very low level
in the floral bud However, some genes (i.e BrST5b-1
and BrST5b-2) were expressed strongly in the carpel,
whereas others (i.e BrST5b-2, BrST5b-8 and BrST5b-9)
were expressed strongly in the stamen The results
obtained demonstrate that the expression of the
paralogs is not tissue-specific but varies greatly
depending on tissue type In terms of the overall
expression level, mature leaf and root expressed BrST
paralogs at higher levels than other tissues,
demonstra-ting functional redundancy for differential expression
In seedling tissue, BrST5a paralogs were more strongly
expressed than BrST5b paralogs No significant
differ-ences in the expression levels of BrST5a paralogs were
noted between the seedling and mature leaf and root
tissues On the other hand, significant differences
between those tissues were found for the expression
levels of BrST5b paralogs except BrST5b-1 Thus,
expression is developmentally regulated in some BrST5b
paralogs but not in BrST5a paralogs
Discussion
Similarity between B rapa and Arabidopsis
in the glucosinolate biosynthesis pathway
Our B rapa genome sequence database searches
identi-fied the counterparts of most of Arabidopsis
glucosino-late synthesis genes, and they are present in various
copy numbers (Fig 1) Only a few genes that
corre-spond to Arabidopsis CYP79F2, FMOGS-OX24 and
AOP3 were not found in B rapa Thus, a high
colin-earity in the glucosinolate biosynthesis pathway exists
between Arabidopsis and B rapa despite the difference
in gene copy numbers
As the first step, two different genes, BCAT and
MAM, are known to be involved in the chain elongation
of Met-derived aliphatic glucosinolate biosynthesis
BCAT4and BCAT3 enzymes catalyze the deamination
and transamination, respectively [58,59] B rapa
con-tains two BCAT4 paralogs that have 92% nucleotide
sequence identity and are the same size B rapa also
carries two BCAT3 paralogs, one of which has a
full-length CDS MAM enzyme catalyzes the condensation
of acetyl-coenzyme A with a series of
x-methylthio-2-oxoalkanoic acids MAM1⁄ MAM2, two tandem
paralogs found in some of Arabidopsis ecotypes, are
responsible for the first two cycles of chain elongation
[24] MAM3 enzyme catalyzes all the different cycles of
Met chain elongation [25] We identified four MAM
paralogs in the B rapa genome that share approxi-mately 78–87% identities with the Arabidopsis ortho-logs, although we were unable to determine which
of these is individually equivalent to MAM1, 2 and 3 Two of them are not identical in an approximately
200 bp region of the 3¢ ends This is also the case in the
B oleracea MAM (BoGSL-ELONG) gene family [46] and did not affect its enzymatic function equivalent to the Arabidopsis ortholog MAM1 [60] In addition to tissue-dependent differential expression, the members of BrMAM gene family may encode enzymes of different biochemical properties with respect to chain elongation, such as Arabidopsis MAM orthologs Two Arabidopsis genes, IPMS1 and IPMS2, that encode isopropylmalate synthase are similar to the MAM family genes, with 60% similarity in their amino acid sequence [46,61] Nevertheless, they are not involved in Met chain elongation but are involved in leucine biosynthesis Phy-logenetic analysis indicates that the BrMAM genes do not belong to the IPMS family but, instead, belong to the the MAM gene family We were unable to identify the genes responsible for phenylalanine chain elonga-tion, an initial step of aromatic glucosinolate synthesis, because the corresponding genes have not yet been isolated in Arabidopsis and other Brassica species
As the second step, the formation of the glucosino-late core structure is initiated by the conversion of amino acid to the corresponding aldoxime, and this is catalyzed by the CYP79 enzymes [23] Three groups of CYP79 family genes, CYP79F1,2, CYP79B2,3 and CYP79A2, are involved in aliphatic, indolyl and aro-matic glucosinolate biosynthesis, respectively Our database searches indicate two copies of the CYP79B2 and C-S lyase genes and three copies of UGT74C1 in
B rapa, unlike the single copy genes in Arabidopsis Such duplication may necessate a redundant function for tissue- or development-dependent differential expression Excluding two copies of nonfunctional BrST5 carrying frameshift and transposon insertion mutations, eight copies of BrST5 are actually involved
in aliphatic glucosinolate synthesis in B rapa (Table 1, Figs 2 and 4), whereas two copies of the orthologs exist in Arabidopsis The expression level of BrST5 is not only developmentally regulated, but also highly dependent on tissue type (Fig 6) Because sulfonation
is a penultimate step of glucosinolate biosynthesis, the expression of BrST5 may play a crucial role in the tis-sue-specific and developmental accumulation of gluco-sinolates It remains to be determined whether BrST5 transcript levels are correlated with the accumulated levels of indole and aliphatic glucosinolates
The final step of glucosinolate synthesis is side chain modification and, currently, this step is well
Trang 10characterized only for aliphatic glucosinolate in
Arabidopsis Glucoraphanin
(4-methylsulfinylbutyl-glucosinolate) is abundant in Columbia but is absent in
the Landsberg ecotype of Arabidopsis This difference is
attributed to the AOP2 gene, whose expression diverts
glucoraphanin into alkenyl-glucosinolate [29] Our
gen-ome database search yielded the presence of a single
copy of AOP2 in B rapa However, two AOP2
quanti-tative trait loci, Ali-QTL3.1 and Ali-QTL9.1, were
recently reported to be involved in determining the type
and concentration of glucosinolates found in B rapa
leaves [62] Consistent with this finding, our Southern
blot analysis indicated the presence of two copies of
AOP2 in B rapa (data not shown) The presence of
AOP2 explains why glucoraphanin was not detectable
in Chinese cabbage [50,51] In B oleracea, two
tan-demly repeated copies of AOP2 contain a 2 bp deletion
at the third exon, which is responsible for the high
accumulation of glucoraphanin (Fig 5B) [45] Brassica
napus, a species resulting from interspecific
hybridiza-tion between B rapa and B oleracea, was reported to
be absent in glucoraphanin [63] This is most likely as
result of the AOP2 gene introduced from B rapa The
content of glucoraphanin in B rapa and B napus could
be elevated by inhibiting AOP2 expression via antisense
or RNA interference approaches
Similarity of the genes controlling glucosinolate
biosynthesis between B rapa and Arabidopsis
B rapa contains the orthologs of the Arabidopsis
glucosinolate regulators, except MYB76 (Table 2)
Unlike Arabidopsis, they are normally triplicated,
con-sistent with the Brassica genome triplication event The
duplication and divergence of the regulators in a small
multigene family along with multiple duplications of
their target biosynthesis genes may result in phenotypic
variation AOP2⁄ AOP3 null accessions of Arabidopsis
were shown to accumulate an increased level of the
precursor methylsulfinylalkyl glucosinolate but also a
considerably lower level of total aliphatic
glucosino-lates than the accessions with a functional AOP2 allele,
which has been explained by the differential feedback
regulation of transcript regulators MYB28, 76 and 29
by the side chain modification end products [29,30,64]
Similarly, epistatic interactions between AOP2 and
transcript regulators MYB28 and MYB29 may exist in
B rapa
BrGS gene duplication
The Brassica genome is believed to have triplicated
soon after its divergence from Arabidopsis [65–67] The
genome sizes of B rapa (550 Mb) and B oleracea (696 Mb) are more than four- and five-fold greater than that of Arabidopsis (125 Mb), respectively [68,69] This could be explained in part by the presence of big-ger gene families as a result of genome diploidization, segmentation or gene duplication In B oleracea, genome rearrangement is commonly followed by gene loss, fragmentation and dispersal [70] Many gene duplications arose as a result of the triplication event, and those genes involved in signal transduction or tran-scriptional control are more extensively retained than others during the evolution process [70] Some tandemly duplicated genes in B rapa and B oleracea are likely to
be the result of an unequal crossover during the rear-rangement process after Brassica genome triplication [45,46] Approximately 14% of B rapa genome is esti-mated to consist of transposable elements, the majority
of which are retrotransposons [69] It has been proposed that gene duplication also is facilitated by retrotranspo-son carrying a LTR [71] BrST is a good example of a multigene family with tandem arrays of genes in
B rapa The genes adjacent to two tandem repeats, BrST5b-1 and BrST5b-2, were colinear with their Arabidopsiscounterparts, and all the other BrST genes jumped to completely new positions (Fig 4) BrST5b-1,
3, 4 and 5 were found to be flanked by LTR Copia-like retrotransposons (data not shown) BrST5b-4 was dis-rupted by insertion of a putative non-LTR retrotrans-poson and shares 89% sequence identity with BrST5b-5
in a tandem array They also are tandemly arranged with BrST5b-3 in the same BAC clone, but with lower sequence identities compared to that between them This suggests two consecutive steps of duplication occur at the same locus
Sequence comparison of glucosinolate synthesis genes reflects evolution of Brassica lineage Soon after divergence from the Arabidopsis lineage and genome triplication, extensive interspersed gene loss or gain events and large-scale chromosomal rearrange-ments gave rise to three basic diploid species: B rapa (AA genome), Brassica nigra (BB genome) and B oler-acea (CC genome) [66] Our data support this pre-sumptive evolution order; BrGS sequence similarities among the Brassicas (mostly > 90%) are normally higher than those between Brassica and Arabidopsis (mostly 80–90%) Individual tandem repeats or dis-persed duplication events are indicative of the self-rear-rangements occurring within each species A convincing example is that AOP3 is only present in Arabidopsis and that AOP2 is tandemly duplicated in
B oleraceabut not in B rapa Even within B oleracea,