Transcriptional regulation of protein complexes in yeast potx

Second, putative regulatory sequence motifs were identified in the upstream regions of the genes involved in individual complexes and regulons were predicted on the basis of these motifs

Trang 1

Transcriptional regulation of protein complexes in yeast

Addresses: * Service de Conformation des Macromolécules Biologiques, Centre de Biologie Structurale et Bioinformatique, CP 263, Université

Libre de Bruxelles, Bld du Triomphe, B-1050 Bruxelles, Belgium † Institut Pasteur, Unité d'Expression des Gènes Eucaryotes, Institut Pasteur,

rue du Docteur Roux, 75724 Paris Cedex 15, France

Correspondence: Shoshana J Wodak E-mail: shosh@ucmb.ulb.ac.be

media for any purpose, provided this notice is preserved along with the article's original URL.

Transcriptional regulation of protein complexes in yeast

<p>Multiprotein complexes play an essential role in many cellular processes But our knowledge of the mechanism of their formation,

reg-known regulons, manually curated or identified by genome-wide screens, were mapped onto the components of multiprotein complexes

The complexes comprised manually curated ones and those characterized by high-throughput analyses Second, putative regulatory

sequence motifs were identified in the upstream regions of the genes involved in individual complexes and regulons were predicted on the

basis of these motifs.</p>

Abstract

Background: Multiprotein complexes play an essential role in many cellular processes But our

knowledge of the mechanism of their formation, regulation and lifetimes is very limited We

investigated transcriptional regulation of protein complexes in yeast using two approaches First,

known regulons, manually curated or identified by genome-wide screens, were mapped onto the

components of multiprotein complexes The complexes comprised manually curated ones and

those characterized by high-throughput analyses Second, putative regulatory sequence motifs

were identified in the upstream regions of the genes involved in individual complexes and regulons

were predicted on the basis of these motifs

Results: Only a very small fraction of the analyzed complexes (5-6%) have subsets of their

components mapping onto known regulons Likewise, regulatory motifs are detected in only about

8-15% of the complexes, and in those, about half of the components are on average part of

predicted regulons In the manually curated complexes, the so-called 'permanent' assemblies have

a larger fraction of their components belonging to putative regulons than 'transient' complexes For

the noisier set of complexes identified by high-throughput screens, valuable insights are obtained

into the function and regulation of individual genes

Conclusions: A small fraction of the known multiprotein complexes in yeast seems to have at

least a subset of their components co-regulated on the transcriptional level Preliminary analysis of

the regulatory motifs for these components suggests that the corresponding genes are likely to be

co-regulated either together or in smaller subgroups, indicating that transcriptionally regulated

modules might exist within complexes

Background

Multiprotein complexes such as the ribosome, spliceosome,

cyclosome, proteasome and the nuclear pore complex have an

essential role in cellular processes [1-3] Until recently,

information about the building blocks of specific complexeshas been rather selective, and the mechanisms underlying theformation of these complexes, and their regulation, lifetimesand degradation remain largely unknown

Published: 30 April 2004

Genome Biology 2004, 5:R33

Received: 26 November 2003 Revised: 30 March 2004 Accepted: 6 April 2004 The electronic version of this article is the complete one and can be

found online at http://genomebiology.com/2004/5/5/R33

Trang 2

One can surmise that the formation of multiprotein

com-plexes might be regulated at different levels, including

tran-scriptional regulation, post-translational modification and

degradation In prokaryotes a significant proportion of the

genes that are co-regulated at the transcriptional level code

for proteins that interact physically This proportion is even

higher for gene groups whose co-regulation is conserved in

different genomes [4] In some multiprotein complexes in

bacteria, the individual components were reported to be

expressed 'as needed', in a time-dependent fashion related to

their role in the complex [5]

In eukaryotes, mainly limited to yeast, gene-expression

pro-files have been shown to correlate with protein function and

protein-protein interactions [6-8] More particularly, genes

corresponding to components of multiprotein complexes

were found to exhibit correlated expression profiles,

espe-cially for complexes that form over a wide range of cellular

conditions [8] In contrast, the relationships between gene

expression and genome-scale two-hybrid interaction data

appear to be more tenuous [6,7,9]

Yeast is an ideal model system in which to investigate the

relations between protein interactions and gene

co-regula-tion It is one of the few organisms in which many individual

protein complexes have been characterized by biochemical

and other methods, with results available in the

Comprehen-sive Yeast Genome Database (CYGD) [10] In addition, two

independent studies recently characterized multiprotein

complexes in yeast by a large-scale experimental approach

involving tandem affinity purification and MS analysis (TAP

[11]) and high-throughput MS protein complex identification

(HMS, [12]) Each study identified several hundred

com-plexes, containing on average about eight and eleven

polypeptides, respectively Many of these were shown to be

associated with known cellular processes

Yeast has also served as a model for the analysis of gene

expression [13-15] and transcriptional regulation [16,17]

Information about the target genes of transcription factors

has been compiled in specialized databases such as

TRANS-FAC [18,19], SCPD [19], YPD [20] and aMAZE [21,22] Most

recently, the genes bound by 106 yeast transcription factors

were identified by a high-throughput approach [16],

produc-ing for the first time a global view of the transcriptional

regu-lation network in this organism

Here we investigate the transcriptional regulation of

multi-protein complexes in yeast In particular we aimed at finding

out to what extent components of such complexes are

co-reg-ulated We first determined the overlap between known sets

of co-regulated genes in yeast and groups of genes coding for

components of individual multiprotein complexes A set of

co-regulated genes is defined here as the group of target genes

of the same transcription factor, and is denoted a 'regulon', in

categories of regulons are considered The manually curatedregulons stored in the databases, and the regulons defined bythe gene-factor associations identified in the high-throughputanalyses mentioned above [16] The protein complexes exam-ined are those manually curated in databases and the twodatasets derived from the recent genome-scale analyses

We then applied pattern-discovery algorithms [24,25] to theupstream sequences of genes coding for the proteins involved

in each of the complexes in the three datasets under ation These algorithms are used to detect sequence patternsshared by some or all of these genes, which are likely to rep-resent binding sites for transcription factors These patternstake the form of short oligonucleotides (hexamers or pairs oftrimers) that occur much more frequently in the upstreamregions of these genes than in the corresponding regionsacross the entire yeast nuclear genome

consider-We have shown recently that these algorithms have an tant advantage of returning predictions with a very small rate

impor-of false positives (over-represented patterns in groups impor-of domly selected genes) when stringent enough statistical crite-ria are used [26] Alternative methods based on matrixdescriptions [27-31] allow a more refined description of pat-tern degeneracy, in which a given sequence position need not

ran-be strictly conserved But, unlike the approach used here, theyhave the inconvenience of nearly always returning a predic-tion, even for random sequences This is particularly prob-lematic when analyzing large groups of genes, of which asizable proportion might not be regulated at the transcrip-tional level, or at least not by the same transcription factor, asmight be the case for many of the protein complexes exam-ined here

Using the set of patterns detected for each complex, we ceeded to predict the components of the complex that arelikely to be co-regulated This is a difficult task, as theupstream regions of genes often contain multiple bindingsites for the same factor or can be regulated by a combination

pro-of different factors that bind to distinct sites [32,33] In tion, pattern-discovery algorithms generally return a number

addi-of strongly overlapping patterns for a given transcription tor, indicating the presence of a partial degeneracy [24,25].Therefore, identifying sets of co-regulated genes usuallyinvolves assembling the patterns into longer motifs, andsearching for upstream regions that score highly against thesemotifs, an approach that often yields ambiguous results.Here we use an alternative approach in which a discriminantanalysis is performed directly on the detected short patternsand their multiple occurrences [26], thereby avoiding the dif-ficult task of pattern assembly This analysis is done for all thecomplexes considered and the results are discussed in terms

fac-of our current knowledge fac-of these complexes and theirregulation

Trang 3

Statistically significant associations between annotated complexes and known regulons

(a) Associations between the annotated complexes and annotated regulons

Annotated complex Annotated

regulon

Components

in complex

Genes in regulon

Common genes

E-value Total overlap

Fatty acid synthetase

cytoplasmic

Trang 4

Together, the approaches presented here provide valuable

insights into the transcriptional regulation of multiprotein

complexes in yeast and help in extracting information on

function from genome-scale datasets for these complexes

Results

Correspondence between multiprotein complexes and

known regulons

The genes coding for the components of each protein complex

regulons, with the aim of detecting complex-regulon pairswhere the overlap between the components is more extensivethan would be expected by chance

The analyzed datasets of complexes comprised 243 annotatedprotein complexes from CYGD [10] and 725 complexes iden-tified by the high-throughput studies [11,12] The complexesfrom the latter two studies were taken as defined by theirauthors, without further grouping [34] The regulons datasetscomprised the 200 annotated and the 106 high-throughput

(b) Associations between annotated complexes and high-throughput regulons, identified by a genome-wide location analysis [16]

High-throughput regulon

Components

in complex

Genes in regulon

Common genes

E-value Total overlap

Permanent Respiration chain complexes

Cytoplasmic ribosomesCytoplasmic ribosomal large subunit

Only the most statistically significant associations (E-value ≤ 0.01) between complexes and regulons are listed (see Additional data file 1 (Figure b) for a complete list) Each line lists the association detected between a multiprotein complex denoted by its CYGD name (column 2) and a regulon denoted by its common name (column 3) Column 4 lists the number of genes in the complex and column 5 lists the number of genes in the regulon Column 6 lists the number of common genes between the regulon and complex, and column 7 lists the statistical significance criterion (E-value) for the detected overlap (see Materials and methods) The far right column lists the total number of genes in the complex that are common between it and all the regulons that map into it Complexes have been subdivided into three categories, 'permanent', 'transient' or 'others', as indicated in column 1, and described in Materials and methods When a smaller complex is completely included within a larger one and detected associations map into it, only the smaller complex is listed For example, the larger assembly 'Cyclin-CDK complexes' is not listed because the detected association is with one of its components the 'Cdc28p complexes' only When associations are detected with more than one complex of a larger assembly, as is the case for the small and large subunits of the cytoplasmic ribosomes, the name of the larger assembly is given first, with no details of the identified associations But those are listed for each of the component complexes Information on the annotated regulons in (a) was obtained from the TRANSFAC and aMAZE databases, from the list compiled by Young and colleagues [16,48] and from the recent literature

S2a-Table 1 (Continued)

Statistically significant associations between annotated complexes and known regulons

Trang 5

To determine whether the number of common components

for a given complex-regulon pair is above chance level, or

sta-tistically significant, we compute the expectation value

(E-value) of observing at least that number by chance, and retain

only pairs with an E-value below a certain threshold (see

Materials and methods)

Correspondence between regulons and annotated protein complexes

Table 1 lists the complex-regulon pairs whose overlap is above

chance level (E-value ≤ 0.01), obtained when mapping the

annotated complexes onto the annotated (Table 1a) and

high-throughput (Table 1b) regulons, respectively It is striking to

see that the 243 annotated complexes and 306 known

regu-lons form a total of only 57 pairs with a statistically significant

overlap Forty of those are with the annotated regulons, and

the remaining ones (only 17 in total) are with the

high-throughput regulons Those pairs involve only about 8% of

complexes (20 out of 243) and 14% of the regulons (44 out of

306) The overlap between known regulons and annotated

complexes is thus on the whole quite limited

Relating protein complexes to gene-expression data, Jansen

et al [7] found it useful to distinguish between two major

cat-egories of complexes 'Permanent' complexes are defined as

those that are detected under a wide range of different cellular

conditions, whereas 'transient' ones are defined as complexes

that form under a specific set of conditions While keeping in

mind that this division is probably oversimplified and could

sometimes be misleading, we follow these authors in

consid-ering it a helpful working hypothesis The list of complexes in

each category was derived from Jansen et al [7] with some

editing We classified complexes that did not clearly fit either

of the first two categories, and some larger assemblies

com-posed of several complexes, as 'other'

Table 1 reveals that meaningful overlaps between complexes

and known regulons occur for both permanent and

non-per-manent complexes Associations with the annotated regulons

involve fewer complexes of the permanent category than of

non-permanent ones (Table 1a) In contrast, the associations

with the high-throughput regulons involve more permanent

complexes than transient ones (Table 1b), in better agreement

with the reported stronger relations of permanent versus

transient complexes with mRNA expression profiles [7]

Another interesting observation is that the set of complexes

into which regulons map and the extent of overlap between

complexes and regulons is also quite different for the

anno-tated and high-throughput regulon datasets Regulons from

nucleosomal protein complex, ribonucleoside diphosphate

reductase and fatty-acid synthetase On the other hand,

com-plexes such as the proteasome, the Cdc28p cyclins and RNA

polymerase II are only involved in associations with

anno-tated regulons (Table 1a), whereas the ribosomal subunits or

cytochrome c oxidase complexes are only involved in

associa-tions with high-throughput regulons (Table 1b)

These and other differences are most likely to be due to thedifferent composition of the regulon repertoires in the twodatasets The annotated dataset contains nearly twice asmany regulons as the high-throughput one But the regulons

in the latter dataset are significantly larger, with on averagesix times more genes than in the annotated regulons (seeMaterials and methods) It is therefore not too surprising thatfor associations involving high-throughput regulons, the frac-tion of the components of a given complex covered by a regu-lon is in general higher than for annotated regulons It should

at the same time be cautioned that the high-throughput lons probably contain a fair number of spurious members(false positives) [26]

regu-Zoom-in on the overlaps between regulons and annotated complexes

We see that a complex is often associated with several lons This is due in part to the substantial overlap that oftenexists between the components of individual regulons Themost severe cases occur when different transcription factorsare annotated as regulating the exact same set of genes, a sit-uation that is often encountered for small regulons, and prob-ably results from incomplete information or because sometranscription factors act in combination or as complexes [35]

regu-We see for example that seven regulons map into the somal protein complex, six map into the ribonucleosidediphosphate reductase complex, and as many as 10 regulonsmap into the modular Cdc28p cyclin complexes (Table 1a)

nucleo-A given regulon also maps, in general, into more than onecomplex, often onto two, and occasionally onto three Thesemultiple associations form a patchy network, with several dis-connected clusters, which link complexes to regulons Thenetwork graphs built from the associations of the annotatedcomplexes, with annotated and high-throughput regulons,respectively, are illustrated in Additional data file 1 (FiguresS1 and S2)

Details of some of these clusters are illustrated in Figures 1and 2, highlighting the common genes involved The nucleo-somal protein complex (Figure 1a) has seven out of its eightcomponents in common with seven small regulons - Hta1/

Hta2, Spt10/Spt21 and Hir1/Hir2/Hir3 - whose genes tially overlap one another The ribonucleoside diphosphatereductase complex (Figure 1b) has all its four components incommon with a total of six partially overlapping regulons

par-The picture is significantly more complicated for the Cdc28p complexes (Figure 1c) As many as 10 regulons map

cyclin-into the 10 components of this complex: the Cln1 and Cln2

genes, which are regulated by as many as five different

tran-scription factors, and two trantran-scription-factor genes, Swi4 and Mcm1, also map into the glucan synthases and pre-repli-

cation complex, respectively

Trang 6

Correspondence between regulons and high-throughput protein

complexes

The total number of statistically significant overlaps (E-value

≤ 0.01) is also very low (66 in total) when the known regulons

are mapped onto TAP complexes and HMS complexes, even

though the number of complexes considered is much larger

(725)

The majority of the complex-regulon pairs with meaningful

overlap (53) involve annotated regulons, whereas only 13

pairs involve high-throughput regulons Matches with

regu-lons from either dataset generally involve only a very small

subset of the complex components, and there are twice as

many matches with complexes from the HMS than from the

TAP datasets, in line with the larger size of the former dataset

(for a complete list of associations, see Additional data file 2

(Table S2))

Owing to the appreciable overlap between the components of

different complexes within and between the TAP and HMS

datasets, the network of associations between these

com-plexes and the regulons is much more intricate than for the

annotated complexes A network graph was built from the

larger set of 125 complex-regulon pairs with meaningful

over-laps (E-value ≤ 0.1) involving the annotated regulons (Figure

3) This network features seven separate dense clusters of

connections (Figure 3a-g) Details of the regulon-complex

overlaps in some of these clusters, highlighting the common

genes involved, are depicted in Figure 4a-c The remaining

clusters are detailed in Additional data file 1 (Figure S3) In

Figure 4h the set of remaining very small clusters, each

involving mostly one or two connections, is grouped

The first cluster (Figure 4a) corresponds chiefly to the overlap

between the Rpn4 regulon and 12 rather large complexes (six

TAP and six HMS complexes) Nine of the 11 genes of this

reg-ulon map onto these complexes All the complexes contain

components of the yeast proteasome, and some other

functionally related proteins in variable proportions estingly, six of the nine common genes correspond to proteinsfrom the 19S regulatory subunit, encoding four of the sixATPases in the subunit (Rpt2, Rpt4, Rpt5, Rpt6) A further

Inter-two genes, PRE6 and PRE2, code, respectively, for alpha and

beta subunits of the catalytic domain [36], and another gene

(RAD23) encodes a ubiquitin-like protein, which links DNA

repair to the ubiquitin/proteasome pathway [37]

The second cluster (Figure 4b) involves four partially ping regulons of three genes each, totaling five genes Thesegenes map into three medium-sized complexes (6-16 genes)and one large complex of 40 genes, with no more than two tothree genes mapping into the same complex Here, too, themajority of the five genes correspond to a biologically activeassembly - the ribonucleoside diphosphate reductase com-plex and associated kinase The third cluster (Figure 4c)involves genes of the nucleosomal protein complex A similaranalysis can be made for the remaining four clusters (data notshown), and similar observations are made when analyzingthe largest clusters in the network graph built from the 46 sta-tistically significant overlapping pairs (E-value ≤ 0.1) involv-ing the TAP and HMS complexes and high-throughputregulons (see Additional data files 1 and 2 (Figures S4, S5 andTable S2d, respectively))

overlap-This detailed analysis shows that although the subset of thecomponents of the multiprotein complexes that corresponds

to known regulons is usually quite small, it tends to be posed of proteins with close physical interactions and/orclear functional relations We also find that the bulk of theoverlaps involve genes that map into both permanent com-plexes such as the proteasome or the nucleosomal-proteincomplex, as well as into non-permanent ones, such as theribonucleoside diphosphate reductase and the cyclin-Cdc28pcomplexes No clear trends can therefore be identified fromthese data on the regulation of any one category of complexes

com-in particular

Detailed view of the main clusters in the network linking annotated protein complexes and regulons

Figure 1 (see following page)

Detailed view of the main clusters in the network linking annotated protein complexes and regulons The network (shown Additional data file 1 (Figure S1)) was built from the multiple links corresponding to associations with E-value ≥ 0.1, identified between the 243 CYGD yeast multiprotein complexes and the 200 annotated regulons (see text) Ellipsoid frames represent complexes, rectangular frame represent regulons, with individual complexes and regulons appearing in different colors in a given cluster Individual complexes are identified by their name in the CYGD complexes catalog [10] and regulons are denoted by the name of the bound transcription factor Genes involved in complexes or regulons are enclosed, respectively, in rounded frames or rectangles of the same color as the complex or regulon, and are displayed by their common name The two digits given in parentheses indicate

the number of genes involved in this cluster for the complex or regulon, and the total number of genes in the complex or regulon, respectively (a)

Cluster involving associations between three groups of regulons (Hta1-Hta2, Hir1-2-3, and Spt10-Spt21) and seven of the eight genes of the nucleosomal

protein complex (b) The ribonucleoside diphosphate reductase cluster, involving associations between all four genes of the corresponding complex and four groups of co-regulated genes belonging to six regulons (c) Cluster involving associations between all the 10 components of the Cdc28p complexes,

and seven distinct groups of genes belonging to 11 regulons Five regulons - Cln3, Sit4, Spt16, Bck2, and Swi4 - map onto the exact same cyclin genes (CLN, CLN2) Two regulons, Swi4, and Mcm1, map also into the glucan synthases and pre-replication complex, respectively.

Trang 7

Swi4 (4/8) Azf1 (2/2)

Sit4 (2/2)Cln3 (2/2)

Spt16 (2/2)Bck2 (2/2)

Swi6 (3/10)

CDC28 CLN3

CLB3

CLB6

CLB1 CLB4

CLN2

CLN1

CLB2 CLB5

RNR2

Mbp1 (2/6)

Ribonucleoside diphosphatereductase (4/4)

RNR4

Rfx1 (3/5)

Rad9 (2/2)Yku70 (2/2)

FKS1 KRE6

Glucan synthases(2/5)

CDC6 CDC46

Pre-replicationcomplex (2/14)

(c)

Trang 8

Figure 2 (see legend on next page)

Hap 25/69

COR1 COX9

QCR7 RIP1

QCR9 COX6

COX12

COX8 COX7 COX5A

CYT1 QCR2

Hap3 (4/23)Hap2 (5/19)

Cytochrome c oxidase(8/8)

Cytochrome bc1complex (7/9)

ATP5

ATP17 ATP15

Hir1 (6/30)Hir2 (6/21)Met4 (2/29)

Nucleosomal proteincomplex (7/8)

CBL6

CDC7

CDC6 RAD27 CDC45 RFA2

(a)

(b)

(c)

Trang 9

The very limited overlap between complexes and regulons

detected above might be biologically meaningful, or might be

due to the limited information that is currently available on

the nature of protein complexes and regulatory networks in

yeast Given these uncertainties, it seemed of interest to

complement the above analyses by an approach in which

reg-ulons are directly predicted from the components of protein

complexes

If the components of a given protein complex are co-regulated

on the transcriptional level, one would expect to find common

regulatory sequence elements, corresponding to

transcrip-tion factor binding sites, in the upstream regions of the

corre-sponding genes The problem of identifying regulatory sites is

notoriously difficult [33] To tackle it we applied algorithms

for the discovery of oligonucleotides (here, hexanucleotides)

[24] and spaced pairs of trinucleotides [25], which occur

more frequently in the upstream regions of the genes coding

for the components of each complex than in the

correspond-ing regions across the entire yeast nuclear genome For this

approach we considered only complexes with at least five

components

Highly significant patterns are detected for only a small subset of the

complexes

Figure 5 plots the number and fraction of the protein

com-plexes in each of the three datasets examined (the annotated,

TAP and HMS complexes) for which regulatory-sequence

patterns were identified by our prediction method using three

different reliability thresholds Plotted alongside are the

cor-responding results obtained here for sets of randomly

selected genes (used as negative control) and results for

known regulons (positive control) obtained in another study

[26]

A first observation is that the fraction of complexes for which

regulatory patterns are identified with some reliability is

quite low No more than 27-28% of the complexes from either

of the three analyzed datasets have at least one pattern with

statistical significance Sig ≥ 1 (corresponding to an E-value ≤

0.1) At this threshold the fraction of complexes with

identi-fied patterns is nonetheless about 7-10% higher than for gene

groups selected at random With the more stringent cance threshold (Sig ≥ 2), the fraction of complexes with atleast one pattern drops further, but less for the curated (15%)and TAP complexes (13%), than for the HMS complexes (8%)

signifi-We recently applied the same algorithms to the dataset ofannotated regulons [26] As the genes belonging to the sameregulon are expected to be co-regulated and hence to exhibitcommon regulatory-sequence patterns, our algorithmsshould perform well on these genes This was indeed the case

Patterns with Sig = 1 were identified in as many as 84% of theannotated regulons, as illustrated in Figure 5

The fraction of the complexes in which regulatory patternscan be reliably detected is thus clearly much smaller, confirm-ing that the components of complexes are on average muchless consistently co-regulated than the genes that belong toknown regulons

Assigning components of protein complexes to putative regulons on the basis of predicted patterns

Having shown that highly reliable regulatory patterns can bedetected in genes corresponding to at least a fraction of thecomplexes, we now proceeded to determine, for each com-plex, which of its components are likely to be co-regulated,and what fraction of the complex they represent To this end,complexes with at least five component genes, featuring atleast one significant pattern (Sig ≥ 1), are selected A stepwiselinear discriminant analysis [38] with a leave-one-out proce-dure is applied to assign a gene involved in a given complex,either to its original complex or to a control group of ran-domly selected genes, according to the number of occurrences

of the discovered patterns in its upstream region Theassigned group (complex or control) is then compared to thegroup from which the gene was drawn to evaluate the cover-age and positive predictive power (PPP) of the assignment

Coverage is defined as the fraction of the total number ofgenes in the complex that were reassigned to it by the discri-minant procedure PPP is defined as the fraction of totalnumber of genes assigned to the complex that actually belong

to it (see Materials and methods for details)

Figure 6 displays the coverage versus PPP values for a total of

140 individual complexes from the three datasets analyzed

Detailed view of the main clusters in the network linking annotated protein complexes and high-throughput regulons

Figure 2 (see previous page)

Detailed view of the main clusters in the network linking annotated protein complexes and high-throughput regulons The network was built considering

all the associations with E-value ≤ 0.1; regulons and complexes are denoted and depicted as described in the legend of Figure 1 (a) Cluster of associations

involving seven of the eight components of the nucleosomal protein complex Unlike in the equivalent cluster of Figure 1a, here only two distinct groups

of, respectively, two and six genes belonging to three rather large regulons (respectively, Met4 and Hir1-2) map into this complex Note that here Hir1-2

comprises a much larger group of genes than in the annotated regulons (b) Cluster of the respiratory chain complexes It comprises three complexes: the

F0-F1-ATP-synthase complex, and the cytochrome bc1- and cytochrome c oxidase complexes Twenty-five genes of the Hap4 regulon, and four and five

genes of the Hap3 and Hap2 regulons, respectively, map into these complexes As noted in the text, the Hap4 transcription factor is known as a

respiratory-chain activator that does not bind DNA but fosters DNA binding by Hap2 and Hap3 [45]) The reasons for the more limited overlap between

these latter two regulons and components of the respiration complexes are not clear (c) An interesting cluster where the main node is the large Mbp1

regulon of 112 genes, of which 10 overlap with components of three complexes: the small replication factor A complex (3 genes), the replication

complexes (49 genes) and the Cdc28p complexes (10 genes).

Trang 10

Figure 3 (see legend on next page)

Hir1(4) 2

3

TAP139(43)

Abf1(37) 4

Gcr1(18) 3 Rap1(32)

4

TAP145(19) 3

TAP148(34)

7 TAP157(36) 7

TAP159(50) 4

Reb1(19) 3

TAP162(36)

2

TAP18(3)

2 2 2

Sto1(1) 1

Tye7(6 ) 2

TAP31(16) Msn2(56)

3

Msn4(58) 3

TAP33(3)

Gcn4(40) 2

Rad26(1) 1

TAP47(3)

Arg82(1) 1

TAP62(13)

3

Gcr2(11) 2

4

TAP68(8)

Hap2(14) 2

Hap3(15) 2

TAP83(10)

Bck2(2) 2

Cln3(2) 2 Far1(2) 2

Sit4(2) 2

Spt16(2) 2

Swi4(8) 2 Swi6(10) 2

TAP86(19) 3

TAP88(9) 2

2 Spt10(3) 2

Spt21(3) 2

Spt6(2) 2

HMS106(8)

Snf2(6) 2

Swi1(5) 2

HMS111(10)

3 3

HMS126(11) 3

HMS184(13)

2 2

HMS188(2)

Mcm1(14) 2

HMS210(55)

Yap1(32) 4

HMS219(3)

Htb1(1) 1

Spt4(1) 1

Spt5(1) 1

HMS220(27) 5 HMS223(15) 2

HMS234(40)

Dun1(3) 2

HMS248(3)

Ddc1(1) 1

HMS26(14)

3 3

HMS55(27) 2

Hir3(3) 2

HMS286(38) 2

HMS29(18) 2

HMS293(2)

Rad53(1) 1

HMS300(21)

Rtg3(6) 2

HMS303(3) 2

Snf3(1) 1

2

HMS349(8)

2

2 2

HMS356(5) 2

2

HMS365(18) 2

HMS373(53)

2 2

HMS391(12) 2

HMS407(4) 2

HMS422(7) 2

HMS424(5) 2

HMS466(16) 2

2 2

HMS468(10) 2

Ndt80(11) 2 2

Xbp1(5) 2

HMS50(45)

4

HMS51(27) 4

HMS273(50) 2

HMS84(47)

Bas1(17) 3

5 5

Pho2(21) 3 4

HMS98(6)

2 2

Trang 11

here (34 TAP, 75 HMS, and 31 annotated ones) The coverage

obtained for these complexes has a mean value of 48%, and a

standard deviation of about 25% The mean PPP is 80%, with

a standard deviation of about 10%, and only a single case with

perfect assignment (PPP = 100%) There is very little

differ-ence between the results obtained for the annotated, TAP,

and HMS complexes (see Additional data file 2 (Table S3) for

details) It is noteworthy that significantly higher average

val-ues for the coverage and PPP (72% and 92% respectively)

were obtained by applying the same procedure to the

anno-tated regulons [26]

Putative regulons in the annotated complexes

We determined whether the putative regulons identified by

our procedures can provide useful information on the

transcriptional regulation of protein complexes As a first

step, we discuss several aspects of the prediction results for

patterns and putative regulons obtained for the annotated

complexes, summarized in Table 2 This lists the results for all

the complexes for which at least one statistically significant

(Sig ≥ 1) regulatory pattern has been detected A complete list

of the predicted co-regulated components in each of the

com-plexes considered is given in Additional data file 2 (Table S4)

Table 2 reveals a clear difference between results for the

per-manent and the non-perper-manent complexes Most strikingly,

the fraction of the components of a given complex covered by

our putative regulons is noticeably higher for most

perma-nent complexes (0.7-1.0) than for the non-permaperma-nent ones

(0.06-0.6) The number of significant regulatory patterns and

the significance value of the 'best' pattern are also generally

higher in theses complexes Among the complexes with the

highest coverage by putative regulons and a large number of

statistically significant patterns we find the proteasome, the

large and small subunits of the cytoplasmic ribosome, three

complexes of the respiratory chain, the translation elongation

complex, as well as the nucleosomal protein and cyclin

Cdc28p complexes To illustrate the information provided by

our approach, we will discuss in detail our findings for the

nucleosomal protein complex and the replication fork

complexes

Nucleosomal protein complex

This complex has all of its eight components predicted to be

part of a regulon, with a large number (20) of significant

patterns Details of the patterns discovered, of which the most

statistically significant are spaced dyads, and their locations

in the upstream regions of the corresponding genes areshown in the feature map (Figure 7) All but one of thesedyads are mutually overlapping, and can be aligned to formthe larger motif cGCGAan{5}caGAACg, where upper-case let-ters denote the most conserved residues, which seem to be the'core' of the binding site, and the number in brackets is thelength of the spacer in terms of the number of interveningnucleotides The feature map shows that each upstreamsequence contains at least two occurrences of this 'core', withsome differences in the bases flanking this core Althoughseveral regulons - Hta1/Hta2, Hir2/Hir3/Hir4 and Spt10/

Spt21 - are known (and were found here) to map into thiscomplex, covering a total of seven out of the eight compo-nents of the complex (Figures 1, 2), our findings represent thefirst instance where a regulatory motif is proposed for all themembers of the nucleosomal complex

Replication fork complexes

The replication fork complex is an assembly of proteinsinvolved in DNA replication (Table 2) It is encoded by a total

of 30 genes, which can be subdivided into several smallercomplexes such as the DNA polymerase δ, DNA polymerase ε,DNA α1 primase and replication factor C complexes Analysis

of the entire assembly detected 12 patterns with a maximumsignificance of 13.3, corresponding to an E-value of 2 × 10-13.The discriminant analysis carried out on the basis of thesepatterns allowed us to assign about half (17) of the 30 compo-nents of this assembly to putative regulons (Table 2)

Table 3 lists the probabilities for individual components to beassigned to the complex by the discriminant analysis Itreveals a striking observation: the predicted co-regulatedgenes correspond almost perfectly to seven out of the 14individual complexes or entities that make up the assembly

The 17 genes that belong to the putative regulons includethree of the four components of the DNA polymerase α1 pri-mase complex, all the components of the DNA δ and εcomplexes, the replication factor A and topoisomerase com-plexes, as well as the proliferating cell nuclear antigen(PCNA) and exonucleases Furthermore, the majority of thesegenes were assigned to the replication fork assembly withhigh probability (0.8-0.99)

Interestingly, Jansen et al [7] reported a poor correlation

with expression data for the replication complex, a large

com-Network graph of the statistically significant links between the TAP and HMS complexes and annotated regulons

Figure 3 (see previous page)

Network graph of the statistically significant links between the TAP and HMS complexes and annotated regulons Each node represents a complex (red

ellipse) or a regulon (blue rectangle) Individual complexes are identified by a number, prefixed by TAP [11] or HMS [12] Regulons are denoted by the

name of the bound transcription factor The number of genes in each group (complex or regulon) is given in parentheses The number of genes common

to a given complex-regulon pair is indicated along the lines (arcs) joining the pair (a-g) Seven dense clusters of connections (h) The set of remaining very

small clusters are grouped, each involving mostly one or two connections Clusters (a-c) are detailed in Figure 4.

Định dạng
Số trang	22
Dung lượng	433,08 KB