In this work, we developed a method to exploit a dataset of approximately 1200 microarray experiments in conjunction with a seed group of known transcription factor target genes and show
Trang 1factor target genes
Ralf Mrowka1,2,3, Nils Blu¨thgen4and Michael Fa¨hling1,3
1 Paul-Ehrlich-Zentrum fu¨r Experimentelle Medizin, Berlin, Germany
2 AG Systems Biology – Computational Physiology, Berlin, Germany
3 Johannes-Mu¨ller-Institut fu¨r Physiologie, Charite´-Universita¨tsmedizin Berlin, Germany
4 School of Chemical Engineering and Analytical Sciences, Manchester Interdisciplinary Biocentre, University of Manchester, UK
The prediction and analysis of the regulatory networks
underlying gene expression is a central challenge in
systems biology and functional genomics [1,2]
Regula-tion of transcripRegula-tion is the initial mechanism for
con-trolling the expression of genes Key regulators of
transcription are transcription factors, which bind to
DNA motifs in noncoding regions that control gene
transcription Therefore, the identification of
transcrip-tion factor target genes is one major element in the
understanding and reconstruction of the regulatory
network Although many DNA motifs for trans-cription factor binding are known and are contained
as consensus sequences and binding matrices in data-bases such as transfac [3] and jaspar [4], their direct use for genome-wide matching in promoter sequences
of higher organisms is greatly limited [5] Current methods that use sequence data give results that are dominated by false predictions [5] The issue of a high proportion of false positives in pure sequence-based methods has been known for a long time [6], and also
Keywords
feedback; glaucoma; NF-jB; optineurin;
transcription factor target prediction
Correspondence
R Mrowka, Paul-Ehrlich-Zentrum fu¨r
Experimentelle Medizin, AG Systems
Biology – Computational Physiology,
Tucholskystr 2, D-10117 Berlin, Germany
Fax: +49 30 450528972
Tel: +49 30 450528218
E-mail: ralf.mrowka@charite.de
(Received 26 February 2008, revised 1 April
2008, accepted 16 April 2008)
doi:10.1111/j.1742-4658.2008.06471.x
Reliable prediction of specific transcription factor target genes is a major challenge in systems biology and functional genomics Current sequence-based methods yield many false predictions, due to the short and degenerated DNA-binding motifs Here, we describe a new systematic gen-ome-wide approach, the seed-distribution-distance method, that searches large-scale genome-wide expression data for genes that are similarly expressed as known targets This method is used to identify genes that are likely targets, allowing sequence-based methods to focus on a subset of genes, giving rise to fewer false-positive predictions We show by cross-vali-dation that this method is robust in recovering specific target genes Fur-thermore, this method identifies genes with typical functions and binding motifs of the seed The method is illustrated by predicting novel targets of the transcription factor nuclear factor kappaB (NF-jB) Among the new targets is optineurin, which plays a key role in the pathogenesis of acquired blindness caused by adult-onset primary open-angle glaucoma We show experimentally that the optineurin gene and other predicted genes are tar-gets of NF-jB Thus, our data provide a missing link in the signalling of NF-jB and the damping function of optineurin in signalling feedback of NF-jB We present a robust and reliable method to enhance the genome-wide prediction of specific transcription factor target genes that exploits the vast amount of expression information available in public databases today
Abbreviations
CASP4, caspase 4; ChIP, chromatin immunoprecipitation; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; HEK, human embryonic kidney; HIF-1, hypoxia-inducible factor 1; HNF4, hepatocyte nuclear factor 4; IKK, IjB kinase; NEMO, nuclear factor kappaB essential modulator; NF-jB, nuclear factor kappaB; OPTN, optineurin; RGA, reporter gene analysis; STAT5A, signal transducer and activator of transcription 5A; TNF-a, tumor necrosis factor-a.
Trang 2applies for the transcription factors analysed in this
study The major problem is the short length and high
degeneracy of the DNA-binding motifs, which give rise
to one predicted binding site per 1000–10 000 bp by
sheer chance Therefore, other resources, such as
phylo-genetic footprinting have been explored to further
restrict and ‘purify’ potential targets to more likely
candidates [7,8] Such methods decrease the number of
false predictions by about one order of magnitude,
which is still not good enough for genome-wide
predic-tions Because the potential list of targets is too large,
further information needs to be exploited to
concen-trate the analysis on the genes that have a higher
prob-ability of being true target genes
Gene ontology as a controlled and
computer-read-able way to annotate genes has been used extensively
to characterize clusters of genes from microarray [9,10]
data and also to validate microarray data [11] Despite
the enormous number of false-positive predictions for
transcription factor targets with current methods,
sig-nificant correlations with gene ontology terms have
been found that can be used to enhance prediction
quality [12,13] In addition, statistical methods have
been developed to associate genes with disease [14],
and seed-based computational procedures have been
applied to identify brain cancer-related genes [15]
Currently, experience and knowledge of pathways
and an educated literature search may help us to focus
on possible candidates The inclusion of information
from expression experiments conducted under different
experimental conditions may hint at potential
candi-dates for further evaluation, as these data provide the
relevant biological functions of transcription factors,
which directly influence mRNA concentrations in the
cell Well-designed, small-scale expression profile
experiments have been successfully used to identify
transcription factors involved in certain pathways
[16,17] Especially when applied to time-series data,
seed-based clustering methods have been very
success-ful in identifying novel targets by comparing
expres-sion kinetics with known targets for p53 and for
picking up genes regulated in different cell-cycle phases
[18,19] However, these approaches require dedicated
microarray experiments We addressed the question as
to whether it is feasible to explore the large body of
expression information that is already stored in public
databases These datasets might contain information
about expression at different time points for different
cell lines that might be only marginally related to the
transcription factor under investigation, and we
won-dered whether these datasets would allow us to extract
the relevant information about the action of
transcrip-tion factors on their targets
In recent years, several microarray techniques have been developed to measure mRNA concentration on a genome-wide scale [20] In addition, efforts have been made to store individual microarray experiments in databases Microarray expression data have been used
in recent times to improve transcription factor target prediction [21] In this work, we developed a method
to exploit a dataset of approximately 1200 microarray experiments in conjunction with a seed group of known transcription factor target genes and show that the information available in the databases is sufficient
to increase the accuracy of prediction drastically We elucidate and exemplify our seed-distribution-distance method for predicting novel nuclear factor kappaB (NF-jB) targets NF-jB is involved in pathways important for both physiological processes and disease conditions It plays an important role in the control of immune function, differentiation, inflammation, stress response, apoptosis, cell survival, processes of develop-ment, and progression of cancers [22] Thus, NF-jB has become one of the most widely studied transcrip-tion factors Five NF-jB genes (NFKB1, NFKB2, RELA, c-REL and RELB) belong to the NF-jB gene family, and the resulting proteins are able to form homodimers or heterodimers [23] Prior to activation, NF-jB is localized in the cytoplasm and is tightly associated with its inhibitors (IjB proteins) and p100 proteins Multiple stimuli such as tumor necrosis fac-tor-a (TNF-a), UV radiation and free radicals, activate NF-jB signalling through activation of IjB kinases (IKKs), which phosphorylate IjBs and p100 proteins, subsequently leading to their polyubiquitination and degradation [24]
Results
The seed-distribution-distance method
We started by defining a ‘seed’ group of known NF-jB targets by collecting known NF-jB targets mentioned
in an NF-jB review paper [25] matching ensembl entries, resulting in 91 genes Joining the 91 target genes with the genes in the microarray set resulted in
81 genes, which were used as the seed We obtained these large-scale microarray expression data [26] (detailed description of data in supplementary Doc S1) from the Stanford microarray database [27] The set contains genome-wide data from 1202 hybridization experiments from human tissues and cell lines Subse-quently, we ranked each gene x according to its similarity L(x) of expression to the seed group (detailed results given in supplementary Doc S2) We defined similarity L(x) for a gene x by taking the
Trang 3median correlation of gene x to the seed and
subtract-ing its median correlation to all genes (typical
distribu-tions of correladistribu-tions of genes to the seed group are
shown in supplementary Fig S1) Thus, if L(x) showed
high values, the particular gene was similarly regulated
as the seed gene group In contrast, if the absolute
value of the similarity measure was low, it indicated
that the median of distribution was close to that
corre-lation distribution of the gene to a randomly selected
group Using the similarity measure L, we then sorted
all remaining human genes and thereby obtained a
ranking of the genes according to their similarity to
the seed group To avoid a circular argument, we
would like to stress that for all statistical analyses and characterization of rank, the seed group was excluded
A schematic representation of this procedure is given
in Fig 1 The essence of the method is that if a gene’s correlation to those in the seed set (represented by the median) is larger than the median of the correlation to all genes, then it is more likely to be related to the seed set, the members of which are then more likely to be targets of the transcription factor This method requires that at least the initial seed set of true targets
is known, and that other targets are correlated to sev-eral genes in the seed set Furthermore, the method is based on the assumption that there is a relationship
Fig 1 Schematic diagram of the workflow
in this study Expression profiles of a gene
g are compared to the expression profiles of the seed genes and randomly selected genes A distance score L(x) is calculated that quantifies specific expression similarity
to the seed The genes are then ranked on the basis of L(x), searched for putative bind-ing sites in their promoter region, and sub-jected to a reporter gene assay.
Trang 4between gene coexpression and gene coregulation.
The ranking can also be done by other scores than
the median correlation For instance, we have ranked
the genes using a one-sided P-value derived from a
computationally more extensive Mann–Whitney
rank-sum test, and found similar performance as with L(x)
(see supplementary Fig S3)
Top members in the rank show typical NF-jB
functions
We next analysed the top members of the obtained
rank with regard to their gene ontology classification
For the top 600 genes, we examined whether any gene
ontology classification is significantly enriched using
rigorous statistics [12] It turns out that the list of
sig-nificant gene functions of the top 600 genes as shown
in supplementary Table S1 is congruent with the
func-tions of NF-jB described in the literature
We further analysed the occurrences of NF-jB
typi-cal functions within the rank We found that there was
a steep increase of the density of genes involved in
‘immune response’, starting at approximately rank 700
when moving from lowest to highest ranks The
proba-bility of a gene being involved in the immune response
is therefore greatly increased for the top members in
the rank, as seen in Fig 2
High density of putative NF-jB DNA-binding sites
in promoters in the top group of the rank
As the overrepresentation of typical NF-jB-related biological functions might be due to coexpression mediated by different transcription factors, we decided
to analyse the sequences of putative promoter regions
of the high-ranking genes
We predicted binding sites for all vertebrate tran-scription factors contained in the transfac database
in the 500 bp putative promoter region of all genes in the ranking We derived the 500 bp sequences upstream of the transcriptional start site from the ensembl database We chose to limit our search to
500 bp, because we and others observed earlier that the majority of promoter sequences fall within this region [12,28]
To illustrate our method, we chose to search for consensus sequences from the transfac database in the putative promoter regions, as this method does not require an additional parameter like more sophisti-cated weight-matrix methods, which typically require a cut-off score (see also supplementary Table S5) We analysed the distribution of occurrence of all predicted factor-binding sites in the promoters of genes along the rank For each predicted binding motif, we calcu-lated the ratio of the number of occurrences in the upper 5% of the rank divided by the expected occur-rence in the top 5% (given by 0.05 times the total number of occurrences) A list of the motifs sorted by this ratio has NF-jB-binding motifs in the top ranks, namely NFKAPPAB65 (P = 0.0028) and NFKAP-PAB50 (P = 0.0239) (P-values from the binomial test; see Experimental procedures) In addition, this list includes motifs of the transcription factors BACH2 (P = 0.0025), signal transducer and activator of tran-scription 5A (STAT5A) (P = 0.0036), and VBP (P = 0.0106), which are enriched on average in the top group A graphical representation is given in Fig 3 (see also supplementary Table S4)
Robustness of seed-distribution-distance method The original seed group contained 81 known NF-jB targets (supplementary Table S2) As, for most tran-scription factors, fewer targets are known, we investi-gated whether the seed-distribution-distance method might also give reliable results if the seed was substan-tially smaller We applied a cross-validation strategy
by randomly dividing the original 81 targets into two groups, one group being the seed, and the remain-ing genes constitutremain-ing the other group, named the test group, t Several sizes of the seed were used (1, 10, 20
0
0.05
0.1
0.15
0.2
0.25
0 0.1 0.2
Position of gene in the ranking
"high rank" position "low rank"
Genes involved in immune response
Fig 2 Density of occurrences of genes annotated with the term
‘immune response’ in the ranking after applying the
seed-distribu-tion-distance method Immune response genes are highly enriched
in the top members of the rank (P < 0.0001, two-sided
Mann–Whit-ney rank-sum test) Red, individual occurrences of immune
response genes; black line, density of genes that are annotated
with the term Inset: density for all genes in the rank.
Trang 5and 50 are shown in Fig 4; cumulative representations
of the distributions are provided in supplementary
Fig S2) After rank construction using the reduced
seed, the test group was then analysed regarding its
position in the rank This procedure was repeated 100
times It turned out that the test group members were
strongly present in the top positions of the rank, and
this was preserved even if a considerable part of the
original targets was not used for the seed Even if one
used, for example, only 10 of 81 members of the seed,
the remaining 71 genes in the test group were highly
enriched in the top ranks, as shown in Fig 4
Moreover, we addressed the question of whether the
seed-distribution-distance method is also effective in
enriching targets for other transcription factors We
chose E2F [29,30], ETS1 [31,32], hypoxia-inducible
factor 1 (HIF-1) [33], hepatocyte nuclear factor 4
(HNF4), and c-Myc [34], and collected seed groups for
these factors (supplementary Tables S2 and S3) We
applied our method to these seed groups in a
jack-knife manner (i.e we iteratively left one seed member
out and determined its position in the rank) For all of
these additional transcription factors, the seed mem-bers left out were strongly enriched in the top of the rank (Fig 5) Moreover, the top members of the rank were strongly enriched with typical gene ontology terms of the factors for E2F and HNF4 For ETS1, HIF-1 and c-Myc, this ontology enrichment is not as clear as for the other three tested factors One reason could be the considerably lower number of gene onto-logy annotated genes for the specific terms and, in the case of c-Myc, the broad-spectrum ontologies [34] The results of this jack-knife procedure also provide
an estimate of how many of the true positives will lie
in the upper 5%: about 18–39% of all targets would
be in the upper 5% of genes of the rank (26% for NF-jB, 39% for E2F, 29% for ETS1, 18% for HIF-1, 36% for HNF4, and 20% for c-Myc) Thus, applying the seed-distribution-distance method will enrich the true targets in the top 5% of the rank by a factor of 4–8
0 0.5
1 1.5
2 2.5
3
Enrichment of putative transcription factor
binding sites in top group
NF κB 65 ST A T5a
VBP1
NF κB 50
BACH2
Binding sites
for 234 other
vertebrate
transcription
factors
Occurence Enriched P < 0.025 Depleted P < 0.025
Fig 3 Distribution of enrichment of putative transcription
factor-binding motifs in the ranking after applying the
seed-distribution-dis-tance method The seed-distribution-disseed-distribution-dis-tance method enriches
genes with putative NF-jB-binding sites in the respective promoter.
The top gene group of the seed rank was analysed regarding
tran-scription factor-binding motif enrichment within the )500 bp
pro-moter region The binding motifs for NF-jB 50 and NF-jB 65 are
among the transcription factor-binding sites that are most strongly
enriched Note that the initial seed group was not contained in this
analysis.
Recovered position in gradient
Histogram of recovery test
0 2000 4000 6000 8000 10 000 12 000 14 000
0 0.1 0.2 0.3 0.4
0.5
Original seed n = 81 Seed n = 50 Seed n = 20 Seed n = 10 Seed n = 1
Fig 4 Recovery of target genes in a cross-validation test: the origi-nal seed was divided into two parts: (a) a group of members for rank construction; and (b) a test group with the remaining members
of the original seed Histograms of the recovery position of the test group are shown for the newly constructed ranks using the seed without the test group (median: s, , h, ) If, for example, 10 genes are used as a seed (71 in the test group), the relative occur-rence of the recovered positions are still very high (h), i.e the enrichment capability of the seed-distribution-distance method is still highly preserved For comparison, the relative occurrence of members of the original seed in the corresponding rank is given (d) The error bars indicate the 5th and 95th percentiles of the dis-tribution Corresponding cumulative histograms are given in supple-mentary Fig S2.
Trang 6Taken together, these results suggest that the
seed-distribution-distance method is applicable to other
transcription factors as well, and might be used for
much smaller seed sizes than the 81 genes used in the
NF-jB seed
The list of predicted NF-jB targets and
experimental verification
We assembled a list of predicted NF-jB target genes
by selecting all genes that showed a putative
NF-jB-binding site (a match of a transfac consensus motif
of NF-jB) in the 500 bp upstream of the transcription
start site and were members of the upper 5% in the
rank The resulting list is shown in Table 1 Eight of
the 16 predicted targets have already been reported in
the literature to be direct targets of NF-jB, but were not
in the seed
We decided to validate three of the novel predicted
targets by performing luciferase reporter assays We
focused on optineurin (OPTN), among SPI-B, and
cas-pase 4 (CASP4), and chose NFKBIA as a positive
control and DARS from the bottom of our rank as a
negative control We cloned their human promoters in
a luciferase reporter plasmid and generated identical
plasmids in which the predicted consensus sequence of
the NF-jB-binding site was deleted A widely used
method to induce NF-jB is stimulation by means of
TNF-a Human HEK293 cells were transiently
trans-fected with the reporter plasmids, and TNF-a
stimula-tion (1.25–20 ngÆmL)1) was applied For all three unmodified promoters, luciferase activity was strongly induced in a concentration-dependent manner under TNF-a stimulation in the undeleted plasmid, very simi-lar to our positive control NFKBIA In contrast, in the experiment with the plasmids in which we had deleted the putative NF-jB sites, the concentration-dependent stimulation effect was not seen for OPTN and CASP4 promoters, and was strongly reduced for the Spi-B promoter (Fig 6), indicating that the NF-jB action was blocked in the deleted mutant The negative control (DARS) did not show any significant dose-dependent change in expression
Furthermore, we applied the chromatin immunopre-cipitation (ChIP) analysis in order to verify NF-jB interaction with the predicted NF-jB-binding sites A positive ChiP signal was obtained for OPTN and SPI-B
as well as for NFKBIA in stimulated cells (Fig 6) NF-jB-dependent activation of the CASP4 promoter was not indicated by ChIP analysis in HEK293 cells (Fig 6Be) This correlates well with a very low basal promoter activity, and therefore may be attributed to
a silenced CASP4 promoter in the cellular model used
Discussion
We have described the seed-distribution-distance method for the identification of specific transcription factor target genes This strategy extracts relevant information about gene regulation from large-scale
Table 1 Potential NF-jB targets identified by the seed-distribution-distance method that are in the top group of the rank and have predicted NF-jB-binding motifs within their )500 bp upstream promoter region Interestingly, eight of the 16 identified new targets are known targets
of NF-jB Note that all potential new targets were not in the initial seed group, so the otherwise known targets therefore constitute a good validation of our method The third column contains additional information about the results of the analysis of the ChIP assays and the repor-ter gene analysis (RGA) followed by a + or ) in case of a positive or negative result, respectively.
RGA+ (positive control) ENSG00000197635 Dipeptidyl peptidase 4 (DPP4)
ENSG00000081041 Macrophage inflammatory protein 2a precursor (CXCL2) Guitart et al [61]
ENSG00000169245 Small inducible cytokine B10 precursor (CXCL10) O’Donnell et al [60], suggested
ENSG00000117151 Di-N-acetylchitobiase precursor (CTBS)
ENSG00000023445 Baculoviral IAP repeat-containing protein 3 (BIRC3) Hosokawa et al [62]
ENSG00000166718 Hypothetical protein
ENSG00000158714 SLAM family member 8 precursor (SLAMF8)
Trang 7microarray experiments to generate a
distribution-dis-tance-derived target prediction based on a seed set of
known target genes of a specific transcription factor
The target prediction is based on a combination of
transcription factor-binding site information and the distribution distance We took especial care to keep our method simple and the number of free parameters
as low as possible, so our results do not depend on
0
5
10
15
20
25
30
0
2
4
6
8
10
0
5
10
15
20
25
30
0
1
2
3
4
5
6
0 10 000 20 000
10
20
30
40
E2F
ETS1
HIF-1
HNF4
NFkB
Position in rank Position in rank
0
10
0 5000 10 000
0
10
0
10
0
10
0
10
5
5
5
5
5
Immune response
Liver development Blood coagulation Lipid metabolic process
Response to hypoxia Angiogenesis
Extracellular matrix
Cell cycle
Transcription
Factor
0
0
5
10
15
20
25
30
0
10
5
Fig 5 Left column: cross-validation of the seed distribution method for six different transcription factors By means of a jack-knife method, the recovery position of the gene left out in the rank was calculated for each transcription factor seed group There
is a clear and high enrichment in the top ranks for each transcription factor tested Right column: we applied the seed distribu-tion method to rank genes We calculated the gene ontology density for typical ontolo-gies of the corresponding factor Enrichment corresponds to an increased density at the top ranks as compared with the density at the bottom ranks.
Trang 8any parameter fine-tuning Despite the simplicity of
the method, our predictions are very reliable, with 11
of the 16 predictions being true targets, corresponding
to an upper bound of the false discovery rate of 33%
On the basis of a jack-knife method, we estimate that
our seed-based method of ranking genes will enrich
true target genes within the top 5% by a factor of 4–8
Thus, incorporating the vast amount of microarray
data stored in databases can help to reduce the
extraordinarily high amount of false-positives obtained
with purely sequence-based methods [5,7,35] More
sophisticated clustering methods might even improve
the prediction quality further We provide both
statisti-cal and biologistatisti-cal evidence that the
seed-distribution-distance method is robust and applicable to other
transcription factors and is hence very useful in
pre-dicting specific transcription factor target genes
Top rank members are involved in typical
NF-jB-regulated functions and are enriched
with putative NF-jB-binding sites
The distance criterion for generating the rank is a kind
of expression profile similarity measure with respect to
the seed group It is not a priori clear that similarly
regulated genes share the same gene function The
NF-jB analysis, however, reveals that the
seed-distri-bution-distance method highly enriches genes in the
top ranks that share typical NF-jB-regulated
func-tions For instance, the processes immune responses,
complement activation, regulation of T-cell
differentia-tion and immune cell activadifferentia-tion are significantly
pres-ent in the top group (supplempres-entary Table S1)
Moreover, we found specific enrichment of predicted
binding motifs for NF-jB 50 and NF-jB 65 in the top
5% of the genes among three others We would expect
the other factors to be functionally related to NF-jB
This is the case for STAT5A, which has been reported
to be involved in severe combined immunodeficiency
[36] and is involved in the immune response [37]
Please note that these statistics were obtained without
the initial seed group Therefore, it would have been
possible in our example to determine with high certainty
from the constructed rank which seed group was used to
build up the rank, namely a group with NF-jB targets
OPTN is a direct NF-jB target
We predict a list of new NF-jB targets that were not
in the initial seed (Table 1) Eight of the 16 predicted
novel targets have been previously confirmed Three
other predicted NF-jB targets were experimentally
investigated in this study, and were identified as direct
NF-jB targets OPTN, Spi-B and CASP4 were in our predicted list of new targets Deletions in the OPTN gene are causative for the adult-onset primary open-angle glaucoma [38] Glaucoma affects 67 million peo-ple worldwide [39], and is the second largest cause of bilateral blindness in the world [40] It has been sug-gested that OPTN is involved in the TNF-a signalling pathway [41]; however, the molecular mode of action has been unknown up to now It has been suggested that OPTN blocks the protective effect of E3-14.7K on TNF-a-mediated cell killing, and hence OPTN may be part of the TNF-a signalling pathway that can shift the equilibrium towards induction of apoptosis [38,41] Recently, it has been shown that OPTN increases cell survival and translocates to the nucleus upon an apop-totic stimulus that is dependent upon the GTPase activity of Rab8, an interaction partner of OPTN [42] Interestingly, this protective function of OPTN is lost when the OPTN protein is changed to the mutated form E50K, which is typical for patients with normal tension glaucoma [42] We show that a deletion of a putative NF-jB-binding site in the promoter region of OPTN completely abolishes the enhancing action and modulatory effect of NF-jB on OPTN (Fig 6) Our experiments show clearly that OPTN is a direct target of NF-jB Recent findings indicated that TNF-a potentiates glutamate neurotoxicity through the blockade of glutamate transporter activity [43,44] Fur-thermore, it was shown that OPTN and NF-jB essen-tial modulator (NEMO) are competitive inhibitors of one another [45] NEMO represents the regulatory subunit of IKK, which is essential for NF-jB activa-tion [46] Together with our data, this makes it appar-ent that OPTN is part of a negative feedback system that is important for NF-jB action Elevated OPTN expression reduces induced NF-jB activation [45], and
is therefore protective against induced neuronal cell death, which depends on NF-jB activity This is in line with findings indicating that the protective func-tion of OPTN is lost upon truncafunc-tion resulting from the insertion of a premature stop codon, and when the OPTN protein is changed to the mutated form E50K, which is markedly reduced in patients suffer from glaucoma [42] Our data provide the missing link in the signalling of NF-jB and the damping function of OPTN in signalling feedback of NF-jB
The knowledge about the direct action of NF-jB on OPTN will greatly enhance our understanding of the signalling pathways relevant for antiapoptosis, and will
be helpful in designing possible new cell survival strate-gies in glaucoma patients
The two other newly identified and verified target genes of the NF-jB transcription factor seem to be
Trang 90.100 1.000 10.000
SPIB
0.100 1.000 10.000
Control TNF-alpha Control TNF-alpha Control TNF-alpha Input Anti-rabbit-AB Anti-NFkB-AB
Control DNA
0.001 0.010 0.100 1.000 10.000
P < 0.003
P < 0.01
n.s.
OPTN
0.010 0.100 1.000 10.000
CASP4
0.001 0.010 0.100 1.000 10.000
P < 0.03
n.s.
0
5
10
15
20
25
30
35
40
45
SPI-B SPI-B NFkB del
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
CASP4 CASP4 NFkB del
P < 3.2*10 –5
P < 4.2*10–12
Control 1.25 ng·mL –1
2.5 ng·mL –1
5 ng·mL –1
10 ng·mL –1
20 ng·mL –1
TNF-alpha
0
200
400
600
800
1000
1200
NFKBIA promoter DARS promoter
0 20 40 60 80 100 120
P = 0.94
A Reporter gene activity
P < 10 –15
Lucreportergene putative
NFkB site
deletion
Lucreportergene putative
NFkB site
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
P < 4.2*10–26
B ChIP analysis (a)
(b)
(c)
(d)
(e)
(a)
(b)
(c)
(d)
(e)
Control TNF-alpha Control TNF-alpha Control TNF-alpha Input Anti-rabbit-AB Anti-NFkB-AB
Control TNF-alpha Control TNF-alpha Control TNF-alpha Input Anti-rabbit-AB Anti-NFkB-AB
Control TNF-alpha Control TNF-alpha Control TNF-alpha Input Anti-rabbit-AB Anti-NFkB-AB
Control TNF-alpha Control TNF-alpha Control TNF-alpha Input Anti-rabbit-AB Anti-NFkB-AB
Control 1.25 ng·mL –1
2.5 ng·mL –1
5 ng·mL–1
10 ng·mL–1
20 ng·mL –1
TNF-alpha
Control 1.25 ng·mL –1
2.5 ng·mL –1
5 ng·mL–1
10 ng·mL–1
20 ng·mL –1
TNF-alpha
Control 1.25 ng·mL –1
2.5 ng·mL –1
5 ng·mL –1
10 ng·mL –1
20 ng·mL –1
TNF-alpha
Trang 10involved in important physiological processes related
to typical known functions of NF-jB It is known
that the Spi-B transcription factor is expressed in
adult pro-T cells, with Spi-B being maximal in the
newly committed cells at the DN3 stage [47]
Furthermore, Spi-B can interfere with T-cell
develop-ment [47] CASP4 can function as an endoplasmic
reticulum stress-specific caspase in humans, and may
be involved in pathogenesis of Alzheimer’s disease
[48]
When does the seed-distribution-distance
method work?
The major assumption of our method is that genes
that are regulated by the same factor show at least
some coregulation We use a genome-wide based
simi-larity measure L(x) based on the comparisons of the
median values of two correlation distributions For
each gene (x) in the genome, we calculate L(x), which
is the median correlation of gene x with all the genes
within the seed set minus the median correlation of
gene x with all the rest of the genes in the genome
Our approach is able to ‘add up’ contributions form
all the genes in the seed set, and by the use of the
med-ian and not the mean, it can discard a reasonable
amount of outliers Subtracting the median correlation
with the rest of the genome corrects for the correlation
structure of the expression dataset as a whole We also
tried a more sophisticated scoring scheme by ranking
the genes on the basis of a Mann–Whitney rank-sum
test, which did not improve the performance of the
ranking procedure
The seed-distribution-distance method is extremely
robust and produces high enrichment even if a
consid-erable part of the seed is not present This was shown
by the cross-validation procedure and the subsequent
recovery test
The seed-distribution-distance method is expected to produce a biologically meaningful rank if the seed group is homogeneous with respect to its expression correlation If, for instance, the seed group contains completely unrelated expression clusters that are located in the cluster space in a linearly independent way, the resulting distance measure might not to be capable of building up a transcription factor-specific rank In this case, one would need to cluster the seed group into subseeds and to build up individual cluster-specific ranks For instance, this might be necessary in the case of transcription factors that target different genes depending on the splice form of the transcription factor Interestingly, however, in our analysis, the per-formance of the method seems not to depend crucially
on the homogeneity of the expression of the seed group, as some seed groups that performed well in the cross-validation test had large intraseed variations (supplementary Fig S4)
A second consideration relates to the expression dataset The seed-distribution-distance method relies
on the assumption that the transcription factor of interest shows some biological activity in the data If, for example, the transcription factor of interest is com-pletely shut down in all experiments, one would not expect to be able to recover the regulation response of that factor This issue might be of importance for genes that are only active at tight periods during devel-opment One solution to this problem would be to generate expression experiments with artificial expres-sion of that transcription factor or to include native material from that developmental period in the micro-array analysis
The third consideration relates to the size of the seed One would expect that if the seed is too small to define the target response adequately, the rank will be poorly defined However, our bootstrapping test showed that 10 seed genes are capable of enriching
Fig 6 Experimental validation of predicted NF-jB targets by functional analyses and physical NF-jB interaction with the predicted NF-jB-binding sites in the nuclear chromatin context (A) RGA HEK293 cells were transfected and treated for 24 h with TNF-a in a dose-dependent manner (n = 4) (a) Schematic illustration of experimental design RGA was measured with unmodified native promoter constructs (left col-umn) and in constructs where the putative NF-jB-binding sites were deleted (right column, NF-jB del) (b) Promoter activity for NFKBIA, which is known to be a target of NF-jB, and a negative control (DARS) Only the NFKBIA promoter responded in a dose-dependent manner under stimulation with TNF-a (c, d, e) RGA for the (c) OPTN, (d) SPI-B and (e) CASP4 promoter: All experiments showed a dose-dependent increase in promoter activity under stimulation with TNF-a Deletion of the putative NF-jB-binding site resulted in significantly attenuated dose-dependent responses (B) ChIP analysis HEK293 cells were cultured with TNF-a (10 ngÆmL)1) or without (control) for 24 h prior to crosslinking and ChIP using anti-rabbit serum (negative control) or an antibody to NF-jB Relative values of immunoprecipitated DNA were assessed by real-time PCR (n = 3) (a) Amplification of a coding region part of the intron-less gene encoding GAPDH, which should show no promoter-like activity and contains no potential NF-jB-binding element, served as control DNA (b–e) Verification of the predicted NF-jB-bind-ing sites was obtained for the (b) positive control NFKBIA as well as (c) OPTN and (d) SPI-B NF-jB-dependent activation of (e) the CASP4 promoter is not indicated by ChIP analysis in HEK293 cells.