This figure shows the stages of B cell development used in the LymphTF Database, indicates anatomical location of developing cells, and arrows describe pathways of progression of develo
Trang 1LymphTF Database
A DATABASE OF TRANSCRIPTION FACTOR ACTIVITY
IN LYMPHOCYTE DEVELOPMENT
PAUL CHILDRESS
Submitted to the faculty of the Bioinformatics Graduate Program
in partial fulfillment of the requirements
for the degree Master of Science
in the School of Informatics, Indiana University September 2005
Trang 2Accepted by the Bioinformatics Program Graduate Faculty, Indiana University, in partialfulfillment of the requirements for the degree of Master of Science in Bioinformatics
Master’s Committee
_ Narayanan B Perumal, PhD
_
Malika Mahoui, PhD
_
Mark H Kaplan, PhD
Trang 3This thesis and indeed my entire graduate education are dedicated to Mason and Joshua Childress They are wonderful sons and a continuing source for inspiration and
motivation
Trang 5we have created the LymphTF Database This database holds interactionsbetween individual transcription factors and their specific targets at a givendevelopmental time It is our hope that storing the interactions in developmentaltime will allow for elucidation of regulatory networks which guide the process.Work for this project also included construction of a custom data entry web pagethat automates many tasks associated with populating the database tables.These tables have also been related in multiple ways to allow for storage ofincomplete information on transcription factor activity This is done withouthaving to replace existing records as details become available The LymphTF
DB is a relational MySQL database which can be accessed freely on the web athttp://www.iupui.edu/~tfinterx/
Trang 6Table of Contents
Introduction _ 1-13 Lymphocyte development 1-10
B cell development 2-5
T cell development _5-10Lymphocyte stages 10-13Importance 14-22Developmental model system 14-16Clinical importance 16-17Genetic control model system 18-20Transcriptional regulatory networks _20-22Knowledge Gap _23-28 Transcriptional regulatory networks _23-24 Gene expression 24-26 Related research 26-28
The Database 28Materials and Methods 29-44 Definitions and general comments 29-31
Article selection 31-32Database design 32-44 PHP code: data entry and retrieval 43 Data entry _43-45
Results/Discussion 46-51
Database contents _46-47Interesting observations 47-48Searches 48-51Final comments/Future Possibilities _52-53References _54-57
Literature 54-56Websites _57Appendix 58-65
Trang 7Immune system development is an area of active research which holds the
potential to develop knowledge about disease processes, serve as a model system for study of stem cell differentiation and provide insights into the broader understanding of control of genetic expression Much advancement in technique and the accumulation of alarge amount of information have also allowed for the development of rudimentary transcriptional regulatory networks (TRNs) both from automated means and manual creation These networks are in the nascent stages, and rely on a dependable base of information to assemble Such a base exists within peer-reviewed literature B and T cells occupy this unique position in biological study due to several factors These cells arise from a continuously regenerating supply of hematopoietic stem cell precursors (Busslinger, 2004), collection of samples is comparatively easy, animal models exist which mimics human immune cell biology closely, and this is a particularly well-studied area of science which has yielded an enormous amount of information (Siu, 2002)
Lymphocyte development
B and T lymphocyte development patterns bear many similarities in development and their cooperation in immune system functions combines to give natural companions for study Both populations are derived from the same precursor cells (see following two sections for details) and then proceed in a step-wise fashion through various stages of development These stages are characterized by appearance (or disappearance) of cell surface markers, genomic rearrangements and progressive loss of developmental
potential This process begins in the embryo and persists throughout life (Bommhardt U
Trang 8et al, 2004) Each cell type will be discussed independently and a discussion of their
cooperative function follows
B cell development
Mammalian B cells originate within the liver of embryos or the bone marrow of adults The common lymphoid progenitor (CLP) is widely recognized as the stem cell (multipotential cell) that displays the first commitment to a lymphoid cell fate This is a progression from the less restricted hematopoietic stem cell (HSC) which has the ability
to become any type of blood cell and the similar multipotent progenitor (MPP) which cannot self-renew its population Figure1 highlights B cell development showing the staging scheme used in the database design (see Materials and Methods)
Figure 1 This figure shows the stages of B cell development used in the LymphTF Database, indicates anatomical location of developing cells, and arrows describe pathway(s) of progression
of developing cells
Trang 9The action of the receptor tyrosine kinase Flk2/Flk3 separates a subset of MPPs from the others to give a population of cells that are directed to a lymphoid cell fate This
means that they can no longer assume the myeloid lineage cell types (Singh et al, 2005)
The action of this kinase results in the earliest recognized committed B cell progenitor, the pre-pro B cell This population does not yet contain rearranged immunoglobulin µ (Igµ) heavy chain genes Cells at this stage show the B cell surface markers, AA4+, B220+ and interleukin-7 receptor α-chain (IL-7Rα+), but still show plasticity to develop into non-B cell types under certain conditions Rearrangement of the Igµ heavy chain diversity (DH) region and joining (JH) region genes signals the early pro-B cell stage Recombination of these genes is mediated by RAG1 and RAG2 enzymes Next, the variable (VH)gene is rearranged to combine with the D-JH genes to form the IgH
complex, which is displayed on the cell surface Cells that successfully rearrange the heavy chains are then able to associate IgH with the surrogate light chain gene products
of the VpreB and λ5 genes The Igα and Igβ signaling molecules are associated with the developing complex (IgH, VpreB and λ5) to create the pre B cell receptor (preBCR) At this point, the cells are referred to as large preB cells and signal transduction activity of the preBCR constitutes a major checkpoint in B cell development (Mathias and Rolink, 2005) Autoreactive cells are removed from the population by apoptosis (programmed cell death) or anergy if their reaction to self antigens is of sufficient strength and cannot
be attenuated by a process known as receptor editing (a second rearrangement event) Anergy is a functional silencing of autoreactive B cells which confers tolerance to these
cells and predisposes them to an apoptotic fate (Loder et al, 2001) Signaling through the
preBCR is also important for allelic exclusion of one of the IgH locus genes Because of
Trang 10the diploid complement, exclusion is necessary so that cells express only one allele for the heavy chain genes The end result is developing B cells that have recombined
successfully at the IgH locus, and have subsequently restricted expression of one allele at that locus Signals from the pre-BCR serve to induce an expansion of the developing population (Mathias and Rolink, 2005) The expression of the surrogate light chains is then downregulated, leading to resting, smaller pre-B cells that begin to rearrange the kappa light chain gene locus If this proves unproductive, the lambda chain begins rearrangement B cells at this stage of development are referred to as immature B cells
(Melchers et al, 2000).
These cells can then leave the marrow (or fetal liver) and enter the spleen to awaitfurther developmental progress that is dependent on interacting with an antigen
(Siebenlist et al, 2005) At this point in development, the B cells are also known as
transitional B cells, and their eventual fate is dependent on their location within the spleen and the concentration of BCR complexes on the cell surface At least two
transitional stages, T1 and T2, are required for B cell development into a mature B cell (termed lymphoblast in this project’s database) Given their location in the marginal zone
of the spleen, these B cells are the first cells to encounter foreign antigens and thus constitute the first line of defense in the humoral response to antigenic challenge A process known as repertoire selection promotes proliferation of these marginal zone B cells based on their reactivity to T cell-independent epitopes After this selection, a portion of the cells leave the spleen to become circulating plasmablasts which no longer express immunoglobulin as part of their cell surface
Trang 11A series of genomic events modifies antibody affinity for its specific antigen during B cell development One such mechanism is class switch recombination that creates different class antibodies such as the secretory class A and high affinity class G byreplacing the M class heavy chain with rearranged chains of different types Also,
affinity maturation (or somatic hypermutation) serves to ‘reverse-engineer’ the variable region which binds antigens This process requires high affinity binding of surface antibodies to antigens in order to better compete with other cells that have undergone similar mutation events for these antigens This effectively sorts out those cells in which
induced mutation yields a stronger binding to foreign bodies (Calame et al, 2003)
Progression from transitional B cells to antibody-producing plasma cells is not the only path of development for splenic B cells Transitional B cells can move into to secondary lymphoid organs for further development These cells can move to germinal centers of peripheral lymph nodes (termed germinal center B cells) and differentiate into long lived memory cells, or adopt a plasma cell phenotype from these peripheral locations (Matthiasand Rolink, 2005)
Trang 12of data obtained in the literature (see Stages section) B and T cells share 1 and
Rag-2 dependent genomic rearrangements In the case of T cells, the recombination yields a Tcell receptor Also similar is the necessity for signals from this receptor to convey
survival from anergic and negative selections by inducing positive selection (Siebenlist et
al, 2005) Figure 2 highlights B cell development showing the staging scheme used in
the database design (see Materials and Methods)
Figure 2 This figure shows the stages of T cell development used in the LymphTF Database, indicates anatomical location of developing cells, and arrows describe pathway(s) of progression
of developing cells
The earliest cells destined for a T cell fate has been a source of some controversy
A more traditional view of lymphocyte development suggests that B and T cells each arrive from a CLP, but an early T cell precursor (ETP) has been proposed that
preferentially develops into thymocytes The CLP cells, while capable of T cell
Trang 13development if placed in the thymus, are more likely to become B cells while in the bone
marrow (Bommhardt et al, 2004) ETPs are distinguished from CLPs by a stronger Sca-1
molecular marker presence and absence of IL-7R on the surface Whether or not ETP cell are derived from CLP cells or represent another cell type derived from an earlier multipotential stem cell is unclear Generally accepted are the next four double negative (DN) stages, so called because these primitive T cells lack the CD4 or CD8 markers associated with more mature and terminally differentiated T cells DN stage is further broken down into four parts based on the combination of expression and disappearance ofCD44 and CD25 (interleukin-2 receptor α-chain) cell surface proteins These stages (DN1-DN4) are shown in Figure 3, with their cell surface marker assignments
Figure 3 Double negative distribution of cell surface markers CD44 and CD25
Double negative T cells begin to appear after CLP (or ETP) migration from the bone marrow to the thymus As with the B cells, genomic rearrangements are an important hallmark of T cell development At the DN2 – DN3 interface the T cell receptor (TCR) α-, β- and δ- chains are rearranged using the Rag-1/2 recombination machinery
Successful recombination is essential for progression to a αβ-T cell fate Late in the DN3stage the TCRβ and TCRα chains bind to form the αβ-pre TCR complex Signaling through this receptor, or β-selection, is essential for progression to the DN4 stage This
Trang 14population can then be sub-divided into “L” and “E” cells (for “large” and “expected”) The L cells constitute about 15% of the total DN3 population, and are enriched for in-frame β-chain rearrangement and lack cell cycle inhibitor p27KIP1 Thus, this minority population of cells moves on to become the DN4 developing T cells Developing T cells which cannot receive preTCR signaling undergo apoptosis This is generally the E cell population Successful β-chain rearrangement also triggers two exclusionary
mechanisms which prevent an identity crisis of sorts in the L population The first is known as allelic exclusion and involves shutting down of further rearrangement of the β-chain gene The second is called isotypic exclusion This prevents the rearrangement and subsequent display of the γ-chain of the preTCR complex (Möröy and Karsunky 2004) A small portion of these DN3 cells do not display the αβ-pre TCRcomplex; rather these cells have a successful recombination of the δ- chain gene and continue to a γδ-T cell fate This fate choice is more common in fetal development, and rapidly declines until birth to reach a final proportion of about 5% of the total T cells in an adult The E subset of cells seems to be the pool from which the γδ-T cells originate; these cells may
experience a δ-selection analogous to the mechanism mentioned above (Bommhardt et
al, 2004) Following these cell fate decisions (αβ- or γδ-T cell lineages), surviving T
cells move to the DN4 stage where expression of both CD44 and CD25 is quashed (see Figure 2)
Next in the developmental program is the emergence of double positive (DP) T cells The name refers to appearance of CD4 and CD8 cell surface markers These proteins are used to classify the maturing thymocytes from this point until terminal differentiation Genomic rearrangement of the α-chain of the TCR occurs at this DP
Trang 15stage, yielding a functional T cell receptor (TCR) A productive rearrangement is again used as a checkpoint in development by requiring that a complete TCR and major
histocompatibility complex (MHC) cooperate to recognize antigens presented by the MHC molecule Strong binding between the two leads to a negative selection and these cells are removed by apoptosis Similarly, DP cells that do not have a correct genomic recombination of the α-chain gene are removed by neglect (via apoptosis) This occurs inthe border area between the medulla and cortex of the thymus This corticomedullary junction thus serves as a broad division for the DN cells having originated from
multipotent precursors The first stages of development occur in the outer cortex, move
to the medulla as they become DP cells and eventually achieve a final cell type fate and
are released into the circulatory system (Bommhardt et al, 2004) The TCR signals a
transient down regulation of both CD4 and CD8 and positive autoregulation to increase its own expression The resulting cells, referred to as CD8low and CD4low, then experienceresurgence in CD4 expression in preparation for lineage commitment to either a CD4+ or CD8+ restricted fate The resulting cells are then designated single positive At this stage expression of one of the two markers is restricted, and the SP stage is reached CD4+
cells interact specifically with MHC type II self-antigens and CD8+ interact specifically with MCH type I self-antigens The details of this choice are not completely understood However, binding between the TCR and the two MHC molecules is known to participate
in the decision (Möröy and Karsunky 2004)
In the thymus a more recently described CD4+ cell type that participates in
attenuating the immune response has been described The T regulatory cell resembles CD4+ phenotypically in cytokine and cell surface profiles The developmental origins of
Trang 16this cell type are not known, but retroviral transfer of the transcription factor FoxP3 in nạve CD4+ T cells results in a T regulatory phenotype suggesting a CD4+ origin (Hori et
al, 2003) The majority of this CD4+ population however, develops into T helper 1 (Th1)
or T helper 2 (Th2) as a final stage in development Nạve CD4+ cells leave the thymus and complete this final step in the peripheral circulation A model is emerging in which transcription factors T-bet and GATA3 control development into Th1 and Th2
respectively, but exact mechanisms are not known (Glimcher and Murphy 2000) After leaving the thymus CD8+ T cells progress into cytotoxic effector cells and memory T cells in the periphery, but retain their single positive CD8+ phenotype
Lymphocyte stages
Because interactions that populate our database are broken down into stages by
developmental time, a discussion of some of the nuances inherent in staging techniques and controversies is warranted
Trang 17Figure 4 Lymphocyte stages used for classification in the LymphTF Database Stages in bold represent terminal stages of differentiation, purple B stages are antigen dependent, blue T stages develop in the thymic cortex (note: DP and γδ-T cells develop as they migrate to medulla), brown
T stages develop in the medulla, red T stages develop in the periphery.
Figure 4 depicts the staging scheme employed in this project This is not a complete picture of all stages that are used to describe lymphocyte development As has been mentioned previously, nạve CD8+ T cells leave the thymus awaiting interaction with antigens to recruit the cytotoxic machinery production and become effector cells These cells are then capable of destroying virally infected cells Improvements in staining techniques have allowed for the recent advances in studying these cells Additionally, improvements in cell-sorting techniques, identification of new cell surface markers and characterization of cytokine production profiles have led to a theory of late-stage
Trang 18development of CD8+ cells The distinctions available with this new information can
conceivably be considered an extension of T cell development (van Baarle et al, 2002).
Many examples of debate exist in the field of immunology when considering lymphocyte development The common lymphoid progenitor had been roundly accepted
as the developmental ancestor of B and T cells (Kondo et al, 1997) However, Allman
and co-workers have recently shown an early T lineage progenitor that seems to generate
T cells in Ikaros deficient mice whose B cell production had been halted Ikaros is known to act in the very early CLP cells (Georgopoulos, 2002) T cell production did go awry in the mutant mouse speaking to the necessity of Ikaros at later stages of
development, but this finding has put into doubt the theory that CLPs are the only the
source for thymopoiesis (Allman et al, 2003) In fact the existence of a common
lymphoid progenitor cell has been directly challenged It has been suggested that rather than a progenitor cell type, a ‘profile’ of genetic expression determines function These profiles can be described using graph theory as collections of smaller motifs With this perspective it is possible to define development as a progression of accumulating motifs that define the phenotype of a cell by its underlying structure of genetic expression profiles This type of arrangement fits nicely with the observation that development is partially defined by what potential exists For example, a pre-pro B cell might
differentiate into a thymocyte if transplanted to the thymic cortex This relocation could
be seen as changing the motifs that make up an expression profile to that of a T cell (Warren and Rothenberg, 2003) In an alternate study, Ikaros-deficient lymphoid
precursors in thymus gave B cells as opposed to T cells, which would seem to refute the work by Allman and others (Hardy, 2003) For the purposes of this project, the
Trang 19controversy around any particular stage of development is troublesome because the literature reflects the thinking of stage delineation at the time of publication in most cases This has led us in our project to a pragmatic approach of assigning stages when reference to a specific stage was not mentioned in the literature (please see Materials and Methods for our strategy)
Trang 20Study of B and T lymphocyte development has many features which distinguish the field as one of significance in many diverse disciplines Examples are serving as a model for progressive cellular differentiation to clinical importance in several types of human cancers and serving as a test system for study of gene expression This section will attempt to highlight the importance of this field of study, and set up a subsequent description of an existing knowledge gap within the community
Developmental model system
The process of mammalian development proceeds from a diploid, fertilized egg to
an enormously complex adult The egg’s progression through development depends on subtle differences in gene expression and soluble factors which become exaggerated as cells divide The entire process involves partially understood acquisition of unique cellular properties This description is remarkably similar to that of B and T cell
development A population of hematopoietic stem cells is capable of becoming many different types of lymphocytes (akin to a fertilized egg) These cells respond to
environmental stimuli by following a particular developmental program The pathways gradually become more restricted in their ability to differentiate into various cell types Again, the parallels with whole organism development can be seen Because lymphocytedevelopment is an ongoing process and sample collection is easy, the system lends itself
as a model system for studying development of cell types by a series of binary choices Stem cell progression to a differentiated cell type has received much attention in recent years due to the anticipation of developing cell-based therapy for human disease
Trang 21However, the accretion of properties such as progressively restrictive developmental potential or epigenetic changes is not fully understood B and T cells are in the enviable position of being very well characterized, plentiful, and relevant for such study As they develop we see the need for cell surface receptors to transmit survival signals The molecular cascades triggered have begun to be elucidated, and involve transcription factors in positive, negative and auto-feedback loops, as well as effecting change on genes known to be involved in cell cycle progression and immune system function
Just as importantly, lymphocyte development depends upon culling a massive population by removing unnecessary or harmful cells This process proceeds either by direct apoptosis of cells without productive genomic rearrangements, or by neglect of cellwhich cannot interact properly with the so-called ‘self antigens’ MHC-I and MHC-II Apoptosis exists elsewhere in the developmental process Pruning of the nervous system
is a well known example with approximately 50% of neurons present in developing vertebrates succumbing to programmed cell death or apoptosis Undifferentiated neural stem cells undergo apoptosis as well This begins in mice as young as E6.5 (6.5 days post fertilization) Disruption of the process leads to developmental defects before terminal neural differentiation indicating that multipotential cells are selectively being destroyed This is analogous to the apoptosis seen in developing B and T cells (Yeo and Gautier 2004) It is not immediately clear if a similar mechanism is employed by both populations However, in a statement about the potential for using lymphocytes as a model system to study stem cell (or multipotential cells), Notch1 has been shown to be important for neural development and lymphocyte development Notch1 in lymphocytes
is involved in early lineage decisions and later stages of T cell development in the
Trang 22β-selection process Notch1 is also part of the signaling pathway that the T cell receptor
uses to promote an αβ-T cell fate, and to avoid apoptosis (Kruisbeek et al, 2000) Notch1
was recently shown to also function in early neurodevelopment via apoptosis and
neurodegeneration associated with aging Characterizing functions and pathways in
blood cells is a vastly easier task than in neural tissue, which shows the power of using lymphocyte development as a model system
Clinical importance
In addition to serving as a general model system for development, study of
lymphocyte development has practical, clinical importance Study of stem cells for use
in therapy has already been mentioned There are other, more direct examples Many leukemias and lymphomas are the results of chromosomal aberrations that lead to
dysregulation of normal transcription factor signaling When considering malignancies
of lymphoid origin, Mantle cell lymphoma represents about 12% of all Non-Hogkins lymphomas and has one of the poorest prognoses for any type of lymphoma A reciprocaltranslocation of t(11;14)(q13;32) is characteristic of this disease The aberration places the cyclin D1 gene product under the control of the Ig heavy chain enhancer (Thelander
et al, 2005) During the pre-B cell stage (see below) the Ig gene locus is activated by E2A, DR1 and ELK3 (Romanow et al, 2000; Rajaiya et al, 2005; Lopez et al, 1993)
transcription factors, and is repressed by PAX5 (Singh, 1993) at other stages of B cell development From a clinical point of view, the pathways involving these transcription factors may yield a more complete understanding of the disease pathology and may also yield novel drug targets Rituximab (an anti-CD20 monoclonal antibody) when
Trang 23conjugated with radioactive iodine (I131) has been used as an adjunct to chemotherapy Future therapies may combine the antibody with drugs designed against molecules involved in the otherwise inappropriate action of the signals generated from the chimeric transcription factor, rather than the generalized killing action of a radioisotope
In addition to cancers, disease states from dysregulation have been implicated in autoimmune conditions such as Crohn’s Disease which is a form of irritable bowel syndrome (IBS) The etiology of this condition is still under some debate Mutations in the CARD15/NOD2 gene have been associated with increased incidence; stress,
abnormal antibody production to intestinal flora, and increased platelet count are also among the proposed causative factors A theme that seems reasonably consistent is the irregular cytokine production seen in gut lymphocytes from Crohn’s patients The CD4+ lymphocytes of the mucosal lining show an increase in interferon-γ (IFN- γ) secretion (Fuss, 1996) Tumor necrosis factor α (TNF-α) and interleukin-12 (IL-12) are also
known to be up-regulated in Th1 lymphocytes from CD patients (Cominelli 2004) The
connection between genetic susceptibility for CARD15 mutations and aberrant cytokine production is not established, but the CARD15 is known to be activated by IFN- α and
INF-γ and is involved in the NF-κB signaling pathway response to enteric bacteria
Although far from a direct link, this pathway may participate in the autoimmune nature ofCrohns Disease In this context understanding which expression profiles are affected by cytokine fluctuations is important for understanding the underlying disease process, identifying possible targets for therapy and improving diagnostic ability
Trang 24Genetic control model system
In addition to using B and T cell development as a model system of differentiationand for their obvious clinical importance, more fundamental questions of molecular biology are being answered by research with these cells Genetic expression has been thetopic of intense and often difficult work Control of expression has been revealed to consist of many layers Epigenetic changes, heterochromatic vs euchromatic super-structure, DNA methylation, presence of CpG islands and specific protein binding sites (promoter, enhancer & repressor elements and locus control regions) in the DNA
sequence constitute important methods of transcriptional control – the first and arguably most fundamental layer of control Transcription factor binding to specific nucleic acid sequences which can activate or repress transcription has been recognized as an important
part of this process since the description of the lac operon by Jacob and Monod (Jacob et
al, 1960) This process is conserved and has been extensively studied in mammals, aided
by improvements in cell culture techniques and in vivo methods of transplantation,
genetic engineering and gene targeting methods
Murine B and T cell development has generated a very large amount of data in this regard Indeed, evidence has emerged which places transcription factor activity into more than one layer of control For example, the early B cell factor (EBF) has many
direct targets for action including the λ5 gene the B29 gene (Smith et al, 2002, Åkerblad
et al, 1999) Recently however, an additional function for EBF effecting epigenetic changes in conjunction with the Runx1 transcription factor has been described (Maier et
al, 2004) Complex methods of regulation are also seen with Ikaros (Georgopoulos 2002)
Trang 25and NF-κB (Dejardin 2002) These two examples involve higher order DNA-protein interactions beyond the classic activation of a gene locus by transcription factor binding Another recent example is the Aire transcription factor thought to contribute to clumping,
on the actual chromosome, of similarly controlled genes (Johnnidis et al, 2005) As an
added layer of complexity, cooperation between transcription factors effects cell lineage decisions The work of O’Riordan and Grosschedl in 1999 demonstrated a synergy between the E2A and EBF factors which act in concert to up-regulate target genes such as
λ5 and the RAG genes This cooperation seems to be a unique feature of the pro B cell
stage, with each factor having its own additional requirements in various stages of B cell development While the authors could not determine the exact nature of the synergistic effects, they did suggest at least three possible mechanisms: cooperative binding, synergylacking physical interactions – perhaps via accessory molecules, and finally a dose dependent mechanism that may have been related to the number of binding sites occupied(O’Riordan and Grosschedl, 1999) An additional mechanism can be interpreted by the results of an interesting study which implicated EBF in the process of re-localizing an EBF responsive gene from a transcriptionally silent state to a more open conformation in the pericentric heterochromatin The researchers used a florescence in situ hybridization technique to visualize the gene’s context In both studies a dosage effect was seen A stochastic relationship was noted between the amount of EBF and re-localization events; however, the relationship did not continue on to include transcription This is consistent
with a two part mechanism for synergy between EBF and E2A (Lundren et al, 2000)
The details of the cooperation aside, we can now add to the list of TF functions
cooperation beyond simple association (O’Riordan and co-workers were unable to detect
Trang 26EBF-E2A complexes via ChIP analysis) and chromosomal structure modification Clearly the role of transcription factors in gene expression is expanding and becoming more complex As more interactions are described organizing and storing this
information is fast becoming a necessity
Transcriptional Regulatory Networks
Transcriptional regulatory networks have begun to surface which attempt to place activity into an abstract model that explains functionality These networks can be
generated algorithmically, from the results of specific high throughput experiments or created from a bottom-up approach utilizing existing data Using a genome-wide screen for regulator binding and subsequent gene activation, Lee and others constructed a
preliminary transcriptional regulatory network in S cerevisiae The genome wide screen
identified regulator-DNA interactions (a total of nearly 4000), but could not give details
on the nature of action such as synergy or chromosomal rearrangements Also, the effects
of dosage could not be assessed due to the nature of expression measurements which were based on massive chromatin immunoprecipitation detected by microarray analysis The number of interactions deemed reliable was dependent on the p-value given to each detection event Despite this drawback, six types of network motifs (or modes of action) were identified The authors then developed an algorithm for transcriptional regulatory network generation which integrated the promoter sequence binding information with expression analysis from more than 500 expression studies The resulting network was
capable of reliably predicting constituents of various network motifs (Lee et al, 2002)
This example shows the use of high throughput data, integration of existing knowledge and algorithmically based networks The accuracy of a network is still, however
Trang 27dependent on the accuracy and completeness of the body of information available on the subject
Yeast is a simple eukaryotic organism for which much information regarding expression regulation is available making this opportunity unique Transcriptional regulation of murine T and B lymphocyte development is however more disparate and hindered by the complexity of performing certain molecular techniques (e.g mutational analysis) Much progress has been made in recent years, and transcriptional regulatory networks are beginning to emerge The indispensable roles of E2A and EBF in early B cell development are well-known, and have already been mentioned Singh and
coworkers have attempted to create model networks to explain transcription factor action.The network is based on expert knowledge of the subject and has been manually created based on results of many studies The resulting self-sustaining network encompasses seven transcription factors, five of which are present in our database in its present form However, we have also included two factors not represented in the proposed model – STAT5 and ETS1 It is possible that these factors are not important for the network (which attempts to model commitment to the B cell fate), but STAT5 activates two genes known to be involved in cell cycle progression (p21 and cyclin dependent kinase 2) and acytokine signal modifier (SOCS1) Developmental progress is known to be dependent onmeeting checkpoints prior to progression (such as gene recombination of the J and D genes in the case of early B cells) and cytokine signaling through the interleukin 7
receptor It is possible that the proposed network might be augmented by these gene products controlling the cell cycle or the SOCS gene attenuating signals from various interleukin signals Importantly, SOCS1 does not attenuate the IL-7 signal Cyclin D3
Trang 28has been identified as a direct target of E2A in human cells (Song et al, 2004) This may
lend support for an expanded TRN that incorporates more cell cycle regulatory
molecules The evidence for this interaction was based on ChIP analysis, but mutation in E-box sites on the genome did not abolish the interaction The possibility of E2A
modifying chromosomal structure was not explored, however that seems to be a
possibility given this demonstrated activity The main point is that when assembling such
a network model, a reservoir of pertinent knowledge is essential Our database project attempts to fill this need
Trang 29Knowledge Gap
An enormous amount of work has been done describing B and T cell
development The last several years in particular has seen a large increase in the amount
of information related to transcription factor activity and their role in gene expression
control networks (Hardy 2003, Bommhardt et al, 2004) B and T cell’s function in
immunity is well known and the interplay between the two as related to this function has been an area of intense research for many years This research has been especially active given the role of these cells in HIV infection and the AIDS pandemic The disease process attacks the CD4+ T helper cell population, which is a vital component of the cooperation seen between the B and T cells
Transcriptional Regulatory Networks
Recently, progress has allowed for construction of transcriptional regulatory networks (see also ‘Importance’ section) to model the commitment of a progenitor cell toward a particular lineage These networks are in the nascent stages of development, butare an important step to understanding the complexities of the regulatory process Most algorithmically driven networks share similar characteristics They are dependent on very basic modes of actions; that is a transcription factor either turns a gene on or off A nested hierarchical structure is then built up from basic modules which are combined to create complex networks that potentially model different cell types and the process of development (or differentiation) where the emergence of different network topology defines different cell types Fundamentally, there are two types of mathematical
representations of network circuit behavior: Boolean and continuous functions The
Trang 30details of the distinctive behavior between the two are not critical to mention here, but both rely on a simple mathematical statement describing action of transcriptional
activators: “if <condition> then <outcome>” Measurement of gene expression (from microarrays for example), transcription factor concentration (either a binary value or continuous measurement), dissociation constant between TFs and target DNA and
concentration of target gene mRNA (which serves as a feedback signal) are among physical measurements that can be fed into the models These parameters then are modeled mathematically and describe either a ‘condition’ or ‘outcome’ in the statement above (Bolouri and Davidson 2002) High throughput experiments have provided a plethora of data to feed into these models But, the devil is in the details of the network configuration Here the detail is the ‘if <condition>’ statement The assumption being that the model can reliably recapitulate the many modes of action cells use to regulate thegene expression The relatively newly described action of E2A chromatin remodeling constitutes a contingency-based parameter that potentiates EBF function and complicates the ‘on or off’ paradigm upon which the model is based Once again, storing
transcription factor interactions for target genes is important so that when models are
built they are sufficiently complex to reflect the modes of action seen in vivo
Gene Expression
Beyond networks, our understanding of gene expression has continued to evolve
as an important biological phenomenon Recent estimates of the number of genes in humans has been revised from near 100,000 to a smaller number of around 20,000-25,000 (although recent challenges to this estimate have come, most notably from Cheng
and co-workers analyzing the human transcriptome) (Cheng et al, 2005) The final count
Trang 31may not be agreed upon for some time It seems clear, though that the complexity of our species does not come solely from the numbers and types of genes we possess As mentioned above, the methods controlling gene expression can be very complex, and are still being characterized and described Indeed, the latest entry into our database, ThPOK(T helper-inducing POZ/Krueppel factor), was only assigned functionality in T cell
development earlier this year (He et al, 2005) However, information at this level of
detail has begun to accumulate Data from microarray and other high throughput
experiments has also begun to make a mark on the field In a massively parallel
experiment, Greenbaum and Zhuang identified target genes for E2A using a gene-taggingsystem with ChIP analysis (Greenbaum and Zhuang, 2002) Thymocytes of varying stages of development were sampled by microarray analysis to identify transcription factors expressed during this process This particular study looked for genes annotated
by Gene Ontology as possessing transcription factor activity and having been either expressed or repressed during the time frame as measured by a fold change of two or greater compared to all other stages where that gene was present Although this type of analysis is dependent on the accuracy of the GO annotations and genes with consistently high expression might be missed, a large number of candidate genes have been reported
Many of the factors have well-known roles in T cell development (Tabrizifard et al,
2004)
Undoubtedly, the use of such techniques will continue and contribute to
interactions that have been demonstrated more rigorously For the specific arena of B and T cell development the number of interactions is relatively small, but is still quickly becoming more than can be completely assimilated by experts in the field Recently,
Trang 32many good reviews have been written that attempt to highlight the key players involved
in the development of these two cell types Given the breadth of information available however, these reviews typically focus on a specific part of the process such as early B cell development (Busslinger, 2004), a particular transcription factor such as NF-κB (Beinke and Ley, 2004) or include only a portion of the TFs and targets involved (Kuo and Leiden, 1999; Barholdy and Matthias, 2004) Storing the transcription factor
information that has been described so far in a freely accessible database would allow researchers quick access to information about molecules with which they are less
familiar However, this is an area in need of work The scientific community clearly seesthe need to store the deluge of interactions, elucidated through time consuming laboratorywork, in freely accessible repositories or databases The following section will attempt tohighlight similar research efforts in the area of database storage of such information, while avoiding the topic of regulatory networks which has been described elsewhere
Related Research
Curated databases have emerged as an important topic in biology and
bioinformatics The sheer amount of information available calls for such computerized storage methods These databases can take several forms Entire web-based databases have been devoted to a single transcription factor’s target gene An example of this type
is the nf-kb.org (http://people.bu.edu/gilmore/nf-kb/target/index.html) website maintained
by Dr Gilmore at Boston University This site is a repository of information related to the Rel/NF-κB genes Although the site is not a database in the sense of storing records
in a structured manner; it does maintain information in a list format separated in a logical
Trang 33organization A true database is maintained at the Ohio State University Medical School which holds important promoter sequences involved in hematopoiesis This database, HemoPDB (http://bioinformatics.med.ohio-state.edu/HemoPDB/) is a web-accessible application which is searchable and provides useful links to additional information when
presenting results Genetic regulators in E coli – both cis- and trans- have been stored in
the RegulonDB (http://www.cifn.unam.mx/Computational_Genomics/regulondb/) This
is another example of a database that is manually curated, and also based on
computational predictions Original scientific literature was used as the data source for the curated data Maintained by the Program of Computational Genomics in Mexico, this
web based application stores information on transcriptional regulation, cis-element organization, and different expression conditions in the K-12 strain of E coli
To our knowledge a database containing interactions between transcription factorsand their targets during lymphocyte development is not freely available This
information is slowly becoming published in scientific literature however Despite the progress in characterizing the transcription factors involved in B and T cell development, identification of their targets has proven much more difficult (Kuo and Leiden 1999) This gap in the molecular details of genetic interactions in lymphocyte development presents problems in defining networks because transcription factors are invariable targets of other transcription factors Also, these factors are known to regulate even their
own expression (Smith et al, 2002; Glimcher and Murphy, 2000) What is not known is
to what extent these factors regulate themselves and each other because all the targets are not known This project attempts to fill the gap that exists in warehousing important genetic interactions in two different ways The first is to provide a searchable database of
Trang 34the transcription factor activity known to be present in developing murine lymphocytes Secondly, by storing these interactions between TF and target genes in developmental time, the database can serve as a base to develop transcriptional regulatory networks for this process
The Database
Here we are introducing a web based database designed to hold transcription factor interactions involved in B and T lymphocyte development Entitled ‘LymphTF Database,’ the database is accessible on the World Wide Web at
http://www.iupui.edu/~tfinterx/ This database has been designed to logically follow transcription factor activity during lymphocyte development including the sometimes ephemeral nature of transcription factors and their target genes We have also designed the site to be specifically useful for immunologists by creating intuitive searches and easy
to interpret results Bioinformatics professionals will find the information valuable when modeling the complex process of lymphocyte development
Trang 35Materials & Methods
At the most basic level, this is a literature review project which collects DNA interactions and presents that data in a simple format The design of the project is
protein-to collect and organize the data in a manner that makes sense protein-to researchers in the field oflymphocyte developmental biology, and also serves as a reservoir of links to relevant information such as links to gene information databases, literature databases, and
provides a working definition of developmental stages as defined for the purposes of classification The combination of a MySQL database searched via a web page front end utilizing the PHP scripting language was used to present the information in a freely available format
Definitions and General Comments
Some definitions are appropriate for completely explaining the contents of this database Transcription factors have been defined as proteins with a known DNA bindingsite which either activate or repress expression of one or more target genes Target genes are defined as a verified locus encoding one or more protein products Stages of
lymphocyte development have been in use for several years as a method to describe the progression of differentiation The development of improved sorting techniques and differences in cellular potential has contributed to this study (Hardy 2003) Interactions have been defined as a transcription factor’s effect at a given stage of either T or B cell activity on a specific target Work in immunology (molecular and otherwise) has been done using mice, human, rat, goat, rabbit and other animals For consistency, the