Binding of HAKN1 to different oligonucleotides indicated that HAKN1 prefers the sequence TGACA TGTCA, with changes within the GAC core more pro-foundly affecting the interaction.. Concer
Trang 1of the interaction of the sunflower KNOX protein HAKN1 with DNA
Mariana F Tioni, Ivana L Viola, Raquel L Chan and Daniel H Gonzalez
Ca´tedra de Biologı´a Celular y Molecular, Facultad de Bioquı´mica y Ciencias Biolo´gicas, Universidad Nacional del Litoral, Santa Fe, Argentina
Homeobox genes encode a group of eukaryotic
tran-scription factors generally involved in the regulation of
developmental processes [1] These genes contain a
region coding for the homeodomain, a 60 amino acid
protein motif that interacts specifically with DNA [2]
The homeodomain folds into a characteristic
three-helix structure Helices I and II are connected by a
loop, while helices II and III are separated by a turn,
resembling prokaryotic helix-turn-helix transcription
factors However, unlike helix-turn-helix-containing
proteins, most homeodomains are able to bind DNA
as monomers with high affinity, through interactions made by helix III (the so-called recognition helix) and
a disordered N-terminal arm located beyond helix I [3–6]
In plants, the first homeobox was identified in the maize gene Knotted1 (kn1; [7]) Dominant mutations in kn1, which is normally active only in meristematic cells, affect leaf development due to its aberrant expression in these organs [8] Additional kn1-like genes (also termed knox genes) have been isolated from maize and other monocot and dicot species
Keywords
DNA-binding specificity; footprinting;
homeodomain; KNOX protein; recognition
code
Correspondence
D H Gonzalez, Ca´tedra de Biologı´a Celular
y Molecular, Facultad de Bioquı´mica y
Ciencias Biolo´gicas (UNL), CC 242 Paraje El
Pozo, 3000 Santa Fe, Argentina
Fax ⁄ Tel: +54 342 4575219
E-mail: dhgonza@fbcb.unl.edu.ar
(Received 13 July 2004, revised 31 August
2004, accepted 21 September 2004)
doi:10.1111/j.1432-1033.2004.04402.x
The interaction of the homeodomain of the sunflower KNOX protein HAKN1 with DNA was studied by site-directed mutagenesis, hydroxyl radical footprinting and missing nucleoside experiments Binding of HAKN1 to different oligonucleotides indicated that HAKN1 prefers the sequence TGACA (TGTCA), with changes within the GAC core more pro-foundly affecting the interaction Footprinting and missing nucleoside experiments using hydroxyl radical cleavage of DNA showed that HAKN1 interacts with a 6-bp region of the strand carrying the GAC core, covering the core and nucleotides towards the 3¢ end On the other strand, protec-tion was observed along an 8-bp region, comprising two addiprotec-tional nucleo-tides complementary to those preceding the core Changes in the residue present at position 50 produced proteins with different specificities An I50S mutant showed a preference for TGACT, while the presence of lysine shifted the preference to TGACC, suggesting that residue 50 interacts with nucleotide(s) 3¢ to GAC Mutation of Lys54 fi Val produced a protein with reduced affinity and relaxed specificity, able to recognize the sequence TGAAA, while the conservative change of Arg55fi Lys completely abol-ished binding to DNA Based on these results, we propose a model for the interaction of HAKN1 with DNA in which helix III of the homeodomain accommodates along the major groove with Arg55, Asn51, Lys54 and Ile50, establishing specific contacts with bases of the GACA sequence or their complements This model can be extended to other KNOX proteins given the conservation of these amino acids in all members of the family
Abbreviations
TALE, three-amino-acid loop extension.
Trang 2(reviewed in [9]), indicating that this class of genes
constitutes a family present throughout the plant
king-dom The knox family of genes can be subdivided into
two classes, I and II, by sequence relatedness and
expression patterns [10] Based on the expression
pat-terns [11–13], analysis of mutants [14–17] and
over-expression studies [18–21] it was proposed that class
I knox genes are involved in the maintenance of
meris-tematic cells in an undifferentiated state Indeed,
over-expression of some class I genes in Arabidopsis and
tobacco produces the proliferation of meristems on the
surface of leaves
The proteins encoded by knox genes belong to the
three-amino-acid loop extension (TALE) superclass
Members of this superclass contain three extra amino
acids within the loop connecting helices I and II [22]
and are present in several eukaryotic kingdoms,
sug-gesting that they represent an early evolutionary
acqui-sition
Concerning their interaction with DNA, studies with
proteins from barley [23], tobacco [24], rice [25] and
maize [26] indicate that they bind related sequences
containing a TGAC core (GTCA in the
complement-ary strand), considerably different from the sequence
TAAT recognized by most homeodomains [27]
Eluci-dating the structural basis for this difference would
help to understand at the molecular level how KNOX
transcription factors recognize their DNA target site
In this study, we analysed the interaction of the
homeodomain of HAKN1, a sunflower class I KNOX
protein [28], with DNA Based on studies of wild-type and mutant forms of the homeodomain, we propose a model for the complex between HAKN1 and its target site This model must be applicable to all KNOX homeodomains, as important amino acids are con-served within this family
Results
Expression and DNA binding analysis of the HAKN1 homeodomain
The homeodomain of the KNOX transcription factor HAKN1 was expressed in Escherichia coli as a fusion with the maltose binding protein using vector pMALc2 The fusion protein was purified by affinity chromatography in amylose resin and used for DNA– protein interaction studies A 24-bp oligonucleotide (HAKN1 binding site; BS1) containing the sequence TGT(G⁄ C)ACA was used as DNA target This seq-uence was designed against a compilation of seqseq-uences bound by KNOX transcription factors from different species, and contains the TGAC (GTCA) core that is present in all of them
Figure 1A shows an electrophoretic mobility shift assay performed with HAKN1 and oligonucleotide BS1 or variants containing changes at single positions (sequences shown in the right panel) We have arbi-trarily numbered from 1 to 7 those positions present in the strand that contains the central G Two shifted
B
A
C
Fig 1 Binding of HAKN1 to different
oligo-nucleotides (A) Electrophoretic mobility
shift assay performed with 30 ng of HAKN1
and oligonucleotides containing different
variants of the sequence TGT(G ⁄ C)ACA
(numbers indicated above each lane) (B)
Competition assay of HAKN1 binding to BS1
using a 15-fold molar excess of different
oligonucleotides (numbers indicated above
each lane) as competitors The sequence of
the 7-bp core present in each
oligonucleo-tide is shown in (C) for reference
Modifica-tions with respect to BS1 are shown within
black boxes.
Trang 3bands of similar intensity were observed in this
experi-ment The relative intensity of the low mobility
com-plex varied when different protein preparations were
used We speculate that this behavior may arise from
aggregation of the protein Nevertheless, different
pro-tein preparations showed the same specificity and
affin-ity when considering the amount of bound protein as
the sum of both shifted bands These bands displayed
similar footprinting patterns (see below), suggesting
that a single HAKN1 homeodomain is bound to DNA
in both complexes This is strengthened by the fact
that only monomer–DNA complexes were observed in
crosslinking experiments (data not shown)
Analysis of the interaction of HAKN1 with different
oligonucleotides indicates that modifications in the
outermost positions (1 and⁄ or 7) do not significantly
affect binding (Fig 1A, lanes BS1, 1,7, 7T, 1 and 7C),
while certain inner nucleotides, notably those located
at positions 4–6, are critical for binding (Fig 1A, lanes
4, 6A and 5) Regarding position 7, the change of A
for T does not seem to affect binding, while the
intro-duction of C partially reduces the amount of complex
formed Mutations at positions 2 (not shown) and 3
(lane 3) have only a moderate effect Similar
obser-vations could be made in experiments in which the
binding to oligonucleotide BS1 was competed with a
15-fold molar excess of different oligonucleotides
(Fig 1B) These results indicate that HAKN1 mainly
recognizes the GAC (GTC) trinucleotide and displays
lower specificity at outer positions The GAC triplet is
contained within the TGAC sequence, found to be
part of the binding sites of the barley KNOX protein
Hooded [23] and of maize Knotted1 [26] This element
is also present in the sequence GTNAC, postulated to
be important for the binding of the tobacco protein
NTH15 to DNA [24], provided that N is G or C
Analysis of DNA binding by hydroxyl radical
footprinting and interference assays
A more detailed picture of the binding of HAKN1 to
its target site was obtained by the analysis of
footprint-ing patterns after cleavage of free and protein-bound
DNA with hydroxyl radicals generated by Fe–EDTA
complexes For this purpose, a dimer of the
corres-ponding oligonucleotide ligated through its EcoRI
cohesive site was cloned into the BamHI site of
pBlue-script SK– Cleavage with HindIII and XbaI produces
a 94-bp fragment that contains two HAKN1 binding
sites in opposite orientations After HAKN1 binding
to the 94-bp oligonucleotide, labeled specifically at one
of its 3¢ ends by filling-in the HindIII site, the complex
was subjected to hydroxyl radical attack, and free and
bound DNA were separated, recovered from the gel and analysed by denaturing polyacrylamide gel electro-phoresis (Fig 2A) Because the oligonucleotide con-tains two sites in opposite orientation, both strands of the binding site can be observed in a single footprint-ing assay Analysis of the cleavage patterns indicates that HAKN1 protects six nucleotides from the strand carrying the sequence TGTGACA (hereafter named the top strand) The protected area includes GACA and two adjacent nucleotides (GA) towards the 3¢ end (Fig 2A) On the bottom strand, the protected region covers two additional nucleotides, AC complementary
to GT in TGTGACA (Fig 2A) For both strands, the highest protection is observed within the GAC core, suggesting that the protein makes closer contacts in this region This agrees with the important role of these nucleotide positions in determining the binding strength of HAKN1 to DNA shown by electrophoretic mobility shift assays When the oligonucleotide labeled
at its XbaI site (at the opposite 3¢ end) was used, foot-printing patterns were identical to those described above, indicating that HAKN1 makes equivalent contacts with both binding sites present in the 94-bp fragment
Footprinting analysis was also performed with a similar oligonucleotide containing two mutated sites [BS(mut1,7); AGTGACT instead of TGTGACA, mutations underlined) The results obtained were essentially the same (not shown), suggesting that HAKN1 contacts the nucleotide adjacent to the GAC core and its complement on the other strand whether they are A or T
Information about the nucleotide positions that influence binding of HAKN1 to DNA was obtained from missing nucleoside (interference) experiments Here, DNA is treated with hydroxyl radical-generating agents before protein binding, thus producing a popu-lation of molecules with single cleavages along the phosphodiester backbone This population is incubated with the protein of interest and subjected to an elec-trophoretic mobility shift assay from which the free and bound fractions are recovered Molecules with cleavages at positions important for binding are then under-represented in the bound fraction and, depend-ing on the binddepend-ing conditions, over-represented in the free fraction Figure 2B shows a missing nucleoside experiment using HAKN1 and the 94-bp DNA frag-ment containing two binding sites previously labeled in one of its 3¢ ends (HindIII or XbaI sites) and treated with Fe–EDTA It is noteworthy that there is a good correlation between the region protected by HAKN1 and the nucleotide positions important for binding This means that all nucleotides in the protected area
Trang 4establish contacts that contribute to binding efficiency.
Again, the GAC core seems to be particularly
import-ant, but outside positions are also required (Fig 2B)
Within the core, modifications to G and A or their
complements influence binding more markedly These
results agree with the fact that mutations of these two
nucleotides abolish binding of HAKN1 to DNA On
the other hand, because nucleotides at outside posi-tions can be mutated without significant loss in bind-ing efficiency, it can be assumed that they mainly participate in nonspecific contacts, such as those estab-lished with the sugar–phosphate backbone
The results of footprinting and missing nucleoside experiments also indicate that HAKN1 does not make
Fig 2 Hydroxyl radical footprinting and interference analysis of HAKN1 binding to DNA An oligonucleotide containing two HAKN1 binding sites (BS1) in opposite orientations was labeled in the 3¢ end of either strand (HindIII or XbaI sites) and subjected to hydroxyl radical attack either after (A) or before (B) HAKN1 binding Free (F) and bound (B) DNA were separated and analysed A portion of the same fragment digested with defined restriction enzymes was used as a standard (S) to calculate the position of the footprint Letters to the right of each panel indicate the DNA sequence (5¢ end in the upper part) of the corresponding strand in this region In the lower part, the sequence of the binding site is shown and the protected positions are indicated in bold and underlined The GAC (GTC) core that shows the highest protec-tion is shaded.
Trang 5symmetrical contacts with its target site The protein
establishes contacts with both strands at the right side
of the GAC core, while only one strand seems to be
contacted at the left side This lack of symmetry and
the extension of the contacts most probably indicate
that only one molecule of HAKN1 is bound at each
target site
Binding of HAKN1 single-site mutants to DNA
The picture that emerges from our results is that
HAKN1 binds an 8-bp region of DNA with a tGACa
(tGTCa) specificity core An interesting question is
how the HAKN1 homeodomain interacts with this
sequence and which amino acids are involved in
sequence-specific contacts To answer this, we have
analysed the effect of single-site mutations on HAKN1
binding to TGACA and variants of this sequence It is
logical to assume that changes in amino acids involved
in the interaction must influence binding efficiency In
addition, some substitutions may alter binding
specific-ity, indicating the existence of contacts between a given
residue and defined positions within the DNA
Residue 50 (53 in TALE homeodomains) is usually
involved in determining the different specificities
among related homeodomains [27,29–31] In
homeo-domains that bind the canonical TAAT sequence,
residue 50 interacts with nucleotides located 3¢ to this site [27,31] We reasoned, then, that changing Ile50, present in HAKN1 and all KNOX proteins, may influ-ence sequinflu-ence preferinflu-ences at external positions of the core As a first approach, we mutated Ile50 to Ser, pre-sent in the yeast TALE protein MATa2 [32] The ana-lysis of binding of I50S–HAKN1 to variants of the HAKN1 binding site indicates a preference for an oligonucleotide containing the sequence TGACT, while the wild-type HAKN1 homeodomain binds TGACA and TGACT with similar efficiency (Fig 3A) This suggests that residue 50 interacts with the 3¢ region of the top strand (and⁄ or the 5¢ region of the bottom strand), outside the GAC core This is also evident in competition experiments (Fig 3B), where oligonucleo-tides BS(mut1,7) and BS(mut7T) compete more effi-ciently than variants with A [BS1 and BS(mut1)] or C [BS(mut7C)] at this position Changes at other posi-tions within the target DNA sequence produced sim-ilar effects on binding than with the wild-type protein (Fig 3)
To further explore the hypothesis that residue 50
is oriented towards the 3¢ end of the top strand, we also mutated Ile50 to Lys, present in Drosophila bicoid [33] I50K–HAKN1 shows a net preference for an oligonucleotide containing the sequence TGACCC [BS(mut7C)] over the original TGACAG, present in
Fig 3 DNA binding preferences of HAKN1 mutants at position 50 (A) Electrophoretic mobility shift assay of I50S–HAKN1 (30 ng) binding to BS1 and BS(mut1,7) (B) Binding
of I50S–HAKN1 to BS(mut1,7) was com-peted with a 100-fold molar excess of oligo-nucleotides with different sequences (depicted in Fig 1) (C) Binding of I50K– HAKN1 (30 ng) to different oligonucleotides was analysed by an electrophoretic mobility shift assay In (D), the binding of different amounts (50, 100 and 250 ng) of either HAKN1 or I50K-HAKN1 to oligonucleotides BS1 and BS(mut7C) is shown Oligonucleo-tide sequences are shown in Fig 1.
Trang 6BS1 and BS(mut1) (Fig 3C) This result confirms that
residue 50 interacts with nucleotides adjacent to the
TGAC core Binding analysis with different
oligonuc-leotides indicated that I50K–HAKN1 is also able to
interact with oligonucleotide BS(mut6G), that contains
a TGAG core (Fig 3C) In fact, when higher protein
concentrations were used in the assays, binding to
TGAGAG was considerably better than to TGACAG
(not shown), suggesting that Lys50 may also be able to
contact the fourth position of the core, thus changing
the preference for G The inclusion of Lys at position
50, in addition to promoting a change in specificity,
resulted in a protein with increased affinity towards its
preferred binding site (Fig 3D) An additional,
fast-migrating band observed in this experiment is present
in free DNA and may represent noncovalent
oligo-nucleotide dimers interacting through their cohesive
ends We have observed that the presence of this
spe-cies does not affect the intensity of the shifted band
The increased affinity dispalyed by I50K–HAKN1
may arise from the fact that lysine is able to establish
hydrogen bonds with DNA, which are more stable
than the van der Waals contacts established by Ile
The interaction of mutants at position 50 with their
preferred binding sites was also analysed by
footprint-ing experiments I50S–HAKN1 protects a region
cov-ering five nucleotides of the top strand and six
nucleotides of the bottom strand (Fig 4A) This region
is coincident with the one more strongly protected by wild-type HAKN1, but is shorter towards the 3¢ end
of the top strand This result further suggests that Ile50 contacts the nucleotides located 3¢ to the TGAC core, as its replacement by a smaller residue such as Ser allows better access of this region to the modifying agent Conversely, I50K–HAKN1 shows an extended footprinting pattern towards the 3¢ end of the top strand and the 5¢ end of the bottom strand (Fig 4B) This agrees with the presence of a larger residue that makes stable contacts with this region of DNA The interaction of mutants at position 50, and par-ticularly of I50K–HAKN1 with DNA, provides a framework to build a model of HAKN1–DNA inter-actions, taking into account experiments performed with other homeodomains The protein bicoid, for example,
is able to bind the sequence TAATCC that contains the canonical TAAT box [31] Lys50 of bicoid puta-tively interacts with the CC dinucleotide, as its muta-tion to Gln changes its preference to TAATTG [29] A reciprocal change, Gln50 to Lys, in engrailed or fushi tarazu shifts sequence preferences from TAATTA or TAATTG to TAATCC [30,34] We postulate, then, that positioning of the HAKN1 homeodomain along the TGAC core in DNA must be equivalent to that adopted by other homeodomains along the TAAT sequence The third position of both sequences con-tains an adenine, known to interact with Asn51,
Fig 4 Hydroxyl radical footprinting of I50S–HAKN1 (A) and I50K–HAKN1 (B) bound to their preferred binding sites After binding and hydro-xyl radical attack, free (F) and bound (B) DNA were separated and analysed The left and right panels in (A) and (B) represent the top and bottom strands of the binding site, respectively A portion of the same fragment digested with defined restriction enzymes was used as a standard to calculate the position of the footprint Letters to the right of each panel indicate the DNA sequence (5¢ end in the upper part) of the corresponding strand in this region Below the footprints, the sequence of the corresponding binding site is shown and the protected positions are indicated in bold and underlined.
Trang 7universally conserved among homeodomains [2,31].
The importance of this interaction is reflected by the
fact that this nucleotide cannot be mutated without a
complete loss of HAKN1 binding The fourth base in
TAAT is usually recognized by a nonpolar amino acid
(mostly Ile or Val) present at position 47 [2,31]
HAKN1 contains Asn at this position, which may be
too small to establish specific contacts with bases
Asn47 does not make specific contacts in the
homeo-domain–DNA complexes of MATa2 and extradenticle
[5,35] Here, we favour the hypothesis that the fourth
position of the core is contacted by Lys54, because the
nucleotide next to that contacted by Asn51 is
recog-nized by residue 54 in other homeodomains (see
below) In support of a prominent role of Lys54, its
mutation to Val produces a significant decrease in
DNA binding (not shown) In addition, K54V–
HAKN1 binds with similar efficiency to sequence
vari-ants containing either A [BS(mut6A)] or C (BS1) at
the fourth position of TGAC, suggesting that it has a
decreased discrimination capacity with respect to
wild-type HAKN1 (Fig 5) An oligonucleotide containing
TGAG [BS(mut6G)], however, is bound with reduced
efficiency, suggesting that the mutant homeodomain
retains partial specificity Discrimination at other
posi-tions of the bound region is similar to those displayed
by the wild-type protein Although the results obtained
do not necessarily indicate a direct role of Lys54 in
establishing contacts with DNA, a plausible
explan-ation is that this residue interacts with at least one of
the members of the CÆG pair at the fourth position of
TGAC in the HAKN1–DNA complex
The two leftmost positions of the core interact
through the minor groove with the N-terminal arm in
most homeodomains [2,31] In yeast MATa2, for
example, the N-terminal arm makes base-specific
con-tacts with the first two nucleotides of a TTAC core [5]
Hence, we replaced the N-terminal arm (residues 1–9)
of HAKN1 with the same portion of MATa2, to
determine if a change in specificity was observed The
resulting protein, Na–HAKN1, showed an overall
reduced affinity but the same sequence preferences as
HAKN1 (Fig 6) It is noteworthy that it did not bind
oligonucleotide BS(mut4), which contains a TTAC
core on the complementary strand This indicates that
the N-terminal arm of MATa2 is not able to interact
with DNA within the context of the HAKN1
homeo-domain as it does within MATa2 Poor binding may
arise from incorrect folding of the chimeric protein or
from the fact that important contacts with DNA are
lost upon replacement of the HAKN1 N-terminal arm
In addition to a role of the N-terminal arm in
contact-ing the first two amino acids of the core, examination
Fig 6 Effect of changes within the N-terminal arm and position 55
on the binding of HAKN1 to oligonucleotides BS1 and BS(mut4) Binding to oligonucleotides containing the sequences TGACA (BS1)
or TTACA [BS(mut4)] was analysed using 30 ng of proteins HAKN1, R55K–HAKN1, Na–HAKN1 (a protein containing the N-terminal arm
of MATa2) or R55K–Na–HAKN1 (a protein with both modifications).
A
B
Fig 5 K54V–HAKN1 shows relaxed specificity Binding of K54V– HAKN1 (150 ng) to different oligonucleotides was analysed in an electrophoretic mobility shift assay (A) (B) Competition of K54V– HAKN1 binding to BS1 with a 25-fold molar excess of different oligonucleotides (depicted in Fig 1).
Trang 8of other homeodomain–DNA complexes suggests the
possibility that Arg55 recognizes the second position
of TGAC Arg55 participates in binding to G residues
in other homeodomains, such as yeast MATa1
(GATG; [36]) or Drosophila extradenticle (TGAT;
[35]) Consistent with a role in DNA binding, an
Arg55 to Ala mutation completely disrupts the
inter-action of HAKN1 with DNA (not shown) To further
analyse its involvement in base-specific contacts, we
reasoned that a conservative substitution for Lys
would not affect nonspecific interactions (i.e
electro-static interactions with the phosphate backbone), but
would preclude the establishment of hydrogen bonds
with the guanine base of G The results shown in
Fig 6 indicate that R55K–HAKN1 is unable to bind
DNA, supporting the hypothesis that Arg55 is
involved in base-specific contacts, which are disrupted
upon mutation to Lys Another explanation would be
that this change disturbs the overall folding of the
homeodomain, but this seems unlikely because several
homeodomains, notably MATa2, contain Lys at
posi-tion 55 Assuming that the N-terminal arm of MATa2
and Arg55 may be incompatible as both portions may
interact with the same positions of the target site, we
also constructed a mutant in which the N-terminal
arm of MATa2 was inserted into the R55K mutant of
HAKN1 This protein was also ineffective in binding
to the HAKN1 target site or its variants (Fig 6)
A model for the HAKN1–DNA interaction
Based on the analysis of the binding of wild-type and
mutant HAKN1 homeodomains to different DNA
tar-get sites, we propose a model for the interaction of
HAKN1 with DNA A set of four amino acids,
located within helix III of the homeodomain, would
make base-specific contacts with defined nucleotides
within the tGACAg sequence Arg55 would establish a
pair of hydrogen bonds with positions O6 and N7 of
guanine in GACA As mentioned above, similar
inter-actions have been observed in complexes of other
homeodomains with DNA [35,36] Asn51 would
inter-act with the first adenine in GACA, also establishing
a pair of hydrogen bonds, as in most homeodomain–
DNA complexes The next position (C, or G in the
opposite strand) would be contacted by Lys54
Although there is no evidence in the literature about a
specific contact made by a lysine at this position,
resi-due 54 interacts with the nucleotide adjacent to that
bound by Asn51 in several homeodomains, for
exam-pleMATa2 (Arg54, TTAC; [5]), TTF1 (Tyr54, CAAG;
[37]), bicoid (Arg54, TAAT or TAAG; [38]) and Hahr1
(Thr54, TAAA, in this case in combination with
Phe47; [39]) Additionally, lysine determines a prefer-ence for C at an adjacent position when present at position 50 in bicoid and other mutant homeodomains (including HAKN1, see above), presumably by inter-acting with guanine bases through hydrogen bonds as observed in the Lys50–engrailed crystal structure [40] Finally, our results also indicate that Ile50 is involved in establishing a preference for A or T at the 3¢ side of the core Mutations of this residue to Ser or Lys were able to confer a new binding specificity to HAKN1, changing to a net preference for T or C, respectively Ile50 is present in MATa1, where it inter-acts with a TA dinucleotide adjacent to the position contacted by Met54 [36] Accordingly, Ile50 may also
be involved in contacts with an adjacent position, which is protected by HAKN1 in footprinting experi-ments and interferes with binding when modified by hydroxyl radical attack
To examine the consistence of the interactions des-cribed above, we have constructed a theoretical model
of the HAKN1–DNA complex using the program swiss-model [41] available in the ExPASy web server Different models for wild-type and mutant HAKN1 were obtained using the homeodomain–DNA com-plexes of extradenticle [35], MATa1 [36] and MATa2 [5] as templates Figure 7 shows the alignment of helix III of the HAKN1 homeodomain along the major groove of DNA (the MATa1 binding site in this case) Amino acids in red are those present in wild-type HAKN1 that putatively contact the GAC core Note that Arg55 and Asn51 establish hydrogen bonds with adjacent G and A, respectively Interestingly, Lys54 also appears making a hydrogen bond with the N7 of
an adjacent purine (adenine in this case) present in the complementary strand A similar contact could be made with a guanine complementary to C in GAC, further suggesting that Lys54 is likely to contact this position The position of Lys55 in the corresponding mutant is shown in yellow Clearly, the specific con-tacts made by Arg55 are lost and are replaced by an interaction with the phosphate backbone Val54 is shown in pink The shorter side chain and the loss of
a hydrogen bond explain the decrease in affinity and relaxed specificity Finally, the variants at position 50 (Ile in orange, Ser in green and Lys in blue) are also represented All these residues are oriented towards the 3¢ end of the core and probably establish contacts with the complementary strand It should be emphasized that the mutagenesis experiments described here do not prove that certain amino acids make base-specific contacts, especially when a new specificity at a defined position was not achieved The combination of these experiments with the footprinting results and previous
Trang 9knowledge, however, are highly indicative that this is
the case Determination of the three-dimensional
struc-ture of the complex will be required to evaluate the
accuracy of the DNA–protein contacts proposed by
this model
Discussion
In this study, we investigated the interaction of the
homeodomain of the KNOX protein HAKN1 with
DNA As no structural studies on the interaction of
any KNOX protein with DNA have been reported,
ours constitutes a first approach to understand these
interactions at the molecular level Electrophoretic
mobility shift assays, footprinting analyses and missing
nucleoside experiments using different binding sites
and mutated proteins allowed us to establish a model
for HAKN1–DNA interaction This model postulates
that HAKN1 binds to a TGACNN core primarily
through interactions of certain helix III amino acids
(Ile50, Asn51, Lys54 and Arg55) with DNA This
particular combination of amino acids is present only
in KNOX proteins, indicating that they may have been selected through evolution to generate a defined specif-icity Among them, the incorporation of Ile50 and Arg55 must have been particularly important Other homeodomains that contain Ile50 and Arg55 are those
of the TGIF, Meis and Bell families [22] and yeast MATa1 [36] TGIF and Meis proteins bind the sequence TGTCA (TGACA on the complementary strand [42,43]), which is identical to that recognized by HAKN1 They possess Arg at position 54, suggesting that Lys54 in HAKN1 may not be the only means of recognizing the TGAC core Accordingly, the Bell protein ATH1, which contains Val54, also selects the sequence TGACA from a random population (I Viola, unpublished results) The presence of Val54 produces, however, a relaxed specificity at the fourth position and reduced affinity within the context of both the HAKN1 (this study) and the ATH1 homeo-domain (I Viola, unpublished results) MATa1, in turn, binds a completely different sequence (GATGT⁄ ACATC [44]), indicating that other factors apart from these residues also influence specificity The GA dinu-cleotide in GATGT is recognized by Arg55 and Asn51
of MATa1, as proposed here for the GA dinucleotide
in TGAC The GT dinucleotide is, in turn, contacted
by Met54 and Ile50 through interactions with the com-plementary strand [36] This means that, in MATa1, positions contacted by residues 55⁄ 51 and 54 ⁄ 50 are separated by one additional base pair This may be originated by the presence of Val47, which binds the nucleotide adjacent to the A recognized by Asn51 in many homeodomains The above-mentioned TGAC binding proteins (including KNOX proteins) contain Asn47, which may not establish specific contacts with DNA These differences may also originate changes in the relative orientation of DNA contacting amino acids
The model presented here can also be compared with the structures determined for the TALE proteins extradenticle and PBX1 bound to DNA [35,45] These proteins bind the sequence TGAT, with Arg55 and Asn51 establishing hydrogen bonds with the GA dinu-cleotide, as proposed here for HAKN1 The initial T is also contacted by Arg55 through van der Waals inter-actions in the PBX1 complex [45] The second T makes van der Waals contacts with Asn47 As PBX1 and extradenticle contain Ile54, this situation may resemble the binding behaviour of K54V–HAKN1, which shows relaxed specificity at this position
Our results with HAKN1 clearly support the idea that there is a general recognition code for homeo-domains Accordingly, recognition at the left side of
A
B
Fig 7 A model for the interaction of HAKN1 with DNA (A)
Dia-gram of the HAKN1 DNA binding site with the residues putatively
involved in binding each position (B) Spatial model of the
interac-tion of helix III of wild-type and mutant HAKN1 homeodomains
with DNA The model was constructed with the program SWISS
-MODEL [41] using the structures of the DNA complexes of MATa1
(1YRN), extradenticle (1B8I) and MATa2 (1APL) as templates.
Amino acids in red are those present in wild-type HAKN1 that
puta-tively contact the GAC core Residues at position 50 are: Ile in
orange, Ser in green and Lys in blue Val54 and Lys55, present in
the mutants, are shown in pink and yellow, respectively.
Trang 10the conserved A that is contacted by the universally
present Asn51 is determined by the N-terminal arm
and⁄ or Arg55 The presence of Arg55 determines a G
5¢ to the conserved A, while the N-terminal arm seems
to determine a preference for A⁄ T base pairs The set
of residues present at positions 47, 50 and 54 influence
binding preferences at the right side
The putative DNA-contacting amino acids of
HAKN1 are also present in all described KNOX
pro-teins, indicating that they may all recognize identical
or similar sequences This raises the question of how
the specificity of interaction is achieved in vivo, because
different KNOX proteins have different functions
A similar paradox has been noted for animal
homeo-domains, for which current evidence suggests that
spe-cificity arises from the interaction of homeodomain
proteins with other factors that somehow influence
their DNA binding properties [46] Plant KNOX
pro-teins interact with propro-teins from the Bell family, which
also belong to the TALE superclass [47,48] and bind
similar sequences [49] Chen et al [49] have shown that
potato KNOX and Bell proteins bind two tandem
cop-ies of a TGAC motif separated by one additional
nuc-leotide As both types of homedomains seem to
establish similar contacts with their target sites
(I Viola, unpublished results), this indicates that the
respective recognition helices must lie in an antiparallel
orientation within the major groove at opposite sides
of the DNA According to the footprinting data
pre-sented here, the central nucleotide pair and the first
two pairs of the second TGAC would be contacted by
both proteins This may indicate that some
rearrange-ments may occur upon complex formation by KNOX
and Bell proteins, either before or after binding to
DNA In the complexes formed by PBX1 and
extra-denticle, the presence of Gly50, which does not contact
DNA, may allow the binding of an additional
homeo-domain in tandem immediately following TGAT
[35,43,45] The presence of Ile50, that interacts with
nucleotides located at the 3¢ side of the core, may
explain the requirement of a larger distance between
both binding sites in the complexes formed by KNOX
and Bell proteins
Sequences outside the homeodomain may also
influ-ence the binding properties of the protein Indeed, a
stretch of 16 amino acids located immediately
C-ter-minal to the homeodomain forms an a-helix that has
been shown to influence the DNA binding affinity of
the PBX1 homeodomain [50,51] As the protein used
in our assays includes a C-terminal portion, we have
analysed the structure of the region immediately
fol-lowing the HAKN1 homeodomain using several
secon-dary structure prediction programs We have only
observed a short region (five to eight amino acids depending on the program) that has a propensity to form an a-helix Therefore, we consider it unlikely that
an effect of the C-terminal tail, similar to that observed with PBX1, occurs in HAKN1 or other KNOX proteins
In summary, the results presented here constitute a framework to understand at the molecular level how KNOX proteins interact with DNA and how these interactions contribute to the establishment of active transcription complexes that influence defined develop-mental pathways within plant cells
Experimental procedures
Cloning, expression and purification of recombinant proteins
HAKN1 homeodomain and C-terminal sequences were amplified and cloned in-frame into the EcoRI and PstI sites
of the expression vector pMAL-c2 (New England Biolabs, Beverly, MA, USA) Amplifications were performed using PfuDNA polymerase and oligonucleotides MALN1: 5¢-GC GGAATTCAAAAAGAGAAAGAAAGGG-3¢ and MALC: 5¢-GGCCTGCAGCTAGAGAAGTGAAACATC-3¢ with HAKN1 cDNA [28] as the template
An I50S mutant was constructed using complementary oligonucleotides I50SF and I50SR (5¢-CAACTGGTTC AGCAACCAAAGGAA-3¢ and 5¢-TTCCTTTGGTTGC TGAACCAGTTG-3¢; introduced mutations underlined) together with primers MALC and MALN1, respectively, to amplify partially overlapping N-terminal and C-terminal HAKN1 fragments The resulting products were mixed in buffer containing 50 mm Tris⁄ HCl (pH 7.2), 10 mm MgSO4, and 0.1 mm dithiothreitol, incubated at 95C dur-ing 5 min, and annealed by allowdur-ing the solution to cool to
24C in approximately 1 h After this, 0.5 mm of each dNTP and 5 units of the Klenow fragment of E coli DNA polymerase I were added, and incubation was followed for
1 h at 37C An aliquot of this reaction was used directly
to amplify the annealed fragments using primers MALN1 and MALC Mutants I50K, K54V, R55A and R55K were constructed in a similar way, using oligonucleotides I50KF (5¢-CAACTGGTTCAAAAACCAAAGGAA-3¢), I50KR TTCCTTTGGTTTTTGAACCAGTTG-3¢), K54VF TAAACCARAGGGTGCGGCAYTGGA-3¢), K54VR (5¢-TCCARTGCCGCACCCTYTGGTTTA-3¢), R55AF (5¢-CA AAGGAAGGCGCACTGGAA-3¢), R55AR (5¢-TTCCAG TGCGCCTTCCTTTG-3¢), R55KF (5¢-CAAAGGAAGAA GCACTGGAA-3¢) and R55KR (5¢-TTCCAGTGCTTCT TCCTTTG-3¢) to introduce the mutations The N-terminal arm of the MATa2 homeodomain (amino acids 1–9) was introduced into the HAKN1 homeodomain using two successive rounds of amplification with oligonucleotides