To conduct a search, each member of a polypeptide database is converted to a hydropathy profile, peaks are automatically detected, and the pattern of peaks is compared with a tem-plate..
Trang 1Identification of novel membrane proteins by searching for patterns
in hydropathy profiles
John D Clements and Rowena E Martin
School of Biochemistry and Molecular Biology, Australian National University, Canberra, Australia
A technique has been developed to search a proteome
database for new members of a functional class of
mem-brane protein It takes advantage of the highly conserved
secondary structure of functionally related membrane
proteins Such proteins typically have the same number of
transmembrane domains located at similar relative positions
in their polypeptide sequence This gives rise to a
charac-teristic pattern of peaks in their hydropathy profiles To
conduct a search, each member of a polypeptide database is
converted to a hydropathy profile, peaks are automatically
detected, and the pattern of peaks is compared with a
tem-plate A template was designed for the acetylcholine (ACh)
and glycine receptors of the cys-loop receptor superfamily
The key feature was a closely spaced triplet of hydropathy
peaks bracketed by deep valleys When applied to the human
proteome the search procedure retrieved 153 profiles with a
receptor-like triplet of peaks The approach was highly
selective with 70% of the retrieved profiles annotated as known or putative receptors These included ACh, glycine, c-amino butyric acid and seretonin receptors, which are all related by sequence However, ionotropic glutamate recep-tors, which have almost no sequence homology with ACh receptors, were also retrieved Thus, the strategy can find members of a functional class that cannot be identified by sequence alignment To demonstrate that the strategy can easily be extended to other membrane protein families, a template was developed for the neurotransmitter/Na+ symporter family, and similar results were obtained This approach should prove a useful adjunct to sequence-based retrieval tools when searching for novel membrane proteins Keywords: hydropathy profile; integral membrane protein; ligand-gated channel; neurotransmitter receptor; proteo-mics; transporter
Integral membrane proteins are responsible for the majority
of interactions between a cell and its external environment
Approximately 20% of the genes in animal, plant, yeast and
bacteria genomes encode integral membrane proteins,
consistent with their fundamental importance to cellular
function [1–3] Transmembrane a-helices are encoded by a
long stretch of predominantly hydrophobic residues
(typic-ally15–19), which is sufficient to cross the hydrophobic
region of the membrane bilayer (2.5 nm) [4] The
pronounced compositional bias arises because these residues
must be capable of hydrophobic interactions with the lipid
environment in the interior of the membrane Most
membrane-associated domains produce an easily identified
peak in the hydropathy profile of the polypeptide Standard
software tools are available that can identify the putative
transmembrane domains of a membrane protein based on
its hydropathy profile [5,6] Sophisticated algorithms that
combine hydropathy and sequence analysis can predict up
to 95% of transmembrane helices [7–12], but simple hydropathy peak detection strategies are also very effective [13]
The primary function of most membrane proteins is to transfer molecules, ions or signals between the exterior and interior of a cell, or subcellular compartment, and trans-membrane domains provide the physical conduit for the transfer Typically, several transmembrane domains com-bine to form a tightly coupled structure that is intimately involved in the function of the protein [14] It follows that the number and the pattern of transmembrane domains will
be strongly conserved within a functionally related family Protein families within which secondary structure is highly conserved include neurotransmitter receptors, voltage-gated channels, connexins and transporters (Fig 1)
The majority of neurotransmitter-activated channels can
be assigned either to the glutamate cationic receptor (iGluR) superfamily, or the cys-loop receptor superfamily, which includes acetylcholine (ACh), glycine, c-amino butyric acid (GABA) and serotonin receptors [15] Channels from both superfamilies are formed from subunits that have four membrane-associated domains These four domains are organized as a cluster of three closely spaced domains near the centre of the polypeptide, and a fourth well separated domain close to the C-terminal end of the polypeptide (Fig 1A) [14] Despite the similarity of their secondary structure, there is almost no sequence homology between the two superfamilies
Neuronal voltage-gated Na+, Ca2+ and K+ channel families diverged from a common ancestor long ago and there is very little sequence homology between the families, yet all three have retained a similar secondary structure
Correspondence to J Clements, School of Biochemistry and Molecular
Biology, Australian National University, Canberra, ACT 0200,
Australia Fax: + 61 26125 0313, Tel.: + 61 26125 3465,
E-mail: John.Clements@anu.edu.au
Abbreviations: ACh, acetylcholine; AchR, acetylcholine receptor;
GABA, c-amino butyric acid; HU, hydrophobicity unit; LGIC,
ligand-gated ion channels; iGluR, glutamate cationic receptor;
NMDA, N-methyl- D -aspartate; AMPA,
a-amino-3-hydroxy-5-methyl-4-isoxazole propionate; GlyR, glycine receptor; NSS,
neurotransmitter/Na + symporter.
(Received 31 December 2001, revised 18 February 2002,
accepted 27 February 2002)
Trang 2They are formed from four subunits, each containing six
membrane-associated domains (Fig 1B) [14] In
voltage-gated Na+and Ca2+channels the four subunits are linked
together as a single protein with a series of internal repeats
In the case of voltage-gated K+channels the subunits are
expressed as separate proteins, and the channel forms as a
tetramer of these subunits (Fig 1B) [14]
Two separate families of membrane proteins form
gap-junctions between mammalian cells (connexins), and
between invertebrate cells (innexins) There is negligible
sequence homology between these families, but they share a
similar secondary structure Subunits of both connexins and
innexins contain four transmembrane domains, and
com-bine to form dodecamers [14,16–19] In contrast to
ligand-gated channels, the four transmembrane domains of
connexin and innexin are organized into two closely spaced
pairs, which are separated by an intracellular hydrophilic loop (Fig 2D) Many other functionally related protein families have been identified where secondary structural features are better conserved than the underlying amino acid sequences [20,21]
Despite clear evidence for conservation of secondary structure, little systematic use has been made of structural information in proteomic analysis Most genomic software
Fig 1 Schematic diagram showing that the pattern of transmembrane
domains is conserved within a functional class of membrane protein.
(A) LGICs typically have a closely spaced cluster of three
transmem-brane domains (dark bars) and a fourth well-separated domain This
secondary structure is conserved across the cys-loop superfamily and
the iGluR superfamily, even though there is no sequence homology
between these families Selected subunits from both families are shown.
(B) Distantly related voltage-gated channels also exhibit a
character-istic pattern of transmembrane domains Channels are formed by four
groups of six transmembrane domains Within each group, the first
five transmembrane domains are closely spaced, with the sixth domain
separated by a relatively long extracellular loop.
Fig 2 The highly conserved secondary structure of LGICs is reflected in
a characteristic pattern of peaks in their hydropathy profiles (A) The hydropathy profile of the human AChR alpha-1 subunit reveals a typical cluster of three peaks bracketed by deep valleys The peak, base and valley threshold levels used by the search algorithm are shown as horizontal dashed lines Peaks located at < 20 residues are likely to be
a cleaved signal sequences and are ignored (B,C) A similar pattern of peaks and valleys is seen in the profiles of the GABA A receptor alpha-1 subunit and glutamate receptor GluR1 subunit (D) A human conn-exin subunit also exhibits four hydropathy peaks, but they are organized in a different pattern The peaks occur in two pairs separated
by a deep valley.
Trang 3packages can generate a hydropathy profile from an
amino-acid sequence, but in general they only permit one or a few
profiles to be generated at a time The resulting hydropathy
profiles are typically examined by eye for significant
features Efforts have been made to improve and automate
this process For example, the web-based programs
TMPRED,TMHMMandMEMSTATidentify and count putative
transmembrane helices, and suggest their orientation in the
membrane [7–11] These programs are effective when
applied to individual amino acid sequences, but no software
tools are available to automatically analyse the pattern of
putative transmembrane domains (secondary structure)
A method for alignment of hydropathy profiles has been
developed [20,21], and an experimental web-based server
uses this approach to align pairs of sequences submitted
by the user, or to search a database for hydropathy profiles
that match a submitted sequence (Bioinformatics Unit,
Weizmann Institute of Science) At present, it is limited to
the SwissProt database, and to Hopp–Woods, or Kyte–
Doolittle hydrophobicity scales In principle, this approach
can be used to search for proteins with conserved secondary
structure, but there are technical issues that limit its
performance For example, a profile with a similar pattern
of peaks, but differently shaped peaks and valleys may be
missed It is equally sensitive to mismatches in both peak
(transmembrane) and valley (intra- and extracellular loop)
regions, even though evolutionary changes in valley shape
will have relatively little effect on secondary structure
In this paper we develop and test a new automated
proteome search technique Every member of a polypeptide
database is converted to a hydropathy profile, hydropathy
peaks are automatically detected, and the pattern of peaks is
compared with a template Sequences that match the
template are output to a new database, and their profiles
are displayed in a convenient format This approach can be
used to search for new members of a family or functional
class of membrane protein It can assist with functional
analysis, and may also be useful in proteome database
annotation
M E T H O D S
An algorithm was developed for searching a large
polypep-tide sequence database for proteins that are likely to be new
members of a functionally related family of membrane
proteins The program runs on a personal computer, and
the analysis of an organism’s total proteome takes about
1 min The test is applied to the hydropathy profile of each
sequence A standard (Kyte–Doolittle) algorithm [5,6] is
used to convert a sequence into a profile The amino acids
are each assigned a hydropathy value based on experimental
measures, and the resulting profile is filtered to reduce noise
We chose a set of hydropathy values and a filter width that
are near-optimal for detection of transmembrane regions
[6] The filter function is a rectangular averaging window
(box-car filter) with a length of 17 amino acid residues With
these settings, the amplitude of the peak produced by a
transmembrane a-helix is typically in the range 1–3
hydro-phobicity units (HU) (Fig 2) For example, the four
transmembrane domains are clearly visible in the
hydro-pathy profiles of three different ligand-gated ion channels
(LGICs) (Fig 2A–C) and the connexin alpha-1 subunit
(Fig 2D)
Peak detection Each polypeptide sequence in a database is subject to a series of three tests The first test simply rejects the sequence if it is too short or too long The range of acceptable lengths is determined from known members of the membrane protein family, but this restriction can be relaxed if necessary Membrane proteins always have both hydrophobic and hydrophilic regions, so profiles that do not cross both an upper and lower threshold are also rejected These thresholds are the same as those used for peak detection (Fig 2) Next, a simple peak-detection procedure is applied to each hydropathy profile, resulting
in an estimate of the number and the locations of putative transmembrane helices The algorithm identifies a peak when the profile rises from below a base threshold, crosses above a peak detection threshold, then crosses back below both the peak and base thresholds In Fig 2, the peak and base thresholds are indicated with the upper two dashed lines
1 Different threshold settings are used depending on the target protein For example, the base threshold selected for LGICs is higher than for connexins (Figs 2A–D) The location and amplitude of each peak is measured at the maximum point between the two peak threshold crossings The width of each peak is measured between the two base threshold crossings This gives a more consistent result than measuring the width at the peak threshold level The location and amplitude of each valley minimum is also measured
Comparing a profile to a template After the peaks and valleys are identified, a test is performed to determine whether they conform to a template The simplest test is to count the peaks and ask whether this number falls within a specified range The peak count may be adjusted by rejecting narrow peaks, or
by counting a broad peak as two merged peaks For example, when the base threshold is set below zero, the majority of transmembrane regions will produce a peak that is wider than 10 residues If the width of a peak is
> 30 residues it is possible that two or more closely spaced transmembrane regions have produced a single peak in the hydropathy profile A peak located within the first 20 residues is likely to be a cleaved signal sequence (destined
in most cases to be cleaved from the mature protein), and can optionally be removed from the peak count (Fig 2A) Sometimes a false hydropathy peak is detected at a location that is not a transmembrane domain, and true transmembrane peaks are occasionally missed Thus, when searching for proteins with four transmembrane domains,
a profile with three to five peaks would typically be accepted
If the number of peaks falls within the specified range, then more sophisticated template-matching tests can be applied For example, the separation between adjacent peaks (interpeak intervals) can be calculated A candidate profile can be rejected if the interpeak intervals fall outside the specified ranges Another strategy is to scan for a particular feature, such as a closely spaced cluster
of peaks bracketed by deep valleys A strategy of this type is developed below for detecting ligand-gated ion channels
Trang 4Designing and refining a template
When designing a search strategy, the peak detection
thresholds and the selection parameters are adjusted with
the dual goals of maximizing detection and minimizing
false-positives The first goal is achieved by applying the
algorithm to a sequence database containing all proteins
that belong to the family of interest The parameters are
refined by trial and error until almost all members of the
family are selected Next the same set of search parameters
is applied to a database containing unrelated membrane
protein sequences If necessary, the parameters are
fine-tuned until all members of the unrelated family are rejected
Finally, the search procedure is applied to a large database,
for example one containing the proteome of an organism
The search algorithm and several related utilities were
written using a development environment that is built
into AxoGraph (Axon Instruments, CA), a scientific
data analysis and graphics program for Macintosh
com-puters (http://www.axon.com/CN_AxoGraph4.html) The
AxoGraph plug-in programs that implement the search
algorithm are available on request, or from http://
johnc3.anu.edu.au/proteomic_plugins.sea AxoGraph was
chosen for this study because it can plot and overlay several
thousand hydropathy profiles in a single window, and
analyse them in a single operation It also has convenient
features for browsing and organizing the large number of
profiles generated by the search algorithm
R E S U L T S
A search strategy was designed for LGICs The strategy was
refined by applying it to custom polypeptide databases, and
tested by applying it to a database containing the complete
human proteome This database was chosen because it is
well annotated, which aids in the assessment of the
algorithm’s performance The results presented below are
essentially a proof of concept In general, this technique will
be more useful when applied to a database that is not
complete or well annotated
Search strategy for LGICs
The following procedure was used to develop the search
strategy for LGICs First, a custom database containing
two members of the cys
constructed ACh receptors (AChRs) and glycine
recep-tors (GlyRs) were selected using a text search of the
Entrez database Truncated sequences, duplicate
sequenc-es and sequencsequenc-es that were not LGICs were removed
manually This left 119 unique, full-length sequences
from many different animal species (including human,
chicken, frog, fish, locust, fruit-fly and nematode); these
were converted to hydropathy profiles in AxoGraph
Features common to all of the profiles were identified by
eye AxoGraph’s convenient browsing features aided in
this task Every profile had a cluster of three peaks
located approximately 200–300 residues from the start of
the sequence (Fig 2A) Each of the three peaks had an
amplitude of 1–2 HU, and the cluster of peaks was
bracketed with deep valleys extending below )2.5 HU
The cluster of three peaks was followed by a fourth peak
close to the end of the profile
Based on these observations, and following a period of trial-and-error refinement, the following selection criteria were chosen Only sequences with lengths between 300 and
1800 were accepted A peak threshold of 1.1 HU and a base threshold of 0.8 HU reliably detected all four peaks in every profile However, some of the peaks were measured as very narrow (only two residues) because the base threshold was set relatively high Therefore, narrow peaks were not rejected A putative transmembrane domain occasionally appeared as two narrow peaks Therefore, a pair of peaks separated by fewer than six residues were counted as a single peak We noted that the first and last peaks in the characteristic cluster of peaks were separated by between
55 and 66 residues Thus, the template criterion for a LGIC was the presence of a cluster of three peaks separated by between 50 and 75 residues, bounded by deep valleys of
<)2.5 HU The cluster had to be followed by at least one additional peak, but no more than three peaks
Testing the LGIC search strategy
A search of the AChR and GlyR database using the above detection criteria correctly retrieved every one of the 119 profiles Thus, the search strategy exhibits excellent sensi-tivity, as it was able to detect 100% of known GlyR and AChR across a range of species
The accuracy and sensitivity of the search strategy were tested by applying it to a custom database containing GABAA receptor sequences retrieved via a text search of the Entrez database GABAAreceptors are also members of the cys-loop superfamily, but they were not used during the selection and tuning of the search parameters The algo-rithm retrieved 39 out of 41 sequences (95%), demonstra-ting excellent sensitivity for proteins that are related in both function and sequence to the target group
Next, the selectivity of the search strategy was examined
We chose two families of integral membrane proteins which are functionally distinct from LGICs, but which also have four transmembrane domains A custom database of known and putative connexins and innexins was construc-ted using a series of text searches of the Entrez database The search algorithm was applied to the database and retrieved only one out of 122 sequences Thus, the LGIC search strategy exhibits good selectivity
The entire human proteome (Entrez) was searched and
153 profiles with a receptor-like triplet of peaks were retrieved Of these, 105 (70%) were annotated as known or putative receptors As expected, many of these were GlyR
or AChR (31) Other members of the cys-loop superfamily were also identified, including receptors for GABA (18) and seretonin (5) Of particular note, 13 members of the iGluR superfamily were also retrieved, including the
N-methyl-D-aspartate (NMDA) and kainate receptor subtypes Thus, the search algorithm succeeded in its central goal of identifying proteins that were functionally related to the target group (GlyR and AChR), but were not related by sequence homology
Of the profiles that were not annotated as receptors, six were voltage-gated potassium channels and two were transporters They were retrieved because they contained six
or seven transmembrane domains, three of which formed a cluster separated by deep valleys (Fig 3A) It was noted that the valleys between the triplet peaks were usually
Trang 5deeper for potassium channels and transporters than for LGICs The receptor detection algorithm was refined
to eliminate profiles where the deeper of the two valleys between the triplet peaks extended below )1.5 HU This refined algorithm was still able to detect 99% of known GlyR and AChR It retrieved 87 profiles from the human proteome, of which 90% were receptors Although this refined search procedure increased the selectivity for recep-tors, it also failed to retrieve any iGluRs This illustrates the inevitable trade-off between the selectivity of the search algorithm and the likelihood of detecting distantly related functional homologues
The search strategy’s sensitivity to membrane proteins that were related to the target group by function but not
by sequence, was investigated further A custom database containing 84 sequences from the iGluR superfamily was constructed using Entrez It included the NMDA, kainate and a-amino-3-hydroxy-5-methyl-4-isoxazole propionate (AMPA) receptor subtypes These receptors are function-ally related to GlyRs and AChRs, but share almost no sequence homology Also, iGluRs are thought to form tetrameric channels, in contrast with the cys-loop super-family that forms pentameric channels Despite these differences, the search algorithm retrieved 30 sequences (36%) from the iGluR database By subtype, 90% of the kainate receptors in the database were detected, but only 36% of the NMDA receptors, and 1% of the AMPA receptors Examination of the AMPA receptor hydropathy profiles revealed that the peak associated with their second membrane-associated domain did not reach the peak threshold in most cases A small reduction in this threshold would have resulted in many more AMPA and NMDA receptors being retrieved Nevertheless, these results dem-onstrate the remarkable sensitivity of the original search strategy for membrane proteins that are related to AChRs only by function
Candidate LGICs retrieved by the search strategy Four proteins with receptor-like profiles from the second search were annotated as having no known or putative function In principle, these could be novel receptors, so we examined them in greater detail The profile with accession number AAF86374 is a member of the ancient conserved domain protein family (ACDP), which has sequence elements conserved from nematode to human Intriguingly, its secondary structure is very similar to that of a LGIC, with a clear triplet of peaks followed by a well-separated fourth peak (Fig 3B) It has a shorter section preceding the triplet than a typical receptor, but it is reasonable to speculate that it is membrane protein, and possibly an ancient ion channel or receptor The next two profiles came from an uncharacterized membrane protein expressed in the hypothalamus (accession numbers NP_060945 and AAG09678) These proteins had six or possibly seven transmembrane domains and are unlikely to be receptors, but could be novel transporters or voltage-gated channel subunits (Fig 3C) The profile BAA18909 is simply anno-tated ÔunknownÕ, but aBLASTsearch revealed weak homol-ogy with a section of an intrinsic factor-vitamin B12 receptor The profile is quite similar to a typical LGIC, although a small narrow peak precedes the main triplet (Fig 3D) These findings demonstrate how the hydropathy
Fig 3 Hydropathy profiles of four proteins that were retrieved from the
human proteome by a search strategy designed to detect LGICs, but were
not annotated as receptors (A) A voltage-gated potassium channel was
incorrectly retrieved because its first two hydropathy peaks fell just
below the detection threshold Potassium channels typically have a
cluster of five peaks followed but a sixth well-separated peak Note that
although only one peak following the valley is highlighted, the
tem-plate will accept up to three peaks (B) An ancient conserved domain
protein with no known function was retrieved because of its
receptor-like cluster of three transmembrane peaks bracketed by deep valleys.
The separation between the cluster and the fourth peak was larger than
for a typical LGIC, but otherwise the secondary structure is strikingly
similar (C) An uncharacterized hypothalamus protein is unlikely to be
a LGIC, despite the fact that it is expressed in a brain region It has two
or three extra peaks before and after the triplet, giving it a secondary
structure that has more in common with a voltage-gated channel or a
transporter (D) A retrieved protein that was simply annotated
ÔunknownÕ, but which has weak sequence homology with an intrinsic
factor-vitamin B12 receptor.
Trang 6peak detection algorithm may be used to search for truly
novel members of a functional class of membrane proteins
Search strategy for neurotransmitter/Na+symporters
To demonstrate that our approach can be applied to other
functional classes of membrane protein, we developed a
search strategy for the neurotransmitter/Na+ symporter
(NSS) family A custom database was constructed
contain-ing 40 GABA and dopamine transporters, which have 10–
12 putative transmembrane domains The corresponding
peaks in the transporter profiles could be detected using a
peak threshold of 1.4 and a base threshold of 0.6 The
minimum peak width was set to 10, and peaks with a width
of up to 60 residues were accepted Profiles were accepted
only if they had between 10 and 13 peaks, arranged as a pair
of peaks, followed by a deep valley (<)1.9), then a cluster
of 8–11 peaks, extending over no more than 300 residues (Fig 4A,B) It is likely that the initial pair of peaks actually represents three transmembrane domains The second peak was typically 40 residues in width, and is probably produced
by two closely spaced transmembrane domains This search strategy identified all 40 of the targeted NSS transporter profiles
The entire human proteome (Entrez) was searched and 59 profiles with an NSS transporter-like pattern of peaks were retrieved Of these, 51 were annotated as known or putative transporters (86%) As expected, many of these were NSS transporters (54%), but several other transporters were also identified, including Na+/Ca2+ antiporters (9%), Na+/ glucose symporters (7%), K+/Cl)symporters (5%), Na+/ nucleoside transporters (3%), and organic ion transporters (3%) (Fig 4C) Thus, the search algorithm again succeeded
in identifying proteins that were functionally related to the target group, but were not related by sequence homology
D I S C U S S I O N
We have developed and tested an algorithm that can scan a large polypeptide database, and retrieve membrane proteins
on the basis of secondary structure rather than sequence homology The algorithm locates putative transmembrane domains in each sequence, and tests whether their spatial pattern matches a template In the past this process has been performed manually, by visual inspection of hydropathy plots generated one at a time Our major innovation was to automate the process, and apply it on the proteome scale A computer program performs the peak detection and tem-plate matching The complete proteome of an organism can
be scanned in about 1 min using a desktop personal computer This represents a qualitative increase in the power of the technique, and it permits new questions to be addressed An analogy may be drawn with modern sequence-based search programs, such as BLAST, which can scan multiple genomes Although it was directly based
on earlier sequence analysis programs that could align small groups of sequences, its development opened an entirely new field
In principle, our technique could be extended by complementing hydropathy peak detection with a more sophisticated analysis of the underlying sequence [8–12] Several web-based programs use such an approach to improve the reliability with which transmembrane domains can be identified, and to predict topology Incorporating additional sequence analysis into our technique would permit an orientation to be assigned to each transmembrane a-helix, which would assist structural analysis However, the additional processing would substantially slow the search run, and it unclear how much improvement would be achieved in practice A recent study evaluated all of the current methods for predicting transmembrane domains, and foundTMHMMto be the best performing program [13] However, the standard Kyte–Doolittle algorithm, which forms the basis of our search technique, was a close
runner-up Some membrane proteins incorporate a hydrophobic pore-lining region that does not cross the membrane, but instead forms a beta hairpin structure that dips into the membrane then re-emerges on the same side [22] These membrane-associated domains represent an important component of the highly conserved secondary structure
Fig 4 The conserved secondary structure of neurotransmitter/Na+
symporters is reflected in a characteristic pattern of peaks in their
hydropathy profiles (A) The hydropathy profile of a rat dopamine
symporter reveals a pair of peaks followed by a deep valley, then a
cluster of nine peaks The peak, base and valley threshold levels used
by the search algorithm are shown as horizontal dashed lines (B) A
similar pattern of peaks and valleys is seen in the profile of a closely
related rat GABA symporter (C) A human Na + -independent organic
anion transporter retrieved by the NSS symporter template exhibits a
similar pattern of peaks, although it has no sequence homology with
the neurotransmitter symporters.
Trang 7of voltage-gated potassium channels, and similar hairpin
structures may also be present in other membrane proteins
[22] A sophisticated a-helix-detection algorithm may reject
or misinterpret such regions
Our approach is loosely analogous with a strategy that
uses alignment of hydropathy profiles to search for
conserved secondary structural features in polypeptide
sequences [20,21] This alignment technique is based on
the same algorithm that is used in standard peptide and
nucleotide sequence alignment, but is applied to sequences
of hydropathy values Profile alignment will generally
provide a more stringent test for conserved structure than
our template-matching approach However, a more
strin-gent test will be less likely to detect unusual or distantly
related family members For example, a LGIC containing a
triplet of unusually high hydropathy peaks will be reliably
detected by our approach, but will receive a low score in an
alignment-based search Another problematic issue for the
alignment algorithm is what penalty should be assigned
when introducing gaps into one or both profiles, and
how this penalty should be weighted for transmembrane
domains vs extra-membrane loops
We tested the performance of the hydropathy alignment
approach by submitting the sequence of the GlyR
alpha-1 subunit to the web-based search engine http://
bioinformatics.weizmann.ac.il/hydroph/, and analysing the
first 200 sequences retrieved from the SwissProt database
Only 43% of these sequences were annotated as receptors,
and all were close relatives of AChR (ACh, glycine and
GABA receptors) No receptors for seretonin or glutamate
were identified Thus, hydropathy alignment is much less
sensitive to distantly related functional homologues, and less
selective for the membrane protein family of interest than
the template matching approach
We chose the human genome to test our search strategy,
because the thorough annotations permitted a detailed
assessment of the algorithm’s performance In practice, the
hydropathy profile search tool will be more useful when
applied to an actively growing proteome database that is
not yet well annotated The most important use for the
technique will be to search for new members of established
functional families of membrane proteins, especially those
that are missed by standard sequence-based search
tech-niques We have demonstrated how this can be achieved for
LGICs, and for neurotransmitter symporters Other
candi-date families include voltage-gated ion channels, G-protein
coupled receptors, connexins and a wide variety of
trans-porters
A C K N O W L E D G E M E N T S
This work was supported by a Senior Research Fellowship from the
Australian Research Council (J D C.) and an Australian
Postgradu-ate Award (R E M.).
R E F E R E N C E S
1 Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B.C &
Herrmann, R (1996) Complete sequence analysis of the genome
of the bacterium Mycoplasma pneumoniae Nucleic Acids Res 24,
4420–4449.
2 Frishman, D & Mewes, H.W (1997) Protein structural classes in five complete genomes Nat.Struct.Biol.4, 626–628.
3 Wallin, E & von Heijne, G (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms Protein Sci 7, 1029–1038.
4 Deisenhofer, J., Remington, S.J & Steigemann, W (1985) Experience with various techniques for the refinement of protein structures Methods Enzymol 115, 303–323.
5 Kyte, J & Doolittle, R.F (1982) A simple method for displaying the hydropathic character of a protein J.Mol.Biol.157, 105–132.
6 Engelman, D.M., Steitz, T.A & Goldman, A (1986) Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins Annu.Rev.Biophys.Biophys.Chem.15, 321–353.
7 Jones, D.T., Taylor, W.R & Thornton, J.M (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology Biochemistry 33, 3038–3049.
8 Rost, B., Casadio, R., Fariselli, P & Sander, C (1995) Trans-membrane helices predicted at 95% accuracy Protein Sci 4, 521–533.
9 Cserzo, M., Wallin, E., Simon, I., von Heijne, G & Elofsson, A (1997) Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method Protein Eng 10, 673–676.
10 Sonnhammer, E.L., von Heijne, G & Krogh, A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences Proc.Int.Conf.Intell.Syst.Mol.Biol.
11 Tusnady, G.E & Simon, I (1998) Principles governing amino acid composition of integral membrane proteins: application to topology prediction J.Mol.Biol.283, 489–506.
12 Krogh, A., Larsson, B., von Heijne, G & Sonnhammer, E.L (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes J.Mol.Biol.
305, 567–580.
13 Moller, S., Croning, M.D & Apweiler, R (2001) Evaluation of methods for the prediction of membrane spanning regions Bioinformatics 17, 646–653.
14 Hille, B (1992) Ionic Channels of Excitable Membranes, 2nd edn Sinauer Associates, Sunderland, MA.
15 Le Novere, N & Changeux, J.P (2001) LGICdb: the ligand-gated ion channel database Nucleic Acids Res 29, 294–295.
16 Landesman, Y., White, T.W., Starich, T.A., Shaw, J.E., Goodenough, D.A & Paul, D.L ( 1999) Innexin-3 forms connexin-like intercellular channels J.Cell Sci.112, 2391–2396.
17 Unger, V.M., Kumar, N.M., Gilula, N.B & Yeager, M (1999) Three-dimensional structure of a recombinant gap junction membrane channel Science 283, 1176–1180.
18 Bennett, M.V., Barrio, L.C., Bargiello, T.A., Spray, D.C., Hertzberg, E & Saez, J.C (1991) Gap junctions: new tools, new answers, new questions Neuron 6, 305–320.
19 Ganfornina, M.D., Sanchez, D., Herrera, M & Bastiani, M.J (1999) Developmental expression and molecular characterization
of two gap junction channel proteins expressed during embry-ogenesis in the grasshopper Schistocerca americana Dev.Genet.
24, 137–150.
20 Lolkema, J.S & Slotboom, D.J (1998) Estimation of structural similarity of membrane proteins by hydropathy profile alignment Mol.Membr.Biol.15, 33–42.
21 Lolkema, J.S & Slotboom, D.J (1998) Hydropathy profile alignment: a tool to search for structural homologues of mem-brane proteins FEMS Microbiol.Rev.22, 305–322.
22 Wood, M.W., VanDongen, H.M & VanDongen, A.M (1995) Structural conservation of ion conduction pathways in K channels and glutamate receptors Proc.Natl.Acad.Sci.USA 92, 4882– 4886.