Báo cáo Y học: Identiﬁcation of novel membrane proteins by searching for patterns in hydropathy proﬁles potx

To conduct a search, each member of a polypeptide database is converted to a hydropathy proﬁle, peaks are automatically detected, and the pattern of peaks is compared with a tem-plate..

Trang 1

Identification of novel membrane proteins by searching for patterns

in hydropathy profiles

John D Clements and Rowena E Martin

School of Biochemistry and Molecular Biology, Australian National University, Canberra, Australia

A technique has been developed to search a proteome

database for new members of a functional class of

mem-brane protein It takes advantage of the highly conserved

secondary structure of functionally related membrane

proteins Such proteins typically have the same number of

transmembrane domains located at similar relative positions

in their polypeptide sequence This gives rise to a

charac-teristic pattern of peaks in their hydropathy proﬁles To

conduct a search, each member of a polypeptide database is

converted to a hydropathy proﬁle, peaks are automatically

detected, and the pattern of peaks is compared with a

tem-plate A template was designed for the acetylcholine (ACh)

and glycine receptors of the cys-loop receptor superfamily

The key feature was a closely spaced triplet of hydropathy

peaks bracketed by deep valleys When applied to the human

proteome the search procedure retrieved 153 proﬁles with a

receptor-like triplet of peaks The approach was highly

selective with 70% of the retrieved profiles annotated as known or putative receptors These included ACh, glycine, c-amino butyric acid and seretonin receptors, which are all related by sequence However, ionotropic glutamate recep-tors, which have almost no sequence homology with ACh receptors, were also retrieved Thus, the strategy can find members of a functional class that cannot be identified by sequence alignment To demonstrate that the strategy can easily be extended to other membrane protein families, a template was developed for the neurotransmitter/Na+ symporter family, and similar results were obtained This approach should prove a useful adjunct to sequence-based retrieval tools when searching for novel membrane proteins Keywords: hydropathy profile; integral membrane protein; ligand-gated channel; neurotransmitter receptor; proteo-mics; transporter

Integral membrane proteins are responsible for the majority

of interactions between a cell and its external environment

Approximately 20% of the genes in animal, plant, yeast and

bacteria genomes encode integral membrane proteins,

consistent with their fundamental importance to cellular

function [1–3] Transmembrane a-helices are encoded by a

long stretch of predominantly hydrophobic residues

(typic-ally15–19), which is sufﬁcient to cross the hydrophobic

region of the membrane bilayer (2.5 nm) [4] The

pronounced compositional bias arises because these residues

must be capable of hydrophobic interactions with the lipid

environment in the interior of the membrane Most

membrane-associated domains produce an easily identiﬁed

peak in the hydropathy proﬁle of the polypeptide Standard

software tools are available that can identify the putative

transmembrane domains of a membrane protein based on

its hydropathy proﬁle [5,6] Sophisticated algorithms that

combine hydropathy and sequence analysis can predict up

to 95% of transmembrane helices [7–12], but simple hydropathy peak detection strategies are also very effective [13]

The primary function of most membrane proteins is to transfer molecules, ions or signals between the exterior and interior of a cell, or subcellular compartment, and trans-membrane domains provide the physical conduit for the transfer Typically, several transmembrane domains com-bine to form a tightly coupled structure that is intimately involved in the function of the protein [14] It follows that the number and the pattern of transmembrane domains will

be strongly conserved within a functionally related family Protein families within which secondary structure is highly conserved include neurotransmitter receptors, voltage-gated channels, connexins and transporters (Fig 1)

The majority of neurotransmitter-activated channels can

be assigned either to the glutamate cationic receptor (iGluR) superfamily, or the cys-loop receptor superfamily, which includes acetylcholine (ACh), glycine, c-amino butyric acid (GABA) and serotonin receptors [15] Channels from both superfamilies are formed from subunits that have four membrane-associated domains These four domains are organized as a cluster of three closely spaced domains near the centre of the polypeptide, and a fourth well separated domain close to the C-terminal end of the polypeptide (Fig 1A) [14] Despite the similarity of their secondary structure, there is almost no sequence homology between the two superfamilies

Neuronal voltage-gated Na+, Ca2+ and K+ channel families diverged from a common ancestor long ago and there is very little sequence homology between the families, yet all three have retained a similar secondary structure

Correspondence to J Clements, School of Biochemistry and Molecular

Biology, Australian National University, Canberra, ACT 0200,

Australia Fax: + 61 26125 0313, Tel.: + 61 26125 3465,

E-mail: John.Clements@anu.edu.au

Abbreviations: ACh, acetylcholine; AchR, acetylcholine receptor;

GABA, c-amino butyric acid; HU, hydrophobicity unit; LGIC,

ligand-gated ion channels; iGluR, glutamate cationic receptor;

NMDA, N-methyl- D -aspartate; AMPA,

a-amino-3-hydroxy-5-methyl-4-isoxazole propionate; GlyR, glycine receptor; NSS,

neurotransmitter/Na + symporter.

(Received 31 December 2001, revised 18 February 2002,

accepted 27 February 2002)

Trang 2

They are formed from four subunits, each containing six

membrane-associated domains (Fig 1B) [14] In

voltage-gated Na+and Ca2+channels the four subunits are linked

together as a single protein with a series of internal repeats

In the case of voltage-gated K+channels the subunits are

expressed as separate proteins, and the channel forms as a

tetramer of these subunits (Fig 1B) [14]

Two separate families of membrane proteins form

gap-junctions between mammalian cells (connexins), and

between invertebrate cells (innexins) There is negligible

sequence homology between these families, but they share a

similar secondary structure Subunits of both connexins and

innexins contain four transmembrane domains, and

com-bine to form dodecamers [14,16–19] In contrast to

ligand-gated channels, the four transmembrane domains of

connexin and innexin are organized into two closely spaced

pairs, which are separated by an intracellular hydrophilic loop (Fig 2D) Many other functionally related protein families have been identiﬁed where secondary structural features are better conserved than the underlying amino acid sequences [20,21]

Despite clear evidence for conservation of secondary structure, little systematic use has been made of structural information in proteomic analysis Most genomic software

Fig 1 Schematic diagram showing that the pattern of transmembrane

domains is conserved within a functional class of membrane protein.

(A) LGICs typically have a closely spaced cluster of three

transmem-brane domains (dark bars) and a fourth well-separated domain This

secondary structure is conserved across the cys-loop superfamily and

the iGluR superfamily, even though there is no sequence homology

between these families Selected subunits from both families are shown.

(B) Distantly related voltage-gated channels also exhibit a

character-istic pattern of transmembrane domains Channels are formed by four

groups of six transmembrane domains Within each group, the ﬁrst

ﬁve transmembrane domains are closely spaced, with the sixth domain

separated by a relatively long extracellular loop.

Fig 2 The highly conserved secondary structure of LGICs is reﬂected in

a characteristic pattern of peaks in their hydropathy proﬁles (A) The hydropathy proﬁle of the human AChR alpha-1 subunit reveals a typical cluster of three peaks bracketed by deep valleys The peak, base and valley threshold levels used by the search algorithm are shown as horizontal dashed lines Peaks located at < 20 residues are likely to be

a cleaved signal sequences and are ignored (B,C) A similar pattern of peaks and valleys is seen in the proﬁles of the GABA A receptor alpha-1 subunit and glutamate receptor GluR1 subunit (D) A human conn-exin subunit also exhibits four hydropathy peaks, but they are organized in a diﬀerent pattern The peaks occur in two pairs separated

by a deep valley.

Trang 3

packages can generate a hydropathy proﬁle from an

amino-acid sequence, but in general they only permit one or a few

proﬁles to be generated at a time The resulting hydropathy

proﬁles are typically examined by eye for signiﬁcant

features Efforts have been made to improve and automate

this process For example, the web-based programs

TMPRED,TMHMMandMEMSTATidentify and count putative

transmembrane helices, and suggest their orientation in the

membrane [7–11] These programs are effective when

applied to individual amino acid sequences, but no software

tools are available to automatically analyse the pattern of

putative transmembrane domains (secondary structure)

A method for alignment of hydropathy proﬁles has been

developed [20,21], and an experimental web-based server

uses this approach to align pairs of sequences submitted

by the user, or to search a database for hydropathy proﬁles

that match a submitted sequence (Bioinformatics Unit,

Weizmann Institute of Science) At present, it is limited to

the SwissProt database, and to Hopp–Woods, or Kyte–

Doolittle hydrophobicity scales In principle, this approach

can be used to search for proteins with conserved secondary

structure, but there are technical issues that limit its

performance For example, a proﬁle with a similar pattern

of peaks, but differently shaped peaks and valleys may be

missed It is equally sensitive to mismatches in both peak

(transmembrane) and valley (intra- and extracellular loop)

regions, even though evolutionary changes in valley shape

will have relatively little effect on secondary structure

In this paper we develop and test a new automated

proteome search technique Every member of a polypeptide

database is converted to a hydropathy proﬁle, hydropathy

peaks are automatically detected, and the pattern of peaks is

compared with a template Sequences that match the

template are output to a new database, and their proﬁles

are displayed in a convenient format This approach can be

used to search for new members of a family or functional

class of membrane protein It can assist with functional

analysis, and may also be useful in proteome database

annotation

M E T H O D S

An algorithm was developed for searching a large

polypep-tide sequence database for proteins that are likely to be new

members of a functionally related family of membrane

proteins The program runs on a personal computer, and

the analysis of an organism’s total proteome takes about

1 min The test is applied to the hydropathy proﬁle of each

sequence A standard (Kyte–Doolittle) algorithm [5,6] is

used to convert a sequence into a proﬁle The amino acids

are each assigned a hydropathy value based on experimental

measures, and the resulting proﬁle is ﬁltered to reduce noise

We chose a set of hydropathy values and a ﬁlter width that

are near-optimal for detection of transmembrane regions

[6] The ﬁlter function is a rectangular averaging window

(box-car ﬁlter) with a length of 17 amino acid residues With

these settings, the amplitude of the peak produced by a

transmembrane a-helix is typically in the range 1–3

hydro-phobicity units (HU) (Fig 2) For example, the four

transmembrane domains are clearly visible in the

hydro-pathy proﬁles of three different ligand-gated ion channels

(LGICs) (Fig 2A–C) and the connexin alpha-1 subunit

(Fig 2D)

Peak detection Each polypeptide sequence in a database is subject to a series of three tests The first test simply rejects the sequence if it is too short or too long The range of acceptable lengths is determined from known members of the membrane protein family, but this restriction can be relaxed if necessary Membrane proteins always have both hydrophobic and hydrophilic regions, so profiles that do not cross both an upper and lower threshold are also rejected These thresholds are the same as those used for peak detection (Fig 2) Next, a simple peak-detection procedure is applied to each hydropathy profile, resulting

in an estimate of the number and the locations of putative transmembrane helices The algorithm identiﬁes a peak when the proﬁle rises from below a base threshold, crosses above a peak detection threshold, then crosses back below both the peak and base thresholds In Fig 2, the peak and base thresholds are indicated with the upper two dashed lines

1 Different threshold settings are used depending on the target protein For example, the base threshold selected for LGICs is higher than for connexins (Figs 2A–D) The location and amplitude of each peak is measured at the maximum point between the two peak threshold crossings The width of each peak is measured between the two base threshold crossings This gives a more consistent result than measuring the width at the peak threshold level The location and amplitude of each valley minimum is also measured

Comparing a profile to a template After the peaks and valleys are identiﬁed, a test is performed to determine whether they conform to a template The simplest test is to count the peaks and ask whether this number falls within a speciﬁed range The peak count may be adjusted by rejecting narrow peaks, or

by counting a broad peak as two merged peaks For example, when the base threshold is set below zero, the majority of transmembrane regions will produce a peak that is wider than 10 residues If the width of a peak is

> 30 residues it is possible that two or more closely spaced transmembrane regions have produced a single peak in the hydropathy proﬁle A peak located within the ﬁrst 20 residues is likely to be a cleaved signal sequence (destined

in most cases to be cleaved from the mature protein), and can optionally be removed from the peak count (Fig 2A) Sometimes a false hydropathy peak is detected at a location that is not a transmembrane domain, and true transmembrane peaks are occasionally missed Thus, when searching for proteins with four transmembrane domains,

a proﬁle with three to ﬁve peaks would typically be accepted

If the number of peaks falls within the specified range, then more sophisticated template-matching tests can be applied For example, the separation between adjacent peaks (interpeak intervals) can be calculated A candidate profile can be rejected if the interpeak intervals fall outside the specified ranges Another strategy is to scan for a particular feature, such as a closely spaced cluster

of peaks bracketed by deep valleys A strategy of this type is developed below for detecting ligand-gated ion channels

Trang 4

Designing and refining a template

When designing a search strategy, the peak detection

thresholds and the selection parameters are adjusted with

the dual goals of maximizing detection and minimizing

false-positives The ﬁrst goal is achieved by applying the

algorithm to a sequence database containing all proteins

that belong to the family of interest The parameters are

reﬁned by trial and error until almost all members of the

family are selected Next the same set of search parameters

is applied to a database containing unrelated membrane

protein sequences If necessary, the parameters are

ﬁne-tuned until all members of the unrelated family are rejected

Finally, the search procedure is applied to a large database,

for example one containing the proteome of an organism

The search algorithm and several related utilities were

written using a development environment that is built

into AxoGraph (Axon Instruments, CA), a scientiﬁc

data analysis and graphics program for Macintosh

com-puters (http://www.axon.com/CN_AxoGraph4.html) The

AxoGraph plug-in programs that implement the search

algorithm are available on request, or from http://

johnc3.anu.edu.au/proteomic_plugins.sea AxoGraph was

chosen for this study because it can plot and overlay several

thousand hydropathy proﬁles in a single window, and

analyse them in a single operation It also has convenient

features for browsing and organizing the large number of

proﬁles generated by the search algorithm

R E S U L T S

A search strategy was designed for LGICs The strategy was

reﬁned by applying it to custom polypeptide databases, and

tested by applying it to a database containing the complete

human proteome This database was chosen because it is

well annotated, which aids in the assessment of the

algorithm’s performance The results presented below are

essentially a proof of concept In general, this technique will

be more useful when applied to a database that is not

complete or well annotated

Search strategy for LGICs

The following procedure was used to develop the search

strategy for LGICs First, a custom database containing

two members of the cys

constructed ACh receptors (AChRs) and glycine

recep-tors (GlyRs) were selected using a text search of the

Entrez database Truncated sequences, duplicate

sequenc-es and sequencsequenc-es that were not LGICs were removed

manually This left 119 unique, full-length sequences

from many different animal species (including human,

chicken, frog, ﬁsh, locust, fruit-ﬂy and nematode); these

were converted to hydropathy proﬁles in AxoGraph

Features common to all of the proﬁles were identiﬁed by

eye AxoGraph’s convenient browsing features aided in

this task Every proﬁle had a cluster of three peaks

located approximately 200–300 residues from the start of

the sequence (Fig 2A) Each of the three peaks had an

amplitude of 1–2 HU, and the cluster of peaks was

bracketed with deep valleys extending below )2.5 HU

The cluster of three peaks was followed by a fourth peak

close to the end of the proﬁle

Based on these observations, and following a period of trial-and-error reﬁnement, the following selection criteria were chosen Only sequences with lengths between 300 and

1800 were accepted A peak threshold of 1.1 HU and a base threshold of 0.8 HU reliably detected all four peaks in every proﬁle However, some of the peaks were measured as very narrow (only two residues) because the base threshold was set relatively high Therefore, narrow peaks were not rejected A putative transmembrane domain occasionally appeared as two narrow peaks Therefore, a pair of peaks separated by fewer than six residues were counted as a single peak We noted that the ﬁrst and last peaks in the characteristic cluster of peaks were separated by between

55 and 66 residues Thus, the template criterion for a LGIC was the presence of a cluster of three peaks separated by between 50 and 75 residues, bounded by deep valleys of

<)2.5 HU The cluster had to be followed by at least one additional peak, but no more than three peaks

Testing the LGIC search strategy

A search of the AChR and GlyR database using the above detection criteria correctly retrieved every one of the 119 proﬁles Thus, the search strategy exhibits excellent sensi-tivity, as it was able to detect 100% of known GlyR and AChR across a range of species

The accuracy and sensitivity of the search strategy were tested by applying it to a custom database containing GABAA receptor sequences retrieved via a text search of the Entrez database GABAAreceptors are also members of the cys-loop superfamily, but they were not used during the selection and tuning of the search parameters The algo-rithm retrieved 39 out of 41 sequences (95%), demonstra-ting excellent sensitivity for proteins that are related in both function and sequence to the target group

Next, the selectivity of the search strategy was examined

We chose two families of integral membrane proteins which are functionally distinct from LGICs, but which also have four transmembrane domains A custom database of known and putative connexins and innexins was construc-ted using a series of text searches of the Entrez database The search algorithm was applied to the database and retrieved only one out of 122 sequences Thus, the LGIC search strategy exhibits good selectivity

The entire human proteome (Entrez) was searched and

153 proﬁles with a receptor-like triplet of peaks were retrieved Of these, 105 (70%) were annotated as known or putative receptors As expected, many of these were GlyR

or AChR (31) Other members of the cys-loop superfamily were also identiﬁed, including receptors for GABA (18) and seretonin (5) Of particular note, 13 members of the iGluR superfamily were also retrieved, including the

N-methyl-D-aspartate (NMDA) and kainate receptor subtypes Thus, the search algorithm succeeded in its central goal of identifying proteins that were functionally related to the target group (GlyR and AChR), but were not related by sequence homology

Of the proﬁles that were not annotated as receptors, six were voltage-gated potassium channels and two were transporters They were retrieved because they contained six

or seven transmembrane domains, three of which formed a cluster separated by deep valleys (Fig 3A) It was noted that the valleys between the triplet peaks were usually

Trang 5

deeper for potassium channels and transporters than for LGICs The receptor detection algorithm was reﬁned

to eliminate profiles where the deeper of the two valleys between the triplet peaks extended below )1.5 HU This refined algorithm was still able to detect 99% of known GlyR and AChR It retrieved 87 profiles from the human proteome, of which 90% were receptors Although this refined search procedure increased the selectivity for recep-tors, it also failed to retrieve any iGluRs This illustrates the inevitable trade-off between the selectivity of the search algorithm and the likelihood of detecting distantly related functional homologues

The search strategy’s sensitivity to membrane proteins that were related to the target group by function but not

by sequence, was investigated further A custom database containing 84 sequences from the iGluR superfamily was constructed using Entrez It included the NMDA, kainate and a-amino-3-hydroxy-5-methyl-4-isoxazole propionate (AMPA) receptor subtypes These receptors are function-ally related to GlyRs and AChRs, but share almost no sequence homology Also, iGluRs are thought to form tetrameric channels, in contrast with the cys-loop super-family that forms pentameric channels Despite these differences, the search algorithm retrieved 30 sequences (36%) from the iGluR database By subtype, 90% of the kainate receptors in the database were detected, but only 36% of the NMDA receptors, and 1% of the AMPA receptors Examination of the AMPA receptor hydropathy proﬁles revealed that the peak associated with their second membrane-associated domain did not reach the peak threshold in most cases A small reduction in this threshold would have resulted in many more AMPA and NMDA receptors being retrieved Nevertheless, these results dem-onstrate the remarkable sensitivity of the original search strategy for membrane proteins that are related to AChRs only by function

Candidate LGICs retrieved by the search strategy Four proteins with receptor-like profiles from the second search were annotated as having no known or putative function In principle, these could be novel receptors, so we examined them in greater detail The profile with accession number AAF86374 is a member of the ancient conserved domain protein family (ACDP), which has sequence elements conserved from nematode to human Intriguingly, its secondary structure is very similar to that of a LGIC, with a clear triplet of peaks followed by a well-separated fourth peak (Fig 3B) It has a shorter section preceding the triplet than a typical receptor, but it is reasonable to speculate that it is membrane protein, and possibly an ancient ion channel or receptor The next two profiles came from an uncharacterized membrane protein expressed in the hypothalamus (accession numbers NP_060945 and AAG09678) These proteins had six or possibly seven transmembrane domains and are unlikely to be receptors, but could be novel transporters or voltage-gated channel subunits (Fig 3C) The profile BAA18909 is simply anno-tated ÔunknownÕ, but aBLASTsearch revealed weak homol-ogy with a section of an intrinsic factor-vitamin B12 receptor The profile is quite similar to a typical LGIC, although a small narrow peak precedes the main triplet (Fig 3D) These findings demonstrate how the hydropathy

Fig 3 Hydropathy proﬁles of four proteins that were retrieved from the

human proteome by a search strategy designed to detect LGICs, but were

not annotated as receptors (A) A voltage-gated potassium channel was

incorrectly retrieved because its ﬁrst two hydropathy peaks fell just

below the detection threshold Potassium channels typically have a

cluster of ﬁve peaks followed but a sixth well-separated peak Note that

although only one peak following the valley is highlighted, the

tem-plate will accept up to three peaks (B) An ancient conserved domain

protein with no known function was retrieved because of its

receptor-like cluster of three transmembrane peaks bracketed by deep valleys.

The separation between the cluster and the fourth peak was larger than

for a typical LGIC, but otherwise the secondary structure is strikingly

similar (C) An uncharacterized hypothalamus protein is unlikely to be

a LGIC, despite the fact that it is expressed in a brain region It has two

or three extra peaks before and after the triplet, giving it a secondary

structure that has more in common with a voltage-gated channel or a

transporter (D) A retrieved protein that was simply annotated

ÔunknownÕ, but which has weak sequence homology with an intrinsic

factor-vitamin B12 receptor.

Trang 6

peak detection algorithm may be used to search for truly

novel members of a functional class of membrane proteins

Search strategy for neurotransmitter/Na+symporters

To demonstrate that our approach can be applied to other

functional classes of membrane protein, we developed a

search strategy for the neurotransmitter/Na+ symporter

(NSS) family A custom database was constructed

contain-ing 40 GABA and dopamine transporters, which have 10–

12 putative transmembrane domains The corresponding

peaks in the transporter proﬁles could be detected using a

peak threshold of 1.4 and a base threshold of 0.6 The

minimum peak width was set to 10, and peaks with a width

of up to 60 residues were accepted Proﬁles were accepted

only if they had between 10 and 13 peaks, arranged as a pair

of peaks, followed by a deep valley (<)1.9), then a cluster

of 8–11 peaks, extending over no more than 300 residues (Fig 4A,B) It is likely that the initial pair of peaks actually represents three transmembrane domains The second peak was typically 40 residues in width, and is probably produced

by two closely spaced transmembrane domains This search strategy identiﬁed all 40 of the targeted NSS transporter proﬁles

The entire human proteome (Entrez) was searched and 59 proﬁles with an NSS transporter-like pattern of peaks were retrieved Of these, 51 were annotated as known or putative transporters (86%) As expected, many of these were NSS transporters (54%), but several other transporters were also identiﬁed, including Na+/Ca2+ antiporters (9%), Na+/ glucose symporters (7%), K+/Cl)symporters (5%), Na+/ nucleoside transporters (3%), and organic ion transporters (3%) (Fig 4C) Thus, the search algorithm again succeeded

in identifying proteins that were functionally related to the target group, but were not related by sequence homology

D I S C U S S I O N

We have developed and tested an algorithm that can scan a large polypeptide database, and retrieve membrane proteins

on the basis of secondary structure rather than sequence homology The algorithm locates putative transmembrane domains in each sequence, and tests whether their spatial pattern matches a template In the past this process has been performed manually, by visual inspection of hydropathy plots generated one at a time Our major innovation was to automate the process, and apply it on the proteome scale A computer program performs the peak detection and tem-plate matching The complete proteome of an organism can

be scanned in about 1 min using a desktop personal computer This represents a qualitative increase in the power of the technique, and it permits new questions to be addressed An analogy may be drawn with modern sequence-based search programs, such as BLAST, which can scan multiple genomes Although it was directly based

on earlier sequence analysis programs that could align small groups of sequences, its development opened an entirely new ﬁeld

In principle, our technique could be extended by complementing hydropathy peak detection with a more sophisticated analysis of the underlying sequence [8–12] Several web-based programs use such an approach to improve the reliability with which transmembrane domains can be identiﬁed, and to predict topology Incorporating additional sequence analysis into our technique would permit an orientation to be assigned to each transmembrane a-helix, which would assist structural analysis However, the additional processing would substantially slow the search run, and it unclear how much improvement would be achieved in practice A recent study evaluated all of the current methods for predicting transmembrane domains, and foundTMHMMto be the best performing program [13] However, the standard Kyte–Doolittle algorithm, which forms the basis of our search technique, was a close

runner-up Some membrane proteins incorporate a hydrophobic pore-lining region that does not cross the membrane, but instead forms a beta hairpin structure that dips into the membrane then re-emerges on the same side [22] These membrane-associated domains represent an important component of the highly conserved secondary structure

Fig 4 The conserved secondary structure of neurotransmitter/Na+

symporters is reﬂected in a characteristic pattern of peaks in their

hydropathy proﬁles (A) The hydropathy proﬁle of a rat dopamine

symporter reveals a pair of peaks followed by a deep valley, then a

cluster of nine peaks The peak, base and valley threshold levels used

by the search algorithm are shown as horizontal dashed lines (B) A

similar pattern of peaks and valleys is seen in the proﬁle of a closely

related rat GABA symporter (C) A human Na + -independent organic

anion transporter retrieved by the NSS symporter template exhibits a

similar pattern of peaks, although it has no sequence homology with

the neurotransmitter symporters.

Trang 7

of voltage-gated potassium channels, and similar hairpin

structures may also be present in other membrane proteins

[22] A sophisticated a-helix-detection algorithm may reject

or misinterpret such regions

Our approach is loosely analogous with a strategy that

uses alignment of hydropathy proﬁles to search for

conserved secondary structural features in polypeptide

sequences [20,21] This alignment technique is based on

the same algorithm that is used in standard peptide and

nucleotide sequence alignment, but is applied to sequences

of hydropathy values Proﬁle alignment will generally

provide a more stringent test for conserved structure than

our template-matching approach However, a more

strin-gent test will be less likely to detect unusual or distantly

related family members For example, a LGIC containing a

triplet of unusually high hydropathy peaks will be reliably

detected by our approach, but will receive a low score in an

alignment-based search Another problematic issue for the

alignment algorithm is what penalty should be assigned

when introducing gaps into one or both proﬁles, and

how this penalty should be weighted for transmembrane

domains vs extra-membrane loops

We tested the performance of the hydropathy alignment

approach by submitting the sequence of the GlyR

alpha-1 subunit to the web-based search engine http://

bioinformatics.weizmann.ac.il/hydroph/, and analysing the

ﬁrst 200 sequences retrieved from the SwissProt database

Only 43% of these sequences were annotated as receptors,

and all were close relatives of AChR (ACh, glycine and

GABA receptors) No receptors for seretonin or glutamate

were identiﬁed Thus, hydropathy alignment is much less

sensitive to distantly related functional homologues, and less

selective for the membrane protein family of interest than

the template matching approach

We chose the human genome to test our search strategy,

because the thorough annotations permitted a detailed

assessment of the algorithm’s performance In practice, the

hydropathy proﬁle search tool will be more useful when

applied to an actively growing proteome database that is

not yet well annotated The most important use for the

technique will be to search for new members of established

functional families of membrane proteins, especially those

that are missed by standard sequence-based search

tech-niques We have demonstrated how this can be achieved for

LGICs, and for neurotransmitter symporters Other

candi-date families include voltage-gated ion channels, G-protein

coupled receptors, connexins and a wide variety of

trans-porters

A C K N O W L E D G E M E N T S

This work was supported by a Senior Research Fellowship from the

Australian Research Council (J D C.) and an Australian

Postgradu-ate Award (R E M.).

R E F E R E N C E S

1 Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B.C &

Herrmann, R (1996) Complete sequence analysis of the genome

of the bacterium Mycoplasma pneumoniae Nucleic Acids Res 24,

4420–4449.

2 Frishman, D & Mewes, H.W (1997) Protein structural classes in ﬁve complete genomes Nat.Struct.Biol.4, 626–628.

3 Wallin, E & von Heijne, G (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms Protein Sci 7, 1029–1038.

4 Deisenhofer, J., Remington, S.J & Steigemann, W (1985) Experience with various techniques for the reﬁnement of protein structures Methods Enzymol 115, 303–323.

5 Kyte, J & Doolittle, R.F (1982) A simple method for displaying the hydropathic character of a protein J.Mol.Biol.157, 105–132.

6 Engelman, D.M., Steitz, T.A & Goldman, A (1986) Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins Annu.Rev.Biophys.Biophys.Chem.15, 321–353.

7 Jones, D.T., Taylor, W.R & Thornton, J.M (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology Biochemistry 33, 3038–3049.

8 Rost, B., Casadio, R., Fariselli, P & Sander, C (1995) Trans-membrane helices predicted at 95% accuracy Protein Sci 4, 521–533.

9 Cserzo, M., Wallin, E., Simon, I., von Heijne, G & Elofsson, A (1997) Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method Protein Eng 10, 673–676.

10 Sonnhammer, E.L., von Heijne, G & Krogh, A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences Proc.Int.Conf.Intell.Syst.Mol.Biol.

11 Tusnady, G.E & Simon, I (1998) Principles governing amino acid composition of integral membrane proteins: application to topology prediction J.Mol.Biol.283, 489–506.

12 Krogh, A., Larsson, B., von Heijne, G & Sonnhammer, E.L (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes J.Mol.Biol.

305, 567–580.

13 Moller, S., Croning, M.D & Apweiler, R (2001) Evaluation of methods for the prediction of membrane spanning regions Bioinformatics 17, 646–653.

14 Hille, B (1992) Ionic Channels of Excitable Membranes, 2nd edn Sinauer Associates, Sunderland, MA.

15 Le Novere, N & Changeux, J.P (2001) LGICdb: the ligand-gated ion channel database Nucleic Acids Res 29, 294–295.

16 Landesman, Y., White, T.W., Starich, T.A., Shaw, J.E., Goodenough, D.A & Paul, D.L ( 1999) Innexin-3 forms connexin-like intercellular channels J.Cell Sci.112, 2391–2396.

17 Unger, V.M., Kumar, N.M., Gilula, N.B & Yeager, M (1999) Three-dimensional structure of a recombinant gap junction membrane channel Science 283, 1176–1180.

18 Bennett, M.V., Barrio, L.C., Bargiello, T.A., Spray, D.C., Hertzberg, E & Saez, J.C (1991) Gap junctions: new tools, new answers, new questions Neuron 6, 305–320.

19 Ganfornina, M.D., Sanchez, D., Herrera, M & Bastiani, M.J (1999) Developmental expression and molecular characterization

of two gap junction channel proteins expressed during embry-ogenesis in the grasshopper Schistocerca americana Dev.Genet.

24, 137–150.

20 Lolkema, J.S & Slotboom, D.J (1998) Estimation of structural similarity of membrane proteins by hydropathy proﬁle alignment Mol.Membr.Biol.15, 33–42.

21 Lolkema, J.S & Slotboom, D.J (1998) Hydropathy proﬁle alignment: a tool to search for structural homologues of mem-brane proteins FEMS Microbiol.Rev.22, 305–322.

22 Wood, M.W., VanDongen, H.M & VanDongen, A.M (1995) Structural conservation of ion conduction pathways in K channels and glutamate receptors Proc.Natl.Acad.Sci.USA 92, 4882– 4886.

Định dạng
Số trang	7
Dung lượng	246,13 KB