1. Trang chủ
  2. » Khoa Học Tự Nhiên

antibody engineering protocols

440 358 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Antibody Engineering Protocols
Tác giả George Johnson, Tai Te Wu, Elvin A Kabat
Trường học Humana Press Inc.
Chuyên ngành Immunology and Bioinformatics
Thể loại protocols
Năm xuất bản Unknown
Thành phố Totowa, NJ
Định dạng
Số trang 440
Dung lượng 26,85 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

DATABAS&~&OWLEDGE-BASED METHOD As the database of antibody structures has increased, the use of know- ledge-based methods in determining loop conformations has gained in importance 3,31

Trang 1

CHAPTER 1 SEQHUNT

A Program to Screen Aligned Nucleotide

and Amino Acid Sequences

and Elvin A Kabat

1 Introduction

We have been collecting nucleotide and amino acid sequences of pro- teins of immunological interest, and aligning them in order to understand the structure and function relations of these proteins (I) To aid in orga- nizing and analyzing this collection, a computer program, called SEQHUNT, was written

SEQHUNT uses a preprocessed form of the database as its search data SEQHUNT can pattern match nucleotide and amino acid sequences with the aligned data, pattern match phrases in the annotation fields of the sequences, and compare specified regions in similarly aligned sequences The SEQHUNT program can be used only on a machine with the PWPROPHET environment present and with the PL/PROPHET table representation of the database present To allow greater accessibility to the matching capabilities of the program, a partial implementation of SEQHUNT is available via electronic mail

2 Materials The variable and constant regions of immunoglobulins and T-cell receptors for antigen, and the various domains of MHC class I and class

II molecules have been aligned (I) These aligned sequences and

From Methods m Molecular B/ology, Vol 51: Ant/body Engmeenng Protocols

Edited by S Paul Humana Press Inc., Totowa, NJ

1

Trang 2

2 Johnson, Wu, and Kabat

sequences of related proteins (I), together with new sequences published recently, have been stored in the NIH-supported PROPHET computer system (2,3) in the form of PL/PROPHET data tables SEQHUNT uses this Kabat database (I) for its searching and region analysis

3 Methods SEQHUNT is a computer program written in PL/PROPHET for use in the PL/PROPHET environment The program performs three main types

of analyses The first is matching Given a nucleotide or amino acid sequence and restrictions on the number of allowable mismatches and data tables to search through, SEQHUNT will return aligned matches of all sequences with mismatches equal to or less than the allowable num- ber The second function of SEQHUNT allows searching for specified patterns in the sequence annotations Name, antibody specificity, T-cell receptor classification, and reference fields may be searched for the desired pattern Moreover, the full implementation of SEQHUNT allows region analysis of any one or a number of sequence stretches in similarly aligned sequences, e.g., all immunoglobulin heavy (H) chains The program queries for the given region, such as the entire light (L) chain variable region (positions l-107) or a combination of several com- plementarity determining regions (CDRs), for example, CDRLl, CDRL2, and CDRL3 together All sequences are called as search pat- terns, and the entire set of sequences is used as the search pool Redun- dant matching is eliminated to reduce output Any number of mismatches may be specified, although the output for mismatches above 1 or 2 is usually massive These three types of searches may be performed on nucleotide or amino acid data, and matching and annotation searches also may be performed on unaligned data

SEQHUNT, as written, must be called from the PL/PROPHET envi- ronment To allow greater access to the program, an interface has been developed that allows specially formatted queries to be sent via elec- tronic mail for processing The interface supports all functions of the original SEQHUNT, except region analysis

The nucleotide sequence pattern-matching capabilities of SEQHUNT are shown in Fig 1 In this example, the nucleotide sequence to match (TARGET SEQUENCE) is the H-chain variable region of the IgM

Trang 3

SE&HUNT

BALB/c murine monoclonal antibody (MAb) PRl (4), which has speci- ficity for the PRl antigen on human prostate cancer cells and normal human prostate cells, This SEQHUNT search was restricted to 12 or fewer mismatches among the sequences of all H-chain variable regions

of all species currently in the database In Fig 1, several sequences with 6,7, or 11 mismatches are shown They are listed in order of increasing mismatches An upper-case base is a mismatch, and all lower-case bases are matches Dashes are for alignment (I) To save space, several other sequences with fewer than 12 mismatches are not listed (see Notes 1 and

2 for other examples)

Figure 2 shows the results of a search of all H-chain variable regions for matches with a segment of the human D-minigene D2 (5) Human D-minigenes sometimes match segments other than the third CDR of human H chains (6) As shown in Fig 2, a segment of 14 nucleotides from human D2 is found in the second CDR of human, mouse, and rabbit H chains (see Note 3) For nucleotide sequences in the human CDRH3 region, additional matches are found on both sides of the 14 nucleotides RF-SJ2 matches human D2 for 24 bases, ttgtugtggtggtagctgctactc, and L42 for 28 bases, ggatattgtagtggtggtugctgctact The 14 matches in Fig 2 are under- lined Usually, only short segments of human D-minigenes are incor- porated into CDRH3s (7) When some of these short segments of the human D-minigenes, e.g., aactgg, a segment of DHQ52 (5), are searched, identical matches occur frequently over the entire H-chain variable region (Fig 3)

An example of antibody specificity searching is shown in Fig 4 for a SEQHUNT search called with the specified pattern “HIV,” the abbrevia- tion for Human Immunodeficiency Virus Only a few of the matches are shown The search was restricted to all H-chain variable region sequences

in the database SEQHUNT scans the antibody specificities and looks for exact matches with the “HIV” pattern This search, Fig 4, found anti- bodies directed against ~24, gp120, and gp4 1 Even for the same protein, the numbers of dashes in the last three lines of the sequences are differ- ent, indicating that the length of H-chain CDR3 can vary more exten- sively than those of CDRl and CDR2 (8) Most likely, the antibodies are directed toward different parts of one of the HIV proteins, Searches of the name and reference fields are also allowed

Trang 4

actstscyccatctc+rsagg"t~~-

~ttcatcatctccagtyncaac?rc- aaaaatacgctqtacctgcaastga- ycaasytyaysfctgaggaca~a~~- cctttattactytgcer?a

90 GytyAagctTCtCgsytctysa~- ytygcctggtycagcctapay~*t~- ,c;',e tcyattttayfag"taccgya- eaactctcct~tycsycct=a- toagt -tyyptccpgcaoor-

I tccayyyaaaygyctrgost~q~++-

gyagaaattsatcca -qata- ycaytacystaaactstscoccatc- tCt."."~lt."~ttC~tC~tCrCC-

gtyc"gcctcs- ggrttcgattttagtagatactgga-

tgayt -tgyytccygcapgc- tccaggg*aag

? gctagaatggatt- ygayaaatta* cc. -Qa+rl- gc*ytacgstsaactatacgccatc-

tctaaaygltsaattcstcatcttc- agAgacsscgccaaasatacgctut- acctycaaatgagcasaytgagatc- tgaggac.c~gccctttattaotgt-

UITI-AMINOFH- :NYL-BLTA-N ,cETYLGLocoS- MINIIDL A/F- L/P/34 IN!-L"- :NZA VIRUS

B0NILLA.F A - , ZAGHWANI, H- ,P"BIN,M 6- B0NAA.C (lP-

PO ) J IMMUNO-

&,145,616-6-

Trang 5

5 3A? I1 qaGqtqAaqct?CtCqsqtctqqaq-

qtqqcctggtqcaqcctqqaqqatc- cct

2 aaaGtctcct

tqaqt - tqqqtccggcsqqc- tcca999.aa9qqctaqaatggatt-

.$~aqa.mttaatcc. -qate,- gtacgCtaaactatGcqccatc- tctaa.qgataaattcstcatctcc-

? qaqcsaaqtqaaa ECIaI*at.CACtTzI tqaqgaoacnqccctttrttsctgt-

qca*ga

“ODSE II

11 gaGqtgAagcCTCtCgagtctgqaq- gtggcctqgtqcaqcctqqaqqatc- cctgaarctctcotgtqcaqcctca-

? gattcg*tttt*gtaqatactgga- q.qt -tgqqtccggcsggc- tcc~CqqaaaCqqctaqaatqqatt- qqaqaaattaatcca -gata-

RNTI-UORPHIN- E-6-HEMISUCC- INATE-BSA

mss1E.P H.,- ANCHIN,J.H ,- suBPAMANxAM, -

i 'EE;hk:cI

I M.D.S (lPY-

1) J.mOL- 146,4246-4-

557

Fig 1 Matching a nucleotide sequence of a H-chain variable region, positions l-94 The sequence, PRl, IS shown m row 1 labeled as TARGET SEQUENCE Some of the sequences in the database with 6,7, or 11 mismatches are listed m the order of increasing rmsmatches as shown in column 2 Names of these sequences are given in column 1 Columns 4-

8 indicate species, beginning position, endmg position, antibody specificity, and reference, respectively

Trang 6

FO”“Wt? Iw4E DIFFER SCQ”EtlCE SFECIES Er‘SIN END SFE’IPICITf PE~ER?NCE

SEQUENCE

) ST EXF l,El2 ,163,2007-2019

9tcl9ta-3=

RC”X,h II ,DRANARAJAN, F , GOTTSC- HALh, 7 , MCPOPMACK A T 5 PENSH-

~lV5R&;03~l~"' J.IMMUt1OL ,14F,-

BPIDOMA 47) J IMMDNOL ,139,24’=6-25Fl

Trang 7

G2, IGG4 RHCU- MATOIU FACTOR

$i992) J.IMtdJNOL ,148,32!+6

K1PPS.T J L DDFFY,S.F (lYYl)- J.CLIN INVEST ,87,2087-2096

Fig 2 Matching segments of D-minigenes The format of this figure is identical to that of Fig 1 The TARGET SEQUENCE pattern consists of 14 bp from the human D2minigene It matches identically to nucleotide segments in the CDRH2 region of 914 (mouse), RVH720 (rabbit) and 1819 (human) as shown in rows 2, 3, and 4 respectively It also matches human CDRH3 segments of RF-SJ2 and LA2 shown in rows 5 and 6 For details, see section 3.1

Trang 8

ECIFIC FOR TIE PRI- N,S., 26IJA-bAENER, - NCIFAL NBUTRALIZIN- 9 4 CAPRA,J D (l-

G DCHAIN OP gpl20 - 991) PROC NA?L.ACA-

OF lS4 AND MAP TO R- 97;$I.VSA,88,7783 ESIDVES XRIHI

IT'IS,T.H (1988) E-

VR J.IMMUNOL.,18,1- 843-1845

IBS,C.G.,EdTH,J - ALT,P." ‘ TUCKER:-

S,A.L ‘ CAPRA,J D-

TV K c ~OR!~TER I - (i991, J.EXP MtD -

Trang 9

8 L22 0 llC2tgg HUMAN 1OOE loo? ADTOANTIEODY KIPPS,T.J L DDFPY-

,S.F (1991) J.CLI- luvEsT.,e7,2087

IBODY AGAINST THB - CARSKY V.M HILL,B-

J.BIOL.CBEM ,267,- 5977-5984

Fig 3 Matching of a segment of the human D-minigene, DHQ52 (TARGET SEQUENCE) with different regions of mouse and human H chains

Trang 10

0wn.m NAME SEQUENCE SPECIES SPECIPICITY REIERENCE

BAT123 - gaa gtg c.g ctt tag gag tcg gga cct ggc ctg gt- MOUSE ANTI-GLYCOPR- LIOW,R -S ,ROSEN,E.H-

act ggc t-0 tu at0 *cc 'gt g.t tat gco tgg *c - OF HTLV-IIIb-

- tgg tc egg c.g ttt cc gg .*c ** ctg gag tgg - STRAIN OF H- N.C .,CHhG,k.T SUN C.,GORDON A CHAfW W- atg ggc tat t age - - - tat sgt ggt ogc o- IVTXFE 1 ,T (1 (1989) J.IMMU?i-

t *cc t.0 sac cc tot ctc *.a agt cg .tc tct ate - OL ,143,3967-3975 act cga

g agt to 8 ac gtg no act tee tct sag g*g 1.0 g.c otg acn ttc gee ttc ac ct.$ -at C.?.Z”I tgt gc agg ggg agt ttc gg gac - - - - -

- - - - - - - - - - tgg ggc a* - ggg act ctg gtc act gtc tot get

CB-mab-,x24/ - tag gtc I ctg tag gag tct ggg gg ggc tta gt-

13-S g eag ctt gg ggg tee ctg a ctc tee tgt go gee - MOUSE ANTI-P24 E PROTEIN COR- OF- KUTTNER,G E ,NIEMNN,E ,GIEBMANN,- ,WINKLE-

tct gg ttc act ttc agt ago t.t tn.2 atg tct - - HIV-1

I,& J , ROSEN, J , WAHRE-

c tat tat cc

? a0 l ct gtg a.g ggc cg ttc *cc at= -

g age agt ctg g tct g.g g.c c gee ttg tat t.c - tgt gc ag ct ccc ctt - - - - - - -

- - - - - - - - g*c i-*c tgg ggc a* - ggg ICC asg gto *cc gtc tee tea

0 S-BETA - tag gtt csg ctg 0-g ug tct ggg get a.0 ctg gt- MOIlSE

f

ANTI-FRINCIF- MATSVSRITA,S.,P!AEDA,-

at Cf .t gag - -

.tt gga *at ttt c.t cot - - t-0 agt gst gat c- 1 gpl20

c age cga tt .c tot gat g.c tot get gtt tat tat - tgt gc at c.c t.0 ggt agt gee t.0 got atg -

- - - - - - - - gac t*c tgg ggt aa -

gg *cc tc gtc *cc gtc tee tea

E 1 SFSCIFIC- S.,ZOLh,-FAkNER,3 b- tct gg t.0 *cc ttt C g.c C tgg tc ggc - - FOR TflG FRI- CAFRA,J D (1991) P-

atg ggg tc tc tat Cd - - grt g.c tct g.c g- - A,80,7703-77R7

c aca gtc gt ccg tc ttc c ggc tag gtc EC tc -

tu gee g.c arg tee cc gc CE gee t-0 ctg c.g tg-

0 *go gc ctg g go0 tog 9.0 CC gee at= tat trc - F T O RESIDW-

3 KRIHI - t.c ttt g.c t

gg *cc ctg gtc cc gtc tee tea

F g =*g "t? =w -g tct p7g=p~.~*w&3~ HtJnAN ANTI-HIV TYF- ANDRIS J.S JOMSON,-

tct ggc ttc tee ttc tc ICC t.c t.t ttt-cat 2- - FOR ~24 CAFRA,J D (1991) P-

*cc tn.2 to tag g tto tag ggc ag gtc gee stg -

*cc gg gac cg tee cq ago go gtg tat rtg g.g tt-

9 St get ago .gc 99 ctg gtt ag -g tct 999 gta stg g.0 g=c .cg egg a=t gee at tt tat ggg w- t-c - ,= - - - - - - - - tgg ggc ag -

gg tee ctg gttc ICC gtc tee tu

Trang 11

NonAN

a c.y cct 99c '99 tot cty '9 ctc tee tyt 9" 9ce - J L WKER,P /lPPO)-

,Eg?,~;,c%:":L$ ct t9g cc - ygg a.9 yaf myc ct9 'yt gE aay "yt tgg at- - ,4927

gyc tat ycg yac-tct yt9 a.9 9yc cq ttc *cc ntc - tee rya

0 a0 yoc a.9 c tee ct9 tat ot Cal at- 4.C a otg ay get yag gac aty 9cc tta ? at t-0 -

yt yt a yge s9 9.t t-c t*t g.t ayt 9yt 9gt tr-

t ttc acg ytt - - yet ttt 9.t at0 tyg 99c aa - ggy c .t9 ytc cc ytc tct tu

? 120-16 - ct.2 d9 c ct c*g 9.9 tee 99c tcr 99 ct9 gt- NOMAN ANTI-HIV TYP- ANDRIS,J.S.,JOMSON,-

g a.9 cct tc c.9 ac7.2 ctg tee ct.2 *cc tgc act ytc - L 1 SPECITIC- S.,ZOLLA-PAZNCR,S L- tct 9yt yap2 tee tc -9.2 Lyt gyt 999 tat gel? - - FOR ANINO A-

- tyy cc t9y ate 039 c.y CC Cal 999 a.9 99 ct9 - CIDS 644~663- REPE+i::d199::!D5I 9.9 tyg att ygm ttc ate t.t tat tat 9g* gyc yc OF gp41 n,se,7703-7707

c tct t.0 4.C coy tee ct.2 9.9 aqt cqr gtc *cc tty -

~4cZc e .C gcy act *cc tcx3 ycc I.9 yc9 4.C gac c 'cy ate ycc tee yt ctt tat c ? -at ct- - tyt 9cc a9 tot ttt 99 ytc t*t ttc - - -

- - - - - - t.c ctc gat ctc tyg 9yc qt - 9c CC ctg ytc tct yttc tee tea

Fig 4 Immunoglobulin H chains with anti-HIV activity This search looked for the pattern “HIV” in the antibody specificity field, as shown in column 4 Name, sequence, species, and reference for each sequence are listed in columns 1,2,3, and 5, respectively

Trang 12

12 Johnson, Wu, and Kabat

3.3 Region Analysis (see Notes 4 and 5)

Along with sequence, name, antibody specificity, and reference match- ing, the fully implemented SEQHUNT can perform region analysis on one or more stretches of sequence from any group of similarly aligned sequences Figure 5 shows the partial output of a region analysis for

all species This analysis was done allowing no mismatches Identical matches with the same specificity are represented by a single entry Sequences with no known specificity were omitted Figure 6 is the out- put of a region analysis done on the three CDRs (positions 24-34,50-56,

bol ( I ) delineates each CDR in consecutive order (CDRl I CDR2 I CDR3 I)

As in Fig 5, sequences with identical specificities are represented by a single entry, and those with unknown specificity are omitted The same CDR associated with antibodies having different specificities had been noted previously (9) Similar instances were found for CDRL2, CDRL3, CDRHl, and CDRH2 In the case of CDRH3, a given sequence is nearly always associated with a unique antibody specificity (8)

2 SEQHUNT is also useful for designing artificial antibodies with required specificities Searches for a desired specificity can be made, and all known antibodies with that spectficity can be found (see Fig 4) Sequences of these antibodies or then segments may be used as starting matertals for detailed designing (8) Thorough analyses of these sequences may provide some insight into the fine structures of interactron between antigen and antibody molecules

3 The questron of why segments of human D-minigenes occur at locatrons other than CDRHs in H-chain variable regions is unanswered (6) Even more puzzling IS the finding that these segments can appear in H chants of other species (see Frg 3) Without SEQHUNT, such occurrences might never have been found,

Trang 13

SE&HUNT 13

aga tct agt tag age ctt gta cad agt - aat gga sac act tat tta cat

9-40

ANTI-FLUORESCEIN (Ka=3 7X 1 OEXP7)

BEDZYK,W D ,HERRON,J N ,EDMUNDSON,A B & VOSS,E W ,JR (1990) J BIOL CHEM ,265,133-138 e.

ANTI-INFLUENZA VIRUS HEMAGGLUTININ HYBRIDOMA

CATON,A J ,BROWNLEE,G G ,STAUDT,L M & GERHARD,W (1986) EMBO J ,5, 1577-1587

ANTI-dsDNA (12%), ssDNA (90%), POLY(dT), POLY(dU)

SMITH,R G & VOSS,E W ,JR (1990) MOL IMMUNOL ,27,463-470

ANTI-DIGOXIN HYBRIDOMA(BINDMG CONSTANT=6 7XIOEXP9)

HUDSON,N W ,BRUCCOLERI,R E ,STEINRAUF,L K ,HAMILTON,J A ,MUDGETT-HUNTER.M &

MARGOLIESM N (1990) J,IMMUNOL ,145,2718-2724 (CHECKED WITH GENBANK 02/l l/91)

Trang 14

14 Johnson, Wu, and Kabat

VSl

ANTI-IGGI MONOCLONAL AUTOANTIBODY (RHEUMATOID FACTOR)

SHLOMCHIK,M ,NEMAZEE,D ,VAN SNICK,J & WEIGERT,M (1987) J EXP MED , 165.970-987 _

02

ANTI-IDIOTYPIC ANTIBODY AGAINST THE THYROTROPIN (TSH) RECEPTOR

TAUB,R,HSU,J-C,GARSKY,V M ,HILL,B L,ERLANGER,B F & KOHN,LD (1992)

J BIOL CHEM.,267,5977-5984

NQ19 16 37

ANTI-2-PHENYL OXAZOLONE HYBRJDOMA

BEREK,C ,JARVIS,J M & MILSTBIN,C (1987) EUR J IMMUNOL ,17,1121-i 129

Fig 6 Same as Fig 5, except that the regions searched are CDRLl, CDRL2, and CDRL3 together

4 Matching sequences in a given region or combination of regions (Figs 5 and 6) have provided a unique tool for studying the underlymg mecha- nisms of antibody specificity Based on the idea of random assortment of the six CDRs generating the antibody repertoire, a given CDR, e.g., CDRLl, should be associated with randomly assorted CDRL2, CDRL3, CDRHl, CDRH2, and CDRH3, which can lead to many different speci- ficities, as illustrated in Fig 5 However, CDRH3 seems exceptional, since

a given CDRH3 sequence is nearly always associated with a unique speci- ficity (9) Specificities associated with identical CDRLl, CDRL2, and CDRL3 together are more limited (see Fig 6) If ammo acid sequences are searched, more specificities will be found for a given CDR or combination

of CDRs

5 Our collection of nucleotide and amino acid sequences of proteins of immunological interest (I) is distinct from other databases because of the alignment of sequences This alignment is essential for the study of the structure and functions of these proteins SEQHUNT is the only computer program that can analyze this large aligned database Additional features will

be incorporated into the program when other analyses become important

References

1 Kabat, E A., Wu, T T., Perry, H M., Gottesman, K S., and Foeller, C (1991) Sequences ofProteins of Zmmunological Interest, 5th ed US Department of Health and Human Services, NIH Publication No 91-3242

2 Raub, W F (1974) The PROPHET system and resource sharing Fed Proc 33, 2390-2392

Trang 15

6 Wu, T T and Kabat, E A (1982) Fourteen nucleotides m the second complementarity-determining region of human heavy-chain variable region gene are identical with a sequence in a human D mimgene Proc Natl Acad Sci USA 79,503 l-5032

7 Taylor, L D., Carmack, C E., Schramm, S R., Mashayekh, R , Higgins, K M , Kuo, C-C., Woodhouse, C., Kay, R M., and Lonberg, N (1992) A transgenic mouse that expresses a diversity of human heavy and light chain immunoglobu- lins Nucleic Acids Res 20,6287-6295

8 Wu, T T., Johnson, G., and Kabat, E A (1993) Length distributions of CDRH3 in antibodies Proteins Struct Funct Genet 16, l-7

9 Wu, T T and Kabat, E A (1992) Possible use of similar framework region amino acid sequences between human and mouse immunoglobulms for humanizing mouse antibodies Mol Zmmunol 29, 1141-l 146

10 Wilson, M R., Middleton, D., and Warr, G W (1988) Immunoglobulin heavy- chain-variable region gene evolution: Structure and family relationships of two genes and pseudogenes in teleost fish Proc Natl Acad Sci USA 85,1566-l 570

11 Schroeder, H W., Jr., Hillson, J L., and Perlmutter, R M (1987) Early restriction

of the human antibody repertoire Science 238,79 l-793

Trang 17

of free and antigen-bound antibodies The number of reported antibody structures grows each year Yet the number of structures deposited with the Brookhaven Protein Database (PDB) (I) remains relatively small, with 43 deposited entries at the time of writing, when compared to the available sequence data It is therefore important to develop an effective method of predicting the structure of antibody-combining sites The validity of predicted structures can then be confirmed by mutagen- esis in the combining site The models can also provide valuable struc- tural information to “humanize” antibodies for therapy effectively, to develop immunosensors, and even for the complete de novo design of new antibodies with different functions

This chapter outlines the structure of antibodies and methods currently available for modeling antibodies An example of the modeling of an anti- N-(P-cyanophenyl)-N’-(diphenylemethyl)guanidineacetic acid antibody (1CGS) is also provided (2) The method used to model this antibody is based on the CAMAL algorithm (3-9) that combines structural and ab initio approaches to determine antibody structure, and is embodied in the commercial version of the program AbM (10)

From Methods m Molecular Biology, Vol 51 Ant/body Engmeenng Protocols

Edited by S Paul Humana Press Inc , Totowa, NJ

17

Trang 18

18 Webster and Rees

1.1.1 The Antibody Fold

Antibodies have a distinctive structure often depicted as a Y or T shape with the two distal arms (Fab) containing the sites for antigen binding (Fig 1) An antibody consists of two identical light (L) chains and two identical heavy (H) chains that fold into domains The structure of the now classic immunoglobulin fold was established with the determina- tion of the structure of a Fab fragment by Poljak and coworkers (II), and the presence of this fold in the F, fragment was shown by Deisenhofer and coworkers (12) This fold and its variants have also been observed in nonantibody molecules, including T-cell receptors The Fab contains a variable domain (V,/V,) and a constant domain (C,/C,l), with the two halves of each domain formed from the two H chains

Since the antibody contains two Fab arms, two antigen molecules may

be bound by the same antibody The constant domains for L and H chains are constant for their particular class L chains may be one of two classes,

K or h, and H chains, one of five classes, a, y, 6, E, and ~1 The two Fabs are attached to the F, region by a flexible hinge, giving the antibody an intrinsic flexibility

1.1.2 The Variable Domain

The variable domains (V,/V,) associate noncovalently to form a twisted antiparallel P-sheet structure Although the framework is well conserved between known antibody structures (Table 1A and B), varia- tions in the packing of P-sheets and strands do occur The sheets may vary in their orientation with respect to one another by as much as 30” f 18” (13,14), and the strands at the interface are inclined to each other by

the most variable regions of the antibody both in sequence and in struc- ture These regions are known as the hypervariable or complementarity determining regions (CDRs) Each CDR interconnects a P-strand, with three CDRs (Ll, L2, L3) derived from the L chain and three from the H chain (Hl, H2, H3) This interconnection of the antiparallel P-strands brings the CDRs close together in space at the distal end of the antibody Some or all of the CDR loops may be involved in antigen binding Three classes of antibody-combining-site topology are recognized: a cavity type that typically binds haptens, a groove type that binds peptides, carbohy- drates, or nucleic acids, and a planar type that binds proteins (16)

Trang 19

Antibody-Combining Sites 1 9

Fig 1 A cartoon of an IgG antibody displaying two L chains and two H chains These two chains fold to form a series of antiparallel P-sheet domains that are classed as either variable (V, and V,) or constant (C, and C,) Associ- ation of V, and VH domains gives rise to the F, or variable region The complementarity determining regions interspace the P-strands of this domain

A larger fragment is the Fab region, which contains the variable domain and a constant domain The circle depicting the Fab is merely illustrative since it does not include all of the Fab region The F, region consists of two constant domains The hinge and elbow angles for the Fab are different, illustrating potential flexibility in this region

2 Antibody Modeling

Antibody modeling has attracted increased interest over the past few years (3,8,9, I7-20), in part owing to an explosion in the number of pub- lished antibody sequences (21), a gradual increase in availability of good crystal structures (I), and availability of fast work stations capable of addressing problems encountered in molecular modeling The first

Trang 20

Table 1 (A) L-Cham and @) H-Chain Sequences of Ambodies Used for Framework Construction, Canonical Structure Construction, and Calculahon of Ca Database Constraintsa

nrvLTQsP*IusnsPG- SASSSVN Y-MY \tyFs DTsP,*: DTSmAS GvPvRPSGSGSGTSYSLTISSMETEn ‘mc QQe.xl- -W-T Pap&T KLBIKQA DwLTQSP*TLSvTFGNSvSLSc RASQSIGN -NLLI :i,@&#S IIgSP T‘Z&Yt’ YASQSIS OIPSRFSOSGSDTDFTtS~sv~~ a#my QQSNS WPYT PMIGIP KLBIKEA

I sIvLTQsP*ImAsLGQKvTITc SASSSVS -SLH : TtYQ@S GTSP;w$ EISKLAS GVPARFSGSGSGTSYSLTINTaEABD g&g&$ QQtiTY- -PLIT TwAm KLBLKRA

QSVLTPPPSASD-TXQRVTISC SGTSSNIG- _ - -ssTvN ‘?i&& MAP Ici&xY PDAMRPS GVPDRFSGSKSGASASLAIGGLQSBD %?]ovltc AAtmvsLNAw $ek@ KVTVLW

I,,

% \ II

PG?.q3 KLPIKRA

-SM?TQPASVSG-SWQSITvsc AGHTSDVA- - -DSNSIS WQ@S DFAP i&W

OVPSRFSGSOSOTDYSLTISNLEIlED ;%* PPGST- -LPRT AVTFRPS GIPIJZFSDSKSGVTASLTISCX,LPDD %%tWC’ BSYLS-DASFV pc68p Iorp\nRQ

oI~TQlTssLs*sLGDRVTIsc RASQDIYN- - - - -YLN : orpriagir DaTv e * YTSRLRS GVPSRFSGSGSGTDYSLTISNLNQED iSI&& QQGNP- -LPpT PGSSP KLBIFRA

SASSSW -YW ‘vIQpgp SSP’~

,,, ’

DIVMTQSPSSL-AGEFXTMSC TSSQSLFMGKQRNYLT i”j+& GQPP $7&k, UASTRES

,&&i’ GEAP tnrrti;r

GvPDBPPOSGSmDFTLTISs”QA~ &&&i;

‘i9”9T KLELKBA

QNDYS - -NPLT GIPQRFSSSTSGTWTLTISGVQA~~&@&& QAWDN SASI

<+T FLELKRA 5nQPPsvsv-sLcQ-rARIK SANALPNQ- - - -YAY KDTQRPS slwis iamzGQ

DIQHTpSPS’L%SASVcnRvTITC RASQSISR -WLA TtYF GWP #& KASSLBS OVPSRFSOSOSOTgPrrTISSLQ~D ,%., I ~ PAWS QQYNS YS ‘~~ FXDIKRT

DIWTpSPASLSAS”G~m: IuSGli~ -YI‘A p$g+$ GKSP $$iwY YTPTLAD GVPSRFSGSGSDTQYSLI[ISLQ*~ ,,@t%v QHFNS- -TPRT poc;m KLE1KP.A mm4TQTPLsLPvsLGDQ*srSC RSNQTILLS-DGDTYZB : itYE$+ GQSP XGLmZZ, KVSNRFS GVPDRFSGSGSGTDF~TXISRVE~BD L+ FQCSH- -VPpT - KLBIKRA

DI~TQIPSSLSASLGDRVSISC RASQDMN - -FL?4 : nrppKq DOTI S!&T.SY’ FTSRSQS GYPSRPSGSDS~DYSL~ISNLgPED X&ST PpQm- -LPRT ‘W KLEIKBA ASVLTQPPSVSG-APoQRVTISC TOSSWIG -AWC,VK ,w DTAPG&TZP INNA - - - - -RFSVSRSDTSA-ITGLQAED ZZJEYC QSYDR- -SLW

DASNLET DVPSRPSGSGSCTDI~T~SSLQ~~ nYXXt& QQYQN LPLT PaPaT FcvrIKRT

DI@IQSPASLSVSVGEIW RAsE?nYS -NLA q&yIQ CKSP &$& AATNLAD G”PSRPSOSCScTpYSLXnVSLQSSD T,y QHEWG- -TPYT P&e RLBIFXA nrvMTpsPssLTvTTGzKvTmc KSSQSLLNSRTQKNYLT w GQSP Xc&XT WASTRES

KASQDVST -At%’ ‘&&P GQSP P&&@#

G’.‘PD~F’K5CS~DF~SISOQA~ f,&%?kFJ QNWN- -YPLT %k& KLBLKRA

rJIvuTQsPKFMsTsvcD~vT1~ * WASTBIII GVPDRFADSDSGXWTLTISSVQAED B.&W QQHYS- -Pm .w+ KLBImA DrQiwQsPssLslsvGLlRvTITc ~SQDT.qT- .AVA +&@& cmp y& SASFLYS OVPSBFSOSRSOTDFrrTISSLQPED P&mC ? ,p&$ i % QQHYT -TNT -w KVBIKAT

* , DvvxTpTPLsLwsLGDQASIsc RSSQSLVHS -I- : onapla GQSP ?GETm KVSNRFS DVPDBFSOSOSOTDFrrIIS~V~~ ZGZrTy: SQlTH- -WET zrlYfs& KLBIKF!a

II, ,,l ”

Trang 21

Structure HFRl Hl HFR2 H2 HFR3 H3 HFR4

aSequence tracts m the framework regons used for fittmg of structures for RM S D calculations are h@&ghted In dark gray

lbaf (94), lbbd (9% 1bbJ (%), ldfb W’), lfdl (98), lfvc (99), lggl (IL@), 0111 (90) llgf (lOl), ligl: (85), 11gm (102), lmam (103), lmcw (I&), lncd (10% lrel (106), 2f@ (JOY), lfb4 and 2fb4 (108), 2fb~ (I@), 2hfl (86), 2mcp (IJO), 2rhe (III), 3hfm (112), 3mcg (113), 4fab (114), 6fab (115) 7fab (116), 8fab (117), glb2 (118)

Trang 22

Webster and Rees

attempts at antibody modeling were based on simple homology-based methods (22,23) Soon, rules began to emerge governing the conforma- tions of short antibody loops (24,25) Later, improvements were made in the accuracy of longer loops These methods can broadly be grouped into knowledge-based approaches (26-31) and ab initio-based approaches that include methods such as conformational searching (32-35), simu- lated annealing (36,37), multicopy sampling (38-40), and molecular dynamics (41,42)

2.1.1 Framework Construction

As outlined in Section 2.2., the antiparallel P-sheet structure forming the F, P-barrel framework is well conserved in structure Most modeling studies of antibody F,s have relied on the conserved nature of the p- barrel framework to construct a scaffold on which the CDR loops are built In this method, an F, P-barrel framework most identical in sequence

to the antibody structure being modeled is chosen as the starting struc- ture The method relies on the premise that the framework is conserved between different antibodies, but a recent analysis of a set of 12 antibody structures has revealed that the spatial orientation of all strands of the F, framework may not always be conserved (4) Strands l-6 are highly con- served, but strands 7 and 8 from the heavy chain are more variable It is interesting to note that these strands interconnect CDRH3, the most vari- able CDR loop in both sequence and structure Incorrect orientation of the framework strands can have important consequences for the construc- tion of CDR loops, particularly where strand orientation affects the take- off trajectories from the F, framework The AbM protocol described in Section 2.2.1 attempts to minimize this problem by selecting the most homologous light and heavy chain from the database of known struc- tures If these are not derived from the same antibody, the respective chains are fitted by their most homologous regions (see Table 1) to an averaged hyperboloid function derived from known antibody structures

2.1.2 CDR Construction Proteins are not static entities They exist m a fluid aqueous environ- ment in which backbone loops and side chains may adopt many well- packed conformations that are energetically feasible The conformations

of protein loops are dependent on their length, their packing with other loops or secondary structural elements, the formation of salt bridges,

Trang 23

Antibody-Combining Sites 23

covalent bonds, hydrogen bonds, and their interactions with solvent Many algorithms have been developed for the construction of protein loops, all of which are applicable to the modeling of CDR loops (26-42) CDRH3 presents unique problems because of the structural diversity of its loop conformations and takeoff angles from the F, framework Long H3 loops are particularly difficult to predict accurately Since most H3 loops fall into this category (9), much effort has been invested in devel- oping suitable construction methods to model this loop

2.1.2.1 CANONICAL LOOPS

Although the conformations of the CDR loops vary, a relationship between the structure of loop conformations and loop length has been noted (26,27) Chothia and coworkers (28,29,43-47, see also 48,49) have established the concept of “canonical families” for five of the six CDR loops (Ll-L3, Hl , and H2) Canonical loops are defined on the basis of their length, and the position of key residues in the loop and in the frame- work (see Table 2) The distribution of canonical loops among the vari- ous classes used in the modeling protocol is shown in Fig 2 Canonical loops adopt their conserved configuration as a result of their length, their packing with other CDR loops or part of the framework regions, the formation of conserved hydrogen bonding patterns, and even their abil- ity to adopt unusual backbone configurations Unfortunately, not all loops are canonical and these loops must be constructed by other means Useful as the canonical concept is, it has been noted that not all loops classified as canonical obey the “canonical rules” (4,50) CDRLl of HyHELlO (51) and REI (52), for example, both belong to canonical class

2 loops, but have different conformations owing to a 1+4 peptide flip between the central four residues (see ref 4 for further examples) Fur- thermore, there is the question of whether all possible pairings of canoni- cal loops are permitted in nature (6JO) It is possible that particular pairings of canonical loops may disrupt the association of V,-Vn domains, thus limiting the repertoire of canonical pairings

2.1.2.2 DATABAS&~&OWLEDGE-BASED METHOD

As the database of antibody structures has increased, the use of know- ledge-based methods in determining loop conformations has gained in importance (3,311 The advantage of knowledge-based approaches is twofold First, the starting structures are known to exist in nature, Sec-

Trang 24

Table 2 The Canonical Loops Ll, L2, L3, Hl, and H2 Defined

on the Basis of Length and the Occurrence of Canonical Residues at the Indicated Positions”

11glAfab 2fw,1fc4,2rhe Sfab

ZhR,2fbf,lbaf,glb2,3hfm,lre~.l~gm,2fl9,lman 6fab,l ncd,lfvc,2mcp.l h~l,lbM,4fab,l~gf,llgl

2fb4,lggl lbbf, ldfb, lfo4, lfdl, lmw3mcg

glb2,3hfm,lrs!,l~gm,2fl9,1mam.6fab,lfvc, 2mcp,lh~l.lbbd,4fab,l~gf,l~g~,lgg~, lbbf, lna lfdl ire!

m

2hR

2f19,1fd1,1~g1,1mam,2fb4,2fbf2hfl,2mcp,41ab 6tab,6fab,glb2,l!gm,l btd,lncd,ltgl.lfvc.l hll

1 bbf, ldfb 3hfm, 7fab lbaf

Qe first residue of the canomcal pattern corresponds to the first canonical in the hst of key residues Rendues in square brackets indicate sues where more than one residue type may be present, and x(n) denotes the number of residues hnking the preceding and succeeding residues Residues m braces { } mean “not this resrdue” m that positron The AbM (IO) numbering and Kabat et al (21) numbenng are shown for loops Ll-H2 The numbenng for H3 IS 101-l 19 (AbM) and 95-102 (Kabat) SeeTable

1 for references

Trang 25

Antibody-Combining Sites

I

Complementary etermining Regions

Fig 2 Distribution of the number of complementarlty determining loops falling into canonical Ll, L2, L3, Hl, and H2 classes Antibodies used for this analysis are listed in Table 1

ond, these approaches are computationally more efficient in saturating conformational space when compared with ab initio methods The method of Jones and Thirup (53), for example, identifies useful database loops by searching for loops that satisfy a set of a-carbon distance con- straints (Fig 3) This method is implemented in AbM and is described in more detail in Section 2.2.6 Other methods include those of Sutcliffe and coworkers (54), who use a high-resolution database to identify struc- turally conserved regions, and of Stanford and Wu (55) who generate backbones from an analysis of tripeptides from p-sheet proteins

2.1.2.3 AB INITZO METHODS

These methods provide an alternative way to saturate conformational space The generation of all possible loops may be accomplished by con-

Trang 26

26 Webster and Rees

Fig 3 Cartoon depicting the Ca distance constraints used for the database search m AbM The constraints are generated from the N- to C-terminal end and the reverse C- to N-terminal end

formational search methods (33,56-58) One such method is the program CONGEN developed by Bruccoleri and coworkers (33) Conformational space is searched by rotation about the backbone $ and w dihedral angles, generating large numbers of loop conformations The problem with this approach is that the number of loop conformations increases in size exponentially as the number of degrees of freedom (i.e., length of the loop) increases This method is only suitable, therefore, for small loops Conformational searching has two advantages over random methods, such as Monte Carlo, simulated annealing (36,37), or molecular dynam- ics methods (41,42) First, conformational search methods search on a regular grid (in some cases, this may be a disadvantage, since discrete regular search steps may miss some conformations), unlike dynamics or Monte Carlo methods, which sequentially perturb one conformation into another by small increments and, hence, may sample the same space many times Second, conformational searching does not entail the cost of determining energy derivatives as in molecular dynamics, and permits the examination of noncontinuous energy surfaces (33)

The ab initio methods generate multiple conformations that must be evaluated at some stage by an objective function, usually consisting of

an energy term A problem exists in that many low-energy conforma- tions are produced that differ only slightly from each other The develop- ment of force fields and the inclusion of solvation models in free-energy calculations is an area of continuing development (59-64)

Trang 27

Antibody-Combining Sites 27

2.1.2.4 COMBINED METHOD

A combined approach to the construction of antibody loops was pro- posed by Martin and coworkers (3,65) to overcome some of the deficien- cies of the above methods Loops are constructed using database methods and conformational searching is applied to the central region of the loop When insufficient loops are found using database methods, the number

of loops can be increased by conformational searching of the region of the loop that is most likely to be variable When the loops are long, con- formational searching alone is computationally intractable By first build- ing the base of the loop by database methods, the computational cost is kept within reasonable bounds This method is described in Section 2.2

The simplest and most cost-effective method of side-chain construc- tion has been the use of rotamer libraries, which depend on a statistical distribution of side-chain x angles from known protein structures Analy- sis of such distributions has shown that side-chain dihedral angles clus- ter around preferred x angles This property has been exploited in a number of side-chain rotamer libraries (66-70) Where the structure of the backbone is known, homologous templates can be used, overlaying

as many atoms as possible in the side chains that show correspondence (71,72) It should be noted that the accuracy of side-chain prediction is dependent on availability of good backbone coordinates With the increasing numbers of high-resolution structures in the PDB, rotamer libraries have been generated that take into account the preferred x angle distribution of side chains with respect to particular backbone conforma- tions (73-76) Reid and Thornton (77) used a rotamer library and per- formed manual adjustment of side-chain x angles from the preferred distributions when clashes occurred, followed by energy minimization The CONGEN method (33,78) used in the AbM modeling protocol (see Section 2.2.) searches conformational space by rotation about x angles

on a rotational grid concurrent with energy evaluation Approximate solutions to the placement of side chains have been achieved using opti- mization techniques, such as simulated annealing (79,80) Genetic algo- rithms (70) have been used to search conformational space Other approaches that attempt to include the influence of all side-chain posi- tions are the use of the dead-end theorem (81,82), molecular dynamics (83), and self-consistent mean field theory (84)

Trang 28

Webster and Rees

Surface side chains, unlike core side chains, present particular prob- lems since they are unlikely to adopt a unique conformation due to a lack

of packing constraints and their accessibility to solvent This is particu- larly true of the longer side-chain amino acids Consideration must also

be given to the packing of bulky hydrophobic residues found on the sur- faces of proteins often shielded from solvent by hydrophilic residues Objective functions to evaluate side-chain positions would ideally con- tain terms that include the effects of solvent and hydrophobicity (59-64)

2.2 Methods for Modeling Antibody 1CGS

2.2.1 Modeling According to AbM Protocol

The modeling presented here was done with the commercial version

of AbM ~2.0 (10) Where deviations occur in the research version of the modeling program, these will be noted in the text The AbM protocol takes a holistic view of available antibody construction methods and uti- lizes canonical structures, database, and conformational searching, or a combination of the database approach with conformational searching where appropriate This approach takes advantage of the wealth of crys- tallographic information and maintains the ability to saturate space using

2.2.2 Methods and Test Example

The anti-N-(P-cyanophenyl)-N’-(diphenylmethyl)guanidine acetic acid antibody (1CGS) (2) was chosen for modeling Its structure has been solved at an average resolution of 2.6 I$, and it contains different types of hypervariable loops that may be modeled using the canonical loop, data- base, and combined database/CONGEN approaches This antibody has been chosen here to demonstrate the need for an understanding of anti- body structure when interpreting the results of the automated assembly method The 1CGS antibody was not present in the database of struc- tures used for the construction of the model

2.2.3 Sequence Alignment

The variable domain sequences of the L and H chains of 1CGS are aligned against a database of antibody sequences that contains 29 L-chain sequences and 24 H-chain sequences (Table 1) Insertions are introduced into the CDR regions at positions of highest sequence variability The numbering scheme followed in this text is according to AbM, which dif- fers from that of Kabat and coworkers (21) The numbering scheme is

Trang 29

Antibody-Combining Sites 29

outlined in Table 2 Loop Ll is defined as six residues greater than the Kabat et al definition, having a maximum length of 17 residues CDRH2, with a maximum of 12 residues, is shorter than the Kabat et al H2 defini- tion, since residues 6 1-65 in H2 in the Kabat definition are conserved in structure The AbM protocol allows redefinition of CDR regions if required

2.2.4 Framework Construction

The interface of the V,/Vu framework is known to be well conserved Small variations in V,/V, orientations combined with differences in packing may cause large errors in the positioning of CDR loops on the framework, particularly when the incorrect orientation affects CDR take- off points on the F, L and H chains are chosen based on greatest homol- ogy from a database of antibody structures Where these chains are not derived from the same antibody, a fitting procedure is used to reconstruct

a new F, framework The L and H chains of the 1CGS antibody show greatest homology with the corresponding subunits derived from two different antibodies Therefore, the model framework for 1CGS was con- structed as a chimera of the L chain from ligi ([85]; 2.7-A resolution) and the H chain from 2hfl([86]; 2.54-A resolution) As outlined in Sec- tion 2.1.) the positioning of strands l-6 is well conserved The two cho- sen domains are least-square fitted to an averaged hyperboloid based on

12 antibody structures (4) using the most conserved regions of strands l-6 (Table 1) Strands 7 and 8 are not used in the fitting procedure These take their position in the model from the fitted database structure The sequence of the framework is adjusted to the model, and side chains are added using a template-based approach, where side-chain torsions of the model are adjusted to match the equivalent parent torsions of the crystal structure Nonequivalent atoms are positioned using the iterative conformational search program CONGEN

2.2.5 CDR Construction

Because of the conserved structure of canonical loops, they are given highest priority in the construction process and are generated first For short loops less than six residues in length, the conformational search method CONGEN is employed to saturate conformational space For loops of greater length, conformational searching becomes restrictive in computer time, and either the database or the combined method is used

If sufficient numbers of database loops are found for loops of six or seven residues in length, the database method is preferred because of its com-

Trang 30

30 Webster and Rees

putational efficiency For loops greater than seven residues in length, a combined approach may be used to construct the base of the loop using a database of structures, followed by reconstruction of the central portion

of the loop using the ab initio search method CONGEN Each of the methods (CONGEN, database, or combined method) overlaps in its use- ful range with other methods Thus, the choice of a particular method depends to some degree on the best judgment of the researcher Side chains are constructed using a template-based approach in which nonequivalent atoms are positioned using the conformational search algorithms of CONGEN The objective function of this procedure uses a solvent-modified energy term in which the electrostatic and attractive nonbond terms have been removed The force field used in the energy evaluation is the Eureka force field of Osguthorpe and coworkers, which

is derived from the CVFF force-field (87,88)

As described in Section 2.1.3., side chains in protein cores are well packed and can be evaluated adequately using a simple energy function However, surface side chains, owing to their lack of packing constraints and accessibility to solvent, may occupy many low-energy sites The use

of a simple energy term may not be appropriate in this case The packing

of bulky hydrophobic side chains is particularly important when they are found at the surface, since they are normally shielded from solvent

by hydrophilic side chains In the research version of AbM (and future releases of the software) (see ref 4), the use of Monte Carlo simulated annealing with an energetic function that contains a full nonbond term and a simple torsional term:

E = && [(rol+- 2(r&)12] + qcos(3m) (1)

Trang 31

Antibody-Combining Sites 31

starting with the canonical loops The remaining loops are generated from the outside of the combining site inward to the center, with CDRH3 always constructed last For antibody lCGS, CDR loop Ll was built onto the framework in the presence of the four canonical loop backbones The AbM protocol offers the option to include side chains from the other CDRs, though this was not done in the present model CDR loop H3 was built in the presence of the other five CDR loop backbones, after con- struction of the Ll loop (Table 3)

2.2.6.1 MODELING CANONICAL LOOPS L2-H2 (SEE NOTE 1)

Since the structure of the canonical loops L2-H2 is well conserved, these loops are constructed first A canonical loop from the database of antibodies (Table 2) is chosen on the basis of sequence homology and placed onto the framework (see also ref 4) taking into account the take- off angle of the base of the CDR with the framework Each loop is built

on a bare framework in isolation of the other canonical loops Side chains are constructed as described in Section 2.2.5

2.2.6.2 MODELING THE Ll LOOP (SEE NOTE 2)

CDR loop Ll is constructed using the combined database/CONGEN method The base of the loop is constructed using the database method that utilizes information from known structures to saturate conformational space Database searching has the advantage over purely ab initio-based approaches such as those employed in CONGEN, of being computation- ally efficient as well as generating structures that are already known to exist in nature This method uses a predetermined set of a-carbon distance constraints These are derived from all of the CDR loops from known antibody structures The a-carbon constraints specify the geometry of CDR loops of a particular length within a tolerance of z = 3.50 An a- carbon database containing all structures from the Brookhaven PDB is searched for loops of the same length and geometric fit Selected loops are clustered to eliminate redundant structures, and the central portion of the remaining loops is removed for construction by the CONGEN method

A 16-residue loop has very tight a-carbon distance constraints Only three database loops were selected that conformed to the required geom- etry for the Ll loop of antibody 1CGS The central five residues corre- sponding to [V(HSN)G] in the model were removed from each of the selected loops and reconstructed using CONGEN Conformational space was searched by rotation of the backbone $ and u/ angles restricted by

Trang 32

Table 3 Outline of the Protocol for the Initial Modeling and Subsequent Remodeling of tbe Ll and H3 Loops0 CDR Sequence Length Build method Priority Database lnts CONGENe Initial Model

Ll RPSQSL[V(HS-N)G]NTYLH 16 Combined database CONGEN 2 3 3567

Remodel

H3 GYSS -M 5 Database search only 2 6798gc/41 lad b

me H3 loop was shortened at the base of the loop, and the five-resrdue loop reconstructed using the database method m the presence of the origmal Ll backbone and canonical loops The shoulder region of the Ll loop was reconstructed m the presence of the remodeled H3 and canomcal backbones using the combmed database/CONGEN method The square brackets [] mdtcate the regron to be constructed, and the round brackets mdrcate the region of chain closure using the Go and Scheraga (89) algorithm

bDoes not apply

‘Total number of database hrts

dNumber remaming after mitial clustering

‘Number of loops after reconstructron of central portion using CONGEN

Trang 33

Antibody-Combining Sites 33

Ramachandran energies of a specified cutoff values Conformational searching was also used in side-chain construction (see Section 2.2.6.1.) Chain closure of the central three residues (HSN) was performed using the chain-closure algorithm of Go and Scheraga (89) Each of the result- ing 3567 putative loops was screened using the solvent-modified energy function The loops were clustered, and the five lowest energy unique conformations were selected Loops derived from database searches were filtered by the structural determining region algorithm of Sutcliffe and coworkers (54) (see ref 41) If only a CONGEN search is made, then the lowest energy structure would be selected

2.2.6.3 MODELING THE H3 LOOP

CDR loop H3 is generally the most difficult loop to model accurately because of variability in its structure and takeoff angle from the frame- work It was noted in Section 2.1.1, that strands 7 and 8 interconnecting CDRH3 vary in their positioning in different antibodies It is important

to model the takeoff trajectory accurately, since small differences can produce large variations in the placement and, hence, the overall struc- ture of the CDRH3 loop The AbM protocol attempts to take this vari- ability into account by defining four H3 families (8,9) that differ in length, the position of key residues, and loop takeoff angles Currently, seven H3 structural classes are incorporated in our research version of the AbM program and will be included in the next planned release of the commercial version

The CDRH3 loop of antibody 1CGS is seven residues in length (GYSSMDY) This loop was constructed using the database method without reconstruction of the central region Because of the structural variability of H3 loops, the constraints on their modeling are weak For CDRH3 of lCGS, a large number of database loops were identified (29978) This number was reduced by the structurally determining region algorithm of Sutcliffe and coworkers (54), the loops were clustered, and evaluation of putative structures was as described in Section 2.2.6.2 A com- parison of the initial model with the crystal structure is shown in Fig 4

Examination of the sequence of the H3 loop revealed an aspartic acid

at position 105 (AbM numbering) and a conserved arginine at position

100 in the framework In known structures containing this configuration,

a salt bridge is formed between the two charged residues, tying down the

Trang 34

34 Webster and Rees

Fig 4 Panels A and C show orthogonal views of the backbone of the initial

The smaller circle indicates the region near the N-termmal Section of the loop

where the chain deviates from the crystal structure The larger circle Indicates a bulge characteristic of H3 loops that do not make a salt bridge between Arg-212 and Asp-218 (AbM numbering) Panels B and D show orthogonal views of the

remodeled H3 loop

base of the H3 loop The 1CGS modeled loop did not possess this salt bridge In addition, the H3 loop displayed a bulge (Fig 4) often seen when a salt bridge is absent at the base of the loop These observations do not inspire confidence in the initial model of the H3 loop The H3 loop was therefore remodeled in the presence of the other five loops by moving the region defined as framework by two residues (framework underlined):

Trang 35

Antibody-Combining Sites 35

Fig 5 Comparison of the backbone of the final six modeled CDR loops The crystal structure is depicted in white, and the model in dark gray The region that showed a clash in Ll is circled (see text) All figures were aligned along the axis joining the N- and C-terminus and the axis entering the page and the plane

of the loop orthogonal to the base of the page The loops were then rotated, so that they are orthogonal to the first view This representation does not bias the viewing in favor of any particular orientation

known antibody structure Construction of the shortened five-residue loop proceeded as before (Section 2.2.6.3.)

The region near the C-terminus of the Ll loop (Fig 5) showed a small clash with part of a side chain from the light chain of the F, framework This region lends characteristic structural features to Ll loops that

Trang 36

36 Webster and Rees

Table 4 Comparison of the R M.S.D of the Model CDRs from the Crystal Structure a

Remodel Change after change from mmimized Initial model” of H3b Minimlzatlona minimizatlonC initial model crystald

Ll 1.63 > 123 -0.40 -0.40 0.51 L2 0.61 > 0.74 +o 13 +o 13 0.58 L3 0.94 > 120 +0.26 +0.26 0.53

Hl 1.29 > 1.33 +0.04 +0.04 0.42 H2 1.18 > 0.97 -0.21 -0.21 0.43 H3 4 29 1.72 1.66 -0.06 -2.63 0.52

me inihal and remodeled structures are compared with the crystal structure The final model was minimrzed, and further comparrson was made with the crystal structure that had undergone the same mmrmizatron protocol The modeled antibody is fitted to the crystal structure based on the framework regions outlined in Table 1, and R.M.S.D values are reported for the loops constructed

on this fitted framework

‘Comparison of model with crystal structure

CComparrson of minimized model structure with mmimrzed crystal structure

dComparison of crystal structure with minimized crystal structure

become more prominent as the length of the loop increases More struc- tural variability is seen in this region than in the rest of the loop Serious clashes in this region may indicate a need to alter the CONGEN-con- strutted loop to cover the region in which the clashing segment and the framework side chain are found:

3 The question of where to place the reconstruction region is still very much

a subjective one Good guidelines on this point have yet to emerge from the literature despite many attempts to define such rules Under these cir- cumstances, reconstruction of the CONGEN-built region would merely

2.2.9 Minimization

Construction of the framework, CDR loops, and side chains may result

in some steric clashes as noted above in Section 2.2.8 These clashes may

Trang 37

Fig 6 Plot of the difference in $ and w angles of the initial and final models

of CDR H3 loop Differences in $ and w angles are calculated as positive if the change from crystal to model is clockwise when viewed along the N to m axrs and negative if the rotation is counterclockwise

gram like Discover (90) The protocol followed here to this end fixed all

minimization for 50 cycles The tethering force was then reduced on all side-chain atoms (both loop and framework) for 50 further cycles of steep- est descent minimization The tethering force on CDR loop backbone atoms was reduced in steps over 100 further cycles Finally, 100 cycles of conjugate gradient minimization without any tethering force on the CDR backbones or framework and CDR side-chains were performed Typically, the backbone framework atoms remain fixed during the minimization This

protocol is followed to allow regions that are most likely to contain clashes

(such as hydrogens) to relax their conformation without affecting other regions of the model unduly during the initial stages of minimization In

order to compare the crystal and model structures, both of these structures

were subjected to the same minimization protocol

3 Notes

1 As expected, the canontcal loops (L2-H2) are modeled well using the yard- sttck of simrlarlty to the crystal structure, with root mean square deviations (R.M.S.D.) of 0.61-1.29 8, for the initial model (Table 4,see previous page)

Trang 38

38

Trang 39

C Plot of Chi-1 vs Chi-2

Fig 7 Ramachandran plots of the minimized crystal and minimized model structures are depicted in A and B, respec- tively The dark gray areas show the most favored regions Plots C and D show the x1/x2 side-chain distributions of the crystal and model structures, respectively Tighter clustering of the points around the crosses indicates that the distribution

of side chain x angles more closely approximates ideal x angle distribution determined from statistical distributions of side- chains from known structures

Trang 40

40 Webster and Rees

These loops either improved slightly on minimization or became slightly worse with changes between +0.26 and -0.21 A The R.M.S.D are withm the error limits for proteins at the resolution of 1CGS X-ray structure determination

2 Compared with the mmimized crystal structure, the Ll loop was also ml- tially modeled well with an R.M.S.D of 1.63 8, that improved to 1.23 8, after minimization

3 The clash noted in Section 2.2.7 was removed after mimmization CDR H3 was modeled poorly with an R.M.S.D of 4.29 8, (see Fig 4) This loop deviated from the crystal structure as soon as the constraint on the frame- work was lost This is clearly shown in a $, v difference plot (see Fig 6 on

p 37) in which the difference in dihedral angles at the ends of the loop are significantly greater than in the central region of the loop By constraining the base of the loop and extending the framework by two residues, the correct conformation of Asp105 was maintained in order to make a salt bridge with ArglOO (AbM numbering) In the H-chain framework of antibody lFDL, position 100 is a histidine residue that was mutated to arginine to conform to the 1CGS sequence The appropriate+, w, and over- lapping x side-chain torsions oriented the side chain in the correct direc- tion The remodeling in the presence of the other five loops resulted in a dramatic improvement in the H3 conformation to 1.72 A, which improved only slightly on energy minimization

4 The decision to remodel the H3 loop resulted from the knowledge of other antibody sequences and structures This pomt emphasizes the need for familiarity with the details of database X-ray crystallographic structures of antibodies An examination of the entire model is also useful in identifying

of the distribution of the backbone @ and v dihedral angles in Rama- chandran plots of the crystal structure and the model structure Chi angles are also shown indicating the distribution of xl and ~2 torsions around the most favored positions Residues that have nonstandard torsions should be investigated by graphical means to determine why they adopt such tor- sions A detailed analysis of the Ramachandran plots of 1CGS is beyond the scope of this chapter However, it should be noted that such plots are only guides Obviously, the background and experience of the modeler are

a key factor in interpreting the structural information As can be seen in Fig 7, the model scores as well as, if not better than, the crystal structure

on the basis of standard parameters The model structures built by homol-

antibody structures are either of such low resolution or have been incor- rectly built that their use in modelmg must be suspect It should also be

Ngày đăng: 10/04/2014, 11:01

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
2. Garrard, L J , Yang, M , O’Connell, M P , Kelly, R. F., and Henner, D. J (1991) BiofTechnology 9,1373-1377.3 Hoogenboom, R H , Gnfhths, A D., Johnson, K S ,&amp;swell, D J , Hudson, P., and Winter, G. (1991) Multi-subunit proteins on the surface of filamentous phage: methodologies for displaymg antibody (Fab) heavy and hght chains. Nucleic Acids Res. 19,4133-4137 4. Kang, A. K., Barbas, C. F., Janda, K. D., Benkovic, S. J., and Lerner, R. A (1991) Sách, tạp chí
Tiêu đề: BiofTechnology
Tác giả: Garrard, L J, Yang, M, O’Connell, M P, Kelly, R. F., Henner, D. J
Nhà XB: BiofTechnology
Năm: 1991
1. Scott, J. K. and Smith, G. P. (1990) Searchmg for peptide ligands wtth an epitope library. Science 249,386-390 Khác
5. Breitling, F., Dtlbel, S., Seehaus, T., Klewinghaus, I., and Little, M. (1991) A sur- face expression vector for antibody screening. Gene 104, 147-153 Khác
10. Orum, H., Nielsen, H., and Engberg, J (1991) Spliceosomal small nuclear RNAs of Tetrahymena thermophila and some possible snRNA-snRNA base-pairing interactions. J. Mol. Biol. 222,219-232 Khác
11. Pluckthun, A. and Skerra, A. (1989) Expression of functional antibody F, and Fab fragments in Escherichia coli. Methods Enzymol. 178,497-515 Khác
12. Miller, J. H. (1972) Experiments in Molecular Genetics Cold Spring Harbor Labo- ratory, Cold Spring Harbor, NY Khác