A systematic analysis performed in an autochthon southern Italy breed identified a new rare allele M, which was characterized at both the protein and genomic level.. A comparison of spec
Trang 1Interallelic recombination is probably responsible for the occurrence
Claudia Bevilacqua1,2,*, Pasquale Ferranti3,4, Giuseppina Garro3,4, Cristina Veltri1, Raffaella Lagonigro1, Christine Leroux2, Emilio Pietrola`1, Francesco Addeo3,4, Fabio Pilla1, Lina Chianese3and Patrice Martin2 1
Dipartimento di Scienze Animali, Vegetali e dell’Ambiente, Facolta` di Agraria dell’Universita` del Molise, Campobasso, Italy;
2
Laboratoire de Ge´ne´tique biochimique et de Cytoge´ne´tique, INRA, Domaine de Vilvert, Jouy-en-Josas, France;
3
Dipartimento di Scienza degli Alimenti, Facolta` di Agraria, le Universita` di Napoli ‘Federico II’, Portici, Italy;
4
Istituto di Scienze dell’Alimentazione del CNR, Avellino, Italy
The as1-casein (as1-Cas) locus in the goat is characterized by
a polymorphism, the main feature of which is to be
qualit-ative as well as quantitqualit-ative A systematic analysis performed
in an autochthon southern Italy breed identified a new rare
allele (M), which was characterized at both the protein and
genomic level The M protein displays the slowest
elec-trophoretic mobility of the as1-Cas variants described so far
MS and automated Edman degradation experiments
showed that this behavior was due to the loss of two
phos-phate residues in the multiple phosphorylation site (64SP-SP
-SP-SP-SP-E-70E) consecutively to a Serfi Leu substitution
at position 66 of the peptide chain (64S-SP-L-SP-SP-E-70E)
This was confirmed by sequencing a genomic DNA
frag-ment encompassing exon 9 where the 8th codon (TCG) was
shown to be mutated to TTG Sequencing of amplified
genomic DNA segments spanning the 5¢ and 3¢ flanking
regions of each exon allowed us to identify 23 single
nuc-leotide polymorphisms and two insertion/deletion events in the coding as well as the noncoding regions A comparison of specific haplotypes defined for each of the as1-CasF, A and
Malleles indicates that the M allele probably arises from interallelic recombination between alleles A and B2, followed
by a Cfi T transition at nucleotide 23 of the ninth exon The region encompassing the recombination break point was putatively located between nucleotide 86 upstream and nucleotide 40 downstream of exon 8 Interallelic recombi-nation therefore appears to be a possible means of gener-ating allelic diversity at the as1-Caslocus, at least in the goat The previously proposed molecular phylogeny must now be revised, possibly starting from two ancestral allelic lineages Keywords: as1-casein gene; allelic recombination; genetic polymorphism; goat milk
Caseins comprise the main protein fraction of ruminant
milk They are encoded by four tightly linked genes [1],
clustered in a 250-kb genomic DNA segment [2] in the
following order: as1, b, as2 and j [3] They have been
mapped on chromosome 6 in cattle and goats [4,5] The
as1-casein locus (as1-Cas) is characterized in the goat by a
polymorphism, the main feature of which is to be
qualit-ative as well as quantitqualit-ative Indeed, more than 11 alleles
have so far been characterized [6], distributed among seven
different classes of protein variants (as1-CasA to as1-CasG),
associated with four levels of expression ranging between 0
(as1-Cas0) and 3.5 gÆL)1(as1-CasA, B, and C) per allele
Whereas the as1-CasE variant, which is 199 amino-acid residues in length, only differs from variants A, B and C by single amino acid substitutions [7], the F variant displays an internal deletion of 37 residues [8], leading to the loss of a hydrophilic cluster of five contiguous phosphoseryl resi-dues: 64SerP-SerP-SerP-SerP-SerP-Glu-70Glu This deletion arises from the outsplicing of three exons (9, 10 and 11) during the processing of primary transcripts, probably because of a single nucleotide deletion occurring within the first (exon 9) unspliced exon [9] More recently, the B allele has been split up into four alleles giving rise to the synthesis
of four protein variants B1, B2, B3, and B4, which differ as a result of amino-acid substitutions [6] These substitutions have no effect on the net charge of the protein, which therefore makes the relevant variants indistinguishable on PAGE Variant B1is considered to be the original type in goat because it shows the closest homology to its bovine and ovine counterpart [6]
The distribution of these different alleles or variants has been investigated in a great variety of breeds and popula-tions [6,10–13] Breeds from the Mediterranean area usually display a high frequency of ‘strong’ alleles (mainly A and B) However, local and now rare breeds generally do not follow this rule and are often the source of rare ‘germoplasms’ Three novel as1-Cas variants (H, I and L) have been identified by Chianese et al [14] in southern Italian goat populations More recently, a further novel and rare
Correspondence to P Martin, Laboratoire de Ge´ne´tique biochimique
et de Cytoge´ne´tique, INRA, Domaine de Vilvert,
78 352 Jouy-en-Josas, France Fax: + 33 1 34 65 24 78,
Tel.: + 33 1 34 65 25 82, E-mail: martin@jouy.inra.fr
Abbreviations: a s1 -Cas, a s1 -casein; UTLIEF, ultra-thin-layer
isoelectric focusing; LC/ES/MS, liquid chromatography/electrospray/
mass spectrometry; ACRS-PCR, amplified created restriction
site-PCR.
*Present address: INSERM E9925, Interactions de l’e´pithe´lium
intestinal avec le syste`me immunitaire, Faculte´ Necker-Infants
Malades, 156, rue de Vaugirard, 75 743 Paris Cedex 15, France.
(Received 29 August 2001, revised 17 December 2001, accepted
9 January 2002)
Trang 2variant, named M, was detected in the Molisane
Montefal-cone goat breed [15], which was shown, in addition, to
display a rather high frequency of the F allele [16]
In this paper, we report the characterization of this new
variant at both the protein and genomic level The complete
amino-acid sequence of the M variant has been determined
Starting from genomic DNA, we amplified, by PCR, the
coding regions (exons) and their intron flanking regions,
which have been subsequently sequenced Such a dual
approach has made it possible to identify the mutation
specific for the as1-CasMallele Extensive comparisons of
these sequences with those of previously characterized
alleles have allowed the identification of additional
poly-morphic sites, the arrangements (haplotypes) of which
strongly suggest an interallelic recombination (or a gene
conversion) event at the origin of the as1-CasMallele This
is, to our knowledge, the first hypothesis of a genomic
recombination event to account for genetic polymorphism
at a locus encoding a milk protein
M A T E R I A L S A N D M E T H O D S
Animals
A total of 147 individual milk samples were analysed from
Montefalcone goats, which are localized in southern Italy
(Molise region) Eight goats were used, as well as two bucks,
for peripheral blood (15–30 mL), which was subsequently
used for DNA extraction
Casein preparation
Whole casein was prepared by acid precipitation of
individual skimmed milk as described by Aschaffenburg &
Drewry [17]
Gel electrophoresis
Vertical disc PAGE at pH 8.6, preparation of casein
samples and polyclonal antibodies against as1-Cas, and
immunoblotting experiments were performed as described
elsewhere [18]
Preparation of polyacrylamide gel ultra-thin layers
(0.25 mm) and isoelectric focusing (UTLIEF) were carried
out as recommended by EEC Regulation no 690/92 [19]
The pH gradient in the range 2.5–6.5 was obtained by
mixing Ampholine (Pharmacia LKB) 2.5–5, 4.5–5.4, and
4–6.5 in the volume ratio 1.6 : 1.4 : 1
2D gel electrophoresis (PAGE in the first dimension
followed by UTLIEF in the second) has been described
elsewhere [18]
Enzymatic hydrolyses
Trypsin (Boehringer Mannheim) hydrolysis was carried out
in 0.4% NH4HCO3, pH 8.5, at 37°C, for 4 h, in a
substrate/enzyme ratio of 50 : 1 (w/w) Dephosphorylation
with calf intestine alkaline phosphatase (Boehringer
Mann-heim) was performed in the same buffer by using 1 mU
enzyme/mg casein at 37°C for 18 h; these conditions have
been previously shown to produce complete
dephosphory-lation of the sample [20] Reactions were stopped by
freeze-drying
Liquid chromatography/mass spectrometry analysis
of proteins and peptides The whole caprine casein samples were fractionated by the procedure of Jaubert and Martin [21], modified by Ferranti
et al [22]
Liquid chromatography/electrospray/mass spectrometry (LC/ES/MS) was performed using a HP1100 modular system on-line connected to a Platform (Micromass) single quadrupole mass spectrometer The selectively precipitated casein phosphopeptides were fractionated by RP-HPLC on
a 214TP54, 5 lm Vydac C18, 250· 2.1 mm internal diameter column (Vydac, Hesperia, CA, USA) Solvent A was 0.3 mL trifluoroacetic acid per L water Solvent B was 0.2 mL trifluoroacetic acid per L acetonitrile Samples (500 lg) were dissolved in 200 lL water and injected on to the HPLC column equilibrated in solvent A A linear gradient from 0% to 37% B was applied at a flow rate of 0.5 mLÆmin)1over 60 min The column effluent was split
1 : 25 to give a flow rate of 4 lLÆmin)1 into the electrospray nebulizer The bulk of the flow was run through the detector for peak collection as measured by following A220 The ES-mass spectra were scanned from
1800 to 400 lm at a scan cycle of 5 s per scan The source temperature was 120°C and the orifice voltage 40 V Mass values were reported as average masses Signals recorded in the mass spectra of peptides were associated with the corresponding tryptic peptides on the basis of the molecular mass, taking into account the enzyme specificity and the reported amino-acid sequence of as1-Cas from different species Quantitative analysis of components was performed
by integration of the multiple charged ions of the single species [22]
Sequence analysis Automated Edman degradation was performed using an Applied Biosystems model 477A Protein Sequencer with on-line phenylthiohydantoinyl amino acid (Pth-Xaa)-HPLC analyzer Phosphorylated peptides were modified
by the procedure of Ferranti et al [20]
Genomic DNA preparation Goat genomic DNA was prepared from leucocytes isolated from the plasma fraction of EDTA-anticoagulated periph-eral blood samples, as described previously [23,24] Oligonucleotides
Intronic primers used either for amplification from genomic DNA or for sequencing of amplified DNA fragments were provided by Genosys Biotechnologies Inc (Cambridge, UK) and Primm (Milano, Italy)
Their sequences are given in Table 1, together with those used for genotyping
PCR conditions
In vitroamplification was performed with the thermostable DNA polymerase of Thermus aquaticus (Taq polymerase) using either a 480 or a 2400 thermal cycler (PerkinElmer), essentially as described [25] A typical 50-lL reaction
Trang 3mixture consisted of 5 lL 10· PCR buffer (500 mMKCl,
100 mMTris/HCl, pH 9.0, 1% Triton X-100), 3 lL 25 mM
MgCl2, 2.5 lL 5 mM dNTPs mixture, 0.5 lL (25 pmol)
each primer, 2 lL template DNA, and 0.25 lL (1.25 U)
Taqpolymerase (Promega) To avoid evaporation (with 480
thermal cycler), the mixture was covered with 70 lL mineral
oil After an initial denaturing step of 5 min (or 10 min) at
94°C, the reaction mixture was subjected to the following
three-step cycle which was repeated 35 times: denaturation
for 30 s (or 1 min) at 94°C, annealing for 30 s (or 2 min) at
47–60°C, and extension for 30–60 s (or 3 min) at 72 °C,
using the 2400 (or 480) thermal cycler To estimate the concentration of PCR products, 5 lL each reaction mixture was analysed by electrophoresis, in the presence of ethidium bromide (0.5 lLÆmL)1) in a 2% SeaKem (FMC) or Gibco BRL Life Technologies agarose slab gel in Tris/borate/ EDTA (8.9 mMTris, 8.9 mM boric acid, 0.2 mM EDTA,
pH 8.0) buffer
For genotype as1-CasM, using the amplified created restriction site (ACRS)-PCR procedure [26], experimental conditions are essentially the same as those mentioned before except for the primer concentration (50 pmolÆ50 lL)1 reac-tion mix) and the concentrareac-tion of the agarose slab gel used
to visualize the PCR products was 4% (2% Gibco-BRL and 2% high-resolution agarose FMC)
Sequencing of amplified genomic DNA fragments PCR products were either directly sequenced or sequenced after cloning (fragments amplified between primers C9U and C9L) into SmaI-digested pUC18 plasmid vector, using fluorescent Cycle Sequencing (AmpliTaq FS, Dye Termi-nator Cycle Sequencing Kit; PerkinElmer) with an ABI 377A or an ABI 310 DNA sequencer
R E S U L T S PAGE analysis and immunoblotting of whole casein Figure 1A shows the typical electrophoretic patterns yielded, in polyacrylamide gel at pH 8.6, by the new as1-Cas phenotype, subsequently shown to be a heterozygous M/F (M being the new variant), in comparison with two reference phenotypes AA (lane 1) and FF (lane 2) This new phenotype is characterized, under these conditions, by the presence of a protein band with a slower mobility (lane 3, *) occurring within the ascomplex As the as1-Cas and as2-Cas overlap in the same zone of the gel, the as1-Cas composition
of each phenotype was analysed by immunostaining after
Table 1 Primers used in the present study Each pair of primers
amplifies the target exon and its flanking regions (from 60 to 200
nucleotides upstream and downstream) Primers ending with U (upper)
and L (lower) are positioned 5¢ and 3¢ from the target exon,
respect-ively Given the small size of introns 4 and 10, primers C45U/C45L
and C1011U/C1011L were designed to amplify together exons 4 and 5
and exons 10 and 11, respectively Sequencing of exon 7 was performed
starting from a genomic DNA fragment produced by amplification
between C7U and C8L Primers in italics were used in the genotyping
of allele M.
C2U 5¢ AAT CAA ATT TTA TTA TAA GAC C 3¢
C3U 5¢ GGT GTC AAA TTT AGC TGT TAA A 3¢
C3L 5¢ GCC CTC TTC TCT AAA AAG GTT T 3¢
C45U 5¢ TGA CTG TGT TTT TCA CTT CT 3¢
C45L 5¢ GCT TTG TTA ATT CTG CAG TA 3¢
C7U 5¢ CAT GAA GCA ATA TAT CTG CTC C 3¢
C7L 5¢ TGG TCA ACA TAC ATG TTG CAT C 3¢
C8L 5¢ TGG CAC AAC ATT GTA CAT TCT TGG G 3¢
C9U 5¢ GTA TGG AAG TGT GGA ATA GTT T 3¢
C9L 5¢ GGA CAC CAC AGA TAT CCA ATA G 3¢
C1011U 5¢ CAT AAA ACT AAC AAT ACA TGT 3¢
C1011L 5¢ TAG CAG ATA TTG AAA AGG AG 3¢
C12U 5¢ CCA GTG AAT ATT CAG GAC TGA T 3¢
C12L 5¢ AGG CTC TAG CAT GAT TTG ATG T 3¢
C13U 5¢ GCA TTT TTA TTT TGA ATG TAA A 3¢
C13L 5¢ TAG TTC AAA TGC ACA TCT TAT 3¢
C14U 5¢ GGC AGA GAA TAC GTT TAT ACT AA 3¢
C14L 5¢ TCT CAG ATT GAC TAC TAC AAC TT 3¢
C15U 5¢ CAT GAA AAG CAT TTC AAA AA 3¢
C15L 5¢ TAA AAA ACA GTG GTT ACC AA 3¢
C16U 5¢ CTA AAG AGT ACA CTA TCC TCA C 3¢
C16L 5¢ TTG CTG TGG TTG CCT ATC CTA 3¢
C17U 5¢ TGA TTT CTC ATA CAC TGT TG 3¢
C17L 5¢ TTG ATA AGG CAA CAA TAT GC 3¢
C18U 5¢ GTC CCA ACT TGA AAT CCT GAT C 3¢
C18L 5¢ CAA GTT TAT AGT CTA CAC GTT GTA C 3¢
C19U 5¢ CTT AGC ATC TTC CAT GGC TTG ATC 3¢
C19L 5¢ ATA CAC ACA AAC TCA CAA GG 3¢
MWU 5¢ CAA CAT ATT TTA AAT AAA ATT GAC AAT 3¢
C9LM* 5¢ ATA AAA ATG GTA TAC CTC ACT TGT*C 3¢
C9UM1 5¢ TAA CAA TGA TTC TCT TTC TTT TAG 3¢
C9LM1 5¢ AAT CTT TAT TTT GTC TCT GAC AA 3¢
Fig 1 Disc-PAGE at pH 8.6 of individual whole caprine casein samples containing different a s1 -Cas variants AA, FF and MF Phenotypes are indicated at the top of each lane Staining was with (A) Coomassie Brilliant Blue and (B) polyclonal antibodies against a s1 -Cas a–e iden-tify a s1 -Cas bands of the MF sample in order of increasing mobility towards the anode.
Trang 4transfer to NC paper with specific polyclonal antibodies
raised against as1-Cas; the result is shown in Fig 1B The
new as1-Cas phenotype (M/F) comprises at least five
components (a, b, c, d, and e) Two of these (a and c)
appear to be shared with variants A and F, while
components e and d seem to be in common with the A
variant Therefore, band b represents the only component
specific to the M variant The intensities of the bands in the
MF pattern indicate that variant M is a ‘strong’ variant like
variants A, B and C, i.e it has a high level of expression
However, as the intensities of three apparently homologous
components (a, c, and e) in the AA and MF profile were
different, further heterogeneity of the PAGE components
may be suspected
To understand the high degree of heterogeneity observed
with goat as1-Cas and to try to explain the difference in
band intensities, further electrophoretic experiments were
carried out, including UTLIEF analysis and 2D
electro-phoresis followed by staining with polyclonal antibodies
In UTLIEF (results not shown) the as1-CasM/F phenotype
comprised at least seven major components, two of which
were in common with variant F Using 2D electrophoresis
(Fig 2), at least two main spots surrounded by a number of
minor components differing in their pI were found in each
PAGE band This large microheterogeneity, which also
occurs for other casein phenotypes (results not shown), may
be attributable to nonallelic as1-Cas forms generated by
defective mRNA splicing and to differently phosphorylated
a -Cas chains, as reported by Ferranti et al [20]
MS and sequence analyses
To determine the molecular mass of the new variant (M), whole caseins of individual milks of the phenotypes A/A, F/F, and M/F were subjected to HPLC separation (Fig 3) The retention time of variant M was shorter than that of the
A variant while the relative percentage was the same The HPLC fractions were analysed by ES/MS, and the molecular masses of as1-CasA, B, and F were in agreement with the expected masses [7,9] The molecular mass deter-mined by ES/MS of the as1-Cas components occurring in the sample containing M/F variants was 23 134/23 214/23
294 Da (Fig 4) After alkaline phosphatase hydrolysis, the molecular mass of the three main peaks shifted to the single value of 22 734 Da, indicating the occurrence of three
as1-Cas species carrying five, six, and seven phosphate groups, respectively A set of small HPLC peaks eluted before the main a -Cas peak gave a molecular mass of
Fig 2 2D electrophoretic analysis of a whole casein sample prepared
from the milk of a single goat, heterozygous M/F at the a s1 -Cas locus.
Disc-PAGE was performed in the first dimension followed by
UTLIEF in the second dimension The UTLIEF pattern in the pH
range 2.5–6.5 is shown on the left Staining was with polyclonal
anti-bodies raised against a s1 -Cas.
Fig 3 RP-HPLC analysis of casein fractions from goats of different genotypes F/F (A), M/F (B) and A/A (C) at the a s1 -Cas locus.
Trang 518 715/18 795/18 875 Da (18 555 Da after alkaline
phos-phatase hydrolysis), corresponding to that expected for the
F variant This result is the first evidence for the
heterozy-gous status (M/F) of the individual goat milk analysed
In addition to this, the HPLC profile confirms that the
M variant is abundantly expressed Thus, as previously
mentioned, we were working with a mixture of two
unresolvable variants, one of which (M) accounts for more
than 80% of the whole as1-Cas This overrepresentation of
the M variant allowed us to continue the molecular
characterization with such a material
The as1-Cas fraction containing the M variant was
digested with trypsin, and the resulting peptide mixture
analysed by LC/ES/MS (Fig 5) The peptide sequence
determined for the M variant was identical with that yielded
by variant B2(from the published sequence [6,7]) except for
two substitutions located in peptide 62–79 MS and
automated sequence analysis actually demonstrated that
peptide 62–79 (molecular mass 1833 Da and sequence
AGSSLSSEEIVPNSAQQK, where S indicates a
phos-phorylated serine residue) contains the two substitutions
Ser66fiLeu and Glu77fiGln, as compared with the B2
variant The substitution Serfi Leu at position 66, first
makes this site unphosphorylatable and secondly impairs
the phosphorylation of Ser64 in the M variant The
sequence determined is consistent with the molecular mass
measured for the native protein The phosphorylated
residues are therefore Ser46, 48, 65, 67, 68 (fully), and
Ser41 and Ser115 (partly), which originate in proteins with
five, six and seven phosphates/mol, explaining the
hetero-geneity of phosphorylation observed for the native protein
by ES/MS analysis (Fig 4) Finally, peptide E96QLLR100,
diagnostic of the F variant, was present among the peptides
identified by Edman degradation after tryptic digestion and
RP-HPLC fractionation, confirming the heterozygous
sta-tus (M/F) of the sample analysed
Experimental strategy designed to analyse the new
as1-Cas variant at the nucleotide level
To determine the coding sequence of a gene, there are at
least two possible strategies: it is possible to analyse it at
both the genomic level and messenger level The most straightforward option is undoubtedly mRNA extraction to construct a cDNA molecule The structure of the coding region is then readily obtained by sequencing the cDNA
In our situation, however, such a strategy was not possible Given the low number of animals in the popula-tion, it was not possible to slaughter individuals of interest
In addition, as the animals were from private flocks bred in mountain meadows, it was not possible to make mammary tissue biopsy samples under appropriate hygienic condi-tions
To overcome this, we tried to extract mammary mRNA from milk somatic cells, using the technique first described
by Martin et al [27] Unfortunately, we could not obtain enough material to synthesize cDNA However, as expected from the phenotypic analysis (at the protein level), the few animals yielding in their milk the as1-CasM variant were exclusively heterozygous M/F at the as1-Cas locus There-fore, analysis of their transcripts could have been rather difficult because of the occurrence of at least nine different forms of messenger arising from the F allele [9] Finally, to integrate this new allele into the phylogenetic tree proposed
by Grosclaude & Martin [6], we also needed to obtain information about relevant noncoding regions in which specific and informative mutations are localized
For these reasons, we decided to analyse the sequence of the M allele at the genomic level After amplification of each exon and its intron-flanking regions, amplified genomic DNA fragments were sequenced The knowledge of the structural organization of the goat gene encoding the as1-Cas [9] made this strategy possible In addition, the complete sequence of the bovine gene [28] was also available and showed that the two genes display the same organiza-tion (number and sizing of exons) and 95% similarity at the exon sequence level As goats and cattle are phylogenetically close and known intron sequences in the goat show strong similarity to their bovine counterparts, we designed prim-ers upstream and downstream of each exon to amplify and analyse genomic regions including flanking intron
Fig 4 Deconvoluted electrospray mass spectrum of caprine a s1 -Cas
M variant.
Fig 5 LC/ES/MS analysis of the tryptic digest of the a s1 -CasM vari-ant The purified protein was digested with appropriate concentrations
of trypsin (see Materials and methods) The peptide mixture was analyzed using a Vydac C18 column (250 · 2.1 mm, 5 lm), on-line with a Platform mass spectrometer, as described in Materials and methods The peak of the variant peptide is indicated by an arrow.
Trang 6sequences, starting from both the bovine and the goat
sequences
Analysis of the exon sequences at the genomic level
As the samples analysed were from goats that were
heterozygous (M/F) at the as1-Cas locus, to discriminate
between the two alleles and therefore determine the 19 exon
sequences coming from the M allele, sequence data were
compared with those from a homozygous F/F goat genomic
DNA sample All the sequences yielded were
unambigu-ously determined except that corresponding to the PCR
fragment encompassing the ninth exon in which a single
nucleotide deletion has been shown to occur in the F allele
[9] This makes the sequence chromatogram unreadable
from that point for the DNA template amplified from the
heterozygous M/F sample To overcome this problem, the
amplified fragment was cloned Of the 10 clones sequenced,
four displayed a typical F exon-9 sequence, and five showed
the same sequence, which was different from that of the
Fallele, with a 33-nucleotide exon 9 Taken together, the
exonic sequence data allowed us to construct a sequence
corresponding to the complete cDNA of the M allele This
sequence is given in Fig 6, where it is compared with that of
alleles F and A
Only four polymorphic nucleotides were identified, three
of which yielded amino-acid substitutions: (a) the transition
TfiC on the second nucleotide of the third codon within
exon 4, leading to a LeufiPro substitution at position 16 of
the peptide chain, as compared with the A variant; (b) a
transversion GfiC on the first nucleotide of the last exon-10
codon, leading to a GlufiGln substitution at position 77 of
the peptide chain, as compared with the F variant; (c) the
deoxycytidyl phosphate residue at position 23 in the 9th
exon of the A allele, which is deleted in the F allele, is
mutated to T in the M allele, giving rise to a Serfi Leu
substitution
Analysis of the intronic flanking sequences
The flanking intronic regions directly upstream and
down-stream of each exon were sequenced over 50–200
nucleo-tides and the complete sequences of introns 4, 7, and 10 were
determined for alleles A, F, and M In this way, 20 further
polymorphic sites were identified besides the four
polymor-phic exon nucleotides (Fig 7) In addition, an RsaI
restriction site was found between exon 6 and exon 8 of
alleles F and M, which is lacking in the A allele, giving a
total of 25 polymorphic sites useful for phylogenetic allele
comparisons Taking into account these data, it is worth
noting that in the 5¢ part of the gene, up to exon 8, the
nucleotide combination (haplotype) observed for the M
allele is identical with that shown by the F allele In contrast,
in its 3¢ part, beyond exon 8, the haplotype of the M allele is
identical with that of the A allele, except at the polymorphic
site located in exon 9
In addition, intron 5 was completely sequenced starting
from genomic DNA isolated from blood of two goats,
genotyped as M/F and F/F at the as1-Caslocus Compared
with the bovine sequence, a deletion spanning nucleotides
376 to 594 was observed for both goats The deleted region
in this intron did not match any known sequence in the
EMBL databank Subsequently, the existence of this
deletion was confirmed by PCR for six goats of different
as1-Cas genotypes (A/A, F/F, M/F) from different Italian breeds (Montefalcone, Teramana, Garganica, Girgentana, and Sarda) and for six sheep of different Italian breeds (Comisana, Gentile di Puglia, and Valle del Belice) These results: (a) confirm the difference in size ( 200 bp) previously reported [23] between goat ( 450 bp) and cattle (641 bp) intron 5, and (b) show that the ovine intron 5 is also shorter than the cattle one This could be expected, given the phylogenetic proximity between sheep and goats Genotyping of theM allele
The genotyping procedure designed consists of two steps The first one is an ACRS-PCR technique [26], the principle
of which is to create a TaqI restriction site (TCGA) by using
a mismatching primer (C9LM*) which allows both the F and M alleles to be discriminated from all the others (Fig 8, Step IA) These two alleles will be subsequently distin-guished after a second amplification which allows discrim-ination between the alleles on the basis of the fragment sizes (Fig 8, Step IIA)
In the first step, a 266-bp (265 bp for the F allele) DNA fragment is amplified between primers MWU and C9LM* with every allele After digestion with TaqI, the 265/266-bp fragment is split into two fragments (240 and 26 bp) for each allele except the M and F alleles, for which no TaqI site has been created (Fig 8, Step IB), because of mutations (deletion or substitution) occurring at position 23 in exon 9 (TTGA instead of TCGA)
To discriminate between the M and the F alleles, we took advantage of the presence of an 11-bp insertion in intron 9
of the F allele, which is lacking in the M allele Thus, using two primers, C9UM1 (forward) and C9LM1 (reverse), located just upstream from exon 9 and 82 nucleotides downstream of the 11-bp insertion site, respectively, a 238-bp DNA fragment was yielded by PCR starting from the M allele, whereas the F allele gives a 248-bp fragment (Fig 8, Step IIA)
Individuals analysed here, which allowed the M allele to
be characterized were heterozygous M/F Consistent with our structural results, they gave the two fragments (238 and
248 bp) as shown for one of them at Fig 8, Step IIB (lane 1) It is worth noting that the third band observed with this sample is due to the occurrence of a heteroduplex structure this was confirmed by analysing an amplification product from the mix of samples F/F and X/X (Fig 8, Step IIB, lane 4)
D I S C U S S I O N
We report the identification and the molecular character-ization of a new allele, named M, occurring at the as1-Cas locus in the goat This novel allele, characterized by the transition CfiT at position 23 in the 9th exon of the gene, was found in the Montefalcone breed, at very low frequency (< 2%) after phenotypic analysis of 147 individual milk samples All goats bearing the M variant were shown to be heterozygous (M/F and M/B)
Interestingly, the mutation specific for the M allele affects the same nucleotide as that which is deleted in the F allele, and shown to be responsible for the internal deletion of 37 amino-acid residues in the F variant, as a consequence of the
Trang 7skipping of three successive exons during the course of the
processing of the primary transcripts [9] At the peptide
level, the CfiT transition, which leads to a Ser66 fi Leu66
substitution, gives rise to the loss of two of the five
phosphate groups clustered in the multiple phosphorylation
site of the as1-Cas This loss explains the lower
electro-phoretic mobility of the M protein compared with the other
caprine as1-Cas variants described so far This situation is
similar to that observed in sheep, with the a -CasD variant
(previously called the Welsh variant) Actually, this ovine variant has only two phosphoserine residues instead of five
in the homologous region of the as1-CasA and C variants [22] However, whereas the structural alteration in the D (Welsh) variant is associated with a reduction in milk casein content [29,30], the M variant, like the goat as1-CasA and B variants, must be considered a ‘strong’ variant, given the intensity of the isoelectrofocusing bands and the surface of the relevant peak in RP-HPLC
Fig 6 Nucleotide sequence of the expected a s1 -CasM cDNA obtained by genomic exon sequencing analysis: comparison with its A and F allele counterparts Numbering begins with the first nucleotide of the first exon (up) and the first amino-acid residue of the mature M protein (down) Dashes indicate nucleotides identical with those of the M allele The stop codon is symbolized by *** Numbers in vertical framed arrows indicate the position of the introns The boxes indicate amino-acid substitutions.
Trang 8Unexpectedly, placing variant M in the phylogeny
(Fig 9) proposed by Grosclaude and Martin [6] turned
out to be rather difficult Indeed, a comparison of the
different variants at the peptide sequence level suggests a
hybrid structure for the M protein Taking into account
amino-acid combinations at the polymorphic residues
(haplophenotypes), the M variant, with a proline and
glutamine residue at position 16 and 77, respectively,
could be placed in both lineages (A and B) arising from
the putative ancestral protein B1 This possible dual
membership strongly suggests the involvement of a
recombination/gene conversion event between alleles from
the two lineages This hypothesis was strengthened by
genomic sequence data Although a mutation-driven
convergence cannot be excluded, an interallelic
recombi-nation/gene conversion event seems to be the most
plausible Intronic sequences relative to A, F and M
alleles (Fig 7) strongly support such a notion Indeed, a
detailed comparative analysis at 25 polymorphic sites,
including 23 single nucleotide polymorphisms, spanning a
large part of the transcription unit provides a haplotype
formula allowing each allele to be precisely characterized
Thus, the M allele unequivocally appears to be a hybrid
structure made of F-type allele sequences in its 5¢ part
followed by A-type allele sequences in its 3¢ part (except
the transition CfiT at nucleotide 23 in the ninth exon)
Following such a scheme, a recombination event would
have occurred around exon 8 (Fig 7) However, the
genomic sequences analysed do not allow us to distinguish
whether the mechanism by which allele M originates is
consecutive to a double (gene conversion) or to a single
(recombination) cross over However, as over the 10 kb
separating exon 8 from the end of the transcription unit
there are no sequence clues indicating a second cross over,
it seems most likely that the M allele originated in an
interallelic recombination event Gene conversion events,
which usually account for exchanges over short sequence
tracts [31], have been mainly described and intensively
investigated as mechanisms generating allelic diversity in
highly polymorphic genetic systems, such as the loci
encoding the class-II cell surface antigens of the major
histocompatibility complex in humans [32,33] Both mechanisms have also been thought to account for genetic disorders in humans, such as sporadic Alzheimer disease cases [34] and diabetic pathology involving the gene encoding insulin [35]
Simplified haplotype formulae strongly suggest that the allele that provided the 3¢ part of the recombinant allele (M)
is the A allele (Fig 10) In contrast, one can wonder whether the donor allele of the 5¢ part is the F allele or another allele belonging to the same B allelic lineage (excluding B1and C),
as they share the same simplified haplotype formula, up to exon 8 To reach a definite conclusion, the complete sequence of the 5¢ region of each allele would be required, because no differences have been found in the available sequences (exons and intron-flanking regions)
If our recombination hypothesis is correct, the break point should be located between nucleotide 86 upstream and nucleotide 40 downstream from exon 8, and the cross over should have been accompanied by a reciprocal exchange One can therefore expect to find the reciprocal recombinant allele among the alleles so far described The structural features of such a recombinant allele should be
an A-type sequence in the 5¢ part followed by a B-type (B2/B3/B4 or F) sequence in the 3¢ part The only allele found so far gathering such characteristics is allele B1, with a B2 simplified haplotype formula in its 3¢ part (Fig 10) This confirms our assumption and suggests that the M allele probably results from an interallelic recom-bination event involving alleles A and B2 whereas the reciprocal event might have given B1 However, with a leucine residue at position 66, it is clear that the M allele does not arise directly from the recombination event between alleles A and B2 It probably is derived from an intermediate hybrid allele (B2:A), putatively W, not yet identified, which was subsequently mutated at nucleotide
23 of the ninth exon
Because of its close similarity to its bovine and ovine
as1-Cas counterparts, allele B1 was considered to be the ancestral allele in the goat [6] The results reported here indicate that B1might result from an interallelic recombi-nation between alleles A and B, which can therefore be
Fig 7 Polymorphisms occurring at 25 sites in the goat a s1 -CasA, F and M alleles The position of each polymorphic site is identified and numbered relative to the nearest exon Intronic nucleotides are preceded by a ‘–’ or ‘+’ when they are upstream or downstream, repectively (e.g )11/1 corresponds to the nucleotide located 11 nucleotides upstream from the first exon) Polymorphic sites in an exon sequence are identified without a sign (e.g 8/4 identifies the 8th nucleotide of the 4th exon) RsaI/6–8 indicates the loss (–) or gain (+) of an RsaI restriction site within the DNA fragment spanning exon 6 to exon 8 Mutations specific for alleles M and F at position 23 in the 9th exon are highlighted The symbol D indicates the nucleotide deletion in allele F [6] The hatched boxes, identified by i7-e8-i8, encompass the putative recombination region.
Trang 9Fig 8 Genotyping the M allele at the a s1 -Cas locus Step I: ACRS-PCR using the primers pair MWU and C9LM* yields a 265/266-bp fragment, whatever the allele Amplicons are then submitted to restriction by TaqI (A) The TaqI restriction site (TCGA) created in exon 9 is indicated Nucleotides C and A* correspond to the mutation characteristic for allele M and substitution introduced within the primer C9LM*, respectively Fragments generated are finally analysed by agarose gel (2% Metaphore + 2% agarose) electrophoresis (B) Lane 1, Molecular mass marker (pBR322 digested by HaeIII); lane 2, nondigested PCR product; lanes 3–5, homozygous X/X, heterozygous M/F and heterozygous X/F samples, respectively, where X represents an allele different from F, B, E, and C Sizes (in bp) of DNA fragments are given on the right of the gel Step II: AS-PCR to discriminate between alleles F and M (A) Amplification between primers C9UM1 and C9LM1 generates DNA fragments of characteristic size for the allele (B) Agarose gel (2% Metaphore + 2% agarose) analysis of amplicons from heterozygous M/F (lane 1), homozygous X/X (lane 2), homozygous F/F (lane 3), F/F + X/X mix (lane 4), with X different from F, B, E, and C Lane 5 shows a molecular mass marker (pBR322, HaeIII digested) Sizes (in bp) of DNA fragments are given on the left of the gel.
Trang 10considered representatives of two ancestral allelic lineages The reciprocal proposal, i.e B1and W are parental alleles, the recombinant products of which are A and B2, cannot be ruled out (Fig 10) The latter proposal is, however, less plausible, given the low frequencies at which alleles B1and
Mhave been found in the goat populations analysed so far
It is worth noting that both alleles are characteristic of local breeds, Poitevine (France) and Montefalcone (Italy), respectively Nevertheless, whatever the hypothesis retained, the existence of two ancestral allelic lineages seems to be the most likely scenario Thus, interallelic recombination between two alleles may be responsible for the generation
of four possible allelic lineages (represented by A, B2, B1, and W), one of which (W/M) is revealed by this work The high polymorphism of the goat as1-Cas system provides further evidence that allelic diversity can arise from multiple pathways, including shuffling of polymorphic sequences generated by point mutations, through interallelic recombi-nation events
Fig 9 Phylogeny proposed by Grosclaude and Martin [6] for the
a s1 -Cas alleles and differences between the corresponding variants The
phylogenetic tree proposed is based on the existence of a single
ancestral allele (B 1 ), which was considered to be the original one given
its close similarity to its ovine and bovine a s1 -Cas counterparts.
Fig 10 A new phylogenetic tree integrating the possible interallelic recombination between two allelic lineages The four alleles (B 2 , A, B 1 , and W) putatively involved in the recombination event are schematically represented as a chain of six boxes (mimicking exons) on which are indicated polymorphic amino-acid residues and their position in the peptide chain A simplified haplotype formula is thus provided (e.g HPSPERT and HLS P QRT for alleles B 2 and A, respectively) The RsaI polymorphic restriction site and insertions occurring, respectively, between exons 6 and 8 and within intron 9 are indicated Alleles deriving from these four ‘potentially recombinant’ alleles (boxed) are circled Arrows indicate a possible pathway of evolution to alleles associated with high (black) or reduced (red) amounts of casein synthesized The M allele is derived from allele W by
a single nucleotide transition CfiT (nucleotide 23/exon 9) leading to the occurrence of a leucine residue (allele M) instead of the Ser (putative allele W) in the multiple phosphorylation site of a s1 -Cas The new phylogeny has been enriched with three novel variants (H, I and L) reported in 1997 by Chianese et al [14].