In this study, the Y-haplogroups were typed for eight branches of the Zhuang population with 13 biallelic markers and 7 Y-chromosome short tandem repeats STR, and for every haplogroup, t
Trang 1Y-chromosome Genotyping and Genetic Structure of Zhuang Populations
CHEN Jing1,2, LI Hui2,3,①
, QIN Zhen-Dong2, LIU Wen-Hong2, LIN Wei-Xiong4, YIN Rui-Xing5, JIN Li2, PAN Shang-Ling1,①
1 Department of Pathophysiology, Guangxi Medical University, Nanning 530021, China;
2.MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai 200433, China;
3 Department of Genetics, School of Medicine, Yale University, New Haven CT 06520-8005, USA;
4 Medical Research Center of Guangxi Medical University, Nanning 530021, China;
5 Institute of Cardiovascular Diseases, the First Affiliated Hospital, Guangxi Medical University, Nanning 530021, China
Abstract: Zhuang, the largest ethnic minority population in China, is one of the descendant groups of the ancient Bai-Yue
Linguistically, Zhuang languages are grouped into northern and southern dialects To characterize its genetic structure, 13 East Asian-specific Y-chromosome biallelic markers and 7 Y-chromosome short tandem repeat (STR) markers were used to infer the haplogroups of Zhuang populations Our results showed that O*, O2a, and O1 are the predominant haplogroups in Zhuang Frequency distribution and principal component analysis showed that Zhuang was closely related to groups of Bai-Yue origin and therefore was likely to be the descendant of Bai-Yue The results of principal component analysis and hierarchical clustering analysis contradicted the linguistically derived north-south division Interestingly, a west-east clinal trend of haplotype frequency changes was observed, which was supported by AMOVA analysis that showed that between-population variance of east-west division was larger than that of north-south division O* network suggested that the Hongshuihe branch was the center of Zhuang Our study suggests that there are three major components in Zhuang The O* and O2a constituted the original component; later, O1 was brought into Zhuang, especially eastern Zhuang; and finally, northern Han population brought O3 into the Zhuang populations
Key words: Y chromosome; Zhuang; internal genetic structure
Received: 2006-02-23; Accepted: 2006-04-07
This work was supported by the Priority Project of the National Natural Science Foundation of China (No 39993420), the Science Foundation of Guangxi Province (No.GSN0339041, GSY0542044) and the Genographic Project of National Geography
① Corresponding authors LI Hui, E-mail:LiHui.Fudan@gmail.com,Tel:+86-21-6564 2419;
PAN Shang-Ling, E-mail: s.pan@gxmu.net.cn ,Tel: +86-771-535 8292
The nonrecombining portion of the human
Y-chromosome (NRY), which is paternally inherited
and does not undergo recombination during cell
divi-sion, is prone to form population-specific
polymor-phisms In addition, single nucleotide polymorphisms
(SNP) on the Y-chromosome, which has a lower
probability of recurrent mutations and higher
reliabil-ity, are more group- and area-specific and can record
more accurately human historical migrations and
evolutionary events, a quality because of which they
are rapidly being accepted as one of the most
effec-tive markers for studying human evolution and origin[1,2] Using the techniques of denaturing high-performance liquid chromatography (DHPLC) and single-stranded conformation polymorphism (SSCP),
Underhill et al.[1,2] have investigated several Y-chro- mosome biallelic markers from populations world-wide in the past and constructed 131 Y-haplogroups and mapped human evolution genealogy In Asia, on the basis of 19 East Asian-specific polymorphic
markers on the Y-chromosome, Su et al.[3,4] estab-lished 17 Y-haplogroups, 7 of which were specific to
Trang 2the East Asian population By clearly tracing the
pa-ternal migration route in East Asia and the Pacific
Region, Su et al found that the South-Asian group
had more Y-haplogroups compared with the North-
Asian group, indicating that the East Asians
origi-nated from the south On the basis of the data of Su et
southern China using these Y-SNP markers and
ex-plained their origin, migration, mixture, and evolution,
thereby adding important genetic information and
evidence to the origin of these groups
With a population of more than 16 million,
Zhuang is China’s largest minority population, with
94% of its population living in the Guangxi
autono-mous region The Zhuang language belongs to the
Kam-Tai linguistic family, the Tai-Kadai sublinguistic
family, and the Tai-Sek branch[7], which can be
sub-classified into Southern and Northern dialects
bounded by Yongjiang River (for details on the
dis-tribution of the Zhuang branches, see Fig 1) It
should be noted that the Bouyei ethnic group in the
Guizhou Province actually belongs to the same
popu-lation as Zhuang, as shown by their language and culture, and the so-called Shui Hu in the Yunnan Province, which is completely different from Shui people in Guizhou, is in fact Bouyei Therefore, all of these ethnic groups are considered Zhuang academi-cally Unfortunately, the Zhuang population does not have its own written script and has to use Han char-acters to record events historically, and these records might be incomplete On the basis of the few avail-able historical records, Zhuang can be traced back to the ‘Luo-Yue’ and ‘Xi’Ou’ groups, 2000 years ago[7] However, the origin of Zhuang might be more com-plicated than expected because Zhuang might have experienced complex evolvement and migration and
to a great extent, may have a close relationship with the origin of Thai and Lao in southeast Asia At the same time, Zhuang might be mixed with other sur-rounding ethnic groups, especially with Han There-fore, many questions emerge: what is the exact origin
of Zhuang? Is there any genetic evidence to support their historical migration events? Is it reasonable to classify Zhuang into South and North groups just
Fig 1 Distribution of Zhuang branches
North Zhuang dialect group: Guibei, Liujiang, Hongshuihe, Yongbei, Youjiang, Bouyei, Qiubei, Nhang, Tai-Mène South Zhuang
dialect group: Yongnan, Tày, Man-Caolan, Nung, YanGuang, WenMa, E (Wuse), Tsün-Lao
Trang 3based on dialects without any genetic indication? Can
the genetic nature of Zhuang provide any information
for the study of the origin and migration of the
sur-rounding populations? Furthermore, when the early
east-Asian population moved northward via the
Guangxi region, were there any original genetic
ma-terials retained in the native Zhuangs?
Very few published reports are available about
Zhuang’s paternal genetic structure, and most of these
have mainly focused on some of its special branches
In this study, the Y-haplogroups were typed for eight
branches of the Zhuang population with 13 biallelic
markers and 7 Y-chromosome short tandem repeats
(STR), and for every haplogroup, the frequency was
calculated, the principal component analysis was
car-ried out, and the heredity framework was drawn The
authors of this study hope to determine Zhuang’s
in-herited structure at a genetic level and to provide
ge-netic data for further studies on Zhuang’s linguistics,
origin, transformation, and admixing with other
eth-nic groups
1 Materials and Methods
1 1 Sample collection and DNA extraction
A total of 129 blood samples were collected
from eight representative Zhuang-living areas in the
Guangxi Province, each representing eight Zhuang
branches DNA was extracted from white blood cells
using the traditional phenol-chloroform method[8]
Table 1 shows the details on the different branches
and the sample sizes All individuals who were
Zhuang for more than three generations and were
unrelated healthy males were asked to sign informed
consent at the time of recruitment
1 2 Y-chromosome biallelic typing
Two strategies were introduced to type Y-SNP
For SNPs with length variation, i.e., deletion or
inser-tion, fluorescence PCR (primer information shown in
Table 2) was used, the obtained product was
electro-phoresed on a 3100 genetic analyzer (ABI company,
USA) to determine the individual’s genotype For
Table 1 Sample distribution of eight branches of Zhuang from Guangxi province
Branches Abbreviation Size
Youjiang YJ 5
Hongshuihe HSH 39
Yongnan YN 19 Zuojiang/Tày ZJ 15 Dejing/Nung DJ 3
SNPs without length changes, i.e., substitution or transversion, PCR-RFLP (Restriction Fragment Length Polymorphism) was used[4,6,8- 10] The Y- hap-logroup of every subject was determined by the inte-grated analysis of the Y-SNP typing results of the two typing methods
For PCR, a primers mix was used, which con-tained M175, 0.04 μL; M121, 0.03 μL; M134, 0.03 μL; M117, 0.06 μL; M111, 0.08 μL; and M15, 0.02
μL Each PCR reaction (volume 5 μL) included KodDash polymerase (TOYOBA) 0.5 U, 0.26 μL Primer Mix, 10 ng of genomic DNA, and buffer The PCR conditions were as follows: 30 cycles of 98℃
for 10 s, 55℃ for 2 s, 74℃ for 2 s, and a final exten-sion at 4℃
To test the degree of variation among different branches, Y-STR was typed using the same method that was used for typing Y-SNP Information on Y-STR primers is also shown in Table 2 The volume
of each PCR system was 5 μL, containing KodDash polymerase, 0.05 μL; 10× buffer, 0.5 μL; dNTP (2.5 mmol/L each), 0.4 μL; genomic DNA, 10 ng; mixture
of primers, 0.4 μL Usually, PCR reaction was carried out in two panels Panel 1 included primer pairs DYS389, 0.05 μL × 2 (forward and reverse); DYS390, 0.07 μL × 2; DYS391, 0.08 μL × 2; with a total vol-ume of 0.4 μL Panel 2 included DYS388, 0.05 μL × 2; DYS392, 0.07 μL × 2; DYS393, 0.02 μL × 2;
DYS19, 0.06 μL × 2
Trang 4Table 2 Y-Chromosome SNP, STR fluorescence primers
Product size (bp)
Wide type Mutant type M121 F:ACAAAGACCTGGACAGATTAC
R:CCCTTAAAAACAGCATGATA
FAM 123 118 M117 F:GTACGAAGAAAATCAAGGCTATTA
R:TTGGGTAGAAAAACTGCAAGTAG
FAM 317 313 M175 F:TTGAGCAAGAAAAATAGTACCCA
R:TTCAGTTAGCCTTGATTGACTGT
FAM 226 221 M134 F:AGAATCATCAAACCCAGAAGG
R:TCTTTGGCTTCTCTTTGAACAG
NED 232 231 M15 F:ACAAATCCTGAACAATCGC
R:GTCTGGGAAGAGTAGAGAAAAG
FAM 151 142 M111 F:TAACATAAACAGTATGCCAAA
R:TGCCCTAAAGTTAATACCAG
HEX 197 195 DYS388 F:GTGAGTTAGCCGTTTAGCGA
R:CAGATCGCAACCACTGCG
FAM DYS389 F:CCAACTCTCATCTGTATTATCTATG
R:TCTTATCTCCACCCACCAGA
FAM DYS390 F:TATATTTTACACATTTTTGGGCC
R:TGACAGTAAAATGAACACATTGC
NED DYS391 F:CTATTCATTCAATCATACACCCA
R:GATTCTTTGTGGTGGGTCTG
HEX DYS392 F:TCATTAATCTAGCTTTTAAAAACAA
R:AGACCCAGTTGATGCAATGT
NED DYS393 F:GTGGTCTTCTACTTGTGTCAATAC
R:AACTCAAGTCCAAAAAATGAGG
HEX DYS394 F:CTACTGAGTTTCTGTTATAGT
R:ATGGCATGTAGTGAGGACA
HEX F: forward; R:reverse.
1 3 Statistical analysis
The Y-haplogroup of every individual was
de-fined based on the experimental results of the authors
of this study and the NRY haplogroup tree of
East-Asia shown in reference [11]
In practice, to ensure that an effective genetic
sample size has been obtained, for every branch,
other published data such as those on Bouyeis,
Zhuangs, Suis in Yunnan[12], Bouyeis in Guizhou [13],
and unpublished data (Fudan University, data not
shown) such as those on Tianlin Zhuangs, Shangsi
Zhuangs, Wuse Zhuangs, and Man-Caolan Zhuangs
in Guangxi were added to the relevant Zhuang
branches
Hierarchical clustering analysis was carried out
to show the genetic distance (affinity) among
Guangxi Zhuang branches, using SPSS13.0 by
calcu-lating the frequency of different haplogroups in every
branch The phylogeny relationship among Zhuang
branches was carried out by combining analysis of hierarchical clustering, principal component analysis, and association analysis to determine the association between phylogeny and Y-chromosome haplogroups; this was then observed using the gradient distribution chart which was drawn using Surfer7.0 software for every principal component according to its geo-graphic distribution, in which principal component value served as the height values
Variances among and within populations result-ing from Y-haplotype frequencies of different
branches were calculated using AMOVA (Analysis of
MOlecular VAriance framework) in Arlequin1.1 software to further elucidate the phylogeny of differ-ent Zhuang branches Finally, Zhuang’s genetic framework was drawn using the network 11.0 soft-ware with the same Y-STR haplogroups of different branches to show the detailed difference and associa-tion among Zhuang branches
Trang 52 Results
2 1 Distribution of NRY haplogroups in Zhuang
branches
The haplogroup frequencies of different Zhuang
branches stemming from the typing results of Y-SNP
were calculated As shown in Table 3, Zhuang’s
Y-haplogroups mainly cluster around O*, O1, O2a,
and O3, the four most common haplogroups in East
Asia Haplogroup O* is the most frequent, followed
by O2a and O1, showing that Zhuang is a typical
southern group in East Asia and possesses more
an-cient Y-haplogroups Interestingly, O3, O3e, and
O3e1, the characteristic haplogroups for East Asian
northern group were also frequently found in
Zhuangs, showing a common gene communication
between the two populations
2 2 Principal component analysis
Principal component analysis was carried out
using SPSS 13.0 software, and the principal
compo-nent dot plot of Y-SNP frequencies was drawn (Fig 2)
according to Y-SNP typing results of different
branches, as obtained by the authors of this study, and
additional data of Yunnan Bouyeis, Zhuangs, Suis,
Guangxi Tianlin, Shuangsi, Wuse, Man-Caolan
Zhuangs, and Guizhou Bouyeis In this analysis,
Tianlin was merged with the Youjiang branch,
Shangsi was merged with the Zuojiang branch, and
Bouyei with the Guibian branch
The result of principal component analysis showed that the cumulative contribution of principal component 1 (pc1) and component 2 (pc2) accounted for 82.5% of the total difference It is obvious from Fig 2 that 12 Zhuang branches and other correlated groups mainly gathered into two bigger groups Hongshuihe, Guibei, Yongbei, Yongnan, and Man-Caolan were located in the upper part of the principal components plot and constituted the first group, whereas Guibian, Zuojiang, Youjiang branch, Yunnan Zhuang, and Yunnan Bouyei clustered in the lower part of the chart, and constituted the second group Yunnan Shui Hu seemed to be isolated from all branches, but was somewhat closer to the second group on pc2 As shown in Fig 2, it is the pc2 that separated these branches into two groups Consider-ing the geographic location of each branch, it was observed that the difference between the two groups was, genetically, an east–west rather than a south-north profile, contradicting the traditional south-south-north grouping of Zhuangs by linguistic factors
Using the values of pc1 and pc2 as height values, the principal component gradient distribution diagram was drawn in the same manner as drawing relief map with contours according to the geographic location of each Zhuang branch (Fig 3) In the relief map of pc1,
it was observed that the peak value was near the Guangxi-Vietnam border, which gradually changed
Table 3 Y-SNP haplogroup frequencies of branches of Zhuang
Branches Size C D D1a F* K* O O1 O2a* O2a1 O3 O3a O3e* O3e1
YB 23 8.70 4.35 4.35 21.704 17.39 8.70 21.74 8.70 4.35 4.35
HSH 39 2.56 5.13 5.13 23.08 10.26 5.13 7.69 20.51 5.13 15.38
GN 21 4.76 4.76 4.76 38.10 4.76 4.76 4.76 14.29 14.29
YN 19 5.26 5.26 15.79 10.53 10.53 10.53 31.58 5.26 5.26
Trang 6Fig 2 The principal components plot of Y-SNP frequencies of Zhuang populations
Fig 3 Geographic map of Y-SNP principal components of Zhuang populations
X coordinate: longitude; Y coordinate: latitude The geographic map of Y-SNP principal components of Zhuang branches was drawn using principal component value as contour, wherein, the lower the value, the darker of the color For pc1 (left), the peak value was near the Guangxi-Vietnam border, which gradually changed north-eastward and north-westward The peak value of pc2 that classifies Zhuang branches into two main groups appeared in the east Hongshuihe basin (right), exhibiting an east-to-west gra-dient It is quite clear that contour lines run along rivers, indicating that native Zhuangs migrated upstream along rivers in Guangxi
in early times
Trang 7north-eastward and north-westward This might be a
clue to the spreading of the East Asian population in
ancient Guangxi when it first entered East Asia
Be-cause pc2 classifies Zhuang branches into two main
groups, its significance might be more definite The
peak value of pc2 appeared in the east Hongshuihe
basin, exhibiting an east-to-west gradient In addition,
a higher value of pc2 was seen in northwest Guangxi
bordering the Yunnan Province This might be
influ-enced by data of the Yunnan Bouyeis Furthermore,
from Fig 3 it is quite clear that contour lines run
along rivers, indicating that native Zhuangs migrated
upstream along rivers in Guangxi in early times
2 3 Correlation analysis
To understand the meaning of each principal
component, correlation analysis was carried out to
seek the origin of each principal component Each
principal component was sorted out after calculating
the correlation coefficient for every haplogroup such
that the positively and negatively correlated
hap-logroups of the principal components could be
ob-served Theoretically, the more positive correlated
haplogroup a principal component has, the more
dis-tinct is its genetic structure, and the more practical
meaning it has For more detail, see Figs 4 and 5
Apparently, the structure of pc2 is clearer than
that of pc1, after comparing the correlation
coeffi-cient values of the two principal components There
were many contradictions in the variables among
positively correlated haplogroups in pc1, whereas in
pc2, most positively associated variables fell within
the positive correlation area Moreover, pc2 was
markedly positively correlated with longitude,
show-ing that pc2 was more significant than pc1
Further analysis of pc1 showed that the number
of positive correlated haplogroups, despite their weak
correlations, was bigger than that of the negative
groups This is attributed mainly to the difference
between O2a and O3e, where O2a was a southern
aboriginal haplogroup in the East Asian population
and O3e was probably a northern haplogroup In this
study, it was observed that pc1 was positively
associ-ated with O2a (r = 0.69, p = 0.02) and that the values
of pcl were all positive Therefore, the meaning of pc1 underscores the positive correlation of Zhuang branches and Yunnan Shui Hu with O2a, showing that these ethnic groups are all typical southern groups of East-Asian population, a conclusion that is consistent with the historical records of Zhuang’s Baiyue origin
For pc2, the number of positively correlated haplogroups was almost the same as that of the nega-tive ones, with an opposite trend being found between O* and O2a: haplogroup O* was positively but O2a was negatively associated with pc2 Both pc2 and haplogroup O* were clearly related to longitude, im-plying that haplogroup O* was the main component
of pc2, i.e., it is O* that separates the Zhuang branches into two main groups With the westward movement of haplogroup O*, O2a retreated in the same direction This process agrees with the lower frequency of O2a in east Zhuang branches
No significant correlation was found between other haplogroups and the two principal components
2 4 Hierarchical clustering analysis
To further elucidate the relationship among Guangxi Zhuang branches, hierarchical clustering analysis was carried out with average linkage (be-tween groups) in SPSS 13.0 software and the results are shown in Fig 6, in which Mien and Yi are the foreign groups It is clear that the center of all Zhuang branches emerges in the Hongshuihe area, evolving gradually toward Yongbei, Yongnan, and Guibian along the Hongshuihe river, then toward Guibei and Man-Caolan, and finally toward Zuojing, Youjiang, and Yunnan, providing strong evidence once again that the difference among Zhuang branches is funda-mentally east-west and not south- north, as tradition-ally believed Intriguingly, Wuse Zhuangs living in Yongle, Rongshui, a northwest county in Guangxi, are genetically close to the remote Zuojiang Zhuangs who live in southwestern Guangxi, implying a special moving event in ancient times
Trang 8Fig 4 Correlations among principal components, Y-SNP frequencies, longitude, and latitude of Zhuang branches
Principal components were ranked after their correlation coefficient for relevant haplogroup were calculated to estimate the posi-tively and negaposi-tively correlated haplogroups of the principal components Each correlation coefficient was marked with relevant color according to its value There were many contradictions in the variables among positively correlated haplogroups in pc1 (left), whereas in pc2, most positively associated variables fell within the positive correlation area (right).
Fig 5 Correlations and statistical significance among principal components, Y-SNP frequencies, longitude, and latitude of Zhuang branches
r: correlation coefficient; P: probability value; long.: longitude; lat.: latitude Chromatism staff gauge shows different p values pc1
was positively associated with O2a, a typical south aboriginal haplogroup in East Asian population, underscoring the positive cor-relation of Zhuang branches and Yunnan Shui Hu with O2a For pc2, the number of positively correlated haplogroups was almost the same as that of the negative ones, with an opposite trend being observed between O* and O2a, in which haplogroup O* was positively but O2a was negatively associated with pc2
Trang 9Fig 6 Dendrogram of Y-SNP of Zhuang branches
Yi and Mien, two non-Baiyue populations living in southwest
China, were introduced to illustrate the cluster of Zhuang and
its relationship with surrounding non-Baiyue groups This
dendrogram showed that the ancestor of Zhuang gathered in
Hongshuihe area first, then spread toward Yongjiang and
You-jiang basin, and the surrounding area Genetically, Zhuang was
much far away from Yi and Mien
2 5 AMOVA analysis
Linguists divide the Zhuangs into the southern
group and northern group, according to their dialects
If this classification has a genetic basis, the variance
among populations should be greater, whereas the
variance within populations should be smaller in the
molecular variance framework analysis As suggested
by the pc2 results, the AMOVA results were
com-pared between the southern and northern groups and
between the eastern and western groups (Table 4)
However, no differences were observed, as once
an-ticipated Instead, the variance among populations of
the eastern-western groups was far greater than that
of the northern-southern groups, whereas the variance
within populations was quite the opposite, further suggesting that the classification of east and west Zhuang is more reasonable than that of the south- north grouping if these groups need to be distin-guished However, because the east-west difference is graded due to Zhuang population migration, it is very difficult to differentiate Zhuangs into two distinct groups genetically
haplogroup of Guangxi Zhuang branches
The combination of several short tandem repeats (STR) in NRY comprises another type of Y- chromo-some haplogroup, which reflects minute genetic variances in different populations with the same Y-SNP This differs from Y-SNP haplogroup, which can only be roughly used in the classification of sys-tematic phylogeny After sorting out different Zhuang branches on the basis of Y-SNP information, the ge-netic network was drawn using Network 11.0 with Y-STR data from individuals of all branches to show the relationship and differences between branches with the same Y-SNP haplogroups
As shown in Fig 7, the information of STR network of haplogroup O* is more abundant due to its higher frequency in all Zhuang branches Nine individuals from Hongshuihe branch with haplogroup O* had 8 Y-STRs, the highest Y-STRs frequency of all, three of which were shared by Guibian, Guibei, and Yongbei individuals In addition, Hongshuihe Zhuangs were closely related with individuals from other haplogroups, especially Yongbei and Guibei individuals, establishing its central position in all
Table 4 Results of AMOVA of northern-southern group and eastern-western group of Zhuang branches
Eastern–western Northern-southern Eastern: HSH, YN, YB, CL, GN Northern: YB, YJ, HB, HSH, GB
Groups
Western: ZJ, YJ, GB, E, DJ Southern: YN, ZJ, DJ, CL, E
Trang 10Fig 7 STR network of haplogroups O*, O1, O2a, and O3* of Zhuang branches
In the network of haplogroups O* and O1, Hongshuihe branch had more STR haplogroup polymorphisms and was closely related with others, especially Yongbei and Guibei individuals, establishing its central position in all Zhuang branches In the O2a network, Zuojiang branch had more Y-STRs In the O3* network, no shared STR haplogroup was seen.
Zhuang branches Only one Guibei individual shared
the same Y-STR haplogroups with that of Hongshuihe
although Guibei branch also had 8 STR haplogroups
that were distributed at the edge of the O* haplogroup
network More linkage was also found among
Zuoji-ang, YoujiZuoji-ang, and Dejing branches, which, as a
closed group, was more isolated from others Taken
together, all these findings verify and agree with
re-sults from the principal component analysis and the
above-mentioned clustering analysis
Likewise, in the O1 network, Hongshuihe branch had more STR haplogroup polymorphisms than others and occupied the center of the framework However, it failed to maintain its leading position in O2a and gave way to Zuojiang branch as a result of fewer individuals carrying the O2a haplogroup There was no shared STR haplogroup or any remarkable difference among branches in the O3* network One possible explanation is that O3* is not the dominant and characteristic haplogroup in Zhuangs