Results of nucleotide sequence analysis of ITS gene region between 14 studied tangerine varieties .... The results of nucleotide sequence analysis of the matK gene region between 14 res
Trang 1VIETNAM NATIONAL UNIVERSITY OF AGRICULTURE
FACULTY OF BIOTECHNOLOGY
UNDERGRADUATE THESIS
TOPIC:
IDENTIFICATION OF TANGERINE VARIETIES IN
VIETNAM USING ITS, MATK, RBCL PRIMERS
School year : 2017 – 2022
Major : Biotechnology
Lecture : Assoc Prof Dr TRAN DANG KHANH
Agricultural Genetics Institute Assoc Prof Dr DONG HUY GIOI Vietnam National University of Agriculture
HANOI – 2022
Trang 2i
COMMITMENT
I hereby declare that all results in this thesis are my own work
The data and results published in the thesis are completely honest, accurate and have not been published in any other works
Hanoi, March 2022
Student
To Hoang Anh Minh
Trang 3ii
ACKNOWLEDGMENTS
First of all, I would like to express my deep respect and gratitude to Assoc Prof
Dr Tran Dang Khanh - Head of Genetic Engineering Department - Agricultural Genetics institute and Assoc Prof.Dr Dong Huy Gioi - Head of Department of Biology - Vietnam national university of Agriculture who has directly guided, enthusiastically instructed and created all the best conditions for me during my
study and scientific research
I would like to thank the staff of the Department of Genetic Engineering - Agricultural Genetics institute for always encouraging me as well as providing valuable professional contributions for me to complete this thesis
I would also like to express my deep gratitude to the teachers of faculty of Biotechnology - Vietnam National University of Agriculture for helping me have the right and correct orientation to carry out my thesis
Finally, I would like to express my deep gratitude to my father, mother, family members as well as friends who have always supported and encouraged
me to be stable throughout the study and research process
Trang 4iii
CONTENTS
COMMITMENT i
ACKNOWLEDGMENTS ii
LIST OF TABLES vi
LIST OF FIGURES vii
SUMMARY viii
SECTION I: INTRODUCTION 1
1.1 The urgency of the subject 1
1.2 The goal of the subject 2
1.3 Research range 2
SECTION II: OVERVIEW 3
2.1 Overview of tangerine 3
2.2 Methods used in determining genetic relationships in citrus fruits 5
2.2.1 Genetic diversity, classification of citrus fruit groups based on morphological markers 6
2.2.2 Genetic diversity, classification of citrus fruit groups based on isozyme markers 7
2.2.3 Genetic diversity, classify citrus fruit groups based on DNA molecular markers 8
2.3 DNA barcode 11
2.3.1 The loci used in the method DNA barcoding in plants 14
2.3.2 Nuclear gene sequence 14
2.3.3 Ribosome coding region 15
2.3.4 Chloroplast gene sequence 16
2.3 Research on genetic diversity and classification in Vietnam citrus fruit trees 17
SECTION III: RESEARCH MATERIALS AND METHODS 19
3.1 Research meterials 19
Trang 5iv
3.2 Chemicals 19
3.3 Research methods 20
3.3.1 Total DNA extraction 20
3.3.2 PCR reaction 22
3.3.3 PCR cycle 22
3.3.4 Electrophoresis to check PCR products 22
3.3.5 Electrophoresis method on gel agarose 23
3.3.6 Electrophoresis method on gel polyacrylamide 24
3.3.7 Purification gel of kit Qiagen 24
3.3.8 Sequencing 25
3.3.9 Data analysis: 25
SECTION IV: RESULTS AND DISCUSSION 26
4.1 Results of nucleotide sequence analysis of ITS gene region between 14 studied tangerine varieties 26
4.2 The results of nucleotide sequence analysis of the matK gene region between 14 researched tangerine varieties 39
4.3 The results of nucleotide sequence analysis of the rbcL gene region between the studied tangerine samples 51
SECTION V: CONCLUSIONS AND PETITIONS 61
REFERENCES 62
Trang 6v
ABBREVIATION
Acronyms Full name
AFLP Amplified Fragment Length Polymorphism
CTAB Cetrimonium bromide
EDTA Ethylenediaminetetraacetic acid
FAO Food and Agriculture Organization
ISSR Inter-Simple Sequence Repeats
ITS Internal transcribed spacer
RAPD Randomly Amplified Polymorphic DNA
RFLP Restriction Fragment Length Polymorphism
SCAR Sequence Characterized Amplification Regions
Trang 7vi
LIST OF TABLES
Table 3.1 List of tangerine samples leaves in the study 19
Table 3.3 List of components in PCR reaction 22
Table 3.4 List of phase in PCR cycle 22
Table 3.5 List of components in Electrophoresis 23
Table 4.1 Evaluation of the similarity and coverage of the rbcL region DNA sequences of the research samples with the corresponding sequences on NCBI 27
Table 4.2 Some position differences in nucleotide sequences of tangerine samples research and reference samples 29
Table 4.3 Genetic similarity coefficient between 14 samples of tangerine studied and reference gene 38
Table 4.4 Coefficient of nucleotide sequence similarity in the matK gene region between 14 studied tangerine varieties and reference samples 40
Table 4.5 Some position differences in nucleotide sequences of tangerine-like samples research and reference samples 42
Table 4.6 Genetic similarity coefficient between 14 samples of tangerine studied and reference gene 51
Table 4.7 Evaluation of the similarity and coverage of the rbcL region DNA sequences of the research samples 53
Table 4.8 Some position differences in nucleotide sequences of the research and reference samples 55
Table 4.9 The nucleotide sequence similarity coefficient of the RbcL gene region between the studied tangerine samples and the reference sample 60
Trang 8vii
LIST OF FIGURES
Figure 2.1 Lists some of the studies using DNA barcoding as a plant
identification tool( Peter M Hollingsworth, 2011) 14 Table 3.2 List of primer pairs used in the study 20 Figure 4.1 PCR results of 14 samples with primers ITS1/ITS4 26 Figure 4.2 Images compare the nucleotide sequences of the ITS region of
the studied tangerine samples using the primer pairs ITS1/ITS4 37 Figure 4.3 Phylogenetic tree of the ITS gene region of 14 studied tangerine
varieties and reference genes 39
Figure 4.4 PCR results of 14 samples studied with matKCi1 primer pair 39 Figure 4.5 Some images comparing matK region nucleotide sequences of
studied tangerine samples with reference samples using primer pairs 50
Figure 4.6 Phylogenetic tree of the matK gene region of 14 studied
tangerine varieties and reference genes 51
Figure 4.7 PCR results of 44 research samples with primer pair rbcLCi2 52 Figure 4.8 Some images comparing the nucleotide sequences of the Rbc
region of the studied tangerine samples 59
Figure 4.9 Phylogenetic tree of the rbcL gene region of the studied and
reference samples 60
Trang 9viii
SUMMARY
Using the ITS genomic region is one of the most useful tools for phylogenetic assessment in both plants and animals because it is common in nature to have genotypes associated with highly conserved chloroplast genomes and Because of the specificity of each species, the use of chloroplast genome analysis results in phylogenetic studies and plant taxonomy is of great interest to scientists as a method widely applied in many different plant species In particular, with the situation that Vietnam's agricultural products are being interested in promoting export, so as to improve the ability to conserve, classify, exploit and ensure the origin of mandarin varieties Therefore, the topic "
Identification of tangerine varieties in Vietnam using ITS, matK, rbcL
primers " was carried out to identify local mandarin varieties, aiming to build a
database of genetic resources of these species native tangerine varieties Through the process of research, comparison and reference to published studies,
it shows the ability to identify some citrus varieties in the sample group based on ITS, matK, rbcL gene regions On that basis, continuing to research, perfect and expand the orientation of building DNA barcodes for tangerine sources, thereby helping to classify and identify genetic resources
Trang 101
SECTION I: INTRODUCTION 1.1 The urgency of the subject
Tangerine belongs to the group of citrus fruit trees, is the most widely produced fruit tree in the world There are two clearly differentiated markets: the fresh fruit market and the processed juice market The increase in the production
of citrus in the world was relatively stable in the last decades of the twentieth century, and brought about great economic resources they are produced in a variety of countries around the world, with a total production The citrus fruit quantity in 2019 is about 158.9 million tons (FAOSTAT, 2020) In Vietnam, there are many famous tangerine growing regions with an annual output of up to tens of thousands of tons However, due to the rapid increase in the area of citrus trees, despite the recommendations of the authorities and localities, farmers have begun to have consequences such as price reductions and price differences between provinces citrus growing area According to statistics of the Ministry
of Industry and Trade, in 2018, the area of fruit trees increased, concentrated in the citrus group (oranges, tangerines, pomelos) As of September 2018, the area
of citrus trees in the whole country reached 192,700 hectares, an increase of 3% over the same period in 2017 Meanwhile, most citrus trees are only consumed domestically and exported insignificantly With fruit products that meet the requirements of importers on food safety and hygiene criteria, high quality, branded Vietnam, the quantity of goods is not enough to satisfy the importer The biggest limitation of fruit production, including tangerines, is the small scale, making it difficult for investment, quality control and product consumption In order to promote the goal of meeting domestic demand, towards the export of Vietnam's agricultural products, including tangerines It is necessary to develop identification and study of each variety and genetic resources to determine the origin to help preserve, identify and develop mandarin varieties with economic value in each region Since then, I have done
Trang 112
the following research entitled "Identication of tangerine varieties in Vietnam
by ITS, matK, rbcL primers"
1.2 The goal of the subject
- Sequenced the chloroplast gene region (matK and rbcL) and the ITS
- Gene regions ITS, matK, rbcL
Trang 123
SECTION II: OVERVIEW 2.1 Overview of tangerine
Tangerine, scientifically known as Citrus reticulata (family: Rutaceae) is
a small, thorny tree, with a dense tip of slender branches Usually green and widely grown in India, Vietnam, and South Asian countries The trees begin to bear fruit when they are around 3 years old Fruit globose or oblong; the rind is bright orange or red-orange when ripe and loose, easily separating from the segments The seeds are small and pointed at one end, the cotyledons are green
in most cultivars (Das et al., 2014; Tiwari, 2009)
Citrus reticulata (order: Sapindales, family: Rutaceae) is a shrub or small
tree growing to 20 feet tall The plants have fragrant flowers and glossy leaves,
as well as spherical fruits with sweet aromatic pulp and pale orange-yellow to
burnt orange peel, loose and easily removed (Nogata et al., 2003) Citrus fruits
to eat; it is called mandarin (with a name usually reserved for red-skinned varieties), mandarin, or Kamala lebu in Bengali There are numerous reports on the antibacterial, antifungal, and antioxidant effects of C reticulate essential oil (CREO), as well as direct food-related application of various plant parts, including the fruit trees, fruit rinds, flowers and leaves
The fruit consists of three layers: the outer skin with oil glands that secrete essential oils, giving the characteristic orange smell; middle pods white, thread-like; and an inner membrane with 8–10 segments containing juice vesicles The leaves are ovate and elongated; The petioles have narrow petals and sweet-scented flowers, singly or in groups in the axils of the leaves The fruits range from spherical to sub-spherical, and small to medium in size Shells vary in thickness from thin to medium, with surface textures ranging from smooth to rough or cobblestone, which can be either easy or tight The color of both the rind and the flesh can range from yellow, or from light to dark orange Maturity ranges from very early to late, depending on cultivar and growing
Trang 134
conditions
The tangerine tree is commonly native to China and Southeast Asia The term "mandarin" was first used in the 19th century to describe tangerines with a deep orange-red exterior The mandarin/citrus group is so diverse, it has been
attempted to divide the members into different types and species Citrus unshiu (satsumas), C deliciosa (Mediterranean mandarin), C nobilis (king mandarin) and C reticulata (common mandarin) are known worldwide, but only C
reticulata and hybrids are known related is of economic importance in the US
Tangerines are usually red-orange skins darker than oranges, hard and sour, often used in cooking or suitable for export, tangerines have a more rounded shape
Quality characteristics: High-quality tangerines will have a dark brown skin that is relatively unblemished The fruit should be elliptical and firm The peel should be easily removed from the meat The edible portion should be juicy and contain little or no seeds
orange-Tangerines in some parts of the world, and their related hybrids, are the largest and most variable group of edible citrus species These include many cultivars of varying importance according to local needs and export markets, growing conditions and climate zones Large and growing yields are important
in domestic and world trade, especially for fresh fruit Tangerines are mainly consumed as fresh fruit There are a number of health benefits associated with consuming fresh citrus: citrus fruits are a good source of many essential nutrients and compounds, and a low-calorie (low-calorie) source of energy, fat, vitamin C, fiber, folate, potassium and several phytochemicals These diverse nutritional benefits from citrus, and especially the very convenient tangerine group, can only be obtained through consumption of fresh fruit and juice and cannot currently be obtained from dietary supplements (Forsyth, 2003)
In essence, both the words "mandarin" and "tangerine" are tangerines,
Trang 145
belonging to the citrus family, originating from southwestern China, but there are differences between them and users can hardly distinguish them distinguishable Mandarin is usually brighter orange, thin skin, smooth, juicy and sweet, used to eat directly, as a dessert The shape of “mandarin” is flatter than that of “tangerine”, it looks like a small orange
Tangerine is more common, has a darker orange-red rind than mandarin,
is hard and has a sour taste, often used in cooking or suitable for export Tangerine's shape is more rounded, botanically, mandarin refers to three classifications of oranges: satsumas, tangerine and miscellaneous hybrids Therefore, tangerine is classified as mandarin That's why, technically tangerine can be mandarin and they can be used interchangeably However, not all mandarins are tangerines, but all tangerines can be mandarin
2.2 Methods used in determining genetic relationships in citrus fruits
There are a lot of research methods to identify and classify the nexus between animals and plants All ways are based on principles such as: same ancestor, common traits, similar properties Besides that, for more accurate with the development of biotechnology industry, identification of species will always need genetic engineering When it combined with conventional morphological observations or folk empirical knowledge will bring more reference value and accuracy Scientists all across the globe have been studying the use of DNA barcodes to identify species recently, and it has made major contributions to species categorization The primary types of genes usually employed for gene identification or species evolution are ribosomal rRNA genes, mitochondrial genes, and chloroplast (plant) genes, of which 18S, 5S, and 16S rRNA genes are often utilized to assess organism-to-organism evolutionary links DNA markers are more accurate than morphological markers and chemical indicators since they are not dependent on any objective circumstances
Trang 15Rhodes, 1976), only three species (C medica, C reticulata, and C maxima)
have phylogenetic value; this is similar to the fact that citrus taxonomists were once thought to have only three or four species, according to early taxonomists
Typically, morphological markers are used to identify hybrid citrus fruit trees When genotypes exhibit dominant traits, hybrids derived from polyploid cultivars are easily identified and used as parent plants For example, three-lobed leaves are a taxonomic feature used as a morphological identifier of hybrids of this species with other citrus species They are a single-gene
dominant trait that distinguishes most species of Poncirus trifoliata (L.) Raf
However, when both the parent plant and the parent plant have dominant traits,
it is extremely difficult to identify hybrids from crosses between multifunctional
Trang 167
citrus genotypes or more
IPGRI has created tables of descriptions for the specific species used in the description and identification of different cultivars, species and genera, including the citrus group, to classify genera within this group It is a useful tool for morphological characterization and classification of citrus genera and species based on morphological criteria
2.2.2 Genetic diversity, classification of citrus fruit groups based on isozyme markers
It is extremely difficult to distinguish and identify citrus fruit trees based only on morphological, physiological and agronomic criteria To determine the genetic diversity of citrus cultivars, as well as to collect and conserve citrus genetic resources, a variety of molecular markers have been used These studies
(Torres et al., 1978) used isozymes for the first time in citrus fruits to examine
variation in genetic diversity in 33 citrus varieties Isozyme markers are used to test many research directions on citrus fruits Embryology is a topic of great interest in the scientific community Embryos can develop from two genetic origins From vegetative or fertilized cells that form when male and female gametes fuse The isozyme marker was chosen to identify fertilized seedlings because it is a co-dominant molecular marker that can easily detect the presence
of the father gene in the embryo by analyzing two isozyme systems PGI (Phosphogluco isomerase) and PGM (Phosphoglucomutase) The efficiency of this procedure was determined by vegetative cytology examination, which demonstrated that 86% of fertilized seeds were differentiated from self-mating seeds and more than 99% of fertilized seeds from all paired pairs in species
In addition, (Jarrell et al.) used isozyme and RFLP techniques to construct
a gene map of the citrus genome The genetic map was generated by analyzing the cleavage of 8 isozymes, 1 protein and 37 RFLP loci in 60 progeny of the
same genus (Citrus paradisi Macf X Poncirus trifoliata (L.) Raf.) and (Citrus
Trang 178
sinensis (L.) Osbeck x P Trifoliata) The map contains 38 of the 46 analyzed loci, scattered in ten association groups From ad-hoc linkage data, a genome size of 1700 cM was determined, 35% of the genome should be within 10 cM, and 58 percent of the genome should be within 20 cM of the marker map, with Eight loci in three linked groups and one unlinked locus do not conform to Mendel's law of segregation
2.2.3 Genetic diversity, classify citrus fruit groups based on DNA molecular markers
The main directions of marker research in citrus are as follows: Identifying varieties, building phylogenetic trees, and studying genetic variation are all things scientists do Salvo and colleagues matched taxonomic data not based on molecular data with chloroplast genomes to determine the evolutionary relationships of the Rutaceace family ISSR amplification is a PCR-based molecular method that can rapidly identify closely related individuals such as relatives 22 ISSR primers were used to detect variations in 94 plants from 68 citrus cultivars, dividing them into 6 crop groups (Fang and Roose, 1997) Other molecular approaches, according to Fang and Roose, make it difficult to distinguish the variants in these six taxa because they are so closely related To explore citrus phylogenetics, Nicolosi and colleagues used a combination of three molecular techniques: analysis of RAPD, SCAR and cpDNA using AFLP techniques Phylogenetic studies from cpDNA analysis, are among the best The group of citrus plants examined was classified into 8 groups using 262 RAPD
primers and 14 SCAR primers, as well as cpDNA analyses C grandis and C
sinensis are classified in the same group of Grapefruit The ISSR method to
study the phylogenetic tree of this group of fruit trees ISSR has studied closely related and economically valuable lemon varieties for the conservation and
development of lemon species grown in southern Italy (Capparelli et al., 2004)
In which, the SSR indicator has co-dominance, high polymorphism, and is
Trang 189
widely used to analyze genetic diversity of citrus species at the level of the same
species, different species, and the same population (Froelicher et al., 2008);
rapid identification of seedlings produced from hybrids and embryos used in crossbreeding to eliminate unwanted genotypes The results show that the SSR technique is more effective than the isozyme technique in discriminating heterozygous plants identification of hybrid citrus fruit trees using SSR markers
combined with morphological markers (Ruiz et al., 2000) The SSR indicator
has also been used with Citrus and Poncirus for phylogenetic analysis, as well as
to study the diversity and characteristics of Grapefruit (Corazza-Nunes et al.,
2002) Froelicher also recently used the SSR Mark in Citrus and Poncirus association and phylogenetic analysis, as well as studied grapefruit diversity and
characteristics (Corazza-Nunes et al., 2002) Froelicher also used 43 SSR markers derived from C reticulata to assess the genetic diversity of wild C
reticulata species concentrated in the northern mountainous region of Vietnam
There are many different methods of classifying plants, and most of them are built on concepts like species of common origin with comparable, increasingly the closer they are to each other ISSR technique is widely used in genetic diversity research, study of genetic characteristics in populations, genetic markers, gene markers, plant identification, origin analysis, and change determination genome modification and progeny evaluation The ISSR technique has several advantages over other techniques in that it can distinguish close genotypes and does not require information about the gene sequence of the studied tree Currently, SSR is the indicator of choice for forensic record studies, population genetics and wildlife studies In plants, SSR is used in genetic diversity studies, in hybrid pair selection, in hybrid identification, and in molecular linkage mapping RADP is widely used in the study of genetic diversity between plant species, in the study of cultivar characteristics and evaluation of genetic variation, in species identification and identification of
Trang 19Simple Sequence Repeats (SSR)
Simple Sequence Repeats (SSRs), microsatellites, Short Tandem Repeats (STRs) or simple sequence length polymorphisms (SSLPs) are the repetition of short nucleotide sequences of 2-6 nucleotides SSR (SSLP, STMS) has become
an important molecular marker technique in both animals and plants SSR is highly polymorphic because mutations affect the number of repeat units SSR variation or polymorphism results from differences in the length of repeats in the genome due to unequal crossovers or from nucleotide reductions during replication SSRs are not only common, but also highly variable in the number
of repeats in the eukaryotic genome The allele variation of SSR is the result of a change in the number of repeat units in the subsatellite structure Repeated sequences are usually simple and composed of 2, 3, or 4 nucleotides SSR technique was performed by PCR reaction with the forward and reverse SSR primers PCR products were separated on silver-stained polyacrylamide gels (AgNO3) or by automated sequencing The development of SSR markers is carried out in a number of steps such as: building SSR library, identifying SSR
Trang 2011
loci, determining suitable regions for primer design, PCR with designed primers, evaluating and analyzing samples tape, polymorphism evaluation of PCR products The SSR technique has some of advantages over other indicators such as:
i For multiple alleles in a locus;
ii Evenly distributed in the genome;
iii SSR gives more specific information than maternal mitochondrial inheritance (because of the high mutation rate) and paternal inheritance;
iv As a co-dominant indicator;
v Highly polymorphic and specific;
Reproducible in experiments, uses little DNA, is cheap and easy to conduct, can be analyzed semi-automatically, does not use radiation, can use ancient DNA-aDNA
SSR can distinguish closely related individuals An important limitation
of the SSR marker technique is the need to read the genome sequence to design specific primer pairs and optimize primer conditions for each species before use Currently, SSR is the indicator of choice for forensic record studies, population genetics and wildlife studies In plants, SSR is used in genetic diversity studies,
in hybrid pair selection, in hybrid identification, and in molecular linkage mapping
2.3 DNA barcode
The concept of DNA barcoding was first introduced by Paul Heber, a researcher at the University of Guelph, Ontario in 2003, to help identify samples (Hebert, 2003) DNA barcodes use a short sequence of DNA located in an organism's genome as a unique sequence of characters to help distinguish two species of organisms, similar to a supermarket scanner that reads two barcodes
of two products products that look very similar on the outside but are actually
Trang 2112
different Thus, DNA barcoding is a method of identification that uses a short standard DNA fragment located in the genome of the organism under study to determine to which species the organism belongs After about 10 years of research and development of DNA barcodes, up to now, scientists have published over 5,000 scientific works in specialized scientific journals, with 3,483,696 DNA barcode sequences in 215,513 biological species animals, in which there are 144,402 species of animals, 54,478 species of plants, 16,633 species of fungi and other living organisms (according to the organization of Life Barcode Data System - BOLD statistics until November 1, 2014; http://www.boldsystems.org/) From the above data, it can be seen that the research direction of building a DNA barcode database is being developed by many countries and scientists around the world, especially in recent years and will be significant research trend in the near future DNA barcoding is considered
a new tool, effectively supporting the research on taxonomy, new species discovery, identification of species and samples derived from organisms (
https://www.jellyfishdata.com.vn/ )
Up to now, many research results have shown that there are many specific DNA fragments used as barcoded DNA, barcoded DNA fragments can be nuclear DNA (nDNA) segments, such as: 18S, 5.6S, 26S, 5S spacer and ITS
zone; located in mitochondria (Mitochondrial DNA - mtDNA), such as: Cyt b
and control region; located in the chloroplast (Chloroplast DNA - cpDNA), such
as: matK, rcbL, atp β, ndn F, 16S (Cuenoud, 2002; Kress et al , 2008; Aron et al , 2008; Spooner et al , 2009) The genes belonging to cpDNA are highly
conserved and can be divided into 3 groups as follows: Group 1 is the genes
encoding elements of the photosynthetic system such as phytosystems( psaA,
psaB, psbA, psbB .), cytochrome b6f ( petA, petB .), ATP synthase ( atpA, atpB .), Rubisco ( rbcL ) and NAD(P)H dehydrogenase ( ndhA, ndhB .)
(Storchova, 2007); Group 2 is the genes encoding for rRNAs( rrn16, rrn5 .),
Trang 22atpβ, ndhF, trnL and matK introns , etc., ranging from order to subspecies level
Each DNA barcode has its own characteristics and is capable of distinguishing organisms at different levels (Ha Van Huan, 2018)
The focus of DNA barcoding research is changing away from performance comparisons of various DNA regions and toward practical applications There are two types of applications One goal is to contribute to the taxonomic process of identifying and delimiting species by providing insights on species-level taxonomy The second and most important application is to help with the identification of unknown specimens to known species Many activities need the development or use of plant identifications (e.g taxonomists, ecologists, conservationists, foresters, agriculturalists, forensic scientists, customs and quarantine officers) When it comes to applying DNA barcoding for plant identification, it's important to match the question at hand to the technique's discriminatory ability (Peter M Hollingsworth, 2011)
Trang 2314
2.3.1 The loci used in the method DNA barcoding in plants
The amplification, sequencing, and species discrimination success rates of
six loci, matK, rbcL, rpoB, rpoC1, trnH-psbA spacer from the chloroplast
genome, and ITS from the nuclear genome, were compared across various accessions of 36 Dendrobium species ITS, which has been suggested as a viable
plant barcode, yielded 100 percent species identification Another locus, matK,
which is likewise proposed as a universal plant barcode, resolved 80.56 percent
of the species Even when the % species resolution capabilities were calculated using sequences from additional Dendrobium species accessible on the NCBI
GenBank (93, 33, 20, 18 and 17 of ITS, matK, rbcL, rpoB, and rpoC1,
respectively), ITS remained the best( Hemant Kumar Singh, 2012)
2.3.2 Nuclear gene sequence
Internal transcribed spacer (ITS) is a nonfunctional RNA found between the structural ribosomal RNAs (rRNA) of a shared precursor transcript and is particularly valuable for clarifying links between congeneric species and closely
Figure 2.1 Lists some of the studies using DNA barcoding as a plant identification tool(
Peter M Hollingsworth, 2011)
Trang 24is substantially conserved within species but varies between species, it is frequently employed in taxonomy(Lackie, J.M., 2013)
2.3.3 Ribosome coding region
Within nuclear DNA, rDNA is another big section of repeated DNA sequence Because of its repeating structure, rDNA is vulnerable to recombination and, as a result, deletions and insertions, making it one of the genome's most fragile locations Due to the existence of many replication fork barriers, which serve to avoid collisions between DNA replication and transcription, rDNA is a typical location for replication fork arrest The rDNA region of the genome is the most often transcribed Due to their distinct look (also known as "Christmas tree" structures), various stages of rDNA transcription may be seen using electron microscopy and the Miller chromatin spreading method ("Miller spread") Gene repetitions make up rDNA, and more than half of them are not transcribed In general, mutations can accumulate in repeated sequences like this, especially in non-transcribed units, resulting in non-functional pseudogenes However, all repetitions in rDNA are quite
Trang 25different species(Van et al., 2000)
Currently, nrITS in the nuclear ITS region is considered to be one of the most useful tools for phylogenetic assessment in both plants and animals because it is widespread in nature, inherited from parents, and highly variable due to less functional limitations Several recent studies in sexually and asexually reproduced plants showed some degree of variation in copy numbers
of ITS1 and ITS2 sequences due to multiple causes such as inbreeding, cleavage, and recombination , high mutation rates and pseudogene formation of
functional genes lead to such changes(Yao et al., 2010)
2.3.4 Chloroplast gene sequence
The plant chloroplast genome is made up of two single copies, the big single-copy area and the small single-copy region, each of which is 120-160 kb
in size Two inverted repetitions (IRa and IRb) with an average length of 20 - 30
kb separate the two transcripts All of the rRNA genes (4 genes in higher plants), tRNA genes (35 genes), and other genes that code for chloroplast fusion proteins (about 100 genes) that are required for chloroplast formation are found in the chloroplast genome Because the chloroplast genome is highly conserved and
Trang 2617
species-specific, scientists are interested in using chloroplast genome analysis results in phylogenetic research and plant taxonomy
*)rbcL gene sequence
The plastid gene rbcL, which encodes the major subunit of rubilose - 1,5
bisphosphate carboxylase/oxygenase, is the most well-studied of the plastid
genes (RUBISCO) Plants' rbcL gene was the first to be sequenced With over 10,000 rbcL sequences available in GenBank, rbcL has been frequently
employed in phylogenetic research and plant taxonomy CBOL has identified
rbcL as one of the most promising gene sequences for DNA barcode
investigations in plants, owing to its ease of PCR amplification in various plant
groupings However, most organizations consider that rbcL should be used in conjunction with other barcode markers, such as matK, which are two common barcode locus for plants, due to limited species discrimination (Vijayan et al., 2010; Yao et al., 2010)
*)matK gene sequence
MatK is one of the most rapidly evolving chloroplast genes, measuring
roughly 1550 bp in length and coding for the enzyme maturase, which is
involved in the elimination of type 2 introns during RNA transcription MatK
has been employed as an indicator in the study of the link between species and phylogenetics in plants since it develops quickly and is found in practically all
plants CBOL investigated matK on approximately 550 plant species and
discovered that 90 percent of angiosperm samples amplified sequences with a
single primer pair, indicating that matK might be used as a standard plant barcode locus (Vijayan et al., 2010; Yong et al., 2010)
2.3 Research on genetic diversity and classification in Vietnam citrus fruit trees
In Vietnam, 2015, the research team Nguyen Thi Lan Hoa carried out the project: "Building DNA barcodes for endemic plant varieties with economic
Trang 2718
value in Vietnam" period 2013-2016 Eleven sets of DNA slides were built (SSR, SCoT, SNP, RADP loci) to identify 212 genetic resources of 11 plants and orchids with 3520 special alleles to help identify 140 gene sources The project has established 11 sets of specific barcode sequences for 11 crops (rice, beans, soybean, tea, pepper, longan, mango, etc,) based on gene/genomic region
matK, rbcL, trnH-psbA , rpoB, rpoC depending on the crop In which, each type
of genetic resource has at least 1 barcode sequence specific to the Vietnamese variety and a total of 39 sequences
Analysis of genetic diversity of samples of orange varieties (Vu Van Hieu, 2015) in Ha Giang using two genetic markers RAPD and ISSR together with the establishment of the genetic taxonomic tree showed the genetic relationship transmission between the studied orange cultivars The results showed that 25 RAPD primers and 5 ISSR primers used all polymorphisms (accounting for 100%), but the number of polymorphic bands per sample showed great variation with the genetic similarity of 39 lines fluctuating in the range of 0.62 - 0.98
Evaluation of genetic diversity of some local mandarin orange varieties in Vietnam by SSR markers (Le Thi Thu Trang, 2020) established DNA samples
of a group of 29 local mandarin varieties, using 30 SSR markers to study polymorphism among mandarin varieties The results show that the total number
of alleles detected at 30 loci is 93 different alleles, average 3.1 allele/locut The polymorphic information coefficient of primers (PIC) was the highest 0.81, the lowest 0.33, the average 0.55; and genetic similarity between 0.55 and 0.89 0.78 The study has identified 5 indicators for characteristic identification that are meaningful in variety identification for the conservation and selection of mandarin varieties in Vietnam
Trang 2819
SECTION III: RESEARCH MATERIALS AND METHODS
3.1 Research meterials
- 14 samples of tangerine leaves were collected from many provinces such
as Son La, Dong Thap, Hanoi, Hue, Lang Son, Yen Bai, Phu Tho, Tuyen Quang,
Ha Giang, Bac Kan
Table 3.1 List of tangerine samples leaves in the study
Sample code
3
Tangerine hong (Tieu
Thuong bang la - Van Chan -
3.2 Chemicals
Some common chemicals used in molecular biology from Sigma, Merck, CTAB, Tris base, Boric acid, NaCl, dNTPs, EDTA, 6X orange loading
Trang 2920
dye solution, Taq Polymerase, Ethanol, 2-propanol , Acetic acid glacial, Phenol, Chloroform, isoamyalcohol, Agarose
3.3 Research methods
3.3.1 Total DNA extraction
For each orange variety, 3 representative trees are taken, and the leaves of
are taken Total DNA was extracted according to the method of (Cheng et al,
2003)
Steps to take:
1 Grind 1.5 – 2 g of fresh leaves in liquid nitrogen to a fine powder with
a pestle and mortar
Table 3.2 List of primer pairs used in the study
Trang 305 Centrifuge 13000 rpm at room temperature
6 Then transfer the clear liquid from the top of the falcon tube to a new tube (4ml) Add an equal volume (4 ml) of isopropanol, mix well, and keep at -20°C for 30 min
7 Precipitate DNA by centrifuge at 10000 rpm for 10 min Wash the precipitate 2 - 3 times (2 - 3 h) each time with 1 ml of 70% Ethanol
8 Dry the precipitate at room temperature Add 2 ml of TE buffer and 15
µl of RNase A (10 mg/ml) Incubate at 37°C for 3 h
9 Transfer the DNA solution to a new tube Add 2 ml of chloroform-isoamyl alcohol (25:24:1), mix for 5-10 minutes and centrifuge for
phenol-10 minutes at phenol-10000 rpm
10 Transfer the supernatant to a new tube Add 750 µl 5M NaCl and 3ml
of steam saturated ether Mix well and centrifuge for 10 min at 10000 rpm
11 Remove the upper layer of ether Carefully transfer the lower part to the new tube
12 Add 3 ml of isopropanol, mix well and refrigerate at -20°C for 30 min Remove the DNA with the tip of a pipette or a glass rod
13 Wash the precipitate 3 times with 1.5 ml of 70% ethanol
14 Dry the precipitate at room temperature Add 800 µl of TE buffer to dissolve the precipitate
15 Keep the DNA solution at - 20°C
Trang 3122
3.3.2 PCR reaction
Table 3.3 List of components in PCR reaction
1 Double distilled deionized water 4.9
3.3.4 Electrophoresis to check PCR products
PCR products were electrophoresed on a 6.0% polyacrylamide gel and detected under ultraviolet light by ethidium bromide staining
Trang 3223
+ The polyacrylamide gel includes the following components (the gel sheet has dimensions of 30cm x 12cm):
Table 3.5 List of components in Electrophoresis
+ Polyacrylamide gel electrophoresis method
Mix the above solutions and use a syringe to inject between the two glass plates After 30 min the gel was completely polymerized Assemble the electrophoresis machine, conduct electrophoresis of PCR products with marker scale at 150V condition
Electrophoresis time is about 40 - 60 minutes Remove the gel from the glass, stain with 0.5% EtBr for 10 minutes, rinse with clean water Scan and image gels with a DigiDoc-It gel imager
3.3.5 Electrophoresis method on gel agarose
Weigh 0.6 g of agarose into 40 ml of 1X TAE, and bring to a boil until the agarose is completely dissolved Cool to 45-500C, add 2.5 µl Ethidium Bromide, pour into prepared gel mold After 30-60 minutes, when the gel has cooled and solidified, transfer the tray containing the gel to the electrophoresis machine and place the TAE 1X running buffer into the electrophoresis chamber so that the buffer is about 0.5-1 cm submerged in the gel
Sample investigation: PCR product was mixed with 4 µl loading dye and placed
Trang 3324
in wells on the gel
Run the electrophoresis: After the electrophoresis sample is completed, the electrophoresis machine is connected to the power supply, set at 130 V
Observation: the gel is examined under the ultraviolet lamp, the DNA will be illuminated by binding to EtBr
3.3.6 Electrophoresis method on gel polyacrylamide
Mix the above solutions and use a syringe to inject between the two glass plates After 30 min the gel was completely polymerized Assemble the electrophoresis machine, conduct electrophoresis of PCR products with marker scale at 150V condition
Electrophoresis time is about 40 - 60 minutes Remove the gel from the glass, stain with 0.5% EtBr for 10 minutes, rinse with clean water Scan and image gels with a DigiDoc-It gel imager
3.3.7 Purification gel of kit Qiagen
1 Cut the desired DNA fragment from the agarose gel, place the cut gel into a 2ml eppendorf tube
2 Add buffer QG according to the ratio of 3 volumes of QG: 1 volume of gel (100mg ~100μl)
3 Incubate at 50°C for about 10 minutes until the gel is completely dissolved
4 After the gel is completely dissolved, check the color of the solution should be yellow, if the color of the solution is orange or purple-blue, add 10μl sodium acetate 3M, pH 5
5 Add the above-dissolved sample solution to the QIAquick column and centrifuge at 13 000 rpm for 1 min
6 Add 500 μl of buffer QG to the QIAquick column and centrifuge at 13
000 rpm for 1 min to remove excess agarose
Trang 3425
7 Add 750 μl of buffer PE to the QIAquick column, allow the column to stand upright for 5 min, then centrifuge at 13 000 rpm for 1 min
8 Transfer the QIAquick column to a clean 1.5ml microcentrifuge tube
9 To dissolve the DNA, add 30 μl of water (pH 7 – 8.5) to the center of the membrane of the QIAquick column and centrifuge at 13 000 rpm for 1 min, collecting purified DNA
3.3.8 Sequencing
After being purified, PCR products were sequenced at Apical Scientific (Malaysia) Sequencing results were compared with homologous sequences on NCBI The sequences were then assembled and analyzed using the program MEGA v6.06
3.3.9 Data analysis:
Data were processed by using the Excel 2019 program
Trang 3526
SECTION IV: RESULTS AND DISCUSSION Results of analysis of nucleotide sequences of studied tangerine varieties 4.1 Results of nucleotide sequence analysis of ITS gene region between 14 studied tangerine varieties
Performing PCR reaction with primers ITS1/ITS4 the results showed that the studied samples all gave monomorphic bands with the size of about 750 bp(Figure 4.1.)
Figure 4.1 PCR results of 14 samples with primers ITS1/ITS4
M: Marker generuler 100bp plus DNA
PCR products with primers ITS1/ITS4 after purification were analyzed directly on the ABI 3730 Xl sequencer of Apical Scientific (Malaysia) ITS sequences of 14 samples/tangerine varieties studied were compared with similarity with reference sequences published on NCBI using BLAST program The analysis results showed that the sequence similarity of the samples varieties with reference genes ranged from 94.00% to 99.59% with a coverage of 97% In which, the sample of Ngoc Hoi tangerine has the lowest level of sequence similarity with the reference gene of 94.00% The samples of hong tagerine (Tieu Son), Trang Dinh tangerine, chum tangerine and Bac Kan mandarin have the highest level of sequence similarity with the reference gene at 99.59% This shows that the amplified ITS region sequences of some studied samples varieties have a high degree of similarity with those published on NCBI(table 4.1.)
Trang 3627
Table 4.1 Evaluation of the similarity and coverage of the rbcL region DNA sequences of
the research samples with the corresponding sequences on NCBI
No Code Reference sequence Similarity
(%)
Coverage (%)
the reference gene MH721728.1 Citrus reticulata(table 4.1.2)
Tich Giang tangerine (Q4) sample had a substitution of nucleotides C, C,
Trang 3728
G to A at positions 635, 680 and 690 respectively However, at positions 680 and 690, there is only one sample of Tich Giang tangerine whose nucleotide substitutions C, G to A are different from those of the reference genes and other samples/varieties in the research group, because so this tangerine sample can be accurately identified thanks to this difference
The tangerine Huong Can (Q5) has a nucleotide substitution G to A at
position 99 similar to the reference gene MH721728.1 Citrus reticulata and C to
T at position 119, which is different from the 5 published reference varieties on NCBI and the remaining 13 varieties The sample of tangerine Huong needs can
be accurately identified by this difference
Dong Khe tangerine (Q8) sample can be identified by the difference from the reference genes and the studied at positions 628 and 714 In these two positions, there are substitutions of nucleotides A into T and G, respectively
Sample of red tangerine Ngoc Hoi (Q9) has the substitution of nucleotides
T, G to A at positions 638 and 714 respectively Therefore, it is only possible to accurately identify Ngoc Hoi red tangerine varieties thanks to this difference compared to other varieties in the research group
The sample of tangerine (Q11) has the substitution of nucleotides A, C to
G at positions 37 and 212, respectively, which is different from other varieties in the research group, so it can be recognized exact form/pattern-like form thanks
to this nucleotide modification
Sample of Ha Giang sweet tangerine (Q12) has a unique nucleotide substitution at position 419, from C to T compared with 5 reference varieties published on NCBI and the remaining varieties Therefore, the sample of Ha Giang sweet tangerine can be accurately identified by this difference
Similarly, the sample of tangerine (Q13) has a unique nucleotide substitution at position 530, from C to G, which is different from the other varieties and reference genes published on NCBI Therefore, thanks to this difference, the