Identification of tangerine varieties in vietnam using its, matk, rbcl primers

Results of nucleotide sequence analysis of ITS gene region between 14 studied tangerine varieties .... The results of nucleotide sequence analysis of the matK gene region between 14 res

Trang 1

VIETNAM NATIONAL UNIVERSITY OF AGRICULTURE

FACULTY OF BIOTECHNOLOGY

UNDERGRADUATE THESIS

TOPIC:

IDENTIFICATION OF TANGERINE VARIETIES IN

VIETNAM USING ITS, MATK, RBCL PRIMERS

School year : 2017 – 2022

Major : Biotechnology

Lecture : Assoc Prof Dr TRAN DANG KHANH

Agricultural Genetics Institute Assoc Prof Dr DONG HUY GIOI Vietnam National University of Agriculture

HANOI – 2022

Trang 2

i

COMMITMENT

I hereby declare that all results in this thesis are my own work

The data and results published in the thesis are completely honest, accurate and have not been published in any other works

Hanoi, March 2022

Student

To Hoang Anh Minh

Trang 3

ii

ACKNOWLEDGMENTS

First of all, I would like to express my deep respect and gratitude to Assoc Prof

Dr Tran Dang Khanh - Head of Genetic Engineering Department - Agricultural Genetics institute and Assoc Prof.Dr Dong Huy Gioi - Head of Department of Biology - Vietnam national university of Agriculture who has directly guided, enthusiastically instructed and created all the best conditions for me during my

study and scientific research

I would like to thank the staff of the Department of Genetic Engineering - Agricultural Genetics institute for always encouraging me as well as providing valuable professional contributions for me to complete this thesis

I would also like to express my deep gratitude to the teachers of faculty of Biotechnology - Vietnam National University of Agriculture for helping me have the right and correct orientation to carry out my thesis

Finally, I would like to express my deep gratitude to my father, mother, family members as well as friends who have always supported and encouraged

me to be stable throughout the study and research process

Trang 4

iii

CONTENTS

COMMITMENT i

ACKNOWLEDGMENTS ii

LIST OF TABLES vi

LIST OF FIGURES vii

SUMMARY viii

SECTION I: INTRODUCTION 1

1.1 The urgency of the subject 1

1.2 The goal of the subject 2

1.3 Research range 2

SECTION II: OVERVIEW 3

2.1 Overview of tangerine 3

2.2 Methods used in determining genetic relationships in citrus fruits 5

2.2.1 Genetic diversity, classification of citrus fruit groups based on morphological markers 6

2.2.2 Genetic diversity, classification of citrus fruit groups based on isozyme markers 7

2.2.3 Genetic diversity, classify citrus fruit groups based on DNA molecular markers 8

2.3 DNA barcode 11

2.3.1 The loci used in the method DNA barcoding in plants 14

2.3.2 Nuclear gene sequence 14

2.3.3 Ribosome coding region 15

2.3.4 Chloroplast gene sequence 16

2.3 Research on genetic diversity and classification in Vietnam citrus fruit trees 17

SECTION III: RESEARCH MATERIALS AND METHODS 19

3.1 Research meterials 19

Trang 5

iv

3.2 Chemicals 19

3.3 Research methods 20

3.3.1 Total DNA extraction 20

3.3.2 PCR reaction 22

3.3.3 PCR cycle 22

3.3.4 Electrophoresis to check PCR products 22

3.3.5 Electrophoresis method on gel agarose 23

3.3.6 Electrophoresis method on gel polyacrylamide 24

3.3.7 Purification gel of kit Qiagen 24

3.3.8 Sequencing 25

3.3.9 Data analysis: 25

SECTION IV: RESULTS AND DISCUSSION 26

4.1 Results of nucleotide sequence analysis of ITS gene region between 14 studied tangerine varieties 26

4.2 The results of nucleotide sequence analysis of the matK gene region between 14 researched tangerine varieties 39

4.3 The results of nucleotide sequence analysis of the rbcL gene region between the studied tangerine samples 51

SECTION V: CONCLUSIONS AND PETITIONS 61

REFERENCES 62

Trang 6

v

ABBREVIATION

Acronyms Full name

AFLP Amplified Fragment Length Polymorphism

CTAB Cetrimonium bromide

EDTA Ethylenediaminetetraacetic acid

FAO Food and Agriculture Organization

ISSR Inter-Simple Sequence Repeats

ITS Internal transcribed spacer

RAPD Randomly Amplified Polymorphic DNA

RFLP Restriction Fragment Length Polymorphism

SCAR Sequence Characterized Amplification Regions

Trang 7

vi

LIST OF TABLES

Table 3.1 List of tangerine samples leaves in the study 19

Table 3.3 List of components in PCR reaction 22

Table 3.4 List of phase in PCR cycle 22

Table 3.5 List of components in Electrophoresis 23

Table 4.1 Evaluation of the similarity and coverage of the rbcL region DNA sequences of the research samples with the corresponding sequences on NCBI 27

Table 4.2 Some position differences in nucleotide sequences of tangerine samples research and reference samples 29

Table 4.3 Genetic similarity coefficient between 14 samples of tangerine studied and reference gene 38

Table 4.4 Coefficient of nucleotide sequence similarity in the matK gene region between 14 studied tangerine varieties and reference samples 40

Table 4.5 Some position differences in nucleotide sequences of tangerine-like samples research and reference samples 42

Table 4.6 Genetic similarity coefficient between 14 samples of tangerine studied and reference gene 51

Table 4.7 Evaluation of the similarity and coverage of the rbcL region DNA sequences of the research samples 53

Table 4.8 Some position differences in nucleotide sequences of the research and reference samples 55

Table 4.9 The nucleotide sequence similarity coefficient of the RbcL gene region between the studied tangerine samples and the reference sample 60

Trang 8

vii

LIST OF FIGURES

Figure 2.1 Lists some of the studies using DNA barcoding as a plant

identification tool( Peter M Hollingsworth, 2011) 14 Table 3.2 List of primer pairs used in the study 20 Figure 4.1 PCR results of 14 samples with primers ITS1/ITS4 26 Figure 4.2 Images compare the nucleotide sequences of the ITS region of

the studied tangerine samples using the primer pairs ITS1/ITS4 37 Figure 4.3 Phylogenetic tree of the ITS gene region of 14 studied tangerine

varieties and reference genes 39

Figure 4.4 PCR results of 14 samples studied with matKCi1 primer pair 39 Figure 4.5 Some images comparing matK region nucleotide sequences of

studied tangerine samples with reference samples using primer pairs 50

Figure 4.6 Phylogenetic tree of the matK gene region of 14 studied

tangerine varieties and reference genes 51

Figure 4.7 PCR results of 44 research samples with primer pair rbcLCi2 52 Figure 4.8 Some images comparing the nucleotide sequences of the Rbc

region of the studied tangerine samples 59

Figure 4.9 Phylogenetic tree of the rbcL gene region of the studied and

reference samples 60

Trang 9

viii

SUMMARY

Using the ITS genomic region is one of the most useful tools for phylogenetic assessment in both plants and animals because it is common in nature to have genotypes associated with highly conserved chloroplast genomes and Because of the specificity of each species, the use of chloroplast genome analysis results in phylogenetic studies and plant taxonomy is of great interest to scientists as a method widely applied in many different plant species In particular, with the situation that Vietnam's agricultural products are being interested in promoting export, so as to improve the ability to conserve, classify, exploit and ensure the origin of mandarin varieties Therefore, the topic "

Identification of tangerine varieties in Vietnam using ITS, matK, rbcL

primers " was carried out to identify local mandarin varieties, aiming to build a

database of genetic resources of these species native tangerine varieties Through the process of research, comparison and reference to published studies,

it shows the ability to identify some citrus varieties in the sample group based on ITS, matK, rbcL gene regions On that basis, continuing to research, perfect and expand the orientation of building DNA barcodes for tangerine sources, thereby helping to classify and identify genetic resources

Trang 10

1

SECTION I: INTRODUCTION 1.1 The urgency of the subject

Tangerine belongs to the group of citrus fruit trees, is the most widely produced fruit tree in the world There are two clearly differentiated markets: the fresh fruit market and the processed juice market The increase in the production

of citrus in the world was relatively stable in the last decades of the twentieth century, and brought about great economic resources they are produced in a variety of countries around the world, with a total production The citrus fruit quantity in 2019 is about 158.9 million tons (FAOSTAT, 2020) In Vietnam, there are many famous tangerine growing regions with an annual output of up to tens of thousands of tons However, due to the rapid increase in the area of citrus trees, despite the recommendations of the authorities and localities, farmers have begun to have consequences such as price reductions and price differences between provinces citrus growing area According to statistics of the Ministry

of Industry and Trade, in 2018, the area of fruit trees increased, concentrated in the citrus group (oranges, tangerines, pomelos) As of September 2018, the area

of citrus trees in the whole country reached 192,700 hectares, an increase of 3% over the same period in 2017 Meanwhile, most citrus trees are only consumed domestically and exported insignificantly With fruit products that meet the requirements of importers on food safety and hygiene criteria, high quality, branded Vietnam, the quantity of goods is not enough to satisfy the importer The biggest limitation of fruit production, including tangerines, is the small scale, making it difficult for investment, quality control and product consumption In order to promote the goal of meeting domestic demand, towards the export of Vietnam's agricultural products, including tangerines It is necessary to develop identification and study of each variety and genetic resources to determine the origin to help preserve, identify and develop mandarin varieties with economic value in each region Since then, I have done

Trang 11

2

the following research entitled "Identication of tangerine varieties in Vietnam

by ITS, matK, rbcL primers"

1.2 The goal of the subject

- Sequenced the chloroplast gene region (matK and rbcL) and the ITS

- Gene regions ITS, matK, rbcL

Trang 12

3

SECTION II: OVERVIEW 2.1 Overview of tangerine

Tangerine, scientifically known as Citrus reticulata (family: Rutaceae) is

a small, thorny tree, with a dense tip of slender branches Usually green and widely grown in India, Vietnam, and South Asian countries The trees begin to bear fruit when they are around 3 years old Fruit globose or oblong; the rind is bright orange or red-orange when ripe and loose, easily separating from the segments The seeds are small and pointed at one end, the cotyledons are green

in most cultivars (Das et al., 2014; Tiwari, 2009)

Citrus reticulata (order: Sapindales, family: Rutaceae) is a shrub or small

tree growing to 20 feet tall The plants have fragrant flowers and glossy leaves,

as well as spherical fruits with sweet aromatic pulp and pale orange-yellow to

burnt orange peel, loose and easily removed (Nogata et al., 2003) Citrus fruits

to eat; it is called mandarin (with a name usually reserved for red-skinned varieties), mandarin, or Kamala lebu in Bengali There are numerous reports on the antibacterial, antifungal, and antioxidant effects of C reticulate essential oil (CREO), as well as direct food-related application of various plant parts, including the fruit trees, fruit rinds, flowers and leaves

The fruit consists of three layers: the outer skin with oil glands that secrete essential oils, giving the characteristic orange smell; middle pods white, thread-like; and an inner membrane with 8–10 segments containing juice vesicles The leaves are ovate and elongated; The petioles have narrow petals and sweet-scented flowers, singly or in groups in the axils of the leaves The fruits range from spherical to sub-spherical, and small to medium in size Shells vary in thickness from thin to medium, with surface textures ranging from smooth to rough or cobblestone, which can be either easy or tight The color of both the rind and the flesh can range from yellow, or from light to dark orange Maturity ranges from very early to late, depending on cultivar and growing

Trang 13

4

conditions

The tangerine tree is commonly native to China and Southeast Asia The term "mandarin" was first used in the 19th century to describe tangerines with a deep orange-red exterior The mandarin/citrus group is so diverse, it has been

attempted to divide the members into different types and species Citrus unshiu (satsumas), C deliciosa (Mediterranean mandarin), C nobilis (king mandarin) and C reticulata (common mandarin) are known worldwide, but only C

reticulata and hybrids are known related is of economic importance in the US

Tangerines are usually red-orange skins darker than oranges, hard and sour, often used in cooking or suitable for export, tangerines have a more rounded shape

Quality characteristics: High-quality tangerines will have a dark brown skin that is relatively unblemished The fruit should be elliptical and firm The peel should be easily removed from the meat The edible portion should be juicy and contain little or no seeds

orange-Tangerines in some parts of the world, and their related hybrids, are the largest and most variable group of edible citrus species These include many cultivars of varying importance according to local needs and export markets, growing conditions and climate zones Large and growing yields are important

in domestic and world trade, especially for fresh fruit Tangerines are mainly consumed as fresh fruit There are a number of health benefits associated with consuming fresh citrus: citrus fruits are a good source of many essential nutrients and compounds, and a low-calorie (low-calorie) source of energy, fat, vitamin C, fiber, folate, potassium and several phytochemicals These diverse nutritional benefits from citrus, and especially the very convenient tangerine group, can only be obtained through consumption of fresh fruit and juice and cannot currently be obtained from dietary supplements (Forsyth, 2003)

In essence, both the words "mandarin" and "tangerine" are tangerines,

Trang 14

5

belonging to the citrus family, originating from southwestern China, but there are differences between them and users can hardly distinguish them distinguishable Mandarin is usually brighter orange, thin skin, smooth, juicy and sweet, used to eat directly, as a dessert The shape of “mandarin” is flatter than that of “tangerine”, it looks like a small orange

Tangerine is more common, has a darker orange-red rind than mandarin,

is hard and has a sour taste, often used in cooking or suitable for export Tangerine's shape is more rounded, botanically, mandarin refers to three classifications of oranges: satsumas, tangerine and miscellaneous hybrids Therefore, tangerine is classified as mandarin That's why, technically tangerine can be mandarin and they can be used interchangeably However, not all mandarins are tangerines, but all tangerines can be mandarin

2.2 Methods used in determining genetic relationships in citrus fruits

There are a lot of research methods to identify and classify the nexus between animals and plants All ways are based on principles such as: same ancestor, common traits, similar properties Besides that, for more accurate with the development of biotechnology industry, identification of species will always need genetic engineering When it combined with conventional morphological observations or folk empirical knowledge will bring more reference value and accuracy Scientists all across the globe have been studying the use of DNA barcodes to identify species recently, and it has made major contributions to species categorization The primary types of genes usually employed for gene identification or species evolution are ribosomal rRNA genes, mitochondrial genes, and chloroplast (plant) genes, of which 18S, 5S, and 16S rRNA genes are often utilized to assess organism-to-organism evolutionary links DNA markers are more accurate than morphological markers and chemical indicators since they are not dependent on any objective circumstances

Trang 15

Rhodes, 1976), only three species (C medica, C reticulata, and C maxima)

have phylogenetic value; this is similar to the fact that citrus taxonomists were once thought to have only three or four species, according to early taxonomists

Typically, morphological markers are used to identify hybrid citrus fruit trees When genotypes exhibit dominant traits, hybrids derived from polyploid cultivars are easily identified and used as parent plants For example, three-lobed leaves are a taxonomic feature used as a morphological identifier of hybrids of this species with other citrus species They are a single-gene

dominant trait that distinguishes most species of Poncirus trifoliata (L.) Raf

However, when both the parent plant and the parent plant have dominant traits,

it is extremely difficult to identify hybrids from crosses between multifunctional

Trang 16

7

citrus genotypes or more

IPGRI has created tables of descriptions for the specific species used in the description and identification of different cultivars, species and genera, including the citrus group, to classify genera within this group It is a useful tool for morphological characterization and classification of citrus genera and species based on morphological criteria

2.2.2 Genetic diversity, classification of citrus fruit groups based on isozyme markers

It is extremely difficult to distinguish and identify citrus fruit trees based only on morphological, physiological and agronomic criteria To determine the genetic diversity of citrus cultivars, as well as to collect and conserve citrus genetic resources, a variety of molecular markers have been used These studies

(Torres et al., 1978) used isozymes for the first time in citrus fruits to examine

variation in genetic diversity in 33 citrus varieties Isozyme markers are used to test many research directions on citrus fruits Embryology is a topic of great interest in the scientific community Embryos can develop from two genetic origins From vegetative or fertilized cells that form when male and female gametes fuse The isozyme marker was chosen to identify fertilized seedlings because it is a co-dominant molecular marker that can easily detect the presence

of the father gene in the embryo by analyzing two isozyme systems PGI (Phosphogluco isomerase) and PGM (Phosphoglucomutase) The efficiency of this procedure was determined by vegetative cytology examination, which demonstrated that 86% of fertilized seeds were differentiated from self-mating seeds and more than 99% of fertilized seeds from all paired pairs in species

In addition, (Jarrell et al.) used isozyme and RFLP techniques to construct

a gene map of the citrus genome The genetic map was generated by analyzing the cleavage of 8 isozymes, 1 protein and 37 RFLP loci in 60 progeny of the

same genus (Citrus paradisi Macf X Poncirus trifoliata (L.) Raf.) and (Citrus

Trang 17

8

sinensis (L.) Osbeck x P Trifoliata) The map contains 38 of the 46 analyzed loci, scattered in ten association groups From ad-hoc linkage data, a genome size of 1700 cM was determined, 35% of the genome should be within 10 cM, and 58 percent of the genome should be within 20 cM of the marker map, with Eight loci in three linked groups and one unlinked locus do not conform to Mendel's law of segregation

2.2.3 Genetic diversity, classify citrus fruit groups based on DNA molecular markers

The main directions of marker research in citrus are as follows: Identifying varieties, building phylogenetic trees, and studying genetic variation are all things scientists do Salvo and colleagues matched taxonomic data not based on molecular data with chloroplast genomes to determine the evolutionary relationships of the Rutaceace family ISSR amplification is a PCR-based molecular method that can rapidly identify closely related individuals such as relatives 22 ISSR primers were used to detect variations in 94 plants from 68 citrus cultivars, dividing them into 6 crop groups (Fang and Roose, 1997) Other molecular approaches, according to Fang and Roose, make it difficult to distinguish the variants in these six taxa because they are so closely related To explore citrus phylogenetics, Nicolosi and colleagues used a combination of three molecular techniques: analysis of RAPD, SCAR and cpDNA using AFLP techniques Phylogenetic studies from cpDNA analysis, are among the best The group of citrus plants examined was classified into 8 groups using 262 RAPD

primers and 14 SCAR primers, as well as cpDNA analyses C grandis and C

sinensis are classified in the same group of Grapefruit The ISSR method to

study the phylogenetic tree of this group of fruit trees ISSR has studied closely related and economically valuable lemon varieties for the conservation and

development of lemon species grown in southern Italy (Capparelli et al., 2004)

In which, the SSR indicator has co-dominance, high polymorphism, and is

Trang 18

9

widely used to analyze genetic diversity of citrus species at the level of the same

species, different species, and the same population (Froelicher et al., 2008);

rapid identification of seedlings produced from hybrids and embryos used in crossbreeding to eliminate unwanted genotypes The results show that the SSR technique is more effective than the isozyme technique in discriminating heterozygous plants identification of hybrid citrus fruit trees using SSR markers

combined with morphological markers (Ruiz et al., 2000) The SSR indicator

has also been used with Citrus and Poncirus for phylogenetic analysis, as well as

to study the diversity and characteristics of Grapefruit (Corazza-Nunes et al.,

2002) Froelicher also recently used the SSR Mark in Citrus and Poncirus association and phylogenetic analysis, as well as studied grapefruit diversity and

characteristics (Corazza-Nunes et al., 2002) Froelicher also used 43 SSR markers derived from C reticulata to assess the genetic diversity of wild C

reticulata species concentrated in the northern mountainous region of Vietnam

There are many different methods of classifying plants, and most of them are built on concepts like species of common origin with comparable, increasingly the closer they are to each other ISSR technique is widely used in genetic diversity research, study of genetic characteristics in populations, genetic markers, gene markers, plant identification, origin analysis, and change determination genome modification and progeny evaluation The ISSR technique has several advantages over other techniques in that it can distinguish close genotypes and does not require information about the gene sequence of the studied tree Currently, SSR is the indicator of choice for forensic record studies, population genetics and wildlife studies In plants, SSR is used in genetic diversity studies, in hybrid pair selection, in hybrid identification, and in molecular linkage mapping RADP is widely used in the study of genetic diversity between plant species, in the study of cultivar characteristics and evaluation of genetic variation, in species identification and identification of

Trang 19

Simple Sequence Repeats (SSR)

Simple Sequence Repeats (SSRs), microsatellites, Short Tandem Repeats (STRs) or simple sequence length polymorphisms (SSLPs) are the repetition of short nucleotide sequences of 2-6 nucleotides SSR (SSLP, STMS) has become

an important molecular marker technique in both animals and plants SSR is highly polymorphic because mutations affect the number of repeat units SSR variation or polymorphism results from differences in the length of repeats in the genome due to unequal crossovers or from nucleotide reductions during replication SSRs are not only common, but also highly variable in the number

of repeats in the eukaryotic genome The allele variation of SSR is the result of a change in the number of repeat units in the subsatellite structure Repeated sequences are usually simple and composed of 2, 3, or 4 nucleotides SSR technique was performed by PCR reaction with the forward and reverse SSR primers PCR products were separated on silver-stained polyacrylamide gels (AgNO3) or by automated sequencing The development of SSR markers is carried out in a number of steps such as: building SSR library, identifying SSR

Trang 20

11

loci, determining suitable regions for primer design, PCR with designed primers, evaluating and analyzing samples tape, polymorphism evaluation of PCR products The SSR technique has some of advantages over other indicators such as:

i For multiple alleles in a locus;

ii Evenly distributed in the genome;

iii SSR gives more specific information than maternal mitochondrial inheritance (because of the high mutation rate) and paternal inheritance;

iv As a co-dominant indicator;

v Highly polymorphic and specific;

Reproducible in experiments, uses little DNA, is cheap and easy to conduct, can be analyzed semi-automatically, does not use radiation, can use ancient DNA-aDNA

SSR can distinguish closely related individuals An important limitation

of the SSR marker technique is the need to read the genome sequence to design specific primer pairs and optimize primer conditions for each species before use Currently, SSR is the indicator of choice for forensic record studies, population genetics and wildlife studies In plants, SSR is used in genetic diversity studies,

in hybrid pair selection, in hybrid identification, and in molecular linkage mapping

2.3 DNA barcode

The concept of DNA barcoding was first introduced by Paul Heber, a researcher at the University of Guelph, Ontario in 2003, to help identify samples (Hebert, 2003) DNA barcodes use a short sequence of DNA located in an organism's genome as a unique sequence of characters to help distinguish two species of organisms, similar to a supermarket scanner that reads two barcodes

of two products products that look very similar on the outside but are actually

Trang 21

12

different Thus, DNA barcoding is a method of identification that uses a short standard DNA fragment located in the genome of the organism under study to determine to which species the organism belongs After about 10 years of research and development of DNA barcodes, up to now, scientists have published over 5,000 scientific works in specialized scientific journals, with 3,483,696 DNA barcode sequences in 215,513 biological species animals, in which there are 144,402 species of animals, 54,478 species of plants, 16,633 species of fungi and other living organisms (according to the organization of Life Barcode Data System - BOLD statistics until November 1, 2014; http://www.boldsystems.org/) From the above data, it can be seen that the research direction of building a DNA barcode database is being developed by many countries and scientists around the world, especially in recent years and will be significant research trend in the near future DNA barcoding is considered

a new tool, effectively supporting the research on taxonomy, new species discovery, identification of species and samples derived from organisms (

https://www.jellyfishdata.com.vn/ )

Up to now, many research results have shown that there are many specific DNA fragments used as barcoded DNA, barcoded DNA fragments can be nuclear DNA (nDNA) segments, such as: 18S, 5.6S, 26S, 5S spacer and ITS

zone; located in mitochondria (Mitochondrial DNA - mtDNA), such as: Cyt b

and control region; located in the chloroplast (Chloroplast DNA - cpDNA), such

as: matK, rcbL, atp β, ndn F, 16S (Cuenoud, 2002; Kress et al , 2008; Aron et al , 2008; Spooner et al , 2009) The genes belonging to cpDNA are highly

conserved and can be divided into 3 groups as follows: Group 1 is the genes

encoding elements of the photosynthetic system such as phytosystems( psaA,

psaB, psbA, psbB .), cytochrome b6f ( petA, petB .), ATP synthase ( atpA, atpB .), Rubisco ( rbcL ) and NAD(P)H dehydrogenase ( ndhA, ndhB .)

(Storchova, 2007); Group 2 is the genes encoding for rRNAs( rrn16, rrn5 .),

Trang 22

atpβ, ndhF, trnL and matK introns , etc., ranging from order to subspecies level

Each DNA barcode has its own characteristics and is capable of distinguishing organisms at different levels (Ha Van Huan, 2018)

The focus of DNA barcoding research is changing away from performance comparisons of various DNA regions and toward practical applications There are two types of applications One goal is to contribute to the taxonomic process of identifying and delimiting species by providing insights on species-level taxonomy The second and most important application is to help with the identification of unknown specimens to known species Many activities need the development or use of plant identifications (e.g taxonomists, ecologists, conservationists, foresters, agriculturalists, forensic scientists, customs and quarantine officers) When it comes to applying DNA barcoding for plant identification, it's important to match the question at hand to the technique's discriminatory ability (Peter M Hollingsworth, 2011)

Trang 23

14

2.3.1 The loci used in the method DNA barcoding in plants

The amplification, sequencing, and species discrimination success rates of

six loci, matK, rbcL, rpoB, rpoC1, trnH-psbA spacer from the chloroplast

genome, and ITS from the nuclear genome, were compared across various accessions of 36 Dendrobium species ITS, which has been suggested as a viable

plant barcode, yielded 100 percent species identification Another locus, matK,

which is likewise proposed as a universal plant barcode, resolved 80.56 percent

of the species Even when the % species resolution capabilities were calculated using sequences from additional Dendrobium species accessible on the NCBI

GenBank (93, 33, 20, 18 and 17 of ITS, matK, rbcL, rpoB, and rpoC1,

respectively), ITS remained the best( Hemant Kumar Singh, 2012)

2.3.2 Nuclear gene sequence

Internal transcribed spacer (ITS) is a nonfunctional RNA found between the structural ribosomal RNAs (rRNA) of a shared precursor transcript and is particularly valuable for clarifying links between congeneric species and closely

Figure 2.1 Lists some of the studies using DNA barcoding as a plant identification tool(

Peter M Hollingsworth, 2011)

Trang 24

is substantially conserved within species but varies between species, it is frequently employed in taxonomy(Lackie, J.M., 2013)

2.3.3 Ribosome coding region

Within nuclear DNA, rDNA is another big section of repeated DNA sequence Because of its repeating structure, rDNA is vulnerable to recombination and, as a result, deletions and insertions, making it one of the genome's most fragile locations Due to the existence of many replication fork barriers, which serve to avoid collisions between DNA replication and transcription, rDNA is a typical location for replication fork arrest The rDNA region of the genome is the most often transcribed Due to their distinct look (also known as "Christmas tree" structures), various stages of rDNA transcription may be seen using electron microscopy and the Miller chromatin spreading method ("Miller spread") Gene repetitions make up rDNA, and more than half of them are not transcribed In general, mutations can accumulate in repeated sequences like this, especially in non-transcribed units, resulting in non-functional pseudogenes However, all repetitions in rDNA are quite

Trang 25

different species(Van et al., 2000)

Currently, nrITS in the nuclear ITS region is considered to be one of the most useful tools for phylogenetic assessment in both plants and animals because it is widespread in nature, inherited from parents, and highly variable due to less functional limitations Several recent studies in sexually and asexually reproduced plants showed some degree of variation in copy numbers

of ITS1 and ITS2 sequences due to multiple causes such as inbreeding, cleavage, and recombination , high mutation rates and pseudogene formation of

functional genes lead to such changes(Yao et al., 2010)

2.3.4 Chloroplast gene sequence

The plant chloroplast genome is made up of two single copies, the big single-copy area and the small single-copy region, each of which is 120-160 kb

in size Two inverted repetitions (IRa and IRb) with an average length of 20 - 30

kb separate the two transcripts All of the rRNA genes (4 genes in higher plants), tRNA genes (35 genes), and other genes that code for chloroplast fusion proteins (about 100 genes) that are required for chloroplast formation are found in the chloroplast genome Because the chloroplast genome is highly conserved and

Trang 26

17

species-specific, scientists are interested in using chloroplast genome analysis results in phylogenetic research and plant taxonomy

*)rbcL gene sequence

The plastid gene rbcL, which encodes the major subunit of rubilose - 1,5

bisphosphate carboxylase/oxygenase, is the most well-studied of the plastid

genes (RUBISCO) Plants' rbcL gene was the first to be sequenced With over 10,000 rbcL sequences available in GenBank, rbcL has been frequently

employed in phylogenetic research and plant taxonomy CBOL has identified

rbcL as one of the most promising gene sequences for DNA barcode

investigations in plants, owing to its ease of PCR amplification in various plant

groupings However, most organizations consider that rbcL should be used in conjunction with other barcode markers, such as matK, which are two common barcode locus for plants, due to limited species discrimination (Vijayan et al., 2010; Yao et al., 2010)

*)matK gene sequence

MatK is one of the most rapidly evolving chloroplast genes, measuring

roughly 1550 bp in length and coding for the enzyme maturase, which is

involved in the elimination of type 2 introns during RNA transcription MatK

has been employed as an indicator in the study of the link between species and phylogenetics in plants since it develops quickly and is found in practically all

plants CBOL investigated matK on approximately 550 plant species and

discovered that 90 percent of angiosperm samples amplified sequences with a

single primer pair, indicating that matK might be used as a standard plant barcode locus (Vijayan et al., 2010; Yong et al., 2010)

2.3 Research on genetic diversity and classification in Vietnam citrus fruit trees

In Vietnam, 2015, the research team Nguyen Thi Lan Hoa carried out the project: "Building DNA barcodes for endemic plant varieties with economic

Trang 27

18

value in Vietnam" period 2013-2016 Eleven sets of DNA slides were built (SSR, SCoT, SNP, RADP loci) to identify 212 genetic resources of 11 plants and orchids with 3520 special alleles to help identify 140 gene sources The project has established 11 sets of specific barcode sequences for 11 crops (rice, beans, soybean, tea, pepper, longan, mango, etc,) based on gene/genomic region

matK, rbcL, trnH-psbA , rpoB, rpoC depending on the crop In which, each type

of genetic resource has at least 1 barcode sequence specific to the Vietnamese variety and a total of 39 sequences

Analysis of genetic diversity of samples of orange varieties (Vu Van Hieu, 2015) in Ha Giang using two genetic markers RAPD and ISSR together with the establishment of the genetic taxonomic tree showed the genetic relationship transmission between the studied orange cultivars The results showed that 25 RAPD primers and 5 ISSR primers used all polymorphisms (accounting for 100%), but the number of polymorphic bands per sample showed great variation with the genetic similarity of 39 lines fluctuating in the range of 0.62 - 0.98

Evaluation of genetic diversity of some local mandarin orange varieties in Vietnam by SSR markers (Le Thi Thu Trang, 2020) established DNA samples

of a group of 29 local mandarin varieties, using 30 SSR markers to study polymorphism among mandarin varieties The results show that the total number

of alleles detected at 30 loci is 93 different alleles, average 3.1 allele/locut The polymorphic information coefficient of primers (PIC) was the highest 0.81, the lowest 0.33, the average 0.55; and genetic similarity between 0.55 and 0.89 0.78 The study has identified 5 indicators for characteristic identification that are meaningful in variety identification for the conservation and selection of mandarin varieties in Vietnam

Trang 28

19

SECTION III: RESEARCH MATERIALS AND METHODS

3.1 Research meterials

- 14 samples of tangerine leaves were collected from many provinces such

as Son La, Dong Thap, Hanoi, Hue, Lang Son, Yen Bai, Phu Tho, Tuyen Quang,

Ha Giang, Bac Kan

Table 3.1 List of tangerine samples leaves in the study

Sample code

3

Tangerine hong (Tieu

Thuong bang la - Van Chan -

3.2 Chemicals

Some common chemicals used in molecular biology from Sigma, Merck, CTAB, Tris base, Boric acid, NaCl, dNTPs, EDTA, 6X orange loading

Trang 29

20

dye solution, Taq Polymerase, Ethanol, 2-propanol , Acetic acid glacial, Phenol, Chloroform, isoamyalcohol, Agarose

3.3 Research methods

3.3.1 Total DNA extraction

For each orange variety, 3 representative trees are taken, and the leaves of

are taken Total DNA was extracted according to the method of (Cheng et al,

2003)

Steps to take:

1 Grind 1.5 – 2 g of fresh leaves in liquid nitrogen to a fine powder with

a pestle and mortar

Table 3.2 List of primer pairs used in the study

Trang 30

5 Centrifuge 13000 rpm at room temperature

6 Then transfer the clear liquid from the top of the falcon tube to a new tube (4ml) Add an equal volume (4 ml) of isopropanol, mix well, and keep at -20°C for 30 min

7 Precipitate DNA by centrifuge at 10000 rpm for 10 min Wash the precipitate 2 - 3 times (2 - 3 h) each time with 1 ml of 70% Ethanol

8 Dry the precipitate at room temperature Add 2 ml of TE buffer and 15

µl of RNase A (10 mg/ml) Incubate at 37°C for 3 h

9 Transfer the DNA solution to a new tube Add 2 ml of chloroform-isoamyl alcohol (25:24:1), mix for 5-10 minutes and centrifuge for

phenol-10 minutes at phenol-10000 rpm

10 Transfer the supernatant to a new tube Add 750 µl 5M NaCl and 3ml

of steam saturated ether Mix well and centrifuge for 10 min at 10000 rpm

11 Remove the upper layer of ether Carefully transfer the lower part to the new tube

12 Add 3 ml of isopropanol, mix well and refrigerate at -20°C for 30 min Remove the DNA with the tip of a pipette or a glass rod

13 Wash the precipitate 3 times with 1.5 ml of 70% ethanol

14 Dry the precipitate at room temperature Add 800 µl of TE buffer to dissolve the precipitate

15 Keep the DNA solution at - 20°C

Trang 31

22

3.3.2 PCR reaction

Table 3.3 List of components in PCR reaction

1 Double distilled deionized water 4.9

3.3.4 Electrophoresis to check PCR products

PCR products were electrophoresed on a 6.0% polyacrylamide gel and detected under ultraviolet light by ethidium bromide staining

Trang 32

23

+ The polyacrylamide gel includes the following components (the gel sheet has dimensions of 30cm x 12cm):

Table 3.5 List of components in Electrophoresis

+ Polyacrylamide gel electrophoresis method

Mix the above solutions and use a syringe to inject between the two glass plates After 30 min the gel was completely polymerized Assemble the electrophoresis machine, conduct electrophoresis of PCR products with marker scale at 150V condition

Electrophoresis time is about 40 - 60 minutes Remove the gel from the glass, stain with 0.5% EtBr for 10 minutes, rinse with clean water Scan and image gels with a DigiDoc-It gel imager

3.3.5 Electrophoresis method on gel agarose

Weigh 0.6 g of agarose into 40 ml of 1X TAE, and bring to a boil until the agarose is completely dissolved Cool to 45-500C, add 2.5 µl Ethidium Bromide, pour into prepared gel mold After 30-60 minutes, when the gel has cooled and solidified, transfer the tray containing the gel to the electrophoresis machine and place the TAE 1X running buffer into the electrophoresis chamber so that the buffer is about 0.5-1 cm submerged in the gel

Sample investigation: PCR product was mixed with 4 µl loading dye and placed

Trang 33

24

in wells on the gel

Run the electrophoresis: After the electrophoresis sample is completed, the electrophoresis machine is connected to the power supply, set at 130 V

Observation: the gel is examined under the ultraviolet lamp, the DNA will be illuminated by binding to EtBr

3.3.6 Electrophoresis method on gel polyacrylamide

Mix the above solutions and use a syringe to inject between the two glass plates After 30 min the gel was completely polymerized Assemble the electrophoresis machine, conduct electrophoresis of PCR products with marker scale at 150V condition

Electrophoresis time is about 40 - 60 minutes Remove the gel from the glass, stain with 0.5% EtBr for 10 minutes, rinse with clean water Scan and image gels with a DigiDoc-It gel imager

3.3.7 Purification gel of kit Qiagen

1 Cut the desired DNA fragment from the agarose gel, place the cut gel into a 2ml eppendorf tube

2 Add buffer QG according to the ratio of 3 volumes of QG: 1 volume of gel (100mg ~100μl)

3 Incubate at 50°C for about 10 minutes until the gel is completely dissolved

4 After the gel is completely dissolved, check the color of the solution should be yellow, if the color of the solution is orange or purple-blue, add 10μl sodium acetate 3M, pH 5

5 Add the above-dissolved sample solution to the QIAquick column and centrifuge at 13 000 rpm for 1 min

6 Add 500 μl of buffer QG to the QIAquick column and centrifuge at 13

000 rpm for 1 min to remove excess agarose

Trang 34

25

7 Add 750 μl of buffer PE to the QIAquick column, allow the column to stand upright for 5 min, then centrifuge at 13 000 rpm for 1 min

8 Transfer the QIAquick column to a clean 1.5ml microcentrifuge tube

9 To dissolve the DNA, add 30 μl of water (pH 7 – 8.5) to the center of the membrane of the QIAquick column and centrifuge at 13 000 rpm for 1 min, collecting purified DNA

3.3.8 Sequencing

After being purified, PCR products were sequenced at Apical Scientific (Malaysia) Sequencing results were compared with homologous sequences on NCBI The sequences were then assembled and analyzed using the program MEGA v6.06

3.3.9 Data analysis:

Data were processed by using the Excel 2019 program

Trang 35

26

SECTION IV: RESULTS AND DISCUSSION Results of analysis of nucleotide sequences of studied tangerine varieties 4.1 Results of nucleotide sequence analysis of ITS gene region between 14 studied tangerine varieties

Performing PCR reaction with primers ITS1/ITS4 the results showed that the studied samples all gave monomorphic bands with the size of about 750 bp(Figure 4.1.)

Figure 4.1 PCR results of 14 samples with primers ITS1/ITS4

M: Marker generuler 100bp plus DNA

PCR products with primers ITS1/ITS4 after purification were analyzed directly on the ABI 3730 Xl sequencer of Apical Scientific (Malaysia) ITS sequences of 14 samples/tangerine varieties studied were compared with similarity with reference sequences published on NCBI using BLAST program The analysis results showed that the sequence similarity of the samples varieties with reference genes ranged from 94.00% to 99.59% with a coverage of 97% In which, the sample of Ngoc Hoi tangerine has the lowest level of sequence similarity with the reference gene of 94.00% The samples of hong tagerine (Tieu Son), Trang Dinh tangerine, chum tangerine and Bac Kan mandarin have the highest level of sequence similarity with the reference gene at 99.59% This shows that the amplified ITS region sequences of some studied samples varieties have a high degree of similarity with those published on NCBI(table 4.1.)

Trang 36

27

Table 4.1 Evaluation of the similarity and coverage of the rbcL region DNA sequences of

the research samples with the corresponding sequences on NCBI

No Code Reference sequence Similarity

(%)

Coverage (%)

the reference gene MH721728.1 Citrus reticulata(table 4.1.2)

Tich Giang tangerine (Q4) sample had a substitution of nucleotides C, C,

Trang 37

28

G to A at positions 635, 680 and 690 respectively However, at positions 680 and 690, there is only one sample of Tich Giang tangerine whose nucleotide substitutions C, G to A are different from those of the reference genes and other samples/varieties in the research group, because so this tangerine sample can be accurately identified thanks to this difference

The tangerine Huong Can (Q5) has a nucleotide substitution G to A at

position 99 similar to the reference gene MH721728.1 Citrus reticulata and C to

T at position 119, which is different from the 5 published reference varieties on NCBI and the remaining 13 varieties The sample of tangerine Huong needs can

be accurately identified by this difference

Dong Khe tangerine (Q8) sample can be identified by the difference from the reference genes and the studied at positions 628 and 714 In these two positions, there are substitutions of nucleotides A into T and G, respectively

Sample of red tangerine Ngoc Hoi (Q9) has the substitution of nucleotides

T, G to A at positions 638 and 714 respectively Therefore, it is only possible to accurately identify Ngoc Hoi red tangerine varieties thanks to this difference compared to other varieties in the research group

The sample of tangerine (Q11) has the substitution of nucleotides A, C to

G at positions 37 and 212, respectively, which is different from other varieties in the research group, so it can be recognized exact form/pattern-like form thanks

to this nucleotide modification

Sample of Ha Giang sweet tangerine (Q12) has a unique nucleotide substitution at position 419, from C to T compared with 5 reference varieties published on NCBI and the remaining varieties Therefore, the sample of Ha Giang sweet tangerine can be accurately identified by this difference

Similarly, the sample of tangerine (Q13) has a unique nucleotide substitution at position 530, from C to G, which is different from the other varieties and reference genes published on NCBI Therefore, thanks to this difference, the

Tiêu đề	Identification of tangerine varieties in Vietnam using its, MATK, RBCL primers
Tác giả	To Hoang Anh Minh
Người hướng dẫn	Assoc. Prof. Dr. Tran Dang Khanh, Assoc. Prof. Dr. Dong Huy Gioi
Trường học	Vietnam National University of Agriculture
Chuyên ngành	Biotechnology
Thể loại	undergraduate thesis
Năm xuất bản	2022
Thành phố	Hanoi

Định dạng
Số trang	75
Dung lượng	8,89 MB