In this research, we aimed to: (i) analyzing and comparing the characteristic of the Raman spectra between rice varieties sample; (ii) pretreating Raman spectra a[r]
Trang 1Identification of rice varieties specialties in Vietnam
using Raman spectroscopy
Le Truong Giang 1,2* , Pham Quoc Trung 1 , Dao Hai Yen 1
1 Institute of Chemistry, Vietnam Academy of Science and Technology,
18 Hoang Quoc Viet, Cau Giay, Hanoi 10000, Vietnam 2
Graduate University of Science and Technology, Vietnam Academy of Science and Technology,
18 Hoang Quoc Viet, Cau Giay, Hanoi 10000, Vietnam
Received February 20, 2019, Accepted April 17, 2020
Abstract
The characteristics and quality of rice are significantly affected by its variety However, discrimination between varieties is an urgent but difficult and time-consuming effort in Vietnam In this study, an effective and reliable identification method was established by Raman spectroscopy (RS) Total Raman spectra of 32 rice samples were acquired from 400 to 1600 cm-1 and the sensitive fundamental vibrations of less polar groups and bonds in rice were analyzed Initially, the raw Raman spectra were processed by standard normal variety (SNV) combined with Savitzky
confirmed by principal component analysis (PCA) Next, multivariate analysis methods included PCA, hierarchical cluster analysis (HCA), and K-nearest neighbor (KNN), that have been compared with each other on the ability to classify rice varieties All three methods give the ability to classify four rice varieties very well The PCA method identifies four main factors were starch chains, amylose, amylopectin, and protein contents which are used to distinguish among four rice varieties While HCA only distinguishes well between rice with high and low amylopectin content and does not provide the main components
Keywords Rice varieties, Raman spectroscopy, PCA, HCA, KNN
1 INTRODUCTION
Rice is an important food for more than half of the
world's population They provide energy for the
body in the form of carbohydrates, proteins,
vitamins, and various trace elements.[1] Vietnam is
known as the leading rice export country in the
world, with many kinds of high-quality rice such as
ST25, Huong Lai, Tam, and Seng Cu These
specialty types of rice have higher economic value
than other conventional rice types In recent years,
some traders have changed their product labels,
mixing different types of rice for-profit purposes
This has seriously affected specialty rice brands,
interests of consumers, and businesses It is therefore
of great significance to ensure that products for
which geographical indications are protected,
through achieving reliable identification and
classification, is of great significance
Over the last decade, several methods have been
described for the traceability of rice These methods
include detecting differences in inorganic, organic,
and flavor components.[2,3] A few types of research have used stable isotope methods to differentiate between rice in different regions such as Vietnam, Japan, and China.[4] In general, chemical properties play an essential role in defining rice types However, there are still many drawbacks to these techniques, including long detection time, high cost, destructibility, etc In recent years, non-destructive and rapid detection methods have become important For example, using low-field nuclear magnetic resonance (NMR) and near-infrared spectroscopic (NIR) combined with the stoichiometry method were reported as an approach for classifying rice.[5,6] Similar to NMR and NIR, Raman spectroscopy was known as a fast and non-destructive method used to identify different materials based on the frequency of molecular vibrations.[7] Different components generate energy levels for molecular rotational and fluctuations, which can be shown by the difference in the Raman shift Therefore, each component in any material is characterized by its specific spectrum Notably, it is particularly useful
Trang 2for water-rich samples compared to infrared
spectroscopy For example, Raman spectroscopy has
been used to detect organic compounds in foods
such as pesticide residues,[8] glucose in blood,[9]
vitamin,[10] etc Moreover, the imitation of cooking
oil by mineral oil was discovered by using the
Raman spectrum and near-infrared spectrum In a
study of rice collected from different agricultural
areas in Korea, Hwang and colleagues used the
Raman spectrum to detect the geographical origin of
rice grains.[11] Currently, there is no specific report
on the classification of different varieties of
Vietnamese rice In this research, we aimed to: (i)
analyzing and comparing the characteristic of the
Raman spectra between rice varieties sample; (ii)
pretreating Raman spectra and using multivariate
analysis such as PCA, KNN, and HCA to evaluate
and identify rice varieties
2 MATERIALS AND METHODS
2.1 Materials
A total of 32 samples; including 16 Seng Cu rice
(MV), 8 Tam rice (T), 4 Ki Deo rice (K), and 4
sticky rice (N) The samples were composed of
different species and were cultivated in diverse
geographical regions of Vietnam The sample was
washed with deionized water, and then dried at 40
o
C until the weight was unchanged, and all the rice
kernel samples were ground with a sample miller
(LM-3100, Perten, Sweden) to obtain fine
powder.[12]
2.2 Methods
2.2.1 Spectral collection method
A LabRAM HR Evolution (HORIBA Jobin Yvon
S.A.S France) instrument was used to collect the
Raman spectrum of rice samples The condition of
LabRAM HR Evolution was set as follows: 50x
objective lens, 20 mW laser power, 1.5 cm-1
resolution at room temperature (25 °C), and relative
humidity below 60 % The excitation wavelengths
and time were set at 632.8 nm and 30 s, respectively,
time a scanning range from 100 to 1600 cm-1.[12] The
rice sample scans were replicated three times
2.2.2 Raman spectra pre-processing
Spectra of the sample could have been recorded over
several days, it is very difficult to calibrate the
Raman instrument precisely to have the same Raman
shift axis, laser power, and spectral resolutions
(depend on gratings) Before using multivariate analysis, the Raman spectra should be treated by a different kind of method such as mean centering (MC), mean scattering correction (MSC).[13] In this study, a Savitsky-Golay smoothing filter[14] and second-order polynomial deconvolution (SGD2) combined with Standard Normal Variate (SNV) method[15] were performed in this data to obtain the best results Initial, SNV is used to normalize Raman data of rice samples when they are measured at different times After that, the spectral data were processed to reduce background noise by a second-order polynomial 100-point S–G smoothing algorithm
2.3 Multivariate data analysis
Multivariate analysis is divided into 3 main groups including exploration methods, calibration methods, and classification methods.[16] In this paper, the exploration method included principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used to analyze the rice distribution Subsequently, classification methods K-nearest
neighbor (KNN) was compared to identify the best
fitting model for rice varieties
2.3.1 Principal component analysis (PCA)
The principal component analysis uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.[12,17] In this study, after
pre-processing all spectrum data for rice samples were
subjected to a PCA to find patterns in the complex data by reducing the dimensions Nine principal components (PCs) were found, and then whose subset were selected (referred to as PCk) with high cumulative contribution rate (> 90 %) After that, a distribution chart was plotted base on eigenvalues for the PCk, distances between points represent the magnitude of difference So the main characteristics
of the different rice varieties can be classified base
on the loading graph
2.3.2 Hierarchical cluster analysis (HCA)
Hierarchical clustering starts by treating each Raman spectrum as a separate cluster Then, it repeatedly executes the following two steps: (1) differences between spectra were calculated based on the method Squared Euclidean; (2) using the Ward method to merge the two most similar clusters.[18]
Trang 32.3.3 K-nearest neighbor (KNN)
KNN proposed by Fix and Hodges includes the
distances calculated among all data points Then,
K-closest neighbors are found by sorting the distance
matrix.[19] The K-closest data points are analyzed to
determine which class label is the most common
among the set KNN has good performance in
dealing with multiclass problems.[20]
3 RESULTS AND DISCUSSION
3.1 Spectral analysis
The composition of the rice is complex, moreover,
the uneven distribution in the grain Figure 1 gives
information about the characteristic bands for the
different groups In the Raman spectra of four
different rice samples were randomly chosen, these
main band included 308, 356, 408, 446, 479, 579,
766, 874, 947, 1001, 1061, 1085, 1127, 1203, 1265,
1341, 1405 and 1460 cm-1 Table 1 illustrates each
characteristic spectral feature for the vibrational or
rotational modes (stretching, bending, torsional
fundamental vibrations, etc.) of the different
functional groups and the skeletal information of the
ring component
Characteristics peaks of the glucose unit in
starches were found at 446 cm-1, 479 cm-1, and 579
cm-1 The strong absorption of approximate 479 cm-1
is probably an important skeletal vibration that can
reflect the degree of crystallinity in rice starch The
fingerprint region for the Raman method from 800
to 1500 cm-1 provides the highly overlapping and
complex vibration modes for different functional
groups Polysaccharides, which are condensed from
multiple glucose units, can be assigned by the
different vibrational states of glucose in this region,
such as the deformation vibration of CH2 (or CH3) at
1460 cm-1, CH bending at 1341 cm-1, COH
deformation and CO stretching between 1085 and
1127 cm-1, and C1-H deformation at 874 cm-1.[21] In
rice, starch, which is the main component, can be
assigned to α-1,4 linkage vibrations (stretching
vibration of COC) at 947 cm-1, and a slight location
change may be associated with the amylopectin
α-1,6 linkage.[12] A band near 1265 cm-1 was attributed
to a CH2OH deformation vibration, which is closely
related to crystalline structures in starch.[22] Other
components of rice, such as protein and lipids, were
associated with vibrations at 1460 to 1341 cm-1 (CH2
twisting vibration), the COH stretching vibration,
and OH twisting vibration at 1200 to 1000 cm-1.[21]
Bands at 1001 and 1061 cm-1 were vibrations that
originated from protein side chains
Figure 1: Raman spectra of selected rice samples
Figure 2a gives information about the spectral features of the sixteen rice samples were randomly chosen from collected spectral libraries Overall, it is obvious that the bands in the spectra are analogous
to each other, which suggests that the samples have similar compositions However, it is clear that the intensity does not have similar between rice samples due to amylopectin branching and amylose lengths were different among the cultivars and varieties Since these differences in the spectrum cannot be confidently visualized, a clearer method of differentiation is needed Thus, multivariate analysis methods such as PCA, HCA, and KNN were combined with the Raman spectrum results to further interpret the data
3.2 Preprocessing of Raman spectral
Pretreatment is performed to eliminate the effects of unevenness, base compensation, and noise signals in the spectral data collected for rice samples In this study, before performing further spectral processing, all the spectra were pre-processed according to Sections 2.2 It can be seen that the background signal fluctuates dramatically from 500 to approximately 3000 counts Moreover, the background noise is relatively large in the raw Raman spectrum (figure 2a) The opposite was true for that of the corrected spectra (figure 2b) From this figure, it is clear that the background interference and baseline drift in the raw spectra have been effectively eliminated
In this study, PCA was applied to both the original and corrected spectra of rice grains for classifications (with six samples selected from the Seng Cu and Tam rice) The results are shown in figure 3, overall it obvious that the rice grains are not
200 400 600 800 1000 1200 1400 1600 2000
3000 4000 5000 6000 7000 8000 9000
T05
K01
MV03
N02
Raman Shift (cm -1 )
Trang 4classified before baselines corrected (figure 3a)
Looking at the information in more detail, the
difference scores PC2 among the samples in the same
group were clearly shown Standing at 0.365 and
-0.183, the score PC2 of MV05 and T01, which differ
greatly from the rest of the samples in the same group respectively While after the baseline-corrected, samples were classified into two groups, which refer
to Seng Cu and Tam varieties (figure 3b)
Table 1: Attribution spectrometry Raman of rice
Wavenumber
579
C-O bending vibration
Skeletal modes C–C stretch Skeletal modes of the pyranose ring
Glucan
766 O=C-N deformation vibration and OH
linkage (C–O–C)
Glycogen and branched-chain
starch
1061 C–C stretching
1127 C-O stretching vibration and C-O-H flexural
1265 Amide III band C-N stretching vibration peak
1460 C-H In-plane bending vibration and CH2 and
Figure 2: Comparing Raman spectra of four rice varieties
a - Raman raw data, b - preprocessing by SNV-SGD2
Trang 5Figure 3: Score scatter plot for the first two PCs of rice sample
a - raw data; b - preprocessing using SNV-SGD2 From the above results, it can be suggested that
the influencing factors in the process of acquiring
Raman spectra were effectively eliminated
Moreover, they also help increase the ability to
classify rice varieties by SNV-SGD2 Therefore,
when distinguishing rice varieties by the Raman
spectrum, the SNV-SGD2 method is necessary for
pre-treatment
3.3 Principal Components Analysis
Clearly, it is impossible to distinguish rice varieties
based on only one factor due to the difference signal
of amylose or amylopectin spectra in rice samples is
not clear Therefore, it is necessary to evaluate all
signals of rice components, which are amylose,
amylopectin, protein, and lipid for purpose
discriminant varieties rice
Figure 4: Full-scale Raman spectra of four rice
varieties after preprocessed by SNV-SGD2
The principal component analysis was used in this study for discriminant among four varieties rice, with input data is the peak area at some characteristic wavelengths as follows: S1 (420-450 nm); S2 (470-560 nm); S3 (570-580 nm); S4 (710-720 nm); S5 (860-880 nm); S6 (920-980 nm); S7
(1000-1200 nm) and S8 (1300-140 nm) (figure 4) The results of PCA indicate that the first nine principal components (PC) explained 100 % variance of the data (table 2) The PC1 represented 73.62 % of the variance in the Raman spectrum, whereas PC2
accounted for 22.21 % and PC3 for 1.74 % Noticeably, the cumulative variance of PCk from 1
to 3 was 97.57 % (> 90 %), hence PC1, PC2, and PC3
were analyzed further The relationship between the variables and principal components was shown in equations (1), (2) and (3)
PC1 = 0.35S1 + 0.37S2 + 0.38S3 + 0.33S4 + 0.26S5
+ 0.37S6 + 0.38S7 (1)
PC2 = -0.21S1 – 0.22S2 + 0.32S4 + 0.52S5 + 0.69S8
(2)
PC3 = 0.72S1 + 0.13S2 -0.41S4 + 0.32S8 (3) The application of PC to all Raman shift produced major characteristic bands that represent significant contributions to varieties rice classification The main band of distinction between rice varieties was shown in equation (1-3) by the load factors in each component from PC1 to PC3
It can be seen that the main characteristic bands included 420-560 cm-1, 860-980 cm-1, 1000-1200
cm-1, and 1300-1400 cm-1, with 420-560 cm-1 showing the strongest correlation (total loading S1-S2
in PC1 was 0.72) This result confirmed that the
0.2
0.4
0.6
0.8
1.0
1.2
1.4
S6
*
*
*
*
*
*
*
S8 S7
S5 S4 S3 S2
Raman Shift (cm -1 ) S1
*
Trang 6main starch chains are affected by the rice variety
Other detected bands are related to amylose,
amylopectin, and protein content Therefore, the
different quantities or structures of amylose,
amylopectin, and protein also are the main reference
indices for the discrimination of Seng Cu, Tam, Ki
Deo, and sticky rice The score scatters plot for the
first two PCs was shown in figure 5, which
demonstrates that Seng Cu, Tam, Ki Deo, and sticky
rice were grouped in different clusters The results
confirmed the separate clusters of the four rice
varieties produced by the PCA
Table 2: Eigenvalues and contributing ratios of
principal components
PC Eigenvalue Percentage of
Variance (%)
Cumulative (%)
Figure 5: Score scatter plot for the first two PCs of
rice grain sample
3.4 Hierarchical cluster analysis (HCA)
One preliminary way to study data is by exploring
the natural groupings among the samples HCA was
used to perform a preliminary data scan and to
uncover the structure residing in the data The
dendrogram in figure 6 shows the clustering pattern
of the data set 32 samples Rice samples were segregated into four clusters: G1, G2, G3, and G4 The
G1 cluster included the rice sample belongs to Seng
Cu varieties (from MV01 to MV16), while the G2
cluster included Tam rice sample (T01-T08) The G4
cluster consists of rice samples of the genus Ki Deo Noticeably, the G3 cluster was sticky rice varieties, which was classified into 2 sub-clusters (G31, G32) when choosing the distance from the cluster center about 100000 (brown line, figure 6) The reason behind the splitting of sticky rice samples into sub-clusters may be related to the difference in sticky species and region of collection sites The results of the HCA analysis in Table 3 show that the distance between clusters is very large (> 100000) Specifically, clusters 1 and 2 have a great distance compared to clusters of 3 and 4, which may indicate that sample groups 1 and 2 are normal rice while groups 3 and 4 are flexible rice with high amylopectin content From the above results, it can
be seen that using the HCA algorithm is suitable for grouping the initial data, but they are not strong enough to evaluate and provide the main components that contribute to the classification rice varieties
Table 3: Distance between cluster
Figure 6: Hierarchical cluster analysis (HCA)
dendrogram for concatenated data obtained from Raman spectra of rice sample Colors indicate
grouping proposals
-2
-1
0
1
2
3
4
MV01
MV02 MV03
MV04
MV05
MV06 MV07
MV08 MV09
MV10
MV11
MV12
MV13
MV14
MV15
MV16
N01
T01 T02
N03
T03 T04
T05
N04
T06
T07 T08
N02
K01 K03 K02 K04 G4 G3 G2
PC1 G1
0 100000 200000 300000 400000 500000
G32 G31 G4
G3 G2
Observations G1
Trang 73.5 K-nearest neighbor (KNN)
K nearest neighbor method is to classify different
data by measuring the distance between them In this
study, K is 4 and the distance is cosine distance
PCA-KNN classification models are respectively
established by using the variables obtained from
PCA post-analysis of the original data as the input of
the KNN method The classification results are
shown in table 4 The classification results are good;
the accuracy is approximately 90 %
Table 4: Classification of sample groups by the
KNN algorithm Sample Membership Sample Membership
4 CONCLUSIONS
The results described in this study open the
possibility to differentiate rice varieties by Raman
spectroscopy combined with multivariate analysis
methods such as PCA, HCA, and KNN The
spectroscopy information showed that Raman
spectroscopy reflected the sensitive fundamental
vibrations of less polar groups and bonds in rice The
combination of SNV and SGD2 in Raman spectra
preprocessing enhances the ability to classify
confirmed rice varieties Three algorithms PCA,
HCA, KNN all give good ability to classify rice
varieties but PCA can be shown the characteristic
band that contributes greatly to the classification of
rice varieties Therefore, the Raman technique is
suitable for determining rice varieties with
nondestructive and cost-efficient characteristics,
especially as a fast screening tool for rice producer
and regulatory authorities
Acknowledgment We are grateful for funding
supports from project TDNDTP.03/19-21
REFERENCES
1 Bhattacharya S., S Tyagi, S Srisuma, D L DeMeo,
S D Shapiro, R Bueno, E K Silverman, J J Reilly,
T J Mariani Peripheral blood gene expression
profiles in COPD subjects, Journal of Clinical
Bioinformatics, 2011, 1(1), 12
2 Maione C., B L Batista, A D Campiglia, F Barbosa, R.M Barbosa Classification of geographic origin of rice by data mining and inductively coupled plasma mass spectrometry, Computers and
Electronics in Agriculture, 2016, 121, 101-107
3 Tokalıoğlu Ş., B Çiçek, N İnanç, G Zararsız, and A Öztürk Multivariate Statistical Analysis of Data and ICP-MS Determination of Heavy Metals in Different Brands of Spices Consumed in Kayseri, Turkey,
Food Analytical Methods, 2018, 11(9), 2407-2418
4 T Korenaga Traceability Studies for Analyzing the Geographical Origin of Rice by Isotope Ratio Mass
Spectrometry, Bunseki kagaku, 2014, 63, 233-244
5 Monakhova Y., D Rutledge, A Roßmann, H.-U Waiblinger, M Mahler, M Ilse, T Kuballa, D Lachenmeier Determination of rice type by 1H NMR spectroscopy in combination with different
chemometric tools, Journal of Chemometrics, 2014,
28, 83-92
6 Sampaio P., A Soares, A Castanho, A S Almeida,
J Oliveira, C Brites Dataset of Near-infrared spectroscopy measurement for amylose determination
using PLS algorithms, Data Brief., 2017, 15,
389-396
7 Wu Z., J Long, E Xu, F Wang, X Xu, Z Jin, A Jiao A Feasibility Study on the Evaluation of Quality Properties of Chinese Rice Wine Using Raman
Spectroscopy, Food Analytical Methods, 2016, 9(5),
1210-1219
8 Xu M.-L, Y Gao, X X Han, B Zhao Detection of Pesticide Residues in Food Using Surface-Enhanced
Raman Spectroscopy: A Review, Journal of
Agricultural and Food Chemistry, 2017, 65(32),
6719-6726
9 Pandey R., S K Paidi, T A Valdez, C Zhang, N Spegazzini, R R Dasari, I Barman Noninvasive Monitoring of Blood Glucose with Raman
Spectroscopy, Acc Chem Res., 2017, 50(2), 264-272
10 Junior B R A., F L F Soares, J A Ardila, L G C Durango, M R Forim, R L Carneiro Determination
of B-complex vitamins in pharmaceutical formulations by surface-enhanced Raman
spectroscopy, Spectrochim Acta A Mol Biomol
Spectrosc., 2018, 188, 589-595
11 Jinyoung Hwang S K., Kangjin Lee, Hoeil Chung
Trang 8Enhanced Raman spectroscopic discrimination of the
geographical origins of rice samples via transmission
spectral collection through packed grains, Talanta,
2012, 101, 488-494
12 Zhu L., J Sun, G Wu, Y Wang, H Zhang, L Wang,
H Qian, X Qi Identification of rice varieties and
determination of their geographical origin in China
using Raman spectroscopy, Journal of Cereal
Science, 2018, 82, 175-182
13 Gautam R., S Vanga, F Ariese, S Umapathy
Review of multidimensional data processing
approaches for Raman and infrared spectroscopy,
EPJ Techniques and Instrumentation, 2015, 2(1)
14 A Savitzky, M J E G Smoothing and
differentiation of data by simplified least squares
procedures, Anal Chem., 1964, 36, 1627-1639
15 Liland K H., A Kohler, N K Afseth Model-based
pre-processing in Raman spectroscopy of biological
samples, Journal of Raman Spectroscopy, 2016,
47(6), 643-650
16 Granato D., J S Santos, G B Escher, B L Ferreira,
R M Maggio Use of principal component analysis
(PCA) and hierarchical cluster analysis (HCA) for
multivariate association between bioactive
compounds and functional properties in foods: A
critical perspective, Trends in Food Science &
Technology, 2018, 72, 83-90
17 Murakami K., N Shinozaki, A Fujiwara, X Yuan,
A Hashimoto, H Fujihashi, H -C Wang, M B E Livingstone, S Sasaki A Systematic Review of Principal Component Analysis–Derived Dietary Patterns in Japanese Adults: Are Major Dietary
Patterns Reproducible Within a Country?, Advances
in Nutrition, 2019, 10(2), 237-249
18 Nielsen Hierarchical Clustering, Introduction to
HPC with MPI for Data Science, 2016, 195-211
19 Aman Kataria M D S A Review of Data Classification Using K-Nearest Neighbour
Algorithm, International Journal of Emerging
Technology and Advanced Engineering, 2013, 3(6),
354-360
20 Kanj S., F Abdallah, T Denœux, K Tout Editing training data for multi-label classification with the
k-nearest neighbor rule, Pattern Analysis and
Applications, 2016, 19(1), 145-161
21 Feng X., Q Zhang, P Cong, Z Zhu Preliminary study on classification of rice and detection of paraffin in the adulterated samples by Raman spectroscopy combined with multivariate analysis,
Talanta, 2013, 115, 548-55
22 Tian F., F Tan, H Li An rapid nondestructive testing method for distinguishing rice producing areas based on Raman spectroscopy and support
vector machine, Vibrational Spectroscopy, 2020, 107
Corresponding author: Le Truong Giang
Institute of Chemistry, Vietnam Academy of Science and Technology
18, Hoang Quoc Viet, Cau Giay, Hanoi 10000, Viet Nam
Tel: +84- 98-585-9795, E-mail: hoasinhmoitruong.vast@gmail.com