1. Trang chủ
  2. » Trung học cơ sở - phổ thông

Identification of rice varieties specialties in Vietnam using raman spectroscopy

8 6 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 8
Dung lượng 869,82 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

In this research, we aimed to: (i) analyzing and comparing the characteristic of the Raman spectra between rice varieties sample; (ii) pretreating Raman spectra a[r]

Trang 1

Identification of rice varieties specialties in Vietnam

using Raman spectroscopy

Le Truong Giang 1,2* , Pham Quoc Trung 1 , Dao Hai Yen 1

1 Institute of Chemistry, Vietnam Academy of Science and Technology,

18 Hoang Quoc Viet, Cau Giay, Hanoi 10000, Vietnam 2

Graduate University of Science and Technology, Vietnam Academy of Science and Technology,

18 Hoang Quoc Viet, Cau Giay, Hanoi 10000, Vietnam

Received February 20, 2019, Accepted April 17, 2020

Abstract

The characteristics and quality of rice are significantly affected by its variety However, discrimination between varieties is an urgent but difficult and time-consuming effort in Vietnam In this study, an effective and reliable identification method was established by Raman spectroscopy (RS) Total Raman spectra of 32 rice samples were acquired from 400 to 1600 cm-1 and the sensitive fundamental vibrations of less polar groups and bonds in rice were analyzed Initially, the raw Raman spectra were processed by standard normal variety (SNV) combined with Savitzky

confirmed by principal component analysis (PCA) Next, multivariate analysis methods included PCA, hierarchical cluster analysis (HCA), and K-nearest neighbor (KNN), that have been compared with each other on the ability to classify rice varieties All three methods give the ability to classify four rice varieties very well The PCA method identifies four main factors were starch chains, amylose, amylopectin, and protein contents which are used to distinguish among four rice varieties While HCA only distinguishes well between rice with high and low amylopectin content and does not provide the main components

Keywords Rice varieties, Raman spectroscopy, PCA, HCA, KNN

1 INTRODUCTION

Rice is an important food for more than half of the

world's population They provide energy for the

body in the form of carbohydrates, proteins,

vitamins, and various trace elements.[1] Vietnam is

known as the leading rice export country in the

world, with many kinds of high-quality rice such as

ST25, Huong Lai, Tam, and Seng Cu These

specialty types of rice have higher economic value

than other conventional rice types In recent years,

some traders have changed their product labels,

mixing different types of rice for-profit purposes

This has seriously affected specialty rice brands,

interests of consumers, and businesses It is therefore

of great significance to ensure that products for

which geographical indications are protected,

through achieving reliable identification and

classification, is of great significance

Over the last decade, several methods have been

described for the traceability of rice These methods

include detecting differences in inorganic, organic,

and flavor components.[2,3] A few types of research have used stable isotope methods to differentiate between rice in different regions such as Vietnam, Japan, and China.[4] In general, chemical properties play an essential role in defining rice types However, there are still many drawbacks to these techniques, including long detection time, high cost, destructibility, etc In recent years, non-destructive and rapid detection methods have become important For example, using low-field nuclear magnetic resonance (NMR) and near-infrared spectroscopic (NIR) combined with the stoichiometry method were reported as an approach for classifying rice.[5,6] Similar to NMR and NIR, Raman spectroscopy was known as a fast and non-destructive method used to identify different materials based on the frequency of molecular vibrations.[7] Different components generate energy levels for molecular rotational and fluctuations, which can be shown by the difference in the Raman shift Therefore, each component in any material is characterized by its specific spectrum Notably, it is particularly useful

Trang 2

for water-rich samples compared to infrared

spectroscopy For example, Raman spectroscopy has

been used to detect organic compounds in foods

such as pesticide residues,[8] glucose in blood,[9]

vitamin,[10] etc Moreover, the imitation of cooking

oil by mineral oil was discovered by using the

Raman spectrum and near-infrared spectrum In a

study of rice collected from different agricultural

areas in Korea, Hwang and colleagues used the

Raman spectrum to detect the geographical origin of

rice grains.[11] Currently, there is no specific report

on the classification of different varieties of

Vietnamese rice In this research, we aimed to: (i)

analyzing and comparing the characteristic of the

Raman spectra between rice varieties sample; (ii)

pretreating Raman spectra and using multivariate

analysis such as PCA, KNN, and HCA to evaluate

and identify rice varieties

2 MATERIALS AND METHODS

2.1 Materials

A total of 32 samples; including 16 Seng Cu rice

(MV), 8 Tam rice (T), 4 Ki Deo rice (K), and 4

sticky rice (N) The samples were composed of

different species and were cultivated in diverse

geographical regions of Vietnam The sample was

washed with deionized water, and then dried at 40

o

C until the weight was unchanged, and all the rice

kernel samples were ground with a sample miller

(LM-3100, Perten, Sweden) to obtain fine

powder.[12]

2.2 Methods

2.2.1 Spectral collection method

A LabRAM HR Evolution (HORIBA Jobin Yvon

S.A.S France) instrument was used to collect the

Raman spectrum of rice samples The condition of

LabRAM HR Evolution was set as follows: 50x

objective lens, 20 mW laser power, 1.5 cm-1

resolution at room temperature (25 °C), and relative

humidity below 60 % The excitation wavelengths

and time were set at 632.8 nm and 30 s, respectively,

time a scanning range from 100 to 1600 cm-1.[12] The

rice sample scans were replicated three times

2.2.2 Raman spectra pre-processing

Spectra of the sample could have been recorded over

several days, it is very difficult to calibrate the

Raman instrument precisely to have the same Raman

shift axis, laser power, and spectral resolutions

(depend on gratings) Before using multivariate analysis, the Raman spectra should be treated by a different kind of method such as mean centering (MC), mean scattering correction (MSC).[13] In this study, a Savitsky-Golay smoothing filter[14] and second-order polynomial deconvolution (SGD2) combined with Standard Normal Variate (SNV) method[15] were performed in this data to obtain the best results Initial, SNV is used to normalize Raman data of rice samples when they are measured at different times After that, the spectral data were processed to reduce background noise by a second-order polynomial 100-point S–G smoothing algorithm

2.3 Multivariate data analysis

Multivariate analysis is divided into 3 main groups including exploration methods, calibration methods, and classification methods.[16] In this paper, the exploration method included principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used to analyze the rice distribution Subsequently, classification methods K-nearest

neighbor (KNN) was compared to identify the best

fitting model for rice varieties

2.3.1 Principal component analysis (PCA)

The principal component analysis uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.[12,17] In this study, after

pre-processing all spectrum data for rice samples were

subjected to a PCA to find patterns in the complex data by reducing the dimensions Nine principal components (PCs) were found, and then whose subset were selected (referred to as PCk) with high cumulative contribution rate (> 90 %) After that, a distribution chart was plotted base on eigenvalues for the PCk, distances between points represent the magnitude of difference So the main characteristics

of the different rice varieties can be classified base

on the loading graph

2.3.2 Hierarchical cluster analysis (HCA)

Hierarchical clustering starts by treating each Raman spectrum as a separate cluster Then, it repeatedly executes the following two steps: (1) differences between spectra were calculated based on the method Squared Euclidean; (2) using the Ward method to merge the two most similar clusters.[18]

Trang 3

2.3.3 K-nearest neighbor (KNN)

KNN proposed by Fix and Hodges includes the

distances calculated among all data points Then,

K-closest neighbors are found by sorting the distance

matrix.[19] The K-closest data points are analyzed to

determine which class label is the most common

among the set KNN has good performance in

dealing with multiclass problems.[20]

3 RESULTS AND DISCUSSION

3.1 Spectral analysis

The composition of the rice is complex, moreover,

the uneven distribution in the grain Figure 1 gives

information about the characteristic bands for the

different groups In the Raman spectra of four

different rice samples were randomly chosen, these

main band included 308, 356, 408, 446, 479, 579,

766, 874, 947, 1001, 1061, 1085, 1127, 1203, 1265,

1341, 1405 and 1460 cm-1 Table 1 illustrates each

characteristic spectral feature for the vibrational or

rotational modes (stretching, bending, torsional

fundamental vibrations, etc.) of the different

functional groups and the skeletal information of the

ring component

Characteristics peaks of the glucose unit in

starches were found at 446 cm-1, 479 cm-1, and 579

cm-1 The strong absorption of approximate 479 cm-1

is probably an important skeletal vibration that can

reflect the degree of crystallinity in rice starch The

fingerprint region for the Raman method from 800

to 1500 cm-1 provides the highly overlapping and

complex vibration modes for different functional

groups Polysaccharides, which are condensed from

multiple glucose units, can be assigned by the

different vibrational states of glucose in this region,

such as the deformation vibration of CH2 (or CH3) at

1460 cm-1, CH bending at 1341 cm-1, COH

deformation and CO stretching between 1085 and

1127 cm-1, and C1-H deformation at 874 cm-1.[21] In

rice, starch, which is the main component, can be

assigned to α-1,4 linkage vibrations (stretching

vibration of COC) at 947 cm-1, and a slight location

change may be associated with the amylopectin

α-1,6 linkage.[12] A band near 1265 cm-1 was attributed

to a CH2OH deformation vibration, which is closely

related to crystalline structures in starch.[22] Other

components of rice, such as protein and lipids, were

associated with vibrations at 1460 to 1341 cm-1 (CH2

twisting vibration), the COH stretching vibration,

and OH twisting vibration at 1200 to 1000 cm-1.[21]

Bands at 1001 and 1061 cm-1 were vibrations that

originated from protein side chains

Figure 1: Raman spectra of selected rice samples

Figure 2a gives information about the spectral features of the sixteen rice samples were randomly chosen from collected spectral libraries Overall, it is obvious that the bands in the spectra are analogous

to each other, which suggests that the samples have similar compositions However, it is clear that the intensity does not have similar between rice samples due to amylopectin branching and amylose lengths were different among the cultivars and varieties Since these differences in the spectrum cannot be confidently visualized, a clearer method of differentiation is needed Thus, multivariate analysis methods such as PCA, HCA, and KNN were combined with the Raman spectrum results to further interpret the data

3.2 Preprocessing of Raman spectral

Pretreatment is performed to eliminate the effects of unevenness, base compensation, and noise signals in the spectral data collected for rice samples In this study, before performing further spectral processing, all the spectra were pre-processed according to Sections 2.2 It can be seen that the background signal fluctuates dramatically from 500 to approximately 3000 counts Moreover, the background noise is relatively large in the raw Raman spectrum (figure 2a) The opposite was true for that of the corrected spectra (figure 2b) From this figure, it is clear that the background interference and baseline drift in the raw spectra have been effectively eliminated

In this study, PCA was applied to both the original and corrected spectra of rice grains for classifications (with six samples selected from the Seng Cu and Tam rice) The results are shown in figure 3, overall it obvious that the rice grains are not

200 400 600 800 1000 1200 1400 1600 2000

3000 4000 5000 6000 7000 8000 9000

T05

K01

MV03

N02

Raman Shift (cm -1 )

Trang 4

classified before baselines corrected (figure 3a)

Looking at the information in more detail, the

difference scores PC2 among the samples in the same

group were clearly shown Standing at 0.365 and

-0.183, the score PC2 of MV05 and T01, which differ

greatly from the rest of the samples in the same group respectively While after the baseline-corrected, samples were classified into two groups, which refer

to Seng Cu and Tam varieties (figure 3b)

Table 1: Attribution spectrometry Raman of rice

Wavenumber

579

C-O bending vibration

Skeletal modes C–C stretch Skeletal modes of the pyranose ring

Glucan

766 O=C-N deformation vibration and OH

linkage (C–O–C)

Glycogen and branched-chain

starch

1061 C–C stretching

1127 C-O stretching vibration and C-O-H flexural

1265 Amide III band C-N stretching vibration peak

1460 C-H In-plane bending vibration and CH2 and

Figure 2: Comparing Raman spectra of four rice varieties

a - Raman raw data, b - preprocessing by SNV-SGD2

Trang 5

Figure 3: Score scatter plot for the first two PCs of rice sample

a - raw data; b - preprocessing using SNV-SGD2 From the above results, it can be suggested that

the influencing factors in the process of acquiring

Raman spectra were effectively eliminated

Moreover, they also help increase the ability to

classify rice varieties by SNV-SGD2 Therefore,

when distinguishing rice varieties by the Raman

spectrum, the SNV-SGD2 method is necessary for

pre-treatment

3.3 Principal Components Analysis

Clearly, it is impossible to distinguish rice varieties

based on only one factor due to the difference signal

of amylose or amylopectin spectra in rice samples is

not clear Therefore, it is necessary to evaluate all

signals of rice components, which are amylose,

amylopectin, protein, and lipid for purpose

discriminant varieties rice

Figure 4: Full-scale Raman spectra of four rice

varieties after preprocessed by SNV-SGD2

The principal component analysis was used in this study for discriminant among four varieties rice, with input data is the peak area at some characteristic wavelengths as follows: S1 (420-450 nm); S2 (470-560 nm); S3 (570-580 nm); S4 (710-720 nm); S5 (860-880 nm); S6 (920-980 nm); S7

(1000-1200 nm) and S8 (1300-140 nm) (figure 4) The results of PCA indicate that the first nine principal components (PC) explained 100 % variance of the data (table 2) The PC1 represented 73.62 % of the variance in the Raman spectrum, whereas PC2

accounted for 22.21 % and PC3 for 1.74 % Noticeably, the cumulative variance of PCk from 1

to 3 was 97.57 % (> 90 %), hence PC1, PC2, and PC3

were analyzed further The relationship between the variables and principal components was shown in equations (1), (2) and (3)

PC1 = 0.35S1 + 0.37S2 + 0.38S3 + 0.33S4 + 0.26S5

+ 0.37S6 + 0.38S7 (1)

PC2 = -0.21S1 – 0.22S2 + 0.32S4 + 0.52S5 + 0.69S8

(2)

PC3 = 0.72S1 + 0.13S2 -0.41S4 + 0.32S8 (3) The application of PC to all Raman shift produced major characteristic bands that represent significant contributions to varieties rice classification The main band of distinction between rice varieties was shown in equation (1-3) by the load factors in each component from PC1 to PC3

It can be seen that the main characteristic bands included 420-560 cm-1, 860-980 cm-1, 1000-1200

cm-1, and 1300-1400 cm-1, with 420-560 cm-1 showing the strongest correlation (total loading S1-S2

in PC1 was 0.72) This result confirmed that the

0.2

0.4

0.6

0.8

1.0

1.2

1.4

S6

*

*

*

*

*

*

*

S8 S7

S5 S4 S3 S2

Raman Shift (cm -1 ) S1

*

Trang 6

main starch chains are affected by the rice variety

Other detected bands are related to amylose,

amylopectin, and protein content Therefore, the

different quantities or structures of amylose,

amylopectin, and protein also are the main reference

indices for the discrimination of Seng Cu, Tam, Ki

Deo, and sticky rice The score scatters plot for the

first two PCs was shown in figure 5, which

demonstrates that Seng Cu, Tam, Ki Deo, and sticky

rice were grouped in different clusters The results

confirmed the separate clusters of the four rice

varieties produced by the PCA

Table 2: Eigenvalues and contributing ratios of

principal components

PC Eigenvalue Percentage of

Variance (%)

Cumulative (%)

Figure 5: Score scatter plot for the first two PCs of

rice grain sample

3.4 Hierarchical cluster analysis (HCA)

One preliminary way to study data is by exploring

the natural groupings among the samples HCA was

used to perform a preliminary data scan and to

uncover the structure residing in the data The

dendrogram in figure 6 shows the clustering pattern

of the data set 32 samples Rice samples were segregated into four clusters: G1, G2, G3, and G4 The

G1 cluster included the rice sample belongs to Seng

Cu varieties (from MV01 to MV16), while the G2

cluster included Tam rice sample (T01-T08) The G4

cluster consists of rice samples of the genus Ki Deo Noticeably, the G3 cluster was sticky rice varieties, which was classified into 2 sub-clusters (G31, G32) when choosing the distance from the cluster center about 100000 (brown line, figure 6) The reason behind the splitting of sticky rice samples into sub-clusters may be related to the difference in sticky species and region of collection sites The results of the HCA analysis in Table 3 show that the distance between clusters is very large (> 100000) Specifically, clusters 1 and 2 have a great distance compared to clusters of 3 and 4, which may indicate that sample groups 1 and 2 are normal rice while groups 3 and 4 are flexible rice with high amylopectin content From the above results, it can

be seen that using the HCA algorithm is suitable for grouping the initial data, but they are not strong enough to evaluate and provide the main components that contribute to the classification rice varieties

Table 3: Distance between cluster

Figure 6: Hierarchical cluster analysis (HCA)

dendrogram for concatenated data obtained from Raman spectra of rice sample Colors indicate

grouping proposals

-2

-1

0

1

2

3

4

MV01

MV02 MV03

MV04

MV05

MV06 MV07

MV08 MV09

MV10

MV11

MV12

MV13

MV14

MV15

MV16

N01

T01 T02

N03

T03 T04

T05

N04

T06

T07 T08

N02

K01 K03 K02 K04 G4 G3 G2

PC1 G1

0 100000 200000 300000 400000 500000

G32 G31 G4

G3 G2

Observations G1

Trang 7

3.5 K-nearest neighbor (KNN)

K nearest neighbor method is to classify different

data by measuring the distance between them In this

study, K is 4 and the distance is cosine distance

PCA-KNN classification models are respectively

established by using the variables obtained from

PCA post-analysis of the original data as the input of

the KNN method The classification results are

shown in table 4 The classification results are good;

the accuracy is approximately 90 %

Table 4: Classification of sample groups by the

KNN algorithm Sample Membership Sample Membership

4 CONCLUSIONS

The results described in this study open the

possibility to differentiate rice varieties by Raman

spectroscopy combined with multivariate analysis

methods such as PCA, HCA, and KNN The

spectroscopy information showed that Raman

spectroscopy reflected the sensitive fundamental

vibrations of less polar groups and bonds in rice The

combination of SNV and SGD2 in Raman spectra

preprocessing enhances the ability to classify

confirmed rice varieties Three algorithms PCA,

HCA, KNN all give good ability to classify rice

varieties but PCA can be shown the characteristic

band that contributes greatly to the classification of

rice varieties Therefore, the Raman technique is

suitable for determining rice varieties with

nondestructive and cost-efficient characteristics,

especially as a fast screening tool for rice producer

and regulatory authorities

Acknowledgment We are grateful for funding

supports from project TDNDTP.03/19-21

REFERENCES

1 Bhattacharya S., S Tyagi, S Srisuma, D L DeMeo,

S D Shapiro, R Bueno, E K Silverman, J J Reilly,

T J Mariani Peripheral blood gene expression

profiles in COPD subjects, Journal of Clinical

Bioinformatics, 2011, 1(1), 12

2 Maione C., B L Batista, A D Campiglia, F Barbosa, R.M Barbosa Classification of geographic origin of rice by data mining and inductively coupled plasma mass spectrometry, Computers and

Electronics in Agriculture, 2016, 121, 101-107

3 Tokalıoğlu Ş., B Çiçek, N İnanç, G Zararsız, and A Öztürk Multivariate Statistical Analysis of Data and ICP-MS Determination of Heavy Metals in Different Brands of Spices Consumed in Kayseri, Turkey,

Food Analytical Methods, 2018, 11(9), 2407-2418

4 T Korenaga Traceability Studies for Analyzing the Geographical Origin of Rice by Isotope Ratio Mass

Spectrometry, Bunseki kagaku, 2014, 63, 233-244

5 Monakhova Y., D Rutledge, A Roßmann, H.-U Waiblinger, M Mahler, M Ilse, T Kuballa, D Lachenmeier Determination of rice type by 1H NMR spectroscopy in combination with different

chemometric tools, Journal of Chemometrics, 2014,

28, 83-92

6 Sampaio P., A Soares, A Castanho, A S Almeida,

J Oliveira, C Brites Dataset of Near-infrared spectroscopy measurement for amylose determination

using PLS algorithms, Data Brief., 2017, 15,

389-396

7 Wu Z., J Long, E Xu, F Wang, X Xu, Z Jin, A Jiao A Feasibility Study on the Evaluation of Quality Properties of Chinese Rice Wine Using Raman

Spectroscopy, Food Analytical Methods, 2016, 9(5),

1210-1219

8 Xu M.-L, Y Gao, X X Han, B Zhao Detection of Pesticide Residues in Food Using Surface-Enhanced

Raman Spectroscopy: A Review, Journal of

Agricultural and Food Chemistry, 2017, 65(32),

6719-6726

9 Pandey R., S K Paidi, T A Valdez, C Zhang, N Spegazzini, R R Dasari, I Barman Noninvasive Monitoring of Blood Glucose with Raman

Spectroscopy, Acc Chem Res., 2017, 50(2), 264-272

10 Junior B R A., F L F Soares, J A Ardila, L G C Durango, M R Forim, R L Carneiro Determination

of B-complex vitamins in pharmaceutical formulations by surface-enhanced Raman

spectroscopy, Spectrochim Acta A Mol Biomol

Spectrosc., 2018, 188, 589-595

11 Jinyoung Hwang S K., Kangjin Lee, Hoeil Chung

Trang 8

Enhanced Raman spectroscopic discrimination of the

geographical origins of rice samples via transmission

spectral collection through packed grains, Talanta,

2012, 101, 488-494

12 Zhu L., J Sun, G Wu, Y Wang, H Zhang, L Wang,

H Qian, X Qi Identification of rice varieties and

determination of their geographical origin in China

using Raman spectroscopy, Journal of Cereal

Science, 2018, 82, 175-182

13 Gautam R., S Vanga, F Ariese, S Umapathy

Review of multidimensional data processing

approaches for Raman and infrared spectroscopy,

EPJ Techniques and Instrumentation, 2015, 2(1)

14 A Savitzky, M J E G Smoothing and

differentiation of data by simplified least squares

procedures, Anal Chem., 1964, 36, 1627-1639

15 Liland K H., A Kohler, N K Afseth Model-based

pre-processing in Raman spectroscopy of biological

samples, Journal of Raman Spectroscopy, 2016,

47(6), 643-650

16 Granato D., J S Santos, G B Escher, B L Ferreira,

R M Maggio Use of principal component analysis

(PCA) and hierarchical cluster analysis (HCA) for

multivariate association between bioactive

compounds and functional properties in foods: A

critical perspective, Trends in Food Science &

Technology, 2018, 72, 83-90

17 Murakami K., N Shinozaki, A Fujiwara, X Yuan,

A Hashimoto, H Fujihashi, H -C Wang, M B E Livingstone, S Sasaki A Systematic Review of Principal Component Analysis–Derived Dietary Patterns in Japanese Adults: Are Major Dietary

Patterns Reproducible Within a Country?, Advances

in Nutrition, 2019, 10(2), 237-249

18 Nielsen Hierarchical Clustering, Introduction to

HPC with MPI for Data Science, 2016, 195-211

19 Aman Kataria M D S A Review of Data Classification Using K-Nearest Neighbour

Algorithm, International Journal of Emerging

Technology and Advanced Engineering, 2013, 3(6),

354-360

20 Kanj S., F Abdallah, T Denœux, K Tout Editing training data for multi-label classification with the

k-nearest neighbor rule, Pattern Analysis and

Applications, 2016, 19(1), 145-161

21 Feng X., Q Zhang, P Cong, Z Zhu Preliminary study on classification of rice and detection of paraffin in the adulterated samples by Raman spectroscopy combined with multivariate analysis,

Talanta, 2013, 115, 548-55

22 Tian F., F Tan, H Li An rapid nondestructive testing method for distinguishing rice producing areas based on Raman spectroscopy and support

vector machine, Vibrational Spectroscopy, 2020, 107

Corresponding author: Le Truong Giang

Institute of Chemistry, Vietnam Academy of Science and Technology

18, Hoang Quoc Viet, Cau Giay, Hanoi 10000, Viet Nam

Tel: +84- 98-585-9795, E-mail: hoasinhmoitruong.vast@gmail.com

Ngày đăng: 09/04/2021, 23:24

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm