Linear and nonlinear analysis for transduced current curves of electrochemical biosensors

5.1 Review of Hematocrit and Previous Measurement Methods...79 5.1.1 Typical Methods for Measuring Hematocrit ...79 5.1.2 Hematocrit Determination from Impedance...80 5.1.3 Hematocrit Me

Trang 1

Doctor of Philosophy Dissertation

Linear and Nonlinear Analysis for Transduced Current Curves of Electrochemical Biosensors

Graduate School of Chonnam National University

Department of Computer Engineering

HUYNH TRUNG HIEU

Directed by Professor Yonggwan Won

February 2009

Trang 2

TABLE OF CONTENTS

TABLE OF CONTENTS i

LIST OF FIGURES v

LIST OF TABLES ix

LIST OF ABBREVIATIONS x

Abstract xii

CHAPTER I INTRODUCTION 1

1.1 Statement of the Problem 1

1.2 Objective and Approach 3

1.2.1 Overview 3

1.2.2 Approaches and Contributions 5

1.2.3 Data Acquisition 8

1.3 Organization 8

CHAPTER II LITERATURE REVIEW 10

2.1 Linear Models 11

2.1.1 Overview 11

2.1.2 Parameter Estimation in the Linear Model 13

a) Minimum Variance Unbiased estimation 13

b) Maximum Likelihood Estimation (MLE) 15

c) Least Squares (LS) 16

d) Linear Bayesian Estimators 19

2.2 Feedforward Neural Networks 22

2.2.1 Neural Networks and Feedforward Operation 23

Trang 3

2.2.2 Gradient-descent based Learning Algorithms 24

2.2.3 Practical Techniques for Improving Backpropagation 26

2.2.4 Theoretical Foundations for Improving Backpropagation 27

2.2.5 Approximation Capabilities of Feedforward networks and SLFNs 30

2.3 Support Vector Machine 34

CHAPTER III TRAINING ALGORITMS FOR SINGLE HIDDEN LAYER FEEDFORWARD NEURAL NETWORKS 43

3.1 Single Hidden Layer Feedforward Neural Networks 43

3.2 Extreme Learning Machine (ELM) 45

3.3 Evolutionary Extreme Learning Machine (E-ELM) 49

3.4 Least-Squares Extreme Learning Machine 51

3.4.1 Least-Squares Extreme Learning Machine (LS-ELM) 51

3.4.2 Online Training with LS-ELM 54

3.5 Regularized Least-Squares Extreme Learning Machine (RLS-ELM) 59

3.6 Evolutionary Least-Squares Extreme Learning Machine (ELS-ELM) 61

CHAPTER IV OUTLIER DETECTION AND ELIMINATION 64

4.1 Distance-based outlier detection 64

4.2 Density-based local outlier detection 66

4.3 The Chebyshev outlier detection 68

4.4 Area-descent-based outlier detection 69

4.5 Two-stage area-descent outlier detection 72

4.6 ELM-based outlier Detection and Elimination 74

CHAPTER V HEMATOCRIT ESTIMATION FROM TRANSDUCED CURRENT CURVE 78

Trang 4

5.1 Review of Hematocrit and Previous Measurement Methods 79

5.1.1 Typical Methods for Measuring Hematocrit 79

5.1.2 Hematocrit Determination from Impedance 80

5.1.3 Hematocrit Measurement by Dielectric Spectroscopy 82

5.2 Hematocrit Estimation from Transduced Current Curve 83

5.2.1 Transduced Current Curve from Electrochemical Biosensor for Glucose Measurement 84

5.2.2 Linear Models for Hematocrit Estimation 86

5.2.3 Neural Network for Hematocrit Estimation 90

5.2.4 Hematocrit Estimation by Using Support Vector Machine 91

CHAPTER VI ERROR CORRECTION FOR GLUCOSE BY REDUCING EFFECTS OF HEMATOCRIT 92

6.1 Effects of Hematocrit on Glucose Measurement 92

6.2 Error Correction for Glucose Measured by a Handheld Device 95

6.3 Error Correction for Glucose Computed Using a Single Transduced Current Point 99

CHAPTER VII DIRECT ESTIMATION FOR GLUCOSE DENSITY FROM TRANSDUCED CURRENT CURVE 107

7.1 Effects of Critical Care Variables 107

7.2 Glucose Estimation from the Transduced Current Curve 109

CHAPTER VIII EXPERIMENTAL RESULTS 114

8.1 Experimental Results for Hematocrit Estimation 115

8.2 Experimental Results for Glucose Correction 119

8.2.1 Error Correction for Glucose Measured by the Handheld Device 120

Trang 5

8.2.2 Error Correction for Glucose Computed Using a Single

Transduced Current Point 125

8.3 Experimental Results for Direct Estimation for Glucose from the Transduced Current Curve 127

CHAPTER IX CONCLUSIONS AND FUTURE WORKS 130

9.1 Conclusions 130

9.2 Future Works 133

9.2.1 Feature Selection 133

9.2.2 Optical Biosensors 133

9.2.3 Reducing Effects of Other Factors 134

9.2.4 Applying Improvements of ELM in Medical Diagnosis 134

REFERENCES 135

ACKNOWLEDGMENTS 148

CURRICULUM VITAE 150

Trang 6

LIST OF FIGURES

Figure Page

Figure 1.1 Overview of the proposed systems: (a) Error correction for glucose

values by reducing the effects of hematocrit (b) Glucose estimation

from transduced current curve 4

Figure 1.2 The transduced current curve The first eight seconds may be

incubation time which waits for chemical reaction .5

Figure 2.1 A typical feedforward neural network .24

Figure 2.2 Loss functions can be used in SVR, in which ε-insensitive loss

function allows obtaining a sparse set of support vectors 35

Figure 2.3 Soft margin loss setting corresponds for a linear SV machine 37

Figure 3.1 The architecture of single hidden layer feedforward neural network

(SLFN) 44

Figure 4.1 A simple 2D dataset contains points belonging to two clusters C 1

and C 2 C1 forms a denser cluster than C 2 Two additional points o 1 and

o2 can be considered as outliers 65

Figure 4.2 Detecting outliers by the area descent method 70

Figure 4.3 A simple dataset with closed outliers o1 and o2 These outliers

cannot be detected by area-descent based method 71

Figure 5.1 An example of anodic current curve corresponding to the first 14s

Trang 7

The first 8 seconds may be incubation time, which waits for chemical reaction 84

Figure 5.2 Transduced anodic current points used in estimation of hematocrit

They are obtained by sampling the second part of current curve at frequency of 10Hz .85

Figure 5.3 Current measurements at the time instants They seem to be an

exponential function of time .87

Figure 5.4 Hematocrit estimation by using LRCP approach Current curve

together with its two extra features are the input of linear model .89

Figure 5.5 Hematocrit estimation using the neural network model Input

features are current points sampled from the transduced current curve with/without extra features .91

Figure 6.1 Effects of Hematocrit on Glucose Measurement: (a) same measured

value on current curve but different glucose value, (b) different measured value on current curve but same glucose value .93

Figure 6.2 Plot of the paired-differences of glucose measurements by portable

device minus the primary reference glucose measurements as function

of hematocrit [5] .94

Figure 6.3 Glucose correction process Finding a mapping from tm to tc so that

dependency of hematocrit is reduced and errors are also reduced .95

Figure 6.4 An illustration of glucose correction of handheld devices .97

Trang 8

Figure 6.5 An illustration of glucose correction measured from a single point

on the transduced current curve .100

Figure 6.6 Plot of the primary reference glucose against current point x57 We

can diagnose that there would be a linear relationship between the

primary reference glucose and current-point xk 101

Figure 7.1 Effects of PO2 on glucose measurement by handheld devices [5]

The glucose is underestimated at higher levels of PO2 .108

Figure 7.2 Effects of PCO2 on glucose measurement by handheld devices [5]

The measured glucose is underestimated at the higher levels of PCO2 .108

Figure 7.3 Effects of pH on glucose measurement by handheld devices [5] 109

Figure 7.4 Illustration of estimating glucose from the transduced current curve

Glucose values are estimated directly from multiple current points, which include changing information of the transduced current curve 110

Figure 7.5 SLFNs for estimating glucose Input features are current points

sampled from the transduced current curve 111

Figure 8.1 Distribution of collected hematocrit This distribution is fairly

representing the general trend of hematocrit values for human 114

Figure 8.2 Distribution of glucose collected from YSI 2300 119

Figure 8.3 Plot of paired-differences of glucose measurements by handheld

device minus the YSI2300 glucose measurements The dependency of

hematocrit on residuals is significant 120

Trang 9

Figure 8.4 The paired-differences of a testing set corresponding to glucose

measurements by handheld device without error correction The dependency of hematocrit on residuals is significant .122

Figure 8.5 The paired-differences of a testing set corresponding to glucose

measurements by handheld device after error correction The dependency of hematocrit on residuals is reduced significantly 122

Figure 8.6 Comparison of glucose results from handheld meter and the

primary reference instrument, YSI 2300: (a) before error correction

and (b) after error correction 124

Figure 8.7 The plot of paired-differences of estimated glucose on the test set

minus the YSI 2300 glucose measurements with respect to the hematocrit density The dependency of hematocrit on residuals is almost removed 127

Figure 8.8 The comparison of glucose value between the neural network and

the primary reference instrument corresponding to criterion of

±15mg/dL for glucose levels ≤100 mg/dL and ±15% for glucose levels

> 100 mg/dL 129

Trang 10

LIST OF TABLES

Table Page

Table 4.1 Symbols and Notations 69

Table 6.1 Correlation coefficients between the current points and the primary reference glucose 102

Table 6.2 Correlation test for normality corresponding to time points 104

Table 8.1 Root mean square errors (RMSE) compared to the reference hematocrit measurements 116

Table 8.2 Mean percentage error (MPE) compared to the reference hematocrit measurement 118

Table 8.3 Comparison results for different criteria of error tolerance 123

Table 8.4 Comparison results on different criteria of error tolerance 126

Table 8.5 Comparison results on RMSE of approaches 126

Table 8.6 Comparison results for different criteria of error tolerance 128

Trang 11

LIST OF ABBREVIATIONS

Abbr Description

BLUE Best linear unbiased estimator

Bmse Bayesian mean square error

BP Backpropagation

C(Rp) Set of all continuous functions defined in the extended Rp

E-ELM Evolutionary extreme learning machine

ELM Extreme learning machine

ELS-ELM Evolutionary least squares extreme learning machine HCT Hematocrit

KKT Karush-Kuhn-Tucker condition

LMMSE Linear minimum mean square error

LRCP Linear model with Reduced Current Points

LS-ELM Least squares extreme learning machine

LWCP Linear model with Whole Current Points

MCV Mean corpuscular volume

MLE Maximum likelihood estimator

MMSE Minimum mean square error

MP Moore-Penrose generalized inverse

MPE Mean percentage error

MSE Mean square error

Trang 12

MVU Minimum variance unbiased estimator

OS-ELM Online sequential extreme learning machine

PCO2 Carbon dioxide partial pressure

PDF Probability density function

pH A measure of the acidity of alkalinity

PO2 Oxygen partial pressure

POCT Point-of-care testing

RBC Red blood cell

RLS-ELM Regularized least-squares extreme learning machine RMSE Root mean squared error

SLFN Single hidden layer feedforward neural network

SVC Support vector classification

SVD Single value decomposition

SVM Support vector machine

SVR Support vector regression

ε-SVR Support vector regression with ε-insensitive loss function

TCC Transduced current curve

WBC White blood cell

WGN White Gaussian noise

WHO World Health Organization

WLS Weighted least squares

Trang 13

Linear and Nonlinear Analysis for Transduced Current

Curves of Electrochemical Biosensors

HUYNH TRUNG HIEU

Department of Computer Engineering Graduate School of Chonnam National University (Directed by Professor Yonggwan Won)

Abstract

Since the development of science and technology, a wide range of diagnostic tests can be done quickly and simply without the need for sophisticated laboratory equipment, in which the biosensors play a role as key technology They are very useful in medicine and healthcare as well as in chemical and biochemical industry to determine and analyze complex mixtures and analytes Normally, biosensors can be classified and evaluated based on the design and functional characteristics such as accuracy, cost, availability, range, simplicity, etc On these bases, electrochemical biosensors are favored due to accuracy, cost and availability

After the expenditure of an enormous amount of effort, the electrochemical biosensors for blood glucose measurement become the most widespread commercial biosensors to date They can be used along with the handheld devices to monitor daily blood glucose levels of diabetic patients to maintain their blood glucose concentrations at or near normal levels, which can reduce substantially complications

Trang 14

due to diabetes Although handheld devices are conveniently used for monitoring and controlling the blood glucose levels, their accuracies are greatly affected by interferences such as uric acid, ascorbic acid, PO2, PCO2, pH, hematocrit, etc, in which the hematocrit is the most highly influencing factor affecting glucose measurements by handheld devices While interferences from oxidizable substances can be reduced by chemical methods, few practical solutions have been proposed to reduce effects of hematocrit However, these solutions increase cost and complexity

of manufacturing procedures They are also very difficult to implement in handheld devices

This research focuses on developing intelligent computing methods for improving accuracies of handheld devices in glucose measurements, which use electrochemical biosensors The analytical principle of the electrochemical biosensors in glucose measurements is based on a bio-interaction process, in which

an electrochemical current signal called transduced current is produced by the interaction of blood glucose with glucose oxidase and the oxidation of reduced form

of the enzyme by electrode This transduced current changes along the time, which is represented by a curve called as transduced current curve (TCC) In some ways, TCC has been used to determine the concentration of glucose by handheld devices However, our research was started from the belief that the changing pattern of TCC includes not only glucose information but also various other factors including interferences Therefore, analysis of TCC can play a crucial role in enhancing performance of measurement by using electrochemical biosensors

Trang 15

In this research, linear and nonlinear models including support vector machines (SVM) and neural networks are investigated to analyze the transduced current curves They can provide proper methods for determining critical factors such as hematocrit and improving accuracy of glucose measurement for the whole blood These models are simple; they do not require complicated chemical procedures and take less cost for handheld devices

Novel methods have been devised for hematocrit estimation from the transduced current curve The first one is linear models, in which the hematocrit is estimated by linear combination of current points sampled from the transduced current curve The second method for hematocrit estimation is using single hidden layer feedforward neural networks (SLFNs) which are trained by extreme learning machine (ELM) algorithm and its improvements to obtain compact networks The input features are also sampled current points An application of support vector machine (SVM) for hematocrit estimation is our third method, in which support vector regression (SVR) was used in mapping the current points to hematocrit The results obtained from measuring methods are the important factor in reducing or eliminating effects of hematocrit level on portable glucose meters In addition, it shows that the clinical indicator can be estimated by cheep handheld devices with fast measuring time

An approach for improving accuracy of glucose measurements by reducing effects of hematocrit in handheld devices has been also investigated For this purpose, the error distribution function related to hematocrit density was modeled by

Trang 16

regression methods Using this function with the input of hematocrit density estimated from the changing pattern of transduced current curve for new samples, the error was computed and subtracted from the value measured by the handheld device This approach reduced the error significantly in average, although the error for some individual samples can be increased

Finally, another novel approach for measurement of glucose value directly from

a set of points sampled from the transduced current curve was also devised This

approach does not require the intermediate step for estimating the hematocrit density

For this approach, several nonlinear regression methods such as neural networks

were applied, and the better performance in glucose measurement is produced This

is because while this approach may reduce the error caused by not only hematocrit

factor but also other critical factors such as PO2, PCO2, pH, etc

Trang 17

CHAPTER I INTRODUCTION

1.1 Statement of the Problem

Glucose is a major component coming from carbohydrate foods It is used as a main source of energy in the body The measurement of glucose in the blood plays an important role in diagnosis and treatment, and especially in the effective treatment of diabetes Typically, there are two types of insulin treatment in diabetic therapy: basal and mealtime The basal insulin may also be called “background” insulin that refers

to continuous secretion of pancreas; it is the insulin working behind the scenes and often taken before bed Mealtime insulin treatment is the injection of additional doses

of faster acting insulin to control the fluctuation of blood glucose levels, which is resulted from different reasons, including the metabolization of sugars and carbohydrates Such fluctuation control requires accurate measurement of the blood glucose levels Failure to do so can result in extreme complications such as blindness and loss of circulation in extremities In addition, Krinsley [1] reported that even a modest degree of hyperglycemia occurring after intensive care unit admission was associated with a substantial increase in hospital mortality in patients with a wide range of medial and surgical diagnoses Patients with glucose concentrations of 80 to 99mg/dL had the lowest hospital mortality (9.6%) and it increased up to 27% with patients having glucose concentrations between 100mg/dL and 119mg/dL The further increase in glucose concentrations had deleterious association with the

Trang 18

highest hospital mortality (42.5%) among patients with glucose concentrations

exceeding 300mg/dL (P<0.001) Therefore, the accurate measurement of glucose

concentrations is important and allows healthcare professionals to have an opportune intervention in hypoglycemic and hyperglycemic conditions

There are some methods for measuring the concentration of analytes in a blood sample such as glucose Although laboratory analysis is the most accurate method for determining glucose values, due to cost and time delays, point-of-care testing (POCT) is often used in glucose measurements It can give direct readings of blood glucose concentrations in a timely and simple manner It is also possible for the individual with diabetes and the healthcare professionals to measure and record glucoses frequently with handheld devices It should be emphasized that the large market for such devices impulses investigating and developing them The American Diabetes Association (ADA) estimated that the national costs of diabetes can increase to 192 billion U.S dollars in 2020 Recent estimates of the World Health Organization (WHO) also indicated that there were 171 million people in the world with diabetes in the year 2000 and this is projected to increase to 366 million by

2030 In this situation, handheld glucose meters can be major tools for managing their disease Such huge need is promising adequate return for development of handheld devices in glucose measurements

Due to portability, handheld devices have limitations on accurate measurements Many studies have shown that the accuracy of glucose measurement is affected by a number of factors such as PO2, PCO2, pHand hematocrit [2-8] Hematocrit, defined as

a measure of the fractional level of red blood cells in the whole blood, is the most influencing factor causing erroneous glucose measurements The glucose results are

Trang 19

overestimated at lower hematocrit levels and underestimated at higher hematocrit levels Therefore, the reduction of dependency on influencing factors (especially hematocrit) in glucose measurements plays a crucial role in enhancing accuracy of handheld devices

There are some approaches trying to remove the dependency on influencing factors by additional chemical procedures loaded on the biosensors However, they cause complication of chemical reaction and high cost for production Determination

of hematocrit can be done by centrifuging a blood specimen in a tiny capillary tube and comparing the red blood cell level to the total volume on a calibrated scale In a laboratory, hematocrit can be determined by using an automated analyzer that calculates this proportion from other measurements instead of direct measurements

In addition, hematocrit can also be estimated by dielectric spectroscopy [9] or some different techniques However, these methods cannot be used in handheld devices Therefore, methods and approaches that can be used in handheld devices to reduce the effects of dependent factors or critical care variables for glucose measurements are needed to improve their accuracies

1.2 Objective and Approach

1.2.1 Overview

In glucose measurement by electrochemical biosensors, enzyme glucose oxidase (GOD) in the glucose biosensors is used to catalyze the oxidation of glucose by oxygen to produce gluconic acid and hydrogen peroxide The reduced form of the enzyme (GO/FADH2) is oxidized to its original state by an electron mediator (ferrocence) The resulting reduced mediator is then oxidized by the active electrode

Trang 20

to produce current, which is called as the transduced current curve

Transduced

current curve

Glucose with errors (Glu e )

Hematocrit

Linear and Nonlinear systems

Corrected Glucose (Gluc)

Linear and Nonlinear systems

Corrected Glucose (Gluc)

Figure 1.1 Overview of the proposed systems: (a) Error correction for glucose values

by reducing the effects of hematocrit (b) Glucose estimation from transduced current

curve

We believed that this transduced current curve can reflect characteristics of chemical reaction; there is a clue in the transduced current curve for removing or reducing measurement errors in handheld devices This research focuses on developing intelligent computational methods (including linear and nonlinear models) to analyze such current curves They can apply to enhance the accuracy of glucose meters by reducing influence of critical care variables These methods are simple, not complicated chemical procedures and less cost for handheld devices Our goals of this research are to devise, investigate, develop and demonstrate the following methods:

(1) Error correction for glucose values by reducing the effects of hematocrit This

Trang 21

idea is depicted in Figure 1.1(a) In this method, the developments of linear and nonlinear systems are used to estimate hematocrit from the transduced current curve A relationship between hematocrit and glucose measurement errors is approximated, which can be used to reduce errors for glucose values

(2) Estimating glucose values directly from the transduced current curves This method determines glucose values directly from the transduced current curve by using machine-learning techniques It may reduce the effects of not only hematocrit but also other critical-care-variables This idea is depicted in Figure 1.1(b)

Figure 1.2 The transduced current curve The first eight seconds may be incubation

time which waits for chemical reaction

1.2.2 Approaches and Contributions

Figure 1.2 displays a transduced anodic current curve obtained in the first 14 seconds using the glucose biosensors and this current curve is used in our study The first 8 seconds may be the incubation time that initiates the chemical reaction It was found

Trang 22

that no information is included, and this is a special case for a specific biosensor product (not general) Thus, we concentrate on the second part of the current curve during the next six seconds In the period of next six seconds, the transduced current curves are sampled at frequency of 10Hz to produce current points There are 59 current points corresponding to each of the transduced current curve These fifty-nine current points are used for our study

Our first approach is estimation of hematocrit from current points sampled from the transduced curve Nonlinear models including neural networks and support vector machine are used, for which the input features are fifty-nine current points The neural network architecture is single hidden layer feedforward neural network (SLFN) which can be trained by the gradient descent based algorithms such as backpropagation (BP) or an effective training algorithm called extreme learning machine (ELM) which was proposed recently by Huang et al [10] However, in order to reduce the training time of networks and the response time of trained networks, we proposed new training algorithms for SLFNs, which are improved from the ELM algorithm Another method for our first approach is using linear models to estimate hematocrit levels The linear combination of whole current points or a smaller number of current points must be determined to approximate hematocrit levels

An approximation function of glucose error due to hematocrit is determined The glucose error is defined as the difference between glucose values measured from handheld devices and accurate glucose values from the primary reference methods Let us call such an approximation function as error-compensation function This function receives an estimated hematocrit and returns an error-compensation value,

Trang 23

which will be used to adjust glucose values obtained by meter with glucose biosensors

Our other approach is to estimate glucose values directly from current points by using SLFNs The input features are the whole 59 current points As done in the hematocrit estimation, the training algorithms for SLFNs are backpropagation (BP), ELM, or our improved ELM algorithms

One of the novel contributions of this research is the development of new training algorithms for SLFNs The advantages of these algorithms, in comparison with the gradient descent based algorithms, are to obtain good generalization performance at very high learning speed and to overcome problems such as overtraining, learning rate and local minima In comparison with ELM, our algorithms can offer compact networks, which have small number of hidden units They make the trained networks to response quickly to new input patterns Specifically, they can reduce significantly the number of network parameters This one is very important in saving memory of handheld devices which have limitations

in storage and computing power

Another key contribution in this research is applying intelligent computational methods for correction of glucose values for handheld devices, not by chemical procedures New approaches for estimating hematocrit levels are proposed by using linear systems, SVM and neural networks The potential advantage of these approaches is that hematocrit values are determined from transduced current curves generated in glucose-measuring process of handheld devices The estimated hematocrit can be used to adjust automatically glucose values by error compensation, which can reduce its effects New approaches for glucose estimation by using neural

Trang 24

networks are also proposed The advantage of this approach is that the effects of multiple critical-care-variables can be reduced, which can give better glucose measurement results by handheld devices

1.2.3 Data Acquisition

The data set used in this study is obtained from blood samples of randomly selected volunteers Each sample is applied to measure following values:

- The accurate hematocrit using centrifugation method

- The accurate glucose values using YSI2700 or YSI2300 (Yellow Springs Instrument, Inc, Yellow Springs, Ohio)

- The glucose values using a handheld meter

- The transduced current curves

In order to measure the transduced current curve changing over time, we specially developed a system that is equipped with modified handheld meter, which are currently in the market

1.3 Organization

This dissertation consists of two parts The first part is the review of literature and theories related to our study including outlier detection and machine learning This part is composed of three chapters The literature for topics related to our study is reviewed in chapter 2 In chapter 3, we present new algorithms for training SLFNs that can obtain good performance with very high speed in both training and testing Methods for outlier detection and elimination are shown in chapter 4 The second part includes approaches for error correction by reducing the effects of critical care variables It includes five chapters Approaches for hematocrit estimation from the

Trang 25

transduced current curve are presented in chapter 5 It involves methods such as linear models, SVM and neural networks In chapter 6, we present a method, which reduces the effects of hematocrit in glucose measurement Another approach for improving glucose measurement, which uses neural networks with input features of current curves, is shown in chapter 7 In chapter 8, we discuss experimental results Finally, conclusions and further works are shown in chapter 9

Trang 26

CHAPTER II LITERATURE REVIEW

This research is related to several topics including function approximation, outlier detection, as well as methods for hematocrit estimation and glucose measurement In this chapter, we review topics related to the function approximation, which are playing crucial roles and related all other topics in our research They include linear and nonlinear systems First, we cover the linear models in Section 2.1 It is one of the simple models based on linear operators, and has important applications in automatic control theory, signal processing, and function approximation In section 2.2, we review neural networks, which are widely used in machine learning, especially in regression analysis They can approximate any function with arbitrary small error if the activation functions are chosen properly A brief description of backpropagation training algorithms and techniques for improving them are provided The approximation capabilities of feedforward networks, especially single hidden layer feedforward neural networks (SLFNs), are emphasized because they provide the architecture of neural networks which can be used to solve our problems Section 2.3 reviews the support vector machine (SVM) in regression or support vector regression (SVR) It has advantages that a nonlinear function can be learned by a linear machine in a kernel induced feature space while the capacity of the system can

be controlled by parameters that do not depend on the dimensionality of the input space

Trang 27

2.1 Linear Models

In order to try and understand phenomena occurring around us when observations on

a phenomenon can be quantified, one often takes the form of building a mathematical

model One of the simple models is the linear model Since it is better understood

and easy to interpret, the methods of analysis and inference are better developed

Furthermore, multiple nonlinear models can be reduced to the linear models by

means of transformation This section provides a brief description of basic operations

in linear models and their parameter estimation with or without assumptions of the

distribution of error and parameters

2.1.1 Overview

In statistics, a linear model is given by

t=w0+w1x1+w2x2+…+w pxp +ε, (2.1)

where t is the response variable or the dependent variable, x1, x2, …, x p are predictors

In some special cases, xi’s are called independent variables or factors The

coefficients w0, w1, …, w p are the parameters of the model, and ε is the unobservable

error If there are N observations of the response and predictors, the explicit form of

the equations would be

ti =w0+w1xi1 +w2xi2 +…+w pxip +ε i , i=1, 2,…, N (2.2)

In matrix notation, the model is written as

T=Xw+ε, (2.3)

where

Trang 28

p p

M M O M

K

In order to complete the description of the model, some assumptions about errors are

necessary It is assumed that the errors are uncorrelated They have zero mean and

covariance matrix Cε These assumptions can be summarized in the matrix notation

as

where the notation E stands for the expected value and the vector 0 denotes a vector

with zero elements

Besides linear models can be used directly as in Equation 2.3, there are other

situations in which the linear models can be used with some modifications such as

polynomial regression [11], curve fitting, Fourier analysis, etc The linear models can

be used in many applications An important application of them is regression analysis

where the dependent variable is modeled as a function of the independent variables

This can be used for prediction, inference, hypothesis test, and modeling of causal

relationships In addition, the linear models can be used in calibration [12], control,

Trang 29

estimation of response surface [13], and imputation of missing data [14]

2.1.2 Parameter Estimation in the Linear Model

This section presents methods for estimating parameters w in the linear model, which

include minimum variance unbiased (MVU) estimation, maximum likelihood

estimation (MLE), least squares and linear Bayesian estimation

a) Minimum Variance Unbiased estimation

A minimum variance unbiased (MVU) estimator guarantees that on the average it

will attain the true value and the average mean squared deviation from the true value

is minimized This means that

ˆ( )

E w =w , and E ⎡⎣( - )w wˆ 2⎤⎦ is minimized

One of the important theorems for this estimator is Cramer-Rao Lower Bound

(CRLB) theorem [15]: if equality constraints

for some functions I and ψ are satisfied then the MVU estimator is existing and

determined by w=ψ( )T with the minimum variance is I-1(w)

Assuming that the linear model (2.3) has white Gaussian noise (WGN) or

Cε=σ2I, the probability density function (PDF) of the data is given by

Trang 30

Using the identities

for a symmetric matrix A, we have

Trang 31

situations where the PDF of error with zero mean and arbitrary covariance

ε~ ( ,ℵ 0 Cε), if we restrict the estimator to be linear in the data T, then the best linear

unbiased estimators (BLUE) of w can be used Following the Gauss-Markov theorem

[15], the BLUE of w is given by

(2.10)

and the minimum variance of w i is

1 1var( ) [( T ) ]

b) Maximum Likelihood Estimation (MLE)

An alternative to MVU estimator is based on the maximum likelihood principle that

is desirable in situations where the MVU estimator does not exist or cannot be found

even though it does exist It is an approximation of the MVU estimator due to its

approximate efficiency Additionally, its performance is optimal for large enough

data records Assuming that ε with PDF ( ,ℵ 0 Cε), the PDF of data is given by

Trang 32

J(w)=(T-Xw) T − 1

ε

C (T-Xw) (2.14)

Since this is a quadratic function of elements of w and is a positive definite matrix,

differentiation will produce the global minimum The partial derivatives is

The previous methods tried to find an optimal or nearly optimal estimator by

considering the class of unbiased estimators and determining the one having

minimum variance In this section, we review a class of estimators that in general

have not optimality properties associated with them but make good sense for many

problems A salient property of this method is that only a data model is assumed,

Trang 33

while no probabilistic assumptions are required for the data, which can apply in a

broader range of possible applications It can be widely used in practice due to easy

implementation and the minimization of a least squares criterion

Unlike MVU estimators that try to be unbiased and have minimum variance,

the least square (LS) approach tries to minimize the squared difference between the

given data and the assumed model For the observations X, T to be linear of the

unknown parameters w, the least squares estimation of w is found by minimizing

estimation as well as MLE that assumes a white Gaussian noise The minimum least

squares error is found by

Trang 34

In other forms, we have

min ( )ˆ

ˆˆ

In the situations where the predictors or independents have different error variance at

different ranges of their values, an extension of the LS estimation called weighted

least squares (WLS) estimation should be used The WLS introduces weighting

factors αi’s into the error criterion which emphasize the contributions of data samples,

which is described by

2 1

p

w w w

Trang 35

=

1

p

i i i

w

=

∑ x

It showed that the response is considered as a linear combination of vectors {x1 x2 …

xp} Furthermore, if we define the Euclidean distance by ξ = ξ ξT , then the LS

error can be also written as

2 2

Thus, the LS approach tries to minimize the square of the distance from the data

vector T to , and this problem can be considered as the problem of fitting or

approximating a vector T in R

1

p

i i i

d) Linear Bayesian Estimators

The previous methods assumed that the linear parameter is a deterministic but

unknown constant In this section, we briefly introduce another method for

determining parameter that is assumed to be a random variable This is the Bayesian

approach because it is implemented by applying directly the Bayes’ theorem If the

prior knowledge (prior PDF) of w is given, then we can attempt to find an estimator

that minimize the Bayesian MSE defined by

=∫∫(w w− ˆ) ( , )2p T w T w d d

We use Bayes’ theorem to write

Trang 36

p(T,w)= p(w |T) p(T)

so that

2 2

Because of p(T) ≥ 0 for all T, the minimization of integral in brackets for each T

leads to the minimization of the Bayesian MSE, and we have

ℵ is independent of w The linear model (2.3) becomes the Bayesian general

linear model In order to estimate w by (2.26), we must have an explicit expression

for the posterior PDF p(w|T)

Since w and ε are independent of each other and each one is Gaussian,

T=Xw+ε is also Gaussian, and they are jointly Gaussian It follows a theorem that if

T and w are jointly Gaussian with mean vector [E(T) T E(w) T] and partitioned

Trang 37

Thus, under the jointly Gaussian assumption, the optimal Bayesian estimators

are easily found However, in general, they are not always to make the Gaussian

assumption In this situation, another method is used that is Linear MMSE (LMMSE)

Trang 38

estimation It considers the class of all linear estimators of the form

L

We must choose the weighting coefficients A to minimize

Bmse(ŵi)=E[(wi- ŵi)2], i=0, 1, …, p

If we assume that the random vector of parameters w has mean E(w) and covariance

matrix Cww, and error vector ε has zero mean and covariance matrix Cε and is

uncorrelated with w, then the LMMSE estimator of w is also given by

ŵ=E(w)+ CwwXT(XCwwXT+Cε)-1(T- XE(w))

The performance of the estimator is measured by the error e=w-ŵ which has mean of

zero and covariance matrix of

1

e =( ww-1 + T ε − )− 1

2.2 Feedforward Neural Networks

The feedforward neural networks have been frequently used in machine learning due

to their ability to approximate complex nonlinear mappings directly from input

patterns They can also provide proper classification models for a wide range of

problems that are difficult to handle by using classical parametric techniques There

are several materials presenting details of neural networks that cover models,

Trang 39

principles, operations, properties, and applications [16-22] This section briefly describes the basis operations of feedforward neural networks including feedforward operations, gradient-descent based learning algorithms, practical techniques and theoretical foundations for improving gradient-descent based algorithms Especially, the approximation capacities of neural networks are emphasized

2.2.1 Neural Networks and Feedforward Operation

The feedforward neural network consists of a hierarchy of processing units (Fig 2.1), organized in a series of two or more layers The first layer serves as a holding site for the inputs applied to the network and it is called as input layer The last one is called

as output layer at which the overall mapping of the network inputs is available Between the input layer and the output layer, there are zero or multiple hidden layers

at which additional remapping or computing takes place The function of processing units is loosely based on characteristics of biological neurons, so they are also called neurons Each neuron or unit at layer Ll has weights connected from the previous lower layer Ll-1 Letol− 1=[ 1

l

l L

o−− ]T be the output of the previous layer (Ll-1)

w − T be weights connected from the layer Ll-1 to the j-th unit

of layer Ll The net activation is formed by the inner product of the

previous-layer output and weights :

l j

nonlinear mapping of its net activation by an activation function f:

Trang 40

l j

o j ( l)

j

net

1 1

l

o−

l j

o

l j

bias

2

l j

w

1

l j

w

1

l

l jL

w

1 2

l

o−

− 1

1

l

l L

−

o −

Figure 2.1 A typical feedforward neural network

The activation function f(·) is usually differential, nonlinear and non-decreasing Two forms of this activation function are often used: f(net)=tanh(net) whose range is normalized from -1 to 1, and f(net)=(1+exp(-net))-1 whose range is normalized from

0 to 1

In mathematical aspect, a multilayer feedforward neural network (FFNN) can

be described as a composite application of functions Each of them represents a

particular layer and may be specific to individual units in the layer Let o be the output of FFNN corresponding to the input pattern x The FFNN implements a

2.2.2 Gradient-descent based Learning Algorithms

Once appropriate network architecture is chosen, we must train it for a specific

application The main goal of training process is to find network parameters w

including weights and biases that minimize error function defined by:

Định dạng
Số trang	169
Dung lượng	1,99 MB