5.1 Review of Hematocrit and Previous Measurement Methods...79 5.1.1 Typical Methods for Measuring Hematocrit ...79 5.1.2 Hematocrit Determination from Impedance...80 5.1.3 Hematocrit Me
Trang 1Doctor of Philosophy Dissertation
Linear and Nonlinear Analysis for Transduced Current Curves of Electrochemical Biosensors
Graduate School of Chonnam National University
Department of Computer Engineering
HUYNH TRUNG HIEU
Directed by Professor Yonggwan Won
February 2009
Trang 2TABLE OF CONTENTS
TABLE OF CONTENTS i
LIST OF FIGURES v
LIST OF TABLES ix
LIST OF ABBREVIATIONS x
Abstract xii
CHAPTER I INTRODUCTION 1
1.1 Statement of the Problem 1
1.2 Objective and Approach 3
1.2.1 Overview 3
1.2.2 Approaches and Contributions 5
1.2.3 Data Acquisition 8
1.3 Organization 8
CHAPTER II LITERATURE REVIEW 10
2.1 Linear Models 11
2.1.1 Overview 11
2.1.2 Parameter Estimation in the Linear Model 13
a) Minimum Variance Unbiased estimation 13
b) Maximum Likelihood Estimation (MLE) 15
c) Least Squares (LS) 16
d) Linear Bayesian Estimators 19
2.2 Feedforward Neural Networks 22
2.2.1 Neural Networks and Feedforward Operation 23
Trang 32.2.2 Gradient-descent based Learning Algorithms 24
2.2.3 Practical Techniques for Improving Backpropagation 26
2.2.4 Theoretical Foundations for Improving Backpropagation 27
2.2.5 Approximation Capabilities of Feedforward networks and SLFNs 30
2.3 Support Vector Machine 34
CHAPTER III TRAINING ALGORITMS FOR SINGLE HIDDEN LAYER FEEDFORWARD NEURAL NETWORKS 43
3.1 Single Hidden Layer Feedforward Neural Networks 43
3.2 Extreme Learning Machine (ELM) 45
3.3 Evolutionary Extreme Learning Machine (E-ELM) 49
3.4 Least-Squares Extreme Learning Machine 51
3.4.1 Least-Squares Extreme Learning Machine (LS-ELM) 51
3.4.2 Online Training with LS-ELM 54
3.5 Regularized Least-Squares Extreme Learning Machine (RLS-ELM) 59
3.6 Evolutionary Least-Squares Extreme Learning Machine (ELS-ELM) 61
CHAPTER IV OUTLIER DETECTION AND ELIMINATION 64
4.1 Distance-based outlier detection 64
4.2 Density-based local outlier detection 66
4.3 The Chebyshev outlier detection 68
4.4 Area-descent-based outlier detection 69
4.5 Two-stage area-descent outlier detection 72
4.6 ELM-based outlier Detection and Elimination 74
CHAPTER V HEMATOCRIT ESTIMATION FROM TRANSDUCED CURRENT CURVE 78
Trang 45.1 Review of Hematocrit and Previous Measurement Methods 79
5.1.1 Typical Methods for Measuring Hematocrit 79
5.1.2 Hematocrit Determination from Impedance 80
5.1.3 Hematocrit Measurement by Dielectric Spectroscopy 82
5.2 Hematocrit Estimation from Transduced Current Curve 83
5.2.1 Transduced Current Curve from Electrochemical Biosensor for Glucose Measurement 84
5.2.2 Linear Models for Hematocrit Estimation 86
5.2.3 Neural Network for Hematocrit Estimation 90
5.2.4 Hematocrit Estimation by Using Support Vector Machine 91
CHAPTER VI ERROR CORRECTION FOR GLUCOSE BY REDUCING EFFECTS OF HEMATOCRIT 92
6.1 Effects of Hematocrit on Glucose Measurement 92
6.2 Error Correction for Glucose Measured by a Handheld Device 95
6.3 Error Correction for Glucose Computed Using a Single Transduced Current Point 99
CHAPTER VII DIRECT ESTIMATION FOR GLUCOSE DENSITY FROM TRANSDUCED CURRENT CURVE 107
7.1 Effects of Critical Care Variables 107
7.2 Glucose Estimation from the Transduced Current Curve 109
CHAPTER VIII EXPERIMENTAL RESULTS 114
8.1 Experimental Results for Hematocrit Estimation 115
8.2 Experimental Results for Glucose Correction 119
8.2.1 Error Correction for Glucose Measured by the Handheld Device 120
Trang 58.2.2 Error Correction for Glucose Computed Using a Single
Transduced Current Point 125
8.3 Experimental Results for Direct Estimation for Glucose from the Transduced Current Curve 127
CHAPTER IX CONCLUSIONS AND FUTURE WORKS 130
9.1 Conclusions 130
9.2 Future Works 133
9.2.1 Feature Selection 133
9.2.2 Optical Biosensors 133
9.2.3 Reducing Effects of Other Factors 134
9.2.4 Applying Improvements of ELM in Medical Diagnosis 134
REFERENCES 135
ACKNOWLEDGMENTS 148
CURRICULUM VITAE 150
Trang 6LIST OF FIGURES
Figure Page
Figure 1.1 Overview of the proposed systems: (a) Error correction for glucose
values by reducing the effects of hematocrit (b) Glucose estimation
from transduced current curve 4
Figure 1.2 The transduced current curve The first eight seconds may be
incubation time which waits for chemical reaction .5
Figure 2.1 A typical feedforward neural network .24
Figure 2.2 Loss functions can be used in SVR, in which ε-insensitive loss
function allows obtaining a sparse set of support vectors 35
Figure 2.3 Soft margin loss setting corresponds for a linear SV machine 37
Figure 3.1 The architecture of single hidden layer feedforward neural network
(SLFN) 44
Figure 4.1 A simple 2D dataset contains points belonging to two clusters C 1
and C 2 C1 forms a denser cluster than C 2 Two additional points o 1 and
o2 can be considered as outliers 65
Figure 4.2 Detecting outliers by the area descent method 70
Figure 4.3 A simple dataset with closed outliers o1 and o2 These outliers
cannot be detected by area-descent based method 71
Figure 5.1 An example of anodic current curve corresponding to the first 14s
Trang 7The first 8 seconds may be incubation time, which waits for chemical reaction 84
Figure 5.2 Transduced anodic current points used in estimation of hematocrit
They are obtained by sampling the second part of current curve at frequency of 10Hz .85
Figure 5.3 Current measurements at the time instants They seem to be an
exponential function of time .87
Figure 5.4 Hematocrit estimation by using LRCP approach Current curve
together with its two extra features are the input of linear model .89
Figure 5.5 Hematocrit estimation using the neural network model Input
features are current points sampled from the transduced current curve with/without extra features .91
Figure 6.1 Effects of Hematocrit on Glucose Measurement: (a) same measured
value on current curve but different glucose value, (b) different measured value on current curve but same glucose value .93
Figure 6.2 Plot of the paired-differences of glucose measurements by portable
device minus the primary reference glucose measurements as function
of hematocrit [5] .94
Figure 6.3 Glucose correction process Finding a mapping from tm to tc so that
dependency of hematocrit is reduced and errors are also reduced .95
Figure 6.4 An illustration of glucose correction of handheld devices .97
Trang 8Figure 6.5 An illustration of glucose correction measured from a single point
on the transduced current curve .100
Figure 6.6 Plot of the primary reference glucose against current point x57 We
can diagnose that there would be a linear relationship between the
primary reference glucose and current-point xk 101
Figure 7.1 Effects of PO2 on glucose measurement by handheld devices [5]
The glucose is underestimated at higher levels of PO2 .108
Figure 7.2 Effects of PCO2 on glucose measurement by handheld devices [5]
The measured glucose is underestimated at the higher levels of PCO2 .108
Figure 7.3 Effects of pH on glucose measurement by handheld devices [5] 109
Figure 7.4 Illustration of estimating glucose from the transduced current curve
Glucose values are estimated directly from multiple current points, which include changing information of the transduced current curve 110
Figure 7.5 SLFNs for estimating glucose Input features are current points
sampled from the transduced current curve 111
Figure 8.1 Distribution of collected hematocrit This distribution is fairly
representing the general trend of hematocrit values for human 114
Figure 8.2 Distribution of glucose collected from YSI 2300 119
Figure 8.3 Plot of paired-differences of glucose measurements by handheld
device minus the YSI2300 glucose measurements The dependency of
hematocrit on residuals is significant 120
Trang 9Figure 8.4 The paired-differences of a testing set corresponding to glucose
measurements by handheld device without error correction The dependency of hematocrit on residuals is significant .122
Figure 8.5 The paired-differences of a testing set corresponding to glucose
measurements by handheld device after error correction The dependency of hematocrit on residuals is reduced significantly 122
Figure 8.6 Comparison of glucose results from handheld meter and the
primary reference instrument, YSI 2300: (a) before error correction
and (b) after error correction 124
Figure 8.7 The plot of paired-differences of estimated glucose on the test set
minus the YSI 2300 glucose measurements with respect to the hematocrit density The dependency of hematocrit on residuals is almost removed 127
Figure 8.8 The comparison of glucose value between the neural network and
the primary reference instrument corresponding to criterion of
±15mg/dL for glucose levels ≤100 mg/dL and ±15% for glucose levels
> 100 mg/dL 129
Trang 10LIST OF TABLES
Table Page
Table 4.1 Symbols and Notations 69
Table 6.1 Correlation coefficients between the current points and the primary reference glucose 102
Table 6.2 Correlation test for normality corresponding to time points 104
Table 8.1 Root mean square errors (RMSE) compared to the reference hematocrit measurements 116
Table 8.2 Mean percentage error (MPE) compared to the reference hematocrit measurement 118
Table 8.3 Comparison results for different criteria of error tolerance 123
Table 8.4 Comparison results on different criteria of error tolerance 126
Table 8.5 Comparison results on RMSE of approaches 126
Table 8.6 Comparison results for different criteria of error tolerance 128
Trang 11LIST OF ABBREVIATIONS
Abbr Description
BLUE Best linear unbiased estimator
Bmse Bayesian mean square error
BP Backpropagation
C(Rp) Set of all continuous functions defined in the extended Rp
E-ELM Evolutionary extreme learning machine
ELM Extreme learning machine
ELS-ELM Evolutionary least squares extreme learning machine HCT Hematocrit
KKT Karush-Kuhn-Tucker condition
LMMSE Linear minimum mean square error
LRCP Linear model with Reduced Current Points
LS-ELM Least squares extreme learning machine
LWCP Linear model with Whole Current Points
MCV Mean corpuscular volume
MLE Maximum likelihood estimator
MMSE Minimum mean square error
MP Moore-Penrose generalized inverse
MPE Mean percentage error
MSE Mean square error
Trang 12MVU Minimum variance unbiased estimator
OS-ELM Online sequential extreme learning machine
PCO2 Carbon dioxide partial pressure
PDF Probability density function
pH A measure of the acidity of alkalinity
PO2 Oxygen partial pressure
POCT Point-of-care testing
RBC Red blood cell
RLS-ELM Regularized least-squares extreme learning machine RMSE Root mean squared error
SLFN Single hidden layer feedforward neural network
SVC Support vector classification
SVD Single value decomposition
SVM Support vector machine
SVR Support vector regression
ε-SVR Support vector regression with ε-insensitive loss function
TCC Transduced current curve
WBC White blood cell
WGN White Gaussian noise
WHO World Health Organization
WLS Weighted least squares
Trang 13Linear and Nonlinear Analysis for Transduced Current
Curves of Electrochemical Biosensors
HUYNH TRUNG HIEU
Department of Computer Engineering Graduate School of Chonnam National University (Directed by Professor Yonggwan Won)
Abstract
Since the development of science and technology, a wide range of diagnostic tests can be done quickly and simply without the need for sophisticated laboratory equipment, in which the biosensors play a role as key technology They are very useful in medicine and healthcare as well as in chemical and biochemical industry to determine and analyze complex mixtures and analytes Normally, biosensors can be classified and evaluated based on the design and functional characteristics such as accuracy, cost, availability, range, simplicity, etc On these bases, electrochemical biosensors are favored due to accuracy, cost and availability
After the expenditure of an enormous amount of effort, the electrochemical biosensors for blood glucose measurement become the most widespread commercial biosensors to date They can be used along with the handheld devices to monitor daily blood glucose levels of diabetic patients to maintain their blood glucose concentrations at or near normal levels, which can reduce substantially complications
Trang 14due to diabetes Although handheld devices are conveniently used for monitoring and controlling the blood glucose levels, their accuracies are greatly affected by interferences such as uric acid, ascorbic acid, PO2, PCO2, pH, hematocrit, etc, in which the hematocrit is the most highly influencing factor affecting glucose measurements by handheld devices While interferences from oxidizable substances can be reduced by chemical methods, few practical solutions have been proposed to reduce effects of hematocrit However, these solutions increase cost and complexity
of manufacturing procedures They are also very difficult to implement in handheld devices
This research focuses on developing intelligent computing methods for improving accuracies of handheld devices in glucose measurements, which use electrochemical biosensors The analytical principle of the electrochemical biosensors in glucose measurements is based on a bio-interaction process, in which
an electrochemical current signal called transduced current is produced by the interaction of blood glucose with glucose oxidase and the oxidation of reduced form
of the enzyme by electrode This transduced current changes along the time, which is represented by a curve called as transduced current curve (TCC) In some ways, TCC has been used to determine the concentration of glucose by handheld devices However, our research was started from the belief that the changing pattern of TCC includes not only glucose information but also various other factors including interferences Therefore, analysis of TCC can play a crucial role in enhancing performance of measurement by using electrochemical biosensors
Trang 15In this research, linear and nonlinear models including support vector machines (SVM) and neural networks are investigated to analyze the transduced current curves They can provide proper methods for determining critical factors such as hematocrit and improving accuracy of glucose measurement for the whole blood These models are simple; they do not require complicated chemical procedures and take less cost for handheld devices
Novel methods have been devised for hematocrit estimation from the transduced current curve The first one is linear models, in which the hematocrit is estimated by linear combination of current points sampled from the transduced current curve The second method for hematocrit estimation is using single hidden layer feedforward neural networks (SLFNs) which are trained by extreme learning machine (ELM) algorithm and its improvements to obtain compact networks The input features are also sampled current points An application of support vector machine (SVM) for hematocrit estimation is our third method, in which support vector regression (SVR) was used in mapping the current points to hematocrit The results obtained from measuring methods are the important factor in reducing or eliminating effects of hematocrit level on portable glucose meters In addition, it shows that the clinical indicator can be estimated by cheep handheld devices with fast measuring time
An approach for improving accuracy of glucose measurements by reducing effects of hematocrit in handheld devices has been also investigated For this purpose, the error distribution function related to hematocrit density was modeled by
Trang 16regression methods Using this function with the input of hematocrit density estimated from the changing pattern of transduced current curve for new samples, the error was computed and subtracted from the value measured by the handheld device This approach reduced the error significantly in average, although the error for some individual samples can be increased
Finally, another novel approach for measurement of glucose value directly from
a set of points sampled from the transduced current curve was also devised This
approach does not require the intermediate step for estimating the hematocrit density
For this approach, several nonlinear regression methods such as neural networks
were applied, and the better performance in glucose measurement is produced This
is because while this approach may reduce the error caused by not only hematocrit
factor but also other critical factors such as PO2, PCO2, pH, etc
Trang 17CHAPTER I INTRODUCTION
1.1 Statement of the Problem
Glucose is a major component coming from carbohydrate foods It is used as a main source of energy in the body The measurement of glucose in the blood plays an important role in diagnosis and treatment, and especially in the effective treatment of diabetes Typically, there are two types of insulin treatment in diabetic therapy: basal and mealtime The basal insulin may also be called “background” insulin that refers
to continuous secretion of pancreas; it is the insulin working behind the scenes and often taken before bed Mealtime insulin treatment is the injection of additional doses
of faster acting insulin to control the fluctuation of blood glucose levels, which is resulted from different reasons, including the metabolization of sugars and carbohydrates Such fluctuation control requires accurate measurement of the blood glucose levels Failure to do so can result in extreme complications such as blindness and loss of circulation in extremities In addition, Krinsley [1] reported that even a modest degree of hyperglycemia occurring after intensive care unit admission was associated with a substantial increase in hospital mortality in patients with a wide range of medial and surgical diagnoses Patients with glucose concentrations of 80 to 99mg/dL had the lowest hospital mortality (9.6%) and it increased up to 27% with patients having glucose concentrations between 100mg/dL and 119mg/dL The further increase in glucose concentrations had deleterious association with the
Trang 18highest hospital mortality (42.5%) among patients with glucose concentrations
exceeding 300mg/dL (P<0.001) Therefore, the accurate measurement of glucose
concentrations is important and allows healthcare professionals to have an opportune intervention in hypoglycemic and hyperglycemic conditions
There are some methods for measuring the concentration of analytes in a blood sample such as glucose Although laboratory analysis is the most accurate method for determining glucose values, due to cost and time delays, point-of-care testing (POCT) is often used in glucose measurements It can give direct readings of blood glucose concentrations in a timely and simple manner It is also possible for the individual with diabetes and the healthcare professionals to measure and record glucoses frequently with handheld devices It should be emphasized that the large market for such devices impulses investigating and developing them The American Diabetes Association (ADA) estimated that the national costs of diabetes can increase to 192 billion U.S dollars in 2020 Recent estimates of the World Health Organization (WHO) also indicated that there were 171 million people in the world with diabetes in the year 2000 and this is projected to increase to 366 million by
2030 In this situation, handheld glucose meters can be major tools for managing their disease Such huge need is promising adequate return for development of handheld devices in glucose measurements
Due to portability, handheld devices have limitations on accurate measurements Many studies have shown that the accuracy of glucose measurement is affected by a number of factors such as PO2, PCO2, pHand hematocrit [2-8] Hematocrit, defined as
a measure of the fractional level of red blood cells in the whole blood, is the most influencing factor causing erroneous glucose measurements The glucose results are
Trang 19overestimated at lower hematocrit levels and underestimated at higher hematocrit levels Therefore, the reduction of dependency on influencing factors (especially hematocrit) in glucose measurements plays a crucial role in enhancing accuracy of handheld devices
There are some approaches trying to remove the dependency on influencing factors by additional chemical procedures loaded on the biosensors However, they cause complication of chemical reaction and high cost for production Determination
of hematocrit can be done by centrifuging a blood specimen in a tiny capillary tube and comparing the red blood cell level to the total volume on a calibrated scale In a laboratory, hematocrit can be determined by using an automated analyzer that calculates this proportion from other measurements instead of direct measurements
In addition, hematocrit can also be estimated by dielectric spectroscopy [9] or some different techniques However, these methods cannot be used in handheld devices Therefore, methods and approaches that can be used in handheld devices to reduce the effects of dependent factors or critical care variables for glucose measurements are needed to improve their accuracies
1.2 Objective and Approach
1.2.1 Overview
In glucose measurement by electrochemical biosensors, enzyme glucose oxidase (GOD) in the glucose biosensors is used to catalyze the oxidation of glucose by oxygen to produce gluconic acid and hydrogen peroxide The reduced form of the enzyme (GO/FADH2) is oxidized to its original state by an electron mediator (ferrocence) The resulting reduced mediator is then oxidized by the active electrode
Trang 20to produce current, which is called as the transduced current curve
Transduced
current curve
Glucose with errors (Glu e )
Hematocrit
Linear and Nonlinear systems
Corrected Glucose (Gluc)
Linear and Nonlinear systems
Corrected Glucose (Gluc)
Figure 1.1 Overview of the proposed systems: (a) Error correction for glucose values
by reducing the effects of hematocrit (b) Glucose estimation from transduced current
curve
We believed that this transduced current curve can reflect characteristics of chemical reaction; there is a clue in the transduced current curve for removing or reducing measurement errors in handheld devices This research focuses on developing intelligent computational methods (including linear and nonlinear models) to analyze such current curves They can apply to enhance the accuracy of glucose meters by reducing influence of critical care variables These methods are simple, not complicated chemical procedures and less cost for handheld devices Our goals of this research are to devise, investigate, develop and demonstrate the following methods:
(1) Error correction for glucose values by reducing the effects of hematocrit This
Trang 21idea is depicted in Figure 1.1(a) In this method, the developments of linear and nonlinear systems are used to estimate hematocrit from the transduced current curve A relationship between hematocrit and glucose measurement errors is approximated, which can be used to reduce errors for glucose values
(2) Estimating glucose values directly from the transduced current curves This method determines glucose values directly from the transduced current curve by using machine-learning techniques It may reduce the effects of not only hematocrit but also other critical-care-variables This idea is depicted in Figure 1.1(b)
Figure 1.2 The transduced current curve The first eight seconds may be incubation
time which waits for chemical reaction
1.2.2 Approaches and Contributions
Figure 1.2 displays a transduced anodic current curve obtained in the first 14 seconds using the glucose biosensors and this current curve is used in our study The first 8 seconds may be the incubation time that initiates the chemical reaction It was found
Trang 22that no information is included, and this is a special case for a specific biosensor product (not general) Thus, we concentrate on the second part of the current curve during the next six seconds In the period of next six seconds, the transduced current curves are sampled at frequency of 10Hz to produce current points There are 59 current points corresponding to each of the transduced current curve These fifty-nine current points are used for our study
Our first approach is estimation of hematocrit from current points sampled from the transduced curve Nonlinear models including neural networks and support vector machine are used, for which the input features are fifty-nine current points The neural network architecture is single hidden layer feedforward neural network (SLFN) which can be trained by the gradient descent based algorithms such as backpropagation (BP) or an effective training algorithm called extreme learning machine (ELM) which was proposed recently by Huang et al [10] However, in order to reduce the training time of networks and the response time of trained networks, we proposed new training algorithms for SLFNs, which are improved from the ELM algorithm Another method for our first approach is using linear models to estimate hematocrit levels The linear combination of whole current points or a smaller number of current points must be determined to approximate hematocrit levels
An approximation function of glucose error due to hematocrit is determined The glucose error is defined as the difference between glucose values measured from handheld devices and accurate glucose values from the primary reference methods Let us call such an approximation function as error-compensation function This function receives an estimated hematocrit and returns an error-compensation value,
Trang 23which will be used to adjust glucose values obtained by meter with glucose biosensors
Our other approach is to estimate glucose values directly from current points by using SLFNs The input features are the whole 59 current points As done in the hematocrit estimation, the training algorithms for SLFNs are backpropagation (BP), ELM, or our improved ELM algorithms
One of the novel contributions of this research is the development of new training algorithms for SLFNs The advantages of these algorithms, in comparison with the gradient descent based algorithms, are to obtain good generalization performance at very high learning speed and to overcome problems such as overtraining, learning rate and local minima In comparison with ELM, our algorithms can offer compact networks, which have small number of hidden units They make the trained networks to response quickly to new input patterns Specifically, they can reduce significantly the number of network parameters This one is very important in saving memory of handheld devices which have limitations
in storage and computing power
Another key contribution in this research is applying intelligent computational methods for correction of glucose values for handheld devices, not by chemical procedures New approaches for estimating hematocrit levels are proposed by using linear systems, SVM and neural networks The potential advantage of these approaches is that hematocrit values are determined from transduced current curves generated in glucose-measuring process of handheld devices The estimated hematocrit can be used to adjust automatically glucose values by error compensation, which can reduce its effects New approaches for glucose estimation by using neural
Trang 24networks are also proposed The advantage of this approach is that the effects of multiple critical-care-variables can be reduced, which can give better glucose measurement results by handheld devices
1.2.3 Data Acquisition
The data set used in this study is obtained from blood samples of randomly selected volunteers Each sample is applied to measure following values:
- The accurate hematocrit using centrifugation method
- The accurate glucose values using YSI2700 or YSI2300 (Yellow Springs Instrument, Inc, Yellow Springs, Ohio)
- The glucose values using a handheld meter
- The transduced current curves
In order to measure the transduced current curve changing over time, we specially developed a system that is equipped with modified handheld meter, which are currently in the market
1.3 Organization
This dissertation consists of two parts The first part is the review of literature and theories related to our study including outlier detection and machine learning This part is composed of three chapters The literature for topics related to our study is reviewed in chapter 2 In chapter 3, we present new algorithms for training SLFNs that can obtain good performance with very high speed in both training and testing Methods for outlier detection and elimination are shown in chapter 4 The second part includes approaches for error correction by reducing the effects of critical care variables It includes five chapters Approaches for hematocrit estimation from the
Trang 25transduced current curve are presented in chapter 5 It involves methods such as linear models, SVM and neural networks In chapter 6, we present a method, which reduces the effects of hematocrit in glucose measurement Another approach for improving glucose measurement, which uses neural networks with input features of current curves, is shown in chapter 7 In chapter 8, we discuss experimental results Finally, conclusions and further works are shown in chapter 9
Trang 26CHAPTER II LITERATURE REVIEW
This research is related to several topics including function approximation, outlier detection, as well as methods for hematocrit estimation and glucose measurement In this chapter, we review topics related to the function approximation, which are playing crucial roles and related all other topics in our research They include linear and nonlinear systems First, we cover the linear models in Section 2.1 It is one of the simple models based on linear operators, and has important applications in automatic control theory, signal processing, and function approximation In section 2.2, we review neural networks, which are widely used in machine learning, especially in regression analysis They can approximate any function with arbitrary small error if the activation functions are chosen properly A brief description of backpropagation training algorithms and techniques for improving them are provided The approximation capabilities of feedforward networks, especially single hidden layer feedforward neural networks (SLFNs), are emphasized because they provide the architecture of neural networks which can be used to solve our problems Section 2.3 reviews the support vector machine (SVM) in regression or support vector regression (SVR) It has advantages that a nonlinear function can be learned by a linear machine in a kernel induced feature space while the capacity of the system can
be controlled by parameters that do not depend on the dimensionality of the input space
Trang 272.1 Linear Models
In order to try and understand phenomena occurring around us when observations on
a phenomenon can be quantified, one often takes the form of building a mathematical
model One of the simple models is the linear model Since it is better understood
and easy to interpret, the methods of analysis and inference are better developed
Furthermore, multiple nonlinear models can be reduced to the linear models by
means of transformation This section provides a brief description of basic operations
in linear models and their parameter estimation with or without assumptions of the
distribution of error and parameters
2.1.1 Overview
In statistics, a linear model is given by
t=w0+w1x1+w2x2+…+w pxp +ε, (2.1)
where t is the response variable or the dependent variable, x1, x2, …, x p are predictors
In some special cases, xi’s are called independent variables or factors The
coefficients w0, w1, …, w p are the parameters of the model, and ε is the unobservable
error If there are N observations of the response and predictors, the explicit form of
the equations would be
ti =w0+w1xi1 +w2xi2 +…+w pxip +ε i , i=1, 2,…, N (2.2)
In matrix notation, the model is written as
T=Xw+ε, (2.3)
where
Trang 28p p
M M O M
K
In order to complete the description of the model, some assumptions about errors are
necessary It is assumed that the errors are uncorrelated They have zero mean and
covariance matrix Cε These assumptions can be summarized in the matrix notation
as
where the notation E stands for the expected value and the vector 0 denotes a vector
with zero elements
Besides linear models can be used directly as in Equation 2.3, there are other
situations in which the linear models can be used with some modifications such as
polynomial regression [11], curve fitting, Fourier analysis, etc The linear models can
be used in many applications An important application of them is regression analysis
where the dependent variable is modeled as a function of the independent variables
This can be used for prediction, inference, hypothesis test, and modeling of causal
relationships In addition, the linear models can be used in calibration [12], control,
Trang 29estimation of response surface [13], and imputation of missing data [14]
2.1.2 Parameter Estimation in the Linear Model
This section presents methods for estimating parameters w in the linear model, which
include minimum variance unbiased (MVU) estimation, maximum likelihood
estimation (MLE), least squares and linear Bayesian estimation
a) Minimum Variance Unbiased estimation
A minimum variance unbiased (MVU) estimator guarantees that on the average it
will attain the true value and the average mean squared deviation from the true value
is minimized This means that
ˆ( )
E w =w , and E ⎡⎣( - )w wˆ 2⎤⎦ is minimized
One of the important theorems for this estimator is Cramer-Rao Lower Bound
(CRLB) theorem [15]: if equality constraints
for some functions I and ψ are satisfied then the MVU estimator is existing and
determined by w=ψ( )T with the minimum variance is I-1(w)
Assuming that the linear model (2.3) has white Gaussian noise (WGN) or
Cε=σ2I, the probability density function (PDF) of the data is given by
Trang 30Using the identities
for a symmetric matrix A, we have
Trang 31situations where the PDF of error with zero mean and arbitrary covariance
ε~ ( ,ℵ 0 Cε), if we restrict the estimator to be linear in the data T, then the best linear
unbiased estimators (BLUE) of w can be used Following the Gauss-Markov theorem
[15], the BLUE of w is given by
(2.10)
and the minimum variance of w i is
1 1var( ) [( T ) ]
b) Maximum Likelihood Estimation (MLE)
An alternative to MVU estimator is based on the maximum likelihood principle that
is desirable in situations where the MVU estimator does not exist or cannot be found
even though it does exist It is an approximation of the MVU estimator due to its
approximate efficiency Additionally, its performance is optimal for large enough
data records Assuming that ε with PDF ( ,ℵ 0 Cε), the PDF of data is given by
Trang 32J(w)=(T-Xw) T − 1
ε
C (T-Xw) (2.14)
Since this is a quadratic function of elements of w and is a positive definite matrix,
differentiation will produce the global minimum The partial derivatives is
The previous methods tried to find an optimal or nearly optimal estimator by
considering the class of unbiased estimators and determining the one having
minimum variance In this section, we review a class of estimators that in general
have not optimality properties associated with them but make good sense for many
problems A salient property of this method is that only a data model is assumed,
Trang 33while no probabilistic assumptions are required for the data, which can apply in a
broader range of possible applications It can be widely used in practice due to easy
implementation and the minimization of a least squares criterion
Unlike MVU estimators that try to be unbiased and have minimum variance,
the least square (LS) approach tries to minimize the squared difference between the
given data and the assumed model For the observations X, T to be linear of the
unknown parameters w, the least squares estimation of w is found by minimizing
estimation as well as MLE that assumes a white Gaussian noise The minimum least
squares error is found by
Trang 34In other forms, we have
min ( )ˆ
ˆˆ
In the situations where the predictors or independents have different error variance at
different ranges of their values, an extension of the LS estimation called weighted
least squares (WLS) estimation should be used The WLS introduces weighting
factors αi’s into the error criterion which emphasize the contributions of data samples,
which is described by
2 1
p
w w w
Trang 35=
1
p
i i i
w
=
∑ x
It showed that the response is considered as a linear combination of vectors {x1 x2 …
xp} Furthermore, if we define the Euclidean distance by ξ = ξ ξT , then the LS
error can be also written as
2 2
Thus, the LS approach tries to minimize the square of the distance from the data
vector T to , and this problem can be considered as the problem of fitting or
approximating a vector T in R
1
p
i i i
d) Linear Bayesian Estimators
The previous methods assumed that the linear parameter is a deterministic but
unknown constant In this section, we briefly introduce another method for
determining parameter that is assumed to be a random variable This is the Bayesian
approach because it is implemented by applying directly the Bayes’ theorem If the
prior knowledge (prior PDF) of w is given, then we can attempt to find an estimator
that minimize the Bayesian MSE defined by
=∫∫(w w− ˆ) ( , )2p T w T w d d
We use Bayes’ theorem to write
Trang 36p(T,w)= p(w |T) p(T)
so that
2 2
Because of p(T) ≥ 0 for all T, the minimization of integral in brackets for each T
leads to the minimization of the Bayesian MSE, and we have
ℵ is independent of w The linear model (2.3) becomes the Bayesian general
linear model In order to estimate w by (2.26), we must have an explicit expression
for the posterior PDF p(w|T)
Since w and ε are independent of each other and each one is Gaussian,
T=Xw+ε is also Gaussian, and they are jointly Gaussian It follows a theorem that if
T and w are jointly Gaussian with mean vector [E(T) T E(w) T] and partitioned
Trang 37Thus, under the jointly Gaussian assumption, the optimal Bayesian estimators
are easily found However, in general, they are not always to make the Gaussian
assumption In this situation, another method is used that is Linear MMSE (LMMSE)
Trang 38estimation It considers the class of all linear estimators of the form
L
We must choose the weighting coefficients A to minimize
Bmse(ŵi)=E[(wi- ŵi)2], i=0, 1, …, p
If we assume that the random vector of parameters w has mean E(w) and covariance
matrix Cww, and error vector ε has zero mean and covariance matrix Cε and is
uncorrelated with w, then the LMMSE estimator of w is also given by
ŵ=E(w)+ CwwXT(XCwwXT+Cε)-1(T- XE(w))
The performance of the estimator is measured by the error e=w-ŵ which has mean of
zero and covariance matrix of
1
e =( ww-1 + T ε − )− 1
2.2 Feedforward Neural Networks
The feedforward neural networks have been frequently used in machine learning due
to their ability to approximate complex nonlinear mappings directly from input
patterns They can also provide proper classification models for a wide range of
problems that are difficult to handle by using classical parametric techniques There
are several materials presenting details of neural networks that cover models,
Trang 39principles, operations, properties, and applications [16-22] This section briefly describes the basis operations of feedforward neural networks including feedforward operations, gradient-descent based learning algorithms, practical techniques and theoretical foundations for improving gradient-descent based algorithms Especially, the approximation capacities of neural networks are emphasized
2.2.1 Neural Networks and Feedforward Operation
The feedforward neural network consists of a hierarchy of processing units (Fig 2.1), organized in a series of two or more layers The first layer serves as a holding site for the inputs applied to the network and it is called as input layer The last one is called
as output layer at which the overall mapping of the network inputs is available Between the input layer and the output layer, there are zero or multiple hidden layers
at which additional remapping or computing takes place The function of processing units is loosely based on characteristics of biological neurons, so they are also called neurons Each neuron or unit at layer Ll has weights connected from the previous lower layer Ll-1 Letol− 1=[ 1
l
l L
o−− ]T be the output of the previous layer (Ll-1)
w − T be weights connected from the layer Ll-1 to the j-th unit
of layer Ll The net activation is formed by the inner product of the
previous-layer output and weights :
l j
nonlinear mapping of its net activation by an activation function f:
Trang 40l j
o j ( l)
j
net
1 1
l
o−
l j
o
l j
bias
2
l j
w
1
l j
w
1
l
l jL
w
1 2
l
o−
− 1
1
l
l L
−
o −
Figure 2.1 A typical feedforward neural network
The activation function f(·) is usually differential, nonlinear and non-decreasing Two forms of this activation function are often used: f(net)=tanh(net) whose range is normalized from -1 to 1, and f(net)=(1+exp(-net))-1 whose range is normalized from
0 to 1
In mathematical aspect, a multilayer feedforward neural network (FFNN) can
be described as a composite application of functions Each of them represents a
particular layer and may be specific to individual units in the layer Let o be the output of FFNN corresponding to the input pattern x The FFNN implements a
2.2.2 Gradient-descent based Learning Algorithms
Once appropriate network architecture is chosen, we must train it for a specific
application The main goal of training process is to find network parameters w
including weights and biases that minimize error function defined by: