11.7 Hidden Markov Models for ECG Segmentation 30511.7.1 Overview The first step in applying hidden Markov models to the task of ECG tion is to associate each state in the model with a p
Trang 111.7 Hidden Markov Models for ECG Segmentation 305
11.7.1 Overview
The first step in applying hidden Markov models to the task of ECG tion is to associate each state in the model with a particular region of the ECG Asdiscussed previously in Section 11.6.5, this can either be achieved in a supervisedmanner (i.e., using expert measurements) or an unsupervised manner (i.e., usingthe EM algorithm) Although the former approach requires each ECG waveform
segmenta-in the trasegmenta-insegmenta-ing data set to be associated with expert measurements of the
wave-form feature boundaries (i.e., the P on , Q, T off points, and so forth), the resultingmodels generally produce more accurate segmentation results compared with theirunsupervised counterparts
Figure 11.5 shows a variety of different HMM architectures for ECG intervalanalysis A simple way of associating each HMM state with a region of the ECG is
to use individual hidden states to represent the P wave, QRS complex, JT intervaland baseline regions of the ECG, as shown in Figure 11.5(a) In practice, it isadvantageous to partition the single baseline state into multiple baseline states [9],one of which is used to model the baseline region between the end of the P waveand the start of the QRS complex (termed “baseline 1”), and another which is used
to model the baseline region following the end of the T wave (termed “baseline 2”).This model architecture, which is shown in Figure 11.5(b), will be used throughoutthe rest of this chapter.5
Following the choice of model architecture, the next step in training an HMM is
to decide upon the specific type of observation model which will be used to capturethe statistical characteristics of the signal samples from each hidden state Commonchoices for the observation models in an HMM are the Gaussian density, the Gaus-sian mixture model (GMM), and the autoregressive (AR) model Section 11.7.4 dis-cusses the different types of observation models in the context of ECG segmentation.Before training a hidden Markov model for ECG segmentation, it is beneficial
to consider the use of preprocessing techniques for ECG signal normalization
11.7.2 ECG Signal Normalization
In many pattern recognition tasks it is advantageous to normalize the raw input dataprior to any subsequent modeling [24] A particularly simple and effective form ofsignal normalization is a linear rescaling of the signal sample values In the case ofthe ECG, this procedure can help to normalize the dynamic range of the signal and
to stabilize the baseline sections
A useful form of signal normalization is given by range normalization, which
linearly scales the signal samples such that the maximum sample value is set to+1and the minimum sample value to−1 This can be achieved in a simple two-stepprocess First, the signal samples are “amplitude shifted” such that the minimumand maximum sample values are equidistant from zero Next, the signal samplesare linearly scaled by dividing by the new maximum sample value These two steps
5 Note that it is also possible to use an “optional” U wave state (following the T wave) to model any U waves
that may be present in the data, as shown in Figure 11.5(c).
Trang 2306 Probabilistic Approaches to ECG Segmentation and Feature Extraction
Figure 11.5 (a–e) Hidden Markov model architectures for ECG interval analysis.
can be stated mathematically as
Trang 311.7 Hidden Markov Models for ECG Segmentation 307
where xmin and xmaxare the minimum and maximum values in the original signal,respectively The range normalization procedure can be made more robust to thepresence of artefact or “spikes” in the ECG signal by computing the median of theminimum and maximum signal values over a number of different signal segments.Specifically, the ECG signal is divided evenly into a number of contiguous segments,and the minimum and maximum signal values within each segment are computed.The ECG signal is then range normalized (i.e., scaled) to the median of the minimumand maximum values over the given segments
11.7.3 Types of Model Segmentations
Before considering in detail the results for HMMs applied to the task of ECGsegmentation, it is advantageous to consider first the different types of ECG seg-mentations that can occur in practice In particular, we can identify two distinctforms of model segmentations when a trained HMM is used to segment a given10-second ECG signal:
• Single-beat segmentations: Here the model correctly infers only one heartbeat
where there is only one beat present in a particular region of the ECG signal
• Double-beat segmentations: Here the model incorrectly infers two or more
heartbeats where there is only one beat present in a particular region of theECG signal
Figure 11.6(a, b) shows examples of single-beat and double-beat segmentations,respectively In the example of the double-beat segmentation, the model incorrectly
infers two separate beats in the ECG signal shown The first beat correctly locates
the QRS complex but incorrectly locates the end of the T wave (in the region ofbaseline prior to the T wave) The second beat then “locates” another QRS complex(of duration one sample) around the onset of the T wave, but correctly locates theend of the T wave in the ECG signal The specific reason for the occurrence ofdouble-beat segmentations and a method to alleviate this problem are covered inSection 11.9
In the case of a single-beat segmentation, the segmentation errors can be uated by simply computing the discrepancy between each individual automated
eval-annotation (e.g., T off) and the corresponding expert analyst annotation In the case
of a double-beat segmentation, however, it is not possible to associate uniquelyeach expert annotation with a corresponding automated annotation Given this, it
is therefore not meaningful to attempt to evaluate a measure of annotation “error”for double-beat segmentations Thus, a more informative approach is simply to re-port the percentage of single-beat segmentations for a given ECG data set, alongwith the segmentation errors for the single-beat segmentations only
11.7.4 Performance Evaluation
The technique of cross-validation [24] was used to evaluate the performance of
a hidden Markov model for automated ECG segmentation In particular, fold cross-validation was used In the first stage, the data set of annotated ECG
Trang 4five-308 Probabilistic Approaches to ECG Segmentation and Feature Extraction
Figure 11.6 Examples of the two different types of HMM segmentations which can occur in tice: (a) single- and (b) double-beat segmentation.
prac-waveforms was partitioned into five subsets of approximately equal size (in terms
of the number of annotated ECG waveforms within each subset) For each “fold”
of the cross-validation procedure, a model was trained in a supervised manner usingall the annotated ECG waveforms from four of the five subsets The trained modelwas then tested on the data from the remaining subset This procedure was repeatedfor each of the five possible test subsets Prior to performing cross-validation, thecomplete data set of annotated ECG waveforms was randomly permuted in order
to remove any possible ordering which could affect the results
As previously stated, for each fold of cross-validation a model was trained
in a supervised manner The transition matrix was estimated from the trainingwaveform annotations using the supervised estimator given in (11.18) For Gaussianobservation models, the mean and variance of the full set of signal samples werecomputed for each model state For Gaussian mixture models, a combined MDL
Trang 511.7 Hidden Markov Models for ECG Segmentation 309
and EM algorithm was used to compute the optimal number of mixture componentsand the associated parameter values [25] For autoregressive6 or AR models, theBurg algorithm [26] was used to infer the model parameters and the optimal modelorder was computed using an MDL criterion
Following the model training for each fold of cross-validation, the trainedHMM was then used to segment each 10-second ECG signal in the test set Thesegmentation was performed by using the Viterbi algorithm to infer the most prob-able underlying sequence of hidden states for the given signal Note that the full10-second ECG signal was processed, as opposed to just the manually annotatedECG beat, in order to more closely match the way an automated system would beused for ECG interval analysis in practice
Next, for each ECG, the model annotations corresponding to the particular beat
which had been manually annotated were then extracted In the case of a
single-beat segmentation, the absolute differences between the model annotations and
the associated expert analyst annotations were computed In the case of a beat segmentation, no annotation errors were computed Once the cross-validationprocedure was complete, the five sets of annotation “errors” were then averaged toproduce the final results
double-Table 11.1 shows the cross-validation results for HMMs trained on the raw ECGsignal data In particular, the table shows the percentage of single-beat segmenta-tions and the annotation errors for different types of HMM observation modelsand with/without range normalization, for ECG leads II and V2
The results for each lead demonstrate the utility of normalizing the ECG nals (prior to training and testing) with the range normalization method In eachcase, the percentage of single-beat segmentations produced by an HMM (with aGaussian observation model) is considerably increased when range normalization
sig-is employed For lead V2, it sig-is notable that the annotation errors (evaluated onthe single-beat segmentations only) for the model with range normalization aregreater than those for the model with no normalization This is most likely to
be due to the fact that the latter model produces double-beat segmentations forthose waveforms that naturally give rise to larger annotation errors (and hencethese waveforms are excluded from the annotation error computations for thismodel)
The most important aspect of the results is the considerable performance provement gained by using autoregressive observation models as opposed to Gaus-sian or Gaussian mixture models The use of AR observation models enables eachHMM state to capture the statistical dependencies between successive groups ofobservations In the case of the ECG, this allows the HMM to take account ofthe shape of each of the ECG waveform features Thus, as expected, these modelslead to a significant performance improvement (in terms of both the percentage ofsingle-beat segmentations and the magnitude of the annotation errors) comparedwith models which assume the observations within each state are i.i.d
im-6 In autoregressive modeling, the signal sample at time t is considered to be a linear combination of a number
of previous signal samples plus an additive noise term Specifically, an AR model of order m is given by
x t=m
i=1c i x t−i + t , where c i are the AR model coefficients and tcan be viewed as a random residual noise term at each time step.
Trang 6310 Probabilistic Approaches to ECG Segmentation and Feature Extraction
Table 11.1 Five-Fold Cross-Validation Results for HMMs Trained on the Raw ECG Signal
Data from Leads II and V2
Lead II Hidden Markov Model % of Single-Beat Mean Absolute Errors (ms) Specification Segmentations P on Q J T off
Standard HMM Gaussian observation model 5.7% 175.3 108.0 99.0 243.7
No normalization Standard HMM Gaussian observation model 69.8% 485.0 35.8 73.8 338.4 Range normalization
Standard HMM GMM observation model 57.5% 272.9 48.7 75.6 326.1 Range normalization
Standard HMM
AR observation model 71.7% 49.2 10.3 12.5 52.8 Range normalization
Lead V2 Hidden Markov Model % of Single-Beat Mean Absolute Errors (ms) Specification Segmentations P on Q J T off
Standard HMM Gaussian observation model 33.6% 211.5 14.5 20.7 31.5
No normalization Standard HMM Gaussian observation model 77.9% 293.1 49.2 50.7 278.5 Range normalization
Standard HMM GMM observation model 57.4% 255.2 49.9 65.0 249.5 Range normalization
Standard HMM
AR observation model 87.7% 43.4 5.4 7.6 32.4 Range normalization
Despite the advantages offered by AR observation models, the mean annotationerrors for the associated HMMs are still considerably larger than the inter-analystvariability present in the data set annotations In particular, the T wave offset anno-tation errors for leads II and V2 are 52.8 ms and 32.4 ms, respectively This “level
of accuracy” is not sufficient to enable the trained model to be used as an effectivemeans for automated ECG interval analysis in practice
The fundamental problem with developing HMMs based on the raw ECG signaldata is that the state observation models must be flexible enough to capture thestatistical characteristics governing the overall shape of each of the ECG waveformfeatures Although AR observation models provide a first step in this direction,these models are not ideally suited to representing the waveform features of theECG In particular, it is unlikely that a single AR model can successfully representthe statistical dependencies across whole waveform features for a range of ECGs
Trang 711.8 Wavelet Encoding of the ECG 311
Thus, it may be advantageous to utilize multiple AR models (each with a separatemodel order) to represent the different regions of each ECG waveform feature
An alternative approach to overcoming the i.i.d assumption within each HMMstate is to encode information from “neighboring” signal samples into the rep-resentation of the signal itself More precisely, each individual signal sample istransformed to a vector of transform coefficients which captures (approximately)the shape of the signal within a given region of the sample itself This newrepresentation can then be used as the basis for training a hidden Markov model,using any of the standard observation models previously described We now con-sider the utility of this approach for automated ECG interval analysis
11.8 Wavelet Encoding of the ECG
11.8.1 Wavelet Transforms
Wavelets are a class of functions that possess compact support and form a basisfor all finite energy signals They are able to capture the nonstationary spectralcharacteristics of a signal by decomposing it over a set of atoms which are localized
in both time and frequency These atoms are generated by scaling and translating asingle mother wavelet
The most popular wavelet transform algorithm is the discrete wavelet transform(DWT), which uses the set of dyadic scales (i.e., those based on powers of two) andtranslates of the mother wavelet to form an orthonormal basis for signal analysis.The DWT is therefore most suited to applications such as data compression where
a compact description of a signal is required An alternative transform is derived
by allowing the translation parameter to vary continuously, whilst restricting thescale parameter to a dyadic scale (thus, the set of time-frequency atoms now forms
a frame) This leads to the undecimated wavelet transform (UWT),7 which for a
In practice the UWT for a signal of length N can be computed in O using an
efficient filter bank structure [27] Figure 11.7 shows a schematic illustration of
the UWT filter bank algorithm, where h and g represent the lowpass and highpass
“conjugate mirror filters” for each level of the UWT decomposition
The UWT is particularly well suited to ECG interval analysis as it provides atime-frequency description of the ECG signal on a sample-by-sample basis In addi-tion, the UWT coefficients are translation-invariant (unlike the DWT coefficients),which is important for pattern recognition applications
7 The undecimated wavelet transform is also known as the stationary wavelet transform and the
translation-invariant wavelet transform.
Trang 8312 Probabilistic Approaches to ECG Segmentation and Feature Extraction
Figure 11.7 Filter bank for the undecimated wavelet transform At each level k of the transform, the operators g and h correspond to the highpass and lowpass conjugate mirror filters at that particular
level.
11.8.2 HMMs with Wavelet-Encoded ECG
In our experiments we found that the Coiflet wavelet with two vanishing ments resulted in the best overall segmentation performance Figure 11.8 showsthe squared magnitude responses for the lowpass, bandpass, and highpass filters
mo-associated with this wavelet (which is commonly known as the coifl wavelet).
In order to use the UWT for ECG encoding, the UWT wavelet coefficients fromlevels 1 to 7 were used to form a seven-dimensional encoding for each ECG signal.Table 11.2 shows the five-fold cross-validation results for HMMs trained on ECGwaveforms from leads II and V2 which had been encoded in this manner (usingrange normalization prior to the encoding)
The results presented in Table 11.2 clearly demonstrate the considerable formance improvement of HMMs trained with the UWT encoding (albeit at theexpense of a relatively low percentage of single-beat segmentations), compared
per-with similar models trained using the raw ECG time series In particular, the Q and
T off single-beat segmentation errors of 5.5 ms and 12.4 ms for lead II, and 3.3 msand 9.5 ms for lead V2, are significantly better than the corresponding errors forthe HMM with an autoregressive observation model
Despite the performance improvement gained from the use of wavelet methodswith hidden Markov models, the models still suffer from the problem of double-beat segmentations In the following section we consider a modification to theHMM architecture in order to overcome this problem In particular, we make use
of the knowledge that the double-beat segmentations are characterized by the modelinferring a number of states with a duration that is much shorter than the minimumstate duration observed with real ECG signals This observation leads on to thesubject of duration constraints for hidden Markov models
11.9 Duration Modeling for Robust Segmentations
A significant limitation of the standard HMM is the manner in which it models state
durations For a given state i with self-transition coefficient a ii, the probability mass
Trang 911.9 Duration Modeling for Robust Segmentations 313
Figure 11.8 Squared magnitude responses of the highpass, bandpass, and lowpass filters
associ-ated with the coifl wavelet (and associassoci-ated scaling function) over a range of different levels of the
undecimated wavelet transform.
Trang 10314 Probabilistic Approaches to ECG Segmentation and Feature Extraction
Table 11.2 Five-Fold Cross-Validation Results for HMMs Trained on the
Wavelet-Encoded ECG Signal Data from Leads II and V2
Lead II Hidden Markov Model % of Single-Beat Mean Absolute Errors (ms) Specification Segmentations P on Q J T off
Standard HMM Gaussian observation model 29.2% 26.1 3.7 5.0 26.8 UWT encoding
Standard HMM GMM observation model 26.4% 12.9 5.5 9.6 12.4 UWT encoding
Lead V2 Hidden Markov Model % of Single-Beat Mean Absolute Errors (ms) Specification Segmentations P on Q J T off
Standard HMM Gaussian obsevation model 73.0% 20.0 4.1 8.7 15.8 UWT encoding
Standard HMM GMM observation model 59.0% 9.9 3.3 5.9 9.5 UWT encoding
The encodings are derived from the seven-dimensional coifl wavelet coefficients resulting from a
level 7 UWT decomposition of each ECG signal In each case range normalization was used prior
to the encoding.
function for the state duration d is a geometric distribution, given by
p i (d) = (a ii)d−1(1− a ii) (11.23)For the waveform features of the ECG signal, this geometric distribution isinappropriate In particular, the distribution naturally favors state sequences of avery short duration Conversely, real-world ECG waveform features do not occurfor arbitrarily short durations, and there is typically a minimum duration for each ofthe ECG features In practice this “mismatch” between the statistical properties ofthe model and those of the ECG results in unreliable “double-beat” segmentations,
as discussed previously in Section 11.7.3
Unfortunately, double-beat segmentations can significantly impact upon the liability of the automated QT interval measurements produced by the model Thus,
re-in order to make use of the model for automated QT re-interval analysis, the bustness of the segmentation process must be improved This can be achieved byincorporating duration constraints into the HMM architecture Each duration con-straint takes the form of a number specifying the minimum duration for a particularstate in the model For example, the duration constraint for the T wave state is sim-ply the minimum possible duration (in samples) for a T wave Such values can beestimated in practice by examining the durations of the waveform features for alarge number of annotated ECG waveforms
ro-Once the duration constraints have been chosen, they are incorporated into
the model in the following manner: For each state k with a minimum duration of
dmin(k), we augment the model with dmin(k)− 1 additional states directly preceding
Trang 1111.9 Duration Modeling for Robust Segmentations 315
Figure 11.9 Graphical illustration of incorporating a duration constraint into an HMM (the dashed
box indicates tied observation distributions).
Table 11.3 Five-Fold Cross-Validation Results for HMMs with Built-In Duration
Constraints Trained on the Wavelet Encoded ECG Signal Data from Leads II and V2
Lead II Hidden Markov Model % of Single-Beat Mean Absolute Errors (ms) Specification Segmentations P on Q J T off
Duration-constrained HMM GMM observation model 100.0% 8.3 3.5 7.2 12.7 UWT encoding
Lead V2 Hidden Markov Model % of Single-Beat Mean Absolute Errors (ms) Specification Segmentations Pon Q J T off
Duration-constrained HMM GMM observation model 100.0% 9.7 3.9 5.5 11.4 UWT encoding
the original state k Each additional state has a self-transition probability of zero,
and a probability of one of transitioning to the state to its right Thus, taken together,these states form a simple left-right Markov chain, where each state in the chain isonly occupied for at most one time sample (during any run through the chain).The most important feature of this chain is that the parameters of the obser-vation density for each state are identical to the corresponding parameters of the
original state k (this is known as “tying”) Thus the observations associated with the
dminstates identified with a particular waveform feature are governed by a single set
of parameters (which is shared by all dminstates) The overall procedure for porating duration constraints into the HMM architecture is illustrated graphically
incor-in Figure 11.9
Table 11.3 shows the five-fold cross-validation results for a hidden Markovmodel with built-in duration constraints For each fold of the cross-validation
procedure, the minimum state duration dmin(k) was calculated as 80% of the
minimum duration present in the annotated training data for each particular state.The set of duration constraints were then incorporated into the HMM architectureand the resulting model was trained in a supervised fashion
Trang 12316 Probabilistic Approaches to ECG Segmentation and Feature Extraction
The results demonstrate that the duration constrained HMM eliminates theproblem of double-beat segmentations In addition, the annotation errors for leads
II are of a comparable standard to the best results presented for the single-beatsegmentations only in the previous section
11.10 Conclusions
In this chapter we have focused on the two core issues in utilizing a probabilisticmodeling approach for the task of automated ECG interval analysis: the choice ofrepresentation for the ECG signal and the choice of model for the segmentation
We have demonstrated that wavelet methods, and in particular the undecimatedwavelet transform, can be used to generate an encoding of the ECG which is tuned
to the unique spectral characteristics of the ECG waveform features With thisrepresentation the performance of the models on new unseen ECG waveforms issignificantly better than similar models trained on the raw time series data We havealso shown that the robustness of the segmentation process can be improved throughthe use of state duration constraints with hidden Markov models With these modelsthe robustness of the resulting segmentations is considerably improved
A key advantage of probabilistic modeling over traditional techniques for ECGsegmentation is the ability of the model to generate a statistical confidence measure
in its analysis of a given ECG waveform As discussed previously in Section 11.3,current automated ECG interval analysis systems are unable to differentiate be-tween normal ECG waveforms (for which the automated annotations are generallyreliable) and abnormal or unusual ECG waveforms (for which the automated an-notations are frequently unreliable) By utilizing a confidence-based approach toautomated ECG interval analysis, however, we can automatically highlight thosewaveforms which are least suitable to analysis by machine (and thus most in need
of analysis by a human expert) This strategy therefore provides an effective way tocombine the twin advantages of manual and automated ECG interval analysis [3]
References
[1] Morganroth, J., and H M Pyper, “The Use of Electrocardiograms in Clinical Drug
Development: Part 1,” Clinical Research Focus, Vol 12, No 5, 2001, pp 17–23.
[2] Houghton, A R., and D Gray, Making Sense of the ECG, London, U.K.: Arnold, 1997.
[3] Hughes, N P., and L Tarassenko, “Automated QT Interval Analysis with Confidence
Measures,” Computers in Cardiology, Vol 31, 2004.
[4] Jan´e, R., et al., “Evaluation of an Automatic Threshold Based Detector of Waveform
Limits in Holter ECG with QT Database,” Computers in Cardiology, IEEE Press, 1997,
pp 295–298.
[5] Pan, J., and W J Tompkins, “A Real-Time QRS Detection Algorithm,” IEEE Trans.
Biomed Eng., Vol 32, No 3, 1985, pp 230–236.
[6] Lepeschkin, E., and B Surawicz, “The Measurement of the Q-T Interval of the
Electro-cardiogram,” Circulation, Vol VI, September 1952, pp 378–388.
[7] Xue, Q., and S Reddy, “Algorithms for Computerized QT Analysis,” Journal of
Electro-cardiology, Supplement, Vol 30, 1998, pp 181–186.
Trang 1311.10 Conclusions 317
[8] Malik, M., “Errors and Misconceptions in ECG Measurement Used for the Detection
of Drug Induced QT Interval Prolongation,” Journal of Electrocardiology, Supplement,
Vol 37, 2004, pp 25–33.
[9] Hughes, N P., “Probabilistic Models for Automated ECG Interval Analysis,” Ph.D sertation, University of Oxford, 2006.
dis-[10] Rabiner, L R., “A Tutorial on Hidden Markov Models and Selected Applications in Speech
Recognition,” Proc of the IEEE, Vol 77, No 2, 1989, pp 257–286.
[11] Coast, D A., et al., “An Approach to Cardiac Arrhythmia Analysis Using Hidden Markov
Models,” IEEE Trans Biomed Eng., Vol 37, No 9, 1990, pp 826–835.
[12] Koski, A., “Modeling ECG Signals with Hidden Markov Models,” Artificial Intelligence
[17] Nabney, I T., Netlab: Algorithms for Pattern Recognition, London, U.K.: Springer, 2002.
[18] Jordan, M I., “Graphical Models,” Statistical Science, Special Issue on Bayesian Statistics,
Vol 19, 2004, pp 140–155.
[19] Brand, M., Coupled Hidden Markov Models for Modeling Interactive Processes, Technical
Report 405, MIT Media Lab, 1997.
[20] Ghahramani, Z., and M I Jordan, “Factorial Hidden Markov Models,” Machine
Learn-ing, Vol 29, 1997, pp 245–273.
[21] Viterbi, A J., “Error Bounds for Convolutional Codes and An Asymptotically Optimal
Decoding Algorithm,” IEEE Trans on Information Theory, Vol IT–13, April 1967, pp.
260–269.
[22] Forney, G D., “The Viterbi Algorithm,” Proc of the IEEE, Vol 61, March 1973, pp.
268–278.
[23] Dempster, A P., N M Laird, and D B Rubin, “Maximum Likelihood from Incomplete
Data Via the EM Algorithm,” Journal of the Royal Statistical Society Series B, Vol 39,
No 1, 1977, pp 1–38.
[24] Bishop, C M., Neural Networks for Pattern Recognition, Oxford, U.K.: Oxford
Univer-sity Press, 1995.
[25] Figueiredo, M A T., and A K Jain, “Unsupervised Learning of Finite Mixture Models,”
IEEE Trans on Pattern Analysis and Machine Intelligence, Vol 24, No 3, 2002, pp.
381–396.
[26] Hayes, M H., Statistical Digital Signal Processing and Modeling, New York: Wiley, 1996.
[27] Mallat, S., A Wavelet Tour of Signal Processing, 2nd ed., London, U.K.: Academic Press,
1999.
Trang 15C H A P T E R 12
Supervised Learning Methods for ECG
Classification/Neural Networks and SVM Approaches
Stanislaw Osowski, Linh Tran Hoai, and Tomasz Markiewicz
rec-an importrec-ant approach Mrec-any solutions based on this approach have been posed Some of the best known techniques are the multilayer perceptron (MLP) [2],self-organizing maps (SOM) [1, 3], learning vector quantization (LVQ) [1], lineardiscriminant systems [6], fuzzy or neuro-fuzzy systems [8], support vector machines(SVM) [5], and the combinations of different neural-based solutions, so-called hy-brid systems [4]
pro-A typical heartbeat recognition system based on neural network classifiers ally builds (trains) different models, exploiting either different classifier networkstructures or different preprocessing methods of the data, and then the best one
usu-is chosen, while the rest are dusu-iscarded However, each method of data processingmight be sensitive to artifacts and outliers Hence, a consensus of experts, integrat-ing available information into one final pattern recognition system, is expected toproduce a classifier of the highest quality, that is of the least possible classificationerrors
In this chapter we will discuss different solutions for ECG classification based
on the application of supervised learning networks, including neural networks andSVM Two different preprocessing methods for generation of features are illustrated:higher-order statistics (HOS) and Hermite characterization of QRS complex of theregistered ECG waveform To achieve better performance of the recognition system,
319
Trang 16320 Supervised Learning Methods for ECG Classification/Neural Networks and SVM Approaches
we propose the combination of multiple classifiers by a weighted voting principle.This technique will be illustrated using SVM-based classifiers In this example theweights of the integrating matrix are adjusted according to the results of individualclassifier’s performance on the learning data The proposed solutions are verified
on the MIT-BIH Arrhythmia Database [9] heartbeat recognition problems
12.2 Generation of Features
The recognition and classification of patterns, including ECG signals, requires thegeneration of features [7] that accurately characterize these patterns in order toenable their type or class differentiation
Such features represent the patterns in such a way that the differences of phology of the ECG waveforms are suppressed for the same type (class) of heart-beats, and enhanced for waveforms belonging to different types of beats This is avery important capability, since we observe great morphological variations in sig-nals belonging to different clinical classes This is, for example, observed in ECGwaveforms contained in the MIT-BIH Arrhythmia Database [9] In this databasethere are ECG waveforms of 12 types of abnormal beats: left bundle branch block(L), right bundle branch block (R), atrial premature beat (A), aberrated atrial pre-mature beat (a), nodal (junctional) premature beat (J), ventricular premature beat(V), fusion of ventricular and normal beat (F), ventricular flutter wave (I), nodal(junctional) escape beat (j), ventricular escape beat (E), supraventricular prematurebeat (S), and fusion of paced and normal beat (f), and the waveforms corresponding
mor-to the normal sinus rhythm (N) Exemplary waveforms of ECG from one patient [9],corresponding to the normal sinus rhythm (N), and three types of abnormal rhythms
(L, R, and V), are presented in Figure 12.1 The vertical axis y is measured in µV
and the horizontal axis x in points (at 360-Hz sampling rate one point corresponds
to approximately 2.8 ms)
It is clear that there is a great variety of morphologies among the heartbeatsbelonging to one class, even for the same patient Moreover, beats belonging todifferent classes are morphologically similar to each other (look, for example, atthe L-type rhythms and some V-type rhythms) They occupy a similar range ofvalues and frequencies; thus, it is difficult to recognize one from the other on thebasis of only time or frequency representations Different feature extraction tech-niques have been applied Traditional representations include features describingthe morphology of the QRS complex, such as RR intervals, width of the QRScomplex [1, 3, 4, 6], wave interval and wave shape features [6] Some authorshave processed features resulting from Fourier [2] or wavelet transformations [10]
of the ECG Clustering of the ECG data, using methods such as self-organizingmaps [3] or learning vector quantization [1], as well as internal features resultingfrom the neural preprocessing stages [1] have been also exploited Other importantfeature extraction methods generate statistical descriptors [5] or orthogonal poly-nomial representations [3, 8] None of these methods is of course perfect and fullysatisfactory In this chapter we will illustrate supervised classification applicationsthat rely on the processing of features originating from the description of the QRScomplex by using the higher-order statistics and Hermite basis functions expansion
Trang 1712.2.1 Hermite Basis Function Expansion
In the Hermite basis function expansion method, the QRS complex is represented
by a series of Hermite functions This approach successfully exploits existing ilarities between the shapes of Hermite basis functions and QRS complexes of theECG waveforms under analysis Moreover, this characterization includes a widthparameter, which provides good representation of beats with large differences in
sim-QRS duration Let us denote the sim-QRS complex of the ECG curve by x(t) Its
ex-pansion into Hermite series may be written in the following way:
x(t)=N−1
n=0
Trang 18322 Supervised Learning Methods for ECG Classification/Neural Networks and SVM Approaches
where c nare the expansion coefficients,σ is the width parameter, and φ n (t, σ ) are
the Hermite basis functions of the nth order defined as follows [3]:
and H n (t /σ ) is the Hermite polynomial of the nth order The Hermite polynomials
satisfy the following recurrence relation:
H n (x) = 2xH n−1 (x) − 2(n − 1)H n−2 (x) (12.3)
with H o (x) = 1 and H1(x) = 2x, for n = 2, 3, The higher the order of the
Hermite polynomial, the higher its frequency of changes in the time domain, andthe better its capability to reconstruct the quick changes of the ECG signal The
coefficients c nof Hermite basis functions expansion may be treated as the featuresused in the recognition process They may be obtained by minimizing the sumsquared error, defined as
This error function represents the set of linear equations with respect to the
coeffi-cients c n They have been solved by using singular value decomposition (SVD) andthe pseudo-inverse technique [11] In numerical calculations, we have representedthe QRS segment of the ECG signal by 91 data points around the R peak (45 pointsbefore and 45 after) A data sampling rate equal to 360 Hz generates a window of
250 ms, which is long enough to cover a typical QRS complex The data have beenadditionally expanded by adding 45 zeros to each end of the QRS segment Thisadditional information is added to reinforce the idea that beats do not not existoutside the QRS complex Subtracting the mean level of the first and the last pointsnormalizes the ECG signals The width σ was chosen proportional to the width
of the QRS complex These modified QRS complexes of the ECG have been composed onto a linear combination of Hermite basis functions Empirical analyseshave shown that 15 Hermite coefficients allow a satisfactory good reconstruction
de-of the QRS curve in terms de-of the representation de-of the most important details de-of thecurve [3] Figure 12.2 depicts a representation of an exemplary normalized QRScomplex by using 15 Hermite basis functions The horizontal axis of the figure ismeasured in points identically as in Figure 12.1
These coefficients, together with two classical signal features—the neous RR interval length of the beat (the time span between two consecutive
instanta-R points) and the average instanta-Rinstanta-R interval of 10 preceding beats, form the 17-element
feature vector x applied to the input of the classifiers These two features are usually
considered for better representation of the actually processed waveform segment onthe background of the average length of the last processed segments
Trang 1912.2 Generation of Features 323
Figure 12.2 The approximation of the QRS complex by 15 Hermite basis functions (From: [8].
c
2004 IEEE Reprinted with permission.)
12.2.2 HOS Features of the ECG
Another important approach to ECG feature generation is the application of tical descriptions of the QRS curves Three types of statistics have been applied: thesecond-, third-, and fourth-order cumulants The cumulants are the coefficients of
statis-the Taylor expansion around s = 0 of the cumulant generating function of variable
x, defined as φ x (s) = ln {E[e sx]}, where E means the expectation operator [12].They can be also expressed in terms of the well-known statistical moments as
their linear or nonlinear combinations For a zero mean stationary process x(t), the
second- and third-order cumulants are equal to their corresponding moments
c 3x(τ1,τ2)= m 3x(τ1,τ2) (12.6)
The nth-order moment of x(k), m nx(τ1,τ2, , τ n−1), is formally defined [12] as the
coefficient in the Taylor expansion around s= 0 of the moment generating function
ϕ x (s), where ϕ x (s) = E[e sx ] Equivalently, each nth-order statistical moment can be calculated by taking an expectation over the process multiplied by (n− 1) laggedversions of itself The expression of the fourth-order cumulants is a bit more complex[12]:
c 4x(τ1,τ2,τ3)= m 4x(τ1,τ2,τ3)− m 2x(τ1)m 2x(τ3− τ2) (12.7)
−m 2x(τ2)m 2x(τ3− τ1)− m 2x(τ3)m 2x(τ2− τ1)
In these expressions c nx means the nth-order cumulant and m nx is the nth-order statistical moment of the process x(k), while τ1,τ2,τ3 are the time lags
Trang 20324 Supervised Learning Methods for ECG Classification/Neural Networks and SVM Approaches
Table 12.1 The Variance of the Chosen Heart Rhythms of the MIT-BIH AD and Their
Cumulants Characterizations
Rhythm Type Original QRS Second-Order Third-Order Fourth-Order
Signal Cumulants Cumulants Cumulants
N 0.74E-2 0.31E-2 0.28E-2 0.24E-2
L 1.46E-2 0.60E-2 1.03E-2 0.51E-2
R 1.49E-2 0.94E-2 1.06E-2 0.55E-2
A 1.47E-2 0.67E-2 0.85E-2 0.38E-2
V 1.64E-2 0.68E-2 0.71E-2 0.54E-2
I 1.72E-2 0.52E-2 0.34E-2 0.24E-2
E 0.59E-2 0.42E-2 0.40E-2 0.60E-2
We have chosen the values of the cumulants of the second, third, and fourthorders at five points distributed evenly within the QRS length (for the third- andfourth-order cumulants the diagonal slices have been applied) as the features usedfor the heart rhythm recognition application examples We have chosen a five-pointrepresentation to achieve a feature coding scheme (number of features) comparablewith the Hermite representation For a 91-element vector representation of theQRS complex, the cumulants corresponding to the time lags of 15, 30, 45, 60,and 75 have been chosen Additionally, we have added two temporal features:one corresponding to the instantaneous RR interval of the beat and the secondrepresenting the average RR interval duration of 10 preceding beats In this wayeach beat has been represented by a 17-element feature vector, with the first 15elements corresponding to the higher-order statistics of QRS complex (the second-,third-, and fourth-order cumulants, each represented by five values) and the lasttwo are the temporal features of the actual QRS signal The application of thecumulant characterization of QRS complexes reduces the relative spread of the ECGcharacteristics belonging to the same type of heart rhythm and in this way makesthe classification relatively easier This is well seen in the example of the variance ofthe signals corresponding to the normal (N) and abnormal (L, R, A, V, I, E) beats.Table 12.1 presents the values of variance for the chosen seven types of normalizedheartbeats (the original QRS complex) and their cumulant characterizations forover 6,600 beats of the MIT-BIH AD [9]
It is evident that the variance of the cumulant characteristics has been cantly reduced with respect to the variance of the original signals It means that thespreads of parameter values characterizing the ECG signals belonging to the sameclass are now smaller and this makes the recognition problem much easier Thisphenomenon has been confirmed by many numerical experiments for all types ofbeats existing in MIT-BIH AD
signifi-12.3 Supervised Neural Classifiers
The components of the input vector, x, containing the features of the ECG pattern
represent the input applied to the classifiers Supervised learning neural classifiersare currently considered as some of the most effective classification approaches[7, 13, 14] We will concentrate on the following models: the MLP, the hybrid