improved algorithm for adaboost with svm base classifiers

INTRODUCTION AdaBoost with SVM base classifiers was studied in this paper.Anewvariablea-AdaBoostSVMwasproposed by Boosting is themachine-learningmethod that working adjusting the kernel

Trang 1

IMPROVED ALGORITHM FOR ADABOOST WITH SvM BASE

CLASSIFIERS Xiaodan WANG, Chongming WU, Chunying ZHENG, Wei WANG Department ofComputer Engineering, Air Force Engineering University

afeu_wgyahoo.com.cn

The relation between the performance ofAdaBoost generalizationcapability.

and theperformance of base classifiers was analyzed, Support vector machine[6] was developed from the and the approach of improving the classification theory of structural risk minimization By usingakernel

performance of AdaBoostSVM was studied There is trickto mapthetraining samplesfromaninputspace toa inconsistency existed betweenthe accuracy and diversity high dimensional feature space, SVM finds an optimal

of base classifiers, and the inconsistency affect separating hyperplane in the feature space and uses a

generalization performance of the algorithm A new regularization parameter, C, to balance its model variable c-AdaBoostSVM was proposedby adjusting the complexityandtrainingerror.

kernelfunction parameter ofthe base classifierbased on How about the generalization performance of using

the distribution of training samples The proposed SVM as the base learner of AdaBoost? Does this algorithm improves the classification performance by AdaBoost havesome advantages over the existingones? making abalance between the accuracy anddiversity of Also, compared with using a single SVM, what is the base classifiers Experimental results indicate the benefit ofusingthisAdaBoost which is acombination of effectivenessoftheproposed algorithm multiple SVMs? These arethe attractive research issues

inrecentyears[7] [8] [9][1O0].

Keywords: Support VectorMachine;AdaBoost After analyzing the relation between theperformance

of AdaBoost and the performance of base classifiers, the

approach ofimproving the classification performance of

1 INTRODUCTION AdaBoost with SVM base classifiers was studied in this

paper.Anewvariablea-AdaBoostSVMwasproposed by

Boosting is themachine-learningmethod that working adjusting the kernel function parameter of the base

in Valiant's PAC (probably approximately correct) classifier based on the distribution of training samples

learning model[1]. A "weak" learning algorithm that The proposed algorithm improves the classification

performs just slightlybetter than random guessing in the performance by making a balance between the accuracy PACmodel canbe"boosted" into an arbitrarily accurate and diversity of base classifiers Experimental results for

"strong" learning algorithm Schapire[2] came up with the benchmark dataset indicate the effectiveness of the

the first provable polynomial-time boosting algorithm proposedalgorithm.

Freund[3] developed a much more efficient boosting

algorithm which, although optimal in a certain sense,

The AdaBoost algorithm, introduced by Freund and

Schapire[4], solved many of the practical difficulties of Given a set of training samples, {(x1,y1), ,

the earlier boosting algorithms, and can be easily used in (x,y,)} , where each training sample xi belongs to

solving practical problems AdaBoost[4] creates a ~~some domain or instance space X, and each class label

collection of weak learners by maintaining a set of

weights over training samples and adjusting these ylsii s label setY x aseX,ayinY={-1,+1}t

weights after each weak learning cycle adaptively: the AdaBoost calls a given weak or base learning algorithm

weights ofthe sampleswhich are misclassified by current repeatedly in a series of rounds t=1, ,T One of the

weak learner will be increased while the weights of the main ideas of the algorithm isto maintain a distribution samples which are correctly classified will be decreased or set of weights over the training set The weight of this The success of AdaBoost can be explained as enlarging distribution on training example xi on round t is

Trang 2

denoted w,(i), i.e w,(i) is the weight of sample xi at 3 Do for t=I ,T

the iteration round t Initially, all weights are set (1) Train the base classifier Ct ontheweighted equally, but on each round, the weights ofincorrectly trainingsample set;Alternatively, a subset of

classified examplesare increasedsothat the base learner the trainingexamples can be sampled

is forced to focus on the hard examples in the training according to wt,and these resampled

The base learner's job is to findabase classifier Ct, C1. The decision function of C is ht.

ht is the decision function of base classifier Ct, and (2) Calculate the trainingerror e ofC:

sample xi, Inthe simplest case, the range of each ht is 8t = W, (i), Y1 '1 (xi)

binary, i.e., restrictedto {-1, +1}; the base learner's job (3) Set weight ofbase classifier C,:

thenis tominimize theerror:

£, =Pri- [h,(xi).y]= w,(i) (1) '=-n( t)

Notice that the error is measured with respect to the (4) Update training samples' weights:

distribution w, onwhich the base learnerwastrained In Wt+i w, (i)exp{-at,yh,(xi)}

training examples can be sampled according to wt, and

these (unweighted) resampled examples can be used to Where Z, is anormalization factor, and Ew,+1(i)=1

train the base learner Both resampling andreweighting

Once the base classifier ht has been received, H(x) =sign[T a,h (x)]

measures the importance that is assigned to ht For The most basic theoretical property of AdaBoost

bivenaryhre,dasntheporigina dcption oA o fraction of mistakes on the training set Let the error et given by Freund and Schapire[4], typically set

2t= -In( ' (2) than random (which has an error rate of 1/2) are h, 's Note that at > 0 if et <1/2, and that a, gets larger classifications Freund and Schapire[4] prove that the

trainingerror(the fraction ofmistakesonthetrainingset)

t ~~~~J7[ fI 2Ve,(1-e,)] II-4yt2 exp(-2>,yt) (3)

increase the weight of examples misclassified by ht, and Thus, if each base hypothesis is slightly better than

to decrease the weight of correctly classified examples random so that Yt 2y for some y>0, then thetraining Thus, the weight tends to concentrate on "hard" error drops exponentiallyfast

The final or combined classifier H is a weighted lower bound y be known a priori before boosting begins

majority vote of the T base classifiers where at is the In practice, knowledge of such a bound is very difficult

weight thatassignedto ht toobtain.AdaBoost,onthe otherhand, is adaptiveinthat Thealgorithm for AdaBoost isgiven below: it adapts to the error rates of the individual base

1 Input: a set of training samples with labels hypotheses. This is the basis of itsname -"Ada" is short D= {(xl,y1), ,(xYn)} , xi , Y =1-1,+1. for "adaptive"

Baselearner algorithm,the number ofcyclesT

2 Initialize: the weight of samples: w1 (i) =1 / n, for

alli=l,*,n

Trang 3

3 AN IMPROVED ALGORITHM FOR In order to avoid the problem resulting from using a

ADABOOST WITH SVM BASE single and fixed a to all RBFSVM base classifiers, and

CLASSIFIERS get good classification performance, it is necessary to

find suitable a for each base classifier RBFSVM Because if aroughly suitableC isgiven and the variance Diversity isknown tobe an importantfactor affecting ofthe training samples is used as the Gaussian width a of the generalization performance of ensemble methods RBFSVM, the SVM may get comparable good

[11][12], which meansthat the errors made by different classification performance, we will use the variance of

base classifiersareuncorrelated Ifeach base classifier is the training samples for each base classifier as the

moderately accurate and these base classifiers disagree Gaussian width a of RBFSVM in this paper, this will

with each other, the uncorrelated errors of these base generate a set of moderately accurate RBFSVM

classifiers will be removedby thevotingprocessso as to classifiers for AdaBoost, and an improved algorithm will

achieve good ensemble results[13]. This also applies to be obtained, we call it the variable

Studies that using SVM as the base learner of In the proposed AdaBoostSVM, the obtained SVM

AdaBoost have beenreported[7][8][9][10] These studies baselearnersaremostly moderately accurate, which give

showed the good generalization performance of chances to obtain more un-correlated base learners.

ForAdaBoost, it is known that there existsadilemma of the training samples, a set of SVM base learnerswith

between base learner'saccuracyanddiversity[14],which different learning abilities is obtained The proposed means that themore accurate twobase learners become, variable G-AdaBoostRBFSVM is hoped to achieve

the less they can disagree with each other Therefore, higher generalization performance than AdaBoostSVM

how to select SVM base learners for AdaBoost? Select which using a single and fixed a to all RBFSVM base accurate but not diverse base learners? Or select diverse classifiers In the proposed algorithm, without loss of

butnot too accurateones? generality, re-sampling technique is used

Suppose we can keep an accuracy and diversity The algorithm for variable a-AdaBoostRBFSVM:

balance among different base classifiers, the superior 1 Input: a set oftraining samples with labels

result of AdaBoost will be gotten, but there is no D={(x,,yI ), - ,(x, yJ )}, xi E X, y1 E Y ={-1,+1}.

effective wayto geta desirable result for AdaBoost We

will then analyze the instance ofusingRBFSVM asthe Bs hlerer the numberof cycles T

base classifier ofAdaBoost 2 Initialize: the weight of samples: w1 (i) = 1/n,for

Theproblem of model selection isvery importantfor all i=1, ,n

SVM, the classification performance ofSVM is affected 3.Do for t=1 'T

by its parameters For RBFSVM, they are the Gaussian (1) A subset ofthe trainingexamples can be

width a, and the regularization parameter C The

variation of either of them leads to the change of sampled accrdn tonwtitand these

classification performance However, as reported in [7] resampledexamples constitute thenew

although RBFSVM cannot learn well when a very low training data set d1, dt will be used to train

value ofC is used, its performance largely depends on the base classifier Ct.The decision function

How to set the a value for the base learners when t

using the RBFSVM as base learner for AdaBoost? (2) Calculate the varianceaofdt:

Problems are encountered when applying a single and a=sqrt(mean(var(dt ))).

fixed a to all RBFSVM base learners In detail[10], an (3) Using dt asthe training sample set,aas

over-large a often results in too weak RBFSVM Its

classification accuracy isoften less than 5000 and cannot G t

meet the requirement on a base learner given in the RBFSVM with Gaussian width(,and

AdaBoost; On the other hand, a smaller a often makes htis the decision function ofCt;

the RBFSVM stronger and boosting them may become (4) Calculate thetrainingerror etofCt:

inefficient because the errors of these base learners are n

highly correlated Furthermore, too small a can even Ct =wt (i), Yi #ht(xi);

make RBFSVM overfit the training samples and they

also cannot be used as base learners Hence, finding a

suitable a for AdaBoost with SVM base learners

becomes a problem[10]

Trang 4

(5) Setweight ofbase classifier Ct number oftraining samples,and axis Ygivesthe correct

-w,(i) exp{-ar,yih,(xi)}

H(x) =seign[ ah (xh)].

4 EXPERIMENTS AND RESULTS S

t Ada~~~~~~~~~~~~~~~~~~~~~~~~- SVM

Impove Ada-SV

comparison between AdaBoostSVM which using a single Fig.2 Performbance compailes

Fig 2 Performancecomparisonfor Wetnooiner

and fixed a to all RBFSVM base classifiers and our

improved algorithm, experiments for the Westontoynon- For theWine dataset, thetraining and testing samples liner data set and the Wine data set[8] were conducted, were also chosen randomly from the given datasets, and the results of the classification experiments are 50,80,100,130,150 are the numbers of training samples

TheSVM we used is from Steve Gunn SVM Toolbox samples used in theexperiments

TheWestontoynonliner data set consists of 1000 samples For SVM andAdaBoostSVM, set the Gaussian width

of 2 classes, each sample having 52 attributes The Wine ai of RBFSVM to 2, 6, and 12, the average correct data set consists of 178 samples of 3 classes, each sample classification rates for randomly chosen testing data sets having 13 attributes, class 1 is used as the positive class arecalculated

and the other two classes belong to thenegative class in For variable ci-AdaBoostRBFSVM and AdaBoost-theclassification experiments SVM, 1/2-1/8 of the training samples are used to train the The training of SVMs for the variable ci- base classifiers, and the average correct classification AdaBoostRBFSVM, AdaBoostSVM and SVM are under rates for 3 randomly chosen testing data sets are the same parameter when comparing the performance of calculated

the algorithms, C=1000 For SVM and AdaBoostSVM, Fig.2 gives the results of performance comparison for set theGaussian widthai ofRBFSVM to 12 Let T be the theWine data set, axis X indicates the number of training number of base classifiers andT=10in theexperiments, samples, and axis Y gives the correct classification rates For the Westontoynonliner data set, the training and In Fig.2, Ada-SVM stands for AdaBoostSVM, and testing samples are chosen randomly from the given improved Ada-SVM stands for variable

ci-AdaBoost-datasets, 50,150,200,300,500 are the numbers of training RB3FSVM.

samples used in the experiments, and 128 is the number From Fig 1 and Fig.2 we can know that comparing

of testing samples used in the experiments AdaBoostSVM with a single SVM, they have almost the For variable ci-AdaBoostRBFSVM and AdaBoost- same classification performances, but our improved SVM, 1/2-1/10 of the training samples are used to train AdaBoostRBFSVM improves the average correct the base classifiers, and the average correct classification classification rates obviously

rates for 3 randomly chosen testing data sets are For the Wine data set, the distribution of the training

Fig 1 gives the results of performance comparison for samples in the positive class and 119 training samples in the Westontoynonliner data set, axis X indicates the the negative class From Fig.2 we can know that the

Trang 5

variable a-AdaBoostRBFSVM is more efficient for [4] Y Freund, R E Schapire, "A decision-theoretic

Compared with using a single SVM, the benefit of boosting", Journal of Computer andSystem Sciences,

using the improved AdaBoostRBFSVM is its advantage vol 55,no 1, pp.119-139, August 1997

of model selection; and compared with using AdaBoost [5] R E Schapire, Y Singer, P Bartlett, and W Lee,

ofasingleand fixedato all RBFSVM base classifiers, it "Boosting the margin: A new explanation for the has better generation performance effectiveness of voting methods," The Annals of

Statistics,vol 26, no 5, pp 1651-1686, 1998

[6] Vladimir Vapnik, StatisticalLearning Theory, John

[7] G Valentini, T G Dietterich, "Bias-variance AdaBoost is a general method for improving the analysis of support vector machines for the accuracy of any given learning algorithm After development of svm-based ensemble methods",

analyzing the relation between the performance of Journal ofMachine Learning Research, vol 5, pp AdaBoost and the performance of base classifiers, the 725-775, 2004

approach of improving the classification performance of [8] Dmitry Pavlov, Jianchang Mao, "Scaling-upSupport AdaBoostSVMwas studied in this paper Vector Machines Using Boosting Algorithm", In

There is inconsistency existed between the accuracy Proceedings ofICPR 2000

and diversity of base classifiers, and the inconsistency [9] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je,

affect generalization performance of the algorithm How Daijin Kim, and Sung Yang Bang, "Constructing

to deal with the dilemma between base classifier's support vector machine ensemble," Pattern accuracy anddiversity of AdaBoost isveryimportant for Recognition, vol 36, no 12, pp 2757-2767, Dec

improvingthe performance of AdaBoost A new variable 2003

a-AdaBoostSVM was proposed by adjusting the kernel [10]Xuchun Li, Lei Wang, Eric Sung, "A Study of

function parameter of the base classifier based on the AdaBoost with SVM Based Weak Learners", In

distribution oftraining samples The proposed algorithm Proceedings ofIJCNN 2005

improves the classification performance by making a [11] P Melville, R J Mooney, "Creating diversity in

balance between the accuracy and diversity of base ensembles using articial data", Information Fusion,

classifiers Experimental results for the benchmark vol 6, no 1,pp 99-111, Mar 2005

dataset indicate the effectiveness of the proposed [12] I K Ludmila, J W Christopher, "Measures of

algorithm The experimental results alsoindicate that the diversity in classifier ensembles and their relationship

proposed algorithm ismoreefficient for unbalanced data with the ensemble accuracy", MachineLearning, vol set [13] H W Shin and S Y Sohn, "Selected tree classifier51, no 2,pp 181-207, May 2003

combination based on both accuracy and error

diversity," PatternRecognition, vol 38, pp 191-197,

[14]Thomas G Dietterich, "An experimental

This work is supported by Natural Science Basic comparison of three methods for constructing

Research Planin Shaanxi Province of Chinaunder Grant ensembles of decision trees: Bagging, boosting, and 2004F36, and partially supported byNSFC under Grant randomization," Machine Learning, vol 40, no 2,

References [1] L G Valiant, "A theory of the learnable",

Communications of the ACM, vol 27, no 11,

pp.1134-1142,November 1984

[2] R E Schapire, "The strength of weak learnability",

MachineLearning,vol 5, no 2, pp 197- 227, 1990

[3] Yoav Freund, "Boosting a weak learning algorithm

bymajority", Information and Computation, vol 121,

no 2,pp.256-285, 1995

Tiêu đề	Improved algorithm for adaboost with svm base classifiers
Tác giả	Xiaodan Wang, Chongming Wu, Chunying Zheng, Wei Wang
Trường học	Air Force Engineering University
Chuyên ngành	Computer Engineering
Thể loại	Thesis

Định dạng
Số trang	5
Dung lượng	440,17 KB