Robust hierarchical feature selection with a capped

However, most of the existing hierarchical fea-ture selection methods are not robust for dealing with the inevitable data outliers, resulting in a serious inter-level error propagation p

Trang 1

Robust hierarchical feature selection with a capped ‘ 2 -norm

Xinxin Liua,b,c, Hong Zhaoa,b,⇑

a

School of Computer Science in Minnan Normal University, Zhangzhou, Fujian 363000, China

b

Key Laboratory of Data Science and Intelligence Application, Fujian Province University, Zhangzhou, Fujian, 363000, China

c

Fujian Key Laboratory of Granular Computing and Application (Minnan Normal University), Zhangzhou, Fujian, 363000, China

a r t i c l e i n f o

Article history:

Received 11 June 2020

Revised 20 February 2021

Accepted 2 March 2021

Available online 10 March 2021

Keywords:

Inter-level error propagation

Capped ‘ 2 -norm

Data outliers

Feature selection

Hierarchical classification

a b s t r a c t Feature selection methods face new challenges in large-scale classification tasks because massive cate-gories are managed in a hierarchical structure Hierarchical feature selection can take full advantage of the dependencies among hierarchically structured classes However, most of the existing hierarchical fea-ture selection methods are not robust for dealing with the inevitable data outliers, resulting in a serious inter-level error propagation problem in the following classification process In this paper, we propose a robust hierarchical feature selection method with a capped‘2-norm (HFSCN), which can reduce the adverse effects of data outliers and learn relatively robust and discriminative feature subsets for the hier-archical classification process Firstly, a large-scale global classification task is split into several small local sub-classification tasks according to the hierarchical class structure and the divide-and-conquer strategy, which makes it easy for feature selection modeling Secondly, a capped‘2-norm based loss func-tion is used in the feature selecfunc-tion process of each local sub-classificafunc-tion task to eliminate the data out-liers, which can alleviate the negative effects outliers and improve the robustness of the learned feature weighted matrix Finally, an inter-level relation constraint based on the similarity between the parent and child classes is added to the feature selection model, which can enhance the discriminative ability

of the selected feature subset for each sub-classification task with the learned robust feature weighted matrix Compared with seven traditional and state-of-art hierarchical feature selection methods, the superior performance of HFSCN is verified on 16 real and synthetic datasets

1 Introduction

In this era of rapid information development, the scale of data in

many domains increases dramatically, such as the number of

data are often vulnerable to outliers, which usually decrease the

density of valuable data for a specific task These problems are

challenging to machine learning and data mining tasks such as

classification

On the one hand, high dimensional data bring the curse of

selec-tion is considered as an effective technique to alleviate this

prob-lem[9,10] This method focuses on the features that relate to the

classification task and excludes the irrelevant and redundant ones

On the other hand, data outliers usually disturb the learning

mod-els and reduce the relevance between these selected features and

the corresponding classes This may lead to serious inter-level error propagation, particularly in the following hierarchical

and how to exploit the hierarchical information of classes in fea-ture selection processes are an interesting challenge

Feature selection methods can be categorized into flat feature selection and hierarchical feature selection methods depending

on whether the class hierarchy is considered The flat feature selec-tion method selects one feature subset to distinguish all the classes Thus far, many flat feature selection methods based on

the classical feature selection method Relief based on statistical methods, which selects a relevant feature subset by statistical analysis and uses few heuristics to avoid the complex heuristic

based on the mutual information measure, which selects a feature subset based on the criteria of maximal dependency, maximal

fea-ture selection method based on feafea-ture manifold learning and

pro-posed an effective feature selection method based on the backward

https://doi.org/10.1016/j.neucom.2021.03.002

⇑Corresponding author at: School of Computer Science, Minnan Normal

Univer-sity, Zhangzhou, Fujian, 363000, China

E-mail address: hongzhaocn@tju.edu.cn (H Zhao).

Neurocomputing

j o u r n a l h o m e p a g e : w w w e l s e v i e r c o m / l o c a t e / n e u c o m

Trang 2

elimination approach for web spam detection Meanwhile, some

flat feature selection methods using different regularization terms

-norm minimization on both the loss function and the

regulariza-tion term to optimize the feature selecregulariza-tion process Lan et al

the effect of data outliers and optimize the flat feature selection

process These flat feature selection methods perform well on

selecting feature subsets for a two-class classification or a

multi-class multi-classification However, these methods fail to consider the

ubiquitous and crucial information of local class relationships

and do not perform well when are applied directly to hierarchical

classification tasks This has been verified by the series of

The hierarchical feature selection method selects several local

feature subsets by taking full advantage of the dependency

rela-tionships among the hierarchically structured classes Relying on

different feature subsets to discriminate among different classes

can help achieve more significant effects in hierarchical

classifica-tion tasks For example, texture and color features are suitable for

distinguishing among different animals, while the edge feature is

more appropriate for discriminating among various furniture items

com-bined the process of feature selection and hierarchical classifier

design with genetic algorithms to improve the classification

hierarchi-cal feature selection algorithm based on the fuzzy rough set theory

These methods can achieve high classification accuracy but fail to

use the dependencies in the hierarchical structure of the classes

a hierarchical feature selection method with three penalty terms:

term to select the common features shared by parent and child

cat-egories; and an independence constraint is added to maximize the

uniqueness between sibling categories Following that work, they

then proposed a recursive regularization based hierarchical feature

relationship constraint and the sibling relationship constraint

on the two-way dependence among different classes and proposed

a hierarchical feature selection with subtree-based graph

regular-ization These methods have a good performance in selecting

fea-ture subsets for large-scale classification tasks with hierarchical

class structures However, these existing hierarchical feature

selec-tion methods are not robust to data outliers and suffer from a

outlier filtering mechanism in these models, and the commonly

used least-squares loss function squares the misclassification loss

of these outliers, which will further aggravate the negative impacts

of these outliers It makes these models achieve relatively low

per-formance when dealing with practical tasks with ubiquitous

outliers

In this paper, we propose a robust hierarchical feature selection

data outliers and selects unique and compact local feature subsets

to control the inter-level error propagation in the following

classi-fication process Firstly, HFSCN decomposes a complex large-scale

classification task into several simple sub-classification tasks

according to the hierarchical structure of classes and the

divide-and-conquer strategy Compared with the initial classification task,

these sub-classification tasks are small-scale and easy to handle,

and only discriminative features need to be retained for the current local sub-classification task Secondly, HFSCN excludes the data

-norm based loss function according to the regression analysis In contrast to the existing hierarchical feature selection methods, HFSCN can improve the robustness of the selected local feature subsets and alleviate the error propagation problem in the classifi-cation process Finally, HFSCN selects a unique and compact fea-ture subset for the current sub-classification task using an inter-level regularization of the parent–child relationship in the feature selection process of the current sub-classification task The depen-dency between the current child sub-classification task and its par-ent sub-classification task is emphasized to drop out the features related to the local sub-classification tasks sharing different parent classes with the current sub-classification task

A series of experiments are conducted to compare HFSCN with seven of the existing hierarchical feature selection methods The experimental datasets consist of two protein sequence datasets, two image datasets, and their 12 corrupted datasets with three types of sample noise Six evaluation metrics are used to discuss the significant differences between our method and the compared methods The experimental results demonstrate that the feature subsets selected by the proposed HFSCN algorithm are superior

to those selected by the compared methods for the classification tasks with hierarchical class structures

we present the basic knowledge of hierarchical classification and feature selection and describe the modeling process of HFSCN in

compared methods, the parameter settings, and some evaluation

and discusses the performance of the compared methods Finally,

and ideas for further study

2 HFSCN method

In this section, we present the proposed robust hierarchical

2.1 Framework of the HFSCN method There are two motivations to design our robust hierarchical fea-ture selection method Firstly, the hierarchical class strucfea-ture in the large-scale classification task has to be taken into account for the prevailing hierarchical management of numerous classes Sec-ondly, the adverse effects of noises such as data outliers, which may result in a serious inter-level error problem in the following hierarchical classification, have to be reduced in the optimization process

A framework of HFSCN based on these considerations is

pro-cess of HFSCN can be roughly decomposed into the following two steps:

(1) Divide a complex large-scale classification task into a group

of small sub-classification tasks according to the divide-and-conquer strategy and the class’s hierarchical information (2) Develop a robust hierarchical feature selection for each sub-classification task, considering the elimination of the outliers

constraints

Trang 3

2.2 Hierarchical classification

In most real-world and practical large-scale classification tasks,

categories are usually managed in a hierarchical structure A tree

structure and a directed acyclic graph structure are two common

representations of the class hierarchical information In this study,

we focus on the classes with a hierarchical tree structure

The hierarchical tree structure of classes is usually defined by

the subclass of the latter The ‘‘IS-A” relationship has the following

– the root class node is the only significant element in the tree

structure of classes;

–8ci; cj2 CT, if ci cj, then cj: ci;

–8ci2 CT; ci: ci;

–8ci; cj; ck2 CT, if ci cjand cj ck, then ci ck;

subclass of the latter That is, the former class is not a child of the

object classes with hierarchical information is represented by a tree

The root class Object, which contains all of the classes below it,

is the only large node There are several internal class nodes, which have parent coarse-grained class nodes and child fine-grained class nodes For instance, the Furniture class has the child class set of Seating and Dining table Class nodes without a child node are ter-med ‘‘leaf class nodes” The root node and all of the internal nodes are called ‘‘non-leaf class nodes” Moreover, the classification pro-cess of all the samples stops at the leaf node in the experiments; i.e., leaf node classification is mandatory Several of the examples

the transmission properties of ‘‘IS-A” (1) The asymmetry property: Sofa is a type of Seating, but it is incorrect that all seating are Sofa (2) The transmission property implies that Chair belongs to Seating and Seating belongs to Furniture, so Chair belongs to Furniture as well In this case, class hierarchies in all hierarchical classification tasks satisfy the four properties mentioned above

A classification task with object classes managed in a hierarchi-cal tree structure is hierarchi-called hierarchihierarchi-cal classification A sample is classified into one class node at each level in turn in a coarse-to-fine fashion In the hierarchical tree structure of classes, the root

Fig 1 Framework of HFSCN Firstly, a complex large-scale classification task is divided into some sub-classification tasks with small scales and different inputs and outputs according to the divide-and-conquer strategy and the hierarchy of the classes Then, a corresponding training dataset is grouped for these subtasks from bottom to top along the hierarchy of classes Finally, robust and discriminative feature subsets are selected recursively for those subtasks by the capped ‘ 2 -norm based noise filtering mechanism and the relation constraint between the parent class and its child classes.

Fig 2 Hierarchical tree structure of object classes of the VOC dataset [30]

Trang 4

and internal nodes are abstract coarse-grained categories

summa-rized from their child classes The class labels of the training

sam-ples correspond to the leaf classes, which are fine-grained

categories The closer a node to the root node, the coarser the

gran-ularity of the category As a result, a sample belongs to several

cat-egories from the coarse-grained level to the fine-grained level

However, most hierarchical classification methods have one

gen-eral and serious problem, called inter-level error propagation;

the classification errors at the parent class are easily transported

to its child classes and propagated to the leaf classes along with

2.3 Decomposition of a complex classification task

A divide-and-conquer strategy is used to divide a complex

clas-sification task with a hierarchical class structure A group of small

sub-classification tasks corresponding to the non-leaf classes in the

hierarchical class structure can be obtained according to the

decomposition process The fine-grained classes under the child

classes of a non-leaf class are ignored, and only these direct child

classes of this non-leaf node are included in the searching space

for the corresponding sub-classification task For example, the

class For the sub-classification task corresponding to the non-leaf

class 0, we only need to distinguish its four direct child classes

(Vehicles, Animal, Household, and Person), and do not discriminate

the fine-grained classes under Vehicles, Animal, and Household

according to above task decomposition process

Therefore, each sub-classification task’s searching space is

sig-nificantly decreased, which makes it simple to model the feature

selection and the classification process

Meanwhile, the classification task is represented according to

number of child nodes in all the sub-classification tasks to facilitate the calculation The class label matrix set is redefined as follows:

Y¼ Yf 0; ; Yig, where Yi¼ 0; 1f g 2Rm i C max 2.4 Robust hierarchical feature selection method One unique feature selection process for each small sub-classification task is obtained by the aforementioned task

the feature matrix and the class label matrix

Not all of the dimensions of object features are suitable for pre-dicting a specific category We select a unique and compact feature subset and drop out the relatively irrelevant features for each sub-classification task to alleviate the curse of the dimensionality prob-lem Feature selection methods can be categorized into three groups according to different selection strategies: filter feature selection, wrapper feature selection, and embedding feature selec-tion In this study, we focused on the third one, namely embedding feature selection Different norm-based regularization terms of the feature weighted matrix W are usually used as penalty terms in the embedding feature selection

The feature selection model for the i-th sub-classification task can be formulated as a common penalized optimization problem

as follows:

min

W i

L Xð i; Yi; WiÞ þ kR Wð iÞ; ð1Þ

Fig 3 Classification task of VOC is divided into several small sub-classification tasks.

Trang 5

The common and traditional empirical loss functions include

the least squares loss and the logistic loss Assume that the

classi-cal least squares losskXiWi Yik2

feature selection model for the i-th sub-classification task, where

exiting in the training set, the classification loss are particularly

iWi yj

to a serious inter-level error propagation problem in the following

hierarchical classification process An outlier is a case that does not

follow the same model as the rest of the data and appears as

In order to reduce the adverse effects of data outliers, we use

to remove the outliers according to the regression analysis of the

features and the class labels:

min

iWi yj

i

2;ei

sub-classification task No matter how serious classification error

caused by a data point, the classification loss of this data point is

learned feature weighted matrix are considerably reduced to obtain

a robust and discriminative feature subset for the i-th

sub-classification task

sub-classification task Firstly, the losses of all the training samples

for the i-th sub-classification task For example, the classification

losses of 20 samples are calculated according to the loss function

Then, the obtained 20 losses are sorted, and the following

descending-order sequence is obtained: [92.1, 87.4, 50.3, 29.9,

the classification losses for the data outliers are limited to 50.3 at

most; these serious data outliers are eliminated by the capped

discriminative ability and decrease the inter-level errors in the

hierarchical classification process

The following are two penalty terms used in our feature

sim-ilarity between the coarse-grained parent class and the

fine-grained child class

the model to select a compact and unique local feature subset for

discriminating the classes in the current i-th sub-classification

task, and discard features that are suitable for distinguishing the

categories in other sub-classification task This penalty term is

called the structural sparsity for the classes in the current subtask

For the i-th sub-classification task, the feature selection objective

function with the sparsity regularization term is as follows:

min

iWi yj

i

2;ei

þ k Wk ik2;1; ð3Þ

large weights across all the classes, and the features with small weights are not selected

An inter-level relation regularization term defined according to

used to optimize the feature selection process of the i-th sub-classification task A coarse-grained parent class is abstracted and generalized from the fine-grained child classes under it The sam-ples in one child class have to first be classified into a coarser-grained category (the parent class) and then into the current class Therefore, it is reasonable to believe that the selected feature sub-set for a child sub-classification task is similar to that selected for its parent sub-classification task The features related to the classes sharing different parent classes with the current class need to be

F

for the convenience of calculation This penalty term is called the inter-level constraint and added to the objective function Thus, the final objective function for the i-th sub-classification task can

be expressed as follows:

min

W i

min xj

iWi yj i

2;ei

þ k Wk ik2;1þaWi Wpi2

Finally, the hierarchical feature selection objective function of the entire hierarchical classification task is written as follows:

min

W i

XN i¼0

min xjiWi yj

i

2;ei

þ k Wk ik2;1

þaXN

i¼1

Wi Wpi

2.5 Optimization of HFSCN

In this section, we describe the optimization process for solving the objective function of HFSCN Assume diagonal matrices

follow-ing values, respectively:

djji ¼ 1

2 w ji

2

sjj

2 x

j

iWi yj i

1

iWi yj i

26ei

class label vector of the j-th sample in the i-th feature selection

Indð Þ ¼ 1; if xj

iWi yj i

26ei; 0; otherwise:

8

<

The hierarchical feature selection objective function can be rewritten as:

min

Wi

X N i¼0

Tr X ð i W i Y i Þ T S i ð X i W i Y i Þ

þ kTr W T

i D i W i

þaXN

i¼1

W i W pi

The optimization objective function of the root node (the 0-th sub-classification task) needs to be updated separately because it has no parent class The objective function of the root node can

be expressed as follows:

Trang 6

W 0 ;W i

X0W0 Y0

S0ðX0W0 Y0Þ þ kTr WT

0D0W0

þaXj jC0

i¼1

Tr Wð i W0ÞT

Wi W0

2 X T0S0X0þ kD0þaj jIC0

W0 2 XT

0S0Y0þaXj jC0

i¼1

Wi

!

¼ 0; ð11Þ

W0¼ XT

0S0X0þ kD0þaj jIC0

XT0S0Y0þaXj jC0

i¼1

Wi

!

Then, the following are the optimization processes of the other

sub-classification tasks that have parent classes The feature selection

objective function of the i-th sub-classification task can be

expressed as follows:

min

W i ;W pi

Tr Xð iWi YiÞTSiðXiWi YiÞ

þ kTr WT

iDiWi

þaTr Wi Wpi

Wi Wpi

follows:

2 XTiSiXiþ kDiþaI

Wi 2 XT

iSiYiþaWpi

Wi¼ XT

iSiXiþ kDiþaI

XTiSiYiþaWpi

Algorithm 1: Robust hierarchical feature selection with a capped ‘ 2 -norm (HFSCN).

Input:

(1) Data matrices X i 2 R m i d ; Y i 2 R m i C max and C T ;

(2) Sparsity regularization parameter k > 0;

(3) Paternity regularization coefficienta> 0;

(4) Outlier percentageeP 0.

Output:

(1) Feature weighted matrix set W ¼ W f 0 ; W 1 ; ; W N g, where W i 2 R dC max ;

(2) Selected feature subsets F ¼ F f 0 ; F 1 ; ; F N g

1: Set iteration number t¼ 1; initialize W ð Þ t with the each element of Wð Þit is 1;

i and S ð Þ t

i according to Eqs (6) and (7) , respectively;

// Update the feature weighted matrix for the 0-th sub-classification task

6:

Update W 0 :Wð0tþ1Þ¼ X T

0 Sð Þ0tX 0 þ kD ð Þ t

0 þaj jI C 0

X T

0 Sð Þ0tY 0 þaPj j C 0

i¼1 Wð Þit

; // Update the feature weighted matrix for the i-th sub-classification task

8:

Update W i : Wðitþ1Þ¼ X T

i S i X i þ kD i þaI

X T

i S i Y i þaW pi

;

10: Update Wð tþ1 Þ ¼ W ð tþ1 Þ

0 ; W ð tþ1 Þ

1 ; ; W ð tþ1 Þ N

;

13: for i ¼ 0 to N do

14: Rank w j

2 (j ¼ 1; ; n) in descending order for the i-th sub-classification

task;

15: Select the top ranked feature subset F i for the i-th sub-classification task;

The feature selection process of the initial hierarchical

obtained for the whole global hierarchical classification task The

n features for the i-th sub-classification task are sorted according

to wj

top-ranked features are selected

Based on the above analysis, the detailed algorithm to solve the

eliminated according to the least squares regression analysis and

weighted matrices are then calculated according to the sparse reg-ularization and the parent–child relationship constraint, as listed from Line 6 to Line 10 The iteration of the two aforementioned processes continue until the convergence of the objective function

the hierarchical classification task through the processes in Lines

13 and 14 It takes approximately six iterations before convergence

in the experiments

In this section, we introduce the experiment setup from the fol-lowing four aspects: (1) the real and synthetic experimental data-sets; (2) the compared methods; (3) the evaluation metrics used to discuss the performance of all methods; and (4) the parameter set-tings in the experiments

All experiments are conducted on a PC with 3.40 GHz Intel Core i7-3770, 3.40 GHz CPU, 16 GB memory, and Windows 7 operating

github.com/fhqxa/HFSCN

3.1 Datasets Experiments are conducted on four practical datasets (including two protein sequence datasets, one object image dataset, and one fine-grained car image dataset) from the machine learning reposi-tory and their 12 corrupted synthetic datasets These datasets from different application scenarios provided a good test ground for the evaluation of the performance of different hierarchical feature selection methods Detailed information about the initial datasets

follows:

(1) DD and F194 are two protein sequence datasets from Bioin-formatics Their samples are described by 473 features

protein hierarchy, namely class and fold, are used in the experiment The DD dataset contains 27 various protein folds, which originate four major structural classes:

sequences from 194 folds which belong to seven classes

bench-mark dataset in visual object category recognition and detection It contains 12,283 annotated consumer pho-tographs collected from the Flickr photo-sharing website, where 1,000 features are extracted to characterize each

Trang 7

image sample Its class hierarchy is shown inFig 2, and the

leaf nodes correspond to the 20 fine-grained car categories

of cars It consists of 196 classes and 15,685 images, covering

sedans, SUVs, coupes, convertibles, pickups, hatchbacks, and

station wagons Each image sample is characterized by 4,096

hier-archical class structure of Car196 and the leaf nodes

corre-spond to the 196 fine-grained car classes

For the corrupted datasets, some sample outliers are generated

according to different distributions and then included in the initial

training datasets The quantity of sample outliers in each corrupted

training set is 10% of the number of samples in the corresponding

initial training set But the test sets in all the corrupted datasets are

the same as those in all the initial datasets and not corrupted with

data outliers In addition, the feature dimensionality and

hierarchi-cal structures of the classes are not changed Three common

distri-butions are used to obtain three new synthetic datasets for each

initial dataset The detailed information about these 16 initial

3.2 Compared methods

HFSCN is compared with the baselines and seven hierarchical

feature selection methods in the hierarchical classification tasks

to evaluate the effectiveness and the efficiency of HFSCN for

hier-archical feature selection Top-down SVM classifiers are used in the

classification experiments The compared methods are as follows:

(1) Baselines, including F-Baseline and H-Baseline, classify the

test samples with all the initial features F-Baseline assumes

that the categories are independent of each other and

directly distinguishes among all the categories Different

from F-Baseline, H-Baseline considers the semantic

hierar-chy between the categories and classifies the samples from

coarse to fine

(2) HRelief is a hierarchical feature selection method that is a

instance-based learning

(3) HFisher uses the classical feature selection method Fisher

hierar-chy, where the Fisher score selects the features of the

labeled training data on the basis of the criterion of the best

discriminatory ability and can avoid the process of heuristic

search

(4) HFSNM is a hierarchical selection method that selects

mini-mization on both the loss function and the regularization

on all the data points In this method, a sparsity

regulariza-tion parameter needs to be predefined

method mRMR selects a feature subset based on the

minimal-redundancy and maximal-relevance criteria and avoids the difficult multivariate density estimation in maxi-mizing dependency

sibling relationships in the class hierarchy to optimize the feature selection process

parent–child relationship only during the hierarchical fea-ture selection process

(8) HFS-O is a hierarchical feature selection method modified

HFS-O uses the sparsity regularization to obtain the feature

fil-ter out the data outliers The parent–child and sibling

consideration in its feature selection process

There are three methods HFSRR-PS, HFSRR-P, and HFS-O related

among these three compared methods and HFSCN in terms of the outlier filtering loss function, parent–child relationship con-straint, and sibling relationship constraint We compare HFSCN with HFSRR-PS, HFSRR-P, and HFS-O as follow:

(1) Compared with HFSRR-PS and HFSRR-P, HFSCN is robust to deal with the inevitable data outliers The outlier filtering

func-tion can improve the discriminative ability of selected local feature subset and further alleviate the error propagation problem in the hierarchical classification process

(2) The sibling relationship constraint between classes is not used in HFSCN given its relatively minor effect improvement

in HFSRR-PS compared with HFSRR-P and the model com-plexity of HFSCN

(3) In contrast to HFS-O, HFSCN uses the similarity constraint between the parent and child classes to select a unique and compact local feature subset recursively for a

sub-Table 1

Initial dataset description.

Fig 4 Hierarchical tree structure of the object classes in DD.

Trang 8

classification task However, the relation among classes is

not taken into consideration and local feature subsets are

selected independently and respectively in HFS-O

3.3 Evaluation metrics

experiments to evaluate the performance of our method and the

compared methods On the one hand, the running time of the

fea-ture selection process and the classification time for testing the

selected feature subsets are evaluated for these methods On the

other hand, the effectiveness of the selected feature subsets for

classification is evaluated in terms of the classification accuracy,

-measure based on the least common ancestor These metrics are

described as follows:

selecting feature subsets by the different feature selection

algorithms F-Baseline and H-Baseline input all the initial

features directly in the classification without the feature

selection process Therefore, only the proposed method

and the other seven hierarchical feature selection methods

are compared based on this evaluation metric

clas-sifier learning and prediction process on the test set with the selected features

simplest evaluation metric for flat or hierarchical classifica-tion It is calculated as the ratio of the number of correctly predicted samples to the total number of test samples

for hierarchical classification models include hierarchical

PH¼j bDaug\ Daugj

j bDaugj ; RH¼j bDaug\ Daugj

jDaugj ;

FH¼2 PH RH

An Dð Þ; ^Daug¼ bDS

An b D

; D is the real label

of the test sample, An(D) represents the parent node set of

indi-cates the predicted class label of the test sample

Fig 5 Hierarchical tree structure of object classes in F194.

Fig 6 Hierarchical tree structure of object classes in Car196.

Trang 9

(5) Tree induced error (TIE) In hierarchical classification, we

should give different punishments on different types of

clas-sification errors In the proposed model, the penalty is

defined by the distance of nodes, which is termed the Tree

Hcp; cr

is the

is, TIE cp; cr

is the number of edges along the path from

This metric is based on but is different from the hierarchical

least common ancestor (LCA) is derived from the graph

as the least common

,

LCA c p; cr

consid-ering the least common ancestor

3.4 Parameter settings Some experimental parameters in the compared methods have

to be set in advance Parameter k of HRelief is varied over the set

1; 2; 3; 4; 5; 6; 7

result for each classification task The parameters shared by HFSCN and the compared methods except HRelief are set to the same

HFSNM, HFSRR-PO, HFSRR-P, HFS-O; (2) the paternity

For HFSCN, the three parameters are determined by a grid

3%, 4%, 5%, 6%, 8%, 10%, 12%, 14%, 16%, 18%, 20%} Different param-eter values with the best results are dparam-etermined for different data-sets A top-down C-SVC classifier modified from the classical

test-ing the selected features

In addition, all experiments are conducted using the 10-fold cross-validation strategy, and the average result is reported for each method

4 Experimental results and analysis

In this section, we present the experimental results and discuss the performance of the proposed method from the following five aspects: (1) the classification results with different numbers of fea-tures selected by HFSCN; (2) the effects of the outlier filtering mechanism and the inter-level constraint in HFSCN; (3) two run-ning time comparisons for feature selecting and feature testing; (4) performance comparisons using the flat and hierarchical

among HFSCN and the compared methods are evaluated statisti-cally; and (5) the convergence analysis of HFSCN

In all the tables presenting the experimental results, the best result is marked in bold, the next best result is underlined The mark ‘‘"” represents that the bigger the value, the better the perfor-mance; ‘‘#” indicates that the smaller the better

4.1 Classification results with different numbers of selected features

An experiment on four initial real datasets is conducted to ver-ify the changes of each evaluation metric when classver-ifying samples with different numbers of features selected by HFSCN The exper-imental results show that the changes of the results on each metric

Table 2

Detailed information about 16 experimental datasets.

No Dataset Sample group Feature description of sample outliers

(a) Initial and corrupted datasets for DD

distribution N 0; 1 ð Þ

distribution U 0; 1 ð Þ (b) Initial and corrupted datasets for F194

distribution U 0; 1 ð Þ (c) Initial and corrupted datasets for VOC

distribution U 0; 1 ð Þ (d) Initial and corrupted datasets for Car196

Car196-R

Car196 + Random Random noise in 0; 10 ½

Car196-N

Car196 + Gaussian Random noise obeys Gaussian

Car196-U

Car196 + Uniform Random noise obeys uniform

distribution U 0; 1 ð Þ

Table 3

Differences among HFSRR-PS, HFSRR-P, HFS-O, and HFSCN.

Trang 10

are consistent, so only the FHresults of the proposed method and

the compared methods on four initial datasets are presented in

Fig 7 From these results, we can obtain the following two

observations:

increasing as more and more features are selected In

addi-tion, the results no longer change or change little when

HFSCN selecting more than 47 features (approximately

10%)

(2) On the two image datasets VOC and Car196, the

classifica-tion results are relatively good and stable when more than

20% of features are selected by the proposed method

Based on these results, we select 10% of features for the protein

datasets and 20% of features for the image datasets

4.2 Effects of the outlier filtering mechanism and inter-level constraint

The first phase of the experiment explores the effects of HFSCN

on controlling the inter-level error propagation by removing the

data outliers During the feature selection process, the values of

these synthetic datasets is shown as red dotted lines), we can

obtain the following three observations:

(1) When the percentage of data outliers is set to or almost close

to 10% (the exact outlier ratio in synthetic datasets), HFSCN

obtains the best classification performance The larger the

of HFSCN This result is likely because some samples that are important and discriminative to the classes are removed (2) The proposed algorithm’s performance is poor when the

the data outliers can lower the learning ability of algorithms The second stage of the experiment explores the effects of the proposed algorithm on avoiding inter-level propagation The

fixed for different classification datasets The parent–child rela-tionship constraint is emphasized in varying degrees The

conclusions:

Therefore, an appropriate paternity constraint can help HFSCN to achieve more effective feature subsets and avoid inter-level error propagation

(2) An overemphasis on the shared common features of the par-ent class and child class will neglect the uniqueness between categories This conclusion is not very apparent for the

VOC-R dataset This could be attributed to the fact that the value

4.3 Running time comparisons for selecting and testing features

We discuss the performance of HFSCN and the compared meth-ods on two metrics: the running time for selecting features and the running time for testing the discriminative ability of the selected features in classification processes

Fig 7 F H results of HFSCN with different numbers of selected features: (a) DD; (b) F194; (c) VOC; (d) Car196.

Định dạng
Số trang	16
Dung lượng	1,46 MB