CLASSIFICATION MODELS FOR INTRUSION DETECTION SYSTEMS

CLASSIFICATION MODELS FOR INTRUSION DETECTION SYSTEMSemail: srinivas@cs.nmt.edu email: sung@cs.nmt.edu email: rajeev@nmt.edu Department of Computer Science, New Mexico Tech, Socorro, NM

Trang 1

CLASSIFICATION MODELS FOR INTRUSION DETECTION SYSTEMS

email: srinivas@cs.nmt.edu email: sung@cs.nmt.edu email: rajeev@nmt.edu

Department of Computer Science, New Mexico Tech, Socorro, NM 87801, USA Institute of Complex Additive Systems Analysis, New Mexico Tech, Socorro, NM 87801, USA

Key words: Machine learning, Intrusion detection systems, CART, MARS, TreeNet

ABSTRACT

This paper describes results concerning the classification

capability of supervised machine learning techniques in

detecting intrusions using network audit trails In this paper

we investigate three well known machine learning

techniques: classification and regression tress (CART),

multivariate regression splines (MARS) and treenet The

best model is chosen based on the classification accuracy

(ROC curve analysis) The results show that high

classification accuracies can be achieved in a fraction of the

time required by well known support vector machines and

artificial neural networks Treenet performs the best for

normal, probe and denial of service attacks (DoS) CART

performs the best for user to super user (U2su) and remote

to local (R2L)

1 INTRODUCTION

Since the ability of an Intrusion Detection System (IDS) to

identify a large variety of intrusions in real time with high

accuracy is of primary concern, we will in this paper

consider performance of machine learning-based IDSs with

respect to classification accuracy and false alarm rates

AI techniques have been used to automate the intrusion

detection process; they include neural networks, fuzzy

inference systems, evolutionary computation, machine

learning, support vector machines, etc [1-6] Often model

selection using SVMs, and other popular machine learning

methods requires extensive resources and long execution

times [7,8] In this paper, we present a few machine

learning methods (MARS, CART, TreeNet) that can

perform model selection with higher or comparable

accuracies in a fraction of the time required by the SVMs

MARS is a nonparametric regression procedure that is

based on the “divide and conquer” strategy, which

partitions the input space into regions, each with its own

regression equation [9] CART is a tree-building algorithm

that determines a set of if-then logical (split) conditions that

permit accurate prediction or classification of classes [10]

TreeNet a tree-building algorithm that uses stochastic

gradient boosting to combine trees via a weighted voting

scheme, to achieve accuracy without the drawback of a tendency to be misled by bad data [11,12]

We performed experiments using MARS, CART, Treenet for classifying each of the five classes (normal, probe, denial of service, user to super-user, and remote to local) of network traffic patterns in the DARPA data

A brief introduction MARS and model selection is given in section II CART and a tree generated for classifying normal vs intrusions in DARPA data is explained in section III TreeNet is briefly described in section IV Intrusion detection data used for experiments is explained

in section V In section VI, we analyze classification accuracies of MARS, CART, TreeNet using ROC curves Conclusions of our work are given in section VII

II MARS

Multivariate Adaptive Regression Splines (MARS) is a nonparametric regression procedure that makes no assumption about the underlying functional relationship between the dependent and independent variables Instead, MARS constructs this relation from a set of coefficients and basis functions that are entirely “driven” from the data The method is based on the “divide and conquer” strategy, which partitions the input space into regions, each with its own regression equation This makes MARS particularly suitable for problems with higher input dimensions, where the curse of dimensionality would likely create problems for other techniques

Basis functions: MARS uses two-sided truncated functions

of the form as basis functions for linear or nonlinear expansion, which approximates the relationships between the response and predictor variables A simple example of

two basis functions (t-x)+ and (x-t)+[9,11] Parameter t is

the knot of the basis functions (defining the "pieces" of the piecewise linear regression); these knots (parameters) are also determined from the data The "+" signs next to the

terms (t-x) and (x-t) simply denote that only positive results

of the respective equations are considered; otherwise the respective functions evaluate to zero

Trang 2

1The MARS Model

The basis functions together with the model parameters

(estimated via least squares estimation) are combined to

produce the predictions given the inputs The general

MARS









M m

m m

x

f

y

1

) ( )

Where the summation is over the M nonconstant terms in

the model, y is predicted as a function of the predictor

variables X (and their interactions); this function consists of

an intercept parameter ( o) and the weighted by ( m)

sum of one or more basis functionsh m X

Model Selection

After implementing the forward stepwise selection of basis

functions, a backward procedure is applied in which the

model is pruned by removing those basis functions that are

associated with the smallest increase in the (least squares)

goodness-of-fit A least squares error function (inverse of

goodness-of-fit) is computed The so-called Generalized

Cross Validation error is a measure of the goodness of fit

that takes into account not only the residual error but also

the model complexity as well It is given by

2 2

1

) / 1 /(

) (

GCV

N

i



with C1cd

Where N is the number of cases in the data set, d is the

effective degrees of freedom, which is equal to the number

of independent basis functions The quantity c is the

penalty for adding a basis function Experiments have

shown that the best value for C can be found somewhere in

the range 2 < d < 3 [9]

III CART

CART builds classification and regression trees for

predicting continuous dependent variables (regression) and

categorical predictor variables (classification) [10,11]

CART analysis consists of four basic steps1 [12]:

 The first step consists of tree building, during which a

tree is built using recursive splitting of nodes Each

resulting node is assigned a predicted class, based on

the distribution of classes in the learning dataset which

would occur in that node and the decision cost matrix

 The second step consists of stopping the tree building

process At this point a “maximal” tree has been

1 Reference [12] was accidentally omitted during the

editing process of the original manuscript Complete

reference is: R J Lewis An Introduction to Classification

and Regression Tree (CART) Analysis Annual Meeting of

the Society for Academic Emergency Medicine, 2000

produced, which probably greatly overfits the information contained within the learning dataset

 The third step consists of tree “pruning,” which results

in the creation of a sequence of simpler and simpler trees, through the cutting off of increasingly important nodes

 The fourth step consists of optimal tree selection, during which the tree which fits the information in the learning dataset, but does not overfit the information,

is selected from among the sequence of pruned trees The decision tree begins with a root node t derived from whichever variable in the feature space minimizes a measure of the impurity of the two sibling nodes The measure of the impurity or entropy at node t, denoted by i(t), is as shown in the following equation [11]:







f

t j w p t j w p t

i

1

) / ( log ) / ( )

(

Where p(wj | t ) is the proportion of patterns xi allocated to class wj at node t Each non-terminal node is then divided into two further nodes, tL and tR, such that pL , pR are the proportions of entities passed to the new nodes tL, tR respectively The best division is that which maximizes the difference given in [11]:

) ( )

( ) ( ) , (s t i t pi L t L pi R t R



The decision tree grows by means of the successive sub-divisions until a stage is reached in which there is no significant decrease in the measure of impurity when a further additional division s is implemented When this stage is reached, the node t is not sub-divided further, and automatically becomes a terminal node The class wj associated with the terminal node t is that which maximizes the conditional probability p(wj | t) No of nodes generated and terminal node values for each class are for the DARPA data set described in section V are presented in Table 1

Figure 1 Tree for classifying normal vs intrusions

E

B

B E AK AK C C

E C

A F AN AG

E E AK AF AI J

F

Trang 3

Figure 1 is represents a classification tree generated for

DARPA data described in section V for classifying normal

activity vs intrusive activity Each of the terminal node

describes a data value; each record is classifies into one of

the terminal node through the decisions made at the

non-terminal node that lead from the root to that leaf

Table 1 Summary of tree splitters for all

five classes.

Class Nodes No of Node Value Terminal

IV TREENET

In a TreeNet model classification and regression models are

built up gradually through a potentially large collection of

small trees Typically consist from a few dozen to several

hundred trees, each normally no longer than two to eight

terminal nodes The model is similar to a long series

expansion (such as Fourier or Taylor’s series) - a sum of

factors that becomes progressively more accurate as the

expansion continues The expansion can be written as

[11,13]:

) ( )

( )

(X F0 1T1 X 2T2 X T X

Where Ti is a small tree

Each tree improves on its predecessors through an

error-correcting strategy Individual trees may be as small as one

split, but the final models can be accurate and are resistant

to overfitting

V DATA USED FOR ANALYSIS

A subset of the DARPA intrusion detection data set is used

for offline analysis In the DARPA intrusion detection

evaluation program, an environment was set up to acquire

raw TCP/IP dump data for a network by simulating a

typical U.S Air Force LAN The LAN was operated like a

real environment, but being blasted with multiple attacks

[14,15] For each TCP/IP connection, 41 various

quantitative and qualitative features were extracted [16] for

intrusion analysis Attacks are classified into the following

types The 41 features extracted fall into three

categorties, “intrinsic” features that describe about

the individual TCP/IP connections; can be obtained

from network audit trails, “content-based” features

that describe about payload of the network packet;

can be obtained from the data portion of the network

packet, “traffic-based” features, that are computed

using a specific window (connection time or no of connections) As DOS and Probe attacks involve several connections in a short time frame, whereas R2U and U2Su attacks are embedded in the data portions of the connection and often involve just a single connection; “traffic-based” features play an important role in deciding whether a particular network activity is engaged in probing or not.

Attack types fall into four main categories:

 Denial of Service (DOS) Attacks: A denial of service attack is a class of attacks in which an attacker makes some computing or memory resource too busy or too full to handle legitimate requests, or denies legitimate users access to a machine Examples are Apache2, Back, Land, Mail bomb, SYN Flood, Ping of death, Process table, Smurf, Syslogd, Teardrop, Udpstorm

 User to Superuser or Root Attacks (U2Su): User to root exploits are a class of attacks in which an attacker starts out with access to a normal user account on the system and is able to exploit vulnerability to gain root access to the system Examples are Eject, Ffbconfig, Fdformat, Loadmodule, Perl, Ps, Xterm

 Remote to User Attacks (R2L): A remote to user attack is a class of attacks in which an attacker sends packets to a machine over a networkbut who does not have an account on that machine; exploits some vulnerability to gain local access as a user of that machine Examples are Dictionary, Ftp_write, Guest, Imap, Named, Phf, Sendmail, Xlock, Xsnoop

 Probing (Probe): Probing is a class of attacks in which

an attacker scans a network of computers to gather information or find known vulnerabilities An attacker with a map of machines and services that are available

on a network can use this information to look for exploits Examples are Ipsweep, Mscan, Nmap, Saint, Satan

In our experiments, we perform 5-class classification The (training and testing) data set contains 11982 randomly generated points from the data set representing the five classes, with the number of data from each class proportional to its size, except that the smallest class is completely included The set of 5092 training data and

6890 testing data are divided in to five classes: normal, probe, denial of service attacks, user to super user and remote to local attacks Where the attack is a collection of

22 different types of instances that belong to the four classes described in Section V, and the other is the normal data Note two randomly generated separate data sets of sizes 5092 and 6890 are used for training and testing MARS, CART, and TreeNet respectively Section VI summarizes the classifier accuracies

VI ROC CURVES

Detection rates and false alarms are evaluated for the five-class pattern in the DARPA data set and the obtained

Trang 4

results are used to form the ROC curves The point (0,1)

is the perfect classifier, since it classifies all positive

cases and negative cases correctly Thus an ideal

system will initiate by identifying all the positive

examples and so the curve will rise to (0,1)

immediately, having a zero rate of false positives,

and then continue along to (1,1)

Figures 2 to 6 show the ROC curves of the detection

models by attack categories as well as on all intrusions In

each of these ROC plots, the x-axis is the false positive

rate, calculated as the percentage of normal connections

considered as intrusions; the y-axis is the detection rate,

calculated as the percentage of intrusions detected A data

point in the upper left corner corresponds to optimal high

performance, i.e, high detection rate with low false alarm

rate Area of the ROC curves, no of false positives and

false negatives are presented in Tables 2 to 6

Table 2 Summary of classification accuracy

for normal.

Curve Area Positives False Negatives False

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Specificity (false positives)

MARS CART TreeNet

Figure 2 Classification accuracy for normal

for probe.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.2 0.4 0.6 0.8 1

MARS CART TreeNet

Figure 3 Classification accuracy for probe Table 4 Summary of classification accuracy

for DoS.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

MARS CART TreeNet

Figure 4 Classification accuracy for DoS

for U2Su.

Trang 5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

MARS CART TreeNet

Figure 5 Classification accuracy for U2Su

for R2L

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

MARS CART TreeNet

Figure 6 Classification accuracy for R2L

VII CONCLUSIONS

A number of observations and conclusions are drawn from

the results reported in this paper:

 TreeNet easily achieves high detection accuracy

(higher than 99%) for each of the 5 classes of DARPA

data Treenet performed the best for normal with 18

false positives (FP) and 0 false negatives (FP), probe

with 14 FP and 0 FN, and denial of service attacks

(DoS) with 3 FP and 9 FN

 CART performed the best for user to super user

(U2su) with 3 FP and 14 FN and remote to local

(R2L) with 15 FP and 6 FN

We demonstrate that using these fast execution machine

learning methods we can achieve high classification

accuracies in a fraction of the time required by the well know support vector machines and artificial neural networks

We note, however, that the difference in accuracy figures tend to be small and may not be statistically significant, especially in view of the fact that the 5 classes of patterns differ tremendously in their sizes More definitive conclusions perhaps can only be drawn after analyzing more comprehensive sets of network data

ACKNOWLEDGEMENTS

Partial support for this research received from ICASA (Institute for Complex Additive Systems Analysis, a division of New Mexico Tech), a DoD IASP, and an NSF SFS Capacity Building grants are gratefully acknowledged

REFERENCES

1 S Mukkamala, G Janowski, A H Sung, Intrusion Detection Using Neural Networks and Support Vector Machines Proceedings of IEEE International Joint Conference on Neural Networks 2002, IEEE press, pp 1702-1707, 2002

2 M Fugate, J R Gattiker, Computer Intrusion Detection with Classification and Anomaly Detection, Using SVMs International Journal of Pattern Recognition and Artificial Intelligence, Vol 17(3), pp 441-458, 2003

3 W Hu, Y Liao, V R Vemuri, Robust Support Vector Machines for Anamoly Detection in Computer Security International Conference on Machine Learning, pp 168-174, 2003

4 K A Heller, K M Svore, A D Keromytis, S J Stolfo, One Class Support Vector Machines for Detecting Anomalous Window Registry Accesses Proceedings of IEEE Conference Data Mining Workshop on Data Mining for Computer Security, 2003

5 A Lazarevic, L Ertoz, A Ozgur, J Srivastava, V Kumar, A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection Proceedings

of Third SIAM Conference on Data Mining, 2003

6 S Mukkamala, A H Sung, Feature Selection for Intrusion Detection Using Neural Networks and Support Vector Machines Journal of the Transportation Research Board of the National Academics, Transportation Research Record No 1822: 33-39, 2003

7 S J Stolfo, F Wei, W Lee, A Prodromidis, P K Chan, Cost-based Modeling and Evaluation for Data

Trang 6

Mining with Application to Fraud and Intrusion

Detection Results from the JAM Project, 1999

8 S Mukkamala, B Ribeiro, A H Sung, Model

Selection for Kernel Based Intrusion Detection

Systems Proceedings of International Conference on

Adaptive and Natural Computing Algorithms

(ICANNGA), Springer-Verlag, pp 458-461, 2005

9 T Hastie, R Tibshirani, J H Friedman, The elements

of statistical learning: Data mining, inference, and

prediction Springer, 2001

10 L Breiman, J H Friedman, R A Olshen, C J

Stone, Classification and regression trees Wadsworth

and Brooks/Cole Advanced Books and Software,

1986

11 Salford Systems TreeNet, CART, MARS Manual

12 R J Lewis An Introduction to Classification and

Regression Tree (CART) Analysis Annual Meeting of

the Society for Academic Emergency Medicine, 2000

13 J H Friedman, Stochastic Gradient Boosting Journal

of Computational Statistics and Data Analysis,

Elsevier Science, Vol 38, PP 367-378, 2002

14 K Kendall, A Database of Computer Attacks for the

Evaluation of Intrusion Detection Systems Master's

Thesis, Massachusetts Institute of Technology (MIT),

1998

15 S E Webster, The Development and Analysis of

Intrusion Detection Algorithms Master's Thesis, MIT,

1998

16 W Lee, S J Stolfo, A Framework for Constructing

Features and Models for Intrusion Detection Systems

ACM Transactions on Information and System

Security, Vol 3, pp 227-261, 2000

Định dạng
Số trang	6
Dung lượng	155,5 KB