07 - immunity-based method for anti-spam model

SPAM SURVEILLANCE MODEL BASED ON AIS The aim of this paper is to establish an immune-based model for dynamic spam detection.. Process of Email Surveillance matches the received mails to

Trang 1

Immunity-based Method for Anti-Spam Model1

Jin Yang Department of Computer Science

LeShan Normal University

LeShan 614004, China

jinnyang@163.com

Yi Liu Department of Computer Science LeShan Normal University LeShan 614004, China bigluckboy@163.com

Qin Li Department of Computer Science LeShan Normal University LeShan 614004, China wkywawa@tom.com

Abstract—Widespread information technique use has led to the

emergence of email networks large-scale applications networks

in cyberspace But the traditional spam solutions for anti-spam

are mostly static methods, and the means of adaptive and real

time analyses the mail are seldom considered Inspired by the

theory of artificial immune systems (AIS), a novel distributed

anti-spam model that leverages e-mail networks’ topological

properties is presented The concepts and formal definitions of

immune cells are given, and dynamically evaluative equations

for self, antigen, immune tolerance, mature-lymphocyte

lifecycle are presented, and the hierarchical and distributed

management framework of the proposed model are built The

experimental results show that the proposed model has the

features of real-time processing and more efficient than

client-server-based solutions, thus providing a promising solution for

anti-spam system

Keywords-spam; artificial immune systems; anti-spam system

I INTRODUCTION The amount of unsolicited email has increased

dramatically in the past few years Spam is becoming a great

serious problem since it causes huge losses to the

organization, such as wasting the bandwidth, adding the

user’s time to deal with the insignificancy mail, enhancing the

mail server processing and causing the mail server to crush

[1] Anti-spam is the application of data investigation and

analysis techniques currently mainly by means of blocking

and filtering procedures [2] However, the current techniques

classifying a message as either spam or legitimate utilize the

methods such as identifying keywords, phrases, sending

address etc Keeping a blacklist of addresses to be blocked, or

an appointment list of addresses to be allowed are also used

widely There are a few disadvantages with using this

technique Because spammers can create many false from

e-mail addresses, it is difficult to maintain a black list that is

always updated with the correct e-mails to block [3]

Message filtering methods is straightforward and does not

require any modifications to existing e-mail protocols But

message filtering often rely on humans to create detectors

based on the spam they’ve received A dedicated spam

sender can use the frequently publicly available information

about such heuristics and their weightings to evade detection

[4] Some of the different approaches have been proposed

Neural networks also have been used for the detecting spam

1 This work was supported by the Scientific Research Fund of

Sichuan Provincial Education Department (No 08ZA130) and he

Scientific Research Fund of LeShan Normal University (No

Z0863)

[5] Using data mining method has been described as well But the methods of adaptive capture the potential sensitive traffic and real time analyses the mail are seldom considered Therefore, the traditional technology lack learning, self-adaptation and the ability of parallel distributed processing, calls for an effective and adaptive analyzing system for anti-spam

Gradually, researchers transfer their visions to the field of biological immune system, exploring new ways for bionic computation Artificial Immune Systems (AIS) is a now receiving more attention and is realized as a new research hotspot of biologically inspired computational intelligence approach after the genetic algorithms, neural networks and evolutionary computation in the research of Intelligent Systems Burnet proposed clone Selection Theory in 1958 [6] Negative Selection Algorithm and the concept of computer immunity proposed by Forrest in 1994 [7] It is known that the Artificial immune system has lots of appealing features[8-9] such as diversity, dynamic, parallel management, self-organization and self-adaptation that has been widely used in the fields such as [10-11] data mining, network security, pattern recognition, learning and optimization etc In this paper, we propose a new spam detection technique based on artificial immunity theory

II SPAM SURVEILLANCE MODEL BASED ON AIS The aim of this paper is to establish an immune-based model for dynamic spam detection The model is composed

of three processes: Process of Email Character distilling, Process of Email Surveillance, and Process of Training Process of Email Character distilling use vector space model and present the received mail in discrete words Process of Training generates various immature detectors from gene

library to distinguish Self and Non-self According to

immune principle, some of these new immature detectors are false detectors and they will be removed by the negative selection process, which matches them to the training mails

If the match strength between an immature detector and one

of the training mails is over the pre-defined threshold, this new immature detector is consider as a false detector Process of Email Surveillance matches the received mails to the mature detectors If the match strength between a received mail and one of detectors, the mail will be consider

as the spam The detail training phases are as following

A Self and Non-self

A biological immune system can produce antibodies to

resist pathogens through B cells distributing all over the

human body And T cells can regulate the antibody

2009 International Conference on Networks Security, Wireless Communications and Trusted Computing

Trang 2

concentration An immune system can distinguish between

self and self to detect potentially dangerous These

non-self elements include antibodies and viruses In a spam

immune system, we distinguish legitimate messages from

spam We consider the text of the email include the headers

and the body as the antigen of a spam message In the model,

we define antigens (Ag) to be the features of email service

and the email information, and given by:

}

|

{ag ag D

Ag= ∈ , D= 0,}l

Antigens are binary strings extracted from the email

information received in the network environment The

antigen consists of the gene libraries of emails include

sender, sending organization, email service provider,

receiving organization, recipient fields, etc

The structure of an antibody is the same as that of an

Antigen For spam detection, the nonself set (Nonself)

represents abnormal information from a malignant email

service, while the self set (Self) is normal email service

Set Ag contains two subsets [12], Self ⊆ Ag and

Ag

Nonself ⊆ such that,

Ag Nonself

Self ∪ = , Self ∩Nonself =Φ (1)

For the convenience using the fields of a antigen x, a

subscript operator "." is used to extract a specified field

of x, where x.fieldname = the value of filed fieldname x

In the model, all the detectors form a Set Detector called

SD

} ,

,

| , ,

{ d age count d D age N count N

where d is the antibody gene that is used to match an

antigen, age is the age of detector d, count (affinity) is the

number of detector matched by antibody d, and N is the set

of nature numbers SD contains two subsets: mature and

memory, respectively, the set M and set T A mature SD

is a SD that is tolerant to self but is not activated by antigens

A memory SD evolves from a mature one that matches

enough antigens in its lifecycle Therefore,

φ

=

∩

∪

)}

,

(

, ,

| {

β

<

∧

>∉

<

∈

∀

∈

=

count x Match y

d x

Self y SD x x M

(3)

)}

,

(

, ,

| {

β

≥

∧

>∉

<

∈

∀

∈

=

count x Match y

d x

Self y SD x x T

(4)

where β(>0) represents the activation threshold Match

is a match relation defined by

} ) , ( , ,

| ,

In the course, β is the threshold of the affinity for the

activated detectors The affinity function f mathch(x,y) may

be any kind of Hamming, Manhattan, Euclidean, and

continuous matching, etc In this model, we take

r-continuous matching algorithm to compute the affinity of

mature Detectors

B The Dynamic Model of Self

In the anti-spam immune system has the same situation

as the biological immune system that the self changes over

time The legitimate mails will change over time along with some environment and personal behavior change such as the user contact friends list increase, develop new interests, discuss new issues, and write email by a new language etc

In order to prevent an antibody from matching a self, the recent formed antibody must be tested by self endurance before matching an antigen We use following formulation

to show the new antibody’s self endurance:

Self(t) =Self(0)={x1,x2, ,xn}, t=0 (6)

Self(t+Δt1)=Self(t) , t≥1∧Δt1 mod δ1≠0 (7)

Self(t+Δt2)= Self(t)+Self n e w (Δt2)-

(∂Self v a r i a t i o n /∂x)·Δt2, t≥1∧Δt2 mod δ1≠0 (8)

}

at time forbidden antigent

self the is

| {

t

}

at time permitted antigent self the is

| {

t

C The Dynamic Mature Detector Model

0 , 0 ) 0 ( )

1 )) ( ), ( ( ),

(

) ( )

( )

≠ Δ

−

Δ +

= Δ +

t Ag t M f when t M

t M

t M t M t t M

match dead

other from new

(12)

1 )) ( ), ( (

), 1 ( )

(

=

− Δ

⋅

∂

⋅

∂

=

t Ag t M f when

t x

M x

M t M

match active active clone

clone clone

 (13)

1 ) ( ) (

, )

( ) (

+

= Δ +

Δ

⋅ +

= Δ +

t count M t t count M

t V t M t t

(14)

= Δ

⋅

∂

=

x

M t M

new

∂

t x

T active active (15)

1 )) 1 ( ), 1 ( (

) (

=

−

Δ

⋅

∂

= Δ

t Self t

M f when

t x

M t M

match death

death

(16)

) (

_

1

x

M t

M

other from

i other from k

i other

∂

=

(17)

Equation (12) depicts the lifecycle of the mature detector, simulating the process that the mature detectors evolve into the next generation All mature detectors have a

fixed lifecycle (λ) If a mature detector matches enough

antigens (≥β) in its lifecycle, it will evolve to a memory detector However, the detector will be eliminated and replaced by new generated mature detector if they do not match enough antigens in their lifecycle M new (t) is the

generation of new mature SD ) M dead (t is the set of SD that

haven’t match enough antigens ( ≤β ) in lifecycle or

classified self antigens as nonself at time t M active (t) is the

set of the least recently used mature SD which degrade into memory SD and be given a new age T>0 and count β>1

Trang 3

When the same antigens arrive again, they will be detected

immediately by the memory SD In the mature detector

lifecycle, the inefficient detectors on classifying antigens are

killed through the process of clone selection Therefore, the

method can enhance detection efficiency when the abnormal

behaviors intrude the email system again

As Figure 1 shows, system randomly creates the

immature detectors firstly, and then it computes the affinity

between the immature detectors and every element of

training example If the affinity of one immature detector is

over threshold, it will become a mature detector and will be

add into mature detector set System repeats this procedure

until mature detectors are created

Figure 1 The Dynamic Mature Detector Model

D The density of antibody dynamic evolvement

The Memory detector’s density of antibody expressed

the quantity and categories of the spam and malice intrusion,

reflecting the security level of the current system There are

two major changes of density of antibody

1) Increase: When the memory detector captures a

particular antigen, we simulate human immune system

functions to increase the density of antibody, representing

spam and malice intrusion quantity increase We use Vρ

reflect the increase speed of the density of antibody, then

the t moment the densityρ(t) of antibodies Mem SD (t) is:

t V t

t =ρ − + ρ⋅Δ

ρ ) ( 1) (18)

+∞

<

>

x

V

u h

0 , 0 ,

2

A

)

] ) [( 2

σ π

The more intensive invasion of antigen, the faster of

antibody density increase On the contrary, if memory

detector matches the invasion antigen relative less, the

increase rate of antibody density becomes slow As each

invasion antigen (spam) causes to the host or network

different degrees, we introduce parameter uto reflect the

damage degree caused, calculating by the experiment To

avoid memory detector for unlimited cloning, we regulate A

as the largest limiting growth of antibody density

2) Decrease: If memory detector fails to clone for a

cycle time, we make antibody density to decay according to equation (20):

) ( 2

1

ρ t = t− ，t≥τ (20)

The t is the half-life of antibody density When the

density of antibody goes down to 0.05, we cease antibody density attenuation ρ t)≤ετ =0.05 At this time shows that the antibody corresponding alarm is free

E The Antibody Variation

In order to prevent algorithm from converging prematurely, we take variation operation to the gene set

= 1

G {g1,g2,L,g i,Lg n} after the cross process Select variation point randomly and varied with some variation probability ( p m ) to generate new generation

=

new

G {g1,g2,L,g i',Lg n} Select variation point according

to Poisson distribution

L , 2 , 1 , 0 ,

! }

k

e k X P

k m

λ λ

(21) 0

) ( ) (X =D X =λ>

variation points Then the G1 turn into the offspring G new by the variation process

F The Process of Email Surveillance

Our model uses detector state conversion in the dynamic evolution of mature detector and memory detector, erasing and self matching detector As the Figure 2 shows, the undetected Emails are compared with memory detectors firstly If one e-mail match any elements of memory detector set, this Email is classified as spam and send alarming information to user Then, the remaining Emails which are filtered by memory detectors are compared to mature detectors Mature detectors must have become stimulated to classify an as junk, and therefore it is assumed the first stimulatory signal has already occurred Feedback from administrator is then interpreted to provide a stimulation signal If system receives affirmative co-stimulation in fixed period, the matched Email is classified

as spam Or else it is considered as normal Email and delivered to user client in the normal way During the filtering phase, when a mature detector matches one e-mail, the count field of mature detector will be added If the value

of filed count is over threshold, it will be activated and become a memory detector Meanwhile, if a memory detector can not match with any e-mails in fixed period, it will degenerate into a mature detector When the unsolicited emails and malice intrusions increase, we simulate immune system functions to increase the density of antibody; when they decrease, we simulate immune feedback functions and reduce the density of corresponding antibody, restoring it to normal level

Trang 4

Figure 2 The Process of Email Surveillance

III EXPERIMENTAL RESULTS AND ANALYSIS

Experiments of simulation were carried out in our

Laboratory The main aim of the experiment was to test the

feasibility of the application for anti-spam based on AIS to

implement spam detecting And we developed some series

experiments Here are the coefficients for the model as the

Table 1 showing

TABLE I C OEFFICIENTS FOR THE MODEL

Parameter Value

r-contiguous bits matching rule 8

The size of initial self set n 40

The Initial Scale of Detectors 100

Match Threshold β 40~60

The Life Cycle of the Mature Detectors 120s

The first series of experiments were carried out to

testify the feasibility of our resolution for anti-spam as the

following We prepared the Ling-Spam datasets for analysis

and experiments A mixture of 481 spam messages and

2412 messages sent via the Linguist list, a moderated list

about the profession and science of linguistics Attachments,

HTML tags, and duplicate spam messages received on the

same day are not included The whole experiment is divided

into two phase: training phase and application phase The

main different between the two phases is that the former

does not use filtering module and just generates detectors

for system We partitioned the emails randomly into ten

parts and choose one part randomly as a training example,

then remaining nine parts are used for test and we can get 9

group recall and precision ratios The average value of these

9 group values is considered as the model’s recall and

precision ratio The Figure 3 below shows the average

performance of Bayesian method and our model in the

comparison experiment As indicated by the experiments, it

can be concluded that artificial immune-based detection of

spam can prove to be a useful technique

Figure 3 Results of Comparison Experiments

IV CONCLUSIONS Traditional spam filters system and technology almost adopted static measure, however, lack self-adaptation and the ability of parallel distributed processing, consequently unable to adjust to current network security situation In this paper, we have presented a model of spam detection based

on the theory of artificial immune system, and we have also illustrated the advantages of this model than traditional models The concepts and formal definitions of immune cells are given And we have quantitatively depicted the dynamic evolutions of self, antigens, immune-tolerance, and the immune memory Additionally, the model utilized a distributed and multi-hierarchy framework to provide an effective solution for the spam Finally, the experimental results show that the proposed model is a good solution for anti-spam system

REFERENCES [1] D D'Ambra, "Killer spam: clawing at your door", Inf Prof 4, vol 28,

no 4, 2007

[2] Le Zhang, Jingbo Zhu, Tianshun Yao, "An Evaluation of Statistical Spam Filtering Techinques", ACM Transactions on Asian Language Information Processing (TALIP) vol 3 ,2004, pp 243-269

[3] M.N Marsono, M Watheq, and F Gebali, "Binary LNS-based nạve Bayes inference engine for spam control: noise analysis and FPGA implementation", IET Comput Digit Tech, vol 56, no 2, 2008 [4] Mizrak.AT; Savage.S, "Detecting compromised routers via pocket forwarding behavior", IEEE Network, vol 22, no 2, 2008, pp 34-39 [5] Villa.O, Petrini.F, "Accelerating real-time string searching with multicore processors", Computer, vol.41, no 4, 2008, pp 42-44 [6] F.M.Burnet, "The Clone Selection Theory of Acquired Immunity Gambridge", Gambridge University Press ,1959

[7] T.B.Kepler, "Somatic hyper mutation in B cells: An optimal control treatment", Theoret Biol ,1993, pp 37-64

[8] S Forrest, A S Perelson, L Allen, and R Cherukuri, "Self-Nonself Discrimination in a Computer", Proceedings of IEEE Symposium on Re-search in Security and Privacy, Oakland, 1994

[9] Kim J, Bentley P, "The Artificial Immune Model for Network Intrusion Detection", 7th European Congress on Intelligent Techniques and Soft Computing, 1999

[10] Artin-Herran G, Rubel O, Zaccour G, "Competing for consumer's attention", AUTOMATICA, vol 44, 2008, pp 361-370

[11] Hanke.M, "On the effects of stock spam e-mails ", Journal Of Financial Markets, vol 11, 2008, pp 57-83

[12] T Li, "An Introduction to Computer Network Security 1st edition", Publishing House of Electronics Industry Beijing , 2004.

Tiêu đề	Immunity-based method for anti-spam model
Tác giả	Jin Yang, Yi Liu
Trường học	LeShan Normal University
Chuyên ngành	Computer Science
Thể loại	Conference Paper
Năm xuất bản	2009
Thành phố	LeShan

Định dạng
Số trang	4
Dung lượng	381,44 KB