SPAM SURVEILLANCE MODEL BASED ON AIS The aim of this paper is to establish an immune-based model for dynamic spam detection.. Process of Email Surveillance matches the received mails to
Trang 1Immunity-based Method for Anti-Spam Model1
Jin Yang Department of Computer Science
LeShan Normal University
LeShan 614004, China
jinnyang@163.com
Yi Liu Department of Computer Science LeShan Normal University LeShan 614004, China bigluckboy@163.com
Qin Li Department of Computer Science LeShan Normal University LeShan 614004, China wkywawa@tom.com
Abstract—Widespread information technique use has led to the
emergence of email networks large-scale applications networks
in cyberspace But the traditional spam solutions for anti-spam
are mostly static methods, and the means of adaptive and real
time analyses the mail are seldom considered Inspired by the
theory of artificial immune systems (AIS), a novel distributed
anti-spam model that leverages e-mail networks’ topological
properties is presented The concepts and formal definitions of
immune cells are given, and dynamically evaluative equations
for self, antigen, immune tolerance, mature-lymphocyte
lifecycle are presented, and the hierarchical and distributed
management framework of the proposed model are built The
experimental results show that the proposed model has the
features of real-time processing and more efficient than
client-server-based solutions, thus providing a promising solution for
anti-spam system
Keywords-spam; artificial immune systems; anti-spam system
I INTRODUCTION The amount of unsolicited email has increased
dramatically in the past few years Spam is becoming a great
serious problem since it causes huge losses to the
organization, such as wasting the bandwidth, adding the
user’s time to deal with the insignificancy mail, enhancing the
mail server processing and causing the mail server to crush
[1] Anti-spam is the application of data investigation and
analysis techniques currently mainly by means of blocking
and filtering procedures [2] However, the current techniques
classifying a message as either spam or legitimate utilize the
methods such as identifying keywords, phrases, sending
address etc Keeping a blacklist of addresses to be blocked, or
an appointment list of addresses to be allowed are also used
widely There are a few disadvantages with using this
technique Because spammers can create many false from
e-mail addresses, it is difficult to maintain a black list that is
always updated with the correct e-mails to block [3]
Message filtering methods is straightforward and does not
require any modifications to existing e-mail protocols But
message filtering often rely on humans to create detectors
based on the spam they’ve received A dedicated spam
sender can use the frequently publicly available information
about such heuristics and their weightings to evade detection
[4] Some of the different approaches have been proposed
Neural networks also have been used for the detecting spam
1 This work was supported by the Scientific Research Fund of
Sichuan Provincial Education Department (No 08ZA130) and he
Scientific Research Fund of LeShan Normal University (No
Z0863)
[5] Using data mining method has been described as well But the methods of adaptive capture the potential sensitive traffic and real time analyses the mail are seldom considered Therefore, the traditional technology lack learning, self-adaptation and the ability of parallel distributed processing, calls for an effective and adaptive analyzing system for anti-spam
Gradually, researchers transfer their visions to the field of biological immune system, exploring new ways for bionic computation Artificial Immune Systems (AIS) is a now receiving more attention and is realized as a new research hotspot of biologically inspired computational intelligence approach after the genetic algorithms, neural networks and evolutionary computation in the research of Intelligent Systems Burnet proposed clone Selection Theory in 1958 [6] Negative Selection Algorithm and the concept of computer immunity proposed by Forrest in 1994 [7] It is known that the Artificial immune system has lots of appealing features[8-9] such as diversity, dynamic, parallel management, self-organization and self-adaptation that has been widely used in the fields such as [10-11] data mining, network security, pattern recognition, learning and optimization etc In this paper, we propose a new spam detection technique based on artificial immunity theory
II SPAM SURVEILLANCE MODEL BASED ON AIS The aim of this paper is to establish an immune-based model for dynamic spam detection The model is composed
of three processes: Process of Email Character distilling, Process of Email Surveillance, and Process of Training Process of Email Character distilling use vector space model and present the received mail in discrete words Process of Training generates various immature detectors from gene
library to distinguish Self and Non-self According to
immune principle, some of these new immature detectors are false detectors and they will be removed by the negative selection process, which matches them to the training mails
If the match strength between an immature detector and one
of the training mails is over the pre-defined threshold, this new immature detector is consider as a false detector Process of Email Surveillance matches the received mails to the mature detectors If the match strength between a received mail and one of detectors, the mail will be consider
as the spam The detail training phases are as following
A Self and Non-self
A biological immune system can produce antibodies to
resist pathogens through B cells distributing all over the
human body And T cells can regulate the antibody
2009 International Conference on Networks Security, Wireless Communications and Trusted Computing
Trang 2concentration An immune system can distinguish between
self and self to detect potentially dangerous These
non-self elements include antibodies and viruses In a spam
immune system, we distinguish legitimate messages from
spam We consider the text of the email include the headers
and the body as the antigen of a spam message In the model,
we define antigens (Ag) to be the features of email service
and the email information, and given by:
}
|
{ag ag D
Ag= ∈ , D= 0,}l
Antigens are binary strings extracted from the email
information received in the network environment The
antigen consists of the gene libraries of emails include
sender, sending organization, email service provider,
receiving organization, recipient fields, etc
The structure of an antibody is the same as that of an
Antigen For spam detection, the nonself set (Nonself)
represents abnormal information from a malignant email
service, while the self set (Self) is normal email service
Set Ag contains two subsets [12], Self ⊆ Ag and
Ag
Nonself ⊆ such that,
Ag Nonself
Self ∪ = , Self ∩Nonself =Φ (1)
For the convenience using the fields of a antigen x, a
subscript operator "." is used to extract a specified field
of x, where x.fieldname = the value of filed fieldname x
In the model, all the detectors form a Set Detector called
SD
} ,
,
| , ,
{ d age count d D age N count N
where d is the antibody gene that is used to match an
antigen, age is the age of detector d, count (affinity) is the
number of detector matched by antibody d, and N is the set
of nature numbers SD contains two subsets: mature and
memory, respectively, the set M and set T A mature SD
is a SD that is tolerant to self but is not activated by antigens
A memory SD evolves from a mature one that matches
enough antigens in its lifecycle Therefore,
φ
=
∩
∪
)}
,
(
, ,
| {
β
<
∧
>∉
<
∈
∀
∈
=
count x Match y
d x
Self y SD x x M
(3)
)}
,
(
, ,
| {
β
≥
∧
>∉
<
∈
∀
∈
=
count x Match y
d x
Self y SD x x T
(4)
where β(>0) represents the activation threshold Match
is a match relation defined by
} ) , ( , ,
| ,
In the course, β is the threshold of the affinity for the
activated detectors The affinity function f mathch(x,y) may
be any kind of Hamming, Manhattan, Euclidean, and
continuous matching, etc In this model, we take
r-continuous matching algorithm to compute the affinity of
mature Detectors
B The Dynamic Model of Self
In the anti-spam immune system has the same situation
as the biological immune system that the self changes over
time The legitimate mails will change over time along with some environment and personal behavior change such as the user contact friends list increase, develop new interests, discuss new issues, and write email by a new language etc
In order to prevent an antibody from matching a self, the recent formed antibody must be tested by self endurance before matching an antigen We use following formulation
to show the new antibody’s self endurance:
Self(t) =Self(0)={x1,x2, ,xn}, t=0 (6)
Self(t+Δt1)=Self(t) , t≥1∧Δt1 mod δ1≠0 (7)
Self(t+Δt2)= Self(t)+Self n e w (Δt2)-
(∂Self v a r i a t i o n /∂x)·Δt2, t≥1∧Δt2 mod δ1≠0 (8)
}
at time forbidden antigent
self the is
| {
t
}
at time permitted antigent self the is
| {
t
C The Dynamic Mature Detector Model
0 , 0 ) 0 ( )
1 )) ( ), ( ( ),
(
) ( )
( )
( )
≠ Δ
−
Δ +
Δ +
= Δ +
t Ag t M f when t M
t M
t M t M t t M
match dead
other from new
(12)
1 )) ( ), ( (
), 1 ( )
(
=
− Δ
⋅
∂
∂
⋅
∂
∂
=
t Ag t M f when
t x
M x
M t M
match active active clone
clone clone
(13)
1 ) ( ) (
, )
( ) (
+
= Δ +
Δ
⋅ +
= Δ +
t count M t t count M
t V t M t t
(14)
= Δ
⋅
∂
∂
=
x
M t M
new
new
∂
∂
t x
T active active (15)
1 )) 1 ( ), 1 ( (
) (
=
−
−
Δ
⋅
∂
∂
= Δ
t Self t
M f when
t x
M t M
match death
death
(16)
) (
) (
_
_
1
x
M t
M
other from
i other from k
i other
∂
∂
=
=
(17)
Equation (12) depicts the lifecycle of the mature detector, simulating the process that the mature detectors evolve into the next generation All mature detectors have a
fixed lifecycle (λ) If a mature detector matches enough
antigens (≥β) in its lifecycle, it will evolve to a memory detector However, the detector will be eliminated and replaced by new generated mature detector if they do not match enough antigens in their lifecycle M new (t) is the
generation of new mature SD ) M dead (t is the set of SD that
haven’t match enough antigens ( ≤β ) in lifecycle or
classified self antigens as nonself at time t M active (t) is the
set of the least recently used mature SD which degrade into memory SD and be given a new age T>0 and count β>1
Trang 3When the same antigens arrive again, they will be detected
immediately by the memory SD In the mature detector
lifecycle, the inefficient detectors on classifying antigens are
killed through the process of clone selection Therefore, the
method can enhance detection efficiency when the abnormal
behaviors intrude the email system again
As Figure 1 shows, system randomly creates the
immature detectors firstly, and then it computes the affinity
between the immature detectors and every element of
training example If the affinity of one immature detector is
over threshold, it will become a mature detector and will be
add into mature detector set System repeats this procedure
until mature detectors are created
Figure 1 The Dynamic Mature Detector Model
D The density of antibody dynamic evolvement
The Memory detector’s density of antibody expressed
the quantity and categories of the spam and malice intrusion,
reflecting the security level of the current system There are
two major changes of density of antibody
1) Increase: When the memory detector captures a
particular antigen, we simulate human immune system
functions to increase the density of antibody, representing
spam and malice intrusion quantity increase We use Vρ
reflect the increase speed of the density of antibody, then
the t moment the densityρ(t) of antibodies Mem SD (t) is:
t V t
t =ρ − + ρ⋅Δ
ρ ) ( 1) (18)
+∞
<
<
>
x
V
u h
0 , 0 ,
2
A
)
] ) [( 2
σ π
The more intensive invasion of antigen, the faster of
antibody density increase On the contrary, if memory
detector matches the invasion antigen relative less, the
increase rate of antibody density becomes slow As each
invasion antigen (spam) causes to the host or network
different degrees, we introduce parameter uto reflect the
damage degree caused, calculating by the experiment To
avoid memory detector for unlimited cloning, we regulate A
as the largest limiting growth of antibody density
2) Decrease: If memory detector fails to clone for a
cycle time, we make antibody density to decay according to equation (20):
) ( 2
1
ρ t = t− ,t≥τ (20)
The t is the half-life of antibody density When the
density of antibody goes down to 0.05, we cease antibody density attenuation ρ t)≤ετ =0.05 At this time shows that the antibody corresponding alarm is free
E The Antibody Variation
In order to prevent algorithm from converging prematurely, we take variation operation to the gene set
= 1
G {g1,g2,L,g i,Lg n} after the cross process Select variation point randomly and varied with some variation probability ( p m ) to generate new generation
=
new
G {g1,g2,L,g i',Lg n} Select variation point according
to Poisson distribution
L , 2 , 1 , 0 ,
! }
k
e k X P
k m
λ λ
(21) 0
) ( ) (X =D X =λ>
variation points Then the G1 turn into the offspring G new by the variation process
F The Process of Email Surveillance
Our model uses detector state conversion in the dynamic evolution of mature detector and memory detector, erasing and self matching detector As the Figure 2 shows, the undetected Emails are compared with memory detectors firstly If one e-mail match any elements of memory detector set, this Email is classified as spam and send alarming information to user Then, the remaining Emails which are filtered by memory detectors are compared to mature detectors Mature detectors must have become stimulated to classify an as junk, and therefore it is assumed the first stimulatory signal has already occurred Feedback from administrator is then interpreted to provide a stimulation signal If system receives affirmative co-stimulation in fixed period, the matched Email is classified
as spam Or else it is considered as normal Email and delivered to user client in the normal way During the filtering phase, when a mature detector matches one e-mail, the count field of mature detector will be added If the value
of filed count is over threshold, it will be activated and become a memory detector Meanwhile, if a memory detector can not match with any e-mails in fixed period, it will degenerate into a mature detector When the unsolicited emails and malice intrusions increase, we simulate immune system functions to increase the density of antibody; when they decrease, we simulate immune feedback functions and reduce the density of corresponding antibody, restoring it to normal level
Trang 4Figure 2 The Process of Email Surveillance
III EXPERIMENTAL RESULTS AND ANALYSIS
Experiments of simulation were carried out in our
Laboratory The main aim of the experiment was to test the
feasibility of the application for anti-spam based on AIS to
implement spam detecting And we developed some series
experiments Here are the coefficients for the model as the
Table 1 showing
TABLE I C OEFFICIENTS FOR THE MODEL
Parameter Value
r-contiguous bits matching rule 8
The size of initial self set n 40
The Initial Scale of Detectors 100
Match Threshold β 40~60
The Life Cycle of the Mature Detectors 120s
The first series of experiments were carried out to
testify the feasibility of our resolution for anti-spam as the
following We prepared the Ling-Spam datasets for analysis
and experiments A mixture of 481 spam messages and
2412 messages sent via the Linguist list, a moderated list
about the profession and science of linguistics Attachments,
HTML tags, and duplicate spam messages received on the
same day are not included The whole experiment is divided
into two phase: training phase and application phase The
main different between the two phases is that the former
does not use filtering module and just generates detectors
for system We partitioned the emails randomly into ten
parts and choose one part randomly as a training example,
then remaining nine parts are used for test and we can get 9
group recall and precision ratios The average value of these
9 group values is considered as the model’s recall and
precision ratio The Figure 3 below shows the average
performance of Bayesian method and our model in the
comparison experiment As indicated by the experiments, it
can be concluded that artificial immune-based detection of
spam can prove to be a useful technique
Figure 3 Results of Comparison Experiments
IV CONCLUSIONS Traditional spam filters system and technology almost adopted static measure, however, lack self-adaptation and the ability of parallel distributed processing, consequently unable to adjust to current network security situation In this paper, we have presented a model of spam detection based
on the theory of artificial immune system, and we have also illustrated the advantages of this model than traditional models The concepts and formal definitions of immune cells are given And we have quantitatively depicted the dynamic evolutions of self, antigens, immune-tolerance, and the immune memory Additionally, the model utilized a distributed and multi-hierarchy framework to provide an effective solution for the spam Finally, the experimental results show that the proposed model is a good solution for anti-spam system
REFERENCES [1] D D'Ambra, "Killer spam: clawing at your door", Inf Prof 4, vol 28,
no 4, 2007
[2] Le Zhang, Jingbo Zhu, Tianshun Yao, "An Evaluation of Statistical Spam Filtering Techinques", ACM Transactions on Asian Language Information Processing (TALIP) vol 3 ,2004, pp 243-269
[3] M.N Marsono, M Watheq, and F Gebali, "Binary LNS-based nạve Bayes inference engine for spam control: noise analysis and FPGA implementation", IET Comput Digit Tech, vol 56, no 2, 2008 [4] Mizrak.AT; Savage.S, "Detecting compromised routers via pocket forwarding behavior", IEEE Network, vol 22, no 2, 2008, pp 34-39 [5] Villa.O, Petrini.F, "Accelerating real-time string searching with multicore processors", Computer, vol.41, no 4, 2008, pp 42-44 [6] F.M.Burnet, "The Clone Selection Theory of Acquired Immunity Gambridge", Gambridge University Press ,1959
[7] T.B.Kepler, "Somatic hyper mutation in B cells: An optimal control treatment", Theoret Biol ,1993, pp 37-64
[8] S Forrest, A S Perelson, L Allen, and R Cherukuri, "Self-Nonself Discrimination in a Computer", Proceedings of IEEE Symposium on Re-search in Security and Privacy, Oakland, 1994
[9] Kim J, Bentley P, "The Artificial Immune Model for Network Intrusion Detection", 7th European Congress on Intelligent Techniques and Soft Computing, 1999
[10] Artin-Herran G, Rubel O, Zaccour G, "Competing for consumer's attention", AUTOMATICA, vol 44, 2008, pp 361-370
[11] Hanke.M, "On the effects of stock spam e-mails ", Journal Of Financial Markets, vol 11, 2008, pp 57-83
[12] T Li, "An Introduction to Computer Network Security 1st edition", Publishing House of Electronics Industry Beijing , 2004.