DangThiKimNgan TV pdf 1 Master Thesis Master program, Graduate school of Information Management College of Business Chinese Culture University HTTP Botnet detection using decision tree Advisor Gu Hsin[.]
Trang 1ᅺγፕЎ!
!
Master Thesis Master program, Graduate school of Information Management
College of Business Chinese Culture University
а،ᐋୀෳ⻄ࡁᆛၡϐࣴز!
HTTP Botnet detection using decision tree
ࡰᏤ௲Ǻᒘك㗟
Advisor: Gu-Hsin Lai, Ph.D
ࣴزғǺᎅМߎሌ!
Graduate Student: Dang Thi Kim Ngan
ύ҇୯ 214 ԃ 7 Д!
Trang 3HTTP Botnet detection using decision tree
Student: Dang Thi Kim Ngan Advisor: Prof Gu-Hsin Lai
C h i n e s e C u l t u r e U n i v e r s i t y
ABSTRACT
Botnet is the most dangerous and widespread threat among the diverse forms of malware internet-attacks nowaday A botnet is a group of damaged computers connected via Internet which are remotely accessed and controlled by hackers to make various network attacks Malicious activities include DDoS attack, spam, click fraud, identity theft and information phishing The most basic characteristic of botnets is the use of command and control channels to communicate with botnet and through which bonet can be updated and command Botnet has become a common and effective tool used by Botmaster in many cyber-attacks Recently malicious botnets develop to HTTP botnets instead of typical IRC botnets HTTP botnets is the latest generations of Botnet ,and it use the standard HTTP protocol to contact with their bots By using the normal HTTP traffic, the bots is consider as normal users of the networks, and the current network security systems cannot detect out them To solve this problem, a method based
on network behavior analysis system was evolved to improve modify and adding new features to the current methods of detecting HTTP-based Botnets and their bots
In this research we will apply data mining Decision tree algorithms to automate detecting malicious characteristics from large amount of data, which the common heuristics and signature based methods could not use We will design and develop feature filters and algorithms to analyze the collected network packets to look for any evidence of suspicious HTTP-based Botnets activities In addition to HTTP-based Botnet detection, some of the HTTP header fields were used to analyze the level of danger of detected suspicious activities
Key words: Botnet detection, HTTP botnet, Data mining, C & C channels, HTTP
Trang 4ACKNOWLEDGMENT
I cannot precisely say when I began work on this thesis, although I am sure to mourn, a little, its completion The submission of this dissertation marks the end of a somewhat long journey in my pursuit of Masters’ degree at the Chinese Culture University, Taiwan The journey would have been difficult if not for all the help, understanding and kindness of many people
Without doubt, I would like to express my sincere gratitude to my supervisors, Dr Gu_Hsin, Lai for his kindness to take me under their charge to conduct this research His patience and encouragement gave me the motivation to work on this research until its successful completion His guidance and readiness to share his knowledge and experiment have greatly contributed to the direction I should take and what I should do
to achieve my goal I cannot thank them enough
While doing my studies and research in thesis, one can say that one is never working alone I have the friendship, goodwill and support of my course-mates and friends, who have never hesitated to offer their advice and moral support when it is needed To my good friend, in particular, thank you for being there whenever I needed someone to go to for advice
I would like to express my gratitude and love to my family for their care and understanding when I was doing my research To the two special peoples in my life, my mother and my father, your boundless love, and for your confidence in me, you have been my pillars of strength and determination to help me to completed this thesis and if
I have succeeded, then you have been a big part of my success, and I dedicate it to both
of you together with my love
Dang Thi Kim Ngan
17 June 2014
Trang 5
CONTENTS
ABSTRACT iii
ACKNOWLEDGMENT iv
LIST OF FIGURES viii
LIST OF TABLES ix
CHAPTER ONE INTRODUCTION 1
1.1 Botnets: Current Largest Security Threat 1
1.2 Statement of Problem 2
1.3 Statement of Objectives 3
1.4 Thesis Scope 3
1.5 Thesis Organization 3
CHAPTER TWO RELATED WORK 5
2.1 Overview 5
2.2 Botnet Centralized Command and Control (C&C) Mechanism 5
2.3 IRC-based Botnets 7
2.4 HTTP-based Botnets 9
2.5 Existing Botnet Detection Methods 11
2.5.1 Honeypot and Honeynet 11
2.5.2 Detection by Signature 12
2.5.3 Detection by DNS Monitoring 12
2.5.4 Detection using Attack Behavior Analysis 14
2.6 Detection Based on Network Behavior Analysis 15
2.6.1 Why Choose Network Behaviour Analysis? 15
2.6.2 Existing Detection Methods Based on NBA 15
2.7 Evaluation of Existing NBA Methods for Botnet Detection 17
2.8 Conclusion 19
CHAPTER THREE PROPOSAL METHOD 20
Trang 63.2Proposed flow chart 20
3.3 Data collection: 21
3.3.1 HTTP traffic filter 22
3.3.2 GET and POST method filter 22
3.4 Feature extraction 23
3.5 Feature Selection 23
3.6 Decision tree algorithms 26
3.6.1 Information Gain 26
3.6.2 Gain ratio 27
CHAPTER FOUR EXPERIMENTAL RESULT 29
4.1Introduction 29
4.1 The Dataset 29
4.2 Preprocessing 32
4.3 Transformation 32
4.3.1 Attribute Reduction 32
4.3.2 Discretization 34
4.4 Data mining 34
CHAPTER FIVE CONCLUSION AND FUTURE WORK 45
5.1 Introduction 45
5.2 Achievement of Objectives 45
5.3 Limitations and Future Work 46
5.3.1 Real Time Detection: 46
5.3.2 Small data 46
5.3.3 Other Type of Bots and Botnets: 46
5.3.4 Prevention Methods: 46
5.3.5 Advanced the HTTP field header for Botnet detection: 47
Trang 75.4 Conclusion 47 REFERENCES 48
Trang 8LIST OF FIGURES
Figure 2-1 The rational relationship between these three factors 5
Figure 2-2 The structure of a Centralized Command and Control Botnet 6
Figure 2-3 The structure of a Decentralized Command and Control Botnet 6
Figure 2-4 The IRC-based Command and Control Botnets 9
Figure 2-5 The HTTP based C&C Botnet 10
Figure 3-2 The flowchart of HTTP traffic filter 22
Figure 3-3 The flowchart of GET and POST filter 23
Figure 4-1 The ranked attribute computed by gain ratio and information gain 33
Figure 4-2 The data table in WEKA 33
Figure 4-3 The ranges of content length attribute after discretized 34
Figure 4-4 The Weka Explore 35
Figure 4-5 The decision tree obtained from J48 39
Figure 4-6 Result experiment 1 of datase split 50% training 40
Figure 4-7 Result experiment 1 of datase split 30% training 41
Figure 4-8 Result experiment 2 of datase split 50% training 43
Trang 9LIST OF TABLES
Table 3-1 List of attributes in this study 25
Table 4-1 List of attributes in normal dataset 30
Table 4-2 List of attributes in infected dataset 30
Table 4-2 List of attributes in infected dataset (continued) 30
Table 4-3 List of attribute in our dataset 31
Table 4-4 Describe method usage in HTTP properties 36
Table 4-5 Describe version usage in HTTP properties 36
Table 4-6 Version token from User-agent in HTTP traffic 37
Table 4-7 Platform token from User-agent in HTTP traffic 37
Table 4-8 The number of leaves and size of the tree before and after pruning 40
Table 4-9 Experimental result 1 41
Table 4-10 Compare result between experiment 1 and experiment 2 44