22 -Figure 4-2 Data structure of reassembly memory with a 4-hole out-of-sequence connection .... If the intrusion patterns are included in a single packet, they can be detected by tradit
Trang 2Cán bộ hướng dẫn khoa học : TS Trần Ngọc Thịnh
(Ghi rõ họ, tên, học hàm, học vị và chữ ký) Cán bộ chấm nhận xét 1 : TS Đinh Đức Anh Vũ
(Ghi rõ họ, tên, học hàm, học vị và chữ ký) Cán bộ chấm nhận xét 2 : TS Trần Mạnh Hà
(Ghi rõ họ, tên, học hàm, học vị và chữ ký) Luận văn thạc sĩ được bảo vệ tại Trường Đại học Bách Khoa, ĐHQG Tp HCM ngày 6 tháng 1 năm 2012
Thành phần Hội đồng đánh giá luận văn thạc sĩ gồm: (Ghi rõ họ, tên, học hàm, học vị của Hội đồng chấm bảo vệ luận văn thạc sĩ) 1 TS Trần Văn Hoài
2 TS Trần Ngọc Thịnh
3 TS Đinh Đức Anh Vũ
4 TS Trần Mạnh Hà
5 TS Vũ Đức Lung
Xác nhận của Chủ tịch Hội đồng đánh giá LV và Trưởng Khoa quản lý chuyên ngành sau khi luận văn đã được sửa chữa (nếu có)
Trang 3NHIỆM VỤ LUẬN VĂN THẠC SĨ
Họ tên học viên: Trần Huy Vũ MSHV:09070923
Ngày, tháng, năm sinh: 09/12/1896 Nơi sinh: Đồng Nai
Chuyên ngành: Khoa Học Máy Tính Mã số : 604801
I TÊN ĐỀ TÀI: Nghiên cứu và hiện thực bộ tiền xử lí cho hệ thống phát hiện xâm nhập NIDS
II NHIỆM VỤ VÀ NỘI DUNG:
- Tìm hiểu bộ tiền xử lí của các hệ thống phát hiện xâm nhập mạng NIDS hiện có trên thế giới ………
- Đề xuất giải pháp cải tiến cho bộ tiền xử lí
- Lựa chon platform để hiện thực bộ tiền xử lí, kiểm nghiệm hệ thống
III NGÀY GIAO NHIỆM VỤ :…04/07/2011
IV NGÀY HOÀN THÀNH NHIỆM VỤ: …06/01/2012
V CÁN BỘ HƯỚNG DẪN : …TS Trần Ngọc Thịnh
Tp HCM, ngày 05 tháng 12 năm 2011
CÁN BỘ HƯỚNG DẪN
(Họ tên và chữ ký) KHOA QUẢN LÝ CHUYÊN NGÀNH (Họ tên và chữ ký)
TS Trần Ngọc Thịnh TS Đinh Đức Anh Vũ
Ghi chú: Học viên phải đóng tờ nhiệm vụ này vào trang đầu tiên của tập thuyết minh LV
Trang 4ACKNOWLEDGEMENT
Foremost, I would like to thank my advisor Dr Tran Ngoc Thinh, Head of Department
of Computer Engineering, Faculty of Computer Science and Engineering, Ho Chi Minh city University of Technology His encouragement and his guidance is the motivation for me to proceed my thesis I greatly appreciate his comments on this thesis as well as on my conference papers
I would also like to thank the Computer Engineering graduate committee for the comments They offer me many ideas to improve my work in the future They also provided me a chance to prove myself capable
Finally, I would like to thank my family members and friends for their supports and encouragements
Tran Huy Vu
Trang 5TÓM TẮT LUẬN VĂN
Ngày nay, hệ thống mạng đóng một vai trò rất quan trọng trong mọi lĩnh vực Chính sự quan trọng của hệ thống mạng làm cho nó trở nên một trong những thành phần dễ bị tổn hại nhất Để khắc phục nhược điểm trên, các Hệ Thống Phát Hiện Xâm Nhập Mạng đã được giới thiệu Khi các hệ thống này, nhất là các hệ thống NIDS bằng phần cứng, xử lí các gói tin TCP, chúng đều cần một bộ Tiền Xử Lí để lắp ghép các dòng TCP nhằm làm tăng sức mạnh cho các hệ thống NIDS này Trong luận văn này, chúng tôi đề xuất một phương pháp lai để hiện thực một bộ Tiền Xử Lí cho NIDS Hệ thống trong luận văn này không chỉ hỗ trợ hang trăm ngàn kết nối đồng thời với những kết nối đứt đoạn, mà còn sử dụng bộ nhớ hiệu quả hơn những hệ thống trước đó Kết quả thực nghiệm cho thấy hệ thống này hỗ trợ lên tới 256 ngàn kết nối đồng thời, và khoảng 46 ngàn các kết nối đứt đoạn chỉ với 64MB DRAM Hệ thống cũng hỗ trợ các NIDS pháp hiện các mẫu tấn công rải trên nhiều gói
Trang 6ABSTRACT
Nowadays, networking plays a very important role in every fields of life The importance of the network also makes it a vulnerable part of many organizations To overcome this weak point, Network Intrusion Detection Systems (NIDS) are introduced When these NIDSes, especially hardware NIDS, process packets of Transmission Control Protocol (TCP), they need a preprocessor to reassemble discrete TCP packets in a flow to strengthen the NIDS In this thesis, we propose a hybrid method to implement a preprocessor for an NIDS Our system not only supports thousands of TCP connections with multiple out-of-sequence data segments but also uses memory more efficiently than other systems The experimental results show that our system can hold about 256K connections simultaneously and support up to 46K out-of-sequence connections with only 64MB DRAM This system also supports NIDSs to detect attack patterns which expand over packets
Trang 7COMMITMENT
I commit that this thesis is made based on my own research
I do not copy or use any illegal material in this thesis All reference of this thesis are cited from public resource or in permission of the authors/publisher
I am directly responsible for any contradiction with the content of this thesis
Tran Huy Vu
Trang 8Table of Contents
ACKNOWLEDGEMENT I TÓM TẮT LUẬN VĂN II ABSTRACT III COMMITMENT IV
LIST OF FIGURES VIII LIST OF TABLES XI
Chapter 1 Introduction 1
-1.1 Motivation 1
-1.2 Statement of problem 3
-1.3 Contribution 4
-1.4 Organization 4
-Chapter 2 Background 5
-2.1 Transmission Control Protocol 5
-2.1.1 TCP/IP model 5
-2.1.2 Flow control mechanism 8
-2.1.3 Three way hand shaking 9
-2.2 TCP reassembly 9
-2.2.1 TCP packets reordering 9
-2.2.2 TCP flow reassembly 11
-2.3 Network intrusion detection system 11
-2.3.1 Snort IDS 13
Trang 9-2.3.2 NIDS project at Faculty of Computer Science and Engineering HCMUT -
14
-2.4 NetFPGA board 15
-2.4.1 Board specification 15
-2.4.2 Design with the reference design 17
-Chapter 3 Related works 18
-3.1 The TCP processor [4] 18
-3.2 Outoforder TCP stream scanning [3] 19
-3.3 Robust TCP reassembly for backbone traffic [1] 19
-3.4 TCP reassembly for Sachet IDS [5] 20
-3.5 Robust TCP stream reassembly of Sarang Dharmapurikar and Vern Paxson [2] 20 -Chapter 4 Method of TCP reassembly 22
-4.1 Method for reordering TCP packets 22
-4.1.1 The data structure 22
-4.1.2 Operation on reassembly memory 28
-4.2 Method for reassembling TCP flow 31
-Chapter 5 System implementation 33
-5.1 The Input Controller module 33
-5.2 The Packet Manager module 35
-5.3 The Flow Controller module 35
-5.4 The Reassembler module 37
Trang 10-5.5 The Memory controller module 38
-5.6 The Output controller module 39
-Chapter 6 Evaluation of the TCP Reassemble Engine 40
-6.1 Deployment model 40
-6.2 Experimental result 42
-6.2.1 Concurrent connections 42
-6.2.2 Memory utilization 43
-6.2.3 Throughput 44
-6.2.4 Capability of supporting NIDS 48
-Chapter 7 Conclusion and future work 50
REFERENCE 51
Trang 11-LIST OF FIGURES
Figure 11 Outofsequence packets passing an NIDS 2
Figure 12 Deployment model of the Preprocessor and NIDS 3
Figure 21 TCP/IP model and packing data of TCP packet 5
Figure 22 IPv4 header format 6
Figure 23 TCP header format 7
Figure 24 the Flow control mechanism 8
Figure 25 Three way hand shaking in TCP connection 9
Figure 26 outoforder TCP packets passing an NIDS 10
Figure 27 Five situations of TCP hole filling up 10
Figure 28 NIDS and the deployment model 11
Figure 29 HIDS and deployment model 12
Figure 210 Architecture of Snort 13
Figure 211 The NetFPGA board 16
Figure 41 A segment of packet S, packet S+1, packet S+2 in a linked list 22
-Figure 4-2 Data structure of reassembly memory with a 4-hole out-of-sequence connection 23
-Figure 4-3 Structure of the segment array Each element is divided and store at different place 24
-Figure 4-4Structure of a memory block If a packet does not used a whole block, others packets in the same segment can fill in the block 25
Figure 45 Structure of a connection record 26
Figure 46Creating of new segment 28
Figure 47 Inserting a packet to a segment 29
Figure 48 Merging two segments 30
Figure 49 Releasing of e segment array 31
Trang 12-Figure 4-10 The original one-edge buffering scheme, l = 6; a) the successive packet
arrives, b) the preceded packet arrives 32
-Figure 4-11 Modified Two-edge buffering scheme for ordered TCP stream, with l = 5 - 32 Figure 51 Block diagram of the Preprocessor 33
Figure 52 The Input controller 33
Figure 53 The Packet manager 35
Figure 54 The Flow Controller 35
Figure 55 The Reassembler 37
Figure 56 The Memory controller 38
Figure 57 The Output controller 39
Figure 61 Deployment model of the preprocessor and NIDS 40
Figure 63 the incoming paket, rx_ll_data holds the data 41
Figure 62 Individual test for the Preprocessor 41
-Figure 6-4 The payload of the output packet is inserted with the last 32 bytes of the previous packet 42
-Figure 6-5 Maximum throughput of the system when percentage of out-of-sequence packets is 0% 44
-Figure 6-6 Throughput of the system with lcock rate=125MHz when percentage of outofsequence packets is 0% 45
-Figure 6-7 Maximum throughput of the system when percentage of out-of-sequence packets is 5% 45
-Figure 6-8 Throughput of the system with clock rate = 125MHz when percentage of outofsequence packets is 5% 46
-Figure 6-9 Maximum throughput of the system when percentage of out-of-sequence packets is 10% 46
Trang 13-Figure 6-10 Throughput of the system with clock rate = 125MHz when percentage of out-of-sequence packets is 10% - 47 -Figure 6-11 Number of rules with different length - 48 -
Trang 14LIST OF TABLES
Table 6-1 Percentage of supported connection types of the TCP Reassembly Engine and other systems - 43 -Table 6-2 Memory utilization of the TCP Reassembly Engine and other systems for single-hole connections only - 43 -
Trang 15Chapter 1 Introduction
1.1 Motivation
Nowadays network is vital to almost every organization E-Commerce is an instance and it is growing rapidly based on the web infrastructure E-Government has been deployed in some countries and continually expanded; and many companies or schools use network to communicate with their staff and customers Because of such importance of network, the security of the network is a serious issue to be solved The network can be the most vulnerable part of an organization, and it should be protected from many crimes Statistics show that the information crime has increased dramatically for recent years, they are widely known as the Cybercrime Interpol reports the cost of Cybercrime or Computer crime [9] worldwide reach $ 8 billion in
2007 and 2008 This type of crime always uses a computer and a network [10] to carry out their illegal intrusion or simply to disable a server by DoS (Denial of Service) attack These intrusions are based on many types of protocols However, the Transmission Control Protocol (TCP) is the most popular, the authors in [4] showed that 85% of network traffic is TCP This explains why many Cybercrimes use TCP packets to attack to a server In order to prevent an information system from these intrusions, some Network Intrusion Detection / Prevention Systems (NIDS/NIPS) have been being developed to prevent attacks on an organization However, because of the nature of TCP protocol, packets can reach a destination in the original sequence or a different sequence If the intrusion patterns are included in a single packet, they can be detected by traditional NIDS/NIPS; but if the intrusion patterns expand over packets, and these packets do not arrive in the original sequence (out-of-sequence), they cannot
be detected Thus these out-of-sequence TCP packets should be re-ordered before it
Trang 16enters an NIDS/NIPS Moreover, because the network speed can reach 1Gbps or more, and there can be a large number of concurrent connections, keeping track of all of these connections can lead to memory exhausting Study [4] shows that full TCP reassembly requires large amount of memory, up to 2GB for each 1Gbps connection Therefore, it
is necessary to develop a TCP Reassembly Engine (TCPRE), which supports high throughput (more than 1Gbps), monitors a large number of concurrent connections and use memory efficiently
In addition, FPGA is now the solution for many hardware related problem First introduced in late 1980s, initially the function of FPGA is mainly prototyping of hardware designs From then on, FPGA has been developing quite rapidly Recently, many high-speed FPGAs has been released, they can fulfill the requirements of many hardware designs In addition to the high speed of new FPGA, they have some other excellent properties; FPGAs can be easily and quickly reconfigured, they are very good for parallel processing and pipelining, and FPGA is a low-cost solution
Because of the advantages of FPGA, there are trends to use FPGA to solve network problems in the world Several researches has been proceeded to reassemble out-of-sequence TCP packets using FPGA [1, 2, 3, 4, 5] These researches are classified in three types which use three different methods, (1) dropping out-of-sequence packets, (2) buffering out-of-sequence packets and (3) out-of-sequence matching (for TCP stream scanning) All these systems use FPGA to implement the design
Figure 1-1 Out-of-sequence packets passing an NIDS
This http: //www attack is an pattern NIDS
This should be reassembled as “this is an attack pattern”
Trang 171.2 Statement of problem
The rapid growth of network, in which the TCP/IP protocol is the most widely used, motivates the development of network applications The releases of many modern FPGA offer high-speed, high throughput solution for these network applications Many
of these applications require TCP packets to be reassembled before so that the applications are more efficient and stronger NIDS is such an application The NIDS systems can deploy a “deep packet inspection” function This includes both static matching and Perl Compatible Regular Expression (PCRE) matching Therefore, this thesis aims to building a TCP preprocessor system with a special technique to support both matching schemes efficiently The main function of this TCP Preprocessor is to analyze the packet protocol and send supported packets, such as UDP, TCP…, to application circuit (NIDS) It also re-orders TCP packets and reassembles TCP flows before passing them to the application circuit (NIDS) as shown in Figure 1-2 Besides
it also manages the traffic on the line, and somehow makes the preprocessor transparent to the user
Figure 1-2 Deployment model of the
Preprocessor and NIDS
Packet classification (Header processing)
Management
Deep Packet inspection (Content processing)
Packet decoder
Alert
Incoming packet
Outgoing packet Preprocessor
NIDS
Trang 181.3 Contribution
This thesis introduces new method TCP reassembly which takes the advantages of both reassembly techniques, the technique of buffering out-of-sequence packets and the technique of our-of-sequence matching above The following are its contributions:
It proposes a new method of TCP reassembly, supports both TCP re-ordering and flow reassembly
It proposes a new data structure to manage the reassembly memory efficiently which supports buffering multi-hole connections
Preprocessor is implemented on FPGA platform This Preprocessor supports hundreds thousands of concurrent connections and tens of thousands of multi-hole connections
1.4 Organization
The thesis is organized as follows
Chapter 2 states some background knowledge about network, the TCP/IP protocol and the TCP reassembly, Network Intrusion Detection system is also mentioned
Chapter 3 briefly describes the related researches in the world
Chapter 4 explains our method of TCP reassembly and the data structure which
is used in this thesis
Chapter 5 presents the implementation of our technique on targeted hardware platform
Chapter 6 is our experimental result and evaluation
Chapter 7 is our conclusion and the future work
Trang 19The lowest level is close to the physical layer (optic fiber, twisted pair cable, co-axial cable …), its function is to encode and send the data from the internet layer to the transmission media, or receive and decode the data from transmission media to the
{Telnet, FTP, HTTP, SMTP,…}
{TCP, UDP, IGMP, ICMP,…}
{IP, IPSEC,…}
{Ethernet, Token Ring, Frame Relay, ATM,…}
Application Layer Transport Layer
Internet Layer Network interface Layer
Data
Data TCP
Header
TCP Header
Data
IP Header
TCP Header
Data
IP Header
Frame
Header
Frame Footer
Figure 2-1 TCP/IP model and packing data of TCP packet
Trang 20internet layer Ethernet, Token Ring, ATM… belong to this layer, these protocols help transmitting data from one machine to another machines in the same network Among these protocols, the Ethernet protocol is the most widely used Based on the network layer, some protocols are developed IP (Internet Protocol) is an example; it allows the data to be transmitted from one machine in a network to another machine in other network The data is packed with an IP header before it is sent to the Network layer as
in Figure 2-2 The Version is always 4 for IPv4 (IPv6 has different header format) IHL, TOS and Total Length are IP Header Length, Type Of Service and the length of
IP packet correspondingly When an IP packet travel from a network with large MTU (Maximum Transmission Unit) to a network with smaller MTU, the size of the packet can exceed the MTU; in this case, the IP packet must be fragmented into smaller packet so that they can be transmitted in the network The 3-bit Flags field indicates whether the packet is fragmented or not The Flags value of 2 means the packet is not fragmented The value of 4 means the packet is fragmented; in this case, the Identification field identifies the sub-packet, and the Fragment Offset field shows the offset of the first data byte of the packet from the fist data byte in the original packet TTL means Time To Live, it is the maximum number of stations that the packet can travel through Protocol indicates the protocol in the next higher layer Header Checksum is the 16-bit checksum value calculated for the header only Each machine communicate via IP is addressed by a 32-bit IP address (IPv4) or 128-bit IP address (IPv6) Options field is optional information and can be omitted; it is padding with zero
to be 32-bit aligned
Version IHL TOS Total Length
Identification Flags Fragment Offset TTL Protocol Header Checksum
Source IP Address Destination IP Address
Padding Options
Figure 2-2 IPv4 header format
Trang 21In TCP/IP model, the TCP protocol in located in transport layer, and it is constructed based on Internet Protocol (IP), a connectionless protocol However, TCP is connection-oriented It requires both terminals to setup a connection before communicate with each other via this connection The header of a TCP packet is illustrated in Figure 2-3 Each connection is identified by the source machine IP address and the destination machine IP address in the IP header, the source port and destination port in TCP header, these port fields represent the applications to process the data The data is divided into smaller parts (if necessary) so that each of them can
be packed in an IP packet These IP packets are sent to the Network layer and then the physical media corresponding to the original order of each part The order of these packets is expressed by a 32-bit sequence number The Acknowledgement number is usually the next sequence number the destination machine expects to receive Offset field of the first data byte from the start of the header There are many flags in the Flags field; however, there are three flags which are frequently used The SYN, ACK and FIN flags, the meaning of these flags will be explained later The Window size field is used for flow control activity and it will be explained in the next part too Urgent Pointer is an offset from the sequence number to the last urgent data byte, this field is meaningful only if the URG flags in Flags field are set Options fields is optional information and can be omitted, it is padded with zero to be 32-bit aligned
Figure 2-3 TCP header format
Data
…
Source Port Destination Port
Sequence Number Acknowledge Number Offset Resv Flags Window size
Checksum Urgent Pointer
Options
Trang 222.1.2 Flow control mechanism
Based on the network infrastructure, a packet can travel through many routers, and thus
it can be dropped at any router due to errors The source machine has to detect this situation to retransmit any dropped packet To ensure the destination machine to receive all packets, and the efficient retransmission of packets, the flow control mechanisms are introduced There are two main mechanisms: Go-back-N ARQ (Auto Repeat Request) and Selective Repeat ARQ as described in Figure 2-4 In Go-Back-N ARQ mechanism, if the transmitter cannot receive the acknowledgement of the packet within a reasonable time after transmission, it automatically retransmits the packet and all successive packets This method is simple, but it can cause the network traffic to be over loaded because of many retransmitted packets The mechanism of Selective Repeat requires the receiver to have the capability of buffering packets The transmitter can keep track of a number of packets, equals to window size, and it retransmits only packets, which are not acknowledge correctly This mechanism uses the bandwidth more efficiently; however, the number of bytes to be buffered can reach the maximum
of Window size Moreover, the Window size can be left shifted up to 14 bits, so the maximum number of byte to be buffered can reach 1GB for each direction of a connection
Figure 2-4 the Flow control mechanism
Trang 232.1.3 Three way hand shaking
Before transmitting data from a machine to another machine using TCP protocol, a connection must be established between the two machines The procedure to establish
a connection is call three way hand shaking as described in the Figure 2-5 The client first sends a TCP packet with SYN flag set, the sequence number is set to a random number called Initial Sequence Number (ISN) and wait for the server to send the TCP packet with both SYN and ACK flag set, the acknowledgement number is set to ISN_A + 1, the sequence number is set to another random number ISN_B and wait for the client to send the TCP packet with ACK flag set, the acknowledgement number is set to ISN_B_1 At this time, both client and server can send or receive data
2.2 TCP reassembly
2.2.1 TCP packets re-ordering
Because of transmission errors and the retransmission mechanisms, packets can reach a receiver in the order different from the original order The first TCP packet (the SYN packet) in a connection is always in the right order, so the SYN packet is in-sequence
A TCP packet is called in-sequence if its sequence number is the next expected sequence number (ACK number) of the last in-sequence packet, or else it is called out-
Figure 2-5 Three way hand shaking in TCP connection
Trang 24of-sequence We call one or more consecutive missed TCP packets as a TCP hole or simply a hole, and we call consecutive successive TCP packets as a TCP segment or simply a segment as illustrated in Figure 2-6 In a connection, there can be one or more concurrent holes, and one hole can be made up from one or more missed packets
When a hole is created, there are five situations that a packet fills in a hole The Figure 2.7a is the situation when a packet is in-sequence, but it does not fill the whole hole, it only makes the hole narrower In the Figure 2.7b, the packet is in-sequence, and it fulfill the hole; therefore the first out-of-sequence become in-sequence, and should be processed by the application An out-of-sequence packet can be pre-pended or appended to a segment as in Figure 2.7c and 2.7d correspondingly, and it only makes the hole narrower In the Figure 2.7e, a packet is out-of-sequence, and it fulfills a hole, and it is adjacent to both segments; in this case, the two segments and the packet become only one segment
Coming packet In-sequence
packets hole segment
Figure 2-6 out-of-order TCP packets passing an NIDS
Figure 2-7 Five situations of TCP hole filling up
SYN Packet0 Packet1 Packet2 Packet3 Packet4 Packet5 Packet6 Packet7
In-sequence hole segment
Out-of-sequence hole segment
Out-of-sequence
Trang 252.2.2 TCP flow reassembly
Only re-order TCP packets does not help the NIDS engine detect attack patterns which expand over packets These packets must be reassembled to make a TCP flow logically continuous to NIDS engine If the application uses a FSM, assembling interleaved packets, which are in the right order, can be obtained by storing and restoring the FSM state of application circuit at the beginning and the end of a segment correspondingly However, many applications do not deploy a FSM; in this case, the flow reassembly can be carried out by load and store overlapped data at the edges of packets
2.3 Network intrusion detection system
An Intrusion Detection System (IDS) is a software or hardware system which monitors
a network and attempts to detect any illegal intrusion activities [10] An IDS can be a Network-based Intrusion Detection System (NIDS), a Host-based Intrusion Detection System (HIDS) or a Network-based Intrusion Protection System (NIPS) An NIDS is usually installed on a backbone network [12, 13] to monitor all from/to the protected network as illustrated in Figure 2-8
Figure 2-8 NIDS and the deployment model
Trang 26A HIDS is usually installed on any host of a network which needs to be protect [12, 13], the name also indicates that a HIDS only monitors the host on which it is installed, and it does not monitor the entire network
Figure 2-9 HIDS and deployment model
An NIPS is more powerful than an NIDS An NIDS only monitors the traffic on the network; it does not make changes to the traffic An NIPS, on the contrary, does not only monitor the traffic, but also drops or redirects the data once that data is judged as intrusion activity
Recently, IDSs attract many researchers Initially, IDSs were usually softwares running
on servers or personal computer It was good because the network speed was not very high at that time However, the network infrastructure grows rapidly; the network
Trang 27speed reaches tens of Giga-bit-per-second (Gbps) now The software cannot tolerate such high speed; therefore, several hard IDSs/IPSs are introduced to protect a network
at line rate The following are some software and hardware IDSs currently being developed
2.3.1 Snort IDS
Snort is an open source light weight IDS which is developed by Sourcefire [11] First introduced in 1998 as a sniffer, Snort has been developing continually with more powerful functions Snort is now the most widely deployed IDS with millions of downloads It can operate on many platforms such as Windows, Linux, Solaris, MacOS… Snort is easy to use; it can be configured to operate as an NIDS or NIPS The operation of Snort is mainly based on the predefined rule set; the architecture of Snort is described in Figure 2-10
Sniffer: capturing all packets from the network
Preprocessor: reassembling TCP flow, defragment the fragmented IP packets
Detection engine: matching the header and the content of packets with the rule
set
Alert/ Logging: logging packets or generating alerts
Figure 2-10 Architecture of Snort
Sniffer Preprocessor Detection
Trang 28As stated above, Operation of Snort is based on the rule set A rule in the Snort rule set
is well-formated so that not only expert can compose the rule, but a normal person can also write a specific rule for his purposes The syntax of the rule is quite simple as in Figure 2-11
alert udp $EXTERNAL_NET any -> $HOME_NET 5060 (msg:“VOIP-SIP MultiTech INVITE field buffer overflow
pcre:”/^INVITE\s[^\s\r\n]{60}/smi”;
reference:bugtraq, 15711; reference:cve, 2005-4050; classtype:attempted-user; sid:11981; rev:4;)
Though a snort rule can have many fields, only the content and pcre fields are mentioned The content keyword indicates the static pattern, in this case the pattern is
INVITE The keyword tells the Snort to scan the entire payload for the text INVITE
inside the payload; if there is any, Snort will issue an alert message The keyword pcre
indicate the regular expression to be scan in the payload The regular expression is
written in Perl-Like format, so it is called Perl-Like Regular Expression (PCRE)[] If there is any text matched the pcre, Snort will issue an alert as well
2.3.2 NIDS project at Faculty of Computer Science and Engineering
HCMUT
Though Snort is a good IDS, it is not reliable when operating in high speed network, for example Gigabit lines Several hardware solutions for NIDS are proposed to meet the requirements of high speed network At Faculty of Computer Science and Engineering, University of Technology, there is a research project to implement an NIDS on FPGA platform This NIDS also uses the Snort rule to detect intrusion patterns It deploys both static pattern matching and PCRE matching The detection engine includes two main parts, the packet classification module classifies packets based on the header and Snort rules, and the content inspection module matches the payload of packets with Snort rules
Trang 29 The packet classification module uses Cuckoo hashing method to classify packets The 5-tuple record, {source IP, destination IP, source port, destination port, protocol}, of the header is used to calculate the hash value This module classifies UDP and TCP packets only
To carry out the static pattern matching, it scans the input data and matches each byte with the pattern using Cuckoo hashing For the long patterns, it uses the method proposed by Dr Tran Ngoc Thinh [8] In this method the long patterns
is split into smaller sub-patterns with the length from 1 to 16 characters These sub-patterns are marked to distinguish the prefix, infix or short patterns
The PCRE matching sub-engine is introduced in [9], it uses the approach of NFA with many Sub-RegEx Units for matching character in PCRE and many CRBs (Constraint Repetition Block) for matching repetition operators in PCRE These matching engines can now detect intrusion patterns in an individual packet only, because when it finishes matching a packet, if there is another packet from other flow arrives, all matching status of the old packet will be lost Besides, neither of them have
an explicit FSM, so the system cannot store and restore the FSM when the next packet
of the same flow arrives Therefore, a preprocessor supports this system need to apply another technique to reassemble flows, for example storing overlapped data between two consecutive packets
2.4 NetFPGA board
2.4.1 Board specification
NetFPGA is a low cost, reconfigurable hardware platform which is optimized for speed networking and developed by Stanford University It is equipped with an FPGA chip and several Gigabit Ethernet interfaces Figure 2-11is a NetFPGA board
Trang 30high-The following is the specification of the board NetFPGA 1G:
Xilinx FPGA: VirtexXC2VP50
Xilinx FPGA: Spartan to control PCI interface and to program the Virtex chip
4.5MB SRAM
64MB DDR2 SDRAM
4X Gigabit Ethernet ports
Standard PCI interface
Support JTAG debug interface
Figure 2-11 The NetFPGA board
Trang 312.4.2 Design with the reference design
The NetFPGA package supplies the user many reference designs such as reference router, reference nic, dram controller, Ethernet mac … When designing with NetFPGA board, the user can save much time by using these reference design The structure of a reference project is as following:
Project directory
src: contains all verilog code to be synthesized
synth: contains XCO file, Makefile, and to implement the design
sw: contains all software programs
include: contains all header files or files that define macro
Trang 32Chapter 3 Related works
The main requirements of a TCP Reassembly Engine are high throughput, large number of concurrent connections and efficient memory utilization However, it is very difficult to fulfill these requirements together; for example, if the system supports many concurrent connections, it has to use a lot of memory To solve these problems efficiently, as well as to balance these requirements, there are several researches on TCP reassembly, these researches either solve part of TCP reassembly problem or solve the problem in some ways, which are not very efficient Below are some of these researches
3.1 The TCP processor [4]
As stated above, if the source machine cannot receive the acknowledgement of a TCP packet, it will retransmit the packet automatically The TCP Processor in [4] uses this retransmission mechanism to reorder the out-of-sequence packets It drops all out-of-sequence packets Because the destination machine cannot receive the packet, it will not acknowledge to the source machine; therefore, the source machine will retransmit the missed packet and all out-of-sequence packets regardless of what the flow control mechanism is In this way, the flow control mechanism will be forced to Go-Back-N The advantages of this approach are simplicity, memory saving, but it causes the network traffic to be heavily loaded, and prevent the destination terminal from efficient acknowledgement The authors chose this approach because a statistical result in [7] shows that only about 5% of TCP packets are out-of-sequence However, though the percentage of out-of-sequence connection is little, these connections are usually long connections; the number of retransmitted packets can be very large
Trang 333.2 Out-of-order TCP stream scanning [3]
In this research, the authors design a TCP stream scanning engine which does not require the packets to be re-ordered There are two schemes which are introduced in the paper, the Two-edge buffering scheme and the One-edge buffering scheme In
Two-edge buffering scheme, the system stores l-1 data byte at both the starting edge and ending edge of each TCP fragment; assumed l is the longest length of patterns If
the preceding packet of the fragment arrives, the matching engine will scan the packet and then the starting edge of the fragment which is buffered in memory If the succeeding packet of the fragment arrives, the scanning engine will scan the ending edge of the fragment and then the packet In one edge buffering scheme, the system
stores l-1 data bytes at the starting edge of each TCP fragment, and the final state of the
Finite State Machine (FSM) of the matching engine Similar to the Two-edge buffering scheme, the preceding packet of a fragment is scanned first and then the starting edge
of the fragment But when the succeeding packet of the fragment arrives, the system only restores the FSM and then scans the packet However, this technique can only be applied to static scanning engine because the maximum length of a pattern is priory known In practice, many applications use both static pattern and regular expression (RE) The length of a string, which matches an RE, cannot be priory known; Therefore, this technique is not applicable to RE The advantages of this method are that the system does not need to re-order packets This system also solve the problem of packet normalization, this is to prevent the situation of inconsistent retransmission of packets
3.3 Robust TCP reassembly for backbone traffic [1]
In another approach [1], the authors use a buffer for each out-of-sequence connection The size of the buffer is fixed and every out-of-sequence connection has only one buffer If an out-of-sequence TCP packet comes, its sequence number is used to
Trang 34compute the offset from the start of the buffer to store the packet This method is not efficient because a large packet cannot be contained in a buffer, but a tiny packet can waste a lot of memory in the buffer This method may require a lot of memory, and does not support large number of concurrent out-of-sequence connections For example, the system has to hold just 64 thousands connections simultaneously, the percent of out-of-sequence connections is 5%, and the recommended buffer size is 64KB, so the total necessary memory is about 205MB
3.4 TCP reassembly for Sachet IDS [5]
Using the similar approach of buffer, the TCP reassembly in [5] uses a linked list to store out-of-sequence packets, but the control information of a packet is store in SRAM Moreover, the data structure of reassembly memory is a linked list of separate packets This structure is not memory-efficient, because the system has to reserve a memory block which is large enough to store the largest packet; in this case the largest Ethernet packet is 1500 bytes This buffer is reserve for only one packet, so two or more small packets cannot share the same buffer This system supports quite few connections simultaneously It requires 1 MB SRAM and 93.75 MB DRAM to hold 64K packets, so the number of connections can be fewer
3.5 Robust TCP stream reassembly of Sarang Dharmapurikar and Vern Paxson [2]
Sarang Dharmapurikar and Vern Paxson in [2] limited the number of holes in a connection to only one hole; it means that if a packet arrives and creates another hole
in the same connections, it will be dropped The system also uses linked list to store out-of-sequence packets, which are all stored in DRAM In this system, the memory is divided into blocks; each packet can be stored in more than 2 blocks, 1 block can