Research and implement a preprocessor for network intrusion detection system NIDS

Introduction

Motivation

In today's digital landscape, the network is essential for nearly all organizations, driving the rapid growth of e-commerce and enabling e-government initiatives in various countries Many businesses and educational institutions rely on networks for effective communication with staff and customers Given this critical importance, network security has become a pressing concern, as networks are often the most vulnerable aspects of an organization The rise in cybercrime, which has escalated significantly in recent years, underscores the need for robust protection against such threats According to Interpol, the global cost of cybercrime has reached a staggering $8 billion, highlighting the urgent need for enhanced network security measures.

Cybercrime, particularly in 2007 and 2008, often involves the use of computers and networks to execute illegal activities, such as Denial of Service (DoS) attacks These attacks predominantly utilize the Transmission Control Protocol (TCP), which accounts for 85% of network traffic, making it a common target for cybercriminals To safeguard information systems from such intrusions, Network Intrusion Detection and Prevention Systems (NIDS/NIPS) have been developed However, the TCP protocol's nature allows packets to arrive in varying sequences, complicating detection efforts While traditional NIDS/NIPS can identify intrusion patterns within single packets, they struggle to detect attacks when these patterns span multiple packets and arrive out-of-sequence, necessitating the re-ordering of these TCP packets for effective detection.

Author Tran Huy Vu highlights the challenges faced by NIDS/NIPS due to high network speeds exceeding 1Gbps and the potential for numerous concurrent connections, which can lead to memory exhaustion Research indicates that full TCP reassembly can demand significant memory resources, reaching up to 2GB for each 1Gbps connection Consequently, the development of a TCP Reassembly Engine (TCPRE) is essential, as it must support high throughput, efficiently monitor a substantial number of concurrent connections, and optimize memory usage.

FPGA technology, first introduced in the late 1980s primarily for hardware design prototyping, has rapidly evolved to address various hardware challenges Recent advancements have led to the release of high-speed FPGAs that meet the demanding requirements of modern hardware designs Beyond their impressive speed, these FPGAs offer excellent reconfigurability, making them ideal for parallel processing and pipelining, while also providing a cost-effective solution for developers.

The growing advantages of FPGAs have led to their increasing application in addressing global network issues Numerous studies have focused on reassembling out-of-sequence TCP packets using FPGAs, which can be categorized into three distinct methods One such method involves dropping out-of-sequence packets, highlighting the diverse approaches researchers are exploring to enhance network performance.

(2) buffering out-of-sequence packets and (3) out-of-sequence matching (for TCP stream scanning) All these systems use FPGA to implement the design

Figure 1-1 Out-of-sequence packets passing an NIDS

This http: //www attack is an pattern NIDS

This should be reassembled as “this is an attack pattern”

Statement of problem

The rapid expansion of networks, primarily utilizing the TCP/IP protocol, drives the development of advanced network applications Modern FPGA releases provide high-speed, high-throughput solutions for these applications, many of which require the reassembly of TCP packets to enhance efficiency and robustness One such application is Network Intrusion Detection Systems (NIDS), which employ deep packet inspection techniques, including static matching and Perl Compatible Regular Expression (PCRE) matching This thesis focuses on creating a TCP preprocessor system designed to efficiently support both matching schemes The primary function of this TCP preprocessor is to analyze packet protocols and forward supported packets, such as UDP and TCP, to the NIDS application circuit Additionally, it reorders TCP packets and reassembles TCP flows before passing them to the NIDS, while managing line traffic and maintaining transparency for the user.

Figure 1-2 Deployment model of the

Deep Packet inspection (Content processing) Packet decoder

Contribution

This thesis presents a novel TCP reassembly method that combines the strengths of both out-of-sequence packet buffering and out-of-sequence matching techniques, offering significant advancements in data reconstruction efficiency.

 It proposes a new method of TCP reassembly, supports both TCP re-ordering and flow reassembly

 It proposes a new data structure to manage the reassembly memory efficiently which supports buffering multi-hole connections

 Preprocessor is implemented on FPGA platform This Preprocessor supports hundreds thousands of concurrent connections and tens of thousands of multi- hole connections

Organization

The thesis is organized as follows

 Chapter 2 states some background knowledge about network, the TCP/IP protocol and the TCP reassembly, Network Intrusion Detection system is also mentioned

 Chapter 3 briefly describes the related researches in the world

 Chapter 4 explains our method of TCP reassembly and the data structure which is used in this thesis

 Chapter 5 presents the implementation of our technique on targeted hardware platform

 Chapter 6 is our experimental result and evaluation

 Chapter 7 is our conclusion and the future work

Background

Transmission Control Protocol

In networking, data transmission between machines involves multiple protocol layers before reaching the physical medium The two primary communication models are the OSI model and the TCP/IP model, with the latter being the most widely adopted globally The TCP/IP model consists of four layers, each containing various protocols, as depicted in Figure 2 When data is transmitted from one layer to a lower layer, it is encapsulated with extra information, allowing it to be correctly unpacked at the destination machine.

The lowest level of the network architecture is associated with the physical layer, which includes technologies such as optical fiber, twisted pair cables, and coaxial cables Its primary function is to encode and transmit data from the internet layer to the transmission media, as well as to receive and decode data coming from the transmission media.

{Ethernet, Token Ring, Frame Relay, ATM,…}

Figure 2-1 TCP/IP model and packing data of TCP packet

The internet layer encompasses various protocols such as Ethernet, Token Ring, and ATM, which facilitate data transmission between machines within the same network Among these, Ethernet is the most prevalent protocol Building upon the network layer, protocols like the Internet Protocol (IP) enable data transfer between devices across different networks Before transmission, data is encapsulated with an IP header, which includes essential components such as the Version, which is always 4 for IPv4, as well as IP Header Length (IHL), Type Of Service (TOS), and Total Length.

When an IP packet travels from a network with a large Maximum Transmission Unit (MTU) to one with a smaller MTU, it may need to be fragmented into smaller packets for successful transmission The 3-bit Flags field indicates whether the packet is fragmented, with a value of 2 signifying that it is not fragmented and a value of 4 indicating fragmentation In fragmented packets, the Identification field identifies the sub-packet, while the Fragment Offset field specifies the position of the first data byte relative to the original packet Additionally, the Time To Live (TTL) field determines the maximum number of hops the packet can make through various stations The Protocol field specifies the protocol used in the next higher layer, and the Header Checksum is a 16-bit value calculated solely for the header Each device communicating via IP is assigned a 32-bit IP address (IPv4) or a 128-bit IP address (IPv6), and the Options field contains optional information that can be omitted, typically padded with zeros to ensure 32-bit alignment.

Version IHL TOS Total Length

Source IP Address Destination IP Address

In the TCP/IP model, the TCP protocol operates at the transport layer and is built upon the connectionless Internet Protocol (IP) Unlike IP, TCP is connection-oriented, requiring both endpoints to establish a connection before data transmission Each TCP connection is uniquely identified by the source and destination IP addresses in the IP header, along with the source and destination ports in the TCP header, which represent the respective applications handling the data Data is segmented into smaller packets, if necessary, and each packet is transmitted in the original order using a 32-bit sequence number The acknowledgment number indicates the next expected sequence number at the destination The TCP header includes an offset field for the first data byte, a Flags field with commonly used flags such as SYN, ACK, and FIN, and a Window size field for flow control The Urgent Pointer indicates the offset to the last byte of urgent data, applicable only when the URG flag is set Additionally, the Options field may contain optional information and is padded for 32-bit alignment.

Offset Resv Flags Window size

In network infrastructure, packets can traverse multiple routers and may be dropped due to errors, necessitating retransmission from the source machine To ensure all packets reach the destination efficiently, flow control mechanisms like Go-back-N ARQ and Selective Repeat ARQ are utilized Go-back-N ARQ retransmits a packet and all subsequent packets if an acknowledgment is not received in a timely manner, which can lead to network congestion due to excessive retransmissions In contrast, Selective Repeat ARQ allows the receiver to buffer packets and enables the transmitter to resend only the unacknowledged packets, optimizing bandwidth usage However, this method requires sufficient buffering capacity, with a maximum window size allowing for up to 1GB of data to be buffered in each connection direction.

Figure 2-4 the Flow control mechanism

To transmit data between machines using the TCP protocol, a connection must first be established through a process known as the three-way handshake This begins when the client sends a TCP packet with the SYN flag set and a random Initial Sequence Number (ISN) The server then responds with a packet that has both the SYN and ACK flags set, with the acknowledgement number equal to ISN_A + 1 and its own random sequence number, ISN_B Finally, the client sends a packet with the ACK flag set, acknowledging ISN_B Once this handshake is complete, both the client and server can begin to send and receive data.

TCP reassembly

Due to transmission errors and retransmission mechanisms, packets may arrive at a receiver out of their original order However, the initial TCP packet, known as the SYN packet, is always received in the correct sequence, ensuring that it is in-sequence.

A TCP packet is considered in-sequence when its sequence number matches the next expected sequence number, also known as the ACK number, of the last in-sequence packet Conversely, if the sequence number does not match, the packet is classified as out-of-sequence.

Figure 2-5 Three way hand shaking in TCP connection

In TCP communication, consecutive missed packets are referred to as a TCP hole, while consecutive successful packets form a TCP segment A connection can experience multiple concurrent holes, with each hole consisting of one or more missed packets, as illustrated in Figure 2-6.

When a packet encounters a hole in data transmission, it can fill the hole in five distinct ways In the first scenario, depicted in Figure 2.7a, an in-sequence packet narrows the hole but does not completely fill it Figure 2.7b illustrates a situation where an in-sequence packet fully occupies the hole, allowing the first out-of-sequence packet to become in-sequence and ready for application processing Figures 2.7c and 2.7d show out-of-sequence packets that either prepend or append to a segment, respectively, which also results in a narrower hole Finally, in Figure 2.7e, an out-of-sequence packet fills a hole and is adjacent to both segments, effectively merging them into a single segment.

Coming packet In-sequence packets hole segment

Figure 2-6 out-of-order TCP packets passing an NIDS

Figure 2-7 Five situations of TCP hole filling up

SYN Packet0 Packet1 Packet2 Packet3 Packet4 Packet5 Packet6 Packet7

Out-of-sequence hole segment

Reordering TCP packets alone is insufficient for a Network Intrusion Detection System (NIDS) to identify attack patterns that span multiple packets; these packets must be reassembled to create a logically continuous TCP flow For applications utilizing a Finite State Machine (FSM), interleaved packets can be assembled in the correct order by saving and restoring the FSM state at the beginning and end of a segment However, many applications do not implement an FSM, necessitating flow reassembly through the loading and storing of overlapping data at the packet edges.

Network intrusion detection system

An Intrusion Detection System (IDS) is a crucial software or hardware solution designed to monitor networks for unauthorized intrusion activities There are different types of IDS, including Network-based Intrusion Detection Systems (NIDS), Host-based Intrusion Detection Systems (HIDS), and Network-based Intrusion Protection Systems (NIPS) NIDS are typically deployed on backbone networks to oversee all traffic to and from the protected network, ensuring enhanced security and threat detection.

Figure 2-8 NIDS and the deployment model

A HIDS is usually installed on any host of a network which needs to be protect [12,

13], the name also indicates that a HIDS only monitors the host on which it is installed, and it does not monitor the entire network

Figure 2-9 HIDS and deployment model

A Network Intrusion Prevention System (NIPS) is more effective than a Network Intrusion Detection System (NIDS) because it not only monitors network traffic but also takes action by dropping or redirecting data identified as potential intrusions, while a NIDS merely observes traffic without making any modifications.

Intrusion Detection Systems (IDSs) have recently garnered significant interest from researchers Initially, these systems operated as software on servers or personal computers, which was effective given the lower network speeds of the time However, as network infrastructure has rapidly advanced, the need for more robust IDS solutions has become increasingly critical.

Author Tran Huy Vu highlights that network speeds have now reached tens of Gigabits per second (Gbps), which poses challenges for software to manage such high velocities As a solution, various hardware Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) are being developed to effectively protect networks at line rate.

Snort, an open-source lightweight Intrusion Detection System (IDS) developed by Sourcefire, was first introduced in 1998 as a sniffer and has since evolved into a powerful tool with millions of downloads, making it the most widely deployed IDS available Compatible with various platforms including Windows, Linux, Solaris, and MacOS, Snort is user-friendly and can be configured to function as either a Network Intrusion Detection System (NIDS) or a Network Intrusion Prevention System (NIPS) Its operation relies primarily on a predefined rule set, and its architecture is illustrated in Figure 2-10.

 Sniffer: capturing all packets from the network

 Preprocessor: reassembling TCP flow, defragment the fragmented IP packets

 Detection engine: matching the header and the content of packets with the rule set

 Alert/ Logging: logging packets or generating alerts

Snort operates based on a well-structured rule set that allows both experts and novices to create specific rules tailored to their needs The rule syntax is straightforward, exemplified by the following: `alert udp $EXTERNAL_NET any -> $HOME_NET 5060 (msg:“VOIP-SIP MultiTech INVITE field buffer overflow attempt”; content:”INVITE”; depth:6; nocase; pcre:”/^INVITE\s[^\s\r\n]{60}/smi”; reference:bugtraq, 15711; reference:cve, 2005-4050; classtype:attempted-user; sid:11981; rev:4;)` This simplicity empowers users to effectively monitor and respond to network threats.

In Snort rules, the content and pcre fields are essential for detecting specific patterns within network payloads The content keyword specifies a static pattern, such as "INVITE," prompting Snort to scan the entire payload for this text and generate an alert if found Meanwhile, the pcre keyword utilizes Perl-Like Regular Expressions to identify dynamic patterns within the payload, also triggering an alert upon a match.

2.3.2 NIDS project at Faculty of Computer Science and Engineering

While Snort is a capable Intrusion Detection System (IDS), its performance diminishes on high-speed networks, such as Gigabit lines To address this challenge, various hardware solutions for Network Intrusion Detection Systems (NIDS) have been proposed At the Faculty of Computer Science and Engineering at the University of Technology, a research project is underway to implement an NIDS on an FPGA platform This system utilizes Snort rules to identify intrusion patterns and incorporates both static pattern matching and Perl Compatible Regular Expression (PCRE) matching The detection engine consists of two key components: a packet classification module that categorizes packets based on headers and Snort rules, and a content inspection module that examines packet payloads against Snort rules.

The packet classification module employs the Cuckoo hashing method to efficiently classify packets based on their 5-tuple records, which consist of the source IP, destination IP, source port, destination port, and protocol This classification process is specifically designed for UDP and TCP packets.

The static pattern matching process involves scanning input data and utilizing Cuckoo hashing to match each byte with a defined pattern For longer patterns, the technique developed by Dr Tran Ngoc Thinh is employed, which entails dividing these patterns into smaller sub-patterns ranging from 1 to 16 characters These sub-patterns are then categorized to identify prefixes, infixes, or shorter patterns.

The PCRE matching sub-engine, as introduced in [9], utilizes a Non-deterministic Finite Automaton (NFA) approach that incorporates multiple Sub-RegEx Units for character matching, along with several Constraint Repetition Blocks (CRBs) to effectively handle repetition operators in PCRE.

Matching engines can now identify intrusion patterns within individual packets; however, once a packet is processed, any incoming packet from a different flow causes the matching status of the previous packet to be lost Additionally, these engines lack an explicit finite state machine (FSM), which prevents the system from saving and restoring the FSM when subsequent packets from the same flow arrive Consequently, a preprocessor is required to implement alternative techniques for reassembling flows, such as retaining overlapping data between consecutive packets.

NetFPGA board

NetFPGA, developed by Stanford University, is a cost-effective, reconfigurable hardware platform designed for high-speed networking It features an FPGA chip and multiple Gigabit Ethernet interfaces, making it an ideal solution for advanced networking applications.

The following is the specification of the board NetFPGA 1G:

 Xilinx FPGA: Spartan to control PCI interface and to program the Virtex chip

2.4.2 Design with the reference design

The NetFPGA package offers a variety of reference designs, including a reference router, NIC, DRAM controller, and Ethernet MAC, which significantly streamline the design process for users By utilizing these reference designs, users can save considerable time when working with the NetFPGA board The structure of a reference project is well-defined, facilitating efficient implementation.

 src: contains all verilog code to be synthesized

 synth: contains XCO file, Makefile, and to implement the design

 sw: contains all software programs

 include: contains all header files or files that define macro

Related works

The TCP processor [4]

If a source machine fails to receive an acknowledgment for a TCP packet, it automatically retransmits the packet The TCP Processor utilizes this retransmission mechanism to manage out-of-sequence packets, dropping all such packets since the destination cannot acknowledge them Consequently, the source machine retransmits the missed packet along with all out-of-sequence packets, effectively forcing the flow control mechanism into a Go-Back-N state This method offers simplicity and saves memory but significantly increases network traffic and hampers efficient acknowledgments at the destination terminal The authors opted for this strategy based on statistical evidence indicating that only about 5% of TCP packets are out-of-sequence; however, these connections are often long-lived, leading to a potentially large number of retransmitted packets.

Out-of-order TCP stream scanning [3]

The authors of this research introduce a TCP stream scanning engine that eliminates the need for packet re-ordering, featuring two innovative schemes: the Two-edge buffering scheme and the One-edge buffering scheme In the Two-edge buffering scheme, the system retains l-1 data bytes at both the starting and ending edges of each TCP fragment, allowing the matching engine to scan either edge based on the arrival of preceding or succeeding packets Conversely, the One-edge buffering scheme stores l-1 data bytes only at the starting edge, along with the final state of the Finite State Machine (FSM) of the matching engine, enabling efficient scanning of preceding packets and restoration of the FSM for succeeding packets However, this approach is limited to static scanning engines, as it requires prior knowledge of the maximum pattern length, making it unsuitable for regular expressions (RE) where string lengths are unpredictable Notably, this system addresses the challenge of packet normalization, preventing inconsistent retransmissions and enhancing overall scanning efficiency.

Robust TCP reassembly for backbone traffic [1]

The authors propose a method that utilizes a dedicated buffer for each out-of-sequence connection, where each buffer has a fixed size When an out-of-sequence TCP packet is received, its sequence number is utilized for processing.

Tran Huy Vu discusses the inefficiencies of calculating the offset from the start of a buffer to store packets, highlighting that while small packets can waste significant memory, larger packets may exceed buffer capacity This approach can lead to high memory requirements and struggles to accommodate numerous concurrent out-of-sequence connections For instance, with a system managing 64,000 simultaneous connections and an out-of-sequence connection rate of 5%, a recommended buffer size of 64KB would result in a total memory requirement of approximately 205MB.

TCP reassembly for Sachet IDS [5]

The TCP reassembly method discussed in [5] utilizes a linked list to manage out-of-sequence packets, while storing packet control information in SRAM However, this approach is not memory-efficient, as it necessitates reserving a memory block large enough for the largest Ethernet packet, which is 1500 bytes Consequently, this buffer is dedicated to a single packet, preventing multiple smaller packets from sharing the same space As a result, the system can only support a limited number of simultaneous connections, requiring 1 MB of SRAM and 93.75 MB of DRAM to accommodate 64K packets, thereby reducing the overall capacity for connections.

Robust TCP stream reassembly of Sarang Dharmapurikar and Vern Paxson [2] - 20 -

Sarang Dharmapurikar and Vern Paxson proposed a system that restricts a connection to a single hole, resulting in the dropping of packets that attempt to create additional holes This system employs a linked list to manage out-of-sequence packets, all of which are stored in DRAM The memory is organized into blocks, allowing each packet to be stored across multiple blocks.

In their study, Tran Huy Vu highlights that the system's design is optimized for memory utilization by supporting primarily single-hole connections, as over 95% of out-of-sequence connections fall into this category However, recent findings indicate that 2-hole connections account for 7.3% of out-of-sequence connections in CAIDA_10G and 17.8% in WA_1G The current design compresses buffer information within the connection record, which poses challenges for scaling the system to accommodate multi-hole connections Additionally, the research addresses packet normalization, contributing to the overall robustness of the system.

Method of TCP reassembly

Method for re-ordering TCP packets

Our system utilizes a linked list of memory blocks to efficiently buffer out-of-sequence packets by storing their payloads within the same segment This approach allows multiple small packets to be accommodated in a single block, while larger packets can span across multiple linked blocks Since all packets within a segment are locally ordered, their payloads can be consecutively stored alongside the previous packet's payload, enhancing data organization and retrieval.

Figure 4-1 A segment of packet S, packet S+1, packet S+2 in a linked list

We utilize a segment array to efficiently manage segments, where each element contains information related to a segment's linked list Out-of-sequence connections are assigned a single segment array, while normal connections do not require one This structure allows us to leverage the advantages of DRAM, as read and write operations occur in bursts Our system is designed to support multiple concurrent holes within a single connection, with a practical array size of 4, enabling the handling of up to 4 concurrent holes Research indicates that approximately 99% of out-of-sequence connections feature fewer than 4 concurrent holes, making it more efficient to drop packets when too many holes exist rather than buffering them Consequently, capping the number of holes at the maximum size proves more effective than allowing unlimited concurrent holes In the reassembly memory, a 4-hole out-of-sequence connection is represented, with linked lists storing segments sequentially, as illustrated in Figure 4-2.

Figure 4-2 Data structure of reassembly memory with a 4-hole out-of-sequence connection

Each segment array element contains following information

 Start seq is the sequence number of the first byte of the segment

The next sequence number, referred to as "next seq.," is calculated based on the start sequence number and the length of the segment For instance, if the start sequence number is 10 and the segment length is 20, the next expected sequence number will be 30.

 Head is address of the first byte of the segment in DRAM

 Tail is address of the last byte of the segment in DRAM

To optimize the retrieval of segment data, we organize the segment array so that all sequence fields are grouped together and all address fields are also grouped, ensuring that sequence and address fields of a segment are distinct This structure allows for efficient access, enabling all sequence fields, such as Start seq.0 to Start seq.3 and Next seq.0 to Next seq.3, to be read in a single DRAM access Consequently, the system can quickly compare the sequence number of an incoming packet with these fields to determine the appropriate segment for insertion or retrieval Additionally, this arrangement enhances the system's ability to process both sequence and address fields effectively.

Start Seq.0 Next Seq.0 Start Seq.1 Next Seq.1 Start Seq.2 Next Seq.2 Start Seq.3 Next Seq.3 Head 0 Tail 0 Head 1 Tail 1 Head 2 Tail 2 Head 3 Tail 3 Seg.0

Figure 4-3 Structure of the segment array Each element is divided and store at different place

In the system designed by Tran Huy Vu, the initial stage quickly accesses sequence fields of incoming packets to determine the necessary action, which is then relayed to the second stage for execution This process is efficient, requiring only one access to DRAM The second stage independently handles the more time-consuming operations on the packet payload, allowing for a pipelined reassembly process that does not affect the overall throughput of the system.

In order to manage the data in a segment efficiently We divide memory space into blocks; the structure of a block is simple as in Figure 4-4

 Data len is the number of valid data bytes stored in the block

In a linked list, the next pointer (Next ptr.) indicates the address of the subsequent block It's important to note that the addresses of both the linked list and the Next ptr may not align with the block address This misalignment can happen when a packet's payload is added before an existing linked list.

Each segment's payload is organized in a linked list of blocks When a packet's payload exceeds the block size, it is divided and stored across two or more blocks, with the first block's Next pointer linking to the subsequent block.

Figure 4-4Structure of a memory block If a packet does not used a whole block, others packets in the same segment can fill in the block

Author: Tran Huy Vu block If the payload of a packet is smaller than the block size, the next packet in the same segment can fill in that block

Our system's primary function is to reassemble TCP packets within the same connection, necessitating effective management of each connection's status To achieve this, it maintains a detailed connection record for every connection, which is essential for tracking its status Upon the arrival of a packet, the system accesses the relevant connection record to compare the packet's sequence number with the expected sequence number, allowing it to identify out-of-sequence packets Additionally, each connection record includes the address of its out-of-sequence buffer and the state of the application finite state machine (FSM) We encapsulate all pertinent connection information within a 32-byte record, known as the connection record, which effectively uses 24 bytes, leaving 8 bytes reserved for potential future needs.

In a network connection, the Source IP, Destination IP, Source Port, and Destination Port serve as key identifiers, totaling 96 bits in length However, this length is insufficient for direct memory access To address this limitation, a hashing technique is employed to compute an 18-bit hash value from these four fields upon the arrival of a packet This hash value then functions as the address for accessing the connection record in memory.

Src IP Src Port Dst Port

Figure 4-5 Structure of a connection record

In the context of DRAM connection records, hash collisions necessitate a comparison of the four fields within the packets against the four fields of the connection record This process is essential for verifying whether the connection record pertains to the current connection.

 Sequence field is the next expected sequence number, which is in in-sequence

For example, packet #0 and packet #1 arrive, packet #2 does arrives due to error, packet #3 arrives The sequence number of packet #0 is 1, the length is

10 The sequence number of packet #1 is 11, the length is 20 The sequence number of packet #3 is 51, the length is 20 In this case, the sequence field stored in the connection record is 31, it is the next expected number of the packet #1

The Flags field includes the EST, SYN, and ACK flags, which serve specific purposes in connection management The EST flag indicates the validity of a record, while the SYN flag shows that the source machine has sent a SYN packet, and the ACK flag signifies that the destination machine has responded with an ACK packet during the 3-way handshake When a new connection's initial SYN packet is received, if the EST flag is 0, both the EST and SYN flags are set to 1, and the packet's four fields are recorded If an ACK packet arrives and the EST flag is 1, the ACK flag in the connection record is updated to 1 In cases where a SYN packet arrives and a hash collision occurs, the system checks if the existing connection is still completing its 3-way handshake; if so, the new connection will take over the record; otherwise, the new packet will be discarded.

The Buffer address refers to the location of a segment array used for storing out-of-sequence packets in reassembly memory If a connection is in order, the Buffer address field is set to null However, when the first out-of-sequence packet is received, the system allocates a new segment array and updates the Buffer address with its location.

When the segment array of a connection does not store any data, it is released, and the Buffer address is set to null

The application finite state machine (FSM) represents the state of an application circuit During a context switch, when the application circuit transitions from one connection to another, its FSM is saved in the app FSM field of the current connection and subsequently restored from the app FSM of the new connection.

When an out-of-sequence packet is received, the system can handle it in four ways: by allocating a new segment array and linked list, inserting the packet payload into an existing linked list, merging two existing linked lists, or releasing an existing linked list and segment array These scenarios are depicted in Figures 4-6 to 4-9 However, if the out-of-sequence packet causes the number of concurrent holes in the connection to exceed the segment array size, the packet will be dropped, and no changes will be made to the reassembly memory.

4.1.2.1 Allocating of new segment array and new linked-list

Figure 4-6Creating of new segment

Segment array Segment array null null null null seg.0 null null null lkji lkji dcba Packet.2

Method for reassembling TCP flow

Reassembling TCP flow is essential for ensuring that packets within the same flow are logically consecutive, as discussed in chapter 2 This process is crucial since attack patterns can span multiple packets When utilizing an algorithm like Aho-Corasick, which employs an explicit finite state machine (FSM), the system only needs to retain the FSM at the end of each packet and can easily restore it upon receiving the next packet However, if the NIDS does not implement this method, it may struggle to accurately detect and analyze potential threats.

Figure 4-9 Releasing of e segment array

Segment array null null null null

Block 4 Block 3 vu nm lkji zyxw ts rqpo seg.0 null null null hgfe

In his thesis, Tran Huy Vu implements an explicit Finite State Machine (FSM) for a Network Intrusion Detection System (NIDS) at HCMUT, which necessitates an additional technique for reassembling TCP flows This reassembly process combines the reordering of TCP packets with a modified Two-edge scheme The original Two-edge scheme involves storing the first and last l-1 bytes of the payload, as illustrated in Figure 4-10.

In the system discussed in this thesis, all TCP packets are ordered, eliminating the need to store the first l-1 bytes of payload, as shown in Figure 4-10 b) Consequently, only the last l-1 bytes of payload require storage The process of buffering and loading edge data is illustrated in Figure 4-11.

The original one-edge buffering scheme, with a buffer size of l = 6, demonstrates two scenarios: (a) the arrival of successive packets and (b) the reception of a preceding packet In the first scenario, packet 1 arrives followed by packet 0, while in the second scenario, edge data is scanned in conjunction with packet 2 The buffering process involves managing the headers and edge data efficiently to ensure smooth packet transmission.

Figure 4-11 Modified Two-edge buffering scheme for ordered

System implementation

The Input Controller module

{Src.IP, Dst.IP, Src.Port, Dst Port…}

Src.IP, Sst.IP, Src.Port, Dst Port…

Figure 5-1 Block diagram of the Preprocessor

The Input Controller functions as an extractor by receiving packets from the network and forwarding them to the Packet Manager, while buffering them for other modules to access the reassembly buffer It concurrently extracts information from packet headers, first validating if the packet is an IP packet by setting signals for protocol, source IP address, and destination IP address For TCP and UDP packets, it further establishes signals for source and destination ports, and for TCP packets, it validates flags, sequence, and acknowledgement Additionally, the Input Controller calculates the TCP checksum and notifies the Flow Controller of any checksum errors.

The Input controller on the NetFPGA board processes packets from a pre-designed receive FIFO, which is linked to a Tri-mode Ethernet MAC module that manages the Ethernet port interface.

 rx_ll_data: 8-bit data

 rx_ll_src_rdy_n: Active-low signal, this signal indicates the availability of rx_ll_data

 rx_ll_dst_rdy_n: Active-low signal, this signal informs the FIFO to output the next data

 rx_ll_sof_n: Active-low signal, it indicates the start-of-frame flag

 rx_ll_eof_n: Active-low signal, it indicates the end-of-frame flag

The Packet Manager module

The Packet Manager operates as a FIFO buffer to manage packet flow between the Input and Output controllers Unlike a standard FIFO, it utilizes a dual-port block RAM for efficient packet handling This system employs two pointers to represent the head and tail of a circular FIFO, where incoming data is written to the block RAM at the tail's address, which is then incremented Conversely, outgoing data is read from the head's address, followed by an increment of the head pointer Additionally, a separate FIFO tracks the lengths of packets, enabling the Packet Manager to swiftly drop packets by reading the length from this FIFO and adjusting the head pointer accordingly.

The Flow Controller module

Cmd, parameter Reassembler Connection record

Figure 5-4 The Flow Controller req

The Flow Controller manages connection records and directs the Reassembler in handling reassembly memory It receives four key header fields—source IP, destination IP, source port, and destination port—from the Input Controller By calculating a hash value from these fields, it accesses the connection record through the Memory Controller In cases of hash collisions, the resolution process is detailed in Chapter 4 If the packet's fields match those in the connection record, the Flow Controller evaluates the sequence numbers to determine the appropriate action, which may include dropping, forwarding, buffering, or sending the packet to the application.

The Flow controller evaluates packets by comparing their sequence numbers to those stored in the connection record; packets matching the sequence number are deemed in-sequence It then instructs the Reassembler to forward the packet to the application circuit Additionally, the Flow controller examines the segment array for any sequence numbers that match the next expected packet sequence If consecutive segments exist, the Flow controller provides extra parameters to the Reassembler, specifying which segment should be read.

When a packet's sequence number is lower than the sequence number in the connection record, it is identified as a retransmission Consequently, the Flow Controller instructs the Reassembler to send it back to the network.

An out-of-sequence packet is identified when its sequence number exceeds that recorded in the connection If the buffer address in the connection record is null, the Flow controller will create a new segment array It then examines this array to determine if the packet's sequence number is consecutive with any existing segment If a match is found, the Flow controller will instruct the Reassembler to integrate the packet's payload into that segment.

In the process of packet insertion, the additional parameter specifies the segment for insertion If a matching segment is unavailable, the first null segment will be selected to accommodate the packet payload However, if there is no available space for the packet, it will be discarded.

 If the checksum result indicates an error in the packet, it is dropped, because the destination will also drops it, and the source machine will retransmit the packet

The Reassembler module

The Reassembler is responsible for managing reassembly memory, as detailed in Chapter 4 It receives commands from the Flow Controller and instructs the Output Controller to either read or remove packets When a command is received to send a packet to the application circuit, additional parameters specify the action required.

Out-of- sequence data Reassembler req

In the system described by Tran Huy Vu, when a command is received to buffer a packet, additional parameters specify the insertion position It employs three FIFO structures: Rsm.FIFO for buffering incoming packets, Dsm.FIFO for managing packets read from memory, and App.FIFO for handling packets from the Output Controller Once a packet is fully read from the Pkt FIFO and sent to the application, the Dsm.FIFO is checked for any available data Additionally, the reassembler can instruct the Output Controller to discard a packet if it results in more than two holes in the connection or fails the checksum verification.

The Memory controller module

The main function of Memory controller is to arbitrate read and write requests from

The Flow Controller and Reassembler manage the allocation and release of memory blocks, each sized at 1KB, where the actual packet payload is stored The segment array, consisting of 32 bytes, is also handled by the Memory Controller, which utilizes two First-In-First-Out (FIFO) queues: one for memory blocks and another for segment arrays When a memory block or segment array is requested, the Memory Controller retrieves the address from the appropriate FIFO Conversely, when these elements are released, their addresses are written back into the corresponding FIFO for future allocation.

Author: Tran Huy Vu we change to another platform with more memory, we can implement a FIFO in DRAM and use a FIFO in Block Ram as cache of DRAM

To interface with the DDR2 SDRAM on the NetFPGA board, the Memory controller uses a pre-design module, the ddr_controller in the reference_design of NetFPGA package.

The Output controller module

The Output Controller manages packet handling by instructing the Packet Buffer to either remove or read packets Packets can be transmitted to both the network and the Reassembler, or exclusively to the network In-sequence packets are directed to both destinations, while out-of-sequence packets are sent solely to the Reassembler, and retransmitted packets are forwarded only to the network Additionally, any packets failing the checksum verification are removed from the buffer at the request of the Output Controller.

Packet manager req cmd packet

Evaluation of the TCP Reassemble Engine

Deployment model

The Figure 6-1 describes the deployment model of a full system, in which the preprocessor and the NIDS are integrated to a NetFPGA board This model requires a

Figure 6-1 Deployment model of the preprocessor and NIDS

Author Tran Huy Vu emphasizes that a comprehensive system test will be conducted once all system components are complete In the meantime, the Preprocessor can be tested individually using a simplified model, as illustrated in Figure 6-2.

Utilizing the testing model depicted in Figure 6-2, the system effectively reorders out-of-sequence packets while efficiently storing and loading edge data Additionally, for batch testing, we simulate our design across various data patterns As illustrated in Figure 6-3, the incoming packet consists of 32 ending bytes.

Figure 6-4 illustrates that the next in-sequence packet is prefixed with 32 bytes for edge storage testing The Post-Route simulation yields accurate results when our design is tested across various data patterns.

Figure 6-3 The incoming paket, rx_ll_data holds the data

Figure 6-2 Individual test for the

Figure 6-4 The payload of the output packet is inserted with the last 32 bytes of the previous packet

Experimental result

In this section, we compare our system with other existing systems based on experimental data, excluding the system referenced in [3] due to its use of out-of-order matching, which eliminates the need for reordering out-of-sequence packets We assume that the network traffic aligns with the CAIDA_10G dataset from [1], which provides statistics recorded by the Cooperative Association for Internet Data Analysis in 2009.

Our system demonstrates a remarkable ability to support 96.9% of out-of-sequence connections, outperforming the 89.6% achieved by the system in [2] Additionally, it can seamlessly scale to accommodate 4-hole connections, with the potential to exceed 98.8% in supported out-of-sequence connections While theoretical methods like the fixed length buffer [1] and simple linked list [5] can handle more than 4 concurrent holes, such connections are rare Therefore, it is more practical to drop packets that create over 4 holes in a connection.

Table 6-1 Percentage of supported connection types of the TCP Reassembly

Number of holes percent age

Simple linked list [5] Our system

Table 6-2 Memory utilization of the TCP Reassembly Engine and other systems for single-hole connections only

No of out of sequence packets in a single hole connection

Simple linked list [5] Our system

Table 2 presents a comparison of reassembly memory utilization between our system and others, focusing specifically on single-hole connections due to the limitations of the design in reference [2] The first column indicates the number of out-of-sequence packets for these connections Utilizing statistical data from reference [1], we calculate memory utilization based on a mean packet size of 441 bytes In the fixed-length buffer method described in reference [1], each out-of-sequence connection requires a fixed buffer size, with experimental results indicating a minimum buffer size of 16KB for handling 64K packets.

The system designed by Tran Huy Vu requires 1024MB of memory, utilizing a 1-hole linked list method with a page size of 2KB, as detailed in Table 6-2 If a 1KB page size is implemented, the memory requirement decreases slightly to around 2MB; however, this method is limited to handling connections with only one hole In contrast, the simple linked list method necessitates a larger memory allocation to accommodate the maximum packet size of 1500B According to Table 6-2, our system demonstrates superior memory efficiency compared to these alternative methods.

Figure 6-5 Maximum throughput of the system when percentage of out-of- sequence packets is 0%

0 500 1000 1500 2000 edge length = 8 edge length = 16 edge length = 32

Figure 6-6 Throughput of the system with clock rate5MHz when percentage of out-of-sequence packets is 0%

0 500 1000 1500 2000 edge length = 8 edge length = 16 edge length = 32

0 200 400 600 800 1000 1200 1400 1600 edge length = 8 edge length = 16 edge length = 32

Figure 6-8 Throughput of the system with clock rate = 125MHz when percentage of out-of-sequence packets is 5%

Figure 6-10 Throughput of the system with clock rate = 125MHz when percentage of out-of-sequence packets is 10%

The throughput of the TCP Reassembly Engine is influenced by several factors, including the average packet length, edge data length, and the percentage of out-of-sequence packets, alongside the application's ability to process multiple bytes per clock cycle Currently, the targeted Network Intrusion Detection System (NIDS) developed at the University of Technology can only handle 1 byte per clock cycle, creating a bottleneck The system is tested with a higher ratio of short packets, as the ratio of edge length to packet length is larger with shorter packets, leading to more variable throughput The packet payload lengths analyzed include 10, 20, 50, 100, 500, 1000, and 1400 bytes, while edge lengths of 8, 16, and 32 bytes are utilized, with plans to increase edge data capacity when transitioning to a new platform with enhanced memory Typically, the percentage of out-of-sequence packets is less than 10% Throughput is currently measured through the implementation of two methods.

In this article by Tran Huy Vu, two counters are utilized to measure network performance: the first counter tracks the clock ticks for receiving all packets, while the second counts the clock ticks for transmitting them The throughput is determined using the formula T = counter0 * 1000 / counter1.

The maximum throughput and onboard throughput of the system are illustrated in Figures 6-3 to 6-8 at various load levels of P = 0%, 5%, and 10% The results indicate that longer packet sizes enable the system to support higher throughput, while shorter edge data also leads to increased throughput capabilities This analysis is based on sending packets to the full system and gradually increasing speed until packet loss occurs.

Figure 6-11 Number of rules with different length

Figure 6-6 illustrates the distribution of rule lengths in the NIDS rule set utilized by HCMUT, which currently employs a subset of the Snort rule set Notably, over 80% of the rules are shorter than 32 characters Additionally, Figures 6-3 to 6-5 indicate that a rule length of 32 does not significantly impact throughput; however, final determinations regarding rule length will be made following comprehensive system testing, scheduled to occur once the full system is operational.

Conclusion and future work

This paper introduces an efficient TCP reassembly technique utilizing a multi-linked list method for managing out-of-sequence packets Our system is capable of handling approximately 256,000 concurrent connections and 46,000 out-of-sequence connections with only 64MB of DRAM It effectively supports connections with multiple concurrent holes, accommodating up to 99% of out-of-sequence connections Additionally, adjusting the system to support more concurrent holes is as simple as modifying the size of the segment array, allowing for easy scalability.

In the future, we plan to integrate a DoS prevention feature into our TCP preprocessor, enhance the Flow Controller's mechanism for handling hash collisions, and implement a function to prevent inconsistent re-transmissions, thereby strengthening our system against potential attackers.

In their 2010 paper presented at the Fifth IEEE International Conference on Networking, Architecture and Storage, Ruan Yuan, Yang Weibing, Chen Mingyu, Zhao Xiaofang, and Fan Jianping propose a hardware-based solution for robust TCP reassembly aimed at enhancing backbone traffic management The authors address the challenges of TCP reassembly in high-speed networks, presenting an innovative approach that improves efficiency and reliability in data transmission Their findings contribute significantly to the field of network architecture, offering insights into optimizing performance for backbone traffic systems.

[2] Dharmapurikar, S., & Paxson – Robust TCP Reassembly in the Presence of Adversaries, Proceedings of the 14th conference on USENIX Security Symposium Volume 14, 2005, pp.65-80

[3] Sugawara Y., Inaba M., Hiraki, K – High-speed and Memory Efficient TCP Stream Scanning Using FPGA, Field Programmable Logic and Applications International Conference, 2005, pp.45-50

[4] David V Schuehler - Techniques for Processing TCP/IP Flow Content in Network Switches at Gigabit Line Rates, Doctoral Dissertation, December

[5] Palak Agarwal – Tcp Stream reassembly and web base gui for Sachet IDS, Master Thesis, Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India, 2007

[6] Hao Chen, Yu Chen, Douglas H.Summerville – A survey on the Application of FPGAs for Network Infrastructure Security, the IEEE Communications Surveys and Tutorials, 2010, pp.1-21

[7] S Jaiswal, G Iannaccone, C Diot, J Kurose, and D Towsley Measurement and classiﬁcation of out-of-sequence packets in a tier-1 IP backbone Technical Report CS Dept Tech Report 02-17, UMass, May 2002, pp.54-66

[8] T.N.Thinh and S.Kittitornkun Massively parallel Cuckoo pattern matching applied for NIDS/NIPS Fifth IEEE International Symposium on Electronic Design, Test and Application, 2010, pp.217-221

[9] Master thesis of T.T.Hieu, L.H.Long, V.T.Tai, Research, design and implement a Regular Expression processing system on FPGA for the Network Intrusion Detection System NIDS

[10] http://www.interpol.int/Crime-areas/Cybercrime/Cybercrime

[11] http://en.wikipedia.org/wiki/Computer_crime

[12] http://www.real-time.com/linuxsolutions/nids.html

[13] http://ciscosecurity.org.ua/1587051672/ch10lev1sec2.html

[14] http://www.sourcefire.com/security-technologies/snort on Information and Communication Technology 2011 (SoICT 2011), Hanoi, Vietnam, October

2 Tran Huy Vu, Tran Ngoc Thinh, Nguyen Quoc Tuan, Nguyen Tran Huu Nguyen, “AN

EFFICIENT TCP REASSEMBLY TECHNIQUE ON FPGA” International Conference on

Advanced Computing and Applications 2011 (ACOMP 2011), Ho Chi Minh city, Vietnam,

3 Tran Huy Vu, Tran Ngoc Thinh, Nguyen Quoc Tuan, Nguyen Tran Huu Nguyen, “AN

EFFICIENT TCP REASSEMBLY TECHNIQUE ON FPGA” Vietnamese Academy of Science and Technology, Journal of Science and Technology, ISSN0866708x, Vol 49, No.4A, 2011, pp

Name Tran Huy Vu Date and Place of Birth 09 Dec.1986

Dept of Computer Engineering, Faculty of Computer Science & Engineering, Ho Chi Minh City University of Technology (HCMUT)

Address: Block A3, 268 Ly Thuong Kiet Street, District 10, Hochiminh City, Vietnam

Tel 84-8-3856489- ext 5843 Fax E-mail: vutran@cse.hcmut.edu.vn

2-1 Academic Qualification (Repeat as necessary)

Name of Institution: Ho Chi Minh city University of Technology (HCMUT)

Number of years of experience in the field related to the project: 3 years

Field of specialization: chip design

Tiêu đề	Research And Implement A Preprocessor For Network Intrusion Detection System NIDS
Tác giả	Trần Huy Vũ
Người hướng dẫn	TS. Trần Ngọc Thịnh
Trường học	HCMC National University University Of Technology
Chuyên ngành	Computer Science
Thể loại	graduate thesis
Năm xuất bản	2011
Thành phố	Ho Chi Minh City

Định dạng
Số trang	69
Dung lượng	1,61 MB