3 Course Objectives • Introduce the fundamentals of tcpdump • Explain how to write tcpdump filters • Examine fields in datagram for uses/misuses • Analyze traffic by placing it in catego
Trang 1Introduction to tcpdump
All material Copyright Novak, 2000, 2001 All rights reserved
Trang 2Writing tcpdump Filters
Examination of Datagram Fields
Beginning Analysis
Real World Examples
Step by Step Analysis
References
Trang 33
Course Objectives
• Introduce the fundamentals of tcpdump
• Explain how to write tcpdump filters
• Examine fields in datagram for uses/misuses
• Analyze traffic by placing it in categories
• Demonstrate “real-world” analysis using
tcpdump
• Let you participate in the analysis process
The objectives of this course are to introduce you to the fundamentals and benefits of using tcpdump
as a tool to analyze your network traffic We’ll start with introducing concepts and output of tcpdump One of the most important aspects of using tcpdump is being able to write tcpdump filters
to look for specific traffic Filter writing is fairly basic unless you want to examine fields in an IP datagram that don’t fall on byte boundaries So, that is why an entire section is devoted to the art of writing filters
Before we start to use tcpdump to analyze traffic, we’ll examine many of the fields found in the IP datagram This is done to familiarize you with those fields in theory and also how they might be used in practice We’ll study how and why fields might be changed and for what purpose Next, we’ll start the basic analysis process by looking at tcpdump output and categorizing the kind of traffic that you can see
Then, we’ll take a look at some real-world examples and of how tcpdump was used on monitored networks to discover what was happening Next, the analysis process will be inspected step by step often with missteps to get you comfortable with it
As a note, all tcpdump output shown in this course is activity that actually occurred Source and destination hosts/IP’s have been altered to obfuscate the true identities
Trang 44
Overview
• Introduction to tcpdump
• Writing tcpdump filters
• Examination of Datagram Fields
• Beginning Analysis
• Real World Examples
• Step by Step Analysis
This page intentionally left blank
Trang 55
Introduction to tcpdump
• Introduction to tcpdump
• Writing tcpdump Filters
• Examination of Datagram Fields
• Beginning Analysis
• Real World Examples
• Step by Step Analysis
This page intentionally left blank
Trang 66
Objectives
• Examine the strengths/weaknesses of tcpdump
• Organize collection/analysis process of tcpdump data via
• Interpretation of payload/hex output
This page intentionally left blank
Trang 77Introduction
This page intentionally left blank
Trang 8• Provides absolute fidelity
• Universally available and used
A
One of the most important parts of an arsenal in your security infrastructure is at least one tool or software package that captures an audit trail or a historical record of the traffic that enters or leaves your network There will be times when you will be required to examine activity or connections that occurred in your network – not just traffic that caused an alarm to sound For instance, what if you suspect that your packet filtering router that acts as your perimeter defense was acting strangely after some major network changes were made You would have to examine the traffic that was allowed into your network to assist in determining the problem That is where tcpdump is invaluable
Also, many tools - even logs from firewalls will display suspicious traffic, yet only partial data is displayed What if you get a log of rejected traffic, but it doesn’t display or keep TCP flags? You’ll never know what kind of connection was attempted tcpdump allows the analyst to examine all the bits and fields that are collected If nothing is “wrong” with the connection, examination at the bit level is unnecessary Yet, if you suspect something “foul” with the traffic, you really need access to all the data down to the bit level
And tcpdump is a tool that is universally used and very portable If you become familiar with this software or its Windows counterpart, windump, it can be used on just about any platform to assist you in analysis of traffic
Trang 99
Weaknesses
• By default, doesn’t collect all the payload
• Does not scale well on large networks
tcpdump can collect a large volume of data for larger networks This can be alleviated by not collecting all the data on the network – perhaps omit web traffic (port 80) Or, another way to deal with this is more disk space and faster processors to analyze all the collected data But, at some point, the volume gets unwieldy
tcpdump blindly collects packet after packet It has no idea of state or being able to know that a given packet is anomalous because it does not follow the flow of a normal connection And while tcpdump has some primitive arithmetic operations or ways to manipulate bits, it cannot do complex operations for analyzing data
Finally, while it is an excellent way to collect data, tcpdump does not attempt to make interpretations of what it sees It does have some integrity checking operations for certain data to make sure that the data is not irregular, but the analyst has to have the training and savvy to interpret the data For the sophisticated analyst, this is a bonus because she or he can make the correct call Compare this with a tool that is prone
to false positives that gives no way of verifying the alarmed event But, for an analyst who has little training, tcpdump can be daunting since it does not interpret events
Trang 1010
tcpdump Versions
• tcpdump: Unix version; official current version 3.4
• ftp://ftp.ee.lbl.gov/ tcpdump tar.Z
• ftp://ftp.ee.lbl.gov/ libpcap tar.Z
• windump: Windows version
tcpdump is officially supported by the Lawrence Berkeley Labs The current version is 3.4 There is
an effort to improve tcpdump and patch known problems with tcpdump and libpcap that appears to
be a collective effort of anyone interested The software for this effort can be found at
www.tcpdump.org Their current version is 3.5
For the Unix versions of tcpdump, you need to download software known as libpcap that implements
a portable framework for capturing low-level network traffic windump is a Windows variant of tcpdump It also requires an application program interface to collect the traffic known as winpcap
The unofficial version of tcpdump has some nice enhancements It decrypts more of the applications
at the application layer and has a very nice capability of converting hexadecimal payload to
character output
Trang 1107:00:48.036776 myhost.com > ping.net: icmp: echo reply (DF)
07:02:12.622460 log.net.3155 > syslog.com.514: udp 101
07:03:01.132414 send.net.32938 > mail.com.25: S 248631:248631(0) win 8760
tcpdump running on a host
“sniffing” network packets
We see on this slide, a host running tcpdump and gathering records from the network interface
We see the records that tcpdump has collected below tcpdump has a default standard output based
on the protocol (TCP, UDP, ICMP) of the record that is displayed While each of the various protocols has a similar format to the other, they are also distinct in what is displayed
By default, tcpdump will collect and print, in a standard format, all the traffic passing on the network There are command line options for tcpdump that will alter the default behavior, either
by collecting specified records, printing in a more verbose mode, printing in hexadecimal or writing records as “raw packets” to a file instead of printing as standard output
Trang 1212
Sample tcpdump Output
Sample UDP Record
09:39:19.470000 nmap.edu.728 > dns.net.111: udp 56
timestamp source port dest port : protocol bytes
Sample TCP Record beginning seq # data bytes
09:35:53.660000 nmap.edu.4 > dns.net.111: SF 136747297:136747297(0) win 1028
flags ending seq # 09:32:43.910000 nmap.edu.1171 > dns.net.139: S 2490962508:2490962508(0) win 512
09:32:43.910000 nmap.edu.1173 > dns.net.21: S 62697789:62697789(0) win 512
09:32:43.910000 nmap.edu.1193 > dns.net.22: S 1360146849:1360146849(0) win 512
09:32:43.920000 nmap.edu.1194 > dns.net.1114: S 372884098:372884098(0) win 512
Since we’ll review a lot of tcpdump output in this course, here’s a chance to get more comfortable with it This is sample output from what appears to be an nmap scan; a popular and informative scan
All records have a timestamp The sensor host (Redhat Linux 5.2) that captured these records has the precision to capture hundredths of seconds although tcpdump allows places for up to millionths
Different protocols will have different representations in tcpdump output One of the first challenges
is to identify the protocol (TCP, UDP, ICMP) Most will be labeled and while TCP isn’t explicitly labeled, it is the only one with flag bits, sequence and acknowledgment numbers to name a few Some protocols like DNS will be interpreted at the application layer Because of this, you may not see the normal clues that you are used to It may not be obvious if it is UDP or TCP so it is
important to look for clues as to which it is
In general, tcpdump gives details about the source/host > destination/host
Note that the bytes (0) transferred on SYN packets is normally 0 since they do not carry a payload because this is just part of establishing the three-way handshake
Trang 14• Collects tcpdump data in hourly files
• Analyzes each hour’s data for anomalies
• Formats anomalous data in html for browsing
• Comes with scripts to assist in examining data
Shadow (Secondary Heuristics for Defensive Online Warfare) is an intrusion detection system available
to all for free It can be found at http://www.nswc.navy.mil/ISSEC/CID Shadow uses tcpdump as its underlying collection and processing tool Shadow turns tcpdump from a packet collecting tool into an intrusion detection system Shadow collects data from the network interface and stores it in hourly files
in raw tcpdump compressed format It analyzes each hour’s collected data after-the-fact and runs a series of tcpdump filters against it looking for anomalies and one-to-many source IP to destination IP traffic
Shadow will format into html all the events of interest detected by the tcpdump filters and processed by some perl programs The analyst can examine the output with a browser and further investigate activity using some additional perl scripts to look through an hour’s or day’s worth of data
Using Shadow relieves the analyst from having to worry about the collection of tcpdump data; it automates this process Further, it gives the analyst an automated way of examining activity Still, the analyst has to interpret the output As with any other intrusion detection system, it requires a savvy analyst to accurately interpret the output However, since it is predicated upon tcpdump, the analyst has the ability to examine all the collected data down to the bit level
Trang 15• Performs traffic analysis
• Primary focus on datagram headers
• Pull-based architecture
• Analyst reviews hourly events of interest via
web browser
• Requires a savvy analyst to interpret output
• Freeware available from www.nswc.navy.mil
Shadow is a Unix based intrusion detection system It has a sensor and analysis component The sensor component collects network traffic and the analysis component fetches that traffic and analyzes it Both the sensor and analysis host process data in an hourly timeframe
The entire IP datagram is not captured because Shadow is mostly concerned with anomalies or events of interest found in the header portions of the datagram The headers examined are the IP, TCP, UDP and ICMP headers Much insight can be gained from examining these headers By default, some payload or data is captured in the datagram Shadow does not attempt to analyze this, but it is there in case you want to analyze it
Each hour the analysis host analyzes the previous hour’s traffic for events of interest These events
of interest are formatted in html for viewing by an analyst using a browser This is known as a based approach since the analyst is required to examine the records; the analyst is not informed or pushed alerts of anomalous events
pull-Shadow was developed by the pull-Shadow team at the Naval Surface Warfare Center It is still maintained and upgraded by this team Shadow can be downloaded at no cost from
http://www.nswc.navy.mil/ISSEC/CID Click on the link for Current Shadow Software.
Trang 16• Provides an audit trail of activity to/from network
• Provides an intimate view of activity
While not the only reason to install and use Shadow, a very compelling reason is the price tag In many cases, but not this one, you get what you pay for Shadow is an excellent no-cost traffic analysis tool
Another benefit is that once you master Shadow, you can change it liberally at any time that you want For instance, if you hear of a new exploit and can fashion a signature with a tcpdump filter, you can modify Shadow instantaneously Compare this with some intrusion detection systems that do not offer the
capability to change filters or signatures You have to wait for the software company to update the filters when they get around to it and the updates may not include signatures that you would like to see
Also, since you get all the source code with Shadow, you can customize it for your whims and needs This
is highly unusual and allows you to make changes based on your proficiency of the software
Shadow uses tcpdump as its collection software By default, you will collect most activity going into and out of your network This can be very beneficial in providing an audit trail of activity in the network If you ever find yourself in the midst of some kind of incident, this may be a very valuable attribute for an intrusion detection system to have
Finally, some of the more GUI kinds of intrusion detection systems do not allow the user to examine the actual traffic at the IP datagram level Shadow, by virtue of tcpdump, will allow the user a very intimate view of the data collected You will maintain fidelity of data and you can use all fields for interpretation and analysis If the traffic you are analyzing is corrupted in some way, you want to be able to inspect the entire datagram
Trang 17hour 02 data
DMZ
analysis host
secure copy
tcpdump filters
html output
The Shadow architecture is a two-host system Typically, the sensor resides on the DMZ, but it can
be placed anywhere on the network It collects the traffic from the network interface and stores the data in hourly files which are in raw tcpdump compressed format
Each hour, the analysis host securely copies the files from the sensor Using perl scripts it
orchestrates the process of running the previous hour’s tcpdump data through a set of tcpdump filters that looks for anomalous activity Another filter and perl script examine the data for signs of scans –one source IP attempting connections to multiple destination IP’s All of this information is then formatted into html for viewing by the analyst
Trang 18• Traffic sent to broadcast address
• Traffic from reserved private networks
• Fragmentation
• Initial SYN connections
• Particular UDP ports
• Specific ICMP traffic
fragmentation
For TCP records, the initial SYN connections are examined This doesn’t necessarily mean that the connection was successful, it just indicates that the connection was attempted Also, certain ports or hosts may have to be excluded so as not to false alarm For UDP records, you have to maintain a list
of UDP destination ports that are of interest to you
Shadow looks for signs of a one-to-many relationship of source IP to multiple destination hosts –often indicative of a scan Finally, Shadow can be tuned to look at more granular activity to the core infrastructure hosts in your network
Trang 1919Sample Shadow Output
Shadow output is sorted tcpdump output It is sorted by source IP and time to allow the analyst to group the activity by source IP The above activity indicates a probe of port 3128 (squid proxy server port) by host 1.2.3.4 A second host that is displayed because it was extracted by one of the tcpdump filters is host 2.2.2.2 which appears to be probing mydns.com for destination port 139 which is a NetBIOS port Typically DNS servers do not have the NetBIOS ports open
The final set of activity appears to be a full-blown scan from source IP 5.5.5.5 It is scanning the hosts on the 172.16.1 subnet for port 1243 which is a trojan known as SubSeven or BackDoorG Having the output displayed in html for the analyst makes it easier for the analyst to examine the hour’s traffic
Trang 2020Examining tcpdump Output
This page intentionally left blank
Trang 2111:55:52.069484 192.168.143.5 > 192.168.143.101: icmp: echo request
tcpdump will display any collected or processed output to standard output – typically the console or terminal It will also attempt to resolve any IP numbers to host names and will also attempt to translate port numbers to known services For instance, if a port number is 23 and it is found in the file /etc/services as being associated with telnet, tcpdump will print the service and not the port number - that is, unless the –n option has been used to disable resolution
As you can see, this does not display all the captured fields in the datagram Other fields are available for display, but different command line options have to be supplied in order to see the fields In the above record, we have an ICMP echo request captured
Trang 231415 1617 1819
Underlined : IP
protocol header and data ICMP Header
Suppose you want to examine all the bits that are captured when tcpdump is run There are many reasons for wanting to examine this level of detail, especially when you believe that there is some kind of deliberate crafting or alteration of the datagram
In order to dump the bits, tcpdump has an option to display the output in hexadecimal This is done
by using the –x command line option From the hexadecimal output, the bits can be determined When output is displayed in hex, you will have to have some idea of what the fields are that you are examining A most excellent resource to assist in this task is the “bible” of TCP/IP – TCP/IP Illustrated, Volume 1 by Richard Stevens Not only are the protocol headers conveniently located directly inside the cover, but this book uses tcpdump output to assist in the understanding of TCP/IP
One of the first things you will need to do upon looking at the hex output is to determine where the
IP header is and how long it is We’ll see how to do that in upcoming slides Also, you want to examine the embedded protocol and determine where that header stops and starts Finally, you may have some kind of interest in the embedded protocol payload
Trang 2424
Default snaplen
• Default number of bytes captured is 68
• Why do we see only 54 bytes of data from tcpdump?
In the above slide, we see that there appears to be a 20 byte header that is underlined Each line of tcpdump output is 16 bytes We see that we have 54 bytes of data that have been captured For the above output, the actual datagram is longer than 68 bytes (we’ll see how to compute the datagram length), but we only have 54 bytes of output Any ideas why?
Trang 2525
Answer: Frame Header
Frame header IP Header ICMP ICMP Data
Header
Ethernet =
14 bytes 20 bytes 8 bytes 26 bytes
The answer to the question of why only 54 bytes of IP datagram data appear on the previous slide even though the datagram is greater than 68 bytes has to do with the collection of the data in the frame header In this case, we are running on a host that has an Ethernet connection Ethernet has a
14 byte frame header which holds fields such as the source and destination MAC address and the kind of embedded datagram – IP, arp or rarp This is why we only see 54 bytes of IP datagram; 14 bytes are used to record the Ethernet header
Trang 2626
Increasing the snaplen
• For Ethernet, maximum frame size (frame header
As a test case, let’s say we want to capture the entire datagram for each record we read or process on
an Ethernet network In this case, we need to increase the snaplen to the maximum size of the datagram + the frame header Ethernet has a Maximum Transmission Unit (MTU) of 1500 If you add 14 bytes for the frame header, the snaplen must be 1514 bytes
Now, to check if we’ve collected the entire datagram, we run tcpdump with a snaplen of 1514 If we dump the collected record in hexadecimal, we find we’ve collected more than the 54 bytes The actual datagram length is found in bytes 2-3 (counting starts at 0 bytes) We discover a hex 54 in this field In decimal, that translates to 5*161+ 4*160= 84 And, we see that we’ve collected all 84 bytes
Trang 2727
Converting Hex Output to a
Decimal Value
• IP datagram length is 16 bits
• 16 bits = 4 hex characters
• Start at the right-most character
• Take each hex character and represent it as a power of 16
Simply continue labeling the remaining hex characters of the field you are converting as increasing powers of 16 Finally, after all of the characters are labeled as powers of 16, multiply the hex character by that value
In the above example, we are looking at the length field We have 4 hex characters because the length is a 16-bit field We really only need to label the two right-most characters because they are non-zero After we do this, we find we have a 4 in the 160 position; this is really the one’s position meaning we have 4*1 or 4 The next character of 5 is in the 161position So, we multiply 5*16 for a product of 80 Therefore, the decimal conversion is 84
Trang 286 Bytes 6 Bytes 2 Bytes (Calculated)
There will be times that you will be interested in examining the frame header One of the reasons for this would be to identify the source MAC address to try to determine where the packet came from - a host or perhaps a router The frame header can be displayed for Ethernet using the –e option
You see the source and destination MAC addresses followed by the type of packet that follows the frame header The types of traffic you are likely to see are IP, arp and rarp These fields are all stored in the frame header The final field is the length, in bytes, of the frame (not including the trailing 4 bytes of cyclical redundancy check – CRC) In this case, it is the length of the datagram plus 14 bytes since this is Ethernet This field is not stored in the header, it is calculated and
displayed in decimal when the –e option is selected
Trang 2929Length Fields
There are several different length fields that are found in the IP datagram These can be somewhat confusing since they often cannot be calculated unless you understand that the value must be multiplied by some factor to determine the true length We’ll examine these fields in this section
Trang 30flags 13-bit fragment offset8-bit time to live
(TTL)
8-bit protocol 16-bit header checksum
32-bit source IP address
32-bit destination IP address
4-bit IP header length
16-bit IP datagram total length
13-bit fragment offset length
4-bit header length – multiply by 4 to convert to bytes16-bit total IP datagram length – already expressed in bytes13-bit fragment offset – multiply by 8 to convert to bytes
If you look at the IP header above, you’ll see that there are three different fields containing lengths None of these fields except the 16-bit IP datagram total length is the actual byte length of the field Let’s examine these fields in more detail
Trang 31The IP header length is found in the low-order nibble (4 bits) of the first byte offset into the IP header In the above slide, we see that the IP header length is 5 This is not 5 bytes as one might assume This is actually 5 words A word is defined as a 32-bit field And, considering that a byte has 8 bits, a word is 4 bytes So, you have to use a multiplication factor of 4 to figure the actual number of bytes In this case, we see that we have 20 bytes Since this field is 4 bits long, the greatest value that can be found in it is a binary 1111 or a hexadecimal 0f which is a decimal 15 This means the longest IP header can be 60 bytes.
You may be wondering why you have to go through this conversion – why didn’t they just make the field long enough to express in bytes? That would require 2 additional bits (26= 64) to represent the maximum of 60 bytes This would require every IP datagram to be 2 additional bits longer –increasing the volume of traffic So, any kind of representation that might truncate the size of the datagram improves efficiency
Trang 33If we look at the type of IP option that is in this datagram, it is found in the 20thbyte There is a hex value of 0x44 there, which translates to a decimal 68 which indicates that we want to record timestamps This attempts to collect timestamps for all routers through which the datagram travels Each timestamp takes up 4 bytes The timestamp option itself requires 4 bytes in the IP header as overhead This means if the maximum IP header is 60 bytes and we must have 20 bytes for the standard header, we only have 40 bytes left for recording timestamps This allows 9 timestamps to
be collected which may not be enough to record all router timestamps through which the datagram travels
Trang 3434
Fragmentation – Total Length
16:21:35.686860 ping.com> your.net: icmp: echo
If you look at the IP datagram length field, you see that we have a hex value of 0x05dc which computes to 1500 decimal As you will recall, this is the MTU for Ethernet So, it appears that this datagram went from a link layer that was larger than 1500 to an Ethernet network The 1500 represents the 1480 bytes of embedded fragment data plus the 20 byte IP header
Trang 35flags 8-bit time to live
(TTL)
8-bit protocol 16-bit header checksum
32-bit source IP address
32-bit destination IP address
13-bit fragmentation offset
Looking at the above slide, the fragmentation offset field is found partially in the 6thand 7thbytes of the IP header It is a 13-bit field When a datagram is fragmented, this field will have to be changed
to reflect the offset that this fragment is found in the reassembled fragment data
Trang 36• Fragment offset length 213= 8192 bytes
• How do you specify a fragment offset > 8192
• 65,536 / 8192 = 8
• Need to multiply fragment offset by 8
Theoretically, it is possible to have a datagram that is 65,535 bytes since the datagram length field is
16 bits Given this, it is also theoretically possible that a fragment offset can be very close to this 65,535 limit But, the fragment offset field is only 13 bits with a possible maximum value of 8192 bytes Therefore, some multiplication factor must be applied to the offset to be able to represent all possible fragments
We see that if you divide the maximum possible IP datagram size – 216(actually 216 – 1) and the maximum fragment offset size 213(actually 213 – 1), you have 23which is 8 More simply, 8192 * 8
= 65536 This is how we arrive at the multiplicative factor of 8 for the fragment offset length
Trang 37We find a fragment offset of 0xb9 which translates to a decimal 185 But, this must be multiplied by
8 to compute the actual offset which is 1480 This most likely indicates that this fragment traveled to
an Ethernet network with a MTU of 1500 (including a 20 byte IP header)
Trang 38options (if any)
16-bit source port number 16-bit destination port number
32-bit sequence number
32-bit acknowledgement number reserved
(6-bits)
U A P R S F
R C S S Y I
G K H T N N
16-bit window size
16-bit checksum 16-bit urgent pointer
4-bit
header
length
4-bit TCP header length – multiply by 4 to convert to bytes
Another length field is found in the TCP header This represents the length of the TCP header itself Like the IP header, the TCP header can have options And, like the IP header, the TCP header is normally 20 bytes long This TCP header is found in the high-order nibble of the 12thbyte offset in the TCP header One final similarity between the IP and TCP header lengths is that they are both expressed as 32-bit words and must be multiplied by 4 to be converted to bytes