Tài liệu Chapter-23-Network debugging docx

Network debugging In this chapter: networ k problems problems application layers In this chapter: networ k problems problems application layers The chances are quite good that you’ll hav

Trang 1

Network debugging

In this chapter:

networ k problems

problems

application layers

In this chapter:

networ k problems

problems

application layers

The chances are quite good that you’ll have some problems somewhere when you set up your network FreeBSD gives you a large number of tools with which to find and solve the problem

In this chapter, we’ll consider a methodology of debugging network problems In the process, we’ll look at the programs that help debugging It will help to have your finger

in Chapter 16 while reading this section

How to approach network problems

Recall from Chapter 16 that network software and hardware operate on at least four layers If one layer doesn’t work, the ones above won’t either When solving problems,

it obviously makes sense to start at the bottom and work up

Most people understand this up to a point Nobody expects a PPP connection to the Internet to work if the modem can’t dial the ISP On the other hand, a large number of messages to theFreeBSD-questionsmailing list show that many people seem to think that once this connection has been established, everything else will work automatically

If it doesn’t, they’re puzzled

Unfortunately, the Net isn’t that simple In fact, it’s too complicated to give a hard-and-fast methodology at all Much network debugging can look more like magic than anything rational Nevertheless, a surprising number of network problems can be solved

by using the steps below Even if they don’t solve your problem, read through them They might give you some ideas about where to look

netdebug.mm,v v4.15 (2003/04/02 03:23:15) 401

Trang 2

How to approach networ k problems 402

Link layer problems

To test your link layer, start with ping ping is a relatively simple program that sends an

ICMP echo packet to a specific IP address and checks the reply ICMP, is the Internet

Illustrated, by Richard Stevens, for more information.

A typical ping output might look like:

$ ping bumble

PING bumble.example.org (223.147.37.156): 56 data bytes

64 bytes from 223.147.37.156: icmp_seq=0 ttl=255 time=1.137 ms

ˆC

bumble.example.org ping statistics

-4 packets transmitted, -4 packets received, 0% packet loss

round-trip min/avg/max/stddev = 0.612/0.765/1.137/0.216 ms

In this case, we are sending the messages to the system bumble.example.org By default,

ping sends messages of 56 bytes With the IP header, this makes packets of 64 bytes.

By default, ping continues until you stop it—notice theˆCindicating that this invocation

was stopped by pressing Ctrl-C.

The information that ping gives you isn’t much, but it’s useful:

• It tells you how long it takes for each packet to get to its destination and back

• It tells you how many packets didn’t make it

• It also prints a summary of packet statistics

But what if this doesn’t work? You enter your ping command, and all you get is:

$ ping wait

PING wait.example.org (223.147.37.4): 56 data bytes

ˆC

wait.example.org ping statistics

-5 packets transmitted, 0 packets received, 100% packet loss

Obviously, something’s wrong here We’ll look at it in more detail below This is very

different, however, from this situation:

$ ping presto

ˆC

In the second case, even after waiting a reasonable amount of time, nothing happened at

all ping didn’t print thePINGmessage, and when we hit Ctrl-C there was no further

output This is indicative of a name resolution problem: ping can’t print the first line

(PING presto ) until it has found the IP address of the system, in other words, until it has performed a DNS lookup If we wait long enough, it will time out, and we get the messageping: cannot resolve presto: Unknown host If this happens, use the

IP address instead of the name DNS is an application, so we won’t even try to debug it

Trang 3

until we’ve debugged the link and network layers.

If things don’t work out, there are two possibilities:

• If both systems are on the same network, it’s a link layer problem We’ll look at that first

• If the systems are on two different networks, it might be a network layer problem That’s more complicated: we don’t know which network to look at It could be either

of the networks on which the systems are located, or it could also be a problem with one of the networks on the way How do you find out where your packets get lost? First you check the link layer If it checks out OK, and the problem still exists, continue with the network layer on page 405

So what can cause link layer problems? There are a number of possibilities:

• One of the interfaces (source or destination) could be misconfigured They should both have on the same range of network addresses For example, the following two interface configurations cannot talk to each other directly, even if they’re on the same physical network:

machine 1

dc0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500

inet 223.147.37.81 netmask 0xffffff00 broadcast 223.147.37.255

machine 2

xl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500

options=3<RXCSUM,TXCSUM>

• If you see something like this on an Ethernet interface, it’s pretty clear that it has a cabling problem:

xl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500

options=3<RXCSUM,TXCSUM>

media: Ethernet autoselect (none)

status: no carrier

In this case, check the physical connections If you’re using UTP, check that you have the right kind of cable, normally a ‘‘straight-through’’ cable If you accidentally use a crossover cable where you need a straight-through cable, or vice versa, you will not get any connection Also, many hubs and switches have a ‘‘crossover’’ switch that achieves the same result

• If you’re on an RG-58 thin Ethernet, the most likely problem is a break in the cabling You can check the static resistance between the central pin and the external part of the connector with a multimeter It should be approximately 25Ω If it’s 50Ω, it indicates that there is a break in the cable, or that one of the terminators has been disconnected

• If your interface is configured correctly, and you’re using a 10 Mb/s card, check whether you are using the correct connection to the network Some older Ethernet boards support multiple physical connections (for example, both BNC and UTP) For

netdebug.mm,v v4.15 (2003/04/02 03:23:15)

Trang 4

Link layer problems 404

example, if your network runs on RG58 thin Ethernet, and your interface is set to AUI, you may still be able to send data on the RG58, but you won’t be able to receive any

The method of setting the connection depends on the board you are using PCI boards are not normally a problem, because the driver can set the parameters directly, but ISA boards can drive you crazy In the case of very old boards, such as the Western Digital 8003, you may need to set jumpers In others, you may need to run

the setup utility under DOS, and with others you can set it with the link flags to

ifconfig For example, on a 3Com 3c509 ‘‘combo’’ board, you can set the connection

like this:

# ifconfig ep0 link0 -link1 set AUI

This example is correct for the ep driver, but not necessarily for other Ethernet

boards: each board has its own flags Read the man page for the board for the correct flags

• If your interface looks OK, the next thing to do is to see whether you can send data to other machines on the network If so, of course, you should continue your search on the machine that isn’t responding If none are working, you probably have a cabling problem

On a wireless network, you need to check for a number of additional problems ifconfig

should show something like this:

wi0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500

inet6 fe80::202:2dff:fe04:93a%wi0 prefixlen 64 scopeid 0x3

ether 00:02:2d:21:54:4c

media: IEEE 802.11 Wireless Ethernet autoselect (DS/11Mbps)

status: associated

ssid "FreeBSD IBSS" 1:""

stationname "FreeBSD WaveLAN/IEEE node"

channel 3 authmode OPEN powersavemode OFF powersavesleep 100

wepmode OFF weptxkey 1

wepkey 2:64-bit 0x123456789a 3:128-bit 0x123456789abcdef123456789ab

There are many things to check here:

• Do you have the same operating mode? This example shows a card operating in BSS

or IBSS mode By contrast, you might see this:

media: IEEE 802.11 Wireless Ethernet autoselect (DS/11Mbps <adhoc, flag0>)

In this case, the interface is operating in so-called ‘‘Lucent demo ad-hoc’’ mode, which is not the same thing as ‘‘ad-hoc’’ mode (which in turn is better called IBSS mode) IBSS mode (‘‘ad-hoc’’) and BSS mode are compatible IBSS mode and

‘‘Lucent demo ad-hoc’’ mode are not See Chapter 17, page 306 for further details

Trang 5

• Is the statusassociated? The alternative isno carrier Some cards, including this one, showno carrierwhen communicating with a station operating in IBSS mode, but they nev er showassociatedunless they are really associated

• If the card is not associated, check the frequencies and the network name

• Check the WEP (encryption) parameters to ensure that they match Note that

ifconfig does not display the WEP key unless you areroot

Your card may showassociated ev en if the WEP key doesn’t match In such a case, it knows about the network, but it can’t communicate with it

After checking all these things, you should have a connection But you may not be home yet:

• If you have a connection, check if all packets got there Lost packets could mean line

quality problems That’s not very likely on an Ethernet, but it’s very possible on a

PPP or DSL link There’s an uncertainty about dropped packets: you might hit

Ctrl-C after the last packet went out, but before it came back If the line is very slow, you

might lose multiple packets Compare the sequence number of the last packet that returns with the total number returned If it’s one less, all the packets except the ones

at the end made it

• Check that each packet comes back only once If not, there’s definitely something wrong, or you have been pinging a broadcast address That looks like this:

$ ping 223.147.37.255

PING 223.147.37.255 (223.147.37.255): 56 data bytes

64 bytes from 223.147.37.88: icmp_seq=0 ttl=255 time=0.785 ms (DUP!)

FreeBSD systems do not respond to broadcast pings, but most other systems do, so

this effectively counts the number of non-BSD machines on a network

• Check the times A ping across an Ethernet should take between about 0.2 and 2 ms,

a ping across a wireless connection should take between 2 and 12 ms, a ping across

an ISDN connection should take about 30 ms, a ping across a 56 kb/s analogue connection should take about 100 ms, and a ping across a satellite connection should

take about 250 ms in each direction All of these times are for idle lines, and the time can go up to over 5 seconds for a slow line transferring large blocks of data across a

serial line (for example, ftping a file) In this example, some line traffic delayed the

response to individual pings

netdebug.mm,v v4.15 (2003/04/02 03:23:15)

Trang 6

Link layer problems 406

Network layer problems

Once we know the link layer is working correctly, we can turn our attention to the next layer up, the network layer Well, first we should check if the problem is still with us

We need additional tools for the network layer ping is a useful tool for telling you

whether data is getting through to the destination, and if so, how much is getting through But what if your local network checks out just fine, and you can’t reach a remote

network? Or if you’re losing 40% of your packets to foo.bar.org, and the remaining ones are taking up to 5 seconds to get through Where’s the problem? Based on the recent

‘‘upgrade’’ your ISP performed, and the fact that you’ve had trouble getting to other sites, you suspect that the performance problems might be occurring in the ISP’s net How can you find out?

As we saw while investigating the link layer, a complete failure is often easier to fix than

a partial failure If nothing at all is getting through, you probably have a routing problem

Check the routing table with netstat On bumble, you might see:

$ netstat -r

Routing tables

Internet:

The default route is via gw, which is correct The first thing is to ensure that you can

ping gw; that’s a link level issue, so we’ll assume that you can But what if you try to ping a remote system and you see something like this?

# ping rider.fc.net

PING rider.fc.net (207.170.123.194): 56 data bytes

36 bytes from gw.example.org (223.147.37.5): Destination Host Unreachable

4 5 00 6800 c5da 0 0000 fe 01 246d 223.147.37.2 207.170.123.194

36 bytes from gw.example.org (223.147.37.5): Destination Host Unreachable

4 5 00 6800 c5e7 0 0000 fe 01 2460 223.147.37.2 207.170.123.194

ˆC

rider.fc.net ping statistics

These are ICMP messages from gw indicating that it does not know where to send the data This is almost certainly a routing problem; on gw you might see something like:

Trang 7

$ netstat -r

Routing tables

Internet:

The problem here is that there is nodefaultroute Add it with the route command:

# route add default free-gw.example.net

# netstat -r

Routing tables

Internet:

default free-gw.example.ne UGSc 24 5724 ppp0

etc

See Chapter 17, page 310, for more details, including how to ensure that the routes will

be added automatically at boot time

But what if the routes look right, you don’t get any ICMP messages, and no data gets through? You don’t always get ICMP messages when the data can’t get through The

logical next place to look is free-gw.example.net, but there’s a problem with that: as the administrator of example.org, you don’t hav e access to example.net’s machines You can

call them up, of course, but before you do you should be reasonably sure it’s their

problem You can find out more information with traceroute.

traceroute

traceroute sends UDP packets to the destination, but it modifies the time-to-live field in

the IP header (see page 280) so that, initially at any rate, they don’t get there As we saw there, the time-to-live field specifies the number of hops that a packet can go before it is

discarded When it is, the system that discards it should send back an ICMP destination

unreachable message traceroute uses this feature and sends out packets with

time-to-live set first to one, then to two, and so on It prints the IP address of the system that sends the ‘‘destination unreachable’’ message and the time it took, thus giving something

like a two-dimensional ping Here’s an example to hub.FreeBSD.org:

netdebug.mm,v v4.15 (2003/04/02 03:23:15)

Trang 8

traceroute 408

$ traceroute hub.freebsd.org

traceroute to hub.freebsd.org (204.216.27.18), 30 hops max, 40 byte packets

1 gw (223.147.37.5) 1.138 ms 0.811 ms 0.800 ms

2 free-gw.example.net (139.130.136.129) 131.913 ms 122.231 ms 134.694 ms

3 Ethernet1-0.way1.Adelaide.example.net (139.130.237.65) 118.229 ms 120.040 ms 118.723 ms

4 Fddi0-0.way-core1.Adelaide.example.net (139.130.237.226) 171.590 ms 117.911 ms 123.513 ms

5 Serial5-0.lon-core1.Melbourne.example.net (139.130.239.21) 129.267 ms 226.927

ms 125.547 ms

6 Fddi0-0.lon5.Melbourne.example.net (139.130.239.231) 144.372 ms 133.998 ms 13 6.699 ms

7 borderx2-hssi3-0.Bloomington.mci.net (204.70.208.121) 962.258 ms 482.393 ms 7 54.989 ms

8 core2-fddi-1.Bloomington.mci.net (204.70.208.65) 821.636 ms * 701.920 ms

9 bordercore3-loopback.SanFrancisco.mci.net (166.48.16.1) 424.254 ms 884.033 ms 645.302 ms

10 pb-nap.crl.net (198.32.128.20) 435.907 ms 438.933 ms 451.173 ms

11 E0-CRL-SFO-02-E0X0.US.CRL.NET (165.113.55.2) 440.425 ms 430.049 ms 447.340 ms

12 T1-CDROM-00-EX.US.CRL.NET (165.113.118.2) 553.624 ms 460.116 ms *

13 hub.FreeBSD.ORG (204.216.27.18) 642.032 ms 463.661 ms 432.976 ms

By default, traceroute tries each hop three times and prints out the times as they happen,

so if the reponse time is more than about 300 ms, you’ll notice it as it happens If there is

no reply after a timeout period (default 5 seconds), traceroute prints an asterisk (*) You’ll also occasionally notice a significant delay at the beginning of a line, although the response time seems reasonable In this case, the delay is probably caused by a DNS reverse lookup for the name of the system If this becomes a problem (maybe because the global DNS servers aren’t reachable), you can turn off DNS reverse lookup using the

-nflag

If you look more carefully at the times in the example above, you’ll see three groups of times:

1 The times to gw are round 1 ms This is typical of an Ethernet network.

2 The times for hops 2 to 6 are in the order of 100 to 150 ms This indicates that the

link between gw.example.org and free-gw.example.net is running PPP over a

telephone line The delay between free-gw.example.net and

Fddi0-0.lon5.Mel-bourne.example.net is negligible compared to the delay across the PPP link, so you

don’t see much difference

3 The times from borderx2-hssi3-0.Bloomington.mci.net to hub.FreeBSD.ORG are

significantly higher, between 400 and 1000 ms We also note a couple of dropped

packets This indicates that the line between Fddi0-0.lon5.Melbourne.example.net and borderx2-hssi3-0.Bloomington.mci.net is overloaded The length of the link

(about 13,000 km) also plays a role: that’s a total distance of 26,000 km, which take about 85 ms to transfer If this were a satellite connection, things would be much slower: the total distance from ground station to satellite and back to the ground is 72,000 km, which takes a total of 240 ms to propagate

Back to our problem If we see something like the output in the previous example, we

know that there’s no reason to call up the people at example.net: it’s not their problem.

This might just be overloading on the global Internet On the other hand, what about this?

Trang 9

1 gw (223.147.37.5) 1.138 ms 0.811 ms 0.800 ms

2 * * *

3 * * *

ˆC

You’ve fixed your routing problems, but you still can’t get data off the system There are

a number of possibilities here:

• The link to the next system may be down The solution’s obvious: bring it up and try again

• gw may not be configured as a gateway You can check this with:

$ sysctl net.inet.ip.forwarding

net.inet.ip.forwarding: 1

For a router, this value should be1 If it’s 0, change it with:

# sysctl -w net.inet.ip.forwarding=1

net.inet.ip.forwarding: 0 -> 1

See page 313 for further details, including how to ensure that this sysctl is set

correctly when the system starts

• You may be trying to use a non-routable IP address such as those in the range

192.168.x.x You can’t do that If you don’t hav e enough globally visible IP

address, you’ll need to run some kind of aliasing package, such as NAT See Chapter

22, page 393, for further details

• Maybe there is something wrong with routing to your network This is a difficult one

to check, but in the case of the reference network, one possibility is to repeat the

traceroute from the machine gw: gw’s external address on the tun0 interface is

139.130.136.133, which is on the ISP’s network As a result, they are not affected

by a routing problem for network 223.147.37.x If this proves to be the case,

contact your ISP to solve it

• Maybe there is something wrong with the other end; if everything else fails, you may

have to call the admins at example.net ev en if you have no hard evidence that it’s

their problem

But maybe the data gets one hop further:

1 gw (223.147.37.5) 1.138 ms 0.811 ms 0.800 ms

2 free-gw.example.net (139.130.136.129) 131.913 ms 122.231 ms 134.694 ms

3 * * *

4 * * *

ˆC

In this case, there is almost certainly a problem at example.net This would be the correct

time to use the telephone

netdebug.mm,v v4.15 (2003/04/02 03:23:15)

Trang 10

traceroute 410

High packet loss

But maybe data is getting through Well, some data, anyway Consider this ping

session:

$ ping freefall.FreeBSD.org

PING freefall.FreeBSD.org (216.136.204.21): 56 data bytes

ˆC

216.136.204.21 ping statistics

round-trip min/avg/max/stddev = 441.352/530.039/839.671/113.674 ms

In this case, we have a connection But look carefully at those sequence numbers At one point, four packets in a row (sequence 19 to 22) get lost How high a packet drop rate is still acceptable? 1% or 2% is probably still (barely) acceptable By the time you get to 10%, though, things look a lot worse 10% packet drop rate doesn’t mean that your connection slows down by 10% For every dropped packet, you have a minimum delay

of one second until TCP retries it If that retried packet gets dropped too—which it will

ev ery 10 dropped packets if you have a 10% drop rate—the second retry takes another three seconds If you’re transmitting packets of 64 bytes over a 33.6 kb/s link, you can normally get about 60 packets through per second With 10% packet loss, the time to get these packets through is about eight seconds, a throughput loss of 87.5%

With 20% packet loss, the results are even more dramatic Now 12 of the 60 packets have

to be retried, and 2.4 of them will be retried a second time (for three seconds delay), and 0.48 of them will be retried a third time (six seconds delay) This makes a total of 22 seconds delay, a throughput degradation of nearly 96%

Theoretically, you might think that the degradation would not be as bad for big packets,

such as you might have with file transfers with ftp In fact, the situation is worse then: in

most cases the packet drop rate rises sharply with the packet size, and it’s common

enough that ftp times out completely before it can transfer a file.

To get a better overview of what’s going on, let’s look at another program, tcpdump.

Định dạng
Số trang	15
Dung lượng	254,5 KB

Tiêu đề	Network debugging
Thể loại	chapter
Năm xuất bản	2003