howto://troubleshoot networks with ping

Welcome to the first in a series of posts on basic network troubleshooting. These posts are designed to help someone with some basic networking knowledge expand their repertoire of troubleshooting skills and tricks. This series will cover the following Windows commands.
ping, tracert, pathping, arp, nbtstat, and telnet. We’ll also cover third party command lines tools including tcping, tracetcp, and the name resolution tools in dig, host, and whois.  This first post is all about ping. The rest will be released as I get to them, but I hope to have them all done by the end of the summer. Stay tuned for more.

The first five are in all current versions of Windows. Telnet used to be, but in the interests of sekuritah, Microsoft decided to make you opt-in to using it by making it an optional feature in Vista and later, which is an example of Right ideal, wrong execution. You might get pushback trying to make a third party tool part of a default build, but trust me on this, make sure the Telnet Client feature is added to EVERYTHING. With the exception of pathping, you should also be able to use these same commands, with almost the same syntax, on Linux, Unix, and Mac systems, too. With these tools, we’re set to diagnose many of the networking problems that might pop up, with just a little bit of cmd-line voodoo. If you’d like to learn how to use ping, you know what to do next. So without any further ado, I give you…


Level 100-The origin of PING

Ping is a command, a particular type of Internet Control Messaging Protocol (ICMP) message, a funny sound made by sonar, and a maker of fine golf putters. We’re only really interested in the first two though, starting with the ICMP message. If you remember your OSI TCPIP_OSImodel <warning, majorly boring stuff approaching> and how the TCP/IP stack maps to it, then you’ll remember that ICMP operates at the network layer…right along side IP. ICMP can be used to deliver messages about the state of the network, or whether a host or a network is unreachable. UDP uses ICMP Port Unreachable messages in the same way TCP uses RST ACK when a host receives traffic to a port on which no service is listening. Firewalls can use ICMP Administratively Prohibited messages when configured to block, as opposed to drop, to respond that traffic is not permitted.

There are a number of other ICMP messages available, but we are interested in using just two, echo request and echo reply. When we ping from HostA to HostB, the idea is that HostA wants to check on the network path between himself and HostB. HostA sends some discrete data to HostB. HostB sends that data back exactly as received, echoing back the data received. HostA compares the received data with what was sent, and can determine the following;

  • that HostB is alive and well
  • that data sent from HostA to HostB can get there, and back,
  • the average time it will take
  • any packet loss
  • and data corruption

Mike Muuss wrote the original program to take advantage of this functionality back in 1983, and named it ping after the sound sonar makes when trying to detect ships. Since that time, essentially every operating system that uses TCP/IP has a ping command, though each major operating system tends to insert its own data pattern into the echo requests, along with different default starting TTL values.

Level 200-PING basics

So what can we do with ping? At it’s simplest, ping just tells us if a host is alive. Try this.

  1. Open a cmd prompt
  2. Do an ipconfig to get your default gateway
    image
  3. Ping your default gateway
    image
     
  4. Consider the results.
    By default, Windows sends four ICMP echo requests with 32 bytes of data. What it got back with an average of 1ms was four good responses. The TTL that we see is set by the responding host…we can infer (since our gateway is on our same network) is that it is a Windows host. It is…our TMG. Had it been a Cisco or *nix host, the TTL would have been 64, like this.
    image

As a general rule, current Windows hosts start pings and replies with a TTL of 128; modern Linux and Unix hosts use 64, and older systems use 255. If you have a rough ideal of how many hops away a host is, you can infer the operating system by looking at the TTL in the reply.

Since we are on the same subnet, there was no router to decrement the TTL of the replies. Four packets doesn’t really give us a great deal of data loss testing, but we see 0% loss since we sent four, and we got four back. Bottom line….our target is alive and well and we can reach it.

So why does the first ping take longer to respond in each case than the subsequent ones? ARP. In both cases my host did not have the MAC address of the target in cache, so it had to ARP for the address. The target would likely also have to do an ARP before replying. Ping starts its clock when the packet hits the stack at layer three…layer two just needs to catch up.

Request timed out versus destination unreachable

Sometimes a host won’t answer, either because it is down, or firewalled. The response you see in your ping command depends on what operating system you have, and whether or not you are on the same subnet as your target, and sometimes even what kind of network devices are between you and your target.

Before a host can ping another, it has to either ARP the target (on the same subnet) or ARP the router (different subnet.) If no response to that ARP request is received, older Windows systems would say request timed out…the same as if you were pinging a host that was down, or where ICMP was blocked. Windows 7 and 2008 are smart enough to tell you "Destination host unreachable." Check the source ip.addr of the "Destination host unreachable" message. If it is yours, then either you cannot ARP the host, or your gateway. If it is a remote router/firewall, then it could be telling you the host is down, or that ICMP is not permitted by the target. If you are on a Windows 7 host, and trying to ping another host on the same subnet, and you get a request timed out instead of a destination unreachable, it means your target is dropping ICMP. It has to respond to the ARP request, but then it is dropping your echo requests. Bad host, no! (see below for more on that.)

Like any good command line tool, ping also supports several switches. They are all optional (a couple are even like the appendix…still around but no longer useful) but with the right combination, you can learn a lot more about a host or the network between.

Ping without switches really only tells us two things about a host…it is alive and we can reach it. As mentioned above, we can infer from the TTL what type of operating system it is running, but that is about it. Here is where switches come in.

the –n switch, for the number of pings

Windows hosts will send four pings by default. If you wish to set a specific number, use the –n switch. To ping a host ten times you would do this.

ping –n 10 a.b.c.d [enter]

the –t switch, for continuous ping

Switch back to your cmd prompt, and enter this command, substituting your default gateway ip.addr.

ping -t a.b.c.d [enter]

The -t switch initiates a continuous ping, which allows us to check for packet lost, as well as to see any general trends in response times either going up or down. It is also dead useful for monitoring a server that you rebooted to see when it starts to come back up. You can also put it after the target.

image

The ping will just run continuously until you stop it by hitting ctrl-c, or ctrl-break. Both of those will give you summaries of all pings sent and received, but ctrl-c stops the pings completely, while ctrl-break gives you the summary up to that point, and keeps on going. In this screen shot, the first table is from a ctrl-break, and you can see four more pings before I hit control-c.

image

Notice what the ping statistics tell you. The number sent, the number received, the number lost (0% is good…the longer you ping the more likely you will drop one or more, but if the percent remains 0 you are doing just fine) the minimum, maximum, and average response times. On a 100Mbit LAN and the same subnet, you definitely should see very low times and 0% loss. The one spike to 140ms when I entered the ctrl-break was more likely a processing problem on my pc than a network issue. Most pings are sent at the rate of one per second, so if you let this run for twelve hours, you should have statistics for around 43,200 pings.

ping –a for name resolution

Sometimes you have an ip.addr and you want to resolve it to a name. While you should do a host or an nslookup to get that information, if you want to combine pinging an ip.addr to see if it is up with a lookup to find its name, use the –a switch, like this.

ping -a 65.55.21.250 [enter]

From that you should see that 65.55.21.250 belongs to microsoft.com, and you can infer that either they fell down and went boom, or that they are blocking ICMP echo requests at their border, and therefore do not comply with RFC 1122. Considering the ping of death was an attack against Windows hosts, I’m guessing they will never be compliant.

Level 300-Advanced PINGing

There are a couple of switches that you will use for special troubleshooting.

ping –l to set the size for larger packets

The –l switch is used to specify the data payload size for your ping. Windows boxes default to using a very small 32 bytes of data. Use –l # to test larger payloads.

ping –f to set the don’t fragment flag

By default, routers can take packets that are larger than the maximum transmission unit for the media, and break them into smaller fragments before forwarding them. The –f flag tells the router that it should reject any packets larger than the MTU by setting the "Don’t Fragment (DF)" flag, instead of fragmenting them. If a router rejects a packet because it is too larger and the DF bit is set, it will respond to the sending host that "fragmentation is required but the DF bit is set.

combine –l and –f to test for mtu issues

Frame, cable and DSL all support the same MTU as Ethernet (1472 once you get past layer three header.) Wi-Fi can handle an even larger MTU, but since that usually connects to an Ethernet network and acts as a bridge, you won’t want to go above 1472. To test your MTU, try this. Open a cmd prompt and enter this command.

ping -f -l 1000 www.yahoo.com [enter]

That says to ping yahoo.com (they always allow ICMP echo request) with a packet of 1000 bytes of payload, and set the DF bit. You can increase that number right on up to 1472….if you have cable or DSL, you will get responses until you use 1473….then you will get an error. When your MTU is actually smaller than the value you set with –l, and you also set –f which tells routers not to fragment, you will get an error stating that fragmentation is required but the DF bit is set.

This becomes useful when troubleshooting throughput (smaller MTU means more packets for the same amount of data, and therefore less throughput) and Windows authentication errors. Some VPN technologies have a lot of overhead in the layer three header <cough>Cisco</cough> which can break Kerberos authentication. If you have an MTU much below 1200, you will want to use the "set mtu" applet in the Cisco client to set a lower value, and force Kerberos to use TCP on your XP clients. Of course, modern Windows clients default to TCP for Kerberos on their own.

ping –w timeout for slooooow (high latency) links

If you want to tell ping to wait a specific number of milliseconds before giving up on a ping, use the –w switch. By default, Windows’ ping will wait 4000 ms for a response.

ping –i TTL to change your TTL

If you want to alter the TTL of your packets to see what routers are not responding to ping, but perhaps will tell you when "TTL Expired In Transit" you can specify a smaller TTL with the –i switch. Remember though that Windows will always start with a TTL far larger than needed to get to the destination, and tracert will do a better job of using ICMP and incrementing TTLs to find the path. In other words, you will probably never use this switch.

Level 400-Arcane PING

There are some other switches available, though you will likely never use them. Some that are still around no longer actually work. They used to be useful for learning some of the hops our packets travel on the WAN, or testing TOS, but alas, no more. Traceroute will tell us the outgoing path, and to learn the return you can trace from the other side.

ping –r count

This would record the route for up to nine hops, putting the routers into the header. We use tracert to see that information now.

ping –s count

This would record the timestamp for up to nine hops. Again, we use tracert for this now.

ping –k host-list

The –k switch lets you specify using a strict source route along a host-list, assuming you have multiple paths of equal cost.

ping –S srcaddr

On a multihomed host, if you want to specify the source ip.addr, use –S.

ping –4

If your host runs IPv4 and IPv6, and the target name can be resolve to an A and a AAAA record, use the –4 switch to specify you want to use IPv4.

ping –6

If your host runs IPv4 and IPv6, and the target name can be resolve to an A and a AAAA record, use the –6 switch to specify you want to use IPv6.

ping –R

For IPv6 only, use the –R switch to specify using the routing header to test reverse route.

Sometimes it’s not that easy…

Our biggest challenge with using PING is that a lot of organisations are making the choice to block/drop ICMP to reduce their exposure. Ping sweeps, oversized or invalid pings, DDoS attacks using ping floods are all risks, and despite the fact that RFC 1122 states that hosts are required to accept and respond to ICMP echo requests, many don’t. Sometimes your side will allow ping, and your target will allow ping, but something in between doesn’t. As a result, don’t assume that an unanswered ping means a host is down unless you know for certain that you have pinged it before and gotten a reply. Ping new hosts with a degree of skepticism until you are sure of the intervening networks. My recommendations are to follow the RFCs, but if you feel you must restrict ICMP traffic, then permit it as follows;

  • allow ICMP ECHO requests to your web server,
  • allow ICMP ECHO requests to your VPN concentrator,
  • allow ICMP ECHO requests to your Internet router,
  • allow ICMP ECHO requests throughout your internal network to all hosts,
  • allow the corresponding ICMP ECHO REPLIES to all of the above.

    You can read more about that in my earlier post, …and then there’s complete paranoia. You can also see RFC792 and RFC1812 for more on hosts and why blocking pings is bad.

    You might also be interested in our two-part on MacGyvering Netstat as a protocol analyser. Here’s part one and part two of that series. Okay, enough brain exertion for one post. Let’s reset our gray matter by trying to figure out how they did this without the Wachowski Brothers /// er, siblings, incredible use of bullet-time cameras.

    direct link for RSS and email subscribers…http://www.youtube.com/watch?v=-dcmDscwEcI

    What tricks do you have for troubleshooting with ping?