Traceroute is the program that shows you the route over the network between two systems, listing all the intermediate routers a connection must pass through to get to its destination. It can help you determine why your connections to a given server might be poor, and can often help you figure out where exactly the problem is. It also shows you how systems are connected to each other, letting you see how your ISP connects to the Internet as well as how the target system is connected.
This tutorial was written for users of premium Usenet services, but can be useful for anyone wanting to learn to use traceroute.
The traceroute program is available on most computers which support networking, including most Unix systems, Mac OS X, and Windows 95 and later.
On a Unix system, including Mac OS X, run a traceroute at the command line like this:
If the traceroute command is not found, it may be present but not in your shell's search path. On some systems, traceroute can be found in /usr/sbin, which is often not in the default user path. In this case, run it with the full path:
On Mac OS X, if you would rather not open a terminal and use the command line, a GUI front-end for traceroute (and several other utilities) called Network Utility can be found in the Utilities folder within the Applications folder. Run it, click the “Traceroute” tab, and enter an address to run a trace to.
MTR is an alternate implementation of traceroute for Unix. It combines a trace with continuing pings of each hop to provide a more complete report all at once. It is available here.
If you're stuck with Windows, the command is called tracert. Open a DOS window and enter the command:
You can also download VisualRoute, a graphical traceroute program available for Windows, Sparc Solaris, and Linux. VisualRoute helps you analyze the traceroute, and provides a nifty world map showing you where your packets are going (it's not always geographically accurate). View a screenshot (I have obscured my local addresses).
Here is some example traceroute output, from a Unix system:
traceroute to library.airnews.net (22.214.171.124), 30 hops max, 40 byte packets 1 rbrt3 (126.96.36.199) 4.867 ms 4.893 ms 3.449 ms 2 519.Hssi2-0-0.GW1.EWR1.ALTER.NET (188.8.131.52) 6.918 ms 8.721 ms 16.476 ms 3 113.ATM3-0.XR2.EWR1.ALTER.NET (184.108.40.206) 6.323 ms 6.123 ms 7.011 ms 4 192.ATM2-0.TR2.EWR1.ALTER.NET (220.127.116.11) 6.955 ms 15.400 ms 6.684 ms 5 105.ATM6-0.TR2.DFW4.ALTER.NET (18.104.22.168) 49.105 ms 49.921 ms 47.371 ms 6 298.ATM7-0.XR2.DFW4.ALTER.NET (22.214.171.124) 48.162 ms 48.052 ms 47.565 ms 7 194.ATM9-0-0.GW1.DFW1.ALTER.NET (126.96.36.199) 47.886 ms 47.380 ms 50.690 ms 8 iadfw3-gw.customer.ALTER.NET (188.8.131.52) 69.827 ms 68.112 ms 66.859 ms 9 library.airnews.net (184.108.40.206) 174.853 ms 163.945 ms 147.501 ms
Here, I am tracing the route to library.airnews.net, the news server name at Airnews. The first line of output is information about what I'm doing; it shows the target system, that system's IP address, the maximum number of hops that will be allowed, and the size of the packets being sent.
Then we have one line for each system or router in the path between me and the target system. Each line shows the name of the system (as determined from DNS), the system's IP address, and three round trip times in milliseconds. The round trip times (or RTTs) tell us how long it took a packet to get from me to that system and back again, called the latency between the two systems. By default, three packets are sent to each system along the route, so we get three RTTs.
Sometimes, a line in the output may have one or two of the times missing, with an asterisk where it should be:
9 host230-142.uuweb.com (220.127.116.11) 12.619 ms * *
In this case, the machine is up and responding, but for whatever reason it did not respond to the second and third packets. This does not necessarily indicate a problem; in fact, it is usually normal, and just means that the system discarded the packet for some reason. Many systems do this normally. These are most often computers, rather than dedicated routers. Systems running Solaris routinely show an asterisk instead of the second RTT.
It's important to remember that timeouts are not necessarily an indication of packet loss. This is a common misconception, but since there are only three probes, dropping one response is no big deal.
Sometimes you will see an entry with just an IP address and no name:
1 18.104.22.168 (22.214.171.124) 0.858 ms 1.003 ms 1.152 ms
This simply means that a reverse DNS lookup on the address failed, so the name of the system could not be determined.
If your trace ends in all timeouts, like this:
12 al-fa3-0-0.austtx.ixcis.net (126.96.36.199) 84.585 ms 92.399 ms 87.805 ms 13 * * * 14 * * * 15 * * *
This means that the target system could not be reached. More accurately, it means that the packets could not make it there and back; they may actually be reaching the target system but encountering problems on the return trip (more on this later). This is possibly due to some kind of problem, but it may also be an intentional block due to a firewall or other security measures, and the block may affect traceroute but not actual server connections.
A trace can end with one of several error indications indicating why the trace cannot proceed. In this example, the router is indicating that it has no route to the target host:
4 rbrt3.exit109.com (188.8.131.52) 35.931 ms !H * 39.970 ms !H
The !H is a “host unreachable” error message (it indicates that an ICMP error message was received). The trace will stop at this point. Possible ICMP error messages of this nature include:
Sometimes, with some versions of traceroute, you will see TTL warnings after the times:
6 qwest-nyc-oc12.above.net (184.108.40.206) 90.0 ms (ttl=251!) 90.0 ms (ttl=251!) 90.0 ms (ttl=251!)
This merely indicates that the TTL (time-to-live) value on the reply packet was different from what was expected. This probably means that your route is asymmetric (see below). This is not shown by all versions of traceroute, and can be safely ignored.
The output of the Windows version of traceroute is slightly different from the Unix examples (I have censored my router's name and IP address from the listing):
Tracing route to news-east.usenetserver.com [220.127.116.11] over a maximum of 30 hops: 1 3 ms 3 ms 2 ms my.router [xxx.xxx.xx.xxx] 2 35 ms 36 ms 35 ms rbtserv5.exit109.com [18.104.22.168] 3 36 ms 37 ms 36 ms rbrt3.exit109.com [22.214.171.124] 4 41 ms 40 ms 41 ms 571.Hssi5-0.GW1.EWR1.ALTER.NET [126.96.36.199] 5 42 ms 44 ms 52 ms 113.ATM2-0.XR1.EWR1.ALTER.NET [188.8.131.52] 6 43 ms 41 ms 41 ms 193.at-1-0-0.XR1.NYC9.ALTER.NET [184.108.40.206] 7 61 ms 41 ms 41 ms 181.ATM6-0.BR2.NYC9.ALTER.NET [220.127.116.11] 8 41 ms 42 ms 47 ms 18.104.22.168 9 47 ms 42 ms 42 ms so-6-0-0.mp2.NewYork1.level3.net [22.214.171.124] 10 65 ms 63 ms 68 ms loopback0.hsipaccess1.Atlanta1.Level3.net [126.96.36.199] 11 104 ms 68 ms 80 ms news-east.usenetserver.com [188.8.131.52] Trace complete.
The Windows version does not show ICMP error messages in the manner described above. Errors are shown as (possibly ambiguous or confusing) text. For example, a “host unreachable” error will be shown as “Destination net unreachable” on Windows.
The rest of the examples will be in Unix format.
Any connection over the Internet actually depends on two routes: the route from your system to the server, and the route from that server back to your system. These routes may be (and often are) completely different (asymmetric). If they differ, a problem in your connection could be a problem with either the route to the server, or with the route back from the server. A problem reflected in a traceroute output may actually not lie with the obvious system in your trace; it may rather be with some other system on the reverse route back from the system that looks, from the trace, to be the cause of the problem.
So a traceroute from you to the server is only showing you half of the picture. The other half is the return route or reverse route. So how can you see that route?
In the good old days, you could use source routing with traceroute to see the reverse trace back to you from a host. The idea is to specify what is called a loose source route, which specifies a system your packets should pass through before proceeding on to their destination.
The ability to use loose source routing to see the reverse route could be pretty handy. Unfortunately, source routing has a great potential for abuse, and therefore most network administrators block all source-routed packets at their border routers. So, in practice, loose source routes aren't going to work.
These days, the only hope you likely have of running a reverse traceroute is if the system you want to trace from has a traceroute facility on their web site. Many systems, and Usenet providers in particular, have a web page where you can run a traceroute from their system back to yours. In combination with your trace to their system, this can give you the other half of the picture. I have a list of Usenet provider traceroute pages here.
It can also be useful to see the result of a traceroute from somewhere else on the net. There are many public traceroute pages available which let you trace from those systems to other systems or back to your own system. There is an exhaustive list at www.traceroute.org.
Since many systems are multi-homed (have more than one connection to the Internet), you may have to run traces to a system from multiple locations in order to “see” all of its connections. In addition to diagnosing technical problems, this can be useful to determine what kind of connections a system has to the Internet.
If your trace to a system ends in timeouts, and never completes, there could be a problem. (The other explanation is that a system is blocking traceroute attempts, either by filtering all ICMP messages or by other means.) Your next step is to figure out where the problem is.
Well, obviously, if the trace stops at a particular system and can't go any further, then that system is where the problem lies, right? Possibly, but not necessarily.
If your traceroute ends in timeouts at a certain system, it's likely that either the connection between that system and the next system on the route, or the next system itself, is the source of the problem. The system may be down, or the network connecting them may be down. You may just have to wait for the problem to be fixed, especially if the problem system is not at your ISP and thus you aren't a paying customer of that network.
The problem could, however, not be with that system. Recall that the packets must travel from your system to the router and back again before you can see the results, and that the return route may be different from the forward route. Thus, the problem could lie somewhere on the return route between the system giving the timeouts and your own system, and that problem may not be reflected in the previous parts of the trace because the route may be entirely different.
Let's say you have a timeout like this:
16 c1-pos5-3.snjsca1.home.net (184.108.40.206) 136.612 ms 129.795 ms 129.133 ms 17 bb1-pos6-0-0.rdc1.sfba.home.net (220.127.116.11) 130.473 ms 137.609 ms 134.162 ms 18 * * *
The last reachable system on the route is at hop 17. The problem may be with the system at hop 18, or with the network connection between hops 17 and 18. Or it may be on the return route. It's very possible that the routers at hop 17 and hop 18 have different return routes to your system. The return route from 17 may work just fine, while the return route from 18 has a problem. That problem could be with that system, or it could be a totally different system, many hops away. It could even be a problem at your own ISP. The only way to tell is to see the reverse trace. A reverse trace from hop 17 would be useful here as well, to verify that the routes are indeed different. Of course, it may be difficult or impossible to obtain traceroutes from those systems, because the network administrator at home.net would have to run them for you, and is probably too busy to worry about such a request.
In this case, you can try running traces to the target system from various other places (use the list at traceroute.org) to see if it is reachable from elsewhere. In the above example, if you knew what router was normally at hop 18 (from seeing it in previous traces), you could try a trace to that router from another site.
If your route to a server is very long, performance is going to suffer. A long route can be due to less-than-optimal configuration within some network along the way. Take a look at this route:
traceroute to 18.104.22.168 (22.214.171.124), 30 hops max, 40 byte packets 1 main2-249-97.iad.above.net (126.96.36.199) 1.143 ms 0.559 ms 0.382 ms 2 core1-main2-oc3-1.iad.above.net (188.8.131.52) 0.574 ms 0.886 ms 0.429 ms 3 sjc-iad-oc12-1.sjc.above.net (184.108.40.206) 82.134 ms 82.537 ms 82.158 ms 4 sl-gw8-sj-0-1.sprintlink.net (220.127.116.11) 82.523 ms 82.383 ms 82.949 ms 5 sl-bb12-sj-6-0.sprintlink.net (18.104.22.168) 82.348 ms 82.762 ms 83.029 ms 6 sl-bb10-sj-8-0.sprintlink.net (22.214.171.124) 83.346 ms 83.012 ms 83.006 ms 7 sl-bb10-rly-6-0.sprintlink.net (126.96.36.199) 136.004 ms 135.804 ms 136.274 ms 8 sl-bb6-dc-0-0-0.sprintlink.net (188.8.131.52) 137.625 ms 137.204 ms 136.794 ms 9 gip-dc-2-fddi1-0.gip.net (184.108.40.206) 137.344 ms 138.156 ms 139.390 ms 10 gip-arch-1-atm2-0-0-132-atm.gip.net (220.127.116.11) 311.850 ms 325.246 ms 285.607 ms 11 gip-telehouse-1-atm0-0-0-333-atm.gip.net (18.104.22.168) 281.472 ms 291.957 ms 314.661 ms 12 gip-linx-fddi0.gip.net (22.214.171.124) 277.425 ms 297.364 ms 248.030 ms 13 linx-gw1.UK.EU.net (126.96.36.199) 291.800 ms 213.447 ms 221.377 ms 14 Nyk-nr01.NY.US.EU.net (188.8.131.52) 266.863 ms 301.220 ms 320.008 ms 15 nyc-core-02.inet.qwest.net (184.108.40.206) 206.191 ms 233.207 ms * 16 nyc-core-03.inet.qwest.net (220.127.116.11) 235.085 ms 270.805 ms 252.668 ms 17 nyc-core-01.inet.qwest.net (18.104.22.168) 281.931 ms 277.519 ms 278.152 ms 18 wdc-core-02.inet.qwest.net (22.214.171.124) 265.548 ms 233.789 ms 219.698 ms 19 wdc-core-03.inet.qwest.net (126.96.36.199) 200.913 ms 225.456 ms 246.335 ms 20 atl-core-01.inet.qwest.net (188.8.131.52) 237.049 ms 253.304 ms 215.435 ms 21 atl-edge-04.inet.qwest.net (184.108.40.206) 234.406 ms 289.490 ms 300.829 ms 22 220.127.116.11 (18.104.22.168) 296.876 ms 333.235 ms 272.397 ms 23 Adelphia-pvc55-t3-gw.aibusiness.net (22.214.171.124) 287.180 ms 268.736 ms 276.649 ms 24 surf4-145-237.pbc.adelphia.net (126.96.36.199) 382.868 ms 420.165 ms 393.398 ms
In this example, both the source and destination of the trace are in the United States. However, note that between hops 11 and 14, the route goes to London and back (LINX is the London Internet Exchange). Obviously, this is a problem; there are two transatlantic hops here which are completely unnecessary. Sprintlink is handing the traffic off to gip.net, which is taking it across the ocean before giving it to Qwest.
Recall that the three numbers given on each line of output show the round trip times (latency) in milliseconds. Smaller numbers generally mean better connections. As the latency of a connection inreases, interactive response suffers. Download speed can also suffer as a result of high latency (due to TCP windowing), or as a result of whatever is actually causing that high latency.
Typically, a modem connection's inherent latency will be around 120-130ms. The latency on an ISDN line is usually around 40-45ms. If you use a connection of this type, you won't see any better than these numbers.
If you see, in a trace output, a large “jump” in latency from one hop to the next, that could indicate a problem. It could be a saturated (overused) network link; a slow network link; an overloaded router; or some other problem at that hop. Of course, it could also be a problem anywhere on the return route from the high-latency hop as well. You can use the ping program (described below) to get a better idea of the latency as well as the packet loss to a given site or router; traceroute only does three probes per router (by default), which isn't a very good sample on its own.
A jump in latency can also indicate a long hop, such as a cross-country link or one that crosses an ocean. A long line is naturally going to have higher latency than a short one. For example:
4 core1.telehouse.level3.net (188.8.131.52) 2.355 ms 4.932 ms 3.473 ms 5 core1.London1.Level3.net (184.108.40.206) 2.550 ms 1.934 ms 3.110 ms 6 atm10-0-100.core1.NewYork1.Level3.net (220.127.116.11) 77.629 ms 75.664 ms 75.351 ms
The link between hops 5 and 6 is transatlatic, and thus is adding more than 70ms to the latency. This is normal.
One example of “weirdness” that you might see in traceroute output is exposure of private address space. Certain ranges of IP addresses are reserved for private, non-Internet use. These address ranges are not assigned to anyone, and are open for use by any system. They cannot be routed over the Internet, and thus are for internal use only. Sending traffic between private address space and outside networks must be done via internal routing or address translation.
The reserved private address ranges are:
Private addresses should never be visible over the Internet. But, sometimes you will see them in traceroute output. If they appear within your local network, this is okay; private addresses inside your own network can be visible to you. If, however, they appear within someone else's network, this can be problematic:
10 ebay-2-gw.customer.ALTER.NET (18.104.22.168) 114.204 ms 123.232 ms 120.957 ms 11 10.1.2.5 (10.1.2.5) 110.693 ms 114.475 ms 107.747 ms 12 * * * 13 * * *
The private address 10.1.2.5 within another network should not be visible to us. In this case, though, it is the last visible address before the trace ends in timeouts.
Visibility of private IP addresses doesn't necessarily (or even usually) mean that the route does not work. It is often simply the way the administrators of the target network have set up their system. In fact, the output above, despite the private IP address and the timeouts, shows a route that works perfectly well for web access.
However, a route which includes private addresses is difficult to troubleshoot. You can't ping the private routers to see if there is any packet loss. You can't trace directly to them from other sites. And in general, they show a certain level of cluelessness in how the network is set up.
Here is another example of routing weirdness:
11 USW-phx-gw.customer.ALTER.NET (22.214.171.124) 142.840 ms 151.245 ms 129.564 ms 12 126.96.36.199 (188.8.131.52) 127.569 ms vdsla121.phnx.uswest.net (184.108.40.206) 185.214 ms * 13 vdsla121.phnx.uswest.net (220.127.116.11) 442.912 ms 205.956 ms 221.537 ms 14 vdsla121.phnx.uswest.net (18.104.22.168) 164.728 ms 186.997 ms 190.414 ms 15 vdsla121.phnx.uswest.net (22.214.171.124) 306.964 ms 189.152 ms 221.288 ms
All looks well until hop 12. At that hop, the first packet is replied to from 126.96.36.199, but the second and third (which should be coming from the same place) are being returned from a different address, and timing out, respectively. After that, hops 13, 14, and 15 are all showing the same address! Since the response times are actually different, though, we can guess that they are, in reality, different systems. The trace ends normally at hop 15.
So what the heck is going on here? US West says this is a security measure, to hide the details of their internal network. The last few hops all return the address of the end-user's ADSL line, rather than their actual address. I'm not entirely sure what kind of “security” this is meant to provide.
Obviously, this makes any kind of troubleshooting of this connection next to impossible. If you encounter problems in this situation, the best you can do is contact the network provider and let them deal with it.
Sometimes you might see a route start “looping” back and forth between two routers, until the 30-hop limit is reached. This is a routing loop. This usually means that one router has lost communication (BGP) with another, and thus has dropped that route. Since the router has lost the route it needs, it sends the packet back where it came from, thinking maybe that is the best route. That router knows better and sends it back to the other one, over and over. Here's an example of a loop:
14 hou-core-03.inet.qwest.net (188.8.131.52) 165.484 ms 164.335 ms 175.928 ms 15 hou-core-02.inet.qwest.net (184.108.40.206) 162.291 ms 172.713 ms 171.532 ms 16 kcm-core-01.inet.qwest.net (220.127.116.11) 212.967 ms 193.454 ms 199.457 ms 17 dal-core-01.inet.qwest.net (18.104.22.168) 206.296 ms 212.383 ms 189.592 ms 18 kcm-core-01.inet.qwest.net (22.214.171.124) 210.201 ms 225.674 ms 208.124 ms 19 dal-core-01.inet.qwest.net (126.96.36.199) 189.089 ms 201.505 ms 201.659 ms 20 kcm-core-01.inet.qwest.net (188.8.131.52) 334.19 ms 320.39 ms 245.182 ms 21 dal-core-01.inet.qwest.net (184.108.40.206) 218.519 ms 210.519 ms 246.635 ms
The ping program is used to determine whether a route is experiencing packet loss, and to measure latency.
On a Unix SVR4 system (such as Solaris), use the command:
ping -s news.server.name
On BSD Unix, Mac OS X, or Linux, use:
And if you're stuck with Windows, open a DOS window and type:
ping -t news.server.name
The output will consist of one line per ping (one per second), giving you the round-trip response time (RTT, or latency). The lower, the better. Note that if you can't traceroute to a system due to administrative blocking, you may not be able to ping it either.
Let the pings go for a while, then press control-C to stop it. You'll see a summary like this, on Unix:
----usenet73.supernews.com PING Statistics---- 76 packets transmitted, 76 packets received, 0% packet loss round-trip (ms) min/avg/max = 138/144/179
Or like this, on Windows:
Ping statistics for 220.127.116.11: Packets: Sent = 73, Received = 73, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 132ms, Maximum = 164ms, Average = 139ms
First you see an indication of packet loss. The more loss you see, the worse your connection will be, because every lost packet on a data connection must be retransmitted. If you see 20% packet loss, it's going to be painful. This number is more meaningful if you let ping run for a while; if you only do five pings, 20% packet loss means it dropped one packet, which could be no big deal. Let it go for a while.
Latency times are important for performance; the lower the better. If you play online games like Quake you are probably familiar with this concept. For Usenet reading, this will matter most if you read news online, interactively, staying connected to the server the whole time. If you use an offline newsreader which downloads articles all at once and lets you read them from your local disk, latency is much less important (it can affect sustained download speeds, but that is beyond the scope of this document). What the output is showing you is the minimum, average, and maximum latency times seen during the ping run. A few systems may include a fourth number showing the standard deviation.
If you see packet loss on a connection, you can use ping with your traceroute output to find the source of the loss. Start by pinging the next to last router in the trace. If you still see packet loss, ping the one before that. Eventually the packet loss will disappear, and you have found the part of the path where the problem begins.
Note, however, that as with other problems, the cause of the loss could be the first router on the path showing packet loss, or it could be anywhere on the return path from that router. Remember that the return path can be totally different from what you see in your trace output. But, this gives you a good place to start pointing fingers.
You don't need to worry about the low-level details of how traceroute works in order to use it. But, if you're interested, here they are.
Traceroute works by causing each router along a network path to return an ICMP (Internet Control Message Protocol) error message. An IP packet contains a time-to-live (TTL) value which specifies how long it can go on its search for a destination before being discarded. Each time a packet passes through a router, its TTL value is decremented by one; when it reaches zero, the packet is dropped, and an ICMP Time-To-Live Exceeded error message is returned to the sender.
The traceroute program sends its first group of packets with a TTL value of one. The first router along the path will therefore discard the packet (its TTL is decremented to zero) and return the TTL Exceeded error. Thus, we have found the first router on the path. Packets can then be sent with a TTL of two, and then three, and so on, causing each router along the path to return an error, identifying it to us. Eventually either the final destination is reached, or the maximum value (default is 30) is reached and the traceroute ends.
At the final destination, a different error is returned. Most traceroute programs work by sending UDP datagrams to some random high-numbered port where nothing is likely to be listening. When that final system is reached, since nothing is answering on that port, an ICMP Port Unreachable error message is returned, and we are finished.
The Windows version of traceroute uses ICMP Echo Request packets (ping packets) rather than UDP datagrams. In practice, this seems to make little difference in the outcome, unless a system along the route is blocking one type of traffic but not the other.
In the unlikely even that some program happens to be listening on the UDP port that traceroute is trying to contact, the trace will fail at the last hop. You can run another trace ucing ICMP Echo Requests, which will probably succeed, or specify a different target port for the UDP datagrams.
A few versions of traceroute, such as the one on Solaris, allow you to choose either method (high-port UDP or ICMP echo requests).