Defaults, lurking in the dark

As the title suggests, today we’ll discuss one of a few defaults applicable for Cisco IOS platforms. The idea is twofold: to show somewhat unexpected behavior at initial glance and to try troubleshooting it.

The topology for this article is the following:

All routers are Cisco 7200 running 15.2(4)M11; however, R3 emulates a host aka server while R4 is the Internet. R1-Switch1 link uses a filter to emulate higher delay. Here are the snippets of the relevant parts of configuration:

R4(config)# interface Loopback0
R4(config-if)# ip address 4.4.4.4 255.255.255.255
R4(config)# interface FastEthernet1/0
R4(config-if)# ip address 192.168.14.4 255.255.255.0
R4(config)# router eigrp 1
R4(config-router)# network 0.0.0.0
R3(config)# interface FastEthernet0/0
R3(config-if)# ip address 192.168.0.3 255.255.255.0
R3(config)# ip route 0.0.0.0 0.0.0.0 192.168.0.1
R2(config)# interface FastEthernet0/0
R2(config-if)# ip address 192.168.0.2 255.255.255.0
R2(config)# interface FastEthernet0/1
R2(config-if)# ip address 192.168.12.2 255.255.255.0
R2(config)# router eigrp 1
R2(config-router)# network 0.0.0.0
R1(config)# interface FastEthernet0/0
R1(config-if)# ip address 192.168.0.1 255.255.255.0
R1(config)# interface FastEthernet0/1
R1(config-if)# ip address 192.168.12.1 255.255.255.0
R1(config)# interface FastEthernet1/0
R1(config-if)# ip address 192.168.14.1 255.255.255.0
R1(config)# router eigrp 1
R1(config-router)# network 0.0.0.0

At some point R2 was out of service and now it’s brought back online. Network engineer observes a high load of one-way downstream traffic towards R3 from the Internet that is sensitive to delay. The link R1-Switch1 is congested by this traffic, so the engineer decides to steer the traffic towards newer R2. Upstream traffic is negligible so R1 would remain default gateway for a time being.

The most obvious way to implement the plan above is to configure a static route on R1 towards R2. This way R1 would send the traffic towards R2 according to the static /32 route; R2 would use its connected /24 route to send the traffic to R3. So, the engineer set off to perform the necessary configuration change:

R1(config)# ip route 192.168.0.3 255.255.255.255 192.168.12.2

However, as soon as the engineer did the configuration, R3 went completely offline:

R3#ping 4.4.4.4
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 4.4.4.4, timeout is 2 seconds:
..…
Success rate is 0 percent (0/5)
R4#ping 192.168.0.3 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
Packet sent with a source address of 4.4.4.4 
.....
Success rate is 0 percent (0/5)
Neither of the upstream routers are reachable as well:
R3#ping 192.168.0.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
R3#ping 192.168.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.2, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

There are no issues found on Switch1, CAM table seems perfectly fine. Let’s run the traceroute on both R3 and R4:

R3#traceroute 4.4.4.4 
Type escape sequence to abort.
Tracing the route to 4.4.4.4
VRF info: (vrf in name/id, vrf out name/id)
  1  *  *  * 
  2  *  *  * 
  3  *  *  * 
  <output trimmed>
R4#traceroute 192.168.0.3 source loopback 0
Type escape sequence to abort.
Tracing the route to 192.168.0.3
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.14.1 16 msec 12 msec 8 msec
  2 192.168.12.2 20 msec 20 msec 20 msec
  3 192.168.0.1 1388 msec 1300 msec 1296 msec
  4 192.168.12.2 1420 msec 1296 msec 1300 msec
  5 192.168.0.1 2696 msec 2604 msec 2696 msec
  6 192.168.12.2 2612 msec 2704 msec 2696 msec
<output trimmed>

Traceroute clearly states that there is a loop in the network; to be more precise, R2 seems to misbehave – it is sending traffic back to R1. However, from R2 point of view, it’s doing everything right: IP address falls into connected subnet, ARP request for R3 address is issued and ARP reply is received:

R2#sho ip cef 192.168.0.3
192.168.0.3/32
  attached to FastEthernet0/0
R2#sho adjacency 192.168.0.3
Protocol Interface                 Address
IP       FastEthernet0/0           192.168.0.3(7)

The routing part is awfully normal so let’s dig deeper into ARP part:

R2#sho ip arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.0.1            53   ca01.221b.0008  ARPA   FastEthernet0/0
Internet  192.168.0.2             -   ca02.224b.0008  ARPA   FastEthernet0/0
Internet  192.168.0.3            54   ca01.221b.0008  ARPA   FastEthernet0/0
Internet  192.168.12.1           55   ca01.221b.0006  ARPA   FastEthernet0/1
Internet  192.168.12.2            -   ca02.224b.0006  ARPA   FastEthernet0/1

Notice anything special? That’s right, 192.168.0.1 and 192.168.0.3 have the same MAC address – ca01.221b.0008 that belongs to R1. Well, seems to be a bug of some sort, let’s clear the ARP table to get rid of it:

R2#clear arp 
R2#sho ip arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.0.1             0   ca01.221b.0008  ARPA   FastEthernet0/0
Internet  192.168.0.2             -   ca02.224b.0008  ARPA   FastEthernet0/0
Internet  192.168.0.3             0   ca01.221b.0008  ARPA   FastEthernet0/0
Internet  192.168.12.1            0   ca01.221b.0006  ARPA   FastEthernet0/1
Internet  192.168.12.2            -   ca02.224b.0006  ARPA   FastEthernet0/1

Too persistent for a bug, don’t you think? Reminder for those who wonder why all the entries are refreshed and not deleted: if ARP is cleared, IOS sends unicast (that’s right) ARP request to refresh the entries. Flapping the interface on R2, however, gives a glimpse of hope:

R4#ping 192.168.0.3 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
Packet sent with a source address of 4.4.4.4 
...!.

It seems that the traffic goes initially as intended; however, after a while something disrupts this flow and rewrites R3 ARP entry on R2 to point towards R1.

Remember the default that is related to ARP and is usually advised to be turned off? If you guessed Proxy ARP, you’re correct. As you might remember, Proxy ARP was designed to provide connectivity for hosts without default gateway or with incorrect subnet:

Since the router knows that the target address (172.16.20.200) is on another subnet and can reach Host D, it replies with its own MAC address to Host A.

Cisco.com, Troubleshooting TechNotes

However, that’s just one piece of a puzzle. Another one is related to ARP processing. According to RFC826, ARP messages can either update an existing entry or create a new one on a target machine. In our case it would mean that no reply from R1 on behalf of R3 should create ARP entry on R2; unfortunately, it can “update” it. Consider the following series of events:

  1. R2 receives packet for R3 and finds no ARP entry for it;
  2. R2 sends ARP request and receives a legitimate ARP reply from R3 thus allowing some pings to pass through;
  3. Since ARP request is broadcast, R1 also receives it, although after a certain delay due to R1-Switch1 link characteristics;
  4. R1 has a /32 route for 192.168.0.3 so Proxy ARP response is triggered since R1 believes 192.168.0.3 resides in another subnet;
  5. Proxy ARP response from R1 arrives at R2 later than the legitimate frame from R3 so ARP entry on R2 is overwritten with R1 MAC address;
  6. Alas, permanent routing loop is created.

Let’s check whether disabling Proxy ARP fixes the issue:

R1(config)#int f0/0
R1(config-if)#no ip proxy-arp 
R2#clear arp
R4#ping 192.168.0.3 source loopback 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
Packet sent with a source address of 4.4.4.4 
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 116/120/124 ms
R2#sho ip arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.0.1             1   ca01.221b.0008  ARPA   FastEthernet0/0
Internet  192.168.0.2             -   ca02.224b.0008  ARPA   FastEthernet0/0
Internet  192.168.0.3             1   ca03.225a.0008  ARPA   FastEthernet0/0
Internet  192.168.12.1            1   ca01.221b.0006  ARPA   FastEthernet0/1
Internet  192.168.12.2            -   ca02.224b.0006  ARPA   FastEthernet0/1

At last, the scheme is working as intended. Although it might seem that the issue is caused by faulty R1-Switch1 link and it is not possible in modern network, consider the length of L2 segment and network equipment delays. If you take into account these factors, it becomes clear that it’s pretty possible that ARP response from Proxy ARP router might randomly arrive later than the correct one making the corresponding host unreachable. It’s true that in designs built according to best practices there would be little room for asymmetric routing at first hop; however, there is also a chance that not everything is taken into account initially and some fine-tuning would be required. Resumé: disable Proxy ARP unless you really need it.

The idea and the explanation were inspired by Routing TCP/IP, Volume 1, 2nd Edition, specifically the section about troubleshooting static routes.

Kudos for review: Anastasiia Kuraleva, Maxim Klimanov

Follow on Telegram, LinkedIn

Leave a comment