How a Bloated IPv6 Neighbor Cache Silently Killed Our Network

Network slowness is one of those complaints that can mean just about anything. Users say “the internet is slow,” and suddenly you’re chasing a dozen possible causes across your entire infrastructure. Recently, I worked through one of these cases, and the root cause turned out to be something I hadn’t encountered before: a bloated IPv6 Neighbor Discovery (ND) cache on a distribution switch silently degrading performance across the campus.

The Complaint

It started with tickets from both wireless and wired users reporting sluggish connectivity. Pages were loading slowly, applications were timing out, and VoIP quality had dropped noticeably. The usual suspects.

The Standard Playbook

When slowness hits, there’s a mental checklist most network engineers run through:

Start at the WAN edge. Are the uplinks saturated? Any interface errors or packet drops? In our case, the WAN links looked healthy. Bandwidth utilization was within normal range, no CRC errors, no input/output drops worth worrying about.

Move to the core. Check CPU and memory utilization on the core switches. High CPU on a core switch can cause control plane instability, slow routing convergence, and degrade forwarding performance. Our core switches showed slightly elevated CPU, but nothing that screamed “problem here.”

Check the distribution layer. Interface stats, spanning tree topology, any flapping links. This is where things got interesting.

Look at the access layer. Client DHCP times, AP association counts, channel utilization for wireless. We checked these too, but the problem clearly wasn’t isolated to a single access switch or a single AP.

Narrowing It Down

What caught my attention was the distribution switch. While the CPU wasn’t pegged, there was a noticeable delay in CLI responses. Running show processes cpu sorted revealed that the IPv6 ND process was consuming more CPU cycles than expected. That’s unusual for a distribution switch that shouldn’t be doing heavy IPv6 lifting.

I pulled up the IPv6 neighbor table and counted entries for our internal prefix:

di_gw2#sh ipv6 neighbors | count 2001
Number of lines which match regexp = 138732
di_gw2#

138,732 entries. That’s not a neighbor table; that’s a phone book.

Understanding the Problem

IPv6 Neighbor Discovery Protocol (NDP) is the IPv6 equivalent of ARP in IPv4. When a device needs to communicate with another IPv6 host on the same link, it sends a Neighbor Solicitation (NS) message. The response gets cached in the neighbor table so the switch doesn’t have to resolve the same address repeatedly.

The problem is what happens when those entries go stale but never get cleaned up. In a healthy network, neighbor cache entries cycle through states: INCOMPLETE, REACHABLE, STALE, DELAY, and PROBE. Under normal conditions, stale entries should eventually time out and be removed. But in our case, they weren’t.

What was happening here was that the switch was accumulating neighbor entries for IPv6 addresses it had seen over time, including addresses from hosts that had long since left the network, temporary IPv6 addresses generated by privacy extensions, and addresses from devices that had cycled through multiple SLAAC allocations. The neighbor table kept growing, and the switch was spending an increasing amount of CPU cycles maintaining, searching through, and attempting to resolve entries in this massive table. This manifested as general network sluggishness because the switch’s control plane was bogged down with ND processing instead of handling real traffic efficiently.

The Fix

The immediate fix was straightforward:

di_gw2#clear ipv6 neighbors

Within seconds, CLI responsiveness improved. The CPU utilization dropped back to normal. Users started confirming that connectivity was back to full speed. The neighbor table rebuilt itself organically with only the entries that were actually needed, settling at a few hundred entries instead of 138,000+.

The Bug

This behavior was traced to Cisco bug CSCws15533, where the IPv6 neighbor cache fails to properly age out stale entries under certain conditions. The entries accumulate over time, gradually degrading switch performance until the control plane is effectively overwhelmed.

This is the kind of bug that doesn’t announce itself. There’s no syslog message screaming “your neighbor table is full.” The switch just gets progressively slower, and because the degradation is gradual, it often gets blamed on something else entirely, the WAN, the wireless controller, or the application itself.

Preventing Recurrence

Clearing the neighbor cache is a band aid. The real question is: how do you prevent this from happening again?

Tune the ND cache timers. You can configure how aggressively the switch expires stale neighbor entries:

interface Vlan100
 ipv6 nd cache expire 7200 refresh

The ipv6 nd cache expire command sets how long (in seconds) a neighbor cache entry is retained before being removed. The refresh keyword is key here: it tells the switch to actively probe the neighbor before removing it. If the neighbor responds, the entry stays and the timer resets. If it doesn’t, the entry gets cleaned up. The default behavior without this command is more passive and, as we saw, can result in entries lingering far longer than they should. Apply this to your SVIs (VLAN interfaces) where IPv6 is active.

Limit the neighbor cache size. On platforms that support it, you can cap the maximum number of entries:

ipv6 nd cache interface-limit 8000

This prevents the table from growing unbounded, though you’ll want to set the limit based on the actual scale of your network.

Consider IPv6 RA Guard and ND Inspection. If you’re running IPv6 SLAAC in your environment, deploying RA Guard on access ports prevents rogue router advertisements from creating unnecessary neighbor entries. IPv6 ND inspection (part of the First Hop Security suite) provides another layer of control.

Monitor proactively. Add the IPv6 neighbor count to your monitoring. A simple EEM script or periodic SNMP poll that alerts when the count crosses a threshold would have caught this issue weeks before it impacted users:

event manager applet ipv6-nd-check
 event timer watchdog time 3600
 action 1.0 cli command "enable"
 action 2.0 cli command "show ipv6 neighbors | count 2001"
 action 3.0 syslog msg "IPv6 ND cache count: $_cli_result"

Lessons Learned

This was a good reminder that IPv6 is not just “IPv4 with more bits.” The ancillary protocols, NDP, SLAAC, and privacy extensions, introduce their own operational overhead and failure modes. If your infrastructure is dual stack, you need to be monitoring and tuning the IPv6 side with the same rigor you apply to IPv4.

When chasing slowness, don’t stop at the obvious. If the WAN is clean and the core is fine, dig deeper. The answer might be hiding in a protocol table you never thought to check.

The Complaint#

The Standard Playbook#

Narrowing It Down#

Understanding the Problem#

The Fix#

The Bug#

Preventing Recurrence#

Lessons Learned#