The mystery of Cisco 2960s and strange ARP cache

This all started when I introduced a new network monitoring tool on to the network, the tool was “cisco prime”, but before I say any more lets be clear the issue here has nothing to do with prime which is a great tool for managing cisco devices. I noticed that when pushing new IOS files and backing up switch configuration that some time they would seem to lose network connectivity. I was able to ping them and ssh to them from my desktop, but they would simple not speak to the prime server.

So lets start with the set up (ip address modified of course)

Prime server – a vmware guest , in vlan 1, ip address 10.10.224.98/21

my desktop – physical machine, in vlan 1 also, ip address 10.10.226.46/21

Switch – management interface in vlan 666, ip address 20.20.255.6/24

interface Vlan666
ip address 20.20.255.6 255.255.255.0
no ip proxy-arp
end
switch#sh ru int vlan 1
Building configuration…
Current configuration : 65 bytes
!
interface Vlan1
no ip address
no ip proxy-arp
shutdown
end

Router – 6506 with a live interface in both vlan 1 and 666 set as DFGW on clients.

So to start with I can ping every thing and every thing can ping every thing else, and on the switch with a show arp I see

switch#sh arp
Protocol Address Age (min) Hardware Addr Type Interface
Internet 20.20.255.1 0 4403.a754.8300 ARPA Vlan666
Internet 20.20.255.6 – 7010.5c99.f241 ARPA Vlan666
switch#

So all looking good, I can see the switch IP address and that of the DFGW

switch#ping 10.10.224.98
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.224.98, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/6 ms

But then the ping stops working? and the switch can no longer contact the prime server, however I can still see it from my desktop? Logging back on to the switch I again look at the ARP cache..

Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.10.224.99 2 0050.5683.61ce ARPA Vlan1
Internet 20.20.255.1 0 4403.a754.8300 ARPA Vlan666
Internet 20.20.255.6 – 7010.5c99.f241 ARPA Vlan666

What?? why has the switch learnt a mac address on vlan 1? I have two issues with it doing this, first Prime is still trying to contact it via the DFGW ( I can see this in a packet trace) so the switch is not seeing the packets coming in on vlan 1, second the interface for vlan 1 is shut down so there should be no ARP entries on it! to get it working I clear the ARP cache of this entry and all is good… well for a few days / hours and then it happens again, but the time between issues seems very random. Keeping an eye on it I see it happen with anther monitoring server, and then another? The one thing I notice is that all the times it happens it is always a server on vmware. Physical servers/appliance and desktops never seem to have this issue. This is the first piece of the puzzle, what does Vmware do different to other servers? They Migrate! And when they migrate between the physical hosts the vmware system sends a gratuitous arp on to the network to alert switches what port in the network to now find the server on? And some switches that have “ip arp gleaning” switched on which is the default hear this and place an entry in to there ARP table. Even though the switch had the interface vlan 1 shut down, it still passed vlan1 traffic through the switch at layer 2 and this seemed to be enough that it saw the ARP packet, and added the entry to its ARP table. Then of course the it try’s to use this entry for communication but as the interface is indeed shut it will not work!

A little bit of time with CISCO TAC and the solution was to disable IP arp gleaning on all the access switches, it might be useful for provisioning switches but as I found it can cause issues.

no ip arp gleaning tftp
no ip arp gleaning udp

The fact it was learning on a disabled interface is a bug and something CISCO are looking in to.

However that’s not quite the end of the story, disabling “IP ARP Gleaning” did not work! and this had us scratching out heads for a while, until one of the cisco engineers I was talking to noticed this..

Cisco Bug CSCun38166

“no ip arp gleaning tftp and udp” doesn’t work

CSCun38166

 Description
Symptom:
On 2960,when we configure “no ip arp gleaning tftp” and “no ip arp gleaning udp” then do
tftp use command”copy flash:teset tftp:”
It is expected following behavior:1.do not learn a IP address from other segment.
2.add a arp entry corresponding with HSRP virtual mac addressHowever in our case,it turns out like below:
1.generate a arp entry of IP address from other segment
2.add a arp entry corresponding with HSRP PHY mac address.

Conditions:
1.no ip arp gleaning tftp
OR
no ip arp gleaning udp
2.do tftp use following command
copy flash:teset tftp:

And looking at the version code, oh yes I would be running one of the affected versions. 15.0(2)SE6, a quick upgrade to version SE7 and all is good.

I am impressed though, hitting 2 cisco bugs in one issue.

And in the end it was prime that sorted it all out, a single click and it pushed the IOS update and the configuration for “no ip arp gleaning” to all 2960s affected devices, not going to ake this a post for plugging prime, but it does have its good points.

Generating traffic from a router

I was looking how to do this to test a monitoring system using GNS 3, and came across this little gem

http://etherealmind.com/the-poor-mans-ios-traffic-generator/

nothing fancy just enabling the small TCP servers (#service tcp-small-servers), but a nice simple way to push some data across a link between two routers. The article also mentions some other methods so will be looking in to them as well.

Always nice to be able to test some things with out having to fire up servers or other hard/soft ware :)

SecureCRT sending commands to multiple sessions.

I came across this in secureCRT and thought I would share it.

When labing things up (and indeed on real networks), there are times when you need to send the same command to multiple devices. you can of course copy and paste between the sessions but what about if you want to past the exact same block of configuration to 20 devices, or just want to do something simple like save the running configuration on your devices in you lab before you close down?

Well SecureCRT has a nice little feature to do this, so before enabled secure CRT looks much like below, as you can see I have several tabs open.

Default SecureCRT Window

However by going in to the view menu up the top there is a option to enable the “chat window”, this will bring up an extra panel at the bottom of the screen. Then by right clicking in this new panel you can enable the “send chat to all tabs” option as shown below.

Chat window enabled

Now any command typed in the chat window will be sent to all devices. Commands typed in the main terminal are still only sent to a single device.

What would be even nicer is if you could highlight multiple tabs and have the commands only sent to those terminal sessions. At the moment it is an all or nothing solution, maybe I will go suggest it to them as an improvement for future versions :)

The more I use CRT the more I like it, written quite a few scripts for it now, if you know any VB script or Java you can pretty much do what ever you like as SecureCRT has a nice simple API in to it.

I am finally moving house this week, so after that should have more time to post on here, and will take some of the script I have and tidy them up and post them for people.

Take care

DevilWAH

Switch Vs Route

I see a lot of people as Questions such as what is harder the Switch or the Route exam, Or Why is the Route coarse materials so much larger than the Switch, does this mean there is less to it? 642-902 ROUTE  642-813 SWITCH

So having now completed both foundation and cert guides here are my views.

First the two have very different goals that they are trying to teach, and approach things in the same way as you would likely see in the Real world.  

ROUTE

In the real world generally Routing protocols stand apart, while you may run EIGRP and OSPF with in he same organisation, most people will keep them separate and they will only interact at the borders. And there are only 3/4 major routing protocals that you woudl expect to see.

RIP,

OSPF,

EIGRP and

BGP.

While there are others these are the common ones that most people will using there jobs. So the ROUTE exam deals with these along with redistributing the routes between them.

This give the following Topics to study

EIGRP
OSPF
BRP
Redistribution and Patch control
IPv6

And each is covered in some detail.

SWITCH

On the other hand has many more topics, and in the case of switch’s many of these will be run on the same devices across the entire network, (eg. VLANS, Spanning Tree, ACL’s Switch Security) so the number of topics in the SWITCH exam is much higher. They are covered in less depth individual than the topics in ROUTE, however you are expected to understand how they all work together and how issues configuring one can cause issues in others.

A partial list of topics covered in switch are.

VLANS
Switch Operation (CAM TCAM and other switch tables)
CEF
VLANS
STP (all modes)
STP enhancements like BPDU guard and ULD detection.
Ether channels and port channels
Multilayer switches
High availabilities (redundet router and redundant supervisors)
IP telephony
Wireless
Securing switch devices
Port security
ACL’s
Vlan ACL’s
Private VLANS
QOS
and the list goes on….

So the question of what one is hard and what one is easy will very much depend on the person taking them, and the current experience they have. Many people do seem to find the Routing exam nicer and I think this is because you can take each topic seperatly and concentrate with out worrying about the rest. While I enjoyed Switch as it was lots of bite size chunks to get stuck in to.

People also ask what one to take first, honestly I don’t think knowing either one will help learning the other one, as long as you have  a decent understanding of networks. Personal I would first go for the one you have most experience with, and get it under your belt first.

The only one I would suggest leaving till last is the Trouble Shoot as this assumes you have knowlage of both Switch and Route.

Using Syslog while Studying in GNS3 (or indeed and cisco Lab)

I have been getting back in to my studying a lot lately and one thing I have found is the need to use a lot of debug commands so I can watch what is happening during things like routing updates and neighbour formation. One thing I do find though is that I am forever having to turn debug on and off, forgetting to do one or the other, and when it is on it clutters up the screen a breaks up the config I am entering making it difficult to read back.

Which got me thinking, I have used syslog servers a lot in the past, so why not send all the debugging out put to a syslog server and turn of logging to the console? This way I can have all the debugs in one place, and keep the console of the devices tidy so I can see what I am doing.

Now if you are doing this through GNS3 you will need a cloud connection so your PC can talk to your GNS3 network. If you are not sure how to do this there are lots of videos and walk though on the net, however the one below is one of the best I have found, very clear and complete.

How-To: Using the Cloud in GNS3 to Provide Internet Access from Matthew on Vimeo.

So once you have your cloud set up you then need to set up a simple GNS3 topology, Here I have set up 4 routers running OSPF connected through a switch as I am looking at the DR and BDR election process.

I have given R1 and R2 F0/1 address 192.168.10.10 and 192.168.10.20, and the loopback adapter used by the cloud is 192.168.10.254. Once the routers are booted and connected to the cloud, check they can ping the loop-back address (you may need to disable your fire wall on the loop-back connection.)

then of course you will need a SFTP server, in windows there are two good free choises, for a realy simple server that can run with out install try, http://tftpd32.jounin.net/tftpd32.html simple but does all you need, just make sure you disable dhcp and other none necessaries services in the settings. For a more complete tool try http://kiwisyslog.com/, they have a free syslog server offering that allows filtering and more.

In either case set it up and insure it is listing on the loopback interface, in the case of TFTP32d this is simple a case of choosing the interface from the drop down list.

Finale we need to change the logging setting of R1 and R2 to direct debugging message to the syslog server and not to the console. Remember debug messages are level 7 so we need to set console logging to level 6 or lower and trap logging to level 7. the following code will do just this from global config mode.

#logging 192.168.10.254
#logging console 6
#logging trap 7

So now we can enable the debugging and reset the neighbour relation ships to see what it looks like.

From the console

So not much there apart from we see the neighbours bounce as I clear the OSPF process.

So how about on the syslog server?? 

So here are all our debug messages, for us to scroll through and review at our leisure, If you have something like Kiwicat syslog server you could filer them in to views, based on device that sent it, or text with in message, ect.

You need to make sure of course that you either have the device connected directly to the syslog server network, or it has a route to get there. Directly connected is always best of course as you will insure that as long as that interface on the device is up you will catch all messages. On real hardware simply use a spare switch or create a separate VLAN and do exactly the same thing.

I have found for large labs this works great, indeed for testing setups for clients its great as well. once you have insured the correct debugging is enabled you can walk though test scripts and plans, safe in the knowledge that you have a full detailed log of every thing that has happened.

Simple to set up and hopefully some of you will find it useful.

DevilWAH

Spanning Tree enhancements (Backbone Fast)

Last time I look at the spanning tree enhancment I covered uplink fast, this is for detecting when a directly connected root port fails and switching over to a back up in the shortest time possible. But what happens if the link that fails is not directly connected. When a switch loses its link back to the root and needs to find an alternate path back. In the digram below switch B is blocking its port to Switch A to prevent loops.

The question is what happens if the link between Switch A and the Root fails? Well with out backbone fast the following sequince takes place.

When the link fails Switch A will no longer be receiving BPDU’s from the root, the direct link is down and the port on switch B is blocking so not forwarding BPDU’s.

Switch A will assume it is the new root and start to send BPDU’s towards Switch B declaring it is the root. However Switch B will see these are inferior BPDU’s to the on it has stored for the port connected to Switch A and ignore them.

This will continue to happen until the BPDU on the port times out, after which the port will go in to the listing and learning state before starting to forward. This is 20 seconds (max age timer) plus 2 x 15 seconds for the listing and learning stage. so a total of 50 seconds.

The idea behind Backbone fast is to cut this by 20 seconds by bypassing the max age timer. The idea is that if Switch B can confirm it still has a link back it’s current known root switch, then it can ignore the max age timer and start the listing and learning process on a port immidatly it receives a inferior BPDU.

Once backbone fast is enabled, when a switch receives a inferior BPDU on one of its ports, it will send a RLQ (root link query) packet out all it’s non designated ports including its root port (so all ports that lead back to the root). If it receives a RLQ response (these are sent from the bridge) then it knows it still has a link to root. It can then age out the port it is receiving the inferior BPDU’s on and start the listing learning stages. If it does not receive any responses then the switch has lost connectivity to the rest of the network and needs to start recomputing the whole STP.

Either way the max age time has been eliminated and 20 seconds have been shaved of the re convergence / fail over time.

Just like Uplink fast Backbone fast is configured on a switch level with the following command.

Switch(config)#spanning-tree backbonefast

and it needs to be configured on all switches on the network.

CISCO’s document HERE explains it in much more details and more examples.

DevilWAH

Simulating PC’s in GNS3

One of my main issues with GNS3 for studying that there are no end devices (PC’s) by default. Ok you can set up loopback interfaces, or add in an extra router and just configure one interface with an IP to act an an end device. But loopbacks don’t really show well when looking at the topology on screen, and adding a whole router (or creating a Qemu host) seems to be a bit much when all you need is a device that can reply to and send pings and/or run trace routes.

However the other day I came across VPCS! There is actuly a down load for this at the bottom of the GNS3 site (here), and it is so simple and easy to get up and running that I suggest any one who uses GNS3 has a look.

Simple unpack the zip file to a folder and then run the VPCS.exe file. this will by default create 3 PC’s with various IP addresses. However it is simple to configure it to create up to 9 separate simulated PC’s with either static or DHCP assigned addresses. These can then be easily added to GNS3 topologies by adding a cloud (see later in the post for how to make them look pretty), and configuring a NIO_UDP port. Really it takes all of 10 seconds to get it up and running. Then you have a simple CLI interface to the “virtual PC’s” where you can run Pings and Traceroutes.

There is only one thing to be careful of. You may find the cygwin1.dll file is a different version in GNS3 as to the version that comes with VPCS and this can casue issues. I find the simple way around this is to delete the cygwin1.dll file from the VPCS folder, and then copy the one form the GNS3 program folder in. (even better copy it to a system path such as c:\window\system32, and then delete it from both folders, and just keep a single copy they both can use).

The following is a link to a blog post on www.firstdigest.com with a video tutorial of how to set it up. (GNS3 and VPCS Video)

Now your GNS3 topology’s can actuly look and run like proper topologies and with our the overheads of emulating entire routers of operating systems. OK if you really need more complex hosts then there are other ways to do that, but for simple end devices that can ping and be pinged it is GREAT!!

DevilWAH

Spanning Tree enhancements (Uplink fast)

In my last job, I jumped straight in to configuring Rapid spanning Tree, I mean what is the point of running Standard STP with its 50second fail over times, when you can enable Rapid-STP and gain sub second fail over??

Well if you want to pass your CCNP SWITCH you need to know it, and you need to know how to configure the enhancements. Actually having read through them and labed them up. They do help in understanding how STP works and how the original protocol was improved in a number of way, before CISCO took all the enhancements and came up with Rapid-STP.

Over the next few post I will be covering all of the basic enhancements, including uplinkfast, backbone fast, portfast, loopguard etc..

Uplinkfast.

This is normaly configured on access switchs that have two links back to the root, in these cases after the initial STP algrothem has run, one of the ports (lowest priority back to the root bridge) will be designated as the root port, while the other will be blocked. See digram below.

Now with standard STP, if the active link fails, the switch sees the root port link has fail and as it is receiving root BPDU’s on the backup blocked port it starts to bring this up. However with out uplink fast enabled this requires the port to go through the listening and learning stages. By default this is 30 seconds of outage, and even with best STP tuning it still results in a 14 second outage.

However with uplink fast configured the switch keeps track of the blocked ports that point back to bridge and forms them in to an “uplink group”. Now if the primary link goes down the switch can pick the next best root port and immediately places it in the forwarding mode as this will not be creating a loop. This creates an almost instant fail over of the primary link. However switch CAM tables will now be out of sync, which could result in frames being sent down the wrong links. To sort this out, the switch creates dummy frames with source address from its CAM table, and destination of multicast address. this updates the other switches on the network.

Now when the link comes back up, the switch waits twice the forward delay + 5 seconds before it switches back over. This allows the core switch at the other end of the link to have time to run through STP and start forwarding on the port.

And that’s Uplink fast. Providing a method to allow instant fail over of directly redundant links towards the root.

Configuration is very simple and is carried out in global config mode.

Switch(config)Spanning-tree uplinkfast

DevilWAH

Upgrading Redundant Supervisors on 6500’s

I need to know how to upgrade the IOS on CISCO 6500’s, So here is the outline along with a download of the official PDF from CISCO.

First confirm what IOS image is running, which supervisor is running as active and the redundancy mode that is active. This can be achieved with the following commands.

Router#show version
Router#show module
Router#show redundancy 

You should now know the current IOS version and the module numbers of the supervisor units and which one is active.
Now we need to set up the boot variable (so the correct IOS boots) and copy the new IOS over if necessary.

Router#show bootvar  (Find out the current IOS booting)

Router#copy start tftp (copy the current start up configuration to TFTP for safe keeping)

Router#copy tftp disk0:          (Copy the new IOS to the two supervisors)
Router#copy tftp slavedisk0:

Router#no boot system disk0:old_IOS.bin
Router#boot system disk0:new_IOS.bin   (set up the new boot variable on the active supervisor)

Router#copy run start  (syncs running configuration on both supervisor units)

Now the boot variables are set up and the new IOS images are on the supervisors we can upgrade each in turn, starting with the one currently set as the standby.

Router#hw-module module  reset  (chose standby supervisor to reset)

Standby supervisor will now reboot and restart running the new IOS image. During this time the redundancy mode will fall back to RPR mode due to the mismatch in IOS images running. You can see this by running the “show” commands above.

Once the standby is fully back up and running you can now update the active module, this is done by forcing the standby to become active, this will result in a reload of the current active supervisor.

Router#redundancy force-switchover (switches standby to become active and reloads current active)

(you will need to swap console cables over to the standby supervisor as only the currently active one can be consoled in to.)

Again use the above “show” commands to check the modules have reloaded correctly, both supervisors are running the same IOS version, and the correct redundancy method. Now the original active router will be the standby. If you wish you apply the last command above to force a second switch over back to the original.

And there it is, upgrade of IOS with out taking the switch off line. Although it is suggested it is done out of hours as there may be some minor traffic interruptions during the switch over.

Here is the CISCO PDF, with more detail and outputs of the commands..

DevilWAH