Troubleshooting Switched Networks
As the number of switch features grows, so does the possibility that things will go wrong. This section presents recommendations for implementing a functional network. It also addresses some of the common reasons that port connectivity, VLAN configuration, VTP, and STP can fail, as well as what information to look for to identify the source of a problem.
There are many ways to troubleshoot a switch. Developing a troubleshooting approach or test plan works much better than using a hit-or-miss approach. Here are some general suggestions to make troubleshooting more effective:
- Take the time to become familiar with normal switch operation: The Cisco website (Cisco.com) has a lot of technical information that describes how its switches work. The configuration guides in particular are helpful.
- For more large multiswitch environments, have an accurate physical and logical map of the network on hand: A physical map shows how the devices and cables are connected. A logical map shows what segments (VLANs) exist in the network and which routers provide routing services to these segments. A spanning-tree map is also useful for troubleshooting complex issues. Because a switch can create different segments by implementing VLANs, the physical connections alone do not tell the whole story. You must know how the switches are configured to determine which segments (VLANs) exist and how they are logically connected.
- Have a plan: Some problems and solutions are obvious; others are not. The symptoms that you see in the network can be the result of problems in another area or layer. Before jumping to conclusions, try to verify in a structured way what is working and what is not. Because networks can be complex, it is helpful to isolate possible problem domains. One way to do this is to use the OSI seven-layer model. For example: Check the physical connections involved (Layer 1), check connectivity issues within the VLAN (Layer 2), check connectivity issues across different VLANs (Layer 3), and so on. Assuming that the switch is configured correctly, many of the problems you encounter will be related to physical layer issues (physical ports and cabling).
- Do not assume a component is working without first verifying that it is: If a PC is not able to log into a server across the network, it could be due to any number of things. Do not assume basic components are working correctly without testing them first; someone else might have altered their configurations and not informed you of the change. It usually takes only a minute to verify the basics (for example, that the ports are correctly connected and active), and it can save you valuable time.
Figure 2-37 outlines a basic flow for troubleshooting switch problems that will be used in this section.
Figure 2-37 Troubleshooting Flow
Troubleshooting Port Connectivity
If you are experiencing connectivity problems, the first thing to check is the port. Ports are the foundation of the switched network. If they do not work, nothing works! Some ports have special significance because of their location in the network and the amount of traffic they carry. These include ports that have connections to other switches, routers, and servers. They can be more complicated to troubleshoot because they often take advantage of special
features, such as trunking and EtherChannel. However, do not overlook the other ports; they, too, are significant because they connect users in the network. Figure 2-38 shows the flow for troubleshooting port connectivity.
Figure 2-38 Troubleshooting Port Connectivity
Hardware issues can be one of the reasons a switch has connectivity issues. To rule out hardware issues, verify the following:
- The port status for both ports involved in the link: Ensure that neither is shut down. The administrator may have manually shut down one or both ports, or the switch software may have shut down one of the ports because of a configuration error. If one side is shut down and the other is not, the status on the enabled side will be “notconnected” (because it does not sense a neighbor on the other side of the wire).
The status on the shutdown side will say something like “disable” or “errDisable” (depending on what actually shuts down the port). The link will not be active unless both ports are enabled.
- The type of cable used for the connection: You should use at least Category 5 cable for 100 Mbps connections, and Category 5e for 1 Gbps or faster. You use a straightthrough RJ-45 cable for end stations, routers, or servers to connect to a switch or hub. You use an Ethernet crossover cable for switch-to-switch connections or hub-to-switch connections. The maximum distance for Ethernet, FastEthernet, or Gigabit Ethernet copper wires is 100 meters.
- A software process disables a port: A solid orange light on the port indicates that the switch software has shut down the port, either by way of the user interface or by internal processes such as spanning tree BPDU guard; Root Guard, which prevents a port from becoming the root port; or port security violations.
Configuration of the port is another possible reason the port may be experiencing connectivity issues. Some of the common configuration issues are as follows:
- The VLAN to which the port belongs has disappeared: Each port in a switch belongs to a VLAN. If the VLAN is deleted, then the port becomes inactive. The following set of code illustrates that the command show interface interface will not reveal a problem when a port is configured to be part of a VLAN that does not exist.
SwitchX# sh int fa0/2
FastEthernet0/2 is up, line protocol is up (connected)
Hardware is Fast Ethernet, address is 0017.596d.2a02 (bia 0017.596d.2a02)
Description: Interface to RouterA F0/0
MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100BaseTX
SwitchX# sh int fa0/2 switchport
Administrative Mode: static access
Operational Mode: static access
Administrative Trunking Encapsulation: dot1q
Operational Trunking Encapsulation: native
Negotiation of Trunking: Off
Access Mode VLAN: 5 (Inactive)
Trunking Native Mode VLAN: 1 (default)
Administrative Native VLAN tagging: enabled
Voice VLAN: none
- Autonegotiation is enabled: Autonegotiation is an optional function of the FastEthernet (IEEE 802.3u) standard that enables devices to automatically exchange information about speed and duplex abilities over a link. You should not use autonegotiation for ports that support network infrastructure devices, such as switches, routers, or other nontransient end systems, such as servers and printers.
Autonegotiating speed and duplex settings is the typical default behavior on switch ports that have this capability. However, you should always configure ports that connect to fixed devices for the correct speed and duplex setting, rather than allow them to autonegotiate these settings. This configuration eliminates any potential negotiation issues and ensures that you always know exactly how the ports should be operating.
Troubleshooting VLANs and Trunking
To effectively troubleshoot switches, you must know how to identify and resolve VLAN performance issues, assuming you have identified and resolved any port connectivity and autonegotiation issues. Figure 2-39 shows the flow for troubleshooting VLANs and trunks.
Figure 2-39 Troubleshooting VLANs
Native VLAN Mismatches
The native VLAN that is configured on each end of an IEEE 802.1Q trunk must be the same. Remember that a switch receiving an untagged frame assigns the frame to the native VLAN of the trunk. If one end of the trunk is configured for native VLAN 1 and the other end is configured for native VLAN 2, a frame sent from VLAN 1 on one side is received on VLAN 2 on the other. VLAN 1 “leaks” into the VLAN 2 segment. There is no reason this behavior would be required, and connectivity issues will occur in the network if a native VLAN mismatch exists.
Trunk Mode Mismatches
You should statically configure trunk links whenever possible. However, Cisco Catalyst switch ports run DTP by default, which tries to automatically negotiate a trunk link. This Cisco proprietary protocol can determine an operational trunking mode and protocol on a switch port when it is connected to another device that is also capable of dynamic trunk negotiation. Table 2-15 outlines DTP mode operations.
Table 2-15 DTP Mode Examples
VLANs and IP Subnets
Each VLAN will correspond to a unique IP subnet. Two devices in the same VLAN should have addresses in the same subnet. With intra-VLAN traffic, the sending device recognizes the destination as local and sends an ARP broadcast to discover the MAC address of the destination.
Two devices in different VLANs should have addresses in different subnets. With interVLAN traffic, the sending device recognizes the destination as remote and sends an ARP broadcast for the MAC address of the default gateway.
Most of the time, inter-VLAN connectivity issues are the result of user misconfiguration. For example, if you incorrectly configure a router on a stick or Multilayer Switching (Cisco Express Forwarding), then packets from one VLAN may not reach another VLAN. To avoid misconfiguration and to troubleshoot efficiently, you should understand the mechanism used by the Layer 3 forwarding device. If you are sure that the equipment is properly configured, yet hardware switching is not taking place, then a software bug or hardware malfunction may be the cause.
Another type of misconfiguration that affects inter-VLAN routing is misconfiguration on end-user devices such as PCs. A common situation is a misconfigured PC default gateway. Too many PCs having the same default gateway can cause high CPU utilization on the gateway, which affects the forwarding rate.
Another important aspect of troubleshooting is the ability to identify and resolve VTP issues, assuming port connectivity and VLAN problems have been identified and resolved. Figure 2-40 shows the flow used for troubleshooting VTP issues.
Figure 2-40 VTP Troubleshooting
Unable to See VLAN Details in the show run Command Output
VTP client and server systems require VTP updates from other VTP servers to be immediately saved without user intervention. A VLAN database was introduced into Cisco IOS Software as a method to immediately save VTP updates for VTP clients and servers.
In some versions of software, this VLAN database is in the form of a separate file in Flash, called the vlan.dat file. You can view VTP and VLAN information that is stored in the vlan.dat file for the VTP client or VTP server if you issue the show vtp status command.
VTP server and client mode switches do not save the entire VTP and VLAN configuration to the startup-config file in NVRAM when you issue the copy running-config startupconfig command on these systems. VTP saves the configuration in the vlan.dat file. This behavior does not apply to systems that run in VTP transparent mode. VTP transparent
switches save the entire VTP and VLAN configuration to the startup-config file in NVRAM when you issue the copy running-config startup-config command. For example, if you delete the vlan.dat file on a VTP server or client mode switch after you have configured
VLANS, and then you reload the switch, VTP is reset to the default settings. (All userconfigured VLANs are deleted.) But if you delete the vlan.dat file on a VTP transparent mode switch and then reload the switch, it retains the VTP configuration. This is an example of default VTP configuration.
You can configure normal-range VLANs (2 through 1000) when the switch is in either VTP server or transparent mode. But on the Cisco Catalyst 2960 switch, you can configure extended-range VLANs (1025 through 4094) only on VTP-transparent switches.
Cisco Catalyst Switches Do Not Exchange VTP Information
When Cisco switches do not exchange VTP information, you need to be able to determine why they are not functioning properly. Use the following guidelines to troubleshoot this problem:
- There are several reasons why VTP fails to exchange the VLAN information. Verify these items if switches that run VTP fail to exchange VLAN information.
- VTP information passes only through a trunk port. Ensure that all ports that interconnect switches are configured as trunks and are actually trunking.
- Ensure that the VLANs are active on all the VTP server switches.
- One of the switches must be a VTP server in the VTP domain. All VLAN changes must be done on this switch to have them propagated to the VTP clients.
- The VTP domain name must match, and it is case sensitive. For example, CISCO and cisco are two different domain names.
- Ensure that no password is set between the server and client. If any password is set, ensure that it is the same on both sides. The password is also case sensitive.
- Every switch in the VTP domain must use the same VTP version. VTP version 1 (VTPv1) and VTP version 2 (VTPv2) are not compatible on switches in the same VTP domain. Do not enable VTPv2 unless every switch in the VTP domain supports version 2.
- A switch that is in VTP transparent mode and uses VTPv2 propagates all VTP messages, regardless of the VTP domain that is listed. However, a switch running VTPv1 propagates only VTP messages that have the same VTP domain as the domain that is configured on the local switch. VTP transparent mode switches that are using VTPv1 drop VTP advertisements if they are not in the same VTP domain.
- The extended-range VLANs are not propagated. So you must configure extendedrange VLANs manually on each network device.
- The updates from a VTP server are not updated on a client if the client already has a higher VTP revision number. In addition, the client does not propagate the VTP updates to its downstream VTP neighbors if the client has a higher revision number than that which the VTP server sends.
Recently Installed Switch Causes Network Problems
A newly installed switch can cause problems in the network when all the switches in your network are in the same VTP domain, and you add a switch into the network that does not have the default VTP and VLAN configuration.
If the configuration revision number of the switch that you insert into the VTP domain is higher than the configuration revision number on the existing switches of the VTP domain, your recently introduced switch overwrites the VLAN database of the domain with its own VLAN database. This happens whether the switch is a VTP client or a VTP server. A VTP client can erase VLAN information on a VTP server. A typical indication that this has happened is when many of the ports in your network go into an inactive state but continue to be assigned to a nonexistent VLAN.
To prevent this problem from occurring, always ensure that the configuration revision number of all switches that you insert into the VTP domain is lower than the configuration revision number of the switches that are already in the VTP domain. You can accomplish this by changing the VTP mode to transparent and then back to server or client. You can also accomplish it by changing the VTP domain name and then changing it back.
All Ports Inactive After Power Cycle
Switch ports move to the inactive state when they are members of VLANs that do not exist in the VLAN database. A common issue is all the ports moving to this inactive state after a power cycle. Generally, you see this issue when the switch is configured as a VTP client with the uplink trunk port on a VLAN other than VLAN1. Because the switch is in VTP client mode, when the switch resets, it loses its VLAN database and causes the uplink port and any other ports that were not members of VLAN1 to become inactive.
Complete these steps to solve this problem:
Step 1 Temporarily change the VTP mode to transparent.
Step 2 Add the VLAN to which the uplink port is assigned to the VLAN database.
Step 3 Change the VTP mode back to client after the uplink port begins forwarding.
Troubleshooting Spanning Tree
It is also important to know how to identify and resolve spanning-tree issues, assuming port connectivity, VLAN, and VTP problems have been identified and resolved. Figure 2-41 shows the flow for troubleshooting STP.
Figure 2-41 Troubleshooting STP
Use the Diagram of the Network
Before you troubleshoot a bridging loop, you must at least be aware of the following:
- The topology of the bridge network
- The location of the root bridge
- The location of the blocked ports and the redundant links
This knowledge is essential for the following reasons:
- Before you can determine what to fix in the network, you must know how the network looks when it is functioning correctly.
- Most of the troubleshooting steps simply use show commands to identify error conditions. Knowledge of the network helps you focus on the critical ports on the key devices.
Identify a Bridging Loop
It used to be that a broadcast storm could have a disastrous effect on the network. Today, with high-speed links and devices that provide switching at the hardware level, it is not likely that a single host, such as a server, will bring down a network through broadcasts.
The best way to identify a bridging loop is to capture the traffic on a saturated link and verify that you see similar packets multiple times. Realistically, however, if all users in a certain bridge domain have connectivity issues at the same time, you can already suspect a bridging loop. Check the port utilization on your devices to determine whether abnormal values are present.
Restore Connectivity Quickly
Bridging loops have extremely severe consequences on a switched network. Administrators generally do not have time to look for the cause of the loop, and they prefer to restore connectivity as soon as possible. The easy way out in this case is to manually disable every port that provides redundancy in the network.
Disable Ports to Break the Loop
If you can identify the part of the network that is affected most, begin to disable ports in this area. Or, if possible, initially disable ports that should be blocking. Each time you disable a port, check to see if you have restored connectivity in the network. By identifying which disabled port stops the loop, you also identify the redundant path of this port. If this port should have been blocking, you have probably found the link on which the failure appeared.
Log STP Events
If you cannot precisely identify the source of the problem, or if the problem is transient, enable the logging of STP events on the switches of the network that experiences the failure. If you want to limit the number of devices to configure, at least enable this logging on devices that host blocked ports; the transition of a blocked port is what creates a loop. Issue the privileged EXEC command debug spanning-tree events to enable STP debug information. Issue the global configuration mode command logging buffered to capture this debug information in the device buffers. You can also try to send the debug output to a syslog device. Unfortunately, when a bridging loop occurs, you seldom maintain connectivity to a syslog server.
Temporarily Disable Unnecessary Features
Disable as many features as possible to help simplify the network structure and ease the identification of the problem. For example, EtherChannel is a feature that requires STP to logically bundle several different links into a single link, so disabling this feature during troubleshooting makes sense. As a rule, make the configuration as simple as possible to ease troubleshooting.
Designate the Root Bridge
Often, information about the location of the spanning-tree root bridge is not available at troubleshooting time. Do not let STP decide which switch becomes the root bridge. For each VLAN, you can usually identify which switch can best serve as the root bridge. Which switch would make the best root bridge depends on the design of the network. Generally, choose a powerful switch in the middle of the network. If you put the root bridge in the center of the network with direct connection to the servers and routers, you generally reduce the average distance from the clients to the servers and routers. For each VLAN, hard code which switches will serve as the root bridge and which will serve as the backup (secondary) root bridge.
Verify the Configuration of RSTP
The 802.1d and PVST+ spanning-tree protocols have convergence times between 30 and 50 seconds. The RSTP and PVRST+ spanning-tree protocols have convergence times within one or two seconds. A slow convergence time may indicate that not all of the switches in your network have been configured with RSTP, which can slow the convergence times globally in your network. Use the show spanning-tree command to verify the spanning tree mode.