What Is Spanning Tree and Why Use Spanning Tree?
In its most basic sense, the Spanning-Tree Protocol (STP) is a loop-prevention protocol. It is a technology that allows bridges to communicate with each other to discover physical loops in the network. The protocol then specifies an algorithm that bridges can use to create a loop-free logical topology. In other words, STP creates a tree structure of loop-free leaves and branches that spans the entire Layer 2 network. The actual mechanics of how the bridges communicate and how the STP algorithm works is the subject of the rest of the chapter.
Loops occur in networks for a variety of reasons. The most common reason you find loops in networks is the result of a deliberate attempt to provide redundancy—in case one link or switch fails, another link or switch can take over. However, loops can also occur by mistake (of course, that would never happen to you). Figure 6-1 shows a typical switch network and how loops can be intentionally used to provide redundancy.
Figure 6-1. Networks Often Include Bridging Loops to Provide Redundancy
The catch is that loops are potentially disastrous in a bridged network for two primary reasons: broadcast loops and bridge table corruption.
Broadcast Loops
Broadcasts and Layer 2 loops can be a dangerous combination. Consider Figure 6-2.
Figure 6-2. Without STP, Broadcasts Create Feedback Loops
Assume that neither switch is running STP. Host-A begins by sending out a frame to the broadcast MAC address (FF-FF-FF-FF-FF-FF) in Step 1. Because Ethernet is a bus medium, this frame travels to both Cat-1 and Cat-2 (Step 2).
When the frame arrives at Cat-1:Port-1/1, Cat-1 will follow the standard bridging algorithm discussed in Chapter 3, “Bridging Technologies,” and flood the frame out Port 1/2 (Step 3). Again, this frame will travel to all nodes on the lower Ethernet segment, including Cat-2:Port1/2 (Step 4). Cat-2 will flood the broadcast frame out Port 1/1 (Step 5) and, once again, the frame will show up at Cat-1:Port-1/1 (Step 6). Cat-1, being a good little switch, will follow orders and send the frame out Port 1/2 for the second time (Step 7). By now I think you can see the pattern—there is a pretty good loop going on here.
Additionally, notice that Figure 6-2 quietly ignored the broadcast that arrived at Cat-2:Port-1/1 back in Step 2. This frame would have also been flooded onto the bottom Ethernet segment and created a loop in the reverse direction. In other words, don’t forget that this “feedback” loop would occur in both directions.
Notice an important conclusion that can be drawn from Figure 6-2—bridging loops are much more dangerous than routing loops. To understand this, refer back to the discussion of Ethernet frame formats in Chapter 1, “Desktop Technologies.” For example, Figure 6-3 illustrates the layout of a DIX V2 Ethernet frame.
Figure 6-3. DIX Version 2 Ethernet Frame Format
Notice that the DIX V2 Ethernet frame only contains two MAC addresses, a Type field, and a CRC (plus the next layer as Data). By way of contrast, an IP header contains a time to live (TTL) field that gets set by the original host and is then decremented at every router. By discarding packets that reach TTL=0, this allows routers to prevent “run-away” datagrams. Unlike IP, Ethernet (or, for that matter, any other common data link implementation) doesn’t have a TTL field. Therefore, after a frame starts to loop in the network above, it continues forever until one of the following happens:
- Someone shuts off one of the bridges or breaks a link.
- The sun novas.
As if that is not frightening enough, networks that are more complex than the one illustrated in Figure 6-2 (such as Figure 6-1) can actually cause the feedback loop to grow at an exponential rate! As each frame is flooded out multiple switch ports, the total number of frames multiplies quickly. I have witnessed a single ARP filling two OC-12 ATM links for 45 minutes (for non-ATM wizards, each OC-12 sends 622 Mbps in each direction; this is a total of 2.4 Gbps of traffic)! For those who have a hard time recognizing the obvious, this is bad.
As a final note, consider the impact of this broadcast storm on the poor users of Host-A and Host-B in Figure 6-2. Not only can these users not play Doom (a popular game on campus networks) with each other, they can’t do anything (other than go home for the day)! Recall in Chapter 2, “Segmenting LANs,” that broadcasts must be processed by the CPU in all devices on the segment. In this case, both PCs lock up trying to process the broadcast storm that has been created.
Even the mouse cursor freezes on most PCs that connect to this network. If you disconnect one of the hosts from the LAN, it generally returns to normal operation. However, as soon as you reconnect it to the LAN, the broadcasts again consume 100 percent of the CPU. If you have never witnessed this, some night when only your worst enemy is still using the network, feel free to create a physical loop in some VLAN (VLAN 2, for example) and then type set spantree 2 disable into your Catalyst 4000s, 5000s, and 6000s to test this theory. Of course, don’t do this if your worst enemy is your boss!
Bridge Table Corruption
Many switch/bridge administrators are aware of the basic problem of broadcast storms as discussed in the previous section. However, fewer people are aware of the fact that even unicast frames can circulate forever in a network that contains loops. Figure 6-4 illustrates this point.
Figure 6-4. Without STP, Even Unicast Frames Can Loop and Corrupt Bridging Tables
For example, suppose that Host-A, possessing a prior ARP entry for Host-B, wants to send a unicast Ping packet to Host-B. However, Host-B has been temporarily removed from the network, and the corresponding bridge-table entries in the switches have been flushed for Host-B. Assume that both switches are not running STP. As with the previous example, the frame travels to Port 1/1 on both switches (Step 2), but the text only considers things from Cat-1’s point of view. Because Host-C is down, Cat-1 does not have an entry for the MAC address CC-CC-CC-CC-CC-CC in its bridging table, and it floods the frame (Step 3). In Step 4, Cat-2 receives the frame on Port 1/2. Two things (both bad) happen at this point:
- Cat-2 floods the frame because it has never learned MAC address CC-CC-CC-CC-CC-CC (Step 5). This creates a feedback loop and brings down the network.
- Cat-2 notices that it just received a frame on Port 1/2 with a source MAC of AA-AA-AA-AA-AA-AA. It changes its bridging entry for Host-A’s MAC address to the wrong port!
As frames loop in the reverse direction (recall that the feedback loop exists in both directions), you actually see Host-A’s MAC address flipping between Port 1/1 and Port 1/2.
In short, not only does this permanently saturate the network with the unicast ping packet, but it corrupts the bridging tables. Remember that it’s not just broadcasts that can ruin your network.