CCDA Notes Identifying Voice Networking Considerations
Many of today’s enterprise network designs must accommodate the transmission of voice traffic in addition to data traffic. The transmission of voice over a data network is often referred to as Voice over IP (VoIP). The inclusion of VoIP in a network design typically requires integration with existing telephony services and connectivity into the public switched telephone network (PSTN). Therefore, this section reviews existing telephony networks, discusses traffic engineering, and offers design guidance for VoIP networks.
Reviewing Traditional Voice Architectures and Features
Before recommending VoIP network design solutions, a designer should first become familiar with traditional telephony networks. A fundamental concept in traditional telephony networks is the conversion of human speech into a digital signal.
When you speak into an analog phone, your voice is converted into an analog waveform. However, telephony networks cannot maintain voice quality when sending analog waveforms over long distances. Therefore, telephony networks convert analog waveforms into digital signals, which can be transmitted over great distance.
The steps for converting an analog waveform into a digital signal include the following:
- Filtering—Approximately 90 percent of the frequencies required to understand human speech are in the range of 300 Hz to 3400 Hz. Therefore, to filter out extraneous noise, a coder-decoder (codec) filters out frequencies greater than 4000 Hz.
- Sampling—Based on the Nyquist theorem, which says an analog waveform needs to be sampled at a rate that is at least double the highest frequency being sampled, the analog waveform is sampled at a rate of 8000 samples per second (that is, twice the highest frequency of 4000 Hz), as shown
- Digitizing—When the analog waveform is sampled, the amplitude (that is, the volume) of each sample is represented as a number. This process is called quantization. However, because each possible amplitude does not have an associated number, the measure of each amplitude is rounded off to the nearest number on a scale. For example, consider, which shows how these amplitudes are rounded off on a linear scale. This rounding off can cause quantization noise
Instead of using a linear scale, quantization typically uses a logarithmic scale, so that more accurate measurements can be made at lower volumes. Accuracy at lower volumes is more important than accuracy at higher volumes because most samples have lower volumes, and higher volumes tend to mask the noise. The methods for constructing the logarithmic scale are called companding (that is, compressing and expanding) types. The two primary companding types in use today are a-law, which is most popular in Europe, and mu-law (sometimes written as u-law), which is most popular in North America and Japan.
Just as a router can make decisions about how packets should be forwarded through a network (for example, based on an IP address), a telephone switch makes call routing decisions (for example, based on a
dialed telephone number) for forwarding a voice call through a telephony network. Although the PSTN contains a series of telephone switches (sometimes referred to as central office [or CO] switches), organizations can have their own telephone switches. An example of a privately owned switch is a Private Branch Exchange (PBX). Although a PBX does not scale to the degree a PSTN switch does, a PBX does offer enhanced telephone features to organizations (for example, call hold, conferencing, transferring, music on hold, call forwarding, call park, and voice mail). Also, PBX vendors often use their own proprietary call signaling protocols, whereas PSTN switches use standardsbased signaling protocols.
demonstrates how telephone switches are connected with the following trunk types:
- Tie trunk—Tie trunks interconnect PBXs.
- PBX-to-CO trunk—PBX-to-CO trunks (sometimes just called “CO trunks”) connect an organization’s PBX to the PSTN.
- Interoffice trunk—Interoffice trunks connect the CO switches that make up the PSTN.
- Local loop—A local loop is the connection from a CO switch to a telephony device at the subscriber’s location (for example, an analog phone in a residence).
Telephone switches use various forms of signaling to set up, maintain, monitor, and tear down calls. The three fundamental categories of signaling are as follows:
- Supervisory signaling—Supervisory signaling allows, for example, a telephone switch to determine whether an attached phone is in the on-hook or off-hook condition. Sending ringing voltage to a phone is another example of supervisory (also called “supervision”) signaling.
- Address signaling—Address signaling is used to transmit dialed digits (for example, dual-tone multifrequency [DTMF] tones generated when you press keypad buttons on a phone).
- Information signaling—Information signaling provides feedback about the state of a call to the caller. For example, if you call someone and hear a busy signal, the busy signal is information signaling letting you know that the called party is not available
Signaling information can be communicated over analog or digital connections. Common types of analog signaling include the following:
Loop start signaling—A traditional home phone is an example of a phone that uses loop start signaling. When you pick up the handset, loop current begins to flow, telling the telephone switch that the phone is off-hook. However, loop start signaling can suffer from glare, where someone is calling you and you pick up the handset to place a call before you hear the phone ring. You expect to hear dial tone, but instead you hear the calling party. Although this might occur infrequently in a home environment, because a PBX shares lines, the use of loop start signaling could lead to excessive glare in PBX environments.
Ground start signaling—Ground start signaling prevents glare.Therefore, ground start signaling is preferred for PBXs, as opposed to loop start.
E&M (ear and mouth)—E&M signaling (sometimes called “recEive and transMit” or “earth and magneto”) is used to connect PBXs. Whereas both loop start and ground start each use two wires (that is, tip and ring) to carry both voice and signaling, E&M uses separate wires (that is, the E & M wires) for signaling,
while still using the tip and ring wires to carry voice.
Although analog connections might be appropriate for lower port densities, if you need many connections coming into a PBX or between PBXs, digital circuits often offer a more cost-effective alternative. The
two main types of digital signaling are as follows:
- Channel-associated signaling (CAS)—Consider a T1 circuit. A T1 is a digital circuit with 24 64-kbps channels. With CAS, all 24 channels can be used to carry voice traffic. The signaling information is transmitted by using specific framing bits, which are not needed because most T1s send 24 T1 frames together in what is called a superframe. Because these unneeded framing bits are used for signaling instead of framing (that is, to indicate the beginning of a frame), CAS is sometimes called “robbed bit signaling.”
- Common channel signaling (CCS)—With CCS, one or more channels in a digital circuit (for example, a T1) are used solely to carry signaling information. Therefore, with most T1 CCS implementations, the T1 circuit can carry 23 voice calls, with the twenty-fourth channel used to carry signaling information. ISDN is an example of CCS. ISDN sends voice, data, and video traffic in bearer channels (that is, B channels), with the signaling being carried in a D channel.
Some PBX vendors use their own proprietary signaling protocols. Therefore, connecting PBXs in a mixed-vendor environment can be a challenge. However, many PBX vendors support the Q Signaling protocol (that is, QSIG), which allows PBXs from different vendors to communicate with one another. Similarly, CO switches also have a common signaling protocol, called Signaling System 7 (SS7).
Just as data networks can benefit from hierarchical IP addressing, telephony networks often benefit from a hierarchical numbering plan. A numbering plan is a set of rules that dictate how telephone numbers
are assigned and how voice calls are routed. For example, consider the North American Numbering Plan (NANP). NANP numbers use a numbering format of NXX NXX XXXX, where N can be any digit from 2 through 9 and X can be any digit from 0 through 9. Notice the first NXX. In North America, this digit pattern is an area code. The next NXX pattern represents the local office code, and the final XXXX pattern represents the subscriber’s number. Notice that in North America, neither an area code nor an office code can begin with a 0 or a 1.
Integrating Voice Architectures
Packet telephony network designers must familiarize themselves with new terms and standards not typically encountered in data network design. Specifically, designers need understanding of integrated voicem architecture concepts, standards, and design challenges.
Traditionally, organizations kept their voice, data, and video networks separate. As a result, a data burst on the data network had no adverse effect on voice traffic. However, with the advent of higher bandwidth, more reliable, quality of service (QoS)-enabled networks, network designers are beginning to see the wisdom of combining voice, data, and video on the same converged network.
The two primary approaches of sending voice over a data network are
as follows:
- VoIP—VoIP networks allow traditional telephony devices (for example, analog phones, PBXs, key systems, and the PSTN) to attach to a voice-enabled router. The router packetizes the voice and signaling traffic from the traditional network and transports that traffic over an IP network.
- IP telephony—An IP telephony network, like a VoIP network, transmits voice and signaling traffic in IP packets. However, the distinction between an IP telephony network and a VoIP network is an IP telephony network includes IP-based voice devices (for example, IP phones that contain an Ethernet port and connect directly to a network).
Both VoIP and IP telephony networks require gateways to convert voice and signaling information between the traditional telephony environment (such as a PBX or the PSTN) and the IP environment. These gateways communicate using gateway control protocols (sometimes called call control protocols).
The most mature of the gateway control protocols is H.323. The H.323 standard not only defines a suite of protocols, but it also includes hardware specifications for physical components in an H.323 network. Among the H.323 protocols used for call setup are the following:
- H.225.0—The H.225.0 protocol (often written as H.225) has a couple of functions. H.225.0 can use TCP to send the initial call setup message between a couple of H.323 endpoints. Also, H.225.0 can use User Datagram Protocol (UDP) for communication with an H.323 gatekeeper (which can be used to resolve phone numbers to IP addresses and grant or deny a call to be placed, based on bandwidth availability).
- H.245—When the H.225.0 protocol initiates the call setup process between two H.323 endpoints, the H.245 protocol negotiates the parameters of the call (for example, how the voice will be encoded and which UDP ports to use when sending voice traffic).
H.323 hardware specifications include the following:
- Terminal—An H.323 terminal acts as an endpoint in a call (for example, a user’s PC running H.323-enabled software).
- Gateway—An H.323 gateway converts voice and signaling information between different environments (for example, the traditional telephony environment and the IP environment).
- Gatekeeper—Two of the most important jobs of an H.323 gatekeeper are the following:
1. Number resolution—H.323 uses IP addresses to set up calls.However, users typically dial phone numbers rather than specify IP addresses. The gatekeeper can perform phone number to IP address resolution.
2. Admission control—If too many calls are simultaneously placed over an IP WAN, the quality of all calls suffers. Fortunately, an H.323 gatekeeper can be used to reject a call attempt if that call would oversubscribe the IP WAN’s available bandwidth.
- Multipoint control unit (MCU)—H.323 networks support conference calls. However, processing power is required to mix together multiple audio streams. An H.323 MCU can perform that mixing
An IP telephony network, such as the one pictured, has the following core components:
- Infrastructure—An IP telephony network runs on an underlying infrastructure composed of network layer switches and voiceenabled routers.
- Call processing—Cisco Unified CallManager software (available for either a Windows 2000 or Linux platform) performs PBX-like functions (for example, call routing) for an IP telephony network.
- Applications—Other than basic call setup, IP telephony networks can offer a wide variety of applications, such as unified messaging, interactive voice response, Cisco Unified Contact Center, and Auto Attendant.
- Client devices—Users interface with an IP telephony network via client devices such as Cisco IP Phones. However, a client device could be a software-based phone, such as Cisco IP Communicator.
Because many organizations have multiple locations, their IP telephony networks might span those locations. When determining how IP telephony components should be deployed, consider the following deployment models:
Single-site deployment—If an IP telephony network is contained within a single location, as illustrated in, and has fewer than 30,000 phones, a single-site deployment model is often appropriate.
- Multisite WAN with centralized call processing deployment— Some organizations might have smaller remote sites that do not contain enough IP phones to justify the purchase of UCM servers for those locations. In those instances, the UCM servers could be located at the headquarters, and IP phones at the remote offices could then register with the centralized UCM servers over the IP WAN. If there is an IP WAN outage, IP phones could register with the local Survivable Remote Site Telephony (SRST) routers located at each remote site, for basic call processing functionality. shows an example of this multisite WAN with centralized call processing deployment model.
- Multisite WAN with distributed call processing deployment— When designing a large IP telephony network with multiple locations, the expense of placing UCM servers at each location might be justified. As an example, provides a sample IP telephony topology using the multisite WAN with distributed call processing deployment model.
Although H.323 is a very popular gateway control protocol for IP telephony and VoIP networks, consider some of the other protocols you might encounter in IP telephony or VoIP networks:
- Real-time Transport Protocol (RTP)—Voice packets are carried inside of RTP segments. RTP is a Layer 4 protocol that is encapsulated inside UDP segments.
- Skinny Client Control Protocol (SCCP)—By default, Cisco IP Phones use SCCP to exchange signaling messages with Cisco Unified CallManager. Unlike H.323 (which is considered a peer-to-peer protocol), SCCP is considered to be a client/server protocol.
- Session Initiation Protocol (SIP)—SIP is a peer-to-peer gateway control protocol that is popular in many mixed-vendor environments. When you are adding Cisco IP telephony components to an existing third-party IP telephony network, SIP might serve as an appropriate gateway control protocol.
- Media Gateway Control Protocol (MGCP)—MGCP is a client/server gateway control protocol. In a Cisco IP telephony environment, a Cisco Unified CallManager server acts as the “server,” and a port on a router (for example, an analog Foreign Exchange Station [FXS] port) acts as the “client.”
Identifying the Requirements of Voice Technologies
When designing a network to accommodate voice traffic, consider what could impact the quality of the voice and which mechanisms might be used to maintain voice quality.
When voice and data traffic are contending for limited bandwidth, the following quality issues might arise:
- Delay—The ITU G.114 recommendation for voice traffic specifies a maximum one-way delay of 150 ms for voice traffic. Some types of delay are considered fixed, in that they do not change during a phone call. Examples of these fixed delay components include propagation delay (the time it takes a packet to traverse a network link), serialization delay (the time it takes to send a frame out of a serial link), and processing delay (the time required by the router to encode/decode, compress/decompress, and packetize voice).
- Jitter—Variable delay might vary during a phone call. One example of variable delay is jitter. Specifically, jitter is the uneven arrival of packets at a destination router. Cisco routers use dejitter buffers to help smooth out packet playout, thus concealing the jitter experienced by those packets. Another type of variable delay, which can contribute to jitter, is queuing delay. Queuing delay is the amount of time a packet must spend in a queue as it waits to be forwarded out of an interface.
- Packet drops—If an interface’s output queue fills to capacity, newly arriving packets might be dropped. This occurrence is called tail drop. Although digital signal processors can correct a maximum of approximately 30 ms of lost voice, additional voice packet drops can severely compromise voice quality.
Although not related to limited bandwidth, echo causes another serious problem for voice quality. You experience the symptom of echo when you speak and hear your own voice reflected back to you, or when you speak and the other party hears your voice twice. The issue of echo typically stems from an impedance mismatch in a two-wire to four-wire circuit, which can be found in an analog phone or in telephony switching equipment.
To combat echo, Cisco voice-enabled routers can use echo cancellation, which allows a voice port to “memorize” waveforms being sent out of the interface for a period of time (typically 8–32 ms). If the voice port sees the same waveform coming back in the interface within that period of time, the voice-enabled router can cancel the echo waveform by superimposing the same waveform, which has been phase-shifted 180 degrees. Silence results from playing the same waveform twice, when those waveforms are 180 degrees out of phase.
However, because most quality issues on IP telephony and VoIP networks result from limited bandwidth, network designers use a variety of approaches to make the best use of this limited bandwidth, such as the following:
- Codec selection—One approach is to use a codec requiring less bandwidth per call. For example, the G.711 codec does not perform any compression, and it requires 64 kbps of bandwidth (not including overhead) for a single voice call. However, over an IP WAN, where bandwidth is at a premium, Cisco networks often use the G.729a codec, which only requires 8 kbps of bandwidth
(not including overhead). Because G.729a performs compression, whereas G.711 does not, voice quality is somewhat compromised when using G.729a.
- The mean opinion score (MOS)—The MOS metric is used to measure voice quality, on a five-point scale, with larger numbers representing better quality. The G.711 codec has an MOS score of
4.1; G.729a’s MOS score is 3.9. This slight, and barely perceptible, quality difference is often an acceptable trade-off to reduce bandwidth demand.
- RTP header compression (cRTP)—When using G.729a, voice packets contain 20 bytes of voice payload, while the packet contains 40 bytes of header information. However, because most
information in these headers is identical (for example, the same source/destination IP address/UDP port numbers and the same RTP payload type), cRTP does not send this redundant header information in each frame. Therefore, cRTP reduces the 40-byte header down to only 2 or 4 bytes, allowing more calls to be placed over the same link speed. - Voice activity detection (VAD)—Statistics show that approximately 35 percent of all voice calls are silence. Instead of consuming bandwidth to send “the sound of silence,” VAD can detect the silence and suppress the transmission of silence.
Because network designers are concerned with bandwidth use, they must understand how to calculate required bandwidth. The following formula shows how to calculate a network’s required voice bandwidth:
Bandwidth = ((Layer 2 header) + (IP/UDP/RTP header)) * (Codec bit rate) / (Voice payload size)
When working with this formula, make the following assumptions:
- IP/UDP/RTP header = 40 bytes
- With cRTP, the header = 2 or 4 bytes
- A WAN’s Layer 2 header = 6 bytes
- An easier, and more detailed, bandwidth calculation can be performed using the Cisco Voice Codec Bandwidth Calculator, available at
http://tools.cisco.com/Support/VBC/do/CodecCalc1.do
NOTE
Your Cisco.com account must have appropriate access permissions to reach the Voice Codec Bandwidth Calculator URL.
To combat the quality issues described earlier, you can implement various QoS mechanisms available on Cisco routers and switches. For example, on wiring closet Catalyst switches, voice and data traffic can
be placed in separate queues. Also, these Catalyst switches can be configured not to trust priority markings originating from a PC connected to a Cisco IP Phone.
Router QoS mechanisms include the following:
- Classification and marking—Classifying traffic recognizes characteristics of traffic and categorizes that traffic. As an example, access control lists (ACL) can be used to classify traffic. Once categorized, the traffic can be marked by, for example, altering bits in a packet’s header to indicate the packet’s relative level of priority.
- Congestion management—Congestion management defines the queuing algorithm used by an interface’s output queue. The queuing algorithm can specify which type of traffic receives priority treatment (that is, forwarded out of the interface ahead of other traffic) and how much bandwidth is available to various traffic types during periods of network congestion. Cisco’s recommended queuing mechanism for voice networks is low-latency queuing (LLQ).
- Congestion avoidance—To prevent an interface’s output queue from filling to capacity, after which newly arriving packets are discarded, routers can use a congestion avoidance mechanism (such as weighted random early detection [WRED]) to increase the probability that lower-priority traffic will be discarded as the queue begins to fill.
- Traffic conditioning—Traffic-conditioning mechanisms (for example, policing and shaping) limit the amount of bandwidth that can be consumed by specific traffic types.
- Link efficiency—Link-efficiency mechanisms, such as link fragmentation and interleaving (which fragments larger packets and interleaves voice packets in among the fragmented data packets, thus reducing the serialization delay experienced by the voice traffic) and RTP header compression, attempt to make the most efficient use of limited WAN bandwidth.
As mentioned earlier, if too many simultaneous calls are sent across an IP WAN, and the IP WAN becomes oversubscribed, all calls experience poor voice quality. Therefore, IP telephony and VoIP networks require call admission control (CAC) tools to prevent this oversubscription.One approach to CAC is to use the previously described gatekeeper.Another approach is to use the Resource Reservation Protocol (RSVP). With RSVP, a Cisco voice-enabled router, or a Unified CallManager server (Version 5.0 or later), can reserve network bandwidth for a voice call that no other application can encroach on, thus preventing IP WAN oversubscription.
Because most QoS issues described result from insufficient bandwidth, a network designer needs to provision enough bandwidth to support projected traffic loads during a network’s busiest hour of the day. The process of calculating the required amount of bandwidth is called traffic engineering. The concept of traffic engineering dates back to PBX design, where designers needed to calculate the number of trunks between a PBX and the local CO. With IP telephony and VoIP networks, you take traffic engineering a step further by converting the calculated number of trunks into a bandwidth amount.
Although the mathematics behind traffic engineering can be quite rigorous, the following steps present a simplified approach:
- Determine the grade of service (GoS).
Because designing a voice network with enough trunks to prevent any incoming calls from receiving a busy signal is typically not cost effective, the designer must determine what percentage of calls can be rejected (that is, receive a busy signal) during the busiest hour of the day for an organization’s telephone system.This percentage is called the grade of service, or GoS. Most designs use a GoS of 1 percent, which is written P(.01) - Determine the busy hour traffic (BHT).
The call volume experienced by an organization’s telephone system (for example, a PBX) is measured in Erlangs, where an Erlang equals one solid hour of phone usage. Statistically, the number of Erlangs a corporate phone system experiences during the busiest hour of the day can be approximated by getting the number of hours of phone use during the previous month from your organization’s telephone bill and using the following formula:
Busy hour Erlangs = [Monthly_call_hours / 22] * .15 - Calculate the number of required trunks.
Usually, after you have determined the GoS and the number of Erlangs experienced during an organization’s busiest hour of the day, you can use an Erlang B table to determine the number of
required trunks (that is, simultaneous connections). You can refer to an Erlang B table to calculate the number of required trunks, or you can use a web-based Erlang B calculator, such as the one
available at http://erlang.com/calculator/erlb. - Convert the number of required trunks to the amount of required bandwidth.
Use the Cisco Voice Codec Bandwidth Calculator, as described earlier, to convert the number of required trunks into the amount of required bandwidth.
More Resources