Cisco Network Mgmt Protocol FAQ: Management Metrics

Q1. In what ways can network management impact a service provider’s business?

Answer: It affects the cost of ownership of a network, revenues that can be generated from communication services, network availability, and the quality of the communication services that are offered.

cisco-network-mgmt-protocol-faq-management-metrics

Figure: Network Management Business Impact

Q2. Give an example of a network management technology or feature that can affect revenue.

Answer: Service provisioning systems are one example. They enable faster rollout of services and hence allow sooner collection of revenues from customers. A second example concerns service level monitoring, which enables you to offer higher service level guarantees, for which a higher price can be charged. A third example is offering unified billing and service statistics as a premium feature of a service itself, which might attract additional customers.

Q3. Give an example of a network management technology that can affect availability.

Answer: Here are a few examples: Performance trending combined with threshold-crossing alerting brings impending problems to the attention of network managers, who might be able to take preventive action and avert problems before availability is affected. Well-designed user interfaces can reduce their tendency to err during configuration changes—as many as half of all network outages can be attributed to preventable user error. A reduction in user errors hence implies an increase in availability. Event correlation enables network managers to spot the root cause of problems more quickly, which leads to sooner repairs.

Q4. At what levels can the effectiveness of network management be assessed?

Answer: Network management effectiveness can be assessed at the following levels: the managed technology itself, which is subsumed under the term manageability; management applications and operations support infrastructure; and the management organization that makes use of this infrastructure.

Q5. Name two different contexts in which an elaborate GUI of a management application does very little to increase management effectiveness.

Answer: One context is that of a management application that is driven primarily through its northbound interface from other applications in an operations support environment, not by network managers sitting in front of a screen. The second context is that of an application that is used by power users who are more efficient typing command shortcuts than performing point-and-click operations in a GUI, which slows them down.

Q6. Imagine that you have decided to invest in the development of custom rules for your eventcorrelation system. The goal is to improve automated diagnosis of failures in the network. The development can be carried out by one consultant who can incrementally introduce new rules into the system, which will make the system gradually more effective. You decide to give it a try, but you want to make continuation of the project dependent on it indeed fulfilling your expectations as you go along. Can you think of some metrics to use to assess whether the system fulfills expectations?

Answer: The metrics should indicate whether the quality and accuracy of diagnosis is actually improving. Possible metrics might include mean time to repair (which should get shorter), the average number of steps an operator must perform to diagnose a problem (which should get smaller), and the percentage of events that are correctly diagnosed and attributed to a root cause by the system without requiring further operator intervention (which should increase).

Q7. Imagine that you need to decide whether to invest in a new service provisioning system. That system is expected to carry a significant price tag—in particular, when taking into account the cost of integrating the new system with the existing operations support infrastructure. What metrics might help you decide whether this is a worthwhile investment, and how would you do the math to arrive at a go/no-go decision?

Answer: You might consider the number of truck rolls (per service order), the mean time to fill customer service orders, and the average size of customer order backlog as useful metrics. What needs to be compared is the value that those metrics have today versus the value that those metrics are expected to have in the future with the new system in place. This difference needs to be translated into a monetary value. For example:

The monetary value yielded by the reduction in truck rolls can be calculated as follows:
Let rtrso be the expected reduction in the number of truck rolls per service order, ctr the cost per truck roll, and nso the number of service orders per month. The value per month of the reduction in truck rolls is rtrso × ctr × nso.
The monetary value yielded by the reduction in mean time to fill service orders can be calculated as follows:
Let rmtso be the expected reduction in the mean time to fill a service order, rso the revenue per service order that can be collected during a time period of length rtmso, cso the cost of the resources that are required to provide service during a time period of length rtmso, and nso the number of service orders per month. The value per month of the reduction in mean time to fill service orders is (rso – cso) × nso.

The combined benefits are then juxtaposed with the cost of the system, using standard business metrics to assess the return on investment. One simple measure is calculating the time until the investment pays for itself—basically, dividing the cost by the additional revenue and operational cost savings. Of course, these are back-of-the-envelope calculations. To increase the accuracy of those calculations, other factors (such as interest cost and other costs associated with the additional revenue) need to be taken into the equation.

Q8. What is the difference between MTBF and availability?

Answer: Availability denotes the percentage of time during which the service or system is available (that is, functioning properly and capable of being used) over a time period. Mean time between failures denotes the frequency of failures, without regard to how long the failures last. Hence, it is possible to have a service with high availability but with low MTBF, meaning that the overall user experience despite high availability can still be pretty bad.

Q9. Which three aspects determine the complexity of operational tasks?

Answer: The three aspects are: execution complexity, which is determined largely by the number of steps of the task; parameter complexity, which is determined by the number of parameters required in those steps and the ease of obtaining them; and memory complexity, which involves how much data a user must retain in memory at any one time during the task.

Q10. A management application vendor boasts about the scale of its management system, claiming that it can support 10 million managed objects. What are some questions that you might want to ask in return?

Answer: What do you consider a managed object, and how does it translate to managed network size? For example, would you require one managed object per managed device, or per configuration parameter, or per port? How long does it take you to initially populate the system with those 10 million objects? How long does it take you to synchronize your 10 million objects with the network, and what is the maximum time lag that a managed object can have until it reflects changes in the managed resource that the managed object represents? How many events can your system process, and is this sufficient to keep up with the managed network size that corresponds to 10 million objects without dropping events?

More Resources

Cisco Network Mgmt Protocol FAQ: Management Metrics

Related