3.3 How to Measure Voice Quality Yourself The final section in this chapter is concerned with the ways in which administrators of voice mobility networks can directly ascertain the quali
Trang 13.2.4 Jitter
Jitter is the variation in delays that the receiver experiences Jitter is a nuisance that the user does not hear directly, because the phones employ a jitter buffer to correct for any delays Jitter can be defined in a number of ways One way is to use the standard deviation or
maximum deviation around the mean delay per packet Another way is to use the known arrival intervals (such as 20ms), and subtract consecutive delays of packets that were not lost from the known arrival time, then take the standard deviation or the maximum
deviation Either way, the jitter, measured in times or percentages against the mean, tells how variable the network is
Jitter is introduced by variable queuing delays within network equipment Phones and PBXs are well known for having very regular transmission intervals However, the intervening
network may have variable traffic As the queue depths change and the network loads
fluctuate, and as contention-based media such as Wi-Fi links clog with density, packets
are forced to wait Wireless networks are the biggest culprit for introducing delay into an
enterprise private network This is because wireless packets can be lost and retransmitted, and the time it takes to retransmit a packet can usually be measured in units of a millisecond
A jitter buffer’s job is to sit on the receiver and prevent the jitter from causing an underrun
of the voice decoder An underrun is an awkward period of silence that happens when the phone has finished playing the previous packet and needs another packet to play, but one has not yet arrived These underruns count as a form of error or loss, even if every packet does make it to the receiver, and loss concealment will work to disguise them The problem with jitter becomes that an underrun must be followed by an increase in delay of the same amount, assuming no packets are lost This can be seen by realizing that the delayed packet will hold up the line for packets behind it
Here, the value of the jitter buffer can be seen The jitter buffer lets the receiver build up
a slight delay in the output If this delay is greater than the amount of actual jitter on the network, the jitter buffer will be able to smooth things out without underruning
In this sense, the jitter buffer converts jitter directly into delay If the jitter becomes too
large, the jitter buffer may have limited room, and start dropping earlier samples in the
buffer to let the call catch up to be closer to real time In this way, the jitter buffer can
convert the jitter directly into loss
Because jitter is always converted into delay first, then loss, it does not have a direct impact
on the E-model by itself, but instead can be folded in to the other measures However, the complication arises because the user or administrator does not usually know the exact
parameters of the jitter buffer How many samples, how much delay, will the jitter buffer take before it starts to drop audio? Does the jitter buffer start off with a fixed delay? Does
it build up the delay as jitter forces it to? Or does it try to proactively build in some delay,
Trang 270 Chapter 3
www.newnespress.com
which can grow or shrink as the underruns occur? These all have an impact on the E-model call quality
As a result, a rule of thumb here is to match the jitter tolerance to the delay tolerance The network, at least, should not introduce more than 50ms of jitter
3.2.5 Non-IP Effects that Should Be Kept in Mind
The E-model makes plenty of room for non-IP effects on voice quality, and we would be wise to consider them here, even though the previous few sections chose to focus only on the network effects
As mentioned earlier, echo is a problem to be tackled whenever calls are being tied together
in conference bridges or are traversing through multiple media gateways Analog lines introduce the problem of noise, as well as volume or gain control Some analog lines may
be tuned softer than others Most of this requires reasonable end-to-end testing, however Then there are the intangibles Is the network provisioned well enough that calls go through
or are held predictably and reliably? Is the voice mobility network laid out well enough that users know that every point in the campus is a hot spot, or are some areas weak or dead? Cellular companies make entire marketing campaigns on the premise of the importance of coverage and dropped calls (The number of bars on the phone or people standing behind the spokesman are both powerful examples of how important the predictability of the call quality is to callers.) This same concern needs to be applied to voice mobility networks produced within the enterprise No amount of modeling will answer how much tolerance exists, but the general consensus is that voice mobility networks must work better than the cellular networks, when the callers are in the office Mobility within the office does not generally count as a factor that can be used to increase the acceptance of the quality of the calls, and although mobility is a tremendous driving force to achieve higher productivity and less frustration, it is the sort of benefit that is hardly noticed until it is gone
Keep in mind that the codec chosen can make an immediate ten-point difference in the R-value, in many cases
3.3 How to Measure Voice Quality Yourself
The final section in this chapter is concerned with the ways in which administrators of voice mobility networks can directly ascertain the quality of the network
3.3.1 The Expensive, Accurate Approach: End-to-End Voice Quality Testers
As mentioned in the discussion of PESQ (Section 3.1.2), existing tools can measure the quality of the voice network by directly pumping in prerecording voice samples and
Trang 3comparing the output These tools are either expensive or home-grown, and are used to test large networks as a part of a planning or predeployment phase
This sort of testing is more of a tuning exercise, and—much like how piano tuning is a rare and complicated enough exercise that it is not performed frequently—direct end-to-end
testing is not diagnostic Telephone equipment testing companies do make the sort of
equipment to perform this end-to-end inspection, and these tools can be rented
Unfortunately, it is very difficult to know where to invest in this sort of heavily proactive effort
More likely, the voice quality is measured by having administrators walk around the
network with some number of phones in question, ensuring themselves that whatever
problems they may face will likely be manageable The problem with both forms of
proactive testing is that they normally occur on only lightly loaded networks, and thus are not able to measure the effect of network load on voice quality Network load is generally the largest impact on voice quality, in fact, partly because voice mobility network managers
do a good job of testing their networks before they launch them for basic problems, which they quickly correct, and partly because voice mobility networks are more likely to be
robust enough out of the box for basic voice connectivity
3.3.2 Network Specific: Packet Capture Tests
Most of the major packet capture tools, for wireline and for wireless, make modules that are able to indirectly infer the MOS values using E-model calculations Sometimes, these work
by tracing the voice setup protocols, such as SIP, and determining what RTP flows map to phone calls and the properties of the phone calls Other times, these tools will just look
directly at the RTP streams, and not try to find out what phone numbers the streams map
to In both cases, the tools then use the sequence number and timestamp fields in the RTP stream to determine values such as loss, delay, and jitter Using assumed values for the jitter buffer, with the option of having the user overwrite them, the tools then model the expected effect and produce a score
The major issue with these tools is that they show quality only up to the point where they are inserted An easy example of the problem is to look at wireless networks On a Wi-Fi network, a packet capture tool may be able to directly determine what packets it sees and come up with a score By looking at the Wi-Fi protocol, the tool may do a good job of
inferring whether the mobile phone received the packet from the access point, and at what time, and may produce a reasonably close call quality number On the other hand, the
upstream flow is likely to look quite good from the point of view of the test tool, because there is only one network in between the client and the tool The entirety of the network upstream from the client goes missing, and the upstream MOS value can be entirely
misleading
Trang 472 Chapter 3
www.newnespress.com
Some network infrastructure devices are able to do these inferences within themselves, as they pass the data through This may be a reasonable thing to do, again depending on the point of insertion and how well they are able to capture information as late into the network
as possible It is important, when using all of these tools, for you to consult with the vendor
or maker of the tools to find out where the tools are measuring For a wireless controller with voice metric capabilities, for example, make sure that the downstream metrics are measured on the access point, based on what happened over the air, and not just passing through the controller For wireless overlay monitoring, make sure that there is an option to
do a similar capture using a wired mirror port on one of the switches, for cases in which voice quality might begin to suffer and the network needs direct attention Overall, do not rely on just one tool, and believe what the users say—no matter what the tool tells you
3.3.3 The Device Itself
The most accurate and reasonable way to measure voice quality is from the endpoints themselves Both some handsets and PBXs offer the ability for the device to produce the one-way MOS value or R-value for the receive side at the device itself These numbers are based entirely on E-model calculations, assuming best-case or known-default scenarios for the rest of the system, but are likely to be the most accurate Of course, it is difficult to ask
a user to determine what the voice quality is of a call while on it, especially given that voice quality is not something a user wants to measure However, for diagnosing locations that are having troubles, this tool is valuable for the administrator herself, who is able to avoid having to guess as to whether the call sounds reasonable, and may be able to detect variations in the MOS value or R-value
In the end, keep in mind that the absolute values produced by any of the methods deserve being taken with a grain of salt As time goes on, the administrator of a voice mobility network should be able to learn what the real quality means for any given value the tool suggests, even when the tool is placing results a half a MOS point too high or too low
However, the variation of the scores, especially when the network has changed, can be a
valuable tool for point the way towards the solution
Trang 5Voice Over Ethernet
4.0 Introduction
This chapter introduces the technologies necessary to carry voice over wireline packet networks The first half of the chapter is a basic review of the concepts within packet networks, including IP and Ethernet The second half takes a look directly at voice over these networking technologies
4.1 The IP-Based Voice Network
The previous chapters explored the basics of how calls are set up and voice is carried over packet-based IP networks However, the details about what makes the IP network itself work have not yet been addressed
Voice started out on analog phone lines Each pair of copper wires was dedicated to one specific phone, and to nothing else This notion of a dedicated circuit has its advantages
It provides complete isolation of whatever might be going on with that line from the
circumstances and problems of other phones in the network No amount of calls being placed on a neighbor’s line can make the original line itself become busy This isolation and invariance is necessary for voice networks to function when unexpected circumstances occur, and ensures that the voice network is reliable in the face of massive fluctuations in the system Provisioning is simple, as well, with one line per phone at the edge
The problem with the concept of the dedicated line is that it is extremely wasteful When the phone is not in use, the line stays empty No other calls can be placed on that line Even when a call is in place, the copper wire is fully occupied with carrying the voice traffic, a small bandwidth application, and a tremendous amount of excess signal capacity exists Dedicated wires might make sense for short distances between the phone and some
next-level aggregation equipment, but these dedicated lines were used as trunks between the aggregators, causing tremendous waste from both idleness and lost bandwidth But probably the property that caused the most complications with wireline networking was that the dedicated line is not robust If network problems occur—the bundle of cables is cut,
or some intermediate equipment fails and can’t do its job—all lines that are attached along that path are brought down with it
Trang 674 Chapter 4
www.newnespress.com
Digital telephone networks started to eliminate some of the problems inherent to the one-line dedication of early circuit switching By having digital processes encode and carry the voice, more voice calls could be multiplexed onto each line, better using the bandwidth available on the copper wire Furthermore, by allowing for hop-by-hop switching with smarter switches between trunks, failures along one trunk could be accommodated
However, the network was still circuit-switched A voice line could be used only for voice Even where voice circuits were set aside for data links, the link is either fully in use or not at all The granularity of the 64kbps audio line, the DS0, became a burden Running applications that are not always on and have massive peak throughput but equally meek average throughput requirements meant that provisioning was always an expensive
proposition: either dedicate enough lines to cover the peak requirement case, and pay for all of the unused capacity, or cap the capacity offered to the application Furthermore, these circuits needed to be considered, managed, and monitored rather separately The hard divisions between two circuits became a hard division between applications Voice networks were famous for their reliability, strict clockwork operation—and complexity They were not for easy-to-set-up, easy-to-move operations The wires are drawn once and carefully, and the switches and intermediate equipment is set up by a team of dedicated and expensive experts who do nothing but voice all day If you were serious about voice, you operated your own little phone company, complete with dedicated operators If not, your only option was to have the phone company run your phone network for you
Along came packet-switched networks Sending small, self-contained messages between arbitrary endpoints on a network inherently made sense for computers The idea of sending
a message quickly, without tying up lines or going through cumbersome setup and teardown operations removed the restrictions on wasted lines Although it was still true that lines
could remain idle when not being used, the notion of allowing these packets of information
into the line as the fundamental concept, rather than requiring continuous occupation and streaming, meant that lines that carried aggregated traffic from multiple users and multiple messages could be used more efficiently If the messages were short enough, one line might
do No concerns about running out of lines and having the needed, or only, path to the receiver blocked Instead, these messages could just be queued until space was available Along with this whole new way of thinking about occupying the resources came a different way of thinking about addressing and connecting the resources In the early days, a phone number used to encode the exact topological location of the extension Each exchange, or switch with switchboard operator, had a name and number, and calls were routed from exchange to exchange based on that number first Changes to the structure or layout of the telephone system would require changes to the numbers Packet-switching technologies changed that Lines themselves lost their names and numbers Instead, those names and numbers were moved to the equipment that glued the lines together Every device itself now had the address The binding of the addresses to the topology of the network remained, at
Trang 7some level Devices could not be given any arbitrary address Rather, they needed to have
addresses that were similar to their neighbors The notion of exchange-to-exchange routing
was retained
This notion, though, proved to be a burden Changes to the network were quite possible, as either more devices needed addresses, or more new “exchanges” were added to the network Either way, the problem of figuring out how to route messages through the network
remained The original design had each router know which lines needed to be used to send the messages along their way The router might not know how the message should get to the final destination, but it always knew the next step, and could direct traffic along the
right roads to the next intersection, where the next router took over As the number of
intersections increased, and the number of devices expanded, the complexity of maintaining
these routing tables exploded A way was needed for neighboring routers to find out about
each other, and more importantly, to find out about what end devices they knew routes to
Thus, the routing protocol was born These protocols spoke from router to router,
exchanging information on a regular basis, ensuring that routers always had recent
information on what destinations were valid and how to get there from here But another thing happened This idea of exchanging the routes had another benefit, in that it allowed the network itself to be restructured, or to fail in spots, and yet still be able to send traffic Routers did not need to know the entire path to the destination, only the next hop If a
router knew two, different next hops for the same message, and one of the routes went
down, the router could try the second one If the router lost all of its paths to a particular set of destinations, the router before it could learn about that, and avoid using that path to get the messages through If there was a way to get the message there, the network would
find it, through the process of convergence, or agreement over time on the consistency of
whether and how messages could be sent The network became resilient, and point failures would not stop traffic from flowing
This is the story of the Internet, and of all the protocols that make it work Clearly, the story
is simplified (and perhaps romanticized to highlight the point at hand), but the fundamentals are there Circuit switching is difficult to manage, because it is incredibly wasteful and
inflexible Packet switching is much simpler to manage, and can recover from failures
The Internet grew up on top of the lines offered by the circuit-switched technologies, but used a better way to dedicate the resources It wasn’t long before someone realized that
voice itself could be put over these packet-switched lines At first, that might sound
wasteful, as using a digital line to carry a packet containing voice can never be more
efficient than using that line to carry the same bits of voice directly because of the packet overhead But packet networking technologies matured, and the throughputs offered on
simple point-to-point links grew much faster than did the corresponding uses of the same copper line for digital voice—at least, in the enterprise And the advantages of using a
Trang 876 Chapter 4
www.newnespress.com
multipurpose technology allowed these voice over IP pioneers to use the network’s
flexibility and lack of dedication to one purpose to add to the voice over IP offerings
quickly, without requiring retooling of physical wires The ways in which provisioning was thought about changed, and the idea that voice and data networks can perhaps use the same resources became a compelling reason to try to save deployment and management costs There are a tremendous number of resources available for understanding the intricacies of how IP networks operate, including details on how to manage routing protocols and large trunk lines Here, we will explore how voice fits into the packet-based IP network
4.1.1 Wireline Networking Technologies and Packetization
The wireline networking technologies range from the most basic definition of how electrical signals are encoded over the copper line to the higher-level ways that computer software endpoints ensure that messages do not flood the network
4.1.1.1 Ethernet
Nearly all wireline voice mobility networks in the enterprise start with Ethernet Ethernet
is a family of related networking technologies that establish how two machines that are physically connected can talk to each other Ethernet was designed to be as simple to deploy
as possible, so that it can be set up as an unmanaged network, where physically connecting two endpoints together, somehow, through the network is enough to allow them to find each other and communicate (Note that this doesn’t mean that higher-level protocols will work
on this network without effort—just Ethernet itself.)
All of the Ethernet protocols belong to the IEEE 802.3 series and are based on the idea of
encoding frames A frame is a well-defined packet message, with a source, a destination,
a length, and a type The logical format of the Ethernet frame is shown in Table 4.1
Table 4.1: Ethernet Frame Format
Destination Source Ethertype Frame Body FCS
6 bytes 6 bytes 2 bytes n bytes 4 bytes
In Ethernet, links are anonymous Endpoints, however—the line cards that the Ethernet cables plug into—are given addresses These addresses are assigned at the time the device
is built, and are permanently associated with the device The Ethernet address is a 48-bit (6-byte) address, as shown in Table 4.2 The first three bytes, or 24 bits, is called the
Organizationally Unique Identifier (OUI) Each manufacturer of Ethernet equipment is
assigned one or more of these OUIs by the Institute of Electrical and Electronics Engineers (IEEE) Registration Authority The manufacturer chooses the second 24 bits from a unique
Trang 9pool, often in order starting from 00:00:01 Together, the scheme guarantees that this
address will never be accidentally taken by another device
Ethernet also defines two special flags in the address The L bit specifies a local address, which is dynamic and invented by a device for temporary usage This has an application
in Wi-Fi (see Chapter 5), but is otherwise not common The G bit is for group-addressed frames—either broadcast or multicast A group-addressed frame is meant to go out to
multiple devices at once, for all of them to receive Multicast transmissions use this
mechanism The special group address FF:FF:FF:FF:FF:FF (all 1s) is the broadcast address, and specifically requests to go to every device, whether they are in a multicast group or not
Table 4.2: The Ethernet Address Format
OUI Manufacturer-Defined
L G
Bit: 6 7
This is one way by which Ethernet guarantees that it does not require management to add or remove devices from the network When a device wants to transmit over a wire to another device, it has no way of knowing if that second device is there Ethernet was intentionally designed to be as simple as possible, so senders have to transmit and hope that the other device is there When the sender creates a frame, it places the destination Ethernet address first in the frame, followed by its own address Then comes the type of the frame, used to figure out what network protocol is running on top of Ethernet An arbitrary frame body
follows, subject to size restrictions: the body of the frame cannot be greater than 1500
bytes, usually, and cannot be less than 64 bytes (Shorter frames must be padded.) Finally, Ethernet provides a way to determine whether noise on the Ethernet line causes any bit
errors, by using a frame check sequence (FCS), a mathematical checksum of the bits in the
frame that will generally not match the contents of a frame if there are any errors Ethernet uses a CRC-32 checksum
Ethernet itself is a serial protocol, much like serial lines used to connect modems together, but operating with much more sophistication and at a faster rate Most Ethernet types today fall into two categories: copper and fiber The commercially available copper Ethernet
technologies all use a modified version of a telephone cable, made out of copper wires
Each cable carries eight small, insulated copper wires, twisted into pairs as is done for
analog telephone lines The plastic connectors at each end also look like telephone
connectors, but have eight pins, rather than the usual six These connectors, often referred
to as RJ45, a specification in which the connectors figure prominently, snap into the
Trang 1078 Chapter 4
www.newnespress.com
corresponding sockets on all Ethernet devices Differing numbers of the pairs within the four-pair cable may be used for different Ethernet technologies
The first RJ45-based Ethernet is called 10BASE-T, or simply original Ethernet Devices that
support 10BASE-T run at 10Mbps, across just two of the pairs within the cable, one for reception, and one for transmission (The other pairs are not used for data.) These Ethernet lines run a serial protocol, where the voltage on the line is flipped to signal a one or a zero
in the bits used to encode the frame However, these serial lines do not constantly transmit Instead, the line is usually idle But when a device wants to transmit on the line, it simply starts transmitting The transmission itself is the frame, just described Before the frame
itself is sent, a few bits are prepended to it These bits, known as the preamble, are used to
alert the device at the other end that the transmission is going to begin The preamble is a 64-bit sequence of alternating ones and zeros, except for the last two bits, which are both ones The receiving device detects that a transmission comes in, by looking for the sharp swings in voltage in the line from idle, representing the preamble’s bits By the time the preamble is done, the receiver will have figured out the timing of the bit patterns, in case the receiver’s clock is slightly off from the sender’s The full bits of the frame proper come
in, including the checksum At the end of the transmission, the sender and receiver have
to wait for a few microseconds, and then the line becomes idle and ready to be transmitted
on again
Given that 10BASE-T is a point-to-point physical system, as there can only be one
transmitter on one twisted pair, and the other transmitter on the second, there needed to be some way to interconnect multiple lines and thus multiple devices together The solution to
that is the Ethernet hub The hub works by connecting the twisted pair that is used by a
device to transmit, to every link’s twisted pair used to receive This connection allows the
transmission by one device to reach all of the others on the same segment, or other devices
attached to the same hub Hubs are purely electrical, and do not participate in the network itself When a device transmits on an Ethernet hub, every device on that hub hears the signal A receiver knows that the frame is for it by looking at the destination Ethernet address If the address matches, then the frame is kept; otherwise, it is discarded unless the operating system on that device requests to receive all frames on the line The use of hubs,
and the definitions for 10BASE-T, require that the transmissions are all half-duplex,
meaning that a reception and transmission cannot occur independently
Adding multiple devices together on an Ethernet link introduces a problem Two or more devices are capable of transmitting at the same time If two devices do transmit at the same time, their signals will mix on the wire, and all of the receivers will receive the garbage created by the interference Thankfully, there is a solution to avoid this The overall concept
is known by the unwieldy phrase Carrier Sense Multiple Access with Collision Detection
(CSMA/CD) Let’s break that phrase apart, starting from the end The collision detection