It is useful to break down the optical layer into three sublayers: the optical channel layer, which deals with individual connections or lightpaths and is end to end across the network;
Trang 1530 CONTROL AND MANAGEMENT
emission limits for the safety class The values chosen for r and r I depend on the link propagation delay (see Problem 9.5)
Since the Class I safety standard also specifies that emission limits must be main- tained during single fault conditions, the open fiber control circuitry at each node is duplicated for redundancy
Summary
Network management is essential to operate and maintain any network Operating costs dominate equipment costs for most telecom networks, making good network management imperative in ensuring the smooth operation of the network The main functions of network management include configuration (of equipment and connec- tions in the network), performance monitoring, and fault management In addition, security and accounting are also management functions Most functions of manage- ment are performed through a hierarchy of centralized management systems, but certain functions, such as restoration against failures, or the use of defect indicators
to suppress alarms, are done in a distributed fashion Several management protocols exist, the main ones being TL-1, SNMP, and CMIP
It is useful to break down the optical layer into three sublayers: the optical channel layer, which deals with individual connections or lightpaths and is end
to end across the network; the optical multiplex section layer, which deals with multiplexed wavelengths on a point-to-point link basis; and the optical transmission section layer, which deals with multiplexed wavelengths and the optical supervisory channel between adjacent amplifiers
Interoperability between equipment from different vendors is a major issue facing the industry today Initially the focus was on trying to get interoperability between vendors at the WDM level, but that has been recognized now as being very complex Today the focus is on establishing interoperability by defining standard port-side single-wavelength interfaces at regenerator (or transponder) locations There is also significant work under way to define optical layer overheads and their functions as well as to establish signaling and control protocol standards for controlling connec- tions in the optical layer
The level of transparency offered by the optical network affects the amount of management that can be performed Key performance parameters such as the bit error rate can only be monitored in the electrical domain Fast signaling methods need to be in place between network elements to perform some key management functions These include the use of defect indicator signals to prevent the generation
of unwanted alarms and protection-switching action, and other signaling bytes to control rapid protection switching Optical path trace is another indicator that can
Trang 2Further Reading 5 31
be used to verify and manage connectivity in the network Several methods exist for exchanging management information between nodes, including the optical su- pervisory channel, pilot tones, the use of certain overhead bytes in the SONET/SDH overhead, and the new digital wrapper overhead defined specifically for the optical layer
Connection management in the optical network is slowly migrating from a cen- tralized management-plane-based approach to a more distributed connection control plane approach using protocols similar to those used in IP and ATM networks Eye safety considerations are a unique feature of optical fiber communication systems These considerations set an upper limit on the power that can be emitted from an open fiber, and these limits make it harder to design WDM systems, since they apply to the total power and not to the power per channel Safety is maintained
by using automated shutdown mechanisms in the network that detect failures and turn off lasers and amplifiers to prevent any laser radiation from exiting the system
Further Reading
Network management is a vast subject, and several books have been written on the subjectmsee, for instance, [Sub00, Udu99, Bla95, AP94] for good introductions to the field, including descriptions of the various standards [McG99, Wil00, Mae98] provide overviews of issues in optical network management
There is currently a lot of interest in the standards bodies in standardiz- ing many of the items we discussed in this chapter The standards groups cur- rently engaged in this are the International Telecommunications Union (ITU) study groups 13 and 15 (www.itu.ch), the American National Standards Institute (ANSI) TIX1.5 subcommittee (www.ansi.org), the Optical Internetworking Forum (OIF)
(www.oiforum.com), the Internet Engineering Task Force (IETF) (www.ietf.org),
Telcordia Technologies (www.telcordia.com), and the Network and Services Inter- operability Forum (NSIF) (www.atis.org/atis/sif/sifhom.htm) The ITU defines the standards, including both SDH and the optical layer ANSI provides the North Amer- ican input to the ITU IETF is the standards body for the Internet and is actively involved in defining optical layer control protocols The OIF serves as a discussion fo- rum for data communications equipment vendors, optical networking vendors, and service providers Telcordia defined many of the SONET standards NSIF has defined many of the management interfaces for facilitating interoperability in SONET We have provided a list of relevant standards documents in Appendix C
Pilot tones have been used in optical networks for several years now See [Hi193, HFKV96, HK97] for a sampling of papers describing implementations of pilot tones
Trang 3532 C O N T R O L AND MANAGEMENT
for signal tracing and monitoring [Epw95] uses pilot tones to control the gain of optical amplifiers
ITU G.709 defines the digital wrapper including the associated maintenance signals such as the path trace and the defect indicators Telcordia's GR-253 defines
an equivalent set of signals for SONET
Distributed protocols for connection management are commonly used in many types of networks; examples include PNNI in ATM networks [ATM96] and RSVP/CR-LDP [BZB+97, Abo01] in IP/MPLS networks See [CGS93] for some early work and [RS97, Wei98] for related work on optical networks Significant activity is under way currently toward defining extensions to IP control protocols to provide optical layer connection management Many of these are contributions to the ITU, ANSI, IETF, and OIF and may be accessed from their Web sites See also [GR00, AR01] for a discussion of the various types of control plane models Laser safety is covered by several standards, including ANSI, the International Electrotechnical Commission (IEC), the U.S Food and Drug Administration (FDA), and the ITU [Ame88, Int93, Int00, US86, ITU99, ITU96]
9.1
9.2
9.3
Problems
Which sublayer within the optical layer would be responsible for handling the fol- lowing functions?
(a) Setting up and taking down lightpaths in the network
(b) Monitoring and changing the digital wrapper overhead in a lightpath (c) Rerouting all wavelengths (except the optical supervisory channel) from a failed fiber link onto another fiber link
(d) Detecting a fiber cable cut in a WDM line system
(e) Detecting failure of an individual lightpath
(f) Detecting bit errors in a lightpath
Consider the SONET network operating over the optical layer shown in Figure 9.13 Trace the path of the connection through the network, and show the termination of different layers at each network element
Consider the network shown in Figure 9.14 Suppose the link segment between OLT
A and amplifier B fails
(a) Assume that each node detects loss of light in 2 ms and waits 5 ms before it sends an FDI signal downstream Also, each node waits for 2 s after the loss
of light is detected before it triggers an alarm Assume that the propagation delay on each link segment (segment defined as the part of the link between adjacent amplifiers or between an OLT and adjacent amplifier) is 3 ms
Trang 4Problems 533
Figure 9.13 A combined SONET/WDM optical network for Problem 9.2
Figure 9.14 Example for Problem 9.3
9.4
Draw a time line indicating the behavior of each node in the network after the failure, including the transmission of OCh-FDI and OMS-FDI signals (b) Now assume that each node detects loss of light in 2 ms, immediately sends
an FDI signal downstream, and waits an additional 2 s after the loss of light
is detected before it triggers an alarm Assume the same propagation delay values as before Redraw the time line indicating the behavior of each node
in the network after the failure, including the transmission of OCh-FDI and OMS-FDI signals
What do you observe as the difference between the two methods proposed above?
Consider an OXC connected to multiple OLTs
(a) If the OXC has an electronic switch core with optical-to-electrical conver- sions at its ports, what overhead techniques can it use? How would it commu- nicate with other such OXCs in the network? What performance parameters could it monitor?
Trang 5534 CONTROL AND MANAGEMENT
9.5
(b) If the OXC is all optical, with no optical-to-electrical conversions, what overhead techniques can it use? How would it communicate with other such OXCs in the network? What performance parameters could it monitor? Consider the open fiber control protocol in the Fibre Channel standard
(a) How would you choose the parameters r and r' as a function of the maximum link propagation delay dp~op?
(b) What is the time taken for a node to go from the DISCONNECT state to the ACTIVE state, assuming a successful reconnection attempt, that is, it never has to go back to the DISCONNECT state?
References
[Abo01] O Aboul-Magd et al Constraint-Based LSP Setup Using LDP Internet
Engineering Task Force, 2001 draft-ietf-mpls-cr-ldp-O5.txt
lAme88] American National Standards Institute Z136.2 Safe Use of Optical Fiber
Communication Systems Utilizing Laser Diodes and LED Sources, 1988
[AP94] S Aidarus and T Plevyak, editors Telecommunications Network Management into
the 21st Century IEEE Press, Los Alamitos, CA, 1994
[AR01] D Awduche and Y Rekhter Multiprotocol lambda switching: Combining MPLS
traffic engineering control with optical crossconnects IEEE Communications Magazine, 39(4):111-116, Mar 2001
[ATM96] ATM Forum Private Network-Network Interface Specification: Version 1.0, 1996 [Bla95] U Black Network Management Standards McGraw-Hill, New York, 1995 [BZB+97] R Bradon, L Zhang, S Berson, S Herzog, and S Jamin Resource Reservation
ProtocolmVersion 1 Functional Specification Internet Engineering Task Force, Sept 1997
[CGS93] I Cidon, I S Gopal, and A Segall Connection establishment in high-speed
networks IEEE/ACM Transactions on Networking, 1(4):469-482, Aug 1993 [Epw95] R.E Epworth Optical transmission system U.S Patent 5463487, 1995
[GR00] J Gruber and R Ramaswami Towards agile all-optical networks Lightwave, Dec
2000
[HFKV96] E Heismann, M T Fatehi, S K Korotky, and J J Veselka Signal tracking and
performance monitoring in multi-wavelength optical networks In Proceedings of European Conference on Optical Communication, pages 3.47-3.50, 1996
[Hi193] G.R Hill et al A transport network layer based on optical network elements
IEEE/OSA Journal on Lightwave Technology, 11:667-679, May/June 1993
Trang 6References 535
[HK97] Y Hamazumi and M Koga Transmission capacity of optical path overhead
transfer scheme using pilot tone for optical path networks IEEE/OSA Journal on Lightwave Technology, 15(12):2197-2205, Dec 1997
[Int93] International Electrotechnical Commission 60825-1: Safety of Laser Productsm
Part 1: Equipment Classification, Requirements and User's Guide, 1993
[Int00] International Electrotechnical Commission 60825-2: Safety of Laser
ProductsmPart 2: Safety of Optical Fiber Communication Systems, 2000
[ITU96] ITU-T SG15/WP 4 Rec G.681" Functional Characteristics of Interoffice and
Long-Haul Line Systems Using Optical Amplifiers, Including Optical Multiplexing,
1996
[ITU99] ITU-T Rec G.664: Optical Safety Procedures and Requirements for Optical
Transport Systems, 1999
[Mae98] M Maeda Management and control of optical networks IEEE Journal of
Selected Areas in Communications, 16(6):1008-1023, Sept 1998
[McG99] A McGuire Management of optical transport networks IEE Electronics and
Communication Engineering Journal, 11(3):155-163, June 1999
[RS97] R Ramaswami and A Segall Distributed network control for optical networks
IEEE/ACM Transactions on Networking, Dec 1997
[Sub00] M Subramanian Network Management: Principles and Practice Addison-Wesley,
Reading, MA, 2000
[Udu99] D.K Udupa TMN Telecommunications Management Network McGraw-Hill,
New York, 1999
[US86] U.S Food and Drug Administration, Department of Radiological Health
Requirements of 21 CFR Chapter J for Class I Laser Products, Jan 1986
[Wei98] Y Wei et al Connection management for multiwavelength optical networking
IEEE Journal of Selected Areas in Communications, 16(6):1097-1108, Sept 1998 [Wil00] B.J Wilson et al Multiwavelength optical networking management and control
IEEE/OSA Journal on Lightwave Technology, 18(12):2038 2057, 2000
Trang 7This Page Intentionally Left Blank
Trang 8Network Survivability
p ROVIDING R E S I L I E N C E A G A I N S T F A I L U R E S is an important requirement for many high-speed networks As these networks carry more and more data, the amount
of disruption caused by a network-related outage becomes more and more significant
A single outage can disrupt millions of users and result in millions of dollars of lost revenue to users and operators of the network
As part of the service-level agreement between a carrier and its customer leasing a
A common requirement is that the connection be available 99.999% (five 9s) of the time This requirement corresponds to a connection downtime of less than 5 minutes per year
A connection is routed through many nodes in the network between its source and its destination, and there are many elements along its path that can fail The
able, that is, able to continue providing service in the presence of failures Protection switching is the key technique used to ensure survivability These protection tech- niques involve providing some redundant capacity within the network and automati- cally rerouting traffic around the failure using this redundant capacity A related term
is restoration Some people apply the term protection when the traffic is restored in
traffic is restored on a slower time scale However, we do not distinguish between protection and restoration in this chapter
Protection is usually implemented in a distributed manner without requiring centralized control in the network This is necessary to ensure fast restoration of service after a failure
5 3 7
Trang 9538 NETWORK SURVIVABILITY
We will be concerned with failures of network links, nodes, and individual chan- nels (in the case of a W D M network) In addition, the software residing in today's network elements is immensely complex, and reliability problems arising from soft- ware bugs has become a serious issue This is something that is usually dealt with by using proper software design and is hard to protect against in the network
In most cases failures are triggered by human error, such as a backhoe cutting through a fiber cable, or an operator pulling out the wrong connection or turning off the wrong switch Links fail mostly because of fiber cuts This is the most likely
failure event There were 136 such failures reported by U.S carriers to the Federal Communications Commission in 1997 Fiber that is deployed inside of oil and gas
pipelines is less likely to be cut than fiber that is buried directly in the ground or strung on poles For instance, Williams Communications, which runs fiber beside oil
pipelines, has experienced only a single fiber cut since 1986
The next most likely failure event is the failure of active components inside net- work equipment, such as transmitters, receivers, or controllers In general, network equipment is designed with redundant controllers Moreover, failure of controllers doesn't affect traffic but only impacts management visibility into the network Node failures are another possibility to be reckoned with Entire central offices can fail, usually because of catastrophic events such as fires or flooding These events are rare, but they cause widespread disruption when they occur Examples include the fire at the Hinsdale central office of Illinois Bell in 1988 and the flooding of
several central offices due to Hurricane Floyd in 1999
Protection schemes are also used extensively to allow maintenance actions in the network For example, in order to service a link, typically the traffic on the link is switched over to an alternate route using the protection scheme before it is serviced The same technique is used when nodes or links are upgraded in the network
In most cases, the protection schemes are engineered to protect against a single failure event or maintenance action If the network is large, we may need to provide the capability to deal with more than one concurrent failure or maintenance action One way to handle this is to break up the network into smaller subnetworks and restrict the operation of the protection scheme to within a subnetwork This allows one failure per subnetwork at any given time Another way to deal with this issue
is to ensure that the mean time to repair a failure is much smaller than the mean time between failures This ensures that, in most cases, the failed link will be re- paired before another failure happens Some of the protection schemes that we will study do, however, protect the network against some types of simultaneous multiple failures
The restoration times required depend on the application/type of data being carried For SONET/SDH networks, the maximum allowed restoration time is 60 ms This restoration time requirement came from the fact that some equipment in the
Trang 1010.1 Basic Concepts 539
network drops voice calls if the connection is disrupted for a period significantly longer than 60 ms Over time, operators have gotten used to being able to achieve restoration on these time scales However, in a world dominated by data, rather than voice traffic, the 60 ms number may not be a hard requirement, and operators may be willing to tolerate somewhat larger restoration times, particularly if they see other benefits as a result, such as higher bandwidth efficiency, which in turn would lead to lower operating costs On the other hand, another point of view is that the restoration time requirements could get more stringent as data rates in the network increase A downtime of 1 second at 10 Gb/s corresponds to losing over a gigabyte
of data Most IP networks today provide services on a best-effort basis and do not guarantee availability; that is, they try to route traffic in the network as best as they can, but packets can have random delays through the network and can be dropped
if there is congestion
Survivability can be addressed within many layers in the network Protection can be performed at the physical layer, or layer 1, which includes the SONET/SDH and the optical layers Protection can also be performed at the link layer, or layer
2, which includes the ATM layer and the MPLS layer that is part of IP networks Finally, protection can also be performed at the network layer, or layer 3, such as the
IP layer There are several reasons why this is the case For instance, each layer can protect against certain types of failures but probably not protect against all types of failures effectively We will focus primarily on layer 1 restoration in this chapter, but also briefly discuss the protection techniques applicable to layers 2 and 3
The rest of this chapter is organized as follows We start by outlining the basic concepts behind protection schemes Many of the protection techniques used in today's telecommunication networks were developed for use in SONET and SDH networks, and we will explore these techniques in detail We will also look at how protection is implemented in today's IP networks Following this, we will look at protection functions in the optical layer in detail, and then discuss how protection functions in the different layers of the network can work together
10.1 Basic Concepts
A great variety of protection schemes are used in today's networks We will talk
operation; protect paths provide an alternate path to carry the traffic in case of failures Working and protection paths are usually diversely routed so that both paths aren't lost in case of a single failure
Protection schemes are designed to operate over a range of network topologies Some work on point-to-point links Ring topologies are particularly popular in