Chuyển đổi lý thuyết P7 pptx

Queueing in non-blocking multistage networks is adopted for improving the loss mance and whenever possible also for increasing the maximum throughput of the switch.Conceptually three kin

Trang 1

Chapter 7 ATM Switching with

Non-Blocking Single-Queueing Networks

A large class of ATM switches is represented by those architectures using a non-blocking connection network In principle a non-blocking interconnection network is a crossbarstructure that guarantees absence of switching conflicts (internal conflicts) between cellsaddressing different switch outlets Non-blocking multistage interconnection networks based

inter-on the self-routing principle, such as sorting–routing networks, are very promising structurescapable of running at the speed required by an ATM switch owing to their self-routing prop-erty and their VLSI implementation suitability It has been shown in Section 6.1.1.2 that anon-blocking interconnection network (e.g., a crossbar network) has a maximum throughput

per switch outlet due to external conflicts, that is multiple cells addressing thesame outlet in the same slot Even more serious than such low utilization factor is the verysmall load level that guarantees a cell loss beyond significant limits

Queueing in non-blocking multistage networks is adopted for improving the loss mance and whenever possible also for increasing the maximum throughput of the switch.Conceptually three kinds of queueing strategies are possible:

perfor-• input queueing (IQ), in which cells addressing different switch outlets are stored at theswitch input interfaces as long as their conflict-free switching through the interconnectionnetwork is not possible;

• output queueing (OQ), where multiple cells addressing the same switch outlet are firstswitched through the interconnection network and then stored in the switch output inter-face while waiting to be transmitted downstream;

• shared queueing (SQ), in which a queueing capability shared by all switch input and outputinterfaces is available for all the cells that cannot be switched immediately to the desiredswitch outlet

Figure 7.1 shows a general model for an ATM switch: it is composed of N input port controllers (IPC), a non-blocking interconnection network and M output port controllers (OPC).Usually, unless required by other considerations, the IPC and OPC with the same index are

ρmax = 0.63

N×M

This document was created with FrameMaker 4.0.4

Switching Theory: Architecture and Performance in Broadband ATM Networks

Trang 2

228 ATM Switching with Non-Blocking Single-Queueing Networks

implemented as a single port controller (PC) interfacing an input and an output channel Inthis case the switch becomes squared and, unless stated otherwise, N and M are assumed to bepowers of 2

Each IPC is provided with a queue of cells, whereas a queue of cells is available ineach OPC Moreover, a shared queue of cells per switch inlet (a total capacity of cells

is available) is associated with the overall interconnection network The buffer capacity B takes

1 as the minimum value and, based on the analytical models to be developed later in this tion and in the following one for input and output queueing, it is assumed that the packet isheld in the queue as long as its service has not been completed With single queueing strategy,

sec-we assume that each IPC is able to transmit at most 1 cell per slot to the interconnection work whereas each OPC can concurrently receive up to K cells per slot addressing theinterface, K being referred to as (output) speed-up The interconnection network is imple-mented, unless stated otherwise, as a multistage network that includes as basic building blocks

net-a sorting network and, if required, also a routing network As far as the former network is cerned, we choose to adopt a Batcher network to perform the sorting function, whereas the

con-n-cube or the Omega topology of a banyan network is selected as routing network; in fact, asshown in Section 3.2.2, such network configuration is internally non-blocking (that is freefrom internal conflicts) The specific models of ATM switches with non-blocking interconnec-tion network that we are going to describe will always be mapped onto the general model ofFigure 7.1 by specifying the values of the queue capacity and speed-up factor of the switch.Unless stated otherwise, a squared switch is considered and each queue operates on

a FIFO basis

This chapter is devoted to the study of the switching architectures adopting only one of thethree different queueing strategies just mentioned Adoption of multiple queueing strategieswithin the same switching fabric will be discussed in the next chapter ATM switching archi-tectures and technologies based on input, output and shared queueing are presented inSections 7.1, 7.2 and 7.3, respectively A performance comparison of ATM switches with sin-

Figure 7.1 Model of non-blocking ATM switch

N= M

Trang 3

in each slot only cells addressing different outlets are switched by the multistage tion network Thus a contention resolution mechanism is needed slot by slot to identify a set

interconnec-of cells in different input queues addressing different network outlets Two basic architectureswill be described which differ in the algorithm they adopt to resolve the output contentions Itwill be shown how both these structures suffer from a severe throughput limitation inherent inthe type of queueing adopted Enhanced architectures will be described as well that aim atovercoming the mentioned throughput limit by means of a more efficient handling of theinput queues

adopt-7.1.1.1 The Three-Phase switch

The block structure of the Three-Phase switch [Hui87] is represented in Figure 7.3: it includes

N port controllers each interfacing an input channel, Ii, and an

out-Figure 7.2 Model of non-blocking ATM switch with input queueing

PC i (i=0, ,… N 1– )

Trang 4

230 ATM Switching with Non-Blocking Single-Queueing Networks

put channel, Oi, a Batcher sorting network (SN), a banyan routing network (RN) and achannel allocation network (AN) The purpose of network AN is to identify winners and los-ers in the contention for the switch outlets by means of the three-phase algorithm [Hui87] Thisscheme has been conceived to exploit the sorting and routing capability of the multistage net-work in order to resolve the contentions for the switch outlets The algorithm, which is runevery slot, evolves according to three phases:

I Probe phase: port controllers request the permission to transmit a cell stored in theirqueue to a switch outlet; the requests are processed in order to grant at most onerequest per addressed switch outlet

II Acknowledgment (ack) phase: based on the processing carried out in Phase I, edgment signals are sent back to each requesting port controller

acknowl-III Data phase: the port controllers whose request is granted transmit their cell

The algorithm uses three types of control packets (for simplicity, we do not consider othercontrol fields required by hardware operations, e.g., an activity bit that must precede eachpacket to distinguish an idle line from a line transporting a packet with all fields set to “0”)1:

• Packet REQ(j,i) is composed of the destination address j of the switch outlet requested bythe HOL cell in the input queue of PCi and the source address i of the transmitting portcontroller Both addresses are bit long

• Packet ACK(i,a) includes the source address i, which is n bits long, to whom the edgment packet is addressed and the grant bit a carrying the contention result

acknowl-• Packet DATA(j,cell) contains the n–bit destination address j of the HOL cell and the cellitself

1 All the fields of the control packets used in the three-phase algorithm are transmitted with the most significant bit first.

Figure 7.3 Architecture of the Three-Phase switch

I0

0

Sorting network (SN)

h0

hN-1

n = log2N

Trang 5

Input Queueing 231

In the probe phase (see Figure 7.4) each port controller with a non-empty input queue sends

a request packet REQ(j,i) through the interconnection network The packets REQ(j,i) are

sorted in non-decreasing order by network SN using the destination and source fields as

pri-mary and secondary sorting key, respectively, so that the requests for the same switch outlets

are adjacent at the outputs of network SN The sorted packets REQ(j,i) enter network AN

which grants only one request per addressed switch outlet, that is the one received on the

low-est-index AN inlet Thus network AN generates for each packet REQ(j,i) a grant field a

indicating the contention outcome ( winner, loser)

In the acknowledgment phase (see Figure 7.4) the port controllers generate packets ACK(i,a)

including the field source just received from the network SN within the packet REQ(j,i) and

the grant field computed by the network AN The packet ACK(i,a) is delivered through the

sorting and routing networks to its due “destination” i in order to signal to PCi the contention

outcome for its request Packets ACK(i,a) cannot collide with each other because all the

desti-nation address i are different by definition (each port controller cannot issue more than one

request per slot)

In the data phase (see Figure 7.4) the port controller PC i receiving the packet ACK(i,0)

transmits a data packet DATA(j,cell) carrying its HOL cell to the switch outlet j, whereas upon

receiving packet ACK(i,1) the HOL cell is kept in the queue and the same request REQ(j,i)

will be issued again in the next slot

Figure 7.4 Packet flow in the Three-Phase switch

D D

R

A A

D a

Trang 6

An example of packet switching according to the three-phase algorithm for isshown in Figure 7.5 Only four out of the seven requests are granted since two network out-lets are addressed by more than one request

The structure of networks SN and RN is described in Section 3.2.2 The sorting Batchernetwork includes stages of sorting elements , whereas n stages of

switching elements compose the banyan network, with

The hardware implementation of the channel allocation network is very simple, owing to

the sorting operation of the packets REQ(j,i) already performed by the network SN In fact,

since all the requests for the same outlet appear on adjacent inlets of the network AN, we cansimply compare the destination addresses on adjacent AN inlets and grant the request received

on the AN inlet , if the AN inlet carries a request for a different outlet The logic ciated with port of network AN is given in Figure 7.6 The

asso-Figure 7.5 Example of switching with the three-phase algorithm

Figure 7.6 Logic associated with each port of AN

3 4 5

6 7

0

1 2

3 4 5

6 7

2

3

5 0

0 2 3

Allocation network

Sorting network

Routing network

Sorting network

Routing network

Source Destination

I Request

II Acknowledgment

III Data

7 0 3

4 2

5 6

0 0 1

1 0

Grant Source

0 2 3

4 5

6 7

0 0 1

1 1

0 0

0

2

3 4

5 6 7

0

1 1

1 0 0

Trang 7

destination ađress of packets REQ(.,.) received on inputs and are compared bit bybit by an EX-OR gate, whose output sets the trigger by the first mismatching bit in and

The trigger keeps its state for a time sufficient for packet ACK(i,a) to be generated by

port controller PCi The trigger is reset by the rising edge of signal Φdest at the start of theađress comparison

Port controllers generate packets ACK(.,.) by transmitting field source of packet REQ(.,.) being received from network SN, immediately followed by field a received from network AN The AND gate in network AN synchronizes the transmission of field a with the end of receipt

of field source by the port controller The signal on outlet is always low ( ), dent of the input signals on , as the request packet received on inlet (ifany) is always granted (it is the request received on the lowest-index AN inlet for the requestedswitch outlet) The structure of network AN is so simple that it can be partitioned and its func-tion can be performed by each single port controller, as suggested in the original proposal ofthree-phase algorithm [Hui87]: the hardware associated to outlet isimplemented within port controller PCk, which thus receives signals both from andfrom

indepen-Since the networks SN and RN are used to transfer both the user information (the cellswithin packets DATẶ,.)) and the control packets, the internal rate of the switch must be

higher than the external rate C, so that the time to complete the three-phase algorithm equals

the cell transmission timẹ The transfer of user information takes place only in Phase III of thethree-phase algorithm, as Phases I and II represent switching overhead (we disregard here the

ađitional overhead needed to transmit the destination field of packet DATĂj,cell)) Let η denote the switching overhead, defined as the ratio between the total duration, , of Phases Iand II and the transmission time, , of packet DATẶ,.) in Phase III ( and will

be expressed in bit times, where the time unit is the time it takes to transmit a bit on the nal channels) Then, bit/s is the bit rate of each digital pipe inside the switch that is

exter-required to allow a flow of C bit/s on the input and output channels

The number of bit times it takes for a signal to cross a network will be referred to as signal

latency in the network and each sorting/switching stage is accounted for with a latency of 1 bit.

The duration of Phase I is given by the latency in the Batcher network and the

transmission time n of the field destination in packet REQ(.,.) (the field source in packet

REQ(.,.) becomes the field source in packet ACK(.,.) and its transmission time is summed up

in Phase II) The duration of Phase II includes the latency in the Batcher

net-work, the latency n in the banyan netnet-work, the transmission time of packet ACK(i,a).

Hence, is given by

For and given by the standard cell length ( ), we obtain

(the n bit time needed to transmit the destination field of packet DATẶ,.) has

been disregarded) Thus, if Mbit/s is the external link rate, the switch internal ratemust be about 200 Mbit/s

The reservation algorithm has been described assuming for the sake of simplicity that allthe requests for a given switch outlet are equally important Actually, in order to avoid unfair-

Trang 8

ness in the selection of the contention winners (the requests issued by lower-index PCs arealways given implicit priority by the combined operations of the sorting and allocation net-works), a priority must be associated with each request This type of operation in which eachrequest packet includes now three fields will be described in Section 7.1.3.1 where the three-phase algorithm is applied to an enhanced switch architecture with input queueing Othersolutions for providing fairness could be devised as well that do not necessarily require an addi-tional field in the request packet (see, e.g., [Pat91])

7.1.1.2 The Ring-Reservation switch

The Ring-Reservation switch [Bin88a, Bin88b] includes a non-blocking self-routing connection network, typically a Batcher-banyan network and a ring structure that seriallyconnects all the port controllers of the switching fabric (see Figure 7.7) Contentions amongport controllers for seizing the same switch outlets are resolved exchanging control informa-tion on the ring: the port controller PCi ( ) receives control information from

and transmits control information to

Port controller PC0 generates a frame containing N fields, each 1 bit long initially set to 0,

that crosses all the downstream port controllers along the ring to be finally received back by

PC0 The field i of the frame carries the idle/reserved status (0/1) for theswitch outlet Oi A port controller holding a HOL packet in its buffer with destination address

j sets to 1 the j-th bit of the frame if that field is received set to 0 If the switch outlet has

already been seized by an upstream reservation, the port controller will repeat the reservationprocess in the next slot The port controllers that have successfully reserved a switch outlet cantransmit their HOL packet preceded by the self-routing label through the interconnection net-work Such a reservation procedure, together with the non-blocking property of the Batcher-

Figure 7.7 Architecture of the Ring-Reservation switch

Trang 9

banyan interconnection network, guarantees absence of internal and external conflictsbetween data packets, as each outlet is reserved by at most one port controller

Note that the interconnection network bit rate in the Ring-Reservation architecture needsnot be higher than the external bit rate, as in the case of the Three-Phase switch (we disregardagain the transmission time of the packet self-routing label) However the price to pay here is acontention resolution algorithm that is run serially on additional hardware In order to guaran-tee that the interconnection network is not underutilized, the reservation cycle must becompleted in the time needed to transmit a data packet by port controllers, whose length is

time units Apparently, the reservation phase and the user packet sion phase can be pipelined, so that the packets transmitted through the interconnection

transmis-network in slot n are the winners of the contention process taking place in the ring in slot

Thus the minimum bit rate on the ring is , if C is the bit rate in the

interconnection network Therefore, a Ring-Reservation switch with requires a bitrate on the ring for the contention resolution algorithm larger than the bit rate in the inter-connection network

The availability of a ring structure for resolving the output contentions can suggest a ent implementation for the interconnection network [Bus89] In fact, using the controlinformation exchanged through the ring it is possible in each reservation cycle not only to

differ-allocate the addressed switch outlets to the requesting PCs winning the contention (busy PCs),

but also to associate each of the non-reserved switch outlets with a non-busy port controller

(idle PC) A port controller is idle if its queue is empty or it did not succeed in reserving the switch outlet addressed by its HOL cell In such a way N packets with different addresses j

can be transmitted, each by a different port controller, which are eitherthe HOL packets of the busy PCs or empty packets issued by idle PCs Based on the operation

of a sorting network described in Section 2.3.2.2, such arrangement makes it is possible to useonly a sorting Batcher network as the interconnection network, since all the switch outlets areaddressed by one and only one packet Apparently only the non-empty packets received at theswitch outlets will be transmitted downstream by the switching fabric

The allocation of the non-reserved switch outlets to idle PCs can be carried out makingthe reservation frame round twice across the port controllers In the first round the switch out-let reservation is carried out as already described, whereas in the second round each portcontroller sets to 1 the first idle field of the reservation frame it receives Since the number ofnon-reserved outlets at the end of the first round equals the number of idle port controllers,this procedure guarantees a one-to-one mapping between port controllers and switch outlets

in each slot

Compared to the basic Ring-Reservation switch architecture, saving the banyan network

in this implementation has the drawback of doubling the minimum bit rate on the ring which

is now equal to (two rounds of the frame on the ring must be completed in

a slot) Thus a switching fabric with requires a bit rate on the ring for the two-roundcontention resolution algorithm larger than the bit rate in the Batcher interconnectionnetwork

Trang 10

7.1.2 Performance analysis

The performance of the basic architecture of an ATM switch with input queueing

(IQ) is now analyzed The concept of virtual queue is now introduced: the virtual queue VQ i

is defined as the set of the HOL positions in the different input queues

holding a cell addressed to outlet i The server of the virtual queue VQ i is the transmission line

terminating the switch outlet i A cell with outlet address i entering the HOL position also

enters the virtual queue VQi So, the capacity of each virtual queue is N (cells) and the total content of all the M virtual queues never exceeds N A graphical representation of the virtual

queue VQj is given in Figure 7.8 The analysis assumes a first-in-first-out (FIFO) service in theinput queues and a FIFO or random order (RO) in the virtual queue Under the hypothesis ofrandom traffic offered to the switch we first evaluate the asymptotic throughput and then theaverage delay Cell loss probability is also evaluated for finite values of the input queue

In the analysis it is assumed that , while keeping a constant ratio Thus the number of cells entering the virtual queues in a slot approaches infinityand the queue joined by each such cell is independently and randomly selected Furthermore,since the arrival process from individual inlets to a virtual queue is asymptotically negligible,the interarrival time from an input queue to a virtual queue becomes sufficiently long There-fore virtual queues, as well as input queues, form a mutually-independent discrete-timesystem Owing to the random traffic and complete fairness assumption, the analysis will bereferred to the behavior of a generic “tagged” input (or virtual) queue, as representative of anyother input (or virtual) queue This operation is an abstraction of the real behavior of theThree-Phase switch, since the hardware sorting operation determines implicitly a biased selec-tion of the HOL packets to transmit, thus making different the behavior of the differentqueues Also the Ring-Reservation switch is affected by a similar unfairness, since the firstport controllers to perform the reservation have a higher probability of booking successfullythe addressed network outlet

Figure 7.8 Representation of the virtual queue

Trang 11

7.1.2.1 Asymptotic throughput

The asymptotic throughput analysis is carried out following the procedure defined in [Bas76]for a completely different environment (a multiprocessor system with multiple memory mod-ules) Such an approach is based on the analysis of a synchronous queue withinternal server, as defined in the Appendix A more recent and analogous analysis of theasymptotic throughput of ATM switches with input queueing is described in [Kar87], inwhich a synchronous queue with external server is used

In order to evaluate the limiting throughput conditions we assume that each switch inletreceives a cell in a generic slot with probability and that with a constantratio , referred to as an expansion ratio For a generic “tagged” queue we define : number of cells stored in the tagged virtual queue at the end of slot n;

: number of cells entering the tagged virtual queue at the beginning of slot n.

The evolution of the tagged virtual queue is expressed by

(7.1)

This analysis assumes steady-state conditions, thus the index n is omitted When N and M

are finite, the distribution function of the new cells entering the virtual queue can be foundconsidering that the total number of new cells in all virtual queues in a slot equals the number

of busy virtual queue servers in the preceding slot Since is the probability that anew cell enters the tagged virtual queue, the probability of i new cells in the tagged virtual

queue in a generic slot is

When , the number of busy virtual servers will become a constant tion of the total number of servers, denoted as , that represents the maximum utilizationfactor of each server (we are assuming ) Thus the cell arrival distribution becomes Pois-son, that is

frac-(7.2)

As is shown in the Appendix, the virtual queue under study described by Equations 7.1 and7.2 is a synchronous queue with internal server whose average queue length is

Assuming means that all input queues are never empty Since each HOL cell is

queued in only one of the M virtual queues, the average number of cells in each virtual queue

is also expressed by Thus

=

p = 1

N M⁄

Trang 12

a memoryless crossbar switch The term head-of-line (HOL) blocking is used to denote such alow capacity of an IQ switch compared to the ideal value of When the expansionratio is larger (smaller) than 1, correspondingly increases (decreases) quite fast.

When and N is finite, the maximum throughput can be obtained through

straightforward Markovian analysis as reported in [Bha75] Nevertheless, since the state space

grows exponentially, this approach is feasible only for small N The throughput values so

obtained are contained in Table 7.2, together with values obtained through computer tion It is interesting to note that the throughput asymptotic value is approached

simula-quite fast as N increases We could say that is a switch size large enough to mate quite well the behavior of an infinitely large switch

approxi-An interesting observation arises from Table 7.2: the switch capacity for is thesame as for the basic crossbar network , (see Section 6.1.1.2) which by

Table 7.1 Switch capacity for different expansion ratios

- ρM ρM

2

2 1( –ρM) -+

- ρ max

M N

Trang 13

definition has no buffers With any other switch size, input queueing degrades the mance of the pure crossbar network The explanation is that at least one packet is alwaysswitched in each slot in a switch, so that at most one packet is blocked in one of the twoinput queues, while the other HOL packet is new Therefore the output addresses requested bythe two HOL packets are mutually independent (the system behaves as if both addresses werenewly assigned slot by slot).

perfor-7.1.2.2 Packet delay

The packet delay of a IQ switch can be evaluated according to the proceduredescribed in [Kar87] (a different approach is described in [Hui87]) Owing to the Bernoulli

arrival process of a packet to each switch inlet with probability p, the tagged input queue

behaves as a queue; its service time is the time it takes for the HOL packet towin the output channel contention and thus to be transmitted, or, equivalently, the time spentinside the tagged virtual queue As shown in [Hui87], [Li90], the steady-state number of pack-ets entering the tagged virtual queue becomes Poisson with rate , when

(each switch outlet is addressed with the same probability ) Thus, the taggedvirtual queue behaves as a synchronous with internal server (see Appendix) withaverage arrival rate in which the waiting time is the time it takes for a packet to win theoutput channel contention and the service time equals 1 slot Consequently, the queueingdelay represents the service time of the input queue Using theresults found in Appendix for this queue, the average time spent in the input queue, which

equals the average total delay T, is

p E( [ ] E ηηv2 + [ ]v )

2 1[ –p E( [ ] 1ηv + )] -+E[ ] 1ηv +

=

M D 1⁄ ⁄

Trang 14

The former case corresponds to a switch that gives priority in the channel contention to oldercells, that is cells that spent more time in the HOL positions The latter case means assumingthat the cell winning a channel contention is always selected random in the virtual queue Animplementation example of the FIFO service in the virtual queue is provided while describingthe enhanced architecture with channel grouping in Section 7.1.3.1 Note that the thatmakes the denominator vanish also gives the maximum throughput already found in thepreceding section with a different approach.

The accuracy of the analytical model is assessed in Figure 7.9 for both service disciplinesFIFO and RO in the virtual queue (a continuous line represents analytical results, whereassimulation data are shown by “×” and “+” signs for FIFO and RO, respectively) The modelassumes an infinite size for the squared switch, whereas the simulation refers to a switch size

in which a complete fairness is accomplished The model provides very accurateresults since it does not introduce any approximation on the real switch behavior and, as dis-cussed in the previous section, the asymptotic throughputs of the two switches are almost thesame As one might expect, Figure 7.9 shows that the average packet delay grows unbounded

as the offered load approaches the switch capacity

7.1.2.3 Packet loss probability

It has been shown that a switch with pure input queueing has a maximum throughput lowerthan a crossbar switch Then a question could arise: what is the advantage of adding inputqueueing to a non-blocking unbuffered structure (the crossbar switch), since the result is adecrease of the switch capacity? The answer is rather simple: input queueing enables controlthe packet loss performance for carried loads by suitably sizing the input queues

Figure 7.9 Delay performance of an ATM switch with input queueing

p

ρmax

N = 256

110100

ρ ρ< max

Trang 15

The procedure to evaluate the cell loss probability when the input queues have a finite size

of cells is a subcase of the iterative analysis described in Section 8.1.2.2 for an infinite-sizenon-blocking switch with combined input–output queueing in which (the onlypacket being served can sit in the output queue) and (at most one packet per switchoutlet can be transmitted in a slot) The results are plotted in Figure 7.10 for a buffer rang-ing from 1 to 32 (simulation results for a switch with are shown by dots) togetherwith the loss probability of a crossbar switch If our target is to guarantee apacket loss probability less than , we simply limit the offered load to with

or to for , whereas the packet loss probability of the crossbar switch

is above even for So, input queueing does control the packet lossperformance

A simple upper bound on the packet loss probability has also been evaluated [Hui87], byrelying on the distribution of the buffer content in a switch with infinitely large input queues.The loss probability with a finite buffer is then set equal to the probability that the content

of the infinite queue exceeds and gives

(7.4)

7.1.3 Enhanced architectures

The throughput limitations of input queueing architectures shown in Section 7.1.2.1 and due

to the HOL blocking phenomenon can be partially overcome in different ways Two

tech-niques are described here that are called Channel grouping and Windowing.

Figure 7.10 Loss performance of a non-blocking ATM switch with input queueing

2

2 1( –p)2 -

B i 1

=

Trang 16

7.1.3.1 Architecture with channel grouping

With the traditional unichannel bandwidth allocation, an amount of bandwidth is reserved on an output channel of the switching node for each source at call establishment time A multichannel

bandwidth allocation scheme is proposed in [Pat88] in which input and output channels of a

switch are organized in channel groups, one group representing the physical support of virtual

connections between two routing entities, each residing either in a switching node or in a user–

network interface More than one channel (i.e., a digital pipe on the order of 150/600 Mbit/s)and more than one channel group can be available between any two routing entities Considerfor example Figure 7.11 where four channels are available between the network interface NI1(NI2) and its access node SN1 (SN2) and six channels connect the two nodes SN1 and SN2(Figure 7.11a) The same network in which multichannel bandwidth is available includes forexample two groups of three channels between the two network nodes, two groups of twochannels between NI1 and SN1 and one group of four channels between NI2 and SN2(Figure 7.11b) With the multichannel bandwidth allocation the virtual connections are allo-cated to a channel group, not to a single channel and the cells of the connections can betransmitted on any channel in the group Then we could say that the switch bandwidth is allo-

cated according to a two-step procedure: at connection set-up time connections are allocated to a channel group, whereas at transmission time cells are allocated to single channels within a group.

The bandwidth to be allocated at connection set-up time is determined as a function of thechannel group capacity, the traffic characteristics of the source, and the delay performanceexpected The criterion for choosing such bandwidth, as well as the selection strategy of thespecific link in the group, is an important engineering problem not addressed here Our inter-est is here focused on the second step of the bandwidth allocation procedure

At transmission time, before packets are switched, specific channels within a group areassigned to the packets addressed to that group, so that the channels in a group behave as a set

of servers with a shared waiting list The corresponding statistical advantage over a packet

Figure 7.11 Arrangement of broadband channels into groups

switching node

NI2

switching node

SN1

switching node

SN2

network interface

NI1

network interface

NI2

Trang 17

switch with a waiting list per output channel is well known This “optimal” bandwidth ment at transmission time requires coordination among the port controllers which may beachieved by designing a fast hardware “channel allocator” This allocator, in each slot, collectsthe channel group requests of all port controllers and optimally allocates them to specific chan-

assign-nels in the requested channel groups The number of chanassign-nels within group j assigned in a slot equals the minimum between the number of packets requesting group j in the slot and the number of channels in group j Packets that are denied the channel allocation remain stored in

the buffer of the input port

This multichannel bandwidth allocation has two noteworthy implications for the kind ofservice provided by the switching node On the one hand, it enables the network to perform a

“super-rate switching”: virtual connections requiring a bandwidth greater than the channelcapacity are naturally provided, as sources are assigned to a channel group, not a single channel

On the other hand, packets could be delivered out-of-sequence to the receiving user, sincethere is no guarantee that the packets of a virtual connection will be transmitted on the samechannel It will be shown how the implementation of the multichannel bandwidth allocationscheme is feasible in an ATM switch with input queueing and what performance improve-ments are associated with the scheme

Switch architecture. An ATM switch with input queueing adopting the multichannel width allocation called the MULTIPAC switch, is described in detail in [Pat90] Figure 7.3,showing the basic architecture of the Three-Phase switch, also represents the structure of theMULTIPAC switch if the functionality of the allocation network is properly enriched Nowthe contention resolution algorithm requires the addresses of the output ports terminating thechannels of the same group to be consecutive This requirement could seriously constrain achange of the configuration of the interswitch communication facilities, e.g., following a link

band-failure or a change in the expected traffic patterns For this reason, a logical addressing scheme of

the output channels is defined, which decouples the channel address from the physical address

of the output port terminating the channel

Each channel is assigned a logical address, so that a channel group is composed of channels

with consecutive logical addresses, and a one-to-one mapping is defined between the channellogical address and the physical address of the port terminating the channel The channel with

the lowest logical address in a group is the group leader The group leader's logical address also represents the group address A specific channel in a group is identified by a channel offset given

by the difference between the channel's logical address and the group leader's logical address.Each port controller is provided with two tables, and maps the logical address tothe physical address (i.e., the port address) of each channel and specifies the maximum

value, maxoff(i), allowed for the channel offset in group i Tables and are changed onlywhen the output channel group configuration is modified

If the N switch outlets are organized into G groups where is the number of channels,

or capacity, of group i, then Therefore Let

be the maximum capacity allowed for a channel group and, for simplicity, N be a power of 2.

Let and d denote the number of bits needed to code the logical address of a

chan-nel (or the physical address of a port) and the chanchan-nel offset, respectively

Trang 18

The procedure to allocate bandwidth at transmission time, which is referred to as a

multi-channel three-phase algorithm, is derived from the three-phase algorithm described in

Section 7.1.1, whose principles are assumed now to be known In addition to solving the put channel contention, the algorithm includes the means to assign optimally the channelswithin a group to the requesting packets slot by slot Compared to the control packet formatsdescribed in Section 7.1.1.1, now the control packets are slightly different:

out-• Packet REQ(j,v,i) is composed of the identifier j of the destination channel group (i.e., the logical address of the group leader), the request packet priority v and the physical address i of the source port controller Field priority, which is bit long, is used to give priority in thechannel allocation process to the older user packets

• Packet ACK(i,actoff(j)) includes the PC source address i and the actual offset field actoff(j) The

actual offset field is bit long and identifies the output channel within the requestedgroup assigned to the corresponding request

• Packet DATA(m,cell) includes the switch outlet address m allocated to the PC and the cell

to be switched

In Phase I, port controller PCi with a cell to be transmitted to channel group j sends a request packet REQ(j,v,i) The value of v is given by decreased by the number of slots spent inthe HOL position by the user packet When the priority range is saturated, that is the userpacket has spent at least slots in the switch, the corresponding request packet willalways maintain the priority value The “channel allocator” assigns an actual offset act-

off(j) to each request for group j, within its capacity limit , to spread the requests over all the

channels of group j Note that there is no guarantee that the number of requests for group j

does not exceed the number of channels in the group Each offset belonging to the interval

is assigned only once to the requests for channel group j, while other requests

for the same group are given an offset Since , each

channel of group j is allocated to only one request for group j So, the number of bits needed

to code the actual offset is not less than

In Phase II, field source of packet REQ(j,v,i) and actoff(j) assigned to this request are received

by a port controller, say PCk , that transmits them in packet ACK(i,actoff(j)) to PC i through theinterconnection network It is easy to verify that the interconnection network structure of the

MULTIPAC switch is such that any two packets ACK(i,actoff(j)) are generated by different port controllers and all packets ACK(i,actoff(j)) are delivered to the addressed port controllers with-

out path contention, i.e., by disjoint paths within the interconnection network

In Phase III, if , PCi sends its data packet DATA(m,cell) to a specific output channel of group j with physical address m Table maps the allocated logical channel

to the corresponding physical address m Packets DATA(m,cell) cross the

Batcher-banyan section of the interconnection network without collisions, since the winningrequests have been assigned different output logical addresses and, hence, different physicaladdresses of output channels If , the port controller will issue a new

request REQ(j,v,i) the next slot for its head-of-line cell, which remains stored in the input port

queue

The packet flow through interconnection network is the same shown in Figure 7.4 for the

basic IQ Three-Phase switch, in which the signal a now represents the offset actoff(.) of the

Trang 19

acknowledgment packet transmitted on and associated to the request packet emerging onoutlet The channel allocation network is now more complex than in the unichannel three-phase algorithm It is composed of subnetworks A and B (Figure 7.12) Subnetwork A receives

a set of adjacent packets with non-decreasing destination addresses and identifies the requests

for the same channel group Subnetwork B, which includes s stages of adders, assigns an actual offset actoff(j) to each packet addressed to channel group j, so that the offsets corresponding to

each member of the group are assigned only once

An example of the packet switching in a multichannel Three-Phase switch with isshown in Figure 7.13 In particular the operation of the allocation network is detailed in theprocess of generating the acknowledgment packets starting from the request packets In theexample three requests are not accepted, that is one addressing group 0 that is given

(the group maximum offset stored in is 0) and two addressing groups 4,which are given (the group maximum offset is 2) The following data phase

is also shown in which the five PCs that are winners of the channel contention transmit a datapacket containing their HOL cell to the switch outlets whose physical addresses are given bytable

As already mentioned, the adoption of the channel grouping technique has some tions on cell sequencing within each virtual call In fact since the ATM cells of a virtual callcan be switched onto different outgoing links of the switch (all belonging to the same group),calls can become out of sequence owing to the independent queueing delays of the cellsbelonging to the virtual call in the downstream switch terminating the channel group It fol-lows that some additional queueing must be performed at the node interfacing the end-user,

implica-so that the correct cell sequence is restored edge-to-edge in the overall ATM network

Implementation guidelines. The hardware structure of subnetwork A is the same as sented in Figure 7.6, which now represents the hardware associated with outlet ofsubnetwork A The EX-OR gate performs the same function Now, port controllers generate

pre-packet ACK(.,.) by transmitting field source of the pre-packet REQ(.,.,.) being received, ately followed by the computed actoff(.) When the first bit of packet REQ(.,.,.) is transmitted

immedi-on outlet , it takes bit times to generate the first bit of actoff(.) by the channel

alloca-Figure 7.12 Channel allocation network

Channel offset computation

Trang 20

Figure 7.13 Example of packet switching in the MULTIPAC switch

1 0 0 1 1 1

1 0 0 1 2 3

5 6 0 1 2 3

4

0 1 4 4 4 4 0

A(4,0) A(5,1) A(6,0) A(0,0) A(1,1) A(2,2) A(3,3) A(7,4)

R(0,4) R(0,5) R(1,6) R(4,0) R(4,1) R(4,2) R(4,3) R(4,7)

B A

PC01

from the Sorting network to the Sorting network

Sorting Routing

D(2) A(0,0)

D(7) D(0)

D(4)

D(3)

A(1,1) A(2,2) A(3,3) A(4,0) A(5,1) A(6,0) A(7,4)

D(0)

D(2)

D(7)

D(4) D(3)

PC01

logical physical group # maxoff

0 1 3 4 7

4 3 5 6 2 7 0 1

0 1 0 2 0

0 1 2 3 4 5 6 7

Trang 21

tion network (n bit in subnetwork A and s bit in subnetwork B), while it takes 2n bit times to

complete the transmission of the first field of packet ACK(.,.) by a port controller The ANDgate in subnetwork A suitably delays the start of computation of the channel offset in subnet-

work B, so as to avoid the storage of actoff(.) in the port controller Also in this case the signal

on outlet is always low, as is required by subnetwork B, which always gives the value 0 to

the actoff(.) associated to the packet REQ(.,.,.) transmitted on outlet

Subnetwork B is a running sum adder of stages computing the digits

of the actoff(.) assigned to each packet REQ(.,.) Subnetwork B is represented in Figure 7.14

for the case of and The AND gate is enabled by signal forone bit time, so that the adder of the first stage at port only receives the binary number

1 (if is high) or the binary number 0 (if is low) Based on the structure of

subnet-work B, the output of each adder of stage z is a binary stream smaller than orequal to with the least significant bit first Hence, bit are

needed to code actoff(j) that emerges from stage s of subnetwork B The AND gates allow

an independent computation of the running sums for each requested channel group, by ting the running sum on each inlet with low signal This is made possible by subnetwork Athat guarantees that at least one inlet with low signal separates any two sets of adjacent

reset-inlets with high signal In fact, the complete set of requests for channel group j transmitted on

outlets determines a low signal on inlet and a highsignal on inlets

The binary stream on output of subnetwork B represents the offset

actoff(j) assigned to packet REQ(j,v,i) transmitted on outlet The offset for packet REQ(.,.,.)transmitted on outlet is always 0, because any other requests for the same channel groupwill be given an offset greater than 0 Thus, the offset allocated by subnetwork B to the requesttransmitted on output is

Note that with this implementation is constrained to be a power of 2 An example

of operation of the channel allocation network is also shown in Figure 7.14 According to theoffset allocation procedure, in this case two out of ten requests for the channel group 7 aregiven the same If then four requests for channel group j, i.e those

transmitted by ports , lose the contention, since they receive an Therunning sum operations in subnetwork B for this example can also be traced stage by stage inFigure 7.14

With the same assumptions and procedure adopted in the (unichannel) Three-Phase switchdescribed in Section 7.1.1.1, the switching overhead required to performthe three-phase algorithm is now computed The duration of Phase I is given by the latency

in the Batcher network and the transmission time of the first two fields inpacket REQ(.,.,.) (the field source in packet REQ(.,.,.) becomes field source in packetACK(.,.) and its transmission time is summed up in Phase II) The duration of Phase II includesthe latency in the Batcher network, the latency n in the banyan network, the

Trang 22

Figure 7.14 Hardware structure of subnetwork B

1

0

1 0

Trang 23

transmission time of packet ACK(i,actoff(j)) and the eventual additional time needed by

the port controller to check if and to sum j and actoff(j) It can be

shown [Pat88] that these tasks can be completed in 1 bit time after the complete reception ofpackets ACK(.,.) by the port controllers Note that the subnetwork A of AN has no latency(the result of the address comparison is available at the EX-OR gate output when the receipt

of the destination field in packet REQ(.,.,.) is complete) Furthermore, the latency of

subnet-work B of AN must not be summed up as the running sum lasts s bit times and this interval

overlaps the transmission time of fields priority and source in packet ACK(.,.) (this conditionholds as long as ) We further assume that the channel logical address ismapped onto the corresponding channel physical address in a negligible time Hence, the totalduration of Phases I and II for a multichannel switch is given by

Thus providing the multichannel capability to a Three-Phase switch implies a small tional overhead that is a logarithmic function of the maximum channels group capacity For areasonable value , a switch size , no priority ( ) and the standardcell length of 53 bytes that implies , we have in the unichannelswitch and in the multichannel switch

addi-In order to reduce the switching overhead the multichannel three-phase algorithm can berun more efficiently by pipelining the signal transmission through the different networks so as

to minimize their idle time [Pat91] In this pipelined algorithm it takes at least two slots to cessfully complete a reservation cycle from the generation of the request packet to thetransmission of the corresponding cell By doing so, the minimum cell switching delaybecomes two slots but the switching overhead reduces from to Thus,with an external channel rate Mbit/s, the switch internal rate iscorrespondingly reduced from 200 Mbit/s to 179 Mbit/s

suc-Performance evaluation. The performance of the multichannel MULTIPAC switch will beevaluated using the same queueing model adopted for the basic input queueing architecture(see Section 7.1.2) The analysis assumes , while keeping a constant expansion ratio

The input queue can be modelled as a queue with offered load p,

where the service time is given the queueing time spent in the virtual queue that is elled by a synchronous queue with average arrival rate (each virtual

mod-queue includes R output channels and has a probability of being addressed by a HOLpacket) A FIFO service is assumed both in the input queue and in the virtual queue Thus,Equation 7.3 provides the maximum switch throughput (the switch capacity) , equal to

the p value that makes the denominator vanish, and the average cell delay Themoments of the queue are provided according to the procedure described in[Pat90]

Table 7.3 gives the maximum throughput for different values of channel groupcapacity and expansion ratio Channel grouping is effective especially for small expansionratios: with a group capacity , the maximum throughput increases by 50% for

(from 0.586 to 0.878), and by 30% for (it becomes very close to 1)

Trang 24

This group capacity value is particularly interesting considering that each optical fiber running

at a rate of 2.4 Gbit/s, which is likely to be adopted for interconnecting ATM switches, is able

to support a group of 16 channels at a rate Mbit/s Less significant throughputincreases are obtained for higher expansion ratios

The average packet delay T given by the model for , a channel group capacity

is plotted in Figure 7.15 and compared to results obtained by puter simulation of a switch with and very large input queues The analytical resultsare quite accurate, in particular close to saturation conditions The mismatching between theo-retical and simulation delay for small group capacities in a non-saturated switch is due to theapproximate computation of the second moment of in the virtual queue

com-Table 7.3 Switch capacity for different channel group sizes and expansion ratios

Trang 25

The priority mechanism described in multichannel architecture (local priority) that gives

priority to older cells in the HOL position of their queue has been correctly modelled by aFIFO service in the virtual queue Nevertheless, a better performance is expected to be pro-

vided by a priority scheme (global priority) in which the cell age is the whole time spent in the

input queue, rather than just the HOL time The latter scheme, in fact, aims at smoothing outthe cell delay variations taking into account the total queueing time These two priorityschemes have been compared by computer simulation and the result is given in Figure 7.16 Asone might expect, local and global priority schemes provide similar delays for very low trafficlevels and for asymptotic throughput values For intermediate throughput values the globalscheme performs considerably better than the local scheme

Figure 7.17 shows the effect of channel grouping on the loss performance of a switch withinput queueing, when a given input queue is selected (the results have been obtained throughcomputer simulation) For an input queue size and a loss performance target, say, the acceptable load is only without channel grouping ( ), whereas itgrows remarkably with the channel group size: this load level becomes for and for This improvement is basically due to the higher maximum through-put characterizing a switch with input queueing Analogous improvements in the lossperformance are given by other input queue sizes

7.1.3.2 Architecture with windowing

A different approach for relieving the throughput degradation due to the HOL blocking in

switches with input queueing is windowing [Hlu88] It consists in allowing a cell other than the

HOL cell of an input queue to be transmitted if the HOL cell is blocked because of a tion for the addressed switch outlet Such a technique assumes that a non-FIFO queueing

conten-Figure 7.16 Delay performance of an IQ switch with local and global priority

Switch throughput, ρR=1 2 4 8 16 32

B i = 4

p = 0.5 R = 4

p = 0.8 R = 8

Trang 26

capability is available in the input queue management The switch outlets are allocated using W

reservation cycles per slot rather than the single cycle used in the basic input queueing tecture In the first cycle the port controllers will request a switch outlet for their HOL cell:the contention winners stop the reservation process, while the losers will contend for the non-reserved outlets in the second cycle The request will be for the switch outlet addressed by apacket younger than the HOL packet Such a younger packet is identified based on either oftwo simple algorithms [Pat89]:

archi-(a) it is the oldest packet in the buffer for which a reservation request was not yet issued

in the current slot;

(b) it is the oldest packet in the buffer for which, in the current slot, a reservation requestwas not yet issued and whose outlet address has not yet been requested by the portcontroller

Then in the generic cycle i , each port controller who has been a loser in the vious cycle will issue a new request for another younger packet in its queue Since up to

pre-W adjacent cells with algorithm (a), or pre-W cells addressing different switch outlets with

algo-rithm (b) contend in a slot for the N switch outlets, the technique is said to use a window of size W Apparently the number of reserved outlets in a slot is a non-decreasing function of the

window size

The adoption of a non-FIFO queueing implies that cell out-of-sequence on a virtual callcan take place if algorithm (a) is adopted with a priority scheme different from that based oncell age in the switch adopted in Section 7.1.3.1 For example cell sequencing is not guaran-teed if a service priority rather than a time priority is used: priority is given to cells belonging

to certain virtual circuits independent of their queueing delay As with channel grouping, a

Figure 7.17 Loss performance with different channel group sizes

Tiêu đề	ATM Switching with Non-Blocking Single-Queueing Networks
Trường học	John Wiley & Sons Ltd
Chuyên ngành	Networking
Thể loại	Chương
Năm xuất bản	1998

Định dạng
Số trang	53
Dung lượng	763,29 KB