P2P approaches best support local dynamic inter-actions; the distributed broker approach scales best globally but cannot easily managethe rich structure of transient services, which woul
Trang 1NaradaBrokering: an event-based infrastructure for building scalable
durable peer-to-peer Grids
Geoffrey Fox and Shrideep Pallickara
Indiana University, Bloomington, Indiana, United States
22.1 INTRODUCTION
The peer-to-peer (P2P) style interaction [1] model facilitates sophisticated resource ing environments between ‘consenting’ peers over the ‘edges’ of the Internet; the ‘disrup-tive’ [2] impact of which has resulted in a slew of powerful applications built aroundthis model Resources shared could be anything – from CPU cycles, exemplified bySETI@home (extraterrestrial life) [3] and Folding@home (protein folding) [4] to files(Napster and Gnutella [5]) Resources in the form of direct human presence include col-laborative systems (Groove [6]) and Instant Messengers (Jabber [7]) Peer ‘interactions’involve advertising resources, search and subsequent discovery of resources, request foraccess to these resources, responses to these requests and exchange of messages betweenpeers An overview of P2P systems and their deployments in distributed computing and
shar-Grid Computing – Making the Global Infrastructure a Reality. Edited by F Berman, A Hey and G Fox
2003 John Wiley & Sons, Ltd ISBN: 0-470-85319-0
Trang 2collaboration can be found in Reference [8] Systems tuned towards large-scale P2P
sys-tems include Pastry [9] from Microsoft, which provides an efficient location and routing
substrate for wide-area P2P applications Pastry provides a self-stabilizing infrastructurethat adapts to the arrival, departure and failure of nodes FLAPPS [10], a ForwardingLayer for Application-level Peer-to-Peer Services, is based on the general ‘peer inter-networking’ model in which routing protocols propagate availability of shared resourcesexposed by remote peers File replications and hoarding services are examples in whichFLAPPS could be used to relay a source peer’s request to the closest replica of the shared
resource The JXTA [11] (from juxtaposition) project at Sun Microsystems is another
research effort that seeks to provide such large-scale P2P infrastructures Discussions taining to the adoption of event services as a key building block supporting P2P systemscan be found in References [8, 12]
per-We propose an architecture for building a scalable, durable P2P Grid comprisingresources such as relatively static clients, high-end resources and a dynamic collection ofmultiple P2P subsystems Such an infrastructure should draw upon the evolving ideas
of computational Grids, distributed objects, Web services, peer-to-peer networks andmessage-oriented middleware while seamlessly integrating users to themselves and toresources, which are also linked to each other We can abstract such environments as adistributed system of ‘clients’, which consist either of ‘users’ or ‘resources’ or proxiesthereto These clients must be linked together in a flexible, fault-tolerant, efficient, high-performance fashion We investigate the architecture, comprising a distributed brokeringsystem that will support such a hybrid environment In this chapter, we study the event bro-kering system – NaradaBrokering – that is appropriate to link the clients (both users andresources of course) together For our purposes (registering, transporting and discoveringinformation), events are just messages – typically with time stamps The event brokeringsystem NaradaBrokering must scale over a wide variety of devices – from handheld com-puters at one end to high-performance computers and sensors at the other extreme Wehave analyzed the requirements of several Grid services that could be built with this model,including computing and education and incorporated constraints of collaboration with
a shared event model We suggest that generalizing the well-known publish–subscribemodel is an attractive approach and this is the model that is used in NaradaBroker-ing Services can be hosted on such a P2P Grid with peer groups managed locally andarranged into a global system supported by core servers Access to services can then
be mediated either by the ‘broker middleware’ or alternatively by direct P2P interactionsbetween machines ‘on the edge’ The relative performance of each approach (which couldreflect computer/network cycles as well as the existence of firewalls) would be used indeciding on the implementation to use P2P approaches best support local dynamic inter-actions; the distributed broker approach scales best globally but cannot easily managethe rich structure of transient services, which would characterize complex tasks We useour research system NaradaBrokering as the distributed brokering core to support such ahybrid environment NaradaBrokering is designed to encompass both P2P and the tradi-tional centralized middle-tier style of interactions This is needed for robustness (since P2Pinteractions are unreliable and there are no guarantees associated with them) and dynamicresources (middle-tier style interactions are not natural for very dynamic clients andresources) This chapter describes the support for these interactions in NaradaBrokering
Trang 3There are several attractive features in the P2P model, which motivate the development
of such hybrid systems Deployment of P2P systems is entirely user-driven obviating theneed for any dedicated management of these systems Peers expose the resources that theyare willing to share and can also specify the security strategy to do so Driven entirely ondemand a resource may be replicated several times; a process that is decentralized andone over which the original peer that advertised the resource has sometimes little control.Peers can form groups with the fluid group memberships In addition, P2P systems tend
to be very dynamic with peers maintaining an intermittent digital presence P2P systemsincorporate schemes for searching and subsequent discovery of resources Communicationbetween a requesting peer and responding peers is facilitated by peers en route to thesedestinations These intermediate peers are thus made aware of capabilities that exist atother peers constituting dynamic real-time knowledge propagation Furthermore, sincepeer interactions, in most P2P systems, are XML-based, peers can be written in anylanguage and can be compiled for any platform There are also some issues that need to
be addressed while incorporating support for P2P interactions P2P interactions are attenuating with interactions dying out after a certain number of hops These attenuations
self-in tandem with traces of the peers, which the self-interactions have passed through, elimself-inatethe continuous echoing problem that results from loops in peer connectivity However,attenuation of interactions sometimes prevents peers from discovering certain services thatare being offered This results in P2P interactions being very localized These attenuationsthus mean that the P2P world is inevitably fragmented into many small subnets that arenot connected Peers in P2P systems interact directly with each other and sometimes useother peers as intermediaries in interactions Specialized peers are sometimes deployed toenhance routing characteristics Nevertheless, sophisticated routing schemes are seldom
in place and interactions are primarily through simple forwarding of requests with thepropagation range being determined by the attenuation indicated in the message.NaradaBrokering must support many different frameworks including P2P and cen-tralized models Though native NaradaBrokering supports this flexibility we must alsoexpect that realistic scenarios will require the integration of multiple brokering schemes.NaradaBrokering supports this hybrid case through gateways to the other event worlds
In this chapter we look at the NaradaBrokering system and its standards-based sions to support the middle-tier style and P2P style interactions This chapter is organized
exten-as follows; in Section 22.2 we provide an overview of the NaradaBrokering system
In Section 22.3 we outline NaradaBrokering’s support for the Java Message Service(JMS) specification This section also outlines NaradaBrokering’s strategy for replac-ing single-server JMS systems with a distributed broker network In Section 22.4 wediscuss NaradaBrokering’s support for P2P interactions, and in Section 22.5 we discussNaradaBrokering’s integration with JXTA
22.2 NARADABROKERING
NaradaBrokering [13–18] is an event brokering system designed to run a largenetwork of cooperating broker nodes while incorporating capabilities of content-based routing and publish/subscribe messaging NaradaBrokering incorporates protocols
Trang 4for organizing broker nodes into a cluster-based topology The topology is thenused for incorporating efficient calculation of destinations, efficient routing even inthe presence of failures, provisioning of resources to clients, supporting applicationdefined communications scope and incorporating fault-tolerance strategies Strategies foradaptive communication scheduling based on QoS requirements, content type, networkingconstraints (such as presence of firewalls, MBONE [19] support or the lack thereof)and client-processing capabilities (from desktop clients to Personal Digital Assistant(PDA) devices) are currently being incorporated into the system core Communicationwithin NaradaBrokering is asynchronous, and the system can be used to supportdifferent interactions by encapsulating them in specialized events Events are central inNaradaBrokering and encapsulate information at various levels as depicted in Figure 22.1.Clients can create and publish events, specify interests in certain types of events andreceive events that conform to specified templates Client interests are managed and used
by the system to compute destinations associated with published events Clients, once theyspecify their interests, can disconnect and the system guarantees the delivery of matchedevents during subsequent reconnects Clients reconnecting after prolonged disconnects,connect to the local broker instead of the remote broker that it was last attached to.This eliminates bandwidth degradations caused by heavy concentration of clients fromdisparate geographic locations accessing a certain known remote broker over and overagain The delivery guarantees associated with individual events and clients are met even
in the presence of failures The approach adopted by the Object Management Group(OMG) is one of establishing event channels and registering suppliers and consumers tothose channels The channel approach in the CORBA Event Service [20] could howeverentail clients (consumers) to be aware of a large number of event channels
22.2.1 Broker organization and small worlds behavior
Uncontrolled broker and connection additions result in a broker network that is ble to network partitions and that is devoid of any logical structure making the creation of
suscepti-Source Destinations Event descriptors Content descriptors Content payload Event distribution traces / Time To Live (TTL)
Event origins Explicit destinations Used to compute destinations
Used for eliminating continuous echoing/
attenuation of event.
Used to handle content
Figure 22.1 Event in NaradaBrokering.
Trang 5efficient broker network maps (BNM) an arduous if not impossible task The lack of thisknowledge hampers development of efficient routing strategies, which exploits the brokertopology Such systems then resort to ‘flooding’ the entire broker network, forcing clients
to discard events they are not interested in To circumvent this, NaradaBrokering rates a broker organization protocol, which manages the addition of new brokers and alsooversees the initiation of connections between these brokers The node organization pro-tocol incorporates Internet protocol (IP) discriminators, geographical location, cluster sizeand concurrent connection thresholds at individual brokers in its decision-making process
incorpo-In NaradaBrokering, we impose a hierarchical structure on the broker network, inwhich a broker is part of a cluster that is part of a super-cluster, which in turn is part
of a super-super-cluster and so on Clusters comprise strongly connected brokers withmultiple links to brokers in other clusters, ensuring alternate communication routes duringfailures This organization scheme results in ‘small world networks’ [21, 22] in whichthe average communication ‘pathlengths’ between brokers increase logarithmically withgeometric increases in network size, as opposed to exponential increases in uncontrolledsettings This distributed cluster architecture allows NaradaBrokering to support largeheterogeneous client configurations that scale to arbitrary size Creation of BNMs and thedetection of network partitions are easily achieved in this topology We augment the BNMhosted at individual brokers to reflect the cost associated with traversal over connections,for example, intracluster communications are faster than intercluster communications.The BNM can now not only be used to compute valid paths but also to compute shortestpaths Changes to the network fabric are propagated only to those brokers that havetheir broker network view altered Not all changes alter the BNM at a broker and thosethat do result in updates to the routing caches, containing shortest paths, maintained atindividual brokers
22.2.2 Dissemination of events
Every event has an implicit or explicit destination list, comprising clients, associated with
it The brokering system as a whole is responsible for computing broker destinations(targets) and ensuring efficient delivery to these targeted brokers en route to the intendedclient(s) Events as they pass through the broker network are to be updated to snapshot itsdissemination within the network The event dissemination traces eliminate continuousechoing and in tandem with the BNM – used for computing shortest paths – at eachbroker, is used to deploy a near optimal routing solution The routing is near optimalsince for every event the associated targeted set of brokers are usually the only onesinvolved in disseminations Furthermore, every broker, either targeted or en route to one,computes the shortest path to reach target destinations while employing only those linksand brokers that have not failed or have not been failure-suspected In the coming years,increases in communication bandwidths will not be matched by commensurately reducedcommunication latencies [23] Topology-aware routing and communication algorithmsare needed for efficient solutions Furthermore, certain communication services [24] arefeasible only when built on top of a topology-aware solution NaradaBrokering’s routingsolution thus provides a good base for developing efficient solutions
Trang 622.2.3 Failures and recovery
In NaradaBrokering, stable storages existing in parts of the system are responsible forintroducing state into the events The arrival of events at clients advances the state asso-ciated with the corresponding clients Brokers do not keep track of this state and areresponsible for ensuring the most efficient routing Since the brokers are stateless, theycan fail and remain failed forever The guaranteed delivery scheme within NaradaBroker-ing does not require every broker to have access to a stable store or database managementsystem (DBMS) The replication scheme is flexible and easily extensible Stable storagescan be added/removed and the replication scheme can be updated Stable stores can failbut they do need to recover within a finite amount of time During these failures, theclients that are affected are those that were being serviced by the failed storage
22.2.4 Support for dynamic topologies
Support for local broker accesses, client roams and stateless brokers provide an ronment extremely conducive to dynamic topologies Brokers and connections could beinstantiated dynamically to ensure efficient bandwidth utilizations These brokers and con-nections are added to the network fabric in accordance with rules that are dictated by theagents responsible for broker organization Brokers and connections between brokers can
envi-be dynamically instantiated on the basis of the concentration of clients at a geographic
n 20 21 19 22
Measuring subscriber
Publisher
Figure 22.2 Test topology.
Trang 7location and also on the basis of the content that these clients are interested in Similarly,average pathlengths for communication could be reduced by instantiating connections tooptimize clustering coefficients within the broker network Brokers can be continuouslyadded or can fail and the broker network can undulate with these additions and failures ofbrokers Clients could then be induced to roam to such dynamically created brokers foroptimizing bandwidth utilization A strategy for incorporation of dynamic self-organizingoverlays similar to MBONE [19] and X-Bone [25] is an area for future research.
22.2.5 Results from the prototype
Figure 22.3 illustrates some results [14, 17] from our initial research in which we studiedthe message delivery time as a function of load The results are from a system compris-ing 22 broker processes and 102 clients in the topology outlined in Figure 22.2 Eachbroker node process is hosted on one physical Sun SPARC Ultra-5 machine (128 MBRAM, 333 MHz), with no SPARC Ultra-5 machine hosting two or more broker node pro-
cesses The publisher and the measuring subscriber reside on the same SPARC Ultra-5
machine In addition, there are 100 subscribing client processes with 5 client processesattached to every other broker node (broker nodes 22 and 21 do not have any other clientsbesides the publisher and the measuring subscriber, respectively) within the system The
100 client node processes all reside on a SPARC Ultra-60 (512 MB RAM, 360 MHz)
Transit delay under different matching rates: 22 brokers 102 clients
Trang 8machine The run-time environment for all the broker node and client processes is SolarisJVM (JDK 1.2.1, native threads, JIT) The three matching values correspond to thepercentages of messages that are delivered to any given subscriber The 100% case corre-sponds to systems that would flood the broker network The system performance improvessignificantly with increasing selectivity from subscribers We found that the distributednetwork scaled well with adequate latency (2 ms per broker hop) unless the system becamesaturated at very high publish rates We do understand how a production version of theNaradaBrokering system could give significantly higher performance – about a factor of
3 lower in latency than the prototype By improving the thread scheduling algorithms andincorporating flow control (needed at high publish rates), significant gains in performancecan be achieved Currently, we do not intend to incorporate any non-Java modules
22.3 JMS COMPLIANCE IN NARADABROKERING
Industrial strength solutions in the publish/subscribe domain include products like
TIB/Rendezvous [26] from TIBCO and SmartSockets [27] from Talarian Other related
efforts in the research community include Gryphon [28], Elvin [29] and Sienna [30].
The push by Java to include publish–subscribe features into its messaging middlewareinclude efforts such as Jini and JMS One of the goals of JMS is to offer a unifiedApplication Programming Interface (API) across publish–subscribe implementations TheJMS specification [31] results in JMS clients being vendor agnostic and interoperatingwith any service provider; a process that requires clients to incorporate a few vendorspecific initialization sequences JMS does not provide for interoperability betweenJMS providers, though interactions between clients of different providers can beachieved through a client that is connected to the different JMS providers VariousJMS implementations include solutions such as SonicMQ [32] from Progress, JMQ fromiPlanet and FioranoMQ from Fiorano Clients need to be able to invoke operations asspecified in the specification; expect and partake from the logic and the guarantees that
go along with these invocations These guarantees range from receiving only those events
that match the specified subscription to receiving events that were published to a given
topic irrespective of the failures that took place or the duration of client disconnect.Clients are built around these calls and the guarantees (implicit and explicit) that areassociated with them Failure to conform to the specification would result in clientsexpecting certain sequences/types of events and not receiving those sequences, which inturn lead to deviations that could result in run-time exceptions
22.3.1 Rationale for JMS compliance in NaradaBrokering
There are two objectives that we meet while providing JMS compliance within kering:
NaradaBro-Providing support for JMS clients within the system: This objective provides for
JMS-based systems to be replaced transparently by NaradaBrokering and also for kering clients (including those from other frameworks supported by NaradaBrokering
Trang 9NaradaBro-such as P2P via JXTA) to interact with JMS clients This also provides NaradaBrokeringaccess to a plethora of applications developed around JMS.
To bring NaradaBrokering functionality to JMS clients/systems developed around it : This
approach (discussed in Section 22.3.3) will transparently replace single-server or server JMS systems with a very large scale distributed solution, with failure resiliency,dynamic real-time load balancing and scaling benefits
limited-22.3.2 Supporting JMS interactions
NaradaBrokering provides clients with connections that are then used for communications,interactions and any associated guarantees that would be associated with these interactions.Clients specify their interest, accept events, retrieve lost events and publish events overthis connection JMS includes a similar notion of connections To provide JMS compliance
we write a bridge that performs all the operations that are required by NaradaBrokeringconnections in addition to supporting operations that would be performed by JMS clients.Some of the JMS interactions and invocations are either supported locally or are mapped tocorresponding NaradaBrokering interactions initiated by the connections Each connectionleads to a separate instance of the bridge In the distributed JMS strategy it is conceivablethat a client, with multiple connections and associated sessions, would not have all of itsconnections initiated to the same broker The bridge instance per connection helps everyconnection to be treated independent of the others
In addition to connections, JMS also provides the notion of sessions that are registered
to specific connections There can be multiple sessions on a given connection, but anygiven session can be registered to only one connection Publishers and subscribers areregistered to individual sessions Support for sessions is provided locally by the bridgeinstance associated with the connection For each connection, the bridge maintains thelist of registered sessions, and the sessions in turn maintain a list of subscribers Upon thereceipt of an event over the connection, the corresponding bridge instance is responsiblefor forwarding the event to the appropriate sessions, which then proceed to deliver theevent to the listeners associated with subscribers having subscriptions matching the event
In NaradaBrokering, each connection has a unique ID and guarantees are associated withindividual connections This ID is contained within the bridge instance and is used to dealwith recovery and retrieval of events after prolonged disconnects or after induced roamdue to failures
We also need to provide support for the creation of different message types andassorted operations on these messages as dictated by the JMS specification, along withserialization and deserialization routines to facilitate transmission and reconstruction InNaradaBrokering, events are routed as streams of bytes, and as long as we provide mar-shalling–unmarshalling operations associated with these types there are no issues regard-ing support for these message types We also make use of the JMS selector mechanismimplemented in OpenJMS [33] The JMS subscription request is mapped to the corre-sponding NaradaBrokering profile propagation request and propagated through the system.The bridge maps persistent/transient subscriptions to the corresponding NaradaBrokeringsubscription types JMS messages that are published are routed through the NaradaBro-kering broker as a NaradaBrokering event The anatomy of a Narada/JMS event, encap-sulating the JMS messages, is shown in Figure 22.4 Events are routed on the basis of
Trang 10NARADA-JMS event Topic name Delivery mode (Persistent/transient) Priority JMS message Headers Payload
Figure 22.4 Narada-JMS event.
the mapped JMS The topic name is contained in the event Storage to databases isdone on the basis of the delivery mode indicator in the event Existing JMS applica-tions in which we successfully replaced the JMS provider with NaradaBrokering includethe multimedia-intensive distance education audio/video/text/application conferencing sys-tem [34] by Anabas Inc and the Online Knowledge Center (OKC) [35] developed at IUGrid Labs Both these applications were based on SonicMQ
22.3.3 The distributed JMS solution
By having individual brokers interact with JMS clients, we have made it possible to replacethe JMS provider’s broker instance with a NaradaBrokering broker instance The features
in NaradaBrokering are best exploited in distributed settings However, the distributednetwork should be transparent to the JMS clients and these clients should not be expected
to keep track of broker states, failures and associated broker network partitions and so on.Existing systems built around JMS should be easily replaced with the distributed modelwith minimal changes to the client In general, setups on the client side are to be performed
in a transparent manner The solution to the transparent distributed JMS solution wouldallow for any JMS-based system to benefit from the distributed solution Applicationswould be based on source codes conforming to the JMS specification, while the scalingbenefits, routing efficiencies and failure resiliency accompanying the distributed solutionare all automatically inherited by the integrated solution
To circumvent the problem of discovering valid brokers, we introduce the notion of
broker locators The broker locators’ primary function is the discovery of brokers that
a client can connect to Clients thus do not need to keep track of the brokers and theirstates within the broker network The broker locator has certain properties and constraintsbased on which it arrives at the decision regarding the broker that a client would connect
to as follows:
Load balancing: Connection requests are always forked off to the best available broker
based on broker metrics (Section 22.3.3.1) This enables us to achieve distributed dynamicreal-time load balancing
Trang 11Incorporation of new brokers: A newly added broker is among the best available brokers
to handle new connection requests Clients thus incorporate these brokers faster into therouting fabric
Availability: The broker locator itself should not constitute a single point of failure nor
should it be a bottleneck for clients trying to utilize network services The ing topology allows brokers to be part of domains There could be more than one brokerlocator for a given administrative domain
NaradaBroker-Failures: The broker locator does not maintain active connections to any element within
the NaradaBrokering system and its loss does not affect processing pertaining to anyother node
22.3.3.1 Metrics for decision making
To determine the best available broker to handle the connection request, the metrics thatplay a role in the broker locator’s decision include the IP-address of the requesting client,the number of connections still available at the brokers that are best suited to handlethe connection, the number of connections that currently exist, the computing capabilitiesand finally, the availability of the broker (a simple ping test) Once a valid broker hasbeen identified, the broker locator also verifies if the broker process is currently up andrunning If the broker process is not active, the computed broker is removed from thelist of available brokers and the broker locator computes the best broker from the currentlist of available brokers If the computed broker is active, the broker locator proceeds toroute broker information to the client The broker information propagated to the clientincludes the hostname/IP-address of the machine hosting the broker, the port number onwhich it listens for connections/communications and the transport protocol that is usedfor communication The client then uses this information to establish a communicationchannel with the broker Once it is connected to a NaradaBrokering broker, the JMS clientcan proceed with interactions identical to those in the single broker case
22.3.4 JMS performance data
To gather performance data, we run an instance of the SonicMQ (Version 3.0) broker andNaradaBrokering broker on the same dual CPU (Pentium-3, 1 GHz, 256 MB) machine
We then set up 100 subscribers over 10 different JMS TopicConnections on another dual
CPU (Pentium-3, 866 MHz, 256 MB) machine A measuring subscriber and a publisher
are then set up on a third dual CPU (Pentium-3, 866 MHz, 256 MB RAM) machine ting up the measuring subscriber and publisher on the same machine enables us to obviatethe need for clock synchronizations and differing clock drifts while computing delays Thethree machines involved in the benchmarking process have Linux (Version 2.2.16) as theiroperating system The run-time environment for the broker, publisher and subscriber pro-cesses is Java 2 JRE (Java-1.3.1, Blackdown-FCS, mixed mode) Subscribers subscribe to
Set-a certSet-ain topic Set-and the publisher publishes to the sSet-ame topic Once the publisher stSet-arts
issu-ing messages, the factor that we are most interested in is the transit delay in the receipt of
these messages at the subscribers We measure this delay at the measuring subscriber whilevarying the publish rates and payload sizes of the messages being published For a sample