Integrated computational and network QOS in grid computing

64 5.2 Average Processing Time for Jobs in Scenario I with Reservation.. 67 5.3 Average Processing Time for Jobs in Scenario I with Provisioning.. 70 5.4 Average Response Time for Jobs i

Trang 1

INTEGRATED COMPUTATIONAL AND NETWORK QOS IN

GRID COMPUTING

GOKUL PODUVAL(B.Eng.(Computer Engineering), NUS)

A THESIS SUBMITTEDFOR THE DEGREE OF MASTER OF ENGINEERING

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

NATIONAL UNIVERSITY OF SINGAPORE

2005

Trang 2

Uni-I am indebted to Anthony Sulistio and Dr Rajkumar Buyya from GRUni-IDSLab, University of Melbourne for helping me to integrate my work with GridSim.

I am thankful to NUS for providing me financial support for my research.This dissertation is dedicated to my parents and my sister, for their encour-agement and support at all times

Trang 3

1.1 Services Provided by Grid Computing 2

1.2 Need for Job Classes 4

1.3 Quality of Service 4

1.3.1 QoS for Processing Nodes 5

1.3.2 QoS for Network Elements 6

1.3.3 QoS Levels 7

1.4 Provisioning and Reservation 8

1.5 Service Level Agreements 8

1.6 Integrated Network and Processing QoS 9

1.7 Related Work 10

1.8 Aims of this Thesis 13

1.9 Organization of this Thesis 14

Trang 4

2 Reinforcement Learning 16

2.1 Introduction to Reinforcement Learning 16

2.1.1 Markov Decision Process 16

2.1.2 The Markov Property 17

2.1.3 Reinforcement Learning 17

2.1.4 State 19

2.1.5 Action 19

2.1.6 Rewards 20

2.1.7 Policy 21

2.1.8 Function Approximation 21

2.2 Solutions to Reinforcement Learning 23

2.2.1 Watkins’ Q-Lambda 23

2.2.2 SMART 26

2.3 Advantages of Reinforcement Learning 28

3 Design of Network Elements in GridSim 29 3.1 Introduction to GridSim 29

3.2 The Need for Network Simulation in GridSim 30

3.3 Design and Implementation of Network in GridSim 31

3.3.1 Network Components 33

3.3.2 Support for Network Quality of Service & Runtime Infor-mation 37

3.3.3 Interaction among GridSim Network Components 38

3.4 Related Work 39

3.5 Conclusion to GridSim 42

4 Reinforcement Learning based Resource Allocation 43 4.1 Life-cycle of a Grid Job 43

4.2 Network QoS 45

Trang 5

4.2.1 Bandwidth Provisioning via Weighted Fair Queuing 46

4.2.2 Bandwidth Reservation via Rate-Jitter Scheduling 48

4.3 QoS at Grid Resources 49

4.3.1 CPU Provisioning 49

4.3.2 CPU Reservation 50

4.4 Using RL for Resource Allocation 51

4.5 Simulation 51

4.5.1 State Space 52

4.5.2 Action Space 53

4.5.3 Reward Structure 54

4.5.4 Configuration of Reinforcement Learning Agents 55

4.5.5 Update Policy 57

4.6 Implementation on Testbed 58

5 Performance Evaluation 60 5.1 Simulation 60

5.2 Simulation Scenarios 60

5.3 Benchmarking 61

5.3.1 Configurations of Agents at UBs 61

5.3.2 Configuration of Agents at Routers and GRs 62

5.4 Simulation Setup 64

5.4.1 Topology and Characteristics 64

5.5 Scenario I - User Level Job Scheduler 65

5.5.1 Using Reservation on GRs 65

5.5.2 Using Provisioning on GRs 68

5.6 Scenario II - Resource-Level RL Management 71

5.6.1 Using Resource Reservation on Routers and GRs 72

5.6.2 Using Resource Provisioning on Routers and GRs 73

Trang 6

5.7.1 Using Resource Reservation on Routers and GRs 76

5.7.2 Using Resource Provisioning on Routers and GRs 78

5.8 Discussion 81

5.8.1 Reservation vs Provisioning 81

5.8.2 Q-Learning vs SMART 82

5.8.3 Policy Learnt by User Brokers 82

5.9 Implementation 84

5.10 Hardware Details 85

5.11 Configuration of the Experiment 86

5.11.1 Network Agent 86

5.11.2 CPU Agent 87

5.11.3 Resource Allocation Policies 88

5.12 Results 89

5.13 Issues with Reinforcement Learning 91

5.14 Conclusions from Simulations and Implementation 92

6 Conclusions and Future Work 93 6.1 Conclusions 93

6.2 Contributions 95

6.3 Recommendations for Future Work 95

6.3.1 Co-ordination among Agents 95

6.3.2 Better Network support in GridSim 96

Trang 7

List of Tables

3.1 Listing of Network Functionalities for Each Grid Simulator 40

5.1 Characteristics of Jobs in Simulation Setup 64

5.2 Average Processing Time for Jobs in Scenario I with Reservation 67

5.3 Average Processing Time for Jobs in Scenario I with Provisioning 70 5.4 Average Response Time for Jobs in Scenario II with Reservation 73

5.5 Average Processing Time for Jobs in Scenario II with Reservation 73 5.6 Average Response Time for Jobs in Scenario II with Provisioning 74 5.7 Average Processing Time for Jobs in Scenario II with Provisioning 75 5.8 Average Response Time for Jobs in Scenario III with Reservation 78 5.9 Average Processing Time for Jobs in Scenario III with Reservation 78 5.10 Average Response Time for Jobs in Scenario III with Provisioning 79 5.11 Average Processing Time for Jobs in Scenario III with Provisioning 80 5.12 Characteristics of Jobs in Implementation Setup 86

5.13 Number of Successful Jobs 89

5.14 Average Response Time for Successful Jobs 89

5.15 Average Response Time for Successful Jobs 90

Trang 8

List of Figures

1.1 A Virtual Organization Aggregates Resources in Various Domains

to Appear as a Single Resource to the End-user 32.1 Reinforcement Learning Model 183.1 A Class Diagram Showing the Relationship between GridSim andSimJava entities 313.2 A Class Diagram Showing the Relationship between GridSim andSimJava Entities 323.3 Interaction among GridSim Network Components 333.4 Generalization and Realization Relationship in UML for GridSimNetwork Classes 343.5 Association Relationship in UML for GridSim Network Classes 354.1 Flow of a Grid Job 434.2 Sample Time-line Showing Generation and Completion of Jobs 524.3 Effect of Darken-Chang-Moody Decay Algorithm on Learning Rate 565.1 Simulation Setup 635.2 Average Processing Time (s) in Scenario I with Reservation (Class 1) 665.3 Average Processing Time (s) in Scenario I with Reservation (Class 2) 665.4 Distribution of Jobs in Scenario I using Reservation (QL) 685.5 Distribution of Jobs in Scenario I using Reservation (ExpAvg) 69

Trang 9

5.6 Average Processing Time (s) in Scenario I with Provisioning (Class

1) 69

5.7 Average Processing Time (s) in Scenario I with Provisioning (Class 2) 70

5.8 Distribution of Jobs in Scenario I with Provisioning (SMART) 71

5.9 Distribution of Jobs in Scenario I with Provisioning (ExpAvg) 72

5.10 Average Response Time (s) in Scenario II with Reservation 74

5.11 Average Response Time (s) in Scenario II with Provisioning 75

5.12 Average Response Time (s) in Scenario III with Reservation (Class 1) 77

5.13 Average Response Time (s) in Scenario III with Reservation (Class 2) 77

5.14 Distribution of Jobs in Scenario III using Reservation (QL) 79

5.15 Average Response Time (s) in Scenario III with Provisioning 80

5.16 Distribution of Jobs in Scenario III using Provisioning (SMART) 81

5.17 Value of Choosing GR1 or GR2 for User1 and User2 83

5.18 Implementation Setup 84

5.19 Average Response Time of Successful Jobs 89

5.20 Number of Jobs that Finished within their Deadline 90

Trang 11

φ Weight Assigned to a Class or Flow

Trang 12

List of Abbreviations

CMAC Cerebellar Model Articulate Controller

GRACE Grid Architecture for Computational Economy

Trang 13

I/O Input/Output

Trang 14

UML Unified Modeling Language

Trang 15

Grid computing is a technology that allows organizations to lower their computingcosts by allowing them to share computing resources, software licenses and storagemedia As more services are pushed to grid networks in the future, Quality ofService (QoS) will become an important aspect of this service

In this thesis, we look into a method for providing QoS through learning andautonomic methods The learning methodology we use is known as ReinforcementLearning (RL), a stochastic optimization method used in areas like robotics Anautonomous method is one in which no manual intervention is required, and amajor aim in this thesis is to provide QoS in such a manner RL based systemswill help achieve this, since they are model free, and require no supervision tolearn An autonomous system will not require constant monitoring, and if welldesigned, will be able to maximize the utility of the grid

We explore two RL methods, known as Watkins’ Q(λ) and Semi-Markovian erage Reward Technique (SMART), to perform resource allocation on computingand network resources We also explore two alternatives to resource allocation,provisioning and reservation We performed simulations by selectively enablingour proposed solution on user’s grid brokers and agents located at networking andcomputing resources We have evaluated the performance of our learning method-ology in simulation using a grid simulation software known as GridSim Sinceearlier versions of GridSim did not support networking resources like routers andnetwork links, we extended GridSim to provide this functionality The design ofthis network functionality is covered in brief We also performed experiments on

Av-a testbed in order to support the observAv-ations mAv-ade from the Av-anAv-alysis of our ulations This thesis describes in detail the design of our proposed solution toproviding QoS, the experiments performed to verify our solution the conditions

Trang 16

From the simulation and implementation results, we conclude that ment learning methods are able to adapt successfully to a given scenario Resourceallocation techniques relying on RL methods are able to modify their allocationlevels to support the workload provided to the system These methods work betterthan Static methods of resource allocation We also conclude that reservation ofresources can provide better a quality of servicve then provisioning methods.Keywords: Grid Computing, Reinforcement Learning, Watkins Q(λ), SMART,GridSim

Trang 17

reinforce-List of Publications

• A Sulistio, G Poduval, R Buyya and C.-K Tham, ”On IncorporatingDifferentiated Network Service into GridSim”, submitted for approval toGrid Technology and Application, a special section with Future GenerationComputer Systems: The International Journal of Grid Computing: Theory,Methods, and Applications, Elsevier Publications , 2006

• C.-K Tham, and Gokul Poduval, ”Adaptive Self-Optimizing Resource agement for the Grid”, to appear in The 3rd International Workshop on GridEconomics and Business Models (GECON ’06), Singapore, 2006

Man-• A Sulistio, G Poduval, R Buyya and C.-K Tham, ”Constructing a GridSimulation with Differentiated Network Service using GridSim”, in Pro-ceedings of The 2005 International MultiConference in Computer Science

& Computer Engineering (ICOMP ’05), Las Vegas, Nevada, USA, 2005

• G Poduval and C.-K Tham, ”A Neuro-Dynamic Approach to ResourceProvisioning in Computational Networks”, in The 2004 Australian Telecom-munication Networks and Applications Conference (ATNAC ’04), Sydney,Australia, 2004

Trang 18

Chapter 1

Introduction

Grid Computing has emerged as a powerful way to maximize the value of puting resources Grid computing allows organizations and people to unite theircomputing, storage and network systems into a virtual system which appears asone point of service to a user Grid computing can be something as simple as acollection of similar machines at one location to a mix of diverse systems spread allover the world The machines may not even be owned by a single entity Organiza-tions can link to other organizations computing resources for collaborative problemsolving in various fields of science, engineering, medicine etc These resources areshared with strict rules and are highly controlled, with resource providers and con-sumers defining clearly what is being shared, the conditions of them being shared,cost, time of use, specific use etc A set of individuals and/or institutions sharingresources under such conditions is known as a Virtual Organization (VO) [1]

com-Some of the most primitive forms of Grid computing were projects like SETI@Home[2] and Distributed.net [3] These projects harnessed the spare computing cycles

of PCs distributed all around the world for a single purpose These applicationsare also known as Peer-to-Peer (P2P) networks

Trang 19

1.1 Services Provided by Grid Computing

Grid computing can provide organizations the following features [4]

• Exploit underutilized resources - Many computers in organizations are onlyactive at certain times of the day For e.g., desktop machines used by peopleare only used during office hours The computing cycles of these machinescan be harnessed with the use of grid computing to run compute intensivejobs This is especially easy if the process to be executed is easily parallelized.This process is also known as scavenging

• Enable Virtual Resources and Virtual Organizations - Grid computing plifies collaboration between different organizations Different users can bedivided into different virtual organizations , with each VO having a differentpolicy for sharing of resources Storage space, data, application licenses andcomputing power etc can be shared by people who are in different physicalorganizations, but in the same VO They can have common security poli-cies, and can implement payment systems for the usage of resources To asingle user, all resources within a VO can be consolidated to feel like a singleresources

sim-• Provide Resource Balancing - In a grid, a large number of machines arefederated into a single entity If there is a sudden spike in the demand for

a computing resource, a good scheduling policy can split the request intovarious resources that form the grid

• Establish A Grid Computing Market - The fact that users from differentorganizations can use resources from other organizations transparently isgiving rise to services like computing and storage farms Entities known

as the Grid Service Provides can provide users access to large computing

Trang 20

Cluster Storage Minicomputer

Mainframe

Site B Site A

User Broker

Figure 1.1: A Virtual Organization Aggregates Resources in Various Domains toAppear as a Single Resource to the End-user

Trang 21

rates This is convenient for the users too since it lowers the cost for sional use of such resources Organizations with large computing resourcescan recoup their investments by lending out their idle computing prowess.This model is also referred to as Utility Computing.

occa-• Provide Quality of Service - In various organizations, some projects could

be of higher importance than the others due to its monetary value or cause of time constraints Users could implement brokers, who can negotiate

be-on their behalf with grid service providers to lease more computing power,storage space, network bandwidth etc In a utility computing model, serviceproviders can charge premium rates for providing expedient service

Jobs that run of Grid Networks may be of several types Some jobs may requirelarge amounts of computational power, while others may require quick responsetimes From the point of view of service providers, it is not be possible to specifywhat kind of service each job should receive Better service can be provided if jobsare profiled, and split into separate classes Each job in the same class receivesthe same service from the grid network and network providers

In this thesis, jobs are divided into classes depending on the amount of ing power they require, the amount of data to be transferred, and its deadline.Jobs belonging to the same class have similar values for each of these parameters

Trang 22

mechanisms to provide varying share of its capability, depending on a specifiedpolicy In our thesis, we have analyzed QoS requirements for two kinds of resources

- Processing nodes and Network elements

1.3.1 QoS for Processing Nodes

[5] discusses some of the metrics relevant to Quality of Service in Grid Computing.The QoS of a grid job at a processing node can be defined in terms of the followingparameters -

• Latency of a task is the amount of time taken by the grid node to execute

it It can also be said to be the response time of the node Latency is themost important parameter for jobs that need immediate attention, for e.g asystem that keeps tabs on the stock market to decide which stocks to investin

• Throughput is the units of work that is accompanied by the grid service inunit time The throughput for a certain job can be measured as the amount

of work provided by the job divided by the amount of time taken to complete

it i.e latency Throughput is an important factor for jobs that need lot ofprocessing power for e.g weather simulations, nuclear reaction simulations

or processing large financial worksheets

• Availability - Another important QoS metric is the availability Availability

is defined as the fraction of time that the resource is available for use Fore.g., the desktop PCs may only spare their CPU cycles for grid services atnight However, some jobs may require guarantees of 100% availability ofservice, for e.g a fire or defense alarm system Systems that require higheravailability may need more dedicated hardware

Trang 23

1.3.2 QoS for Network Elements

In networks, QoS refers to the capability of a network to provide differentiatedservice to different types of network traffic Certain flows should be able to obtainbetter service than other flows Common methods to do this include providinghigher bandwidth to high priority flows and/or dropping more packets of lowpriority packets in the presence of congestion In computer networks, some of theimportant QoS parameters are bandwidth, packet loss rate, delay and jitter

• Bandwidth - Bandwidth refers to the amount of data that can be sent in unittime by a flow Applications which require lot of I/O to and from networkresources can be sped up by providing them higher bandwidth

• Packet Loss Rate - In IP networks, packets are not guaranteed to reachtheir destination If queues at any intermediate router in a flow are full,

an incoming packet is dropped Flows that are guaranteed low packet lossrates can be accommodated by dropping packets of lower priority, when highpriority packets arrive at a router that has full queues Flow-based RandomEarly Detect (FRED) ([6]) is a common technique that can be used to achievepreferential treatment with regard to loss rate for packets of certain classes

• Delay - In computer networks, delay is composed of transmission delay,prorogation delay and queuing delay Transmission delay depends on thebandwidth provided to the flow and prorogation delay depends on the speed

of electrons in metallic mediums, or speed of light in optic fibers Queuingdelays occur at a router because routers have an overhead for processing eachpacket, and many packets in its queues that delay the time at which a specificpacket will be processed by the router Routers which guarantee low delays

to certain flows can use buffer management techniques like Priority Queuing(PQ), Weighted Fair Queuing (WFQ) or Class Based Queuing (CBQ)

Trang 24

• Jitter - Many applications are sensitive to the variation in delay, rather thanthe actual delay itself This is usually the case with applications that rely

on a regular arrival of data, for e.g a video client Variation in delay ofpackets is known as jitter Jitter can be kept low by using small queue sizes

in routers

[7] provides a detailed view of QoS in networks and the various mechanismsused to implement and provide QoS in networks

1.3.3 QoS Levels

Quality of Service (QoS) can be distinguished into two categories

-• Soft QoS - Soft QoS consists of providing better service to some jobs onresources like network and processing nodes Service Providers can makesome assurances about bandwidth, delay, processing power etc., by treatingsome jobs better than the rest However, no guarantees are made, and theassurances are given on a statistical basis Soft QoS is mainly achieved byhaving multiple classes of service, with different priorities One example of

a soft QoS mechanism is the Assured Forwarding Per-Hop-Behavior (PHB)([8] of DiffServ ([9])

• Hard QoS - In some cases, soft QoS may not be enough to satisfy all therequirements of an user For jobs with tight deadlines, the customer mayrequire the service provider to guarantee that the job will get completedwithin a certain time Hard QoS includes mechanisms for guaranteeing thebandwidth or other QoS metrics at resources Hard QoS allows the resourceprovider to decouple the requirement of one customer from the load intro-duced by another, since he needs to make sure there are enough resourcesavailable to satisfy both Though this service requires more stringent re-source management techniques, as compared to soft QoS, service providers

Trang 25

may be interested in providing such services since they can charge a premiumrate for it An example of Hard QoS is the Resource ReSerVation Protocol(RSVP) ([10]) RSVP can be used by an application to make resource reser-vation at each node that the application stream will be traversing Anotherexample is the Guaranteed Service (GS) ([11]) provided by the IntServ ([12])framework.

We discussed in Section 1.3.3 the difference between soft and hard QoS Resourcesthat support soft QoS are said to support provisioning In provisioning, the servicelevels supported are only statistical Resources provided differentiated service tomultiple classes of jobs Load from other sources can affect the performance of ajob in provisioning schemes Reservation, on the other hand, refers to hard QoSguarantees provided by resources Here, the service parameters are absolute, and

do not change depending on the load at a resource

Users and service providers can agree to the service parameters that the user willreceive from a service provider through a Service Level Agreement (SLA) A SLAcontains the contract between the subscriber and the service provider regardingthe characteristics of traffic allowable at the service providers ingress node Forexample, a SLA could state that a subscriber will always receive a lower delay forhis packets as compared to other flows, as long as the number of packets are keptbelow a certain threshold This is an example of soft QoS A SLA could also statethat a job of certain size will always get processed within an hour at a processingnode This is an example of hard (guaranteed) QoS [13] provides examples of

Trang 26

like IntServ and DiffServ to provide QoS to users.

The major problem with SLAs is that it leads to wastage of resources at theservice providers side If the mechanisms ensuring QoS are static, the serviceprovider needs to make sure that he has enough capacity to cater to every userconsuming their maximum allowable capacity This leads to underutilization ofhis resources when the subscribers are sending below their upper limits Thus,static QoS mechanisms do not provide maximum value for money

In grid computing models, users submit their jobs via brokers to one or manygrid processing nodes These processing nodes may be within the same networkdomain, or could be a few network Administrative Domains (AD) away If a griduser wants his jobs to meet certain QoS criteria, he needs the following

i The grid service provider must have mechanisms to ensure that he can vide QoS to the jobs, for e.g higher or fixed share of processing capacity, orensuring 24/7 availability

pro-ii The network domains through which the jobs passes also need to provideQoS support This can be through means of providing priority queuing,higher share of bandwidth, or allowing packets to queue at the head of therouter queues

Jobs can only fulfill their deadline and other QoS requirements if QoS support

is provided by both of the above For e.g if only grid nodes provide QoS support,data packets for that job may get held up in the network, or even lost, leading toQoS violations overall for the job This will be especially true if the job is I/Ointensive

One way in which the user can make sure of his job meeting deadlines, is tosign a SLA(1.5) with the grid service provider, and all the network domains that

Trang 27

his data will be passing through However, this would mean the location of thegrid processing nodes will need to be fixed and known in advance This takes away

a lot of flexibility of having a grid system that is supposed to be transparent tothe user Also, in services like utility computing, the user sends his jobs to theutility provider, from where it may be farmed out to any location the providerthinks is appropriate Thus, signing SLAs in advance is a cumbersome processwhich requires manual intervention, and does not always fit in well with the gridand utility computing model

In this thesis, we have used a model where network and processing resourcesalways provide QoS support The services a job receives depends on its class Class

1 jobs receive preference over Class 2 jobs The resource allocations among classesare decided by policies at each resource The policies are learnt and continuallyadjusted by agents residing at each resource

Providing Quality of Service in Grids is a challenging problem and many searchers have taken a look at it in recent years [14] presents a resource man-agement architecture called GARA, that addresses the problem of achieving end-to-end QoS guarantees across heterogeneous collections of shared resources Itworks in conjunction with the Globus Toolkit ([15]) It allows the construction

re-of reusable co-reservation and co-allocation agents that can combine domain andresource specific knowledge to discover, reserve and and allocate resources to tryand meet application QoS requirements [16] discusses the principles behind theGARA architecture However, GARA requires that the application use an API

to create calls for reservation and allocation.Also, the project seems to be of-date and doesn’t work with newer versions of Globus, with the last update totheir website in June 2000

Trang 28

out-[17] proposes a Grid Architecture for Computational Economy (GRACE).GRACE is an economic framework for consumers and service providers to op-erate in a grid computing market The consumers interact with brokers to expresstheir budget and deadline requirements The resource broker is responsible fordiscovering grid resources, negotiating with GSPs and controlling and schedul-ing jobs GRACE proposes setting up a market directory, where resource ownerscan publish their services along with their service parameters like pricing poli-cies It also proposes setting up a Grid Bank which records resource usage, billsconsumers, and transfers funds to service providers The pricing strategies can

be based on flat rates, demand-and-supply, calendar based, bargaining based etc.[17] also discusses Nimrod-G, which is a tool for automated modelling and exe-cution of applications, and it is based on the GRACE framework The GRACEframework enables QoS by allowing users and service providers to negotiate theprice and deadline of job However, it does not propose any mechanisms by whichthe service providers can ensure that they deliver the QoS they promised whenpublishing their services in the market directory It also does not explore howresource providers can maximize the usage of their grids

[18] details another approach to provide QoS to grid services using a frameworkcalled QoSINUS This approach aims to provide an end-to-end Best Effort QoS.Only network level QoS is provided in this approach Programs can specify theirQoS parameters through an API provided by QoSINUS, and the QoSINUS servicetries to map their requests with a class of IP service on a network that supportssome form of QoS like DiffServ ([9]) The QoSINUS approach contains adaptivecontrol components that are responsible for class mapping and adapting the packetmarking policy according to the performance experienced by the packets of eachparticular flow The adaptive policies can be flexible, ranging from static mapping

to policies that give higher priorities to jobs with earlier deadlines There is noprovision for CPU Reservation in this framework, therefore no guarantees can be

Trang 29

provided The QoS provided is very basic, limited to Best Effort Service Thoughthis framework has adaptive algorithms, the adaptive algorithms are simple innature.

[19] proposes a framework called Grid QoS Management (G-QoSM), which

is compatible with the Open Grid Services Architecture (OGSA) specification.OGSA is a framework to build Virtual Organizations, and defines grid services inthe form of Web services It provides three levels of service called Guaranteed,Controlled Load and Best Effort QoS The behavior of these levels are similar

to those defined by the IntServ Architecture ([12]) The G-QoSM framework iscomposed of - (i) the QoS Grid Server, (ii) an extended version of the UniversalDescription Discovery and Integration (UDDIe), (iii) a resource reservation man-ager, (iv) a resource allocation manager and, (v) a policy Grid Service It usesDSRT ([20]) for reservation of CPU cycles and DiffServ ([9]) for network manage-ment It has facilities for reservation, advanced reservation and admission control.However, the reservation levels are fixed and cannot adjust depending on the loadand type of jobs coming to the grid service

[21] discusses a resource allocation algorithm using reinforcement learning Thealgorithm used for reinforcement learning is one step Q Learning The reward isbased on the amount of time used to complete the job by a grid node [?] presents

a general methodology for scheduling jobs in soft real-time systems, where theutility of each job decreases as a function of time It demonstrates a RL basedarchitecture for solving a NP-hard optimal control problem, which is schedulingjobs on multiple CPUs on single or multiple machines

R = sign{< ρi > −ρi >} (1.1)where R is the reward, and < ρi > is the average of the response time ofjobs completed to date and ρi is the response time of the last completed job

Trang 30

average response time.

The update equation is

Qi,t+1← Qi,t+ α(R − Qi,t) (1.2)

Qi,t is the value of grid node i at time t and α is the learning rate Thereforegrid nodes that take longer time to complete a job have lower value then grid nodeswith fast response time The agent follows a -greedy strategy, which means thatthe greedy action will be taken with probability 1− , with the greedy action beingtaken otherwise The paper concludes that this Q learning approach does betterthan an algorithm that simply chooses the least loaded resource The problemwith this simplistic approach is that the arrival of jobs at a resource would becyclic A fast grid node would see more jobs coming to it, eventually slowing itdown more than the slow resource, which will hardly see any load As a result its

Q value will decrease over time, after which the slow resources will tend to getswamped

This research focuses on the need to provide Quality of Service to grid users in

an autonomous manner Providing QoS in an autonomous manner implies that itshould require minimum interference from a system or network administrator inorder to meet the QoS requirements laid out in SLAs

As we stated in Section 1.5, static mechanisms to provide QoS cause resources

to be underutilized at a service providers site In our thesis, we focus on dynamicQoS mechanisms that learn policies to help maximize the utilization of a network

or grid service providers resources

We focus on Reinforcement Learning based mechanisms that can observe theload and other conditions at a resource and learn resource allocation policies with-

Trang 31

out the need for external supervision RL agents reside at each resource node.These agents are responsible for the resource allocation policies We intend toimplement a system where successful completion of jobs within deadlines gives

a positive feedback to the agents, and they are penalized if a job fails to finishwithin deadline Once the agent learns an optimal policy, they should be able tohandle any situation in the network and grid We aim to design such a system insimulation, as well as have a working implementation on a testbed to assess itsviability

This thesis describes how QoS can be achieved using RL based algorithms Theproposal made is evaluated using a grid simulation package known as GridSim,and a testbed running a Globus based grid Before the actual RL based system

is described, we need to briefly introduce these various components Accordingly,the rest of the thesis is organized as follows - Chapter 2 gives a brief introduc-tion to Reinforcement Learning, the framework which was used for the design ofour resource allocation system It includes descriptions of the Markov DecisionProcess, states, actions reward policy etc It also shows how a Cerebellar ModelArticulate Controller is used for function approximation It goes on to describethe RL algorithms used in this thesis, namely Q(λ) and SMART

Chapter 3 gives a description of the work that was done on GridSim GridSimdid not support any network simulations prior to version 3.1 This chapter detailsthe motivations behind implementing an elementary network stack for GridSimand how the this feature was designed and added to GridSim We have shown thedesign for the network elements, and also discussed their implementation details

We have provided class diagrams for the various elements like Packets, Links,Routers etc., that are essential for the network infrastructure in GridSim We

Trang 32

have also discussed some of the other grid simulation packages, the features theyprovide, and why they do not fit our needs completely It is hoped that these will

be useful for other researchers wanting to use and extend GridSim

Chapter 4 discusses our strategies for Reinforcement Learning based resourceallocation in detail Resources can be allocated by provisioning or reservation.The chapter explores resource provisioning on routers and GRs with the use ofWFQ and GPS algorithms respectively It also details how resource reservationcan be provided on routers and GRs using rate-jitter schedulers and CPU cyclereservation We also discuss the design and configuration of the RL system, likethe state and action space, reward policies etc

In Chapter 5, we describe the simulations of the solutions we discussed inChapter 4 We describe the various simulation experiments we designed and ran

on GridSim to benchmark the performance of our solution against currently usedstatic allocation policies We simulated three distinct scenarios : (i) when only theUser broker runs an agent to decide which resource to send the next job to; (ii)when the agents are only running on network routers and grid nodes, adjustingthe resource allocation levels according to the policies they learn in real time;(iii) when the agents are enabled on the User brokers, the network routers andgrid nodes We also compare the performance of our algorithm in each case Wecontinue the chapter with a description of the implementation of our system on atestbed We detail the design of the agents running on routers and grid resources.These agents also use Reinforcement Learning to determine resource allocation

We show how we used nice levels on Linux to provide CPU provisioning, and aLinux program called rshaper to provide bandwidth reservation

Trang 33

Chapter 2

Reinforcement Learning

2.1.1 Markov Decision Process

In general reinforcement learning problems can be modelled as Markov DecisionProcesses (MDP) A MDP consists of

• a set of States S

• a set of Actions A

• a reward function R : S × A → <

• a state transition function T : S × A → π(S)

The state transition function specifies the probability distribution of state sitions, i.e the probability of moving from state s to state s0 given action a, spec-ified as T (s, a, s0) The reward function specifies the instantaneous reward whensuch a transition takes place A reinforcement learning process that satisfies theMarkov property can be modelled as a MDP

Trang 34

tran-2.1.2 The Markov Property

In a certain process, if the probability distribution of all the future states of theenvironment only depend on the current state, the process is said to fulfill theMarkovian Property, and the process itself is known as a Markovian Process Thisimplies that the probability of reaching a state s0 from s is always known Math-ematically,

2.1.3 Reinforcement Learning

Bellman [22] proposed the framework of Dynamic Programming (DP) to solveproblems which are amenable to Markovian analysis Algorithms like value orpolicy iteration can be used to solve such problems However, these algorithmsrequire that for every decision made by an agent, the complete state-transitionprobability is required In other words, the model of the problem must be known

In problems with large state and action spaces, developing the transition bilities model can be a hindrance to solving the problem This is known as thecurse of dimensionality To solve such problem, we can use algorithms like themethod of temporal differences (TD(λ) [23] or Watkins Q(λ) [24] These methods

Trang 35

proba-fall in the category of reinforcement learning algorithms.

Reinforcement Learning(RL) (also known as Neuro-dynamic Programming (NDP))

is a popular technique used to learn an action policy i.e given a certain situation,

it is able to decide the ideal action to be taken An ideal action is one whichmaximizes a numerical reward signal over a period of time Agents using the

RL algorithm discover by trial and error which actions are the most valuable inthe states that the agents will encounter For a comprehensive discussion on RL,please refer to [25] and [26]

Figure 2.1: Reinforcement Learning Model

In the standard model shown in 2.1, an agent communicates with an ment At each iteration of the algorithm, the agent gets to know the state stfromthe environment at time t The agent refers to its current policy (π), and decides

environ-an action at to be taken This changes the state of the environment from st to

st+1 It also causes a reward to be generated, Rt, which is used as feedback to theagent The agent learns about the desirability of an action from a particular state

in this way The agent formulates a policy that maximizes the discounted sum ofexpected future rewards

Trang 36

value = E[Rt+ γRt+1+ γ2Rt+2+ ] (2.1)

2.1.4 State

In every reinforcement learning problem, the environment is always in a certainstate A state can consist of any information available to the agent from theenvironment For example, if the environment is a chess board, the state of theenvironment could be the positions of the pieces on the board States can bedirectly observed from the environment or constructed using inputs received fromthe environment Constructed states can be useful when the agent is interested inthe gradient of a feature In this thesis, the state of the environment at time t isrepresented as st

2.1.5 Action

At every state, the RL agent has a list of possible actions that it can take from thatstate Actions could either be greedy or exploratory Greedy actions are ones thatthe agent knows should lead to the future rewards being maximized Exploratoryactions [27] are those that are used to learn more about the environment, in thehope of finding a better policy than the current one In this thesis, the actiontaken at time t is represented by at

One way to decide between taking greedy and exploratory actions at each step

is known as the -greedy method In this method, the greedy action is takenmost of the time, but once in a while, an exploratory action is taken with a smallprobability

Another way to explore the state and action space is by using softmax actionselection This method is an improvement over -greedy methods, because the -greedy selection is as likely to be a bad one as a good one The softmax algorithm

on the other hand chooses actions based on their value estimates The probability

Trang 37

of an action being chosen is directly proportional to its value estimate, thereforethe greedy action has the highest probability of being chosen A common softmaxmethod uses the Boltzmann distribution.

The probability of choosing action a at time t is

p(a) = e

Q t (a)/θ

Pn b=1eQ t (b)/θ (2.2)

θ is a positive parameter called the temperature The higher the temperature,the more even the probabilities of all actions being chosen Therefore, an experi-ment can begin with a high temperature, and the temperature can be lowered asthe experiment goes on This will mean that more learning will take place initially,and as the agent learns more about the optimal policy π∗, less exploratory actionsare taken, causing higher returns and system stability

2.1.6 Rewards

Agents receive reward from the environment for taking actions at each state Thesole purpose of the agent is to maximize the expected value of the rewards inthe future, as stated in Equation 2.1 Rewards are generated by the environment,rather than being calculated by the agent itself By placing the reward assignmentoutside the agent, it is encouraged to formulate a policy in an environment in which

it has imperfect control of the reward outcome

The rewards generated by environments due to an agent’s actions may not

be immediate For example, in a grid environment, an agent could change thereservation levels for certain classes of jobs This affects the jobs running currentlyand the jobs arriving in future Since jobs on grids could be processor and I/Ointensive, they could take a long time to complete, and the effect of the agent’saction will not be known till some time into the future Such rewards are known

as Delayed Rewards, and agents must know how to assign delayed rewards to the

Trang 38

action or series of actions that caused this reward to be generated.

The most important part of assigning reward is to determine which action atwhich state should the reward be assigned to This is known as the temporal creditassignment problem If the credit is assigned to an action that did not contribute

to the credit being generated, the agent might formulate a sub-optimal policy

A sub-optimal policy is one that does not lead to maximum expected return.One method to overcome this problem is to wait till the end of the experiment,and observe whether the reward is positive or negative, and reward the actionsrespectively We will need to iterate this process a few times for proper learning

to take place However, this increases the learning time of the agent, and does notwork in problems that do not have a specific ending There are techniques thatadjust the estimated value of a state depending on the current estimated value,immediate reward and the estimated value of the state reached due to the action.These class of techniques are known as temporal difference methods ([23])

2.1.7 Policy

Every agent follows a policy which dictates what actions are to be taken in eachstate The policy that the agent follows determines the reward and return thatthe agent enjoys By observing the returns from its current policy, the agent tries

to continually improve its policy In our thesis, policy is represented by π, theaction to be taken at a state s is represented as π(s), and the optimal policy isrepresented by π∗ The optimal policy is the one that achieves maximum return.Policy can be improved through Policy Iteration or Value Iteration [26]

2.1.8 Function Approximation

The basic method of storing the Q-values of state-action pairs are lookup tables.Lookup tables are simple to implement, but they can only be used when the stateand action space is small However, in problems where the state or action space is

Trang 39

very large or continuous (for example [28]), lookup tables become impractical toimplement Not only do the lookup tables require large amounts of memory, theyare also unable to generalize the learnt values Generalization refers to the genera-tion of similar outputs for similar input Generalization also allows agents to learnfaster since they do not need to explore the entire state and action space beforethey can produce a meaningful action for a given state-action pair Structuresthat provide such generalization are known as Function Approximators.

There are several techniques which can provide function approximation Wecan use neural network methods like Multi-Layer Perceptrons (MLP) or RadialBasis Functions (RBF) The method we have chosen for function approximation inour experiments is a coarse coding technique known Cerebellar Model ArticulationController (CMAC) The main advantage of CMACs over RBFs and MLPs is thatthey have lower computation requirements ([29])

CMACs work by using quantizing functions and resolution elements Eachdimension of the Q-value is mapped to K different quantizers, each of which has

N resolution elements Each quantizing function has a value associated with it,and each resolution represents a fraction of the value of the quantizer When twoinputs to the CMAC are close, they map to quantizers and resolution elementsthat are close, and therefore have values that are similar A detailed description

of CMACs can be found in Chapter 3 of[29]

For a CMAC with K quantizing functions, the number of storage elementsrequired is

where Ni is the number of resolution elements in quantizer i Choosing a largernumber of quantizers and resolution elements increases the accuracy of the CMAC,but also increases its memory requirements

Trang 40

element only requires K lookups, where K is the number of quantizers This issignificantly lesser than the case of MLPs where all weights have to be computed

to lookup a single value

In our problem of providing Quality of Service in Grid Environment, we mented with two different methods that can be used to solve reinforcement learningproblems The first of these is called Watkins Q-Learning algorithm [30] , and thesecond one is known as Semi-Markovian Average Reward Technique (SMART)[31]

experi-2.2.1 Watkins’ Q-Lambda

Watkins’ Q(λ) is an off policy Temporal Difference (TD) learning method [24]

It combines one-step Q-Learning with eligibility traces Both these terms areexplained below

One-step Q-Learning

In one step Q-Learning, the agent uses the following update rule

Q(st, at) ← Q(st, at) + α[Rt+1+ γmaxaQ(st+1, a) − Q(st, at)] (2.4)The Qπ(s, a) function is the perceived value of taking action a at state s underpolicy π The higher the value of Qπ(s, a), the more desirable that action in thatparticular state Qπ(s, a) is basically the expected return when starting from state

s, taking action a and following policy π thereafter The Q-value of a state-actionpair is learnt from experience If the agent follows policy π and maintains anseparate average for actions from each state, the average will converge to the Q-value for that state In a reinforcement learning problem, the objective of the

Định dạng
Số trang	122
Dung lượng	0,93 MB