Grid Scheduling and Resource Management

6.5 Grid Scheduling with QoSa Grid environment is largely dependent on the effectiveness andefﬁciency of its schedulers, which act as localized resource brokers.Figure 6.1 shows that use

Trang 1

man-• What a scheduling system is about and how it works.

• Scheduling paradigms

• Condor, SGE, PBS and LSF

• Grid scheduling with quality-of-services (QoS) support, e.g.AppLeS, Nimrod/G, Grid rescheduling

• Grid scheduling optimization with heuristics

CHAPTER OUTLINE

6.1 Introduction

6.2 Scheduling Paradigms

6.3 How Scheduling Works

6.4 A Review of Condor, SGE, PBS and LSF

The Grid: Core Technologies Maozhen Li and Mark Baker

Trang 2

6.5 Grid Scheduling with QoS

a Grid environment is largely dependent on the effectiveness andefﬁciency of its schedulers, which act as localized resource brokers.Figure 6.1 shows that user tasks, for example, can be submitted viaGlobus to a range of resource management and job scheduling sys-tems, such as Condor [1], the Sun Grid Engine (SGE) [2], the PortableBatch System (PBS) [3] and the Load Sharing Facility (LSF) [4].Grid scheduling is deﬁned as the process of mapping Grid jobs

to resources over multiple administrative domains A Grid job can

Figure 6.1 Jobs, via Globus, can be submitted to systems managed by

Condor, SGE, PBS and LSF

Trang 3

be split into many small tasks The scheduler has the responsibility

of selecting resources and scheduling jobs in such a way that theuser and application requirements are met, in terms of overallexecution time (throughput) and cost of the resources utilized.This chapter is organized as follows In Section 6.2, we presentthree scheduling paradigms – centralized, hierarchical and decen-tralized In Section 6.3, we describe the steps involved in thescheduling process In Section 6.4, we give a review of the currentwidely used resource management and job scheduling such asCondor and SGE In Section 6.5, we discuss some issues related toscheduling with QoS In Section 6.6, we conclude the chapter and

in Section 6.7, provide references for further reading and testing

6.2 SCHEDULING PARADIGMS

Hamscheret al [5] present three scheduling paradigms –

central-ized, hierarchical and distributed In this section, we give a briefreview of the scheduling paradigms A performance evaluation ofthe three scheduling paradigms can also be found in Hamscher

et al [5].

6.2.1 Centralized scheduling

In a centralized scheduling environment, a central machine (node)acts as a resource manager to schedule jobs to all the surroundingnodes that are part of the environment This scheduling paradigm

is often used in situations like a computing centre where resourceshave similar characteristics and usage policies Figure 6.2 showsthe architecture of centralized scheduling

In this scenario, jobs are ﬁrst submitted to the central scheduler,which then dispatches the jobs to the appropriate nodes Thosejobs that cannot be started on a node are normally stored in acentral job queue for a later start

One advantage of a centralized scheduling system is that thescheduler may produce better scheduling decisions because ithas all necessary, and up-to-date, information about the availableresources However, centralized scheduling obviously does notscale well with the increasing size of the environment that it man-ages The scheduler itself may well become a bottleneck, and if

Trang 4

Figure 6.2 Centralized scheduling

there is a problem with the hardware or software of the uler’s server, i.e a failure, it presents a single point of failure inthe environment

sched-6.2.2 Distributed scheduling

In this paradigm, there is no central scheduler responsible for aging all the jobs Instead, distributed scheduling involves multiplelocalized schedulers, which interact with each other in order to dis-patch jobs to the participating nodes There are two mechanismsfor a scheduler to communicate with other schedulers – direct orindirect communication

man-Distributed scheduling overcomes scalability problems, whichare incurred in the centralized paradigm; in addition it can offerbetter fault tolerance and reliability However, the lack of a globalscheduler, which has all the necessary information on availableresource, usually leads to sub-optimal scheduling decisions

6.2.2.1 Direct communication

In this scenario, each local scheduler can directly communicatewith other schedulers for job dispatching Each scheduler has alist of remote schedulers that they can interact with, or there mayexist a central directory that maintains all the information related

to each scheduler Figure 6.3 shows the architecture of direct munication in the distributed scheduling paradigm

com-If a job cannot be dispatched to its local resources, its schedulerwill communicate with other remote schedulers to ﬁnd resources

Trang 5

Figure 6.3 Direct communications in distributed scheduling

appropriate and available for executing its job Each scheduler maymaintain a local job queue(s) for job management

6.2.2.2 Communication via a central job pool

In this scenario, jobs that cannot be executed immediately are sent

to a central job pool Compared with direct communication, thelocal schedulers can potentially choose suitable jobs to schedule

on their resources Policies are required so that all the jobs in thepool are executed at some time Figure 6.4 shows the architecture

of using a job pool for distributed scheduling

Figure 6.4 Distributed scheduling with a job pool

Trang 6

Figure 6.5 Hierarchical scheduling

6.2.3 Hierarchical scheduling

In hierarchical scheduling, a centralized scheduler interacts withlocal schedulers for job submission The centralized scheduler is akind of a meta-scheduler that dispatches submitted jobs to localschedulers Figure 6.5 shows the architecture of this paradigm.Similar to the centralized scheduling paradigm, hierarchicalscheduling can have scalability and communication bottlenecks.However, compared with centralized scheduling, one advantage

of hierarchical scheduling is that the global scheduler and localscheduler can have different policies in scheduling jobs

6.3 HOW SCHEDULING WORKS

Grid scheduling involves four main stages: resource discovery,resource selection, schedule generation and job execution

6.3.1 Resource discovery

The goal of resource discovery is to identify a list of authenticatedresources that are available for job submission In order to copewith the dynamic nature of the Grid, a scheduler needs to have

Trang 7

some way of incorporating dynamic state information about theavailable resources into its decision-making process.

This decision-making process is somewhat analogous to anordinary compiler for a single processor machine The compilerneeds to know how many registers and functional units exist andwhether or not they are available or “busy” It should also beaware of how much memory it has to work with, what kind ofcache conﬁguration has been implemented and the various com-munication latencies involved in accessing these resources It isthrough this information that a compiler can effectively scheduleinstructions to minimize resource idle time Similarly, a schedulershould always know what resources it can access, how busy theyare, how long it takes to communicate with them and how long ittakes for them to communicate with each other With this informa-tion, the scheduler optimizes the scheduling of jobs to make moreefﬁcient and effective use of the available resources

A Grid environment typically uses a pull model, a push model

or a push–pull model for resource discovery The outcome of theresource discovery process is the identity of resources available(Ravailable) in a Grid environment for job submission and execution

6.3.1.1 The pull model

In this model, a single daemon associated with the schedulercan query Grid resources and collect state information such asCPU loads or the available memory The pull model for gather-ing resource information incurs relatively small communicationoverhead, but unless it requests resource information frequently,

it tends to provide fairly stale information which is likely to beconstantly out-of-date, and potentially misleading In centralizedscheduling, the resource discovery/query process could be ratherintrusive and begin to take signiﬁcant amounts of time as the envi-ronment being monitored gets larger and larger Figure 6.6 showsthe architecture of the model

6.3.1.2 The push model

In this model, each resource in the environment has a daemon forgathering local state information, which will be sent to a central-ized scheduler that maintains a database to record each resource’s

Trang 8

Figure 6.6 The pull model for resource discovery

activity If the updates are frequent, an accurate view of the systemstate can be maintained over time; obviously, frequent updates

to the database are intrusive and consume network bandwidth.Figure 6.7 shows the architecture of the push model

6.3.1.3 The push–pull model

The push–pull model lies somewhere between the pull model andthe push model Each resource in the environment runs a dae-mon that collects state information Instead of directly sending thisinformation to a central scheduler, there exist some intermediatenodes running daemons that aggregate state information from dif-ferent sub-resources that respond to queries from the scheduler

Figure 6.7 The push model for resource discovery

Trang 9

Figure 6.8 The push–pull model for resource discovery

A challenge of this model is to ﬁnd out what information is mostuseful, how often it should be collected and how long this infor-mation should be kept around Figure 6.8 shows the architecture

of the push–pull model

6.3.2 Resource selection

Once the list of possible target resources is known, the secondphase of the scheduling process is to select those resources that bestsuit the constraints and conditions imposed by the user, such asCPU usage, RAM available or disk storage The result of resourceselection is to identify a resource listRselected in which all resourcescan meet the minimum requirements for a submitted job or a joblist The relationship between resources available Ravailable andresources selectedRselected is:

Rselected⊆ Ravailable

6.3.3 Schedule generation

The generation of schedules involves two steps, selecting jobs andproducing resource selection strategies

Trang 10

6.3.3.1 Job selection

The resource selection process is used to choose resource(s) fromthe resource list Rselected for a given job Since all resources inthe listRselectedcould meet the minimum requirements imposed bythe job, an algorithm is needed to choose the best resource(s) toexecute the job Although random selection is a choice, it is not anideal resource selection policy The resource selection algorithmshould take into account the current state of resources and choosethe best one based on a quantitative evaluation A resource selec-tion algorithm that only takes CPU and RAM into account could

RAM1 − RAMusage∗RAMsize

RAMmin (6.3)where WCPU – the weight allocated to CPU speed; CPUload – thecurrent CPU load; CPUspeed– real CPU speed; CPUmin – minimumCPU speed;WRAM – the weight allocated to RAM; RAMusage – thecurrent RAM usage; RAMsize – original RAM size; and RAMmin –minimum RAM size

Now we give an example to explain the algorithm used to chooseone resource from three possible candidates The assumed param-eters associated with each resource are given in Table 6.1

Let us suppose that the total weighting used in the algorithm

is 10, where the CPU weight is 6 and the RAM weight is 4 Theminimum CPU speed is 1 GHz and minimum RAM size is 256 MB

Table 6.1 The resource information matrix

CPU speed (GHz)

CPU load (%)

RAM size (MB)

RAM usage (%) Resource 1 1.8 50 256 50

Resource 2 2.6 70 512 60

Resource 3 1.2 40 512 30

Trang 11

Then, evaluation values for resources can be calculated using thethree formulas:

Evaluationresource1=54 + 2

10 = 074Evaluationresource2=468 + 32

10 = 0788Evaluationresource3=432 + 56

10 = 0992From the results we know Resource3 is the best choice for thesubmitted job

6.3.3.2 Resource selection

The goal of job selection is to select a job from a job queue forexecution Four strategies that can be used to select a job are givenbelow

• First come ﬁrst serve: The scheduler selects jobs for execution in

the order of their submissions If there is no resource availablefor the selected job, the scheduler will wait until the job can

be started The other jobs in the job queue have to wait Thereare two main drawbacks with this type of job selection It maywaste resources when, for example, the job selected needs moreresources to be available before it can start, which results in

a long waiting time And jobs with high priorities cannot getdispatched immediately if a job with a low priority needs moretime to complete

• Random selection: The next job to be scheduled is randomly

selected from the job queue Apart from the two drawbacks withthe ﬁrst-come-ﬁrst-serve strategy, jobs selection is not fair andjob submitted earlier may not be scheduled until much later

• Priority-based selection: Jobs submitted to the scheduler have

dif-ferent priorities The next job to be scheduled is the job with thehighest priority in the job queue A job priority can be set whenthe job is submitted One drawback of this strategy is that it ishard to set an optimal criterion for a job priority A job with thehighest priority may need more resources than available andmay also result in a long waiting time and inability to makegood use of the available resources

Trang 12

• Backﬁlling selection [6]: The backﬁlling strategy requires

knowl-edge of the expected execution time of a job to be scheduled

If the next job in the job queue cannot be started due to a lack

of available resources, backﬁlling tries to ﬁnd another job in thequeue that can use the idle resources

6.3.4 Job execution

Once a job and a resource are selected, the next step is to submitthe job to the resource for execution Job execution may be as easy

as running a single command or as complicated as running a series

of scripts that may, or may not, include set up or staging

6.4 A REVIEW OF CONDOR, SGE, PBS AND LSF

In this section, we give a review on Condor/Condor-G, the SGE,PBS and LSF The four systems have been widely used for Grid-based resource management and job scheduling

6.4.1 Condor

Condor is a resource management and job scheduling system, aresearch project from University of Wisconsin–Madison In thissection we study Condor based on its latest version, Condor 6.6.3

6.4.1.1 Condor platforms

Condor 6.6.3 supports a variety of systems as follows:

• HP systems running HPUX10.20

• Sun SPARC systems running Solaris 2.6/2.7/8/9

• SGI systems running IRIX 6.5 (not fully supported)

• Intel x86 systems running Redhat Linux 7.1/7.2/7.3/8.0/9.0, dows NT4.0, XP and 2003 Server (the Windows systems are notfully supported)

Win-• ALPHA systems running Digital UNIX 4.0, Redhat Linux7.1/7.2/7.3 and Tru64 5.1 (not fully supported)

Trang 13

• PowerPC systems running Macintosh OS X and AIX 5.2L (notfully supported)

• Itanium systems running Redhat 7.1/7.2/7.3 (not fully ported)

sup-• Windows systems (not fully supported)

UNIX machines and Windows machines running Condor canco-exist in the same Condor pool without any problems, e.g ajob submitted from a Windows machine can run on a Windowsmachine or a UNIX machine, a job submitted from a UNIX machinecan run on a UNIX or a Windows machine There is absolutely

no need to run more than one Condor central manager, even ifyou have both UNIX and Windows machines The Condor centralmanager itself can run on either UNIX or Windows machines

6.4.1.2 The architecture of a Condor pool

Resources in Condor are normally organized in the form of Condorpools A pool is an administrated domain of hosts, not speciﬁcallydedicated to a Condor environment A Condor system can havemultiple pools of which each follows a ﬂat machine organization

As shown in Figure 6.9, a Condor pool normally has one tral Manager (master host) and an arbitrary number of Execution(worker) hosts A Condor Execution host can be conﬁgured as ajob Execution host or a job Submission host or both The CentralManager host is used to manage resources and jobs in a Condor

Cen-Figure 6.9 The architecture of a Condor pool

Trang 14

pool Host machines in a Condor pool may not be dedicated toCondor.

If the Central Manager host in a Condor pool crashes, jobs thatare already running will continue to run unaffected Queued jobswill remain in the queue unharmed, but they cannot begin runninguntil the Central Manager host is restarted

6.4.1.3 Daemons in a Condor pool

A daemon is a program that runs in the background once started

To conﬁgure a Condor pool, the following Condor daemons need

to be started Figure 6.10 shows the interactions between Condordaemons

condor_master

Thecondor_master daemon runs on each host in a Condor pool to

keep all the other daemons running in the pool It spawns daemonssuch ascondor_startd and condor_schedd, and periodically checks if

there are new binaries installed for any of these daemons If so, the

condor_master will restart the affected daemons In addition, if any

daemon crashes, the master will send an email to the administrator

Figure 6.10 Daemons in a Condor pool

Trang 15

of the Condor pool and restart the daemon The condor_master

also supports various administrative commands, such as starting,stopping or reconﬁguring daemons remotely

condor_startd

Thecondor_startd daemon runs on each host in a Condor pool It

advertises information related to the node resources to the dor_collector daemons running on the Master host for matching

con-pending resource requests This daemon is also responsible forenforcing the policies that resource owners require, which deter-mine under what conditions remote jobs will be started, sus-pended, resumed, vacated or killed When the condor_startd is

ready to execute a Condor job on an Execution host, it spawns the

condor_starter.

condor_starter

Thecondor_starter daemon only runs on Execution hosts It is the condor_starter that actually spawns a remote Condor job on a given

host in a Condor pool The condor_starter daemon sets up the

execution environment and monitors the job once it is running.When a job completes, the condor_starter sends back job status

information to the job Submission node and exits

condor_schedd

Thecondor_schedd daemon running on each host in a Condor pool

deals with resource requests User jobs submitted to a node arestored in a local job queue managed by thecondor_schedd daemon.

Condor command-line tools such as condor_submit, condor_q or condor_rm interact with the condor_schedd daemon to allow users

to submit a job into a job queue, and to view and manipulate thejob queue If thecondor_schedd is down on a given machine, none

of these commands will work

The condor_schedd advertises the job requests with resource

requirements in its local job queue to thecondor_collector daemon

running on the Master hosts Once a job request from a dor_schedd on a Submission host has been matched with a given

con-resource on an Execution host, thecondor_schedd on the Submission

host will spawn acondor_shadow daemon to serve that particular

job request

condor_shadow

The condor_shadow daemon only runs on Submission hosts in

a Condor pool and acts as the resource manager for user job

Trang 16

submission requests Thecondor_shadow daemon performs remote

system calls allowing jobs submitted to Condor to be checkpointed.Any system call performed on a remote Execution host is sent overthe network, back to thecondor_shadow daemon on the Submission

host, and the results are also sent back to the Submission host Inaddition, thecondor_shadow daemon is responsible for making deci-

sions about a user job submission request, such as where point ﬁles should be stored or how certain ﬁles should be accessed

check-condor_collector

The condor_collector daemon only runs on the Central Manager

host This daemon interacts with condor_startd and condor_schedd

daemons running on other hosts to collect all the informationabout the status of a Condor pool such as job requests andresources available The condor_status command can be used to

query the condor_collector daemon for speciﬁc status information

about a Condor pool

condor_negotiator

Thecondor_negotiator daemon only runs on the Central Manager

host and is responsible for matching a resource with a speciﬁc jobrequest within a Condor pool Periodically, the condor_negotiator

daemon starts a negotiation cycle, where it queries the dor_collector daemon for the current state of all the resources

con-available in the pool It interacts with eachcondor_schedd daemon

running on a Submission host that has resource requests in apriority order, and tries to match available resources with thoserequests If a user with a higher priority has jobs that are waiting

to run, and another user claims resources with a lower priority,the condor_negotiator daemon can preempt a resource and match

it with the user job request with a higher priority

condor_kbdd

Thecondor_kbdd daemon only runs on an Execution host installing

Digital Unix or IRIX On these platforms, thecondor_startd daemon

cannot determine console (keyboard or mouse) activity directlyfrom the operating system The condor_kbdd daemon connects to

an X Server and periodically checks if there is any user activity If

so, thecondor_kbdd daemon sends a command to the condor_startd

daemon running on the same host In this way, thecondor_startd

daemon knows the machine owner is using the machine againand it can perform whatever actions are necessary, given thepolicy it has been conﬁgured to enforce Therefore, Condor can

Trang 17

be used in a non-dedicated computing environment to scavengeidle computing resources.

condor_ckpt_server

Thecondor_ckpt_server daemon runs on a checkpoint server, which

is an Execution host, to store and retrieve checkpointed ﬁles If acheckpoint server in a Condor pool is down, Condor will revert

to sending the checkpointed ﬁles for a given job back to the jobSubmission host

6.4.1.4 Job life cycle in Condor

A job submitted to a Condor pool will go through the followingsteps as shown in Figure 6.11

1 Job submission: A job is submitted by a Submission host with condor_submit command (Step 1).

2 Job request advertising: Once it receives a job request, the dor_schedd daemon on the Submission host advertises the

con-request to the condor_collector daemon running on the Central

Manager host (Step 2)

3 Resource advertising: Each condor_startd daemon running on an

Execution host advertises resources available on the host to the

Figure 6.11 Job life cycle in Condor

Trang 18

condor_collector daemon running on the Central Manager host

(Step 3)

4 Resource matching: The condor_negotiator daemon running on the

Central Manager host periodically queries the condor_collector

daemon (Step 4) to match a resource for a user job request Itthen informs thecondor_schedd daemon running on the Submis-

sion host of the matched Execution host (Step 5)

5 Job execution: The condor_schedd daemon running on the job

Sub-mission host interacts with the condor_startd daemon running

on the matched Execution host (Step 6), which will spawn a

condor_starter daemon (Step 7) The condor_schedd daemon on

the Submission host spawns a condor_shadow daemon (Step 8)

to interact with the condor_starter daemon for job execution

(Step 9) The condor_starter daemon running on the matched

Execution host receives a user job to execute (Step 10)

6 Return output: When a job is completed, the results will be sent

back to the Submission host by the interaction between the

condor_shadow daemon running on the Submission host and the condor_starter daemon running on the matched Execution host

(Step 11)

6.4.1.5 Security management in Condor

Condor provides strong support for authentication, encryption,integrity assurance, as well as authorization A Condor systemadministrator using conﬁguration macros enables most of thesesecurity features

When Condor is installed, there is no authentication, encryption,integrity checks or authorization checks in the default conﬁgura-tion settings This allows newer versions of Condor with secu-rity features to work or interact with previous versions withoutsecurity support An administrator must modify the conﬁgurationsettings to enable the security features

Trang 19

READ permission; if you want to submit a job, you need WRITEpermission.

Authentication

Authentication provides an assurance of an identity Through figuration macros, both a client and a daemon can specify whetherauthentication is required For example, if the macro defined inthe configuration file for a daemon is

con-SEC_WRITE_AUTHENTICATION= REQUIRED

then the daemon must authenticate the client for any nication that requires the WRITE access level If the daemon’sconﬁguration contains

commu-SEC_DEFAULT_AUTHENTICATION= REQUIREDand does not contain any other security configuration forAUTHENTICATION, then this default configuration defines thedaemon’s needs for authentication over all access levels

If no authentication methods are specified in the configuration,Condor uses a default authentication such as Globus GSI authenti-cation with x.509 certificates, Kerberos authentication or file systemauthentication as we have discussed in Chapter 4

Encryption

Encryption provides privacy support between two ing parties Through conﬁguration macros, both a client and adaemon can specify whether encryption is required for furthercommunication

communicat-Integrity checks

An integrity check assures that the messages between cating parties have not been tampered with Any change, such asaddition, modiﬁcation or deletion, can be detected Through con-ﬁguration macros, both a client and a daemon can specify whether

communi-an integrity check is required of further communication

6.4.1.6 Job management in Condor

Condor manages jobs in the following aspects

Job

A Condor job is a work unit submitted to a Condor pool forexecution

Trang 20

Job types

Jobs that can be managed by Condor are executable sequential orparallel codes, using, for example, PVM or MPI A job submissionmay involve a job that runs over a long period, a job that needs

to run many times or a job that needs many machines to run inparallel

A job can have one of the following status:

• Idle: There is no job activity.

• Busy: A job is busy running.

• Suspended: A job is currently suspended.

• Vacating: A job is currently checkpointing.

• Killing: A job is currently being killed.

• Benchmarking: The condor_startd is running benchmarks.

Job run-time environments

The Condor universe speciﬁes a Condor execution environment.There are seven universes in Condor 6.6.3 as described below

• The default universe is the Standard Universe (except where

the conﬁguration variable DEFAULT_UNIVERSE deﬁnes itotherwise), and tells Condor that this job has been re-linkedviacondor_compile with Condor libraries and therefore supports

checkpointing and remote system calls

• The Vanilla Universe is an execution environment for jobs which

have not been linked with Condor libraries; and it is used tosubmit shell scripts to Condor

• The PVM Universe is used for a parallel job written with PVM 3.4.

• The Globus Universe is intended to provide the standard Condor

interface to users who wish to start Globus jobs from Condor.Each job queued in the job submission ﬁle is translated intothe Globus Resource Speciﬁcation Language (RSL) and subse-quently submitted to Globus via the Globus GRAM protocol

Trang 21

• The MPI Universe is used for MPI jobs written with the MPICH

package

• The Java Universe is used for programs written in Java.

• The Scheduler Universe allows a Condor job to be executed

on the host where the job is submitted The job does not needmatchmaking for a host and it will never be preempted

Job submission with a shared ﬁle system

If Vanilla, Java or MPI jobs are submitted without using the filetransfer mechanism, Condor must use a shared file system to accessinput and output files In this case, the job must be able to accessthe data files from any machine on which it could potentially run

Job submission without a shared ﬁle system

Condor also works well without a shared file system A usercan use the file transfer mechanism in Condor when submittingjobs Condor will transfer any files needed by a job from the hostmachine where the job is submitted into a temporary workingdirectory on the machine where the job is to be executed Con-dor executes the job and transfers output back to the Submissionmachine

The user specifies which files to transfer, and at what point theoutput files should be copied back to the Submission host Thisspecification is done within the job’s submission description file.The default behavior of the file transfer mechanism varies acrossthe different Condor universes, which have been discussed aboveand it differs between UNIX and Windows systems

Job priority

Job priorities allow the assignment of a priority level to each mitted Condor job in order to control the order of execution Thepriority of a Condor job can be changed

sub-Chirp I/O

The Chirp I/O facility in Condor provides a sophisticatedI/O functionality It has two advantages over simple whole-ﬁletransfers

• First, the use of input ﬁles is done at run time rather than mission time

sub-• Second, a part of a ﬁle can be transferred instead of transferringthe whole ﬁle

Trang 22

Job ﬂow management

A Condor job can have many tasks of which each task is an cutable code Condor uses a Directed Acyclic Graph (DAG) to rep-resent a set of tasks in a job submission, where the input/output,

exe-or execution of one exe-or mexe-ore tasks is dependent on one exe-or mexe-oreother tasks The tasks are nodes (vertices) in the graph, and theedges (arcs) identify the dependencies of the tasks Condor ﬁndsthe Execution hosts for the execution of the tasks involved, but itdoes not schedule the tasks in terms of dependencies

The Directed Acyclic Graph Manager (DAGMan) [7] is a scheduler for Condor jobs DAGMan submits jobs to Condor in anorder represented by a DAG and processes the results An inputﬁle is used to describe the dependencies of the tasks involved inthe DAG, and each task in the DAG also has its own descriptionﬁle

meta-Job monitoring

Once submitted, the status of a Condor job can be monitored using

condor_q command In the case of DAG, the progress of the DAG

can also be monitored by looking at the log ﬁle(s), or by using

condor_q–dag.

Job recovery: The rescue DAG

DAGMan can help with the resubmission of uncompleted portions

of a DAG when one or more nodes fail If any node in the DAGfails, the remainder of the DAG is continued until no more forwardprogress can be made based on the DAG’s dependencies When

a node in the DAG fails, DAGMan automatically produces a filecalled a Rescue DAG, which is a DAG input file whose function-ality is the same as the original DAG file The Rescue DAG fileadditionally contains indication of successfully completed nodesusing the DONE option If the DAG is re-submitted using this

Rescue DAG input ﬁle, the nodes marked as completed will not

be re-executed

Job checkpointing mechanism

Checkpointing is normally used in a Condor job that needs a longtime to complete It takes a snapshot of the current state of a job

in such a way that the job can be restarted from that checkpointedstate at a later time

Checkpointing gives the Condor scheduler the freedom to sider scheduling decisions through preemptive-resume schedul-ing If the scheduler decides to no longer allocate a host to a

Trang 23

recon-job, e.g when the owner of that host starts using the host, it cancheckpoint the job and preempt it without losing the work thejob has already accomplished The job can be resumed later whenthe scheduler allocates it a new host Additionally, periodic check-pointing provides fault tolerance in Condor.

Computing On Demand

Computing On Demand (COD) extends Condor’s high throughputcomputing abilities to include a method for running short-termjobs on available resources immediately

COD extends Condor’s job management to include interactive,computation-intensive jobs, giving these jobs immediate access tothe computing power they need over a relatively short period

of time COD provides computing power on demand, switching

predeﬁned resources from working on Condor jobs to working onthe COD jobs These COD jobs cannot use the batch schedulingfunctionality of Condor since the COD jobs require interactiveresponse time

6.4.1.7 Resource management in Condor

Condor manages resources in a Condor pool in the followingaspects

Tracking resource usage

The condor_startd daemon on each host reports to the dor_collector daemon on the Central Manager host about the

con-resources available on that host

User priority

Condor hosts are allocated to users based upon a user’s priority

A lower numerical value for user priority means higher priority,

so a user with priority 5 will get more resources than a user withpriority 50

Trang 24

6.4.1.8 Job scheduling policies in Condor

Job scheduling in a Condor pool is not strictly based on a come-ﬁrst-server selection policy Rather, to keep large jobs fromdraining the pool of resources, Condor uses a unique up-downalgorithm [8] that prioritizes jobs inversely to the number of cyclesrequired to run the job Condor supports the following policies inscheduling jobs

ﬁrst-• First come ﬁrst serve: This is the default scheduling policy.

• Preemptive scheduling: Preemptive policy lets a pending

high-priority job take resources away from a running job of lowerpriority

• Dedicated scheduling: Dedicated scheduling means that jobs

scheduled to dedicated resources cannot be preempted

6.4.1.9 Resource matching in Condor

Resource matching [9] is used to match an Execution host to run

a selected job or jobs Thecondor_collector daemon running on the

Central Manager host receives job request advertisements from the

condor_schedd daemon running on a Submission host and resource

availability advertisements from thecondor_startd daemon running

on an Execution host A resource match is performed by the dor_negotiator daemon on the Central Manager host by selecting

con-a resource bcon-ased on job requirements Both job request con-tisements and resource advertisements are described in CondorClassiﬁed Advertisement (ClassAd) language, a mechanism forrepresenting the characteristics and constraints of hosts and jobs

adver-in the Condor system

A ClassAd is a set of uniquely named expressions Each namedexpression is called an attribute ClassAds use a semi-structureddata model for resource descriptions Thus, no speciﬁc schema isrequired by the matchmaker, allowing it to work naturally in aheterogeneous environment

The ClassAd language includes a query language, allowingadvertising agents such as thecondor_startd and condor_schedd dae-

mons to specify the constraints in matching resource offers anduser job requests Figure 6.12 shows an example of a ClassAdjob request advertisement and a ClassAd resource advertisement

Trang 25

Job ClassAd Host ClassAd

TargetType=“Machine” TargetType = “Job”

Requirements= Machine = “s140n209.brunel.ac.uk”

((other.Arch == “INTEL”&& Arch = “INTEL”

other OpSys == “LINUX”)&& OpSys = “LINUX”

Figure 6.12 Two ClassAd samples

These two ClassAds will be used by thecondor_negotiator daemon

running on the Central Manager host to check whether the hostcan be matched with the job requirements

6.4.1.10 Condor support in Globus

Jobs can be submitted directly to a Condor pool from a Condorhost, or via Globus (GT2 or earlier versions of Globus), as shown

in Figure 6.13 The Globus host is conﬁgured withCondor ager provided by Globus When using a Condor jobmanager, jobs

jobman-are submitted to the Globus resource, e.g using globus_job_run.

However, instead of forking the jobs on the local machine, jobs arere-submitted by Globus to Condor using thecondor_submit tool.

To use Condor-G, we do not need to install a Condor pool.Condor-G is only the job management part of Condor Condor-Gcan be installed on just one machine within an organization and

Trang 26

Figure 6.13 Submitting jobs to a Condor pool via Condor or Globus

Figure 6.14 Submitting jobs to Globus via Condor-G

the access to remote Grid resources using a Globus interface can

be done through it

Submitting Globus jobs using Condor-G provides a much higherlevel of service than simply using globus_job_run command pro-

vided by Globus

• First, jobs submitted to Globus with Condor-G enter a localCondor queue that can be effectively managed by Condor

Trang 27

• Secondly, jobs remain in the Condor queue until they are pleted Therefore, should the job crash while running remotely,Condor-G can re-submit it again without user intervention.

com-In a word, Condor-G provides a level of service guarantee that isnot available withglobus_job_run and other Globus commands Note: Condor-G does not have a GUI (the “G” is for Grid) How-

ever, the following graphic tools can be used with both Condorand Condor-G:

• CondorView: Shows a graphical history of the resources in a pool.

• Condor UserLogViewer: Shows a graphical history of a large set

of jobs submitted to Condor or Condor-G

6.4.2 Sun Grid Engine

The SGE is a distributed resource management and schedulingsystem from Sun Microsystems that can be used to optimize theutilization of software and hardware resources in a UNIX-basedcomputing environment The SGE can be used to ﬁnd a pool ofidle resources and harnesses these resources; also it can be usedfor normal activities, such as managing and scheduling jobs ontothe available resources The latest version of SGE is Sun N1 GridEngine (N1GE) version 6 (see Table 6.2) In this section, we focus

on SGE 5.3 Standard Edition because it is freely downloadable

6.4.2.1 The SGE architecture

Hosts (machines or nodes) in SGE are classiﬁed into fourcategories – master, submission, execution, administration andshadow Figure 6.15 shows the SGE architecture

• Master host: A single host is selected to be the SGE master host.

This host handles all requests from users, makes job-schedulingdecisions and dispatches jobs to execution hosts

• Submit host: Submit hosts are machines conﬁgured to submit,

monitor and administer jobs, and to manage the entire cluster

• Execution host: Execution hosts have the permission to run SGE

jobs

Trang 28

Table 6.2 A note of the differences between N1 Grid Engine and Sun Grid

• Grid Engine Management Model for 1-click deployment of execution hosts on

an arbitrary number of hosts (to be delivered in the second quarter of 2005) The basic software components underneath N1GE and SGE are identical In fact, the open-source project is the development platform for those components Proprietary Sun code only exists for the differentiators listed above (where applicable) Note that some of those differentiators use other Sun products or technologies, which are not open source themselves.

Figure 6.15 The architecture of the SGE

• Administration host: SGE administrators use administration hosts

to make changes to the cluster’s conﬁguration, such as changingdistributed resource management parameters, conﬁguring newnodes or adding or changing users

• Shadow master host: While there is only one master host, other

machines in the cluster can be designated as shadow masterhosts to provide greater availability A shadow master hostcontinually monitors the master host, and automatically and

Trang 29

transparently assumes control in the event that the master hostfails Jobs already in the cluster are not affected by a master hostfailure.

6.4.2.2 Daemons in an SGE cluster

As shown in Figure 6.16, to conﬁgure an SGE cluster, the followingdaemons need to be started

sge_qmaster – The Master daemon

Thesge_qmaster daemon is the centre of the cluster’s management

and scheduling activities; it maintains tables about hosts, queues,jobs, system load and user permissions It receives scheduling deci-sions fromsge_schedd daemon and requests actions from sge_execd

daemon on the appropriate execution host(s) Thesge_qmaster

dae-mon runs on the Master host

sge_schedd – The Scheduler daemon

The sge_schedd is a scheduling daemon that maintains an

up-to-date view of the cluster’s status with the help of sge_qmaster

daemon It makes the scheduling decision about which job(s) aredispatched to which queue(s) It then forwards these decisions to

Figure 6.16 Daemons in SGE

Tiêu đề	Grid Scheduling and Resource Management
Tác giả	Maozhen Li, Mark Baker
Trường học	John Wiley & Sons, Ltd
Thể loại	Chương
Năm xuất bản	2005
Thành phố	London

Định dạng
Số trang	58
Dung lượng	1,05 MB