6.5 Grid Scheduling with QoSa Grid environment is largely dependent on the effectiveness andefficiency of its schedulers, which act as localized resource brokers.Figure 6.1 shows that use
Trang 1man-• What a scheduling system is about and how it works.
• Scheduling paradigms
• Condor, SGE, PBS and LSF
• Grid scheduling with quality-of-services (QoS) support, e.g.AppLeS, Nimrod/G, Grid rescheduling
• Grid scheduling optimization with heuristics
CHAPTER OUTLINE
6.1 Introduction
6.2 Scheduling Paradigms
6.3 How Scheduling Works
6.4 A Review of Condor, SGE, PBS and LSF
The Grid: Core Technologies Maozhen Li and Mark Baker
Trang 26.5 Grid Scheduling with QoS
a Grid environment is largely dependent on the effectiveness andefficiency of its schedulers, which act as localized resource brokers.Figure 6.1 shows that user tasks, for example, can be submitted viaGlobus to a range of resource management and job scheduling sys-tems, such as Condor [1], the Sun Grid Engine (SGE) [2], the PortableBatch System (PBS) [3] and the Load Sharing Facility (LSF) [4].Grid scheduling is defined as the process of mapping Grid jobs
to resources over multiple administrative domains A Grid job can
Figure 6.1 Jobs, via Globus, can be submitted to systems managed by
Condor, SGE, PBS and LSF
Trang 3be split into many small tasks The scheduler has the responsibility
of selecting resources and scheduling jobs in such a way that theuser and application requirements are met, in terms of overallexecution time (throughput) and cost of the resources utilized.This chapter is organized as follows In Section 6.2, we presentthree scheduling paradigms – centralized, hierarchical and decen-tralized In Section 6.3, we describe the steps involved in thescheduling process In Section 6.4, we give a review of the currentwidely used resource management and job scheduling such asCondor and SGE In Section 6.5, we discuss some issues related toscheduling with QoS In Section 6.6, we conclude the chapter and
in Section 6.7, provide references for further reading and testing
6.2 SCHEDULING PARADIGMS
Hamscheret al [5] present three scheduling paradigms –
central-ized, hierarchical and distributed In this section, we give a briefreview of the scheduling paradigms A performance evaluation ofthe three scheduling paradigms can also be found in Hamscher
et al [5].
6.2.1 Centralized scheduling
In a centralized scheduling environment, a central machine (node)acts as a resource manager to schedule jobs to all the surroundingnodes that are part of the environment This scheduling paradigm
is often used in situations like a computing centre where resourceshave similar characteristics and usage policies Figure 6.2 showsthe architecture of centralized scheduling
In this scenario, jobs are first submitted to the central scheduler,which then dispatches the jobs to the appropriate nodes Thosejobs that cannot be started on a node are normally stored in acentral job queue for a later start
One advantage of a centralized scheduling system is that thescheduler may produce better scheduling decisions because ithas all necessary, and up-to-date, information about the availableresources However, centralized scheduling obviously does notscale well with the increasing size of the environment that it man-ages The scheduler itself may well become a bottleneck, and if
Trang 4Figure 6.2 Centralized scheduling
there is a problem with the hardware or software of the uler’s server, i.e a failure, it presents a single point of failure inthe environment
sched-6.2.2 Distributed scheduling
In this paradigm, there is no central scheduler responsible for aging all the jobs Instead, distributed scheduling involves multiplelocalized schedulers, which interact with each other in order to dis-patch jobs to the participating nodes There are two mechanismsfor a scheduler to communicate with other schedulers – direct orindirect communication
man-Distributed scheduling overcomes scalability problems, whichare incurred in the centralized paradigm; in addition it can offerbetter fault tolerance and reliability However, the lack of a globalscheduler, which has all the necessary information on availableresource, usually leads to sub-optimal scheduling decisions
6.2.2.1 Direct communication
In this scenario, each local scheduler can directly communicatewith other schedulers for job dispatching Each scheduler has alist of remote schedulers that they can interact with, or there mayexist a central directory that maintains all the information related
to each scheduler Figure 6.3 shows the architecture of direct munication in the distributed scheduling paradigm
com-If a job cannot be dispatched to its local resources, its schedulerwill communicate with other remote schedulers to find resources
Trang 5Figure 6.3 Direct communications in distributed scheduling
appropriate and available for executing its job Each scheduler maymaintain a local job queue(s) for job management
6.2.2.2 Communication via a central job pool
In this scenario, jobs that cannot be executed immediately are sent
to a central job pool Compared with direct communication, thelocal schedulers can potentially choose suitable jobs to schedule
on their resources Policies are required so that all the jobs in thepool are executed at some time Figure 6.4 shows the architecture
of using a job pool for distributed scheduling
Figure 6.4 Distributed scheduling with a job pool
Trang 6Figure 6.5 Hierarchical scheduling
6.2.3 Hierarchical scheduling
In hierarchical scheduling, a centralized scheduler interacts withlocal schedulers for job submission The centralized scheduler is akind of a meta-scheduler that dispatches submitted jobs to localschedulers Figure 6.5 shows the architecture of this paradigm.Similar to the centralized scheduling paradigm, hierarchicalscheduling can have scalability and communication bottlenecks.However, compared with centralized scheduling, one advantage
of hierarchical scheduling is that the global scheduler and localscheduler can have different policies in scheduling jobs
6.3 HOW SCHEDULING WORKS
Grid scheduling involves four main stages: resource discovery,resource selection, schedule generation and job execution
6.3.1 Resource discovery
The goal of resource discovery is to identify a list of authenticatedresources that are available for job submission In order to copewith the dynamic nature of the Grid, a scheduler needs to have
Trang 7some way of incorporating dynamic state information about theavailable resources into its decision-making process.
This decision-making process is somewhat analogous to anordinary compiler for a single processor machine The compilerneeds to know how many registers and functional units exist andwhether or not they are available or “busy” It should also beaware of how much memory it has to work with, what kind ofcache configuration has been implemented and the various com-munication latencies involved in accessing these resources It isthrough this information that a compiler can effectively scheduleinstructions to minimize resource idle time Similarly, a schedulershould always know what resources it can access, how busy theyare, how long it takes to communicate with them and how long ittakes for them to communicate with each other With this informa-tion, the scheduler optimizes the scheduling of jobs to make moreefficient and effective use of the available resources
A Grid environment typically uses a pull model, a push model
or a push–pull model for resource discovery The outcome of theresource discovery process is the identity of resources available(Ravailable) in a Grid environment for job submission and execution
6.3.1.1 The pull model
In this model, a single daemon associated with the schedulercan query Grid resources and collect state information such asCPU loads or the available memory The pull model for gather-ing resource information incurs relatively small communicationoverhead, but unless it requests resource information frequently,
it tends to provide fairly stale information which is likely to beconstantly out-of-date, and potentially misleading In centralizedscheduling, the resource discovery/query process could be ratherintrusive and begin to take significant amounts of time as the envi-ronment being monitored gets larger and larger Figure 6.6 showsthe architecture of the model
6.3.1.2 The push model
In this model, each resource in the environment has a daemon forgathering local state information, which will be sent to a central-ized scheduler that maintains a database to record each resource’s
Trang 8Figure 6.6 The pull model for resource discovery
activity If the updates are frequent, an accurate view of the systemstate can be maintained over time; obviously, frequent updates
to the database are intrusive and consume network bandwidth.Figure 6.7 shows the architecture of the push model
6.3.1.3 The push–pull model
The push–pull model lies somewhere between the pull model andthe push model Each resource in the environment runs a dae-mon that collects state information Instead of directly sending thisinformation to a central scheduler, there exist some intermediatenodes running daemons that aggregate state information from dif-ferent sub-resources that respond to queries from the scheduler
Figure 6.7 The push model for resource discovery
Trang 9Figure 6.8 The push–pull model for resource discovery
A challenge of this model is to find out what information is mostuseful, how often it should be collected and how long this infor-mation should be kept around Figure 6.8 shows the architecture
of the push–pull model
6.3.2 Resource selection
Once the list of possible target resources is known, the secondphase of the scheduling process is to select those resources that bestsuit the constraints and conditions imposed by the user, such asCPU usage, RAM available or disk storage The result of resourceselection is to identify a resource listRselected in which all resourcescan meet the minimum requirements for a submitted job or a joblist The relationship between resources available Ravailable andresources selectedRselected is:
Rselected⊆ Ravailable
6.3.3 Schedule generation
The generation of schedules involves two steps, selecting jobs andproducing resource selection strategies
Trang 106.3.3.1 Job selection
The resource selection process is used to choose resource(s) fromthe resource list Rselected for a given job Since all resources inthe listRselectedcould meet the minimum requirements imposed bythe job, an algorithm is needed to choose the best resource(s) toexecute the job Although random selection is a choice, it is not anideal resource selection policy The resource selection algorithmshould take into account the current state of resources and choosethe best one based on a quantitative evaluation A resource selec-tion algorithm that only takes CPU and RAM into account could
RAM1 − RAMusage∗RAMsize
RAMmin (6.3)where WCPU – the weight allocated to CPU speed; CPUload – thecurrent CPU load; CPUspeed– real CPU speed; CPUmin – minimumCPU speed;WRAM – the weight allocated to RAM; RAMusage – thecurrent RAM usage; RAMsize – original RAM size; and RAMmin –minimum RAM size
Now we give an example to explain the algorithm used to chooseone resource from three possible candidates The assumed param-eters associated with each resource are given in Table 6.1
Let us suppose that the total weighting used in the algorithm
is 10, where the CPU weight is 6 and the RAM weight is 4 Theminimum CPU speed is 1 GHz and minimum RAM size is 256 MB
Table 6.1 The resource information matrix
CPU speed (GHz)
CPU load (%)
RAM size (MB)
RAM usage (%) Resource 1 1.8 50 256 50
Resource 2 2.6 70 512 60
Resource 3 1.2 40 512 30
Trang 11Then, evaluation values for resources can be calculated using thethree formulas:
Evaluationresource1=54 + 2
10 = 074Evaluationresource2=468 + 32
10 = 0788Evaluationresource3=432 + 56
10 = 0992From the results we know Resource3 is the best choice for thesubmitted job
6.3.3.2 Resource selection
The goal of job selection is to select a job from a job queue forexecution Four strategies that can be used to select a job are givenbelow
• First come first serve: The scheduler selects jobs for execution in
the order of their submissions If there is no resource availablefor the selected job, the scheduler will wait until the job can
be started The other jobs in the job queue have to wait Thereare two main drawbacks with this type of job selection It maywaste resources when, for example, the job selected needs moreresources to be available before it can start, which results in
a long waiting time And jobs with high priorities cannot getdispatched immediately if a job with a low priority needs moretime to complete
• Random selection: The next job to be scheduled is randomly
selected from the job queue Apart from the two drawbacks withthe first-come-first-serve strategy, jobs selection is not fair andjob submitted earlier may not be scheduled until much later
• Priority-based selection: Jobs submitted to the scheduler have
dif-ferent priorities The next job to be scheduled is the job with thehighest priority in the job queue A job priority can be set whenthe job is submitted One drawback of this strategy is that it ishard to set an optimal criterion for a job priority A job with thehighest priority may need more resources than available andmay also result in a long waiting time and inability to makegood use of the available resources
Trang 12• Backfilling selection [6]: The backfilling strategy requires
knowl-edge of the expected execution time of a job to be scheduled
If the next job in the job queue cannot be started due to a lack
of available resources, backfilling tries to find another job in thequeue that can use the idle resources
6.3.4 Job execution
Once a job and a resource are selected, the next step is to submitthe job to the resource for execution Job execution may be as easy
as running a single command or as complicated as running a series
of scripts that may, or may not, include set up or staging
6.4 A REVIEW OF CONDOR, SGE, PBS AND LSF
In this section, we give a review on Condor/Condor-G, the SGE,PBS and LSF The four systems have been widely used for Grid-based resource management and job scheduling
6.4.1 Condor
Condor is a resource management and job scheduling system, aresearch project from University of Wisconsin–Madison In thissection we study Condor based on its latest version, Condor 6.6.3
6.4.1.1 Condor platforms
Condor 6.6.3 supports a variety of systems as follows:
• HP systems running HPUX10.20
• Sun SPARC systems running Solaris 2.6/2.7/8/9
• SGI systems running IRIX 6.5 (not fully supported)
• Intel x86 systems running Redhat Linux 7.1/7.2/7.3/8.0/9.0, dows NT4.0, XP and 2003 Server (the Windows systems are notfully supported)
Win-• ALPHA systems running Digital UNIX 4.0, Redhat Linux7.1/7.2/7.3 and Tru64 5.1 (not fully supported)
Trang 13• PowerPC systems running Macintosh OS X and AIX 5.2L (notfully supported)
• Itanium systems running Redhat 7.1/7.2/7.3 (not fully ported)
sup-• Windows systems (not fully supported)
UNIX machines and Windows machines running Condor canco-exist in the same Condor pool without any problems, e.g ajob submitted from a Windows machine can run on a Windowsmachine or a UNIX machine, a job submitted from a UNIX machinecan run on a UNIX or a Windows machine There is absolutely
no need to run more than one Condor central manager, even ifyou have both UNIX and Windows machines The Condor centralmanager itself can run on either UNIX or Windows machines
6.4.1.2 The architecture of a Condor pool
Resources in Condor are normally organized in the form of Condorpools A pool is an administrated domain of hosts, not specificallydedicated to a Condor environment A Condor system can havemultiple pools of which each follows a flat machine organization
As shown in Figure 6.9, a Condor pool normally has one tral Manager (master host) and an arbitrary number of Execution(worker) hosts A Condor Execution host can be configured as ajob Execution host or a job Submission host or both The CentralManager host is used to manage resources and jobs in a Condor
Cen-Figure 6.9 The architecture of a Condor pool
Trang 14pool Host machines in a Condor pool may not be dedicated toCondor.
If the Central Manager host in a Condor pool crashes, jobs thatare already running will continue to run unaffected Queued jobswill remain in the queue unharmed, but they cannot begin runninguntil the Central Manager host is restarted
6.4.1.3 Daemons in a Condor pool
A daemon is a program that runs in the background once started
To configure a Condor pool, the following Condor daemons need
to be started Figure 6.10 shows the interactions between Condordaemons
condor_master
Thecondor_master daemon runs on each host in a Condor pool to
keep all the other daemons running in the pool It spawns daemonssuch ascondor_startd and condor_schedd, and periodically checks if
there are new binaries installed for any of these daemons If so, the
condor_master will restart the affected daemons In addition, if any
daemon crashes, the master will send an email to the administrator
Figure 6.10 Daemons in a Condor pool
Trang 15of the Condor pool and restart the daemon The condor_master
also supports various administrative commands, such as starting,stopping or reconfiguring daemons remotely
condor_startd
Thecondor_startd daemon runs on each host in a Condor pool It
advertises information related to the node resources to the dor_collector daemons running on the Master host for matching
con-pending resource requests This daemon is also responsible forenforcing the policies that resource owners require, which deter-mine under what conditions remote jobs will be started, sus-pended, resumed, vacated or killed When the condor_startd is
ready to execute a Condor job on an Execution host, it spawns the
condor_starter.
condor_starter
Thecondor_starter daemon only runs on Execution hosts It is the condor_starter that actually spawns a remote Condor job on a given
host in a Condor pool The condor_starter daemon sets up the
execution environment and monitors the job once it is running.When a job completes, the condor_starter sends back job status
information to the job Submission node and exits
condor_schedd
Thecondor_schedd daemon running on each host in a Condor pool
deals with resource requests User jobs submitted to a node arestored in a local job queue managed by thecondor_schedd daemon.
Condor command-line tools such as condor_submit, condor_q or condor_rm interact with the condor_schedd daemon to allow users
to submit a job into a job queue, and to view and manipulate thejob queue If thecondor_schedd is down on a given machine, none
of these commands will work
The condor_schedd advertises the job requests with resource
requirements in its local job queue to thecondor_collector daemon
running on the Master hosts Once a job request from a dor_schedd on a Submission host has been matched with a given
con-resource on an Execution host, thecondor_schedd on the Submission
host will spawn acondor_shadow daemon to serve that particular
job request
condor_shadow
The condor_shadow daemon only runs on Submission hosts in
a Condor pool and acts as the resource manager for user job
Trang 16submission requests Thecondor_shadow daemon performs remote
system calls allowing jobs submitted to Condor to be checkpointed.Any system call performed on a remote Execution host is sent overthe network, back to thecondor_shadow daemon on the Submission
host, and the results are also sent back to the Submission host Inaddition, thecondor_shadow daemon is responsible for making deci-
sions about a user job submission request, such as where point files should be stored or how certain files should be accessed
check-condor_collector
The condor_collector daemon only runs on the Central Manager
host This daemon interacts with condor_startd and condor_schedd
daemons running on other hosts to collect all the informationabout the status of a Condor pool such as job requests andresources available The condor_status command can be used to
query the condor_collector daemon for specific status information
about a Condor pool
condor_negotiator
Thecondor_negotiator daemon only runs on the Central Manager
host and is responsible for matching a resource with a specific jobrequest within a Condor pool Periodically, the condor_negotiator
daemon starts a negotiation cycle, where it queries the dor_collector daemon for the current state of all the resources
con-available in the pool It interacts with eachcondor_schedd daemon
running on a Submission host that has resource requests in apriority order, and tries to match available resources with thoserequests If a user with a higher priority has jobs that are waiting
to run, and another user claims resources with a lower priority,the condor_negotiator daemon can preempt a resource and match
it with the user job request with a higher priority
condor_kbdd
Thecondor_kbdd daemon only runs on an Execution host installing
Digital Unix or IRIX On these platforms, thecondor_startd daemon
cannot determine console (keyboard or mouse) activity directlyfrom the operating system The condor_kbdd daemon connects to
an X Server and periodically checks if there is any user activity If
so, thecondor_kbdd daemon sends a command to the condor_startd
daemon running on the same host In this way, thecondor_startd
daemon knows the machine owner is using the machine againand it can perform whatever actions are necessary, given thepolicy it has been configured to enforce Therefore, Condor can
Trang 17be used in a non-dedicated computing environment to scavengeidle computing resources.
condor_ckpt_server
Thecondor_ckpt_server daemon runs on a checkpoint server, which
is an Execution host, to store and retrieve checkpointed files If acheckpoint server in a Condor pool is down, Condor will revert
to sending the checkpointed files for a given job back to the jobSubmission host
6.4.1.4 Job life cycle in Condor
A job submitted to a Condor pool will go through the followingsteps as shown in Figure 6.11
1 Job submission: A job is submitted by a Submission host with condor_submit command (Step 1).
2 Job request advertising: Once it receives a job request, the dor_schedd daemon on the Submission host advertises the
con-request to the condor_collector daemon running on the Central
Manager host (Step 2)
3 Resource advertising: Each condor_startd daemon running on an
Execution host advertises resources available on the host to the
Figure 6.11 Job life cycle in Condor
Trang 18condor_collector daemon running on the Central Manager host
(Step 3)
4 Resource matching: The condor_negotiator daemon running on the
Central Manager host periodically queries the condor_collector
daemon (Step 4) to match a resource for a user job request Itthen informs thecondor_schedd daemon running on the Submis-
sion host of the matched Execution host (Step 5)
5 Job execution: The condor_schedd daemon running on the job
Sub-mission host interacts with the condor_startd daemon running
on the matched Execution host (Step 6), which will spawn a
condor_starter daemon (Step 7) The condor_schedd daemon on
the Submission host spawns a condor_shadow daemon (Step 8)
to interact with the condor_starter daemon for job execution
(Step 9) The condor_starter daemon running on the matched
Execution host receives a user job to execute (Step 10)
6 Return output: When a job is completed, the results will be sent
back to the Submission host by the interaction between the
condor_shadow daemon running on the Submission host and the condor_starter daemon running on the matched Execution host
(Step 11)
6.4.1.5 Security management in Condor
Condor provides strong support for authentication, encryption,integrity assurance, as well as authorization A Condor systemadministrator using configuration macros enables most of thesesecurity features
When Condor is installed, there is no authentication, encryption,integrity checks or authorization checks in the default configura-tion settings This allows newer versions of Condor with secu-rity features to work or interact with previous versions withoutsecurity support An administrator must modify the configurationsettings to enable the security features
Trang 19READ permission; if you want to submit a job, you need WRITEpermission.
Authentication
Authentication provides an assurance of an identity Through figuration macros, both a client and a daemon can specify whetherauthentication is required For example, if the macro defined inthe configuration file for a daemon is
con-SEC_WRITE_AUTHENTICATION= REQUIRED
then the daemon must authenticate the client for any nication that requires the WRITE access level If the daemon’sconfiguration contains
commu-SEC_DEFAULT_AUTHENTICATION= REQUIREDand does not contain any other security configuration forAUTHENTICATION, then this default configuration defines thedaemon’s needs for authentication over all access levels
If no authentication methods are specified in the configuration,Condor uses a default authentication such as Globus GSI authenti-cation with x.509 certificates, Kerberos authentication or file systemauthentication as we have discussed in Chapter 4
Encryption
Encryption provides privacy support between two ing parties Through configuration macros, both a client and adaemon can specify whether encryption is required for furthercommunication
communicat-Integrity checks
An integrity check assures that the messages between cating parties have not been tampered with Any change, such asaddition, modification or deletion, can be detected Through con-figuration macros, both a client and a daemon can specify whether
communi-an integrity check is required of further communication
6.4.1.6 Job management in Condor
Condor manages jobs in the following aspects
Job
A Condor job is a work unit submitted to a Condor pool forexecution
Trang 20Job types
Jobs that can be managed by Condor are executable sequential orparallel codes, using, for example, PVM or MPI A job submissionmay involve a job that runs over a long period, a job that needs
to run many times or a job that needs many machines to run inparallel
A job can have one of the following status:
• Idle: There is no job activity.
• Busy: A job is busy running.
• Suspended: A job is currently suspended.
• Vacating: A job is currently checkpointing.
• Killing: A job is currently being killed.
• Benchmarking: The condor_startd is running benchmarks.
Job run-time environments
The Condor universe specifies a Condor execution environment.There are seven universes in Condor 6.6.3 as described below
• The default universe is the Standard Universe (except where
the configuration variable DEFAULT_UNIVERSE defines itotherwise), and tells Condor that this job has been re-linkedviacondor_compile with Condor libraries and therefore supports
checkpointing and remote system calls
• The Vanilla Universe is an execution environment for jobs which
have not been linked with Condor libraries; and it is used tosubmit shell scripts to Condor
• The PVM Universe is used for a parallel job written with PVM 3.4.
• The Globus Universe is intended to provide the standard Condor
interface to users who wish to start Globus jobs from Condor.Each job queued in the job submission file is translated intothe Globus Resource Specification Language (RSL) and subse-quently submitted to Globus via the Globus GRAM protocol
Trang 21• The MPI Universe is used for MPI jobs written with the MPICH
package
• The Java Universe is used for programs written in Java.
• The Scheduler Universe allows a Condor job to be executed
on the host where the job is submitted The job does not needmatchmaking for a host and it will never be preempted
Job submission with a shared file system
If Vanilla, Java or MPI jobs are submitted without using the filetransfer mechanism, Condor must use a shared file system to accessinput and output files In this case, the job must be able to accessthe data files from any machine on which it could potentially run
Job submission without a shared file system
Condor also works well without a shared file system A usercan use the file transfer mechanism in Condor when submittingjobs Condor will transfer any files needed by a job from the hostmachine where the job is submitted into a temporary workingdirectory on the machine where the job is to be executed Con-dor executes the job and transfers output back to the Submissionmachine
The user specifies which files to transfer, and at what point theoutput files should be copied back to the Submission host Thisspecification is done within the job’s submission description file.The default behavior of the file transfer mechanism varies acrossthe different Condor universes, which have been discussed aboveand it differs between UNIX and Windows systems
Job priority
Job priorities allow the assignment of a priority level to each mitted Condor job in order to control the order of execution Thepriority of a Condor job can be changed
sub-Chirp I/O
The Chirp I/O facility in Condor provides a sophisticatedI/O functionality It has two advantages over simple whole-filetransfers
• First, the use of input files is done at run time rather than mission time
sub-• Second, a part of a file can be transferred instead of transferringthe whole file
Trang 22Job flow management
A Condor job can have many tasks of which each task is an cutable code Condor uses a Directed Acyclic Graph (DAG) to rep-resent a set of tasks in a job submission, where the input/output,
exe-or execution of one exe-or mexe-ore tasks is dependent on one exe-or mexe-oreother tasks The tasks are nodes (vertices) in the graph, and theedges (arcs) identify the dependencies of the tasks Condor findsthe Execution hosts for the execution of the tasks involved, but itdoes not schedule the tasks in terms of dependencies
The Directed Acyclic Graph Manager (DAGMan) [7] is a scheduler for Condor jobs DAGMan submits jobs to Condor in anorder represented by a DAG and processes the results An inputfile is used to describe the dependencies of the tasks involved inthe DAG, and each task in the DAG also has its own descriptionfile
meta-Job monitoring
Once submitted, the status of a Condor job can be monitored using
condor_q command In the case of DAG, the progress of the DAG
can also be monitored by looking at the log file(s), or by using
condor_q–dag.
Job recovery: The rescue DAG
DAGMan can help with the resubmission of uncompleted portions
of a DAG when one or more nodes fail If any node in the DAGfails, the remainder of the DAG is continued until no more forwardprogress can be made based on the DAG’s dependencies When
a node in the DAG fails, DAGMan automatically produces a filecalled a Rescue DAG, which is a DAG input file whose function-ality is the same as the original DAG file The Rescue DAG fileadditionally contains indication of successfully completed nodesusing the DONE option If the DAG is re-submitted using this
Rescue DAG input file, the nodes marked as completed will not
be re-executed
Job checkpointing mechanism
Checkpointing is normally used in a Condor job that needs a longtime to complete It takes a snapshot of the current state of a job
in such a way that the job can be restarted from that checkpointedstate at a later time
Checkpointing gives the Condor scheduler the freedom to sider scheduling decisions through preemptive-resume schedul-ing If the scheduler decides to no longer allocate a host to a
Trang 23recon-job, e.g when the owner of that host starts using the host, it cancheckpoint the job and preempt it without losing the work thejob has already accomplished The job can be resumed later whenthe scheduler allocates it a new host Additionally, periodic check-pointing provides fault tolerance in Condor.
Computing On Demand
Computing On Demand (COD) extends Condor’s high throughputcomputing abilities to include a method for running short-termjobs on available resources immediately
COD extends Condor’s job management to include interactive,computation-intensive jobs, giving these jobs immediate access tothe computing power they need over a relatively short period
of time COD provides computing power on demand, switching
predefined resources from working on Condor jobs to working onthe COD jobs These COD jobs cannot use the batch schedulingfunctionality of Condor since the COD jobs require interactiveresponse time
6.4.1.7 Resource management in Condor
Condor manages resources in a Condor pool in the followingaspects
Tracking resource usage
The condor_startd daemon on each host reports to the dor_collector daemon on the Central Manager host about the
con-resources available on that host
User priority
Condor hosts are allocated to users based upon a user’s priority
A lower numerical value for user priority means higher priority,
so a user with priority 5 will get more resources than a user withpriority 50
Trang 246.4.1.8 Job scheduling policies in Condor
Job scheduling in a Condor pool is not strictly based on a come-first-server selection policy Rather, to keep large jobs fromdraining the pool of resources, Condor uses a unique up-downalgorithm [8] that prioritizes jobs inversely to the number of cyclesrequired to run the job Condor supports the following policies inscheduling jobs
first-• First come first serve: This is the default scheduling policy.
• Preemptive scheduling: Preemptive policy lets a pending
high-priority job take resources away from a running job of lowerpriority
• Dedicated scheduling: Dedicated scheduling means that jobs
scheduled to dedicated resources cannot be preempted
6.4.1.9 Resource matching in Condor
Resource matching [9] is used to match an Execution host to run
a selected job or jobs Thecondor_collector daemon running on the
Central Manager host receives job request advertisements from the
condor_schedd daemon running on a Submission host and resource
availability advertisements from thecondor_startd daemon running
on an Execution host A resource match is performed by the dor_negotiator daemon on the Central Manager host by selecting
con-a resource bcon-ased on job requirements Both job request con-tisements and resource advertisements are described in CondorClassified Advertisement (ClassAd) language, a mechanism forrepresenting the characteristics and constraints of hosts and jobs
adver-in the Condor system
A ClassAd is a set of uniquely named expressions Each namedexpression is called an attribute ClassAds use a semi-structureddata model for resource descriptions Thus, no specific schema isrequired by the matchmaker, allowing it to work naturally in aheterogeneous environment
The ClassAd language includes a query language, allowingadvertising agents such as thecondor_startd and condor_schedd dae-
mons to specify the constraints in matching resource offers anduser job requests Figure 6.12 shows an example of a ClassAdjob request advertisement and a ClassAd resource advertisement
Trang 25Job ClassAd Host ClassAd
TargetType=“Machine” TargetType = “Job”
Requirements= Machine = “s140n209.brunel.ac.uk”
((other.Arch == “INTEL”&& Arch = “INTEL”
other OpSys == “LINUX”)&& OpSys = “LINUX”
Figure 6.12 Two ClassAd samples
These two ClassAds will be used by thecondor_negotiator daemon
running on the Central Manager host to check whether the hostcan be matched with the job requirements
6.4.1.10 Condor support in Globus
Jobs can be submitted directly to a Condor pool from a Condorhost, or via Globus (GT2 or earlier versions of Globus), as shown
in Figure 6.13 The Globus host is configured withCondor ager provided by Globus When using a Condor jobmanager, jobs
jobman-are submitted to the Globus resource, e.g using globus_job_run.
However, instead of forking the jobs on the local machine, jobs arere-submitted by Globus to Condor using thecondor_submit tool.
To use Condor-G, we do not need to install a Condor pool.Condor-G is only the job management part of Condor Condor-Gcan be installed on just one machine within an organization and
Trang 26Figure 6.13 Submitting jobs to a Condor pool via Condor or Globus
Figure 6.14 Submitting jobs to Globus via Condor-G
the access to remote Grid resources using a Globus interface can
be done through it
Submitting Globus jobs using Condor-G provides a much higherlevel of service than simply using globus_job_run command pro-
vided by Globus
• First, jobs submitted to Globus with Condor-G enter a localCondor queue that can be effectively managed by Condor
Trang 27• Secondly, jobs remain in the Condor queue until they are pleted Therefore, should the job crash while running remotely,Condor-G can re-submit it again without user intervention.
com-In a word, Condor-G provides a level of service guarantee that isnot available withglobus_job_run and other Globus commands Note: Condor-G does not have a GUI (the “G” is for Grid) How-
ever, the following graphic tools can be used with both Condorand Condor-G:
• CondorView: Shows a graphical history of the resources in a pool.
• Condor UserLogViewer: Shows a graphical history of a large set
of jobs submitted to Condor or Condor-G
6.4.2 Sun Grid Engine
The SGE is a distributed resource management and schedulingsystem from Sun Microsystems that can be used to optimize theutilization of software and hardware resources in a UNIX-basedcomputing environment The SGE can be used to find a pool ofidle resources and harnesses these resources; also it can be usedfor normal activities, such as managing and scheduling jobs ontothe available resources The latest version of SGE is Sun N1 GridEngine (N1GE) version 6 (see Table 6.2) In this section, we focus
on SGE 5.3 Standard Edition because it is freely downloadable
6.4.2.1 The SGE architecture
Hosts (machines or nodes) in SGE are classified into fourcategories – master, submission, execution, administration andshadow Figure 6.15 shows the SGE architecture
• Master host: A single host is selected to be the SGE master host.
This host handles all requests from users, makes job-schedulingdecisions and dispatches jobs to execution hosts
• Submit host: Submit hosts are machines configured to submit,
monitor and administer jobs, and to manage the entire cluster
• Execution host: Execution hosts have the permission to run SGE
jobs
Trang 28Table 6.2 A note of the differences between N1 Grid Engine and Sun Grid
• Grid Engine Management Model for 1-click deployment of execution hosts on
an arbitrary number of hosts (to be delivered in the second quarter of 2005) The basic software components underneath N1GE and SGE are identical In fact, the open-source project is the development platform for those components Proprietary Sun code only exists for the differentiators listed above (where applicable) Note that some of those differentiators use other Sun products or technologies, which are not open source themselves.
Figure 6.15 The architecture of the SGE
• Administration host: SGE administrators use administration hosts
to make changes to the cluster’s configuration, such as changingdistributed resource management parameters, configuring newnodes or adding or changing users
• Shadow master host: While there is only one master host, other
machines in the cluster can be designated as shadow masterhosts to provide greater availability A shadow master hostcontinually monitors the master host, and automatically and
Trang 29transparently assumes control in the event that the master hostfails Jobs already in the cluster are not affected by a master hostfailure.
6.4.2.2 Daemons in an SGE cluster
As shown in Figure 6.16, to configure an SGE cluster, the followingdaemons need to be started
sge_qmaster – The Master daemon
Thesge_qmaster daemon is the centre of the cluster’s management
and scheduling activities; it maintains tables about hosts, queues,jobs, system load and user permissions It receives scheduling deci-sions fromsge_schedd daemon and requests actions from sge_execd
daemon on the appropriate execution host(s) Thesge_qmaster
dae-mon runs on the Master host
sge_schedd – The Scheduler daemon
The sge_schedd is a scheduling daemon that maintains an
up-to-date view of the cluster’s status with the help of sge_qmaster
daemon It makes the scheduling decision about which job(s) aredispatched to which queue(s) It then forwards these decisions to
Figure 6.16 Daemons in SGE