Computational Web portal takes advantage of the technologies and standards developed for Internet comput-ing such as HTTP, HTML, XML, CGI, Java, CORBA [6, 7], and Enterprise JavaBeans EJ
Trang 1Distributed object-based Grid
computing environments
Tomasz Haupt1 and Marlon E Pierce2
1Mississippi State University, Starkville, Mississippi, United States,2Indiana University,
Bloomington, Indiana, United States
30.1 INTRODUCTION
Computational Grid technologies hold the promise of providing global scale distributed computing for scientific applications The goal of projects such as Globus [1], Legion [2], Condor [3], and others is to provide some portion of the infrastructure needed to sup-port ubiquitous, geographically distributed computing [4, 5] These metacomputing tools provide such services as high-throughput computing, single login to resources distributed across multiple organizations, and common Application Programming Interfaces (APIs) and protocols for information, job submission, and security services across multiple orga-nizations This collection of services forms the backbone of what is popularly known as the computational Grid, or just the Grid
The service-oriented architecture of the Grid, with its complex client tools and pro-gramming interfaces, is difficult to use for the application developers and end users The perception of complexity of the Grid environment comes from the fact that often Grid
Grid Computing – Making the Global Infrastructure a Reality. Edited by F Berman, A Hey and G Fox
2003 John Wiley & Sons, Ltd ISBN: 0-470-85319-0
Trang 2services address issues at levels that are too low for the application developers (in terms
of API and protocol stacks) Consequently, there are not many Grid-enabled applications, and in general, the Grid adoption rate among the end users is low
By way of contrast, industry has undertaken enormous efforts to develop easy user inter-faces that hide the complexity of underlying systems Through Web portals the user has access to a wide variety of services such as weather forecasts, stock market quotes and on-line trading, calendars, e-mail, auctions, air travel reservations and ticket purchasing, and many others yet to be imagined It is the simplicity of the user interface, which hides all implementation details from the user, that has contributed to the unprecedented success of the idea of a Web browser
Grid computing environments (GCEs) such as computational Web portals are an exten-sion of this idea GCEs are used for aggregating, managing, and delivering grid services to end users, hiding these complexities behind user-friendly interfaces Computational Web portal takes advantage of the technologies and standards developed for Internet comput-ing such as HTTP, HTML, XML, CGI, Java, CORBA [6, 7], and Enterprise JavaBeans (EJB) [8], using them to provide browser-based access to High Performance Computing (HPC) systems (both on the Grid and off) A potential advantage of these environments also is that they may be merged with more mainstream Internet technologies, such as information delivery and archiving and collaboration
Besides simply providing a good user interface, computing portals designed around dis-tributed object technologies provide the concept of persistent state to the Grid The Grid infrastructure is implemented as a bag of services Each service performs a particular trans-action following a client-server model Each transtrans-action is either stateless or supports only
a conversional state This model closely resemble HTTP-based Web transaction model: the user makes a request by pointing the Web browser to a particular URL, and a Web server responds with the corresponding, possibly dynamically generated, HTML page However, the very early Web developers found this model too restrictive Nowadays, most Web servers utilize object- or component-oriented technologies, such as EJB or CORBA, for session management, multistep transaction processing, persistence, user profiles, providing enterprise-wide access to resources including databases and for incorporating third-party services There is a remarkable similarity between the current capabilities of the Web servers (the Web technologies), augmented with Application Servers (the Object and Com-ponent Technologies), and the required functionality of a Grid Computing Environment This paper provides an overview of Gateway and Mississippi Computational Web Portal (MCWP) These projects are being developed separately at Indiana University and Mississippi State University, respectively, but they share a common design heritage The key features of both MCWP and Gateway are the use of XML for describing portal metadata and the use of distributed object technologies in the control tier
30.2 DEPLOYMENT AND USE OF COMPUTING
PORTALS
In order to make concrete the discussion presented in the introduction, we describe below our deployed portals These provide short case studies on the types of portal users and the services that they require
Trang 330.2.1 DMEFS: an application of the Mississippi Computational Web Portal
The Distributed Marine Environment Forecast System (DMEFS) [9] is a project of the Mississippi State team that is funded by the Office of Naval Research DMEFS’s goal is
to provide open framework to simulate the littoral environments across many temporal and spatial scales in order to accelerate the evolution of timely and accurate forecasting DMEFS is expected to provide a means for substantially reducing the time to develop, prototype, test, validate, and transition simulation models to operation, as well as support
a genuine, synergistic collaboration among the scientists, the software engineers, and the operational users In other words, the resulting system must provide an environment for model development, including model coupling, model validation and data analysis, routine runs of a suite of forecasts, and decision support
Such a system has several classes of users The model developers are expected to be computer savvy domain specialists On the other hand, operational users who routinely run the simulations to produce daily forecasts have only a limited knowledge on how the simulations actually work, while the decision support is typically interested only in accessing the end results The first type of users typically benefits from services such as archiving and data pedigree as well as support for testing and validation The second type of users benefits from an environment that simplifies the complicated task of setting up and running the simulations, while the third type needs ways of obtaining and organizing results
DMEFS is in its initial deployment phase at the Naval Oceanographic Office Major Shared Resource Center (MSRC) In the next phase, DMEFS will develop and inte-grate metadata-driven access to heterogenous, distributed data sources (databases, data servers, scientific instruments) It will also provide support for data quality assessment, data assimilation, and model validation
30.2.2 Gateway support for commodity codes
The Gateway computational Web portal is deployed at the Army Research Laboratory MSRC, with additional deployment approved for the Aeronautical Systems Center MSRC Gateway’s initial focus has been on simplifying access to commercial codes for novice HPC users These users are assumed to understand the preprocessing and postprocessing tools of their codes on their desktop PC or workstation but not to be familiar with common HPC tasks such as queue script writing and job submission and management Problems using HPC systems are often aggravated by the use of different queuing systems between and even within the same center, poor access for remote users caused by slow network speeds at peak hours, changing locations for executables, and licensing issues for commercial codes Gateway attempts to hide or manage as much of these details
as possible, while providing a browser front end that encapsulates sets of commands into relatively few portal actions Currently, Gateway supports job creation, submission, monitoring, and archiving for ANSYS, ZNS, and Fluent, with additional support planned for CTH Gateway interfaces to these codes are currently being tested by early users Because Gateway must deal with applications with restricted source codes, we wrap these codes in generic Java proxy objects that are described in XML The interfaces for the invocation of these services likewise are expressed in XML, and we are in the process
Trang 4of converting our legacy service description to the Web service standard Web Services Description Language (WSDL) [10]
Gateway also provides secure file transfer, job monitoring and job management through a Web browser interface These are currently integrated with the application interfaces but have proven popular on their own and so will be provided as stand-alone services in the future Future plans for Gateway include integration with the Interdisciplinary Computing Envi-ronment (ICE) [11], which provides visualization tools and support for light code coupling through a common data format Gateway will support secure remote job creation and man-agement for ICE-enabled codes, as well as secure, remote, sharable visualization services
30.3 COMPUTING PORTAL SERVICES
One may build computational environments such as the one above out of a common set of core services We list the following as the base set of abstract service definitions, which may be (but are not necessarily) implemented more or less directly with typical Grid technologies in the portal middle tier
1 Security : Allow access only to authenticated users, give them access only to authorized
areas, and keep all communications private
2 Information resources: Inform the user about available codes and machines.
3 Queue script generation: On the basis of the user’s choice of code and host, create a
script to run the job for the appropriate queuing system
4 Job submission: Through a proxy process, submit the job with the selected resources
for the user
5 Job monitoring: Inform the user of the status of his submitted jobs, and more generally
provide events that allow loosely coupled applications to be staged
6 File transfer and management : Allow the user to transfer files between his desktop
computer and a remote system and to transfer files between remote systems
Going beyond the initial core services above, both MCWP and Gateway have identified and have or are in the process of implementing the following GCE-specific services
1 Metadata-driven resource allocation and monitoring: While indispensable for
acquir-ing adequate resources for an application, allocation of remote resources adds to the complexity of all user tasks To simplify this chore, one requires a persistent and platform-independent way to express computational tasks This can be achieved by the introduction of application metadata This user service combines standard authentica-tion, informaauthentica-tion, resource allocaauthentica-tion, and file transfer Grid services with GCE services: metadata discovery, retrieval and processing, metadata-driven Resource Specification Language (RSL) (or batch script) generation, resource brokerage, access to remote file systems and data servers, logging, and persistence
2 Task composition or workflow specification and management : This user service
auto-mates mundane user tasks with data preprocessing and postprocessing, file transfers, format conversions, scheduling, and so on It replaces the nonportable ‘spaghetti’ shell
Trang 5scripts currently widely used It requires task composition tools capable of describing the workflow in a platform-independent way, since some parts of the workflow may be preformed on remote systems The workflow is built hierarchically from reusable mod-ules (applications), and it supports different mechanisms for triggering execution of modules: from static sequences with branches to data flow to event-driven systems The workflow manager combines information, resource brokers, events, resource allocation and monitoring, file transfer, and logging services
3 Metadata-driven, real-time data access service: Certain simulation types perform
assimilation of observational data or analyze experimental data in a real time These data are available from many different sources in a variety of formats Built on top of the metadata, file transfer and persistence services, this user service closely interacts with the resource allocation and monitoring or workflow management services
4 User space, persistency, and pedigree service: This user service provides support for
reuse and sharing of applications and their configuration, as well as for preserving the pedigree of all jobs submitted by the user The pedigree information allows the user
to reproduce any previous result on the one hand and to localize the product of any completed job on the other It collects data generated by other services, in particular,
by the resource allocation and workflow manager
30.4 GRID PORTAL ARCHITECTURE
A computational Web portal is implemented as a multitier system composed of clients running on the users’ desktops or laptops, portal servers providing user level services (i.e portal middleware), and backend servers providing access to the computing resources
30.4.1 The user interface
The user interacts with the portal through either a Web browser, a client application, or both The central idea of both the Gateway and the MCWP user interfaces is to allow
users to organize their work into problem contexts, which are then subdivided into session
contexts in Gateway terminology, or projects and tasks using MCWP terms Problems
(or projects) are identified by a descriptive name handle provided by the user, with sessions automatically created and time-stamped to give them unique names Within a particular session (or task), the user chooses applications to run and selects computing resources to use This interface organization is mapped to components in the portal mid-dleware (user space, persistency, and pedigree services) described below In both cases, the Web browser–based user interface is developed using JavaServer Pages (JSP), which allow us to dynamically generate Web content and interface easily with our Java-based middleware
The Gateway user interface provides three tracks: code selection, problem archive, and administration The code selection track allows the user to start a new problem, make
an initial request for resources, and submit the job request to the selected host’s queuing system The problem archive allows the user to revisit and edit old problem sessions so that he/she can submit his/her job to a different machine, use a different input file, and
Trang 6so forth Changes to a particular session are stored in a newly generated session name The administration track allows privileged users to add applications and host computers
to the portal, modify the properties of these entities, and verify their installation This information is stored in an XML data record, described below
The MCWP user interface provides five distinct views of the system, depending on the user role: developer, analyst, operator, customer, and administrator The developer view combines the selection and archive tracks The analyst view provides tools for data selec-tion and visualizaselec-tions The operator view allows for creating advance scheduling of tasks for routine runs (similar to creating a cron table) The customer view allows access to rou-tinely generated and postprocessed results (plots, maps, and so forth) Finally, the adminis-trator view allows configuration and controlling of all operations performed by the portal
30.4.2 Component-based middleware
The portal middleware naturally splits into two layers: the actual implementation of the user services and the presentation layer responsible for providing mechanisms for the user interactions with the services The presentation layer accepts the user requests and returns the service responses Depending on the implementation strategy for the client, the services’ responses are directly displayed in the Web browser or consumed by the client-side application
A key feature of both Gateway and MCWP is that they provide a container-based middle tier that holds and manages the (distributed) proxy wrappers for basic services like those listed above This allows us to build user interfaces to services without worrying about the implementation of those services Thus, for example, we may implement the portal using standard service implementations from the Globus toolkit, we may implement some core services ourselves for stand-alone resources, or we may implement the portal
as a mixture of these different service implementation styles
The Gateway middle tier consists of two basic sections: a Web server running a servlet engine and a distributed CORBA-based middle tier (WebFlow) This is illustrated in Figure 30.1 The Web server typically runs a single Java Virtual Machine (JVM) on
a single server host that contains local JavaBean components These components may implement specific local services or they may act as proxies for WebFlow-distributed components running in different JVMs on a nest of host computers WebFlow servers consist of a top-level master server and any number of child servers The master server acts as a gatekeeper and manages the life cycle of the children These child servers can in turn provide access to remote backend services such as HPCs running Portal Batch System (PBS) or Load Sharing Facility (LSF) queuing systems, a Condor flock, a Globus grid, and data storage devices By running different WebFlow child servers on different hosts,
we may easily span organizational barriers in a lightweight fashion For more information
on the WebFlow middleware, see References [12, 13, 14] For a general overview of the role of commodity technologies in computational Grids, see Reference [15]
The MCWP application server is implemented using EJB The user space is a hierarchy
of entities: users, projects, tasks, and applications The abstract application metadata tree is implemented as entity beans as well with the host-independent information as one database table and host-dependent information as another one Finally, there are two entities related
Trang 7WebFlow master server
WebFlow child server
WebFlow child server
WebFlow child server
WebFlow
child
server
SECIOP SECIOP
JVM
Data storage
Condor flock
Globus grid
Web browser and client applications
Web browser and client applications
Web server and servlet engine
JavaBean service proxy
JavaBean local service
JavaBean service proxy
JavaBean
local
service
Figure 30.1 The Gateway computational portal is implemented in a multitiered architecture.
to job status: a job entity (with the unique jobId as the key in the job table) and a host that describes the target machines properties (metadata) It is important to note that all metadata beans (i.e application, hosts, and data sets) are implemented using a hybrid technology: EJB and XML, that is, a database is used to store many short XML files
The MCWP services are implemented as EJB session beans, and their relationship
is depicted in Figure 30.2 The bottom-layer services are clients to the low-level Grid services, the upper-layer services are user level services, and the middle-layer services provides mapping between the two former ones The task composition service provides
a high-level interface for the metadata-driven resource allocation and monitoring The knowledge about the configuration of each component of the computational task is encom-passed in the application metadata and presented to the user in the form of a GUI The user does not need to know anything about the low-level Globus interfaces, syntax of RSL or batch schedulers on the remote systems In addition, the user is given either the default values of parameters for all constituent applications that comprise the task or the
Trang 8Metadata compositionTask Scriptingtools spaceUser repository Task schedulingAdvance
Security Logging
Cron RSL and script
generator
Job table
Workflow manager
Resource
broker
Status Resource
allocation
File transfer
Access to remote file systems
Access to data servers and databases EJB container
Figure 30.2 MCWP services are implemented as EJB session beans.
values of parameters used in any of the previous runs The application metadata are acces-sible through the metadata service A configured task, that is, application parameters for all components and relationship between the components (e.g workflow specification) is transparently saved in the user space (or application context) for later reuse Optionally, the user may choose to publish the configured task to be used by others through the task repository service
The scripting tool is similar to the task composition service If several steps are to
be executed on the same machine running in a batch mode, it is much more efficient to generate a shell script that orchestrate these steps in a single run, rather than to submit several batch jobs under control of the workflow manager The advance scheduling service allows an operator to schedule a selected application to run routinely at specified times, say everyday at 2 p.m The services in the middle and bottom layers have self-describing names The job table is an EJB entity that keeps track of all jobs submitted through MCWP and is used for reconnection, monitoring, and preserving the task pedigree The cron service reproduces the functionality of the familiar Unix service to run commands
at predefined times, and it is closely related to the advance scheduling user service The security service is responsible for delegation of the user credentials For Globus-based
Trang 9implementation, it is a client of the myProxy server [16], which stores the user’s temporary certificates For Kerberos-based systems, it serializes the user tickets
For both MCWP and Gateway, it is natural to implement clients as stand-alone Java applications built as a collection of (Enterprise) JavaBean clients However, this approach has several drawbacks if applied to the ‘World Wide Grids’ CORBA and EJB technolo-gies are well suited for distributed, enterprise-wide applications but not for the Internet Going beyond the enterprise boundaries, there is always a problem with client software distribution and in particular the upgrades of the service interfaces Secondly, the proto-cols employed for the client-server communication are not associated with standard ports and are often filtered by firewalls, making it impossible for the external users to access the services Finally, in the case of EJB, currently available containers do not implement robust security mechanisms for extra-enterprise method invocation
An alternative solution is to restrict the scope of client application to the application server and to provide access to it through the World Wide Web, as shown in Figure 30.1 for Gateway and Figure 30.3 for MCWP Here, the clients are implemented as the server-side Java Beans and these beans are accessed by JSP to dynamically generate user interface
as HTML forms This approach solves the problem of the client software distribution as well as the problem of secure access to the Grid resources With the Web browser–server communications secured using the HTTPS protocol, and using myProxy server to store the
The Grid
Web browser: user graphical interface
Java server pages Javabeans: EJB clients
EJB container Metadata
bean
Task composition
Scripting tools
User space
Task repository
Advance scheduling Resource
broker
Workflow manager
Job table
RSL and script generator Cron Logging Security Status Resource
allocation
File transfer
Access to remote file systems
Access to data servers and databases Web server integrated with EJB container
Kerberized universe
Figure 30.3 EJB clients may be implemented as JavaBeans and accessed through JavaServer Pages.
Trang 10user’s Globus (proxy) certificate, the MCWP services are capable of securely allocating services, transfer files, and access data using the Globus grid services Finally, the server-side Java Beans acting as EJB clients can be easily converted into Web services (Simple Object Access Protocol/Web Services Description Language (SOAP/WSDL)) [17] There-fore, the MCWP can be implemented as a stand-alone application, deployed using Java WebStart technology, acting as a WSDL client as opposed to EJB client
30.4.3 Resource tier
Computing resources constitute the final tier of the portal These again are accessed through standard protocols, such as the Globus protocols for Grid-enabled computing resources, and also including protocols such as Java Database Connectivity (JDBC) for database connections
There is always the problem that the computing resource may not be using a grid service, so the transport mechanism for delivering commands from the middle tier to the backend must be pluggable We implement this in the job submission proxy service in the middle tier, which constructs and invokes commands on the backend either through secure remote shell invocations or else through something such as a globusrun command The actual command to use in a particular portal installation is configured
30.5 APPLICATION DESCRIPTORS
One may view the middle tier core services as being generic building blocks for assem-bling portals A specific portal on the other hand includes a collection of metadata about
the services it provides We refer to this metadata as descriptors, which we define in
XML Both MCWP and Gateway define these metadata as a container hierarchy of XML schema: applications contain host computing resources, which contain specific entities like queuing systems Descriptors are divided into two types: abstract and instance descriptors Abstract application descriptors contain the ‘static’ information about how to use a partic-ular application Instance descriptors are used to collect information about a particpartic-ular run
by a particular user, which can be reused later by an archiving service and for pedigree XML descriptors are used to describe data records that should remain long-lived or static As an example, an application descriptor contains the information needed to run
a particular code: the number of input and output files that must be specified on the command line, the method that the application uses for input and output, the machines that the code is installed on, and so on Machine, or host, descriptors describe specific computing resources, including the queuing systems used, the locations of application executables, and the location of the host’s workspace Taken together, these descriptors provide a general framework for building requests for specific resources that can be used
to generate batch queue scripts Applications may be further chained together into a workflow
The GCE Application Metadata Working Group has been proposed as a forum for different groups to exchange ideas and examples of using application metadata, which may potentially lead to a standardization of some of the central concepts