Tài liệu Grid Computing P39 doc

39.3 MEETING THE CHALLENGES: DATA GRIDS AS MANAGED DISTRIBUTED SYSTEMS FOR GLOBAL VIRTUAL ORGANIZATIONS The LHC experiments have thus adopted the ‘Data Grid Hierarchy’ model developed by

Trang 1

Data-intensive Grids for

high-energy physics

Julian J Bunn and Harvey B Newman

California Institute of Technology, Pasadena, California, United States

39.1 INTRODUCTION: SCIENTIFIC EXPLORATION AT THE HIGH-ENERGY FRONTIER

The major high-energy physics (HEP) experiments of the next twenty years will break newground in our understanding of the fundamental interactions, structures and symmetriesthat govern the nature of matter and space-time Among the principal goals are to find themechanism responsible for mass in the universe, and the ‘Higgs’ particles associated withmass generation, as well as the fundamental mechanism that led to the predominance ofmatter over antimatter in the observable cosmos

The largest collaborations today, such as CMS [1] and ATLAS [2] who are buildingexperiments for CERN’s Large Hadron Collider (LHC) program [3], each encompass 2000physicists from 150 institutions in more than 30 countries Each of these collaborationsinclude 300 to 400 physicists in the US, from more than 30 universities, as well as themajor US HEP laboratories The current generation of operational experiments at SLAC(BaBar [4]), Fermilab (D0 [5] and CDF [6]), as well as the experiments at the Relativistic

Grid Computing – Making the Global Infrastructure a Reality. Edited by F Berman, A Hey and G Fox

 2003 John Wiley & Sons, Ltd ISBN: 0-470-85319-0

Trang 2

Heavy Ion Collider (RHIC) program at Brookhaven National Laboratory (BNL) [7], facesimilar challenges BaBar in particular has already accumulated datasets approaching apetabyte (1PB= 1015Bytes).

Collaborations on this global scale would not have been attempted if the cists could not plan on excellent networks: to interconnect the physics groups through-out the life cycle of the experiment, and to make possible the construction of DataGrids capable of providing access, processing and analysis of massive datasets Thesedatasets will increase in size from petabytes to exabytes (1EB= 1018Bytes) within thenext decade

physi-An impression of the complexity of the LHC data can be gained from Figure 39.1,which shows simulated particle trajectories in the inner ‘tracking’ detectors of CMS.The particles are produced in proton–proton collisions that result from the crossing oftwo proton bunches A rare proton–proton interaction (approximately 1 in 1013) result-ing in the production of a Higgs particle that decays into the distinctive signature offour muons, is buried in 30 other ‘background’ interactions produced in the same cross-ing, as shown in the upper half of the figure The CMS software has to filter out thebackground interactions by isolating the point of origin of the high momentum tracks

in the interaction containing the Higgs This filtering produces the clean configurationshown in the bottom half of the figure At this point, the (invariant) mass of the Higgscan be measured from the shapes of the four muon tracks (labelled), which are itsdecay products

Figure 39.1 A simulated decay of the Higgs Boson into four muons (a) The high momentum charged particles in the Higgs event and (b) how the event would actually appear in the detector, submerged beneath many other ‘background’ interactions.

Trang 3

39.2 HEP CHALLENGES: AT THE FRONTIERS OF INFORMATION TECHNOLOGY

Realizing the scientific wealth of these experiments presents new problems in data access,processing and distribution, and collaboration across national and international networks,

on a scale unprecedented in the history of science The information technology challengesinclude the following:

• Providing rapid access to data subsets drawn from massive data stores, rising frompetabytes in 2002 to ∼100 petabytes by 2007, and exabytes (1018bytes) by approxi-mately 2012 to 2015

• Providing secure, efficient, and transparent managed access to heterogeneous distributed computing and data-handling resources, across an ensemble of networks ofvarying capability and reliability

worldwide-• Tracking the state and usage patterns of computing and data resources in order to makepossible rapid turnaround as well as efficient utilization of global resources

• Matching resource usage to policies set by the management of the experiments’ laborations over the long term; ensuring that the application of the decisions made tosupport resource usage among multiple collaborations that share common (network andother) resources are internally consistent

col-• Providing the collaborative infrastructure that will make it possible for physicists in allworld regions to contribute effectively to the analysis and the physics results, particu-larly while they are at their home institutions

• Building regional, national, continental, and transoceanic networks, with bandwidthsrising from the gigabit per second to the terabit per second range over the next decade.1All these challenges need to be met so as to provide the first integrated, managed,distributed system infrastructure that can serve ‘virtual organizations’ on the global scale

39.3 MEETING THE CHALLENGES: DATA GRIDS AS MANAGED DISTRIBUTED SYSTEMS FOR

GLOBAL VIRTUAL ORGANIZATIONS

The LHC experiments have thus adopted the ‘Data Grid Hierarchy’ model (developed

by the MONARC2 project) shown schematically in Figure 39.2 This five-tiered modelshows data at the experiment being stored at the rate of 100 to 1500 MB s−1 throughoutthe year, resulting in many petabytes per year of stored and processed binary data, whichare accessed and processed repeatedly by the worldwide collaborations searching fornew physics processes Following initial processing and storage at the ‘Tier0’ facility

at the CERN laboratory site, the processed data is distributed over high-speed networks

1 Continuing the trend of the last decade, where the affordable bandwidth increased by a factor of order 1000.

2

Trang 4

CERN/Outside resource ratio ~1:2 Tier0/( Σ Tier1)/(Σ Tier2) ~1:1:1 PByte/sec

Physics data cach

Each institute has ~10 physicists working on one or more channels Physicists work on analysis ‘channels’

~100 MBytes/sec

Figure 39.2 The LHC data Grid hierarchy model This was first proposed by the MONARC collaboration in 1999.

to ∼10 to 20 national ‘Tier1’ centers in the United States, leading European countries,Japan, and elsewhere.3The data is there further processed and analyzed and then stored atapproximately 60 ‘Tier2’ regional centers, each serving a small to medium-sized country,

or one region of a larger country (as in the US, UK and Italy) Data subsets are accessedand further analyzed by physics groups using one of hundreds of ‘Tier3’ workgroupservers and/or thousands of ‘Tier4’ desktops.4

The successful use of this global ensemble of systems to meet the experiments’ tific goals depends on the development of Data Grids capable of managing and marshalingthe ‘Tier-N’ resources, and supporting collaborative software development by groups ofvarying sizes spread around the globe The modes of usage and prioritization of tasksneed to ensure that the physicists’ requests for data and processed results are handledwithin a reasonable turnaround time, while at the same time the collaborations’ resourcesare used efficiently

scien-The GriPhyN [8], Particle Physics Data Grid (PPDG) [9], iVDGL [10], EU Grid [11], DataTAG [12], the LHC Computing Grid (LCG) [13] and national Grid projects

Data-in Europe and Asia are workData-ing together, Data-in multiyear R&D programs, to develop the

3 At the time of this writing, a major Tier1 center in Rio de Janeiro Brazil is being planned.

4 Tier4 also includes laptops, and the large number of handheld devices with broadband connections that are expected to come

Trang 5

necessary Grid systems The DataTAG project is also working to address some of thenetwork R&D issues and to establish a transatlantic test bed to help ensure that the USand European Grid systems interoperate smoothly.

The data rates and network bandwidths shown in Figure 39.2 are per LHC Experimentfor the first year of LHC operation The numbers shown correspond to a conservative

‘baseline’, formulated using a 1999–2000 evolutionary view of the advance of networktechnologies over the next five years [14] The reason for this is that the underlying ‘Com-puting Model’ used for the LHC program assumes a very well-ordered, group-orientedand carefully scheduled approach to data transfers supporting the production processingand analysis of data samples More general models supporting more extensive access

to data samples on demand [15] would clearly lead to substantially larger bandwidthrequirements

39.4 EMERGENCE OF HEP GRIDS: REGIONAL

CENTERS AND GLOBAL DATABASES

It was widely recognized from the outset of planning for the LHC experiments that thecomputing systems required to collect, analyze and store the physics data would need

to be distributed and global in scope In the mid-1990s, when planning for the LHCcomputing systems began, calculations of the expected data rates, the accumulated yearlyvolumes and the required processing power led many to believe that HEP would need

a system whose features would not have looked out of place in a science fiction novel.However, careful extrapolations of technology trend lines, and detailed studies of thecomputing industry and its expected development [16] encouraged the experiments that

a suitable system could be designed and built in time for the first operation of the LHCcollider in 2005.5 In particular, the studies showed that utilizing computing resourcesexternal to CERN, at the collaborating institutes (as had been done on a limited scalefor the Large Electron Positron Collider (LEP) experiments) would continue to be anessential strategy, and that a global computing system architecture would need to bedeveloped (It is worthwhile noting that, at that time, the Grid was at an embryonic stage

of development, and certainly not a concept the Experiments were aware of.) Accordingly,work began in each of the LHC experiments on formulating plans and models for howthe computing could be done The CMS experiment’s ‘Computing Technical Proposal’,written in 1996, is a good example of the thinking that prevailed at that time Because thecomputing challenges were considered so severe, several projects were instigated by theExperiments to explore various aspects of the field These projects included RD45 [17],GIOD, MONARC and ALDAP, as discussed in the following sections

39.4.1 The CMS computing model circa 1996

The CMS computing model as documented in the CMS ‘Computing Technical Proposal’was designed to present the user with a simple logical view of all objects needed to

5 At that time, the LHC planning specified a start of machine operations in 2005 The machine is now expected to come on-line

Trang 6

perform physics analysis or detector studies The word ‘objects’ was used loosely to refer

to data items in files (in a traditional context) and to transient or persistent objects (in

an object-oriented (OO) programming context) The proposal explicitly noted that, often,choices of particular technologies had been avoided since they depended too much onguesswork as to what would make sense or be available in 2005 On the other hand, themodel explicitly assumed the use of OO analysis, design and programming With theserestrictions, the model’s fundamental requirements were simply summed up as,

1 objects that cannot be recomputed must be stored somewhere;

2 physicists at any CMS institute should be able to query any objects (recomputable ornot recomputable) and retrieve those results of the query which can be interpreted by

a human;

3 the resources devoted to achieving 1 and 2 should be used as efficiently as possible.Probably the most interesting aspect of the model was its treatment of how to make

the CMS physics data (objects) persistent The Proposal states ‘at least one currently

availableODBMS appears quite capable of handling the data volumes of typical current experiments and requires no technological breakthroughs to scale to the data volumes expected during CMS operation Read performance and efficiency of the use of storage are very similar to Fortran/Zebra systems in use today Large databases can be created

as a federation of moderate (few GB) sized databases, many of which may be on tape

to be recalled automatically in the event of an ‘access fault’ The current product ports a geographically distributed federation of databases and heterogeneous computing platforms Automatic replication of key parts of the database at several sites is already available and features for computing (computable) objects on demand are recognized as strategic developments.’

sup-It is thus evident that the concept of a globally distributed computing and data-servingsystem for CMS was already firmly on the table in 1996 The proponents of the modelhad already begun to address the questions of computing ‘on demand’, replication ofdata in the global system, and the implications of distributing computation on behalf ofend-user physicists

Some years later, CMS undertook a major requirements and consensus-building effort

to modernize this vision of a distributed computing model to a Grid-based computinginfrastructure Accordingly, the current vision sees CMS computing as an activity that

is performed on the ‘CMS Data Grid System’ whose properties have been described inconsiderable detail [18] The CMS Data Grid System specifies a division of labor betweenthe Grid projects (described in this chapter) and the CMS core computing project Indeed,the CMS Data Grid System is recognized as being one of the most detailed and completevisions of the use of Grid technology among the LHC experiments

39.4.2 GIOD

In late 1996, Caltech’s HEP department, its Center for Advanced Computing Research(CACR), CERN’s Information Technology Division, and Hewlett Packard Corporationinitiated a joint project called ‘Globally Interconnected Object Databases’ The GIOD

Trang 7

Project [19] was designed to address the key issues of wide-area network-distributed dataaccess and analysis for the LHC experiments It was spurred by the advent of network-distributed Object database management systems (ODBMSs), whose architecture held thepromise of being scalable up to the multipetabyte range required by the LHC experiments.GIOD was set up to leverage the availability of a large (200 000 MIP) HP Exemplarsupercomputer, and other computing and data-handling systems at CACR as of mid-

1997 It addressed the fundamental need in the HEP community at that time to prototypeobject-oriented software, databases and mass storage systems, which were at the heart

of the LHC and other (e.g BaBar) major experiments’ data analysis plans The projectplan specified the use of high-speed networks, including ESnet, and the transatlantic linkmanaged by the Caltech HEP group, as well as next-generation networks (CalREN2 inCalifornia and Internet2 nationwide) which subsequently came into operation with speedsapproaching those to be used by HEP in the LHC era

The GIOD plan (formulated by Bunn and Newman in late 1996) was to develop anunderstanding of the characteristics, limitations, and strategies for efficient data accessusing the new technologies A central element was the development of a prototype

‘Regional Center’ This reflected the fact that both the CMS and ATLAS ComputingTechnical Proposals foresaw the use of a handful of such centers, in addition to the maincenter at CERN, with distributed database ‘federations’ linked across national and inter-national networks Particular attention was to be paid to how the system software wouldmanage the caching, clustering and movement of collections of physics objects betweenstorage media and across networks In order to ensure that the project would immediatelybenefit the physics goals of CMS and US CMS while carrying out its technical R&D,

it also called for the use of the CACR computing and data storage systems to produceterabyte samples of fully simulated LHC signal and background events that were to bestored in the Object database

The GIOD project produced prototype database, reconstruction, analysis and ization systems This allowed the testing, validation and development of strategies andmechanisms that showed how the implementation of massive distributed systems for dataaccess and analysis in support of the LHC physics program would be possible Deploymentand tests of the terabyte-scale GIOD database were made at a few US universities andlaboratories participating in the LHC program In addition to providing a source of simu-lated events for evaluation of the design and discovery potential of the CMS experiment,the database system was used to explore and develop effective strategies for distributeddata access and analysis at the LHC These tests used local, regional, national and interna-tional backbones, and made initial explorations of how the distributed system worked, andwhich strategies were most effective The GIOD Project terminated in 2000, its findingsdocumented [19], and was followed by several related projects described below

visual-39.4.3 MONARC

The MONARC6 project was set up in 1998 to model and study the worldwide-distributedComputing Models for the LHC experiments This project studied and attempted to opti-mize the site architectures and distribution of jobs across a number of regional computing

6

Trang 8

centers of different sizes and capacities, in particular, larger Tier-1 centers, providing a fullrange of services, and smaller Tier-2 centers The architecture developed by MONARC

is described in the final report [20] of the project

MONARC provided key information on the design and operation of the ComputingModels for the experiments, who had envisaged systems involving many hundreds ofphysicists engaged in analysis at laboratories and universities around the world Themodels encompassed a complex set of wide-area, regional and local-area networks, aheterogeneous set of compute- and data-servers, and an undetermined set of prioritiesfor group-oriented and individuals’ demands for remote data and compute resources.Distributed systems of the size and complexity envisaged did not yet exist, althoughsystems of a similar size were predicted by MONARC to come into operation and beincreasingly prevalent by around 2005

The project met its major milestones, and fulfilled its basic goals, including

• identifying first-round baseline computing models that could provide viable (and effective) solutions to meet the basic simulation, reconstruction, and analysis needs ofthe LHC experiments,

cost-• providing a powerful (CPU and time efficient) simulation toolset [21] that enabledfurther studies and optimization of the models,

• providing guidelines for the configuration and services of Regional Centers,

• providing an effective forum in which representatives of actual and candidate RegionalCenters may meet and develop common strategies for LHC computing

In particular, the MONARC work led to the concept of a Regional Center hierarchy,

as shown in Figure 39.2, as the best candidate for a cost-effective and efficient means offacilitating access to the data and processing resources The hierarchical layout was alsobelieved to be well adapted to meet local needs for support in developing and runningthe software, and carrying out the data analysis with an emphasis on the responsibili-ties and physics interests of the groups in each world region In the later phases of theMONARC project, it was realized that computational Grids, extended to the data-intensivetasks and worldwide scale appropriate to the LHC, could be used and extended (as dis-cussed in Section 39.9) to develop the workflow and resource management tools needed

to effectively manage a worldwide-distributed ‘Data Grid’ system for HEP

39.4.4 ALDAP

The NSF funded three-year ALDAP project (which terminated in 2002) concentrated

on the data organization and architecture issues for efficient data processing and accessfor major experiments in HEP and astrophysics ALDAP was a collaboration betweenCaltech, and the Sloan Digital Sky Survey (SDSS7) teams at Johns Hopkins Universityand Fermilab The goal was to find fast space- and time-efficient structures for storing

7 The Sloan Digital Sky Survey (SDSS) will digitally map about half of the northern sky in five filter bands from UV to the near IR SDSS is one of the first large physics experiments to design an archival system to simplify the process of ‘data

Trang 9

large scientific data sets The structures needed to efficiently use memory, disk, tape, local,and wide area networks, being economical on storage capacity and network bandwidth.The SDSS is digitally mapping about half the northern sky in five filter bands from UV

to the near IR SDSS is one of the first large physics experiments to design an archivalsystem to simplify the process of ‘data mining’ and shield researchers from the need tointeract directly with any underlying complex architecture

The need to access these data in a variety of ways requires it to be organized in ahierarchy and analyzed in multiple dimensions, tuned to the details of a given discipline.But the general principles are applicable to all fields To optimize for speed and flexibilitythere needs to be a compromise between fully ordered (sequential) organization, andtotally ‘anarchic’, random arrangements To quickly access information from each ofmany ‘pages’ of data, the pages must be arranged in a multidimensional mode in aneighborly fashion, with the information on each page stored judiciously in local clusters.These clusters themselves form a hierarchy of further clusters These were the ideas thatunderpinned the ALDAP research work

Most of the ALDAP project goals were achieved Besides them, the collaborationyielded several other indirect benefits It led to further large collaborations, most notablywhen the ALDAP groups teamed up in three major successful ITR projects: GriPhyN,iVDGL and NVO In addition, one of the ALDAP tasks undertaken won a prize in theMicrosoft-sponsored student Web Services contest The ‘SkyServer’[22], built in collab-oration with Microsoft as an experiment in presenting complex data to the wide public,continues to be highly successful, with over 4 million Web hits in its first 10 months

39.5 HEP GRID PROJECTS

In this section we introduce the major HEP Grid projects Each of them has a ferent emphasis: PPDG is investigating short-term infrastructure solutions to meet themission-critical needs for both running particle physics experiments and those in activedevelopment (such as CMS and ATLAS) GriPhyN is concerned with longer-term R&D

dif-on Grid-based solutidif-ons for, collectively, Astrdif-onomy, Particle Physics and Gravity WaveDetectors The international Virtual Data Grid Laboratory (iVDGL) will provide globaltest beds and computing resources for those experiments The EU DataGrid has sim-ilar goals to GriPhyN and iVDGL, and is funded by the European Union LCG is aCERN-based collaboration focusing on Grid infrastructure and applications for the LHCexperiments Finally, CrossGrid is another EU-funded initiative that extends Grid work

to eleven countries not included in the EU DataGrid There are several other smaller Gridprojects for HEP, which we do not cover here because of space limitations

39.5.1 PPDG

The Particle Physics Data Grid (www.ppdg.net) collaboration was formed in 1999 toaddress the need for Data Grid services to enable the worldwide-distributed computingmodel of current and future high-energy and nuclear physics (HENP) experiments Initially

Trang 10

funded from the Department of Energy’s NGI program and later from the MICS8 andHENP9 programs, it has provided an opportunity for early development of the Data Gridarchitecture as well as for the evaluation of some prototype Grid middleware.

PPDG’s second round of funding is termed the Particle Physics Data Grid oratory Pilot This phase is concerned with developing, acquiring and delivering vitallyneeded Grid-enabled tools to satisfy the data-intensive requirements of particle and nuclearphysics Novel mechanisms and policies are being vertically integrated with Grid mid-dleware and experiment-specific applications and computing resources to form effectiveend-to-end capabilities As indicated in Figure 39.3, PPDG is a collaboration of computer

Jiab data management

CMS data management

Atlas data management

BaBar data management D0 data management

CMS Atlas

Globus users

SRB users

Condor users

HENP GC users

Trang 11

scientists with a strong record in distributed computing and Grid technology, and cists with leading roles in the software and network infrastructures for major high-energyand nuclear experiments A three-year program has been outlined for the project that takesfull advantage of the strong driving force provided by currently operating physics experi-ments, ongoing Computer Science (CS) projects and recent advances in Grid technology.The PPDG goals and plans are ultimately guided by the immediate, medium-term andlonger-term needs and perspectives of the physics experiments, and by the research anddevelopment agenda of the CS projects involved in PPDG and other Grid-oriented efforts.

physi-39.5.2 GriPhyN

The GriPhyN (Grid Physics Network – http://www.griphyn.org) project is a collaboration

of CS and other IT researchers and physicists from the ATLAS, CMS, Laser InterferometerGravitational-wave Observatory (LIGO) and SDSSexperiments The project is focused onthe creation of Petascale Virtual Data Grids that meet the data-intensive computationalneeds of a diverse community of thousands of scientists spread across the globe Theconcept of Virtual Data encompasses the definition and delivery to a large community of

a (potentially unlimited) virtual space of data products derived from experimental data asshown in Figure 39.4 In this virtual data space, requests can be satisfied via direct accessand/or computation, with local and global resource management, policy, and securityconstraints determining the strategy used Overcoming this challenge and realizing theVirtual Data concept requires advances in three major areas:

Resource management services

Security and policy services

Other grid services

Transforms Raw data

source Distributed resources(code, storage,

computers, and network)

Production team

Other users Individual investigator

Interactive user tools

scheduling tools

Request execution management tools

Figure 39.4 A production Grid, as envisaged by GriPhyN, showing the strong integration of data generation, storage, computing, and network facilities, together with tools for scheduling, management and security.

Trang 12

• Virtual data technologies: Advances are required in information models and in new

methods of cataloging, characterizing, validating, and archiving software components

to implement virtual data manipulations

• Policy-driven request planning and scheduling of networked data and computational resources: Mechanisms are required for representing and enforcing both local and

global policy constraints and new policy-aware resource discovery techniques

• Management of transactions and task execution across national-scale and worldwide virtual organizations: New mechanisms are needed to meet user requirements for per-

formance, reliability, and cost Agent computing will be important to permit the Grid

to balance user requirements and Grid throughput, with fault tolerance

The GriPhyN project is primarily focused on achieving the fundamental IT advancesrequired to create Petascale Virtual Data Grids, but is also working on creating softwaresystems for community use, and applying the technology to enable distributed, collabo-rative analysis of data A multifaceted, domain-independent Virtual Data Toolkit is beingcreated and used to prototype the virtual Data Grids, and to support the CMS, ATLAS,LIGO and SDSS analysis tasks

39.5.3 iVDGL

The international Virtual Data Grid Laboratory (iVDGL) (http://www.ivdgl.org) has beenfunded to provide a global computing resource for several leading international experi-ments in physics and astronomy These experiments include the LIGO, the ATLAS andCMS experiments, the SDSS, and the National Virtual Observatory (NVO) For theseprojects, the powerful global computing resources available through the iVDGL shouldenable new classes of data-intensive algorithms that will lead to new scientific results.Other application groups affiliated with the NSF supercomputer centers and EU projectsare also taking advantage of the iVDGL resources Sites in Europe and the United Statesare, or soon will be, linked together by a multi-gigabit per second transatlantic link funded

by a companion project in Europe Management of iVDGL is integrated with that of theGriPhyN Project Indeed, the GriPhyN and PPDG projects are providing the basic R&Dand software toolkits needed for iVDGL The European Union DataGrid (see the nextsection) is also a major participant and is contributing some basic technologies and tools.The iVDGL is based on the open Grid infrastructure provided by the Globus Toolkit andbuilds on other technologies such as the Condor resource management tools

As part of the iVDGL project, a Grid Operations Center (GOC) has been created.Global services and centralized monitoring, management, and support functions are beingcoordinated by the GOC, which is located at Indiana University, with technical effortprovided by GOC staff, iVDGL site staff, and the CS support teams The GOC operatesiVDGL just as a Network Operations Center (NOC) manages a network, providing asingle, dedicated point of contact for iVDGL status, configuration, and management, andaddressing overall robustness issues

39.5.4 DataGrid

The European DataGrid (eu-datagrid.web.cern.ch) is a project funded by the EuropeanUnion with the aim of setting up a computational and data-intensive Grid of resources

Trang 13

for the analysis of data coming from scientific exploration Next generation sciencewill require coordinated resource sharing, collaborative processing and analysis of hugeamounts of data produced and stored by many scientific laboratories belonging to severalinstitutions.

The main goal of the DataGrid initiative is to develop and test the technologicalinfrastructure that will enable the implementation of scientific ‘collaboratories’ whereresearchers and scientists will perform their activities regardless of geographical location

It will also allow interaction with colleagues from sites all over the world, as well as thesharing of data and instruments on a scale previously unattempted The project is devisingand developing scalable software solutions and test beds in order to handle many petabytes

of distributed data, tens of thousand of computing resources (processors, disks, etc.), andthousands of simultaneous users from multiple research institutions

The DataGrid initiative is led by CERN, together with five other main partners andfifteen associated partners The project brings together the following European leadingresearch agencies: the European Space Agency (ESA), France’s Centre National de laRecherche Scientifique (CNRS), Italy’s Istituto Nazionale di Fisica Nucleare (INFN), theDutch National Institute for Nuclear Physics and High-Energy Physics (NIKHEF) and theUK’s Particle Physics and Astronomy Research Council (PPARC) The fifteen associatedpartners come from the Czech Republic, Finland, France, Germany, Hungary, Italy, theNetherlands, Spain, Sweden and the United Kingdom

DataGrid is an ambitious project Its development benefits from many different kinds

of technology and expertise The project spans three years, from 2001 to 2003, with over

200 scientists and researchers involved

The DataGrid project is divided into twelve Work Packages distributed over fourWorking Groups: Test bed and Infrastructure, Applications, Computational & DataGridMiddleware, Management and Dissemination Figure 39.5 illustrates the structure of theproject and the interactions between the work packages

39.5.5 LCG

The job of CERN’s LHC Computing Grid Project (LCG – http://lhcgrid.web.cern.ch) is toprepare the computing infrastructure for the simulation, processing and analysis of LHCdata for all four of the LHC collaborations This includes both the common infrastructure

of libraries, tools and frameworks required to support the physics application software,and the development and deployment of the computing services needed to store andprocess the data, providing batch and interactive facilities for the worldwide community

of physicists involved in the LHC (see Figure 39.6)

The first phase of the project, from 2002 through 2005, is concerned with the opment of the application support environment and of common application elements, thedevelopment and prototyping of the computing services and the operation of a series ofcomputing data challenges of increasing size and complexity to demonstrate the effec-tiveness of the software and computing models selected by the experiments During thisperiod, there will be two series of important but different types of data challenge underway: computing data challenges that test out the application, system software, hardware

Trang 14

Applications (Work Packages 8 −10):

Earth observation, High-energy Physics,

Biology

DataGrid Middleware (Work Packages 1 −5):

Workload management, Data management, Monitoring services, Mass storage management, Fabric management

Applications use DataGrid middleware

to access resources

DataGrid middleware provides access

to distributed and heterogeneous resources

Infrastructure (Work Packages 6 −7)

Work plan definition

Project overview board

e-Science Software and

computing committee (SC2)

Project leader

WP WP WP WP WP EU

Reports

Reviews

Resource Matters The LHC computing Grid project

Figure 39.6 The organizational structure of the LHC computing Grid, showing links to external projects and industry.

Trang 15

and computing model, and physics data challenges aimed at generating data and ing it to study the behavior of the different elements of the detector and triggers Duringthis R&D phase, the priority of the project is to support the computing data challenges,and to identify and resolve problems that may be encountered when the first LHC dataarrives The physics data challenges require a stable computing environment, and thisrequirement may conflict with the needs of the computing tests, but it is an importantgoal of the project to arrive rapidly at the point where stability of the Grid prototypeservice is sufficiently good to absorb the resources that are available in Regional Centersand CERN for physics data challenges.

analyz-This first phase will conclude with the production of a Computing System TechnicalDesign Report, providing a blueprint for the computing services that will be requiredwhen the LHC accelerator begins production This will include capacity and performancerequirements, technical guidelines, costing models, and a construction schedule takingaccount of the anticipated luminosity and efficiency profile of the accelerator

A second phase of the project is envisaged, from 2006 through 2008, to oversee theconstruction and operation of the initial LHC computing system

39.5.6 CrossGrid

CrossGrid (http://www.crossgrid.org) is a European project developing, implementing andexploiting new Grid components for interactive compute- and data-intensive applica-tions such as simulation and visualization for surgical procedures, flooding crisis team

1.3 Interactive distributed data access

1.1 BioMed

1.2 Flooding

1.3 Data Mining on Grid (NN)

1.4 Meteo pollution

2.2 MPI Verification

2.3 Metrics and benchmarks

2.4 Performance analysis

3.1 Portal &

Migrating desktop

and others

1.3 Interactive session services

1.1 Grid Visualization kernel

1.1 User interaction services

3.2 Scheduling Agents

3.4 Optimization of Grid data access

3.3 Grid monitoring

3.1 Roaming access

Globus replica manager

DataGrid replica manager

Replica catalog Replica

catalog Globus-IO GSI

GIS/MDS GridFTP

GRAM DataGrid job submission service

Resource manager (CE)

Resource manager (SE)

Instruments (satellites, radars)

Trang 16

decision-support systems, distributed data analysis in high-energy physics, and air tion combined with weather forecasting The elaborated methodology, generic applicationarchitecture, programming environment, and new Grid services are being validated andtested on the CrossGrid test bed, with an emphasis on a user-friendly environment Cross-Grid collaborates closely with the Global Grid Forum (GGF) and the DataGrid project inorder to profit from their results and experience, and to ensure full interoperability (seeFigure 39.7) The primary objective of CrossGrid is to further extend the Grid environ-ment to a new category of applications of great practical importance Eleven Europeancountries are involved.

pollu-The essential novelty of the CrossGrid project consists in extending the Grid to acompletely new and socially important category of applications The characteristic feature

of these applications is the presence of a person in a processing loop, with a requirementfor real-time response from the computer system The chosen interactive applications areboth compute- and data-intensive

39.6 EXAMPLE ARCHITECTURES AND

Ter-TeraGrid is strongly supported by the physics community participating in the LHC,through the PPDG, GriPhyN and iVDGL projects, due to its massive computing capac-ity, leading-edge network facilities, and planned partnerships with distributed systems

in Europe

As part of the planning work for the TeraGrid proposal, a successful ‘preview’ ofits potential use was made, in which a highly compute and data-intensive Grid task forthe CMS experiment was distributed between facilities at Caltech, Wisconsin and NCSA(see Figure 39.8) The TeraGrid test runs were initiated at Caltech, by a simple script

10 The US partnership for advanced computing infrastructure See http://www.paci.org

11

Trang 17

Condor Pool

CMSIM

Globus

Globus NCSA

Caltech

Figure 39.8 Showing the Grid-based production of Monte Carlo data for the CMS experiment The setup, distributed between Caltech, Wisconsin, and NCSA, was an early demonstrator of the success of Grid infrastructure for HEP computing.

invocation The necessary input files were automatically generated and, using

Condor-G12, a significant number of Monte Carlo simulation jobs were started on the WisconsinCondor flock Each of the jobs produced a data file that was then automatically transferred

to a UniTree mass storage facility at NCSA After all the jobs had finished at sin, a job at NCSA was automatically started to begin a further phase of processing.This being completed, the output was automatically transferred to UniTree and the runwas completed

Wiscon-39.6.2 MOP for Grid-enabled simulation production

The MOP13 (short for ‘CMS Monte Carlo Production’) system was designed to providethe CMS experiment with a means for distributing large numbers of simulation tasksbetween many of the collaborating institutes The MOP system as shown in Figure 39.9comprises task description, task distribution and file collection software layers The GridData Management Pilot (GDMP) system (a Grid-based file copy and replica managementscheme using the Globus Toolkit) is an integral component of MOP, as is the GlobusReplica Catalogue Globus software is also used for task distribution The task scheduler

is the ‘gridified’ version of the Condor scheduler, Condor-G In addition, MOP includes

a set of powerful task control scripts developed at Fermilab

12 See http://www.cs.wisc.edu/condor/condorg/

13

Trang 18

GDMP

Batch Queue

Files

Files Jobs

MOP − Monte Carlo Distributed

Production System for CMS

MOP − Monte Carlo Distributed

Production System for CMS

DAGMan CondorG mop_submitter

IMPALA

bus

G lo b us

GDMP

Figure 39.9 The MOP system, as demonstrated at SuperComputing 2001 In this schematic are shown the software components, and the locations at which they execute Of particular note, is the use of the GDMP Grid tool.

The MOP development goal was to demonstrate that coordination of geographicallydistributed system resources for production was possible using Grid software Along theway, the development and refinement of MOP aided the experiment in evaluating the suit-ability, advantages and shortcomings of various Grid tools MOP developments to supportfuture productions of simulated events at US institutions in CMS are currently underway

Trang 19

Grid-User app clients User browser

Portal Web server Script engine

File management/

data transfer

Job management

Information/

service Accounting Security

Monitoring Event/mesg

User’s workstation

Portal server (Grappa)

Grid services

Resource layer

Figure 39.10 Showing the architecture of the ATLAS ‘GRAPPA’ system.

The GRAPPA14 user authenticates to the portal using a GSI credential; a proxy dential is then stored so that the portal can perform actions on behalf of the user (such

cre-as authenticating jobs to a remote compute resource) The user can access any number

of active notebooks within their notebook database An active notebook encapsulates asession and consists of HTML pages describing the application, forms specifying thejob’s configuration, and Java Python scripts for controlling and managing the execution

of the application These scripts interface with Globus services in the GriPhyN VirtualData Toolkit and have interfaces following the Common Component Architecture (CCA)Forum’s specifications This allows them to interact with and be used in high-performancecomputation and communications frameworks such as Athena

Using the XCAT Science Portal tools, GRAPPA is able to use Globus credentials toperform remote task execution, store user’s parameters for reuse or later modification, andrun the ATLAS Monte Carlo simulation and reconstruction programs Input file stagingand collection of output files from remote sites is handled by GRAPPA Produced filesare registered in a replica catalog provided by the PPDGproduct MAGDA,15 developed

at BNL Job monitoring features include summary reports obtained from requests to theGlobus Resource Allocation Manager (GRAM16) Metadata from job sessions are captured

to describe dataset attributes using the MAGDA catalog

14 See http://iuatlas.physics.indiana.edu/grappa/

15 See http://atlassw1.phy.bnl.gov/magda/info.

16

Trang 20

39.6.4 SAM

The D0 experiment’s data and job management system software, sequential data access viametadata (SAM),17 is an operational prototype of many of the concepts being developedfor Grid computing (see Figure 39.11)

The D0 data-handling system, SAM, was built for the ‘virtual organization’, D0,

con-sisting of 500 physicists from 72 institutions in 18 countries Its purpose is to provide

a worldwide system of shareable computing and storage resources that can be brought

to bear on the common problem of extracting physics results from about a petabyte ofmeasured and simulated data The goal of the system is to provide a large degree of trans-parency to the user who makes requests for datasets (collections) of relevant data andsubmits jobs that execute Monte Carlo simulation, reconstruction or analysis programs onavailable computing resources Transparency in storage and delivery of data is currently

in a more advanced state than transparency in the submission of jobs Programs executed,

in the context of SAM, transform data by consuming data file(s) and producing resultantdata file(s) of different content, that is, in a different ‘data tier’ Data files are read-onlyand are never modified, or versioned

Indicates component that will be replaced

Disk storage elements

LANs and WANs

Resource and services catalog

Replica catalog

Meta-data catalog

Authentication and Security

GSI

Connectivity and Resource

manager Cache

manager Request

manager

‘Dataset Editor’ ‘Project Master’ ‘Station Master’ ‘Station Master’ ‘File Storage Server’

‘Stager’

‘Optimiser’

Code repostory

Name in “quotes” is SAM-given software component name

or added

Figure 39.11 The structure of the D0 experiment’s SAM system.

17

Trang 21

The data-handling and job control services, typical of a data Grid, are provided by acollection of servers using CORBA communication The software components are D0-specific prototypical implementations of some of those identified in Data Grid Architecturedocuments Some of these components will be replaced by ‘standard’ Data Grid compo-nents emanating from the various Grid research projects, including PPDG Others will

be modified to conform to Grid protocols and APIs Additional functional componentsand services will be integrated into the SAM system (This work forms the D0/SAMcomponent of the PPDG project.)

39.7 INTER-GRID COORDINATION

The widespread adoption by the HEP community of Grid technology is a measure of itsapplicability and suitability for the computing models adopted and/or planned by HEPexperiments With this adoption there arose a pressing need for some sort of coordinationbetween all the parties concerned with developing Grid infrastructure and applications.Without coordination, there was a real danger that a Grid deployed in one country, or

by one experiment, might not interoperate with its counterpart elsewhere Hints of thisdanger were initially most visible in the area of conflicting authentication and securitycertificate granting methods and the emergence of several incompatible certificate grantingauthorities To address and resolve such issues, to avoid future problems, and to proceedtoward a mutual knowledge of the various Grid efforts underway in the HEP community,several inter-Grid coordination bodies have been created These organizations are nowfostering multidisciplinary and global collaboration on Grid research and development Afew of the coordinating organizations are described below

39.7.1 HICB

The DataGrid, GriPhyN, iVDGL and PPDG, as well as the national European Grid projects

in UK, Italy, Netherlands and France agreed to coordinate their efforts to design, developand deploy a consistent open source standards-based global Grid infrastructure The coor-dination body is HICB18

The consortia developing Grid systems for current and next generation High-Energyand Nuclear Physics (HENP) experiments, as well as applications in the earth sciencesand biology, recognized that close collaboration and joint development is necessary inorder to meet their mutual scientific and technical goals A framework of joint technicaldevelopment and coordinated management is therefore required to ensure that the systemsdeveloped will interoperate seamlessly to meet the needs of the experiments, and that nosignificant divergences preventing this interoperation will arise in their architecture orimplementation

To that effect, it was agreed that their common efforts would be organized in threemajor areas:

18

Trang 22

• An HENP Inter-Grid Coordination Board (HICB) for high-level coordination,

• A Joint Technical Board (JTB), and

• Common Projects and Task Forces to address needs in specific technical areas.The HICB is thus concerned with ensuring compatibility and interoperability of Gridtools, interfaces and APIs, and organizing task forces, reviews and reporting on specificissues such as networking, architecture, security, and common projects

39.7.2 GLUE

The Grid Laboratory Uniform Environment (GLUE19) collaboration is sponsored by theHICB, and focuses on interoperability between the US Physics Grid Projects (iVDGL, Gri-PhyN and PPDG) and the European physics Grid development projects (EDG, DataTAGetc.) The GLUE management and effort is provided by the iVDGL and DataTAG projects.The GLUE effort reports to and obtains guidance and oversight from the HICB and JTBsdescribed in Section 39.7.1 The GLUE collaboration includes a range of subprojects toaddress various aspects of interoperability:

• Tasks to define, construct, test, and deliver interoperable middleware to and with theGrid projects

• Tasks to help experiments with their intercontinental Grid deployment and operationalissues; establishment of policies and procedures related to interoperability and so on.Since the initial proposal for the GLUE project, the LCG Project Execution Board andSC220 have endorsed the effort as bringing benefit to the project goals of deploying andsupporting global production Grids for the LHC experiments

The GLUE project’s work includes the following:

1 Definition, assembly and testing of core common software components of Grid dleware drawn from EU DataGrid, GriPhyN, PPDG, and others, designed to be part

mid-of the base middleware mid-of the Grids that will be run by each project GLUE will notnecessarily assemble a complete system of middleware, but will choose components

to work on that raise particular issues of interoperability (Other projects may addresssome of these issues in parallel before the GLUE effort does work on them)

2 Ensuring that the EU DataGrid and GriPhyN/PPDG Grid infrastructure will be able

to be configured as a single interoperable Grid for demonstrations and ultimatelyapplication use

3 Experiments will be invited to join the collaboration to build and test their applicationswith the GLUE suite GLUE will work with Grid projects to encourage experiments

to build their Grids using the common Grid software components

Trang 23

in Europe, and related Grid projects in the United States, This will allow the ration of advanced networking technologies and interoperability issues between differentGrid domains.

explo-DataTAG aims to enhance the EU programme of development of Grid-enabled nologies through research and development in the sectors relevant to interoperation ofGrid domains on a global scale In fact, a main goal is the implementation of an exper-imental network infrastructure for a truly high-speed interconnection between individualGrid domains in Europe and in the US, to be shared with a number of EU projects.However, the availability of a high-speed infrastructure is not sufficient, so DataTAG isproposing to explore some forefront research topics such as the design and implementation

tech-of advanced network services for guaranteed traffic delivery, transport protocol tion, efficiency and reliability of network resource utilization, user-perceived applicationperformance, middleware interoperability in multidomain scenarios, and so on

optimiza-The DataTAG project is thus creating a large-scale intercontinental Grid test bed thatwill link the Grid domains This test bed is allowing the project to address and solve theproblems encountered in the high-performance networking sector, and the interoperation

of middleware services in the context of large-scale data-intensive applications

39.7.4 Global Grid Forum

The Global Grid Forum (GGF – http://www.gridforum.org) is a group of individualsengaged in research, development, deployment, and support activities related to Grids

in general (see Figure 39.12) The GGF is divided into working groups tasked withinvestigating a range of research topics related to distributed systems, best practices forthe design and interoperation of distributed systems, and recommendations regarding theimplementation of Grid software Some GGF working groups have evolved to function

as sets of related subgroups, each addressing a particular topic within the scope of theworking group Other GGF working groups have operated with a wider scope, surveying

a broad range of related topics and focusing on long-term research issues This situation

Current Group Type

Figure 39.12 Global Grid forum working groups, as defined in 2001.

Tiêu đề	Data-intensive Grids for High-Energy Physics
Tác giả	Julian J. Bunn, Harvey B. Newman
Trường học	California Institute of Technology
Chuyên ngành	High-Energy Physics
Thể loại	research paper
Năm xuất bản	2003
Thành phố	Pasadena

Định dạng
Số trang	47
Dung lượng	592,54 KB