This document reports on an effort to explore the usage of the web for archiving of “content-rich” material, which we define to be lectures, seminars, or other eventswhich include audio,
Trang 1WLAP: THE WEB LECTURE ARCHIVE PROJECT
The Development of a Web-Based Archive of Lectures, Tutorials,
“content-Contents
1 CERN - the European Organization for Nuclear Research, commonly known as the European Laboratory for Particle Physics.
Trang 21 Introduction 3
2 Project Motivation 4
2.1 Communication in modern high-energy physics experiments 4
2.2 Enhancing learning capability and dissemination of education and training 4
3 Project Implementation 5
3.1 The pilot project 5
3.2 The archive application 5
3.3 The archive process 7
3.4 The WLAP archive 8
3.4.1 The CERN WLAP archive 8
3.4.2 The ATLAS GEANT4 workshop 9
4 Details of the Implementation 9
4.1 Audio and video capture 9
4.2 Scenarios for handling the visual support material 10
5 The Lecture Object 13
5.1 Lecture Object Architecture 14
5.2 Draft Specification 15
5.3 Distributed Architecture 17
5.4 Prototype developed 17
5.5 Advantages of standardization 18
6 Other Web Lecture Archives and Technologies 18
6.1 Other archives 18
6.2 Other technologies 19
7 Planned Future Applications and R/D 20
7.1 ATLAS 20
7.2 CERN Particle Physics Distance Education Program 21
7.3 Web accessible Basic Safety Training 21
7.4 Planned Future Technology Development 21
8 Conclusions 22
9 Acknowledgements 23
10 References 23
Bibliography 25
Trademark Notice
All trademarks appearing in this document are acknowledged as such
Trang 31 Introduction
The primary motivation for the creation of the World-Wide Web was thefacilitation of collaboration between scientists [1] There was a need for a better wayfor scientists to rapidly exchange large amounts of information, ranging fromexperimental data and results of analyses to organizational and strategic details related
to ongoing experiments The rapid proliferation of the web and web-relatedapplications, as well as the ever-increasing size and international scope of scientificcollaborations, has by now clearly demonstrated the value of the web as a common andnecessary tool for research In addition, it has enhanced the dissemination of scientificknowledge to the general public through the publication of online documents and otherweb-based media
This document reports on an effort to explore the usage of the web for archiving of
“content-rich” material, which we define to be lectures, seminars, or other eventswhich include audio, video and visual support materials The work targets a segment ofthe tasks that must be completed for scientists and others to optimally draw upon theweb for transmitting information for training and archival purposes, as well as forkeeping colleagues informed of strategic, technical and administrative decisions.CERN was chosen as a focal point for this research because of its historicalparticipation in web development, its continuing role as a center for scientific researchand information exchange, its rich education and training programs, and the newchallenges it faces during the current construction and future running of the nextgeneration of experiments for the Large Hadron Collider (LHC) These experimentswill be run by teams of sizes heretofore unseen in most sectors of the scientificcommunity, with thousands of members literally spread around the globe
The involvement of the University of Michigan in the CERN ATLAS experiment,
as one of that experiment’s largest groups, is one of the reasons for its interest in thisproject Augmenting this reason are the roles played by the University of Michigan inthe inauguration of U.S participation in the CERN Summer Student program, alongwith its affiliation with Internet2, and its work in bringing CERN into Internet2 Thesereasons, together with the presence of pioneers at the University in the development ofmulti-media educational tools, all provided a shared rationale and stimulus for theUniversity of Michigan and CERN to examine the possible future role of web-basedarchiving in the general area of highly collaborative large-scale research involvinguniversities and international laboratories
With this background, a major focus of our efforts has been to investigate how tobest facilitate the work of large, globally dispersed scientific collaborations Anotherhas been to study how to best reinforce CERN’s education and training programs andmake them accessible to as wide a community as possible
This paper seeks to examine the relevance of web-based archiving to this set ofchallenges We approach the topic by examining how we have used web-basedarchiving to record a series of content-rich presentations at CERN over the past twoyears The issues covered range from the technical details of how such recordings aremade, to questions of how the technology can be improved, and how such materialcould be confederated to address certain larger goals
Trang 42 Project Motivation
In this section, we describe the principal motivations for our present study Though
we cite specific applications, the results presented herein have a clear relevance for avariety of scientific fields and educational venues
2.1 Communication in modern high-energy physics experiments
A prime motivation of the WLAP project was the hope that web-based archivingtechnology could address some of the key challenges that face the high-energy physicscommunity To understand these challenges, it is instructive to consider the anatomy of
a modern high-energy physics experiment Once a set of physics goals has beenestablished for an experiment, the achievement of these goals requires the massivegeneration and refinement of novel ideas for how to solve the myriad of attendanttechnical problems A talented set of individuals must be assembled to assimilate theseideas, to design and build the detector components, and to integrate the componentsinto the overall detector, a process that can span a decade and involve extensivecommunication among thousands of experts in numerous countries The runningdetector must be maintained and monitored and the resulting data must be analyzed, anactivity that may involve yet another decade The required funding must be repeatedlyapplied for, dispersed, expended, accounted for and reported on At every stageextensive communication is required among dispersed participants In such scientificenterprises the communications aspect can rapidly become as daunting as the scientificand technical challenges themselves
LHC experimental collaborations will have hundreds of participating universitygroups Depending upon the precise responsibilities accepted by a group, it may havethe need to interact daily with colleagues at a half-dozen other universities Manypractical questions quickly arise For example, how does one convene frequentmeetings with colleagues in real-time on modest budgets, when time differences of asmuch as 12 hours are involved for the participants, many of whom have majorresponsibilities in addition to those that form the focus of the meetings? How does oneoffer a tutorial to two thousand colleagues on some paradigm he or she has developed,when initially there are only one or two true “expert peers” on a given topic in theentire collaboration and it is your job to train all of the others?
Since many of the large experiments may run for as long as twenty years,involving numerous generations of Ph.D students, how is information recorded andpassed on to subsequent generations? When major talks are given on results from arunning experiment by the author of a particular analysis, how does that informationget captured and made available to members of the collaboration who may only be able
to access the talk hours (or years) later? How do findings and major strides emergingfrom these experiments get captured and interwoven into classroom materials for laterpresentation?
These are but a few of the questions that arise in the conduct of large high-energyphysics experiments Given the nature of the World-Wide Web and the originalfunction seen for it, one would naturally be led to inquire if the Web itself mightprovide possible solutions to facilitate the communication requirements of the verylarge experiments it had helped make scientifically possible
2.2 Enhancing learning capability and dissemination of education and training
There is another motivation for our work in the area of web lecture archiving: itspotential use in education and training Traditional lectures and seminars follow asequential pattern in which the lecturer prepares a presentation and delivers it, often
Trang 5accompanied by visual support material The delivery mechanism can vary in style,with the lecturer using different techniques for displaying the visual support material,for example, an overhead projector, a computer slide projection or a blackboard.Questions may be taken during the presentation, at the end, or not at all In each case,students must rely on their notes and/or a copy of the support material to recall the keypoints of the lecture at a later date.
People who are unable to attend or miss a session have to make do with a copy ofthe visual support material when it is available and even in the best of cases find it verydifficult to reconstruct in detail what has been presented verbally
Having access to some form of audio/video reproduction of the original lecture,however, can greatly facilitate the learning process and allow many more people, inaddition to those who physically attend, to benefit Such a reproduction can exist in avariety of media, including audio or video recordings Unfortunately, the dissemination
of the material on audio/video tapes is cumbersome, thereby limiting access
Again, recent technological developments based on the accessibility of the Internetand the widespread utilization of the World-Wide Web lead us to conclude that thesedifficulties can now be overcome
3 Project Implementation
3.1 The pilot project
The Web Lecture Archiving Project (WLAP) activity started in 1999 as a project [2] funded by the U.S National Science Foundation and the University ofMichigan (UM) The primary aim was to examine the feasibility of using a softwaretool, called Sync-O-Matic [3] , to record and archive slide-based lectures in a variety ofsituations Following the success of the pilot project [4] , a collaboration was formedbetween the CERN HR Division Training and Development group [5] , the UM ATLASGroup [6] and the UM Media Union [7] , supported by CERN IT Division Theobjective was to demonstrate the feasibility and usefulness of archiving lectures,seminars, tutorials, training sessions and plenary sessions of ATLAS experimentmeetings by focusing first on the archiving of the prestigious CERN Summer StudentLectures
pilot-3.2 The archive application
The Sync-O-Matic application, successfully tested at CERN during the pilotproject described above, was adopted for the implementation of the joint project Sync-O-Matic was written by Charles Severance, then Associate Director of the University
of Michigan Media Union Documentation describing the software can be found on itsweb site [3] It is freely available and there is a mailing list of users and developerswho can be contacted for help and support
Sync-O-Matic produces slide-based web-lectures for viewing with a standard webbrowser and the freely available RealPlayer plug-in Its output is a multi-media lecturethat combines the audio and video playback of the lecturer with digital images of thevisual support material, synchronized to the video, and displayed in a browser window.Figure 1 illustrates an archived lecture Note that the video and slide indexes can beused to rapidly locate sections of the lecture or to review the slides in order to selectspecific topics
Trang 6Figure 1: A typical archived lecture as viewed from a web browser The video image of the speaker appears in the
upper left-hand corner of the page The visual support material (in this case, scanned transparencies) appears in the large window on the right The changing of the transparencies is synchronized to the timing of the video.
These important features distinguish Sync-O-Matic from the historically commonapproach to lecture archiving based on a video recording of the event which combinesviews of the speaker and visual support material Such a video is necessarily acompromise, as a choice has to be made between focussing on the lecturer or supportmaterial, or more precisely part of this material to facilitate readability The cameraoperator therefore makes the choice for the viewer as to what is of primary interest atany time This is in marked contrast to a Sync-O-Matic lecture in which the speakerand support material are always in view
Another drawback of a standard video is that although good video resolution canindeed reproduce the slides in a readable format, it does so only at a significant cost ofnetwork bandwidth and archive size For example, we find that using MPEG-1, abandwidth of 1500 Kb/s is necessary to ensure readability using historical approaches,
to be compared to a typical Sync-O-Matic archive that can be readable at 50 Kb/s.Regardless of advances to the technology, there will always be some inefficiencyintroduced by the transmission of a video stream rather than a fixed image In addition,the video stream lacks the slide preview and rapid search/location functionalityprovided by the indexed Sync-O-Matic archive As we will discuss below, suchindexing could also be exploited for the development of web-based lecture databasesand search engines
Trang 73.3 The archive process
Sync-O-Matic was originally designed for use by an individual teacher, operatingalone or in a staffed distance-learning studio, using Microsoft PowerPoint materials[12] In this mode, Sync-O-Matic imports a PowerPoint file and converts the slides toGIF/JPEG images as shown in
While the teacher gives the lecture, Sync-O-Matic records the audio and videofrom a microphone and camera using the RealProducer [8] ActiveX control As thelecturer changes slides, Sync-O-Matic records those actions and the timing of eachaction in internal text files
Figure 2: Sync-O-Matic standard operational procedure The speaker uses Syncomatic on a PC connected to a
camera and a microphone The speaker loads the PowerPoint file into Syncomat, and then starts recording the audio and video, and changes the slides as he/she talks The capturing component of Syncomatic creates internal files with the audio, video, slides, and timing information Once the recording is over, the speaker can publish the web lecture in a format suitable for a CD-ROM or for the Web The “Style files” can be edited to change the structure and the “look and feel” of the web lectures.
At the end of the presentation, two archived lectures are produced, one suitable forviewing with a standard web browser from CD-ROM, and another suitable for viewingdirectly from a web server The look and feel of the resulting lectures is controlledusing Sync-O-Matic specific “style files” These style files are written in HTML withsome Sync-O-Matic specific markup included The result of the publishing process is adirectory with several HTML files, text files used to allow random navigation of thelecture, and the media files
The media files make up the bulk of the disk usage An average quality video andhigh quality audio (160x120 pixels, 24kps video + 16kbps audio) lecture requires about
36 Mbytes per hour of lecture This relatively small amount of disk storage allows
15-20 hours of lectures to be stored on a single CD, and a large number on a web serverwithout resorting to exotic storage technology To watch a web lecture the user can useany popular web browser The first time the user views a lecture, it may be necessary
to download and install the RealPlayer software (the free version is sufficient),although this software is now commonly bundled with most web browsers JavaScript
is not required but some advanced navigation features are available if it is enabled.Although adequate to meet the requirements of a lecturer in a teachingenvironment as described above, one of the major challenges of the WLAP project was
to extend the functionality of Sync-O-Matic beyond its original design in order tohandle the demands of live lectures and to cope with a number of challenging recording
Sync-O-Matic 2000
Capturing component
Publishing component
Web version
CD version
Style files
Trang 8scenarios, such as hand-written transparencies and blackboard material The variousscenarios encountered and the solutions developed are detailed in section 4 below.
3.4 The WLAP archive
During the pilot project, a significant portion of the 1999 CERN Summer StudentLecture Series was recorded and made available to the participants and to the generalpublic As a result, significant demand was generated at CERN and throughout thephysics community to continue the recordings on a regular basis
The CERN HR Division Technical Training group, supported by the CERNAmphitheatre technical support team, took up the challenge of demonstrating thefeasibility of recording a wide variety of lectures, while simultaneously investigatingand applying technical advancements to simplify the process and to reduce themanpower needs Over the next 15 months many important CERN colloquia, TechnicalTraining seminars, Academic and Summer Student Program lectures, and softwaretraining tutorials were recorded, either by request, or for the purposes of testing thetechnology under a variety of recording conditions
So successful was this effort that since January 2001 this activity has been takenover by a team from the CERN ETT Division who have not only recorded all Academicand Summer Student Program lectures since that date, but have also developed andimproved the operational procedure Several steps of the file manipulation process havebeen automated and the lecture 'events' are now integrated into the ETT developedCERN event calendar system
The ATLAS Collaboration served as a test-bed for much of the material recorded,placing the focus on the challenges of a large-scale, globally dispersed scientificcollaboration The events archived for ATLAS include collaboration meetings, plenarysessions, subsystem workshops and tutorials In addition, the collaboration profitedfrom its members accessing and providing feedback on all of the archived lectures Thefeedback was, in general, very positive, and often resulted in requests for a greatlyextended service Nearly all suggestions regarding technical improvements wereeventually implemented to the recording and archiving procedures
3.4.1 The CERN WLAP archive
One of the reasons mentioned for choosing CERN as the target of the project is therichness of its physics program Indeed, the project team quickly found the number andfrequency of interesting recording opportunities to exceed its ability to record,regardless of the facility of the process Given the modest resources, however, asignificant number of lectures were archived and published on the WLAP site, covering
a large spectrum of laboratory activities
The current archive, which is growing literally every day, comprises more than
400 lectures Among the more notable archives are colloquia by Prof MartinusVeltman on the history of the Standard Model [9] , presented in the CERN auditoriumshortly following his reception of the 1999 Nobel Prize in Physics, and by Dr PaulKunz on the birth of the World-Wide Web, its early stages of development at CERN,and his involvement setting up the first web server in America [10]
In Table 1, we briefly summarize the current content of the WLAP archive at
CERN2 The full catalogue can be viewed at the WLAP archive web site http://www.cern.ch/WLAP/
-2 For completeness the table includes lectures from the Academic Training Program 2000/01 and Summer Student Program 2001 recorded by ETT Division since January
2001 which are not strictly part of the WLAP archive
Trang 9Table 1: Summary of the current contents of the WLAP lecture archive
Academic Training Program Lectures 99/00, 00/01 109
Summer Student Lectures 1999, 2000 & 2001 36+95+66Technical Training & Safety Seminars 7
3.4.2 The ATLAS GEANT4 workshop
In addition to the events recorded at CERN, the University of Michigan members
of the team archived a series of lectures given by Andrea dell’Acqua at an sponsored workshop [11] on GEANT4 held on the Ann Arbor campus
ATLAS-GEANT4, a recently completed re-write in C++ of the well-tested GEANT3application, is a software package for simulating the passage of particles throughmaterial It is currently being tested for use by the LHC experiments
Usage of the software requires a significant effort to bring the physics community,typically well versed in the usage of FORTRAN, up to speed in both the syntax of C++and the concepts of object-oriented analysis and design
Andrea dell’Acqua is one of the key developers of the ATLAS simulationsoftware It was clearly advantageous at this time for him to make the effort to trainnew users and developers in its usage In the future, new collaboration members, orthose just beginning to contribute to the software after having completed contributions
to other aspects of the detector construction, will be able to access the archivedlectures at a web-based training site [11] which will include documentation, problemsand solutions, and a listing of frequently asked questions with answers
4 Details of the Implementation
Contrary to the original design concept of Sync-O-Matic, the bulk of the lecturesrecorded for the WLAP archive were not prepared in advance using MicrosoftPowerPoint, nor were they presented in a controlled environment with the main focusbeing the production of a quality lecture archive Rather, the lectures were presented to
a live audience using a variety of visual media and the lecturer was insulated as much
as possible from the recording process Given these constraints, it was necessary forthe archive team to develop a number of new operational procedures for the production
of a quality archive, using a reasonable amount of resources and manpower In thissection, we present a summary of these procedures
4.1 Audio and video capture
Because high quality audio and video are essential for producing good streamingmedia, we separately taped the lectures with a high quality wireless microphone andallowed the camera operator to concentrate on the video Initially the video wasencoded from tape but it was soon realized that live encoding, with tape as backup, wasnot only feasible but saved a significant amount of archive production time
Our target audience was identified from the outset as being world wide, withaccess to the Internet provided via research network backbones, as well as using themost currently standard analog modems from home Based on this, we chose to deliver
a total bandwidth of 40 Kb/s, divided into 24 Kb/s for video and 16 Kb/s for audiostreaming, in order to ensure high-quality audio In order to optimize usage of the
Trang 10client bandwidth, we encoded using the RealProducer Surestream technology, whichdelivers reduced quality video if necessary to maintain sound quality.
One slightly negative impact of using Surestream for encoding is that the mediafiles becomes somewhat larger (approximately 40 MB per lecture hour), decreasing theaverage number of lectures that can fit on a CD-ROM to about 15 This is not deemed
to be a crucial problem, however, as the vast majority of access to the lectures is direct,via the web-server Rather than handing out a CD-ROM to each student at the end ofthe summer, as was the practice for the pilot project, CD-ROM’s of specific lectureswere provided to any user, upon request
4.2 Scenarios for handling the visual support material
Presenters differ in the way that they choose to display their visual supportmaterial and the techniques encountered ranged from electronic presentations withPowerPoint, through transparencies and overhead projector, to chalk and a blackboard.For each scenario the challenge was to record the order in which the information waspresented and the time of display in order to synchronize with the audio and video.Furthermore, we chose not to burden the speaker with the technical aspects of startingSync-O-Matic at the right moment and learning to use its interface in order to givetheir lecture
For the pilot project, the camera operator recorded the timing of the transparencychanges during the lecture in a notebook During the production of the Sync-O-Maticarchive following the event, this information was entered into the formatted text filesinternal to the software For presentations given on other media than PowerPoint, theslide images were converted to GIF format (scanning overhead transparencies, ifnecessary) and then placed in the appropriate subdirectories for production of thearchive This process was relatively time-consuming, requiring between two and fivehours of post-production work for each hour of presentation
During the Summer Student Lecture Series of 2000, efforts were focused onreducing the time and manpower required to produce the archived lecture The simpletext format of the Sync-O-Matic internal files made it possible for us to create severalsoftware macros, loaded as background processes on the presentation PC, to capturethe timing of the slide changes, during the presentation
In the simplest case, shown in Figure 3, when the speaker used Microsoft
PowerPoint for the presentation, we developed a VBA macro called CarpePpt [13] to
capture the timing when the speaker changed the slide in the PowerPoint viewer Themacro is started well before the presentation and is completely transparent to thespeaker who uses PowerPoint as usual to show the slides At the end of thepresentation, a file is generated containing the timing information in the Sync-O-Maticformat This file is then used by Sync-O-Matic to publish the resulting lecture
Publishing component
Internal files
Trang 11Figure 3: Operational procedure for the PowerPoint scenario The speaker is not involved in the recording
operations The audio/video signal coming from a camera is encoded by a computer using RealProducer, that generates a RealVideo file CarpePpt captures the timing transparently for the speaker who uses PowerPoint normally to show the slides, and generates a timing file in the Syncomatic internal format In post-production, the video file and the timing file are copied and named as if they were produced by Syncomatic The publishing component generates, in the same way as in Figure 1, the Web lecture in the formats suitable for the Web and for the CD.
When speakers used Postscript [14] or Adobe Acrobat [15] for their prepared slides
we developed another tool called CarpePdf [16] to capture the timing information
during the lecture This method was also used when the speaker was able to make thetransparencies available in advance so that they could be scanned and converted intoPDF for the presentation In this case, the scanned images were also used to generateGIF files for the Sync-O-Matic archive
The real challenge is when the transparencies are not made available in advance orwhen the presenter writes on a blackboard In these cases, we used a second videocamera to record images from the display screen or the blackboard and we entered thetiming values by hand during the post-talk production using a Sync-O-Matic featurecalled TimeIT The images were eventually replaced with higher quality scannedimages of the transparencies or still pictures of the blackboard, when they wereavailable after the presentation
In 2000, Charles Severance developed a new product called ClipBoard-2000 [17] ClipBoard produces files that are compatible with Sync-O-Matic’s publishing processand introduces many new features, including two-camera support For the challengingcase discussed above, one camera could now be used to capture the video of thespeaker with the second camera dedicated to the overhead screen or blackboard Atechnician then pressed a button on the PC to capture a high quality image each timethe speaker changed the slide, showed an object, or drew on the blackboard
The introduction of ClipBoard not only aided in automating the recording process,but also in the post-talk production, when the high quality scanned images weresubstituted for the preliminary screen snapshots Because many speakers give theirpresentation in a non-sequential fashion, skipping from slide to slide or moving fromthe slide to the blackboard and back, the image substitution can be slightlycomplicated To aid in this process, Giosue Vitaglione developed a tool called Snaps-O-Matic [18] Snaps-O-Matic reads in the Sync-O-Matic internal files, the scannedtransparencies, and the snapshot images and presents them to the publisher in agraphical interface From the interface, the publisher can drag and drop the scannedtransparencies to the snapshots Upon completion, Snaps-O-Matic writes a Sync-O-Matic intermediate file that describes the new lecture The whole process for thisscenario is shown in detail in Figure 4 and Table 2 summarizes the complete set ofscenarios
While the tools described above do significantly aid in the recording andproduction processes, it remains the case that lectures presented on non-digital mediarequire some amount of post-talk production work Typically, this can be estimated as 1FTE-hour for each hour of lecture One could imagine using new devices, such asautomatic scanning projectors and timing mechanisms, but they do not (yet) exist inthe CERN Auditorium
For important events, such as the presentation of a Nobel Laureate at CERN, webelieve the value of the final product certainly justifies the effort For lower profile
Sync-O-Matic 2000
PPT=>GIF conversion
Trang 12events, conference conveners may consider requesting either digital presentations orthe submission of the non-digital media before the event, with adequate time forscanning and preparation This will guarantee immediate publication of the material,following the presentations This is a significant asset for audiences who would like toparticipate in or follow an event, but who are either unable to attend locally or viavideoconference, or who are separated from the event location by several time-zones.
Table 2 Summary of the various scenarios encountered for visual support material and the techniques employed
to record the data