The objective of Relational Management and Display of Site Environmental Data is to bring together in one place the information necessary to manage the data well, so everyone, fromstuden
Trang 1ENVIRONMENTAL DATA
Trang 3This book contains information obtained from authentic and highly regarded sources Reprinted material is quoted with permission, and sources are indicated A wide variety of references are listed Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials
or for the consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher.
The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works,
or for resale Specific permission must be obtained in writing from CRC Press LLC for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W Corporate Blvd., Boca Raton, Florida 33431
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation, without intent to infringe.
Visit the CRC Press Web site at www.crcpress.com
© 2002 by CRC Press LLC Lewis Publishers is an imprint of CRC Press LLC
No claim to original U.S Government works International Standard Book Number 1-56670-591-6 Library of Congress Card Number 2002019441 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0
Printed on acid-free paper
Library of Congress Cataloging-in-Publication Data
Rich, David William,
1952-Relational management and display of site environmental data / David W Rich.
p cm.
Includes bibliographical references and index.
ISBN 1-56670-591-6 (alk paper)
1 Pollution—Measurement—Data processing 2 Environmental monitoring—Data
processing 3 Database management I Title.
TD193 R53 2002
Trang 4The environmental industry is changing, along with the way it manages data Many projectsare making a transition from investigation through remediation to ongoing monitoring Datamanagement is evolving from individual custom systems for each project to standardized,centralized databases, and many organizations are starting to realize the cost savings of this
approach The objective of Relational Management and Display of Site Environmental Data is to
bring together in one place the information necessary to manage the data well, so everyone, fromstudents to project managers, can learn how to benefit from better data management
This book has come from many sources It started out as a set of course notes to help transferknowledge about earth science computing and especially environmental data management to ourclients as part of our software and consulting practice While it is still used for that purpose, it hasevolved into a synthesis of theory and a relation of experience in working with site environmentaldata It is not intended to be the last word on the way things are or should be done, but rather tohelp people learn from the experience of others, and avoid mistakes whenever possible
The book has six main sections plus appendices Part One provides an overview of the subjectand some general concepts, including a discussion of system data content Part Two covers systemdesign and implementation, including database elements, user interface issues, and implementationand operation of the system Part Three addresses gathering the data, starting with an overview ofsite investigation and remediation, progressing through gathering samples in the field, and endingwith laboratory analysis Part Four covers the data management process, including importing,editing, maintaining data quality, and managing multiple projects Part Five is about using the dataonce it is in the database It starts with selecting data, and then covers various aspects of dataoutput and analysis including reporting and display; graphs; cross sections and similar displays; alarge chapter on mapping and GIS; statistical analysis; and integration with other programs.Section Six discusses problems, benefits, and successes with implementing a site environmentaldata management system, along with an attempt to look into the future of data management andenvironmental projects Appendices include examples of a needs assessment, a data model, a datatransfer standard, typical constituent parameters, some exercises, a glossary, and a bibliography
A number of people have contributed directly and indirectly to this book, including myparents, Dr Robert and Audrey Rich; Dr William Fairley, my uncle and professor of geology atthe University of Notre Dame; and Dr Albert Carozzi, my advisor and friend at the University ofIllinois Numerous coworkers and friends at Texaco, Inc., Shell Oil Company, Sabine Corporation,Grant Environmental, and Geotech Computer Systems, Inc helped bring me to the pointprofessionally where I could write this book These include Larry Ratliff, Jim Thomson, Dr James
L Grant, Neil Geitner, Steve Wampler, Jim Quin, Cathryn Stewart, Bill Thoen, Judy Mitchell, Dr.Mike Wiley, and other Geotech staff members who helped with the book in various ways Friends
Trang 5in other organizations have also helped me greatly in this process, including Jim Reed ofRockWare, Tom Bresnahan of Golden Software, and other early members of the ComputerOriented Geological Society Thanks also go to Dr William Ganus, Roy Widmann, SherronHendricks, and Frank Schultz of Kerr-McGee for their guidance.
I would also like to specifically thank those who reviewed all or part of the book, includingCathryn Stewart (AquAeTer), Bill Thoen (GISNet), Mike Keester (Oklahoma State University),Bill Ganus and Roy Widmann (Kerr-McGee), Mike Wiley (The Consulting Operation), and SueStefanosky and Steve Clough (Roy F Weston) The improvements are theirs The errors are stillmine
Finally, my wife, business partner, and best friend, Toni Rich, has supported me throughout
my career, hanging in there through the good times and bad, and has always done what she could tomake our enterprise successful She’s also a great proofreader
Throughout this book a number of trademarks and registered trademarks are used Theregistered trademarks are registered in the United States, and may be registered in other countries.Any omissions are unintentional and will be remedied in later editions Enviro Data and Spase areregistered trademarks of Geotech Computer Systems, Incorporated Microsoft, Office, Windows,
NT, Access, SQL Server, Visual Basic, Excel, and FoxPro are trademarks or registered trademarks
of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation Paradox anddBase are registered trademarks of Borland International, Incorporated IBM and DB2 areregistered trademarks of International Business Machines Corporation AutoCAD and AutoCADMap are registered trademarks of Autodesk, Incorporated ArcView is a registered trademark ofEnvironmental Systems Research Institute, Incorporated Norton Ghost is a trademark of SymantecCorporation Apple and Macintosh are registered trademarks of Apple Computer, Incorporated.Sun is a registered trademark and Sparcstation is a trademark of Sun Microsystems CapabilityMaturity Model and CMM are registered trademarks of The Software Engineering Institute ofCarnegie Mellon University Adobe and Acrobat are registered trademarks of Adobe Systems.Grapher is a trademark and Surfer is a registered trademark of Golden Software, Inc RockWare is
a registered trademark and RockWorks and Gridzo are trademarks of RockWare, Inc Intergraphand GeoMedia are trademarks of Intergraph Corporation Corel is a trademark and Corel Draw is aregistered trademark of Corel Corporation UNIX is a registered trademark of The Open Group.Linux is a trademark of Linus Torvalds Use of these products is for illustration only, and does notsignify endorsement by the author
A Web site has been established for updates, exercises, and other information related to thisbook It is located at www.geotech.com/relman
I welcome your comments and questions I can be reached by email at drdave@geotech.com
David W Rich
Trang 6David W Rich is founder and president of Geotech Computer Systems, Inc in Englewood,
CO Geotech provides off-the-shelf and custom software and consulting services for environmentaldata management, GIS, and other technical computing projects Dr Rich received his B.S inGeology from the University of Notre Dame in 1974, and his M.S and Ph.D in Geology from theUniversity of Illinois in 1977 and 1979, with his dissertation on “Porosity in Oolitic Limestones.”
He worked for Texaco, Inc in Tulsa, OK and Shell Oil Company in Houston, TX, exploring for oiland gas in Illinois and Oklahoma He then moved to Sabine Corporation in Denver, CO as part of ateam that successfully explored for oil in the Minnelusa Formation in the Powder River Basin ofWyoming He directed the data management and graphics groups at Grant Environmental inEnglewood, CO where he worked on several projects involving soil and groundwater contaminatedwith metals, organics, and radiologic constituents His team created automated systems formapping and cross section generation directly from a database In 1986 he founded GeotechComputer Systems, Inc., where he has developed and supervised the development of custom andcommercial software for data management, GIS, statistics, and Web data access
Environmental projects with which Dr Rich has been directly involved include two Superfundwood treating sites, three radioactive material processing facilities, two hazardous waste disposalfacilities, many municipal solid waste landfills, two petroleum refineries, and several mining andpetroleum production and transportation projects He has been the lead developer on three publichealth projects involving blood lead and related data, including detailed residential environmentalmeasurements In addition he has been involved in many projects outside of the environmentalfield, including a real-time Web-based weather mapping system, an agricultural GIS analysis tool,and database systems for petroleum exploration and production data, paleontological data, landownership, health care tracking, parts inventory and invoice printing, and GPS data capture
Dr Rich has been using computers since 1970, and has been applying them to earth scienceproblems since 1975 He was a co-founder and president of the Computer Oriented GeologicalSociety in the early 1980s, and has authored or co-authored more than a dozen technical papers,book chapters, and journal articles on environmental and petroleum data management, geology,and computer applications He has taught many short courses on geological and environmentalcomputing in several countries, and has given dozens of talks at various industry conventions andother events
When he is not working, Dr Rich enjoys spending time with his family and riding hismotorcycle in the mountains, and often both at the same time
Trang 7PART ONE - OVERVIEW AND CONCEPTS
CHAPTER 1 - OVERVIEW OF ENVIRONMENTAL DATA MANAGEMENT Concern for the environment
The computer revolution
Convergence - Environmental data management
Concept of data vs information
EMS vs EMIS vs EDMS
CHAPTER 2 - SITE DATA MANAGEMENT CONCEPTS
Purpose of data management
Types of data storage
Responsibility for data management
Understanding the data
CHAPTER 3 - RELATIONAL DATA MANAGEMENT THEORY
What is relational data management?
History of relational data management
Data normalization
Structured Query Language
Benefits of normalization
Automated normalization
CHAPTER 4 - DATA CONTENT
Data content overview
Project technical data
Project administrative data
Project document data
Reference data
Document management
PART TWO - SYSTEM DESIGN AND IMPLEMENTATION
CHAPTER 5 - GENERAL DESIGN ISSUES
Database management software
Trang 8Database location options
Distributed vs centralized databases
The data model
Data access requirements
Government EDMS systems
Other issues
CHAPTER 6 - DATABASE ELEMENTS
Hardware and software components
Units of data storage
Databases and files
Tables (“databases”)
Fields (columns)
Records (rows)
Queries (views)
Other database objects
CHAPTER 7 - THE USER INTERFACE
General user interface issues
Conceptual guidelines
Guidelines for specific elements
Documentation
CHAPTER 8 - IMPLEMENTING THE DATABASE SYSTEM
Designing the system
Buy or build?
Implementing the system
Managing the system
CHAPTER 9 - ONGOING DATA MANAGEMENT ACTIVITIES Managing the workflow
Managing the data
Administering the system
PART THREE - GATHERING ENVIRONMENTAL DATA
CHAPTER 10 - SITE INVESTIGATION AND REMEDIATION Overview of environmental regulations
The investigation and remediation process
Environmental Assessments and Environmental Impact Statements CHAPTER 11 - GATHERING SAMPLES AND DATA IN THE FIELD General sampling issues
Trang 9Other analysis issues
PART FOUR - MAINTAINING THE DATA
CHAPTER 13 - IMPORTING DATA
QC samples and analyses
Data quality procedures
Database support for data quality and usability
Precision vs accuracy
Protection from loss
CHAPTER 16 - DATA VERIFICATION AND VALIDATION
Types of data review
Meaning of verification
Meaning of validation
The verification and validation process
Verification and validation checks
Software assistance with verification and validation
CHAPTER 17 - MANAGING MULTIPLE PROJECTS AND DATABASES One file or many?
Sharing data elements
Moving between databases
Limiting site access
PART FIVE - USING THE DATA
CHAPTER 18 - DATA SELECTION
Trang 10Fence diagrams and stick displays
Block Diagrams and 3-D displays
CHAPTER 22 - MAPPING AND GIS
Types of statistical analyses
Outliers and comparison with limits
Toxicology and risk assessment
CHAPTER 24 - INTEGRATION WITH OTHER PROGRAMS
PART SIX - PROBLEMS, BENEFITS, AND SUCCESSES
CHAPTER 25 - AVOIDING PROBLEMS
Manage expectations
Use the right tool
Prepare for problems with the data
Plan project administration
Increasing the chance of a positive outcome
Trang 11CHAPTER 26 - SUCCESS STORIES
APPENDIX A - NEEDS ASSESSMENT EXAMPLE
APPENDIX B - DATA MODEL EXAMPLE
Database redesign exercise
Data normalization exercise
Group discussion - data management and your organization
Database redesign exercise solution
Data normalization exercise solution
Database software exercises
APPENDIX F - GLOSSARY
APPENDIX G - BIBLIOGRAPHY
Trang 12PART ONE - OVERVIEW AND
CONCEPTS
Trang 13CONCERN FOR THE ENVIRONMENT
The United States federal government has been regulating human impact on the environmentfor over a century Section 13 of the River and Harbor Act of 1899 made it unlawful (with someexceptions) to put any refuse matter into navigable waters (Mackenthun, 1998, p 20) Since thenhundreds of additional laws have been enacted to protect the environment This regulation occurs
at all levels of government from international treaties, through federal and state governments, toindividual municipalities Often this situation of multiple regulatory oversight results in a maze ofregulations that makes even legitimate efforts to improve the situation difficult, but it has definitelyincreased the effort to clean up the environment and keep it clean
Through the 1950s the general public had very little awareness or concern aboutenvironmental issues In the 1960s concern for the environment began to grow, helped at least
some by the book Silent Spring by Rachel Carson (Carson, 1962) The ongoing significance of this
book is highlighted by the fact that a 1994 edition of the book has a foreword by then VicePresident Al Gore In this book Ms Carson brought attention to the widespread and sometimesindiscriminate use of DDT and other chlorinated hydrocarbons, organic phosphates, arsenic, andother materials, and the impact of this use on ground and surface water, soil, plants, and animals.She cites examples of workers overcome by exposure to large doses of chemicals, and changes inanimal populations after use of these chemicals, to build the case that widespread use of thesematerials is harmful She also discusses the link between these chemicals and cancer
Trang 14Rachel Carson’s message about concern for the environment came at a time, the 1960s, whenAmerica was ready for a “back-to-the-earth” message With the youth of America and othersorganizing to oppose the war in Vietnam, the two causes fit well together and encouraged eachother’s growth This was reflected in the music of the time, with many songs in the sixties andseventies discussing environmental issues, often combined with sentiments against the war andnuclear power The war in Vietnam ended, but the environmental movement lives on.
There are many examples of rock songs of the sixties and seventies discussing environmental
issues In 1968 the rock musical Hair warned about the health effects of sulfur dioxide and carbon monoxide Zager and Evans in their 1969 song In The Year 2525 talked about taking from the earth
and not giving back, and in 1970 the Temptations discussed air pollution and many other social
issues in Ball of Confusion Three Dog Night also warned about air pollution in their 1970 songs Cowboy and Out in the Country Perhaps the best example of a song about the environment is Marvin Gaye’s 1971 song Mercy Mercy Me (The Ecology), in which he talked about oil polluting
the ocean, mercury in fish, and radiation in the air and underground In 1975 Joni Mitchell told
farmers not to use DDT in her song Big Yellow Taxi, and the incomparable songwriter Bob Dylan got into the act with his 1976 song A Hard Rain’s A-gonna Fall, warning about poison in water
and global hunger It’s not a coincidence that this time frame overlaps all of the significant earlyenvironmental regulations
A good example of an organized environmental effort that started in those days and continuestoday is Earth Day Organized by Senator Gaylord Nelson and patterned after teach-ins against thewar in Vietnam, the first Earth Day was held on April 22, 1970, and an estimated 20 million peoplearound the country attended, according to television anchor Walter Cronkite In the 10 years afterthe first Earth Day, 28 significant pieces of federal environmental legislation were passed, alongwith the establishment of the U.S Environmental Protection Agency (EPA) in December of 1970.The first major environmental act, the National Environmental Policy Act of 1969 (NEPA)predated Earth Day, and had the stated purposes (Yost, 1997) of establishing harmony betweenman and the environment; preventing or eliminating damage to the environment; stimulating thehealth and welfare of man; enriching the understanding of ecological systems; and establishment ofthe Council on Environmental Quality Since that act, many laws protecting the environment havebeen passed at the national, state, and local levels
Evidence that public interest in environmental issues is still high can be found in the public
reaction to the book A Civil Action (Harr, 1995) This book describes the experience of people in
the town of Woburn, Massachusetts A number of people in the town became ill and some died due
to contamination of groundwater with TCE, an industrial solvent This book made the New York Times bestseller list, and was later made into a movie starring John Travolta More recently, the movie Erin Brockovich starring Julia Roberts covered a similar issue in California with Pacific Gas
and Electric and problems with hexavalent chrome in groundwater causing serious health issues.Public interest in the environment is exemplified by the various watchdog organizations thattrack environmental issues in great detail A good example of this is Scorecard.org, (EnvironmentalDefense, 2001) a Web site that provides a very large amount of information on environmentalcurrent events, releases of toxic substances, environmental justice, and similar topics For example,
on this site you can find the largest releasers of pollutants near your residence Sites like thisdefinitely raise public awareness of environmental issues
It’s also important to point out that the environmental industry is big business According toreports by the U.S Department of Commerce and Environmental Business International (as quoted
in Diener, Terkla, and Cooke, 2000), the environmental industry in the U.S in 1998 had $188.7billion in sales, up 1.6% from the previous year It employed 1,354,100 people in 115,850companies The worldwide market for environmental goods and services for the same period wasestimated to be $484 billion
Trang 15Figure 1 - The author (front row center) examining state-of-the-art punch card technology in 1959
THE COMPUTER REVOLUTION
In parallel with growing public concern for the environment has been growth of technology tosupport a better understanding of environmental conditions While people have been usingcomputing devices of some sort for over a thousand years and mainframe computers since the1950s (see Environmental Computing History Timeline sidebar), the advent of personal computers
in the 1980s made it possible to use them effectively on environmental projects For moreinformation on the history of computers, see Augarten (1984) and Evans (1981) Discussions of thehistory of geological use of computers are contained in Merriam (1983,1985)
With the advent of Windows-based, consumer-oriented database management programs in the
1990s, the tools were in place to create an environmental data management system (EDMS) to
store data for one or more facilities and use it to improve project management
Computers have assumed an increasingly important role in our lives, both at work and athome The average American home contains more computers than bathtubs From electronicwatches to microwave ovens, we are using computers of one type or another a significantpercentage of our waking hours In the workplace, computers have changed from big numbercrunchers cloistered somewhere in a climate-controlled environment to something that sits on ourdesk (or our lap) No longer are computers used only for massive computing jobs which could not
be done by hand, but they are now replacing the manual way of doing our daily work This is astrue in the earth science disciplines as anywhere else Consequently, industry sages have suggestedthat those who do not have computer skills will be left behind in the next wave of automation of the
Trang 16Environmental Computing History Timeline
1000 BC – The Abacus was invented (still in use)
1623 – The first mechanical calculator was invented by German professor Wilhelm
Schickard
1834 – Charles Babbage began work on the Analytical Engine, which was never completed
1850 – Charles Lyell was the first person to use statistics in geology
1876 – Alexander Graham Bell patented the telephone
1890 – Herman Hollerith built the Tabulating Machine, which was the first successful
mechanical calculating machine
1899 – The River and Harbor Act was the first environmental law passed in the United
States
1943 – The Mark 1, an electromechanical calculator, was developed
1946 – ENIAC (Electronic Numerator, Integrator, Analyzer and Computer) was completed.(Dick Tracy’s wrist radio also debuted in the comic strip.)
1947 – The transistor was invented by Bardeen, Brattain, and Shockley at Bell Labs
1951 – UNIVAC, the first commercial computer, became available
1952 – Digital plotters were introduced
1958 – The integrated circuit was invented by Jack Kilby at Texas Instruments
1962 – Rachel Carson’s Silent Spring is published, starting the environmental movement.
1965 – IBM white paper on computerized contouring appeared
1969 – National Environmental Policy Act (NEPA) was enacted
1970 – The first Earth Day was held
1970 – Relational data management was described by Edwin Codd
1971 – The first microprocessor, the Intel 4004, was introduced
1973 – SQL was introduced by Boyce and Chamberlain
1977 – The Apple II, the first widely accepted personal computer, was introduced
1981 – IBM releases its Personal Computer This was the computer that legitimized smallcomputers for business use
1984 – The Macintosh computer was introduced, the first significant use of a graphical userinterface on a personal computer
1985 – Windows 1.0 was released
1990 – Microsoft shipped Windows 3.0, the first widely accepted version
1994 – Netscape Navigator was released by Mosaic Communications, leading to widespreaduse of the World Wide Web
workplace At the least, those who are computer aware will be in a better position to evaluate howcomputers can help them in their work
The growth that we have seen in computer processing power is related to Moore’s law(Moore, 1965; see also Schaller, 1996), which states that the capacity of semiconductor memorydoubles every 18 months The price-performance ratio of most computer components meets orexceeds this law over time For example, I bought a 10 megabyte hard drive in 1984 for $800 In
2001 I could buy a 20 gigabyte hard drive for $200, a price-performance increase of 8000 times in
17 years This averages to a doubling about every 16 months Over the same time, PC processingspeed has increased from 4 megahertz for $5000 to 1000 megahertz for $1000, an increase of
1250, a doubling every 20 months These average to 18 months So computers become twice aspowerful every year and a half, obeying Moore’s law
Unlike 10 or especially 20 years ago, it is now usual in industrial and engineering companiesfor each employee to have a suitable computer on his or her desk, and for that computer to benetworked to other people’s computers and often a server This computing environment is a goodbase on which to build a data management system
Trang 17As the hardware has developed, so has the data management software It is now possible tooutfit an organization with the software for a client-server data management system starting at
$1,000 or $2,000 a seat Users probably already have the hardware Adding softwarecustomization, training, support, and other costs still allows a powerful data management system to
be put in place for a cost which is acceptable for many projects
In general, computers perform best when problem solving calls for either deductive orinductive reasoning, and poorly when using subjective reasoning For example, calculating a series
of stratigraphic horizon elevations where the ground level elevation and the depth to the formationare known is an example of deductive reasoning Computers perform optimally on problemsrequiring deductive reasoning because the answers are precise, requiring explicit computations.Estimating the volume of contamination or contouring a surface is an example of inductivereasoning Inductive reasoning is less precise, and requires a skilled geoscientist to critique andmodify the interpretation Lastly, the feeling that carbonate aquifers may be more complex thanclastic aquifers is an example of subjective reasoning Subjective reasoning uses qualitative dataand is the most difficult of all for computer analysis In such instances, the analytical potential ofthe computer is secondary to its ability to store and graphically portray large amounts ofinformation Graphic capabilities are requisite to earth scientists in order to make qualitative datausable for interpretation
Another example of appropriate use of computers relative to types of reasoning is thedistinction between verification and validation, which is discussed in detail in Chapter 16.Verification, which checks compliance of data with project requirements, is an example ofdeductive logic Either a continuous calibration verification sample was run every ten samples or itwasn’t Validation, on the other hand, which determines the suitability of the data for use, is verysubjective, requiring an understanding of sampling conditions, analytical procedures, and theexpected use of the data Verification is easily done with computer software How far software can
go toward complete automation of the validation process remains to be seen
CONVERGENCE - ENVIRONMENTAL DATA MANAGEMENT
Efficient data management is taking on increased importance in many organizations, and yours
is probably no exception In the words of one author (Diamondstone, 1990, p 3):
Automated measuring equipment has provided rapidly increasing amounts of
data Now, the challenge before us is to assure sufficient data uniformity and
compatibility and to implement data quality measures so that these data will be
useful for integrative environmental problem solving.
This is particularly true in organizations where many different types of site investigation andmonitoring data are coming from a variety of different sources Fortunately, software tools are nowavailable which allow off-the-shelf programs to be used by people who are not computer experts toretrieve this data in a format that is meaningful to them According to Finkelstein (1989, p 3):
Management is on the threshold of an explosive boom in the use of computers A
boom initiated by simplicity and ease of use Managers and staff at all levels of
an organization will be able to design and implement their own systems, thereby
dramatically reducing their dependence on the data processing (DP)
department, while still ensuring that DP maintains central control, so that
application systems and their data can be used by others in the business.
With the advent of relatively easy to use software tools such as Microsoft Windows andMicrosoft Access, it is even more true now that individuals can have a much greater role in
Trang 18satisfying their own data management needs It is important to develop a data managementapproach that makes efficient use of these tools to solve the business problem of managing datawithin the organization The environmental data management system that will result fromimplementation of a plan based on this approach will provide users with access to theorganization’s environmental data to satisfy their business needs It will allow them to expand theirdata retrievals as their needs change and as their skills develop.
As with most business decisions, the decision to implement a data management system should
be based on an analysis of the expected return on the time and money invested In the case of anoffice automation project, some of the return is tangible and can be expressed in dollar savings,and some is intangible savings in efficiency in everyday operations In general, the best approachfor system implementation is to look for leverage points in the system where a great return can behad for a small cost The question becomes: How do you improve the process to get the greatestsavings?
Often some examples of tangible returns can be identified within the organization Thebenefits can best be seen from analyzing the impact of the data management system on the wholesite investigation and remediation process For example, during remediation you might be able, bymore careful tracking and modeling of the contamination, to decrease the amount of waste to beremoved or water to be processed You may also be able to decrease the time required to completethe project and save many person-years of cost by making quality data available in a standardizedformat and in a timely fashion For smaller sites, automating the data management process canprovide savings by repetition Once the system has been set up for one site and people trained touse it, that effort can be re-used on the next site
The intangible benefits of a data management system are difficult to quantify, but subjectivelycan include increased job satisfaction of project workers, a higher quality work product, and betterdecision making The cumulative financial and intangible return on investment of these variousbenefits can easily justify reasonable expenditures for a data management system
CONCEPT OF DATA VS INFORMATION
It is important to recognize that there is a difference between numbers and letters stored in acomputer and useful information Numbers stored in a computer, or printed out onto a sheet ofpaper, may not themselves be of any value It is only when those numbers are presented in a formthat is useful to the intended audience that they become useful information The keys to making thetransition from data to information are organization and access It doesn't matter if you have a file
of all the monitoring wells ever drilled; if you can't get the information you want out of the file, it isuseless Before any database is created, careful attention should be paid to how the data is going to
be used, to ensure that the maximum use can be received from the effort
Statistics and graphics can be tremendously helpful in perceiving relationships among differentvariables contained in the data As the power and ease-of-use of both general business programsand technical programs for statistics and graphics improves, it will become common to take a goodlook at the data as a set before working with individual members of the set
The next step is to move from information to knowledge The difference between the two isunderstanding Once you have processed the information and understand it, it becomes knowledge.This transition is a human activity, not a computer activity, but the computer can help bypresenting the information in an understandable manner
EMS VS EMIS VS EDMS
A final overview issue to discuss is the relationship between EMS (environmental management systems), EMIS (environmental management information systems), and site EDMS (environmental
data management systems) An EMS is a set of policies and procedures for managing
Trang 19Data is or Data are?
Is “data” singular or plural? In this book the word data is used as a singular noun.Depending on your background, you may not like this Many engineers and scientists think ofdata as the plural of “datum,” so they consider the word plural Computer people view data as
a chunk of stuff, and, like “chunk,” consider it singular In one dictionary I consulted(Webster, 1984), data as the plural of datum was the third definition, with the first two beingsynonyms for “information,” which is always treated as singular It also states that commonusage at this time is singular rather than plural, and that “data can now be used as a singularform in English.” In Strunk and White (1935), a style manual that I use, the discussion ofsingular vs plural nouns uses an example of the contents of a jar If the jar contains marbles,its contents are plural If it contains jam, its content is singular You decide: Is data jam ormarbles?
environmental issues for an organization or a facility An EMIS is a software system implemented
to support the administration of the EMS (see Gilbert, 1999) EMIS usually has a focus on recordkeeping and reporting, and is implemented with the hope of improving business processes andpractices A site environmental data management systems (EDMS) is a software system formanaging data regarding the environmental impact of current or former operations EDMSoverlaps partially with EMIS systems For an operating facility, the EDMS is a part of the EMIS.For a facility no longer in operation, there may be no formal EMS or EMIS, but the EDMS isnecessary to facilitate monitoring and cleanup
Trang 20CHAPTER 2
SITE DATA MANAGEMENT CONCEPTS
The size and complexity of environmental investigation and monitoring programs at industrialfacilities continue to increase Consequently the amount of environmental data, both at operatingfacilities and orphan sites, is growing as well The volume of data often exceeds the capacity ofsimple tools like paper reports and spreadsheets When that happens it is appropriate to implement
a more powerful data management system and often the system of choice is a relational databasemanager This section provides a top-down discussion of management of environmental data Itfocuses on the purpose and goals of environmental data management, and on the types and
locations of data storage These issues should always be resolved before an electronic (or in fact
any) data management system should be implemented
PURPOSE OF DATA MANAGEMENT
Why manage data electronically? Or why even manage it at all? Clear answers to thesequestions are critical before a successful system can be implemented This section addresses some
of the issues related to the purpose of data management It all comes down to planning If youunderstand the goal to be accomplished, you have a better chance of accomplishing it
There is only one real purpose of data management: to support the goals of the organization.These goals are discussed in detail in Chapter 8 No data management system should be builtunless it satisfies one or more significant business or technical goals Identification of these goalsshould be done prior to designing and implementing the system for two reasons One reason is thatthe achievement of these goals provides the economic justification for the effort of building thesystem The other reason is that the system is more likely to generate satisfactory results if thoseresults are understood, at least to a first approximation, before the system is implemented andfunctionality is frozen
Different organizations have different things that make them tick For some organizations,internal considerations such as cost and efficiency are most important For others, outsideappearances are equally or more important The goals of the organization must be taken intoconsideration in the design of the system so that the greatest benefit can be achieved Typical goalsinclude:
Improve efficiency – Environmental site investigation and remediation projects can involve
an enormous amount of data Computerized methods, if they are well designed and implemented,
Trang 21can be a great help in improving the flow of data through the project They can also be a great sink
of time and effort if poorly managed
Maximize quality – Because of the great importance of the results derived from
environmental investigation and remediation, it is critical that the quality of the output bemaximized relative to the cost This is not trivial, and careful data storage, and annotation of datawith quality information, can be a great help in achieving data quality objectives
Minimize cost – No organization has an unlimited amount of money, and even those with a
high level of commitment to environmental quality must spend their money wisely to receive thegreatest return on their investment This means that unnecessary costs, whether in time or money,must be minimized Electronic data management can help contain costs by saving time andminimizing lost data
People tend to start working on a database without giving a lot of thought to what a databasereally is It is more than an accumulation of numbers and letters It is a special way to help usunderstand information Here are some general thoughts about databases:
A database is a model of reality – In many cases, the data that we have for a facility is the
only representation that we have for conditions at that facility This is especially true in thesubsurface, and for chemical constituents that are not visible, either because of their physicalcondition or their location
The model helps us understand the reality – In general, conditions at sites are nearly
infinitely complex The total combination of geological, hydrological and engineering factorsusually exceeds our ability to understand it without some simplification Our model of the site,based on the data that we have, helps us to perform this simplification in a meaningful way
This understanding helps us make decisions – Our simplified understanding of the site
allows us to make decisions about actions to be taken to improve the situation at the site Ourmodel lets us propose and test solutions based on the data that we have, identify additional datathat we need, and then choose from the alternative solutions
The clearer the model, the better the decisions – Since our decisions are based on our
data-based model, it follows that we will make better decisions if we have a clear, accurate, up-to-datemodel The purpose of a database management system for environmental data is to provide us theinformation to build accurate models and keep them current
Clearly information technology, including data management, is important to organizations.Linderholm (2001) reports the results of a study that asked business executives about theimportance of information technology (IT) to their business 70% reported that it was absolutelyessential, and 20% said it was extremely valuable The net increase in revenue attributable to IT,after accounting for IT costs, was estimated to be 20%, clearly a good return 70% said that therole of IT in business strategy is increasing In the environmental business the story must besimilar, but perhaps not as strong If you were to survey project managers today about theimportance of data management on their projects, probably the percentage that said it was essential
or extremely valuable would be less than the 90% quoted above, and maybe less than 50% But asthe amount of data for sites continues to grow, this number will surely increase
TYPES OF DATA STORAGE
Once the purpose of the system has been determined, the next step is to identify the data to becontained in the system and how it is to be stored Some data must be stored electronically, while
Environmental problems are complex problems Complex problems have simple, understand wrong answers
easy-to-From Environmental Humor by Gerald Rich (1996), reprinted with permission
Trang 22other data might not need to be stored this way Implementers should first develop a thoroughunderstanding of their existing data and storage methods, and then make decisions about howelectronic storage can provide an improvement This section will cover ways of storing siteenvironmental data The content of an EDMS will be discussed in Chapter 4.
Hard copy
Since its inception, hard copy data storage has been the lifeblood of the environmental
industry Many organizations have thousands of boxes of paper related to their projects Theimportance of this data varies greatly, but in many organizations, it is not well understood
A data management system for hard copy data is different from a data management system fordigital data such as laboratory analytical results The former is really a document managementsystem, and many vendors offer software and other tools to build this type of system The latter ismore of a technical database issue, and can be addressed by in-house generated solutions or off-the-shelf or semi-custom solutions from environmental software vendors
LAB REPORTS
Laboratory analyses can generate a large volume of paper Programs like the U.S.E.P.A.Contract Lab Program (CLP) specify deliverables that can be hundreds of pages for one samplingevent This paper is important as backup for the data, but these hundreds of pages can cause astorage and retrieval problem for many organizations Often the usable data from the lab event, that
is, the data actually used to make site decisions, may be only a small fraction of the paper, with therest being quality assurance and other backup information
DERIVED REPORTS
Evaluation of the results of laboratory analysis and other investigation efforts usually results in
a printed report These reports contain a large amount of useful information, but over time can alsobecome a storage and retrieval problem
Electronic
There are many different ways of organizing data for digital storage There is no “right” or
“wrong” way, but there are approaches that provide greater benefits than others in specificsituations People store environmental data a lot of different ways, both in database systems and inother file types Here we will discuss two non-database ways of storing data, and several differentdatabase system designs for storing data
TEXT FILES AND WORD PROCESSOR FILES
The simplest way to manage environmental data is in text files These files contain just the
information of interest, with no formatting or information about the data structure or relationships
between different data elements Usually these files are encoded in ASCII, which stands for
American Standard Code for Information Interchange and is pronounced like as′-kee For thisreason they are sometimes called ASCII files Text files can be effectively used for storing andtransferring small amounts of data Because they lack “intelligence” they are not usually practicalfor large data sets For example, in order to search for one piece of data in a text file you must look
at every word until you find the one you are looking for, rather than using a more efficient methodsuch as indexed searching used by data management programs
A variation on text files is word processor files, which contain some formatting and structure
resulting from the word processing program that created them An example of this would be thedata in a table in a report Again this works well only for small amounts of data
Trang 23Over the years a large amount of environmental data has been managed in spreadsheets This
approach works for data sets that are small to medium in size, and where the display and retrievalrequirements are relatively simple For large data sets, a database manager program is usuallyrequired because spreadsheets have a limit to the number of rows and columns that they contain,and these limits can easily be exceeded by a large data set For example, Lotus 123 has a limit ofabout 16,000 rows of data, and Excel 97 has a limit of 65,536 rows
Spreadsheets do have their place in working with environmental data They are particularlyuseful for statistical analysis of data and for graphing in a variety of ways Spreadsheets are fordoing calculations Database managers are for managing data As long as both are usedappropriately, the two together can be very powerful
The problem with spreadsheets occurs when they are used in situations where real datamanagement is required For example, it’s not unusual for organizations to manage quarterlygroundwater monitoring data using spreadsheets They can do statistics on the data and printreports Where the problem becomes evident is when it becomes necessary to do a historicalanalysis of the data It can be very difficult to tie the data together The format of the spreadsheetsmay have evolved over time The file for one quarter may be missing or corrupted Suddenly itbecomes a chore to pull all of the data together to answer a question such as “What is the trend ofthe sulfate values over the last five years?”
DATABASE MANAGERS
For storing large amounts of data, and where immediate calculations are not as important,
database managers usually do a better job than spreadsheets, although the capabilities of
spreadsheets and databases certainly overlap somewhat The better database managers allow you tostore related data in several different tables and to link them together based on the contents of thedata Many database manager programs have a reputation for not being very easy to use, partlybecause of the sheer number of options available This has been improved with the menu-driveninterfaces that are now available These interfaces help with the learning curve, but datamanagement software, especially database server software, can still be very difficult to master.Many database manager programs provide a programming language, which allows you toautomate tasks that you perform often or repeatedly It also allows you to configure the system forother users This language provides the tools to develop sophisticated applications programs fornearly any data handling need, and provides the basis for some commercial EDMS software.Database managers are usually classified by how they store and relate data The most commontypes are flat files, hierarchical, network, object-oriented, and relational Most use the terminology
of “record” for each object in the database (such as a well or sample location) and “field” for eachtype of information on each object (such as county or collection date) For information on databasemanagement concepts see Date (1981) and Rumble and Hampel (1984)
Sullivan (2001) quotes a study by the University of California at Berkeley that humans havegenerated 12 exabytes (an exabyte is over 1 million terabytes, or a million trillion bytes) of datasince the start of time, and will double this in the next two and a half years Currently, about 20%
of the world’s data is contained in relational databases, while the rest is in flat files, audio, video,pre-relational, and unstructured formats
Flat file
A flat file is a two-dimensional array of data organized in rows and columns similar to a
spreadsheet This is the simplest type of database manager All of the data for a particular type ofobject is stored in a single file or table, and each record can have one instance of data for eachfield A good analogy is a 3"×5" card file, where there is one card (record) for each item beingtracked in the database, and one line (field) for each type of information stored
Trang 24Flat file database managers are usually the cheapest to buy, and often the easiest to use, but thecomplexity of real-world data often requires more power than they can provide.
In a flat file storage system, each row represents one observation, such as a boring or a sample.Each column contains the same kind of data An example of a flat file of environmental data isshown in the following table:
B-1 725 1050 681 2/3/96 JLG 05 not det 6.8B-1 725 1050 681 5/8/96 DWR 05 not det 05 not det 6.7B-2 706 342 880 11/4/95 JAM 3.7 detected 9.1 detected 5.2B-2 706 342 880 2/3/96 JLG 2.1 detected 8.4 detected 5.3B-2 706 342 880 5/8/96 DWR 1.4 detected 7.2 detected 5.8B-3 714 785 1101 2/3/96 JLG 05 not det 8.1B-3 714 785 1101 5/8/96 CRS 05 not det 05 not det 7.9
Figure 2 - Flat file of environmental data
In this table, each line is the result of one sampling event for an observation well Since thewells were sampled more than once, and analyzed for multiple parameters, information specific tothe well, such as the elevation and location (X and Y), is repeated This wastes space and increasesthe chance for error since the same data element must be entered more than once The same is truefor sampling events, represented here by the date and the initials of the person doing the sampling.Also, since the format for the analysis results requires space for each value, if the value is missing,
as it is for some of the chloride measurements, the space for that data is wasted
In general, flat files work acceptably for managing small amounts of data such as individualsampling events They become less efficient as the size of the database grows Examples of flat filedata management programs are FileMaker Pro (www.filemaker.com) and Web-based databaseprograms such as QuickBase (www.quickbase.com)
Hierarchical
In the hierarchical design, the one-to-many relationship common to many data sets is
formalized into the database design This design works well for situations such as multiple samplesfor each boring, but has difficulty with other situations such as many-to-many relationships Thistype of program is less common than flat files or relational database managers, but is appropriatefor some types of data In a hierarchical database, data elements can be viewed as branches of aninverted tree
A good example of a hierarchical database might be a database of organisms At the top would
be the kingdom, and underneath that would be the phyla for each kingdom Each phylum belongs
to only one kingdom, but each kingdom can have several phyla The same continues down the linefor class, order, and so on The most important factor in fitting data into this scheme is that theremust be no data element at one level that needs to be under more than one element at a higherlevel If a crinoid could be both a plant and an animal at the same time, it could not be classified in
a hierarchical database by phylogeny (which biological kingdom it evolved from)
Environmental site data is for the most part hierarchical in nature Each site can have manymonitoring wells Each well can have many samples, either over time or by depth Then eachsample can be analyzed for multiple constituents Each constituent analysis comes from onespecific sample, which comes from one well, which comes from one site
A data set which is inherently hierarchical can be stored in a relational database manager, andrelational database managers are somewhat more flexible, so pure hierarchical database managersare now rare
Trang 25In the network data model, multiple relationships between different elements at the same level
are easy to manage Hypertext systems (such as the World Wide Web) are examples of managingdata this way Network database managers are not common, but are appropriate in some cases,especially those in which the interrelationships among data are complex
An example of a network database would be a database of authors and articles Each authormay have written many articles, and each article may have one or more authors This is called a
“many-to-many” relationship This is a good project for a network database manager Each author
is entered, as is each article Then the links between authors and articles are established The dataelements are entered, and then the network established Then an article can be called up, and theinformation on its authors can be retrieved Likewise, an author can be named, and his or herarticles listed
A network data topology (geometric configuration) can be stored in a relational databasemanager A “join table” is needed to handle the many-to-many relationships Storing the abovearticle database in a relational system would require three tables, one for authors, one for articles,and a join table with the connections between them
Object oriented
This relatively recent invention stores each data element as an object with properties andmethods encapsulated (wrapped up) into each object This is a deviation from the usual separation
of code and data, but is being used successfully in many settings Current object-oriented systems
do not provide the data retrieval speed on large data sets provided by relational systems Using thistype of software involves a complete re-education of the user, since different terminology andconcepts are used It is a very powerful way to manipulate data for many purposes, and is likely tosee more widespread use Some of the features of object-oriented databases are described in thenext few paragraphs
Encapsulation – Traditional programming languages focus on what is to be done This is
referred to as “procedural programming.” Object-oriented programming focuses on objects, whichare a blend of data and program code (Watterson, 1989) In a procedural paradigm (a paradigm is
an approach or model), the data and the programs are separate In an object-oriented paradigm, theobjects consist of data that knows what to do with itself, that is, objects contain methods forperforming actions This is called encapsulation Thus, instead of applying procedures to passivedata, in object-oriented programming systems (OOPS), methods are part of the objects
Some examples of the difference between procedural systems and OOPS might be helpful In aprocedural system, the data for a well could contain a field for well type, such as monitoring well
or soil boring The program operating on the data would know what symbol to draw on the mapbased on the contents of that field In an OOPS the object called “soil boring” would include amethod to draw its symbol, based on the data content (properties) of the object Properties ofobjects in OOPS are usually loosely typed, which means that the distinction between data typessuch as integers and characters is not rigorously defined This can be useful when, as is often thecase, a numeric property such as depth to a particular formation needs to be filled with charactervalues such as NP (not present) or NDE (not deep enough)
For another illustration, imagine modeling a rock or soil body subject to chemical and physicalprocesses such as leaching or neutralization using an OOPS Each mineral phase would be anobject of class “mineral,” while each fluid phase would be an object of class “fluid.” Methodsknown to the objects would include precipitation, dissolution, compaction, and so on The model isgiven an initial condition, and then the objects interact via messages triggering methods until somefinal state is reached
Inheritance – Objects in an OOPS belong to classes, and members of a particular class share
the same methods Also, similar classes of objects can inherit properties and methods from anexisting class This feature, called inheritance, allows a building-block approach to designing a
Trang 26database system by first creating simple objects and then building on and combining them intomore complex objects In this way, an object called “site” made up of “well” objects would knowhow to display itself with no additional programming.
Message Passing – An object-oriented program communicates with objects via messages, and
objects can exchange messages as well For example, an object of class “excavated material” couldsend a message to an object of class “remediation pit” which would update the property “remainingmaterial” within object “remediation pit.”
Polymorphism – A method is part of an object, and is distinct from messages between
objects The objects “well” and “boring” could both contain the method “draw yourself,” andsending the “draw yourself” message to one or the other object will cause a similar but differentresult This is referred to as polymorphism
Object-oriented programming directly models the application, with messages being passedbetween objects being the analog of real-world processes (Thomas, 1989) Software written in thisway is easier to maintain because programmers, other than the author, can easily comprehend theprogram code Since program code is easily reusable, development of complex applications can bedone more quickly and smoothly Encapsulation, message passing, inheritance, and polymorphismgive OOPS developers very different tools from those provided by traditional programminglanguages Also, OOPS often use a graphical user interface and large amounts of memory, makingthem more suitable to high-end computer systems For these reasons, OOPS have been slow ingaining acceptance, but they are gaining momentum and are considered by many to be theprogramming system of the future
Examples of object-oriented programming languages include Smalltalk developed by Xerox atthe Palo Alto Research Center in the 1970s (Goldberg and Robson, 1983); C++, which is asuperset of the C programming language (Soustrup, 1986); and Hypercard for the Macintosh.NextStep, the programming environment for the Next computer, also uses the object-orientedparadigm
There are several database management programs that are designed to be object oriented,which means that their primary data storage design is to store objects Also, a number of relationaldatabase management systems have recently added object data types to allow object-orientedapplications to use them as data repositories, and are referred to as Object-Relational systems
Relational
Relational database managers and SQL are discussed in much greater detail in Chapter 3, andare described here briefly for comparison with other database manager types In the relational
model, data is stored in one or more tables, and these tables are related, that is, they can be joined
together, based on data elements within the tables This allows storage of data where there may bemany pieces of one type of information related to one object (one-to-many relationship), as well asother relationships such as hierarchical and many-to-many In many cases, this has been found to
be the most efficient form of data storage for large, complicated databases, because it providesefficient data storage combined with flexible data retrieval Currently the most popular type ofdatabase manager program is the relational type
A file of monitoring well data provides a good example of how real-world data can be stored
in a relational database manager One table is created which contains the header data for the wellincluding location, date drilled, elevation, and so on, with one record for each well For each well,the driller or logger will report a different number of formation tops, so a table of formation tops iscreated, with one record for each top A unique identifier such as well ID number relates the twotables to each other Each well can also have one or more screened intervals, and a table is createdfor each of those data types, and related by the same ID number Each screened interval can havemultiple sampling events, with a description for each, so another table can be created for thesesample events, which can be related by well ID number and sample event number Very complex