1. Trang chủ
  2. » Giáo Dục - Đào Tạo

RELATIONAL MANAGEMENT and DISPLAY of SITE ENVIRONMENTAL DATA - PART 1 pps

52 379 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Relational Management and Display of Site Environmental Data
Tác giả David W. Rich
Trường học Lewis Publishers, CRC Press LLC
Chuyên ngành Environmental Data Management
Thể loại Sách giáo trình
Năm xuất bản 2002
Thành phố Boca Raton
Định dạng
Số trang 52
Dung lượng 790,03 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The objective of Relational Management and Display of Site Environmental Data is to bring together in one place the information necessary to manage the data well, so everyone, fromstuden

Trang 1

ENVIRONMENTAL DATA

Trang 3

This book contains information obtained from authentic and highly regarded sources Reprinted material is quoted with permission, and sources are indicated A wide variety of references are listed Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials

or for the consequences of their use.

Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher.

The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works,

or for resale Specific permission must be obtained in writing from CRC Press LLC for such copying.

Direct all inquiries to CRC Press LLC, 2000 N.W Corporate Blvd., Boca Raton, Florida 33431

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for

identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com

© 2002 by CRC Press LLC Lewis Publishers is an imprint of CRC Press LLC

No claim to original U.S Government works International Standard Book Number 1-56670-591-6 Library of Congress Card Number 2002019441 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0

Printed on acid-free paper

Library of Congress Cataloging-in-Publication Data

Rich, David William,

1952-Relational management and display of site environmental data / David W Rich.

p cm.

Includes bibliographical references and index.

ISBN 1-56670-591-6 (alk paper)

1 Pollution—Measurement—Data processing 2 Environmental monitoring—Data

processing 3 Database management I Title.

TD193 R53 2002

Trang 4

The environmental industry is changing, along with the way it manages data Many projectsare making a transition from investigation through remediation to ongoing monitoring Datamanagement is evolving from individual custom systems for each project to standardized,centralized databases, and many organizations are starting to realize the cost savings of this

approach The objective of Relational Management and Display of Site Environmental Data is to

bring together in one place the information necessary to manage the data well, so everyone, fromstudents to project managers, can learn how to benefit from better data management

This book has come from many sources It started out as a set of course notes to help transferknowledge about earth science computing and especially environmental data management to ourclients as part of our software and consulting practice While it is still used for that purpose, it hasevolved into a synthesis of theory and a relation of experience in working with site environmentaldata It is not intended to be the last word on the way things are or should be done, but rather tohelp people learn from the experience of others, and avoid mistakes whenever possible

The book has six main sections plus appendices Part One provides an overview of the subjectand some general concepts, including a discussion of system data content Part Two covers systemdesign and implementation, including database elements, user interface issues, and implementationand operation of the system Part Three addresses gathering the data, starting with an overview ofsite investigation and remediation, progressing through gathering samples in the field, and endingwith laboratory analysis Part Four covers the data management process, including importing,editing, maintaining data quality, and managing multiple projects Part Five is about using the dataonce it is in the database It starts with selecting data, and then covers various aspects of dataoutput and analysis including reporting and display; graphs; cross sections and similar displays; alarge chapter on mapping and GIS; statistical analysis; and integration with other programs.Section Six discusses problems, benefits, and successes with implementing a site environmentaldata management system, along with an attempt to look into the future of data management andenvironmental projects Appendices include examples of a needs assessment, a data model, a datatransfer standard, typical constituent parameters, some exercises, a glossary, and a bibliography

A number of people have contributed directly and indirectly to this book, including myparents, Dr Robert and Audrey Rich; Dr William Fairley, my uncle and professor of geology atthe University of Notre Dame; and Dr Albert Carozzi, my advisor and friend at the University ofIllinois Numerous coworkers and friends at Texaco, Inc., Shell Oil Company, Sabine Corporation,Grant Environmental, and Geotech Computer Systems, Inc helped bring me to the pointprofessionally where I could write this book These include Larry Ratliff, Jim Thomson, Dr James

L Grant, Neil Geitner, Steve Wampler, Jim Quin, Cathryn Stewart, Bill Thoen, Judy Mitchell, Dr.Mike Wiley, and other Geotech staff members who helped with the book in various ways Friends

Trang 5

in other organizations have also helped me greatly in this process, including Jim Reed ofRockWare, Tom Bresnahan of Golden Software, and other early members of the ComputerOriented Geological Society Thanks also go to Dr William Ganus, Roy Widmann, SherronHendricks, and Frank Schultz of Kerr-McGee for their guidance.

I would also like to specifically thank those who reviewed all or part of the book, includingCathryn Stewart (AquAeTer), Bill Thoen (GISNet), Mike Keester (Oklahoma State University),Bill Ganus and Roy Widmann (Kerr-McGee), Mike Wiley (The Consulting Operation), and SueStefanosky and Steve Clough (Roy F Weston) The improvements are theirs The errors are stillmine

Finally, my wife, business partner, and best friend, Toni Rich, has supported me throughout

my career, hanging in there through the good times and bad, and has always done what she could tomake our enterprise successful She’s also a great proofreader

Throughout this book a number of trademarks and registered trademarks are used Theregistered trademarks are registered in the United States, and may be registered in other countries.Any omissions are unintentional and will be remedied in later editions Enviro Data and Spase areregistered trademarks of Geotech Computer Systems, Incorporated Microsoft, Office, Windows,

NT, Access, SQL Server, Visual Basic, Excel, and FoxPro are trademarks or registered trademarks

of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation Paradox anddBase are registered trademarks of Borland International, Incorporated IBM and DB2 areregistered trademarks of International Business Machines Corporation AutoCAD and AutoCADMap are registered trademarks of Autodesk, Incorporated ArcView is a registered trademark ofEnvironmental Systems Research Institute, Incorporated Norton Ghost is a trademark of SymantecCorporation Apple and Macintosh are registered trademarks of Apple Computer, Incorporated.Sun is a registered trademark and Sparcstation is a trademark of Sun Microsystems CapabilityMaturity Model and CMM are registered trademarks of The Software Engineering Institute ofCarnegie Mellon University Adobe and Acrobat are registered trademarks of Adobe Systems.Grapher is a trademark and Surfer is a registered trademark of Golden Software, Inc RockWare is

a registered trademark and RockWorks and Gridzo are trademarks of RockWare, Inc Intergraphand GeoMedia are trademarks of Intergraph Corporation Corel is a trademark and Corel Draw is aregistered trademark of Corel Corporation UNIX is a registered trademark of The Open Group.Linux is a trademark of Linus Torvalds Use of these products is for illustration only, and does notsignify endorsement by the author

A Web site has been established for updates, exercises, and other information related to thisbook It is located at www.geotech.com/relman

I welcome your comments and questions I can be reached by email at drdave@geotech.com

David W Rich

Trang 6

David W Rich is founder and president of Geotech Computer Systems, Inc in Englewood,

CO Geotech provides off-the-shelf and custom software and consulting services for environmentaldata management, GIS, and other technical computing projects Dr Rich received his B.S inGeology from the University of Notre Dame in 1974, and his M.S and Ph.D in Geology from theUniversity of Illinois in 1977 and 1979, with his dissertation on “Porosity in Oolitic Limestones.”

He worked for Texaco, Inc in Tulsa, OK and Shell Oil Company in Houston, TX, exploring for oiland gas in Illinois and Oklahoma He then moved to Sabine Corporation in Denver, CO as part of ateam that successfully explored for oil in the Minnelusa Formation in the Powder River Basin ofWyoming He directed the data management and graphics groups at Grant Environmental inEnglewood, CO where he worked on several projects involving soil and groundwater contaminatedwith metals, organics, and radiologic constituents His team created automated systems formapping and cross section generation directly from a database In 1986 he founded GeotechComputer Systems, Inc., where he has developed and supervised the development of custom andcommercial software for data management, GIS, statistics, and Web data access

Environmental projects with which Dr Rich has been directly involved include two Superfundwood treating sites, three radioactive material processing facilities, two hazardous waste disposalfacilities, many municipal solid waste landfills, two petroleum refineries, and several mining andpetroleum production and transportation projects He has been the lead developer on three publichealth projects involving blood lead and related data, including detailed residential environmentalmeasurements In addition he has been involved in many projects outside of the environmentalfield, including a real-time Web-based weather mapping system, an agricultural GIS analysis tool,and database systems for petroleum exploration and production data, paleontological data, landownership, health care tracking, parts inventory and invoice printing, and GPS data capture

Dr Rich has been using computers since 1970, and has been applying them to earth scienceproblems since 1975 He was a co-founder and president of the Computer Oriented GeologicalSociety in the early 1980s, and has authored or co-authored more than a dozen technical papers,book chapters, and journal articles on environmental and petroleum data management, geology,and computer applications He has taught many short courses on geological and environmentalcomputing in several countries, and has given dozens of talks at various industry conventions andother events

When he is not working, Dr Rich enjoys spending time with his family and riding hismotorcycle in the mountains, and often both at the same time

Trang 7

PART ONE - OVERVIEW AND CONCEPTS

CHAPTER 1 - OVERVIEW OF ENVIRONMENTAL DATA MANAGEMENT Concern for the environment

The computer revolution

Convergence - Environmental data management

Concept of data vs information

EMS vs EMIS vs EDMS

CHAPTER 2 - SITE DATA MANAGEMENT CONCEPTS

Purpose of data management

Types of data storage

Responsibility for data management

Understanding the data

CHAPTER 3 - RELATIONAL DATA MANAGEMENT THEORY

What is relational data management?

History of relational data management

Data normalization

Structured Query Language

Benefits of normalization

Automated normalization

CHAPTER 4 - DATA CONTENT

Data content overview

Project technical data

Project administrative data

Project document data

Reference data

Document management

PART TWO - SYSTEM DESIGN AND IMPLEMENTATION

CHAPTER 5 - GENERAL DESIGN ISSUES

Database management software

Trang 8

Database location options

Distributed vs centralized databases

The data model

Data access requirements

Government EDMS systems

Other issues

CHAPTER 6 - DATABASE ELEMENTS

Hardware and software components

Units of data storage

Databases and files

Tables (“databases”)

Fields (columns)

Records (rows)

Queries (views)

Other database objects

CHAPTER 7 - THE USER INTERFACE

General user interface issues

Conceptual guidelines

Guidelines for specific elements

Documentation

CHAPTER 8 - IMPLEMENTING THE DATABASE SYSTEM

Designing the system

Buy or build?

Implementing the system

Managing the system

CHAPTER 9 - ONGOING DATA MANAGEMENT ACTIVITIES Managing the workflow

Managing the data

Administering the system

PART THREE - GATHERING ENVIRONMENTAL DATA

CHAPTER 10 - SITE INVESTIGATION AND REMEDIATION Overview of environmental regulations

The investigation and remediation process

Environmental Assessments and Environmental Impact Statements CHAPTER 11 - GATHERING SAMPLES AND DATA IN THE FIELD General sampling issues

Trang 9

Other analysis issues

PART FOUR - MAINTAINING THE DATA

CHAPTER 13 - IMPORTING DATA

QC samples and analyses

Data quality procedures

Database support for data quality and usability

Precision vs accuracy

Protection from loss

CHAPTER 16 - DATA VERIFICATION AND VALIDATION

Types of data review

Meaning of verification

Meaning of validation

The verification and validation process

Verification and validation checks

Software assistance with verification and validation

CHAPTER 17 - MANAGING MULTIPLE PROJECTS AND DATABASES One file or many?

Sharing data elements

Moving between databases

Limiting site access

PART FIVE - USING THE DATA

CHAPTER 18 - DATA SELECTION

Trang 10

Fence diagrams and stick displays

Block Diagrams and 3-D displays

CHAPTER 22 - MAPPING AND GIS

Types of statistical analyses

Outliers and comparison with limits

Toxicology and risk assessment

CHAPTER 24 - INTEGRATION WITH OTHER PROGRAMS

PART SIX - PROBLEMS, BENEFITS, AND SUCCESSES

CHAPTER 25 - AVOIDING PROBLEMS

Manage expectations

Use the right tool

Prepare for problems with the data

Plan project administration

Increasing the chance of a positive outcome

Trang 11

CHAPTER 26 - SUCCESS STORIES

APPENDIX A - NEEDS ASSESSMENT EXAMPLE

APPENDIX B - DATA MODEL EXAMPLE

Database redesign exercise

Data normalization exercise

Group discussion - data management and your organization

Database redesign exercise solution

Data normalization exercise solution

Database software exercises

APPENDIX F - GLOSSARY

APPENDIX G - BIBLIOGRAPHY

Trang 12

PART ONE - OVERVIEW AND

CONCEPTS

Trang 13

CONCERN FOR THE ENVIRONMENT

The United States federal government has been regulating human impact on the environmentfor over a century Section 13 of the River and Harbor Act of 1899 made it unlawful (with someexceptions) to put any refuse matter into navigable waters (Mackenthun, 1998, p 20) Since thenhundreds of additional laws have been enacted to protect the environment This regulation occurs

at all levels of government from international treaties, through federal and state governments, toindividual municipalities Often this situation of multiple regulatory oversight results in a maze ofregulations that makes even legitimate efforts to improve the situation difficult, but it has definitelyincreased the effort to clean up the environment and keep it clean

Through the 1950s the general public had very little awareness or concern aboutenvironmental issues In the 1960s concern for the environment began to grow, helped at least

some by the book Silent Spring by Rachel Carson (Carson, 1962) The ongoing significance of this

book is highlighted by the fact that a 1994 edition of the book has a foreword by then VicePresident Al Gore In this book Ms Carson brought attention to the widespread and sometimesindiscriminate use of DDT and other chlorinated hydrocarbons, organic phosphates, arsenic, andother materials, and the impact of this use on ground and surface water, soil, plants, and animals.She cites examples of workers overcome by exposure to large doses of chemicals, and changes inanimal populations after use of these chemicals, to build the case that widespread use of thesematerials is harmful She also discusses the link between these chemicals and cancer

Trang 14

Rachel Carson’s message about concern for the environment came at a time, the 1960s, whenAmerica was ready for a “back-to-the-earth” message With the youth of America and othersorganizing to oppose the war in Vietnam, the two causes fit well together and encouraged eachother’s growth This was reflected in the music of the time, with many songs in the sixties andseventies discussing environmental issues, often combined with sentiments against the war andnuclear power The war in Vietnam ended, but the environmental movement lives on.

There are many examples of rock songs of the sixties and seventies discussing environmental

issues In 1968 the rock musical Hair warned about the health effects of sulfur dioxide and carbon monoxide Zager and Evans in their 1969 song In The Year 2525 talked about taking from the earth

and not giving back, and in 1970 the Temptations discussed air pollution and many other social

issues in Ball of Confusion Three Dog Night also warned about air pollution in their 1970 songs Cowboy and Out in the Country Perhaps the best example of a song about the environment is Marvin Gaye’s 1971 song Mercy Mercy Me (The Ecology), in which he talked about oil polluting

the ocean, mercury in fish, and radiation in the air and underground In 1975 Joni Mitchell told

farmers not to use DDT in her song Big Yellow Taxi, and the incomparable songwriter Bob Dylan got into the act with his 1976 song A Hard Rain’s A-gonna Fall, warning about poison in water

and global hunger It’s not a coincidence that this time frame overlaps all of the significant earlyenvironmental regulations

A good example of an organized environmental effort that started in those days and continuestoday is Earth Day Organized by Senator Gaylord Nelson and patterned after teach-ins against thewar in Vietnam, the first Earth Day was held on April 22, 1970, and an estimated 20 million peoplearound the country attended, according to television anchor Walter Cronkite In the 10 years afterthe first Earth Day, 28 significant pieces of federal environmental legislation were passed, alongwith the establishment of the U.S Environmental Protection Agency (EPA) in December of 1970.The first major environmental act, the National Environmental Policy Act of 1969 (NEPA)predated Earth Day, and had the stated purposes (Yost, 1997) of establishing harmony betweenman and the environment; preventing or eliminating damage to the environment; stimulating thehealth and welfare of man; enriching the understanding of ecological systems; and establishment ofthe Council on Environmental Quality Since that act, many laws protecting the environment havebeen passed at the national, state, and local levels

Evidence that public interest in environmental issues is still high can be found in the public

reaction to the book A Civil Action (Harr, 1995) This book describes the experience of people in

the town of Woburn, Massachusetts A number of people in the town became ill and some died due

to contamination of groundwater with TCE, an industrial solvent This book made the New York Times bestseller list, and was later made into a movie starring John Travolta More recently, the movie Erin Brockovich starring Julia Roberts covered a similar issue in California with Pacific Gas

and Electric and problems with hexavalent chrome in groundwater causing serious health issues.Public interest in the environment is exemplified by the various watchdog organizations thattrack environmental issues in great detail A good example of this is Scorecard.org, (EnvironmentalDefense, 2001) a Web site that provides a very large amount of information on environmentalcurrent events, releases of toxic substances, environmental justice, and similar topics For example,

on this site you can find the largest releasers of pollutants near your residence Sites like thisdefinitely raise public awareness of environmental issues

It’s also important to point out that the environmental industry is big business According toreports by the U.S Department of Commerce and Environmental Business International (as quoted

in Diener, Terkla, and Cooke, 2000), the environmental industry in the U.S in 1998 had $188.7billion in sales, up 1.6% from the previous year It employed 1,354,100 people in 115,850companies The worldwide market for environmental goods and services for the same period wasestimated to be $484 billion

Trang 15

Figure 1 - The author (front row center) examining state-of-the-art punch card technology in 1959

THE COMPUTER REVOLUTION

In parallel with growing public concern for the environment has been growth of technology tosupport a better understanding of environmental conditions While people have been usingcomputing devices of some sort for over a thousand years and mainframe computers since the1950s (see Environmental Computing History Timeline sidebar), the advent of personal computers

in the 1980s made it possible to use them effectively on environmental projects For moreinformation on the history of computers, see Augarten (1984) and Evans (1981) Discussions of thehistory of geological use of computers are contained in Merriam (1983,1985)

With the advent of Windows-based, consumer-oriented database management programs in the

1990s, the tools were in place to create an environmental data management system (EDMS) to

store data for one or more facilities and use it to improve project management

Computers have assumed an increasingly important role in our lives, both at work and athome The average American home contains more computers than bathtubs From electronicwatches to microwave ovens, we are using computers of one type or another a significantpercentage of our waking hours In the workplace, computers have changed from big numbercrunchers cloistered somewhere in a climate-controlled environment to something that sits on ourdesk (or our lap) No longer are computers used only for massive computing jobs which could not

be done by hand, but they are now replacing the manual way of doing our daily work This is astrue in the earth science disciplines as anywhere else Consequently, industry sages have suggestedthat those who do not have computer skills will be left behind in the next wave of automation of the

Trang 16

Environmental Computing History Timeline

1000 BC – The Abacus was invented (still in use)

1623 – The first mechanical calculator was invented by German professor Wilhelm

Schickard

1834 – Charles Babbage began work on the Analytical Engine, which was never completed

1850 – Charles Lyell was the first person to use statistics in geology

1876 – Alexander Graham Bell patented the telephone

1890 – Herman Hollerith built the Tabulating Machine, which was the first successful

mechanical calculating machine

1899 – The River and Harbor Act was the first environmental law passed in the United

States

1943 – The Mark 1, an electromechanical calculator, was developed

1946 – ENIAC (Electronic Numerator, Integrator, Analyzer and Computer) was completed.(Dick Tracy’s wrist radio also debuted in the comic strip.)

1947 – The transistor was invented by Bardeen, Brattain, and Shockley at Bell Labs

1951 – UNIVAC, the first commercial computer, became available

1952 – Digital plotters were introduced

1958 – The integrated circuit was invented by Jack Kilby at Texas Instruments

1962 – Rachel Carson’s Silent Spring is published, starting the environmental movement.

1965 – IBM white paper on computerized contouring appeared

1969 – National Environmental Policy Act (NEPA) was enacted

1970 – The first Earth Day was held

1970 – Relational data management was described by Edwin Codd

1971 – The first microprocessor, the Intel 4004, was introduced

1973 – SQL was introduced by Boyce and Chamberlain

1977 – The Apple II, the first widely accepted personal computer, was introduced

1981 – IBM releases its Personal Computer This was the computer that legitimized smallcomputers for business use

1984 – The Macintosh computer was introduced, the first significant use of a graphical userinterface on a personal computer

1985 – Windows 1.0 was released

1990 – Microsoft shipped Windows 3.0, the first widely accepted version

1994 – Netscape Navigator was released by Mosaic Communications, leading to widespreaduse of the World Wide Web

workplace At the least, those who are computer aware will be in a better position to evaluate howcomputers can help them in their work

The growth that we have seen in computer processing power is related to Moore’s law(Moore, 1965; see also Schaller, 1996), which states that the capacity of semiconductor memorydoubles every 18 months The price-performance ratio of most computer components meets orexceeds this law over time For example, I bought a 10 megabyte hard drive in 1984 for $800 In

2001 I could buy a 20 gigabyte hard drive for $200, a price-performance increase of 8000 times in

17 years This averages to a doubling about every 16 months Over the same time, PC processingspeed has increased from 4 megahertz for $5000 to 1000 megahertz for $1000, an increase of

1250, a doubling every 20 months These average to 18 months So computers become twice aspowerful every year and a half, obeying Moore’s law

Unlike 10 or especially 20 years ago, it is now usual in industrial and engineering companiesfor each employee to have a suitable computer on his or her desk, and for that computer to benetworked to other people’s computers and often a server This computing environment is a goodbase on which to build a data management system

Trang 17

As the hardware has developed, so has the data management software It is now possible tooutfit an organization with the software for a client-server data management system starting at

$1,000 or $2,000 a seat Users probably already have the hardware Adding softwarecustomization, training, support, and other costs still allows a powerful data management system to

be put in place for a cost which is acceptable for many projects

In general, computers perform best when problem solving calls for either deductive orinductive reasoning, and poorly when using subjective reasoning For example, calculating a series

of stratigraphic horizon elevations where the ground level elevation and the depth to the formationare known is an example of deductive reasoning Computers perform optimally on problemsrequiring deductive reasoning because the answers are precise, requiring explicit computations.Estimating the volume of contamination or contouring a surface is an example of inductivereasoning Inductive reasoning is less precise, and requires a skilled geoscientist to critique andmodify the interpretation Lastly, the feeling that carbonate aquifers may be more complex thanclastic aquifers is an example of subjective reasoning Subjective reasoning uses qualitative dataand is the most difficult of all for computer analysis In such instances, the analytical potential ofthe computer is secondary to its ability to store and graphically portray large amounts ofinformation Graphic capabilities are requisite to earth scientists in order to make qualitative datausable for interpretation

Another example of appropriate use of computers relative to types of reasoning is thedistinction between verification and validation, which is discussed in detail in Chapter 16.Verification, which checks compliance of data with project requirements, is an example ofdeductive logic Either a continuous calibration verification sample was run every ten samples or itwasn’t Validation, on the other hand, which determines the suitability of the data for use, is verysubjective, requiring an understanding of sampling conditions, analytical procedures, and theexpected use of the data Verification is easily done with computer software How far software can

go toward complete automation of the validation process remains to be seen

CONVERGENCE - ENVIRONMENTAL DATA MANAGEMENT

Efficient data management is taking on increased importance in many organizations, and yours

is probably no exception In the words of one author (Diamondstone, 1990, p 3):

Automated measuring equipment has provided rapidly increasing amounts of

data Now, the challenge before us is to assure sufficient data uniformity and

compatibility and to implement data quality measures so that these data will be

useful for integrative environmental problem solving.

This is particularly true in organizations where many different types of site investigation andmonitoring data are coming from a variety of different sources Fortunately, software tools are nowavailable which allow off-the-shelf programs to be used by people who are not computer experts toretrieve this data in a format that is meaningful to them According to Finkelstein (1989, p 3):

Management is on the threshold of an explosive boom in the use of computers A

boom initiated by simplicity and ease of use Managers and staff at all levels of

an organization will be able to design and implement their own systems, thereby

dramatically reducing their dependence on the data processing (DP)

department, while still ensuring that DP maintains central control, so that

application systems and their data can be used by others in the business.

With the advent of relatively easy to use software tools such as Microsoft Windows andMicrosoft Access, it is even more true now that individuals can have a much greater role in

Trang 18

satisfying their own data management needs It is important to develop a data managementapproach that makes efficient use of these tools to solve the business problem of managing datawithin the organization The environmental data management system that will result fromimplementation of a plan based on this approach will provide users with access to theorganization’s environmental data to satisfy their business needs It will allow them to expand theirdata retrievals as their needs change and as their skills develop.

As with most business decisions, the decision to implement a data management system should

be based on an analysis of the expected return on the time and money invested In the case of anoffice automation project, some of the return is tangible and can be expressed in dollar savings,and some is intangible savings in efficiency in everyday operations In general, the best approachfor system implementation is to look for leverage points in the system where a great return can behad for a small cost The question becomes: How do you improve the process to get the greatestsavings?

Often some examples of tangible returns can be identified within the organization Thebenefits can best be seen from analyzing the impact of the data management system on the wholesite investigation and remediation process For example, during remediation you might be able, bymore careful tracking and modeling of the contamination, to decrease the amount of waste to beremoved or water to be processed You may also be able to decrease the time required to completethe project and save many person-years of cost by making quality data available in a standardizedformat and in a timely fashion For smaller sites, automating the data management process canprovide savings by repetition Once the system has been set up for one site and people trained touse it, that effort can be re-used on the next site

The intangible benefits of a data management system are difficult to quantify, but subjectivelycan include increased job satisfaction of project workers, a higher quality work product, and betterdecision making The cumulative financial and intangible return on investment of these variousbenefits can easily justify reasonable expenditures for a data management system

CONCEPT OF DATA VS INFORMATION

It is important to recognize that there is a difference between numbers and letters stored in acomputer and useful information Numbers stored in a computer, or printed out onto a sheet ofpaper, may not themselves be of any value It is only when those numbers are presented in a formthat is useful to the intended audience that they become useful information The keys to making thetransition from data to information are organization and access It doesn't matter if you have a file

of all the monitoring wells ever drilled; if you can't get the information you want out of the file, it isuseless Before any database is created, careful attention should be paid to how the data is going to

be used, to ensure that the maximum use can be received from the effort

Statistics and graphics can be tremendously helpful in perceiving relationships among differentvariables contained in the data As the power and ease-of-use of both general business programsand technical programs for statistics and graphics improves, it will become common to take a goodlook at the data as a set before working with individual members of the set

The next step is to move from information to knowledge The difference between the two isunderstanding Once you have processed the information and understand it, it becomes knowledge.This transition is a human activity, not a computer activity, but the computer can help bypresenting the information in an understandable manner

EMS VS EMIS VS EDMS

A final overview issue to discuss is the relationship between EMS (environmental management systems), EMIS (environmental management information systems), and site EDMS (environmental

data management systems) An EMS is a set of policies and procedures for managing

Trang 19

Data is or Data are?

Is “data” singular or plural? In this book the word data is used as a singular noun.Depending on your background, you may not like this Many engineers and scientists think ofdata as the plural of “datum,” so they consider the word plural Computer people view data as

a chunk of stuff, and, like “chunk,” consider it singular In one dictionary I consulted(Webster, 1984), data as the plural of datum was the third definition, with the first two beingsynonyms for “information,” which is always treated as singular It also states that commonusage at this time is singular rather than plural, and that “data can now be used as a singularform in English.” In Strunk and White (1935), a style manual that I use, the discussion ofsingular vs plural nouns uses an example of the contents of a jar If the jar contains marbles,its contents are plural If it contains jam, its content is singular You decide: Is data jam ormarbles?

environmental issues for an organization or a facility An EMIS is a software system implemented

to support the administration of the EMS (see Gilbert, 1999) EMIS usually has a focus on recordkeeping and reporting, and is implemented with the hope of improving business processes andpractices A site environmental data management systems (EDMS) is a software system formanaging data regarding the environmental impact of current or former operations EDMSoverlaps partially with EMIS systems For an operating facility, the EDMS is a part of the EMIS.For a facility no longer in operation, there may be no formal EMS or EMIS, but the EDMS isnecessary to facilitate monitoring and cleanup

Trang 20

CHAPTER 2

SITE DATA MANAGEMENT CONCEPTS

The size and complexity of environmental investigation and monitoring programs at industrialfacilities continue to increase Consequently the amount of environmental data, both at operatingfacilities and orphan sites, is growing as well The volume of data often exceeds the capacity ofsimple tools like paper reports and spreadsheets When that happens it is appropriate to implement

a more powerful data management system and often the system of choice is a relational databasemanager This section provides a top-down discussion of management of environmental data Itfocuses on the purpose and goals of environmental data management, and on the types and

locations of data storage These issues should always be resolved before an electronic (or in fact

any) data management system should be implemented

PURPOSE OF DATA MANAGEMENT

Why manage data electronically? Or why even manage it at all? Clear answers to thesequestions are critical before a successful system can be implemented This section addresses some

of the issues related to the purpose of data management It all comes down to planning If youunderstand the goal to be accomplished, you have a better chance of accomplishing it

There is only one real purpose of data management: to support the goals of the organization.These goals are discussed in detail in Chapter 8 No data management system should be builtunless it satisfies one or more significant business or technical goals Identification of these goalsshould be done prior to designing and implementing the system for two reasons One reason is thatthe achievement of these goals provides the economic justification for the effort of building thesystem The other reason is that the system is more likely to generate satisfactory results if thoseresults are understood, at least to a first approximation, before the system is implemented andfunctionality is frozen

Different organizations have different things that make them tick For some organizations,internal considerations such as cost and efficiency are most important For others, outsideappearances are equally or more important The goals of the organization must be taken intoconsideration in the design of the system so that the greatest benefit can be achieved Typical goalsinclude:

Improve efficiency – Environmental site investigation and remediation projects can involve

an enormous amount of data Computerized methods, if they are well designed and implemented,

Trang 21

can be a great help in improving the flow of data through the project They can also be a great sink

of time and effort if poorly managed

Maximize quality – Because of the great importance of the results derived from

environmental investigation and remediation, it is critical that the quality of the output bemaximized relative to the cost This is not trivial, and careful data storage, and annotation of datawith quality information, can be a great help in achieving data quality objectives

Minimize cost – No organization has an unlimited amount of money, and even those with a

high level of commitment to environmental quality must spend their money wisely to receive thegreatest return on their investment This means that unnecessary costs, whether in time or money,must be minimized Electronic data management can help contain costs by saving time andminimizing lost data

People tend to start working on a database without giving a lot of thought to what a databasereally is It is more than an accumulation of numbers and letters It is a special way to help usunderstand information Here are some general thoughts about databases:

A database is a model of reality – In many cases, the data that we have for a facility is the

only representation that we have for conditions at that facility This is especially true in thesubsurface, and for chemical constituents that are not visible, either because of their physicalcondition or their location

The model helps us understand the reality – In general, conditions at sites are nearly

infinitely complex The total combination of geological, hydrological and engineering factorsusually exceeds our ability to understand it without some simplification Our model of the site,based on the data that we have, helps us to perform this simplification in a meaningful way

This understanding helps us make decisions – Our simplified understanding of the site

allows us to make decisions about actions to be taken to improve the situation at the site Ourmodel lets us propose and test solutions based on the data that we have, identify additional datathat we need, and then choose from the alternative solutions

The clearer the model, the better the decisions – Since our decisions are based on our

data-based model, it follows that we will make better decisions if we have a clear, accurate, up-to-datemodel The purpose of a database management system for environmental data is to provide us theinformation to build accurate models and keep them current

Clearly information technology, including data management, is important to organizations.Linderholm (2001) reports the results of a study that asked business executives about theimportance of information technology (IT) to their business 70% reported that it was absolutelyessential, and 20% said it was extremely valuable The net increase in revenue attributable to IT,after accounting for IT costs, was estimated to be 20%, clearly a good return 70% said that therole of IT in business strategy is increasing In the environmental business the story must besimilar, but perhaps not as strong If you were to survey project managers today about theimportance of data management on their projects, probably the percentage that said it was essential

or extremely valuable would be less than the 90% quoted above, and maybe less than 50% But asthe amount of data for sites continues to grow, this number will surely increase

TYPES OF DATA STORAGE

Once the purpose of the system has been determined, the next step is to identify the data to becontained in the system and how it is to be stored Some data must be stored electronically, while

Environmental problems are complex problems Complex problems have simple, understand wrong answers

easy-to-From Environmental Humor by Gerald Rich (1996), reprinted with permission

Trang 22

other data might not need to be stored this way Implementers should first develop a thoroughunderstanding of their existing data and storage methods, and then make decisions about howelectronic storage can provide an improvement This section will cover ways of storing siteenvironmental data The content of an EDMS will be discussed in Chapter 4.

Hard copy

Since its inception, hard copy data storage has been the lifeblood of the environmental

industry Many organizations have thousands of boxes of paper related to their projects Theimportance of this data varies greatly, but in many organizations, it is not well understood

A data management system for hard copy data is different from a data management system fordigital data such as laboratory analytical results The former is really a document managementsystem, and many vendors offer software and other tools to build this type of system The latter ismore of a technical database issue, and can be addressed by in-house generated solutions or off-the-shelf or semi-custom solutions from environmental software vendors

LAB REPORTS

Laboratory analyses can generate a large volume of paper Programs like the U.S.E.P.A.Contract Lab Program (CLP) specify deliverables that can be hundreds of pages for one samplingevent This paper is important as backup for the data, but these hundreds of pages can cause astorage and retrieval problem for many organizations Often the usable data from the lab event, that

is, the data actually used to make site decisions, may be only a small fraction of the paper, with therest being quality assurance and other backup information

DERIVED REPORTS

Evaluation of the results of laboratory analysis and other investigation efforts usually results in

a printed report These reports contain a large amount of useful information, but over time can alsobecome a storage and retrieval problem

Electronic

There are many different ways of organizing data for digital storage There is no “right” or

“wrong” way, but there are approaches that provide greater benefits than others in specificsituations People store environmental data a lot of different ways, both in database systems and inother file types Here we will discuss two non-database ways of storing data, and several differentdatabase system designs for storing data

TEXT FILES AND WORD PROCESSOR FILES

The simplest way to manage environmental data is in text files These files contain just the

information of interest, with no formatting or information about the data structure or relationships

between different data elements Usually these files are encoded in ASCII, which stands for

American Standard Code for Information Interchange and is pronounced like as′-kee For thisreason they are sometimes called ASCII files Text files can be effectively used for storing andtransferring small amounts of data Because they lack “intelligence” they are not usually practicalfor large data sets For example, in order to search for one piece of data in a text file you must look

at every word until you find the one you are looking for, rather than using a more efficient methodsuch as indexed searching used by data management programs

A variation on text files is word processor files, which contain some formatting and structure

resulting from the word processing program that created them An example of this would be thedata in a table in a report Again this works well only for small amounts of data

Trang 23

Over the years a large amount of environmental data has been managed in spreadsheets This

approach works for data sets that are small to medium in size, and where the display and retrievalrequirements are relatively simple For large data sets, a database manager program is usuallyrequired because spreadsheets have a limit to the number of rows and columns that they contain,and these limits can easily be exceeded by a large data set For example, Lotus 123 has a limit ofabout 16,000 rows of data, and Excel 97 has a limit of 65,536 rows

Spreadsheets do have their place in working with environmental data They are particularlyuseful for statistical analysis of data and for graphing in a variety of ways Spreadsheets are fordoing calculations Database managers are for managing data As long as both are usedappropriately, the two together can be very powerful

The problem with spreadsheets occurs when they are used in situations where real datamanagement is required For example, it’s not unusual for organizations to manage quarterlygroundwater monitoring data using spreadsheets They can do statistics on the data and printreports Where the problem becomes evident is when it becomes necessary to do a historicalanalysis of the data It can be very difficult to tie the data together The format of the spreadsheetsmay have evolved over time The file for one quarter may be missing or corrupted Suddenly itbecomes a chore to pull all of the data together to answer a question such as “What is the trend ofthe sulfate values over the last five years?”

DATABASE MANAGERS

For storing large amounts of data, and where immediate calculations are not as important,

database managers usually do a better job than spreadsheets, although the capabilities of

spreadsheets and databases certainly overlap somewhat The better database managers allow you tostore related data in several different tables and to link them together based on the contents of thedata Many database manager programs have a reputation for not being very easy to use, partlybecause of the sheer number of options available This has been improved with the menu-driveninterfaces that are now available These interfaces help with the learning curve, but datamanagement software, especially database server software, can still be very difficult to master.Many database manager programs provide a programming language, which allows you toautomate tasks that you perform often or repeatedly It also allows you to configure the system forother users This language provides the tools to develop sophisticated applications programs fornearly any data handling need, and provides the basis for some commercial EDMS software.Database managers are usually classified by how they store and relate data The most commontypes are flat files, hierarchical, network, object-oriented, and relational Most use the terminology

of “record” for each object in the database (such as a well or sample location) and “field” for eachtype of information on each object (such as county or collection date) For information on databasemanagement concepts see Date (1981) and Rumble and Hampel (1984)

Sullivan (2001) quotes a study by the University of California at Berkeley that humans havegenerated 12 exabytes (an exabyte is over 1 million terabytes, or a million trillion bytes) of datasince the start of time, and will double this in the next two and a half years Currently, about 20%

of the world’s data is contained in relational databases, while the rest is in flat files, audio, video,pre-relational, and unstructured formats

Flat file

A flat file is a two-dimensional array of data organized in rows and columns similar to a

spreadsheet This is the simplest type of database manager All of the data for a particular type ofobject is stored in a single file or table, and each record can have one instance of data for eachfield A good analogy is a 3"×5" card file, where there is one card (record) for each item beingtracked in the database, and one line (field) for each type of information stored

Trang 24

Flat file database managers are usually the cheapest to buy, and often the easiest to use, but thecomplexity of real-world data often requires more power than they can provide.

In a flat file storage system, each row represents one observation, such as a boring or a sample.Each column contains the same kind of data An example of a flat file of environmental data isshown in the following table:

B-1 725 1050 681 2/3/96 JLG 05 not det 6.8B-1 725 1050 681 5/8/96 DWR 05 not det 05 not det 6.7B-2 706 342 880 11/4/95 JAM 3.7 detected 9.1 detected 5.2B-2 706 342 880 2/3/96 JLG 2.1 detected 8.4 detected 5.3B-2 706 342 880 5/8/96 DWR 1.4 detected 7.2 detected 5.8B-3 714 785 1101 2/3/96 JLG 05 not det 8.1B-3 714 785 1101 5/8/96 CRS 05 not det 05 not det 7.9

Figure 2 - Flat file of environmental data

In this table, each line is the result of one sampling event for an observation well Since thewells were sampled more than once, and analyzed for multiple parameters, information specific tothe well, such as the elevation and location (X and Y), is repeated This wastes space and increasesthe chance for error since the same data element must be entered more than once The same is truefor sampling events, represented here by the date and the initials of the person doing the sampling.Also, since the format for the analysis results requires space for each value, if the value is missing,

as it is for some of the chloride measurements, the space for that data is wasted

In general, flat files work acceptably for managing small amounts of data such as individualsampling events They become less efficient as the size of the database grows Examples of flat filedata management programs are FileMaker Pro (www.filemaker.com) and Web-based databaseprograms such as QuickBase (www.quickbase.com)

Hierarchical

In the hierarchical design, the one-to-many relationship common to many data sets is

formalized into the database design This design works well for situations such as multiple samplesfor each boring, but has difficulty with other situations such as many-to-many relationships Thistype of program is less common than flat files or relational database managers, but is appropriatefor some types of data In a hierarchical database, data elements can be viewed as branches of aninverted tree

A good example of a hierarchical database might be a database of organisms At the top would

be the kingdom, and underneath that would be the phyla for each kingdom Each phylum belongs

to only one kingdom, but each kingdom can have several phyla The same continues down the linefor class, order, and so on The most important factor in fitting data into this scheme is that theremust be no data element at one level that needs to be under more than one element at a higherlevel If a crinoid could be both a plant and an animal at the same time, it could not be classified in

a hierarchical database by phylogeny (which biological kingdom it evolved from)

Environmental site data is for the most part hierarchical in nature Each site can have manymonitoring wells Each well can have many samples, either over time or by depth Then eachsample can be analyzed for multiple constituents Each constituent analysis comes from onespecific sample, which comes from one well, which comes from one site

A data set which is inherently hierarchical can be stored in a relational database manager, andrelational database managers are somewhat more flexible, so pure hierarchical database managersare now rare

Trang 25

In the network data model, multiple relationships between different elements at the same level

are easy to manage Hypertext systems (such as the World Wide Web) are examples of managingdata this way Network database managers are not common, but are appropriate in some cases,especially those in which the interrelationships among data are complex

An example of a network database would be a database of authors and articles Each authormay have written many articles, and each article may have one or more authors This is called a

“many-to-many” relationship This is a good project for a network database manager Each author

is entered, as is each article Then the links between authors and articles are established The dataelements are entered, and then the network established Then an article can be called up, and theinformation on its authors can be retrieved Likewise, an author can be named, and his or herarticles listed

A network data topology (geometric configuration) can be stored in a relational databasemanager A “join table” is needed to handle the many-to-many relationships Storing the abovearticle database in a relational system would require three tables, one for authors, one for articles,and a join table with the connections between them

Object oriented

This relatively recent invention stores each data element as an object with properties andmethods encapsulated (wrapped up) into each object This is a deviation from the usual separation

of code and data, but is being used successfully in many settings Current object-oriented systems

do not provide the data retrieval speed on large data sets provided by relational systems Using thistype of software involves a complete re-education of the user, since different terminology andconcepts are used It is a very powerful way to manipulate data for many purposes, and is likely tosee more widespread use Some of the features of object-oriented databases are described in thenext few paragraphs

Encapsulation – Traditional programming languages focus on what is to be done This is

referred to as “procedural programming.” Object-oriented programming focuses on objects, whichare a blend of data and program code (Watterson, 1989) In a procedural paradigm (a paradigm is

an approach or model), the data and the programs are separate In an object-oriented paradigm, theobjects consist of data that knows what to do with itself, that is, objects contain methods forperforming actions This is called encapsulation Thus, instead of applying procedures to passivedata, in object-oriented programming systems (OOPS), methods are part of the objects

Some examples of the difference between procedural systems and OOPS might be helpful In aprocedural system, the data for a well could contain a field for well type, such as monitoring well

or soil boring The program operating on the data would know what symbol to draw on the mapbased on the contents of that field In an OOPS the object called “soil boring” would include amethod to draw its symbol, based on the data content (properties) of the object Properties ofobjects in OOPS are usually loosely typed, which means that the distinction between data typessuch as integers and characters is not rigorously defined This can be useful when, as is often thecase, a numeric property such as depth to a particular formation needs to be filled with charactervalues such as NP (not present) or NDE (not deep enough)

For another illustration, imagine modeling a rock or soil body subject to chemical and physicalprocesses such as leaching or neutralization using an OOPS Each mineral phase would be anobject of class “mineral,” while each fluid phase would be an object of class “fluid.” Methodsknown to the objects would include precipitation, dissolution, compaction, and so on The model isgiven an initial condition, and then the objects interact via messages triggering methods until somefinal state is reached

Inheritance – Objects in an OOPS belong to classes, and members of a particular class share

the same methods Also, similar classes of objects can inherit properties and methods from anexisting class This feature, called inheritance, allows a building-block approach to designing a

Trang 26

database system by first creating simple objects and then building on and combining them intomore complex objects In this way, an object called “site” made up of “well” objects would knowhow to display itself with no additional programming.

Message Passing – An object-oriented program communicates with objects via messages, and

objects can exchange messages as well For example, an object of class “excavated material” couldsend a message to an object of class “remediation pit” which would update the property “remainingmaterial” within object “remediation pit.”

Polymorphism – A method is part of an object, and is distinct from messages between

objects The objects “well” and “boring” could both contain the method “draw yourself,” andsending the “draw yourself” message to one or the other object will cause a similar but differentresult This is referred to as polymorphism

Object-oriented programming directly models the application, with messages being passedbetween objects being the analog of real-world processes (Thomas, 1989) Software written in thisway is easier to maintain because programmers, other than the author, can easily comprehend theprogram code Since program code is easily reusable, development of complex applications can bedone more quickly and smoothly Encapsulation, message passing, inheritance, and polymorphismgive OOPS developers very different tools from those provided by traditional programminglanguages Also, OOPS often use a graphical user interface and large amounts of memory, makingthem more suitable to high-end computer systems For these reasons, OOPS have been slow ingaining acceptance, but they are gaining momentum and are considered by many to be theprogramming system of the future

Examples of object-oriented programming languages include Smalltalk developed by Xerox atthe Palo Alto Research Center in the 1970s (Goldberg and Robson, 1983); C++, which is asuperset of the C programming language (Soustrup, 1986); and Hypercard for the Macintosh.NextStep, the programming environment for the Next computer, also uses the object-orientedparadigm

There are several database management programs that are designed to be object oriented,which means that their primary data storage design is to store objects Also, a number of relationaldatabase management systems have recently added object data types to allow object-orientedapplications to use them as data repositories, and are referred to as Object-Relational systems

Relational

Relational database managers and SQL are discussed in much greater detail in Chapter 3, andare described here briefly for comparison with other database manager types In the relational

model, data is stored in one or more tables, and these tables are related, that is, they can be joined

together, based on data elements within the tables This allows storage of data where there may bemany pieces of one type of information related to one object (one-to-many relationship), as well asother relationships such as hierarchical and many-to-many In many cases, this has been found to

be the most efficient form of data storage for large, complicated databases, because it providesefficient data storage combined with flexible data retrieval Currently the most popular type ofdatabase manager program is the relational type

A file of monitoring well data provides a good example of how real-world data can be stored

in a relational database manager One table is created which contains the header data for the wellincluding location, date drilled, elevation, and so on, with one record for each well For each well,the driller or logger will report a different number of formation tops, so a table of formation tops iscreated, with one record for each top A unique identifier such as well ID number relates the twotables to each other Each well can also have one or more screened intervals, and a table is createdfor each of those data types, and related by the same ID number Each screened interval can havemultiple sampling events, with a description for each, so another table can be created for thesesample events, which can be related by well ID number and sample event number Very complex

Ngày đăng: 11/08/2014, 10:22

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm