RELATIONAL MANAGEMENT and DISPLAY of SITE ENVIRONMENTAL DATA - PART 4 pptx

The implementers of the system should provide a data transfer standard DTS so that the electronic data deliverables EDDs created by the laboratory for the EDMS contain the appropriate da

Trang 1

PART FOUR - MAINTAINING

THE DATA

Trang 2

MANUAL ENTRY

Sometimes there’s no other way to get data into the system other than transcribing it from hardcopy, usually by typing it in This process is slow and error prone, but if it’s the only way, and ifthe data is important enough to justify it, then it must be done The challenge is to do the entrycost-effectively while maintaining a sufficient level of data quality

Historical entry

Often the bulk of manual entry is for historical data Usually this is data in hard-copy files Itcan be found in old laboratory reports, reports which have been submitted to regulators, and manyother places

DATA SELECTION - WHAT’S REALLY IMPORTANT?

Before embarking on a manual entry project, it is important to place a value on the data to beentered The importance of the data and the cost to enter it must be balanced It is not unusual for adata entry project for a large site, where an effort is made to locate and input a comprehensive set

of data for the life of the facility, to cost tens or hundreds of thousands of dollars The decision toproceed should not be taken lightly

LOCATING AND ORGANIZING DATA

The next step, and often the most difficult, is to find the data This is often complicated by thefact that over time many different people or even different organizations may have worked on the

Trang 3

project, and the data may be scattered across many different locations It may even be difficult tolocate people who know or can find out what happened in the past It is important to locate asmuch of this historical data as possible, and then the portion selected as described in the previoussection can be inventoried and input.

Once the data has been found, it should be inventoried On small projects this can be done inword processor or spreadsheet files For larger projects it is appropriate to build a database just totrack documents and other items containing the data, or include this information in the EDMS.Either way, a list should be made of all of the data that might be entered This list should beupdated as decisions are made about what data is to be entered, and then updated again as the data

is entered and checked If the data inventory is stored in the EDMS, it should be set up so that afterthe data is imported it can be tracked back to the original source documents to help answerquestions about the origin of the data

TOOLS TO HELP WITH CORRECT ENTRY

There are a number of ways to enter the data, and these options provide various levels ofassistance in getting clean data into the system

Entry and review process – Probably the most common approach used in the environmental

industry is manual entry followed by visual review In this process, someone types in the data, then

it is printed out in a format similar to the one that was used for import Then a second personcompares every piece of data between the two pieces of paper, and marks any inconsistencies.These are then remedied in the database, and the corrections checked The end result, if doneconscientiously, is reliable data The process is tedious for those involved, and care should betaken that those doing it keep up their attention to detail, or quality goes down Often it is best tomix this work with other work, since it is hard to do this accurately for days on end Some peopleare better at it than others, and some like it more than others (Most don’t like it very much.)

Double entry – Another approach is to have the data entered twice, by two different people,

and then have special software compare the two copies Data that does not match is then enteredagain This technique is not as widely used as the previous one in the environmental industryperhaps because existing EDMS software does not make this easy to do, and maybe also becausethe human checking in the previous approach sounds more reliable

Scanning and OCR – Hardware and software are widely available to scan hard copy

documents into digital format, and then convert it into editable text using optical character recognition (OCR) The tools to do this have improved immensely over the last few years, such

that error rates are down to just a few errors per page Unfortunately, the highest error rates arewith older documents and with numbers, both of which are important in historical entry ofenvironmental data Also, because the formats of old documents are widely variable, it is difficult

to fit the data into a database structure after it has been scanned These problems are most likely to

be overcome, from the point of view of environmental data entry, when there is a large amount ofdata in a consistent format, with the pages in good condition Unless you have this situation, thenscanning probably won’t work However, this approach has been known to work on some projects.After scanning, a checking step is required to maintain quality

Voice entry – As with scanning, voice recognition has taken great strides in recent years.

Systems are available that do a reasonable job of converting a continuous stream of spoken wordsinto a word processing document Voice recognition is also starting to be used for on-screennavigation, especially for the handicapped It is probably too soon to tell whether this technologywill have a large impact on data entry

Offshore entry – There are a number of organizations in countries outside the United States,

especially Mexico and India, that specialize in high-volume data entry They have been verysuccessful in some industries, such as processing loan applications Again, the availability of alarge number of documents in the same format seems to be the key to success in this approach, and

a post-entry checking step is required

Trang 4

Figure 55 - Form entry of analysis data

Form entry vs spreadsheet entry – EDMS programs usually provide a form-based system

for entering data, and the form usually has fields for all the data at each level, such as site, station,sample, and analysis Figure 55 shows an example of this type of form This is usually best forentering a small amount of data For larger data entry projects, it may be useful to make acustomized form that matches the source documents to simplify input Another common approach

is to enter the data into a spreadsheet, and then use the import tool of the EDMS to check andimport the data Figure 56 shows this approach This has two benefits The EDMS may have betterdata checking and cleanup tools as part of the import than it does for form entry Also, the personentering the data into the spreadsheet doesn’t necessarily need a license for the EDMS software,which can save the project money Sometimes it is helpful to create spreadsheet templates withthings like station names, dates, and parameter lists using cut and paste in one step, and then havethe results entered in a second step

Ongoing entry

There may be situations where data needs to be manually entered on an ongoing basis This isbecoming less common as most sources of data involve a computerized step, so there is usually away to import the data electronically If not, approaches as described above can be used

ELECTRONIC IMPORT

The majority of data placed into the EDMS is usually in digital format in some form or other

before it is brought into the system The implementers of the system should provide a data transfer standard (DTS) so that the electronic data deliverables (EDDs) created by the laboratory for the

EDMS contain the appropriate data elements in a format suitable for easy import An example DTS

is shown in Appendix C

Trang 5

Figure 56 - Spreadsheet entry of analysis data

Automated import routines should be provided in the EDMS so that data in the specifiedformat (or formats if the system supports more than one) can be easily brought into the system andchecked for consistency Data review tracking options and procedures must be provided Inaddition, if it is found that a significant amount of digital data exists in other formats, then importsfor those formats should be provided In some cases, importing those files may require operatorinvolvement if, for example, the file is a spreadsheet file of sample and analytical data but does notcontain site or station information These situations usually must be addressed on a case-by-casebasis

Historical entry

Electronic entry of historical data involves several issues including selecting, locating, andorganizing data, and format and content issues

Data selection, location, and organization – The same issues exist here as in manual input in

terms of prioritizing what data will be brought into the EDMS Then it is necessary to locate andcatalog the data, whatever format it is in, such as on a hard drive or on diskettes

Format issues – Importing historical data in digital format involves figuring out what is in the

files and how it is formatted, and then finding a way to import it, either interactively using queries

or automatically with a menu-driven system Most modern data management programs can read avariety of file formats including text files, spreadsheets, word processing documents, and so on.Usually the data needs to be organized and reformatted before it can be merged with other dataalready in the EDMS This can be done either in its native format, such as in a spreadsheet, orimported into the database program and organized there If each file is in a different format, thenthere can be a big manual component to this If there are a lot of data files in the same format, itmay be possible to automate the process to a large degree

Trang 6

Content issues – It is very important that the people responsible for importing the data have a

detailed understanding of the content of the data being imported This includes knowing where thedata was acquired and when, how it is organized, and other details like detection limits, flags, andunits, if they are not in the data files Great care must be exercised here, because often details likethese change over time, often with little or no documentation, and are important in interpreting thedata

Ongoing entry

The EDMS should provide the capability to import analytical data in the format(s) specified inthe data transfer standard This import capability must be robust and complete, and the softwareand import procedures must address data selection, format, and content issues, and special issuessuch as field data, along with consistency checking as described in a later section

Data selection – For current data in a standard format, importing may not be very

time-consuming, but it may still be necessary to prioritize data import for various projects The return onthe time invested is the key factor

Format and content issues – It may be necessary to provide other import formats in addition

to those in the data transfer standard The identification of the need to implement other dataformats will be made by project staff members The content issues for ongoing entry may be lessthan for historical data, since the people involved in creating the files are more likely to beavailable to provide guidance, but care must still be taken to understand the data in order to get it

in right

Field data – In the sampling process for environmental data there is often a field component

and a laboratory component More and more the data is being gathered in the field electronically It

is sometimes possible to move this data digitally into the EDMS Some hard copy information isusually still required, such as a chain of custody to accompany the samples, but this can begenerated in the field and printed there The EDMS needs to be able to associate the field dataarriving from one route with the laboratory data from another route so both types of data areassigned to the correct sample

Understanding duplicated and superseded data

Environmental projects generate duplicated data in a variety of ways Particular care should be

taken with duplicated data at the Samples and Analyses levels Duplicate samples are usually the

result of the quality assurance process, where a certain number of duplicates of various types aretaken and analyzed to check the quality of the sampling and analysis processes QC samples are

described in more detail in Chapter 15 A sample can also be reanalyzed, resulting in duplicated

results at the Analyses level These results can be represented in two ways, either as the original

result plus the reanalysis, or as a superseded (replaced) original result plus the new, unsuperseded

result The latter is more useful for selection purposes, because the user can easily choose to seejust the most current (unsuperseded) data, whereas selecting reanalyzed data is not as helpfulbecause not all samples will have been reanalyzed Examples of data at these two levels and thevarious fields that can be involved in the duplications at the levels are shown in Figure 57

Obtaining clean data from laboratories

Having an accurate, comprehensive, historical database for a facility provides a variety ofbenefits, but requires that consistency be enforced when data is being added to the database.Matching analytical data coming from laboratories with previous data in a database can be a time-consuming process

Trang 7

* Unique Index - For Water Samples

* Unique Index - For Water Analyses

Duplicate - Samples Level

Superseded - Analysis Level

MW-1

Field pH Field pH Field pH Field pH Naphthalene Naphthalene Naphthalene

Sample Date*

Leach Method*

8/1/2000

None None None None None None None

Matrix*

Basis*

Water

None None None None None None None

Filtered*

Superseded*

Total

0 1 2 3 0 1 2

Duplicate*

Value Code

0 1 2

None None None None Original DL1 DL2

QC Code

Dilution Factor

Original Field Dup.

Split

1 50 10

Lab ID

Reportable Result

2000-001 2000-002 2000-003

N Y N

Figure 57 - Duplicate and superseded data

Variation in station names, spelling of constituent names, abbreviation of units, and problemswith other data elements can result in data that does not tie in with historical data, or, even worse,does not get imported at all because of referential integrity constraints An alternative is a time-consuming data checking and cleanup process with each data deliverable, which is standardoperating procedure for many projects

WORKING WITH LABS - STANDARDIZING DELIVERABLES

The process of getting the data from the laboratory in a consistent, usable format is a keyelement of a successful data management system Appendix C contains a data transfer standard(DTS) that can be used to inform the lab how to deliver data EDDs should be in the same formatevery time, with all of the information necessary to successfully import the data into the databaseand tie it with field samples, if they are already there Problems with EDDs fall into two generalareas: 1) data format problems and 2) data content problems In addition, if data is gathered in thefield (pH, turbidity, water level, etc.) then that data must be tied to the laboratory data once thedata administrator has received both data sets Data format problems fall into two areas: 1) fileformat and 2) data organization The DTS can help with both of these by defining the formats (textfile, Excel spreadsheet, etc.) acceptable to the data management system, and the columns of data inthe file (data elements, order, width, etc.) Data content problems are more difficult, because theyinvolve consistency between what the lab is generating and what is already in the database.Variation in station names (is it “MW1” or “MW-1”?), spelling of constituent names, abbreviation

of units, and problems with other data elements can result in data that does not tie in with historicaldata Even worse, the data may not get imported at all because of referential integrity constraintsdefined in the data management system

Trang 8

Figure 58 - Export laboratory reference file

USING REFERENCE FILES AND A CLOSED-LOOP SYSTEM

While project managers expect their laboratories to provide them with “clean” data, on mostprojects it is difficult for the laboratory to deliver data that is consistent with data already in thedatabase What is needed is a way for the project personnel to keep the laboratory updated withinformation on the various data elements that must be matched in order for the data to importproperly Then the laboratory needs a way to efficiently check its electronic data deliverable(EDD) against this information prior to delivering it to the user When this is done, then projectpersonnel can import the data cleanly, with minimal impact on the data generation process at thelaboratory

It is possible to implement a system that cuts the time to import a laboratory deliverable by afactor of five to ten over traditional methods The process involves a DTS as described in

Appendix C to define how the data is to be delivered, and a closed-loop reference file system

where the laboratory compares the data it is about to deliver to a reference file provided by thedatabase user Users employ their database software to create the reference file This reference file

is then sent to the laboratory The laboratory prepares the electronic data deliverable (EDD) in theusual way, following the DTS, and then uses the database software to do a test import against thereference file If the EDD imports successfully, the laboratory sends it to the user If it does not, thelaboratory can make changes to the file, test it again, and once successful, send it to the user Userscan then import this file with a minimum of effort because consistency problems have beeneliminated before they receive it This results in significant time-savings over the life of a project

If the database tracks which laboratories are associated with which sites, then the creation ofthe reference file can start with selection of the laboratory An example screen to start the process

is shown in Figure 58

In this example, the software knows which sites are associated with the laboratory, and alsoknows the name to be used for the reference file The user selects the laboratory, confirms the filename, and clicks on Create File The file can then be sent to the laboratory via email or on a disk.This process is done any time there are significant changes to the database that might affect thelaboratory, such as installation of new stations (for that laboratory’s sites) or changes to the lookuptables

There are many benefits to having a centralized, open database available to project personnel

In order to have this work effectively the data in the database must be accurate and consistent.Achieving this consistency can be a time-consuming process By using a comprehensive datatransfer standard, and the closed-loop system described above, this time can be minimized In oneorganization the average time to import a laboratory deliverable was reduced from 30 minutesdown to 5 minutes using this process Another major benefit of this process is higher data quality.This increase in quality comes from two sources The first is that there will be fewer errors in thedata deliverable, and consequently fewer errors in the database, because a whole class of errors

Trang 9

related to data mismatches has been completely eliminated A second increase in quality is aconsequence of the increased efficiency of the import process The data administrator has moretime to scrutinize the data during and after import, making it easier to eliminate many other errorsthat would have been missed without this scrutiny.

Automated checking

Effective importing of laboratory and other data should include data checking prior to import

to identify errors and to assist with the resolution of those errors prior to placing the data in thesystem Data checking spans a range of activities from consistency checking through verificationand validation Performing all of the checks won’t ensure that no bad data ever gets into thedatabase, but it will cut down significantly on the number of errors The verification and validationcomponents are discussed in more detail in Chapter 16 The consistency checks should includeevaluation of key data elements, including referential integrity (existence of parents); valid site(project) and station (well); valid parameters, units, and flags; handling of duplicate results (samestation, sample date and depth, and parameter); reasonable values for each parameter; comparisonwith like data; and comparison with previous data

The software importing the data should perform all of the data checks and report on the resultsbefore importing the data It’s not helpful to have it give up after finding one error, since there maywell be more, and it might as well find and flag all of them so you can fix them all at once.Unfortunately, this is not always possible For example, valid station names are associated with aspecific site, so if the site in the import file is wrong, or hasn’t been entered in the sites table, thenthe program can’t check the station names Once the program has a valid site, though, it should beable to perform the rest of the checks before stopping

Of course, all of this assumes that the file being imported is in a format that matches what thesoftware is looking for If site name is in the column where the result values should be, the importshould fail, unless the software is smart enough to straighten it out for you

Figure 59 shows an example of a screen where the user is being asked what software-assisteddata checking they want performed, and how to handle specific situations resulting from thechecking

Figure 59 - Screen for software-assisted data checking

Trang 10

Figure 60 - Screen for editing data prior to import

You might want to look at the data prior to importing it Figure 60 shows an example of ascreen to help you do this If edits are made to the laboratory deliverable, it is important that arecord be kept of these changes for future reference

REFERENTIAL INTEGRITY CHECKING

A properly designed EDMS program based on the relational model should require that aparent entry exist before related child entries can be imported (Surprisingly, not all do.) Thismeans that a site must exist before stations for that site can be entered, and so on through stations,samples, and analyses Relationships with lookups should also be enforced, meaning that valuesrelated to a lookup, such as sample matrix, must be present and match entries in the lookup table.This helps ensure that “orphan” data does not exist in the tables Unfortunately, the databasesystem itself, such as Access, usually doesn’t give you much help when referential integrityproblems occur It fails to import the record(s), and provides an error message that may, or maynot, give you some useful information about what happened Usually it is the job of the applicationsoftware running within the database system to check the data and provide more detailedinformation about problems

CHECKING SITES AND STATIONS

When data is obtained from the lab it must contain information about the sites and samplesassociated with the data It is usually not a good idea to add this data to the main data tablesautomatically based on the lab data file This is because it is too easy to get bad records in thesetwo tables and then have the data being imported associated with those bad records In ourexperience, it is more likely that the lab has misspelled the station name than that you really drilled

a new well, although obviously this is not always the case It is better to enter the sites and stationsfirst, and then associate the samples and analyses with that data during import Then the importshould check to make sure the sites and stations are there, and tell you if they aren’t, so you can dosomething about it

On many projects the sample information follows two paths The samples and field data aregathered in the field The samples go to the laboratory for analysis, and that data arrives in theelectronic data deliverable (EDD) from the laboratory The field data may arrive directly from thefield, or may be input by the laboratory

Trang 11

Figure 61 - Helper screen for checking station names

If the field data arrives separately from the laboratory data, it can be entered into the EDMSprior to arrival of the EDD from the laboratory This entry can be done in the field in a portablecomputer or PDA, in a field office at the site, or in the main office Then the EDMS needs to beable to associate the field information with the laboratory information when the EDD is imported.Another approach is to enter the sample information prior to the sampling event Then theEDD can check the field data and laboratory data as it arrives for completeness The process needs

to be flexible enough to accommodate legitimate changes resulting from field activities (well

MW-1 was dry), but also notify the data administrator of data that should be there but is missing Thischecking can be performed on data at both the sample and analyses levels

The screen shown in Figure 61 shows the software helping with the data checking process.The user has imported a laboratory data file that has some problems with station names Theprogram is showing the names of the stations that don’t match entries already in the database, andproviding a list of valid stations to choose from The user can step through the problem stations,choosing the correct names If they are able to correctly match all of the stations, the import canproceed If not, they will need to put this import aside while they research the station names thathave problems

The import routine may provide an option to convert the data to consistent units, and this isuseful for some projects For other projects (perhaps most), data is imported as it was reported bythe laboratory, and conversion to consistent units is done, if at all, at retrieval time This isdiscussed in Chapter 19 The decision about whether to convert to consistent units during importshould be made on a project-by-project basis, based on the needs of the data users In general, ifthe data will be used entirely for site analysis, it probably makes sense to convert to consistent units

so retrieval errors due to mixed units are eliminated If the data will be used for regulatory andlitigation purposes, it is better to import the data as-is, and do conversion on output

CHECKING PARAMETERS, UNITS, AND FLAGS

After the import routine is happy with the sites and stations in the file, it should check theother data, as much as possible, to try to eliminate inconsistent data Data in the import file should

be compared to lookup tables in the database to weed out errors Parameter names in particularprovide a great opportunity for error, as do reporting units, flags, and other data

Trang 12

Figure 62 - Screen for entering defaults for required values

The system should provide screens similar to Figure 61 to help fix bad values, and flag recordsthat have issues that can’t be resolved so that they can be researched and fixed

Note that comparing values against existing data like sites and stations, or against lookups,only makes sure that the data makes sense, not that it is really right A value can pass a comparisontest against a lookup and still be wrong After a successful test of the import file, it is critical thatthe actual data values be checked to an appropriate level before the data is used

Sometimes the data being imported may not contain all of the data necessary to satisfyreferential integrity constraints For example, historical data being imported may not haveinformation on sample filtration or measurement basis, or even the sample matrix, if all of the data

in the file has the same matrix The records going into the tables need to have values in these fieldsbecause of their relationships to the lookup tables, and also so that the data is useful It is helpful ifthe software provides a way to set reasonable defaults for these values, as shown in Figure 62, sothe data can be imported without a lot of manual editing Obviously, this feature should be usedwith care, based on good knowledge of the data being imported, to avoid assigning incorrectvalues

OTHER CHECKS

There are a number of other checks that the software can perform to improve the quality of thedata being imported

Checking for repeated import – In the confusion of importing data, it is easy to accidentally

import, or at least try to import, the same data more than once The software should look for this,tell you about it, and give you the opportunity to stop the import It is also helpful if the softwaregives you a way to undo an import later if a file shouldn’t have been imported for one reason oranother

Parameter-specific reasonableness – Going beyond checking names, codes, etc., the

software should check the data for reasonableness of values on a parameter-by-parameter basis.For example, if a pH value comes in outside the range of 0 to 14, then the software could noticeand complain Setting up and managing a process like this takes a considerable amount of effort,but results in better data quality

Comparison with like data – Sometimes there are comparisons that can be made within the

data set to help identify incorrect values One example is comparing total dissolved solids reported

by the lab with the sum of all of the individual constituents, and flag the data if the difference

Trang 13

exceeds a certain amount Another is to do a charge balance comparison Again, this is not easy toset up and operate, but results in better data quality.

Comparison with previous data – In situations where data is being gathered on a regular

basis, new data can be compared to historical data, and data that is more than or less than previousdata by a certain amount (usually some number of standard deviations from the mean) is suspect.These data points are often referred to as outliers The data point can then be researched for error,re-sampled, or excluded, depending on the regulations for that specific project The field ofstatistical quality control has various tools for performing this analysis, including Shewhart andCumulative Sum control charts and other graphical and non-graphical techniques See Chapters 20and 23 for more information

CONTENT-SPECIFIC FILTERING

At times there will be specific data content that needs to be handled in a special way duringimport Some data will require specific attention when it is present in the import For example, oneproject that we worked on had various problems over time with phenols At different times thelaboratory reported phenols in different ways For this project, any file that contained any variety

of phenol required specific attention In another case, the procedure for a project specified thattentatively identified compounds (TICs) should not be imported at all The database softwareshould be able to handle these two situations, allowing records with specific data content to beeither flagged or not imported Figure 63 shows an example of a screen to help with this

Some projects allow the data administrator to manually select which data will be imported.This sounds strange to many people, but we have worked on projects where each line in the EDD isinspected to make sure that it should be imported If a particular constituent is not required by theproject plan, and the laboratory delivered it anyway, that line is deleted prior to import In Figure

60 the checkbox near the top of the screen is used for this purpose The software should allowdeleted records to be saved to a file for later reference if necessary

Figure 63 - Screen to configure content-specific filtering

Trang 14

Figure 64 - Screens showing results of a successful and an unsuccessful import

Trang 15

Figure 66 - Report showing a successful import

The report from the unsuccessful import can be used to resolve problems prior to trying theimport again At this stage it is helpful for the software to be able to summarize the errors so anerror that occurs many times is shown only once Then each type of error can be fixed genericallyand the report re-run to make sure all of the errors have been remedied so you can proceed with theimport

The report from the successful import provides a permanent record of what was imported Thisreport can be used for another purpose as well In the upper left corner is a panel (shown larger inFigure 67) showing the data review steps that may apply to this data This report can be circulatedamong the project team members and used to track which review steps have been performed Afterall of the appropriate steps have been performed, the report can be returned to the dataadministrator to enter the upgraded review status for the analyses

UNDOING AN IMPORT

Despite your best efforts, sometimes data is imported that either should not have been

imported or is incorrect An Undo Import feature can do this automatically for you if the software

provides this feature The database software should track the data that you import so you can undo

an import if necessary You might need to do this if you find out that a particular file that youimported has errors and is being replaced, or if you accidentally imported a file twice An undoimport feature should be easy to use but sophisticated, leaving samples in the database that haveanalyses from a different import, and undoing superseded values that were incremented by theimport of the file being undone Figure 68 shows a form to help you select an import for deletion

Trang 16

Figure 67 - Section of successful import report used for data review

Figure 68 - Form to select an import for deletion

TRACKING QUALITY

A constant focus on quality should be maintained during the import process Each result in thedatabase should be marked with flags regarding lab and other problems, and should also be marked

Trang 17

with the level of data review that has been applied to that result An example of a screen to assistwith maintaining data review status is shown in Figure 75 in Chapter 15.

If the import process is managed properly using software with a sufficiently sophisticatedimport tool, and if the data is checked properly after import, then the resulting data will be of aquality that it is useful to the project The old axiom of “garbage in, garbage out” holds true withenvironmental data Another old axiom says “a job worth doing is worth doing well,” or in otherwords, “If you don’t have time to do it right the first time, how are you ever going to find time to

do it again?” These old saws reinforce the point that the time invested in implementing a robustchecking system and using it properly will be rewarded by producing data that people can trust

Trang 18

CHAPTER 14

EDITING DATA

Once the data is in the database it is sometimes necessary to modify it This can be donemanually or using automated tools, depending on the task to be accomplished These two processesare described here Due to the focus on data integrity, a log of all changes to the data should bemaintained, either by the software or manually in a logbook

MANUAL EDITING

Sometimes it is necessary to go into the database and change specific pieces of data content.Actually, modification of data in an EDMS is not as common as an outsider might expect For themost part, the data comes from elsewhere, such as the field or the laboratory, and once it is in itstays the way it is Data editing is mostly limited to correcting errors (which, if the process isworking correctly, should be minimal) and modifying data qualifiers such as review status andvalidation flags

The data management system will usually provide at least one way to manually edit data.Sometimes the user interface will provide more than one way to view and edit data Two examples

include form view (Figure 69) and datasheet view (Figure 70)

Figure 69 - Site data editing screen in form view

Trang 19

Figure 70 - Site data editing screen in datasheet view

AUTOMATED EDITING

If the changes involve more than one record at a time, then it probably makes sense to use anautomated approach For specific types of changes that are a standard part of data maintenance,this should be programmed into the system Other changes might be a one-time action, but involvemultiple records with the same change, so a bulk update approach using ad hoc queries is better

Standardized tasks

Some data editing activities are a relatively common activity For these activities, especially ifthey involve a lot of records to be changed or a complicated change process, the software shouldprovide an automated or semi-automated process to assist the data administrator with making thechanges The examples given here include both a simple process and a complicated one to showhow the system can provide this type of capability

UPDATING REVIEW STATUS

It’s important to track the review status of the data, that is, what review steps have beenperformed on the data An automated editing step can help update the data as review steps arecompleted Automated queries should allow the data administrators to update the review statusflags after appropriate data checks have been made An example of a screen to assist withmaintaining data review status is shown in Figure 75 in Chapter 15

REMOVAL OF DUPLICATED ENTRIES

Repeated records can enter the database in several ways The laboratory may deliver data thathas already been delivered, either a whole EDD or part of one Data administrators may import thesame file twice without noticing (The EDMS should notify them if they try to do this.) Data thathas been imported from the lab may also be imported from a data validator with partial or completeoverlap The lab may include field data, which has already been imported, along with its data.However it gets in, this repeated data provides no value and should be removed, and records kept

of the changes that were made to the database However, duplicated data resulting from the qualitycontrol process usually is of value to the project, and should not be removed

Repeated information can be present in the database at the samples level, the analyses level, orboth The removal of duplicated records should address both levels, starting at the samples level,and then moving down to the analyses level This order is important because removing repeatedsamples can result in more repeated analyses, which will then need to be removed The samplescomponent of the duplicated record removal process is complicated by the fact that samples haveanalyses underneath them, and when a duplicate sample is removed, the analyses should probablynot be lost, but rather moved to the remaining sample The software should help you do this byletting you pick the sample to which you want to move the analyses Then the software shouldmodify the superseded value of the affected analyses, if necessary, and assign them to the othersample

Trang 20

Figure 71 - Form for moving analyses from a duplicate sample

The analyses being moved may in fact represent duplicated data themselves, and theduplicated record removal at the analyses level can be used to remove these results The analysescomponent of the duplicated record removal process must deal with the situation that, in somecases, redundant data is desirable The best example is groundwater samples, where fourindependent observations of pH are often taken, and should all be saved The database shouldallow you to specify for each parameter and each site and matrix how many observations should beallowed before the data is considered redundant

The first step in the duplicated record removal process is to select the data for the duplicateremoval process Normally you will want to work with all of the data for a sampling event Onceyou have selected the set of data to work on, the program should look for samples that might berepeated information It should do this by determining samples that have the same site, station,matrix, sample date, top and base, and lab sample ID Once the software has made itsrecommendations for samples you might want to remove, the data should be displayed for you toconfirm the action Before removing any samples, you should print a report showing the samplesthat are candidates for removal You should then make notes on this report about any actions takenregarding removal of duplicated sample records, and save the printed report in the project file

If a sample to be removed has related analyses, then the analyses must be moved to anothersample before the candidate sample can be deleted This might be the case if somehow someanalyses were associated with one sample in the database and other analyses with another, and infact only one sample was taken In that case, the analyses should be moved to the sample with aduplicate value of zero from the one with a higher duplicate value, and then the sample with ahigher duplicate value should be deleted The software should display the sample with the higherduplicate value first, as this is the one most likely to be removed, and display a sample that is alikely target to move the analyses to A screen for a sample with analyses to be moved might looklike Figure 71 The screen has a notation that the sample has analyses, and provides a combo box,

in gray, for you to select a sample to move the analyses to If the sample being displayed does nothave analyses, or once they have been moved to another sample, then it can be deleted In this case,the screen might look like Figure 72

Once you have moved analyses as necessary and deleted improper duplicates, the programshould look for analyses that might contain repeated information It can do this using the followingprocess: 1) Determine all of the parameters in the selection set 2) Determine the number of desiredobservations for each parameter Use site-specific information if it is present If it is not, use globalinformation If observation data is not available, either site-specific or global, for one or moreparameters, the software should notify you, and provide the option of stopping or proceeding 3)Determine which analyses for each parameter exceed the observations count

Trang 21

Figure 72 - Form for deleting duplicate samples for a sample without analyses

Next, the software should recommend analyses for removal The goal of this process is toremove duplicated information, while, for each sample, retaining the records with the most data.The program can use the following process: 1) Group all analyses where the sample and parameterare the same 2) If all of the data is exactly the same in all of the fields (except for AnalysisNumberand Superseded), mark all but one for deletion 3) If all of the data is not exactly the same, look atthe Value, AnalyticMethod, AnalDate_D, Lab, DilutionFactor, QCAnalysisCode, andAnalysisLabID fields If the records are different in any of these fields, keep them For records thatare the same in all of these fields, mark all but one for deletion (The user should be able to modifythe program’s selections prior to deletion.) If the data in all of these fields is the same, then keepthe record with the greatest number of other data fields populated, and mark the others for removal.Once the software has made its recommendations for analyses to be removed, the data should bedisplayed in a form such as that shown in Figure 73

In this example, the software has selected several analyses for removal Visible on the screenare two Arsenic and two Chloride analyses, and one of each has been selected for removal In thiscase, this appears appropriate, since the data is exactly duplicated The information on this screen

should be reviewed carefully by someone very familiar with the site You should look at each

analysis and the recommendation to confirm that the software has selected the correct action Afterselecting analyses for removal, but before removing any analyses, you should print a reportshowing the analyses that have been selected for removal You should save the printed report in theproject file

There are two parts to the Duplicated Record Removal process for analyses The first part isthe actual removal of the analytical records This can be done with a simple delete query, afterusers are asked to confirm that they really want to delete the records The second part is to modifythe superseded values as necessary to remove any gaps caused by the removal process This should

be done automatically after the removal has been performed

PARAMETER PRINT REORDERING

This task is an example of a relatively simple process that the software can automate It has to

do with the order that results appear on reports A query or report may display the results inalphabetical order by parameter name The data user may not want to see it this way A more usefulorder may be to see the data grouped by category, such as all of the metals followed by all of theorganics Or perhaps the user wants to enter some specific order, and have the system remember itand use it

Trang 22

Figure 73 - Form for deleting duplicated analyses

A good way to implement a specific order is to have a field somewhere in the database, such

as in the Parameters table, that can be used in queries to display the data in the desired order Forthe case where users want the results in a specific order, they can manually edit this field until theorder is the way they want it

For the case of putting the parameters in order by category, the software can also help A toolcan be provided to do the reordering automatically The program needs to open a query of theparameters in order by category and name, and then assign print orders in increasing numbers fromthe first to the last If the software is set up to skip some increment between each, then the user canslip a new one in the middle without needing to redo the reordering process The software can also

be set up to allow you to specify an order for the categories themselves that is different fromalphabetical, in case you want the organics first instead of the metals

Ad hoc queries

Where the change to be made affects multiple records, but will performed only once, or asmall number of times over the life of the database, it doesn’t make sense to provide an automatedtool, but manual entry is too tedious An example of this is shown in Figure 74 The problem is thatwhen the stations were entered, their current status was set to “z” for “Unknown,” even thoughonly active wells were entered at that time Now that some inactive wells are to be entered, thestatus needs to be set to “s” for “In service.”

Figure 74 shows an example of an update query to do this The left panel shows the query indesign view, and the right panel in SQL view The data administrator has told the software toupdate the Stations table, setting the CurrentStatusCode field to “s” where it is currently “z.” Thequery will then make this change for all of the appropriate records in one step, instead of the dataadministrator having to make the change to each record individually

This type of ad hoc query can be a great time saver in the hands of a knowledgeable user Itshould be used with great care, though, because of the potential to cause great damage to thedatabase Changes made in this way should be fully documented in the activity log, and backupcopies of the database maintained in case it is done wrong

Trang 23

Figure 74 - Ad hoc query showing a change to the CurrentStatusCode field

Trang 24

QA VS QC

Quality assurance (QA) is an integrated system of activities involving planning, quality

control, quality assessment, reporting, and quality improvement to ensure that a product or service

meets defined standards of quality with a stated level of confidence Quality control (QC) is the

overall system of technical activities whose purpose is to measure and control the quality of aproduct or service so that it meets the needs of users The aim is to provide quality that issatisfactory, adequate, dependable, and economical (EPA, 1997a) In an over-generalization, QAtalks about it and QC does it Since the EDMS involves primarily the technical data and activitiesthat surround it, including quantification of the quality of the data, it comes under QC more than

QA An EMS and the related EMIS (see Chapter 1), on the other hand, cover the QA component

THE QAPP

The quality assurance project plan (QAPP) provides guidance to the project to maintain the

quality of the data gathered for the project The following are typical minimum requirements for aQAPP for EPA projects:

Project management

• Title and approval sheet

• Table of Contents – Document control format

• Distribution List – Distribution list for the QAPP revisions and final guidance

• Project/Task Organization – Identify individuals or organizations participating in the projectand discuss their roles, responsibilities, and organization

Trang 25

• Problem Definition/Background – 1) State the specific problem to be solved or the decision to

be made 2) Identify the decision maker and the principal customer for the results

• Project/Task Description – 1) Hypothesis test, 2) expected measurements, 3) ARARs or otherappropriate standards, 4) assessment tools (technical audits), 5) work schedule and requiredreports

• Data Quality Objectives for Measurement – Data decision(s), population parameter of interest,action level, summary statistics, and acceptable limits on decision errors Also, scope of theproject (domain or geographical locale)

• Special Training Requirements/Certification – Identify special training that personnel willneed

• Documentation and Record – Itemize the information and records that must be included in adata report package, including report format and requirements for storage, etc

Measurement/data acquisition

• Sampling Process Designs (Experimental Design) – Outline the experimental design, includingsampling design and rationale, sampling frequencies, matrices, and measurement parameter ofinterest

• Sampling Methods Requirements – Sample collection method and approach

• Sample Handling and Custody Requirements – Describe the provisions for sample labeling,shipment, chain of custody forms, procedures for transferring and maintaining custody ofsamples

• Analytical Methods Requirements – Identify analytical method(s) and equipment for the study,including method performance requirements

• Quality Control Requirements – Describe routine (real-time) QC procedures that should beassociated with each sampling and measurement technique List required QC checks andcorrective action procedures

• Instrument/Equipment Testing Inspection and Maintenance Requirements – Discuss howinspection and acceptance testing, including the use of QC samples, must be performed toensure their intended use as specified by the design

• Instrument Calibration and Frequency – Identify tools, gauges and instruments, and othersampling or measurement devices that need calibration Describe how the calibration should

Assessment/oversight

• Assessments and Response Actions – Describe the assessment activities for this project

• Reports to Management – Identify the frequency, content, and distribution of reports issued tokeep management informed

Data validation and usability

• Data Review, Validation, and Verification Requirements – State the criteria used to accept orreject the data based on quality

Trang 26

What Is Quality?

Take a few minutes, put this book down, get a paper and pencil, and write a conciseanswer to: “What is quality in data management?” It’s harder than it sounds

“Quality … you know what it is, yet you don’t know what it is But that’s

self-contradictory But some things are better than others, that is, they have more quality But when you try to say what the quality is, apart from the things that have it, it all goes poof!

There’s nothing to talk about But if you can’t say what Quality is, how do you know what it

is, or how do you know that it even exists? If no one knows what it is, then for all practical

purposes it doesn’t exist at all But for all practical purposes, it really does exist … So round

and round you go, spinning mental wheels and nowhere finding anyplace to get traction

What the h _ is Quality? What is it?”

Robert M Pirsig, 1974 - Zen and the Art of Motorcycle Maintenance

Quality is a little like beauty We know it when we see it, but it’s hard to say how weknow When you are talking about data quality, be sure the person you are talking to has thesame meaning for quality that you do As an aside, a whole discipline has grown up aroundPirsig’s work, called the Metaphysics of Quality For more information, visit www.moq.org

• Validation and Verification Methods – Describe the process to be used for validating andverifying data, including the chain of custody for data throughout the lifetime of the project

• Reconciliation with Data Quality Objectives – Describe how results will be evaluated todetermine if DQOs have been satisfied

There are many sources of information on how to write a QAPP The EPA Web site(www.epa.gov) is a good place to start It is usually not necessary to create the QAPP from scratch.Templates for QAPPs are available from a number of sources, and one of these templates can bemodified for the needs of each specific project

QC SAMPLES AND ANALYSES

Over time, project personnel, laboratories, and regulators have developed a set of procedures

to help maintain data quality through the sampling, transportation, analysis, and reporting process.This section describes these procedures and their impact on environmental data management Anattempt has been made to keep the discussion general, but some of the issues discussed here apply

to some types of samples more than others Material in this section is based on information in EPA(1997a); DOE/HWP (1990a, 1990b); and Core Laboratories (1996) This section covers four parts

of the process: field samples, field QC samples, lab sample analysis, and lab calibration

There are several aspects of handling QC data that impact the way it should be handled in theEDMS The basic purpose of QC samples and analyses is to confirm that the sampling and analysisprocess is generating results that accurately represent conditions at the site If a QC sampleproduces an improper result, it calls into question a suite of results associated with that QC sample.The scope of the questionable suite of results depends on the samples associated with that QCsample The scope might be a shipping cooler of samples, a sampling event, a laboratory batch,and so on The questionable results must then be investigated further to determine whether they arestill usable

Another issue is the amount and type of QC data to store The right answer is to store thedata necessary to support the use of the data, and no more or less The problem is that differentprojects and uses have different requirements, and different parts of the data handling process can

be done either inside or outside the database system Once the range of data handling processes hasbeen defined for the anticipated project tasks that will be using the system, a decision must be

Định dạng
Số trang	53
Dung lượng	1,01 MB