GIS programs that support cross section displays can provide a similar featurewhere a user can click on a soil boring in a cross section, and then call up data from that boring, or a spe
Trang 1PART FIVE - USING THE DATA
Trang 2CHAPTER 18
DATA SELECTION
An important key to successful use of an EDMS is to allow users to easily find the data theyneed There are two ways for the software to assist the user with data selection: text-based andgraphical With text-based queries, the user describes the data to be retrieved using words,generally in the query language of the software Graphical queries involve selecting data from agraphical display such as a graph or a map Query-by-form is a hybrid technique that uses agraphical interface to make text-based selections
TEXT-BASED QUERIES
There are two types of text-based queries: canned and ad hoc The trade-off is ease of use vs.flexibility
Canned queries
Canned queries are procedures where the query is prepared ahead of time, and the retrieval is
done the same way each time An example would be a specific report for management orregulators, which is routinely generated from a menu selection screen The advantage of cannedselections is that they can be made very easy to use since they involve a minimum of choices forthe user The goal of this process is to make it easy to quickly generate the output that will berequired most of the time by most of the users The EDMS should make it easy to add new cannedqueries, and to connect to external data selection tools if required Figure 85 shows an example of
a screen from Access from which users can select pre-made queries The different icons next to thequeries represent the different query types, including select, insert, update, and delete The user can
execute a query by double-clicking on it Queries that modify data (action queries), such as insert,
update, and delete, display a warning dialog box before performing the action Other than with theicons, this screen does not separate selection queries from action queries, which results in somerisk in the hands of inexperienced or careless users
Trang 3Figure 85 - Access database window showing the Queries tab
Ad hoc queries
Sometimes it is necessary to generate output with a format or data content that was not
anticipated in the system design Text selections of this type are called ad hoc queries (“ad hoc” is
a Latin term meaning “for this”) These are queries that are created when they are needed for aparticular use This type of selection is more difficult to provide the user, especially the casualuser, in a way that they can comfortably use It usually requires that users have a goodunderstanding of the structure and content of the database, as well as a medium to high level ofexpertise in using the software, in order to perform ad hoc text-based queries The data modelshould be included with the system documentation to assist them in doing this
Unfortunately, ad hoc queries also expose a high level of risk that the data retrieved may not
be valid For example, the user may not include the units for analyses, and the database maycontain different units for a single parameter sampled at different times The data retrieved will beinvalid if the units are assumed to be the same, and there is no visible indication of the problem.This is particularly dangerous when the user is not seeing the result of the query directly, but usingthe data indirectly to generate some other result such as statistics or a contour map In general, it isdesirable to formalize and add to the menu as wide a variety of correctly formatted retrievals as
possible Then casual users are likely to get valid results, and “power users” can use the ad hoc
queries only as necessary
Figure 86 shows an example of creation of an ad hoc text-based query The user has created anew query, selected the tables for display, dragged the fields from the tables to the grid, andentered selection criteria In this case, the user has asked for all “Sulfate” results for the site “RadIndustries” where the value is > 1000 Access has translated this into SQL, which is shown in thesecond panel, and the user can toggle between the two The third panel shows the query indatasheet view, which displays the selected data The design and SQL views contain the sameinformation, although in Access it is possible to write a query, such as a union query, that can’t bedisplayed in design view and must be shown in SQL Some advanced users prefer to type in theSQL rather than use design view, but even for them the drag and drop can save typing andminimize errors
Trang 4Figure 86 - A text-based query in design, SQL, and datasheet views
GRAPHICAL SELECTION
A second selection type is graphical selection In this case, the user generates a graphical
display, such as a map, of a given site, selects the stations (monitoring wells, borings, etc.), thenretrieves associated analytical data from the database
Trang 5Figure 87 - Interactive graphical data selection
Figure 88 - Editing a well selected graphically
Trang 6Figure 89 - Batch-mode graphical data selection
Geographic Information System (GIS) programs such as ArcView, MapInfo, and Enviro Spaseprovide various types of graphical selection capability Some map add-ins that can be integratedwith database management and other programs, such as MapObjects and GeoObjects, also offerthis feature
There are two ways of graphically selecting data, interactive and batch In Figure 87 the userhas opened a map window and a list window showing a site and some monitoring wells The userthen double-clicked on one of the wells on the map, and the list window scrolled to show someadditional information on the well
In Figure 88 a well was selected graphically, then the user called up an editing screen to viewand possibly change data for that well The capability of working with data in its spatial contextcan be a valuable addition to an EDMS
In Figure 89 the user wanted to work with wells in or near two ponds The user dragged arectangle to select a group of wells, and then individually selected another Then the user asked thesoftware to create a list of information about those wells, which is shown on the bottom part of thescreen In this case the spatial component was a critical part of the selection process
Selection based on distance from a point can also be valuable The point can be a specificobject, such as a well, or any other location on the ground, such as a proposed constructionlocation The GIS can help you perform these selections
Other types of graphical selection include selection from graphs and selections from crosssections Some graphics and statistics programs allow you to create a graph, and then click on apoint on the graph and bring up information about that point, which may represent a station,sample, or analysis GIS programs that support cross section displays can provide a similar featurewhere a user can click on a soil boring in a cross section, and then call up data from that boring, or
a specific sample for that boring
Trang 7Figure 90 - Example of query-by-form
QUERY-BY-FORM
A technique that works well for systems with a variety of different user skill levels is
query-by-form, or QBF In this technique, a form is presented to the user with fields for some of the data
elements that are most likely to be used for selection The user can fill out as many of the fields asneeded to select the subset that the user is interested in The software then creates a query based onthe selection criteria This query can then be used as the basis for a variety of different lists,reports, graphs, maps, or file exports Figure 90 shows an example of this method
Trang 8Figure 91 - Query-by-form screen showing selection criteria for different data levels
In this example, the user has selected Analyses in the upper right corner Along the left sidethe user selected “Rad Industries” as the site, and “MW-1” as the station name In the center of thescreen, the user has selected a sample date range of greater than 1/1/1985, and “Sulfate” as theparameter The lower left of the screen indicates that there are 16 records that match these criteria,meaning that there are 16 sulfate measurements for this well for this time period When the userselected List, the form at the bottom of the screen was displayed showing the results
To be effective, the form for querying should represent the data model, but in a way that feelscomfortable to the user Also, the screen should allow the user to see the selection optionsavailable Figure 91 shows four different versions of a screen allowing users to make selections atfour different levels of the data hierarchy
The more defined the data model, the easier it is to provide advanced user-friendly selection.The Access query editor is very flexible, and will work with any tables and fields that might be inthe database However, the user has to know the values to enter into the selection criteria If thefields are well defined and won’t change, then a screen like that shown in Figures 90 and 91 canprovide selection lists to select values from Figure 92 shows an example of a screen showing theuser a list of parameter names to choose from
Trang 9Figure 92 - Query-by-form screen showing data choices
One final point to be emphasized is the reliance of data quality on good selection practices.This was discussed above and in Chapter 15 Improper selection and display can result in data that
is easy to misinterpret Great care must be taken in system design, implementation, and usertraining so that the data retrieved accurately represents the answer to the question the user intended
to ask
Trang 10CHAPTER 19
REPORTING AND DISPLAY
It takes a lot of work to build a good database Because of this, it makes sense to get as muchbenefit from the data as possible This means providing data in formats that are useful to as manyaspects of the project as possible, and printed reports and other displays are one of the primaryoutput goals of most data management projects This chapter covers a variety of issues for reportsand other displays Graph displays are described in Chapter 20 Cross sections are discussed inChapter 21, and maps and GIS displays in Chapter 22 Chapter 23 covers statistical analysis anddisplay, and using the EDMS as a data source for other programs is described in Chapter 24
TEXT OUTPUT
Whether the user has performed a canned or ad hoc query, the desired result might be a tabulardisplay This display can be viewed on the screen, printed, saved to a file, or copied to theclipboard for use in other applications Figure 93 is an example of this type of display This is the
most basic type of retrieval This is considered unformatted output, meaning that the data is there,
but there is no particular presentation associated with it
Figure 93 - Tabular display of output from the selection screen
Trang 11Figure 94 - Banded report for printing
FORMATTED REPORTS
Once a selection has been made, another option is formatted output The data can be sent to a
formatted report for printing or electronic distribution A formatted report is a template designedfor a specific purpose and saved in the program The report is based on a query or table thatprovides the data, and the report form provides the formatting
Standard (banded) reports
Figure 94 is an example of a report formatted for printing This example shows a standard
banded report, where the data at different parent-child levels is displayed in horizontal bands
across the page This is the easiest type of report to create in many database systems, and is mostuseful when there is a large amount of information to present for each data element, because one ormore lines can be dedicated to each result
Cross-tab reports
The next figure, Figure 95, shows a different organization called a cross-tab or pivot table
report In this layout, one element of the data is used to create the headers for columns In thisexample, the sample event information is used as column headers
Trang 12Figure 95 - Cross-tab report with samples across and parameters down
Figure 96 - Cross-tab report with parameters across and samples down
Figure 96 is a cross-tab pivoted the other way, with parameters across and sample eventsdown In general, cross-tab reports are more compact than banded reports because multiple resultscan be shown on one line
Trang 13Figure 97 - Data display options
Cross-tab reports provide a challenge regarding the display of field data when multiple fieldobservations must be displayed with the analytical data Typically there will be one result for eachanalyte (ignoring dilutions and reanalyses), but several observations of pH for each sample In across-tab, the additional pH values can be displayed either as additional columns or additionalrows Adding rows usually takes less space than additional columns, so this may be preferred, buteither way the software needs to address this issue
FORMATTING THE RESULT
There are a number of options that can affect how the user sees the data Figure 97 shows apanel with some of these options for how the data might be displayed
The user can select which regulatory limit or regulatory limit group to use for comparison,how to handle non-detected values, how to display graphs and handle field data, whether to includecalculated parameters, how to display the values and flags, how to format the date and time, andwhether to convert to consistent units and display regulatory limits
Regulatory limit comparison
For investigation and remediation projects, an important issue is comparison of analytical
results to regulatory limits or target levels These limits might be based on national regulations
such as federal drinking water standards, state or local government regulations, or site-specificgoals based on an operating permit or Record of Decision (ROD) Project requirements might be todisplay all data with exceedences highlighted, or to create a report with only the exceedences Formost constituents, the comparison is against a maximum value For others, such as pH, both anupper and a lower limit must be met
The first step in using regulatory limits is to define the limit types that will be used Figure 98shows a software screen for doing this The user enters the regulatory limit types to be used, alongwith a code for each type
The next step is to enter the limits themselves Figure 99 shows a form for doing this Limitscan be entered as either site-specific or for all sites For each limit, the matrix, parameter, and limittype are entered, along with the upper and lower limits and units The regulatory limit units areparticularly important, and must be considered in later comparison, and should be taken intoconsideration in conversion to consistent units as described below
There is one complication that must be addressed for limit comparison to be useful for manyproject requirements Often the requirement is for different parameters, or groups of parameters, to
be compared to different limit types on the same report For example, the major ions might becompared to federal drinking water standards, but the organics may be compared to more stringentlocal or site-specific criteria This requires that the software provide a feature to allow the use ofdifferent limits for different parameters Figure 100 shows a screen for doing this The user enters aname for the group, and then selects limits from the various limit types to use in that group
Trang 14Figure 98 - Form for defining regulatory limit types
Figure 99 - Form for entering regulatory limits
Figure 100 - Form for defining regulatory limit groups
Trang 15Figure 101 - Selection of regulatory limit or group for reporting
After the limits and groups have been defined, they can be used in reporting Figure 101 shows
a panel from the selection screen where the user is selecting the limit type or group for comparison.The list contains both the regulatory limit types and the regulatory limit groups, so either one can
be used at report time The software code should be set up to determine which type of limit hasbeen selected, and then retrieve the proper data for comparison
Value and flag
Analytical results contain much more information than just the measured value A laboratorydeliverable file may contain 30 or more fields of data for each analysis In a banded report there isroom to display all of this data When the result is displayed in a cross-tab report, there is only onefield for each result, but it is still useful to display some of this additional information The itemsmost commonly involved in this are the value, the analytical flag, and the detection limit Different
EDMS programs handle this in different ways, but one way to do it is using fields for reporting
factor and reporting basis that are based on the analytical flag Another way to do it is to have a
text field for each analysis containing exactly the formatting desired Examples of reporting factorand reporting basis values, and how each result might look, are shown in the following table:
b Both value and flag 1 3.7 v 0.1 3.7 v
l Less than sign (<) and
detection limit or value
1 3.7 u 0.1 < 0.1
g Greater than sign (>) and
d Detection limit (times factor)
Trang 16Flag code Flag Reporting factor Reporting basis
Non-detects
When laboratories analyze for a constituent, it may or may not be found If it is not found, it is
referred to as not detected, or a non-detect The various different detection limits used by
laboratories are discussed in Chapter 12 If the result is not detected at the appropriate limit, the labshould flag (qualify) the data with a flag such as “u” for “undetected.” It should also report thedetection limit and the limit type It may or may not place the detection limit in the value field
In reporting and otherwise working with non-detects, they can be handled in several ways In afull, banded report, the value, flag, detection limit, and detection limit type can all be reported In across-tab report, or an export such as an XYZ file for contouring, there is no room for that Thereare several ways to handle non-detects Often a combination of these is used
Ignore them – Analyses for which the constituent was not detected can be excluded This is
generally not a good idea, since the fact that the constituent wasn’t detected is useful information
Display the value – The software can display the value provided by the laboratory, but this is
risky, because the laboratory may or may not place the detection limit in the value field It has theadvantage of being easy to implement, because the report can be based on only one field
Trang 17Figure 102 - Form for defining calculated parameters
Display the detection limit – It makes sense to display the detection limit for non-detected
values and the value if there was a detection This is more complicated to program than just basingthe report on the value field, because the software has to look at the analysis record and determinewhich field to display, either using an IF statement (or more likely the slightly different immediateIIF) or using program code
Display the limit and qualify it – If the limit is displayed, it is helpful to qualify it in the
report, either by displaying a less than sign (<) or the flag To do this only for the non-detectsrequires special handling in the software
Apply a factor to the limit – Sometimes a numerical factor is applied to the detection limit
before it is displayed A common factor is one half, although others are sometimes used Thethinking is that the true value is somewhere between the detection limit and zero, so one half is agood guess This can be useful for estimating volumes of a material, or for other statisticalcalculations
Display a zero – A variation on using a factor is to use a zero for non-detects This is usually
not correct technically, but can be useful in some applications like contour mapping If you do use
a zero value in contouring, be sure to do so with care The value is not really a zero, but is less than
a specific value (the detection limit), and setting it to zero could be misleading, especially if thedetection limit is highly elevated, and the real value could be different enough from zero to affectthe surface Another option for contouring is to set the value to the indeterminate value, which isthe value (such as -99999) that the contouring program ignores in calculating the surface, but thenyou are throwing away the useful information that the value is low Some, but not many,contouring programs allow you to specify that the value is less than a certain amount, and then thesoftware constrains the surface based on that information That is the best solution if it is available.Which approach is best for displaying non-detects depends on the use of the data It isimportant that data users be aware of how the result is being displayed
Calculated fields
Sometimes it is helpful to display data that is based on calculations using data that is in the
database These are referred to as calculated fields or derived values These are results that are not
contained in the database, but are generated “on the fly” at retrieval time The software can provide
a system for defining and calculating these results Figure 102 shows an example of how this might
be presented
Trang 18In this screen, the user has specified that the software is to calculate the mass of the totaldissolved solids for a sample The input parameters have been selected as the total dissolved solidsconcentration times the effluent volume The result must then be scaled to the output units ofkilograms by dividing by one million The screen is also asking for a nesting order, whichdetermines the order in which multiple calculations are to be performed, allowing complicatedmulti-step calculations with many parameters if necessary There is also a checkbox to enable anddisable the calculated field, so that a particular calculation can be turned off and on withoutdeleting it.
Consistent units
It is possible that different results for the same parameter in the database might be in differentunits This can be avoided at import time, as described in Chapter 13, but that is not alwaysdesirable When the data is displayed in a banded report with one or more lines per result, and theunits displayed, then multiple units may not be a problem, since a unit is shown with each value In
a cross-tab report, or if only the numbers (and not the units) are being retrieved for use in statistics,graphing, or mapping, then it is mandatory to convert to consistent units A good approach is todefine in the software the target units for each parameter and matrix Matrix is important becausethe units for different matrices usually should be different For example, in water the concentration
of a constituent like a metal is reported as mass per unit volume, such as milligrams per liter, whilefor a solid such as soil, it is in mass per mass, such as milligrams per kilogram or parts per million
A screen for defining target units for each parameter is shown in Figure 103
The next step is to define all of the conversion factors necessary to do the conversions This isalso shown in Figure 103 Conversion of different units of the same scale, such as from milligramsper liter to micrograms per liter, is pretty straightforward Not all conversions are this simple,however, and great care must be taken in converting between different types of measure Forexample, the laboratory may express measurements of radioactive materials like radium226 inactivity, such as picocuries per gram In order to determine how much material is there, it is useful
to have the data in mass units, such as milligrams per kilogram This conversion, however, depends
on a number of factors, such as the isotopic mix, physical properties of the sample, and so on, andconsequently is at best site-specific, and at worst involves complicated statistical calculations Besure you know what you are doing before you go too far with unit conversions
Once the desired concentration and conversion factors have been defined, the software canperform the conversion It is obvious that the value should be converted, but usually you will alsowant to convert other related information, such as the detection limit, regulatory limits used forcomparison, and so on
Other issues
There are a number of other issues that arise in formatting the data to satisfy project needs.These include handling of decimal places and date and time formatting
Handling of decimal places, or significant figures, is an issue that is not done well in many
software programs Try this experiment Open a new database in Excel In one of the cells, type in3.00, and press Enter The zeros go away Access and other programs lose trailing zeros the sameway This results in lost information If the analysis was to two decimal places, then those zerosshould be displayed There are two ways to handle this in an Access-based database One is tostore the value as a text string, rather than as a number The other is to store the number of decimalplaces in a separate field, and combine the two if necessary at retrieval time using a user-definedfunction
Trang 19Figure 103 - Forms for defining units by parameter and matrix, and conversion between units
The issue of date and time formatting is related to the way that the data management software
stores dates and times, and how you want them displayed For example, Access combines dates andtimes into one field This field is a numeric field, with the whole number (left of the decimal point)representing the date Internally this is stored as the number of days since Dec 31, 1899, so a value
of 1 is Jan 1, 1900, and Jan 1, 2002 is 37257 The decimal portion of the date number (right ofthe decimal point) represents the time, starting at midnight For example, a value of 5 is 12:00 PM(noon) and 8:30 AM is 3541666667 This combination of date and time storage is different fromsome other systems, such as dBase and FoxPro, where the date and time are stored in separatefields For environmental projects, the date is nearly always important, but the time may or may not
be For example, for soil samples taken once, the time during the day that they were taken may not
be important, but for air samples taken every hour, it certainly would be For systems like Accessthat combine the date and time, it is useful to have a feature to turn the display of the time on andoff as appropriate for the data being displayed Reports can be formatted to display the date andtime field in different fields if desired
Trang 20Sample Point ->
Matrix: Water Sample Date ->
MW-12/26/1981
MW-14/20/1981
Parameters Reg Limit Units
MW-14/20/1981
Figure 104 - Reports with different levels of formatting for performance comparison
Formatting and performance
Keep in mind that asking the software to perform sophisticated formatting comes at a cost InFigure 104, the panel on the top has formatted values and comparison to regulatory limits Noticethat a regulatory limit is displayed for sulfate, and both sulfate values are bolded and underlinedbecause they exceed this limit Also, for 4/20/1981 the value for iron shows the value andanalytical flags, and the value for nitrate shows “<” and the detection limit This retrieval for 315records takes 17 seconds The panel on the bottom displays only the numbers, with no comparison
to limits, and takes 1.5 seconds In data management (as in most everything else) nothing is free
INTERACTIVE OUTPUT
In the past, nearly all of the focus of data management has been on generating printed reports
As data management software evolves, it is now becoming possible to work interactively with thedata in ways that before were either not possible or not time-effective
Figure 105 shows an example of this type of interactive display The software is showing theenvironmental data in a TreeView display This display, which is similar to the Windows Explorerdisplay, shows sites at the highest level, then stations, samples, and analyses At each level, themost pertinent data is displayed This type of display lets the user “drill down” to find a particularresult quickly, even in a large database
Trang 21Figure 105 - TreeView display of site data
ELECTRONIC DISTRIBUTION OF DATA
Often the person managing the data is not the person using it The best approach is foreveryone that needs the data to have direct access to it through the EDMS For various reasons,such as cost and location, this is not always possible There are several ways to overcome this One
is to make the data available more generally, such as through Web access Another way is throughelectronic distribution of reports The Adobe Portable Document Format (PDF) and the free PDFreader are a convenient way to distribute reports Users create the report that they want in theEDMS, and then print it to the PDF format using Acrobat for distribution Recipients of the reportcan use the free Acrobat reader to see it, formatted the way the database user intended
Trang 22CHAPTER 20
GRAPHS
There’s an old saying that a picture is worth a thousand words In many situations, presentingdata in a graphical display makes the information much more understandable A well-designedgraph of the data in a table can be many times more informative than the table alone This chapterand the next two describe and show a variety of graphic displays that can be used to presentenvironmental data This chapter discusses traditional graphs Other graphic displays, such as mapsand cross sections, are discussed in the following two chapters
GRAPH OVERVIEW
There’s a good and a bad side to graphs They can be used to display data in a formatconducive to greater understanding They can also be confusing, misleading, or even dishonest Anexcellent book by Tufte (1983) provides a wealth of information on various aspects of graphicaldata display, including graphs and maps According to Tufte, graphical displays should:
Show the data
Induce the viewer to think about the substance rather than about methodology, graphicdesign, the technology of graphic production, or something else
Avoid distorting what the data has to say
Present many numbers in a small space
Make large data sets coherent
Encourage the eye to compare different pieces of data
Reveal the data at several levels of detail, from a broad overview to fine structure
Serve a reasonably clear purpose: description, exploration, tabulation, or decoration
Be closely integrated with the statistical and verbal description of a data set
In addition, Tufte provides the following six principles of graphical integrity:
The representation of numbers, as physically measured on the surface of the graphic itself,should be directly proportional to the numerical quantities expressed
Clear, detailed, and thorough labeling should be used to defeat graphical distortions andambiguity Write out explanations of the data on the graphic itself Label important events
in the data
Show data variation, not design variation
In time-series displays of money, deflated and standardized units of monetarymeasurement are nearly always better than nominal units
The number of information-carrying (variable) dimensions depicted should not exceed thenumber of dimensions in the data
Graphics must not quote data out of context
Trang 23Following these two sets of guidelines will greatly increase your chance of creating goodgraphical displays Additional general information on graphs can be found in Milne (1992), andinformation specific to environmental graphing in Sara (1994, pp 11-19 to 11-28).
GENERAL CONCEPTS
Because graphing software is so accessible and easy to use, there is a tendency to throwtogether a graph of a bunch of data and be done with it If you try to follow Tufte’s guidelinesabove, then clearly there is more to it than that, from making sure the data is amenable to thegraphing technique you will be using to confirming at the end that the graph communicates thecorrect message If you keep in mind the key concepts of creating a graph, rather than take them forgranted, your graphs will be much more effective
Generally graphs present data with one data element graphed as a function of another.Commonly the independent variable, which is often presented against the X (horizontal) axis, istime, and the dependent variable, presented against the Y (vertical) axis, is the measured value It
is also possible to plot one observed value against another Sometimes the X-axis is called the
abscissa and the Y-axis is called the ordinate.
Data issues
Back in the day when graphs were created by hand, the person creating the graph was forced
to look at each data point, because he or she scaled it off and drew it on the graph With automatedprograms like Microsoft Excel and Golden Software’s Grapher, it is easy to create a graph withoutgiving it much thought This can result in a graph that looks great, but, in the worst case, is totallymeaningless For example, if you take a data set like the one graphed in Figure 106, and set thescale to logarithmic as discussed below, Grapher will complain if some of the data has a zero valueand can’t be graphed, but Excel won’t Those values may be important, and won’t be displayed ineither case, but with Excel you might not even know they are gone
There are a number of other data issues that can trip you up in creating graphs Chapter 19discussed the importance of checking units during data retrieval Use of non-detects and flaggeddata must be done carefully Duplicate data can also be a problem
A good policy is to take a hard look at the data after it has been retrieved from the EDMS, butbefore it is graphed Look at every number, or if there is too much data to do that, sort in variousways to understand the data ranges, relationships between different values, and so on Time spentdoing this will be rewarded by better graphs, ones that you are more likely to be able to trust
Coordinate systems
Graphing involves taking values and plotting them relative to some coordinate system Formost graphs this is a Cartesian XY system, but other systems, such as polar and radial plots, arepossible Think about which system will work best with your data and the message you are trying
to get across, rather than just using the default provided by the software
Graph scales
The scales of the graph determine the spacing of the points relative to each axis In the simplecase of an X-Y graph of two constituents against each other, the value range for each constituentwill be used as the scale for each axis In the case of a time-sequence graph, one of the axes(usually the horizontal one) is the time or date range, and the other is the value or values
Trang 240 200 400 600 800 1000 1200
U Tot0
1101001000
Parameter Comparison
Figure 106 - Comparison of linear vs logarithmic scales
For the case where the data has a large dynamic range, or where the data is lognormally
distributed, a logarithmic scale on one or both axes may be appropriate A graph with a logarithmic scale on one axis and a linear scale on the other is called a semi-log plot, and one with both axes logarithmic is called a log-log plot The graph on the right side of Figure 106 shows a
log-log plot The goal is to see the relationship between the two constituents in each sample Theleft graph shows the data graphed on a linear scale Most of the data is clustered in the lower left,and it is difficult to say what the relationship is The right graph shows a logarithmic scale for bothconstituents, and it is possible to see that there is a rough correlation between the two, and a samplewith a high value in one is likely to have a high value in the other In fact, it appears that there may
be several populations with different linear relationships between the constituents, perhapsrepresenting different sources of the material This was not at all apparent from the linear graph
Labels and annotations
There are two basic types of labels and annotations, those associated directly with graphelements, and those not Examples of the first type are the scale labels and scale titles Scale labelsidentify positions along a scale axis Usually there will be one set of labels per axis, such as thenumbers annotating the tic marks and the text label for the axis Labels not associated with graphelements include the graph title, legends, comments, and so on
TYPES OF GRAPHS
Because graphics are so useful, people have developed many different types of graphs to bestrepresent their data This section describes some of the most popular types of graphs, and thefollowing one shows some examples
Line graphs – Line graphs are often used to represent data in a series A grid is drawn, and
then one or more series of data are drawn on the grid Lines are used to connect the points tohighlight trends and patterns Often the horizontal axis (abscissa) is time, and the vertical axisWhenever presenting a forecast, give a number and a date, but never both
Rich (1996)
Trang 25(ordinate) is the value being compared, but this is not required Line graphs are probably the mostcommon type of technical graph.
Bar graphs – Bar graphs, also called column graphs, are good for displaying increases and
decreases in quantity over a period of time They work best when the amount of data to bedisplayed is not large As with line graphs, the horizontal axis is often time
Area graphs – Area graphs are similar to line graphs, except the areas under the curve(s) are
filled
Stacked graphs – A stacked graph is a variety of bar or (more commonly) area graph where
the values are stacked cumulatively rather than each starting at zero
Scatter plots – A scatter plot is used for displaying two variables for each point against each
other Scatter plots are very popular for technical data
Box plots – Box plots are special bar graphs that show the minimum, maximum, mean, and
lower and upper quartiles for each data group
Picture graphs – In picture graphs, the data is displayed with symbols rather than lines or
bars These are sometimes used for business presentations, but are not commonly used for displays
of technical data
Pie charts – A pie chart is a type of graph used to display the fractional parts of a whole like
slices of a pie, where the size, or more accurately the angular displacement, of each slice is based
on the percentage of the whole contributed by each value
Surface plots – Surface plots are used to show one variable as a function of two others They
are similar to contour displays used on maps, but the two independent variables can be somethingother than map coordinates
Rose diagram – A rose diagram is a circular graph of angular data Angular measurements,
such as joint or cross-bed directions, are grouped by an angle range, such as 10° or 30°, and thenumber of observations in each range are shown as distances from the center Before designing arose diagram, you should examine the variability in the data and set the increments (angle range) to
be graphed appropriately If the increment is too small for the data, then only “noise” is displayed
If too coarse, the real variability is lost An alternative way of drawing the rose diagram is to start
at the outer edge and increase the values toward the center This often helps to define trends inmulti-modal data sets better than the more conventional approach (Mike Wiley, pers comm.,2002)
Polar plot – A polar plot is also a circular graph of angular data Values as a function of angle
are shown as distances from the center, creating a line graph within a circle
Maps – It’s important to remember that maps are a type of graph Because maps have so many
special issues to discuss, they will be covered separately in Chapter 22 There are also manyopportunities for combining maps with traditional graphs to create visually rich and informativedisplays
GRAPH EXAMPLES
The following examples show graphs created by several different programs Figure 107 shows
a number of graphs created with Microsoft Excel Figure 108 shows some more technical graphtypes created with Grapher from Golden Software
The previous examples have used programs outside the EDMS Figure 110 shows a fairlytypical graph of one parameter (sulfate) from two wells plotted as a function of time within anEDMS program Figure 111 shows a variation on the time sequence graph where data from severalyears is folded onto one 12-month graph This was done to help identify seasonality in the data
Trang 26Line (time sequence) graph
0 200 400 600 800 1000 1200 1400
2/26/81 1/27/82 1/18/83 2/8/84 5/13/85 5/21/86 5/26/87 5/31/88 6/21/89 6/4/90 6/19/91 5/12/92 5/19/93 5/18/94
Sodium Sulfate
3-D bar graph
2/26/81 4/27/82 7/15/8311/13/84 5/21/86 9/16/87 12/15/88 3/23/90 6/19/91 8/5/92
0 200 400 600 800 1000 1200 1400
Sodium Sulfate
3-D bar graph with too much data
2/26/81 4/27/82
11/13/84 5/21/86 9/16/8712/15/88 3/23/90 6/19/91 8/5/92
3-D area graph
0 200 400 600 800 1000 1200 1400 1600 1800 2000
2/26/81 4/27/82 7/15/83 11/13/8 5/21/86 9/16/87 12/15/8 3/23/90 6/19/91 8/5/92 11/2/93
Sulfate Sodium
Stacked area graphFigure 107 - Examples of several graph types created with Microsoft Excel
Trang 2790
135 180
225 270 315
M6 MW -7 MW -8 MW -1 0
W-MW -1
1 0.8 0.6 0.4 0.2 0
Lith
ics
Trilinear plotFigure 108 - Examples of several graph types created with Grapher from Golden Software
Al
Pie chart created with ExcelFigure 109 - Additional graph examples
Trang 28Figure 110 - Formatted graph of selected parameter
Figure 111 - A graph of a constituent (blood lead) by month created by an EDMS
Sometimes it is useful to view graph data in its spatial context Figure 112 shows an example
of this type of display A map with an airphoto backdrop is displayed, along with symbols for thewell locations Time-sequence graphs are shown for five of the monitoring wells, with leader lines
to the wells from which the samples were taken This type of display shows the time sequence data,along with the spatial context of the wells, so inferences can be made about the progression ofvalues over time for different parts of the facility Graphing in spatial context is of greatest valuewhere the variation is expected to relate to geographic position For example, in addition to thewater quality parameters shown, parameters such as water level elevation and temperature oftenbenefit from being displayed in this manner
Figure 113 shows an enlarged view of part of Figure 112 The graphs show the value of theconstituents of interest, along with the vertical scales of the graph It also shows horizontal lines forthe mean value for bicarbonate, along with lines located three standard deviations above and belowthe mean, and a line for the regulatory limit Points that are outside the limit lines are displayed in adifferent color These points deserve additional scrutiny to determine if they are erroneous or real
Trang 29Figure 112 - Graphs displayed with leader lines to their map locations
Figure 113 - Enlarged graph showing control chart limits and outliers
CURVE FITTING
Often a graph, especially a time-sequence graph, will expose a trend in the data Manygraphing programs provide a way to fit a curve to the data to help understand the trend The curvecan help understand the trend by smoothing out irregularities and variations in the data
Trang 30Concentration Over Time
Concentration Over Time
0 10 20 30 40
Month
Value 3rd Order Polynomial
Figure 114 - Graphs showing trend lines
Curve fitting must be used with caution, however Figure 114 shows an example of two graphs
of the same data set, with trend lines suggesting two very different conclusions The data setconsists of four monthly observations: 10, 25, 30, 20, 25, and 25 The question is whether the data
is trending up or down Fitting a second-order polynomial suggests that the data is trending down.Changing to a third-order polynomial suggests an upward trend Which is correct? A scarierquestion is: Which will you use to prove your point?
Because graphing software makes it so easy to use high-order polynomial fitting, it is tempting
to use high orders to improve the fit However, a third-order polynomial is the lowest order thatcan produce both concave and convex curves on the same plot This may be the highest orderappropriate for many data sets
GRAPH THEORY
Graph theory is a topic that might be confused with the theory of creating graphs, but actuallycovers a different topic It is discussed here to make the point that graph theory and the theoreticalbasis for graphing data or functions are different issues Some of the theoretical issues related tocreating graphs, such as data issues, scales, etc., are discussed above The basic material of graph
theory is spatial connectivity or topology In graph theory, “graph” is used to denote a set of
vertices possibly connected by edges, as opposed to graphing data or values Graphs in graphtheory consist of points connected by lines (vertices connected by edges), and then various kinds ofstudies are performed on these graphs Unlike geometry, topology ignores spatial issues, andaddresses only issues that don’t change when objects are deformed An example of the type ofproblem studied by graph theory is the Four-Color Problem (Figure 115), which is a theory thatany map can be colored using four colors in such a way that adjacent regions (those sharing acommon boundary segment, not just a point) receive different colors
Graph theory may have application for environmental projects by analyzing the relationshipsbetween different areas of interest, or similar area-based studies
Figure 115 - Example of the Four-Color Problem in graph theory
Trang 31CHAPTER 21
CROSS SECTIONS, FENCE DIAGRAMS,
AND 3-D DISPLAYS
Environmental data, and geologic data in general, is inherently three dimensional A number
of graphical tools have been developed to assist with visualizing the 3-D configuration andrelationships contained in the data These range from logs through cross sections and fencediagrams to block diagrams
LITHOLOGIC AND WIRELINE LOGS
Rock or soil samples and geophysical measurements from boreholes and from outcrops make
up the basic data for many geologic projects Displaying this data as a function of depth is the firststep in interpretation Before the advent of personal computers, lithologic logs of samples wereprepared by manual drafting onto strip-log paper Wireline geophysical logs were drawn withanalog recorders on special chart paper Digital wireline logs arrived long before personalcomputers They were recorded on tape and plotted with pen plotters attached to mainframe orminicomputers Now both lithologic and wireline logs, including combinations of both in onedisplay, can be easily created using a computer program on a personal computer
For drill cuttings and outcrop samples, the plot usually consists of patterns for lithology typesalong with a text description of the rock, both plotted against depth on the vertical axis Curves forother factors, either measured or interpreted, may also be included Measured factors that can beplotted might include grain size, porosity, or oil saturation, while interpretive factors might includedepositional energy or diagenetic alteration Figure 116 shows an example of a typical lithologiclog for an environmental project
Geophysical measurements from boreholes (or less commonly from outcrops) are widely usedfor determining rock properties, and are also very valuable for stratigraphic correlation Displays
of two or more geophysical curves, such as spontaneous potential (SP) along with resistivity, orgamma ray plotted with neutron density or sonic travel time, are widely used for stratigraphic andstructural interpretation of subsurface rocks Figure 117 shows an example of a small portablegeophysical logging device
Trang 32Figure 116 - Lithologic log for an environmental project (Courtesy of RockWare)
Figure 117 - Gamma ray logger system (Courtesy of Geotech Environmental Equipment)
Trang 33Figure 118 - Cross section created from relational data
CROSS SECTIONS
Several lithologic or geophysical logs can be displayed side by side to form a cross section.
The use of cross sections on environmental projects is discussed in Sara (1994, pp 7-17 to 7-21).The manual approach is to tape several logs onto a big sheet of graph paper (cross section paper),
with the vertical position based on elevation (structural cross section) or on a stratigraphic horizon (stratigraphic cross section) This type of display is used to interpret the spatial position of rock
units or the lateral variation in lithologies This is particularly useful to assist with correlation oflithologic and stratigraphic units Contamination values can be added to increase the informationcontent of the cross sections
The vertical scale can be changed relative to the horizontal scale to adjust the vertical
exaggeration for cross sections, block diagrams, and other displays This is important because
many geological features are tabular in shape, and vertical exaggeration is necessary to be able tosee the features
Computers can be used to create cross section displays once the basic data on lithology,chemistry, or log values has been entered The user specifies which logs are to be used, how eachlog is to be displayed, and other information such as how the cross section is to be hung (structural
or stratigraphic datum) and how the logs are to be labeled Most cross section programs allowcorrelation lines to be drawn from log to log to display stratigraphic and structural relationships,and some allow the user to interactively pick formation tops from the logs for entry into a database.Figure 118 shows a cross section display of the concentration of uranium and radium in soil Itwas generated to show the part of the site that will need to be excavated It includes a combination
of laboratory data from soil samples along with downhole data from gamma logs Uranium valuesare to the left of each log, and radium values to the right The shaded rectangles represent soilsamples, and the continuous lines show the downhole gamma surveys Both the boxes and the linesare truncated at the excavation cutoff The elevations of geologic units, as well as the groundsurface and water table, have been added to aid in interpretation A series of parallel cross sections
of this type can be used to calculate the volume to be excavated
Trang 34Figure 119 - Electric log cross section
Figure 120 - Map and cross section views of model results
Figure 119 shows a cross section of electric log data This type of display is very useful forperforming subsurface correlations
Figure 120 shows two graphic displays from the same project, one a map and one a crosssection The values from the borings were used to create a 3-D geostatistical block model, whichwas then displayed along with the map and cross section
Cross sections can also be used to demonstrate changes in water chemistry or contaminantdistribution over time If the same wells were resampled over time, then cross sections of eachsampling period will show the changes This adds a fourth dimension to the information obtained
PROFILES
A type of display that is similar to a cross section is a profile, which is like a slice through asurface The surface is usually a grid created by a contouring program The profile represents thevalues of that surface along the line of the profile Sometimes profiles and log cross sections arecombined to show what the surface does between control points
Trang 35Figure 121 - Fence diagram (Courtesy of RockWare, Inc.)
FENCE DIAGRAMS AND STICK DISPLAYS
Extending from a one-dimensional lithologic or geophysical log or a two-dimensional crosssection to a three-dimensional display is very difficult with hand drafting techniques (see Tearpockand Bischke, 1991, pp 182-194), but can be easily done with a computer The user first specifieswhich wells are to be used, what data elements are to be displayed, and how they are to be shown.The software then uses the X-Y coordinates of the well locations to project them onto a three-
dimensional perspective view The logs can be shown with no connections between them (stick
diagram) or the formations can be connected from well to well (fence diagram) This type of
display can show three-dimensional relationships that are difficult to discern using other methods.Figure 121 shows an example of a fence diagram In this example, the geology from theborings, which are shown as curved lines, has been interpolated across the site, and then profilesdrawn at regularly spaced intervals Unless the wells are regularly spaced, which they usuallyaren’t, either the lines of the fence diagram must be crooked, or the lines drawn straight and thedata interpolated at the intersections This requires considerable confidence in your understanding
of the data and the spatial relationships at the site
Trang 36Figure 122 - Block diagram created automatically from relational data
BLOCK DIAGRAMS AND 3-D DISPLAYS
Another type of three-dimensional display is the block diagram Block diagrams can be made
from two- or three-dimensional grid models of a particular volume of rock Some block diagramsoftware allows certain stratigraphic or lithologic units to be made transparent so the user can seeinto the block diagram Generating block diagrams for large grid models is computationallyintensive and requires a powerful computer to produce results in a reasonable amount of time.Fortunately, current high-end personal computers have the power to do this for all but the largestprojects Figure 122 shows an example of a block diagram created from data extracted from anEDMS In Figure 123 the low concentration material has been removed (made transparent) to showonly the higher concentration material Also, logs for the boreholes have been added Figures 124and 125 are more complicated figures with 3-D surface features and a contaminant plume Figure
125 adds the depth to bedrock
Although block diagrams such as those shown from Figures 122 to 125 are quitecomputationally intensive and can take several minutes or more to create, the benefits can faroutweigh the inconvenience Block diagrams with this kind of detail are very difficult to producemanually Accurate rendering of 3-D objects that are faithful to the data, as shown in these figures,
is virtually impossible without using a computer
Because many people, especially non-technical people, find it difficult or impossible tovisualize objects in three dimensions, block and 3-D diagrams can be a powerful tool in illustratingand proving your case Providing an understanding of spatial relationships for regulators, attorneys,and environmental activists can be greatly aided by these displays
Trang 37Figure 123 - Deviated boreholes and plume display (Courtesy of RockWare, Inc.)
Figure 124 - 3-D facility display created with a mapping program (Courtesy of RockWare, Inc.)
Trang 38Figure 125 - 3-D display of contamination under a refinery created with a GIS (Courtesy of Dan Heidenreich,HSI Geotrans)
Trang 39CHAPTER 22
MAPPING AND GIS
Most environmental data is inherently spatial That means that the observation was taken at aspecific location in map coordinates (X and Y) and depth (or elevation) Often seeing the data in itsspatial context imparts more information content than seeing it as a text-only presentation Usingcomputerized mapping to help understand this spatial context makes sense for many projects Thischapter covers issues related to computerized mapping, including software for creating maps,displaying your data, contouring and modeling, and specialized map displays
MAPPING CONCEPTS
Since earth scientists often spend a large amount of their time working with maps, it is logical
to consider computerization of the map generation and manipulation process Computerizedmapping covers a wide variety of activities, and programs are available to help with most of them.There are some advantages and disadvantages to consider, however
Advantages and disadvantages of computerized mapping
Before making a commitment to computerized mapping, a thorough appraisal should be made
of the timesaving and other benefits that will be provided The problem is similar to the oneencountered with computer-aided design (CAD) software Making the first map with the computerwill take as much or more time than doing it by hand This is especially true if the learning curvefor the mapping software is taken into consideration The time savings will come later, when themap needs to be redone or changes need to be made Then the computer eliminates re-drafting,which can improve accuracy as well as improve speed in generating the second map
Another advantage of the computerized mapping process is that it allows the earth scientist tomake maps and diagrams that either could not or would not have been made by hand Goodexamples are trend surface and residual maps, other derived maps, block diagrams, and maps ofdata that may have previously been considered unimportant Other examples of maps more likely
to be made are multiple maps of different time periods Having the computer generate the mapsmakes it more likely that these maps will be made Some of these experimental maps will not beuseful and will be thrown away Others may provide surprising insight into the data and thegeology behind it, and could prove tremendously valuable Computer-generated maps are, for themost part, unbiased, which can be of value in many situations Finally, computerized mapping cangreatly improve the ease and accuracy of volumetric calculations
Trang 40Often the decision on whether computerized mapping is appropriate for a project depends onthe number of maps to be made and the amount of data to be mapped For small projects, handmapping is often better For large projects with thousands (or millions) of data points, the computermay be the only way to do it.
Types of maps
Since maps are so widely used, there are hundreds of different kinds of maps A few of thosetypes of maps will be discussed here In some cases, the final map display is a combination ofseveral map types
Base maps – The most fundamental type of map is the base map Whether derived from a
topographic map or commercial or proprietary data, a base map usually must be constructed beforeany other type of map can be made Geographic, cultural, and sample location data must all becollected and related together in the right spatial positions The importance of this step must not beunderestimated, and this subject is discussed in more detail below
Posted data – The next step after creating a base map is often to post data, creating a posted
data map In many cases, one of the primary goals of organizing a database is to create this type of
map Retrieving data and posting it on maps is described in a later section
Bubble maps – A bubble map, also called a dot map or pin map, expands on data posting by
using the symbol on the map to represent the value being displayed The symbol’s size, color, orshape can reflect the value being featured Figures 129 and 130 later in the chapter show examples
of this type of map
Thematic maps – Thematic maps use the display of various map elements to communicate
data, usually numeric values For example, each county on a regional map could be color coded torepresent the value of some variable, such as economic or environmental parameters Sometimesthis is done in perspective view with the polygon of each county extruded to a height thatrepresents the value
Contour maps – Contour maps use contour lines, color fill, and other graphical displays to
communicate numeric information, usually of a continuous or nearly continuous surface (such asone broken by faults) The many issues related to creating and displaying contour maps arediscussed below
Surface geology – It is often useful to make geologic maps of surface or subsurface geology
and/or other features As the use of computers for image processing and analysis increases, moresurface geology projects are being done on the computer using airphotos and satellite photos.Software exists that allows the user to move interpreted information from images onto linedrawings (maps) for output to plotters and for integration with other types of maps
Airphotos and satellite images – Aerial imagery, whether taken from an airplane or satellite,
can be of great value in environmental mapping Often airphotos are available for various timeperiods, which can assist with documenting the history of the site In order to be used for mapping,images must be ortho-rectified to remove any spatial distortion caused by the imaging process, and
to allow them to be registered to a particular map geometry Once this has been done, both types ofimages can be used for map backdrops to illustrate a variety of points about site data
Base map creation
If the EDMS will include a map component, then base map information must be loaded intoeither the database or the geographic information system (GIS) before a map can be displayed.Then the analytical and other information can be overlaid on the base map Loading the base map
data involves two steps The first is to create a base map Often this is done using a computer-aided
drafting program such as AutoCAD or the digitizing capabilities of the GIS This base map shouldprovide sufficient locational information as a reference for the data being displayed while keeping