RELATIONAL MANAGEMENT and DISPLAY of SITE ENVIRONMENTAL DATA - PART 2 pdf

The software on the server works with software on the client computers to provide access to the data.. The physical data model Figure 23 describes in detail exactly how the data will be

Trang 1

PART TWO - SYSTEM DESIGN AND IMPLEMENTATION

Trang 2

CHAPTER 5

GENERAL DESIGN ISSUES

The success of a data management task usually depends on the tool used for that task Thetheoretical physicist Stephen Hawking is quoted as saying, “When all you have is a hammer,everything looks like a nail.” This is as true in data management as in anything else People wholike to use a word processor or a spreadsheet program are likely to use the tool they are familiarwith to manage their data But just as a hammer is not the right tool to tighten a screw, aspreadsheet is not the right tool to manage a large and complicated database A databasemanagement program should be used instead This section discusses the design of the databasemanagement tool, and how the design can influence the success of the project

DATABASE MANAGEMENT SOFTWARE

Database management programs fall into two categories, desktop and client-server The use ofthe two different types and decisions about where the data will be located are discussed in the nextsection This section will discuss database applications themselves and briefly discuss the featuresand benefits of the programs The major database software vendors put a large amount of effortinto expanding and improving their products, so these descriptions are a snapshot in time For anoverview of desktop and Web-based database software, see Ross et al (2001)

Older database systems were, for the most part, based on dBase, or at least the dBase fileformat dBase started in the early days of DOS, and was originally released as dBase II becausethat sounded more mature than calling it 1.0 If anyone tells you that they have been doingdatabases since dBase 1 you know they are bluffing dBase was an interpreted application,meaning that the code was translated into machine language (compiled) each time it was run, whichwas slow on those early computers This created a market for dBase compilers, of which FoxProwas the most popular Both used a similar data format in which each data table was called adatabase file or dbf Relationships were defined in code, rather than as part of the data model.Much data has been, and still is in some cases, managed in this format These files were designedfor single-user desktop use, although locking capabilities were added in later versions of thesoftware to allow shared use

Nowadays Microsoft Access dominates the desktop database market This program provides agood combination of ease of use for beginners and power for experts It is widely available, either

as a stand-alone product or as part of the Office desktop suite Additional information on Accesscan be found in books by Dunn (1994), Jennings (1995), and others, and especially in journals

such as PC Magazine and Access/Visual Basic Advisor Access has a feature that is common to

almost all successful database programs, which is a programming language that allows users to

Trang 3

automate tasks, or even build complete programs using the database software In the case ofAccess, there are actually two programming models, a macro language that is falling out of favor,and a programming language The programming language is called Visual Basic for Applications(VBA), and is a fairly complete development environment.

Since Access is a desktop database, it has limitations relative to larger systems Experience hasshown that for practical use, the software starts to have performance problems when the largesttable in a database starts to reach a half million to a million records Access allows multiple users

to share a database, and no programming is required to implement this, but a dozen or soconcurrent users is an upper limit for heavy usage database scenarios Access is available either as

a stand-alone product or as part of the Microsoft Office Suite

An alternative to Access is Paradox from Corel (www.corel.com) This is a programmable,relational database system, and is available as part of the Corel Office Suite Paradox is a capabletool suitable for a complex database project, but the greater acceptance of Access makes Paradox

an unlikely choice in the environmental business where file sharing is common, and Access iswidespread

The next step up from Access for many organizations is Microsoft SQL Server This is a scale client-server system with robust security and a larger capacity than Access It is moderatelypriced and relatively easy to use (for enterprise software), and increases the capacity to severalmillion records It is easy to attach an Access front end (user interface) to a SQL Server back end(data storage), so the transition to SQL Server is relatively easy when the data outgrows Access

full-This connection can be done using ODBC (Open DataBase Connectivity) or other connection

methods

For even larger installations, Oracle or IBM’s DB2 offer industrial-strength data storage, butwith a price and learning curve to match These can also be connected to the desktop usingconnection methods like ODBC, and one front-end application can be set up to talk to data in thesedatabases, as well as to data in Access Using this approach it is possible to create one userinterface that can work with data in all of the different database systems

A new category of database software that is beginning to appear is open-source software.

Open-source software refers to programs where the source code for the software is freely available,and the software itself is often free as well This type of software is popular for Internetapplications, and includes such popular programs as the Linux operating system and Apache Webserver Two open-source database programs are PostgreSQL and MySQL (Jepson, 2001) Theseprograms are not yet as robust as commercial database systems, but are improving rapidly Theyare available in commercial, supported versions as well as the free open-source versions, so theyare starting to become options for enterprise database storage And you can’t beat the price.Another new category of database software is Web-based programs These programs run in abrowser rather than on the desktop, and are paid for with a monthly fee Current versions of theseprograms are limited to a flat-file design, which makes them unsuitable for the complex, relationalnature of most environmental data, but they might have application in some aspects of Web datadelivery Examples of this type of software include QuickBase from the authors of the popularQuicken and QuickBooks financial software (www.quickbase.com), and Caspio Bridge(www.caspio.com)

DATABASE LOCATION OPTIONS

A key decision in designing a data management system is where the data will reside Related

to this are a variety of issues, including what hardware and software will provide the necessaryfunctionality, who will be responsible for data entry and editing, and who will be responsible forbackup of the database

Trang 4

The simplest design for a database location is stand-alone In this design, the data and the

software to manage it reside on the computer of one user That computer may or may not be on anetwork, but all of the functionality of the database system is local to that machine The hardwareand software requirements of a system like this are modest, requiring only one computer and onelicense for the database management software The software does not need to provide access formore than one user at a time One person is in control of the computer, software, and data

For small projects, especially one-person projects, this type of design is often adequate Forlarger projects where many people need access to the data, the single individual keeping the datacan become a bottleneck This is particularly true when the retrievals required are large orcomplicated The person responsible for the data can end up spending most or all of his or her timeresponding to data requests from users When the data management requirements grow beyond thatpoint, the stand-alone system no longer meets the needs of the project team, and a better design isrequired

Shared file

Generally the next step beyond a stand-alone system is a shared file system In a shared file

system, the server (or any computer) stores the database on its hard drive like any other file Clientsaccess the file using database software on their computers the same way they would open any otherfile on the server The operating system on the server makes the file available The databasesoftware on the client computer is responsible for handling access to the database by multipleusers An example of this design would be a system in which multiple users have Microsoft Access

on their computers, and the database file, which has an extension of mdb, resides on a server,which could be running Windows 95/98/ME or NT/2000/XP When one or more users is working

in the database file, their copy of Access maintains a second file on the server called a lock file.

This file, which has an extension of ldb, keeps track of which users are using the database andwhat objects in the database may be locked at any particular time This design works well for amodest number of users in the database at once, providing adequate performance for a dozen or sousers at any given time, and for databases up to a few hundred thousand records

Client-server

When the load on the database increases to the point where a shared file system no longer

provides adequate performance, the next step is a client-server system In this design, a data

manager program runs on the server, providing access to the data through a system process One

computer is designated the server, and it holds the data management software and the data itself.

This system may also be used as the workstation of an individual user, but in high-volumesituations this is not recommended More commonly, the server computer is placed on a network

with the computers of the users, which are referred to as clients The software on the server works

with software on the client computers to provide access to the data

The following diagram covers the internal workings of a client-server EDMS It contains twoparts, the Access component at the top and the SQL Server part at the bottom In discussing theEDMS, the Access component is sometimes called the user interface, since that is the part of thesystem that users see, but in fact both Access and SQL Server have user interfaces The Accessuser interface has been customized to make it easy for the EDMS users to manage the data in waysuseful to them The SQL Server user interface is used without modifications (as provided byMicrosoft) for data administration tasks Between these user interfaces are a number of pieces thatwork together to provide the data management capabilities of the system

Trang 5

Access User Interface

Server

Client

Fmt1

Fmt1 Fmt2

Fmt2

Lookup TableMaintenance TableView RecordCounts CreationSubsetElectronic

Import

Volume

Maint

FormattedReports

Backup /Restore

FileExport

SubsetDatabase

ManualEntry ReviewData Maps Graphs

Selection Screen

Server TablesServer Volume

Access Queries / ModulesAccess AttachmentsSecurity SystemRead/Write

Access Queries / ModulesAccess AttachmentsSecurity SystemRead OnlySelection Scr

SQL Server / Oracle User Interface

Figure 17 - Client-server EDMS data flow diagram

Discussion of this diagram will start at the bottom and work toward the top, since this orderstarts with the least complicated parts (from the user’s perspective, not the complexity of thesoftware code) and moves to the more complicated parts That means starting on the SQL Serverside and working toward the client side This sequence provides the most orderly view of thesystem In this diagram, the part with a gray background represents SQL Server, and the rest of thebox is Access

The basic foundation of the server side is the SQL Server volume, which is actually a file onthe server hard drive that contains the data The size of this volume is set when the database is set

up, and can be changed by the administrator as necessary Unlike many computer files, it will notgrow automatically as data is added Someone needs to monitor the volume size and the amount ofdata and keep them in synch The software works this way because the location and structure of thefile on the hard drive is carefully managed by the SQL Server software to provide maximumperformance (speed of query execution)

The database tables are within the SQL Server volume These tables are similar in functionand appearance to the tables in Access They contain all of the data in the system, usually innormalized database form The data in the tables is manipulated through SQL queries passed to theSQL Server software via the ODBC link from the clients Also stored in the SQL Server volumecan be triggers and stored procedures to increase performance in the client-server system and toenforce referential integrity If they wish, users can see the tables in the SQL Server volumethrough the database window in Access, but their ability to manipulate the data depends on theprivileges that they have through the security system A System Administrator should back up data

in the SQL Server tables on a regular basis (at least daily)

The interface between the EDMS and the SQL Server tables is through a security system that

is partly implemented in SQL Server and partly in Access Most users should have read-onlypermission for working with the data; that is, they will be able to view data but not change it A

small group of users, called data administrators, should be able to modify data, which will include

importing and entering data, changing data, and deleting data

Trang 6

The actual connection between Access and SQL Server is done through attachments.Attachments in Access allow Access to see data that is located somewhere other than the currentAccess mdb file as if it were in the current database This is the usual way of connecting to SQLServer, and also allows us to provide the flexibility of attaching to other data sources.

Once the attachments are in place, the client interaction with the database is through Accessqueries, either alone or in combination with modules, which are programs written in VBA Variousqueries provide access to different subsets of the data for different purposes Modules are usedwhen it is necessary to work with the data procedurally, that is, line by line, instead of the wholequery as a set

Distributed

The database designs described above were geared toward an organization where the data

management is centralized, that is, where the data management activities are performed in one central location, usually on a local area network (LAN) With environmental data management,

this is not always the case Often the data for a particular facility must be available both at thefacility and at the central office The situation becomes even more complicated when the centraloffice must manage and share data with multiple remote sites This requires that some or all of thedata be made available at multiple locations The following sections describe three ways to do this:wide-area networks, distributed databases with replication, and remote communication withsubsets The factors that determine which solution is best for an organization include the amount ofdata to be managed, how fresh the data needs to be at the remote locations, and whether full-timecommunication between the facilities is available and the speed of that communication

Wide-area networks – In situations where a full-time, high-speed communication link is or

can be made available (at a cost which is reasonable for the project), a wide-area network (WAN)

is often the best choice From the point of view of the people using it, the WAN hardware andsoftware connect the computers just as with a LAN The difference is that instead of all of thecomputers being connected directly through a local Ethernet or Token Ring network, some areconnected through long-distance data lines of some sort Often there are LANs at the differentlocations connected together over the WAN

The connection between the LANs is usually done with routers on either end of the

long-distance line The router looks at the data traffic passing over the network, and data packets whichhave a destination on the other LAN are routed across the WAN to that LAN, where they continue

on to their destination

There are several options for the long-distance lines forming the WAN between the LANs.This discussion will cover some popular existing and emerging technologies This is a rapidlychanging industry, and new technologies are appearing regularly

At the high end of connectivity options are dedicated analog solutions such as T1 (or in cases

of very high data volume, T3) or frame relay These services are connected full-time, and providehigh to moderate speeds ranging from 56 kilobits per second (kbps) to 1 megabit per second(mbps) or more These services can cost $1000 per month or more This is proven technology, and

is available, at a cost, to nearly any facility

Recently, newer digital services have become available for some areas Integrated ServicesDigital Network (ISDN) provides 128 kbps for around $100 per month Digital Subscriber Line(DSL) provides connectivity ranging in speed from 256 kbps to 1.5 mbps or more Prices can be aslow as $40 per month, so this service can be a real bargain, but service is limited to a fairly shortdistance from the telephone company central office, so it’s not available to many locations Cablemodems promise to give DSL a run for its money It’s not widely available right now, especiallyfor business locations, and when it is it will have more of a focus on residential rather than businesssince that is where cable is currently connected

Trang 7

Another option is standard telephone lines and analog modems This is sometimes calledPOTS (plain old telephone system) This provides 56 kbps, or more with modem pooling, and theconnection is made on demand The cost is relatively low ($15 to 30 per month) and is availableeverywhere.

In order to have WAN-level connectivity, you should have a full-time connection of about 1mbps or faster If the connection speed available is less than this, another approach should be used

Distributed databases with replication – There are several situations where a client-server

connection over a WAN is not the best solution One is where the connection speed is too low forreal-time access The second is where the data volume is extremely high, and local copies make

more sense In this situation, distributed databases can make sense In this design, copies of the

database are placed on local servers at each facility, and users work with the local copies of thedata This is an efficient use of computer resources, but raises the very important issue of currency

of the data When data is entered or changed in one copy, the other copy is no longer the mostcurrent, and, at some point, the changes must be transferred between the copies Most high-enddatabase programs, and now some low end ones, can do this automatically at specified intervals

This is called replication Generally, the database manager software is smart enough to move only

the changed data (sometimes called “dirty” records) rather than the whole database Problems canoccur when users make simultaneous changes to the same records at different locations, so thisapproach rapidly becomes complicated

Remote communication with subsets – Often it is valuable for users to be able to work with

part of the database remotely This is particularly useful when the communication line is slow Inthis scenario, users call in with their remote computers and attach to the main database, and eitherwork with it that way or download subsets for local use In some software this is as easy asselecting data using the standard selection screen, then instructing the EDMS to create a subset.This subset can be created on the user’s computer, then the user can hang up, attach to the localsubset, then use the EDMS in the usual way, working with the subset This works for retrievingdata, but not as well for data entry or editing, unless a way is provided for data entered into thesubset to be uploaded to the main database

Internet/intranet

Tools are now available to provide access to data stored in relational database managers using

a Web browser interface At the time of this writing, in the opinion of the author, these tools arenot yet ready for use as the primary interface in a sophisticated EDMS package Specifically, thetechnology to provide an intuitive user interface with real-time feedback is too costly or impossible

to build with current Web development tools Vendors are working on implementing thatcapability, but the technology is not currently ready for prime time, at least for everyday users, andthe current technology of choice is client-server

It is now feasible, however, to provide a Web interface with limited functionality for specificapplications For example, it is not difficult to provide a public page with summaries ofenvironmental data for plants The more “canned” the retrieval is, the easier it is to implement in abrowser interface, although allowing some user selection is not difficult

In the near future, tools like Dynamic HTML, Active Server Pages, and better Java applets,combined with universal high-speed connections, will make it much easier to provide an interactiveuser interface, hosted in a Web browser At that time, EDMS vendors will certainly provide thistype of interface to their client’s databases

The following figure shows a view of three different spectra provided by the Internet andrelated technologies There are probably other ways of looking at it, but this view provides aframework for discussing products and services and their presentation to users In these days ofmulti-tiered applications, this diagram is somewhat of an over-simplification, but it serves thepurpose here

Trang 8

Local Global

Applications

Data

Alone

Stand-Shared Files

Server

Client- Enabled

Based

Web-Proprietary Commercial Public

Domain

Desktops Laptops PDAs,

etc PortalsPublic

Users

Figure 18 - The Internet spectrum

The overall range of the diagram in Figure 18 is from Local on the left to Global on the right.This range is divided into three spectra for this discussion The three spectra, which are separate

but not unrelated, are applications, data, and users.

Applications – Desktop computer usage started with stand-alone applications A program and

the data that it used were installed on one computer, which was not attached to others With theadvent of local area networks (LANs), and in some organizations wide-area networks (WANs), itbecame possible to share files, with the application running on the local desktop and with the dataresiding on a file server As software evolved and data volumes grew, software was installed onboth the local machine (client) and the server, with the user interface operating locally, and datastorage, retrieval, and sometimes business logic operating on the server With the advent of theInternet and the World Wide Web, sharing on a much broader scale is possible The applicationcan reside either on the client computer and communicate with the Web, or it can run on a Webserver The first type of application can be called Web-enabled An example of this is an emailprogram that resides locally, but talks through the Web Another example would be a virus-scanning program that operates locally but goes to the Web to update its virus signature files Thesecond type of application can be called Web-based An example of this would be a browser-basedstock trading application

Many commercial applications still operate in the range between stand-alone and client server.There is now a movement of commercial software to the right in this spectrum, to Web-enabled orWeb-based applications, probably starting with Web-enabling current programs, and then perhapsevolving to a thin-client, browser-based model over time This migration can be done with variousparts of the applications at different rates depending on the costs and benefits at each stage Newtechnologies like Microsoft’s NET initiative are helping accelerate this trend

Data – Most environmental database users currently work mostly with data that they generate

themselves Their base map data is usually based on CAD drawings that they create, and the rest oftheir project data comes from field measurements and laboratory analyses, which the data manager(or their client) pays for and owns This puts them to the left of the spectrum in the above figure.Many vendors now offer both base map and other data, either on the Web or on CD-ROM, whichmight be of value to many users Likewise, government agencies are making more and more data,spatial and non-spatial, available, often for free As vendors evolve their software and Webpresence, they can work toward integrating this data into their offerings For example, softwarecould be used to load a USGS or Census Bureau base map, and then display sites of environmentalconcern obtained from the EPA Several software companies provide tools to make it possible to

Trang 9

serve up this type of data from a modified Web server Revenue can be obtained from purchase orrental of the application, as well as from access to the data.

Users – The World Wide Web has opened up a whole new world of options for computing

platforms These range from the traditional desktop computers and laptops through personal digitalassistants (PDAs), which may be connected via wireless modem, to Web portals and other publicaccess devices Desktops and laptops can run present and future software, and, as most arebecoming connected to the Internet, will be able to support any of the computing models discussedabove PDAs and other portable devices promise to provide a high level of portability andconnectivity, which may require re-thinking data delivery and display Already there are companiesthat integrate global positioning systems (GPS) with PDAs and map data to show you where youare Other possible applications include field data gathering and delivery, and a number oforganizations provide this capability Web portals include public Internet access (such as inlibraries and coffee shops) as well as other Internet-enabled devices like public phones This brings

up the possibility that applications (and data) may run on a device not owned by or controlled bythe client, and argues for a thin-client approach

This is all food for thought as we try to envision the evolution of environmental softwareproducts and services (see Chapter 27) What is clear is that the options for delivery of applicationsand data have broadened significantly, and must be considered in planning for future needs

Multi-tiered

The evolution of the Internet and distributed computing has led to a new deployment modelcalled “multi-tiered.” The three most common tiers are the presentation level, the business logiclevel, and the data storage level Each level might run on a different computer For example, thepresentation level displayed to the user might run on a client computer, using either client-serversoftware or a Web browser The business logic level might enforce the data integrity and otherrules of the database, and could reside on a server or Web server computer Finally, the data itselfcould reside on a database server computer Separating the tiers can provide benefits for both thedesign and operation of the system

DISTRIBUTED VS CENTRALIZED DATABASES

An important decision in implementing a data management system for an organizationperforming environmental projects for multiple sites is whether the databases should be distributed

or centralized This is particularly true when the requirements for various uses of the data are takeninto consideration This issue will be discussed here from two perspectives The first perspective to

be discussed will be that of the data, and the second will be that of the organization

From the perspective of the data and the applications, the options of distributed vs centralizeddatabases are illustrated in Figures 19 and 20 Clearly it is easier for an application to be connected

to a centralized, open database than a diverse assortment of data sources The downside is theeffort required to set up and maintain a centralized data repository

Trang 10

GIS Coverages Lab Deliverables CAD Files

Spreadsheets Legacy Systems

ASCII Files Word Proc Files Chain of Custody

Field Notebooks Hard Copy Files Regulatory Reports

Figure 19 - Connection to diverse, distributed data sources

Figure 20 - Connection to a centralized open database

Trang 11

Consultant 1

Consultant 2A

Consultant

1 + Geotech

Consultant 4

Do It Yourself

Spreadsheet Access

Site 1

CLIENT

Figure 21 - Distributed vs centralized databases

The choice of distributed vs centralized databases can also be viewed from the perspective ofthe organization This is illustrated in Figure 21 The left side of the diagram shows the way thedata for environmental projects has traditionally been managed The client, such as an industrialcompany, owns several sites with environmental issues One or more consultants, labeled C1, C2,etc., manage each site, and each consultant may manage the project from various offices, such asC2A, C2B, etc Each consultant office might use a different tool to manage the data For example,for Site 1, consultant C1 may use an Excel spreadsheet Consultant C2, working on a different part

of the project, or on the same issues at a different time, may use a home-built database Otherconsultants working on different sites use a wide variety of different tools If people in the clientorganization, either in the head office or at one of the sites, want some data from one monitoringevent, it is very difficult for them to know where to look

Contrast this with the right side of the diagram In this scenario, all of the client’s data ismanaged in a centralized, open database The data may be managed by the client, or by aconsultant, but the data can be accessed by anyone given permission to do so There are hugesavings in efficiency, because everyone knows where the data is and how to get at it The difficultchallenge is getting the data into the centralized database before the benefits can be realized

Trang 12

Figure 22 - Example of a simplified logical data model

THE DATA MODEL

The data model for a data management system is the structure of the tables and fields that

contain the data Creating a robust data model is one of the most important steps in building asuccessful data management system (Walls, 1999) If you are building a data management systemfrom scratch, you need to get this part right first, as best you can, before you proceed with the userinterface design and system construction

Many software designers work with data models at two levels The logical data model (Figure

22) describes, at a conceptual level, the data content for the system The lines between the boxes

represent the relationships in the model The physical data model (Figure 23) describes in detail

exactly how the data will be stored, with names, data types, and sizes for all of the fields in each

table, along with the relationships (key fields which join the tables) between the tables.

The overall scope of the logical data model should be identified as early in the design process

as possible This is particularly true when the project is to be implemented in stages This allowsidentification of the interactions between the different parts of the system so that dependencies can

be planned for as part of the detailed design for each subset of the data as that subset isimplemented Then the physical data model for the subset can be designed along with the userinterface for that subset

The following sections describe the data structure and content for a relational EDMS Thisstructure and content is based on a commercial system developed and marketed by GeotechComputer Systems, Inc called Enviro Data Because this is a working system that has managedhundreds of databases and millions of records of site environmental investigation and monitoringdata, it seems like a good starting point for discussing the issues related to a data model for storingthis type of data

Trang 13

Figure 23 - Table and field display from a physical data model

Data structure

The structure of a relational EDMS, or of any database for that matter, should, as closely aspossible, reflect the physical realities of the data being stored For environmental site data, samplesare taken at specific locations, at certain times, depths, and/or heights, and then analyzed forcertain physical and chemical parameters This section describes the tables and relationships used

to model this usage pattern The section after this describes in some detail the data elements andexactly how they are used so that the data accurately reflects what happened

Tables – The data model for storing site environmental data consists of three types of tables:

primary tables, lookup tables, and utility tables The primary tables contain the data of interest Thelookup tables contain codes and their expanded values that are used in the primary tables to savespace and encourage consistency Sometimes the lookups contain other useful information for thedata elements that are represented by the coded values The utility tables provide a place to storevarious data items, often related to the operation and maintenance of the system Often these tablesare not related directly to the primary tables

For the most part, the primary data being stored in the EDMS has a series of one-to-many(also known as parent-child or hierarchical) relationships It is particularly fortunate that theserelationships are one-to-many rather than many-to-many, since one-to-many relationships arehandled well by the relational data model, and many-to-many are not (Many-to-many relationshipscan be handled in the relational data model They require adding another table to track the links

between the two tables This table is sometimes called a join table We don’t have to worry about

that here.)

Trang 14

The primary tables in this system are called Sites, Stations, Samples, and Analyses The

detailed content of these tables is described below Sites contains information about each facility being managed in the system Stations stores data for each location where samples are taken, such

as monitoring wells and soil borings (Note that what is called a station in this discussion is called a

site in some system designs.) Samples represents each physical sample or monitoring event at specific stations, and Analyses contains specific observed values or analytical results from the

samples

Relationships – The hierarchical relationships between the tables are obvious Each site can

have one or more stations, each station has one or more samples, and each sample is analyzed forone or more, often many, constituents But each sulfate measurement corresponds to one specificsampling event for one specific location for one specific site

The lookup relationships are one-to-many also, with the “one” side being the lookup table andthe “many” side being the primary table For example, there is one entry in the StationTypes tablefor monitoring wells, with a code of “mw,” but there can be (and usually are) many monitoringwells in the Stations table

Data content

This section will discuss briefly the data content of the example EDMS This material will becovered in greater detail in Appendix B

Sites – A Site is a facility or project that will be treated as a unit Some projects may be treated

as more than one site, and sometimes a site can be more than one facility, but the use of the siteterminology should be consistent within the database, or at least for each project Some peoplerefer to a sampling location as a site, but in this discussion we will call that a station

Stations – A Station is a location of observation Examples of stations include soil borings,

monitoring wells, surface water monitoring stations, soil and stream sediment sample locations, airmonitoring stations, and weather stations A station can be a location that is persistent, such as amonitoring well which is sampled regularly, or can be the location of a single sampling event Forstations that are sampled at different elevations (such as a soil boring), the location of the station isthe surface location for the boring, and the elevation or depth component is part of the samplingevent

Samples – A Sample is a unique sampling event or observation for a station Each station can

be sampled at various depths (such as with a soil boring), at various dates and times (such as with amonitoring well), or, less commonly, both Observations, which may or may not accompany aphysical sample, can be taken at a station at a particular time, and in this model would beconsidered part of a sample event

Analyses – An Analysis is the observed value of a parameter related to a sample This term is

intended to be interpreted broadly, and not to be limited to chemical analyses For example, fieldparameters such as pH, temperature, and turbidity also are considered analyses This would alsoinclude operating parameters of environmental concern such as flow, volume, and so on

Lookups – A lookup table is a table that contains codes that are used in the main data tables,

and the expanded values of those codes that are used for selection and display

Utilities – The system may contain tables for tracking internal information not directly related

to the primary tables These utility tables are important to the software developers and maybe thesystem and data administrators, but can usually be ignored by the users

DATA ACCESS REQUIREMENTS

The user interface provides a number of data manipulation functions, some of which areread/write and the rest are read-only

Trang 15

The functions that require read/write access to the database are:

Electronic import – This function allows data administrators to import analytical and other

data Initially the data formats supported will be the three formats defined in the Data TransferStandard Other import formats may be added as needed This is shown in Figure 17 as a single-headed arrow going into the database, but in reality there is also a small flow of data the other way

as the module checks for valid data

Manual entry – The hope is that the majority of the data that will be put in the system will be

in digital format that can be imported without retyping However, there will probably be some datawhich will need to be manually entered and edited, and this function will allow data administrators

to make those entries and changes

Editing – Data administrators will sometimes need to change data in the database Such

changes must be done with great care and be fully documented

Lookup table maintenance – One of the purposes of the lookup tables is to standardize the

entries to a limited number of choices, but there will certainly be a need for those data tables toevolve over time This feature allows the data administrators to edit those tables A procedure will

be developed for reviewing and approving those changes before entry

Verification and validation – Either as part of the import process or separately, data

validators will need to add or change validation flags based on their work

Data review – Data review should accompany data import and entry, and can be done

independently as well This function allows data administrators to look at data and modify its datareview flag as appropriate, such as after validation

Read-only

The functions that require read-only access to the database are:

Record counts – This function is a useful guide in making selections It should provide the

number of selected items whenever a selection criterion is changed

Table view – This generalized display capability allows users to view the data that they have

selected This might be all of the output they need, or they might wish to proceed to another outputoption, once they have confirmed that they have selected correctly They can also use this screen tocopy the data to the clipboard or save it to a file for use in another application

Formatted reports – Reports suitable for printing can be generated from the selection screen.

Different reports could be displayed depending on the data element selected

Maps – The results of the selection can be displayed on a map, perhaps with the value of a

constituent for each station drawn next to that station and a colored dot representing the value SeeChapter 22 for more information on mapping

Graphs – The most basic implementation of this feature allows users to draw a graph of

constituent values as a function of time for the selected data They should be able to graph multipleconstituents for one station or one constituent for several stations More advanced graphing is alsopossible as described in Chapter 20

Subset creation – Users should be able to select a subset of the main database and export it to

an Access database This might be useful for providing the data to others, or to work with thesubset when a network connection to the database is unavailable or slow

File export – This function allows users to export data in a format suitable for use in other

software needing data from the EDMS Formats need to be provided for the data needs of the othersoftware Direct connection without export-import is also possible

Trang 16

GOVERNMENT EDMS SYSTEMS

A number of government agencies have developed systems for managing site environmentaldata This section describes some of the systems that are most widely used

STORET (www.epa.gov/storet) – STORET (short for STOrage and RETrieval) is EPA’srepository for water quality, biological, and physical data It is used by EPA and other federalagencies, state environmental agencies, universities, private citizens, and others It is one of EPA’stwo data management systems containing water quality information for the nation's waters Theother system, the Legacy Data Center, or LDC, contains historical water quality data dating back tothe early part of the 20th century and collected up to the end of 1998 It is being phased out infavor of STORET STORET contains data collected beginning in 1999, along with older data thathas been properly documented and migrated from the LDC Both LDC and STORET contain rawbiological, chemical, and physical data for surface and groundwater collected by federal, state, andlocal agencies, Indian tribes, volunteer groups, academics, and others All 50 states, territories, andjurisdictions of the U.S., along with portions of Canada and Mexico, are represented in thesesystems Each sampling result is accompanied by information on where the sample was taken,when the sample was gathered, the medium sampled, the name of the organization that sponsoredthe monitoring, why the data was gathered, and much other information The LDC and STORETare Web-enabled, so users can browse both systems interactively or create files to be downloaded

to their computer for further use

CERCLIS (www.epa.gov/superfund/sites/cursites) – CERCLIS is a database that contains theofficial inventory of Superfund hazardous waste sites It contains information on hazardous wastesites, site inspections, preliminary assessments, and remediation of hazardous waste sites The EPAprovides online access to CERCLIS data Additionally, standard CERCLIS site reports can bedownloaded to a personal computer CERCLIS is a database and not an EDMS, but can be of value

in EDMS projects

IRIS (www.epa.gov/iriswebp/iris/index.html) – The Integrated Risk Information System,prepared and maintained by the EPA, is an electronic database containing information on humanhealth effects that may result from exposure to various chemicals in the environment The IRISsystem is primarily a collection of computer files covering individual chemicals These chemicalfiles contain descriptive and quantitative information on oral reference doses and inhalationreference concentrations for chronic non-carcinogenic health effects, and hazard identification, oralslope factors, and oral and inhalation unit risks for carcinogenic effects It is a database and not anEDMS, but can be of value in EDMS projects

ERPIMS (www.afcee.brooks.af.mil/ms/msc_irp.htm) – The Environmental ResourcesProgram Information Management System (ERPIMS, formerly IRPIMS) is the U.S Air Forcesystem for validation and management of data from environmental projects at all Air Force bases.The project is managed by the Air Force Center for Environmental Excellence (AFCEE) at BrooksAir Force Base in Texas ERPIMS contains analytical chemistry samples, tests, and results as well

as hydrogeological information, site/location descriptions, and monitoring well characteristics.AFCEE maintains ERPTools/PC, a Windows-based software package that has been developed tohelp Air Force contractors in collection and entry of their data, validation, and quality control.Many ERPIMS data fields are filled by codes that have been assigned by AFCEE These codes arecompiled into lists, and each list is the set of legal values for a certain field in the database AirForce contractors use ERPTools/PC to prepare the data, including comparing data to these lists,and then submit it to the main ERPIMS database at Brooks

IRDMIS (aec.army.mil/prod/usaec/rmd/im/imass.htm) – The Installation Restoration Data

Management Information System (IRDMIS) supports the technical and managerial requirements ofthe Army's Installation Restoration Program (IRP) and other environmental efforts of the U.S.Army Environmental Center (USAEC, formerly the U.S Toxic and Hazardous Materials Agency).(Don’t confuse this AEC with the Atomic Energy Commission, which is now the Department of

Trang 17

Energy.) Since 1975, more than 15 million data records have been collected and stored in IRDMISwith information collected from over 100 Army installations IRDMIS users can enter, validate,store, and retrieve the Army’s geographic; geological and hydrological; sampling; chemical; andphysical analysis information The system covers all aspects of the data life cycle, includingcomplete data entry and validation software using USAEC and CLP QA/QC methods; a Web sitefor data submission and distribution; and an Oracle RDMS with menu-driven user interface forstandardized reports, geographical plots, and plume modeling It provides a fully integratedinformation network of data status and disposition for USAEC project officers, chemists,geologists, contracted laboratories, and other parties, and supports Geographical InformationSystems and other third-party software.

USGS Water Resources (http://water.usgs.gov/nwis) – This is a set of Web pages thatprovide access to water resources data collected at about 1.5 million locations in all 50 states, theDistrict of Columbia, and Puerto Rico The U.S Geological Survey investigates the occurrence,quantity, quality, distribution, and movement of surface and groundwater, and provides the data tothe public Online access to data on this site includes real-time data for selected surface water,groundwater, and water quality sites; descriptive site information for all sites with links to allavailable water data for individual sites; water flow and levels in streams, lakes, and springs; waterlevels in wells; and chemical and physical data for streams, lakes, springs, and wells Site visitorscan easily select data and retrieve it for on-screen display or save it to a file for further processing

OTHER ISSUES

Creating and maintaining an environmental database is a serious undertaking In addition tothe activities directly related to maintaining the data itself, there are a number of issues related tothe database system that must be considered

Scalability

Databases grow with time You should make sure that the tool you select for managing yourenvironmental data can grow with your needs If you store your data in a spreadsheet program,when the number of lines of data exceeds the capacity of the spreadsheet, you will need to startanother file, and then you can’t easily work with all of your data If you store your data in a stand-alone database manager program like Access, when your data grows you can relatively easilymigrate to a more powerful database manager like SQL Server or Oracle The ability of software

and hardware to handle tasks of different sizes is called scalability, and this requirement should be

part of your planning if there is any chance your project will grow over time

Security

The cost of building a large environmental database can be hundreds of thousands of dollars ormore Protect this investment from loss Ensure that only authorized individuals can get access tothe database Make adequate backups frequently Be sure that the people who are working with thedatabase are adequately trained so that they do a good job of getting clean data into the database,and that the data stays there and stays clean Instill an attitude of protecting the database andkeeping its quality up so that people can feel comfortable using it

Access and permissions

Most database manager programs provide a system for limiting who can use a database, andwhat actions they can perform Some have more than one way of doing this Be sure to set up and

Trang 18

use an access control system that fits the needs of your organization This may not be easy Youwill have to walk a thin line between protecting your data and letting people do what they need to

do Sometimes it’s better to start off more restrictive than you think you need to, and then grantmore permissions over time, than to be lenient and then need to tighten up, since people reactbetter to getting more power rather than less Also be aware that security and access limitations areeasier to implement and manage in a client-server system than in a stand-alone system, so if youwant high security, choose SQL Server or Oracle over Access for the back-end

Activity tracking

To guarantee the quality of the data in the database, it is important to track what changes aremade to the data, when they are made, who made them, and why they were made A simple activitytracking system would include an ActivityLog table in the database to allow data administrators totrack data modifications On exit from any of the data modification screens, including importing,editing, or reviewing, an activity log screen will appear The program reports the name of the dataadministrator and the activity date The data administrator must enter a description of the activity,and the name of the site that was modified The screen should not close until an entry has beenmade Figure 24 shows an example of a screen for this type of simple system

The system should also provide a way to select and display the activity log Figure 25 shows

an example of a selection screen and report of activity data In this example, the log can be filtered

on Administrator name, Activity Date, or Site If no filters are entered, the entire log is displayed.Another option is a more elaborate system that keeps copies of any data that is changed This

is sometimes called a shadow system or audit log In this type of system, when someone changes a

record in a table, a copy of the unchanged record is stored in a shadow table, and then the change ismade in the main table Since most EDMS activity usually does not involve a lot of changes, thisdoes not increase the storage as much as it might appear, but it does significantly increase thecomplexity of the software

Figure 24 - Simple screen for tracking database activity

Trang 19

Figure 25 - Output of activity log data

Database maintenance

There are a number of activities that must be performed on an ongoing or at least occasionalbasis to keep an EDMS up and running These include:

Backup – Backing up data in the database is discussed in Chapter 15, but must be kept in

mind as part of ongoing database maintenance

Upgrades – Both commercial and custom software should be upgraded on a regular basis.

These upgrades may be required due to a change in the software platform (operating system,database software) or to add features and fix bugs A system should be implemented so that allusers of the EDMS receive the latest version of the software in a timely fashion For largeenterprises with a large number of users, automated tools are available to assist the systemadministrator with distributing upgrades to all of the appropriate computers without having to visiteach one Web-based tools are beginning to appear that provide the same functionality for all users

of software programs that support this feature Either of these approaches can be a great time saverfor a large enterprise system

Other maintenance – Other maintenance activities are required, both on the client side and

the server side For example, on the client side, Access databases grow in size with use Youshould occasionally compact your database files You can do this on some set schedule, such asmonthly, or when you notice that it has grown large, such as larger than 5 megabytes (5000 Kb).Occasionally problems will occur with Access databases due to power failures, system crashes, etc

Trang 20

When this happens, first exit Access, then shut down Windows, power down the computer, and start If you get errors in the database after that, you can have Access repair and compact thedatabase In the worst case (if repairing does not work), you should obtain a new copy of thedatabase program from the original source, and restore your data file from a backup.

re-System maintenance will be required on the server database as well, and will generally beperformed by the system administrator with assistance from the vendor if necessary Theseprocedures include general maintenance of the server computer, user administration, databasemaintenance, and system backup

The database is expected to grow as new data is received for sites currently in the database,and as new sites are added At some point in the future it will be necessary to expand the size of thedevice and the database to accommodate the increased volume of data which is anticipated Thesystem administrator should monitor the system to determine when the database size needs to beincreased

Trang 21

CHAPTER 6

DATABASE ELEMENTS

A number of elements make up an EDMS These elements include the computer on the user’sdesk, the software on that computer, the network hardware and software, and the database servercomputer They also include the components of the database management system itself, such asfiles, tables, fields, and so on This chapter covers the important elements from these twocategories This presentation focuses on how these objects are implemented in Access (for stand-alone use) and SQL Server (for client-server), two popular database products from Microsoft Agood overview of Access for both new and experienced database users can be found in Jennings(1995) More advanced users might be interested in Dunn (1994) More information on SQLServer can be found in Nath (1995); England and Stanley (1996); and England (1997) Moreinformation on database elements can be found in Dragan (2001), Gagnon (1998), Harkins (2001a,2001b), Jepson (2001), and Ross et al (2001)

HARDWARE AND SOFTWARE COMPONENTS

A modern networked data management system consists of a number of hardware and softwarecomponents These items, which often come from different manufacturers and vendors, all mustwork together for the system to function properly

The desktop computer

It is obvious that in order to run a data management system, either client-server or stand-alone,you must have a computer, and the computer resources must be sufficient to run the software Datamanagement programs can be relatively large applications In order to run a program like this youmust have a computer capable of running the appropriate operating system such as Windows Thissection describes the desktop hardware and software requirements for either a client-server orstand-alone database management system Other than the network connection, the hardwarerequirements are the same

DESKTOP HARDWARE

The computer should have a large enough hard drive and enough random access memory(RAM) to be able to load the software and run it with adequate performance, and data managementsoftware can have relatively high requirements For example, Microsoft Access has the greatestresource requirements of any of the Microsoft Office programs At the time of this writing, theminimum and recommended computer specifications for adequate performance using the datamanagement system are as shown in Figure 26

Trang 22

Item Minimum Recommended

Computer 200 megahertz Pentium processor 500 to 1000 megahertz Pentium processorHard drive Adequate for software and local

data storage, at least 1 gigabyte

Adequate for software and local datastorage, at least 1 gigabyte

Removable

storage

3.5” floppy, CD-ROM 3.5” floppy, CD-RW, Zip drive

Network 10 megabits per second 100 megabits per second

Figure 26 - Suggested hardware specifications

Probably the most important requirement is adequate random access memory (RAM), the

chips that provide short-term storage of data The amount of RAM should be increased onmachines that are not providing acceptable performance If increasing the RAM does not increasethe performance to a level appropriate for that user’s specific needs, then replacing the computerwith a faster one may be required

It is important to note that the hardware requirements to run the latest software, and thecomputer processing power of standard systems available at the store, both become greater overtime Computers that are more than three years or so old may be inadequate for running the latestversion of the database software

A brand-new, powerful computer including a monitor and printer sells for $1000 or less, so itdoesn’t make sense to limp along on an underpowered, flaky computer Don’t be penny-wise andpound-foolish Be sure that everyone has adequate computers for the work they do It will savemoney in the long run

An important distinction to keep in mind is the difference between memory and storage Acomputer has a certain amount of system memory or RAM It also has a storage device such as ahard drive Often people confuse the two, and say that their computer has 10 gigabytes of memory,when they mean disk storage

DESKTOP SOFTWARE

Several software components are required in order to run a relational database managementsystem These include the operating system, networking software (where appropriate), databasemanagement software, and the application

Operating system

Most systems used for data management run one of the Microsoft operating systems: Windows

95, 98, ME, or NT/2000/XP All of these systems can run the same client data managementsoftware and perform pretty much the same Apple Macintosh systems are present in some places,but are used mostly for graphic design and education, and have limited application for datamanagement due to poor software availability UNIX systems (including the popular open-sourceversion, Linux) are becoming an increasingly viable possibility, with serious database systems likeOracle and DB2 now available for various types of UNIX

Networking software

If the data is to be managed with a shared-file or client-server system, or if the files containing

a single-user database are to be stored on a file server computer, the client computer will need torun networking software to make the network interface card work In some cases the networkingsoftware is part of the operating system This is the case with a Windows network In other casesthe networking will be done with a separate software package Examples include Novell Netware

Trang 23

and Banyan Vines Either way, the networking software will generally be loaded during systemstartup, and after that can pretty much be ignored, except that network file server resources andnetwork database server resources are available This networking software is described in moredetail in the next section.

Database management software

The next software element in the database system is the database management software itself.Examples of this software are Microsoft Access, FoxPro, and Paradox This software can be used

by itself to manage the data, or with the help of a customized application as described in the nextsection The database application provides the user interface (the menus and forms that the usersees) and can, in the case of a stand-alone or single-user system, also provide the data storage In aclient-server system, the database software on the client computer provides the user interface, andsome or all of the data is stored on the database server computer somewhere else on the network

If the data to be managed is relatively simple, the database management software by itself isadequate for managing it For example, a simple table of names and addresses can be created anddata entered into it with a minimum of effort As the data model becomes more complicated, and asthe interaction between the database and external data sources becomes more involved, it canbecome increasingly difficult to perform the required activities using the tools of the software byitself At that point a specialized application may be required

Application

When the complexity of the database or its interactions exceeds the capability of the

general-purpose database manager program, it is necessary to move to a specialized vertical market application This refers to software specialized for a particular industry segment An EDMS represents software of this type This type of system is also referred to as COTS (commercial off-

the-shelf) software Usually the vertical market application will provide pre-configured tables andfields to store the data, import routines for data formats common in the industry, forms for editingthe data, reports for printing selected data, and export formats for specific needs Using off-the-shelf EDMS software can give you a great head start in building and managing your database,relative to designing and building your own system

The network adapters are printed circuit boards that are placed in slots in the client and servercomputers and provide the electronic connection between the computer and the network The type

of adapter card used depends on the kind of computer in which it is placed, and the type of networkbeing used

Trang 24

Clients

Network Hub

Network Adapter

Network Adapter NetworkAdapter

Figure 27 - The EDMS network hardware diagram

The wiring also depends on the type of network being used The two most common types ofwiring are twisted pair and coaxial, usually thin Ethernet Twisted pair is becoming more commonover time due to lower cost Most twisted pair networks use Category 5 (sometimes called Cat5)cable, which is similar to standard telephone wiring, but of higher quality There is usually a shortcable that runs between the computer and a wall plate, wiring in the walls from the client’s orserver’s office to a wiring closet, and then another cable from the wall plate or switch block in thewiring closet to the hub

The hub is a piece of hardware that takes the cables carrying data from the computers on thenetwork and connects them together physically Depending on the type of network and the number

of computers, other hardware may be used in place of or in addition to the hub This might includenetwork switches or routers

The network can run at different speeds depending on the capability of the computers, networkcards, hubs, wiring, and so on Until recently 10 megabits per second was standard for local areanetworks (LANs), and 56 kilobits per second was common for wide-area networks (WANs).Increasingly, 100 megabits per second is being installed for LANs and 1 megabit per second orfaster is used for WANs

EDMS NETWORK SOFTWARE

There are a number of software components required on both the client and server computers

in order for the EDMS to operate Included in this category is the operating system transportprotocols and other software required just to make the computer and network work The operatingsystem and network software should be up and running before the EDMS is installed

Trang 25

SQLServer Process SQLServer Data Storage

ODBC Driver

Access Front-end

ODBC Driver

Access Front-end ODBC

Driver

Access Front-end

Server

Data In

SQL Queries

Query Results

Data Out

Backup and Restore

Figure 28 - The EDMS network software components

The major networked data management software components of the EDMS are discussed inthis section from an external perspective, that is, looking at the various pieces and what they do,but not at the detailed internal workings of each The important parts of the internal view,especially of the data management system, will be provided in later sections

On the client computers in a client-server system, the important components for datamanagement provide the user interface and communication with the server On the server, thesoftware completes the communication and provides storage and manipulation of the data For astand-alone system, both parts run on the client computer The diagram in Figure 28 shows themajor data management software components for a client-server system, based on Access as afront-end and SQL Server as a back-end

On the client computers, the user interface for the EDMS can be provided by a database such

as Microsoft Access, or can be written in a programming language like Visual Basic, PowerBuilder, Java, or C++ The advantage of using a database language is ease of development andflexibility The advantage of a compiled language is code security, and perhaps speed, althoughspeed is less of a distinguishing factor than it used to be

The main user interface components are forms and menus for soliciting user input and formsand reports for displaying output Also provided by Access on the desktop are queries tomanipulate data and macros and modules (both of which are types of programs) to control program

Trang 26

operation and perform various tasks Customized components specific to the EDMS, if any, arecontained in an Access mdb file which is placed on the client computer during setup and whichcan be updated on a regular basis as modifications are made to the software Through thisinterface, the user should be able to (with appropriate privileges) import and check data, selectsubsets of the data, and generate output, including tables, reports, graphs, and maps.

To communicate data with the server, the Access software works with a driver, which is a

specialized piece of software with specific capabilities In a typical EDMS this driver uses a data

transfer protocol called Open DataBase Connectivity (ODBC) The driver for communicating with

SQL Server is provided by Microsoft as part of the Access software installation, although it maynot be installed as part of the standard installation Drivers for other server databases are availablefrom various sources, often the vendor of the database software There are two parts to the ODBCsystem in Windows One part is ODBC administration, which can be accessed through the ODBCicon in Control Panel This part provides central management of ODBC connections for all of thedrivers that are installed The second part consists of individual drivers for specific data sources.There are two kinds of ODBC drivers, single-tier and multi-tier The single-tier driversprovide both the communication and data manipulation capabilities, and the data managementsoftware for that specific format itself is not required Examples of single-tier drivers include thedrivers for Access, dBase, and FoxPro data files Multi-tier drivers provide the communicationbetween the client and server, and work with the database management software on the server toprovide data access Examples of multi-tier drivers include the drivers for SQL Server and Oracle.The server side of the ODBC communication link is provided by software that runs on theserver as an NT/2000/XP process The SQL Server process listens for data requests from clientsacross the network via the ODBC link, executes queries locally on the server, and sends the resultsback to the requesting client This step is very important, because the traffic across the network isminimized The requests for data are in the form of SQL queries, which are a few hundred to a fewthousand characters, and the data returned is whatever was asked for In this way the user canquery a small amount of data from a database with millions of records and the network trafficwould be just a few thousand characters

Some EDMS software packages can work in either stand-alone or client-server mode In thefirst case it uses a direct link to the Jet database engine when working with an Access database Inthe second case, the EDMS uses the SQL Server multi-tier driver to communicate between the userinterface in Access and SQL Server on the server When users are attached to a local Accessdatabase, all of the processing and data flow occurs on the client computer When connected to theserver database the data comes from the server

The server

SERVER HARDWARE

The third hardware component of the EDMS, besides client computers and the network, is thedatabase server This is a computer, usually a relatively powerful one, which contains the data andruns the server component of the data management software Usually it runs an enterprise-gradeoperating system such as Windows NT/2000/XP or UNIX In large organizations the server will beprovided or operated by an Information Technology (IT) or similar group, while in smallerorganizations data administrators or power users in the group will run it

The range of hardware used for servers, especially running Windows NT/2000/XP, is great.NT/2000/XP can run on a standard PC of the type purchased at discount or office supply stores

This is actually a good solution for small groups, especially when the application is not mission critical, meaning that if the database becomes unavailable for short periods of time the company

won’t be shut down

Trang 27

Figure 29 - Example administrative screen from Microsoft SQL Server

For an organization where the amount of use of the system is greater, or full-time availability

is very important, a computer designed as a server, with redundant and hot-swappable (can bereplaced without turning off the computer) components, is a better solution This can increase thecost of the computer by a factor of two to ten or more, but may be justified depending on the cost

of loss of availability

SERVER SOFTWARE

The client-based software components described above are those that users interact with.System administrators also interact with the server database user interface, which is softwarerunning on the server computer that allows maintenance of the database These maintenanceactivities include regular backup of the data and occasional other maintenance activities includinguser and volume administration Software is also available which allows many of thesemaintenance activities to be performed from computers remote from the server, if this is moreconvenient An example screen from SQL Server is shown in Figure 29

UNITS OF DATA STORAGE

The smallest unit of information used by computers is the binary bit (short for BInary digiT).

A bit is made up of one piece of data consisting of either a zero or a one, or more precisely, theelectrical charge is on or off at that location in memory All other types of data are composed ofone or more bits

The next larger common unit of storage is the byte, which contains eight bits One byte can

represent one of 256 different possibilities (two raised to the eighth power) This allows a byte torepresent any one of the characters of the alphabet, the numbers and punctuation symbols, or alarge number of other characters For example, the letter A (capital A) can be represented by thebyte 01000001 How each character is coded depends on the coding convention used The twomost common are ASCII (American Standard Code for Information Interchange) used on personal

Trang 28

computers and workstations, and EBCDIC (Extended Binary Coded Decimal Interchange Code)used on some mainframes.

The largest single piece of data that can be handled directly by a given processor is called a

word For an 8-bit machine, a word is the same as a byte For a 16-bit system, a word is 16 bits

long, and so on A 32-bit processor is faster than a 16-bit processor of the same clock speedbecause it can process more data at once, since the word size is twice as big

For larger amounts of data, the amount of storage is generally referred to in terms of thenumber of bytes, usually in factors of a thousand (actually 1024, or 210) Thus one thousand bytes

would be one kilobyte, one million would be one megabyte, one billion is one gigabyte, and one trillion is one terabyte As memory, mass storage devices, and databases become larger, the last

two terms are becoming increasingly important

DATABASES AND FILES

As discussed in Chapter 5, databases can be described by their logical data model, whichfocuses on data and relationships, and their physical data model, which is how the data is stored in

the computer All data in a modern computer is stored in files Files are chunks of related data

stored together on a disk drive such as a hard disk or floppy disk The operating system takes care

of managing the details of the files such as where they are located on the disk Files have names,and files in DOS and Windows usually have a base name and an extension separated by a period,such as Mydata.dbf The extension usually tells you what type of file it is

Older database systems often stored their data in the format of dBase, with an extension of.dbf Access stores its data and programs in files with the extension of mdb for MicrosoftDataBase, and can store many tables and other objects in one file Most Access developers buildtheir applications with one mdb file for the program information (queries, forms, reports, etc.) andanother for the data (tables) Larger database applications have their data in an external databasemanager such as Oracle or SQL Server The user does not see this data as files, but rather as a datasource available across the network If the front end is running in Access, they will still have theprogram mdb either on their local hard drive or available on a network drive If their user interface

is a compiled program written in Visual Basic, C, or a similar language, it will have an extension of.exe

We will now look at the remaining parts of a database system from the point of view of astand-alone Access database The concepts are about the same for other database softwarepackages Access databases contain six primary objects These are tables, queries, forms, reports,macros, and modules These objects are described in the following sections

TABLES (“DATABASES”)

The basic element of storage in a relational database system is the table Each table is ahomogeneous set of rows of data describing one type of real-world object In some older systemslike dBase, each table was referred to as a database file Current usage tends more towardconsidering the database as the set of related tables, rather than calling one table a database Tablescontain the following parts:

Records – Each line in a table is called a record, row, entity, or tuple For example, each

boring or analysis would be a record in the appropriate table Records are described in more detailbelow

Fields – Each data element within a record is called a field, column, or attribute This

represents a significant attribute of a real-world object, such as the elevation of a boring or themeasured value of a constituent Records are also described in more detail below

Trang 29

Figure 30 - Join Properties form in Microsoft Access

Relationships – Data in different tables can be related to each other For example, each

analysis is related to a specific sample, which in turn is related to a specific boring Relationshipsare usually based on key fields The database manager can help in enforcing relationships using

referential integrity, which requires that defined relationships be fulfilled according to the join

type Using this capability, it would be impossible to have an analysis for which there is no sample

Join types – A relationship between two tables is defined by a join There are two kinds of

joins, inner joins and outer joins In an inner join, matching records must be present on both sides

of the join That means that if one of the tables has records that have no matching records in theother, they are not displayed An outer join allows unmatched records to be displayed It can be a

left join or a right join, depending on which table will have unmatched records displayed.

Figure 30 shows an example of defining an outer join in Access In this example, a query hasbeen created with the Sites and Stations tables The join based on the SiteNumber field has beendefined as an outer join, with all records from the Sites table being displayed, even if there are nocorresponding records in the Stations table This outer join is a left join

Figure 31 shows the result of this query There are stations for Rad Industries and ForestProducts Co., but none for Refining, Inc Because of the outer join there is a record displayed forRefining, Inc even though there are no stations

Figure 31 - Result of an outer join query

Trang 30

FIELDS (COLUMNS)

The fields within each record contain the data of each specific kind within that record Theseare analogous to the way columns are often used in a spreadsheet, or the blanks to be filled out on apaper form

Data types – Each field has a data type, such as numeric (several possible types), character,

date/time, yes/no, object, etc The data type limits the content of the field to just that kind of data,although character fields can contain numbers and dates You shouldn’t store numbers in acharacter field, though, if you want to treat them as numbers, such as performing arithmeticcalculations on them

Character fields are the most common type of field They may include letters, numbers,

punctuation marks, and any other printable characters Some typical character fields would be

SiteName, SampleType, and so on.

Numeric is for numbers on which calculations will be performed They may be either positive

or negative, and may include a decimal point Numeric fields that might be found in an EDMS are

GroundElevation, SampleTop, etc Some systems break numbers down further into integer and

floating point numbers of various degrees of precision Generally this is only important if you arewriting software, and less important if you are using commercial programs

It is important to note that Microsoft programs such as Excel and Access have an annoyingfeature (bug) that refuses to save trailing zeros, which are very important in tracking precision Ifyou open a new spreadsheet in Excel, type in 3.10, and press Enter, the zero will go away You canchange the formatting to get it back, but it’s not stored with the number The best way around this

is to store the number of decimals with each result value, and then format the number when it isdisplayed

Date is pretty obvious Arithmetic calculations can often be performed on dates For example, the fields SampleDate and AnalysisDate could be included in a table, and could be subtracted from

each other to find the holding time Date fields in older systems are often 8 characters long(MM/DD/YY), while more modern, year 2000 compliant systems are 10 characters(MM/DD/YYYY)

There is some variability in the way that time is handled in data management systems In somedatabase files, such as dBase and FoxPro dbf files, date and time are stored in separate fields Inothers, such as Access mdb files, both can be stored in one field, with the whole numberrepresenting the date and the decimal component containing the time The dates in Access arestored as the number of days since 12/30/1899, and times as the fraction of the day starting atmidnight, such that 5 is noon

The way dates are displayed traditionally varies from one part of the world to another, so as

we go global, be careful On Windows computers, the date display format is set in the operatingsystem under Start/Settings/Control Panel/Regional Settings

Logical represents a yes/no (true/false) value Logical fields are one byte long (although it actually takes only one bit to store a logical value) ConvertedValue could be a logical field that is

true or false based on whether or not a value in the database has been converted from its originalunits

Data domain – Data within each field can be limited to a certain range For example, pH

could be limited to the range of 0 to 14 Comprehensive domain checking can be difficult toimplement effectively, since in a normalized data model, pH is not stored in its own field, but in

the same Value field that stores sulfate and benzene, which certainly can exceed 14 That means

that this type of domain analysis usually requires programming

Value – Each field has a value, which can be some measured amount, some text attribute, etc.

It is also possible that the value may be unknown or not exist, in which case the value can be set to

Null Be aware, however, that Null is not the same as zero, and is treated differently by the

software

Trang 31

Figure 32 - Oracle screen for setting field properties

Key fields – Within each table there should be one or more fields that make each record in the

table unique This might be some real-world attribute (such as laboratory sample number) or a

synthetic key such as a counter assigned by the data management system A primary key has a

unique value for each record in the table A field in one table that is a primary key in another table

is called a foreign key, and need not be unique, such as on the “many” side of a one-to-many relationship Simple keys, which are made up of one field, are usually preferable to compound keys

made up of more than one field Compound keys, and in fact any keys based on real data, areusually poor choices because they depend on the data, which may change

Figure 32 shows an Oracle screen for setting field properties

RECORDS (ROWS)

Once the tables and fields have been defined, the data is usually entered one record at a time.

Each well in the Stations table or groundwater sample in the Samples table is a record Often the

size of a database is described by the number of records in its tables

QUERIES (VIEWS)

In Access, data manipulation is done using queries Queries are based on SQL, and are given

names and stored as objects, just like tables The output of a query can be viewed directly in aneditable, spreadsheet-like view, or can be used as the basis of a form or a report Access has sixtypes of queries:

Select – This is the basic data retrieval query.

Cross-tab – This is a specialized query for summarizing data.

Trang 32

Figure 33 - Simple data editing form

Make table – This query is used to retrieve data and place it into a new table.

Update – This query changes data in an existing table.

Append – This query type adds records to an existing table.

Delete – These queries remove records from a table, and should be used with great care!

OTHER DATABASE OBJECTS

The other types of database objects in an Access system are forms, reports, macros, andmodules Forms and reports are for entering and displaying data, while macros and modules are forautomating operations

Forms

Forms in data management programs such as Access are generally used for entering, editing,

or selecting data, although they can also be used as menus for selecting an activity Forms for

working with data use a table or a query as a data source.

Figure 34 - Advanced data editing form

Định dạng
Số trang	65
Dung lượng	1,83 MB