The software on the server works with software on the client computers to provide access to the data.. The physical data model Figure 23 describes in detail exactly how the data will be
Trang 1PART TWO - SYSTEM DESIGN AND IMPLEMENTATION
Trang 2CHAPTER 5
GENERAL DESIGN ISSUES
The success of a data management task usually depends on the tool used for that task Thetheoretical physicist Stephen Hawking is quoted as saying, “When all you have is a hammer,everything looks like a nail.” This is as true in data management as in anything else People wholike to use a word processor or a spreadsheet program are likely to use the tool they are familiarwith to manage their data But just as a hammer is not the right tool to tighten a screw, aspreadsheet is not the right tool to manage a large and complicated database A databasemanagement program should be used instead This section discusses the design of the databasemanagement tool, and how the design can influence the success of the project
DATABASE MANAGEMENT SOFTWARE
Database management programs fall into two categories, desktop and client-server The use ofthe two different types and decisions about where the data will be located are discussed in the nextsection This section will discuss database applications themselves and briefly discuss the featuresand benefits of the programs The major database software vendors put a large amount of effortinto expanding and improving their products, so these descriptions are a snapshot in time For anoverview of desktop and Web-based database software, see Ross et al (2001)
Older database systems were, for the most part, based on dBase, or at least the dBase fileformat dBase started in the early days of DOS, and was originally released as dBase II becausethat sounded more mature than calling it 1.0 If anyone tells you that they have been doingdatabases since dBase 1 you know they are bluffing dBase was an interpreted application,meaning that the code was translated into machine language (compiled) each time it was run, whichwas slow on those early computers This created a market for dBase compilers, of which FoxProwas the most popular Both used a similar data format in which each data table was called adatabase file or dbf Relationships were defined in code, rather than as part of the data model.Much data has been, and still is in some cases, managed in this format These files were designedfor single-user desktop use, although locking capabilities were added in later versions of thesoftware to allow shared use
Nowadays Microsoft Access dominates the desktop database market This program provides agood combination of ease of use for beginners and power for experts It is widely available, either
as a stand-alone product or as part of the Office desktop suite Additional information on Accesscan be found in books by Dunn (1994), Jennings (1995), and others, and especially in journals
such as PC Magazine and Access/Visual Basic Advisor Access has a feature that is common to
almost all successful database programs, which is a programming language that allows users to
Trang 3automate tasks, or even build complete programs using the database software In the case ofAccess, there are actually two programming models, a macro language that is falling out of favor,and a programming language The programming language is called Visual Basic for Applications(VBA), and is a fairly complete development environment.
Since Access is a desktop database, it has limitations relative to larger systems Experience hasshown that for practical use, the software starts to have performance problems when the largesttable in a database starts to reach a half million to a million records Access allows multiple users
to share a database, and no programming is required to implement this, but a dozen or soconcurrent users is an upper limit for heavy usage database scenarios Access is available either as
a stand-alone product or as part of the Microsoft Office Suite
An alternative to Access is Paradox from Corel (www.corel.com) This is a programmable,relational database system, and is available as part of the Corel Office Suite Paradox is a capabletool suitable for a complex database project, but the greater acceptance of Access makes Paradox
an unlikely choice in the environmental business where file sharing is common, and Access iswidespread
The next step up from Access for many organizations is Microsoft SQL Server This is a scale client-server system with robust security and a larger capacity than Access It is moderatelypriced and relatively easy to use (for enterprise software), and increases the capacity to severalmillion records It is easy to attach an Access front end (user interface) to a SQL Server back end(data storage), so the transition to SQL Server is relatively easy when the data outgrows Access
full-This connection can be done using ODBC (Open DataBase Connectivity) or other connection
methods
For even larger installations, Oracle or IBM’s DB2 offer industrial-strength data storage, butwith a price and learning curve to match These can also be connected to the desktop usingconnection methods like ODBC, and one front-end application can be set up to talk to data in thesedatabases, as well as to data in Access Using this approach it is possible to create one userinterface that can work with data in all of the different database systems
A new category of database software that is beginning to appear is open-source software.
Open-source software refers to programs where the source code for the software is freely available,and the software itself is often free as well This type of software is popular for Internetapplications, and includes such popular programs as the Linux operating system and Apache Webserver Two open-source database programs are PostgreSQL and MySQL (Jepson, 2001) Theseprograms are not yet as robust as commercial database systems, but are improving rapidly Theyare available in commercial, supported versions as well as the free open-source versions, so theyare starting to become options for enterprise database storage And you can’t beat the price.Another new category of database software is Web-based programs These programs run in abrowser rather than on the desktop, and are paid for with a monthly fee Current versions of theseprograms are limited to a flat-file design, which makes them unsuitable for the complex, relationalnature of most environmental data, but they might have application in some aspects of Web datadelivery Examples of this type of software include QuickBase from the authors of the popularQuicken and QuickBooks financial software (www.quickbase.com), and Caspio Bridge(www.caspio.com)
DATABASE LOCATION OPTIONS
A key decision in designing a data management system is where the data will reside Related
to this are a variety of issues, including what hardware and software will provide the necessaryfunctionality, who will be responsible for data entry and editing, and who will be responsible forbackup of the database
Trang 4The simplest design for a database location is stand-alone In this design, the data and the
software to manage it reside on the computer of one user That computer may or may not be on anetwork, but all of the functionality of the database system is local to that machine The hardwareand software requirements of a system like this are modest, requiring only one computer and onelicense for the database management software The software does not need to provide access formore than one user at a time One person is in control of the computer, software, and data
For small projects, especially one-person projects, this type of design is often adequate Forlarger projects where many people need access to the data, the single individual keeping the datacan become a bottleneck This is particularly true when the retrievals required are large orcomplicated The person responsible for the data can end up spending most or all of his or her timeresponding to data requests from users When the data management requirements grow beyond thatpoint, the stand-alone system no longer meets the needs of the project team, and a better design isrequired
Shared file
Generally the next step beyond a stand-alone system is a shared file system In a shared file
system, the server (or any computer) stores the database on its hard drive like any other file Clientsaccess the file using database software on their computers the same way they would open any otherfile on the server The operating system on the server makes the file available The databasesoftware on the client computer is responsible for handling access to the database by multipleusers An example of this design would be a system in which multiple users have Microsoft Access
on their computers, and the database file, which has an extension of mdb, resides on a server,which could be running Windows 95/98/ME or NT/2000/XP When one or more users is working
in the database file, their copy of Access maintains a second file on the server called a lock file.
This file, which has an extension of ldb, keeps track of which users are using the database andwhat objects in the database may be locked at any particular time This design works well for amodest number of users in the database at once, providing adequate performance for a dozen or sousers at any given time, and for databases up to a few hundred thousand records
Client-server
When the load on the database increases to the point where a shared file system no longer
provides adequate performance, the next step is a client-server system In this design, a data
manager program runs on the server, providing access to the data through a system process One
computer is designated the server, and it holds the data management software and the data itself.
This system may also be used as the workstation of an individual user, but in high-volumesituations this is not recommended More commonly, the server computer is placed on a network
with the computers of the users, which are referred to as clients The software on the server works
with software on the client computers to provide access to the data
The following diagram covers the internal workings of a client-server EDMS It contains twoparts, the Access component at the top and the SQL Server part at the bottom In discussing theEDMS, the Access component is sometimes called the user interface, since that is the part of thesystem that users see, but in fact both Access and SQL Server have user interfaces The Accessuser interface has been customized to make it easy for the EDMS users to manage the data in waysuseful to them The SQL Server user interface is used without modifications (as provided byMicrosoft) for data administration tasks Between these user interfaces are a number of pieces thatwork together to provide the data management capabilities of the system
Trang 5Access User Interface
Server
Client
Fmt1
Fmt1 Fmt2
Fmt2
Lookup TableMaintenance TableView RecordCounts CreationSubsetElectronic
Import
Volume
Maint
FormattedReports
Backup /Restore
FileExport
SubsetDatabase
ManualEntry ReviewData Maps Graphs
Selection Screen
Server TablesServer Volume
Access Queries / ModulesAccess AttachmentsSecurity SystemRead/Write
Access Queries / ModulesAccess AttachmentsSecurity SystemRead OnlySelection Scr
SQL Server / Oracle User Interface
Figure 17 - Client-server EDMS data flow diagram
Discussion of this diagram will start at the bottom and work toward the top, since this orderstarts with the least complicated parts (from the user’s perspective, not the complexity of thesoftware code) and moves to the more complicated parts That means starting on the SQL Serverside and working toward the client side This sequence provides the most orderly view of thesystem In this diagram, the part with a gray background represents SQL Server, and the rest of thebox is Access
The basic foundation of the server side is the SQL Server volume, which is actually a file onthe server hard drive that contains the data The size of this volume is set when the database is set
up, and can be changed by the administrator as necessary Unlike many computer files, it will notgrow automatically as data is added Someone needs to monitor the volume size and the amount ofdata and keep them in synch The software works this way because the location and structure of thefile on the hard drive is carefully managed by the SQL Server software to provide maximumperformance (speed of query execution)
The database tables are within the SQL Server volume These tables are similar in functionand appearance to the tables in Access They contain all of the data in the system, usually innormalized database form The data in the tables is manipulated through SQL queries passed to theSQL Server software via the ODBC link from the clients Also stored in the SQL Server volumecan be triggers and stored procedures to increase performance in the client-server system and toenforce referential integrity If they wish, users can see the tables in the SQL Server volumethrough the database window in Access, but their ability to manipulate the data depends on theprivileges that they have through the security system A System Administrator should back up data
in the SQL Server tables on a regular basis (at least daily)
The interface between the EDMS and the SQL Server tables is through a security system that
is partly implemented in SQL Server and partly in Access Most users should have read-onlypermission for working with the data; that is, they will be able to view data but not change it A
small group of users, called data administrators, should be able to modify data, which will include
importing and entering data, changing data, and deleting data
Trang 6The actual connection between Access and SQL Server is done through attachments.Attachments in Access allow Access to see data that is located somewhere other than the currentAccess mdb file as if it were in the current database This is the usual way of connecting to SQLServer, and also allows us to provide the flexibility of attaching to other data sources.
Once the attachments are in place, the client interaction with the database is through Accessqueries, either alone or in combination with modules, which are programs written in VBA Variousqueries provide access to different subsets of the data for different purposes Modules are usedwhen it is necessary to work with the data procedurally, that is, line by line, instead of the wholequery as a set
Distributed
The database designs described above were geared toward an organization where the data
management is centralized, that is, where the data management activities are performed in one central location, usually on a local area network (LAN) With environmental data management,
this is not always the case Often the data for a particular facility must be available both at thefacility and at the central office The situation becomes even more complicated when the centraloffice must manage and share data with multiple remote sites This requires that some or all of thedata be made available at multiple locations The following sections describe three ways to do this:wide-area networks, distributed databases with replication, and remote communication withsubsets The factors that determine which solution is best for an organization include the amount ofdata to be managed, how fresh the data needs to be at the remote locations, and whether full-timecommunication between the facilities is available and the speed of that communication
Wide-area networks – In situations where a full-time, high-speed communication link is or
can be made available (at a cost which is reasonable for the project), a wide-area network (WAN)
is often the best choice From the point of view of the people using it, the WAN hardware andsoftware connect the computers just as with a LAN The difference is that instead of all of thecomputers being connected directly through a local Ethernet or Token Ring network, some areconnected through long-distance data lines of some sort Often there are LANs at the differentlocations connected together over the WAN
The connection between the LANs is usually done with routers on either end of the
long-distance line The router looks at the data traffic passing over the network, and data packets whichhave a destination on the other LAN are routed across the WAN to that LAN, where they continue
on to their destination
There are several options for the long-distance lines forming the WAN between the LANs.This discussion will cover some popular existing and emerging technologies This is a rapidlychanging industry, and new technologies are appearing regularly
At the high end of connectivity options are dedicated analog solutions such as T1 (or in cases
of very high data volume, T3) or frame relay These services are connected full-time, and providehigh to moderate speeds ranging from 56 kilobits per second (kbps) to 1 megabit per second(mbps) or more These services can cost $1000 per month or more This is proven technology, and
is available, at a cost, to nearly any facility
Recently, newer digital services have become available for some areas Integrated ServicesDigital Network (ISDN) provides 128 kbps for around $100 per month Digital Subscriber Line(DSL) provides connectivity ranging in speed from 256 kbps to 1.5 mbps or more Prices can be aslow as $40 per month, so this service can be a real bargain, but service is limited to a fairly shortdistance from the telephone company central office, so it’s not available to many locations Cablemodems promise to give DSL a run for its money It’s not widely available right now, especiallyfor business locations, and when it is it will have more of a focus on residential rather than businesssince that is where cable is currently connected
Trang 7Another option is standard telephone lines and analog modems This is sometimes calledPOTS (plain old telephone system) This provides 56 kbps, or more with modem pooling, and theconnection is made on demand The cost is relatively low ($15 to 30 per month) and is availableeverywhere.
In order to have WAN-level connectivity, you should have a full-time connection of about 1mbps or faster If the connection speed available is less than this, another approach should be used
Distributed databases with replication – There are several situations where a client-server
connection over a WAN is not the best solution One is where the connection speed is too low forreal-time access The second is where the data volume is extremely high, and local copies make
more sense In this situation, distributed databases can make sense In this design, copies of the
database are placed on local servers at each facility, and users work with the local copies of thedata This is an efficient use of computer resources, but raises the very important issue of currency
of the data When data is entered or changed in one copy, the other copy is no longer the mostcurrent, and, at some point, the changes must be transferred between the copies Most high-enddatabase programs, and now some low end ones, can do this automatically at specified intervals
This is called replication Generally, the database manager software is smart enough to move only
the changed data (sometimes called “dirty” records) rather than the whole database Problems canoccur when users make simultaneous changes to the same records at different locations, so thisapproach rapidly becomes complicated
Remote communication with subsets – Often it is valuable for users to be able to work with
part of the database remotely This is particularly useful when the communication line is slow Inthis scenario, users call in with their remote computers and attach to the main database, and eitherwork with it that way or download subsets for local use In some software this is as easy asselecting data using the standard selection screen, then instructing the EDMS to create a subset.This subset can be created on the user’s computer, then the user can hang up, attach to the localsubset, then use the EDMS in the usual way, working with the subset This works for retrievingdata, but not as well for data entry or editing, unless a way is provided for data entered into thesubset to be uploaded to the main database
Internet/intranet
Tools are now available to provide access to data stored in relational database managers using
a Web browser interface At the time of this writing, in the opinion of the author, these tools arenot yet ready for use as the primary interface in a sophisticated EDMS package Specifically, thetechnology to provide an intuitive user interface with real-time feedback is too costly or impossible
to build with current Web development tools Vendors are working on implementing thatcapability, but the technology is not currently ready for prime time, at least for everyday users, andthe current technology of choice is client-server
It is now feasible, however, to provide a Web interface with limited functionality for specificapplications For example, it is not difficult to provide a public page with summaries ofenvironmental data for plants The more “canned” the retrieval is, the easier it is to implement in abrowser interface, although allowing some user selection is not difficult
In the near future, tools like Dynamic HTML, Active Server Pages, and better Java applets,combined with universal high-speed connections, will make it much easier to provide an interactiveuser interface, hosted in a Web browser At that time, EDMS vendors will certainly provide thistype of interface to their client’s databases
The following figure shows a view of three different spectra provided by the Internet andrelated technologies There are probably other ways of looking at it, but this view provides aframework for discussing products and services and their presentation to users In these days ofmulti-tiered applications, this diagram is somewhat of an over-simplification, but it serves thepurpose here
Trang 8Local Global
Applications
Data
Alone
Stand-Shared Files
Server
Client- Enabled
Based
Web-Proprietary Commercial Public
Domain
Desktops Laptops PDAs,
etc PortalsPublic
Users
Figure 18 - The Internet spectrum
The overall range of the diagram in Figure 18 is from Local on the left to Global on the right.This range is divided into three spectra for this discussion The three spectra, which are separate
but not unrelated, are applications, data, and users.
Applications – Desktop computer usage started with stand-alone applications A program and
the data that it used were installed on one computer, which was not attached to others With theadvent of local area networks (LANs), and in some organizations wide-area networks (WANs), itbecame possible to share files, with the application running on the local desktop and with the dataresiding on a file server As software evolved and data volumes grew, software was installed onboth the local machine (client) and the server, with the user interface operating locally, and datastorage, retrieval, and sometimes business logic operating on the server With the advent of theInternet and the World Wide Web, sharing on a much broader scale is possible The applicationcan reside either on the client computer and communicate with the Web, or it can run on a Webserver The first type of application can be called Web-enabled An example of this is an emailprogram that resides locally, but talks through the Web Another example would be a virus-scanning program that operates locally but goes to the Web to update its virus signature files Thesecond type of application can be called Web-based An example of this would be a browser-basedstock trading application
Many commercial applications still operate in the range between stand-alone and client server.There is now a movement of commercial software to the right in this spectrum, to Web-enabled orWeb-based applications, probably starting with Web-enabling current programs, and then perhapsevolving to a thin-client, browser-based model over time This migration can be done with variousparts of the applications at different rates depending on the costs and benefits at each stage Newtechnologies like Microsoft’s NET initiative are helping accelerate this trend
Data – Most environmental database users currently work mostly with data that they generate
themselves Their base map data is usually based on CAD drawings that they create, and the rest oftheir project data comes from field measurements and laboratory analyses, which the data manager(or their client) pays for and owns This puts them to the left of the spectrum in the above figure.Many vendors now offer both base map and other data, either on the Web or on CD-ROM, whichmight be of value to many users Likewise, government agencies are making more and more data,spatial and non-spatial, available, often for free As vendors evolve their software and Webpresence, they can work toward integrating this data into their offerings For example, softwarecould be used to load a USGS or Census Bureau base map, and then display sites of environmentalconcern obtained from the EPA Several software companies provide tools to make it possible to
Trang 9serve up this type of data from a modified Web server Revenue can be obtained from purchase orrental of the application, as well as from access to the data.
Users – The World Wide Web has opened up a whole new world of options for computing
platforms These range from the traditional desktop computers and laptops through personal digitalassistants (PDAs), which may be connected via wireless modem, to Web portals and other publicaccess devices Desktops and laptops can run present and future software, and, as most arebecoming connected to the Internet, will be able to support any of the computing models discussedabove PDAs and other portable devices promise to provide a high level of portability andconnectivity, which may require re-thinking data delivery and display Already there are companiesthat integrate global positioning systems (GPS) with PDAs and map data to show you where youare Other possible applications include field data gathering and delivery, and a number oforganizations provide this capability Web portals include public Internet access (such as inlibraries and coffee shops) as well as other Internet-enabled devices like public phones This brings
up the possibility that applications (and data) may run on a device not owned by or controlled bythe client, and argues for a thin-client approach
This is all food for thought as we try to envision the evolution of environmental softwareproducts and services (see Chapter 27) What is clear is that the options for delivery of applicationsand data have broadened significantly, and must be considered in planning for future needs
Multi-tiered
The evolution of the Internet and distributed computing has led to a new deployment modelcalled “multi-tiered.” The three most common tiers are the presentation level, the business logiclevel, and the data storage level Each level might run on a different computer For example, thepresentation level displayed to the user might run on a client computer, using either client-serversoftware or a Web browser The business logic level might enforce the data integrity and otherrules of the database, and could reside on a server or Web server computer Finally, the data itselfcould reside on a database server computer Separating the tiers can provide benefits for both thedesign and operation of the system
DISTRIBUTED VS CENTRALIZED DATABASES
An important decision in implementing a data management system for an organizationperforming environmental projects for multiple sites is whether the databases should be distributed
or centralized This is particularly true when the requirements for various uses of the data are takeninto consideration This issue will be discussed here from two perspectives The first perspective to
be discussed will be that of the data, and the second will be that of the organization
From the perspective of the data and the applications, the options of distributed vs centralizeddatabases are illustrated in Figures 19 and 20 Clearly it is easier for an application to be connected
to a centralized, open database than a diverse assortment of data sources The downside is theeffort required to set up and maintain a centralized data repository
Trang 10GIS Coverages Lab Deliverables CAD Files
Spreadsheets Legacy Systems
ASCII Files Word Proc Files Chain of Custody
Field Notebooks Hard Copy Files Regulatory Reports
Figure 19 - Connection to diverse, distributed data sources
Figure 20 - Connection to a centralized open database
Trang 11Consultant 1
Consultant 2A
Consultant
1 + Geotech
Consultant 4
Do It Yourself
Spreadsheet Access
Site 1
CLIENT
Figure 21 - Distributed vs centralized databases
The choice of distributed vs centralized databases can also be viewed from the perspective ofthe organization This is illustrated in Figure 21 The left side of the diagram shows the way thedata for environmental projects has traditionally been managed The client, such as an industrialcompany, owns several sites with environmental issues One or more consultants, labeled C1, C2,etc., manage each site, and each consultant may manage the project from various offices, such asC2A, C2B, etc Each consultant office might use a different tool to manage the data For example,for Site 1, consultant C1 may use an Excel spreadsheet Consultant C2, working on a different part
of the project, or on the same issues at a different time, may use a home-built database Otherconsultants working on different sites use a wide variety of different tools If people in the clientorganization, either in the head office or at one of the sites, want some data from one monitoringevent, it is very difficult for them to know where to look
Contrast this with the right side of the diagram In this scenario, all of the client’s data ismanaged in a centralized, open database The data may be managed by the client, or by aconsultant, but the data can be accessed by anyone given permission to do so There are hugesavings in efficiency, because everyone knows where the data is and how to get at it The difficultchallenge is getting the data into the centralized database before the benefits can be realized
Trang 12Figure 22 - Example of a simplified logical data model
THE DATA MODEL
The data model for a data management system is the structure of the tables and fields that
contain the data Creating a robust data model is one of the most important steps in building asuccessful data management system (Walls, 1999) If you are building a data management systemfrom scratch, you need to get this part right first, as best you can, before you proceed with the userinterface design and system construction
Many software designers work with data models at two levels The logical data model (Figure
22) describes, at a conceptual level, the data content for the system The lines between the boxes
represent the relationships in the model The physical data model (Figure 23) describes in detail
exactly how the data will be stored, with names, data types, and sizes for all of the fields in each
table, along with the relationships (key fields which join the tables) between the tables.
The overall scope of the logical data model should be identified as early in the design process
as possible This is particularly true when the project is to be implemented in stages This allowsidentification of the interactions between the different parts of the system so that dependencies can
be planned for as part of the detailed design for each subset of the data as that subset isimplemented Then the physical data model for the subset can be designed along with the userinterface for that subset
The following sections describe the data structure and content for a relational EDMS Thisstructure and content is based on a commercial system developed and marketed by GeotechComputer Systems, Inc called Enviro Data Because this is a working system that has managedhundreds of databases and millions of records of site environmental investigation and monitoringdata, it seems like a good starting point for discussing the issues related to a data model for storingthis type of data
Trang 13Figure 23 - Table and field display from a physical data model
Data structure
The structure of a relational EDMS, or of any database for that matter, should, as closely aspossible, reflect the physical realities of the data being stored For environmental site data, samplesare taken at specific locations, at certain times, depths, and/or heights, and then analyzed forcertain physical and chemical parameters This section describes the tables and relationships used
to model this usage pattern The section after this describes in some detail the data elements andexactly how they are used so that the data accurately reflects what happened
Tables – The data model for storing site environmental data consists of three types of tables:
primary tables, lookup tables, and utility tables The primary tables contain the data of interest Thelookup tables contain codes and their expanded values that are used in the primary tables to savespace and encourage consistency Sometimes the lookups contain other useful information for thedata elements that are represented by the coded values The utility tables provide a place to storevarious data items, often related to the operation and maintenance of the system Often these tablesare not related directly to the primary tables
For the most part, the primary data being stored in the EDMS has a series of one-to-many(also known as parent-child or hierarchical) relationships It is particularly fortunate that theserelationships are one-to-many rather than many-to-many, since one-to-many relationships arehandled well by the relational data model, and many-to-many are not (Many-to-many relationshipscan be handled in the relational data model They require adding another table to track the links
between the two tables This table is sometimes called a join table We don’t have to worry about
that here.)
Trang 14The primary tables in this system are called Sites, Stations, Samples, and Analyses The
detailed content of these tables is described below Sites contains information about each facility being managed in the system Stations stores data for each location where samples are taken, such
as monitoring wells and soil borings (Note that what is called a station in this discussion is called a
site in some system designs.) Samples represents each physical sample or monitoring event at specific stations, and Analyses contains specific observed values or analytical results from the
samples
Relationships – The hierarchical relationships between the tables are obvious Each site can
have one or more stations, each station has one or more samples, and each sample is analyzed forone or more, often many, constituents But each sulfate measurement corresponds to one specificsampling event for one specific location for one specific site
The lookup relationships are one-to-many also, with the “one” side being the lookup table andthe “many” side being the primary table For example, there is one entry in the StationTypes tablefor monitoring wells, with a code of “mw,” but there can be (and usually are) many monitoringwells in the Stations table
Data content
This section will discuss briefly the data content of the example EDMS This material will becovered in greater detail in Appendix B
Sites – A Site is a facility or project that will be treated as a unit Some projects may be treated
as more than one site, and sometimes a site can be more than one facility, but the use of the siteterminology should be consistent within the database, or at least for each project Some peoplerefer to a sampling location as a site, but in this discussion we will call that a station
Stations – A Station is a location of observation Examples of stations include soil borings,
monitoring wells, surface water monitoring stations, soil and stream sediment sample locations, airmonitoring stations, and weather stations A station can be a location that is persistent, such as amonitoring well which is sampled regularly, or can be the location of a single sampling event Forstations that are sampled at different elevations (such as a soil boring), the location of the station isthe surface location for the boring, and the elevation or depth component is part of the samplingevent
Samples – A Sample is a unique sampling event or observation for a station Each station can
be sampled at various depths (such as with a soil boring), at various dates and times (such as with amonitoring well), or, less commonly, both Observations, which may or may not accompany aphysical sample, can be taken at a station at a particular time, and in this model would beconsidered part of a sample event
Analyses – An Analysis is the observed value of a parameter related to a sample This term is
intended to be interpreted broadly, and not to be limited to chemical analyses For example, fieldparameters such as pH, temperature, and turbidity also are considered analyses This would alsoinclude operating parameters of environmental concern such as flow, volume, and so on
Lookups – A lookup table is a table that contains codes that are used in the main data tables,
and the expanded values of those codes that are used for selection and display
Utilities – The system may contain tables for tracking internal information not directly related
to the primary tables These utility tables are important to the software developers and maybe thesystem and data administrators, but can usually be ignored by the users
DATA ACCESS REQUIREMENTS
The user interface provides a number of data manipulation functions, some of which areread/write and the rest are read-only
Trang 15The functions that require read/write access to the database are:
Electronic import – This function allows data administrators to import analytical and other
data Initially the data formats supported will be the three formats defined in the Data TransferStandard Other import formats may be added as needed This is shown in Figure 17 as a single-headed arrow going into the database, but in reality there is also a small flow of data the other way
as the module checks for valid data
Manual entry – The hope is that the majority of the data that will be put in the system will be
in digital format that can be imported without retyping However, there will probably be some datawhich will need to be manually entered and edited, and this function will allow data administrators
to make those entries and changes
Editing – Data administrators will sometimes need to change data in the database Such
changes must be done with great care and be fully documented
Lookup table maintenance – One of the purposes of the lookup tables is to standardize the
entries to a limited number of choices, but there will certainly be a need for those data tables toevolve over time This feature allows the data administrators to edit those tables A procedure will
be developed for reviewing and approving those changes before entry
Verification and validation – Either as part of the import process or separately, data
validators will need to add or change validation flags based on their work
Data review – Data review should accompany data import and entry, and can be done
independently as well This function allows data administrators to look at data and modify its datareview flag as appropriate, such as after validation
Read-only
The functions that require read-only access to the database are:
Record counts – This function is a useful guide in making selections It should provide the
number of selected items whenever a selection criterion is changed
Table view – This generalized display capability allows users to view the data that they have
selected This might be all of the output they need, or they might wish to proceed to another outputoption, once they have confirmed that they have selected correctly They can also use this screen tocopy the data to the clipboard or save it to a file for use in another application
Formatted reports – Reports suitable for printing can be generated from the selection screen.
Different reports could be displayed depending on the data element selected
Maps – The results of the selection can be displayed on a map, perhaps with the value of a
constituent for each station drawn next to that station and a colored dot representing the value SeeChapter 22 for more information on mapping
Graphs – The most basic implementation of this feature allows users to draw a graph of
constituent values as a function of time for the selected data They should be able to graph multipleconstituents for one station or one constituent for several stations More advanced graphing is alsopossible as described in Chapter 20
Subset creation – Users should be able to select a subset of the main database and export it to
an Access database This might be useful for providing the data to others, or to work with thesubset when a network connection to the database is unavailable or slow
File export – This function allows users to export data in a format suitable for use in other
software needing data from the EDMS Formats need to be provided for the data needs of the othersoftware Direct connection without export-import is also possible
Trang 16GOVERNMENT EDMS SYSTEMS
A number of government agencies have developed systems for managing site environmentaldata This section describes some of the systems that are most widely used
STORET (www.epa.gov/storet) – STORET (short for STOrage and RETrieval) is EPA’srepository for water quality, biological, and physical data It is used by EPA and other federalagencies, state environmental agencies, universities, private citizens, and others It is one of EPA’stwo data management systems containing water quality information for the nation's waters Theother system, the Legacy Data Center, or LDC, contains historical water quality data dating back tothe early part of the 20th century and collected up to the end of 1998 It is being phased out infavor of STORET STORET contains data collected beginning in 1999, along with older data thathas been properly documented and migrated from the LDC Both LDC and STORET contain rawbiological, chemical, and physical data for surface and groundwater collected by federal, state, andlocal agencies, Indian tribes, volunteer groups, academics, and others All 50 states, territories, andjurisdictions of the U.S., along with portions of Canada and Mexico, are represented in thesesystems Each sampling result is accompanied by information on where the sample was taken,when the sample was gathered, the medium sampled, the name of the organization that sponsoredthe monitoring, why the data was gathered, and much other information The LDC and STORETare Web-enabled, so users can browse both systems interactively or create files to be downloaded
to their computer for further use
CERCLIS (www.epa.gov/superfund/sites/cursites) – CERCLIS is a database that contains theofficial inventory of Superfund hazardous waste sites It contains information on hazardous wastesites, site inspections, preliminary assessments, and remediation of hazardous waste sites The EPAprovides online access to CERCLIS data Additionally, standard CERCLIS site reports can bedownloaded to a personal computer CERCLIS is a database and not an EDMS, but can be of value
in EDMS projects
IRIS (www.epa.gov/iriswebp/iris/index.html) – The Integrated Risk Information System,prepared and maintained by the EPA, is an electronic database containing information on humanhealth effects that may result from exposure to various chemicals in the environment The IRISsystem is primarily a collection of computer files covering individual chemicals These chemicalfiles contain descriptive and quantitative information on oral reference doses and inhalationreference concentrations for chronic non-carcinogenic health effects, and hazard identification, oralslope factors, and oral and inhalation unit risks for carcinogenic effects It is a database and not anEDMS, but can be of value in EDMS projects
ERPIMS (www.afcee.brooks.af.mil/ms/msc_irp.htm) – The Environmental ResourcesProgram Information Management System (ERPIMS, formerly IRPIMS) is the U.S Air Forcesystem for validation and management of data from environmental projects at all Air Force bases.The project is managed by the Air Force Center for Environmental Excellence (AFCEE) at BrooksAir Force Base in Texas ERPIMS contains analytical chemistry samples, tests, and results as well
as hydrogeological information, site/location descriptions, and monitoring well characteristics.AFCEE maintains ERPTools/PC, a Windows-based software package that has been developed tohelp Air Force contractors in collection and entry of their data, validation, and quality control.Many ERPIMS data fields are filled by codes that have been assigned by AFCEE These codes arecompiled into lists, and each list is the set of legal values for a certain field in the database AirForce contractors use ERPTools/PC to prepare the data, including comparing data to these lists,and then submit it to the main ERPIMS database at Brooks
IRDMIS (aec.army.mil/prod/usaec/rmd/im/imass.htm) – The Installation Restoration Data
Management Information System (IRDMIS) supports the technical and managerial requirements ofthe Army's Installation Restoration Program (IRP) and other environmental efforts of the U.S.Army Environmental Center (USAEC, formerly the U.S Toxic and Hazardous Materials Agency).(Don’t confuse this AEC with the Atomic Energy Commission, which is now the Department of
Trang 17Energy.) Since 1975, more than 15 million data records have been collected and stored in IRDMISwith information collected from over 100 Army installations IRDMIS users can enter, validate,store, and retrieve the Army’s geographic; geological and hydrological; sampling; chemical; andphysical analysis information The system covers all aspects of the data life cycle, includingcomplete data entry and validation software using USAEC and CLP QA/QC methods; a Web sitefor data submission and distribution; and an Oracle RDMS with menu-driven user interface forstandardized reports, geographical plots, and plume modeling It provides a fully integratedinformation network of data status and disposition for USAEC project officers, chemists,geologists, contracted laboratories, and other parties, and supports Geographical InformationSystems and other third-party software.
USGS Water Resources (http://water.usgs.gov/nwis) – This is a set of Web pages thatprovide access to water resources data collected at about 1.5 million locations in all 50 states, theDistrict of Columbia, and Puerto Rico The U.S Geological Survey investigates the occurrence,quantity, quality, distribution, and movement of surface and groundwater, and provides the data tothe public Online access to data on this site includes real-time data for selected surface water,groundwater, and water quality sites; descriptive site information for all sites with links to allavailable water data for individual sites; water flow and levels in streams, lakes, and springs; waterlevels in wells; and chemical and physical data for streams, lakes, springs, and wells Site visitorscan easily select data and retrieve it for on-screen display or save it to a file for further processing
OTHER ISSUES
Creating and maintaining an environmental database is a serious undertaking In addition tothe activities directly related to maintaining the data itself, there are a number of issues related tothe database system that must be considered
Scalability
Databases grow with time You should make sure that the tool you select for managing yourenvironmental data can grow with your needs If you store your data in a spreadsheet program,when the number of lines of data exceeds the capacity of the spreadsheet, you will need to startanother file, and then you can’t easily work with all of your data If you store your data in a stand-alone database manager program like Access, when your data grows you can relatively easilymigrate to a more powerful database manager like SQL Server or Oracle The ability of software
and hardware to handle tasks of different sizes is called scalability, and this requirement should be
part of your planning if there is any chance your project will grow over time
Security
The cost of building a large environmental database can be hundreds of thousands of dollars ormore Protect this investment from loss Ensure that only authorized individuals can get access tothe database Make adequate backups frequently Be sure that the people who are working with thedatabase are adequately trained so that they do a good job of getting clean data into the database,and that the data stays there and stays clean Instill an attitude of protecting the database andkeeping its quality up so that people can feel comfortable using it
Access and permissions
Most database manager programs provide a system for limiting who can use a database, andwhat actions they can perform Some have more than one way of doing this Be sure to set up and
Trang 18use an access control system that fits the needs of your organization This may not be easy Youwill have to walk a thin line between protecting your data and letting people do what they need to
do Sometimes it’s better to start off more restrictive than you think you need to, and then grantmore permissions over time, than to be lenient and then need to tighten up, since people reactbetter to getting more power rather than less Also be aware that security and access limitations areeasier to implement and manage in a client-server system than in a stand-alone system, so if youwant high security, choose SQL Server or Oracle over Access for the back-end
Activity tracking
To guarantee the quality of the data in the database, it is important to track what changes aremade to the data, when they are made, who made them, and why they were made A simple activitytracking system would include an ActivityLog table in the database to allow data administrators totrack data modifications On exit from any of the data modification screens, including importing,editing, or reviewing, an activity log screen will appear The program reports the name of the dataadministrator and the activity date The data administrator must enter a description of the activity,and the name of the site that was modified The screen should not close until an entry has beenmade Figure 24 shows an example of a screen for this type of simple system
The system should also provide a way to select and display the activity log Figure 25 shows
an example of a selection screen and report of activity data In this example, the log can be filtered
on Administrator name, Activity Date, or Site If no filters are entered, the entire log is displayed.Another option is a more elaborate system that keeps copies of any data that is changed This
is sometimes called a shadow system or audit log In this type of system, when someone changes a
record in a table, a copy of the unchanged record is stored in a shadow table, and then the change ismade in the main table Since most EDMS activity usually does not involve a lot of changes, thisdoes not increase the storage as much as it might appear, but it does significantly increase thecomplexity of the software
Figure 24 - Simple screen for tracking database activity
Trang 19Figure 25 - Output of activity log data
Database maintenance
There are a number of activities that must be performed on an ongoing or at least occasionalbasis to keep an EDMS up and running These include:
Backup – Backing up data in the database is discussed in Chapter 15, but must be kept in
mind as part of ongoing database maintenance
Upgrades – Both commercial and custom software should be upgraded on a regular basis.
These upgrades may be required due to a change in the software platform (operating system,database software) or to add features and fix bugs A system should be implemented so that allusers of the EDMS receive the latest version of the software in a timely fashion For largeenterprises with a large number of users, automated tools are available to assist the systemadministrator with distributing upgrades to all of the appropriate computers without having to visiteach one Web-based tools are beginning to appear that provide the same functionality for all users
of software programs that support this feature Either of these approaches can be a great time saverfor a large enterprise system
Other maintenance – Other maintenance activities are required, both on the client side and
the server side For example, on the client side, Access databases grow in size with use Youshould occasionally compact your database files You can do this on some set schedule, such asmonthly, or when you notice that it has grown large, such as larger than 5 megabytes (5000 Kb).Occasionally problems will occur with Access databases due to power failures, system crashes, etc
Trang 20When this happens, first exit Access, then shut down Windows, power down the computer, and start If you get errors in the database after that, you can have Access repair and compact thedatabase In the worst case (if repairing does not work), you should obtain a new copy of thedatabase program from the original source, and restore your data file from a backup.
re-System maintenance will be required on the server database as well, and will generally beperformed by the system administrator with assistance from the vendor if necessary Theseprocedures include general maintenance of the server computer, user administration, databasemaintenance, and system backup
The database is expected to grow as new data is received for sites currently in the database,and as new sites are added At some point in the future it will be necessary to expand the size of thedevice and the database to accommodate the increased volume of data which is anticipated Thesystem administrator should monitor the system to determine when the database size needs to beincreased
Trang 21CHAPTER 6
DATABASE ELEMENTS
A number of elements make up an EDMS These elements include the computer on the user’sdesk, the software on that computer, the network hardware and software, and the database servercomputer They also include the components of the database management system itself, such asfiles, tables, fields, and so on This chapter covers the important elements from these twocategories This presentation focuses on how these objects are implemented in Access (for stand-alone use) and SQL Server (for client-server), two popular database products from Microsoft Agood overview of Access for both new and experienced database users can be found in Jennings(1995) More advanced users might be interested in Dunn (1994) More information on SQLServer can be found in Nath (1995); England and Stanley (1996); and England (1997) Moreinformation on database elements can be found in Dragan (2001), Gagnon (1998), Harkins (2001a,2001b), Jepson (2001), and Ross et al (2001)
HARDWARE AND SOFTWARE COMPONENTS
A modern networked data management system consists of a number of hardware and softwarecomponents These items, which often come from different manufacturers and vendors, all mustwork together for the system to function properly
The desktop computer
It is obvious that in order to run a data management system, either client-server or stand-alone,you must have a computer, and the computer resources must be sufficient to run the software Datamanagement programs can be relatively large applications In order to run a program like this youmust have a computer capable of running the appropriate operating system such as Windows Thissection describes the desktop hardware and software requirements for either a client-server orstand-alone database management system Other than the network connection, the hardwarerequirements are the same
DESKTOP HARDWARE
The computer should have a large enough hard drive and enough random access memory(RAM) to be able to load the software and run it with adequate performance, and data managementsoftware can have relatively high requirements For example, Microsoft Access has the greatestresource requirements of any of the Microsoft Office programs At the time of this writing, theminimum and recommended computer specifications for adequate performance using the datamanagement system are as shown in Figure 26
Trang 22Item Minimum Recommended
Computer 200 megahertz Pentium processor 500 to 1000 megahertz Pentium processorHard drive Adequate for software and local
data storage, at least 1 gigabyte
Adequate for software and local datastorage, at least 1 gigabyte
Removable
storage
3.5” floppy, CD-ROM 3.5” floppy, CD-RW, Zip drive
Network 10 megabits per second 100 megabits per second
Figure 26 - Suggested hardware specifications
Probably the most important requirement is adequate random access memory (RAM), the
chips that provide short-term storage of data The amount of RAM should be increased onmachines that are not providing acceptable performance If increasing the RAM does not increasethe performance to a level appropriate for that user’s specific needs, then replacing the computerwith a faster one may be required
It is important to note that the hardware requirements to run the latest software, and thecomputer processing power of standard systems available at the store, both become greater overtime Computers that are more than three years or so old may be inadequate for running the latestversion of the database software
A brand-new, powerful computer including a monitor and printer sells for $1000 or less, so itdoesn’t make sense to limp along on an underpowered, flaky computer Don’t be penny-wise andpound-foolish Be sure that everyone has adequate computers for the work they do It will savemoney in the long run
An important distinction to keep in mind is the difference between memory and storage Acomputer has a certain amount of system memory or RAM It also has a storage device such as ahard drive Often people confuse the two, and say that their computer has 10 gigabytes of memory,when they mean disk storage
DESKTOP SOFTWARE
Several software components are required in order to run a relational database managementsystem These include the operating system, networking software (where appropriate), databasemanagement software, and the application
Operating system
Most systems used for data management run one of the Microsoft operating systems: Windows
95, 98, ME, or NT/2000/XP All of these systems can run the same client data managementsoftware and perform pretty much the same Apple Macintosh systems are present in some places,but are used mostly for graphic design and education, and have limited application for datamanagement due to poor software availability UNIX systems (including the popular open-sourceversion, Linux) are becoming an increasingly viable possibility, with serious database systems likeOracle and DB2 now available for various types of UNIX
Networking software
If the data is to be managed with a shared-file or client-server system, or if the files containing
a single-user database are to be stored on a file server computer, the client computer will need torun networking software to make the network interface card work In some cases the networkingsoftware is part of the operating system This is the case with a Windows network In other casesthe networking will be done with a separate software package Examples include Novell Netware
Trang 23and Banyan Vines Either way, the networking software will generally be loaded during systemstartup, and after that can pretty much be ignored, except that network file server resources andnetwork database server resources are available This networking software is described in moredetail in the next section.
Database management software
The next software element in the database system is the database management software itself.Examples of this software are Microsoft Access, FoxPro, and Paradox This software can be used
by itself to manage the data, or with the help of a customized application as described in the nextsection The database application provides the user interface (the menus and forms that the usersees) and can, in the case of a stand-alone or single-user system, also provide the data storage In aclient-server system, the database software on the client computer provides the user interface, andsome or all of the data is stored on the database server computer somewhere else on the network
If the data to be managed is relatively simple, the database management software by itself isadequate for managing it For example, a simple table of names and addresses can be created anddata entered into it with a minimum of effort As the data model becomes more complicated, and asthe interaction between the database and external data sources becomes more involved, it canbecome increasingly difficult to perform the required activities using the tools of the software byitself At that point a specialized application may be required
Application
When the complexity of the database or its interactions exceeds the capability of the
general-purpose database manager program, it is necessary to move to a specialized vertical market application This refers to software specialized for a particular industry segment An EDMS represents software of this type This type of system is also referred to as COTS (commercial off-
the-shelf) software Usually the vertical market application will provide pre-configured tables andfields to store the data, import routines for data formats common in the industry, forms for editingthe data, reports for printing selected data, and export formats for specific needs Using off-the-shelf EDMS software can give you a great head start in building and managing your database,relative to designing and building your own system
The network adapters are printed circuit boards that are placed in slots in the client and servercomputers and provide the electronic connection between the computer and the network The type
of adapter card used depends on the kind of computer in which it is placed, and the type of networkbeing used
Trang 24Clients
Network Hub
Network Adapter
Network Adapter
Network Adapter NetworkAdapter
Figure 27 - The EDMS network hardware diagram
The wiring also depends on the type of network being used The two most common types ofwiring are twisted pair and coaxial, usually thin Ethernet Twisted pair is becoming more commonover time due to lower cost Most twisted pair networks use Category 5 (sometimes called Cat5)cable, which is similar to standard telephone wiring, but of higher quality There is usually a shortcable that runs between the computer and a wall plate, wiring in the walls from the client’s orserver’s office to a wiring closet, and then another cable from the wall plate or switch block in thewiring closet to the hub
The hub is a piece of hardware that takes the cables carrying data from the computers on thenetwork and connects them together physically Depending on the type of network and the number
of computers, other hardware may be used in place of or in addition to the hub This might includenetwork switches or routers
The network can run at different speeds depending on the capability of the computers, networkcards, hubs, wiring, and so on Until recently 10 megabits per second was standard for local areanetworks (LANs), and 56 kilobits per second was common for wide-area networks (WANs).Increasingly, 100 megabits per second is being installed for LANs and 1 megabit per second orfaster is used for WANs
EDMS NETWORK SOFTWARE
There are a number of software components required on both the client and server computers
in order for the EDMS to operate Included in this category is the operating system transportprotocols and other software required just to make the computer and network work The operatingsystem and network software should be up and running before the EDMS is installed
Trang 25SQLServer Process SQLServer Data Storage
ODBC Driver
Access Front-end
ODBC Driver
Access Front-end ODBC
Driver
Access Front-end
Server
Data In
SQL Queries
Query Results
Data Out
Backup and Restore
Figure 28 - The EDMS network software components
The major networked data management software components of the EDMS are discussed inthis section from an external perspective, that is, looking at the various pieces and what they do,but not at the detailed internal workings of each The important parts of the internal view,especially of the data management system, will be provided in later sections
On the client computers in a client-server system, the important components for datamanagement provide the user interface and communication with the server On the server, thesoftware completes the communication and provides storage and manipulation of the data For astand-alone system, both parts run on the client computer The diagram in Figure 28 shows themajor data management software components for a client-server system, based on Access as afront-end and SQL Server as a back-end
On the client computers, the user interface for the EDMS can be provided by a database such
as Microsoft Access, or can be written in a programming language like Visual Basic, PowerBuilder, Java, or C++ The advantage of using a database language is ease of development andflexibility The advantage of a compiled language is code security, and perhaps speed, althoughspeed is less of a distinguishing factor than it used to be
The main user interface components are forms and menus for soliciting user input and formsand reports for displaying output Also provided by Access on the desktop are queries tomanipulate data and macros and modules (both of which are types of programs) to control program
Trang 26operation and perform various tasks Customized components specific to the EDMS, if any, arecontained in an Access mdb file which is placed on the client computer during setup and whichcan be updated on a regular basis as modifications are made to the software Through thisinterface, the user should be able to (with appropriate privileges) import and check data, selectsubsets of the data, and generate output, including tables, reports, graphs, and maps.
To communicate data with the server, the Access software works with a driver, which is a
specialized piece of software with specific capabilities In a typical EDMS this driver uses a data
transfer protocol called Open DataBase Connectivity (ODBC) The driver for communicating with
SQL Server is provided by Microsoft as part of the Access software installation, although it maynot be installed as part of the standard installation Drivers for other server databases are availablefrom various sources, often the vendor of the database software There are two parts to the ODBCsystem in Windows One part is ODBC administration, which can be accessed through the ODBCicon in Control Panel This part provides central management of ODBC connections for all of thedrivers that are installed The second part consists of individual drivers for specific data sources.There are two kinds of ODBC drivers, single-tier and multi-tier The single-tier driversprovide both the communication and data manipulation capabilities, and the data managementsoftware for that specific format itself is not required Examples of single-tier drivers include thedrivers for Access, dBase, and FoxPro data files Multi-tier drivers provide the communicationbetween the client and server, and work with the database management software on the server toprovide data access Examples of multi-tier drivers include the drivers for SQL Server and Oracle.The server side of the ODBC communication link is provided by software that runs on theserver as an NT/2000/XP process The SQL Server process listens for data requests from clientsacross the network via the ODBC link, executes queries locally on the server, and sends the resultsback to the requesting client This step is very important, because the traffic across the network isminimized The requests for data are in the form of SQL queries, which are a few hundred to a fewthousand characters, and the data returned is whatever was asked for In this way the user canquery a small amount of data from a database with millions of records and the network trafficwould be just a few thousand characters
Some EDMS software packages can work in either stand-alone or client-server mode In thefirst case it uses a direct link to the Jet database engine when working with an Access database Inthe second case, the EDMS uses the SQL Server multi-tier driver to communicate between the userinterface in Access and SQL Server on the server When users are attached to a local Accessdatabase, all of the processing and data flow occurs on the client computer When connected to theserver database the data comes from the server
The server
SERVER HARDWARE
The third hardware component of the EDMS, besides client computers and the network, is thedatabase server This is a computer, usually a relatively powerful one, which contains the data andruns the server component of the data management software Usually it runs an enterprise-gradeoperating system such as Windows NT/2000/XP or UNIX In large organizations the server will beprovided or operated by an Information Technology (IT) or similar group, while in smallerorganizations data administrators or power users in the group will run it
The range of hardware used for servers, especially running Windows NT/2000/XP, is great.NT/2000/XP can run on a standard PC of the type purchased at discount or office supply stores
This is actually a good solution for small groups, especially when the application is not mission critical, meaning that if the database becomes unavailable for short periods of time the company
won’t be shut down
Trang 27Figure 29 - Example administrative screen from Microsoft SQL Server
For an organization where the amount of use of the system is greater, or full-time availability
is very important, a computer designed as a server, with redundant and hot-swappable (can bereplaced without turning off the computer) components, is a better solution This can increase thecost of the computer by a factor of two to ten or more, but may be justified depending on the cost
of loss of availability
SERVER SOFTWARE
The client-based software components described above are those that users interact with.System administrators also interact with the server database user interface, which is softwarerunning on the server computer that allows maintenance of the database These maintenanceactivities include regular backup of the data and occasional other maintenance activities includinguser and volume administration Software is also available which allows many of thesemaintenance activities to be performed from computers remote from the server, if this is moreconvenient An example screen from SQL Server is shown in Figure 29
UNITS OF DATA STORAGE
The smallest unit of information used by computers is the binary bit (short for BInary digiT).
A bit is made up of one piece of data consisting of either a zero or a one, or more precisely, theelectrical charge is on or off at that location in memory All other types of data are composed ofone or more bits
The next larger common unit of storage is the byte, which contains eight bits One byte can
represent one of 256 different possibilities (two raised to the eighth power) This allows a byte torepresent any one of the characters of the alphabet, the numbers and punctuation symbols, or alarge number of other characters For example, the letter A (capital A) can be represented by thebyte 01000001 How each character is coded depends on the coding convention used The twomost common are ASCII (American Standard Code for Information Interchange) used on personal
Trang 28computers and workstations, and EBCDIC (Extended Binary Coded Decimal Interchange Code)used on some mainframes.
The largest single piece of data that can be handled directly by a given processor is called a
word For an 8-bit machine, a word is the same as a byte For a 16-bit system, a word is 16 bits
long, and so on A 32-bit processor is faster than a 16-bit processor of the same clock speedbecause it can process more data at once, since the word size is twice as big
For larger amounts of data, the amount of storage is generally referred to in terms of thenumber of bytes, usually in factors of a thousand (actually 1024, or 210) Thus one thousand bytes
would be one kilobyte, one million would be one megabyte, one billion is one gigabyte, and one trillion is one terabyte As memory, mass storage devices, and databases become larger, the last
two terms are becoming increasingly important
DATABASES AND FILES
As discussed in Chapter 5, databases can be described by their logical data model, whichfocuses on data and relationships, and their physical data model, which is how the data is stored in
the computer All data in a modern computer is stored in files Files are chunks of related data
stored together on a disk drive such as a hard disk or floppy disk The operating system takes care
of managing the details of the files such as where they are located on the disk Files have names,and files in DOS and Windows usually have a base name and an extension separated by a period,such as Mydata.dbf The extension usually tells you what type of file it is
Older database systems often stored their data in the format of dBase, with an extension of.dbf Access stores its data and programs in files with the extension of mdb for MicrosoftDataBase, and can store many tables and other objects in one file Most Access developers buildtheir applications with one mdb file for the program information (queries, forms, reports, etc.) andanother for the data (tables) Larger database applications have their data in an external databasemanager such as Oracle or SQL Server The user does not see this data as files, but rather as a datasource available across the network If the front end is running in Access, they will still have theprogram mdb either on their local hard drive or available on a network drive If their user interface
is a compiled program written in Visual Basic, C, or a similar language, it will have an extension of.exe
We will now look at the remaining parts of a database system from the point of view of astand-alone Access database The concepts are about the same for other database softwarepackages Access databases contain six primary objects These are tables, queries, forms, reports,macros, and modules These objects are described in the following sections
TABLES (“DATABASES”)
The basic element of storage in a relational database system is the table Each table is ahomogeneous set of rows of data describing one type of real-world object In some older systemslike dBase, each table was referred to as a database file Current usage tends more towardconsidering the database as the set of related tables, rather than calling one table a database Tablescontain the following parts:
Records – Each line in a table is called a record, row, entity, or tuple For example, each
boring or analysis would be a record in the appropriate table Records are described in more detailbelow
Fields – Each data element within a record is called a field, column, or attribute This
represents a significant attribute of a real-world object, such as the elevation of a boring or themeasured value of a constituent Records are also described in more detail below
Trang 29Figure 30 - Join Properties form in Microsoft Access
Relationships – Data in different tables can be related to each other For example, each
analysis is related to a specific sample, which in turn is related to a specific boring Relationshipsare usually based on key fields The database manager can help in enforcing relationships using
referential integrity, which requires that defined relationships be fulfilled according to the join
type Using this capability, it would be impossible to have an analysis for which there is no sample
Join types – A relationship between two tables is defined by a join There are two kinds of
joins, inner joins and outer joins In an inner join, matching records must be present on both sides
of the join That means that if one of the tables has records that have no matching records in theother, they are not displayed An outer join allows unmatched records to be displayed It can be a
left join or a right join, depending on which table will have unmatched records displayed.
Figure 30 shows an example of defining an outer join in Access In this example, a query hasbeen created with the Sites and Stations tables The join based on the SiteNumber field has beendefined as an outer join, with all records from the Sites table being displayed, even if there are nocorresponding records in the Stations table This outer join is a left join
Figure 31 shows the result of this query There are stations for Rad Industries and ForestProducts Co., but none for Refining, Inc Because of the outer join there is a record displayed forRefining, Inc even though there are no stations
Figure 31 - Result of an outer join query
Trang 30FIELDS (COLUMNS)
The fields within each record contain the data of each specific kind within that record Theseare analogous to the way columns are often used in a spreadsheet, or the blanks to be filled out on apaper form
Data types – Each field has a data type, such as numeric (several possible types), character,
date/time, yes/no, object, etc The data type limits the content of the field to just that kind of data,although character fields can contain numbers and dates You shouldn’t store numbers in acharacter field, though, if you want to treat them as numbers, such as performing arithmeticcalculations on them
Character fields are the most common type of field They may include letters, numbers,
punctuation marks, and any other printable characters Some typical character fields would be
SiteName, SampleType, and so on.
Numeric is for numbers on which calculations will be performed They may be either positive
or negative, and may include a decimal point Numeric fields that might be found in an EDMS are
GroundElevation, SampleTop, etc Some systems break numbers down further into integer and
floating point numbers of various degrees of precision Generally this is only important if you arewriting software, and less important if you are using commercial programs
It is important to note that Microsoft programs such as Excel and Access have an annoyingfeature (bug) that refuses to save trailing zeros, which are very important in tracking precision Ifyou open a new spreadsheet in Excel, type in 3.10, and press Enter, the zero will go away You canchange the formatting to get it back, but it’s not stored with the number The best way around this
is to store the number of decimals with each result value, and then format the number when it isdisplayed
Date is pretty obvious Arithmetic calculations can often be performed on dates For example, the fields SampleDate and AnalysisDate could be included in a table, and could be subtracted from
each other to find the holding time Date fields in older systems are often 8 characters long(MM/DD/YY), while more modern, year 2000 compliant systems are 10 characters(MM/DD/YYYY)
There is some variability in the way that time is handled in data management systems In somedatabase files, such as dBase and FoxPro dbf files, date and time are stored in separate fields Inothers, such as Access mdb files, both can be stored in one field, with the whole numberrepresenting the date and the decimal component containing the time The dates in Access arestored as the number of days since 12/30/1899, and times as the fraction of the day starting atmidnight, such that 5 is noon
The way dates are displayed traditionally varies from one part of the world to another, so as
we go global, be careful On Windows computers, the date display format is set in the operatingsystem under Start/Settings/Control Panel/Regional Settings
Logical represents a yes/no (true/false) value Logical fields are one byte long (although it actually takes only one bit to store a logical value) ConvertedValue could be a logical field that is
true or false based on whether or not a value in the database has been converted from its originalunits
Data domain – Data within each field can be limited to a certain range For example, pH
could be limited to the range of 0 to 14 Comprehensive domain checking can be difficult toimplement effectively, since in a normalized data model, pH is not stored in its own field, but in
the same Value field that stores sulfate and benzene, which certainly can exceed 14 That means
that this type of domain analysis usually requires programming
Value – Each field has a value, which can be some measured amount, some text attribute, etc.
It is also possible that the value may be unknown or not exist, in which case the value can be set to
Null Be aware, however, that Null is not the same as zero, and is treated differently by the
software
Trang 31Figure 32 - Oracle screen for setting field properties
Key fields – Within each table there should be one or more fields that make each record in the
table unique This might be some real-world attribute (such as laboratory sample number) or a
synthetic key such as a counter assigned by the data management system A primary key has a
unique value for each record in the table A field in one table that is a primary key in another table
is called a foreign key, and need not be unique, such as on the “many” side of a one-to-many relationship Simple keys, which are made up of one field, are usually preferable to compound keys
made up of more than one field Compound keys, and in fact any keys based on real data, areusually poor choices because they depend on the data, which may change
Figure 32 shows an Oracle screen for setting field properties
RECORDS (ROWS)
Once the tables and fields have been defined, the data is usually entered one record at a time.
Each well in the Stations table or groundwater sample in the Samples table is a record Often the
size of a database is described by the number of records in its tables
QUERIES (VIEWS)
In Access, data manipulation is done using queries Queries are based on SQL, and are given
names and stored as objects, just like tables The output of a query can be viewed directly in aneditable, spreadsheet-like view, or can be used as the basis of a form or a report Access has sixtypes of queries:
Select – This is the basic data retrieval query.
Cross-tab – This is a specialized query for summarizing data.
Trang 32Figure 33 - Simple data editing form
Make table – This query is used to retrieve data and place it into a new table.
Update – This query changes data in an existing table.
Append – This query type adds records to an existing table.
Delete – These queries remove records from a table, and should be used with great care!
OTHER DATABASE OBJECTS
The other types of database objects in an Access system are forms, reports, macros, andmodules Forms and reports are for entering and displaying data, while macros and modules are forautomating operations
Forms
Forms in data management programs such as Access are generally used for entering, editing,
or selecting data, although they can also be used as menus for selecting an activity Forms for
working with data use a table or a query as a data source.
Figure 34 - Advanced data editing form