The requirements of environmental monitoring schemes have led to an interest in special types of sampling designs that include aspects of random sampling, good spatial cover, and the gra
Trang 1CHAPTER 5 Environmental Monitoring
5.1 Introduction
The increasing worldwide concern about threats to the naturalenvironment both on a local and a global scale has led to theintroduction of many monitoring schemes that are intended to provide
an early warning of violations of quality control systems, to detect theeffects of major events such as accidental oil spills or the illegaldisposal of wastes, and to study long-term trends or cycles in keyenvironmental variables
Examples of some of the national monitoring schemes that are nowoperating are the United States Environmental Protection Agency'sEnvironmental Monitoring and Assessment Program (EMAP) based
on 12,600 hexagons each with an area of 40 square kilometres, theUnited Kingdom Environmental Change Network (ECN) based on ninesites, and the Swedish Environmental Monitoring Program based on
20 sites In all three of these schemes a large number of variables arerecorded on a regular basis to describe physical aspects of the land,water and atmosphere, and the abundance of many species ofanimals and plants Around the world numerous smaller scalemonitoring schemes are also operated for particular purposes, such
as to ensure that the quality of drinking water is adequate
Monitoring schemes to detect unexpected changes and trends areessentially repeated surveys The sampling methods described inChapter 2 are therefore immediately relevant In particular, if themean value of a variable for the sample units in a geographical area
is of interest, then the population of units should be randomly sampled
so that the accuracy of estimates can be assessed in the usual way.Modifications of simple random sampling such as stratified samplingmay well be useful to improve efficiency
The requirements of environmental monitoring schemes have led
to an interest in special types of sampling designs that include aspects
of random sampling, good spatial cover, and the gradual replacement
of sampling sites over time (Skalski, 1990; Stevens and Olsen, 1991;
Overton et al., 1991; Urquhart et al., 1993; Conquest and Ralph,
1998) Designs that are optimum in some sense have also been
developed (Fedorov and Mueller, 1989; Caselton et al., 1992).
Trang 2Although monitoring schemes sometimes require fairly complicateddesigns, as a general rule it is a good idea to keep designs as simple
as possible so that they are easily understood by administrators andthe public Simple designs also make it easier to use the data forpurposes that were not foreseen in the first place, which is somethingthat will often occur As noted by Overton and Stehman (1995,1996), complex sample structures create potential serious difficultiesthat do not exist with simple random sampling
5.2 Purposely Chosen Monitoring Sites
For practical reasons the sites for long-term monitoring programs areoften not randomly chosen For example, Cormack (1994) notes thatthe nine sites for the United Kingdom ECN were chosen on the basis
of having:
(a) a good geographical distribution covering a wide range ofenvironmental conditions and the principal natural and managedecosystems;
(b) some guarantee of long-term physical and financial security;(c) a known history of consistent management;
(d) reliable and accessible records of past data, preferably for ten ormore years; and
(e) sufficient size to allow the opportunity for further experiments andobservations
In this scheme it is assumed that the initial status of sites can beallowed for by only considering time changes These changes canthen be related to differences between the sites in terms of measuredmeteorological variables and known geographical differences
5.3 Two Special Monitoring Designs
Skalski (1990) suggested a rotating panel design with augmentationfor long-term monitoring This takes the form shown in Table 5.1 ifthere are eight sites that are visited every year and four sets of tensites that are rotated Site set 7, for example, consists of ten sites thatare visited in years 4 to 7 of the study The number of sites in different
Trang 3sets is arbitrary Preferably, the sites will be randomly chosen from anappropriate population of sites This design has some appealingproperties: the sites that are always measured can be used to detectlong-term trends but the rotation of blocks of ten sites ensures that thestudy is not too dependent on an initial choice of sites that may beunusual in some respects.
Table 5.1 Skalski's (1990) rotating panel design with augmentation.
Every year 48 sites are visited Of these, 8 are always the same and theother 40 sites are in four blocks of size ten, such that each block of tenremains in the sample for four years after the initial start up period
in different sets is at choice in a design of this form Sites should berandomly selected from an appropriate population
Urquhart et al (1993) compared the efficiency of the designs in
Trang 4number visited every year (i.e., in set 0) ranged from 0 to 48 To dothis, they assumed the model
Yijk = Si(j)k + Tj + eijk,
where Yijk is a measure of the condition at site i, in year j, within siteset k; Si(j)k is an effect specific to site i, in site set k, in year j, Tj is ayear effect common to all sites, and eijk is a random disturbance Theyalso allowed for autocorrelation between the overall year effects, andbetween the repeated measurements at one site They found thedesign of Table 5.2 to always be better for estimating the currentmean and the slope in a trend because more sites are measured inthe first few years of the study However, in a more recent studywhich compared the two designs in terms of variance and cost, Lesserand Kalsbeek (1997) concluded that the first design tends to be betterfor detecting short-term change while the second design tends to bebetter for detecting long-term change
Table 5.2 A serially alternating design with augmentation Every year 48
sites are measured Of these, eight sites are always the same and theother 40 sites are measured every four years
The EMAP sample design is based on approximately 12,600 points
on a grid, each of which is the centre of a hexagon with area 40 km².The grid is itself within a large hexagonal region covering much ofNorth America, as shown in Figure 5.1 The area covered by the 40km² hexagons entered on the grid points is one sixteenth of the totalarea of the conterminous United States, with the area used beingchosen after a random shift in the grid Another aspect of the design
is that the four sets of sites that are measured on different years are
Trang 5spatially interpenetrating, as indicated in Figure 5.2 This allows theestimation of parameters for the whole area every year.
Figure 5.1 The EMAP baseline grid for North America The shaded area
shown is covered by about 12,600 small hexagons, with a spacing betweentheir centres being of 27 km
Figure 5.2 The use of spatially interpenetrating samples for visits at four
year intervals
Trang 65.4 Designs Based on Optimization
One approach to the design of monitoring schemes is by choosing thesites so that the amount of information is in some sense maximized.The main question then is how to measure the information that is to
be maximized, particularly if the monitoring scheme has a number ofdifferent objectives, some of which will only become known in thefuture
One possibility involves choosing a network design, or adding orsubtracting stations to minimize entropy, where low entropy
corresponds to high 'information' (Caselton et al., 1992) The theory
is complex, and needs more prior information than will usually beavailable, particularly if there is no existing network to provide this.Another possibility considers the choice of a network design to be
a problem of the estimation of a regression function for which aclassical theory of optimal design exists (Fedorov and Mueller, 1989)
5.5 Monitoring Designs Typically Used
In practice, sample designs for monitoring often consist of selecting
a certain number of sites preferably (but not necessarily) at randomfrom the potential sites in a region, and then measuring the variable
of interest at those sites at a number of points in time A complication
is that for one reason or another some of the sites may not bemeasured at some of the times A typical set of data will then look likethe data in Table 5.3 for pH values measured on lakes in Norway.With this set of data, which is part of the more extensive data that areshown in Table 1.1 and discussed in Example 1.2, the main question
of interest is whether there is any evidence for changes from year toyear in the general level of pH and, in particular, whether the pH levelwas tending to increase or decrease
5.6 Detection of Changes by Analysis of Variance
A relatively simple analysis for data like the Norwegian lake pH valuesshown in Table 5.3 involves carrying out a two factor analysis ofvariance, as discussed in Section 3.5 The two factors are then thesite and the time The model for the observation at site i at time j is
yij = µ + Si + Tj + eij, (5.1)
Trang 7where µ represents an overall general level for the variable beingmeasured, Si represents the deviation of site i from the general level,
Tj represents a time effect, and eij represents measurement errors andother random variation that is associated with the observation at thesite at the particular time
The model (5.1) does not include a term for the interaction betweensites and times as is included in the general two-factor analysis ofvariance model as defined in equation (4.31) This is because there
is only at most one observation for a site in a particular year, whichmeans that it is not possible to separate interactions frommeasurement errors Consequently, it must be assumed that anyinteractions are negligible
Example 5.1 Analysis of Variance on the pH Values
The results of an analysis of variance on the pH values for Norwegianlakes are summarised in Table 5.4 The results in this table wereobtained using the MINITAB package (Minitab Inc., 1994) using anoption that takes into account the missing values, although many otherstandard statistical packages could have been used just as well Theeffects in the model were assumed to be fixed rather than random (asdiscussed in Section 4.5), although since interactions are assumed to
be negligible the same results would be obtained using randomeffects It is found that there is a significant difference between thelakes (p = 0.000) and a nearly significant difference between the years(p = 0.061) Therefore there is no very strong evidence from thisanalysis of differences between years
To check the assumptions of the analysis, standardized residuals(the differences between the actual observations and those predicted
by the model, divided by their standard deviations) can be plottedagainst the lake, the year, and against their position in space for each
of the four years These plots are shown in Figures 5.3 and 5.4.These residuals show no obvious patterns so that the model seemssatisfactory, except that there are one or two residuals that are ratherlarge
Trang 8Table 5.3 Values for pH for lakes in southern
Norway with the latitudes (Lat) and longitudes(Long) for the lakes
Trang 9Table 5.4 Analysis of variance table for the data on pH levels in Norwegian
lakes
Source of
Variation
Sum of Squares 1
Degrees of Freedom
Mean Square F
icance level (p)
Figure 5.3 Standardized residuals from the analysis of variance model for
pH in Norwegian lakes plotted against the lake number and the yearnumber
5.7 Detection of Changes Using Control Charts
Control charts are used to monitor industrial processes (Montgomery,1991) and they can be used equally well with environmental data Thesimplest approach involves using an x chart to detect changes in aprocess mean, together with a range chart to detect changes in theamount of variation These types of charts are often called Shewhartcontrol charts after their originator (Shewhart, 1931)
Typically, the starting point is a moderately large set of dataconsisting of M random samples of size n, where these are taken atequally spaced intervals of time from the output of the process Thisset of data is then used to estimate the process mean and standarddeviation, and hence to construct the two charts The data are then
Trang 10plotted on the charts It is usually assumed that the observations arenormally distributed.
If the process seems to have a constant mean and standarddeviation, then the sampling of the process is continued with newpoints being plotted to monitor whatever is being measured If themean or standard deviation does not seem to have been constant forthe time when the initial samples were taken, then in the industrialprocess situation, action is taken to bring the process under control.With environmental monitoring this may not be possible However,the knowledge that the process being measured is not stable will be
of interest anyway
Figure 5.4 Standardized residuals from the analysis of variance model for
pH in Norwegian lakes plotted against the locations of the lakes Thestandardized residuals are rounded to the nearest integer for clarity
The method for constructing the x-chart involves the followingstages:
Trang 111 The sample mean and the sample range (the maximum value in asample minus the minimum value in a sample) are calculated, foreach of the M samples For the ith sample let these values bedenoted by xi and Ri.
2 The mean of the variable being measured is assumed to beconstant, and is estimated by the overall mean of all the availableobservations, which is also just the mean of the sample means x1
to xM Let the estimated mean be denoted by µ
3 Similarly, the standard deviation of ring widths is assumed to haveremained constant and this is estimated on the basis of a knownrelationship between the mean range for samples of size n and thestandard deviation for samples from a normal distribution Thisrelationship is of the form F = k(n)µR, where µR is the mean rangefor samples of size n, and the constant k(n) is given in Table 5.5.Thus the estimated standard deviation is
where R- is the mean of the sample ranges
4 The standard error of the mean for samples of size n is estimated
to be SÊ(x) = F/%n
5 Warning limits are set at the mean plus and minus 1.96 standarderrors, i.e., at µ ± 1.96SÊ(x) If the mean and standard deviationare constant then only about 1 in 20 (5%) sample means should beoutside one of these limits Action limits are set at the mean plusand minus 3.09 standard errors, i.e., at µ ± 3.09SÊ(x) Only about
1 in 500 (0.2%) sample means should plot outside these limits
The rationale behind constructing the x chart in this way is that itshows the changes in the sample means with time, and the warningand action limits indicate whether these changes are too large to bedue to normal random variation if the mean is in fact constant
Trang 12Table 5.5 Control chart limits for sample ranges, assuming samples from
normal distributions To find the limits on the range chart, multiply themean range by the tabulated value For example, for samples of size n =
5 the lower action limit is 0.16µR, where µR is the mean range With astable distribution a warning limit is crossed with probability 0.05 (5%) and
an action limit with probability 0.002 (0.2%) The last column is the factorthat the mean range must be multiplied by to obtain the standard deviation.For example, for samples of size 3 the standard deviation is 0.591µR.Source: Tables G1 and G2 of Davies and Goldsmith (1972)
size Action Warning Warning Action Factor (k)
is stable Similarly, action limits can be placed so that the probability
of crossing one of them is 0.002 (0.2%) when the level of variation isstable The setting of these limits requires the use of tabulated valuesthat are provided and explained in Table 5.5
Control charts can be produced quite easily in a spreadsheetprogram Alternatively, statistical package such as MINITAB (MintabInc., 1994) have options to produce the charts, and often allow anumber of other types of control charts to be produced as well
Example 5.2 Monitoring pH in New Zealand
monitoring of rivers in the South Island of New Zealand Values areprovided for pH, for five randomly chosen rivers, with a differentselection for each of the monthly sample times from January 1989 toDecember 1997 The data are used to construct control charts for
Trang 13monitoring pH over the sampled time As shown in Figure 5.5, thedistribution is reasonably close to normal.
The overall mean of the pH values for all the samples is µ^ = 7.640.This is used as the best estimate of the process mean The mean ofthe sample ranges is R - = 0.694 From Table 5.5, the factor to convertthis to an estimate of the process standard deviation is k(5) = 0.43.The estimated standard deviation is therefore
The range chart is shown in Figure 5.6 (b) Using the factors forsetting the action and warning limits from Table 5.5, these limits are
at are 0.16 x 0.694 = 0.11 (lower action limit), 0.37 x 0.694 = 0.26(lower warning limit), 1.81 x 0.694 = 1.26 (upper warning limit), and2.36 x 0.694 = 1.64 (upper action limit) Due to the nature of thedistribution of sample ranges, these limits are not symmetricallyplaced about the mean
With 108 observations altogether, it is expected that about fivepoints will plot outside the warning limits In fact, there are ten pointsoutside these limits, and one point outside the lower action limit Itappears, therefore, that the mean pH level in the South Island of NewZealand was changing to some extent during the monitored period,with the overall plot suggesting that the pH level was high for aboutthe first two years, and was lower from then on
The range chart has seven points outside the warning limits, andone point above the upper action limit Here again there is evidencethat the process variation was not constant, with occasional 'spikes'
of high variability, particularly in the early part of the monitoring period
Trang 14Table 5.6 Values for pH for five randomly selected rivers in the South
Island of New Zealand, for each month from January 1989 to December
1997 This is part of a larger data set provided by Graham McBride,National Institute of Water and Atmospheric Research, Hamilton, NewZealand
1989 Jan 7.27 7.10 7.02 7.23 8.08 7.34 1.06
Feb 8.04 7.74 7.48 8.10 7.21 7.71 0.89 Mar 7.50 7.40 8.33 7.17 7.95 7.67 1.16 Apr 7.87 8.10 8.13 7.72 7.61 7.89 0.52 May 7.60 8.46 7.80 7.71 7.48 7.81 0.98 Jun 7.41 7.32 7.42 7.82 7.80 7.55 0.50 Jul 7.88 7.50 7.45 8.29 7.45 7.71 0.84 Aug 7.88 7.79 7.40 7.62 7.47 7.63 0.48 Sep 7.78 7.73 7.53 7.88 8.03 7.79 0.50 Oct 7.14 7.96 7.51 8.19 7.70 7.70 1.05 Nov 8.07 7.99 7.32 7.32 7.63 7.67 0.75 Dec 7.21 7.72 7.73 7.91 7.79 7.67 0.70
1990 Jan 7.66 8.08 7.94 7.51 7.71 7.78 0.57
Feb 7.71 8.73 8.18 7.04 7.28 7.79 1.69 Mar 7.72 7.49 7.62 8.13 7.78 7.75 0.64 Apr 7.84 7.67 7.81 7.81 7.80 7.79 0.17 May 8.17 7.23 7.09 7.75 7.40 7.53 1.08 Jun 7.79 7.46 7.13 7.83 7.77 7.60 0.70 Jul 7.16 8.44 7.94 8.05 7.70 7.86 1.28 Aug 7.74 8.13 7.82 7.75 7.80 7.85 0.39 Sep 8.09 8.09 7.51 7.97 7.94 7.92 0.58 Oct 7.20 7.65 7.13 7.60 7.68 7.45 0.55 Nov 7.81 7.25 7.80 7.62 7.75 7.65 0.56 Dec 7.73 7.58 7.30 7.78 7.11 7.50 0.67
1991 Jan 8.52 7.22 7.91 7.16 7.87 7.74 1.36
Feb 7.13 7.97 7.63 7.68 7.90 7.66 0.84 Mar 7.22 7.80 7.69 7.26 7.94 7.58 0.72 Apr 7.62 7.80 7.59 7.37 7.97 7.67 0.60 May 7.70 7.07 7.26 7.82 7.51 7.47 0.75 Jun 7.66 7.83 7.74 7.29 7.30 7.56 0.54 Jul 7.97 7.55 7.68 8.11 8.01 7.86 0.56 Aug 7.86 7.13 7.32 7.75 7.08 7.43 0.78 Sep 7.43 7.61 7.85 7.77 7.14 7.56 0.71 Oct 7.77 7.83 7.77 7.54 7.74 7.73 0.29 Nov 7.84 7.23 7.64 7.42 7.73 7.57 0.61 Dec 8.23 8.08 7.89 7.71 7.95 7.97 0.52
Trang 15Table 5.6 (Continued)
1992 Jan 8.28 7.96 7.86 7.65 7.49 7.85 0.79
Feb 7.23 7.11 8.53 7.53 7.78 7.64 1.42 Mar 7.68 7.68 7.15 7.68 7.85 7.61 0.70 Apr 7.87 7.20 7.42 7.45 7.96 7.58 0.76 May 7.94 7.35 7.68 7.50 7.12 7.52 0.82 Jun 7.80 6.96 7.56 7.22 7.76 7.46 0.84 Jul 7.39 7.12 7.70 7.47 7.74 7.48 0.62 Aug 7.42 7.41 7.47 7.80 7.12 7.44 0.68 Sep 7.91 7.77 6.96 8.03 7.24 7.58 1.07 Oct 7.59 7.41 7.41 7.02 7.60 7.41 0.58 Nov 7.94 7.32 7.65 7.84 7.86 7.72 0.62 Dec 7.64 7.74 7.95 7.83 7.96 7.82 0.32
1993 Jan 7.55 8.01 7.37 7.83 7.51 7.65 0.64
Feb 7.30 7.39 7.03 8.05 7.59 7.47 1.02 Mar 7.80 7.17 7.97 7.58 7.13 7.53 0.84 Apr 7.92 8.22 7.64 7.97 7.18 7.79 1.04 May 7.70 7.80 7.28 7.61 8.12 7.70 0.84 Jun 7.76 7.41 7.79 7.89 7.36 7.64 0.53 Jul 8.28 7.75 7.76 7.89 7.82 7.90 0.53 Aug 7.58 7.84 7.71 7.27 7.95 7.67 0.68 Sep 7.56 7.92 7.43 7.72 7.21 7.57 0.71 Oct 7.19 7.73 7.21 7.49 7.33 7.39 0.54 Nov 7.60 7.49 7.86 7.86 7.80 7.72 0.37 Dec 7.50 7.86 7.83 7.58 7.45 7.64 0.41
1994 Jan 8.13 8.09 8.01 7.76 7.24 7.85 0.89
Feb 7.23 7.89 7.81 8.12 7.83 7.78 0.89 Mar 7.08 7.92 7.68 7.70 7.40 7.56 0.84 Apr 7.55 7.50 7.52 7.64 7.14 7.47 0.50 May 7.75 7.57 7.44 7.61 8.01 7.68 0.57 Jun 6.94 7.37 6.93 7.03 6.96 7.05 0.44 Jul 7.46 7.14 7.26 6.99 7.47 7.26 0.48 Aug 7.62 7.58 7.09 6.99 7.06 7.27 0.63 Sep 7.45 7.65 7.78 7.73 7.31 7.58 0.47 Oct 7.65 7.63 7.98 8.06 7.51 7.77 0.55 Nov 7.85 7.70 7.62 7.96 7.13 7.65 0.83 Dec 7.56 7.74 7.80 7.41 7.59 7.62 0.39
1995 Jan 8.18 7.80 7.22 7.95 7.79 7.79 0.96
Feb 7.63 7.88 7.90 7.45 7.97 7.77 0.52 Mar 7.59 8.06 8.22 7.57 7.73 7.83 0.65 Apr 7.47 7.82 7.58 8.03 8.19 7.82 0.72 May 7.52 7.42 7.76 7.66 7.76 7.62 0.34 Jun 7.61 7.72 7.56 7.49 6.87 7.45 0.85 Jul 7.30 7.90 7.57 7.76 7.72 7.65 0.60 Aug 7.75 7.75 7.52 8.12 7.75 7.78 0.60 Sep 7.77 7.78 7.75 7.49 7.14 7.59 0.64 Oct 7.79 7.30 7.83 7.09 7.09 7.42 0.74 Nov 7.87 7.89 7.35 7.56 7.99 7.73 0.64 Dec 8.01 7.56 7.67 7.82 7.44 7.70 0.57