1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Statistics for Environmental Science and Management - Chapter 5 pps

30 271 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Statistics for Environmental Science and Management - Chapter 5 pps
Trường học Chapman & Hall/CRC
Chuyên ngành Environmental Science and Management
Thể loại chapter
Năm xuất bản 2001
Định dạng
Số trang 30
Dung lượng 1,66 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The requirements of environmental monitoring schemes have led to an interest in special types of sampling designs that include aspects of random sampling, good spatial cover, and the gra

Trang 1

CHAPTER 5 Environmental Monitoring

5.1 Introduction

The increasing worldwide concern about threats to the naturalenvironment both on a local and a global scale has led to theintroduction of many monitoring schemes that are intended to provide

an early warning of violations of quality control systems, to detect theeffects of major events such as accidental oil spills or the illegaldisposal of wastes, and to study long-term trends or cycles in keyenvironmental variables

Examples of some of the national monitoring schemes that are nowoperating are the United States Environmental Protection Agency'sEnvironmental Monitoring and Assessment Program (EMAP) based

on 12,600 hexagons each with an area of 40 square kilometres, theUnited Kingdom Environmental Change Network (ECN) based on ninesites, and the Swedish Environmental Monitoring Program based on

20 sites In all three of these schemes a large number of variables arerecorded on a regular basis to describe physical aspects of the land,water and atmosphere, and the abundance of many species ofanimals and plants Around the world numerous smaller scalemonitoring schemes are also operated for particular purposes, such

as to ensure that the quality of drinking water is adequate

Monitoring schemes to detect unexpected changes and trends areessentially repeated surveys The sampling methods described inChapter 2 are therefore immediately relevant In particular, if themean value of a variable for the sample units in a geographical area

is of interest, then the population of units should be randomly sampled

so that the accuracy of estimates can be assessed in the usual way.Modifications of simple random sampling such as stratified samplingmay well be useful to improve efficiency

The requirements of environmental monitoring schemes have led

to an interest in special types of sampling designs that include aspects

of random sampling, good spatial cover, and the gradual replacement

of sampling sites over time (Skalski, 1990; Stevens and Olsen, 1991;

Overton et al., 1991; Urquhart et al., 1993; Conquest and Ralph,

1998) Designs that are optimum in some sense have also been

developed (Fedorov and Mueller, 1989; Caselton et al., 1992).

Trang 2

Although monitoring schemes sometimes require fairly complicateddesigns, as a general rule it is a good idea to keep designs as simple

as possible so that they are easily understood by administrators andthe public Simple designs also make it easier to use the data forpurposes that were not foreseen in the first place, which is somethingthat will often occur As noted by Overton and Stehman (1995,1996), complex sample structures create potential serious difficultiesthat do not exist with simple random sampling

5.2 Purposely Chosen Monitoring Sites

For practical reasons the sites for long-term monitoring programs areoften not randomly chosen For example, Cormack (1994) notes thatthe nine sites for the United Kingdom ECN were chosen on the basis

of having:

(a) a good geographical distribution covering a wide range ofenvironmental conditions and the principal natural and managedecosystems;

(b) some guarantee of long-term physical and financial security;(c) a known history of consistent management;

(d) reliable and accessible records of past data, preferably for ten ormore years; and

(e) sufficient size to allow the opportunity for further experiments andobservations

In this scheme it is assumed that the initial status of sites can beallowed for by only considering time changes These changes canthen be related to differences between the sites in terms of measuredmeteorological variables and known geographical differences

5.3 Two Special Monitoring Designs

Skalski (1990) suggested a rotating panel design with augmentationfor long-term monitoring This takes the form shown in Table 5.1 ifthere are eight sites that are visited every year and four sets of tensites that are rotated Site set 7, for example, consists of ten sites thatare visited in years 4 to 7 of the study The number of sites in different

Trang 3

sets is arbitrary Preferably, the sites will be randomly chosen from anappropriate population of sites This design has some appealingproperties: the sites that are always measured can be used to detectlong-term trends but the rotation of blocks of ten sites ensures that thestudy is not too dependent on an initial choice of sites that may beunusual in some respects.

Table 5.1 Skalski's (1990) rotating panel design with augmentation.

Every year 48 sites are visited Of these, 8 are always the same and theother 40 sites are in four blocks of size ten, such that each block of tenremains in the sample for four years after the initial start up period

in different sets is at choice in a design of this form Sites should berandomly selected from an appropriate population

Urquhart et al (1993) compared the efficiency of the designs in

Trang 4

number visited every year (i.e., in set 0) ranged from 0 to 48 To dothis, they assumed the model

Yijk = Si(j)k + Tj + eijk,

where Yijk is a measure of the condition at site i, in year j, within siteset k; Si(j)k is an effect specific to site i, in site set k, in year j, Tj is ayear effect common to all sites, and eijk is a random disturbance Theyalso allowed for autocorrelation between the overall year effects, andbetween the repeated measurements at one site They found thedesign of Table 5.2 to always be better for estimating the currentmean and the slope in a trend because more sites are measured inthe first few years of the study However, in a more recent studywhich compared the two designs in terms of variance and cost, Lesserand Kalsbeek (1997) concluded that the first design tends to be betterfor detecting short-term change while the second design tends to bebetter for detecting long-term change

Table 5.2 A serially alternating design with augmentation Every year 48

sites are measured Of these, eight sites are always the same and theother 40 sites are measured every four years

The EMAP sample design is based on approximately 12,600 points

on a grid, each of which is the centre of a hexagon with area 40 km².The grid is itself within a large hexagonal region covering much ofNorth America, as shown in Figure 5.1 The area covered by the 40km² hexagons entered on the grid points is one sixteenth of the totalarea of the conterminous United States, with the area used beingchosen after a random shift in the grid Another aspect of the design

is that the four sets of sites that are measured on different years are

Trang 5

spatially interpenetrating, as indicated in Figure 5.2 This allows theestimation of parameters for the whole area every year.

Figure 5.1 The EMAP baseline grid for North America The shaded area

shown is covered by about 12,600 small hexagons, with a spacing betweentheir centres being of 27 km

Figure 5.2 The use of spatially interpenetrating samples for visits at four

year intervals

Trang 6

5.4 Designs Based on Optimization

One approach to the design of monitoring schemes is by choosing thesites so that the amount of information is in some sense maximized.The main question then is how to measure the information that is to

be maximized, particularly if the monitoring scheme has a number ofdifferent objectives, some of which will only become known in thefuture

One possibility involves choosing a network design, or adding orsubtracting stations to minimize entropy, where low entropy

corresponds to high 'information' (Caselton et al., 1992) The theory

is complex, and needs more prior information than will usually beavailable, particularly if there is no existing network to provide this.Another possibility considers the choice of a network design to be

a problem of the estimation of a regression function for which aclassical theory of optimal design exists (Fedorov and Mueller, 1989)

5.5 Monitoring Designs Typically Used

In practice, sample designs for monitoring often consist of selecting

a certain number of sites preferably (but not necessarily) at randomfrom the potential sites in a region, and then measuring the variable

of interest at those sites at a number of points in time A complication

is that for one reason or another some of the sites may not bemeasured at some of the times A typical set of data will then look likethe data in Table 5.3 for pH values measured on lakes in Norway.With this set of data, which is part of the more extensive data that areshown in Table 1.1 and discussed in Example 1.2, the main question

of interest is whether there is any evidence for changes from year toyear in the general level of pH and, in particular, whether the pH levelwas tending to increase or decrease

5.6 Detection of Changes by Analysis of Variance

A relatively simple analysis for data like the Norwegian lake pH valuesshown in Table 5.3 involves carrying out a two factor analysis ofvariance, as discussed in Section 3.5 The two factors are then thesite and the time The model for the observation at site i at time j is

yij = µ + Si + Tj + eij, (5.1)

Trang 7

where µ represents an overall general level for the variable beingmeasured, Si represents the deviation of site i from the general level,

Tj represents a time effect, and eij represents measurement errors andother random variation that is associated with the observation at thesite at the particular time

The model (5.1) does not include a term for the interaction betweensites and times as is included in the general two-factor analysis ofvariance model as defined in equation (4.31) This is because there

is only at most one observation for a site in a particular year, whichmeans that it is not possible to separate interactions frommeasurement errors Consequently, it must be assumed that anyinteractions are negligible

Example 5.1 Analysis of Variance on the pH Values

The results of an analysis of variance on the pH values for Norwegianlakes are summarised in Table 5.4 The results in this table wereobtained using the MINITAB package (Minitab Inc., 1994) using anoption that takes into account the missing values, although many otherstandard statistical packages could have been used just as well Theeffects in the model were assumed to be fixed rather than random (asdiscussed in Section 4.5), although since interactions are assumed to

be negligible the same results would be obtained using randomeffects It is found that there is a significant difference between thelakes (p = 0.000) and a nearly significant difference between the years(p = 0.061) Therefore there is no very strong evidence from thisanalysis of differences between years

To check the assumptions of the analysis, standardized residuals(the differences between the actual observations and those predicted

by the model, divided by their standard deviations) can be plottedagainst the lake, the year, and against their position in space for each

of the four years These plots are shown in Figures 5.3 and 5.4.These residuals show no obvious patterns so that the model seemssatisfactory, except that there are one or two residuals that are ratherlarge

Trang 8

Table 5.3 Values for pH for lakes in southern

Norway with the latitudes (Lat) and longitudes(Long) for the lakes

Trang 9

Table 5.4 Analysis of variance table for the data on pH levels in Norwegian

lakes

Source of

Variation

Sum of Squares 1

Degrees of Freedom

Mean Square F

icance level (p)

Figure 5.3 Standardized residuals from the analysis of variance model for

pH in Norwegian lakes plotted against the lake number and the yearnumber

5.7 Detection of Changes Using Control Charts

Control charts are used to monitor industrial processes (Montgomery,1991) and they can be used equally well with environmental data Thesimplest approach involves using an x chart to detect changes in aprocess mean, together with a range chart to detect changes in theamount of variation These types of charts are often called Shewhartcontrol charts after their originator (Shewhart, 1931)

Typically, the starting point is a moderately large set of dataconsisting of M random samples of size n, where these are taken atequally spaced intervals of time from the output of the process Thisset of data is then used to estimate the process mean and standarddeviation, and hence to construct the two charts The data are then

Trang 10

plotted on the charts It is usually assumed that the observations arenormally distributed.

If the process seems to have a constant mean and standarddeviation, then the sampling of the process is continued with newpoints being plotted to monitor whatever is being measured If themean or standard deviation does not seem to have been constant forthe time when the initial samples were taken, then in the industrialprocess situation, action is taken to bring the process under control.With environmental monitoring this may not be possible However,the knowledge that the process being measured is not stable will be

of interest anyway

Figure 5.4 Standardized residuals from the analysis of variance model for

pH in Norwegian lakes plotted against the locations of the lakes Thestandardized residuals are rounded to the nearest integer for clarity

The method for constructing the x-chart involves the followingstages:

Trang 11

1 The sample mean and the sample range (the maximum value in asample minus the minimum value in a sample) are calculated, foreach of the M samples For the ith sample let these values bedenoted by xi and Ri.

2 The mean of the variable being measured is assumed to beconstant, and is estimated by the overall mean of all the availableobservations, which is also just the mean of the sample means x1

to xM Let the estimated mean be denoted by µ

3 Similarly, the standard deviation of ring widths is assumed to haveremained constant and this is estimated on the basis of a knownrelationship between the mean range for samples of size n and thestandard deviation for samples from a normal distribution Thisrelationship is of the form F = k(n)µR, where µR is the mean rangefor samples of size n, and the constant k(n) is given in Table 5.5.Thus the estimated standard deviation is

where R- is the mean of the sample ranges

4 The standard error of the mean for samples of size n is estimated

to be SÊ(x) = F/%n

5 Warning limits are set at the mean plus and minus 1.96 standarderrors, i.e., at µ ± 1.96SÊ(x) If the mean and standard deviationare constant then only about 1 in 20 (5%) sample means should beoutside one of these limits Action limits are set at the mean plusand minus 3.09 standard errors, i.e., at µ ± 3.09SÊ(x) Only about

1 in 500 (0.2%) sample means should plot outside these limits

The rationale behind constructing the x chart in this way is that itshows the changes in the sample means with time, and the warningand action limits indicate whether these changes are too large to bedue to normal random variation if the mean is in fact constant

Trang 12

Table 5.5 Control chart limits for sample ranges, assuming samples from

normal distributions To find the limits on the range chart, multiply themean range by the tabulated value For example, for samples of size n =

5 the lower action limit is 0.16µR, where µR is the mean range With astable distribution a warning limit is crossed with probability 0.05 (5%) and

an action limit with probability 0.002 (0.2%) The last column is the factorthat the mean range must be multiplied by to obtain the standard deviation.For example, for samples of size 3 the standard deviation is 0.591µR.Source: Tables G1 and G2 of Davies and Goldsmith (1972)

size Action Warning Warning Action Factor (k)

is stable Similarly, action limits can be placed so that the probability

of crossing one of them is 0.002 (0.2%) when the level of variation isstable The setting of these limits requires the use of tabulated valuesthat are provided and explained in Table 5.5

Control charts can be produced quite easily in a spreadsheetprogram Alternatively, statistical package such as MINITAB (MintabInc., 1994) have options to produce the charts, and often allow anumber of other types of control charts to be produced as well

Example 5.2 Monitoring pH in New Zealand

monitoring of rivers in the South Island of New Zealand Values areprovided for pH, for five randomly chosen rivers, with a differentselection for each of the monthly sample times from January 1989 toDecember 1997 The data are used to construct control charts for

Trang 13

monitoring pH over the sampled time As shown in Figure 5.5, thedistribution is reasonably close to normal.

The overall mean of the pH values for all the samples is µ^ = 7.640.This is used as the best estimate of the process mean The mean ofthe sample ranges is R - = 0.694 From Table 5.5, the factor to convertthis to an estimate of the process standard deviation is k(5) = 0.43.The estimated standard deviation is therefore

The range chart is shown in Figure 5.6 (b) Using the factors forsetting the action and warning limits from Table 5.5, these limits are

at are 0.16 x 0.694 = 0.11 (lower action limit), 0.37 x 0.694 = 0.26(lower warning limit), 1.81 x 0.694 = 1.26 (upper warning limit), and2.36 x 0.694 = 1.64 (upper action limit) Due to the nature of thedistribution of sample ranges, these limits are not symmetricallyplaced about the mean

With 108 observations altogether, it is expected that about fivepoints will plot outside the warning limits In fact, there are ten pointsoutside these limits, and one point outside the lower action limit Itappears, therefore, that the mean pH level in the South Island of NewZealand was changing to some extent during the monitored period,with the overall plot suggesting that the pH level was high for aboutthe first two years, and was lower from then on

The range chart has seven points outside the warning limits, andone point above the upper action limit Here again there is evidencethat the process variation was not constant, with occasional 'spikes'

of high variability, particularly in the early part of the monitoring period

Trang 14

Table 5.6 Values for pH for five randomly selected rivers in the South

Island of New Zealand, for each month from January 1989 to December

1997 This is part of a larger data set provided by Graham McBride,National Institute of Water and Atmospheric Research, Hamilton, NewZealand

1989 Jan 7.27 7.10 7.02 7.23 8.08 7.34 1.06

Feb 8.04 7.74 7.48 8.10 7.21 7.71 0.89 Mar 7.50 7.40 8.33 7.17 7.95 7.67 1.16 Apr 7.87 8.10 8.13 7.72 7.61 7.89 0.52 May 7.60 8.46 7.80 7.71 7.48 7.81 0.98 Jun 7.41 7.32 7.42 7.82 7.80 7.55 0.50 Jul 7.88 7.50 7.45 8.29 7.45 7.71 0.84 Aug 7.88 7.79 7.40 7.62 7.47 7.63 0.48 Sep 7.78 7.73 7.53 7.88 8.03 7.79 0.50 Oct 7.14 7.96 7.51 8.19 7.70 7.70 1.05 Nov 8.07 7.99 7.32 7.32 7.63 7.67 0.75 Dec 7.21 7.72 7.73 7.91 7.79 7.67 0.70

1990 Jan 7.66 8.08 7.94 7.51 7.71 7.78 0.57

Feb 7.71 8.73 8.18 7.04 7.28 7.79 1.69 Mar 7.72 7.49 7.62 8.13 7.78 7.75 0.64 Apr 7.84 7.67 7.81 7.81 7.80 7.79 0.17 May 8.17 7.23 7.09 7.75 7.40 7.53 1.08 Jun 7.79 7.46 7.13 7.83 7.77 7.60 0.70 Jul 7.16 8.44 7.94 8.05 7.70 7.86 1.28 Aug 7.74 8.13 7.82 7.75 7.80 7.85 0.39 Sep 8.09 8.09 7.51 7.97 7.94 7.92 0.58 Oct 7.20 7.65 7.13 7.60 7.68 7.45 0.55 Nov 7.81 7.25 7.80 7.62 7.75 7.65 0.56 Dec 7.73 7.58 7.30 7.78 7.11 7.50 0.67

1991 Jan 8.52 7.22 7.91 7.16 7.87 7.74 1.36

Feb 7.13 7.97 7.63 7.68 7.90 7.66 0.84 Mar 7.22 7.80 7.69 7.26 7.94 7.58 0.72 Apr 7.62 7.80 7.59 7.37 7.97 7.67 0.60 May 7.70 7.07 7.26 7.82 7.51 7.47 0.75 Jun 7.66 7.83 7.74 7.29 7.30 7.56 0.54 Jul 7.97 7.55 7.68 8.11 8.01 7.86 0.56 Aug 7.86 7.13 7.32 7.75 7.08 7.43 0.78 Sep 7.43 7.61 7.85 7.77 7.14 7.56 0.71 Oct 7.77 7.83 7.77 7.54 7.74 7.73 0.29 Nov 7.84 7.23 7.64 7.42 7.73 7.57 0.61 Dec 8.23 8.08 7.89 7.71 7.95 7.97 0.52

Trang 15

Table 5.6 (Continued)

1992 Jan 8.28 7.96 7.86 7.65 7.49 7.85 0.79

Feb 7.23 7.11 8.53 7.53 7.78 7.64 1.42 Mar 7.68 7.68 7.15 7.68 7.85 7.61 0.70 Apr 7.87 7.20 7.42 7.45 7.96 7.58 0.76 May 7.94 7.35 7.68 7.50 7.12 7.52 0.82 Jun 7.80 6.96 7.56 7.22 7.76 7.46 0.84 Jul 7.39 7.12 7.70 7.47 7.74 7.48 0.62 Aug 7.42 7.41 7.47 7.80 7.12 7.44 0.68 Sep 7.91 7.77 6.96 8.03 7.24 7.58 1.07 Oct 7.59 7.41 7.41 7.02 7.60 7.41 0.58 Nov 7.94 7.32 7.65 7.84 7.86 7.72 0.62 Dec 7.64 7.74 7.95 7.83 7.96 7.82 0.32

1993 Jan 7.55 8.01 7.37 7.83 7.51 7.65 0.64

Feb 7.30 7.39 7.03 8.05 7.59 7.47 1.02 Mar 7.80 7.17 7.97 7.58 7.13 7.53 0.84 Apr 7.92 8.22 7.64 7.97 7.18 7.79 1.04 May 7.70 7.80 7.28 7.61 8.12 7.70 0.84 Jun 7.76 7.41 7.79 7.89 7.36 7.64 0.53 Jul 8.28 7.75 7.76 7.89 7.82 7.90 0.53 Aug 7.58 7.84 7.71 7.27 7.95 7.67 0.68 Sep 7.56 7.92 7.43 7.72 7.21 7.57 0.71 Oct 7.19 7.73 7.21 7.49 7.33 7.39 0.54 Nov 7.60 7.49 7.86 7.86 7.80 7.72 0.37 Dec 7.50 7.86 7.83 7.58 7.45 7.64 0.41

1994 Jan 8.13 8.09 8.01 7.76 7.24 7.85 0.89

Feb 7.23 7.89 7.81 8.12 7.83 7.78 0.89 Mar 7.08 7.92 7.68 7.70 7.40 7.56 0.84 Apr 7.55 7.50 7.52 7.64 7.14 7.47 0.50 May 7.75 7.57 7.44 7.61 8.01 7.68 0.57 Jun 6.94 7.37 6.93 7.03 6.96 7.05 0.44 Jul 7.46 7.14 7.26 6.99 7.47 7.26 0.48 Aug 7.62 7.58 7.09 6.99 7.06 7.27 0.63 Sep 7.45 7.65 7.78 7.73 7.31 7.58 0.47 Oct 7.65 7.63 7.98 8.06 7.51 7.77 0.55 Nov 7.85 7.70 7.62 7.96 7.13 7.65 0.83 Dec 7.56 7.74 7.80 7.41 7.59 7.62 0.39

1995 Jan 8.18 7.80 7.22 7.95 7.79 7.79 0.96

Feb 7.63 7.88 7.90 7.45 7.97 7.77 0.52 Mar 7.59 8.06 8.22 7.57 7.73 7.83 0.65 Apr 7.47 7.82 7.58 8.03 8.19 7.82 0.72 May 7.52 7.42 7.76 7.66 7.76 7.62 0.34 Jun 7.61 7.72 7.56 7.49 6.87 7.45 0.85 Jul 7.30 7.90 7.57 7.76 7.72 7.65 0.60 Aug 7.75 7.75 7.52 8.12 7.75 7.78 0.60 Sep 7.77 7.78 7.75 7.49 7.14 7.59 0.64 Oct 7.79 7.30 7.83 7.09 7.09 7.42 0.74 Nov 7.87 7.89 7.35 7.56 7.99 7.73 0.64 Dec 8.01 7.56 7.67 7.82 7.44 7.70 0.57

Ngày đăng: 11/08/2014, 09:21