1 The Role of Statistics in Environmental Science 2.2 Simple Random Sampling 2.3 Estimation of Population Means 2.4 Estimation of Population Totals 2.5 Estimation of Proportions 2.6 Samp
Trang 1Statistics for Environmental Science and Management
Bryan F.J Manly
Statistical Consultant Western Ecosystem Technology Inc
Wyoming, USA
CHAPMAN & HALL/CRC
Boca Raton London New York Washington, D.C
Trang 2Library of Congress Cataloging-in-Publication Data
Manly, Bryan F.J., 1944-
Statistics for environmental science and management / by Bryan F.J Manly
p cm
Includes bibliographical references and index
ISBN l-58488-029-5 (alk paper)
1 Environmental sciences - Statistical methods 2 Environmental
management - Statistical methods I Title
GE45.S73 M36 2000
CIP
This book contains information obtained from authentic and highly regarded sources Reprinted material
is quoted with permission, and sources are indicated A wide variety of references are listed Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of a11 materials or for the consequences of their use
Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher
The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale Specific permission must be obtained in writing from CRC Press UC for such copying
Direct all inquiries to CRC Press LLC, 2000 N.W Corporate Blvd., Boca Raton, Florida 33431 Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe
Visit the CRC Press Web site at
0 2001 by Chapman & HalIKRC
No claim to original U.S Government works International Standard Book Number l-58488-029-5 Library of Congress Card Number 00-055458 Printed in the United States of America 34567890
Printed on acid-free paper
www.crcpress.com
Trang 3A great deal of intelligence can be invested in ignorance when the need for illusion is deep.
Saul Bellow
Trang 41 The Role of Statistics in Environmental Science
2.2 Simple Random Sampling
2.3 Estimation of Population Means
2.4 Estimation of Population Totals
2.5 Estimation of Proportions
2.6 Sampling and Non-Sampling Errors
2.7 Stratified Random Sampling
2.13 Choosing Sample Sizes
2.14 Unequal Probability Sampling
2.15 The Data Quality Objectives Process
2.16 Chapter Summary
3 Models for Data
3.1 Statistical Models
3.2 Discrete Statistical Distributions
3.3 Continuous Statistical Distributions
3.4 The Linear Regression Model
3.5 Factorial Analysis of Variance
3.6 Generalized Linear Models
3.7 Chapter Summary
4 Drawing Conclusions from Data
4.1 Introduction
4.2 Observational and Experimental Studies
4.3 True Experiments and Quasi-Experiments
4.4 Design-Based and Model-Based Inference
Trang 54.5 Tests of Significance and Confidence Intervals
5.2 Purposely Chosen Monitoring Sites
5.3 Two Special Monitoring Designs
5.4 Designs Based on Optimization
5.5 Monitoring Designs Typically Used
5.6 Detection of Changes by Analysis of Variance
5.7 Detection of Changes Using Control Charts
5.8 Detection of Changes Using CUSUM Charts
5.9 Chi-Squared Tests for a Change in a Distribution
5.10 Chapter Summary
6 Impact Assessment
6.1 Introduction
6.2 The Simple Difference Analysis with BACI Designs
6.3 Matched Pairs with a BACI Design
7.2 Problems with Tests of Significance
7.3 The Concept of Bioequivalence
7.4 Two-Sided Tests of Bioequivalence
8.4 Tests for Randomness
8.5 Detection of Change Points and Trends
8.6 More Complicated Time Series Models
Trang 68.8 Forecasting
8.9 Chapter Summary
9 Spatial Data Analysis
9.1 Introduction
9.2 Types of Spatial Data
9.3 Spatial Patterns in Quadrat Counts
9.4 Correlation Between Quadrat Counts
9.5 Randomness of Point Patterns
9.6 Correlation Between Point Patterns
9.7 Mantel Tests for Autocorrelation
10.4 Comparing the Means of Two or More Samples
10.5 Regression with Censored Data
10.6 Chapter Summary
11 Monte Carlo Risk Assessment
11.1 Introduction
11.2 Principles for Monte Carlo Risk Assessment
11.3 Risk Analysis Using a Spreadsheet Add-On
A2 Distributions for Sample Data
A3 Distributions of Sample Statistics
A4 Tests of Significance
A5 Confidence Intervals
A6 Covariance and Correlation
B1 The Standard Normal Distribution
B2 Critical Values for the t-Distribution
B3 Critical Values for the Chi-Squared Distribution
B4 Critical Values for the F-Distribution
Trang 7B5 Critical Values for the Durbin-Watson Statistic
References
Trang 8This book is intended to introduce environmental scientists andmanagers to the statistical methods that will be useful for them in theirwork A secondary aim was to produce a text suitable for a course instatistics for graduate students in the environmental science area Iwrote the book because it seemed to me that these groups shouldreally learn about statistical methods in a special way It is true thattheir needs are similar in many respects to those working in otherareas However, there are some special topics that are relevant toenvironmental science to the extent that they should be covered in anintroductory text, although they would probably not be mentioned at all
in such a text for a more general audience I refer to environmentalmonitoring, impact assessment, assessing site reclamation, censoreddata, and Monte Carlo risk assessment, which all have their ownchapters here
The book is not intended to be a complete introduction to statistics.Rather, it is assumed that readers have already taken a course orread a book on basic methods, covering the ideas of random variation,statistical distributions, tests of significance, and confidence intervals.For those who have done this some time ago, Appendix A is meant toprovide a quick refresher course
A number of people have contributed directly or indirectly to thisbook I must first mention Lyman McDonald of West Inc., Cheyenne,Wyoming, who first stimulated my interest in environmental statistics,
as distinct from ecological statistics Much of the contents of the bookare influenced by the discussions that we have had on mattersstatistical Jennifer Brown from the University of Canterbury in NewZealand has influenced the contents because we have shared theteaching of several short courses on statistics for environmentalscientists and managers Likewise, sharing a course on statistics forMSc students of environmental science with Caryn Thompson andDavid Fletcher has also had an effect on the book Other people aretoo numerous to name, so I would just like to thank generally thosewho have contributed data sets, helped me check references andequations, etc
Most of this book was written in the Department of Mathematicsand Statistics at the University of Otago As usual, the university wasgenerous with the resources that are needed for the major effort ofwriting a book, including periods of sabbatical leave that enabled me
to write large parts of the text without interruptions, and an excellentlibrary
Trang 9However, the manuscript would definitely have taken longer tofinish if I had not been invited to spend part of the year 2000 as aVisiting Researcher at the Max Planck Institute for Limnology at Plön
in Germany This enabled me to write the final chapters and put thewhole book together I am very grateful to Winfried Lampert, theDirector of the Institute, for his kind invitation to come to Plön, and forallowing me to use the excellent facilities at the Institute while I wasthere
The Saul Bellow quotation above may need some explanation Itresults from attending meetings where an environmental matter isargued at length, with everyone being ignorant about the true facts ofthe case Furthermore, one suspects that some people there wouldprefer not to know the true facts because this would be likely to endthe arguments
Bryan F.J Manly
May 2000
Trang 10CHAPTER 1 The Role of Statistics in Environmental Science
1.1 Introduction
In this chapter the role of statistics in environmental science isconsidered by examining some specific examples First, however, animportant point needs to be made The importance of statistics isobvious because much of what is learned about the environment isbased on numerical data Therefore the appropriate handling of data
is crucial Indeed, the use of incorrect statistical methods may makeindividuals and organizations vulnerable to being sued for largeamounts of money Certainly in the United States it appears thatincreasing attention to the use of statistical methods is driven by thefear of litigation
One thing that it is important to realize in this context is that there
is usually not a single correct way to gather and analyse data At bestthere may be several alternative approaches that are all about equallygood At worst the alternatives may involve different assumptions,and lead to different conclusions This will become apparent fromsome of the examples in this and the following chapters
1.2 Some Examples
The following examples demonstrate the non-trivial statisticalproblems that can arise in practice, and show very clearly theimportance of the proper use of statistical theory Some of theseexamples are revisited again in later chapters
For environmental scientists and resource managers there arethree broad types of situation that are often of interest:
(a) baseline studies intended to document the present state of theenvironment in order to establish future changes resulting, forexample, from unforeseen events such as oil spills;
(b) targeted studies designed to assess the impact of planned eventssuch as the construction of a dam, or accidents such as oil spills;and
Trang 11(c) regular monitoring intended to detect trends and changes inimportant variables, possibly to ensure that compliance conditionsare being met for an industry that is permitted to discharge smallamounts of pollutants into the environment.
The examples include all of these types of situations
Example 1.1 The Exxon Valdez Oil Spill
Oil spills resulting from the transport of crude and refined oils occurfrom time to time, particularly in coastal regions Some very largespills (over 100,000 tonnes) have attracted considerable interest
around the world Notable examples are the Torrey Canyon spill in the English Channel in 1967, the Amoco Cadiz off the coast of Brittany, France in 1978, and the grounding of the Braer off the
Shetland Islands in 1993 These spills all bring similar challenges fordamage control for the physical environment and wildlife There isintense concern from the public, resulting in political pressures onresource managers There is the need to assess both short-term andlong-term environmental impacts Often there are lengthy legal cases
to establish liability and compensation terms
One of the most spectacular oil spills was that of the Exxon Valdez,
which grounded on Bligh Reef in Prince William Sound, Alaska, on 24March 1989, spilling more than 41 million litres of Alaska north slopecrude oil This was the largest spill up to that time in United States
coastal waters, although far from the size of the Amoco Cadiz spill.
The publicity surrounding it was enormous and the costs for cleanup,damage assessment and compensation have been considerable atnearly $US12,000 per barrel lost, compared with the more typical
$US5,000 per barrel, for which the typical sale price is only about
$US15 (Wells et al., 1995, p 5) Figure 1.1 shows the path of the oilthrough Prince William Sound and the western Gulf of Alaska
There were many targeted studies of the Exxon Valdez spill related
to the persistence and fate of the oil and the impact on fisheries andwildlife Here only three of these studies, concerned with the shorelineimpact of the oil, are considered The investigators used differentstudy designs and all met with complications that were not foreseen
in advance of sampling The three studies are Exxon's Shoreline
Ecology Program (Page et al., 1995; Gilfillan et al., 1995), the Oil Spill Trustees' Coastal Habitat Injury Assessment (Highsmith et al., 1993; McDonald et al., 1995), and the Biological Monitoring Survey (Houghton et al., 1993) The summary here owes much to a paper
Trang 12presented by Harner et al (1995) at an International Environmetrics
Conference in Kuala Lumpur, Malaysia
Figure 1.1 The path of the oil spill from the Exxon Valdez that occurred on
24 March (day 1) until 18 May 1989 (day 56), through Prince William Soundand the western Gulf of Alaska
The Exxon Shoreline Ecology Program
The Exxon Shoreline Ecology Program started in 1989 with thepurposeful selection of a number of heavily oiled sites along theshoreline that were to be measured over time in order to determinerecovery rates Because these sites are not representative of theshoreline potentially affected by oil they were not intended to assessthe overall damage
In 1990, using a stratified random sampling design of a type that
is discussed in Chapter 2, the study was enlarged to include manymore sites Basically, the entire area of interest was divided into anumber of short segments of shoreline Each segment was thenallocated to one of 16 strata based on the substrate type (exposedbedrock, sheltered bedrock, boulder/cobble, and pebble/gravel) andthe degree of oiling (none, light, moderate, and heavy) For example,the first stratum was exposed bedrock with no oiling Finally, four sites
Trang 13were chosen from each of the 16 strata for sampling to determine theabundances of more than a thousand species of animals and plants.
A number of physical variables were also measured at each site.The analysis of the data collected from the Exxon ShorelineEcology Program was based on the use of what are calledgeneralized linear models for species counts These models aredescribed in Chapter 3, and here it suffices to say that the effects ofoiling were estimated on the assumption that the model used for eachspecies was correct, with an allowance being made for differences inphysical variables between sites
A problem with the sampling design was that the initial allocation
of shoreline segments to the 16 strata was based on the information
in a geographical information system (GIS) However, this resulted insome sites being misclassified, particularly in terms of oiling levels.Furthermore, sites were not sampled if they were near an active eaglenest or human activity The net result was that the samplingprobabilities used in the study design were not quite what they weresupposed to be The investigators considered that the effect of thiswas minor However, the authors of the National Oceanic andAtmospheric Administrations guidance document for assessing thedamage from oil spills argue that this could be used in an attempt to
discredit the entire study (Bergman et al., 1995, Section F) It is
therefore an example of how a minor deviation from the requirements
of a standard study design may lead to potentially very seriousconsequences
The Oil Spill Trustees' Coastal Habitat Injury Assessment
The Exxon Valdez Oil Spill Trustee Council was set up to oversee the
allocation of funds from Exxon for the restoration of Prince WilliamSound and Alaskan waters Like the Exxon Shoreline EcologyProgram, the 1989 Coastal Habitat Injury Assessment study that wasset up by the Council was based on a stratified random samplingdesign of a type that will be discussed in Chapter 3 There were 15strata used, with these defined by five habitat types, each with threelevels of oiling Sample units were shoreline segments with varyinglengths, and these were selected using a GIS system, withprobabilities proportional to their lengths
Unfortunately, so many sites were misclassified by the GIS systemthat the 1989 study design had to be abandoned in 1990 Instead,each of the moderately and heavily oiled sites that were sampled in
1989 was matched up with a comparable unoiled control site based
on physical characteristics, to give a paired comparison design The
Trang 14investigators then considered whether the paired sites weresignificantly different with regard to species abundance.
There are two aspects of the analysis of the data from this studythat are unusual First, the results of comparing site pairs (oiled andunoiled) were summarised as p-values (probabilities of observingdifferences as large as those seen on the hypothesis that oiling had
no effect) These p-values were then combined using a meta-analysiswhich is a method for combining data that is described in Chapter 4.This method for assessing the evidence was used because each sitepair was thought to be an independent study of the effects of oiling.The second unusual aspect of the analysis was the weighting ofresults that was used for one of the two methods of meta-analysis thatwas employed By weighting the results for each site pair by thereciprocal of the probability of the pair being included in the study, itwas possible to make inferences with respect to the entire set ofpossible pairs in the study region This was not a particularly simpleprocedure to carry out because inclusion probabilities had to beestimated by simulation It did, however, overcome the problemsintroduced by the initial misclassification of sites
The Biological Monitoring Survey
The Biological Monitoring Survey was instigated by the NationalOceanic and Atmospheric Administration to study differences inimpact between oiling alone and oiling combined with high pressurehot water washing at sheltered rocky sites Thus there were threecategories of sites used Category 1 sites were unoiled Category 2sites were oiled but not washed Category 3 sites were oiled andwashed Sites were subjectively selected, with unoiled ones beingchosen to match those in the other two categories Oiling levels werealso classified as being light or moderate/heavy depending on theirstate when they were laid out in 1989 Species counts andpercentage cover were measured at sampled sites
Randomization tests were used to assess the significance of thedifferences between the sites in different categories because of theextreme nature of the distributions found for the recorded data Thesetypes of test are discussed in Chapter 4 Here it is just noted that thehypothesis tested is that an observation was equally likely to haveoccurred for a site in any one of the three categories These tests cancertainly provide valid evidence of differences between the categories.However, the subjective methods used to select sites allow theargument to be made that any significant differences were due to theselection procedure rather than the oiling or the hot water treatment
Trang 15Another potential problem with the analysis of the study is that itmay have involved pseudoreplication (treating correlated data asindependent data), which is also defined and discussed in Chapter 4.This is because sampling stations along a transect on a beach weretreated as if they provided completely independent data, although infact some of these stations were in close proximity In reality,observations taken close together in space can be expected to bemore similar than observations taken far apart Ignoring this fact mayhave led to a general tendency to conclude that sites in the differentcategories differed when this was not really the case.
General Comments on the Three Studies
The three studies on the Exxon Valdez oil spill took different
approaches and lead to answers to different questions The ExxonShoreline Ecology Program was intended to assess the impact ofoiling over the entire spill zone by using a stratified random samplingdesign A minor problem is that the standard requirements of thesampling design were not quite followed because of sitemisclassification and some restrictions on sites that could be sampled.The Oil Trustees' Coastal Habitat Study was badly upset by sitemisclassification in 1989, and was therefore converted to a pairedcomparison design in 1990 to compare moderately or heavily oiledsites with subjectively chosen unoiled sites This allowed evidence forthe effect of oiling to be assessed, but only at the expense of acomplicated analysis involving the use of simulation to estimate theprobability of a site being used in the study, and a special method tocombine the results for different pairs of sites The BiologicalMonitoring Survey focussed on assessing the effects of hot waterwashing, and the design gives no way for making inferences to theentire area affected by the oil spill
All three studies are open to criticism in terms of the extent towhich they can be used to draw conclusions about the overall impact
of the oil spill in the entire area of interest For the Exxon CoastalEcology Program and the Trustees' Coastal Habitat InjuryAssessment, this was the result of using stratified random samplingdesigns for which the randomization was upset to some extent As a
case study the Exxon Valdez oil spill should, therefore, be a warning
to those involved in oil spill impact assessment in the future aboutproblems that are likely to occur with this type of design Anotheraspect of these two studies that should give pause for thought is thatthe analyses that had to be conducted were rather complicated and