Modeling Hydrologic Change: Statistical Methods - Chapter 7 pot

Correlationanalyses can also be applied to a flood series to test for serial independence, withsignificance tests applied to assess whether an observed dependency is significant;the Pear

Trang 1

The runs test can be used to test for nonhomogeneity due to a trend or an episodicevent The Kendall test tests for nonhomogeneity associated with a trend Correlationanalyses can also be applied to a flood series to test for serial independence, withsignificance tests applied to assess whether an observed dependency is significant;the Pearson test and the Spearman test are commonly used to test for serial corre-lation If a nonhomogeneity is thought to be episodic, separate flood frequencyanalyses can be done to detect differences in characteristics, with standard techniquesused to assess the significance of the differences The Mann–Whitney test is usefulfor detecting nonhomogeneity associated with an episodic event.

Four of these tests (all but the Pearson test) are classified as nonparametric Theytests can be applied directly to the discharges in the annual maximum series withoutmaking a logarithmic transform The exact same solution results when the test isapplied to the logarithms and to the untransformed data with all four tests This isnot true for the Pearson test, which is parametric Because a logarithmic transform

is cited in Bulletin 17B (Interagency Advisory Committee on Water Data, 1982),the transform should also be applied when making the statistical test for the Pearsoncorrelation coefficient

The tests presented for detecting nonhomogeneity follow the six steps of esis testing: (1) formulate hypotheses; (2) identify theory that specifies the teststatistic and its distribution; (3) specify the level of significance; (4) collect the dataand compute the sample value of the test statistic; (5) obtain the critical value ofthe test statistic and define the region of rejection; and (6) make a decision to rejectthe null hypothesis if the computed value of the test statistic lies in the region ofrejection

hypoth-7

Trang 2

7.2 RUNS TEST

Statistical methods generally assume that hydrologic data measure random variables,with independence among measured values The runs (or run) test is based on themathematical theory of runs and can test a data sample for lack of randomness orindependence (or conversely, serial correlation) (Siegel, 1956; Miller and Freund,1965) The hypotheses follow:

H0: The data represent a sample of a single independently distributed randomvariable

H A: The sample elements are not independent values

If one rejects the null hypothesis, the acceptance of nonrandomness does not indicatethe type of nonhomogeneity; it only indicates that the record is not homogeneous

In this sense, the runs test may detect a systematic trend or an episodic change Thetest can be applied as a two-tailed or one-tailed test It can be applied to the lower

or upper tail of a one-tailed test

The runs test is based on a sample of data for which two outcomes are possible,

x1 or x2 These outcomes can be membership in two groups, such as exceedances ornonexceedances of a user-specified criterion such as the median In the context offlood-record analysis, these two outcomes could be that the annual peak dischargesexceed or do not exceed the median value for the flood record A run is defined as

a sequence of one or more of outcome x1 or outcome x2 In a sequence of n values,

n1 and n2 indicate the number of outcomes x1 and x2, respectively, where n1+n2=

n The outcomes are determined by comparing each value in the data series with auser-specified criterion, such as the median, and indicating whether the data valueexceeds (+) or does not exceed (−) the criterion Values in the sequence that equalthe median should be omitted from the sequences of + and − values The solutionprocedure depends on sample size If the values of n1 and n2 are both less than 20,the critical number of runs, nα, can be obtained from a table If n1 or n2 is greaterthan 20, a normal approximation is made

The theorem that specifies the test statistic for large samples is as follows: Ifthe ordered (in time or space) sample data, contains n1 and n2 values for the twopossible outcomes, x1 and x2, respectively, in n trials, where both n1 and n2 are notsmall, the sampling distribution of the number of runs is approximately normal withmean, , and variance, , which are approximated by:

Trang 3

in which n1 + n2 = n For a sample with U runs, the test statistic is (Draper andSmith, 1966):

(7.2)

where z is the value of a random variable that has a standard normal distribution.The 0.5 in Equation 7.2 is a continuity correction applied to help compensate forthe use of a continuous (normal) distribution to approximate the discrete distribution

of U This theorem is valid for samples in which n1 or n2 exceeds 20

If both n1 and n2 are less than 20, it is only necessary to compute the number

of runs U and obtain critical values of U from appropriate tables (see AppendixTable A.5) A value of U less than or equal to the lower limit or greater than orequal to the upper limit is considered significant The appropriate section of the table

is used for a one-tailed test The critical value depends on the number of values, n1

and n2 The typically available table of critical values is for a 5% level of significancewhen applied as a two-tailed test When it is applied as a one-tailed test, the criticalvalues are for a 2.5% level of significance

The level of significance should be selected prior to analysis For consistencyand uniformity, the 5% level of significance is commonly used Other significancelevels can be justified on a case-by-case basis Since the basis for using a 5% level

of significance with hydrologic data is not documented, it is important to assess theeffect of using the 5% level on the decision

The runs test can be applied as a one-tailed or two-tailed test If a direction isspecified, that is, the test is one-tailed, then the critical value should be selectedaccordingly to the specification of the alternative hypothesis After selecting thecharacteristic that determines whether an outcome should belong to group 1 (+) orgroup 2 (−), the runs should be identified and n1, n2, and U computed Equations7.1a and 7.1b should be used to compute the mean and variance of U The computedvalue of the test statistic z can then be determined with Equation 7.2

For a two-tailed test, if the absolute value of z is greater than the critical value

of z, the null hypothesis of randomness should be rejected; this implies that thevalues of the random variable are probably not randomly distributed For a one-tailed test where a small number of runs would be expected, the null hypothesis isrejected if the computed value of z is less (i.e., more negative) than the critical value

of z For a one-tailed test where a large number of runs would be expected, the nullhypothesis is rejected if the computed value of z is greater than the critical value of

z For the case where either nl or n2 is greater than 20, the critical value of z is −zα

or +zα depending on whether the test is for the lower or upper tail, respectively.When applying the runs test to annual maximum flood data for which watershedchanges may have introduced a systematic effect into the data, a one-sided test istypically used Urbanization of a watershed may cause an increase in the centraltendency of the peaks and a decrease in the coefficient of variation Channelizationmay increase both the central tendency and the coefficient of variation Where theprimary effect of watershed change is to increase the central tendency of the annual

z U U

S u

= − −0 5.

Trang 4

maximum floods, it is appropriate to apply the runs test as a one-tailed test with a

small number of runs Thus, the critical z value would be a negative number, and

the null hypothesis would be rejected when the computed z is more negative than

the critical zα, which would be a negative value For a small sample test, the null

hypothesis would be rejected if the computed number of runs was smaller than the

critical number of runs

Example 7.1

The runs test can be used to determine whether urban development caused an

increase in annual peak discharges It was applied to the annual flood series of the

rural Nolin River and the urbanized Pond Creek watersheds to test the following

null (H0) and alternative (H A) hypotheses:

H0: The annual peak discharges are randomly distributed from 1945 to 1968,

and thus a significant trend is not present

H A: A significant trend in the annual peak discharges exists since the annual

peaks are not randomly distributed

The flood series is represented in Table 7.1 by a series of + and − symbols The

criterion that designates a + or − event is the median flow (i.e., the flow exceeded

or not exceeded as an annual maximum in 50% of the years) For the Pond Creek

and North Fork of the Nolin River watersheds, the median values are 2175 ft3/sec

and 4845 ft3/sec, respectively (see Table 7.1) If urbanization caused an increase in

discharge rates, then the series should have significantly more + symbols in the part

of the series corresponding to greater urbanization and significantly more − symbols

before urbanization The computed number of runs would be small so a one-tailed

Trang 5

test should be applied While rejection of the null hypothesis does not necessarilyprove that urbanization caused a trend in the annual flood series, the investigatormay infer such a cause.

The Pond Creek series has only two runs (see Table 7.1) All values before 1956are less than the median and all values after 1956 are greater than the median Thus,

nl = n2 = 12 The critical value of 7 was obtained from Table A.5 The null hypothesisshould be rejected if the number of runs in the sample is less than or equal to 7.Since a one-tailed test was used, the level of significance is 0.025 Because thesequence includes only two runs for Pond Creek, the null hypothesis should berejected The rejection indicates that the data are nonrandom The increase in urban-ization after 1956 may be a causal factor for this nonrandomness

For the North Fork of the Nolin River, the flood series represents 14 runs (see

Table 7.1) Because n1 and n2 are the same as for the Pond Creek analysis, the criticalvalue of 7 applies here also Since the number of runs is greater than 7, the nullhypothesis of randomness cannot be rejected Since the two watersheds are locatednear each other, the trend in the flood series for Pond Creek is probably not due to

an increase in rainfall (In a real-world application, rainfall data should be examinedfor trends as well.) Thus, it is probably safe to conclude that the flooding trend forPond Creek is due to urban development in the mid-1950s

7.2.1 R ATIONAL A NALYSIS OF R UNS T EST

Like every statistical test, the runs test is limited in its ability to detect the influence

of a systematic factor such as urbanization If the variation of the systematic effect

is small relative to the variation introduced by the random processes, then the runstest may suggest randomness In such a case, all of the variation may be attributed

to the effects of the random processes

In addition to the relative magnitudes of the variations due to random processesand the effects of watershed change, the ability of the runs test to detect the effects

of watershed change will depend on its temporal variation Two factors are important.First, change can occur abruptly over a short time or gradually over the duration of

a flood record Second, an abrupt change may occur near the center, beginning, orend of the period of record These factors must be understood when assessing theresults of a runs test of an annual maximum flood series

Before rationally analyzing the applicability of the runs test for detecting logic change, summarizing the three important factors is worthwhile

hydro-1 Is the variation introduced by watershed change small relative to thevariation due to the randomness of rainfall and watershed processes?

2 Has the watershed change occurred abruptly over a short part of the length

of record or gradually over most of the record length?

3 If the watershed change occurred over a short period, was it near thecenter of the record or at one of the ends?

Answers to these questions will help explain the rationality of the results of a runstest and other tests discussed in this chapter

Trang 6

Responses to the above three questions will include examples to demonstratethe general concepts Studies of the effects of urbanization have shown that the morefrequent events of a flood series may increase by a factor of two for large increases

in imperviousness For example, the peaks in the later part of the flood record forPond Creek are approximately double those from the preurbanization portion of theflood record Furthermore, variation due to the random processes of rainfall andwatershed conditions appears relatively minimal, so the effects of urbanization areapparent (see Figure 2.4) The annual maximum flood record for the Ramapo River

at Pompton Lakes, New Jersey (1922 through 1991) is shown in Figure 7.1 Thescatter is very significant, and an urbanization trend is not immediately evident.Most urban development occurred before 1968, and the floods of record then appearsmaller than floods that occurred in the late 1960s However, the random scatterlargely prevents the identification of effects of urbanization from the graph Whenthe runs test is applied to the series, the computed test statistic of Equation 7.2 equalszero, so the null hypothesis of randomness cannot be rejected In contrast to theseries for Pond Creek, the large random scatter in the Ramapo River series masksthe variation due to urbanization

The nature of a trend is also an important consideration in assessing the effect

of urbanization on the flows of an annual maximum series Urbanization of the PondCreek watershed occurred over a short period of total record length; this is evident

in Figure 2.4 In contrast, Figure 7.2 shows the annual flood series for the ElizabethRiver, at Elizabeth, New Jersey, for a 65-year period While the effects of the randomprocesses are evident, the flood magnitudes show a noticeable increase Many floods

at the start of the record are below the median, while the opposite is true for lateryears This causes a small number of runs, with the shorter runs near the center of

record The computed z statistic for the run test is −3.37, which is significant at the

FIGURE 7.1 Annual maximum peak discharges for Ramapo River at Pompton Lakes, New

Trang 7

0.0005 level Thus, a gradual trend, especially with minimal variation due to randomprocesses, produces a significant value for the runs test More significant randomeffects may mask the hydrologic effects of gradual urban development.

Watershed change that occurs over a short period, such as that in Pond Creek,can lead to acceptance or rejection of the null hypothesis for the runs test Whenthe abrupt change is near the middle of the series, the two sections of the recordwill have similar lengths; thus, the median of the series will fall in the center of thetwo sections, with a characteristic appearance of two runs, but it quite possibly will

be less than the critical number of runs Thus, the null hypothesis will be rejected.Conversely, if the change due to urbanization occurs near either end of the recordlength, the record will have short and long sequences The median of the flows willfall in the longer sequence; thus, if the random effects are even moderate, the floodseries will have a moderate number of runs, and the results of a runs test will suggestrandomness

It is important to assess the type (gradual or abrupt) of trend and the location(middle or end) of an abrupt trend This is evident from a comparison of the seriesfor Pond Creek, Kentucky, and Rahway River in New Jersey Figure 7.3 shows theannual flood series for the Rahway River The effect of urbanization appears in the

later part of the record The computed z statistic for the runs test is −1.71, which isnot significant at the 5% level, thus suggesting that randomness can be assumed

7.3 KENDALL TEST FOR TREND

Hirsch, Slack, and Smith (1982) and Taylor and Loftis (1989) provide assessments

of the Kendall nonparametric test The test is intended to assess the randomness of

a data sequence X i; specifically, the hypotheses (Hirsch, Slack, and Smith, 1982) are:

FIGURE 7.2 Annual maximum peak discharges for Elizabeth River, New Jersey.

Trang 8

H0: The annual maximum peak discharges (x i ) are a sample of n independent

and identically distributed random variables

H A : The distributions of x j and x k are not identical for all k, j ≤ n with k ≤ j.

The test is designed to detect a monotonically increasing or decreasing trend in the

data rather than an episodic or abrupt event The above H A alternative is two-sided,which is appropriate if a trend can be direct or inverse If a direction is specified,then a one-tailed alternative must be specified Gradual urbanization would cause adirect trend in the annual flood series Conversely, afforestation can cause an inversetrend in an annual flood series For the direct (inverse) trend in a series, the one-sided alternative hypothesis would be:

H A : A direct (inverse) trend exists in the distribution of x j and x k

The theorem defining the test statistic is as follows If x j and x k are independent

and identically distributed random values, the statistic S is defined as:

ΘΘΘ

Trang 9

For sample sizes of 30 or larger, tests of the hypothesis can be made using thefollowing test statistic:

series that did not include ties, and Kendall (1975) provided the adjustment shown

as the second term of Equation 7.5b Kendall points out that the normal tion of Equation 7.5a should provide accurate decisions for samples as small as 10,

approxima-but it is usually applied when N ≥ 30 For sample sizes below 30, the following τstatistic can be used when the series does not include ties:

(7.6)

Equation 7.6 should not be used when the series includes discharges of the samemagnitude; in such cases, a correction for ties can be applied (Gibbons, 1976)

After the sample value of the test statistic z is computed with Equation 7.5 and

a level of significance α selected, the null hypothesis can be tested Critical values

of Kendall’s τ are given in Table A.6 for small samples For large samples with a

two-tailed test, the null hypothesis H0 is rejected if z is greater than the standard normal deviate zα/2 or less than −zα/2 For a one-sided test, the critical values are zα

for a direct trend and −zα for an inverse trend If the computed value is greater than

zα for the direct trend, then the null hypothesis can be rejected; similarly, for an

inverse trend, the null hypothesis is rejected when the computer z is less (i.e., more

for

V

n n n t t i i t i

i g

=

∑( 1 2)( 5) ( 1 2)( 5)

181

τ =2S n n/[ ( −1)]

Trang 10

Since there are 33 + and 12 − values, S of Equation 7.3 is 21 Equation 7.6 yields

the following sample value of τ:

Since the sample size is ten, critical values are obtained from tables, with thefollowing tabular summary of the decision for a one-tailed test:

Thus, for a 5% level the null hypothesis is rejected, which suggests that the datacontain a trend At smaller levels of significance, the test would not suggest a trend

in the sequence

Example 7.3

The 50-year annual maximum flood record for the Northwest Branch of the AnacostiaRiver watershed (Figure 2.1) was analyzed for trend Since the record length isgreater than 30, the normal approximation of Equation 7.5 is used:

(7.7)

Because the Northwest Branch of the Anacostia River has undergone urbanization,the one-sided alternative hypothesis for a direct trend is studied Critical values of

z for 5% and 0.1% levels of significance are 1.645 and 3.09, respectively Thus, the

computed value of 3.83 is significant, and the null hypothesis is rejected The testsuggests that the flood series reflects an increasing trend that we may infer resultedfrom urban development within the watershed

z= 459 =

119 54 3 83.

Trang 11

Example 7.4

The two 24-year, annual-maximum flood series in Table 7.1 for Pond Creek and theNorth Fork of the Nolin River were analyzed for trend The two adjacent watershedshave the same meteorological conditions Since the sample sizes are below 30,

Equation 7.6 will be used for the tests S is 150 for Pond Creek and 30 for Nolin

River Therefore, the computed τ for Pond Creek is:

For Nolin River, the computed τ is:

For levels of significance of 5, 2.5, 1, and 0.5%, the critical values are 0.239, 0.287,0.337, and 0.372, respectively Thus, even at a level of significance of 0.5%, the nullhypotheses would be rejected for Pond Creek For Nolin River, the null hypothesismust be accepted at a 5% level of significance The results show that the Pond Creekseries is nonhomogeneous, which may have resulted from the trend in urbanization.Since the computed τ of 0.109 is much less than any critical values, the series forthe North Fork of the Nolin River does not contain a trend

7.3.1 R ATIONALE OF K ENDALL S TATISTIC

The random variable S is used for both the Kendall τ of Equation 7.6 and the normalapproximation of Equation 7.5 If a sequence consists of alternating high-flow andlow-flow values, the summation of Equation 7.3 would be the sum of alternating +

1 and −1 values, such as for deforestation, which would yield a near-zero value for

S Such a sequence is considered random so the null hypothesis should be accepted

for a near-zero value Conversely, if the sequence consisted of a series of increasinglylarger flows, such as for deforestation, which would indicate a direct trend, theneach Θ of Equation 7.3 would be +1, so S would be a large value If the flows

showed an inverse trend, such as for afforestation, then the summation of Equation7.3 would consist of values of −1, so S would be a large negative value The denominator of Equation 7.6 is the maximum possible number for a sequence of n flows, so the ratio of S to n(n − 1) will vary from −1, for an inverse trend to +1 for

a direct trend A value of zero indicates the absence of a trend (i.e., randomness)

For the normal approximation of Equation 7.5, the z statistic has the form of

the standard normal transformation equation: , where the mean and

s is the standard deviation For Equation 7.5, S is the random variable, a mean of

zero is inherent in the null hypothesis of randomness, and the denominator is the

standard deviation of S Thus, the null hypothesis of the Kendall test is accepted for values of z that are not significantly different from zero.

Trang 12

The Kendall test statistic depends on the difference in magnitude between everypair of values in the series, not just adjacent values For a series in which an abrupt

watershed change occurred, there will be more changes in sign of the (x j − x k) value

of Equation 7.3, which will lead to a value of S that is relatively close to zero This

is especially true if the abrupt change is near one of the ends of the flood series

For a gradual watershed change, a greater number of positive values of (x j − x k) willoccur Thus, the test will suggest a trend In summary, the Kendall test may detectwatershed changes due to either gradual trends or abrupt events However, it appears

to be more sensitive to changes that result from gradually changing trends

7.4 PEARSON TEST FOR SERIAL INDEPENDENCE

If a watershed change, such as urbanization, introduces a systematic variation into

a flood record, the values in the series will exhibit a measure of serial correlation.For example, if the percentage of imperviousness gradually increases over all or amajor part of the flood record, then the increase in the peak floods that results fromthe higher imperviousness will introduce a measure of correlation between adjacentflood peaks This correlation violates the assumption of independence and station-arity that is required for frequency analysis

The serial correlation coefficient is a measure of common variation betweenadjacent values in a time series In this sense, serial correlation, or autocorrelation,

is a univariate statistic, whereas a correlation coefficient is generally associated withthe relationship between two variables The computational objective of a correlationanalysis is to determine the degree of correlation in adjacent values of a time orspace series and to test the significance of the correlation The nonstationarity of anannual flood series as caused by watershed changes is the most likely hydrologicreason for the testing of serial correlation In this sense, the tests for serial correlationare used to detect nonstationarity and nonhomogeneity Serial correlation in a dataset does not necessarily imply nonhomogeneity

The Pearson correlation coefficient (McNemar, 1969; Mendenhall and Sincich,1992) can be used to measure the association between adjacent values in an orderedsequence of data For example, in assessing the effect of watershed change on anannual flood series, the correlation would be between values for adjacent years in

a sequential record The correlation coefficient could be computed for either themeasured flows or their logarithms but the use of logarithms is recommended whenanalyzing annual maximum flood records The two values will differ, but the differ-ence is usually not substantial except when the sample size is small The hypothesesfor the Pearson serial independence test are:

H0: ρ = 0

H A: ρ ≠ 0

in which ρ is the serial correlation coefficient of the population If appropriate for

a particular problem, a one-tailed alternative hypothesis can be used, either ρ > 0

or ρ < 0 As an example in the application of the test to annual maximum flooddata, the hypotheses would be:

Trang 13

H0: The logarithms of the annual maximum peak discharges represent a

sequence of n independent events.

H A: The logarithms of the annual maximum peak discharges are not seriallyindependent and show a positive association

The alternative hypothesis is stated as a one-tailed test in that the direction of theserial correlation is specified The one-tailed alternative is used almost exclusively

in serial correlation analysis

Given a sequence of measurements on the random variable x i (for i = 1, 2, …,

n), the statistic for testing the significance of a Pearson R is:

Note that for a data sequence of n values, only n − 1 pairs are used to compute the

value of R For a given level of significance α and a one-tailed alternative hypothesis,

the null hypothesis should be rejected if the computed t is greater than t v,α where

v = n − 3, the degrees of freedom Values of t v,α can be obtained from Table A.2

For a two-tailed test, t α/2 is used rather than tα For serial correlation analysis, the tailed positive correlation is generally tested Rejection of the null hypothesis wouldimply that the measurements of the random variable are not independent The serialcorrelation coefficient will be positive for both an increasing trend and a decreasingtrend When the Pearson correlation coefficient is applied to bivariate data, the slope

one-of the relationship between the two random variables determines the sign on thecorrelation coefficient In serial correlation analysis of a single data sequence, onlythe one-sided upper test is generally meaningful

Example 7.5

To demonstrate the computation of the Pearson R for data sequences that include

dominant trends, consider the annual maximum flows for two adjacent watersheds,one undergoing deforestation (A), which introduces an increasing trend, and oneundergoing afforestation (B), which introduces a decreasing trend The two data setsare given in Table 7.2

n

i i

n

i i n

i i

n

i i

n

i i

n

i i n

1 1

2

1 1

0 5

).

Trang 14

The Pearson R for the increasing trend is:

The Pearson R for the decreasing trend is:

Both are positive values because the sign of a serial correlation coefficient does not

reflect the slope of the trend The serial correlation for sequence A is higher than that for B because it is a continuously increasing trend, whereas the data for B

includes a rise in the third year of the record

Using Equation 7.8, the computed values of the test statistic are:

For a sample size of 7, 4 is the number of degrees of freedom for both tests Therefore,

the critical t for 4 degrees of freedom and a level of significance of 5% is 2.132 The trend causes a significant serial correlation in sequence A The trend in series

B is not sufficiently dominant to conclude that the trend is significant.

Example 7.6

The Pearson R was computed using the 24-year annual maximum series for the Pond

Creek and North Fork of the Nolin River watersheds (Table 7.1) For Pond Creek,

the sample correlation for the logarithms of flow is 0.72 and the computed t is 4.754

Trang 15

according to Equation 7.8 For 21 degrees of freedom and a level of significance of

0.01, the critical t value is 2.581, implying that the computed R value is statistically

significantly different from zero For the North Fork, the sample correlation is 0.065

and the computed t is 0.298 according to Equation 7.8 This t value is not statistically

significantly different from zero even at a significance level of 0.60

7.5 SPEARMAN TEST FOR TREND

The Spearman correlation coefficient (R S) (Siegel, 1956) is a nonparametric

alter-native to the Pearson R, which is a parametric test Unlike the Pearson R test, it is

not necessary to make a log transform of the values in a sequence since the ranks

of the logarithms would be the same as the ranks for the untransformed data Thehypotheses for a direct trend (one-sided) are:

H0: The values of the series represent a sequence of n independent events.

H A: The values show a positive correlation

Neither the two-tailed alternative nor the one-tailed alternative for negative tion is appropriate for watershed change

correla-The Spearman test for trend uses two arrays, one for the criterion variable andone for an independent variable For example, if the problem were to assess theeffect of urbanization on flood peaks, the annual flood series would be the criterionvariable array and a series that represents a measure of the watershed change would

be the independent variable The latter might include the fraction of forest cover forafforestation or deforestation or the percentage of imperviousness for urbanization

of a watershed Representing the two series as x i and y i, the rank of each item withineach series separately is determined, with a rank of 1 for the smallest value and a

rank of n for the largest value The ranks are represented by r xi and r yi , with the i corresponding to the ith magnitude.

Using the ranks for the paired values r xi and r yi, the value of the Spearman

coefficient R S is computed using:

1 3

Trang 16

For sample sizes greater than ten, the following statistic can be used to test the abovehypotheses:

(7.11)

where t follows a Student’s t distribution with n − 2 degrees of freedom For a

one-sided test for a direct trend, the null hypothesis is rejected when the computed t is greater than the critical tα for n − 2 degrees of freedom

To test for trend, the Spearman coefficient is determined by Equation 7.10 andthe test applies Equation 7.11 The Spearman coefficient and the test statistic arebased on the Pearson coefficient that assumes that the values are from a circular,normal, stationary time series (Haan, 1977) The transformation from measurements

on a continuous scale to ordinal scale (i.e., ranks) eliminates the sensitivity to thenormality assumption The circularity assumption will not be a factor because eachflood measurement is transformed to a rank

(7.12)The test statistic of Equation 7.11 is:

( )

..

Trang 17

7.5.1 R ATIONALE FOR S PEARMAN T EST

The Spearman test is more likely to detect a trend in a series that includes gradualvariation due to watershed change than in a series that includes an abrupt change

For an abrupt change, the two partial series will likely have small differences (d i)because it will reflect only the random variation in the series For a gradual change,both the systematic and random variation are present throughout the series, which

results in larger differences (d i) Thus, it is more appropriate to use the Spearmanserial correlation coefficient for hydrologic series where a gradual trend has beenintroduced by watershed change than where the change occurs over a short part ofthe flood record

Rank of

Discharge, r q

Imperviousness (%)

Rank of Imperviousness

Trang 18

7.6 SPEARMAN–CONLEY TEST

Recommendations have been made to use the Spearman R s as bivariate correlation

by inserting the ordinal integer as a second variable Thus, the x i values would be

the sequential values of the random variable and the values of i from 1 to n would

be the second variable This is incorrect because the integer values of i are not truly

values of a random variable and the critical values are not appropriate for the test.The Spearman–Conley test (Conley and McCuen, 1997) enables the Spearman sta-tistic to be used where values of the independent variable are not available

In many cases, the record for the land-use-change variable is incomplete.Typically, records of imperviousness are sporadic, for example, aerial photographstaken on an irregular basis They may not be available on a year-to-year basis.Where a complete record of the land use change variable is not available andinterpolation will not yield accurate estimates of land use, the Spearman test cannot

be used

The Spearman–Conley test is an alternative that can be used to test for serialcorrelation where the values of the independent variable are incomplete The Spear-man–Conley test is univariate in that only values of the criterion variable are used.The steps for applying it are as follows:

1 State the hypotheses For this test, the hypotheses are:

H0: The sequential values of the random variable are serially independent

H A : Adjacent values of the random variable are serially correlated.

As an example for the case of a temporal sequence of annual mum discharges, the following hypotheses would be appropriate:

maxi-H0: The annual flood peaks are serially independent

H A: Adjacent values of the annual flood series are correlated

For a flood series suspected of being influenced by urbanization, thealternative hypothesis could be expressed as a one-tailed test with an in-dication of positive correlation Significant urbanization would cause thepeaks to increase, which would produce a positive correlation coefficient.Similarly, afforestation would likely reduce the flood peaks over time,

so a one-sided test for negative serial correlation would be expected

2 Specify the test statistic Equation 7.10 can also be used as the test statistic for the Spearman–Conley test However, it will be denoted as R sc In

applying it, the value of n is the number of pairs, which is 1 less than the

number of annual maximum flood magnitudes in the record To compute

the value of R sc , a second series X t is formed, where X t = Y t−1 To compute

the value of R sc, rank the values of the two series in the same manner as

for the Spearman test and use Equation 7.10 to compute the value of R sc

3 Set the level of significance Again, this is usually set by convention,

Tiêu đề	Modeling Hydrologic Change: Statistical Methods - Chapter 7 Pot
Trường học	Unknown University
Chuyên ngành	Hydrology
Thể loại	Lecture Note
Năm xuất bản	2003
Thành phố	Unknown City

Định dạng
Số trang	37
Dung lượng	887,33 KB