Correlationanalyses can also be applied to a flood series to test for serial independence, withsignificance tests applied to assess whether an observed dependency is significant;the Pear
Trang 1The runs test can be used to test for nonhomogeneity due to a trend or an episodicevent The Kendall test tests for nonhomogeneity associated with a trend Correlationanalyses can also be applied to a flood series to test for serial independence, withsignificance tests applied to assess whether an observed dependency is significant;the Pearson test and the Spearman test are commonly used to test for serial corre-lation If a nonhomogeneity is thought to be episodic, separate flood frequencyanalyses can be done to detect differences in characteristics, with standard techniquesused to assess the significance of the differences The Mann–Whitney test is usefulfor detecting nonhomogeneity associated with an episodic event.
Four of these tests (all but the Pearson test) are classified as nonparametric Theytests can be applied directly to the discharges in the annual maximum series withoutmaking a logarithmic transform The exact same solution results when the test isapplied to the logarithms and to the untransformed data with all four tests This isnot true for the Pearson test, which is parametric Because a logarithmic transform
is cited in Bulletin 17B (Interagency Advisory Committee on Water Data, 1982),the transform should also be applied when making the statistical test for the Pearsoncorrelation coefficient
The tests presented for detecting nonhomogeneity follow the six steps of esis testing: (1) formulate hypotheses; (2) identify theory that specifies the teststatistic and its distribution; (3) specify the level of significance; (4) collect the dataand compute the sample value of the test statistic; (5) obtain the critical value ofthe test statistic and define the region of rejection; and (6) make a decision to rejectthe null hypothesis if the computed value of the test statistic lies in the region ofrejection
hypoth-7
Trang 27.2 RUNS TEST
Statistical methods generally assume that hydrologic data measure random variables,with independence among measured values The runs (or run) test is based on themathematical theory of runs and can test a data sample for lack of randomness orindependence (or conversely, serial correlation) (Siegel, 1956; Miller and Freund,1965) The hypotheses follow:
H0: The data represent a sample of a single independently distributed randomvariable
H A: The sample elements are not independent values
If one rejects the null hypothesis, the acceptance of nonrandomness does not indicatethe type of nonhomogeneity; it only indicates that the record is not homogeneous
In this sense, the runs test may detect a systematic trend or an episodic change Thetest can be applied as a two-tailed or one-tailed test It can be applied to the lower
or upper tail of a one-tailed test
The runs test is based on a sample of data for which two outcomes are possible,
x1 or x2 These outcomes can be membership in two groups, such as exceedances ornonexceedances of a user-specified criterion such as the median In the context offlood-record analysis, these two outcomes could be that the annual peak dischargesexceed or do not exceed the median value for the flood record A run is defined as
a sequence of one or more of outcome x1 or outcome x2 In a sequence of n values,
n1 and n2 indicate the number of outcomes x1 and x2, respectively, where n1+n2=
n The outcomes are determined by comparing each value in the data series with auser-specified criterion, such as the median, and indicating whether the data valueexceeds (+) or does not exceed (−) the criterion Values in the sequence that equalthe median should be omitted from the sequences of + and − values The solutionprocedure depends on sample size If the values of n1 and n2 are both less than 20,the critical number of runs, nα, can be obtained from a table If n1 or n2 is greaterthan 20, a normal approximation is made
The theorem that specifies the test statistic for large samples is as follows: Ifthe ordered (in time or space) sample data, contains n1 and n2 values for the twopossible outcomes, x1 and x2, respectively, in n trials, where both n1 and n2 are notsmall, the sampling distribution of the number of runs is approximately normal withmean, , and variance, , which are approximated by:
Trang 3in which n1 + n2 = n For a sample with U runs, the test statistic is (Draper andSmith, 1966):
(7.2)
where z is the value of a random variable that has a standard normal distribution.The 0.5 in Equation 7.2 is a continuity correction applied to help compensate forthe use of a continuous (normal) distribution to approximate the discrete distribution
of U This theorem is valid for samples in which n1 or n2 exceeds 20
If both n1 and n2 are less than 20, it is only necessary to compute the number
of runs U and obtain critical values of U from appropriate tables (see AppendixTable A.5) A value of U less than or equal to the lower limit or greater than orequal to the upper limit is considered significant The appropriate section of the table
is used for a one-tailed test The critical value depends on the number of values, n1
and n2 The typically available table of critical values is for a 5% level of significancewhen applied as a two-tailed test When it is applied as a one-tailed test, the criticalvalues are for a 2.5% level of significance
The level of significance should be selected prior to analysis For consistencyand uniformity, the 5% level of significance is commonly used Other significancelevels can be justified on a case-by-case basis Since the basis for using a 5% level
of significance with hydrologic data is not documented, it is important to assess theeffect of using the 5% level on the decision
The runs test can be applied as a one-tailed or two-tailed test If a direction isspecified, that is, the test is one-tailed, then the critical value should be selectedaccordingly to the specification of the alternative hypothesis After selecting thecharacteristic that determines whether an outcome should belong to group 1 (+) orgroup 2 (−), the runs should be identified and n1, n2, and U computed Equations7.1a and 7.1b should be used to compute the mean and variance of U The computedvalue of the test statistic z can then be determined with Equation 7.2
For a two-tailed test, if the absolute value of z is greater than the critical value
of z, the null hypothesis of randomness should be rejected; this implies that thevalues of the random variable are probably not randomly distributed For a one-tailed test where a small number of runs would be expected, the null hypothesis isrejected if the computed value of z is less (i.e., more negative) than the critical value
of z For a one-tailed test where a large number of runs would be expected, the nullhypothesis is rejected if the computed value of z is greater than the critical value of
z For the case where either nl or n2 is greater than 20, the critical value of z is −zα
or +zα depending on whether the test is for the lower or upper tail, respectively.When applying the runs test to annual maximum flood data for which watershedchanges may have introduced a systematic effect into the data, a one-sided test istypically used Urbanization of a watershed may cause an increase in the centraltendency of the peaks and a decrease in the coefficient of variation Channelizationmay increase both the central tendency and the coefficient of variation Where theprimary effect of watershed change is to increase the central tendency of the annual
z U U
S u
= − −0 5.
Trang 4maximum floods, it is appropriate to apply the runs test as a one-tailed test with a
small number of runs Thus, the critical z value would be a negative number, and
the null hypothesis would be rejected when the computed z is more negative than
the critical zα, which would be a negative value For a small sample test, the null
hypothesis would be rejected if the computed number of runs was smaller than the
critical number of runs
Example 7.1
The runs test can be used to determine whether urban development caused an
increase in annual peak discharges It was applied to the annual flood series of the
rural Nolin River and the urbanized Pond Creek watersheds to test the following
null (H0) and alternative (H A) hypotheses:
H0: The annual peak discharges are randomly distributed from 1945 to 1968,
and thus a significant trend is not present
H A: A significant trend in the annual peak discharges exists since the annual
peaks are not randomly distributed
The flood series is represented in Table 7.1 by a series of + and − symbols The
criterion that designates a + or − event is the median flow (i.e., the flow exceeded
or not exceeded as an annual maximum in 50% of the years) For the Pond Creek
and North Fork of the Nolin River watersheds, the median values are 2175 ft3/sec
and 4845 ft3/sec, respectively (see Table 7.1) If urbanization caused an increase in
discharge rates, then the series should have significantly more + symbols in the part
of the series corresponding to greater urbanization and significantly more − symbols
before urbanization The computed number of runs would be small so a one-tailed
Trang 5test should be applied While rejection of the null hypothesis does not necessarilyprove that urbanization caused a trend in the annual flood series, the investigatormay infer such a cause.
The Pond Creek series has only two runs (see Table 7.1) All values before 1956are less than the median and all values after 1956 are greater than the median Thus,
nl = n2 = 12 The critical value of 7 was obtained from Table A.5 The null hypothesisshould be rejected if the number of runs in the sample is less than or equal to 7.Since a one-tailed test was used, the level of significance is 0.025 Because thesequence includes only two runs for Pond Creek, the null hypothesis should berejected The rejection indicates that the data are nonrandom The increase in urban-ization after 1956 may be a causal factor for this nonrandomness
For the North Fork of the Nolin River, the flood series represents 14 runs (see
Table 7.1) Because n1 and n2 are the same as for the Pond Creek analysis, the criticalvalue of 7 applies here also Since the number of runs is greater than 7, the nullhypothesis of randomness cannot be rejected Since the two watersheds are locatednear each other, the trend in the flood series for Pond Creek is probably not due to
an increase in rainfall (In a real-world application, rainfall data should be examinedfor trends as well.) Thus, it is probably safe to conclude that the flooding trend forPond Creek is due to urban development in the mid-1950s
7.2.1 R ATIONAL A NALYSIS OF R UNS T EST
Like every statistical test, the runs test is limited in its ability to detect the influence
of a systematic factor such as urbanization If the variation of the systematic effect
is small relative to the variation introduced by the random processes, then the runstest may suggest randomness In such a case, all of the variation may be attributed
to the effects of the random processes
In addition to the relative magnitudes of the variations due to random processesand the effects of watershed change, the ability of the runs test to detect the effects
of watershed change will depend on its temporal variation Two factors are important.First, change can occur abruptly over a short time or gradually over the duration of
a flood record Second, an abrupt change may occur near the center, beginning, orend of the period of record These factors must be understood when assessing theresults of a runs test of an annual maximum flood series
Before rationally analyzing the applicability of the runs test for detecting logic change, summarizing the three important factors is worthwhile
hydro-1 Is the variation introduced by watershed change small relative to thevariation due to the randomness of rainfall and watershed processes?
2 Has the watershed change occurred abruptly over a short part of the length
of record or gradually over most of the record length?
3 If the watershed change occurred over a short period, was it near thecenter of the record or at one of the ends?
Answers to these questions will help explain the rationality of the results of a runstest and other tests discussed in this chapter
Trang 6Responses to the above three questions will include examples to demonstratethe general concepts Studies of the effects of urbanization have shown that the morefrequent events of a flood series may increase by a factor of two for large increases
in imperviousness For example, the peaks in the later part of the flood record forPond Creek are approximately double those from the preurbanization portion of theflood record Furthermore, variation due to the random processes of rainfall andwatershed conditions appears relatively minimal, so the effects of urbanization areapparent (see Figure 2.4) The annual maximum flood record for the Ramapo River
at Pompton Lakes, New Jersey (1922 through 1991) is shown in Figure 7.1 Thescatter is very significant, and an urbanization trend is not immediately evident.Most urban development occurred before 1968, and the floods of record then appearsmaller than floods that occurred in the late 1960s However, the random scatterlargely prevents the identification of effects of urbanization from the graph Whenthe runs test is applied to the series, the computed test statistic of Equation 7.2 equalszero, so the null hypothesis of randomness cannot be rejected In contrast to theseries for Pond Creek, the large random scatter in the Ramapo River series masksthe variation due to urbanization
The nature of a trend is also an important consideration in assessing the effect
of urbanization on the flows of an annual maximum series Urbanization of the PondCreek watershed occurred over a short period of total record length; this is evident
in Figure 2.4 In contrast, Figure 7.2 shows the annual flood series for the ElizabethRiver, at Elizabeth, New Jersey, for a 65-year period While the effects of the randomprocesses are evident, the flood magnitudes show a noticeable increase Many floods
at the start of the record are below the median, while the opposite is true for lateryears This causes a small number of runs, with the shorter runs near the center of
record The computed z statistic for the run test is −3.37, which is significant at the
FIGURE 7.1 Annual maximum peak discharges for Ramapo River at Pompton Lakes, New
Trang 70.0005 level Thus, a gradual trend, especially with minimal variation due to randomprocesses, produces a significant value for the runs test More significant randomeffects may mask the hydrologic effects of gradual urban development.
Watershed change that occurs over a short period, such as that in Pond Creek,can lead to acceptance or rejection of the null hypothesis for the runs test Whenthe abrupt change is near the middle of the series, the two sections of the recordwill have similar lengths; thus, the median of the series will fall in the center of thetwo sections, with a characteristic appearance of two runs, but it quite possibly will
be less than the critical number of runs Thus, the null hypothesis will be rejected.Conversely, if the change due to urbanization occurs near either end of the recordlength, the record will have short and long sequences The median of the flows willfall in the longer sequence; thus, if the random effects are even moderate, the floodseries will have a moderate number of runs, and the results of a runs test will suggestrandomness
It is important to assess the type (gradual or abrupt) of trend and the location(middle or end) of an abrupt trend This is evident from a comparison of the seriesfor Pond Creek, Kentucky, and Rahway River in New Jersey Figure 7.3 shows theannual flood series for the Rahway River The effect of urbanization appears in the
later part of the record The computed z statistic for the runs test is −1.71, which isnot significant at the 5% level, thus suggesting that randomness can be assumed
7.3 KENDALL TEST FOR TREND
Hirsch, Slack, and Smith (1982) and Taylor and Loftis (1989) provide assessments
of the Kendall nonparametric test The test is intended to assess the randomness of
a data sequence X i; specifically, the hypotheses (Hirsch, Slack, and Smith, 1982) are:
FIGURE 7.2 Annual maximum peak discharges for Elizabeth River, New Jersey.
Trang 8H0: The annual maximum peak discharges (x i ) are a sample of n independent
and identically distributed random variables
H A : The distributions of x j and x k are not identical for all k, j ≤ n with k ≤ j.
The test is designed to detect a monotonically increasing or decreasing trend in the
data rather than an episodic or abrupt event The above H A alternative is two-sided,which is appropriate if a trend can be direct or inverse If a direction is specified,then a one-tailed alternative must be specified Gradual urbanization would cause adirect trend in the annual flood series Conversely, afforestation can cause an inversetrend in an annual flood series For the direct (inverse) trend in a series, the one-sided alternative hypothesis would be:
H A : A direct (inverse) trend exists in the distribution of x j and x k
The theorem defining the test statistic is as follows If x j and x k are independent
and identically distributed random values, the statistic S is defined as:
ΘΘΘ
Trang 9For sample sizes of 30 or larger, tests of the hypothesis can be made using thefollowing test statistic:
series that did not include ties, and Kendall (1975) provided the adjustment shown
as the second term of Equation 7.5b Kendall points out that the normal tion of Equation 7.5a should provide accurate decisions for samples as small as 10,
approxima-but it is usually applied when N ≥ 30 For sample sizes below 30, the following τstatistic can be used when the series does not include ties:
(7.6)
Equation 7.6 should not be used when the series includes discharges of the samemagnitude; in such cases, a correction for ties can be applied (Gibbons, 1976)
After the sample value of the test statistic z is computed with Equation 7.5 and
a level of significance α selected, the null hypothesis can be tested Critical values
of Kendall’s τ are given in Table A.6 for small samples For large samples with a
two-tailed test, the null hypothesis H0 is rejected if z is greater than the standard normal deviate zα/2 or less than −zα/2 For a one-sided test, the critical values are zα
for a direct trend and −zα for an inverse trend If the computed value is greater than
zα for the direct trend, then the null hypothesis can be rejected; similarly, for an
inverse trend, the null hypothesis is rejected when the computer z is less (i.e., more
for
V
n n n t t i i t i
i g
=
=
∑( 1 2)( 5) ( 1 2)( 5)
181
τ =2S n n/[ ( −1)]
Trang 10Since there are 33 + and 12 − values, S of Equation 7.3 is 21 Equation 7.6 yields
the following sample value of τ:
Since the sample size is ten, critical values are obtained from tables, with thefollowing tabular summary of the decision for a one-tailed test:
Thus, for a 5% level the null hypothesis is rejected, which suggests that the datacontain a trend At smaller levels of significance, the test would not suggest a trend
in the sequence
Example 7.3
The 50-year annual maximum flood record for the Northwest Branch of the AnacostiaRiver watershed (Figure 2.1) was analyzed for trend Since the record length isgreater than 30, the normal approximation of Equation 7.5 is used:
(7.7)
Because the Northwest Branch of the Anacostia River has undergone urbanization,the one-sided alternative hypothesis for a direct trend is studied Critical values of
z for 5% and 0.1% levels of significance are 1.645 and 3.09, respectively Thus, the
computed value of 3.83 is significant, and the null hypothesis is rejected The testsuggests that the flood series reflects an increasing trend that we may infer resultedfrom urban development within the watershed
z= 459 =
119 54 3 83.
Trang 11Example 7.4
The two 24-year, annual-maximum flood series in Table 7.1 for Pond Creek and theNorth Fork of the Nolin River were analyzed for trend The two adjacent watershedshave the same meteorological conditions Since the sample sizes are below 30,
Equation 7.6 will be used for the tests S is 150 for Pond Creek and 30 for Nolin
River Therefore, the computed τ for Pond Creek is:
For Nolin River, the computed τ is:
For levels of significance of 5, 2.5, 1, and 0.5%, the critical values are 0.239, 0.287,0.337, and 0.372, respectively Thus, even at a level of significance of 0.5%, the nullhypotheses would be rejected for Pond Creek For Nolin River, the null hypothesismust be accepted at a 5% level of significance The results show that the Pond Creekseries is nonhomogeneous, which may have resulted from the trend in urbanization.Since the computed τ of 0.109 is much less than any critical values, the series forthe North Fork of the Nolin River does not contain a trend
7.3.1 R ATIONALE OF K ENDALL S TATISTIC
The random variable S is used for both the Kendall τ of Equation 7.6 and the normalapproximation of Equation 7.5 If a sequence consists of alternating high-flow andlow-flow values, the summation of Equation 7.3 would be the sum of alternating +
1 and −1 values, such as for deforestation, which would yield a near-zero value for
S Such a sequence is considered random so the null hypothesis should be accepted
for a near-zero value Conversely, if the sequence consisted of a series of increasinglylarger flows, such as for deforestation, which would indicate a direct trend, theneach Θ of Equation 7.3 would be +1, so S would be a large value If the flows
showed an inverse trend, such as for afforestation, then the summation of Equation7.3 would consist of values of −1, so S would be a large negative value The denominator of Equation 7.6 is the maximum possible number for a sequence of n flows, so the ratio of S to n(n − 1) will vary from −1, for an inverse trend to +1 for
a direct trend A value of zero indicates the absence of a trend (i.e., randomness)
For the normal approximation of Equation 7.5, the z statistic has the form of
the standard normal transformation equation: , where the mean and
s is the standard deviation For Equation 7.5, S is the random variable, a mean of
zero is inherent in the null hypothesis of randomness, and the denominator is the
standard deviation of S Thus, the null hypothesis of the Kendall test is accepted for values of z that are not significantly different from zero.
Trang 12The Kendall test statistic depends on the difference in magnitude between everypair of values in the series, not just adjacent values For a series in which an abrupt
watershed change occurred, there will be more changes in sign of the (x j − x k) value
of Equation 7.3, which will lead to a value of S that is relatively close to zero This
is especially true if the abrupt change is near one of the ends of the flood series
For a gradual watershed change, a greater number of positive values of (x j − x k) willoccur Thus, the test will suggest a trend In summary, the Kendall test may detectwatershed changes due to either gradual trends or abrupt events However, it appears
to be more sensitive to changes that result from gradually changing trends
7.4 PEARSON TEST FOR SERIAL INDEPENDENCE
If a watershed change, such as urbanization, introduces a systematic variation into
a flood record, the values in the series will exhibit a measure of serial correlation.For example, if the percentage of imperviousness gradually increases over all or amajor part of the flood record, then the increase in the peak floods that results fromthe higher imperviousness will introduce a measure of correlation between adjacentflood peaks This correlation violates the assumption of independence and station-arity that is required for frequency analysis
The serial correlation coefficient is a measure of common variation betweenadjacent values in a time series In this sense, serial correlation, or autocorrelation,
is a univariate statistic, whereas a correlation coefficient is generally associated withthe relationship between two variables The computational objective of a correlationanalysis is to determine the degree of correlation in adjacent values of a time orspace series and to test the significance of the correlation The nonstationarity of anannual flood series as caused by watershed changes is the most likely hydrologicreason for the testing of serial correlation In this sense, the tests for serial correlationare used to detect nonstationarity and nonhomogeneity Serial correlation in a dataset does not necessarily imply nonhomogeneity
The Pearson correlation coefficient (McNemar, 1969; Mendenhall and Sincich,1992) can be used to measure the association between adjacent values in an orderedsequence of data For example, in assessing the effect of watershed change on anannual flood series, the correlation would be between values for adjacent years in
a sequential record The correlation coefficient could be computed for either themeasured flows or their logarithms but the use of logarithms is recommended whenanalyzing annual maximum flood records The two values will differ, but the differ-ence is usually not substantial except when the sample size is small The hypothesesfor the Pearson serial independence test are:
H0: ρ = 0
H A: ρ ≠ 0
in which ρ is the serial correlation coefficient of the population If appropriate for
a particular problem, a one-tailed alternative hypothesis can be used, either ρ > 0
or ρ < 0 As an example in the application of the test to annual maximum flooddata, the hypotheses would be:
Trang 13H0: The logarithms of the annual maximum peak discharges represent a
sequence of n independent events.
H A: The logarithms of the annual maximum peak discharges are not seriallyindependent and show a positive association
The alternative hypothesis is stated as a one-tailed test in that the direction of theserial correlation is specified The one-tailed alternative is used almost exclusively
in serial correlation analysis
Given a sequence of measurements on the random variable x i (for i = 1, 2, …,
n), the statistic for testing the significance of a Pearson R is:
Note that for a data sequence of n values, only n − 1 pairs are used to compute the
value of R For a given level of significance α and a one-tailed alternative hypothesis,
the null hypothesis should be rejected if the computed t is greater than t v,α where
v = n − 3, the degrees of freedom Values of t v,α can be obtained from Table A.2
For a two-tailed test, t α/2 is used rather than tα For serial correlation analysis, the tailed positive correlation is generally tested Rejection of the null hypothesis wouldimply that the measurements of the random variable are not independent The serialcorrelation coefficient will be positive for both an increasing trend and a decreasingtrend When the Pearson correlation coefficient is applied to bivariate data, the slope
one-of the relationship between the two random variables determines the sign on thecorrelation coefficient In serial correlation analysis of a single data sequence, onlythe one-sided upper test is generally meaningful
Example 7.5
To demonstrate the computation of the Pearson R for data sequences that include
dominant trends, consider the annual maximum flows for two adjacent watersheds,one undergoing deforestation (A), which introduces an increasing trend, and oneundergoing afforestation (B), which introduces a decreasing trend The two data setsare given in Table 7.2
n
i i
n
i i n
i i
n
i i
n
i i
n
i i n
1 1
2
2
1 1
0 5
).
Trang 14The Pearson R for the increasing trend is:
The Pearson R for the decreasing trend is:
Both are positive values because the sign of a serial correlation coefficient does not
reflect the slope of the trend The serial correlation for sequence A is higher than that for B because it is a continuously increasing trend, whereas the data for B
includes a rise in the third year of the record
Using Equation 7.8, the computed values of the test statistic are:
For a sample size of 7, 4 is the number of degrees of freedom for both tests Therefore,
the critical t for 4 degrees of freedom and a level of significance of 5% is 2.132 The trend causes a significant serial correlation in sequence A The trend in series
B is not sufficiently dominant to conclude that the trend is significant.
Example 7.6
The Pearson R was computed using the 24-year annual maximum series for the Pond
Creek and North Fork of the Nolin River watersheds (Table 7.1) For Pond Creek,
the sample correlation for the logarithms of flow is 0.72 and the computed t is 4.754
Trang 15according to Equation 7.8 For 21 degrees of freedom and a level of significance of
0.01, the critical t value is 2.581, implying that the computed R value is statistically
significantly different from zero For the North Fork, the sample correlation is 0.065
and the computed t is 0.298 according to Equation 7.8 This t value is not statistically
significantly different from zero even at a significance level of 0.60
7.5 SPEARMAN TEST FOR TREND
The Spearman correlation coefficient (R S) (Siegel, 1956) is a nonparametric
alter-native to the Pearson R, which is a parametric test Unlike the Pearson R test, it is
not necessary to make a log transform of the values in a sequence since the ranks
of the logarithms would be the same as the ranks for the untransformed data Thehypotheses for a direct trend (one-sided) are:
H0: The values of the series represent a sequence of n independent events.
H A: The values show a positive correlation
Neither the two-tailed alternative nor the one-tailed alternative for negative tion is appropriate for watershed change
correla-The Spearman test for trend uses two arrays, one for the criterion variable andone for an independent variable For example, if the problem were to assess theeffect of urbanization on flood peaks, the annual flood series would be the criterionvariable array and a series that represents a measure of the watershed change would
be the independent variable The latter might include the fraction of forest cover forafforestation or deforestation or the percentage of imperviousness for urbanization
of a watershed Representing the two series as x i and y i, the rank of each item withineach series separately is determined, with a rank of 1 for the smallest value and a
rank of n for the largest value The ranks are represented by r xi and r yi , with the i corresponding to the ith magnitude.
Using the ranks for the paired values r xi and r yi, the value of the Spearman
coefficient R S is computed using:
1 3
Trang 16For sample sizes greater than ten, the following statistic can be used to test the abovehypotheses:
(7.11)
where t follows a Student’s t distribution with n − 2 degrees of freedom For a
one-sided test for a direct trend, the null hypothesis is rejected when the computed t is greater than the critical tα for n − 2 degrees of freedom
To test for trend, the Spearman coefficient is determined by Equation 7.10 andthe test applies Equation 7.11 The Spearman coefficient and the test statistic arebased on the Pearson coefficient that assumes that the values are from a circular,normal, stationary time series (Haan, 1977) The transformation from measurements
on a continuous scale to ordinal scale (i.e., ranks) eliminates the sensitivity to thenormality assumption The circularity assumption will not be a factor because eachflood measurement is transformed to a rank
(7.12)The test statistic of Equation 7.11 is:
( )
..
Trang 177.5.1 R ATIONALE FOR S PEARMAN T EST
The Spearman test is more likely to detect a trend in a series that includes gradualvariation due to watershed change than in a series that includes an abrupt change
For an abrupt change, the two partial series will likely have small differences (d i)because it will reflect only the random variation in the series For a gradual change,both the systematic and random variation are present throughout the series, which
results in larger differences (d i) Thus, it is more appropriate to use the Spearmanserial correlation coefficient for hydrologic series where a gradual trend has beenintroduced by watershed change than where the change occurs over a short part ofthe flood record
Rank of
Discharge, r q
Imperviousness (%)
Rank of Imperviousness
Trang 187.6 SPEARMAN–CONLEY TEST
Recommendations have been made to use the Spearman R s as bivariate correlation
by inserting the ordinal integer as a second variable Thus, the x i values would be
the sequential values of the random variable and the values of i from 1 to n would
be the second variable This is incorrect because the integer values of i are not truly
values of a random variable and the critical values are not appropriate for the test.The Spearman–Conley test (Conley and McCuen, 1997) enables the Spearman sta-tistic to be used where values of the independent variable are not available
In many cases, the record for the land-use-change variable is incomplete.Typically, records of imperviousness are sporadic, for example, aerial photographstaken on an irregular basis They may not be available on a year-to-year basis.Where a complete record of the land use change variable is not available andinterpolation will not yield accurate estimates of land use, the Spearman test cannot
be used
The Spearman–Conley test is an alternative that can be used to test for serialcorrelation where the values of the independent variable are incomplete The Spear-man–Conley test is univariate in that only values of the criterion variable are used.The steps for applying it are as follows:
1 State the hypotheses For this test, the hypotheses are:
H0: The sequential values of the random variable are serially independent
H A : Adjacent values of the random variable are serially correlated.
As an example for the case of a temporal sequence of annual mum discharges, the following hypotheses would be appropriate:
maxi-H0: The annual flood peaks are serially independent
H A: Adjacent values of the annual flood series are correlated
For a flood series suspected of being influenced by urbanization, thealternative hypothesis could be expressed as a one-tailed test with an in-dication of positive correlation Significant urbanization would cause thepeaks to increase, which would produce a positive correlation coefficient.Similarly, afforestation would likely reduce the flood peaks over time,
so a one-sided test for negative serial correlation would be expected
2 Specify the test statistic Equation 7.10 can also be used as the test statistic for the Spearman–Conley test However, it will be denoted as R sc In
applying it, the value of n is the number of pairs, which is 1 less than the
number of annual maximum flood magnitudes in the record To compute
the value of R sc , a second series X t is formed, where X t = Y t−1 To compute
the value of R sc, rank the values of the two series in the same manner as
for the Spearman test and use Equation 7.10 to compute the value of R sc
3 Set the level of significance Again, this is usually set by convention,