Slide 8 - Statistical Methods
Slide 8 - Statistical Methods
INTRODUCTION
When determining how to appropriately analyze any collection
of data, the first consideration must be the characteristics of
the data themselves.
Characteristics often described include:
a measure of the center of the data,
a measure of spread or variability,
a measure of the symmetry of the data distribution
estimates of extremes such as some large or
small percentile.
INTRODUCTION, Cont.
n i
1
Measure of Spread
• However, the population mean μ is not known
precisely
1
and therefore it is necessary to compute
s2
(x1 x)2
n 1 i
• The square
1 root of the variance is called standard
deviation and is measured in the same units as the
variate and therefore easier to compare.
• The coefficient of variation Cv defined as σ/μ or s/x
is
useful for comparing relative variability.
Measures of Symmetry
Measures of Symmetry
• If the data are exactly symmetrically displaced about
the mean, then the measure of symmetry should
be zero.
• Furthermore, it would exhibit the property that all
odd moments equal zero.
• A skewed distribution, however, would have
excessive data to either side of the centre.
Measures of Symmetry
• If data to the right of the mean are more spread out
from the mean than those on the left, by convention
the skewness is positive and vice versa for negative
asymmetry. 1 n
• The third moment α is; (xi
n i )3
1
• The best estimate of the third moment is computed
n
n
by; a = (x x)3
(n 1)(n 2) i
i
1
Measures of Symmetry
• The coefficient of skewness is the ratio α/σ3 and the
a
best estimate is given by; C =
3 s 3
• For symmetrical distributions, the third moment is
zero and Cs = 0; for right skewness, Cs > 0 and for left
skewness, Cs < 0.
Kurtosis
Kurtosis
• Another criterion for determining the shape of a
unimodal frequency curve is its ‘peakedness’.
• Kurtosis is a Greek word meaning bulginess.
• If the frequency curve is highly peaked, a large
number of observations have same values.
• Again, if the curve is flat, a large number of
observations have low frequency and are spread in
the mid of interval.
Kurtosis
• In both these situations, the curve is said to be a
kurtic curve.
• If a frequency curve is more peaked than normal, it is
called a Leptokurtic curve.
• If it is less peaked (flat) than normal, it is called a
Platykurtic curve.
• If a curve is properly peaked, it is called a Mesokurtic
curve.
Kurtosis
• The formula for the measure of kurtosis is,
4
4
22
4
4
• It is apparent from (4.6a) that α4 is always positive,
as it is the ratio of two positive quantities.
• α4 has no physical unit.
• For Leptokurtic curve, α4 > 3 and for a platykurtic
curve α4 < 3
Kurtosis
• If α4 = 3, the curve is a mesokurtic.
• The quantity (α4 -3) is called the excess of kurtosis.
• For a sample of n values the sample kurtosis is;
1 n
n
i1 ix
g 2 m4 3 3
m22 1 x 4
2
n i (x i x)
2
n 1
• Where m4 is the fourth sample moment about the
mean, m2 is the second sample moment about the
mean (that is, the sample variance), xi is the ith value,
and x is the sample mean.
Example
Example
• The following values in the Table below are the
annual maximum flows (in m3/s) from 1970 to 1983,
for River Sezibwa measured near Lugazi.
6.37 10.72 9.56 10.23 10.30 7.36 16.53
17.36 8.50 6.56 10.08 9.68 12.24 9.26
Example
Arithmetic mean
x =
10.33929
s2Variance
1
(x1 x)2
n 1 i
1
Example
Kurtosis
g = 1
X 4455.803
2 14 3
1
2
0.374778
X 135.9579
14
Hydrological Data Series
i. Hydrological Data Series
• Each question about the frequency of occurrence of
a particular hydrological quantity (e.g. minimum/
maximum of stated severity) is answered by looking
at a record of flows in a particular way.
• To each question and answer, there corresponds
some type of model of the hydrological
process, which gives an idealized picture of
the process.
Hydrological Data Series
• This picture, although simplified, is frequently
adequate for the purposes of answering the question
being asked.
• The model filters out those portions of the entire
hydrograph record, which are not of immediate use.
What remains is a simpler data series (Cunnane,
1989).
Hydrological Data Series
ii. Continuous Records
• A well maintained hydrometric gauging station
provides a continuous record of water level or stage,
H, as a function of time.
• From this continuous record of instantaneous
discharge, a hydrograph can be obtained with the
help of a stage-discharge relation or rating curve.
• If the catchment area contributing to the gauging
station is small, then the river responds rapidly
to rainfall and the resulting hydrograph is
spiky with high frequency oscillations.
Frequency Diagrams
• If the catchment area is large, or the river drains
through a lake the hydrograph tends to be smooth.
fi=ni/N Fj=
V(mm)) V(mm)
(mm)
(mm)
Fig 4.1a A frequency histogram Fig 4.1b The cumulative distribution function
Frequency Diagrams
• The data of a continuous variable can be divided into
equal class intervals with an increment ∆V m3.
Starting with the first class interval i =1, count the
number of volumes n1 in it.
• The relative frequency of occurrence for the first
class interval is f1=n1/N where N is the number
of volumes in the record.
• This is repeated for each class interval and a plot of i
verses fi results in the frequency histogram for the
series.
Frequency Diagrams
• Alternatively built from the frequency histogram in
Fig 4.1a is the cumulative distribution function CDF in
Fig 4.1b.
• In the CDF graph the class interval j has the same
length as that of the histogram but the ordinate Fj is
the summation of the frequency values in the
histogram from the first class to the current value j.
j
Fj =
fi
i1
Frequency Diagrams
• The cumulative value is useful in the calculation of
probabilities associated with the exceedence or non-
exceedence of a particular volume.
• The frequency histogram and cumulative distribution
function are discrete representations of an
underlying continuous function called a probability
density function (pdf) describing the uncertainty
behaviour of hydrological events.
Frequency Diagrams
• Using statistical methods, the sample return periods
or the sample frequency histogram is used to fit a
theoretical continuous probability function.
• With reference to the Fig 4.2a, the continuous
theoretical function describing frequency of the
hydrological event in an area, is the probability
density function.
• The continuous distribution function is a smooth
monotonic increasing curve varying from 0 to1 as in
Fig 4.2b.
Frequency Diagrams
Fig 4.2a Probability density function Fig 4.2b Continuous distribution function
P (V)
CDF
P (v < V)
v V
0 V
Return Period
iii) Return Period
• As stated above, the distribution function is the
cumulative form of the probability density function
and expresses the probability of non exceedance Pr
(v < V), that the v value of an element drawn
randomly, would be less than a particular value V. Its
largest value is unity.
• The complement of F (V) is called the exceedance
probability of V, 1 – F(V). The reciprocal of the
exceedance probability is the return period.
Return Period
• i.e. 1/(1 – F(V) = T is the Return Period
• or F(VT) = 1 – 1/T , where VT has a return period
=T
• In repeated random trials, from the population, the
value VT is exceeded once in every T trials, but this
exceedance does not occur in a regular cyclic
manner.
• Once the analytical form of the curve is known, it can
be used to calculate the design hydrological event
associated with a given return period.
Return Period
• Return period is defined as the average time elapsing
between successive occurrences of some
hydrological event. N
1
lim
• T (Q’) = Average (t1, t2…) = N N t
I 1
Q’
t1 t2 t3 t4
Return Period
• In other words, it is the average interval in which a
specified event is equalled or exceeded.
• For example, if 25 cm rainfall in 24 hours on an
average is equalled or exceeded once in 20 years, the
recurrence interval of 25 cm rainfall is 20 years.
• Data collected should be adequate, accurate and
consistent. Normally a minimum of 20 years
data should be analyzed for reliable results.
Return Period
• Accordingly as the magnitude Q is varied so does the
value of the return period T. In this example, Q
increases with T. Larger values of Q have larger
return periods. The larger the value of T the rarer is
the value with which it is associated. Note that this
definition of return period does not entail any direct
reference to probability.
Return Period
• Probability of Exceedence (Occurrence) at least once
(J ) or risk. The probability of occurrence J at
least once in N successive years is given by :
J 1 1 pN
• Probability of (Non Exceedence) Non-occurrence in N
successive years (K): The probability that an event
will not occur in any of N successive years is given
by; K 1
pN
Frequency Distributions
iv) Frequency Distributions
• A distribution is an attribute of a statistical
population.
• If each element of a population has x then the
distribution describes the constitution of the
population as observed through its X values.
• It indicates whether they are in general very large or
very small, that is, their location on the axis.
• It shows whether they are bunched together or
spread out and whether they are
symmetrically disposed on the x-axis or
not.
Frequency Distributions
• These three are described by the mean, standard
deviation and skewness of the population x values.
• The distribution also gives the relative frequency or
proportion of various x values in the population in
the same way that a histogram gives that information
about a sample.
• The frequencies are probabilities and thus the
distribution gives the probability Pr (X x), that the x
value of an element drawn randomly from the
population would be less than a particular value x.
Frequency Distributions
• The distributions encountered in hydrology for
continuous variates are;
1. Normal (Gaussian)
2. Exponential
3. Gamma
4. Pearson Type III
5. Log-normal
6. Log Pearson Type III
Frequency Distributions
7. Extreme Value Type I (Double Exponential also
called Gumbel)
8. Extreme Value Type II (Frechet)
9. Extreme Value Type III (related to Weibull
distribution)
10. General Extreme Value (Jenkinson)
• Table 4.1 below gives parameters of some of the
continuous distributions used in hydrology.
Frequency Distributions
Normal Exponential EVI
Pdf f(x)or F(x)= 1 e−1/2(x-u)2 F(x)= 1 e−(x-x o )/ β F(x)= e−e−(x-u)/α
df F(x) √2 πσ σ Β
Mean Μ X0 + β u+0.5772α
St. dev. σ β 1.28α=πα
Variance σ2 β2 √6
3rd 0 π2α2/6
2
Moment σ4
1.
β3
4th 146σ3=
0 9
Moment 2.40α3
β4
Skewnes 5.
2
s 40α4
=14.61
α4
1.
14
F(XT)=1- 1 Quantile XT = μ + σ KT
T = [loc. Par.] + [scale par.] yr
Risk of at least one exceedance of QT in L years 1-(1 -1/T)L
Risk of at least one exceedance of QT in T years: 1-e-1
Discrete Data Series
v. Discrete Data Series
• The most commonly encountered ones are (all in
m3/s)
1. Mean daily flow series,
2. Mean monthly flow series,
3. Mean annual flow series,
4. Daily flow duration series,
5. Annual maximum flood series,
Discrete Data Series
6. Annual minimum flow series
7. Peaks over a threshold series (also known as partial
duration series)
• Each mean daily flow value is expressed as the
average of all the flows occurring during that day.
• A flow equal to a mean daily flow sustained
throughout 24 hours would amount to the same
volume as actually flowed in the river.
• The monthly and annual series have a similar
interpretation.
Discrete Data Series
• Thus if in 1990 the mean annual flow was 240m3/s,
this means that the total flow volume in that year
was
• 240m3/s x (365x24x60x60) s/year =7.56x109m3 in the
year
• The daily flow duration series is the mean daily flow
series arranged in order of magnitude from
smallest to largest (or vice versa).
Discrete Data Series
• It is frequently plotted to give a flow duration curve
(Section 9.4.4) from which it can be seen at a
glance what flow has been exceeded at a
stated proportion of the time.
• Consider flood peaks as an example. These occur in
no set physical pattern, either in time or in
magnitude.
• There is no means of forecasting the exact sequence
(i.e. times and magnitudes) of flood events which will
occur over the next twenty years at any site.
Discrete Data Series
• However, if it is assumed that the sequence which
will occur will have the same statistical
characteristics as sequences which occurred in the
past then it is possible to estimate the probability of
any magnitude being exceeded during the design life
(say twenty years) of some scheme.
• This estimation depends on the use of a statistical
model which gives an idealized picture of the entire
hydrograph.
Discrete Data Series
• The simplest models filter out all aspects of the
hydrograph except the flood peaks.
• Then only two questions remain
i. How to describe the varying times that elapse
between peaks, and
ii. How to describe the varying magnitudes of the
flood peaks themselves.
Frequency Models
• Two models which handle these questions in slightly
different ways are the partial duration series model
and the annual maximum series model.
• This will be dealt with below in the section on flood
frequency analysis.
• Other hydrological variables may also be subjected to
frequency analysis. These include;
(a) flood volume, V,
Frequency Models
(b) duration D of flooding above a certain discharge
(c) volumes S of flow deficiency below some
demand flow and
(d) minimum flow values, q and are shown in Figs.
4.3 and 4.4.
Discharg
e, m³/s
Q S
QD
q
Frequency Models
• Q = Instantaneous Peak flow rate.
• V = Volume of flow in 1 day, 2 days or k days
• q = Minimum flow rate
• S = Volume of deficiency relative to some
demand flow, QD.
Frequency Models
• If there are no flow records available at the site,
obtaining the Q-T relation is dealt with by ungauged
catchment methods discussed in Section 4.10.
• In these methods one or two key parameters of the
Q-T relation must be estimated from numerically
expressed characteristics of the catchment (size,
slope, climate, soil) using a relationship which has
been derived from the flow and physical data of
neighbouring catchments.
Frequency Models
• The objective is to determine a Q-T relationship at
any required site on a river.
• This section deals with the gauged situations, that is
a continuous record of flows are available at the site.
Flood Frequency Models
• Fig 4.5 a) A flood magnitude-return period relationship
• Fig 4.5 b) A minimum flow magnitude-return period
relationship
Q (m3/s) q (m3/s)
200 Q 1.0
100 0.5
q
0 0.0
1 100
T (years)
Frequency Analysis
• There are three main methods of frequency analysis
used in practice namely:
i. The straight-forward plotting technique, which is
used to obtain the cumulative distribution.
ii. The other method utilizes Frequency Factors.
iii. The cumulative distribution function provides a
quick means of determining the probability of an
event equal to or less than a specified quantity.
Frequency Analysis
4.3 Graphical Frequency Analysis
• The frequency of an event can be obtained by use of
“plotting positions” formulae. (Viesmann, Lewis,
1996).
• Plotting positions refers to probability value assigned
to each piece of data to be plotted.
• There are several methods proposed and most of
them are empirical.
Frequency Analysis
• In the analysis of annual maximum values, the
recurrence interval is approximated as the mean
time in years, with N future trials, for the mth largest
value to be exceeded once on the average.
• The mean number of exceedences for this condition
can be shown to be:
• = m
n
1
Frequency Analysis
• Where;
• = the mean number of exceedances
• N = the number of future trials
• n = the number of values
• m = the rank of descending values, with largest equal
to 1.
Frequency Analysis
• Several plotting position formulae are available and
they give different results for the same value of n and
m.
• Most plotting position formulae do not account for
samples size or length of record.
• Some of these formulae are shown in Table 4.2. The
probability values for a sample of 20 and a ranking of
5 are also shown.
Frequency Analysis
• Table 4.2: Plotting Position Formulae (for n = 20, m = 5)
0.2
0.010
0.1 0.005
-2 -1 0 1 2 3 4 5 y x
60 100 140 180
Flood Frequency Models
• Fig 4.7 Cumulative distribution function of EV Type1
G(y) F (x)
x
u
y
G (y ) e e F (x ) e e
1.0 1.0
0.5 0.5
0.0 0.0
-2 y x
-1 0 1 4 5 60 100 140 180
2 3
Example
• The annual maximum discharges on a River Aswa for
the years 1936-1965 are as follows in m3/s.
5140 5640 1050 6020 3740 4580 5140 10560 12840 7870
1180 2520 1730 12400 3400 3700 9540 4810 4550 7043
8667 4550 7460 3360 8450 3420 4890 5730 9020 3240
f Q
e
Q Q0
1
• Where;
• Qo= location parameter, in this case a lower bound
• β= scale parameter
• γ = shape parameter
• Г(γ) = Complete Gamma Function
Flood Frequency Models
• The mean, standard deviation and skewness can be
shown to be:
• µ = Q0 +βγ
• g = 2/
•Q =
T KT
• where KT is a frequency factor, which is given as a
function of g, the skewness. Such tables are attached
for - 3.0<g<3.0
Flood Frequency Models
• Procedure when using method of moments
estimation
• Assemble the N values of annual maximum flood
data Q1, Q2…
• Calculate sample estimates of mean, standard
deviation, third moment and skewness (unbiased
Qi Q
2
estimates) 1
Q
Q
i
N
NM 1
M 3 N Q
3 3
g 3
QNi 1N
2
Flood Frequency Models
• Enter Harter’s table of KT and read off values of KT for
each required T for the calculated value of g.
• Evaluate QT= K T for each desired value of
T
Example 4.3
• For the time series data of peak discharges as given
below, estimate the peak discharge for return
periods of 10 and 200 years by using the Pearson
Type III method.
Flood Frequency Models
Year 1 2 3 4 5 6 7 8 9 10
Flood Peak
(m3/s) 11125 7656 11259 8863 9973 11035 11499 7908 7947 8894
• Solution
Flood Frequency Models
(m3/s) (m3/s)
1 11125 9615.9 1509.1 2277382.81 3436798398.571
2 7656 9615.9 -1959.9 3841208.01 -7528383578.799
3 11259 9615.9 1643.1 2699777.61 4436004590.991
4 8863 9615.9 -752.9 566858.41 -426787696.889
5 9973 9615.9 357.1 127520.41 45537538.411
6 11035 9615.9 1419.1 2013844.81 2857847169.871
7 11499 9615.9 1883.1 3546065.61 6677596150.191
8 7908 9615.9 -1707.9 2916922.41 -4981811784.039
9 7947 9615.9 -1668.9 2785227.21 -4648265690.769
10 8894 9615.9 -721.9 521139.61 -376210684.459
96159 21295946.90 -507675586.920
Flood Frequency Models
• Standard deviation Q Q
i
2 = 1538.251
N 1
m /s
3
Example 4.4
• Consider the maximum discharges of River Aswa
given in example 4.1 and estimate the peak
discharge for return periods of 100 and 200 years by
using the Pearson Type III method
Flood Frequency Models
Solution
• Standard deviation Qi Q = 3080 m3/s
2
N 1
• And Q = 5741
• M3 = 20,054,093,900.25
• Coefficient of skewness of variate , g = 0.69
• For KT read Harter’s Table for g = 0.69
Flood Frequency Models
• Calculation of QT
T KT KT QT=μ+KT
(years) (m3/s)
100 2.817 8675.432 14,416.8
200 3.214 9898.062 15,639.4
Flood Frequency Models
iii) Log Pearson Type III
• The Log Pearson Type III probability distribution is
used for approximation of frequency characteristics
of measured annual flood peak data.
• This distribution has been widely adopted as one of
the standard methods for flood frequency
analysis. In this distribution the transform y =
log x is used to reduce skewness.
Flood Frequency Models
• Although all three moments are required to fit the
distribution, it is extremely flexible in that a zero
skew will reduce the Log-Pearson III distribution to a
Log-Normal and the Pearson Type III to a Normal.
• A very important property of Gamma variates and
normal variates is that the sum of the two such
variables retains the same distribution.
• This is an important feature in the syntheses of
hydrologic sequences.
Flood Frequency Models
• It has been noted, (Alexander, 2002) in the analysis
of floods, no direct method can be used with
confidence for return periods exceeding 50 years.
• The Pearson III distribution used for simulating daily
stream flows in reservoir studies has been advocated
and is recommended when simulating daily flows for
critical flood months.
Flood Frequency Models
Procedure
• First convert the series of annual maximum flows
(Q1, Q2…QN) into logarithms (Z1, Z2…Zn) where
Zi=logQi, and then to fit the Pearson Type 3
distribution for the Z series by the method of
moments.
• This results in values of ZT, which are converted to QT
values by exponentiation.
• The probability density function of Q, namely f(Q),
can be obtained from that of Z using the relation:
Flood Frequency Models
• f Q Z
dQ with evaluated with Z logQ and
• ddQQZ Q1 where is the pdf of Z
dZ
distribution.
log Z Q
0
f Q
e
log Z Q
0 1
Q
1
• When the distribution is expressed in this way, the
moments of it cannot be conveniently expressed in
terms of Qo, β and γ therefore, the practice has
developed of dealing with this distribution entirely in
the log domain
Flood Frequency Models
Example 4.5
• For the time series data of peak discharges given in
Example 4.2, estimate the peak discharge for return
periods of 10 and 200 years by using the Log-Pearson
Type III method.
• Solution
Flood Frequency Models
Peak
Year Qi=logx Qi-Q (Qi-Q)2 (Qi-Q)3
Discharge
(m3/s)
d) Parameter Estimation
• The expression for QT contains three unknowns Qo,
and and these must be estimated from observed
data. This can be done in either of two distinct ways
from a record of N years.
Flood Frequency Models
i. Fix Qo a priori and abstract from the record of flows
every peak value exceeding qo. Let there be M of
them (Q1, Q2………QM).
• Then; λ = M/N
β =µ – Qo
• Where; µ= Q = Qi / N
• β=σ
• Qo = µ- σ
Flood Frequency Models
• Where, µ= Q as above and
• Both of these methods use the methods of moments
for parameter estimation. Other estimators could
be used also.
• These use different criteria for matching the
population to the sample and hence their algebraic
expression and resulting numerical results may differ
from the above.
Flood Frequency Models
e) Standard Error of Estimate
• An approximate expression for se(QT) for use with
the above method is
• se (QT) = { (1) InT + (In λT)2 λ }1/2
2
M M 1
f) Notation
• The series of peaks exceeding the threshold qo is
known as a partial duration series.
Flood Frequency Models
• It is the series remaining after truncating the entire
parent series at qo. When qo is chosen so that λ =1
the series is known as the annual exceedance series.
• The term ‘Peaks over a threshold’ (POT) series is
used synonymously with partial duration series.
Flood Frequency Models
Example 4.7
• The highest 77 peaks recorded on River Mayanja
between 29th March 1973 to 2nd May 1991 are given
in the table below. Using hydrometric years
extending from 1st May to 30th April, this record has
N = 19 years.
• Estimation Method (i): Qo fixed at 60m3/s
• The choice of Qo = 60m3/s is arbitrary. Having chosen
this, count the number of peaks exceeding it and
calculate their mean. This gives M = 40 peaks with
the following values:
Flood Frequency Models
125.0 116.0 79.9 133.3 196.0 324.9 99.4 65.9
108.6 126.7 172.8 166.3 118.9 63.4 113.6 80.5
125.5 112.6 140.9 122.8 186.3 158.3 127.7 94.8
115.6 66.9 115.4 181.3 137.4 190.3 146.3 151.5
127.8 116.0 111.9 65.2 200.1 192.9 210.5 219.4
• Standard deviation
i
N 1
= 0.269
• And Q = μz = 110.677/30 = 3.689
Flood Frequency Models
• M3 = -0.0148
• Coefficient of skewness of log variate
• g = -0.0148/0.2693 = -0.76
• For KT read Harter’s Table for g = -0.76
• For KT read Harter’s Table for g = 0
• Calculation of QT
T KT KT σ ZT=μz+KT σ QT=Antilog ZT
(years) (m3/s)
100 2.326 0.626 4.315 20,639.3
200 2.576 0.693 4.382 24,095.9
Flood Frequency Models
4.5.4 Gamma Distribution
• The time taken for a number of events to occur in a
Poisson process is given by the gamma
distribution, which is a distribution of a sum of
independent and identical exponentially
distributed random variables.
• It has been used to describe the distribution of depth
of precipitation in storms.
Flood Frequency Models
• But unlike log-normal distribution it has not been
possible to transform the coordinate scales in such a
manner that all cumulative gamma distributions
could be plotted as a straight line in order to judge
visually (approximately) whether an empirical
frequency distribution could be fitted by gamma
distribution.
• This has distinct disadvantages vis-à-vis log-normal
distribution and makes it less popular among the
users.
Flood Frequency Models
4.6 Errors in Frequency Estimation
• Errors in estimating QT the flood of return period T
may arise under two categories (Cunnane, 1983);
a. Model Errors
b. Sampling Errors
• 4.6.1 Model Error
• A Model Error is one made in the analysis.
Flood Frequency Models
• In analyzing annual maximum or minimum flow
series for instance, it is assumed that the available
AM series is a simple random sample from a single
population with distribution function F(Q). This
assumption implies that the:
a) Series is one of many possible such series which
could have occurred, each series having an equal
chance of occurring (random sample).
b) Population did not change with time during the
period of observation (stationarity).
Flood Frequency Models
c) Value occurring in year t, Qt, is independent of the
value which occurred in previous years, Qt-1, Qt-2
…………… this is referred to as lack of persistence.
d) Algebraic form F(Q) of the distribution is known,
and,
e) Relation between Q and T is the same in the model
as it is in nature.
• The assumption causing the biggest concern is d),
that the correct form of distribution F(Q) is known.
Flood Frequency Models
• Within the range of the observed data, two quite
different distributions might appear to describe the
distribution of the sample data quite well even
though the two distributions might be very different
in their tails.
• In one of these, QT may increase almost linearly with
log T while in another it may increase much more
rapidly at large T causing a rapid divergence of
estimates QT, as T increases.
Flood Frequency Models
• Within the range of observed data, as viewed on a
probability plot say, both types of distribution
may seem to be supported to some extent.
• In such a case guidance ought to be available from
the studies which have been made of flow records
world-wide but no absolutely firm knowledge is yet
available.
• It may require many more years of data to become
available before firm statements about the form
of distribution can be made.
Flood Frequency Models
• It is important to note that in this analysis one
assumes that the volumes are statistically
independent of one another.
• This means that the fact that having a hydrological
event in one year has nothing to do with whether or
not we have the hydrological event the next year or
the year after that.
• This serial independence is a reasonable assumption
in small watersheds. In large watersheds there might
be some inherent persistence
Flood Frequency Models
4.6.2 Sampling Errors
• A sampling error arises because the series of flows
being analyzed is but a sample (assumed
random) from an unknown population.
• Any quantity calculated from such a sample is a
statistic with its own theoretical sampling
distribution, the standard deviation of which is called
the standard error of the statistic.
• In 2-parameter distributions a quantile can be
written as:
Flood Frequency Models
• QT= µ+ σKT
• where KT is a frequency factor as mentioned in
Section 4.4 dependent on T and on the form of the
distribution being assumed for Q.
• A sample provided estimates of µ and σ,
namely Q
and σ and then an estimate of QT is
• QT = Q + σKT
• Its sampling variance can always be expressed as
• var (QT) = var Q + 2KT cov (Q,σ ) + K varσ
Flood Frequency Models
• where the variances and covariances on the RHS are
those of the sampling distributions of Q and σ.
• These sampling distributions and their variances
depend on the method of estimation as well as on
the distribution.
• As an example se(QT) = √var(QT) in the EV1
distribution, when estimation is by the method of
moments is
• se(QT) = N {1 + 1.14K T + 1.10K2T }1/2
Flood Frequency Models
• while in the Normal distribution it is
• se (QT) = {1+ K2T }1/2
N
• where in this case KT = yT where y is the standardised
normal N(0,1) variate whose distribution function
is widely tabulated.
• Formulae for se(QT) are less readily derived for three
parameters distributions and values for se(QT) have
been obtained in some such cases by simulation
methods.
Flood Frequency Models
• The magnitude of se(QT) is about 10% - 15% of QT for
two parameter distributions when the record length
is about 20 years.
• This is not very large and it is very much less
damaging than the error which could occur by
choosing the wrong form of distribution.
• The main conclusion is that model assumption is far
more damaging than sampling error.
Flood Frequency Models
• The annual maximum model with the EV1
distribution was used in study of flooding on Lake
Albert and the results showed that the highest
effective inflow (8971 cumecs) into the lake has a
recurrence interval of 28 years, whereas the highest
outflow (3029 cumecs) has a recurrence interval of
59 years, suggesting that the causes of flooding may
be because the maximum flow occurs more
frequently (> 2) and is of a greater magnitude (~ 3)
than the maximum outflow (Rugumayo, Kayondo,
2006).
Flood Frequency Models
4.7 Flood Models Compared
• 4.7.1 Aim of Each Model
• The ultimate aim of both models is the same. Each
tries to represent the flood peak aspects of the
entire flow hydrograph by a simple series of flood
peak values (Cunnane, 1983).
• 4.7.2 Number of Values in Each series
• The series used in the annual maximum model
consists of one value, the maximum peak flow, from
each year of record. Thus N years of record give N
items in the series.
Flood Frequency Models
• The series used in Partial Duration series model
consists of either the M highest peaks in the entire
record regardless of year of occurrence, here usually
M ≥ N, or alternatively it consists of all peaks which
exceed some threshold flow value Qo.
• The latter form of the series is also called peaks over
a threshold series. The algebraic probability
treatment differs slightly between the two forms of
series definition.
Flood Frequency Models
• If the Partial Duration series is made to consist of the
N highest peaks in the record, where N is the number
of years in the record, then this special case is
traditionally called the Annual Exceedance Series.
4.7.3 Independence
• In each series the model assumes that successive
items in the series are statistically independent and
come from the same probability distribution (i.e.
identically, independently, distributed).
Flood Frequency Models
• This assumption causes no problems for the Annual
maximum series, but it does for the Partial
Duration series.
• In the latter an arbitrary rule has to be adopted
about whether to include adjacent peaks or reject
one of them.
Flood Frequency Models
• The adoption of an arbitrary rule is always
unsatisfactory and this has lessened the popularity of
the Partial Duration Series model and has
correspondingly increased the popularity of the
Annual Maximum Series model.
• When the two series of data are extracted from the
same record of river flows many flood peak values
occur in both series.
Flood Frequency Models
4.7.4 Common Values in Each Series
• Those years having low floods, while contributing to
the Annual Maximum (AM) series as a result of its
definition, do not contribute to the Partial Duration
(P.D.) Series.
• On the other hand, years which have 2 or more large
(independent) flood peaks contribute twice or thrice
to the P.D. Series, but only once to the A.M. Series.
• The frequency distribution of flood magnitudes in
the
P.D. series tends to be abruptly truncated at some
threshold, while that of the A.M. series always has
values to the left of the mode.
Flood Frequency Models
• These latter values reflect the presence of annual
maximum values from the years with small floods.
4.7.5 Statistical Modeling
• The modeling problem in the A.M. series is one of
choice of distribution. Many different distributions
have been suggested including Extreme Value Type 1,
General Extreme Value, Log Normal, Pearson Type 3
and Gamma, Log-Pearson Type 3.
Flood Frequency Models
• The modelling problem in the P.D. series is also one
of choice of distribution coupled with choice of
the number of peaks, M, to include in the series.
• While increasing M increased the amount of
information in the series it sometimes makes the
problem of choice of distribution more difficult.
Design Return Period
4.8 Selection of Design Return Period
• Return Period: As defined in Section 4.2, the return
period (Tr) indicates the average interval between
the occurrence of floods equal to or greater than a
given magnitude.
• It must be noted that a 50 year flood may not occur
at the end of every 50 years period as no
periodicity is implied.
• It only means that if we consider a very long period,
say, 1000 years, there would be 1000/50= 20
floods of this magnitude or higher magnitude.
Design Return Period
• Such floods can occur even twice or thrice in a year
and at the other extreme it may not occur even for
60-70 years in a stretch. But the total number of such
floods in 1000 years would be equal to 20.
• Risk: the ratio (1/Tr) is the probability (p) with which
the Tr year flood may be equalled or exceeded in
any one year.
Design Return Period
• To select a design flood which is not likely to occur
during the life of the hydraulic structure, the design
return period should be much greater than the
estimated useful life of the structure.
• However, there is still no guarantee that such a flood
would not occur during the useful life as there is
always some risk.
• To reduce the risk, the return period of very
important structures such as spillways for very high
dams is taken very long.
Design Return Period
• The probability that the hydraulic structure does not
fail due to excess flood in any year is equal to (1-
1/Tr).
• If it is assumed that the annual flood peaks are
independent events, the probability that the
structure does not fail in the next N years is equal to
1 N
1
therefore, the risk that the structure may fail in
T
r
any
one of the next N years is given by;
• R = 1 - 1 1 (4.44)
N
r
T
Design Return Period
• Where; R is called the risk.
• Equation 4.44 may be used to determine the risk
R involved in adopting Tr year flood for a structure
with a useful life of N years
• For example, if the useful life (N) of a structure is 100
years, the risk for adopting Tr=100 years flood is
given by
100
• R=1- 1 1001 =0.634
Design Return Period
• Thus there is a 63.4% chance of its exceedance in its
useful life, which is rather great risk. To reduce the
risk, we can adopt a flood of Tr = 1000years.
• R = 1-(1-1/1000)100 = 0.095
• Thus the risk further, the return period should be
further increased.
• A judicious selection of the design return period is
made considering economy and other factors. Of
course, it is impossible to entirely eliminate the risk.
Design Return Period
Example 4.10
• A dam has an expected working life of 25 years, and
is designed for a peak flood of 100 years return
period. Estimate the risk of failure of this dam. If a
risk of 12.5 % is acceptable, what should be the
return period for it?
• Solution
• The risk of failure (R) on the dam by using
• P1= 1 – qn = 1 – (1 - p)n can be expressed as
Design Return Period
• R = 1 – (1 - P)n = 1 – (1 – 1/T)n
• Where; n=25 years, and T = 100 years
• Therefore, R = 1 – (1 – 1/100) 25 = 0.2221
• If R = 12.5% = 0.125, then the return period will be
calculated as
• 0.125 = 1- (1 – 1/T) 25
• Or T = 188 years
Design Return Period
Example 4.11
• At a station in Apac, Uganda, it was found that
250mm of rainfall has a return period of 25 years.
Determine the probability of one day rainfall which is
equal to or greater than 250 mm, once in 15
succession years.
• Solution
• From T = 1/P = (N+1)/M,
• P= 1/T, therefore
Design Return Period
• Probability of 250 mm of rainfall, P = 1/25 = 0.04
• For once in 15 successive years, n =15, r=1
• From the formula for probability of occurrence of
an event r times in n successive years that is:
• P r,n = nC Prqn-r = n!/((n-r)!r!)
r
y 2
xy nxy (4.47a
n 1
)
x y
• Where; x x x , y y y
• σ x, σ y = standard deviations of x and y,
respectively
• x, y = middle of each class interval respectively
• If r = 1, the correlation is perfect giving a straight line
plot (regression line).
• r = 0, no relation exists between x and y (scatter
plot).
• r → 1 indicates a close linear relationship.
• If a linear regression cannot be fitted, a quadratic
parabola can be used as the fitting curve, given by
• y = a + bx + cx2
• From the principles of least squares, a, b, and c can
be obtained by solving the three normal equations
• ∑y = na + b ∑x + c ∑x2
• ∑xy = a ∑x + b ∑x2 + c ∑x3
• ∑x2y = a ∑x2 + b ∑x3 + c ∑x4 (4.49)
• Where n = number of pairs of observed values of x
and y.
• Regardless of the type of curve fitted, the correlation
coefficient r is given by Eq. (4.47). The variables x and
y, for instance, may be precipitation and the
corresponding runoff, or gauge height and the
corresponding stream flow, and like that.
• For the exponential function y = cxm
• It can be transformed to a straight line by using
logarithms of the variables as
• Log y = log c + m log x
• By putting log x = X, log y = Y, log c = a and m = b, the
function becomes similar to Eq. (4.45), can be solved
for a and b from Eq. (4.53) and the exponential
function can be determined.
• Whichever fitting gives r→1 by Eq. (4.46), that curve
fitting is adopted. Statistical methods can be applied
to many kinds of meteorological data, such as
precipitation, temperature, floods, droughts, and
water quality.
4.9.2 Standard Error of Estimate
• A measure of the scatter about the regression line of
y on x in Eq. (4.47) is given by
y yest (4.52)
S y.x
2 n
• Which 2
is called the standard error of estimate of y
with respect to x; and yest is the value of y for the
given value of x in Eq. (4.45). Sy.x can also be
determined by the expressions
1 r 2 (4.53)
S y.x y
y 2 a y b xy
(4.54)
S y.x n2
n 1 2
S y.x
n2 y
2 b 2 x (4.55)
• Eq. (4.54) can be extended to non-linear regression
equations
Example 4.12
• Annual rainfall and runoff data for Sezibwa River for
10 years (1950-1959) are given below. Determine the
linear regression line between rainfall and runoff, the
correlation coefficient and the standard error of
estimate.
Year Rainfall Runoff
(mm) (mm)
1950 51 10
1951 94 22
1952 65 15
1953 42 12
1954 73 17
1955 112 19
1956 106 20
1957 86 18
1958 59 13
1959 84 16
• Solution
• The regression line computations are given below:
Rainfall Runoff
(mm) x (mm) y x2 xy ∆x= x- ∆y= y- (∆x)2 (∆y)2 ∆x. ∆y
x y
51 10 2601 510 -26.2 -6.2 686.44 38.44 162.44
94 22 8836 2068 16.8 5.8 282.24 33.64 97.44
65 15 4225 975 -12.2 -1.2 148.84 1.44 14.64
42 12 1764 504 -35.2 -4.2 1239.04 17.64 147.84
73 17 5329 1241 -4.2 0.8 17.64 0.64 -3.36
112 19 12544 2128 34.8 2.8 1211.04 7.84 97.44
106 20 11236 2120 28.8 3.8 829.44 14.44 109.44
86 18 7396 1548 8.8 1.8 77.44 3.24 15.84
59 13 3481 767 -18.2 -3.2 331.24 10.24 58.24
84 16 7056 1344 6.8 -0.2 46.24 0.04 -1.36
∑= 772 162 64468 13205 4869.6 127.6 698.6
• x n x 772
77.2 mm
10
• y y 162 16.2 mm
n
• From10
Equations 4.44
• 162 = 10a +772b
• 13205 =772a + 64468b
• Solving the above two simultaneous equations gives
• a = 5.125 b = 0.144
• Therefore the regression line is; y= 0.144x+5.125
• Or; R = 0.144P + 5.125 where R and P are in
mm r x.y
698.6 coefficient
• rCorrelation =0.886
2 2
x . y
4869.6 x 127.6
•
• Or; b r x y 22
y
r x
127.6
0.144
r 4869.6
y
2
y 127.6
y y
n
2
10 1 = 3.765 mm
y
n 1
1
S y.x 3.7651
0.886 2 = 1.746 mm
4.9.3 Linear Multiple Regression
• A regression equation for estimating a dependent
variable, say x1, from independent variables x2,x3, …
is called a regression equation of x1 on x2, x3, … and
like that; for three variables, it is given by
• x1 = a +bx2 +cx3
• The constants a, b and c can be determined by the
method of least squares. The least square regression
plane of x1 on x2 and x3 can be determined by solving
simultaneously the three normal equations
x 1 an b x2 c x3
2
x 1x 2 a x 2 b x 2 c
x 3 1x3 ax3 b x2x3 c
2 xx
3
• where
3
n is the set of data points (x1, x2, x3)
x
• The standard error of estimate of x1, with respect to
x2 and x3 is given by
S 1.23
x1 n x 13e s t
2
• where x1est = value of x1 for the given value of x2 and
x3 in Eq. (4.56)
• The coefficient of multiple correlation is given by
1 S 12 . 2 3
r 1.23 2
1
r1.23
1 1 r 1 r
2
1
2
1
2 3
1 2 1 2
r12 x x nx x
n 1 1 2
• r12 = the linear correlation coefficient between the
variables x1 and x2, ignoring the variable x3; and
similarly r13 and r23. r12, r23 are partial correlation
coefficients.
• From Eq. (4.66), S1.23 1 1 r21.23
very similar to Eq.(4.53).
Recurrence 31 15.50 10.33 7.75 6.20 5.17 4.43 3.88 3.44 3.1 2.81 2.58
Interval
• Solution
No. Recurrence O Log(x) Kt Z E (O - E) (O - E)2 / E
Interval
1 31.00 168.00 2.23 1.824 2.22 167.56 0.44 0.00
2 15.50 163.58 2.21 1.454 2.21 161.04 2.54 0.04
3 10.33 153.83 2.19 1.292 2.20 158.27 -4.44 0.12
4 7.75 143.08 2.16 1.084 2.19 154.78 -11.70 0.88
5 6.20 137.42 2.14 0.948 2.18 152.54 -15.12 1.50
6 5.17 134.17 2.13 0.857 2.18 151.06 -16.89 1.89
7 4.43 132.33 2.12 0.682 2.17 148.25 -15.92 1.71
8 3.88 129.50 2.11 0.528 2.16 145.83 -16.33 1.83
9 3.44 127.08 2.10 0.404 2.16 143.90 -16.82 1.97
10 3.10 126.90 2.10 0.309 2.15 142.44 -15.54 1.70
11 2.81 124.05 2.09 0.227 2.15 141.20 -17.15 2.08
12 2.58 122.67 2.09 0.163 2.15 140.23 -17.56 2.20
25.67 2 = 15.92
• Mean =
• 2.139
Standard deviation = 0.047
• From the table of chi-square at = 0.05 and (12-2 -1) =
9 d.f. 0.05,11
2
= 16.92
• The calculated value of chi-square is less than the
table value, it means that cal2 lies in the acceptance
region. Hence the given distribution follows a
lognormal distribution.
4.10 Frequency Analysis and
Ungauged Catchments
• The frequency analysis techniques cannot be directly
applied to ungauged catchments, because they are
dependant on the availability of data.
• One of the very useful techniques designed to tackle
this problem that was developed in the Flood
Studies Report UK, is the use of regional curves.
• This would allow for the estimation for the
magnitude of the flood peak of any return period for
ungauged catchments.
• A regional curve is a dimensionless plot of the ratio
of flood peak (QTr) of return period Tr to mean annual
flood (Q) against return period (Tr).
• By combining the records of gauged catchments in a
particular region, a single regional curve may be
plotted. For ungauged catchments Q may be
estimated by using catchment characteristics and
QTr/Q from the regional curve.
• Furthermore, for short records frequency analysis is
unreliable and hence in this case Q may be estimated
from the record and QTr can be found using the
regional curves.
4.10.1 Catchment Characteristics
• The techniques presented in the Flood Studies
Report, provided a basis for flood prediction for
ungauged catchments.
• This meant the development of quantitative
relationships between catchment characteristics and
flood magnitudes for large numbers of gauged
catchments and the application of these results to
ungauged catchments by use of multiple regression
techniques.
• The physical processes occurring in the hydrological
cycle provide us with a basis for the assessment of
runoff within a catchment (Chadwick and Morfet,
1989).
• Circulation of water takes place from the ocean to
the atmosphere via evaporation, and this water is
deposited on the catchment mainly as rainfall. From
there, it may follow several routes, but eventually
the water returns to the sea via the rivers.
• Within the catchment, several circulation routes are
possible. Rainfall is initially intercepted by vegetation
and may be re-evaporated.
• Secondly, infiltration into the soil or overland flow to
a stream channel or river may occur.
• Water entering the soil layer may remain in storage
(in the saturated zone) or may percolate to the
ground water table (the saturated zone).
• All subsurface water may move laterally and
eventually enter a stream channel.
• The main characteristics, which determine the
response of the catchment to rainfall, are:
a. Catchment area
b. Soil type(s) and depth(s)
c. Vegetation cover
d. Stream slopes and surface slopes
e. Rock type(s) and areas(s)
f. Drainage network (natural and man-made)
g. Lakes and reservoirs
h. Impermeable area (e.g. roads, buildings, etc)
• Furthermore, different catchments are in different
climates, hence the response of a catchment to
rainfall, is dependant upon the prevailing climate.
• This may be represented by:
a) Rainfall (depth, duration, and intensity)
b) Evaporation potential (derived from temperature,
humidity, wind speed and solar radiation
measurements or from evaporation pan records.
• However, from an engineering viewpoint, measures
of catchment characteristics are inadequate in
themselves, and quantitative measures are necessary
to predict flood magnitudes.
• In a study on low flows of Eastern catchments in
Uganda (Rugumayo, Ojeo, 2006), the following
equation for rural catchments was derived with a
multiple regression coefficient R of 0.961:-
• Q75(10) = 232.631+ 2.038 x 10-2 MAR - 1.469 x 10-5
AREA - 314S1085 - 22.507STRFQ + 3.195 X 10-3 MSL -
121PE
• Where Q = the flow available 75% of the time
mean annual flood (m3/s)
• AREA=the catchment area (km2)
• STFRQ = the stream frequency (no. of stream
junctions/AREA)
• S1085 = the slope of the main stream
(m/km)
• MAR = Mean annual rainfall (mm)
• PE = Potential Evaporation (mm)
• MSL = Mean stream length (m)
U1 West Nile drainage basin. Rivers mainly flow 21,780 0.348 0.381 0.371
into the Albert Nile
U2 Aswa River drainage area 36,816 0.358 0.253 0.162
U3 Lake Albert drainage area 39,577 0.325 0.219 0.188
U4 Lake Kyoga drainage area. Mainly swampy 38,043 0.509 0.378 0.231
U5 Mt. Elgon drainage area 4,100 0.322 0.136 0.105
U6 South western drainage area 33,134 0.217 0.13 0.169
U7 Lake Victoria drainage area 54,610 0.305 0.203 0.129
U8 Karamoja area. There was no reliable data for 22089 N/A N/A N/A
this region to facilitate analysis
Fig 4.8 Delineated Hydrologically Homogeneous Regions in Uganda
• Table 4.5: Selected Regression Models
Model Equation R2 SEE F Significance of F
at 95% level
Simple linear Q 3.921A0.310 0.224 0.506 7.64 0.0113
regression model
Multiple linear regression
model Q 1.239A0.410 R3.224 0.377 0.485 6.05 0.0088
10
0
ADF)
20
1
0 10
1
0
0 20 40 60 10 -2 -1 0 1 2
80 0 3
Percentage of time a flow is Plotting position,Wi
exceeded.
• Low flow analysis methods were applied to eight
catchments in eastern Uganda and six catchments in
northern Uganda, (Rugumayo, Ojeo, 2006) which are
relatively climatically homogenous and with
sufficient stream flow data, in order to estimate low
flow indices.
• The low flow indices were then correlated with the
catchment characteristics, using statistical software,
to develop relationships for estimating low flow
indices, at ungauged sites.
• It was observed that the multiple regression models
developed were linear, showed a very high degree of
correlation and can be used for preliminary design at
ungauged catchments. These models are illustrated
in Table 4.56
• The additional areas of research in low flow
hydrology include;
i. The understanding of specific low flow generating
mechanisms and relevance of gain and loss
processes to the wide variety of climatic,
topographic and geologic conditions,
ii. The impact of direct or indirect anthropogenic
effects on the low flows like deforestation,
groundwater pumping, conservation farming;
iii. development of methods, which quantify the
The
individual and combined effects of various
anthropogenic effects on low flow characteristics;
iv. With increasing pressure on water resources
emphasis should be placed on finer temporal
resolution of hyrological data and utilisation of
small castchments;
v. The time series of flows or the application of
general measures of catchment flow response;
vi. The use of larger regional data bases of flow
characteristics and vii) the impact of climate change
on low flows (Smakhtin, 2004).
• Table 4.6 Models Generated for Ungauged
Catchments in Eastern Uganda for Low Flow Indices
Independent Variables
Dependant
Variable Constant MAR 10-5 AREA 10-6 S1085 10-3 STRFQ 10-2 MSL 10-3 PE 10-4 KREC R
Q75 (10) 232.631 2038 146.9 314000 -2250.7 3.195 -1210000 0 0.961
Q95(10) 103.586 1130 193.6 -191000 -1042.2 -82.32 -526.8 0 0.989
KREC 1.660 -4.465 5.206 -1.202 -9.819 .-722 -4.719 0 0.978
MAM(10) 73.807 -1139 170.6 .-7332 435.6 -22.16 -225.5 0 0.974
ADF 2.485 387 -456.8 -50.99 -743.1 152000 -24.48 0 0.992
BFI -179 4.917 -2.545 3.667 10700 2.821 1.844 354 1
4.13 Time Series
• The measurements or numerical values of any
variable that changes with time constitute a time
series.
• A time series has also been defined to be any
collection of observations made sequentially in time
(Chatfield, 1996).
• From these two definitions, it is clear that the two
most important variables that are to be
monitored are the values and the times at which
they occur.
4.13.1 Hydrologic Time Series
• In general, hydrologic processes such as precipitation
and runoff evolve on a continuous time scale. For
example, a recording gauging station in a stream
provides a continuous record of stage and discharge
y(t) through time.
• A plot of the stream-flow hydrograph y(t) against time t
constitutes a stream-flow time series in continuous
time or simply a continuous time series. In practice
however, most hydrological processes that are of
interest evolve on a discrete time scale.
• Because of convenience or preference, the time series
are usually plotted in the form of a continuous line.
• This can be derived from the discrete time plot by
successively joining the tops of the sticks or bars to
form the desired continuous line. Most hydrologic
series are defined on hourly, daily, weekly, monthly,
bimonthly, quarterly and annual time intervals.
• The term seasonal time series is often adopted and it
refers to time series defined in intervals of time less
than a year (weekly, monthly etc.). Several categories
of time series exist depending on a number of factors
and they are defined below.
a) Single Time Series
• A single time series or a univariate series is a time
series of one hydrologic variable at a given site. The
natural flow of a river constitutes a single time series.
b) Multiple Time Series
• The set of two or more time series consists of a
multiple time series or a multivariate time series.
Multiple time series may be a set of time series of
different variables.
• A multiple time series may also arise at the same
gauging station if different hydrological variables are
being measured at it such as discharge, water depth,
temperature, sediment transport, etc.
c) Correlated and Uncorrelated Time Series
• For a given single time series, if the values say x's at
time t depend linearly on the x's at time t-k, for k
=1,2,. . ., then the time series is said to be auto
correlated, serially correlated or correlated in time.
• Otherwise, the time series is said to be uncorrelated
i.e. it is independent. Autocorrelation in some
hydrological time series such as streamflow time
series usually arises as a result of storage like surface,
soil and groundwater storage, which causes the
water to remain in the system through subsequent
time periods.
d) Cross-correlated Time Series
• If there are two time series with one series as the x
variable and the other as the y variable changing
with time t, then if the y's at time t are linearly
dependent on the x's at time t-k, for k =0,1,2,…, then
the series are said to be cross-correlated. It is
important to note that both time series can be
uncorrelated in time, yet are cross-correlated with
one another.
• Also each time series can be auto correlated in time
without there being cross-correlation between the
two. Just as there are physical reasons why time
series are autocorrelated, there are reasons why
time series are cross-correlated.
• One would expect that the streamflow series at two
nearby gauging stations must be cross-correlated.
• This is because they are relatively close to each other
and as a result they must be exposed to similar
climatic and hydrological activities.
e) Stationary and Nonstationary Time Series,
• A time series is said to be stationary if it is free from
trends, shifts and periodicities (cyclicities). The
implication here is that the statistical parameters of
time series such as the mean and variance remain
constant through time. Otherwise the time series is
said to be nonstationary.
• The general principle is that hydrologic time series
defined on an annual scale are stationary although
this may not be true as a result of large scale climatic
variability, natural disruptions e.g landslides or
human induced changes like the construction of a
reservoir upstream of the gauging station.
• Hydrologic time series defined on a smaller time
scale like monthly or weekly flows are typically
nonstationary mainly because of the annual cycle.
4.13.2 Partitioning the Time Series
Structure
• Hydrological time series exhibit varying degrees,
trends, shifts, seasonality, autocorrelation and non-
norrnality.
• These attributes of time series are referred to as the
components of the time series.
• The series can be decomposed or partitioned into its
components. Figure 4.11 shows a composite time
series together with other plots where an attempt
has been made to isolate the components.
• Raw data Decomposed into components
of variation
• Fig 4.11 Components in the decomposition of time
series
Time
yt yt
yt
(a) y1 y2
(a’)
yt - yt yt - yt
(b) yt (b’)
St St
yt – yt yt - yt τ +1
S St
(d) (d’)
iv) Seasonality (Periodicity)
• Hydrologic time series defined on a time scale
smaller than a year generally exhibit distinct seasonal
or periodic patterns.
• This is as a result of the annual revolution of the
earth around the sun which creates the annual cycle.
• The existence of periodic components can
be investigated quantitatively by constructing
a correlogram of the data.
• For a series of data yt the serial correlation
coefficients rK between yt and yt+k are calculated and
plotted against values of K (which is known as the
lag) for all pairs of data K time units apart in the
series.
• The identification of specific periodicities can be
assisted by the complex procedure of spectral
analysis which is explained fully in standard texts like
(Chatfield, 1980).
v) Removing Seasonality in the Mean and
Variance
• The procedure used in removing the trends in the
mean and variance is applied here too.
• The operation here is referred to as seasonal
standardisation and in most literature is called
deseasonalising the original series.
• This term is misleading since it may imply that the
new series is free from other seasonalities,
which may not necessarily be the case.
vi) Removing Seasonality in the Correlation
• The correlogram plots shown in Fig.4.13 are a means
of analyzing the dependence structure of the Zt
series.
• Regardless of whether the autocorrelation is
constant or periodic, removing the correlation
structure requires the use of a mathematical model.
• A simple model that can be used is the lag one
autoregressive process written as . z r z
t 1 t 1
εt
• The residual series z z rI zt 1
t t t may be checked
independence by plotting its correlogram. If there
for
are still signs of autoregression, a higher order model
is used.
4.13.3 Correlograms
• The existence of periodic components may be
investigated quantitatively by means of constructing
correlograms of the data.
• For a series of data Yt, the serial correlation
coefficients rL between Yt and yt+L are calculated and
plotted against values of L, (known as the lag), for all
pairs of data L time units apart in the series.
y y n1
n 1
t y t y
rL 1 t 1
N
n 1
y
t 1
t y
sample of n values of y .
• where y is the mean of the
2
● ● #
● # #
● #
●
●
# # # # #
# # ●
#
●
● #
#
● ● ●
●
● ●
● ●
●
●
● ●
●
●
where
y
t
c
is the original series is used. Monthly flows
frequently do not follow a normal distribution as
opposed to annual flows .
• The monthly flows tend to be log-normal or Pearson
ill distributed. A recursive equation for log-normal
simulation can be used to describe them (Matalas,
1967).
iv) Persistence
• Persistence refers to phenomenon whereby low flow
on one day tends to be followed by low flow on the
next and perhaps the next.
• The number of steps over which a measurable
persistence is deemed to exist is termed the lag. The
Markov model is particularly suitable in reproducing
the persistence characteristic.
• 4.13.5 Time Series Modelling
• The concepts discussed in previous sections are used
in the representation of hydrologic time series by
mathematical models. These models are a class of
stochastic models and they are grouped here in two
broad classes:
(i) Autoregressive models (AR models)
(ii) Autoregressive with Moving Average models
(ARMA models)
• These models can be used to reproduce some of the
most important statistical properties of the time
series under consideration.
yt j y e
t 1 t
j1
• is called an autoregressive model of order p in which
et is an uncorrelated normal random variable also
referred to as noise, innovation, error term or series
of shocks. It has mean zero and variance.
• The parameters of the model are µ,Ф1 ….and Φp. The
model is often denoted as the AR model. The AR (1)
model takes the form;
yt 1t1(y t
) e
• These models have been widely used for modelling
annual hydrologic time series and seasonal time
series after seasonal standardisation.
• The mean, variance and autocorrelation coefficients
of the AR(p) process are
( y)
Va r y 2
2
e
p
1 j j j
k 1 k 1
1
p
...
kp
ii) Markov models
• This class of models depends on the principle of
autoregression, which defines the AR models
described above. The Lag One Markov model is
expressed as:
Qi 1 Q
j 1
rj (Qi Q
j
)
t
• Where:e
• Q i =generated streamflow in period i+1
1
Q
j
b (Q r ) j
• Where
• b r i and j are indices with i running from 1,……
j j
j 1
j