Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views299 pages

Slide 8 - Statistical Methods

Water resources Engineering 1

Uploaded by

Namugenyi Betty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views299 pages

Slide 8 - Statistical Methods

Water resources Engineering 1

Uploaded by

Namugenyi Betty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 299

Slide 7: Stochastic Hydrology

INTRODUCTION
 When determining how to appropriately analyze any collection
of data, the first consideration must be the characteristics of
the data themselves.
 Characteristics often described include:
 a measure of the center of the data,
 a measure of spread or variability,
 a measure of the symmetry of the data distribution
 estimates of extremes such as some large or
small percentile.
INTRODUCTION, Cont.

 The data about which a statement or summary is to be


made are called the population, or sometimes the target
population.
 Rarely are all such data available to the scientist.
 It may be physically impossible to collect all data of interest
(all the water in a stream over the study period), or it may
just be financially impossible to collect them.
 Instead, a subset of the data called the sample is selected
and measured in such a way that conclusions about the
sample may be extended to the entire population.
STATISTICAL METHODS
• Methods of statistical analysis provide means of;
i. Reducing and summarizing observed data,
ii. Presenting information in a precise and meaningful
format,
iii. Determining underlying characteristics of the
observed phenomena, and
iv. Making predictions concerning the future
behaviour of hydrologic events and variables.
Statistical methods
• The frequency of a hydrologic event is the probability
that some value of a discrete variable will occur or some
value of a continuous variable will be equalled or
exceeded in any given year.
• The magnitude of an extreme event is inversely related
to the frequency of occurrence; very severe events
occur less frequently than more moderate events.
Statistical Parameters
 Statistical Parameters
• In a sample of annual flows from River Sezibwa, for example,
we would need to summarise the data by obtaining the
four main characteristics about the data series. These are;
i) measure of location or the central tendency,
ii) measure of spread or variability,
iii) measure of skewness or symmetry and
iv) kurtosis (Haan,1982,Chow et al,1988).
Measure of Location
 Measure of Location
• The arithmetic mean is the most used method for
measure of central tendency or location. It is also
considered as the first moment about the origin
x=
• The statistic x is the best estimate of population
mean μ. Another measure of central location is the
median, which is the middle value of the observed
data.
Measure of Spread
 Measure of Spread
• The spread or variability can be represented by the
total range of values or by average deviation
about the mean.
• Statistically this can be expressed as the mean
squared deviation or the second moment about the
mean.
• This parameter is termed n the variance and is
designated as;  2  1
(xi  )
2

 n i
1
Measure of Spread
• However, the population mean μ is not known
precisely
1
and therefore it is necessary to compute
s2 
(x1  x)2
 n 1 i
• The square
1 root of the variance is called standard
deviation and is measured in the same units as the
variate and therefore easier to compare.
• The coefficient of variation Cv defined as σ/μ or s/x
is
useful for comparing relative variability.
Measures of Symmetry
 Measures of Symmetry
• If the data are exactly symmetrically displaced about
the mean, then the measure of symmetry should
be zero.
• Furthermore, it would exhibit the property that all
odd moments equal zero.
• A skewed distribution, however, would have
excessive data to either side of the centre.
Measures of Symmetry
• If data to the right of the mean are more spread out
from the mean than those on the left, by convention
the skewness is positive and vice versa for negative
asymmetry. 1 n
• The third moment α is;   (xi 
 n i )3
1
• The best estimate of the third moment is computed
n
n
by; a =  (x  x)3
(n 1)(n  2) i
i
1
Measures of Symmetry
• The coefficient of skewness is the ratio α/σ3 and the
a
best estimate is given by; C =
3 s 3
• For symmetrical distributions, the third moment is
zero and Cs = 0; for right skewness, Cs > 0 and for left
skewness, Cs < 0.
Kurtosis
 Kurtosis
• Another criterion for determining the shape of a
unimodal frequency curve is its ‘peakedness’.
• Kurtosis is a Greek word meaning bulginess.
• If the frequency curve is highly peaked, a large
number of observations have same values.
• Again, if the curve is flat, a large number of
observations have low frequency and are spread in
the mid of interval.
Kurtosis
• In both these situations, the curve is said to be a
kurtic curve.
• If a frequency curve is more peaked than normal, it is
called a Leptokurtic curve.
• If it is less peaked (flat) than normal, it is called a
Platykurtic curve.
• If a curve is properly peaked, it is called a Mesokurtic
curve.
Kurtosis
• The formula for the measure of kurtosis is,
 4 
 4
 22


 4
 4
• It is apparent from (4.6a) that α4 is always positive,
as it is the ratio of two positive quantities.
• α4 has no physical unit.
• For Leptokurtic curve, α4 > 3 and for a platykurtic
curve α4 < 3
Kurtosis
• If α4 = 3, the curve is a mesokurtic.
• The quantity (α4 -3) is called the excess of kurtosis.
• For a sample of n values the sample kurtosis is;
1 n
n

 i1 ix 
g 2  m4  3 3
m22  1 x  4
2
 n i (x i  x) 
2

n 1 
• Where m4 is the fourth sample moment about the
mean, m2 is the second sample moment about the
mean (that is, the sample variance), xi is the ith value,
and x is the sample mean.
Example
 Example
• The following values in the Table below are the
annual maximum flows (in m3/s) from 1970 to 1983,
for River Sezibwa measured near Lugazi.
6.37 10.72 9.56 10.23 10.30 7.36 16.53
17.36 8.50 6.56 10.08 9.68 12.24 9.26
Example
 Arithmetic mean
x =

=6.37 10.72  9.56 10.23 10.30  7.36 16.53


1 17.36  8.50  6.56 10.08  9.68
X =
12.24  9.26 4

10.33929

s2Variance

1
(x1  x)2
 n 1 i
1
Example

Xi Xi - X (Xi - )2 (Xi - )3 (Xi - )4


6.37 -3.96929 15.75523 -62.537 248.2283
10.72 0.380714 0.144943 0.055182 0.021009
9.56 -0.77929 0.607286 -0.47325 0.368805
10.23 -0.10929 0.011943 -0.00131 0.000143
10.3 -0.03929 0.001543 -6.1E-05 2.38E-06
7.36 -2.97929 8.876143 -26.4446 78.78637
16.53 6.190714 38.32494 237.2588 1468.801
17.36 7.020714 49.29043 346.054 2429.546
8.5 -1.83929 3.382972 -6.22225 11.44461
6.56 -3.77929 14.283 -53.9795 204.005
10.08 -0.25929 0.067229 -0.01743 0.00452
9.68 -0.65929 0.434658 -0.28656 0.188932
12.24 1.900714 3.612715 6.866739 13.0517
9.26 -1.07929 1.164858 -1.25721 1.356915
∑= 135.9579 439.0155 4455.803
Example
• S2 = 135.9579 / 13 = 10.4583
• S = 3.233929 (Standard Deviation)
 Third Moment
n
a= n
(n 1)(n  2)  (x
i
i  x) 3
1
14
a= X 439.0155 
13X12
39.39883
Example
 Coefficient of Skewness
• C3 = a
s 3

C3 = 3.233929
39.39883 3 1.164908

 Kurtosis
g = 1
X 4455.803
2 14  3

 1 
2
0.374778
 X 135.9579
14
Hydrological Data Series
i. Hydrological Data Series
• Each question about the frequency of occurrence of
a particular hydrological quantity (e.g. minimum/
maximum of stated severity) is answered by looking
at a record of flows in a particular way.
• To each question and answer, there corresponds
some type of model of the hydrological
process, which gives an idealized picture of
the process.
Hydrological Data Series
• This picture, although simplified, is frequently
adequate for the purposes of answering the question
being asked.
• The model filters out those portions of the entire
hydrograph record, which are not of immediate use.
What remains is a simpler data series (Cunnane,
1989).
Hydrological Data Series
ii. Continuous Records
• A well maintained hydrometric gauging station
provides a continuous record of water level or stage,
H, as a function of time.
• From this continuous record of instantaneous
discharge, a hydrograph can be obtained with the
help of a stage-discharge relation or rating curve.
• If the catchment area contributing to the gauging
station is small, then the river responds rapidly
to rainfall and the resulting hydrograph is
spiky with high frequency oscillations.
Frequency Diagrams
• If the catchment area is large, or the river drains
through a lake the hydrograph tends to be smooth.
fi=ni/N Fj=

V(mm)) V(mm)
(mm)
(mm)

Fig 4.1a A frequency histogram Fig 4.1b The cumulative distribution function
Frequency Diagrams
• The data of a continuous variable can be divided into
equal class intervals with an increment ∆V m3.
Starting with the first class interval i =1, count the
number of volumes n1 in it.
• The relative frequency of occurrence for the first
class interval is f1=n1/N where N is the number
of volumes in the record.
• This is repeated for each class interval and a plot of i
verses fi results in the frequency histogram for the
series.
Frequency Diagrams
• Alternatively built from the frequency histogram in
Fig 4.1a is the cumulative distribution function CDF in
Fig 4.1b.
• In the CDF graph the class interval j has the same
length as that of the histogram but the ordinate Fj is
the summation of the frequency values in the
histogram from the first class to the current value j.
j
Fj =
 fi
i1
Frequency Diagrams
• The cumulative value is useful in the calculation of
probabilities associated with the exceedence or non-
exceedence of a particular volume.
• The frequency histogram and cumulative distribution
function are discrete representations of an
underlying continuous function called a probability
density function (pdf) describing the uncertainty
behaviour of hydrological events.
Frequency Diagrams
• Using statistical methods, the sample return periods
or the sample frequency histogram is used to fit a
theoretical continuous probability function.
• With reference to the Fig 4.2a, the continuous
theoretical function describing frequency of the
hydrological event in an area, is the probability
density function.
• The continuous distribution function is a smooth
monotonic increasing curve varying from 0 to1 as in
Fig 4.2b.
Frequency Diagrams
Fig 4.2a Probability density function Fig 4.2b Continuous distribution function

P (V)

CDF

P (v < V)

v V
0 V
Return Period
iii) Return Period
• As stated above, the distribution function is the
cumulative form of the probability density function
and expresses the probability of non exceedance Pr
(v < V), that the v value of an element drawn
randomly, would be less than a particular value V. Its
largest value is unity.
• The complement of F (V) is called the exceedance
probability of V, 1 – F(V). The reciprocal of the
exceedance probability is the return period.
Return Period
• i.e. 1/(1 – F(V) = T is the Return Period
• or F(VT) = 1 – 1/T , where VT has a return period
=T
• In repeated random trials, from the population, the
value VT is exceeded once in every T trials, but this
exceedance does not occur in a regular cyclic
manner.
• Once the analytical form of the curve is known, it can
be used to calculate the design hydrological event
associated with a given return period.
Return Period
• Return period is defined as the average time elapsing
between successive occurrences of some
hydrological event. N
1
lim
• T (Q’) = Average (t1, t2…) = N  N  t
I 1

• In the Fig 4.3 the flow value Q is exceeded five times.


The average inter-event time is the return period T of
Q. This average must be understood in the long-time
sense, being the average of all t1 values occurring
over a long period of time.
Return Period
• Fig 4.4 Return Period of Q’ = T = Average (t1, t2, t3 …….)

Q’

t1 t2 t3 t4
Return Period
• In other words, it is the average interval in which a
specified event is equalled or exceeded.
• For example, if 25 cm rainfall in 24 hours on an
average is equalled or exceeded once in 20 years, the
recurrence interval of 25 cm rainfall is 20 years.
• Data collected should be adequate, accurate and
consistent. Normally a minimum of 20 years
data should be analyzed for reliable results.
Return Period
• Accordingly as the magnitude Q is varied so does the
value of the return period T. In this example, Q
increases with T. Larger values of Q have larger
return periods. The larger the value of T the rarer is
the value with which it is associated. Note that this
definition of return period does not entail any direct
reference to probability.
Return Period
• Probability of Exceedence (Occurrence) at least once
(J ) or risk. The probability of occurrence J at
least once in N successive years is given by :
J  1 1 pN
• Probability of (Non Exceedence) Non-occurrence in N
successive years (K): The probability that an event
will not occur in any of N successive years is given
by; K  1 
pN
Frequency Distributions
iv) Frequency Distributions
• A distribution is an attribute of a statistical
population.
• If each element of a population has x then the
distribution describes the constitution of the
population as observed through its X values.
• It indicates whether they are in general very large or
very small, that is, their location on the axis.
• It shows whether they are bunched together or
spread out and whether they are
symmetrically disposed on the x-axis or
not.
Frequency Distributions
• These three are described by the mean, standard
deviation and skewness of the population x values.
• The distribution also gives the relative frequency or
proportion of various x values in the population in
the same way that a histogram gives that information
about a sample.
• The frequencies are probabilities and thus the
distribution gives the probability Pr (X  x), that the x
value of an element drawn randomly from the
population would be less than a particular value x.
Frequency Distributions
• The distributions encountered in hydrology for
continuous variates are;
1. Normal (Gaussian)
2. Exponential
3. Gamma
4. Pearson Type III
5. Log-normal
6. Log Pearson Type III
Frequency Distributions
7. Extreme Value Type I (Double Exponential also
called Gumbel)
8. Extreme Value Type II (Frechet)
9. Extreme Value Type III (related to Weibull
distribution)
10. General Extreme Value (Jenkinson)
• Table 4.1 below gives parameters of some of the
continuous distributions used in hydrology.
Frequency Distributions
Normal Exponential EVI
Pdf f(x)or F(x)= 1 e−1/2(x-u)2 F(x)= 1 e−(x-x o )/ β F(x)= e−e−(x-u)/α
df F(x) √2 πσ σ Β
Mean Μ X0 + β u+0.5772α
St. dev. σ β 1.28α=πα
Variance σ2 β2 √6
3rd 0 π2α2/6
2
Moment σ4
1.
β3
4th 146σ3=
0 9
Moment 2.40α3
β4
Skewnes 5.
2
s 40α4
=14.61
α4
1.
14

Return Frequency Factors


Period T
T YT KT YT KT YT KT
2 0.00 0.00 0.69 -0.31 0.37 -0.17
5 0.84 0.84 1.61 0.16 1.50 0.72
10 1.28 1.28 2.30 1.30 2.25 1.31
20 1.64 1.64 3.00 2.00 2.97 1.87
25 1.75 1.75 3.22 2.22 3.20 2.04
50 2.05 2.05 3.91 2.91 3.90 2.59
100 2.33 2.33 4.61 3.61 4.60 3.14
1000 3.09 3.09 6.91 5.91 6.90 4.94

F(XT)=1- 1 Quantile XT = μ + σ KT
T = [loc. Par.] + [scale par.] yr
Risk of at least one exceedance of QT in L years 1-(1 -1/T)L
Risk of at least one exceedance of QT in T years: 1-e-1
Discrete Data Series
v. Discrete Data Series
• The most commonly encountered ones are (all in
m3/s)
1. Mean daily flow series,
2. Mean monthly flow series,
3. Mean annual flow series,
4. Daily flow duration series,
5. Annual maximum flood series,
Discrete Data Series
6. Annual minimum flow series
7. Peaks over a threshold series (also known as partial
duration series)
• Each mean daily flow value is expressed as the
average of all the flows occurring during that day.
• A flow equal to a mean daily flow sustained
throughout 24 hours would amount to the same
volume as actually flowed in the river.
• The monthly and annual series have a similar
interpretation.
Discrete Data Series
• Thus if in 1990 the mean annual flow was 240m3/s,
this means that the total flow volume in that year
was
• 240m3/s x (365x24x60x60) s/year =7.56x109m3 in the
year
• The daily flow duration series is the mean daily flow
series arranged in order of magnitude from
smallest to largest (or vice versa).
Discrete Data Series
• It is frequently plotted to give a flow duration curve
(Section 9.4.4) from which it can be seen at a
glance what flow has been exceeded at a
stated proportion of the time.
• Consider flood peaks as an example. These occur in
no set physical pattern, either in time or in
magnitude.
• There is no means of forecasting the exact sequence
(i.e. times and magnitudes) of flood events which will
occur over the next twenty years at any site.
Discrete Data Series
• However, if it is assumed that the sequence which
will occur will have the same statistical
characteristics as sequences which occurred in the
past then it is possible to estimate the probability of
any magnitude being exceeded during the design life
(say twenty years) of some scheme.
• This estimation depends on the use of a statistical
model which gives an idealized picture of the entire
hydrograph.
Discrete Data Series
• The simplest models filter out all aspects of the
hydrograph except the flood peaks.
• Then only two questions remain
i. How to describe the varying times that elapse
between peaks, and
ii. How to describe the varying magnitudes of the
flood peaks themselves.
Frequency Models
• Two models which handle these questions in slightly
different ways are the partial duration series model
and the annual maximum series model.
• This will be dealt with below in the section on flood
frequency analysis.
• Other hydrological variables may also be subjected to
frequency analysis. These include;
(a) flood volume, V,
Frequency Models
(b) duration D of flooding above a certain discharge
(c) volumes S of flow deficiency below some
demand flow and
(d) minimum flow values, q and are shown in Figs.
4.3 and 4.4.

• Of the variables mentioned above V, D and S are


increasing functions of T while q is a decreasing
function, as shown in Fig 4.5.
Frequency Models
• In order to determine the Q-T relation (or those for
V, D, S or q) for any particular river site for which flow
records are available it is necessary to link the Q-T
relation with some quantities which are observable
or measurable from the flow records.
• This is done by recognising a relationship between
return period and probability of exceedance in a well
– defined population of flow values as indicated
below.
Frequency Models
• Fig 4.3 Some variables used in frequency analysis

Discharg
e, m³/s

Q S

QD
q
Frequency Models
• Q = Instantaneous Peak flow rate.
• V = Volume of flow in 1 day, 2 days or k days
• q = Minimum flow rate
• S = Volume of deficiency relative to some
demand flow, QD.
Frequency Models
• If there are no flow records available at the site,
obtaining the Q-T relation is dealt with by ungauged
catchment methods discussed in Section 4.10.
• In these methods one or two key parameters of the
Q-T relation must be estimated from numerically
expressed characteristics of the catchment (size,
slope, climate, soil) using a relationship which has
been derived from the flow and physical data of
neighbouring catchments.
Frequency Models
• The objective is to determine a Q-T relationship at
any required site on a river.
• This section deals with the gauged situations, that is
a continuous record of flows are available at the site.
Flood Frequency Models
• Fig 4.5 a) A flood magnitude-return period relationship
• Fig 4.5 b) A minimum flow magnitude-return period
relationship
Q (m3/s) q (m3/s)

200 Q 1.0

100 0.5
q

0 0.0
1 100
T (years)
Frequency Analysis
• There are three main methods of frequency analysis
used in practice namely:
i. The straight-forward plotting technique, which is
used to obtain the cumulative distribution.
ii. The other method utilizes Frequency Factors.
iii. The cumulative distribution function provides a
quick means of determining the probability of an
event equal to or less than a specified quantity.
Frequency Analysis
4.3 Graphical Frequency Analysis
• The frequency of an event can be obtained by use of
“plotting positions” formulae. (Viesmann, Lewis,
1996).
• Plotting positions refers to probability value assigned
to each piece of data to be plotted.
• There are several methods proposed and most of
them are empirical.
Frequency Analysis
• In the analysis of annual maximum values, the
recurrence interval is approximated as the mean
time in years, with N future trials, for the mth largest
value to be exceeded once on the average.
• The mean number of exceedences for this condition
can be shown to be:
•  = m
n 
1
Frequency Analysis
• Where;
• = the mean number of exceedances
• N = the number of future trials
• n = the number of values
• m = the rank of descending values, with largest equal
to 1.
Frequency Analysis
• Several plotting position formulae are available and
they give different results for the same value of n and
m.
• Most plotting position formulae do not account for
samples size or length of record.
• Some of these formulae are shown in Table 4.2. The
probability values for a sample of 20 and a ranking of
5 are also shown.
Frequency Analysis
• Table 4.2: Plotting Position Formulae (for n = 20, m = 5)

Method Formulae Probability Return Period


m
California n
0.2500 4.00
2m 1
Hazen 0.2250 4.44
2n
m
Weibull n1 0.2381 4.20
m  0.3
Chegadagyev n  0.4 0.2304 4.34
3
m
Blom 8 0.2284 4.39
1
n
4
3m 1
Turkey 3n  1
0.2295 4.36
Frequency Analysis
4.4 Frequency Analysis Using Frequency Factors
• Frequency analysis begins with calculation of the
statistical parameters required for a particular
probability distribution from the given data.
• Then for a given distribution, a K – T relationship can
be determined between the frequency factor and
the corresponding return period (Chow, 1951).
• This relationship can then be expressed in
mathematical terms or in tabular form.
Frequency Analysis
• The value of X the variable hydrological parameter
can be determined.
• Chow, proposed the use of the following equation;
X =  
• As the general equation for hydrologic frequency
analysis, where;
K= frequency factor named after Chow
σ = Standard deviation
X = Variable hydrological parameter
Frequency Analysis
• K is a function of T and varies with the coefficient of
skewness in skewed distributions and is affected
greatly by the number of years of record.
• T is the recurrence interval or return period.
• The reciprocal of the recurrence interval is the
exceedance frequency.
• The K – T relationships for some of the common
distributions are as follows:
Frequency Analysis
 Normal Distribution

• The value of K is equal to the value of z in the


standard normal distribution, and therefore; for a
normal distribution, the value of variable Q
corresponding to a given recurrence interval T is;
• Q  Q  Zσ
• where z is the standard normal variate and σ the
standard deviation
Frequency Analysis
 Extreme Value Type 1 Distribution

4.5 Flood Frequency Models


• At any river it is usually assumed that nature
provides a unique Q-T relationship and that Q is a
monotonically increasing function of T.
Flood Frequency Models
• In order to estimate this natural Q-T relation from a
good quality continuous hydrometric record of N
years duration, it is necessary to resort to a statistical
or stochastic model of the continuous hydrograph
which retains information in the hydrograph relevant
to the Q-T relation and discards the rest. (Cunnane,
1983)
Flood Frequency Models
• Two such models are
1. Annual Maximum Series Model, AM.
2. Partial Duration Series (or peak over a threshold)
model, PD
• Even if the statistical parameters of these models
were known perfectly for a given river site, it must
be assumed that a particular value of Q would be
attributed differing values of T by each of these
models and further that these values of T would
differ from the true value in nature.
Flood Frequency Models
• Thus, if for some Q, T is the true value of the return
period in nature and TAM and TPD are the values
attributed to Q by the two models it must be
assumed that ; T ≠ TAM ≠ TPD

 4.5.1 Annual Maximum Series Model


• The annual maximum series model replaces the
year’s hydrograph its largest flood and the series
thus formed is called the annual maximum series.
Flood Frequency Models
• The series Q1, Q2 ……..Qn where Q1 is the maximum
flow occurring in the ith year is assumed to be a
random sample from some underlying population.
• The distribution of values in the population cannot
be obtained theoretically but, because each Q is
extreme of what occurs in a year, the theory of
extreme value statistics is used and the annual
maximum data comes from the family of extreme
value distributions.
Flood Frequency Models
• While the suitability of the theory cannot be proven
absolutely, it has been found that Annual Maximum
(AM) data from some rivers can be described
adequately by the Extreme Value Type 1 (EVI)
distribution or by the General Extreme Value (GEV)
distribution.
• Other distributions will also be considered in later
sections.
Flood Frequency Models
i) Extreme Value Type I
• It is sometimes called Double Exponential or Gumbel.
This is a two-parameter distribution, whose form
arises from consideration of the statistical properties
of the sample extreme values and was first
introduced into hydrology by Gumbel.
• The theory of extreme values considers the
distribution of the largest (or smallest) observations
occurring in each group of repeated samples
Flood Frequency Models Flood
Frequency Models
• For example the study of peak flows uses just the
largest flow recorded each year at a gauging station
out of the many thousands of values recorded.
• The probability density functions and the cumulative
distribution functions with their equations of the EV1
function are given in Fig 4.6 and 4.7 respectively,
with the standardized variate Y and the linearized
variate X .
Flood Frequency Models
Fig 4.6 Probability density function of EV Type1
a) Standardized Y-Variate (b) X- Variate
g(y) f(x)
y
y e 
(y )    x  u  
  x  u   
0.4 0.020  
  e
e 1

   
f(x )  e 

0.3
0.015

0.2
0.010

0.1 0.005

-2 -1 0 1 2 3 4 5 y x
60 100 140 180
Flood Frequency Models
• Fig 4.7 Cumulative distribution function of EV Type1

G(y) F (x)

  x

 u
 y
G (y )  e  e F (x )  e  e

1.0 1.0

0.5 0.5

0.0 0.0
-2 y x
-1 0 1 4 5 60 100 140 180
2 3

Y – standardized distribution X – variate


distribution
Flood Frequency Models
• Although y may vary from - to +, the practical
range is -2 to +6, the distribution being skewed to
the right, that is to say, it is positive.
• The same applies to x, its practical range being u -
2, to u + 6, where  is a scale parameter.
• From Fig.4.6 the Extreme Value Type 1 (EVI), the
probability distribution function is :
  x  u   x
F (x)  exp  exp   
 
Flood Frequency Models
• The parameters α and u are given as:
6


u  x  0.5772
• The parameter u represents the mode of the
distribution (point of maximum probability density)
• It is necessary to link the concept of the Q-T
relationship with the information contained in the
Annual Maximum (AM) model if the probability
distribution of annual maximum values is defined by
its distribution function.
Flood Frequency Models
• F (q) = PR (Q<q)
• Then the variate value having return period T namely
QT is defined implicitly by the equation:
1
1-F(Q) = T
• which states that the exceedance probability of QT is
1/T
• If F (QT) is known, after some algebraic manipulation,
provides the Q-T relation.
Flood Frequency Models
• For instance, if F (QT) is the Extreme Value Type 1 or
Gumbel distribution the equation gives:
F (QT) = exp exp Qb  a  =1  T1
T


• which leads
to 

 1 


a T
 T

 
• QT = a  b ln
=
 by
 ln 1 
y 
 1 
=
• Where T   T 

Flood Frequency Models
• Is called the standardised or reduced EVI variate with
parameter values a=0, b=1.
• An alternative form is
• QT =µ +σ KT (4.23)
• where µ and σ are population mean and
standard deviation and
 1 
KT =   0.5772 ln  In 1  T
6 

 
•(4.24)
Is a frequency
 factor (section depending only on T).
In equations 4.19 to 4.24 is synonymous with
TAM.
Flood Frequency Models
• Note that K is a linear function of y and that YT ≈
In T  2 for T>5.
1

 Example
• The annual maximum discharges on a River Aswa for
the years 1936-1965 are as follows in m3/s.
5140 5640 1050 6020 3740 4580 5140 10560 12840 7870
1180 2520 1730 12400 3400 3700 9540 4810 4550 7043
8667 4550 7460 3360 8450 3420 4890 5730 9020 3240

• Calculate mean and variance


Flood Frequency Models
• µ = 5741, σ² = 9,484,372 , σ = 3080
• If it is assumed that the data constitutes a random
sample from a population of Q values which have the
EVI distribution then these values may be used to
estimate μ and σ of equation 4.23 and QT calculated
as follows:
T (Years) 2 5 10 25 50 100 200
KT (Eq. 4.24) -0.16 0.72 1.30 2.04 2.59 3.14 3.68
QT (Eq. 4.23) 5235 7957 9759 12035 13724 15401 17071
se(QT)(Eq. 4.25)
535.719 902.26 1218.7 1643.1 1966.2 2290.4 2615.7
Flood Frequency Models
 Standard Error
• The standard error of estimate of the standard
deviation of event magnitude is computed from
samples about the true event magnitude.
• The standard error, se (QT), expresses the uncertainty
in QT due to the fact μ and σ are estimated from a
sample of finite size N rather than measured from
the entire population of values.
Flood Frequency Models
• If repeated random samples of size N were available each
would yield a different numerical value of QT for each T.
• The entire ensemble of possible values of QT for any T,
obtained from such repeated samples can be considered
to have a distribution known as its sampling distribution.
• The standard deviation of such a sampling distribution is
known as a standard error. It is proportional to the
square of KT and is inversely proportional to √N. Its
algebraic form depends on the distribution function of
the population.
Flood Frequency Models
• In the EVI case when estimation is by methods, as
above,

{1+1.14KT + 1.10K2
• se (QT) = }1/2
N
T
ii) Pearson Type III
• The Pearson Type 3 distribution is a three-parameter
one. The three parameters represent measures of
location, scale and shape respectively.
Flood Frequency Models
• The probability density function is:
QQ0 

f Q 
e 
Q  Q0  

  
1


 
• Where;
• Qo= location parameter, in this case a lower bound
• β= scale parameter
• γ = shape parameter
• Г(γ) = Complete Gamma Function
Flood Frequency Models
• The mean, standard deviation and skewness can be
shown to be:
• µ = Q0 +βγ
 
• g = 2/ 

• When γ =1, the distribution reduces to the


exponential. As γ tends to infinity, g tends to 0 and
the distribution tends to the normal distribution.
Flood Frequency Models
• The distribution function is   Q  (4.28)
F Q Q  ft dt
• where f(t) is the pdf  0

• It can be shown that; QT=Q0+ yT


• where yT is the value of y having exceedance
probability 1/T.
• The integral in equation 4.28 is known as the
Incomplete Gamma Function and has no explicit
solution but it has been evaluated for different
values of γ and y.
Flood Frequency Models
• For hydrological use, tables have been prepared by
Harter. These are prepared for use in the expression

•Q =   
T KT
• where KT is a frequency factor, which is given as a
function of g, the skewness. Such tables are attached
for - 3.0<g<3.0
Flood Frequency Models
• Procedure when using method of moments
estimation
• Assemble the N values of annual maximum flood
data Q1, Q2…
• Calculate sample estimates of mean, standard
deviation, third moment and skewness (unbiased
 Qi  Q
2
estimates) 1
Q 
Q
i
 N
 NM 1
M 3 N  Q 
3 3
g   3

QNi 1N 
 2
Flood Frequency Models
• Enter Harter’s table of KT and read off values of KT for
each required T for the calculated value of g.
• Evaluate QT=    K T for each desired value of
T
 Example 4.3
• For the time series data of peak discharges as given
below, estimate the peak discharge for return
periods of 10 and 200 years by using the Pearson
Type III method.
Flood Frequency Models
Year 1 2 3 4 5 6 7 8 9 10
Flood Peak
(m3/s) 11125 7656 11259 8863 9973 11035 11499 7908 7947 8894

• Solution
Flood Frequency Models

Year Peak Discharge Q Qi-Q (Qi-Q)2 (Qi-Q)3

(m3/s) (m3/s)
1 11125 9615.9 1509.1 2277382.81 3436798398.571
2 7656 9615.9 -1959.9 3841208.01 -7528383578.799
3 11259 9615.9 1643.1 2699777.61 4436004590.991
4 8863 9615.9 -752.9 566858.41 -426787696.889
5 9973 9615.9 357.1 127520.41 45537538.411
6 11035 9615.9 1419.1 2013844.81 2857847169.871
7 11499 9615.9 1883.1 3546065.61 6677596150.191
8 7908 9615.9 -1707.9 2916922.41 -4981811784.039
9 7947 9615.9 -1668.9 2785227.21 -4648265690.769
10 8894 9615.9 -721.9 521139.61 -376210684.459
96159 21295946.90 -507675586.920
Flood Frequency Models
• Standard deviation  Q  Q
i
2 = 1538.251
 N 1
m /s
3

• And Q = 96159/10 = 9615.9


10x  507675586.920 = -70,510,498.18
M3 = 9x 8

• Coefficient of skewness of variate
• g = -70510498.180/1538.2513 = -0.02
• For KT read Harter’s Table for g = -0.02
Flood Frequency Models
• Calculation of QT
T KT KT QT=μ+KT 
(years) (m3/s)
10 1.2796 1968.346 11,584.3
200 2.5572 3933.616 13,549.5

 Example 4.4
• Consider the maximum discharges of River Aswa
given in example 4.1 and estimate the peak
discharge for return periods of 100 and 200 years by
using the Pearson Type III method
Flood Frequency Models
 Solution
• Standard deviation Qi  Q = 3080 m3/s
2

N 1
• And Q = 5741
• M3 = 20,054,093,900.25
• Coefficient of skewness of variate , g = 0.69
• For KT read Harter’s Table for g = 0.69
Flood Frequency Models
• Calculation of QT
T KT KT QT=μ+KT
(years) (m3/s)
100 2.817 8675.432 14,416.8
200 3.214 9898.062 15,639.4
Flood Frequency Models
iii) Log Pearson Type III
• The Log Pearson Type III probability distribution is
used for approximation of frequency characteristics
of measured annual flood peak data.
• This distribution has been widely adopted as one of
the standard methods for flood frequency
analysis. In this distribution the transform y =
log x is used to reduce skewness.
Flood Frequency Models
• Although all three moments are required to fit the
distribution, it is extremely flexible in that a zero
skew will reduce the Log-Pearson III distribution to a
Log-Normal and the Pearson Type III to a Normal.
• A very important property of Gamma variates and
normal variates is that the sum of the two such
variables retains the same distribution.
• This is an important feature in the syntheses of
hydrologic sequences.
Flood Frequency Models
• It has been noted, (Alexander, 2002) in the analysis
of floods, no direct method can be used with
confidence for return periods exceeding 50 years.
• The Pearson III distribution used for simulating daily
stream flows in reservoir studies has been advocated
and is recommended when simulating daily flows for
critical flood months.
Flood Frequency Models
 Procedure
• First convert the series of annual maximum flows
(Q1, Q2…QN) into logarithms (Z1, Z2…Zn) where
Zi=logQi, and then to fit the Pearson Type 3
distribution for the Z series by the method of
moments.
• This results in values of ZT, which are converted to QT
values by exponentiation.
• The probability density function of Q, namely f(Q),
can be obtained from that of Z using the relation:
Flood Frequency Models
• f Q   Z
dQ with   evaluated with Z  logQ and
• ddQQZ Q1 where    is the pdf of Z
dZ

distribution.
log Z Q 
0

f Q 
e 

log Z  Q

0 1


Q
 
1 
• When the distribution is expressed in this way, the

moments of it cannot be conveniently expressed in
terms of Qo, β and γ therefore, the practice has
developed of dealing with this distribution entirely in
the log domain
Flood Frequency Models
 Example 4.5
• For the time series data of peak discharges given in
Example 4.2, estimate the peak discharge for return
periods of 10 and 200 years by using the Log-Pearson
Type III method.

• Solution
Flood Frequency Models
Peak
Year Qi=logx Qi-Q (Qi-Q)2 (Qi-Q)3
Discharge
(m3/s)

1 11125 4.046 0.0684 0.0047 0.00032


2 7656 3.884 -0.0939 0.0088 -0.00083
3 11259 4.051 0.0736 0.0054 0.00040
4 8863 3.948 -0.0303 0.0009 -0.00003
5 9973 3.999 0.0209 0.0004 0.00001
6 11035 4.043 0.0649 0.0042 0.00027
7 11499 4.061 0.0828 0.0068 0.00057
8 7908 3.898 -0.0798 0.0064 -0.00051
9 7947 3.900 -0.0777 0.0060 -0.00047
10 8894 3.949 -0.0288 0.0008 -0.00002
39.779 0.045 -0.000290
Flood Frequency Models
• Standard deviation log domain     Q i  Q2
N  1
=
0.070
• And Q10x
= μ0.000290
z = 39.779/10 = 3.978
= -0.0000403
M =
3 9x8

• Coefficient of skewness of log variate
• g = -0.0000403/0.0703 = -0.12
• For KT read Harter’s Table for g = -0.12
Flood Frequency Models
• Calculation of QT
T KT KT  ZT=μz+KT QT = Antilog ZT
(years) (m3/s)

10 1.2676 0.0887 4.0667 11,660.0


200 2.4628 0.1724 4.1504 14,138.4
Flood Frequency Models
 Example 4.6
• Consider the maximum discharges of River Aswa
given in example 4.1 and estimate the peak
discharge for return periods of 100 and 200 years by
using the Log-Pearson Type III method.
• Solution
• Standard deviation log domain   QN 1 Q =0.269
i
2

• And Q = μz = 110.677/30 = 3.689


• M3 = -0.0148
Flood Frequency Models
• Coefficient of skewness of log variate
• g = -0.0148/0.2693 = -0.76
• For KT read Harter’s Table for g = -0.76
• Calculation of QT
T KT KT σ ZT=μz+KT σ QT = Antilog ZT
(years) (m3/s)
100 1.7622 0.474 4.163 14,555.7
200 1.8726 0.504 4.193 15,585.8
Flood Frequency Models
• Below is a table comparing the above methods. Note
that the values for Pearson Type III and Log Pearson
are closer to each other.
T (yrs) EVI (m3/s) Pearson Type III (m3/s) Log Pearson (m3/s)
100 15401 14416.8 14555.7
200 17071 15639.4 15585.8
Flood Frequency Models
 4.5.2 Partial Duration Series or Peak over
the Threshold Model
i) Exponential Distribution
• Some sequences of hydrologic events, such as the
occurrence of precipitation, may be considered
Poisson processes, in which events occur
instantaneously and independently on a time
horizon.
Flood Frequency Models
• The time between such events, or inter arrival time,
is described by the exponential distribution whose
parameter is the mean rate of occurrence of the
events.
• The exponential distribution is used to describe
arrival times of random shocks to hydrologic systems,
such as slugs of polluted runoff entering streams as
rainfall washes the pollutants off the land surface.
• It represents the Partial duration model.
Flood Frequency Models
• This model replaces the continuous hydrograph of
flows by a series of randomly spaced spikes on the
time axis. The spikes themselves are of random size.
a) Inter-Event times
• The inter event times t2-t1, t3-t2……are random
variables and it is possible to treat them as having
various statistical properties.
• However, in application only the average time
between events, t needs to be known.
Flood Frequency Models
• Its reciprocal 1 – F(Q) = 1/t is the rate or number of
peaks which occur in unit time.
• In our context we measure time in years and 1/t then
is in units of number of events per year.
Flood Frequency Models
• Table 4.3: Harter’s Tables
KT values for Pearson Type III distribution (positive skew)

Return period in years


2 5 10 25 50 100 200
Skew Exceedence probability

coefficient 0.5 0.20 0.10 0.04 0.02 0.01 0.005


G
3.0 -0.296 0.420 1.180 2.278 3.152 4.051 4.970
2.9 -0.390 0.440 1.195 2.277 3.134 4.013 4.909
2.8 -0.384 0.460 1.210 2.275 3.114 3.973 4.847
2.7 -0.376 0.479 1.224 2.272 3.093 3.932 4.783
2.6 -0.368 0.499 1.238 2.267 3.071 3.889 4.718
2.5 -0.360 0.518 1.250 2.262 3.048 3.845 4.652
2.4 -0.351 0.537 1.262 2.256 3.023 3.800 4.584
'2.3 -0.341 0.555 1.274 2.248 2.997 3.753 4.515
2.2 -0.330 0.574 1.284 2.240 2.970 3.705 4.444
2. I -0.319 0.592 1.294 2.230 2.942 3.656 4.372
2.0 -0.307 0.609 1.302 2.219 2.912 3.605 4.298
1.9 -0.294 0.627 1.310 2.207 2.881 3.553 4.223
1.8 -0.282 0.643 1.318 2.193 2.848 3.499 4.147
1.7 -0.268 0.660 1.324 2.179 2.815 3.444 4.069
1.6 -0.254 0.675 1.329 2.163 2.780 3.388 3.990
1.5 -0.240 0.690 1.333 2.146 2.743 3.330 3.910
1.4 -0.225 0.705 1.337 2.128 2.706 3.271 3.828
1.3 -0.210 0.719 1.339 2.108 2.666 3.211 3.745
1.2 -0.195 0.732 1.340 2.087 2.626 3.149 3.661
1.1 -0.180 0.745 1.341 2.066 2.585 3.087 3.575
1.0 -0.164 0.758 1.340 2.043 2.542 3.022 3.489
0.9 -0.148 0.769 1.339 2.018 2.498 2.957 3.401
0.8 -0.132 0.780 1.336 1.993 2.453 2.891 3.312
0.7 -0.116 0.790 1.333 1.967 2.407 2.824 3.223
0.6 -0.099 0.800 1.328 1.939 2.359 2.755 3.132
0.5 -0.083 0.808 1.323 1.910 2.311 2.686 3.041
0.4 -0.066 0.816 1.317 1.880 2.261 2.615 2.949
0.3 -0.050 0.824 1.309 1.849 2.211 2.544 2.856
0.2 -0.033 0.830 1.301 1.818 2.159 2.472 2.763
0.1 -0.017 0.836 1.292 1.785 2.107 2.400 2.670
0.0 0 0.842 1.282 1.751 2.054 2.326 1.576
Flood Frequency Models
• KT values for Pearson Type III distribution (negative skew)
KT values for Pearson Type III distribution (negative skew)
Return period in years
2 5 10 25 50 100 200
Skew Exceedence probability
Coefficient 0.5 0.20 0.10 0.04 0.02 0.01 0.005
G
-0.1 0.017 0.846 1.27 1.716 2 2.252 2.482
-0.2 0.033 0.850 1.258 1.680 1.945 2.178 2.388
-0.3 0.050 0.853 1.245 1.643 1.890 2.104 2.294
-0.4 0.066 0.855 1 231 1.606 1.834 2 029 2.201
-0.5 0.083 0.856 1.216 1.567 1 777 1.955 2.108
-0.6 0.099 0.857 1.200 1.528 1.720 1.880 2.016
-0.7 0.116 0.857 1.183 1.488 1.663 1.806 1.926
-0.8 0.132 0.856 1.166 1.448 1.606 1.733 1.837
-0.9 0.148 0.854 1.147 1.407 1.549 1.660 1.749
-1.0 0.164 0.852 1.128 1.366 1.492 1.588 1.664
-1.1 0.180 0.848 1.107 1.324 1.435 1.518 1.581
-1.2 0.195 0.844 1.086 1.282 1.379 1.449 1.501
-1.3 0.210 0.838 1.064 1.240 1.324 1.383 1.424
-1.4 0.225 0.832 1.041 1.198 1.270 1.318 1.351
-1.5 0.240 0.825 1.018 1.157 1.217 1.256 1.282
-1.6 0.254 0.817 0.994 1.116 1.166 1.197 1.216
-1.7 0.268 0.808 0.970 1.075 1.116 1.140 l.155
-1.8 0.282 0.799 0.945 1.035 1.069 1.087 1.097
-1.9 0.294 0.788 0.920 0.996 1.023 1.037 1.044
-2.0 0.307 0.777 0.895 0.959 0.98 0.990 0.995
-2.1 0.319 0,765 0.869 0.923 0.939 0.946 0.949
-2.2 0.330 0.752 0.844 0.888 0.900 0.905 0.907
-2.3 0.341 0.739 0.819 0.855 0.864 0.867 0.869
-2.4 0.351 0.725 0.795 0.823 0.830 0.832 0.833
-2.5 0.360 0.711 0.771 0.793 0.798 0.799 0.800
-2.6 0.368 0.696 0.747 0.764 0.768 0.769 0.769
-2.7 0.376 0.681 0.724 0.738 0.740 0.740 0.741
-2.8 0.384 0.666 0.702 0.712 0.714 0.7I4 0.714
-2.9 0.390 0.651 0.681 0.683 0.689 0.690 0.690
-3.0 0.396 0.636 0.666 0.666 0.666 0.667 0.667
Flood Frequency Models
• Kf values for log-Pearson Type III distribution
Kf values for log-Pearson Type III distribution
Coefficient Kf for recurrence interval T in years
of skew g 2 5 10 25 50 100 200 1000
3.0 -0.396 0.420 1.180 2.278 3.152 4.051 4.970 7.250
2.5 -0.360 0.518 1.250 2.262 3.048 3.845 4.652 6.600
2.2 -0.330 0.574 1.284 2.240 2.970 3.705 4.444 6.200
2.0 -0.307 0.609 1.302 2.219 2.912 3.605 4.298 5.910
1.8 -0.282 0.643 1.318 2.193 2.848 3.499 4.147 5.660
1.6 -0.254 0.675 1.329 2.163 2.780 3.388 3.990 5.390
1.4 -0.225 0.705 1.337 2.128 2.706 3.271 3.828 5.110
1.2 -0.195 0.732 1.340 2.087 2.626 3.149 3.661 4.820
1.0 -0.164 0.758 1.340 2.043 2.542 3.022 3.489 4.540
0.9 -0.148 0.769 1.339 2.018 2.498 2.957 3.401 4.395
0.8 -0.132 0.780 1.336 1.998 2.453 2.891 3.312 4.250
0.7 -0.116 0.790 1.333 1.967 2.407 2.824 3.223 4.105
0.6 -0.099 0.800 1.328 1.939 2.359 2.755 3.132 3.960
0.5 -0.083 0.808 1.323 1.910 2.311 2.686 3.041 3.815
0.4 -0.066 0.816 1.317 1.880 2.261 2.615 2.949 3.670
0.3 -0.050 0.824 1.309 1.849 2.211 2.544 2.856 3.525
0.2 -0.033 0.830 1.301 1.818 2.159 2.472 2.763 3.380
0.1 -0.017 0.836 1.292 1.785 2.107 2.400 2.670 3.235
0.0 0.000 0.842 1.282 1.751 2.054 2.326 2.576 3.090
-0.1 0.017 0.836 1.270 1.716 2.000 2.252 2.482 2.950
-0.2 0.033 0.850 1.258 1.680 1.945 2.178 2.388 2.810
-0.3 0.050 0.853 1.245 1.643 1.890 2.104 2.294 2.675
-0.4 0.066 0.855 1.231 1.606 1.834 2.029 2.201 2.540
-0.5 0.083 0.856 1.216 1.567 1.777 1.955 2.108 2.400
-0.6 0.099 0.857 1.200 1.528 1.720 1.880 2.016 2.275
-0.7 0.116 0.857 1.183 1.488 1.663 1.806 1.926 2.150
-0.8 0.132 0.856 1.166 1.448 1.606 1.733 1.837 2.035
-0.9 0.148 0.854 1.147 1.407 1.549 1.660 1.749 0.910
-1.0 0.164 0.852 1.128 1.366 1.492 1.588 1.664 1.880
-1.4 0.225 0.832 1.041 1.198 1.270 1.318 1.351 1.465
-1.8 0.282 0.799 0.945 1.035 1.069 1.087 1.097 1.130
-2.2 0.330 0.752 0.844 0.888 0.900 0.905 0.907 0.910
-3.0 0.396 0.636 0.660 0.666 0.666 0.667 0.667 0.668
Flood Frequency Models
b) Peak Magnitudes
• The peak magnitudes Q1, Q2 ……… have to be described
statistically by their joint probability function
h(Q1,Q2,Q3,……….) but unfortunately the assumption
that the Q1s are mutually independent in the statistical
sense usually holds true so the joint pdf is obtained
from the product of the individual pdf’s.
• It can also be assumed that the Q1s are identically
distributed, although it would be quite reasonable
to question this assumption, for instance by
asserting that, on average, long rains (floods) are
larger than short rains.
Flood Frequency Models
• The evidence for this assertion is not striking however,
and should not be confused with the fact that in
tropical climates, more peaks occur in winter than in
summer.
• Hence, flood peaks Q1, Q2 …….., are considered to be
identically and independently distributed i.e. are
randomly drawn from a single population with
probability density function f(Q).
• This allows us to use the sample to estimate the
parameters of f(Q) and thus expresses in concise
mathematical form the statistical properties of the
flood population.
Flood Frequency Models
c) Relationship between Magnitude and
Return Period.
• The T year flood, QT, occurs on average once every T
years. If there are λ floods/year in the series then
QT occurs on average once among λT floods.
• In the population of floods which exceed qo this
magnitude has exceedance probability =1/ λT. That is
1 – F (QT > qo) = 1/T
OR F (QT > qo) = 1 - 1/T
Flood Frequency Models
• In particular if F(QT) is assumed, as is the case, to be
exponential and hence the Exponential
Distribution function will apply
• (4.31)
• Then (4.31) gives; QT = qo +β In ( λT)
• Note that the mean, standard deviation and
skewness of the distribution defined by equations
4.1,4.2 and 4.5 are:
Flood Frequency Models
• µ= qo +β
• α= β
• g=2

d) Parameter Estimation
• The expression for QT contains three unknowns Qo,
and and these must be estimated from observed
data. This can be done in either of two distinct ways
from a record of N years.
Flood Frequency Models
i. Fix Qo a priori and abstract from the record of flows
every peak value exceeding qo. Let there be M of
them (Q1, Q2………QM).
• Then; λ = M/N
β =µ – Qo
• Where; µ= Q = Qi / N
• β=σ
• Qo = µ- σ
Flood Frequency Models
• Where, µ= Q as above and
• Both of these methods use the methods of moments
for parameter estimation. Other estimators could
be used also.
• These use different criteria for matching the
population to the sample and hence their algebraic
expression and resulting numerical results may differ
from the above.
Flood Frequency Models
e) Standard Error of Estimate
• An approximate expression for se(QT) for use with
the above method is
• se (QT) =  { (1)  InT + (In λT)2 λ }1/2
2

M M 1

f) Notation
• The series of peaks exceeding the threshold qo is
known as a partial duration series.
Flood Frequency Models
• It is the series remaining after truncating the entire
parent series at qo. When qo is chosen so that λ =1
the series is known as the annual exceedance series.
• The term ‘Peaks over a threshold’ (POT) series is
used synonymously with partial duration series.
Flood Frequency Models
 Example 4.7
• The highest 77 peaks recorded on River Mayanja
between 29th March 1973 to 2nd May 1991 are given
in the table below. Using hydrometric years
extending from 1st May to 30th April, this record has
N = 19 years.
• Estimation Method (i): Qo fixed at 60m3/s
• The choice of Qo = 60m3/s is arbitrary. Having chosen
this, count the number of peaks exceeding it and
calculate their mean. This gives M = 40 peaks with
the following values:
Flood Frequency Models
125.0 116.0 79.9 133.3 196.0 324.9 99.4 65.9
108.6 126.7 172.8 166.3 118.9 63.4 113.6 80.5
125.5 112.6 140.9 122.8 186.3 158.3 127.7 94.8
115.6 66.9 115.4 181.3 137.4 190.3 146.3 151.5
127.8 116.0 111.9 65.2 200.1 192.9 210.5 219.4

• These 40 peaks have mean 137.72m3/s.


• Then;λ = M/N = 40/19 = 2.11 peaks per year
• β = µ – Qo
• β= 137.72 – 60.0 = 77.72 m3/s
Flood Frequency Models
• and QT = Qo + βln(λ T)
= 60 + 77.72 ln (2.11T)
= 118.03 + 77.72lnT (a)
Estimation Method (ii): fixed at 2 peaks per year
• This choice of λ is arbitrary. Having chosen λ,
this determines M = λN = 2x19=38, the number of
peaks which have to be included in the series.
• The highest 38 peaks are listed above with two
smallest ones 63.4 and 65.2 omitted.
Flood Frequency Models
• The mean and standard deviation of the 38 values
are:
• Q = ∑Qi/38 = 134.50 m3/s
• σ = √2501= 50.01 m3/s
• hence ; β= σ = 50.01
• Qo = Q –β = 134.50-50.01= 84.49 m3/s
• And ; QT = Qo +β ln (λ T)
• = 84.49 + 50.01 ln (2T)
• = 119.15 + 50.01 lnT (b)
Flood Frequency Models
T (years) 1 2 5 10 25 50
QT (a) 118.0 171.9 243.1 297 368.2 422.1
QT (b) 119.2 153.8 199.6 234.3 280.1 314.8
se(QT) (Eq(4.36) 1.2 2.5 4.1 5.4 7.0 8.3
Flood Frequency Models
 4.5.3 Lognormal Distribution
• If the random variable Y=logX is normally distributed,
then X is said to be lognormally distributed.
• The lognormal distribution has the advantage over
the normal distribution that it is bounded (X>0) and
that the log transformation tends to reduce the
positive skewness commonly found in hydrologic
data, because taking logarithms reduces large
numbers proportionately more than it does small
numbers.
Flood Frequency Models
• Some limitations of the lognormal distribution are
that it has only two parameters and that it requires
the logarithms of the data to be symmetric about
their mean.
• When the coefficient of skewness g = 0, then Log-
Pearson Type III frequency distribution is reduced to
log-normal distribution. A log-normal distribution
plots as a straight line on logarithmic probability
paper.
Flood Frequency Models
 Example 4.8
• For the time series data of peak discharges given in
Example 4.2, estimate the peak discharge for return
periods of 100 and 200 years by using the Lognormal
distribution method.
• Solution
• From Example 4.2
Q  Q
2

• Standard deviation  
i

N 1
= 0.269
• And Q = μz = 110.677/30 = 3.689
Flood Frequency Models
• M3 = -0.0148
• Coefficient of skewness of log variate
• g = -0.0148/0.2693 = -0.76
• For KT read Harter’s Table for g = -0.76
• For KT read Harter’s Table for g = 0
• Calculation of QT
T KT KT σ ZT=μz+KT σ QT=Antilog ZT
(years) (m3/s)
100 2.326 0.626 4.315 20,639.3
200 2.576 0.693 4.382 24,095.9
Flood Frequency Models
4.5.4 Gamma Distribution
• The time taken for a number of events to occur in a
Poisson process is given by the gamma
distribution, which is a distribution of a sum of
independent and identical exponentially
distributed random variables.
• It has been used to describe the distribution of depth
of precipitation in storms.
Flood Frequency Models
• But unlike log-normal distribution it has not been
possible to transform the coordinate scales in such a
manner that all cumulative gamma distributions
could be plotted as a straight line in order to judge
visually (approximately) whether an empirical
frequency distribution could be fitted by gamma
distribution.
• This has distinct disadvantages vis-à-vis log-normal
distribution and makes it less popular among the
users.
Flood Frequency Models
 4.6 Errors in Frequency Estimation
• Errors in estimating QT the flood of return period T
may arise under two categories (Cunnane, 1983);
a. Model Errors
b. Sampling Errors
• 4.6.1 Model Error
• A Model Error is one made in the analysis.
Flood Frequency Models
• In analyzing annual maximum or minimum flow
series for instance, it is assumed that the available
AM series is a simple random sample from a single
population with distribution function F(Q). This
assumption implies that the:
a) Series is one of many possible such series which
could have occurred, each series having an equal
chance of occurring (random sample).
b) Population did not change with time during the
period of observation (stationarity).
Flood Frequency Models
c) Value occurring in year t, Qt, is independent of the
value which occurred in previous years, Qt-1, Qt-2
…………… this is referred to as lack of persistence.
d) Algebraic form F(Q) of the distribution is known,
and,
e) Relation between Q and T is the same in the model
as it is in nature.
• The assumption causing the biggest concern is d),
that the correct form of distribution F(Q) is known.
Flood Frequency Models
• Within the range of the observed data, two quite
different distributions might appear to describe the
distribution of the sample data quite well even
though the two distributions might be very different
in their tails.
• In one of these, QT may increase almost linearly with
log T while in another it may increase much more
rapidly at large T causing a rapid divergence of
estimates QT, as T increases.
Flood Frequency Models
• Within the range of observed data, as viewed on a
probability plot say, both types of distribution
may seem to be supported to some extent.
• In such a case guidance ought to be available from
the studies which have been made of flow records
world-wide but no absolutely firm knowledge is yet
available.
• It may require many more years of data to become
available before firm statements about the form
of distribution can be made.
Flood Frequency Models
• It is important to note that in this analysis one
assumes that the volumes are statistically
independent of one another.
• This means that the fact that having a hydrological
event in one year has nothing to do with whether or
not we have the hydrological event the next year or
the year after that.
• This serial independence is a reasonable assumption
in small watersheds. In large watersheds there might
be some inherent persistence
Flood Frequency Models
 4.6.2 Sampling Errors
• A sampling error arises because the series of flows
being analyzed is but a sample (assumed
random) from an unknown population.
• Any quantity calculated from such a sample is a
statistic with its own theoretical sampling
distribution, the standard deviation of which is called
the standard error of the statistic.
• In 2-parameter distributions a quantile can be
written as:
Flood Frequency Models
• QT= µ+ σKT
• where KT is a frequency factor as mentioned in
Section 4.4 dependent on T and on the form of the
distribution being assumed for Q.
• A sample provided estimates of µ and σ,
namely Q
and σ and then an estimate of QT is
• QT = Q + σKT
• Its sampling variance can always be expressed as
• var (QT) = var Q + 2KT cov (Q,σ ) + K varσ
Flood Frequency Models
• where the variances and covariances on the RHS are
those of the sampling distributions of Q and σ.
• These sampling distributions and their variances
depend on the method of estimation as well as on
the distribution.
• As an example se(QT) = √var(QT) in the EV1
distribution, when estimation is by the method of
moments is
• se(QT) = N {1 + 1.14K T + 1.10K2T }1/2
Flood Frequency Models
• while in the Normal distribution it is
• se (QT) =  {1+ K2T }1/2
N
• where in this case KT = yT where y is the standardised
normal N(0,1) variate whose distribution function
is widely tabulated.
• Formulae for se(QT) are less readily derived for three
parameters distributions and values for se(QT) have
been obtained in some such cases by simulation
methods.
Flood Frequency Models
• The magnitude of se(QT) is about 10% - 15% of QT for
two parameter distributions when the record length
is about 20 years.
• This is not very large and it is very much less
damaging than the error which could occur by
choosing the wrong form of distribution.
• The main conclusion is that model assumption is far
more damaging than sampling error.
Flood Frequency Models
• The annual maximum model with the EV1
distribution was used in study of flooding on Lake
Albert and the results showed that the highest
effective inflow (8971 cumecs) into the lake has a
recurrence interval of 28 years, whereas the highest
outflow (3029 cumecs) has a recurrence interval of
59 years, suggesting that the causes of flooding may
be because the maximum flow occurs more
frequently (> 2) and is of a greater magnitude (~ 3)
than the maximum outflow (Rugumayo, Kayondo,
2006).
Flood Frequency Models
 4.7 Flood Models Compared
• 4.7.1 Aim of Each Model
• The ultimate aim of both models is the same. Each
tries to represent the flood peak aspects of the
entire flow hydrograph by a simple series of flood
peak values (Cunnane, 1983).
• 4.7.2 Number of Values in Each series
• The series used in the annual maximum model
consists of one value, the maximum peak flow, from
each year of record. Thus N years of record give N
items in the series.
Flood Frequency Models
• The series used in Partial Duration series model
consists of either the M highest peaks in the entire
record regardless of year of occurrence, here usually
M ≥ N, or alternatively it consists of all peaks which
exceed some threshold flow value Qo.
• The latter form of the series is also called peaks over
a threshold series. The algebraic probability
treatment differs slightly between the two forms of
series definition.
Flood Frequency Models
• If the Partial Duration series is made to consist of the
N highest peaks in the record, where N is the number
of years in the record, then this special case is
traditionally called the Annual Exceedance Series.
 4.7.3 Independence
• In each series the model assumes that successive
items in the series are statistically independent and
come from the same probability distribution (i.e.
identically, independently, distributed).
Flood Frequency Models
• This assumption causes no problems for the Annual
maximum series, but it does for the Partial
Duration series.
• In the latter an arbitrary rule has to be adopted
about whether to include adjacent peaks or reject
one of them.
Flood Frequency Models
• The adoption of an arbitrary rule is always
unsatisfactory and this has lessened the popularity of
the Partial Duration Series model and has
correspondingly increased the popularity of the
Annual Maximum Series model.
• When the two series of data are extracted from the
same record of river flows many flood peak values
occur in both series.
Flood Frequency Models
 4.7.4 Common Values in Each Series
• Those years having low floods, while contributing to
the Annual Maximum (AM) series as a result of its
definition, do not contribute to the Partial Duration
(P.D.) Series.
• On the other hand, years which have 2 or more large
(independent) flood peaks contribute twice or thrice
to the P.D. Series, but only once to the A.M. Series.
• The frequency distribution of flood magnitudes in
the
P.D. series tends to be abruptly truncated at some
threshold, while that of the A.M. series always has
values to the left of the mode.
Flood Frequency Models
• These latter values reflect the presence of annual
maximum values from the years with small floods.
 4.7.5 Statistical Modeling
• The modeling problem in the A.M. series is one of
choice of distribution. Many different distributions
have been suggested including Extreme Value Type 1,
General Extreme Value, Log Normal, Pearson Type 3
and Gamma, Log-Pearson Type 3.
Flood Frequency Models
• The modelling problem in the P.D. series is also one
of choice of distribution coupled with choice of
the number of peaks, M, to include in the series.
• While increasing M increased the amount of
information in the series it sometimes makes the
problem of choice of distribution more difficult.
Design Return Period
 4.8 Selection of Design Return Period
• Return Period: As defined in Section 4.2, the return
period (Tr) indicates the average interval between
the occurrence of floods equal to or greater than a
given magnitude.
• It must be noted that a 50 year flood may not occur
at the end of every 50 years period as no
periodicity is implied.
• It only means that if we consider a very long period,
say, 1000 years, there would be 1000/50= 20
floods of this magnitude or higher magnitude.
Design Return Period
• Such floods can occur even twice or thrice in a year
and at the other extreme it may not occur even for
60-70 years in a stretch. But the total number of such
floods in 1000 years would be equal to 20.
• Risk: the ratio (1/Tr) is the probability (p) with which
the Tr year flood may be equalled or exceeded in
any one year.
Design Return Period
• To select a design flood which is not likely to occur
during the life of the hydraulic structure, the design
return period should be much greater than the
estimated useful life of the structure.
• However, there is still no guarantee that such a flood
would not occur during the useful life as there is
always some risk.
• To reduce the risk, the return period of very
important structures such as spillways for very high
dams is taken very long.
Design Return Period
• The probability that the hydraulic structure does not
fail due to excess flood in any year is equal to (1-
1/Tr).
• If it is assumed that the annual flood peaks are
independent events, the probability that the
structure does not fail in the next N years is equal to
 1 N

1 

therefore, the risk that the structure may fail in
T
r
any

one of the next N years is given by;
• R = 1 - 1  1  (4.44)
N

  r 
T
Design Return Period
• Where; R is called the risk.
• Equation 4.44 may be used to determine the risk
R involved in adopting Tr year flood for a structure
with a useful life of N years
• For example, if the useful life (N) of a structure is 100
years, the risk for adopting Tr=100 years flood is
given by
 100
• R=1- 1  1001  =0.634
Design Return Period
• Thus there is a 63.4% chance of its exceedance in its
useful life, which is rather great risk. To reduce the
risk, we can adopt a flood of Tr = 1000years.
• R = 1-(1-1/1000)100 = 0.095
• Thus the risk further, the return period should be
further increased.
• A judicious selection of the design return period is
made considering economy and other factors. Of
course, it is impossible to entirely eliminate the risk.
Design Return Period
 Example 4.10
• A dam has an expected working life of 25 years, and
is designed for a peak flood of 100 years return
period. Estimate the risk of failure of this dam. If a
risk of 12.5 % is acceptable, what should be the
return period for it?
• Solution
• The risk of failure (R) on the dam by using
• P1= 1 – qn = 1 – (1 - p)n can be expressed as
Design Return Period
• R = 1 – (1 - P)n = 1 – (1 – 1/T)n
• Where; n=25 years, and T = 100 years
• Therefore, R = 1 – (1 – 1/100) 25 = 0.2221
• If R = 12.5% = 0.125, then the return period will be
calculated as
• 0.125 = 1- (1 – 1/T) 25
• Or T = 188 years
Design Return Period
 Example 4.11
• At a station in Apac, Uganda, it was found that
250mm of rainfall has a return period of 25 years.
Determine the probability of one day rainfall which is
equal to or greater than 250 mm, once in 15
succession years.
• Solution
• From T = 1/P = (N+1)/M,
• P= 1/T, therefore
Design Return Period
• Probability of 250 mm of rainfall, P = 1/25 = 0.04
• For once in 15 successive years, n =15, r=1
• From the formula for probability of occurrence of
an event r times in n successive years that is:
• P r,n = nC Prqn-r = n!/((n-r)!r!)
r

• P 1,15 = 15!/ ((15-1)! X 1!) x 0.041 X (0.96) 15-1


• = 0.338
Design Return Period
• Flood Design Standards: The design flood for a
particular hydraulic structure is selected considering
various factors such as the type of structure, its
importance, economy and the development of the
areas surrounding the structure.
• Small structures such as culverts and aqueducts in
remote areas can be designed for less severe floods as
the consequences of their failures may also be serious.
• It may cause temporary inconvenience but no property
damage or loss of life.
Design Return Period
• On the other hand, spillways for high dams located
upstream of large towns will have to be designed for
very severe floods The failure of such spillways may
cause havoc and great loss of property and life.
• Depending upon the severity, floods are classified
into the following types:
i. Probable maximum flood (PMF)
ii. Standard Project flood (SPF)
iii. Design flood
Design Return Period
a) Probable Maximum Flood (PMF): the probable
maximum flood is the flood that may occur from
the most severe combinations of meteorological
and hydrological conditions which are reasonably
possible in the region.
• The estimation of PMF involves a detailed study of
storm patterns, storm transportation and various
other meteorological phenomena.
Design Return Period
• From the critical combinations of storms and
moisture adjustments, the probable maximum
precipitation (PMP) is estimated.
• The minimum water losses are assumed, and the
PMP is applied to the unit hydrograph of the
catchment to estimate the PMF.
• The PMF is an extremely severe flood in the basin.
Spillways of high dams are designed for PMF.
Design Return Period
b) Standard Project Flood (SPF): the standard project
flood is the flood that is likely to occur from a severe
combination of meteorological and hydrological
conditions which are reasonably characteristics of
the drainage but excluding rare combination of these
conditions.
• It is determined by applying the standard project
storm (SPS) to the unit hydrograph.
Design Return Period
• The standard project flood (SPF) is used in the design
of hydraulic structures where the failure of the
structure would cause damage less severe than that
in the case of PMF. Therefore some risk can be taken.
The SPF is generally 40 to 60% of the PMF.
Design Return Period
c) Design Flood: it is flood adopted for the design of a
hydraulic structure after careful consideration of
economic and other factors.
• As the magnitude of the adopted design flood
increases, the capital and maintenance cost of the
structure increases but the probable magnitude of
the expected damage decreases.
• The most economical design flood is found after
studying the various magnitudes of the flood and
the corresponding expected damages.
Design Return Period
• The design flood may be PMF SPF, or smaller flood,
depending on the degree of protection desired
and the cost of hydraulic structure.
• A standard design flood methodology was developed
(Alexander, 2002) that can be used for the design of
most structures in South Africa.
• Design flood applications can be divided into three
broad categories;
i. important structures, where public safety is at risk,
(road bridges;
Design Return Period
ii. structures where the risk of life is minimal,
(for example, minor urban drainage works, and
iii. applications where the purpose is administrative
only (for example designated floodlines in urban
areas).
• Where public safety is at risk, the regional
maximum flood method should be
applied.
• Where public safety is not at risk, cost optimization
procedures based on the standard design flood
can apply and as in the case of floodlines.
Design Return Period
• A brief Summary of the guidelines adopted by
Central Water Commission, India, (1973) for selecting
the design flood for various hydraulic structures is
given below.
a) Spillways for major and medium projects with
storages more than 60Mm3 (=6000ha-m): for the
recommended design flood:-
i. PMF determined by the unit hydrograph and the
probable maximum precipitation (PMP).
ii. If (i) is not possible, the flood with a recurrence
interval of 1000 years.
Design Return Period
b) Permanent barrages, and minor dams with storages
less than 60Mm3
i. The design flood is taken as the standard project
(SPF), determined by the unit hydrograph and the
standard project storm (SPS). The SPS is usually
equal to the largest recorded storm in the region.
ii. Flood with a return period of 100 years.
iii. Either (i) or (ii), which ever gives higher value.
Design Return Period
c) Pickup weirs: to design flood is usually taken as the
flood with a return period of 100 or 50 years,
depending upon the magnitude and the importance
of the project.
d) Aqueducts: For the waterway, the flood with Tr = 50
years, but for foundations and free board, the flood
with Tr= 100 years is taken as the design flood.
e) Projects with very scanty or inadequate data: the
design flood can be found from the empirical
formulae.
 4.9 Linear Regression
• 4.9.1 Fitting regression Equation
• The fitting of a straight line may be done objectively
by one of the following statistical methods:
i. The method of least squares
ii. The method of moments
iii. The method of maximum likelihood
• In this section, the method of least squares is
discussed.
• Two variables y (dependent) and x (independent) can
be correlated by plotting them on x- and y-axis.
• If they are plotted on a straight line, there is a close
linear relationship; on the other hand, if the points
depart appreciably (without a definite trend), the
graph is called a scatter diagram or plot (Das, 2002).
• If the trend is a straight line, the relationship is linear
and has the equation
• y = a + bx
• Number of lines can be obtained depending on the
values of a and b.
• The method of least squares is used to select the line
that fits the data best.
• The principle of least squares states that the best line
for fitting a series of observations is the one for
which the sum of the squares of the departures is
minimum.
• A departure is the difference between the observed
value and the line.
• Since x is the independent variable, the departures
of y are used. The least squares line for the equation
above may be obtained by solving for a and b, the
two normal equations;
• ∑y = na + b ∑x
• ∑xy = a ∑x + b ∑x2
• Where n = number of pairs of observed values of x
and y.
• The most commonly used statistical parameter for
measuring the degree of association of two linearly
dependent variables x and y, is the correlation
coefficient.
x.y (4.47)
r
x .2

y 2

 xy  nxy (4.47a
 n  1
)
x y
• Where;  x  x  x ,  y  y  y
• σ x, σ y = standard deviations of x and y,
respectively
• x, y = middle of each class interval respectively
• If r = 1, the correlation is perfect giving a straight line
plot (regression line).
• r = 0, no relation exists between x and y (scatter
plot).
• r → 1 indicates a close linear relationship.
• If a linear regression cannot be fitted, a quadratic
parabola can be used as the fitting curve, given by
• y = a + bx + cx2
• From the principles of least squares, a, b, and c can
be obtained by solving the three normal equations
• ∑y = na + b ∑x + c ∑x2
• ∑xy = a ∑x + b ∑x2 + c ∑x3
• ∑x2y = a ∑x2 + b ∑x3 + c ∑x4 (4.49)
• Where n = number of pairs of observed values of x
and y.
• Regardless of the type of curve fitted, the correlation
coefficient r is given by Eq. (4.47). The variables x and
y, for instance, may be precipitation and the
corresponding runoff, or gauge height and the
corresponding stream flow, and like that.
• For the exponential function y = cxm
• It can be transformed to a straight line by using
logarithms of the variables as
• Log y = log c + m log x
• By putting log x = X, log y = Y, log c = a and m = b, the
function becomes similar to Eq. (4.45), can be solved
for a and b from Eq. (4.53) and the exponential
function can be determined.
• Whichever fitting gives r→1 by Eq. (4.46), that curve
fitting is adopted. Statistical methods can be applied
to many kinds of meteorological data, such as
precipitation, temperature, floods, droughts, and
water quality.
 4.9.2 Standard Error of Estimate
• A measure of the scatter about the regression line of
y on x in Eq. (4.47) is given by
y  yest (4.52)
S y.x
2 n

• Which 2
is called the standard error of estimate of y
with respect to x; and yest is the value of y for the
given value of x in Eq. (4.45). Sy.x can also be
determined by the expressions
1 r 2 (4.53)
S y.x   y
 y 2  a  y  b  xy
(4.54)
S y.x  n2
n 1 2
S y.x
n2 y

2  b 2 x (4.55)


• Eq. (4.54) can be extended to non-linear regression
equations
 Example 4.12
• Annual rainfall and runoff data for Sezibwa River for
10 years (1950-1959) are given below. Determine the
linear regression line between rainfall and runoff, the
correlation coefficient and the standard error of
estimate.
Year Rainfall Runoff
(mm) (mm)
1950 51 10
1951 94 22
1952 65 15
1953 42 12
1954 73 17
1955 112 19
1956 106 20
1957 86 18
1958 59 13
1959 84 16
• Solution
• The regression line computations are given below:
Rainfall Runoff
(mm) x (mm) y x2 xy ∆x= x-  ∆y= y-  (∆x)2 (∆y)2 ∆x. ∆y
x y
51 10 2601 510 -26.2 -6.2 686.44 38.44 162.44
94 22 8836 2068 16.8 5.8 282.24 33.64 97.44
65 15 4225 975 -12.2 -1.2 148.84 1.44 14.64
42 12 1764 504 -35.2 -4.2 1239.04 17.64 147.84
73 17 5329 1241 -4.2 0.8 17.64 0.64 -3.36
112 19 12544 2128 34.8 2.8 1211.04 7.84 97.44
106 20 11236 2120 28.8 3.8 829.44 14.44 109.44
86 18 7396 1548 8.8 1.8 77.44 3.24 15.84
59 13 3481 767 -18.2 -3.2 331.24 10.24 58.24
84 16 7056 1344 6.8 -0.2 46.24 0.04 -1.36
∑= 772 162 64468 13205 4869.6 127.6 698.6
• x n x 772

77.2 mm
  10
• y  y 162  16.2 mm

 n
• From10
Equations 4.44
• 162 = 10a +772b
• 13205 =772a + 64468b
• Solving the above two simultaneous equations gives
• a = 5.125 b = 0.144
• Therefore the regression line is; y= 0.144x+5.125
• Or; R = 0.144P + 5.125 where R and P are in
mm r  x.y
698.6 coefficient 
• rCorrelation =0.886
2 2
  x . y
 4869.6 x 127.6

• Or; b  r  x   y 22
  y 
r x
127.6
0.144 
r 4869.6

• Standard error of estimate


1 r
S y.x  2

y
 

2
y 127.6

  y  y 
  
n
2
 10 1 = 3.765 mm
y
n 1

1
S y.x  3.7651 
0.886 2 = 1.746 mm
 4.9.3 Linear Multiple Regression
• A regression equation for estimating a dependent
variable, say x1, from independent variables x2,x3, …
is called a regression equation of x1 on x2, x3, … and
like that; for three variables, it is given by
• x1 = a +bx2 +cx3
• The constants a, b and c can be determined by the
method of least squares. The least square regression
plane of x1 on x2 and x3 can be determined by solving
simultaneously the three normal equations
x 1  an  b  x2  c  x3
2
 x 1x 2  a  x 2  b  x 2  c 
x 3 1x3  ax3  b x2x3  c 
2 xx
3

• where
3
n is the set of data points (x1, x2, x3)
x
• The standard error of estimate of x1, with respect to
x2 and x3 is given by

S 1.23
  x1 n x 13e s t 
2
• where x1est = value of x1 for the given value of x2 and
x3 in Eq. (4.56)
• The coefficient of multiple correlation is given by
1  S 12 . 2 3

r 1.23 2
 1

• Where σ₁ = standard deviation of x1 and r²₁₂₃ is


called the coefficient of multiple determination. The
value of r₁₂₃ lies between 0 and 1. Also
r 12  r 13 2r 12 r 13 r 23
2 2 
r1.23 
1  2r 223

r1.23
1 1 r 1 r 
2
1
2
1


2 3
 1 2 1 2
r12 x x  nx x
 n 1   1  2
• r12 = the linear correlation coefficient between the
variables x1 and x2, ignoring the variable x3; and
similarly r13 and r23. r12, r23 are partial correlation
coefficients.
• From Eq. (4.66), S1.23  1 1 r21.23
very similar to Eq.(4.53).

 4.9.4 Chi-Square Goodness of Fit Test


• Chi-square test is one of the most common used
tests of significance. The distribution has its
importance in getting the critical values of X2-variate.
• For convenience, the table for critical values of X2 at
various levels of significance and for different
degrees of freedom is provided
• The Chi-square test is applicable to test the hypotheses
of the variance of a normal population, goodness of fit
of the theoretical distribution to observed frequency
distribution, in a one way classification having k-
categories.
• It is also applied for the test of independence of
attributes, when the frequencies are presented in as
two-way classification called the contingency table.
• A plot is made of the actual observations and expected
observations on the same axis against recurrence
interval.
• From this it can be visually seen how close the actual
expected values are to the actual values. The
correlation coefficient is obtained and this is a
measure of how correlated these values are to each
other.
• In the Chi-square test, a comparison is made
between actual observations and expected
observations.
• The chi-square goodness of fit test is expressed by
the relationship:
 i O 
i
  E 2
 2
E
• Where; Oi is the actual value
E is the expected value
• Degrees of freedom for are (k-p-1) where k is the
number of class intervals and p is the number
of parameters of the distribution estimated.
• For lognormal distribution two parameters μ and σ
are estimated. Hence, in this case, chi-square has (k-
3) degrees of freedom (Aggarwal, 2007).
 Example 4.13
• Chi-square test for Log-Normal distribution
• The table below shows the monthly rainfall data for
Kitgum. Subject the data to goodness of fit test using
the Log Normal distribution
O (m3/s) 168 163.58 153.83 143.08 137.42 134.17 132.33 129.50 127.08 126.9 124.05 122.67

Recurrence 31 15.50 10.33 7.75 6.20 5.17 4.43 3.88 3.44 3.1 2.81 2.58
Interval
• Solution
No. Recurrence O Log(x) Kt Z E (O - E) (O - E)2 / E
Interval
1 31.00 168.00 2.23 1.824 2.22 167.56 0.44 0.00
2 15.50 163.58 2.21 1.454 2.21 161.04 2.54 0.04
3 10.33 153.83 2.19 1.292 2.20 158.27 -4.44 0.12
4 7.75 143.08 2.16 1.084 2.19 154.78 -11.70 0.88
5 6.20 137.42 2.14 0.948 2.18 152.54 -15.12 1.50
6 5.17 134.17 2.13 0.857 2.18 151.06 -16.89 1.89
7 4.43 132.33 2.12 0.682 2.17 148.25 -15.92 1.71
8 3.88 129.50 2.11 0.528 2.16 145.83 -16.33 1.83
9 3.44 127.08 2.10 0.404 2.16 143.90 -16.82 1.97
10 3.10 126.90 2.10 0.309 2.15 142.44 -15.54 1.70
11 2.81 124.05 2.09 0.227 2.15 141.20 -17.15 2.08
12 2.58 122.67 2.09 0.163 2.15 140.23 -17.56 2.20
25.67 2 = 15.92
• Mean =
• 2.139
Standard deviation = 0.047
• From the table of chi-square at = 0.05 and (12-2 -1) =
9 d.f.  0.05,11
2
= 16.92
• The calculated value of chi-square is less than the
table value, it means that  cal2 lies in the acceptance
region. Hence the given distribution follows a
lognormal distribution.
 4.10 Frequency Analysis and
Ungauged Catchments
• The frequency analysis techniques cannot be directly
applied to ungauged catchments, because they are
dependant on the availability of data.
• One of the very useful techniques designed to tackle
this problem that was developed in the Flood
Studies Report UK, is the use of regional curves.
• This would allow for the estimation for the
magnitude of the flood peak of any return period for
ungauged catchments.
• A regional curve is a dimensionless plot of the ratio
of flood peak (QTr) of return period Tr to mean annual
flood (Q) against return period (Tr).
• By combining the records of gauged catchments in a
particular region, a single regional curve may be
plotted. For ungauged catchments Q may be
estimated by using catchment characteristics and
QTr/Q from the regional curve.
• Furthermore, for short records frequency analysis is
unreliable and hence in this case Q may be estimated
from the record and QTr can be found using the
regional curves.
 4.10.1 Catchment Characteristics
• The techniques presented in the Flood Studies
Report, provided a basis for flood prediction for
ungauged catchments.
• This meant the development of quantitative
relationships between catchment characteristics and
flood magnitudes for large numbers of gauged
catchments and the application of these results to
ungauged catchments by use of multiple regression
techniques.
• The physical processes occurring in the hydrological
cycle provide us with a basis for the assessment of
runoff within a catchment (Chadwick and Morfet,
1989).
• Circulation of water takes place from the ocean to
the atmosphere via evaporation, and this water is
deposited on the catchment mainly as rainfall. From
there, it may follow several routes, but eventually
the water returns to the sea via the rivers.
• Within the catchment, several circulation routes are
possible. Rainfall is initially intercepted by vegetation
and may be re-evaporated.
• Secondly, infiltration into the soil or overland flow to
a stream channel or river may occur.
• Water entering the soil layer may remain in storage
(in the saturated zone) or may percolate to the
ground water table (the saturated zone).
• All subsurface water may move laterally and
eventually enter a stream channel.
• The main characteristics, which determine the
response of the catchment to rainfall, are:
a. Catchment area
b. Soil type(s) and depth(s)
c. Vegetation cover
d. Stream slopes and surface slopes
e. Rock type(s) and areas(s)
f. Drainage network (natural and man-made)
g. Lakes and reservoirs
h. Impermeable area (e.g. roads, buildings, etc)
• Furthermore, different catchments are in different
climates, hence the response of a catchment to
rainfall, is dependant upon the prevailing climate.
• This may be represented by:
a) Rainfall (depth, duration, and intensity)
b) Evaporation potential (derived from temperature,
humidity, wind speed and solar radiation
measurements or from evaporation pan records.
• However, from an engineering viewpoint, measures
of catchment characteristics are inadequate in
themselves, and quantitative measures are necessary
to predict flood magnitudes.
• In a study on low flows of Eastern catchments in
Uganda (Rugumayo, Ojeo, 2006), the following
equation for rural catchments was derived with a
multiple regression coefficient R of 0.961:-
• Q75(10) = 232.631+ 2.038 x 10-2 MAR - 1.469 x 10-5
AREA - 314S1085 - 22.507STRFQ + 3.195 X 10-3 MSL -
121PE
• Where Q = the flow available 75% of the time
mean annual flood (m3/s)
• AREA=the catchment area (km2)
• STFRQ = the stream frequency (no. of stream
junctions/AREA)
• S1085 = the slope of the main stream
(m/km)
• MAR = Mean annual rainfall (mm)
• PE = Potential Evaporation (mm)
• MSL = Mean stream length (m)

• More details are given in the reference.


• The above equation expresses all the catchment
characteristics, which were found to be statistically
significant and may be applied to ungauged
catchments, in eastern Uganda.
• The equation will only give an approximate value Q
and this reflects the difficulty of predicting natural
events with any certainty.
 4.11 L-Moments And Their Advantages
• L-moments are an alternative system for describing
the shapes of probability distributions that are
modifications of the “probability weighted
moments” (Greenwood et al, 1979).
• The purpose of L-moments (like ordinary moments)
is to summarise theoretical probability
distributions and observed samples.
• They can also be used for parameter estimation,
interval estimation and hypothesis testing.
• L-moments have a theoretical advantage over
conventional moments of being able to characterise
a wider range of distributions and, when estimated
from a sample, of being more robust in the presence
of outliers in the data (Muhara, 2001).
• Since sample estimators of L-moments are always
linear combinations of the ranked observations, they
are less subject to bias than ordinary moments.
• This is because computing ordinary moment
estimators such as skewness and kurtosis requires
squaring and cubing observations, which causes
them to give greater weights to observations that are
far from the mean.
• In a Ugandan study, (Kizza et al, 2006), 7
homogeneous regions were identified for Uganda
(with the 8th (Karamoja area) being left out of the
analysis because it had no reliable data) using the L-
Moment approach.
• These are shown in Table 4.4. The number of sites
used in the analysis per region ranged from 3 to 10.
• The lognormal distribution performed best for
drainage areas of West Nile, Aswa River, Lake Albert,
and Mt. Elgon, while the Generalised Logistic
distribution performed best for Lake Kyoga drainage
area and the Generalised Extreme Value for the
South Western region.
• The generated frequency curves showed variations in
flood generation mechanisms for the different
regions.
• For a given catchment size Mt. Elgon region showed
the highest flood peaking tendency while Lake Kyoga
and Lake Victoria areas showed the lowest peaking
tendencies.
• This may be due to the mountainous location of the
Elgon area while L. Kyoga and Victoria drainage areas
have a number of swamps which attenuate the
floods.
• Regression models were generated for the country as
a whole, rather than individual regions due to lack
of sufficient and reliable data for each region.
• A simple linear regression model relating mean
annual flood (MAF) to catchment area showed better
prediction efficiency than the multiple linear
regression model relating MAF to catchment area
and mean annual rainfall and was therefore
recommended for estimation of flood flows in
Uganda as shown in Table 4.5.
• Table 4.4: Seven Hydrologically Homogeneous Regions in
Uganda
Regional Average L-moment
Area ratios
Region Description
(Sq. km)
CV () Skew (3) Kurt
(4)

U1 West Nile drainage basin. Rivers mainly flow 21,780 0.348 0.381 0.371
into the Albert Nile
U2 Aswa River drainage area 36,816 0.358 0.253 0.162
U3 Lake Albert drainage area 39,577 0.325 0.219 0.188
U4 Lake Kyoga drainage area. Mainly swampy 38,043 0.509 0.378 0.231
U5 Mt. Elgon drainage area 4,100 0.322 0.136 0.105
U6 South western drainage area 33,134 0.217 0.13 0.169
U7 Lake Victoria drainage area 54,610 0.305 0.203 0.129
U8 Karamoja area. There was no reliable data for 22089 N/A N/A N/A
this region to facilitate analysis
Fig 4.8 Delineated Hydrologically Homogeneous Regions in Uganda
• Table 4.5: Selected Regression Models
Model Equation R2 SEE F Significance of F
at 95% level
Simple linear Q  3.921A0.310 0.224 0.506 7.64 0.0113
regression model
Multiple linear regression
model Q  1.239A0.410 R3.224 0.377 0.485 6.05 0.0088

 4.12 Low Flow Studies


• Low flow studies provide one of the ways of deriving
the important parameters that need to be
considered in hydrological and water resources
planning.
• The study of low flows of rivers is aimed at obtaining
methods (Drayton, 1980) that can be used to
estimate the variability of a river when as a resource
is exploited.
• Other applications of low flow studies include;
i. Control of discharge to ensure availability of water
throughout the year;
ii. The design of sewage treatment works,
hydropower and reservoir design and operation;
iii. The design of water supply systems, the protection
of marine life, determining the return periods of
severe droughts, river flow forecasts,
hydrogeological studies and
iv. The licensing of water abstractions.
• There are several definitions for a low flow.
• One such definition states that a low flow is the
average flow over ten days that is exceeded 75% of
the time, while a low flow study refers to the various
ways in which a river regime may be summarized
usually in diagrammatic form.
• The Average Daily Flow (ADF) is defined as the long-
term average rate of runoff, from a catchment and
is expressed in cubic meters per second or cumecs.
• ADFs are used to standardize the flow indices so as
to facilitate comparisons between catchments.
• This standardization may also reduce the bias of
estimated percentiles caused by above or
below average flows, during the period of
record.
• A Flow Duration Curve shows the relationship
between any given discharge and the percentage of
time during which this discharge is exceeded as
shown in Fig 4.9.
• The index related to the Flow Duration Curve is the
average flow over ten days that is exceeded 75%
of the time, Q75 (10) is also expressed as a
percentage of ADF.
• Q75 (10) is taken to be more suitable as an index
than Q75 (1) because it is less sensitive to data error
than the 1 - day duration series. This is discussed
further in Section 9.4.4.
• A Low Flow Frequency Curve shows the proportion
of years, or equivalently the average interval
between years (return period), in which the river falls
below a given discharge, as shown in Fig 4.10.
• The index related to the low flow frequency curve is
the Mean Annual 10-day Minimum, MAM (10)
expressed as a percentage of ADF.
• A Storage Yield Curve is used to describe the
frequency of requirement for a given volume of
storage to supply a given yield.
• The Base Flow Index BFI gives a measure of the
proportion of the river's runoff, that derives from
stored sources and gives an indication of the
catchment geology and soils.
• The Recession Constant KREC allows the user,
knowing the present flow, to predict the flow at any
time in the future for any given catchment.
• Fig 4.9 A 10 Day Flow Duration Curve Fig 4.10 A Low Frquency Curve
FLOW DURATION CURVE OF R. Flow Frequency Curve for R. Namalu
NAMATALA
100 40

Discharge (as a percentage of


0
30
F l o w ( % of A D F )

10
0

ADF)
20
1
0 10

1
0
0 20 40 60 10 -2 -1 0 1 2
80 0 3
Percentage of time a flow is Plotting position,Wi
exceeded.
• Low flow analysis methods were applied to eight
catchments in eastern Uganda and six catchments in
northern Uganda, (Rugumayo, Ojeo, 2006) which are
relatively climatically homogenous and with
sufficient stream flow data, in order to estimate low
flow indices.
• The low flow indices were then correlated with the
catchment characteristics, using statistical software,
to develop relationships for estimating low flow
indices, at ungauged sites.
• It was observed that the multiple regression models
developed were linear, showed a very high degree of
correlation and can be used for preliminary design at
ungauged catchments. These models are illustrated
in Table 4.56
• The additional areas of research in low flow
hydrology include;
i. The understanding of specific low flow generating
mechanisms and relevance of gain and loss
processes to the wide variety of climatic,
topographic and geologic conditions,
ii. The impact of direct or indirect anthropogenic
effects on the low flows like deforestation,
groundwater pumping, conservation farming;
iii. development of methods, which quantify the
The
individual and combined effects of various
anthropogenic effects on low flow characteristics;
iv. With increasing pressure on water resources
emphasis should be placed on finer temporal
resolution of hyrological data and utilisation of
small castchments;
v. The time series of flows or the application of
general measures of catchment flow response;
vi. The use of larger regional data bases of flow
characteristics and vii) the impact of climate change
on low flows (Smakhtin, 2004).
• Table 4.6 Models Generated for Ungauged
Catchments in Eastern Uganda for Low Flow Indices
Independent Variables
Dependant
Variable Constant MAR 10-5 AREA 10-6 S1085 10-3 STRFQ 10-2 MSL 10-3 PE 10-4 KREC R
Q75 (10) 232.631 2038 146.9 314000 -2250.7 3.195 -1210000 0 0.961
Q95(10) 103.586 1130 193.6 -191000 -1042.2 -82.32 -526.8 0 0.989
KREC 1.660 -4.465 5.206 -1.202 -9.819 .-722 -4.719 0 0.978
MAM(10) 73.807 -1139 170.6 .-7332 435.6 -22.16 -225.5 0 0.974
ADF 2.485 387 -456.8 -50.99 -743.1 152000 -24.48 0 0.992
BFI -179 4.917 -2.545 3.667 10700 2.821 1.844 354 1
 4.13 Time Series
• The measurements or numerical values of any
variable that changes with time constitute a time
series.
• A time series has also been defined to be any
collection of observations made sequentially in time
(Chatfield, 1996).
• From these two definitions, it is clear that the two
most important variables that are to be
monitored are the values and the times at which
they occur.
 4.13.1 Hydrologic Time Series
• In general, hydrologic processes such as precipitation
and runoff evolve on a continuous time scale. For
example, a recording gauging station in a stream
provides a continuous record of stage and discharge
y(t) through time.
• A plot of the stream-flow hydrograph y(t) against time t
constitutes a stream-flow time series in continuous
time or simply a continuous time series. In practice
however, most hydrological processes that are of
interest evolve on a discrete time scale.
• Because of convenience or preference, the time series
are usually plotted in the form of a continuous line.
• This can be derived from the discrete time plot by
successively joining the tops of the sticks or bars to
form the desired continuous line. Most hydrologic
series are defined on hourly, daily, weekly, monthly,
bimonthly, quarterly and annual time intervals.
• The term seasonal time series is often adopted and it
refers to time series defined in intervals of time less
than a year (weekly, monthly etc.). Several categories
of time series exist depending on a number of factors
and they are defined below.
a) Single Time Series
• A single time series or a univariate series is a time
series of one hydrologic variable at a given site. The
natural flow of a river constitutes a single time series.
b) Multiple Time Series
• The set of two or more time series consists of a
multiple time series or a multivariate time series.
Multiple time series may be a set of time series of
different variables.
• A multiple time series may also arise at the same
gauging station if different hydrological variables are
being measured at it such as discharge, water depth,
temperature, sediment transport, etc.
c) Correlated and Uncorrelated Time Series
• For a given single time series, if the values say x's at
time t depend linearly on the x's at time t-k, for k
=1,2,. . ., then the time series is said to be auto
correlated, serially correlated or correlated in time.
• Otherwise, the time series is said to be uncorrelated
i.e. it is independent. Autocorrelation in some
hydrological time series such as streamflow time
series usually arises as a result of storage like surface,
soil and groundwater storage, which causes the
water to remain in the system through subsequent
time periods.
d) Cross-correlated Time Series
• If there are two time series with one series as the x
variable and the other as the y variable changing
with time t, then if the y's at time t are linearly
dependent on the x's at time t-k, for k =0,1,2,…, then
the series are said to be cross-correlated. It is
important to note that both time series can be
uncorrelated in time, yet are cross-correlated with
one another.
• Also each time series can be auto correlated in time
without there being cross-correlation between the
two. Just as there are physical reasons why time
series are autocorrelated, there are reasons why
time series are cross-correlated.
• One would expect that the streamflow series at two
nearby gauging stations must be cross-correlated.
• This is because they are relatively close to each other
and as a result they must be exposed to similar
climatic and hydrological activities.
e) Stationary and Nonstationary Time Series,
• A time series is said to be stationary if it is free from
trends, shifts and periodicities (cyclicities). The
implication here is that the statistical parameters of
time series such as the mean and variance remain
constant through time. Otherwise the time series is
said to be nonstationary.
• The general principle is that hydrologic time series
defined on an annual scale are stationary although
this may not be true as a result of large scale climatic
variability, natural disruptions e.g landslides or
human induced changes like the construction of a
reservoir upstream of the gauging station.
• Hydrologic time series defined on a smaller time
scale like monthly or weekly flows are typically
nonstationary mainly because of the annual cycle.
 4.13.2 Partitioning the Time Series
Structure
• Hydrological time series exhibit varying degrees,
trends, shifts, seasonality, autocorrelation and non-
norrnality.
• These attributes of time series are referred to as the
components of the time series.
• The series can be decomposed or partitioned into its
components. Figure 4.11 shows a composite time
series together with other plots where an attempt
has been made to isolate the components.
• Raw data Decomposed into components
of variation
• Fig 4.11 Components in the decomposition of time
series

Time

Historical record Trend Periodicity Stocastic Component


T P Z
i. Trends and Shifts
• Natural and human induced factors may produce
gradual or instantaneous trends and shifts in
the time series.
• Trend is loosely defined as the long term change in
the mean level.
• This brings difficulty in determining what is meant by
long term.
• The number of observations available must be
considered and a subjective assessment made of
what is meant by long term.
• A large forest fire in a river basin can cause sudden
changes or shifts in the runoff time series while the
gradual killing of a forest for example by insects may
result in gradual changes or trends in the runoff
hydrograph.
• An important source of trends and shifts in
streamflow time series is from changes in land use
and the development of reservoirs and diversion
structures.
• Any trend that is easily discernible may be quantified
and removed from the time series.
• The trend, Tt may take the form: An important
source of trends and shifts in streamflow time series
is from changes in land use and the development of
reservoirs and diversion structures.
• Any trend that is easily discernible may be quantified
and removed from the time series. The trend, Tt
may take the form:
• (a linear trend)
Tt  a 
• Or
bt T  a  bt  bt  dr ..... (non-linear trend)
t
2 3

• The coefficients a, b, c, d, …. are usually evaluated by


least squares fitting. In many sample hydrologic
series, there is no obvious shift in the sample mean
over the period of record and the series is assumed
to be trend-free (Shaw, 1994).
ii. Removing Trends
• The most common trends are those in the mean and
in the variance.
• The partitioning of a time series with a simple trend
is shown schematically in Fig 4.12a .
• A linear trend in the mean is shown in Fig. 4.12b.
trend can be removed by the difference   y  / s as
The t

t

shown in Fig.4.12c. y
• The trend in the variance can be removed by (y,-y)/St
as shown in Fig 4.12d.The process of subtracting the
mean and dividing by the standard deviation is
known as standardization.
• Fig 4.12
Removing trends Removing shifts

yt yt
yt
(a) y1 y2
(a’)

yt - yt yt - yt
(b) yt (b’)

St St

(c) Constant S S1 S2 (c’)

yt – yt yt - yt τ +1
S St
(d) (d’)
 iv) Seasonality (Periodicity)
• Hydrologic time series defined on a time scale
smaller than a year generally exhibit distinct seasonal
or periodic patterns.
• This is as a result of the annual revolution of the
earth around the sun which creates the annual cycle.
• The existence of periodic components can
be investigated quantitatively by constructing
a correlogram of the data.
• For a series of data yt the serial correlation
coefficients rK between yt and yt+k are calculated and
plotted against values of K (which is known as the
lag) for all pairs of data K time units apart in the
series.
• The identification of specific periodicities can be
assisted by the complex procedure of spectral
analysis which is explained fully in standard texts like
(Chatfield, 1980).
 v) Removing Seasonality in the Mean and
Variance
• The procedure used in removing the trends in the
mean and variance is applied here too.
• The operation here is referred to as seasonal
standardisation and in most literature is called
deseasonalising the original series.
• This term is misleading since it may imply that the
new series is free from other seasonalities,
which may not necessarily be the case.
 vi) Removing Seasonality in the Correlation
• The correlogram plots shown in Fig.4.13 are a means
of analyzing the dependence structure of the Zt
series.
• Regardless of whether the autocorrelation is
constant or periodic, removing the correlation
structure requires the use of a mathematical model.
• A simple model that can be used is the lag one
autoregressive process written as . z  r z 
t 1 t 1

εt
• The residual series   z  z  rI zt 1
t t t may be checked
independence by plotting its correlogram. If there
for
are still signs of autoregression, a higher order model
is used.

 4.13.3 Correlograms
• The existence of periodic components may be
investigated quantitatively by means of constructing
correlograms of the data.
• For a series of data Yt, the serial correlation
coefficients rL between Yt and yt+L are calculated and
plotted against values of L, (known as the lag), for all
pairs of data L time units apart in the series.
 y  y n1
n  1
 
t  y t  y
rL  1  t  1
N


n  1 

 y
t  1
t  y 

 sample of n values of y .
• where y is the mean of the
2

L is usually taken for values up to n/4. The plot of rt


against lag L forms the correlogram.
• Examples of correlograms are shown in Fig 4.13. In a
completely random series with a very large number
of data, the correlogram varies slightly around 0 for
lags greater than 0 in (Fig. 4.13a).
• This is the correlogram for random independent
noise. If there exists some positive short term
correlation, the correlation coefficients gradually
decrease from 1 and finally vary very closely around
0. (Fig. 4.13b).
• This type of correlogram is typical of an
autoregressive process and such a correlogram can
be obtained with a series that can be fitted with a
Markov model. Fig. 4.13c represents a correlogram
from a data series represented by a pure sine wave.
• Fig 4.13
# # # ●
● # #
#

● ● #
● # #
● #

(a) Random, independent


noise



# # # # #
# # ●
#


● #
#

(b) Autoregressive, Markov


process

● ● ●


● ●

● ●


● ●

(c) Pure sine


wave
• It is important to note here that the correlation
coefficient calculations described above are only
applicable when looking for correlation within the
same time series.
• If the correlation coefficients being sought are to
correlate the flows in one season with those in
another season, then the product moment
correlation coefficients have to be calculated.
• If one series is represented as xt and the other as yt,
then the correlation coefficient between them is
given by:
rL

1
x  xy  y/ s s
N i
t t
y
x

• where sx is the standard deviation of the xt series and


sy is the standard deviation of the yt series (Mutreja,
1995).
 4.13.4. Time Series Analysis Principles
• The identification and mathematical description of
the several components forming the structure of the
time series constitutes time series analysis
i) Stationarity
• If the statistics of the sample (mean, variance
skewness) are not functions of the timing or length
of the sample, then the series is said to be stationary.
• The presence of trends, shifts or periodicity are
indicators of non-stationarity.
• The modelling of a time series is much easier if it is
stationary and so an attempt is made to remove the
non-stationary components before the series can be
modelled.
ii) Seasonal Sample Statistics
• Seasonal hydrological time series such as monthly
flows may be better described by considering
statistics on a seasonal basis.
• Let the seasonal time series be y in which v =
year; τ
= season; v= 1,. . . ..,N and τ= 1,... . . . , w; with N
denoting the number of years and w is the number
of seasons in the year e.g 12 if dealing with months.
• Generally for seasonal streamflow series, the mean
flow is greater than the standard deviation.
• The former may be smaller for low flow seasons and
generally, for intermittent streams, it is smaller
than the standard deviation throughout the year.
• Values of the skewness coefficient for the dry season
are generally larger than those for the wet season
indicating that data in the dry season depart more
from normality than those in the wet season.
iii) Normality
• Several of the models used in time series analysis
assume that the variable under consideration is
normally distributed. It is usual practice therefore to
test the data for normality before further analysis.
• A common method to determine normality is a plot
of the empirical frequency distribution of the data on
normal probability paper. A straight line plot
indicates that the data is normally distributed.
• If this is not the case, a transformation like  logx 
t

where
y
t
c
is the original series is used. Monthly flows
frequently do not follow a normal distribution as
opposed to annual flows .
• The monthly flows tend to be log-normal or Pearson
ill distributed. A recursive equation for log-normal
simulation can be used to describe them (Matalas,
1967).
iv) Persistence
• Persistence refers to phenomenon whereby low flow
on one day tends to be followed by low flow on the
next and perhaps the next.
• The number of steps over which a measurable
persistence is deemed to exist is termed the lag. The
Markov model is particularly suitable in reproducing
the persistence characteristic.
• 4.13.5 Time Series Modelling
• The concepts discussed in previous sections are used
in the representation of hydrologic time series by
mathematical models. These models are a class of
stochastic models and they are grouped here in two
broad classes:
(i) Autoregressive models (AR models)
(ii) Autoregressive with Moving Average models
(ARMA models)
• These models can be used to reproduce some of the
most important statistical properties of the time
series under consideration.

i) Auto Regressive Models


• A time series model defined as
p

yt     j y   e
t 1 t

j1
• is called an autoregressive model of order p in which
et is an uncorrelated normal random variable also
referred to as noise, innovation, error term or series
of shocks. It has mean zero and variance.
• The parameters of the model are µ,Ф1 ….and Φp. The
model is often denoted as the AR model. The AR (1)
model takes the form;
yt   1t1(y t

 )  e
• These models have been widely used for modelling
annual hydrologic time series and seasonal time
series after seasonal standardisation.
• The mean, variance and autocorrelation coefficients
of the AR(p) process are
( y)  

Va r y  2
2
 e
  p


 1  j  j  j 
k  1 k 1
1
p

     ... 
kp


ii) Markov models
• This class of models depends on the principle of
autoregression, which defines the AR models
described above. The Lag One Markov model is
expressed as:
Qi 1  Q
j 1
 rj (Qi  Q
j
)
 t
• Where:e
• Q i =generated streamflow in period i+1
 1

(day, month, year)


• Qj = mean of the observed flows in period j+ 1
1

• Qi = generated flow in period j+1


• Qj = mean of the observed flows in period j
• rj = correlation coefficient for the relation of
flows for period j+1to those for period j
• et = simulation error due to unexplained
variance
rewritten as: Qi1  Q j1  rj (Qi  Qj )  ti(1 r j 2 )2
• Since et represents the error, the equation can be
j1
• Where:
• ti = a random number selected from a normal
distribution having zero mean and unit variance.
• Q = the standard deviation of observed flows for
j 1

the period j+1.


• This means, variances and correlations must be
calcu1ated and an initial value Qi with i=1 selected.
The mean of the observed flows is usually selected as
the initial flow in the random synthetic runoff
sequence.
• To eliminate starting condition bias, the first 50 flows
in the sequence are discarded (Viessman, Lewis,
1996) although some texts suggest discarding the
first 10 or so flows (McMahon and Mein, 1978).
• Equation 4.75 is known as the single period Markov
generator. It is only suitable for annual flows since for
periods smaller than a year, the means of flows in
consecutive periods are seldom equal.
• An example is that in the gauge record, it is rare to
find that the January flow in a certain year equals the
February flow. For this reason, the multi-period
Markov model is used.
• This model is also referred to as the Thomas-Fiering
model because of the contribution of these two
(Thomas and Fiering, 1962). This model is expressed
as: i,
Q 
j 1
 Q )  t  (1 
i1, j1 j1 i j
2 2

Q
j
b (Q r ) j
• Where
• b  r   i and j are indices with i running from 1,……
j j
j 1
j

m (m is the number of times new flows are


generated) and j is the seasonal index e.g., it is 1-12
for monthly flows. Other terms are as discussed
previously.
• The Markov generation techniques sometimes result
in the generation of negative flows. This is not
desirable but these negative flows must be retained
for generating future values before they can be
discarded.
• This is acceptable so long as the proportion of
negative flows is not too high say not more than 5%
(McMahon and Mein, 1978).
• In addition, one should check to see that the
difference in mean flow of the generated sequence
with the negative values included does not vary by
more than 1% with that when the negative values
are set to zero.
• If this undesirable situation arises, there is a strong
indication that the model being applied is not
satisfactory for that stream.
• In the application of the model, a column in the
calculation table is incorporated for k described as a
normal random digit from a random number table
like Table 1 in the appendix.
• The alternative is to generate uniformly distributed
random numbers using a standard computer package
like Microsoft Excel.
• The ti’s corresponding to the k values are then
normal random deviates (these can also be obtained
using a computer package like Microsoft Excel by
utilising the Norminv function).
iii) Auto Regressive Moving Average model
• The Autoregressive Moving Average model (ARMA
model) is more versatile than the AR model.
• It can also be denoted as the ARMA( p,q) model with
p autoregressive parameters and q moving average
parameters.
• The ARMA( p,0) model is the same as the AR(p)
model and the ARMA( 0,q) model is simply a moving
average model MA( q). ARMA models must fulfill the
stationarity and invertibility requirements which
imply certain constraints on the parameters.
• In relation to modelling hydrologic processes, AR
models are basically short memory processes while
ARMA models are long memory processes.
• The low order ARMA( 1,1) model takes the form:
yt 
1
t  j ( y  t)  1e t   e
1

• ARMA models provide a better means of modelling


the Hurst effect i.e. long-term persistence than the
original Markov model and require fewer
components than the traditional time series models.
• Fig 4.14 Correlograms for River Namatala, Eastern Uganda
• Fig 4.14 Correlograms for River Namatala, Eastern Uganda
• Fig 4.14 Correlograms for River Namatala, Eastern Uganda
• Their current popularity is indicative of the present
emphasis of climatic persistence in synthetic
hydrology.
• The ease with which these models can be used to
generate a wide range of possible sequences with
similar properties to the historical record but each
one a unique alternative record is extremely varied in
many planning problems.
• This is because it provides a broad spectrum of
almost all the possible scenarios so that the planner
is not caught unaware.
• They can also be adopted for real time forecasting if
immediate past records are entered into the
equation and then regularly updated.
iv) Model Testing and Selection
• The adequacy of a time series model is often
examined by comparing the historical statistics of the
flow record with those derived from the model. The
statistics considered are the mean, variance,
skewness, covariance or correlation coefficients.
• If transformations are made say normalising flows,
the statistics may not be readily reproduced.
(Matalas, 1967) showed significant biases resulting
from this scenario, Sometimes the comparison of the
historical and model correlograms is made.
• The ultimate desire of anyone utilizing time series
analysis to generate data is to see that the historical
and model correlograms resemble as closely as
possible in order to justify the suitability of the
model.
• In a study (Rugumayo et al, 2002) on rivers in Eastern
Uganda, it was noted that the generated
characteristics of the mean and standard deviation,
were comparatively higher than the historical ones.
• Furthermore, the ARMA model generated values
closer to the historical means and standard deviation
than the Markov model.
• The difference between these values could because
monthly flows usually follow a log normal
distribution, yet in this case, a normal distribution
was assumed.
v) Limitations of Models
• One of the criticisms of the models described in
previous sections is that they do not take into account
the effects of climatic change or variability.
• They are therefore constrained in as far as answering
questions of designing water management schemes in
a world affected by global warming.
• Furthermore, is the fact that estimates based on
shorter records can be seriously influenced by both
random and cyclical oscillations as demonstrated by
the Hurst effect (Jones, 1996).
• The Hurst effect arises when a short period of data is
considered that has flow values that are not
representative of the rest of the river flows.
• The problem arises when periods with similar flows
are clustered together and are not well distributed
over the whole gauging period.
• Nevertheless continuous practical use is made of
these models and stochastic hydrology is dependant
upon them.

You might also like