9/26/2020 Jupyter Notebook Viewer
Course-work-and-data-analysis (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master)
/ Hydrology-Course (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master/Hydrology-Course)
/
GEO4310_2015 (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master/Hydrology-Course/GEO4310_2015)
/
EX6 (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master/Hydrology-Course/GEO4310_2015/EX6)
Exercise 6: Time series analysis and stochastic
modelling
Name: Bikas Chandra Bhattarai
Date: 2015/10/12
Question 1: Time series analysis
The annual mean runoff data in a station are given in the file Qyearlymean.txt.
1. Investigate if there is a trend using linear regression method for α =10%.
2. If the trend is significant, remove the trend from the data series.
3. Calculate and plot the autocorrelation ρ k and 95% confidence limits for time shifts, k, up to 5 years. Check up
to which time lag is the ρ k significant.
Test for significance of b
Xi = a + bTi
Ho : b = 0; Ha : b ≠ 0
b−o
Test statistic t =
sb
Ho is rejected if |t| ≥ t 1−α/2,n−2 i.e. b is significantally different from zero.
Where
−−−−−−−−− n 2
2 ∑ (χi −χ
^i )
S
sb = √ and S 2 =
i=1
(1)
n ¯ 2 n−2
∑ (T t −T )
t=1
The equation for autocorrelation coefficient, ρ k is given as
1 n−k ¯ ¯
C ov(X t ,X t+k ) ∑ (X t −X )(X t+k −X )
t=1
(2)
n−1
ρk = =
1 n ¯ 2
V ar(X t ) ∑ (X t −X )
n−1 t=1
Test the significance of ρ k
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 1/18
9/26/2020 Jupyter Notebook Viewer
(−1−z1−α/2 √n−2)
l =
n−1
and
(−1+z1−α/2 √n−2)
u =
n−1
Ho : ρ k = 0 ; Ha : ρ k ≠ 0
If the calculated ρ k falls outside these confidence limits, then the hypothesis is rejected, i.e., ρ k is significant for
significant level α and time lag k.
In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import scipy.stats
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from scipy.stats.stats import pearsonr
from pandas import Series
import statsmodels.tsa.stattools
In [3]:
q=pd.read_table('Qyearlymean.txt')
q.columns = ([u'YEAR', u'T', u'discharge'])
discharge = q['discharge']
print q.head(3)
print len(q)
YEAR T discharge
0 1940 1 398.0
1 1941 2 429.1
2 1942 3 690.2
51
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 2/18
9/26/2020 Jupyter Notebook Viewer
In [4]:
q= pd.read_table('Qyearlymean.txt')
# create a fitted model between discharge as dependent variable and time (T) as indip
fd = smf.ols(formula='discharge ~ T', data= q).fit()
#print summary statistics
print(fd.summary())
OLS Regression Results
==========================================================================
Dep. Variable: discharge R-squared: 0
Model: OLS Adj. R-squared: 0
Method: Least Squares F-statistic: 2
Date: Tue, 13 Oct 2015 Prob (F-statistic): 0
Time: 09:41:14 Log-Likelihood: -3
No. Observations: 51 AIC: 6
Df Residuals: 49 BIC: 6
Df Model: 1
Covariance Type: nonrobust
==========================================================================
coef std err t P>|t| [95.0% Conf.
-------------------------------------------------------------------------
Intercept 491.2333 36.817 13.343 0.000 417.247 565
T 2.1136 1.232 1.715 0.093 -0.363 4
==========================================================================
Omnibus: 2.044 Durbin-Watson: 0
Prob(Omnibus): 0.360 Jarque-Bera (JB):
Skew: 0.348 Prob(JB): 0
Kurtosis: 2.390 Cond. No.
==========================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is co
So the regression model becomes:
Xt = a + bTt = 491.23 + 2.11 ∗ Tt
Now formulating the test hypothesis for the coefficients to test wheather the coefficients are significantly different or
not and the test hypothesis can be formulated as given below:
H0 : b = 0
Ha : b ≠ 0
If the |t| > t critical , then H0 is rejected
From above table |t| = 1.715
In [5]:
# Obtaining the value of 't' from t-table using the python function at alpha = 10, de
round(scipy.stats.t.ppf(0.90,len(q)-2),3)
Out[5]:
1.299
From the t-table calculation tcritical = t 1− α ;n − 2 = 1.299 (for α = 10%)
2
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 3/18
9/26/2020 Jupyter Notebook Viewer
Conclusion: From the analysis, |t| value is greater than tcritical so H0 is rejected and concluded that, there is a
trend.
2. If the trend is significant, remove the trend from the data series.
To remove the trend, the difference between observed values and estimated values, the error, is calculated. The
error is always without trend. To get the time series of discharge without trend, interception and each error value is
added.
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 4/18
9/26/2020 Jupyter Notebook Viewer
In [6]:
# Defining x-axis
year = q['T']
#creating plot
fig, ax = plt.subplots(figsize=(15,7))
#creating linear fit
fit = np.polyfit(year, discharge, deg=1)
#plotting trend line to the figure
ax.plot(year, fit[0] * year + fit[1], color='red')
ax.plot(year, discharge,color='red', label = 'with trend')
plt.legend()
#setting the font size for x and y-axis
plt.rc('xtick', labelsize=16)
plt.rc('ytick', labelsize=16)
#priting the linear equation of fit
print 'y = %.2fx + %.3f'%(fit[0],fit[1])
#plot for without trend
#creating x-axis for the data with out trend
x1 = year
#date = range(1940, 1991,1)
#removing the trend from data ( compendium 139, 2013)
y1 = 2.114*x1
without_trend = discharge-y1
ax.plot(x1,without_trend,label='without trend')
plt.legend()
#creating linear fit for non trended data series
fit = np.polyfit(x1, without_trend, deg=1)
ax.plot(x1, fit[0] * x1 + fit[1], color='blue')
ax.set_xlim([0,52])
# display the equation for data without trend
print 'y_without_trend = %.2fx + %.3f'%(fit[0],fit[1])
plt.ylabel('Discharge ($m^3/s$)', fontsize =18)
plt.xlabel('Year',fontsize=18)
plt.show()
y = 2.11x + 491.233
y_without_trend = -0.00x + 491.233
3) Calculate and plot the autocorrelation ρ k and 95% confidence limits for time shifts, k, up to 5 years.
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 5/18
9/26/2020 Jupyter Notebook Viewer
Check up to which time lag is the ρ k significant.
Solution:The equation for autocorrelation coefficient, ρ k is given as
1 n−k ¯ ¯
C ov(X t ,X t+k ) ∑ (X t −X )(X t+k −X )
t=1
Which is given by the python function :
n−1
ρk = =
1 n ¯ 2
V ar(X t ) ∑ (X t −X )
n−1 t=1
statsmodels.tsa.stattools.acf(Series, nlags= Timelag) is used in this calculation. Conficence intervals are
calculated by using the formula given below:
(−1−z1−α/2 √n−2)
l =
n−1
and
(−1+z1−α/2 √n−2)
u =
n−1
Ho : ρ k = 0 ; Ha : ρ k ≠ 0
If the calculated ρ k falls outside these confidence limits, then the hypothesis is rejected, i.e., ρ k is significant for
significant level α and time lag k.
In [7]:
# generating series of total number of data which is equivalent to n in the confidenc
v = pd.Series(range(51 ,45, -1))
# Defining the time lag as index in the dataframe
index = range(0 ,6)
# creating dataframe
auto = pd.DataFrame( index = index)
# assigning the title to index as time lag
auto.columns.name = 'Time lag'
#calculating autocorrelation by using the function from pytohn where nlags is the all
auto['autocorr (pk)'] = Series(statsmodels.tsa.stattools.acf(without_trend, nlags=5))
# calculating lower confidence interval where 1.96 is the value from normal distribut
auto['CI-lower'] = (-1-1.96*np.sqrt(v-2))/(v-1)
# calculating upper confidence interval
auto['CI-upper'] = (-1+1.96*np.sqrt(v-2))/(v-1)
# displaying the data
auto
Out[7]:
Time lag autocorr (pk) CI-lower CI-upper
0 1.000000 -0.294400 0.254400
1 0.527878 -0.297536 0.256720
2 0.453294 -0.300773 0.259106
3 0.386475 -0.304114 0.261561
4 0.194384 -0.307567 0.264089
5 0.159846 -0.311137 0.266693
Conclustion: From above calculation H0 : pk = 0 is rejected up to time lag of 3 years. This means the
autocorrelation coefficient is significant for α =5% up to 3 years of time lag. The autocorrelation for time lag up to 5
years together with limits for confidence interval with 5% leel of significance is given below:
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 6/18
9/26/2020 Jupyter Notebook Viewer
In [8]:
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(auto, marker='o')
plt.grid()
plt.legend(['Auto correlation', 'Lower CI' , 'Uppper CI'])
plt.ylabel('rL', fontsize =18)
plt.xlabel('Time lag (Years)',fontsize=18)
plt.show()
In python there is a inbuild function instide statsmodels.graphics.tsaplots called plot_acf to calculate and plot the
auto-correlation with 95% significance level. The plot given below is plotted by using the plot_acf function where
area under the shaded region is the confidence intervals.
In [9]:
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(without_trend)
plt.show()
Question 2: Stochastic models for stationary time series
Fit the first order Markov model for annual data (after the removal of trend if there is any)
Xt+1 = μ + β(1)(Xt − μ) + ϵ t+1 (3)
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 7/18
9/26/2020 Jupyter Notebook Viewer
Parameter estimation:
^
β 1 = ρ1 where ρ 1 is calculated using equation (2).
n 2
2 ∑ (χ −χ̄ )
2 ^
2
^ = σ x (1 − β )
σ ϵ 1
where σ x2 =
i=1 i i
n−1
and
1 n
^ = x̄ =
μ ∑ xi
n i=1
equation (3) becomes
−−−−−
2
^ ^
xt = μ
^
x
+ β (x t−1 − μ
1
^
x
+ z t σ x )√1 − β
1
(4)
Procedure of using equation (4):
1. Estimate μx , σ x , and β1 and substitute them in (4)
2. Generate random z t series which is from a N(0,1) distribution. N_yrs = 1991-1960+1
random.standard_normal((N_yrs,))
3. Use the 1959’s value as initial value and simulate for the period 1960-1991
4. Compare the simulated time series with the observed time series. Plot two time series in the same graph,
calculate and compare the statistics of the two series, i.e. mean, standard deviation, ρ 1 and ρ 2 , etc.,
5. Use AR(1) model to simulate 100 samples, each having 33 years, calculate the mean values of the statistics
(average, standard deviation, r1) from the 100 samples and compare them with the statistics calculated from
observed 33 years data series
Note that in step (4), one simulation sample is compared with the observed data, while in step (5) the mean values
of 100 samples are compared with the observed values.
Solution:
In [10]:
#Calculating the different statistics for the data:
#Defining the dataframe
frame= pd.DataFrame()
#counting the number of data
n = len(without_trend)
#calculating mean
mean = 491.22
frame['no_obs'] = Series(len(without_trend))
frame['mean']= np.mean(without_trend).round(2)
frame['std'] = (np.sqrt(sum((without_trend-mean)**2)/(n-1))).round(2)
#Displaying the basic statistics of data with out trend
frame
Out[10]:
no_obs mean std
0 51 491.22 128.23
Now Fitting of first order Markov model for annual data after trend is removed. Equation (4) to simulate
discharge.Autocorrelation for time lag of 1 is calculated by using the python function:
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 8/18
9/26/2020 Jupyter Notebook Viewer
In [11]:
#calculating the autocorrelation
auto_1 = statsmodels.tsa.stattools.acf(without_trend, nlags=1)
#converting the array data in to series
auto_1 = Series(auto_1[1,]).round(3)
print auto_1
0 0.528
dtype: float64
Here, the autocorrelation with time lag of 1 year is found to be 0.528
β1 = ρ 1 = auto1 = 0.528
mean = x̄ = 491.22
2
std = σ x = 128.23
and z t is series generated from random normal distribution: N(0,1). The initial value for Xt−1 is selected as the
3
1959 value i.e 552.63 m /s .
In [12]:
mean = 491.22
std = 128.23
#Selecting the data series 1959 to 1990
no_trend = without_trend[19:]
#generating normally distributed random data series
z= np.random.normal(0, 1, 32)
# Simulating the discharge by using equation (4)
simulated = pd.DataFrame(mean+(0.557*(no_trend -mean))+(z*std*(np.sqrt(1-0.557**2))))
calculating the auto-correlation upto 2 year time lag for observed and simulated datasets.
In [13]:
auto_2_obs = pd.DataFrame(statsmodels.tsa.stattools.acf(no_trend, nlags=2))[1:]
print auto_2_obs.T
1 2
0 0.595307 0.614706
In [14]:
auto_2_sim = pd.DataFrame(statsmodels.tsa.stattools.acf(simulated, nlags=2))[1:]
print auto_2_sim.T
1 2
0 0.252234 0.267889
Calculating other statistics and making the dataframe
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 9/18
9/26/2020 Jupyter Notebook Viewer
In [15]:
index = ['Observed','Simulated']
statistic = pd.DataFrame(index = index)
statistics = pd.DataFrame({'Auto_1': [0.595, 0.128], 'Auto_2': [0.6147, 0.0735]},inde
statistics['Mean'] = round(no_trend.mean(),2),round(simulated.mean(),2)
statistics['std'] = round(no_trend.std(),2), round(simulated.std(),2)
statistics['median'] = round(no_trend.median(),2), round(simulated.median(),2)
statistics
Out[15]:
Auto_1 Auto_2 Mean std median
Observed 0.595 0.6147 494.06 126.30 471.80
Simulated 0.128 0.0735 511.54 110.66 511.32
In [16]:
fig, ax = plt.subplots(figsize=(13,5))
ax.plot(no_trend, label='observed')
ax.plot(simulated, label='simulated')
plt.grid()
ax.set_xlim([1 ,31])
plt.legend()
plt.xlabel('Years',fontsize =18)
plt.ylabel('Discharge ($m^3/s$)',fontsize =18)
plt.show()
Conclusion: The statistics for observed and modelled time series (1960 -1990) is in above table. And aslo it is
clear from the plot that the variation between the observed and simulated data series is large. Comparision shows
that the Markov medeled time series is a more ramdom series as is expected when including the random z values
from a normal distribution in the Markov model. Simulated times series has lower mean, and lower standard
deviation then that of the observed series. And the value of mean and standard deviation changes everytimes after
the model because of random number.
5 Use AR(1) model to simulate 100 samples, each having 33 years, calculate the mean values of the statistics
(average, standard deviation, r1) from the 100 samples and compare them with the statistics calculated from
observed 33 years data series
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 10/18
9/26/2020 Jupyter Notebook Viewer
In [395]:
#Defining the dataframe
df = pd.DataFrame()
#defining the function for n times model run
def repeat_fun(times, f, args):
for i in range(times): f(i,args)
def f(iteration,df):
g = np.random.normal(0, 1, 32)
mm = Series(no_trend)
x = 491.22+(0.557*(mm -491.22))+(g*128.23*(np.sqrt(1-0.557**2)))
df[iteration] = x
repeat_fun(100,f,df)
#Dataframe with 100 columns and 31 rows will be obtained by printing df.
#calculating the mean for 100 times run datasets
Simulated_100 =df.mean(axis=1)
Calculating auto-correlation up to 2 year time lag for mean which is calculated from 100 years run datasets.
In [397]:
auto_2_100_sim = pd.DataFrame(statsmodels.tsa.stattools.acf(Simulated_100, nlags=2))[
print auto_2_sim.T
1 2
0 0.219762 0.432154
Calculating and comparing the different statistics for the mean of 100 times run and observed data without trend.
In [398]:
ind = ['Observed','Simulated']
simulated_100 = pd.DataFrame(index = ind)
simulated_100 = pd.DataFrame({'Auto_1': [0.595, 0.128], 'Auto_2': [0.219, 0.432]},ind
simulated_100['Mean'] = round(no_trend.mean(),2),round(Simulated_100.mean(),2)
simulated_100['std'] = round(no_trend.std(),2), round(Simulated_100.std(),2)
simulated_100['median'] = round(no_trend.median(),2), round(Simulated_100.median(),2)
simulated_100
Out[398]:
Auto_1 Auto_2 Mean std median
Observed 0.595 0.219 494.06 126.30 471.80
Simulated 0.128 0.432 491.35 70.28 476.09
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 11/18
9/26/2020 Jupyter Notebook Viewer
In [401]:
fig, ax = plt.subplots(figsize=(13,5))
ax.plot(Simulated_100,label='Simulated_100')
ax.plot(no_trend, label='observed')
plt.grid()
ax.set_xlim([1 ,31])
plt.ylabel('Discharge($m^3/s$)', fontsize =18)
plt.xlabel('Years',fontsize=18)
plt.legend()
plt.show()
Conclusion mean and standard deviation for mean simulated discharge is lower then that of observed one. On the
basis of standard deviation we can say that the variation in simulated data sets is comperatively less then that of
observed datasets.
Question 3: Stochastic model for non-stationary time series
Adapt a stochastic model to monthly discharge COLUMBIA_M1.txt.
Thomas-Fiering is a generalization of the first order Markov model. In general for the stochastic models (MA, AR,
ARMA) stationarity is assumed. For example, the first order Markov model assumes stationarity in the first three
moments. However, there are generalizations of these models that account for non-stationarity. For data sets which
are seasonally non-stationary, you can use Thomas-Fiering.
1. Fit the Thomas-Fiering Model to the data and simulate the monthly discharge from January 1949 to December
1975 using the value of December 1948 as the initial value. Plot the simulated and observed data on the same
graph and compare the statistics of the two data series
2. Simulate the 100-year discharge series from 1976 to 2075 using the value of December 1975 as initial value
and compare the statistics of the simulated data series with the observed data series.
The Thomas-Fiering Model is given as:
−−−−−
^j+1,i = q
q ¯ ) + z j+1,i ⋅ s j+1 √1 − r2
¯j+1 + bj (qj,i − qj j
(1)
Where i = year, j = month, and z is a random series with N(0, 1).
Calculation procedure:
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 12/18
9/26/2020 Jupyter Notebook Viewer
1
(a) The mean flow: q¯j =
n
∑ qj,i ,
i
where i = {j, 12 + j, 24 + j, ⋯}
−−−−−−−−
2
∑ (q j,i −q
¯ )
(b) The standard deviation: Sj = √
i j
n−1
(c) the correlation coefficient with flow in the next month,
∑ (q j,i −q
¯ j)(q j+1,i −q
¯ j+1 )
i
rj =
2 2
√∑ (q −q
¯ ) ∑ (q j+1,i −q
¯ )
i j,i j i j+1
(d) the slope of the regression equation relating the month’s flow to flow in the preceding month:
Sj+1
bj = rj
Sj
Solution
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 13/18
9/26/2020 Jupyter Notebook Viewer
In [18]:
def thomas_fiering(q_hist, q_init, j_init, period):
""" Thomas Fiering Model """
q = q_hist
months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
months1 = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1]
# create a random generator, note we have to call this '()' and take the
# first element returned [0]... so in use it is Z()[0]
Z = np.random.randn
# get the mean monthly discharges for the j months
q_mean_j = [q[q.index.month == i].discharge.mean() for i in months]
# get the mean monthly discharges for the j+1 months
q_mean_j1 = [q[q.index.month == i].discharge.mean() for i in months1]
# do the same for standard deviation
sj = [q[q.index.month == i].discharge.std() for i in months]
sj1 = [q[q.index.month == i].discharge.std() for i in months1]
# calculate the pearson r using built-ins
rj = [pearsonr(q[q.index.month == i].discharge,
q[q.index.month == j].discharge)[0] for i, j in zip(months, months
# the model, not vectorized. assumes each of the variables above are
# of length 12 ... and note that Python is zero-indexed (e.g. first element is '0
def tf1(qi, month):
i = month-1 #account for zero offset
return q_mean_j1[i] + (rj[i] * (sj1[i] / sj[i])) * (qi - q_mean_j[i]) + Z(1)[
# initiate simulated q with q_init and j_init (month)
# simq is a list, and we just call the model function to get the first element
simq = [tf1(q_init, j_init)]
# enumerate (e.g. makes 'i' a counter) and get each month in the simulation perio
for i, m in enumerate(period.month[1:]):
#append to the list by calling the model with the prior value and the month
simq.append(tf1(simq[i-1], m))
# convert it to a dataframe for convenience
return pd.DataFrame(np.array(simq), columns=['discharge'], index=period)
if __name__ == "__main__":
# use pandas to read the data
q = pd.read_table('mon.csv', sep=',',
names=['year', 'month', 'discharge'])[1:].convert_objects(convert_n
#q = q['discharge'].convert_objects(convert_numeric=True)
# create a pandas date range to index the data frame
rng = pd.date_range('1/1/1949', periods=len(q), freq='M')
# set the index (now we can easily get monthly values)
q.index = rng
# create a simulation period
period = pd.date_range('Jan-1949', periods=324, freq='M')
q_init = q.discharge['Dec-1949'] #december 1949
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 14/18
9/26/2020 Jupyter Notebook Viewer
simq = thomas_fiering(q, q_init, 12, period)
# plot the data
plt.figure(figsize=(15,5)); plt.hold(True)
plt.plot(q.index, q.discharge, c='b', label='observed')
plt.plot(simq.index, simq.discharge, c='r', label='simulated')
plt.legend()
plt.grid(True)
Comparing the statistics:
In [19]:
index = ['Observed','Simulated']
stat = pd.DataFrame(index = index)
stat['Mean'] = round(q.discharge.mean(),2),round(simq.mean(),2)
stat['std'] = round(q.discharge.std(),2), round(simq.std(),2)
stat['median'] = round(q.discharge.median(),2), round(simq.median(),2)
stat
Out[19]:
Mean std median
Observed 5551.51 3366.80 4207.00
Simulated 5697.04 2480.31 5043.47
Conclusion: From the plot above it is clear that the there is a variation between observed and simulated discharge
from thomas fiering model. Simulated mean is higher than that of observed while the simulated standard deviation
is lower then that of observed datasets. And also the mean is greater than the median in both case. This is common
for a distribution that is skewed to the right (that is, bunched up toward the left and with a "tail" stretching toward the
right)
Simulating discharge for monthly 100 years by using the Thomas-Fiering model
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 15/18
9/26/2020 Jupyter Notebook Viewer
In [20]:
# Repeat Thomas-Fiering 1200 time steps:
#Define an array of 1200 elements
#sim_100 = np.arange(100)
# Define the period (100 years = 1200 months time steps)
period = pd.date_range('Jan-1975', periods=1200, freq='M')
sim_100 = thomas_fiering(q, q_init, 12, period)
#print(sim_100)
# plot the data
plt.figure(figsize=(15,6)); plt.hold(True)
plt.plot(sim_100.index, sim_100, c='b')
#plt.plot(simq.index, simq.discharge, c='r', label='simulated')
plt.xlabel('Years',fontsize =18)
plt.ylabel('Discharge ($m^3/s$)',fontsize =18)
plt.grid(True)
Calculating the different statistics for 100 years simulated data
In [21]:
sim=pd.DataFrame(sim_100['discharge'])
mean_100 = pd.DataFrame(sim.mean().round(2))
std_100 = pd.DataFrame(sim.std().round(2))
median_100 =pd.DataFrame(sim.median().round(2))
stats_100= (pd.concat([mean_100, std_100,median_100], axis =1))
my_columns = ["mean", "std", "median"]
stats_100.columns = my_columns
stats_100
Out[21]:
mean std median
discharge 5685.43 2622.74 5085.56
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 16/18
9/26/2020 Jupyter Notebook Viewer
In [30]:
plot_acf(sim_yearly)
Out[30]:
Calculating the yearly sum for 100 years simulated data
In [29]:
sim_yearly = sim_100.groupby(sim_100.index.year).sum()
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 17/18
9/26/2020 Jupyter Notebook Viewer
In [40]:
# plotting 100 years simulated data
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(sim_yearly.index,sim_yearly)
plt.grid()
plt.ylabel('Discharge($m^3/s$)', fontsize =18)
ax.set_xlim([1975 ,2075])
plt.xlabel('Years',fontsize=18)
plt.show()
In [ ]:
https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 18/18