Slide1 : Péter Elek and László Márkus
Dept. Probability Theory and Statistics
Eötvös Loránd University
Budapest, Hungary
River Tisza and its aquifer : River Tisza and its aquifer
Tisza produces devastating floods : Tisza produces devastating floods
Water discharge at Vásárosnamény(We have 5 more monitoring sites)from1901-2000 : Water discharge at Vásárosnamény (We have 5 more monitoring sites) from1901-2000
Basic properties of the series : Basic properties of the series Series exhibit:
slight upward trend in mean
strong seasonal component both in mean and variance
Seasonal components can be removed by a local polinomial fitting (LOESS) procedure
The distribution of the standardized series are still periodic.
The distribution is skewed.
Empirical and smoothed seasonal components : Empirical and smoothed seasonal components
Bi-monthly distributions : Bi-monthly distributions
Autocorrelation function is slowly decaying : Autocorrelation function is slowly decaying
How long does the river remember...? : How long does the river remember...? Long memory - known ever since Hurst’s original work on the Aswan dam (1952) of the Nile River
Short and long memory - autocorrelations die out at exponential versus hyperbolic rate
Equivalent for long memory - spectral density has pole at zero, meaning it tends to infinity at zero at polynomial speed.
Long memory processes : Long memory processes Short and long memory - autocorrelations die out at exponential versus hyperbolic rate
Equivalent for long memory - spectral density has pole at zero, meaning it tends to infinity at zero at polynomial speed.
Indicators of long memory : Indicators of long memory Nonparametric statistics
Rescaled adjusted range or R/S
Classical
Lo’s (test)
Taqqu’s graphical (robust)
Variance plot
Log-periodogram (Geweke-Porter Hudak)
Slide13 : Classic R/S: biased, lack of distribution theory, no test or confidence intervals, sensitive for short mem component if present.
Lo’s modification - convergence to Brownian Bridge. Test of long memory; often accepts short memory when long is present, but reliably rejects long mem when only short is present.
Taqqu’s graphical R/S; get out the most of the data: drop blocks from the beginnig, reestimate R/S from the shorter data, log-log plot all together and fit a straight line (regression)
Variance plot : Variance plot The variance of the mean tends to zero as n2H-2
Estimate the variance of means from the aggregated processes
Fit a straight line on the log-log scale
Determine the slope from a linear regression
The log - spectrum : The log - spectrum The order of the pole of the spectral density is 1-2H
On the log-log scaleplot of the periodogram this means a straight line of steepness 1-2H
The celebrated Geweke - Porter Hudak statistics uses log(sin2(/2)) instead of the log of the frequencies.
Linear long-memory model : fractional ARIMA-process (Montanari et al., Lago Maggiore, 1997) : Linear long-memory model : fractional ARIMA-process (Montanari et al., Lago Maggiore, 1997) Fractional ARIMA-model:
Fitting is done by Whittle-estimator:
based on the empirical and theoretical periodogram
quite robust: consistent and asymptotically normal for linear processes driven by innovatons with finite forth moments (Giraitis and Surgailis, 1990)
Results of fractional ARIMA fit : Results of fractional ARIMA fit
H=0.846 (standard error: 0.014)
p-value: 0.558 (indicates goodness of fit)
Innovations can be reconstructed using a linear filter (the inverse of the filter above)
Reconstruct the innovation from the fitted model : Reconstruct the innovation from the fitted model
The density of the innovations : The density of the innovations
Reconstructed innovations are uncorrelated... : Reconstructed innovations are uncorrelated... But not independent
Simulations using i.i.d. innovations : Simulations using i.i.d. innovations If we assume that innovations are i.i.d, we can generate synthetic series:
Use resampling to generate synthetic innovations
Apply then the linear filter
Add the sesonal components to get a synthetic streamflow series
But: these series do not approximate well the high quantiles of the original series
But: they fail to catch the densities and underestimate the high quantiles of the original series : But: they fail to catch the densities and underestimate the high quantiles of the original series
Logarithmic linear model : Logarithmic linear model It is quite common to take logarithm in order to get closer to the normal distribution
It is indeed the case here as well
Even the simulated quantiles from a fitted linear model seem to be “almost” acceptable
Slide26 : But the backtransformed quantiles are clearly unrealistic.
Let’s have a closer look at innovations : Let’s have a closer look at innovations Innovations can be regarded as shocks to the linear system
Few properties:
Squared and absolute values are autocorrelated
Skewed and peaked marginal distribution
There are periods of high and low variance
All these point to a GARCH-type model
The classical GARCH is far too heavy tailed to our purposes
Simulation from the GARCH-process : Simulation from the GARCH-process Simulations:
Generate i.i.d. series from the estimated GARCH-residuals
Then simulate the GARCH(1,1) process using these residuals
Apply the linear filter and add the seasonalities
The simulated series are much heavier-tailed than the original series
General form of GARCH-models : Need a model „between”
i.i.d.-driven FARIMA-series and
GARCH(1,1)-driven FARIMA-series
General form of GARCH-models:
General form of GARCH-models
A smooth transition GARCH-model : A smooth transition GARCH-model
ACF of GARCH-residuals : ACF of GARCH-residuals
Results of simulationsat Vásárosnamény : Results of simulations at Vásárosnamény
Back to the original GARCH philosophy : Back to the original GARCH philosophy The above described GARCH model is somewhat artificial, and hard to find heuristic explanations for it:
why does the conditional variance depend on the innovations of the linear filter?
in the original GARCH-context the variance is dependent on the lagged values of the process itself.
A possible solution: condition the variance on the lagged discharge process instead !
The fractional integration does not seem to be necessary
almost the same innovations as from an ARMA(3,1)
In extreme value theory long memory in linear models does not make a difference
Estimated variance of innovations plotted against the lagged discharge : Estimated variance of innovations plotted against the lagged discharge Spectacularly linear relationship
This approves the new modelling attempt
Distorted at sites with damming along Tisza River
Slide35 : The variance is not conditional on the lagged innovation
but it is conditional on the lagged water discharge.
Slide36 : Theoretical problems arise in the new model
Existence of stationary solution
Finiteness of all moments
Consistence and asymptotic normality of quasi max-likelihood estimators
Heuristically clearer explanation can be given
The discharge is indicative of the saturation of the watershed
A saturated watershed results in more straightforward reach for precipitation to the river, hence an increase in the water supply.
A saturated watershed gives away water quicker.
The possible changes are greater and so is the uncertainty for the next discharge value.
An example: Zt~N(0,1) c=20, a1=0.95, 0=1, 1=2, m=1 : An example: Zt~N(0,1) c=20, a1=0.95, 0=1, 1=2, m=1
Existence and moments of the stationary solution : Existence and moments of the stationary solution We assume that ct = constant
The model has a unique stationary solution if the corresponding ARMA-model is stationary
i.e. all roots of the characteristic equation lie within the unit circle
Moreover, if the m-th moment of Zt is finite then the same holds for the stationary distribution of Xt, too.
These are in contrast to the usual, quadratic ARCH-type innovations. There the condition for stationarity is more complicated and not all moments of the stationary distribution are finite.
Sketch of the proof I. : Sketch of the proof I. The process can be embedded into a (p+q)-dimensional Markov-chain: Yt=AYt-1+Et
where Yt=(Xt-c, Xt-1-c,...Xt-p+1-c, εt, εt-1,..., εt-q+1) and Et=(εt, 0,...).
Yt is aperiodic and irreducible (under some technical conditions).
General condition for geometric ergodicity and hence for existence of a unique stationary distribution (Meyn and Tweedie, 1993): there exists a V1 function with 0<<1, b< and C compact set
E( V(Y1) | Y0=y ) (1-) V(y) + b IC(y)
In other words: V is bounded on a compact set and is a contraction outside it.
Moreover: E(V(Yt)) is finite ( is the stationary distribution).
Sketch of the proof II. : Sketch of the proof II. In the given case:
if E(|Zt|m) is finite,
V(y) = 1 + ||QPy||mm will suffice
where:
B=PAP-1 is the real valued block Jordan-decomposition of A
and Q is an appropriately chosen diagonal matrix with positive elements.
This also implies the finiteness of the mth moment of Xt.
Estimation : Estimation Estimation of the ARMA-filter can be carried out by least squares.
Essentially only the uncorrelatedness of innovations is needed for consistency.
Additional moment condition is needed for asymptotic normality (e.g. Francq and Zakoian, 1998).
The ARCH-equation is estimated by quasi maximum likelihood (assuming that Zt is Gaussian), using the t innovations calculated from the ARMA-filter.
The QML estimator of the ARCH-parameters is consistent and asymptotically normal under mild conditions (Zt does not need to be Gaussian).
Estimation of the ARCH-equation in the case of known t innovations(along the lines of Kristensen and Rahbek, 2005) : Estimation of the ARCH-equation in the case of known t innovations (along the lines of Kristensen and Rahbek, 2005)
Maximising the Gaussian log-likelihood
we obtain the QML-estimator of .
For simplicity we assume that 0 min>0 in the whole parameter space.
Consistency of the estimator : Consistency of the estimator By the ergodic theorem
It is easy to check that L(α)
Asymptotic normality I. : Asymptotic normality I. Using the notations
for the derivatives of the log-likelihood:
for the information matrix
and for the expected Hessian:
Asymptotic normality II. : Asymptotic normality II. A standard Taylor-expansion implies:
Finiteness of the fourth moment with a martingale central limit theorem yields:
Moreover, the asymptotic covariance matrix can be consistently estimated by the empirical counterparts of H and V.
Estimation of the ARCH-equation when t is not known : Estimation of the ARCH-equation when t is not known In this case the innovations of the ARMA-model are calculated using the estimated ARMA-parameters:
If the ARMA-parameter vector is estimated consistently, the mean difference of squared innovations tends to zero:
If the ARMA parameter estimate is asymptotically normal, a stronger statement holds:
Consistency in the case of estimated innovations : Consistency in the case of estimated innovations Now the following expression is maximised:
But the difference tends to zero (uniformly on the parameter space):
which then yields consistency of the estimate of .
Asymptotic normality in the case of estimated innovations : Asymptotic normality in the case of estimated innovations Under some moment conditions the least squares estimate of the ARMA-parameters is asymptotically normal, hence:
This way the differences between the first and, respectively, the second derivatives both converge to zero:
As a result, all the arguments for asymptotic normality given above remain valid.
Estimation results : Estimation results
How to simulate the residuals of the new GARCH-type model : How to simulate the residuals of the new GARCH-type model Residuals are highly skewed and peaked.
Simulation:
Use resampling to simulate from the central quantiles of the distribution
Use Generalized Pareto distribution to simulate from upper and lower quantiles
Use periodic monthly densities
The simulation process : The simulation process t Zt Xt resampling and GPD FARIMA filter smoothed GARCH Seasonal filter
Evaluating the model fit : Evaluating the model fit Independence of residual series
ACF, extremal clustering
Fit of probability density and high quantiles
Variance – lagged discharge relationship
Extremal index
Level exceedance times
Flood volume distribution
ACF of original and squared innovation series – residual series : ACF of original and squared innovation series – residual series
Results of new simulationsat Vásárosnamény : Results of new simulations at Vásárosnamény
Densities and quantiles at all 6 locations : Densities and quantiles at all 6 locations
Reestimated (from the fitted model) discharge-variance relationship : Reestimated (from the fitted model) discharge-variance relationship
Seasonalities of extremes : Seasonalities of extremes The seasonal appearance of the highest values (upper 1%) of the simulated processes follows closely the same for the observed one.
Extremal index to measure the clustering of high values : Extremal index to measure the clustering of high values Estimated for the observed and simulated processes containing all seasonal components
Estimation by block method:
Value of block length changes from 0.1% to 1%.
Value of threshold ranges from 95% to 99.9%.
Estimated extremal indices displayed : Estimated extremal indices displayed
Extremal indices in the threshold GARCH model : Extremal indices in the threshold GARCH model
Extremal indices in the discharge dependent GARCH model : Extremal indices in the discharge dependent GARCH model
Level exceedance times : Level exceedance times The distribution of the level exceedance times may serve as a further goodness of fit measure.
It has an enormous importance as it represents the time until the dams should stand high water pressure.
Flood volume distribution : Flood volume distribution The match of the empirical and simulated flood volume distributions also approve the good fit.
Multivariate modelling : Multivariate modelling Final aim: to model the runoff processes simultaneously
Nonlinear interdependence and non-Gaussianity should be addressed here, too
First, the joint behaviour of the discharges inflowing into Hungary should be modelled
Differential equation-oriented models of conventional hydrology may be used to describe downstream evolution of runoffs
Now we concentrate on joint modelling of two rivers: Tisza (at Tivadar) and Szamos (at Csenger)
Issues of joint modelling : Issues of joint modelling Measures of linear interdependences (the cross-correlations) are likely to be insufficient.
High runoffs appear to be more synchronized on the two rivers than small ones
The reason may be the common generating weather patterns for high flows
This requires a non-conventional analysis of the dependence structure of the observed series
Basic statistics of Tivadar (Tisza) and Csenger (Szamos) : Basic statistics of Tivadar (Tisza) and Csenger (Szamos) The model described previously was applied to both rivers
Correlations between the series of raw values, innovations and residuals are highest when either series at Tivadar are lagged by one day
Correlations:
Raw discharges: 0.79
Deseasonalized data: 0.77
Innovations: 0.40
Residuals: 0.48
Conditional variances: 0.84
Displaying the nature of interdependence : Displaying the nature of interdependence The joint plot may not be informative because of the highly non-Gaussian distributions
Transform the marginals into uniform distributions (produce the so-called copula),
then the scatterplot is more informative on the joint behaviour
The strange behaviour of the copula of the innovations is characterized by the concentration of points
1. at the main diagonal, and especially at the upper right corner (tail dependence)
2. at the upper left (and the lower right) corner(s)
Linearly dependent variables do not display this type of copula
The GARCH-residuals lack the second type of irregularity 1 2
A possible explanation of this type of interdependence : A possible explanation of this type of interdependence The variance process is essentially common for the two rivers
This is justified by the high correlation (0.84)
Generate two linearly interdependent residual series (correlation=0.48)
Multiply by the common standard deviation distributed as Gamma
Observe the obtained copula
This justifies the hypothesis that the common variance causes the nonlinear interdependence of the given type
Tail dependence : Tail dependence
Conclusion : Conclusion Long range dependency does not have much impact on extremes of river discharges
Nonlinearities are influental and have to be accounted for
The proposed model has a stationary solution
Its estimation is consistent and asymptotically normal
The suggested model catches densities and quantiles well
The model fitting procedure does not optimise on quantiles and maxima clustering, therefore their fit really shows model adequacy
Other fit-independent measures include level exceedence and flood volume distributions
Something different has to be done with dammed water
Possible refinements of the model : Possible refinements of the model The simulated process is still slightly heavier tailed than the original one.
The conditional distribution of residuals depends on the lagged water discharge
very high residual values occur at low discharge
Possible solutions:
innovation as a mixture of a GARCH-part and of a discharge-independent part
Markov-switching model with discharge-dependent transition probabilities
Thank you for your attention! : Thank you for your attention!