Rainfall Data

Views:
 
Category: Others/ Misc
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Let It Rain: Modeling Multivariate Rain Time Series Using Hidden Markov Models : 

Let It Rain: Modeling Multivariate Rain Time Series Using Hidden Markov Models Sergey Kirshner Donald Bren School of Information and Computer Sciences UC Irvine March 2, 2006

Acknowledgements : 

March 2, 2006 2 Acknowledgements Padhraic Smyth UCI Andy Robertson IRI DOE (DE-FG02-02ER63413)

Slide 3: 

March 2, 2006 3 http://iri.columbia.edu/climate/forecast/net_asmt/2006/feb2006/MAM06_World_pcp.html

What to Do with Rainfall Data? : 

March 2, 2006 4 What to Do with Rainfall Data? historical rainfall data general circulation model (GCM) outputs model Description

What to Do with Rainfall Data? : 

March 2, 2006 5 What to Do with Rainfall Data? historical rainfall data general circulation model (GCM) outputs model predicted data Downscaling

What to Do with Rainfall Data? : 

March 2, 2006 6 What to Do with Rainfall Data? historical rainfall data general circulation model (GCM) outputs model predicted data crop modeling water management Simulation

Snapshot of the Data : 

March 2, 2006 7 Snapshot of the Data

Modeling Precipitation Occurrence… : 

March 2, 2006 8 Modeling Precipitation Occurrence… Northeast Brazil 1975-2002 (except 1976, 78, 84, and 86) 24 seasons (N) 90 days (T) 10 stations (M)

… and Amounts : 

March 2, 2006 9 … and Amounts

Annual Precipitation Probability : 

March 2, 2006 10 Annual Precipitation Probability

Spatial Correlation : 

March 2, 2006 11 Spatial Correlation

Spell Run Length Distributions : 

March 2, 2006 12 Spell Run Length Distributions Dry spells are in blue; wet spells are in red.

Important Data Characteristics : 

March 2, 2006 13 Important Data Characteristics Correlation Spatial dependence Temporal structure Run-length distributions Persistence First order dependence Variability of individual series Interannual variability: important for climate studies

Missing Data : 

March 2, 2006 14 Missing Data Missing data mask (black) for 41 stations (y-axis) in India for May 1 - Oct 31, 1973. 29% of the data is missing, with stations 13 14, 16, 24, 26, 30, 36, 38, and 40 missing more than 45% of the data for that station.

A Bit of Notation : 

March 2, 2006 15 A Bit of Notation Vector time series R Vector observation of R at time t R11 R12 R1M R13 R1 R21 R22 R2M R23 R2 RT1 RT2 RTM RT3 RT

Weather Generator : 

March 2, 2006 16 Weather Generator Does not take spatial correlation into account

Rain Generating Process : 

March 2, 2006 17 Rain Generating Process

Hidden Markov Model (HMM) : 

March 2, 2006 18 Hidden Markov Model (HMM) Discrete weather states S (K states) Evolution of the weather state – transition probability P(St|St-1) Rainfall generation in weather state i – emission probability P(Rt|St=i)

Hidden Markov Model (HMM) : 

March 2, 2006 19 Hidden Markov Model (HMM) R1 R2 Rt RT-1 RT S1 S2 St ST-1 ST

Basic Operations with HMMs : 

March 2, 2006 20 Basic Operations with HMMs Probability of weather states given observed data (inference) Forward-Backward Model parameter estimation given the data Baum-Welch (EM) Most likely sequence of weather states given the data Viterbi [Rabiner 89]

States for 4-state HMM : 

March 2, 2006 21 States for 4-state HMM [Robertson, Kirshner, Smyth 04]

Weather State Evolution : 

March 2, 2006 22 Weather State Evolution [Robertson, Kirshner, and Smyth 04]

Generalizations to HMMs: Auto-regressive HMM (AR-HMM) : 

March 2, 2006 23 Generalizations to HMMs: Auto-regressive HMM (AR-HMM) Explicitly models temporal first-order dependence of rainfall

Generalizations to HMMs: Non-homogeneous HMM (NHMM) : 

March 2, 2006 24 Generalizations to HMMs: Non-homogeneous HMM (NHMM) Incorporates atmospheric variables Allows non-stationary and oscillatory behavior [Hughes and Guttorp 94; Bengio and Frasconi 95]

Parameter Estimation : 

March 2, 2006 25 Parameter Estimation Find Q maximizing P(r|Q) (ML) or P(Q|r) (MAP) Cannot be done in closed form EM (Baum-Welch for HMMs) E-step: compute Forward-Backward Calculate M-step: Maximize Can be split into maximization of emission and transition parameters:

Modeling Approaches : 

March 2, 2006 26 Modeling Approaches Use HMMs Transition probabilities for temporal dependence Emissions (hidden state distributions) for spatial or multivariate dependence (and additional temporal dependence) Emphasis on categorical valued data Transitions and emissions can be specified separately Covers cross-product of models

Modeling Approaches (cont’d) : 

March 2, 2006 27 Modeling Approaches (cont’d) Use HMMs Possible emission distributions Conditional independence Chow-Liu trees [Chow and Liu 68], conditional Chow-Liu forests [Kirshner et al 04] Markov Random Fields Maximum entropy models [e.g., Jelinek 98], Boltzmann machines [e.g., Hinton and Sejnowski 86], thin junction trees [Bach and Jordan 02] Belief Networks Sigmoidal belief networks [Neal 92] Possible transition distributions Non-homogeneous mixture (mixture of experts [Jordan and Jacobs 94]) Stationary transition matrix Non-homogeneous transition matrix ([Hughes and Guttorp 94, Meila and Jordan 96, Bengio and Fasconi 95])

HMM-CI : 

March 2, 2006 28 HMM-CI [e.g., Zucchini and Guttorp 91; Hughes and Guttorp 94]

Why Use HMM-CI? : 

March 2, 2006 29 Why Use HMM-CI? Simple and efficient O(TKM) for inference and for parameter estimation Small number of free parameters Can handle missing data Can be used to model amounts

HMM-CI for Amounts : 

March 2, 2006 30 HMM-CI for Amounts Types of mixture components Gamma [Bellone 01] Exponentials [Robertson et al 06]

Why not HMM-CI : 

March 2, 2006 31 Why not HMM-CI Not matching spatial correlations or persistence well Models spatial correlation implicitly through hidden states May require large K to model regions with moderate number of stations

HMM-Autologistic : 

March 2, 2006 32 HMM-Autologistic [Hughes, Guttorp, and Charles 99]

What about HMM-Autologistic? : 

March 2, 2006 33 What about HMM-Autologistic? Sure! Models spatial correlations very well Can use sampling or approximate schemes to compute normalization constant and to update parameters Not so sure Complexity of exact computation is exponential in M What about temporal dependence? May have too many free parameters if not constrained Does not handle missing values (or very slow)

Neither Here nor There : 

March 2, 2006 34 Neither Here nor There HMM-CI efficient but too simplistic HMM-Autologistic more capable but computationally more cumbersome Want something in between Computationally tractable Emission spatial dependence Additional temporal dependence Missing values

Bayesian Networks and Trees : 

March 2, 2006 35 Bayesian Networks and Trees Tree-structured distributions Chow-Liu trees (spatial dependence) [Chow and Liu 68] With HMMs [Kirshner et al 04] Conditional Chow-Liu forests (spatial and temporal dependence) [Kirshner et al 04] Markov (undirected) and Bayesian (directed) networks MaxEnt (logistic) Conditional MaxEnt Sigmoidal belief networks [Neal 92] Would need to estimate both the parameters and the structure

Chow-Liu Trees : 

March 2, 2006 36 Chow-Liu Trees Approximation of a joint distribution with a tree-structured distribution [Chow and Liu 68] Maximizing log-likelihood  solving maximum spanning tree (MST) problem Can find both the tree structure and the parameters in one swoop! Finding MST is quadratic in the number of nodes [Kruskal 59] Edge weights are pairwise mutual information values – measure of conditional independence

Learning Chow-Liu Trees : 

March 2, 2006 37 0.3126 0.0229 0.0172 0.0230 0.0183 0.2603 Learning Chow-Liu Trees

Chow-Liu Trees : 

March 2, 2006 38 Chow-Liu Trees Approximation of a joint distribution with a tree-structured distribution [Chow and Liu 68] Properties Efficient: O(TM2B2) Optimal Can handle missing data Mixture of trees [Meila and Jordan 00] More expressive than trees yet with simple estimation procedure HMMs with trees [Kirshner et al 04]

HMM-Chow-Liu : 

March 2, 2006 39 HMM-Chow-Liu [Kirshner et al 04]

Tree-structured Emissions for Amounts : 

March 2, 2006 40 Tree-structured Emissions for Amounts Ot2 Ot4 Ot3 Ot1 Rt2 Rt4 Rt3 Rt1 St=1

Improving on Chow-Liu Trees : 

March 2, 2006 41 Improving on Chow-Liu Trees Tree edges with low MI add little to the approximation. Observations from the previous time point can be more relevant than from the current one. Idea: Build Chow-Liu tree allowing it to include variables from the current and the previous time point.

Conditional Chow-Liu Forests : 

March 2, 2006 42 Conditional Chow-Liu Forests Extension of Chow-Liu trees to conditional distributions Approximation of conditional multivariate distribution with a tree-structured distribution Uses MI to build maximum spanning (directed) trees (forest) Variables of two consecutive time points as nodes All nodes corresponding to the earlier time point considered connected before the tree construction Same asymptotic complexity as Chow-Liu trees Optimal (within the class of structures) [Kirshner et al 04]

Example of CCL-Forest Learning : 

March 2, 2006 43 0.3126 0.0229 0.0230 0.1207 0.1253 0.0623 0.1392 0.1700 0.0559 0.0033 0.0030 0.0625 Example of CCL-Forest Learning

HMM-Conditional-Chow-Liu : 

March 2, 2006 44 HMM-Conditional-Chow-Liu St=1 St=2 St=3 = [Kirshner et al 04]

Beyond Trees : 

March 2, 2006 45 Beyond Trees Can learn more complex structure Optimality not guaranteed [Chickering 96; Srebro 03] Structure and parameters may have to be learned in separate computations Computationally expensive Independence model matches all univariate marginals Chow-Liu trees match all univariate and some bivariate marginals Unconstrained Bayesian or Markov Networks May have too few data points for the number of parameters Even 3rd order cliques may have zero probability mass

Log-linear or Logistic : 

March 2, 2006 46 Log-linear or Logistic a b c d

Maximum Entropy Method : 

March 2, 2006 47 Maximum Entropy Method Given Target distribution (empirical) Set of features and corresponding constraints Example: feature is 1 when it rains both at station 1 and 2 Corresponding constraint Interpretation Proportion of time it rains simultaneously at stations 1 and 2 is the same for both the historical data and according to the learned distribution Want to satisfy all of the constraints [e.g., Jelinek 98]

MaxEnt Method (cont’d) : 

March 2, 2006 48 MaxEnt Method (cont’d) Maximize entropy of subject to constraints corresponding to features Exponential form satisfying all of the constraints for features in maximizes the log-likelihood of the data!!! [e.g., Della Pietra et al 97] Such solution is unique (likelihood is concave)

HMM-Autologistic : 

March 2, 2006 49 HMM-Autologistic [Hughes, Guttorp, and Charles 99]

Conditional Log-linear Distribution : 

March 2, 2006 50 Conditional Log-linear Distribution a c b d e

Conditional MaxEnt Method : 

March 2, 2006 51 Conditional MaxEnt Method Extension of MaxEnt distribution to conditional distributions Target distribution Set of features and corresponding constraints Maximize conditional entropy subject to constraints [e.g., Lafferty et al 01]

Learning parameters of MaxEnt models : 

March 2, 2006 52 Learning parameters of MaxEnt models Assume set of features given Need only free parameters to learn Cannot be done in closed form Iterative algorithms: IS, GIS, IIS, conjugate gradients [Brown 59, Darroch and Ratciff 72, Berger et al 96, Della Pietra et al 97, Goodman 02] Require computation of (or similar) per iteration Exact computation exponential in the size of the largest clique in the Markov network and proportional to the size of the data Needs computation of the junction tree and requires message passing [e.g., Bach and Jordan 02] Needs potentially large number of iterations Want to reduce computation

Sigmoidal Belief Network : 

March 2, 2006 53 Sigmoidal Belief Network a b c d [Neal 92]

Product of Univariate Conditional Maximum Entropy Models : 

March 2, 2006 54 Product of Univariate Conditional Maximum Entropy Models Approximate target distribution as a product of univariate conditional MaxEnt distributions (PUC-MaxEnt) Parameters for each factor can be learned separately Requires summation over only a single modeled variable at a time, not the largest clique No message passing required Intuition: Bayesian network with factors modeled as conditional univariate MaxEnt distributions Sigmoidal belief networks [Neal 92]

Structure Learning : 

March 2, 2006 55 Structure Learning Number of possible structure super-exponential in the number of variables Finding optimal solution NP-hard [Chickering 96] Need to search over possible structures Search Structure modification in the outer loop Parameter estimation in the inner loop Restricting to bivariate interactions Edge induction

HMM-PUC-MaxEnt : 

March 2, 2006 56 HMM-PUC-MaxEnt Rt St Rt2 Rt4 Rt3 Rt2 Rt4 Rt3 Rt2 Rt4 Rt3 St St=1 St=2 St=3 = Rt1 Rt1 Rt1

AR-HMM-PUC-MaxEnt : 

March 2, 2006 57 AR-HMM-PUC-MaxEnt Rt-13 Rt-11 Rt-12 Rt-14 Rt1 Rt2 Rt3 Rt4 Rt-14 Rt-13 Rt-12 Rt-11 Rt3 Rt4 Rt1 Rt2 Rt-14 Rt-13 Rt-12 Rt-11 Rt4 Rt3 Rt1 Rt2 St=1 St=2 St=3 =

Experimental Setup : 

March 2, 2006 58 Experimental Setup Data Australia 15 seasons, 184 days each, 30 stations Queensland 40 seasons, 197 days each, 11 stations Measuring predictive performance Choose K (number of states) Leave-n-out cross-validation Evaluation metrics Log-likelihood Error for prediction of a single entry given the rest Difference in spatial correlation Difference in persistence

Southwestern Australia : 

March 2, 2006 59 Southwestern Australia 1978-1992 May-October 15 seasons 184 days 30 stations

Scaled out-of-sample log-likelihood (SW Australia) : 

March 2, 2006 60 Scaled out-of-sample log-likelihood (SW Australia)

Out-of-sample predictive error (SW Australia) : 

March 2, 2006 61 Out-of-sample predictive error (SW Australia)

Examples of Weather States (HMM-CI) : 

March 2, 2006 62 Examples of Weather States (HMM-CI)

Examples of Weather States (HMM-CL) : 

March 2, 2006 63 Examples of Weather States (HMM-CL)

Examples of Weather States (HMM-PUC-MaxEnt) : 

March 2, 2006 64 Examples of Weather States (HMM-PUC-MaxEnt)

Queensland (Northeastern Australia) : 

March 2, 2006 65 Queensland (Northeastern Australia) 1958-1998 October-April 40 seasons 197 days 11 stations

Correlation and Persistence of Queensland Data : 

March 2, 2006 66 Correlation and Persistence of Queensland Data

Scaled out-of-sample log-likelihood (Queensland) : 

March 2, 2006 67 Scaled out-of-sample log-likelihood (Queensland)

Out-of-sample correlation difference (Queensland) : 

March 2, 2006 68 Out-of-sample correlation difference (Queensland)

Out-of-sample persistence difference (Queensland) : 

March 2, 2006 69 Out-of-sample persistence difference (Queensland)

Summary : 

March 2, 2006 70 Summary Important and interesting application Lots of data available, lots of problems to be solved Use tree-structured distributions Can find parameters and structure at the same time If trees are not sufficient, prepare cycle servers Learning complexity jumps once loops are introduced

Contributions : 

March 2, 2006 71 Contributions New models for multi-site rainfall occurrence and amounts Conditional Chow-Liu forest model for multivariate data [Kirshner, Smyth and Robertson, UAI-2004] HMM with Chow-Liu and conditional Chow-Liu trees for modeling multivariate time series [Kirshner, Smyth and Robertson, UAI-2004] HMM with Product-of-Univariate-Conditional MaxEnt distributions (PUC-MaxEnt) [Kirshner 2005] HMM with mixtures of exponentials [Robertson et al, in press] HMM with tree-structured mixtures

Software : 

March 2, 2006 72 Software (M)ulti(V)ariate (N)onhomogeneous (H)idden (M)arkov (M)odels Toolbox Free software for multivariate time series modeling with HMM as a backbone Large selection of implemented emission distributions http://www.datalab.uci.edu/software/mvnhmm

Future Work : 

March 2, 2006 73 Future Work Rainfall Filling in missing data Modeling large regions Factorized state space Using satellite data OLR fields Subseasonal predictions Selecting good input variables Other models for amounts Machine Learning Learning structure of the distribution from data Modeling in the presence of missing data Loops in HMM-Conditional-Chow-Liu and log-linear models Factorized state-space models Continuous hidden-state models Modeling of multivariate real-valued non-Gaussian distributions

Correlation for 4-state HMM-CI : 

March 2, 2006 74 Correlation for 4-state HMM-CI [Robertson et al 04]

Persistence for 4-state HMM-CI : 

March 2, 2006 75 Persistence for 4-state HMM-CI [Robertson et al 04]

Inference for NHMMs : 

March 2, 2006 76 Inference for NHMMs Inference (calculating ) Forward-Backward: recursively compute

Forecasting Precipitation : 

March 2, 2006 77 Forecasting Precipitation Can we use this model for forecasting? Same predicted expected values, no variability Need additional information about the seasons to be forecasted

HMM-CI: Is It Sufficient? : 

March 2, 2006 78 HMM-CI: Is It Sufficient? Simple yet effective Few parameters Implicit marginal spatial dependency through the hidden states Requires large number of hidden states Points to exploration of dependency models

Limitations of Chow-Liu Structures : 

March 2, 2006 79 Limitations of Chow-Liu Structures