authorSTREAM Share PowerPoint. Anywhere

UE ANC2006

Uploaded from authorPOINT Lite
Download as Download Not Available PPT
Presentation Description

No description available

What's up on authorSTREAM?
Views: 6
Like it  ( Likes) Dislike it  ( Dislikes)
Added: September 30, 2007 This presentation is Public
Presentation Category :Entertainment
Presentation StatisticsNew!
Views on authorSTREAM: 6
Presentation Transcript

Regression under Covariate Shift : Regression under Covariate Shift Masashi Sugiyama Department of Computer Science Tokyo Institute of Technology http://sugiyama-www.cs.titech.ac.jp/~sugi sugi@cs.titech.ac.jp


Abstract : Abstract One of the common assumptions in supervised learning is that the input points in the training set follow the same probability distribution that the input points used for testing follow. However, this assumption is not satisfied, for example, when the outside of training region is inter/extrapolated. The situation where the training input points and test input points follow different distributions is called the covariate shift. Under the covariate shift, standard supervised learning techniques such as maximum likelihood estimation or cross-validation do not work well since their unbiasedness is no longer maintained. In this talk, I present (non-Bayesian) supervised learning techniques which possess desirable theoretical properties even under the covariate shift.


Research Interests : Research Interests Statistical machine learning Methodologies Applications


Methodologies : Methodologies Small sample model selection for Kernel Machines Sugiyama, M. & Müller, K.-R. The subspace information criterion for infinite dimensional hypothesis spaces. Journal of Machine Learning Research, vol.3 (Nov), pp.323-359, 2002. Sugiyama, M., Kawanabe, M., & Müller, K.-R. Trading variance reduction with unbiasedness: The regularized subspace information criterion for robust model selection in kernel regression. Neural Computation, vol.16, no.5, pp.1077-1104, 2004.


Methodologies (cont.) : Methodologies (cont.) Bias-considered active learning Sugiyama, M. & Ogawa, H. Incremental active learning for optimal generalization. Neural Computation, vol.12, no.12, pp.2909-2940, 2000. Sugiyama, M. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research, vol.7 (Jan), pp.141-166, 2006.


Methodologies (cont.) : Methodologies (cont.) Dimensionality reduction for data visualization: Blanchard, G., Kawanabe, M., Sugiyama, M., Spokoiny, V., & Müller, K.-R. In search of non-Gaussian components of a high-dimensional distribution. Journal of Machine Learning Research, vol.7 (Feb), pp.247-282, 2006. Sugiyama, M., submitted


Methodologies (cont.) : Methodologies (cont.) Online learning in kernel methods Several papers in Neural Networks, etc. RKHS-based model selection Several papers in Neural Computation, Neural Networks, IEEE Transactions on Neural Networks, Machine Learning, etc.


Methodologies (cont.) : Methodologies (cont.) Covariate shift adaptation (today’s topic!) Sugiyama, M. & Müller, K.-R. Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions, to appear. Sugiyama, M., Blankertz, B. Krauledat, M., Donehege, G., & Müller, K.-R., submitted.


Applications : Applications Optical surface profiling 4 patents applied, already commercialized Toray Engineering http://www.scn.tv/corp/torayins/index-eng.html


Applications (cont.) : Applications (cont.) Brain-computer interface (with Fraunhofer FIRST.IDA, Berlin) http://www.bbci.de/


Applications (cont.) : Applications (cont.) Rainfall prediction Image restoration Robotics (with Sethu Vijayakumar, on going!) Etc.


Today’s Topic : Today’s Topic Regression under covariate shift Brief introduction of regression problem What is covariate shift?


Linear Regression Problem : Unknown learning target:      Training samples: Linear regression model: Linear Regression Problem :Parameter :Basis function


Goal of Regression Problem : :Test input point (not included in training set) Test error: Prediction error at Generalization error: Expected test error over all test input points Goal of Regression Problem Goal: Learn so that gen. error is minimized


Common Assumption : Common Assumption Almost all supervised learning methods proposed so far assume: test input points follow the same distribution as the training input points Gen. error: e.g. Wahba (1990), Bishop (1995),    Vapnik (1998), Schölkopf & Smola (2002)


Covariate Shift : Covariate Shift Test and training input points follow different distributions.         Gen. error: Is covariate shift important to explore? YES!


Examples : Examples (Weak) extrapolation: Predict output values outside training region Training samples Test samples


Examples (cont.) : Examples (cont.) Covariate shift is conceivable in many real applications such as Imbalanced classification, e.g., in bioinformatics Nonstationarity compensation, e.g., in brain-computer interface Online system adaptation, e.g., in reinforcement learning Sample selection bias, e.g., in economics


Examples (cont.) : Examples (cont.) Active learning (experimental design): User designs the training input distribution so that generalization capability is maximized. Test input points are coming from the environment (which we can not control). So covariate shift naturally occurs! Sugiyama, M. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research, vol.7 (Jan), pp.141-166, 2006.


Plan of My Talk : Plan of My Talk Introduce covariate shift Illustrate how standard supervised learning methods are affected by covariate shift Explain alternative methods Numerical examples


Linear Regression (revisited) : Unknown learning target:      Training samples: Linear regression model: Our model may not be correct. Linear Regression (revisited)


Covariate Shift : Covariate Shift Test and training input points follow different distributions.         Gen. error:


Ordinary Least-Squares under Covariate Shift : Ordinary Least-Squares under Covariate Shift For correct models, bias is asymptotically minimized. For misspecified models, bias is not minimized even asymptotically. We want to reduce bias.


Law of Large Numbers : Law of Large Numbers Sample average converges to the population mean: We want to estimate the expectation over test input points only using training input points .


Importance-Weighted Average : Importance-Weighted Average Importance:Ratio of test and training input densities Importance-weighted average: (cf. importance sampling)


Importance-Weighted LS for Covariate Shift : Even for misspedified models, bias is minimized asymptotically. Importance-Weighted LS for Covariate Shift :Assumed known and strictly positive


Importance-Weighted LS (cont.) : Importance-Weighted LS (cont.) However, variance of IWLS is larger than OLS (cf. BLUE) We want to reduce variance We reduce variance by adding small bias to IWLS (e.g., changing weight, regularization)


Adaptive IWLS : Adaptive IWLS Large bias Small variance Small bias Large variance (Intermediate) (Shimodaira, 2000)


Model Selection : Model Selection We want to determine so that generalization error is minimized. However, is inaccessible. We derive an estimator of , and determine so that is minimized.


Cross-Validation : Cross-Validation A standard method for gen. error estimation Divide training samples into groups. Train a learning machine with groups. Validate the trained machine using the rest. Repeat this for all combinations and output the mean validation error as . Group 1 Group 2 Group k Group k-1 … Training Validation


CV under Covariate Shift : CV under Covariate Shift CV is almost unbiased without covariate shift. However, it is heavily biased under covariate shift. We want to have a better generalization error estimator! Cross validation True gen. error Our new method! Sugiyama, M. & Müller, K.-R. Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions (to appear)


Decomposition of Generalization Error : Decomposition of Generalization Error We estimate Accessible Constant (ignored) Estimated


Decomposition of Learning Target Function : Decomposition of Learning Target Function :Optimal parameter


Estimating (cont.) : Estimating (cont.) IWLS for linear models has analytic solution:


Estimating . : Estimating . is unknown. If an unbiased estimator of is available, simply replacing with gives an unbiased estimator of : However, if the same training samples are used for obtaining , correlation exists.


Unbiased Estimation of : Unbiased Estimation of Suppose we have , which gives linear unbiased estimator of :Unbiased estimator of noise variance Then we can construct an unbiased estimator of by However, are not always available. Use approximations instead


Approximations of : Approximations of If model is correct, If model is misspecified,


New Gen. Error Estimator : New Gen. Error Estimator (Sugiyama. & Müller, 2006)


Unbiasedness : Unbiasedness Exactly unbiased if model is correct: Almost unbiased if model is almost correct: Asymptotically unbiased in general: Bias:


Model Comparison : Model Comparison Purpose of estimating gen. error is to select a good value of . To this end, we just need to accurately estimate the difference of generalization errors for different values of .


Model Comparison (cont.) : If the following holds, we can select a better model on average. Model Comparison (cont.)


Model Comparison (cont.) : We simplify the criterion since sgn is hard to deal with. Model Comparison (cont.) x2 Effective in model comparison: Asymptotically effective in model comparison:


Model Comparison (cont.) : Model Comparison (cont.) The proposed gen. error estimator is Effective in model comparison if model is correct. Asymptotically effective in model comparison in general.


Simulation (Toy) : Simulation (Toy)


Results : Results 10-fold cross-validation True gen. error Proposed estimator


Simulation (Abalone from DELVE) : Simulation (Abalone from DELVE) Estimate the age of abalones from 7 physical measurements. We add bias to 4th attribute (weight of abalones) Training and test input densities are estimated by standard kernel density estimator.  


Gen. Error Estimation : Gen. Error Estimation Mean over 300 trials 10CV True gen. error Proposed


Test Error After Model Selection : Test Error After Model Selection T-test (95%) Extrapolation in 4th attribute Extrapolation in 6th attribute


Conclusions : Conclusions Covariate shift: Training and test input distributions are different Ordinary least-squares: Biased Importance-weighted LS: Unbiased but large variance. Adaptive IWLS: Model selection needed. Cross-validation: Biased Proposed estimator: possesses unbiasedness and effectiveness in model comparison.


Ongoing Works : Ongoing Works Unbiased gen. error estimators can have large variance and thus unstable. Regularize gen. error estimators following, e.g., Gen. error estimation under covariate shift for classification (Sugiyama et al., submitted). Application, e.g., to brain-computer interface (with Fraunhofer FIRST, Berlin) Sugiyama, M., Kawanabe, M., & Müller, K.-R. Trading variance reduction with unbiasedness: The regularized subspace information criterion for robust model selection in kernel regression. Neural Computation, vol.16, no.5, pp.1077-1104, 2004.


Bayesian? : Bayesian? Is it possible to deal with covariate shift in Bayesian framework?