logging in or signing up UE ANC2006 Gourangi Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 43 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 30, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Regression underCovariate Shift: Regression under Covariate Shift Masashi Sugiyama Department of Computer Science Tokyo Institute of Technology http://sugiyama-www.cs.titech.ac.jp/~sugi sugi@cs.titech.ac.jp Abstract: Abstract One of the common assumptions in supervised learning is that the input points in the training set follow the same probability distribution that the input points used for testing follow. However, this assumption is not satisfied, for example, when the outside of training region is inter/extrapolated. The situation where the training input points and test input points follow different distributions is called the covariate shift. Under the covariate shift, standard supervised learning techniques such as maximum likelihood estimation or cross-validation do not work well since their unbiasedness is no longer maintained. In this talk, I present (non-Bayesian) supervised learning techniques which possess desirable theoretical properties even under the covariate shift.Research Interests: Research Interests Statistical machine learning Methodologies ApplicationsMethodologies: Methodologies Small sample model selection for Kernel Machines Sugiyama, M. & Müller, K.-R. The subspace information criterion for infinite dimensional hypothesis spaces. Journal of Machine Learning Research, vol.3 (Nov), pp.323-359, 2002. Sugiyama, M., Kawanabe, M., & Müller, K.-R. Trading variance reduction with unbiasedness: The regularized subspace information criterion for robust model selection in kernel regression. Neural Computation, vol.16, no.5, pp.1077-1104, 2004. Methodologies (cont.): Methodologies (cont.) Bias-considered active learning Sugiyama, M. & Ogawa, H. Incremental active learning for optimal generalization. Neural Computation, vol.12, no.12, pp.2909-2940, 2000. Sugiyama, M. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research, vol.7 (Jan), pp.141-166, 2006. Methodologies (cont.): Methodologies (cont.) Dimensionality reduction for data visualization: Blanchard, G., Kawanabe, M., Sugiyama, M., Spokoiny, V., & Müller, K.-R. In search of non-Gaussian components of a high-dimensional distribution. Journal of Machine Learning Research, vol.7 (Feb), pp.247-282, 2006. Sugiyama, M., submitted Methodologies (cont.): Methodologies (cont.) Online learning in kernel methods Several papers in Neural Networks, etc. RKHS-based model selection Several papers in Neural Computation, Neural Networks, IEEE Transactions on Neural Networks, Machine Learning, etc.Methodologies (cont.): Methodologies (cont.) Covariate shift adaptation (today’s topic!) Sugiyama, M. & Müller, K.-R. Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions, to appear. Sugiyama, M., Blankertz, B. Krauledat, M., Donehege, G., & Müller, K.-R., submitted.Applications: Applications Optical surface profiling 4 patents applied, already commercialized Toray Engineering http://www.scn.tv/corp/torayins/index-eng.htmlApplications (cont.): Applications (cont.) Brain-computer interface (with Fraunhofer FIRST.IDA, Berlin) http://www.bbci.de/Applications (cont.): Applications (cont.) Rainfall prediction Image restoration Robotics (with Sethu Vijayakumar, on going!) Etc. Today’s Topic: Today’s Topic Regression under covariate shift Brief introduction of regression problem What is covariate shift?Linear Regression Problem: Unknown learning target: Training samples: Linear regression model: Linear Regression Problem :Parameter :Basis functionGoal of Regression Problem: :Test input point (not included in training set) Test error: Prediction error at Generalization error: Expected test error over all test input points Goal of Regression Problem Goal: Learn so that gen. error is minimizedCommon Assumption : Common Assumption Almost all supervised learning methods proposed so far assume: test input points follow the same distribution as the training input points Gen. error: e.g. Wahba (1990), Bishop (1995), Vapnik (1998), Schölkopf & Smola (2002) Covariate Shift: Covariate Shift Test and training input points follow different distributions. Gen. error: Is covariate shift important to explore? YES!Examples: Examples (Weak) extrapolation: Predict output values outside training region Training samples Test samplesExamples (cont.): Examples (cont.) Covariate shift is conceivable in many real applications such as Imbalanced classification, e.g., in bioinformatics Nonstationarity compensation, e.g., in brain-computer interface Online system adaptation, e.g., in reinforcement learning Sample selection bias, e.g., in economicsExamples (cont.): Examples (cont.) Active learning (experimental design): User designs the training input distribution so that generalization capability is maximized. Test input points are coming from the environment (which we can not control). So covariate shift naturally occurs! Sugiyama, M. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research, vol.7 (Jan), pp.141-166, 2006. Plan of My Talk: Plan of My Talk Introduce covariate shift Illustrate how standard supervised learning methods are affected by covariate shift Explain alternative methods Numerical examplesLinear Regression (revisited): Unknown learning target: Training samples: Linear regression model: Our model may not be correct. Linear Regression (revisited)Covariate Shift: Covariate Shift Test and training input points follow different distributions. Gen. error: Ordinary Least-Squaresunder Covariate Shift: Ordinary Least-Squares under Covariate Shift For correct models, bias is asymptotically minimized. For misspecified models, bias is not minimized even asymptotically. We want to reduce bias.Law of Large Numbers: Law of Large Numbers Sample average converges to the population mean: We want to estimate the expectation over test input points only using training input points .Importance-Weighted Average: Importance-Weighted Average Importance:Ratio of test and training input densities Importance-weighted average: (cf. importance sampling)Importance-Weighted LSfor Covariate Shift: Even for misspedified models, bias is minimized asymptotically. Importance-Weighted LS for Covariate Shift :Assumed known and strictly positiveImportance-Weighted LS (cont.): Importance-Weighted LS (cont.) However, variance of IWLS is larger than OLS (cf. BLUE) We want to reduce variance We reduce variance by adding small bias to IWLS (e.g., changing weight, regularization) Adaptive IWLS: Adaptive IWLS Large bias Small variance Small bias Large variance (Intermediate) (Shimodaira, 2000)Model Selection: Model Selection We want to determine so that generalization error is minimized. However, is inaccessible. We derive an estimator of , and determine so that is minimized.Cross-Validation: Cross-Validation A standard method for gen. error estimation Divide training samples into groups. Train a learning machine with groups. Validate the trained machine using the rest. Repeat this for all combinations and output the mean validation error as . Group 1 Group 2 Group k Group k-1 … Training ValidationCV under Covariate Shift: CV under Covariate Shift CV is almost unbiased without covariate shift. However, it is heavily biased under covariate shift. We want to have a better generalization error estimator! Cross validation True gen. error Our new method! Sugiyama, M. & Müller, K.-R. Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions (to appear)Decomposition ofGeneralization Error: Decomposition of Generalization Error We estimate Accessible Constant (ignored) EstimatedDecomposition ofLearning Target Function: Decomposition of Learning Target Function :Optimal parameterEstimating (cont.): Estimating (cont.) IWLS for linear models has analytic solution: Estimating . : Estimating . is unknown. If an unbiased estimator of is available, simply replacing with gives an unbiased estimator of : However, if the same training samples are used for obtaining , correlation exists. Unbiased Estimation of : Unbiased Estimation of Suppose we have , which gives linear unbiased estimator of :Unbiased estimator of noise variance Then we can construct an unbiased estimator of by However, are not always available. Use approximations instead Approximations of : Approximations of If model is correct, If model is misspecified, New Gen. Error Estimator: New Gen. Error Estimator (Sugiyama. & Müller, 2006)Unbiasedness: Unbiasedness Exactly unbiased if model is correct: Almost unbiased if model is almost correct: Asymptotically unbiased in general: Bias:Model Comparison: Model Comparison Purpose of estimating gen. error is to select a good value of . To this end, we just need to accurately estimate the difference of generalization errors for different values of . Model Comparison (cont.): If the following holds, we can select a better model on average. Model Comparison (cont.) Model Comparison (cont.): We simplify the criterion since sgn is hard to deal with. Model Comparison (cont.) x2 Effective in model comparison: Asymptotically effective in model comparison: Model Comparison (cont.): Model Comparison (cont.) The proposed gen. error estimator is Effective in model comparison if model is correct. Asymptotically effective in model comparison in general.Simulation (Toy): Simulation (Toy)Results: Results 10-fold cross-validation True gen. error Proposed estimatorSimulation (Abalone from DELVE): Simulation (Abalone from DELVE) Estimate the age of abalones from 7 physical measurements. We add bias to 4th attribute (weight of abalones) Training and test input densities are estimated by standard kernel density estimator. Gen. Error Estimation: Gen. Error Estimation Mean over 300 trials 10CV True gen. error ProposedTest Error After Model Selection: Test Error After Model Selection T-test (95%) Extrapolation in 4th attribute Extrapolation in 6th attributeConclusions: Conclusions Covariate shift: Training and test input distributions are different Ordinary least-squares: Biased Importance-weighted LS: Unbiased but large variance. Adaptive IWLS: Model selection needed. Cross-validation: Biased Proposed estimator: possesses unbiasedness and effectiveness in model comparison.Ongoing Works: Ongoing Works Unbiased gen. error estimators can have large variance and thus unstable. Regularize gen. error estimators following, e.g., Gen. error estimation under covariate shift for classification (Sugiyama et al., submitted). Application, e.g., to brain-computer interface (with Fraunhofer FIRST, Berlin) Sugiyama, M., Kawanabe, M., & Müller, K.-R. Trading variance reduction with unbiasedness: The regularized subspace information criterion for robust model selection in kernel regression. Neural Computation, vol.16, no.5, pp.1077-1104, 2004. Bayesian?: Bayesian? Is it possible to deal with covariate shift in Bayesian framework? You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
UE ANC2006 Gourangi Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 43 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 30, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Regression underCovariate Shift: Regression under Covariate Shift Masashi Sugiyama Department of Computer Science Tokyo Institute of Technology http://sugiyama-www.cs.titech.ac.jp/~sugi sugi@cs.titech.ac.jp Abstract: Abstract One of the common assumptions in supervised learning is that the input points in the training set follow the same probability distribution that the input points used for testing follow. However, this assumption is not satisfied, for example, when the outside of training region is inter/extrapolated. The situation where the training input points and test input points follow different distributions is called the covariate shift. Under the covariate shift, standard supervised learning techniques such as maximum likelihood estimation or cross-validation do not work well since their unbiasedness is no longer maintained. In this talk, I present (non-Bayesian) supervised learning techniques which possess desirable theoretical properties even under the covariate shift.Research Interests: Research Interests Statistical machine learning Methodologies ApplicationsMethodologies: Methodologies Small sample model selection for Kernel Machines Sugiyama, M. & Müller, K.-R. The subspace information criterion for infinite dimensional hypothesis spaces. Journal of Machine Learning Research, vol.3 (Nov), pp.323-359, 2002. Sugiyama, M., Kawanabe, M., & Müller, K.-R. Trading variance reduction with unbiasedness: The regularized subspace information criterion for robust model selection in kernel regression. Neural Computation, vol.16, no.5, pp.1077-1104, 2004. Methodologies (cont.): Methodologies (cont.) Bias-considered active learning Sugiyama, M. & Ogawa, H. Incremental active learning for optimal generalization. Neural Computation, vol.12, no.12, pp.2909-2940, 2000. Sugiyama, M. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research, vol.7 (Jan), pp.141-166, 2006. Methodologies (cont.): Methodologies (cont.) Dimensionality reduction for data visualization: Blanchard, G., Kawanabe, M., Sugiyama, M., Spokoiny, V., & Müller, K.-R. In search of non-Gaussian components of a high-dimensional distribution. Journal of Machine Learning Research, vol.7 (Feb), pp.247-282, 2006. Sugiyama, M., submitted Methodologies (cont.): Methodologies (cont.) Online learning in kernel methods Several papers in Neural Networks, etc. RKHS-based model selection Several papers in Neural Computation, Neural Networks, IEEE Transactions on Neural Networks, Machine Learning, etc.Methodologies (cont.): Methodologies (cont.) Covariate shift adaptation (today’s topic!) Sugiyama, M. & Müller, K.-R. Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions, to appear. Sugiyama, M., Blankertz, B. Krauledat, M., Donehege, G., & Müller, K.-R., submitted.Applications: Applications Optical surface profiling 4 patents applied, already commercialized Toray Engineering http://www.scn.tv/corp/torayins/index-eng.htmlApplications (cont.): Applications (cont.) Brain-computer interface (with Fraunhofer FIRST.IDA, Berlin) http://www.bbci.de/Applications (cont.): Applications (cont.) Rainfall prediction Image restoration Robotics (with Sethu Vijayakumar, on going!) Etc. Today’s Topic: Today’s Topic Regression under covariate shift Brief introduction of regression problem What is covariate shift?Linear Regression Problem: Unknown learning target: Training samples: Linear regression model: Linear Regression Problem :Parameter :Basis functionGoal of Regression Problem: :Test input point (not included in training set) Test error: Prediction error at Generalization error: Expected test error over all test input points Goal of Regression Problem Goal: Learn so that gen. error is minimizedCommon Assumption : Common Assumption Almost all supervised learning methods proposed so far assume: test input points follow the same distribution as the training input points Gen. error: e.g. Wahba (1990), Bishop (1995), Vapnik (1998), Schölkopf & Smola (2002) Covariate Shift: Covariate Shift Test and training input points follow different distributions. Gen. error: Is covariate shift important to explore? YES!Examples: Examples (Weak) extrapolation: Predict output values outside training region Training samples Test samplesExamples (cont.): Examples (cont.) Covariate shift is conceivable in many real applications such as Imbalanced classification, e.g., in bioinformatics Nonstationarity compensation, e.g., in brain-computer interface Online system adaptation, e.g., in reinforcement learning Sample selection bias, e.g., in economicsExamples (cont.): Examples (cont.) Active learning (experimental design): User designs the training input distribution so that generalization capability is maximized. Test input points are coming from the environment (which we can not control). So covariate shift naturally occurs! Sugiyama, M. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research, vol.7 (Jan), pp.141-166, 2006. Plan of My Talk: Plan of My Talk Introduce covariate shift Illustrate how standard supervised learning methods are affected by covariate shift Explain alternative methods Numerical examplesLinear Regression (revisited): Unknown learning target: Training samples: Linear regression model: Our model may not be correct. Linear Regression (revisited)Covariate Shift: Covariate Shift Test and training input points follow different distributions. Gen. error: Ordinary Least-Squaresunder Covariate Shift: Ordinary Least-Squares under Covariate Shift For correct models, bias is asymptotically minimized. For misspecified models, bias is not minimized even asymptotically. We want to reduce bias.Law of Large Numbers: Law of Large Numbers Sample average converges to the population mean: We want to estimate the expectation over test input points only using training input points .Importance-Weighted Average: Importance-Weighted Average Importance:Ratio of test and training input densities Importance-weighted average: (cf. importance sampling)Importance-Weighted LSfor Covariate Shift: Even for misspedified models, bias is minimized asymptotically. Importance-Weighted LS for Covariate Shift :Assumed known and strictly positiveImportance-Weighted LS (cont.): Importance-Weighted LS (cont.) However, variance of IWLS is larger than OLS (cf. BLUE) We want to reduce variance We reduce variance by adding small bias to IWLS (e.g., changing weight, regularization) Adaptive IWLS: Adaptive IWLS Large bias Small variance Small bias Large variance (Intermediate) (Shimodaira, 2000)Model Selection: Model Selection We want to determine so that generalization error is minimized. However, is inaccessible. We derive an estimator of , and determine so that is minimized.Cross-Validation: Cross-Validation A standard method for gen. error estimation Divide training samples into groups. Train a learning machine with groups. Validate the trained machine using the rest. Repeat this for all combinations and output the mean validation error as . Group 1 Group 2 Group k Group k-1 … Training ValidationCV under Covariate Shift: CV under Covariate Shift CV is almost unbiased without covariate shift. However, it is heavily biased under covariate shift. We want to have a better generalization error estimator! Cross validation True gen. error Our new method! Sugiyama, M. & Müller, K.-R. Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions (to appear)Decomposition ofGeneralization Error: Decomposition of Generalization Error We estimate Accessible Constant (ignored) EstimatedDecomposition ofLearning Target Function: Decomposition of Learning Target Function :Optimal parameterEstimating (cont.): Estimating (cont.) IWLS for linear models has analytic solution: Estimating . : Estimating . is unknown. If an unbiased estimator of is available, simply replacing with gives an unbiased estimator of : However, if the same training samples are used for obtaining , correlation exists. Unbiased Estimation of : Unbiased Estimation of Suppose we have , which gives linear unbiased estimator of :Unbiased estimator of noise variance Then we can construct an unbiased estimator of by However, are not always available. Use approximations instead Approximations of : Approximations of If model is correct, If model is misspecified, New Gen. Error Estimator: New Gen. Error Estimator (Sugiyama. & Müller, 2006)Unbiasedness: Unbiasedness Exactly unbiased if model is correct: Almost unbiased if model is almost correct: Asymptotically unbiased in general: Bias:Model Comparison: Model Comparison Purpose of estimating gen. error is to select a good value of . To this end, we just need to accurately estimate the difference of generalization errors for different values of . Model Comparison (cont.): If the following holds, we can select a better model on average. Model Comparison (cont.) Model Comparison (cont.): We simplify the criterion since sgn is hard to deal with. Model Comparison (cont.) x2 Effective in model comparison: Asymptotically effective in model comparison: Model Comparison (cont.): Model Comparison (cont.) The proposed gen. error estimator is Effective in model comparison if model is correct. Asymptotically effective in model comparison in general.Simulation (Toy): Simulation (Toy)Results: Results 10-fold cross-validation True gen. error Proposed estimatorSimulation (Abalone from DELVE): Simulation (Abalone from DELVE) Estimate the age of abalones from 7 physical measurements. We add bias to 4th attribute (weight of abalones) Training and test input densities are estimated by standard kernel density estimator. Gen. Error Estimation: Gen. Error Estimation Mean over 300 trials 10CV True gen. error ProposedTest Error After Model Selection: Test Error After Model Selection T-test (95%) Extrapolation in 4th attribute Extrapolation in 6th attributeConclusions: Conclusions Covariate shift: Training and test input distributions are different Ordinary least-squares: Biased Importance-weighted LS: Unbiased but large variance. Adaptive IWLS: Model selection needed. Cross-validation: Biased Proposed estimator: possesses unbiasedness and effectiveness in model comparison.Ongoing Works: Ongoing Works Unbiased gen. error estimators can have large variance and thus unstable. Regularize gen. error estimators following, e.g., Gen. error estimation under covariate shift for classification (Sugiyama et al., submitted). Application, e.g., to brain-computer interface (with Fraunhofer FIRST, Berlin) Sugiyama, M., Kawanabe, M., & Müller, K.-R. Trading variance reduction with unbiasedness: The regularized subspace information criterion for robust model selection in kernel regression. Neural Computation, vol.16, no.5, pp.1077-1104, 2004. Bayesian?: Bayesian? Is it possible to deal with covariate shift in Bayesian framework?