Presentation Transcript
INTRODUCTION TO Machine Learning :INTRODUCTION TO Machine Learning Lecture Slides for
Slide 2:Local Models
Introduction :3 Introduction Divide the input space into local regions and learn simple (constant/linear) models in each patch
Unsupervised: Competitive, online clustering
Supervised: Radial-basis func, mixture of experts
Competitive Learning :4 Competitive Learning
Slide 5:5 Winner-take-all
network
Adaptive Resonance Theory :6 Adaptive Resonance Theory Incremental; add a new cluster if not covered; defined by vigilance, ? (Carpenter and Grossberg, 1988)
Self-Organizing Maps :7 Self-Organizing Maps Units have a neighborhood defined; mi is “between” mi-1 and mi+1, and are all updated together
One-dim map: (Kohonen, 1990)
Radial-Basis Functions :8 Radial-Basis Functions Locally-tuned units:
Local vs Distributed Representation :9 Local vs Distributed Representation
Training RBF :10 Training RBF Hybrid learning:
First layer centers and spreads:
Unsupervised k-means
Second layer weights: Supervised gradient-descent
Fully supervised
(Broomhead and Lowe, 1988; Moody and Darken, 1989)
Regression :11 Regression
Classification :12 Classification
Rules and Exceptions :13 Rules and Exceptions Default
rule Exceptions
Rule-Based Knowledge :14 Rule-Based Knowledge Incorporation of prior knowledge (before training)
Rule extraction (after training) (Tresp et al., 1997)
Fuzzy membership functions and fuzzy rules
Normalized Basis Functions :15 Normalized Basis Functions
Competitive Basis Functions :16 Competitive Basis Functions Mixture model:
Regression :17 Regression
Classification :18 Classification
EM for RBF (Supervised EM) :19 EM for RBF (Supervised EM) E-step:
M-step:
Learning Vector Quantization :20 Learning Vector Quantization H units per class prelabeled (Kohonen, 1990)
Given x, mi is the closest: x mi mj
Mixture of Experts :21 Mixture of Experts In RBF, each local fit is a constant, wih, second layer weight
In MoE, each local fit is a linear function of x, a local expert: (Jacobs et al., 1991)
MoE as Models Combined :22 MoE as Models Combined Radial gating:
Softmax gating:
Cooperative MoE :23 Cooperative MoE Regression
Competitive MoE: Regression :24 Competitive MoE: Regression
Competitive MoE: Classification :25 Competitive MoE: Classification
Hierarchical Mixture of Experts :26 Hierarchical Mixture of Experts Tree of MoE where each MoE is an expert in a higher-level MoE
Soft decision tree: Takes a weighted (gating) average of all leaves (experts), as opposed to using a single path and a single leaf
Can be trained using EM (Jordan and Jacobs, 1994)