mrdm talk

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

An Introduction to Multi-Relational Data Mining: 

An Introduction to Multi-Relational Data Mining Xiaoxin Yin Department of Computer Science University of Illinois at Urbana-Champaign

Traditional Data Mining: 

Traditional Data Mining Work on single “flat” relations flatten Contact Lose information of linkages and relationships Cannot utilize information of database structures or schemas

Multi-Relational Data Mining (MRDM): 

Multi-Relational Data Mining (MRDM) Motivations Most structured data are stored in relational databases MRDM can utilize linkage and structural information Knowledge discovery in multi-relational environments Multi-relational rules Multi-relational clustering Multi-relational classification Multi-relational linkage analysis …

Applications of MRDM: 

Applications of MRDM e-Commerce: discovering patterns involving customers, products, manufacturers, … Bioinformatics/Medical databases: discovering patterns involving genes, patients, diseases, … Networking security: discovering patterns involving hosts, connections, services, … Many other relational data sources

MRDM Approaches: 

MRDM Approaches Inductive Logic Programming (ILP) Find models that are coherent with background knowledge Multi-relational Clustering Analysis Clustering objects with multi-relational information Probabilistic Relational Models Model cross-relational probabilistic distributions Efficient Multi-relational Classification Our work that will be presented later today

Inductive Logic Programming (ILP): 

Inductive Logic Programming (ILP) Find a hypothesis that is consistent with background knowledge (training data) FOIL, Golem, Progol, TILDE, … Background knowledge Relations (predicates), Tuples (ground facts) Training examples Background knowledge

Inductive Logic Programming (ILP): 

Inductive Logic Programming (ILP) Hypothesis The hypothesis is usually a set of rules, which can predict certain attributes in certain relations Daughter(X,Y) ← female(X), parent(Y,X)

FOIL: First-Order Inductive Learner : 

FOIL: First-Order Inductive Learner Find a set of rules consistent with training data E.g. female(X), parent(Y,X) → daughter(X,Y) A top-down, sequential covering learner Build each rule by heuristics Foil gain – a special type of information gain Examples covered by Rule 1 All examples Examples covered by Rule 2 Examples covered by Rule 3

ILP Approaches: 

ILP Approaches Top-down Approaches (e.g. FOIL) while(enough examples left) generate a rule remove examples satisfying this rule Bottom-up Approaches (e.g. Golem) Use each example as a rule Generalize rules by merging rules Decision Tree Approaches (e.g. TILDE)

ILP – Summary: 

ILP – Summary Advantages Expressive and powerful Rules are understandable Disadvantages Inefficient for databases with complex schemas Not appropriate for continuous attributes

Multi-relational Clustering: 

Multi-relational Clustering RDBC (Kirsten & Wrobel 98) Distance-based agglomerative clustering First-order K-Means clustering (Kirsten & Wrobel 00) Distance-based K-Means clustering Relational distance measure RIBL – Relational Instance-based Learning (Emde & Wettschereck 96) Measure distance between two objects by their attributes and their neighbor objects in relational databases

Relational Distance Measure: 

Relational Distance Measure RIBL (Relational Instance-Based Learning) To measure distance between objects O1 and O2, neighbor objects of O1 and O2 are also considered. Relational data member(person1 ; 45 ; male; 20 ; gold) member(person2 ; 30 ; female; 10 ; platinum) car(person1 ; wagon; 200 ; volkswagen) car(person1 ; sedan; 220 ; mercedesbenz ) car(person2 ; roadster; 240 ; audi) car(person2 ; coupe; 260 ; bmw) house(person1 ; murgle; 1987 ; 560 ) house(person1 ; montecarlo; 1990 ; 210 ) house(person2 ; murgle; 1999 ; 430 ) district(montecarlo; famous; large; monaco) district(murgle; famous; small ; slovenia) Neighbor data of level 2

Relational Distance Measure (cont.): 

Relational Distance Measure (cont.) Distance between two objects O1 and O2 is defined by Attributes of O1 and O2: Discrete attribute: distance = 1 if equal; 0 otherwise. Numerical attribute: distance = diff / range Neighbor objects of O1 and O2: Defined recursively Comments Advantage: considering related objects in distance measure Disadvantage: very expensive to compute, because of the huge number of related objects

RDBC: Relational Distance-Based Clustering: 

RDBC: Relational Distance-Based Clustering Use distance measure of RIBL Agglomerative clustering approach Every object is used as a cluster at beginning Keep merging clusters that are most similar

First-order K-Means Clustering: 

First-order K-Means Clustering K-Means algorithm Select k initial objects as cluster centers Assign objects to nearest clusters Repeat step 2 until stable K-Means is very expensive Computing distance between an object and a cluster is very expensive K-Means can be replaced by K-Medoids For each cluster, use an object that is nearest to all objects in this cluster as the center

Multi-relational Clustering: Summary: 

Multi-relational Clustering: Summary Extend clustering algorithms to multi-relational environments Use distance measures that consider related objects Very expensive because the numbers of related objects are usually very large

Probabilistic Relational Model (PRM): 

Probabilistic Relational Model (PRM) Model the probabilistic distribution between directly joinable relations Model the conditional probabilistic distribution between any two attributes in two joinable relations Applications Model data distribution Selectivity estimation for query optimization Classification More on http://robotics.stanford.edu/~koller/acl-talk-web_files/frame.htm

Bayesian Networks: 

nodes = variables edges = direct influence Letter Grade SAT Intelligence Difficulty Bayesian Networks

Probabilistic Relational Models: 

Probabilistic Relational Models

The Web of Influence: 

The Web of Influence weak / smart easy / hard

Slide21: 

Welcome to CS101 weak / smart The Web of Influence easy / hard

Learning PRMs: 

Learning PRMs Parameter estimation: Learning probabilistic models by parameter estimation E.g. Grades for all students share same model Use standard techniques for max-likelihood or Bayesian parameter estimation Structure learning: Define scoring function over structures Use combinatorial search to find high-scoring structure

References: 

References H. Blockeel, L. De Raedt and J. Ramon. Top-down induction of logical decision trees. In Proc. of the Fifteenth Int. Conf. of Machine Learning, Madison, WI, 1998. S. Dzeroski. Multi-relational data mining: an introduction. KDD Explorations, July 2003. W. Emde, D. Wettschereck. Relational instance-based learning. ICML, 1996. L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of relational structure. In Proc. 18th International Conf. on Machine Learning, Williamtown, MA, 2001. M. Kirsten, S. Wrobel. Relational distance-based clustering. ILP, 1998. M. Kirsten, S. Wrobel. Extending k-means clustering to first-order representations. ILP, 2000. T. Mitchell. Machine Learning. McGraw Hill, 1996. S. Muggleton. Inverse Entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming, 1995. S. Muggleton and C. Feng. Efficient induction of logic programs. In Proc. of First Conf. on Algorithmic Learning Theory, Tokyo, Japan, 1990. J. R. Quinlan. FOIL: A midterm report. In Proc. of the sixth European Conf. on Machine Learning, Springer-Verlag, 1993. J. R. Quilan. C4.5: Programs for Machine Learning. In Morgan Kaufmann series in machine learning, Morgan Kaufmann, 1993. B. Taskar, E. Segal, and D. Koller. Probabilistic Classification and Clustering in Relational Data. in Proc. of 17th Int. Joint Conf. on Artificial Intelligence, Seattle, WA, 2001.