logging in or signing up domingos pedro smord Jacqueline Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 76 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 23, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Statistical ModelingOf Relational Data: Statistical Modeling Of Relational Data Pedro Domingos Dept. of Computer Science & Eng. University of WashingtonOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsMotivation: MotivationExamples: Examples Web search Information extraction Natural language processing Perception Medical diagnosis Computational biology Social networks Ubiquitous computing Etc.Costs and Benefits ofMulti-Relational Data Mining: Costs and Benefits of Multi-Relational Data Mining Benefits Better predictive accuracy Better understanding of domains Growth path for KDD Costs Learning is much harder Inference becomes a crucial issue Greater complexity for userGoal and Progress: Goal and Progress Goal: Learn from multiple relations as easily as from a single one Progress to date Burgeoning research area We’re “close enough” to goal Easy-to-use open-source software available Lots of research questions (old and new) Plan: Plan We have the elements: Probability for handling uncertainty Logic for representing types, relations, and complex dependencies between them Learning and inference algorithms for each Figure out how to put them together Tremendous leverage on a wide range of applicationsDisclaimers: Disclaimers Not a complete survey of multi-relational data mining Or of foundational areas Focus is practical, not theoretical Assumes basic background in logic, probability and statistics, etc. Please ask questions Tutorial and examples available at alchemy.cs.washington.eduOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsMarkov Networks: Markov Networks Undirected graphical models Cancer Cough Asthma Smoking Potential functions defined over cliquesMarkov Networks: Markov Networks Undirected graphical models Log-linear model: Weight of Feature i Feature i Cancer Cough Asthma SmokingMarkov Nets vs. Bayes Nets: Markov Nets vs. Bayes NetsInference in Markov Networks: Inference in Markov Networks Goal: Compute marginals & conditionals of Exact inference is #P-complete Conditioning on Markov blanket is easy: Gibbs sampling exploits thisMCMC: Gibbs Sampling: MCMC: Gibbs Sampling state ← random truth assignment for i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of x P(F) ← fraction of states in which F is trueOther Inference Methods: Other Inference Methods Many variations of MCMC Belief propagation (sum-product) Variational approximation Exact methodsMAP/MPE Inference: MAP/MPE Inference Goal: Find most likely state of world given evidence Query EvidenceMAP Inference Algorithms: MAP Inference Algorithms Iterated conditional modes Simulated annealing Graph cuts Belief propagation (max-product)Overview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsLearning Markov Networks: Learning Markov Networks Learning parameters (weights) Generatively Discriminatively Learning structure (features) In this tutorial: Assume complete data (If not: EM versions of algorithms)Generative Weight Learning: Generative Weight Learning Maximize likelihood or posterior probability Numerical optimization (gradient or 2nd order) No local maxima Requires inference at each step (slow!)Pseudo-Likelihood: Pseudo-Likelihood Likelihood of each variable given its neighbors in the data Does not require inference at each step Consistent estimator Widely used in vision, spatial statistics, etc. But PL parameters may not work well for long inference chainsDiscriminative Weight Learning: Discriminative Weight Learning Maximize conditional likelihood of query (y) given evidence (x) Approximate expected counts by counts in MAP state of y given x No. of true groundings of clause i in data Expected no. true groundings according to modelOther Weight Learning Approaches: Other Weight Learning Approaches Generative: Iterative scaling Discriminative: Max marginStructure Learning: Structure Learning Start with atomic features Greedily conjoin features to improve score Problem: Need to reestimate weights for each new candidate Approximation: Keep weights of previous features constant Overview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsFirst-Order Logic: First-Order Logic Constants, variables, functions, predicates E.g.: Anna, x, MotherOf(x), Friends(x, y) Literal: Predicate or its negation Clause: Disjunction of literals Grounding: Replace all variables by constants E.g.: Friends (Anna, Bob) World (model, interpretation): Assignment of truth values to all ground predicatesInference in First-Order Logic: Inference in First-Order Logic Traditionally done by theorem proving (e.g.: Prolog) Propositionalization followed by model checking turns out to be faster (often a lot) Propositionalization: Create all ground atoms and clauses Model checking: Satisfiability testing Two main approaches: Backtracking (e.g.: DPLL; not covered here) Stochastic local search (e.g.: WalkSAT)Satisfiability: Satisfiability Input: Set of clauses (Convert KB to conjunctive normal form (CNF)) Output: Truth assignment that satisfies all clauses, or failure The paradigmatic NP-complete problem Solution: Search Key point: Most SAT problems are actually easy Hard region: Narrow range of #Clauses / #VariablesStochastic Local Search: Stochastic Local Search Uses complete assignments instead of partial Start with random state Flip variables in unsatisfied clauses Hill-climbing: Minimize # unsatisfied clauses Avoid local minima: Random flips Multiple restartsThe WalkSAT Algorithm: The WalkSAT Algorithm for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes number of satisfied clauses return failureOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsRule Induction: Rule Induction Given: Set of positive and negative examples of some concept Example: (x1, x2, … , xn, y) y: concept (Boolean) x1, x2, … , xn: attributes (assume Boolean) Goal: Induce a set of rules that cover all positive examples and no negative ones Rule: xa ^ xb ^ … y (xa: Literal, i.e., xi or its negation) Same as Horn clause: Body Head Rule r covers example x iff x satisfies body of r Eval(r): Accuracy, info. gain, coverage, support, etc.Learning a Single Rule: Learning a Single Rule head ← y body ← Ø repeat for each literal x rx ← r with x added to body Eval(rx) body ← body ^ best x until no x improves Eval(r) return rLearning a Set of Rules: Learning a Set of Rules R ← Ø S ← examples repeat learn a single rule r R ← R U { r } S ← S − positive examples covered by r until S = Ø return RFirst-Order Rule Induction: First-Order Rule Induction y and xi are now predicates with arguments E.g.: y is Ancestor(x,y), xi is Parent(x,y) Literals to add are predicates or their negations Literal to add must include at least one variable already appearing in rule Adding a literal changes # groundings of rule E.g.: Ancestor(x,z) ^ Parent(z,y) Ancestor(x,y) Eval(r) must take this into account E.g.: Multiply by # positive groundings of rule still covered after adding literalOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsPlethora of Approaches: Plethora of Approaches Knowledge-based model construction [Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Bayesian logic [Milch et al., 2005] Markov logic [Richardson & Domingos, 2006] And many others! Key Dimensions: Key Dimensions Logical language First-order logic, Horn clauses, frame systems Probabilistic language Bayes nets, Markov nets, PCFGs Type of learning Generative / Discriminative Structure / Parameters Knowledge-rich / Knowledge-poor Type of inference MAP / Marginal Full grounding / Partial grounding / LiftedKnowledge-BasedModel Construction: Knowledge-Based Model Construction Logical language: Horn clauses Probabilistic language: Bayes nets Ground atom → Node Head of clause → Child node Body of clause → Parent nodes >1 clause w/ same head → Combining function Learning: ILP + EM Inference: Partial grounding + Belief prop.Stochastic Logic Programs: Stochastic Logic Programs Logical language: Horn clauses Probabilistic language: Probabilistic context-free grammars Attach probabilities to clauses .Σ Probs. of clauses w/ same head = 1 Learning: ILP + “Failure-adjusted” EM Inference: Do all proofs, add probs.Probabilistic Relational Models: Probabilistic Relational Models Logical language: Frame systems Probabilistic language: Bayes nets Bayes net template for each class of objects Object’s attrs. can depend on attrs. of related objs. Only binary relations No dependencies of relations on relations Learning: Parameters: Closed form (EM if missing data) Structure: “Tiered” Bayes net structure search Inference: Full grounding + Belief propagationRelational Markov Networks: Relational Markov Networks Logical language: SQL queries Probabilistic language: Markov nets SQL queries define cliques Potential function for each query No uncertainty over relations Learning: Discriminative weight learning No structure learning Inference: Full grounding + Belief prop.Bayesian Logic: Bayesian Logic Logical language: First-order semantics Probabilistic language: Bayes nets BLOG program specifies how to generate relational world Parameters defined separately in Java functions Allows unknown objects May create Bayes nets with directed cycles Learning: None to date Inference: MCMC with user-supplied proposal distribution Partial groundingMarkov Logic: Markov Logic Logical language: First-order logic Probabilistic language: Markov networks Syntax: First-order formulas with weights Semantics: Templates for Markov net features Learning: Parameters: Generative or discriminative Structure: ILP with arbitrary clauses and MAP score Inference: MAP: Weighted satisfiability Marginal: MCMC with moves proposed by SAT solver Partial grounding + Lazy inference Markov Logic: Markov Logic Most developed approach to date Many other approaches can be viewed as special cases Main focus of rest of this tutorial Markov Logic: Intuition: Markov Logic: Intuition A logical KB is a set of hard constraints on the set of possible worlds Let’s make them soft constraints: When a world violates a formula, It becomes less probable, not impossible Give each formula a weight (Higher weight Stronger constraint) Markov Logic: Definition: Markov Logic: Definition A Markov Logic Network (MLN) is a set of pairs (F, w) where F is a formula in first-order logic w is a real number Together with a set of constants, it defines a Markov network with One node for each grounding of each predicate in the MLN One feature for each grounding of each formula F in the MLN, with the corresponding weight wExample: Friends & Smokers: Example: Friends & SmokersExample: Friends & Smokers: Example: Friends & SmokersExample: Friends & Smokers: Example: Friends & Smokers Example: Friends & Smokers: Example: Friends & Smokers Two constants: Anna (A) and Bob (B)Example: Friends & Smokers: Example: Friends & Smokers Cancer(A) Smokes(A) Smokes(B) Cancer(B) Two constants: Anna (A) and Bob (B)Example: Friends & Smokers: Example: Friends & Smokers Cancer(A) Smokes(A) Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Two constants: Anna (A) and Bob (B)Example: Friends & Smokers: Example: Friends & Smokers Cancer(A) Smokes(A) Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Two constants: Anna (A) and Bob (B)Example: Friends & Smokers: Example: Friends & Smokers Cancer(A) Smokes(A) Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Two constants: Anna (A) and Bob (B)Markov Logic Networks: Markov Logic Networks MLN is template for ground Markov nets Probability of a world x: Typed variables and constants greatly reduce size of ground Markov net Functions, existential quantifiers, etc. Infinite and continuous domains Weight of formula i No. of true groundings of formula i in xRelation to Statistical Models: Relation to Statistical Models Special cases: Markov networks Markov random fields Bayesian networks Log-linear models Exponential models Max. entropy models Gibbs distributions Boltzmann machines Logistic regression Hidden Markov models Conditional random fields Obtained by making all predicates zero-arity Markov logic allows objects to be interdependent (non-i.i.d.) Relation to First-Order Logic: Relation to First-Order Logic Infinite weights First-order logic Satisfiable KB, positive weights Satisfying assignments = Modes of distribution Markov logic allows contradictions between formulasMAP/MPE Inference: MAP/MPE Inference Problem: Find most likely state of world given evidence Query EvidenceMAP/MPE Inference: MAP/MPE Inference Problem: Find most likely state of world given evidenceMAP/MPE Inference: MAP/MPE Inference Problem: Find most likely state of world given evidence MAP/MPE Inference: MAP/MPE Inference Problem: Find most likely state of world given evidence This is just the weighted MaxSAT problem Use weighted SAT solver (e.g., MaxWalkSAT [Kautz et al., 1997] ) Potentially faster than logical inference (!)The MaxWalkSAT Algorithm: The MaxWalkSAT Algorithm for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if ∑ weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ∑ weights(sat. clauses) return failure, best solution foundBut … Memory Explosion: But … Memory Explosion Problem: If there are n constants and the highest clause arity is c, the ground network requires O(n ) memory Solution: Exploit sparseness; ground clauses lazily → LazySAT algorithm [Singla & Domingos, 2006] cComputing Probabilities: Computing Probabilities P(Formula|MLN,C) = ? MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms First construct min subset of network necessary to answer query (generalization of KBMC) Then apply MCMC (or other) Can also do lifted inference [Braz et al, 2005]Ground Network Construction: Ground Network Construction network ← Ø queue ← query nodes repeat node ← front(queue) remove node from queue add node to network if node not in evidence then add neighbors(node) to queue until queue = ØBut … Insufficient for Logic: But … Insufficient for Logic Problem: Deterministic dependencies break MCMC Near-deterministic ones make it very slow Solution: Combine MCMC and WalkSAT → MC-SAT algorithm [Poon & Domingos, 2006]Learning: Learning Data is a relational database Closed world assumption (if not: EM) Learning parameters (weights) Learning structure (formulas)Weight Learning: Parameter tying: Groundings of same clause Generative learning: Pseudo-likelihood Discriminative learning: Cond. likelihood, use MC-SAT or MaxWalkSAT for inference Weight Learning No. of times clause i is true in data Expected no. times clause i is true according to MLNStructure Learning: Structure Learning Generalizes feature induction in Markov nets Any inductive logic programming approach can be used, but . . . Goal is to induce any clauses, not just Horn Evaluation function should be likelihood Requires learning weights for each candidate Turns out not to be bottleneck Bottleneck is counting clause groundings Solution: Subsampling Structure Learning: Structure Learning Initial state: Unit clauses or hand-coded KB Operators: Add/remove literal, flip sign Evaluation function: Pseudo-likelihood + Structure prior Search: Beam, shortest-first, bottom-up [Kok & Domingos, 2005; Mihalkova & Mooney, 2007] Alchemy: Alchemy Open-source software including: Full first-order logic syntax Generative & discriminative weight learning Structure learning Weighted satisfiability and MCMC Programming language features alchemy.cs.washington.eduOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsApplications: Applications Basics Logistic regression Hypertext classification Information retrieval Entity resolution Hidden Markov models Information extraction Statistical parsing Semantic processing Bayesian networks Relational models Practical tipsRunning Alchemy: Running Alchemy Programs Infer Learnwts Learnstruct Options MLN file Types (optional) Predicates Formulas Database filesUniform Distribn.: Empty MLN: Uniform Distribn.: Empty MLN Example: Unbiased coin flips Type: flip = { 1, … , 20 } Predicate: Heads(flip) Binomial Distribn.: Unit Clause: Binomial Distribn.: Unit Clause Example: Biased coin flips Type: flip = { 1, … , 20 } Predicate: Heads(flip) Formula: Heads(f) Weight: Log odds of heads: By default, MLN includes unit clauses for all predicates (captures marginal distributions, etc.)Multinomial Distribution: Multinomial Distribution Example: Throwing die Types: throw = { 1, … , 20 } face = { 1, … , 6 } Predicate: Outcome(throw,face) Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’). Exist f Outcome(t,f). Too cumbersome!Multinomial Distrib.: ! Notation: Multinomial Distrib.: ! Notation Example: Throwing die Types: throw = { 1, … , 20 } face = { 1, … , 6 } Predicate: Outcome(throw,face!) Formulas: Semantics: Arguments without “!” determine arguments with “!”. Also makes inference more efficient (triggers blocking).Multinomial Distrib.: + Notation: Multinomial Distrib.: + Notation Example: Throwing biased die Types: throw = { 1, … , 20 } face = { 1, … , 6 } Predicate: Outcome(throw,face!) Formulas: Outcome(t,+f) Semantics: Learn weight for each grounding of args with “+”. Logistic Regression: Logistic regression: Type: obj = { 1, ... , n } Query predicate: C(obj) Evidence predicates: Fi(obj) Formulas: a C(x) bi Fi(x) ^ C(x) Resulting distribution: Therefore: Alternative form: Fi(x) => C(x) Logistic RegressionText Classification: Text Classification page = { 1, … , n } word = { … } topic = { … } Topic(page,topic!) HasWord(page,word) !Topic(p,t) HasWord(p,+w) => Topic(p,+t)Text Classification: Text Classification Topic(page,topic!) HasWord(page,word) HasWord(p,+w) => Topic(p,+t)Hypertext Classification: Hypertext Classification Topic(page,topic!) HasWord(page,word) Links(page,page) HasWord(p,+w) => Topic(p,+t) Topic(p,t) ^ Links(p,p') => Topic(p',t) Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext Classification Using Hyperlinks,” in Proc. SIGMOD-1998.Information Retrieval: Information Retrieval InQuery(word) HasWord(page,word) Relevant(page) InQuery(w+) ^ HasWord(p,+w) => Relevant(p) Relevant(p) ^ Links(p,p’) => Relevant(p’) Cf. L. Page, S. Brin, R. Motwani & T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” Tech. Rept., Stanford University, 1998.Entity Resolution: Problem: Given database, find duplicate records HasToken(token,field,record) SameField(field,record,record) SameRecord(record,record) HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’) SameField(f,r,r’) => SameRecord(r,r’) SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”) Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertainty with Application to Noun Coreference,” in Adv. NIPS 17, 2005. Entity ResolutionEntity Resolution: Can also resolve fields: HasToken(token,field,record) SameField(field,record,record) SameRecord(record,record) HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’) SameField(f,r,r’) <=> SameRecord(r,r’) SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”) SameField(f,r,r’) ^ SameField(f,r’,r”) => SameField(f,r,r”) More: P. Singla & P. Domingos, “Entity Resolution with Markov Logic”, in Proc. ICDM-2006. Entity ResolutionHidden Markov Models: Hidden Markov Models obs = { Obs1, … , ObsN } state = { St1, … , StM } time = { 0, … , T } State(state!,time) Obs(obs!,time) State(+s,0) State(+s,t) => State(+s',t+1) Obs(+o,t) => State(+s,t)Information Extraction: Information Extraction Problem: Extract database from text or semi-structured sources Example: Extract database of publications from citation list(s) (the “CiteSeer problem”) Two steps: Segmentation: Use HMM to assign tokens to fields Entity resolution: Use logistic regression and transitivityInformation Extraction: Token(token, position, citation) InField(position, field, citation) SameField(field, citation, citation) SameCit(citation, citation) Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) <=> InField(i+1,+f,c) f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’) SameField(+f,c,c’) <=> SameCit(c,c’) SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”) SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”) Information ExtractionInformation Extraction: Token(token, position, citation) InField(position, field, citation) SameField(field, citation, citation) SameCit(citation, citation) Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) ^ !Token(“.”,i,c) <=> InField(i+1,+f,c) f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’) SameField(+f,c,c’) <=> SameCit(c,c’) SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”) SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”) More: H. Poon & P. Domingos, “Joint Inference in Information Extraction”, in Proc. AAAI-2007. Information ExtractionStatistical Parsing: Statistical Parsing Input: Sentence Output: Most probable parse PCFG: Production rules with probabilities E.g.: 0.7 NP → N 0.3 NP → Det N WCFG: Production rules with weights (equivalent) Chomsky normal form: A → B C or A → aStatistical Parsing: Statistical Parsing Evidence predicate: Token(token,position) E.g.: Token(“pizza”, 3) Query predicates: Constituent(position,position) E.g.: NP(2,4) For each rule of the form A → B C: Clause of the form B(i,j) ^ C(j,k) => A(i,k) E.g.: NP(i,j) ^ VP(j,k) => S(i,k) For each rule of the form A → a: Clause of the form Token(a,i) => A(i,i+1) E.g.: Token(“pizza”, i) => N(i,i+1) For each nonterminal: Hard formula stating that exactly one production holds MAP inference yields most probable parseSemantic Processing: Semantic Processing Example: John ate pizza. Grammar: S → NP VP VP → V NP V → ate NP → John NP → pizza Token(“John”,0) => Participant(John,E,0,1) Token(“ate”,1) => Event(Eating,E,1,2) Token(“pizza”,2) => Participant(pizza,E,2,3) Event(Eating,e,i,j) ^ Participant(p,e,j,k) ^ VP(i,k) ^ V(i,j) ^ NP(j,k) => Eaten(p,e) Event(Eating,e,j,k) ^ Participant(p,e,i,j) ^ S(i,k) ^ NP(i,j) ^ VP(j,k) => Eater(p,e) Event(t,e,i,k) => Isa(e,t) Result: Isa(E,Eating), Eater(John,E), Eaten(pizza,E) Bayesian Networks: Bayesian Networks Use all binary predicates with same first argument (the object x). One predicate for each variable A: A(x,v!) One clause for each line in the CPT and value of the variable Context-specific independence: One Horn clause for each path in the decision tree Logistic regression: As before Noisy OR: Deterministic OR + Pairwise clauses Relational Models: Relational Models Knowledge-based model construction Allow only Horn clauses Same as Bayes nets, except arbitrary relations Combin. function: Logistic regression, noisy-OR or external Stochastic logic programs Allow only Horn clauses Weight of clause = log(p) Add formulas: Head holds => Exactly one body holds Probabilistic relational models Allow only binary relations Same as Bayes nets, except first argument can varyRelational Models: Relational Models Relational Markov networks SQL → Datalog → First-order logic One clause for each state of a clique * syntax in Alchemy facilitates this Bayesian logic Object = Cluster of similar/related observations Observation constants + Object constants Predicate InstanceOf(Obs,Obj) and clauses using it Unknown relations: Second-order Markov logic S. Kok & P. Domingos, “Statistical Predicate Invention”, in Proc. ICML-2007. Practical Tips: Practical Tips Add all unit clauses (the default) Implications vs. conjunctions Open/closed world assumptions How to handle uncertain data: R(x,y) => R’(x,y) (the “HMM trick”) Controlling complexity Low clause arities Low numbers of constants Short inference chains Use the simplest MLN that works Cycle: Add/delete formulas, learn and testSummary: Summary Most domains have multiple relations and dependencies between objects Much progress in recent years Multi-relational data mining mature enough to be practical tool Many old and new research issues Check out the Alchemy Web site: alchemy.cs.washington.edu You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
domingos pedro smord Jacqueline Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 76 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 23, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Statistical ModelingOf Relational Data: Statistical Modeling Of Relational Data Pedro Domingos Dept. of Computer Science & Eng. University of WashingtonOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsMotivation: MotivationExamples: Examples Web search Information extraction Natural language processing Perception Medical diagnosis Computational biology Social networks Ubiquitous computing Etc.Costs and Benefits ofMulti-Relational Data Mining: Costs and Benefits of Multi-Relational Data Mining Benefits Better predictive accuracy Better understanding of domains Growth path for KDD Costs Learning is much harder Inference becomes a crucial issue Greater complexity for userGoal and Progress: Goal and Progress Goal: Learn from multiple relations as easily as from a single one Progress to date Burgeoning research area We’re “close enough” to goal Easy-to-use open-source software available Lots of research questions (old and new) Plan: Plan We have the elements: Probability for handling uncertainty Logic for representing types, relations, and complex dependencies between them Learning and inference algorithms for each Figure out how to put them together Tremendous leverage on a wide range of applicationsDisclaimers: Disclaimers Not a complete survey of multi-relational data mining Or of foundational areas Focus is practical, not theoretical Assumes basic background in logic, probability and statistics, etc. Please ask questions Tutorial and examples available at alchemy.cs.washington.eduOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsMarkov Networks: Markov Networks Undirected graphical models Cancer Cough Asthma Smoking Potential functions defined over cliquesMarkov Networks: Markov Networks Undirected graphical models Log-linear model: Weight of Feature i Feature i Cancer Cough Asthma SmokingMarkov Nets vs. Bayes Nets: Markov Nets vs. Bayes NetsInference in Markov Networks: Inference in Markov Networks Goal: Compute marginals & conditionals of Exact inference is #P-complete Conditioning on Markov blanket is easy: Gibbs sampling exploits thisMCMC: Gibbs Sampling: MCMC: Gibbs Sampling state ← random truth assignment for i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of x P(F) ← fraction of states in which F is trueOther Inference Methods: Other Inference Methods Many variations of MCMC Belief propagation (sum-product) Variational approximation Exact methodsMAP/MPE Inference: MAP/MPE Inference Goal: Find most likely state of world given evidence Query EvidenceMAP Inference Algorithms: MAP Inference Algorithms Iterated conditional modes Simulated annealing Graph cuts Belief propagation (max-product)Overview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsLearning Markov Networks: Learning Markov Networks Learning parameters (weights) Generatively Discriminatively Learning structure (features) In this tutorial: Assume complete data (If not: EM versions of algorithms)Generative Weight Learning: Generative Weight Learning Maximize likelihood or posterior probability Numerical optimization (gradient or 2nd order) No local maxima Requires inference at each step (slow!)Pseudo-Likelihood: Pseudo-Likelihood Likelihood of each variable given its neighbors in the data Does not require inference at each step Consistent estimator Widely used in vision, spatial statistics, etc. But PL parameters may not work well for long inference chainsDiscriminative Weight Learning: Discriminative Weight Learning Maximize conditional likelihood of query (y) given evidence (x) Approximate expected counts by counts in MAP state of y given x No. of true groundings of clause i in data Expected no. true groundings according to modelOther Weight Learning Approaches: Other Weight Learning Approaches Generative: Iterative scaling Discriminative: Max marginStructure Learning: Structure Learning Start with atomic features Greedily conjoin features to improve score Problem: Need to reestimate weights for each new candidate Approximation: Keep weights of previous features constant Overview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsFirst-Order Logic: First-Order Logic Constants, variables, functions, predicates E.g.: Anna, x, MotherOf(x), Friends(x, y) Literal: Predicate or its negation Clause: Disjunction of literals Grounding: Replace all variables by constants E.g.: Friends (Anna, Bob) World (model, interpretation): Assignment of truth values to all ground predicatesInference in First-Order Logic: Inference in First-Order Logic Traditionally done by theorem proving (e.g.: Prolog) Propositionalization followed by model checking turns out to be faster (often a lot) Propositionalization: Create all ground atoms and clauses Model checking: Satisfiability testing Two main approaches: Backtracking (e.g.: DPLL; not covered here) Stochastic local search (e.g.: WalkSAT)Satisfiability: Satisfiability Input: Set of clauses (Convert KB to conjunctive normal form (CNF)) Output: Truth assignment that satisfies all clauses, or failure The paradigmatic NP-complete problem Solution: Search Key point: Most SAT problems are actually easy Hard region: Narrow range of #Clauses / #VariablesStochastic Local Search: Stochastic Local Search Uses complete assignments instead of partial Start with random state Flip variables in unsatisfied clauses Hill-climbing: Minimize # unsatisfied clauses Avoid local minima: Random flips Multiple restartsThe WalkSAT Algorithm: The WalkSAT Algorithm for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes number of satisfied clauses return failureOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsRule Induction: Rule Induction Given: Set of positive and negative examples of some concept Example: (x1, x2, … , xn, y) y: concept (Boolean) x1, x2, … , xn: attributes (assume Boolean) Goal: Induce a set of rules that cover all positive examples and no negative ones Rule: xa ^ xb ^ … y (xa: Literal, i.e., xi or its negation) Same as Horn clause: Body Head Rule r covers example x iff x satisfies body of r Eval(r): Accuracy, info. gain, coverage, support, etc.Learning a Single Rule: Learning a Single Rule head ← y body ← Ø repeat for each literal x rx ← r with x added to body Eval(rx) body ← body ^ best x until no x improves Eval(r) return rLearning a Set of Rules: Learning a Set of Rules R ← Ø S ← examples repeat learn a single rule r R ← R U { r } S ← S − positive examples covered by r until S = Ø return RFirst-Order Rule Induction: First-Order Rule Induction y and xi are now predicates with arguments E.g.: y is Ancestor(x,y), xi is Parent(x,y) Literals to add are predicates or their negations Literal to add must include at least one variable already appearing in rule Adding a literal changes # groundings of rule E.g.: Ancestor(x,z) ^ Parent(z,y) Ancestor(x,y) Eval(r) must take this into account E.g.: Multiply by # positive groundings of rule still covered after adding literalOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsPlethora of Approaches: Plethora of Approaches Knowledge-based model construction [Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Bayesian logic [Milch et al., 2005] Markov logic [Richardson & Domingos, 2006] And many others! Key Dimensions: Key Dimensions Logical language First-order logic, Horn clauses, frame systems Probabilistic language Bayes nets, Markov nets, PCFGs Type of learning Generative / Discriminative Structure / Parameters Knowledge-rich / Knowledge-poor Type of inference MAP / Marginal Full grounding / Partial grounding / LiftedKnowledge-BasedModel Construction: Knowledge-Based Model Construction Logical language: Horn clauses Probabilistic language: Bayes nets Ground atom → Node Head of clause → Child node Body of clause → Parent nodes >1 clause w/ same head → Combining function Learning: ILP + EM Inference: Partial grounding + Belief prop.Stochastic Logic Programs: Stochastic Logic Programs Logical language: Horn clauses Probabilistic language: Probabilistic context-free grammars Attach probabilities to clauses .Σ Probs. of clauses w/ same head = 1 Learning: ILP + “Failure-adjusted” EM Inference: Do all proofs, add probs.Probabilistic Relational Models: Probabilistic Relational Models Logical language: Frame systems Probabilistic language: Bayes nets Bayes net template for each class of objects Object’s attrs. can depend on attrs. of related objs. Only binary relations No dependencies of relations on relations Learning: Parameters: Closed form (EM if missing data) Structure: “Tiered” Bayes net structure search Inference: Full grounding + Belief propagationRelational Markov Networks: Relational Markov Networks Logical language: SQL queries Probabilistic language: Markov nets SQL queries define cliques Potential function for each query No uncertainty over relations Learning: Discriminative weight learning No structure learning Inference: Full grounding + Belief prop.Bayesian Logic: Bayesian Logic Logical language: First-order semantics Probabilistic language: Bayes nets BLOG program specifies how to generate relational world Parameters defined separately in Java functions Allows unknown objects May create Bayes nets with directed cycles Learning: None to date Inference: MCMC with user-supplied proposal distribution Partial groundingMarkov Logic: Markov Logic Logical language: First-order logic Probabilistic language: Markov networks Syntax: First-order formulas with weights Semantics: Templates for Markov net features Learning: Parameters: Generative or discriminative Structure: ILP with arbitrary clauses and MAP score Inference: MAP: Weighted satisfiability Marginal: MCMC with moves proposed by SAT solver Partial grounding + Lazy inference Markov Logic: Markov Logic Most developed approach to date Many other approaches can be viewed as special cases Main focus of rest of this tutorial Markov Logic: Intuition: Markov Logic: Intuition A logical KB is a set of hard constraints on the set of possible worlds Let’s make them soft constraints: When a world violates a formula, It becomes less probable, not impossible Give each formula a weight (Higher weight Stronger constraint) Markov Logic: Definition: Markov Logic: Definition A Markov Logic Network (MLN) is a set of pairs (F, w) where F is a formula in first-order logic w is a real number Together with a set of constants, it defines a Markov network with One node for each grounding of each predicate in the MLN One feature for each grounding of each formula F in the MLN, with the corresponding weight wExample: Friends & Smokers: Example: Friends & SmokersExample: Friends & Smokers: Example: Friends & SmokersExample: Friends & Smokers: Example: Friends & Smokers Example: Friends & Smokers: Example: Friends & Smokers Two constants: Anna (A) and Bob (B)Example: Friends & Smokers: Example: Friends & Smokers Cancer(A) Smokes(A) Smokes(B) Cancer(B) Two constants: Anna (A) and Bob (B)Example: Friends & Smokers: Example: Friends & Smokers Cancer(A) Smokes(A) Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Two constants: Anna (A) and Bob (B)Example: Friends & Smokers: Example: Friends & Smokers Cancer(A) Smokes(A) Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Two constants: Anna (A) and Bob (B)Example: Friends & Smokers: Example: Friends & Smokers Cancer(A) Smokes(A) Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Two constants: Anna (A) and Bob (B)Markov Logic Networks: Markov Logic Networks MLN is template for ground Markov nets Probability of a world x: Typed variables and constants greatly reduce size of ground Markov net Functions, existential quantifiers, etc. Infinite and continuous domains Weight of formula i No. of true groundings of formula i in xRelation to Statistical Models: Relation to Statistical Models Special cases: Markov networks Markov random fields Bayesian networks Log-linear models Exponential models Max. entropy models Gibbs distributions Boltzmann machines Logistic regression Hidden Markov models Conditional random fields Obtained by making all predicates zero-arity Markov logic allows objects to be interdependent (non-i.i.d.) Relation to First-Order Logic: Relation to First-Order Logic Infinite weights First-order logic Satisfiable KB, positive weights Satisfying assignments = Modes of distribution Markov logic allows contradictions between formulasMAP/MPE Inference: MAP/MPE Inference Problem: Find most likely state of world given evidence Query EvidenceMAP/MPE Inference: MAP/MPE Inference Problem: Find most likely state of world given evidenceMAP/MPE Inference: MAP/MPE Inference Problem: Find most likely state of world given evidence MAP/MPE Inference: MAP/MPE Inference Problem: Find most likely state of world given evidence This is just the weighted MaxSAT problem Use weighted SAT solver (e.g., MaxWalkSAT [Kautz et al., 1997] ) Potentially faster than logical inference (!)The MaxWalkSAT Algorithm: The MaxWalkSAT Algorithm for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if ∑ weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ∑ weights(sat. clauses) return failure, best solution foundBut … Memory Explosion: But … Memory Explosion Problem: If there are n constants and the highest clause arity is c, the ground network requires O(n ) memory Solution: Exploit sparseness; ground clauses lazily → LazySAT algorithm [Singla & Domingos, 2006] cComputing Probabilities: Computing Probabilities P(Formula|MLN,C) = ? MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms First construct min subset of network necessary to answer query (generalization of KBMC) Then apply MCMC (or other) Can also do lifted inference [Braz et al, 2005]Ground Network Construction: Ground Network Construction network ← Ø queue ← query nodes repeat node ← front(queue) remove node from queue add node to network if node not in evidence then add neighbors(node) to queue until queue = ØBut … Insufficient for Logic: But … Insufficient for Logic Problem: Deterministic dependencies break MCMC Near-deterministic ones make it very slow Solution: Combine MCMC and WalkSAT → MC-SAT algorithm [Poon & Domingos, 2006]Learning: Learning Data is a relational database Closed world assumption (if not: EM) Learning parameters (weights) Learning structure (formulas)Weight Learning: Parameter tying: Groundings of same clause Generative learning: Pseudo-likelihood Discriminative learning: Cond. likelihood, use MC-SAT or MaxWalkSAT for inference Weight Learning No. of times clause i is true in data Expected no. times clause i is true according to MLNStructure Learning: Structure Learning Generalizes feature induction in Markov nets Any inductive logic programming approach can be used, but . . . Goal is to induce any clauses, not just Horn Evaluation function should be likelihood Requires learning weights for each candidate Turns out not to be bottleneck Bottleneck is counting clause groundings Solution: Subsampling Structure Learning: Structure Learning Initial state: Unit clauses or hand-coded KB Operators: Add/remove literal, flip sign Evaluation function: Pseudo-likelihood + Structure prior Search: Beam, shortest-first, bottom-up [Kok & Domingos, 2005; Mihalkova & Mooney, 2007] Alchemy: Alchemy Open-source software including: Full first-order logic syntax Generative & discriminative weight learning Structure learning Weighted satisfiability and MCMC Programming language features alchemy.cs.washington.eduOverview: Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together ApplicationsApplications: Applications Basics Logistic regression Hypertext classification Information retrieval Entity resolution Hidden Markov models Information extraction Statistical parsing Semantic processing Bayesian networks Relational models Practical tipsRunning Alchemy: Running Alchemy Programs Infer Learnwts Learnstruct Options MLN file Types (optional) Predicates Formulas Database filesUniform Distribn.: Empty MLN: Uniform Distribn.: Empty MLN Example: Unbiased coin flips Type: flip = { 1, … , 20 } Predicate: Heads(flip) Binomial Distribn.: Unit Clause: Binomial Distribn.: Unit Clause Example: Biased coin flips Type: flip = { 1, … , 20 } Predicate: Heads(flip) Formula: Heads(f) Weight: Log odds of heads: By default, MLN includes unit clauses for all predicates (captures marginal distributions, etc.)Multinomial Distribution: Multinomial Distribution Example: Throwing die Types: throw = { 1, … , 20 } face = { 1, … , 6 } Predicate: Outcome(throw,face) Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’). Exist f Outcome(t,f). Too cumbersome!Multinomial Distrib.: ! Notation: Multinomial Distrib.: ! Notation Example: Throwing die Types: throw = { 1, … , 20 } face = { 1, … , 6 } Predicate: Outcome(throw,face!) Formulas: Semantics: Arguments without “!” determine arguments with “!”. Also makes inference more efficient (triggers blocking).Multinomial Distrib.: + Notation: Multinomial Distrib.: + Notation Example: Throwing biased die Types: throw = { 1, … , 20 } face = { 1, … , 6 } Predicate: Outcome(throw,face!) Formulas: Outcome(t,+f) Semantics: Learn weight for each grounding of args with “+”. Logistic Regression: Logistic regression: Type: obj = { 1, ... , n } Query predicate: C(obj) Evidence predicates: Fi(obj) Formulas: a C(x) bi Fi(x) ^ C(x) Resulting distribution: Therefore: Alternative form: Fi(x) => C(x) Logistic RegressionText Classification: Text Classification page = { 1, … , n } word = { … } topic = { … } Topic(page,topic!) HasWord(page,word) !Topic(p,t) HasWord(p,+w) => Topic(p,+t)Text Classification: Text Classification Topic(page,topic!) HasWord(page,word) HasWord(p,+w) => Topic(p,+t)Hypertext Classification: Hypertext Classification Topic(page,topic!) HasWord(page,word) Links(page,page) HasWord(p,+w) => Topic(p,+t) Topic(p,t) ^ Links(p,p') => Topic(p',t) Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext Classification Using Hyperlinks,” in Proc. SIGMOD-1998.Information Retrieval: Information Retrieval InQuery(word) HasWord(page,word) Relevant(page) InQuery(w+) ^ HasWord(p,+w) => Relevant(p) Relevant(p) ^ Links(p,p’) => Relevant(p’) Cf. L. Page, S. Brin, R. Motwani & T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” Tech. Rept., Stanford University, 1998.Entity Resolution: Problem: Given database, find duplicate records HasToken(token,field,record) SameField(field,record,record) SameRecord(record,record) HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’) SameField(f,r,r’) => SameRecord(r,r’) SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”) Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertainty with Application to Noun Coreference,” in Adv. NIPS 17, 2005. Entity ResolutionEntity Resolution: Can also resolve fields: HasToken(token,field,record) SameField(field,record,record) SameRecord(record,record) HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’) SameField(f,r,r’) <=> SameRecord(r,r’) SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”) SameField(f,r,r’) ^ SameField(f,r’,r”) => SameField(f,r,r”) More: P. Singla & P. Domingos, “Entity Resolution with Markov Logic”, in Proc. ICDM-2006. Entity ResolutionHidden Markov Models: Hidden Markov Models obs = { Obs1, … , ObsN } state = { St1, … , StM } time = { 0, … , T } State(state!,time) Obs(obs!,time) State(+s,0) State(+s,t) => State(+s',t+1) Obs(+o,t) => State(+s,t)Information Extraction: Information Extraction Problem: Extract database from text or semi-structured sources Example: Extract database of publications from citation list(s) (the “CiteSeer problem”) Two steps: Segmentation: Use HMM to assign tokens to fields Entity resolution: Use logistic regression and transitivityInformation Extraction: Token(token, position, citation) InField(position, field, citation) SameField(field, citation, citation) SameCit(citation, citation) Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) <=> InField(i+1,+f,c) f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’) SameField(+f,c,c’) <=> SameCit(c,c’) SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”) SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”) Information ExtractionInformation Extraction: Token(token, position, citation) InField(position, field, citation) SameField(field, citation, citation) SameCit(citation, citation) Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) ^ !Token(“.”,i,c) <=> InField(i+1,+f,c) f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’) SameField(+f,c,c’) <=> SameCit(c,c’) SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”) SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”) More: H. Poon & P. Domingos, “Joint Inference in Information Extraction”, in Proc. AAAI-2007. Information ExtractionStatistical Parsing: Statistical Parsing Input: Sentence Output: Most probable parse PCFG: Production rules with probabilities E.g.: 0.7 NP → N 0.3 NP → Det N WCFG: Production rules with weights (equivalent) Chomsky normal form: A → B C or A → aStatistical Parsing: Statistical Parsing Evidence predicate: Token(token,position) E.g.: Token(“pizza”, 3) Query predicates: Constituent(position,position) E.g.: NP(2,4) For each rule of the form A → B C: Clause of the form B(i,j) ^ C(j,k) => A(i,k) E.g.: NP(i,j) ^ VP(j,k) => S(i,k) For each rule of the form A → a: Clause of the form Token(a,i) => A(i,i+1) E.g.: Token(“pizza”, i) => N(i,i+1) For each nonterminal: Hard formula stating that exactly one production holds MAP inference yields most probable parseSemantic Processing: Semantic Processing Example: John ate pizza. Grammar: S → NP VP VP → V NP V → ate NP → John NP → pizza Token(“John”,0) => Participant(John,E,0,1) Token(“ate”,1) => Event(Eating,E,1,2) Token(“pizza”,2) => Participant(pizza,E,2,3) Event(Eating,e,i,j) ^ Participant(p,e,j,k) ^ VP(i,k) ^ V(i,j) ^ NP(j,k) => Eaten(p,e) Event(Eating,e,j,k) ^ Participant(p,e,i,j) ^ S(i,k) ^ NP(i,j) ^ VP(j,k) => Eater(p,e) Event(t,e,i,k) => Isa(e,t) Result: Isa(E,Eating), Eater(John,E), Eaten(pizza,E) Bayesian Networks: Bayesian Networks Use all binary predicates with same first argument (the object x). One predicate for each variable A: A(x,v!) One clause for each line in the CPT and value of the variable Context-specific independence: One Horn clause for each path in the decision tree Logistic regression: As before Noisy OR: Deterministic OR + Pairwise clauses Relational Models: Relational Models Knowledge-based model construction Allow only Horn clauses Same as Bayes nets, except arbitrary relations Combin. function: Logistic regression, noisy-OR or external Stochastic logic programs Allow only Horn clauses Weight of clause = log(p) Add formulas: Head holds => Exactly one body holds Probabilistic relational models Allow only binary relations Same as Bayes nets, except first argument can varyRelational Models: Relational Models Relational Markov networks SQL → Datalog → First-order logic One clause for each state of a clique * syntax in Alchemy facilitates this Bayesian logic Object = Cluster of similar/related observations Observation constants + Object constants Predicate InstanceOf(Obs,Obj) and clauses using it Unknown relations: Second-order Markov logic S. Kok & P. Domingos, “Statistical Predicate Invention”, in Proc. ICML-2007. Practical Tips: Practical Tips Add all unit clauses (the default) Implications vs. conjunctions Open/closed world assumptions How to handle uncertain data: R(x,y) => R’(x,y) (the “HMM trick”) Controlling complexity Low clause arities Low numbers of constants Short inference chains Use the simplest MLN that works Cycle: Add/delete formulas, learn and testSummary: Summary Most domains have multiple relations and dependencies between objects Much progress in recent years Multi-relational data mining mature enough to be practical tool Many old and new research issues Check out the Alchemy Web site: alchemy.cs.washington.edu