Identification in Inductive MetaLogic Programming: Identification in Inductive MetaLogic Programming Akihiro Yamamoto
Hokkaido University, JAPAN
yamamoto@meme.hokudai.ac.jp
Inverse of Deduction, Anti-Unification, and Inductive MetaLogic Programming: Inverse of Deduction, Anti-Unification, and Inductive MetaLogic Programming Akihiro Yamamoto
Hokkaido University, JAPAN
yamamoto@meme.hokudai.ac.jp
Motivation: Motivation 'How can mathematical (deductive) logic be used in formalizing inductive logic?'
Why?
Justification of inference (generation of Hypotheses) with logic
Inference from non-numerical data
Use of background theories represented in logical formulae
Inverse of Deduction for CNFs: Inverse of Deduction for CNFs
Using CNFs: Using CNFs CNFs are useful.
In propositional logic, the set of all models of a formula can be represented DNF or CNF.
In first order case, Herbrand Theorem helps up to lift up the theory of CNF (DNF).
CNFs are very popular in CS and AI, and some other mechanisms (e.g. abduction) proposed in AI can be merged into our theory [Yamamoto].
Definition and Example: Definition and Example B : a CNF as a background theory
E : a CNF as a set of examples
s.t. B |- E
A correct hypothesis of HFP(B, E) is a CNF s.t.
B Ù H |- E.
B = (Pet ¬ Cat)
Ù(Pet ¬ Dog )
Ù(Cuddly ¬ Small, Fluffy, Pet)
E = Cuddly ¬ Cat, Fluffy
H = Small ¬ Pet
Basic strategy for solving HFP: Basic strategy for solving HFP B Ù H |= E Û H |= Ø (B Ù Ø E )
Note : The negation of a CNF is not always a CNF, but easily transformed into a DNF.
Residue Hypothesis [Yamamoto et al]: Residue Hypothesis [Yamamoto et al] Res(T) : The residue of a CNF T is a CNF obtained by transforming Ø T into CNF and then deleting all tautological clauses from it.
B : a clausal theory, E : a clausal theory
The residue hypothesis for HFP(B, E) is
Res(B Ù Res(E)).
Prop. For any correct hypothesis H,
H |- Res(B Ù Res (E))
Example 1: Example 1 B = (Pet ¬ Cat)
Ù(Pet ¬ Dog )
Ù(Cuddly ¬ Small, Fluffy)
E = Cuddly ¬ Cat, Fluffy
H = Small ¬ Pet
Res(B Ù Res(E)) =
(ØPet Ú Dog Ú Small Ú ØFluffy Ú ØCat Ú Cuddly)
Ù (ØPet Ú ØPet Ú Small Ú ØFluffy Ú ØCat Ú Cuddly)
= (Small Ú Cuddly Ú Dog ¬ Pet Ù Fluffy Ù Cat )
Ù (Small Ú Cuddly ¬ Pet Ù Fluffy Ù Cat)
= (Small Ú Cuddly ¬ Pet Ù Fluffy Ù Cat)
Example 2: Example 2 B = (Pet ¬ Cat)
Ù(Black ¬ Dog)
Ù(Cuddly ¬ Small Ù Fluffy)
E = Cuddly ¬ Cat Ù Fluffy
H = Small ¬ Pet
Res(B Ù Res(E)) =
(ØPet Ú Dog Ú Small Ú ØFluffy Ú ØCat Ú Cuddly)
Ù (ØPet Ú ØBlack Ú Small Ú ØFluffy Ú ØCat Ú Cuddly)
= (Small Ú Cuddly Ú Dog ¬ Pet Ù Fluffy Ù Cat )
Ù (Small Ú Cuddly ¬ Pet Ù Black Ù Fluffy Ù Cat)
Inference rule for CNF: Inference rule for CNF S : a CNF C : a clause L : a literal
S Ù(C Ú L ) Ù (C Ú Ø L )
S Ù C Ù (C Ú L ) Ù (C Ú Ø L )
S Ù C
S Ù C Ù (C Ú L )
S Ù C
S
Inverse of deduction: Inverse of deduction Res(S) :
(Small Ú Cuddly Ú Dog ¬ Pet, Fluffy, Cat)
Ù (Small Ú Cuddly ¬ Pet, Black, Fluffy, Cat)
H1 : (Cuddly ¬ Pet, Fluffy, Cat)
Ù (Small Ú Cuddly ¬ Pet, Black, Fluffy, Cat)
H2 : (Cuddly ¬ Pet, Fluffy, Cat)
Ù (Cuddly ¬ Pet, Fluffy, Cat)
= Cuddly ¬ Pet, Fluffy, Cat
Problems in Practice: Problems in Practice High complexity of computing Residue Hypotheses (#SAT)
recovering hidden literals
adding any clauses
Rule Generation: Rule Generation E1= fly(swallow) E2= fly(pigeon)
E3= Øfly(ostrich) E4= Øfly(emu)
B = Øsym-feather(swallow) Ù Øsym-feather(pigeon)
Ù sym-feather(ostrich) Ù sym-feather(emu)
H = Øsym-feather(x) ® fly(x)
Ù sym-feather(x) ® Øfly(x)
fly(archaeopteryx)
中華竜鳥，始祖鳥，孔子鳥…
Example: Example B = △x y z º △u v w ® ∠y =∠ v
Ù A P = A P
Ù ∠ P = 90
E = A B= A C ® ∠ B =∠ C
H =
x y = u v Ù x z = u w
∠z = 90 Ù ∠w = 90
® △x y z º △u v w A B C P
Residue in Predicate Logic: Residue in Predicate Logic Herbrand’s Theorem
A conjunction of first order clauses S is unsatisfiable
Û there is an unsatisfiable conjunction of clause T consisting of clauses in ground(S) Since Res(B Ù Res(E) is satisfiable,
the residue hypotheses of any conjunction T
of clauses from ground(Res(B Ù Res(E)) is a
correct hypothesis.
Example (cont’d): B’ = △ABP º △ACP ® ∠B=∠C
Ù A P = A P
Ù ∠P = 90
Res(E) = Ø∠B =∠C Ù AB= AC
Res(B’ Ù Res(E)) =
AP = AP Ù ∠P=90 Ù AB= AC® △ABP º △ACP
x y = u v Ù x z = u w Ù ∠z = 90 Ù ∠w = 90 ® △x y z º △u v w Example (cont’d)
Residue Hypothesis: Residue Hypothesis B : a clausal theory, E : a clausal theory
For any clausal theory T consisting of clauses in ground(B Ù Res(E)), Res(T) is called a residue hypothesis for HFP(B, E). ground(S) T
Inference rule for CNF: Inference rule for CNF S : a CNF C : a clause L : a literal
q : a substitution
S Ù(C Ú L ) Ù (C Ú Ø L )
S Ù C Ù (C Ú L ) Ù (C Ú Ø L )
S Ù C S Ù C
S Ù C Ù (C Ú L ) S Ù Cq
S Ù C
S
Problems in Practice: Problems in Practice High complexity of computing Residue Hypotheses (#SAT)
recovering hidden literals
adding any clauses
selecting a set T consisting of clauses in ground(B Ù Res(E))
ILP Techniques: ILP Techniques Some ILP techniques are considered as inverse of resolution
B Ù H |- E Û B Ù Ø E |- Ø H
1. Derive B Ù Res(E) |- K
2. Then H= Res(K)
Saturation [Sammut, Riouveiol, Angluin, Cohen, Arimura]
The bottom method [Yamamoto 97], Inverse entailment [Muggleton95]
Abduction [Poole, Inoue]
Inverse Resolution and subsumption: Inverse Resolution and subsumption Th[Yamamoto and Fronhoefer].
S, T : CNF
T Î r (S ) Þ Res(S) Res(T) Subsumption of CNFs S Ù C S Ù C
S Ù C Ù (C Ú L ) S
Subsumption BY Resolution: Subsumption BY Resolution B Ù Res(E) = S1 |- S2 |- ・ ・ ・ |- Sn-1 |- Sn
Res(S1) = H1 H2 ・ ・ ・ Hn-1 Hn
Res(S1) = H1 -| H2 -| ・ ・ ・ -| Hn-1 -| Hn
Sk = L1 Ù L2 Ù ... Ù Lm
or
Sk = L1 Ú L2 Ú ... Ú Lm
Definite Induction: Definite Induction B = (Pet ¬ Cat)
Ù(Black ¬ Dog)
Ù(Cuddly ¬ Small Ù Fluffy)
E = Cuddly ¬ Cat Ù Fluffy
H = Small ¬ Pet
Ø E = Cat Ù Fluffy Ù (¬ Cuddly )
For positive literals
B Ù Cat Ù Fluffy |- Pet, Cat, Fluffy
Skipping and consequence: Skipping and consequence SOLD = SLD + SOL = SLD + Skip [Inoue 92]
Resolution
á¬A1, ..., Ai, ..., Am | ¬H1, ..., Hnñ
B0¬B1, ..., Bn
á(¬A1, ..., B1, ..., Bk, ..., Am)q | ¬(H1, ..., Hn)q ñ
Skip
á¬A1 ,..., Ai ,..., Am | ¬H1,..., Hnñ
á¬A1,..., ,..., Am | ¬H1,..., Hn , Ai ñ
Slide26: á ¬ cuddly | q ñ
cuddly ¬ small, fluffy
¬ small, fluffy | q ñ
á ¬ fluffy | ¬ smallñ
fluffy¬
á q | ¬ smallñ
B Ù ( ¬ Cuddly) |- ¬ small
Analysis of Anti-Unification: Analysis of Anti-Unification
From HFP to Learning: From HFP to Learning Given : a set of clauses as examples
E1, E2,…, En, …
In HFP the set is treated as CNF
E1 Ù E2 Ù … Ù En
and generate complete set of hypos.
In learning one appropriate hypothesis must be chosen according to some criteria
Least Generalization Approach: Least Generalization Approach Generate a hypothesis Hn for En and then compute the least generalization of
H1, H2,…, Hn
if it exists.
Anti-Unification: Anti-Unification R.J. Popplestone said to G. Plotkin:
'Since unification is useful in automatic deduction, its dual might prove helpful for induction'
The dual of unification is now called anti-unification or generalization
Ex. The (most specific) anti-unification of
p(a, f(a), f(f(a))), p(b, a, f(a))
is p(x, y, f(y))
Anti-unification Algorithm: Anti-unification Algorithm áp(a, f(a), f(f(a))), p(b, a, f(a)) ñ
áp(a, f(a), f(f(a))), p(b, a, f(a)) ñ
áp(xa,b, f(a), f(f(a))), P(xa,b, a, f(a)) ñ
áp(xa,b, xf(a),a, f(f(a))), P(xa,b, xf(a),a, f(a)) ñ
áp(xa,b, xf(a),a, f(f(a))), P(xa,b, xf(a),a, f(a)) ñ
áp(xa,b, xf(a),a, f(xf(a),a)), p(xa,b, xf(a),a, f(xf(a),a)) ñ
p(x, y, f(y))
The AU case (cont’d): The AU case (cont’d) Induction algorithm:
the anti-unification algorithm p(x, y, z)
p(f (x), y, z) p(x, y, f(z)) p(x, y, y) …
p(x, y, f(y))
p(x, f(z), f(f(z))) p(b, y, f(y))
p(b, a, f(a))
p(a, f(a), f(f(a))) General Concrete
Problems: Problems The least generalization does not always exist if background theory is given.
The least generalization for clauses exists.
Finite number of minimally generalizations exist in some cases.
Why least generalization?
Logical explanation, if possible.
Induction Algorithm: Induction Algorithm M h1, h2, h3, ..., w1, w2, w3, ..., Hypotheses Examples Deductive Logic How is each hypothesis generated from example?
How does the sequence of hypotheses acts?
Identification in the limit[Gold]: Identification in the limit[Gold] An algorithm M identifies concepts in a class U in the limit iff $N 'n andgt; N hn = i
A class U of concepts can be identifiable iff $M s.t. M identifies concepts in a class U in the limit. M h1, h2, h3, ..., w1, w2, w3, ...,
The AU case (cont’d): The AU case (cont’d) Global criteria for successful inference identification in the limit
For any input sequence for every concept L(A), the inference algorithm output a correct axiom A after finite change of hypotheses, and never change it afterwards.
[Angluin]
[Lassez Maher Marriot] M h1, h2, h3, ..., Aq1, Aq2, Aq3, ..., A
Comparison 1: Comparison 1 Subsumption relation could be interpreted as likelihood.
Support set of a Hypothesis: Support set of a Hypothesis The anti-unification algorithm shows an atomic formula can be represented with a finite set of examples.
áp(a, f(a), f(f(a))), p(b, a, f(a)) ñ
áp(a, f(a), f(f(a))), p(b, a, f(a)) ñ
áp(xa,b, f(a), f(f(a))), P(xa,b, a, f(a)) ñ
áp(xa,b, xf(a),a, f(f(a))), P(xa,b, xf(a),a, f(a)) ñ
áp(xa,b, xf(a),a, f(f(a))), P(xa,b, xf(a),a, f(a)) ñ áp(x, y, f(y))
Comparison 2: Comparison 2 Buchberger’s Algorithm can be regarded
as learning
Computational Algebra: Computational Algebra Dicson’s Lemma (Hilbert’s Basis Theorem)
Every ideal of monomial is finitely generated
x m y n z k Þ (m, n, k)
Computational Algebra: Computational Algebra U : the class of ideals of commutative ring R with a unit.
Prop[Stephan and Venstov 98]. T is a finite tell-tail of an ideal I iff T is a basis of I .
Cor. U is identifiable from texts.
Û R is finitely generated.
Û R is noetherian.
Inductive MetaLogic Programming: Inductive MetaLogic Programming
Simple representation: Simple representation Metalogic Programming[Bowen and Kowalski]
P |- G Û Pdemo |- demo(P, G)
Inductive Metalogic Programming
[Hamfelt and Nillson 95]
B Ù H |- E
Û Pdemo |- demo(B Ù H , E)
More precise representation: More precise representation Metalogic Programming[Bowen and Kowalski]
P |- G Û Pdemo |- demo(P, G)
G = $ x1…xn A1…Am
HFP
Phyp Ù Pdemo |- $ x hyp(x) Ù demo(B Ù x, E)
Idenitification
Phyp Ù Pdemo Ù Ps |=
$ x 'y ( hyp(x) Ù (s(y) ® (demo(B Ù x, y)))
Learning as semantics: Learning as semantics PP Ù Pq Ù Pr |=
$ x 'y ( p(x) Ù (q(y) ® r(x, y))
P |= $ x 'y F(x, y)
PP Ù Pq Ù Pr |-
$ x 'y ( p(x) Ù (q(y) ® r(x, y))
P |- $ x 'y F(x, y)
Procedural Semantics
Examples: Examples Anti-Unification
Pterm Ù Pq Ù P= |=
$ x 'y ( term(x) Ù (q(y) ® x = y))
The Least Element
Pnat Ù Pq Ù Pless-equal |=
$ x 'y ( nat(x) Ù (q(y) ® less-equal(x, y))
Replacing proving with learning: Replacing proving with learning Robust Logic [Valiant]
Rich Prolog (2) [Martin et al]
LCM [Nakata and Hayashi]
More simple and understandable semantics would be useful for extension of Logic Programming.
Conclusion: Conclusion Hypothesis Finding Problem
combines logical aspects of induction and abduction
shows their incompleteness
is difficult because of the DNF -andgt; CNF transformation
Anti-unification
has good property of logic, statistics, algebra, and learning
Conclusion: Conclusion Inductive metalogic programming
would be an extension of logic programming
with the support of identification in the limit.