logging in or signing up corrector japan craig Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 24 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 09, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript The correction of XML data: The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr http://www.lri.fr/~mdr Approximation and Edit Distance Testers and Correctors Correcting regular binary trees Applications to XML Practical corrector Relative value of documents Slide2: Relations Dist (R,S) = # x : if Dist(R,S) < Edit-distance Trees: Tree-Edit-Distance Min # Deletions, Insertions Approximation Left-deletion Left-insertionClassical Tree-Edit-Distance: Binary trees : p-Distance allows permutation Classical Tree-Edit-Distance Dist(T1,T2) =2 p-Dist (T1,T2) =1 Dist (T, L) = Min Dist (T,T’) a e b c d a e b c a e b c d f e Deletion InsertionSlide4: Satisfiability : Tree |= F Approximate satisfiability Tree |= F Image on a class K of trees Approximate satisfiability Logic, testers, correctors: Logic, testers, correctors A Tester decides |= for a formula F. A Corrector takes a tree T close to a language L and find T’ in L close to T. This is possible if F follows a simple logic. Theorem. there is linear time corrector for regular binary trees and a constant distance. Given a tree T, k- close to a regular language L, we find in linear time T’ in L, c.k -close to T. General problem: given a language L defined in some Logic, find a corrector. Theorem. (implicit in Alon and al. FOCS2000) There is a linear time corrector for regular words and distance Application to Model-Checking (LICS2002)Simple example: Simple example Tester for 0+ 1* 0+ Types of segments: 000000011111110000010000 probably accepted 011110000000110111 rejected with high probability 0 0 1 00000 11111 00011 11100 01100 0 0 Corrector for 0+ 1* 0+ 00000001111110000100000 *Regular Trees: Tree-automata Logical definability on trees Tree grammar Regular expression Regular Trees r(a,b(a,b(a,b(a,b(a,b(a,b)....) r(a(a,b(a,b(a(a,b),b)....),b)Tree automata: (q0, q0) q1 (q0,q1) q1 Tree automata q0 q0 q0 q0 q0 q0 q1 q1 q1 q1 q1 q0 q0 q0 q1 q2 (q1,q1)q2 (q1,q0)q2 (q2,-) q2 (-,q2) q2Feasible and infeasible subtrees: Definition : a subtree t is feasible for L if there are subtrees (for its leaves) which reach states (q1...ql) such that the state of the root q=t(q1...ql) can reach an accepting state (in the automaton for L). A subtree is infeasible if it is not feasible Feasible and infeasible subtrees feasible infeasibleInfeasible subtrees: Fact . If then the number of unfeasible subtrees of length a is O(n). Fact. If the distance is small, there are few infeasibles trees. Intuition : make local corrections at the root of the infeasible trees Infeasible subtrees Structure of the corrector: Phase 1 : (Bottom-up) Marking of * nodes, roots of infeasible subtrees. Phase 2 : (Top-down) Recursive analysis of the * subtrees to make root accept. Phase 3 : Local corrections Structure of the corrector q0 q1Phase 1 : bottom-up marking: Phase 1 : bottom-up marking Definitions: A terminal *-node is the first sink node of a run A * subtree of a node v is the subtree whose root is v reaching leaves or *-node A node v is a *-node if its state is a sink node when all possible reachable states replace the *-nodes of its *-subtree. 4. Compute the size of the subtrees * * Runs with all possible reachable states (q,q’) reach a sink. * O(n) procedure. Lemma 1: If Dist(T,L)<k, there are at most k *-nodes.Phase 2 : top-down possible states: Phase 2 : top-down possible states * * Let (q,q’) a possible choice at the top *-subtree. Let q’’ a possible state for the *-node of the left *-subtree * q1 q2 q’’ instead of * Correction needed.Case analysis of the automaton: Case 1: One essentially-connected component. Case 2: General case Many components Case analysis of the automaton Case 1: one component: Lemma: if (q1,q2,q’’) are in the same connected component, there is a finite subtree t which can correct. Case a : there is a transition (q,q’) to q’’ with both q,q’ in C: there is a finite tree t1 from q1 to q, a finite tree t2 from q2 to q’ and the correction is: Case 1: one component q1 q2 q’’ q q’ q’’ q1 q2 t2 t1Case 1: b and c: Case b : there is a transition (q,q’) to q’’ with one of q or q’ being q0: suppose q=q0. The correction uses t2 and cut the left branch. Case c: there is a transition (q0,q0) to q’’. The correction cuts both branches. Case 1: b and c q1 q2 q’’ q0 q’ q’’ q2 t2 q1 q2 q’’ q0 q0 q’’Correction rules: Correction rules q1 q2 q’’ instead of *Case 2 : many components : Hypothesis : q1 in Ci q2 in Cj q’’ in Ck Case a: P such that Ci < Ck and Cj < Ck Find t1 and t2 as in case 1.a Case 2 : many components q1 q2 q’’ q q’ q’’ q1 q2 t2 t1Case 2: b and c: Case b,c : P such that Ci >Ck and Cj < Ck Find t2 and let Cp=inf(Ci,Ck). Cut the left branch until Cp. Case d: P such that Ci >Ck and Cj > Ck Let Cp=inf(Ci,Ck). Cut the left branch until Cp. Let Cq=inf(Cj,Ck). Cut the right branch until Cq. Case 2: b and c q1 q2 q’’ q’ q’’ q2 t2 q1 q2 q’’ q’’Correction rules: Correction rules q1 q2 q’’ instead of *Analysis of the corrector: Fact 1: finitely many insertions Fact 2: deletions less predictable Lemma: If the cut is large, than the distance must be large. Analysis of the corrector General Corrector: 1. Do the inductive Marking bottom-up. 2. Apply the recursive analysis of compatible states top-down. 3. For each transition (q,q’) -> q’’ apply the correction, compute the distance and select the rule with smallest distance 4. Select the * states with Minimum Dist.. Procedure is O(n), exponential in k and size(Q)General result: Theorem: If Dist(T,L) <k, the general corrector finds T’ such that Dist(T,T’) <c.k. Proof : # *-nodes < k Case 1: 0 *-node: no correction Case 2: at least 1 *-node. Looking at all possible k-variations will correct the errors in the *-subtree and diminish the *-nodes. General resultUnranked trees: XML: Labelled trees of large degree. Structure given by a « grammar », or DTD. Generalization of automata: 1. Unranked tree automaton 2. Tree-walking automaton Method: Code an unranked labelled tree with a binary labelled tree. Advantage: the correction table is FINITE. Theorem: If Dist(T,L) <k, the general corrector finds T’ such that Dist(T,T’) <c.k. Unranked trees: XMLApplications to XML: Applications to XML DTD <?xml version='1.0' ?> <!ELEMENT book (chapter*,title,author)> <!ELEMENT chapter (title,para*)> <!ELEMENT title (#PCDATA)> <!ELEMENT para (#PCDATA)> <!ELEMENT author (#PCDATA)> Binary Normal Form l -> l1, a l1 -> c1, t c1 -> c, c1 c1 -> - c -> t, p1 p1 -> p, p1 p1 -> - a -> data t -> data p -> data XML tree decomposition: XML tree decomposition XML file transformed into a binary labelled tree. XML file with errors: XML file with errors Corrected XML file: Corrected XML file No ambiguities on the possible states of q’’ Immediate correction! XML Correction rules: XML Correction rules q1 q2 q’’ instead of *Java Implementation: Parser: Xerces, Tree structure : DOM Phase 1: look at the parent node of *-node. Propose tags for * (c or f) Phase 2: for each proposal, compute the distance. *=c, distance=1, replacing c with b. *=f, distance=2, replacing c with b and adding an a leaf. Choose the 1st solution. Java Implementation a b c * b a d a DTD: d (c,b,a) or (f,b,a) c (a,b,b) f (a,b,b,a) Relative value of documents: Relative value of documents Given a DTD, mark the Web documents as follows: Infinity if there are far Dist(Document,DTD)=i Provides a relative valued landscape. Works for boolean combinations Generalize to Min{ Dist(D,DTD’) : }Distance on words and trees: Distance on words and trees On words, how can one compute Dist(w,w’), a P-problem Is is possible in less than O(n) ? Yes, STOC 2003 Dist(w,L) and Dist(L,L’) Given two trees, how can one compute: Dist(T,T’) P on ordered trees and NP-complete on unordered trees p-Dist(T,T’) NP-complete. Conclusion: Conclusion Testers and Correctors Testers for approximate verification Correctors Trees Regular trees are testable If T is at distance less than k,then we can correct it. Theoretical algorithms Practical algorithmsTesters, Correctors and formal verification : Testers, Correctors and formal verification Two different views of logical verification: Formal verification. How can we check if a program satisfies a specification? Logical proof: theorem proving, model checking Design a tester for the specification (closer to practice: Windows 95 to XP !) (Blum & Kanan) Combine the two approaches to approximately verify a specification (LICS 2002, Sylvain’s thesis)Testers : Testers Self-testers and correctors for Linear Algebra Blum & Kanan 1985s Testers for graph properties : k-colorability Goldreich and al. 1995s graph properties have testers Alon and al. 1999 Regular languages have testers Alon and al. 2000s Testers for Regular tree languages (Mdr and Magniez) Corrector for regular trees! Blum’s Checker and Tester: Blum’s Checker and Tester Checker for f (Blum, Kannan, ~1990) P C x y A checker is a probabilistic program with an oracle P such that for all x,k : if P=f, C(x,k) = Correct If P(x)!=f(x), Prob[ C(x,k) =Buggy] >1- ½^k Correct BuggySelf-testing: Distance d(f,g) = | {x : f(x) != g(x)}| / | D| A self-tester for f is a probabilistic program T(P, ) such that : If d(P,f)=0, then T(P, )=Correct If d(P,f) > then T(P, )=Buggy Corrector. Division (x,y) : Majority { x.r /y.r : r random.} Self-testing Property testing on graphs : Property testing on graphs H random subgraph G Bipartite 2-colorable H 2-Colorability G bipartite Prob [ H is bipartite] =1 G is -far from bipartite Prob [ H is non-bipartite] > 2/3Property testing on graphs : Property testing on graphs 3-Colorability G 3-colorable Prob [ H is 3-colorable] =1 G is -far from 3-colorable Prob [ H is non 3-colorable] > 2/3 Generalization to k-colorability G H random subgraphProperty testing and descriptive complexity: Which graphs (and matrices) properties have testers? Alon and al., STOC 99: Sigma 2 testers Compression. Property testing and descriptive complexity -equivalent Property testing on words : Property testing on words F : 0*1* W |= F Prob [ H |= F’ ] =1 W is -far from F Prob [ H |= not F’] >2/3 H random subword Word WA testable regular property: A testable regular property W |= F Prob [ H |= F’ ] =1 W is -far from F Prob [ H |= not F’] >2/3 Many 10 appear in W. Repeating the test will detect it with high probability H random subword 0000 1111 0111 ..... F’ Word W How can we verify F : 0*1* ? distance(w,w’) = Hamming distanceRegular properties are testable : Regular properties are testable Theorem. Regular languages are testable. N. Alon, M. Krivelevich, I. Newman, M. Szegedy FOCS 99. General idea : if a word is far from a regular language, it contains many subwords which are infeasible and can be detected. Theorem. Dyck languages are not testable You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
corrector japan craig Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 24 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 09, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript The correction of XML data: The correction of XML data Université Paris II & LRI Michel de Rougemont mdr@lri.fr http://www.lri.fr/~mdr Approximation and Edit Distance Testers and Correctors Correcting regular binary trees Applications to XML Practical corrector Relative value of documents Slide2: Relations Dist (R,S) = # x : if Dist(R,S) < Edit-distance Trees: Tree-Edit-Distance Min # Deletions, Insertions Approximation Left-deletion Left-insertionClassical Tree-Edit-Distance: Binary trees : p-Distance allows permutation Classical Tree-Edit-Distance Dist(T1,T2) =2 p-Dist (T1,T2) =1 Dist (T, L) = Min Dist (T,T’) a e b c d a e b c a e b c d f e Deletion InsertionSlide4: Satisfiability : Tree |= F Approximate satisfiability Tree |= F Image on a class K of trees Approximate satisfiability Logic, testers, correctors: Logic, testers, correctors A Tester decides |= for a formula F. A Corrector takes a tree T close to a language L and find T’ in L close to T. This is possible if F follows a simple logic. Theorem. there is linear time corrector for regular binary trees and a constant distance. Given a tree T, k- close to a regular language L, we find in linear time T’ in L, c.k -close to T. General problem: given a language L defined in some Logic, find a corrector. Theorem. (implicit in Alon and al. FOCS2000) There is a linear time corrector for regular words and distance Application to Model-Checking (LICS2002)Simple example: Simple example Tester for 0+ 1* 0+ Types of segments: 000000011111110000010000 probably accepted 011110000000110111 rejected with high probability 0 0 1 00000 11111 00011 11100 01100 0 0 Corrector for 0+ 1* 0+ 00000001111110000100000 *Regular Trees: Tree-automata Logical definability on trees Tree grammar Regular expression Regular Trees r(a,b(a,b(a,b(a,b(a,b(a,b)....) r(a(a,b(a,b(a(a,b),b)....),b)Tree automata: (q0, q0) q1 (q0,q1) q1 Tree automata q0 q0 q0 q0 q0 q0 q1 q1 q1 q1 q1 q0 q0 q0 q1 q2 (q1,q1)q2 (q1,q0)q2 (q2,-) q2 (-,q2) q2Feasible and infeasible subtrees: Definition : a subtree t is feasible for L if there are subtrees (for its leaves) which reach states (q1...ql) such that the state of the root q=t(q1...ql) can reach an accepting state (in the automaton for L). A subtree is infeasible if it is not feasible Feasible and infeasible subtrees feasible infeasibleInfeasible subtrees: Fact . If then the number of unfeasible subtrees of length a is O(n). Fact. If the distance is small, there are few infeasibles trees. Intuition : make local corrections at the root of the infeasible trees Infeasible subtrees Structure of the corrector: Phase 1 : (Bottom-up) Marking of * nodes, roots of infeasible subtrees. Phase 2 : (Top-down) Recursive analysis of the * subtrees to make root accept. Phase 3 : Local corrections Structure of the corrector q0 q1Phase 1 : bottom-up marking: Phase 1 : bottom-up marking Definitions: A terminal *-node is the first sink node of a run A * subtree of a node v is the subtree whose root is v reaching leaves or *-node A node v is a *-node if its state is a sink node when all possible reachable states replace the *-nodes of its *-subtree. 4. Compute the size of the subtrees * * Runs with all possible reachable states (q,q’) reach a sink. * O(n) procedure. Lemma 1: If Dist(T,L)<k, there are at most k *-nodes.Phase 2 : top-down possible states: Phase 2 : top-down possible states * * Let (q,q’) a possible choice at the top *-subtree. Let q’’ a possible state for the *-node of the left *-subtree * q1 q2 q’’ instead of * Correction needed.Case analysis of the automaton: Case 1: One essentially-connected component. Case 2: General case Many components Case analysis of the automaton Case 1: one component: Lemma: if (q1,q2,q’’) are in the same connected component, there is a finite subtree t which can correct. Case a : there is a transition (q,q’) to q’’ with both q,q’ in C: there is a finite tree t1 from q1 to q, a finite tree t2 from q2 to q’ and the correction is: Case 1: one component q1 q2 q’’ q q’ q’’ q1 q2 t2 t1Case 1: b and c: Case b : there is a transition (q,q’) to q’’ with one of q or q’ being q0: suppose q=q0. The correction uses t2 and cut the left branch. Case c: there is a transition (q0,q0) to q’’. The correction cuts both branches. Case 1: b and c q1 q2 q’’ q0 q’ q’’ q2 t2 q1 q2 q’’ q0 q0 q’’Correction rules: Correction rules q1 q2 q’’ instead of *Case 2 : many components : Hypothesis : q1 in Ci q2 in Cj q’’ in Ck Case a: P such that Ci < Ck and Cj < Ck Find t1 and t2 as in case 1.a Case 2 : many components q1 q2 q’’ q q’ q’’ q1 q2 t2 t1Case 2: b and c: Case b,c : P such that Ci >Ck and Cj < Ck Find t2 and let Cp=inf(Ci,Ck). Cut the left branch until Cp. Case d: P such that Ci >Ck and Cj > Ck Let Cp=inf(Ci,Ck). Cut the left branch until Cp. Let Cq=inf(Cj,Ck). Cut the right branch until Cq. Case 2: b and c q1 q2 q’’ q’ q’’ q2 t2 q1 q2 q’’ q’’Correction rules: Correction rules q1 q2 q’’ instead of *Analysis of the corrector: Fact 1: finitely many insertions Fact 2: deletions less predictable Lemma: If the cut is large, than the distance must be large. Analysis of the corrector General Corrector: 1. Do the inductive Marking bottom-up. 2. Apply the recursive analysis of compatible states top-down. 3. For each transition (q,q’) -> q’’ apply the correction, compute the distance and select the rule with smallest distance 4. Select the * states with Minimum Dist.. Procedure is O(n), exponential in k and size(Q)General result: Theorem: If Dist(T,L) <k, the general corrector finds T’ such that Dist(T,T’) <c.k. Proof : # *-nodes < k Case 1: 0 *-node: no correction Case 2: at least 1 *-node. Looking at all possible k-variations will correct the errors in the *-subtree and diminish the *-nodes. General resultUnranked trees: XML: Labelled trees of large degree. Structure given by a « grammar », or DTD. Generalization of automata: 1. Unranked tree automaton 2. Tree-walking automaton Method: Code an unranked labelled tree with a binary labelled tree. Advantage: the correction table is FINITE. Theorem: If Dist(T,L) <k, the general corrector finds T’ such that Dist(T,T’) <c.k. Unranked trees: XMLApplications to XML: Applications to XML DTD <?xml version='1.0' ?> <!ELEMENT book (chapter*,title,author)> <!ELEMENT chapter (title,para*)> <!ELEMENT title (#PCDATA)> <!ELEMENT para (#PCDATA)> <!ELEMENT author (#PCDATA)> Binary Normal Form l -> l1, a l1 -> c1, t c1 -> c, c1 c1 -> - c -> t, p1 p1 -> p, p1 p1 -> - a -> data t -> data p -> data XML tree decomposition: XML tree decomposition XML file transformed into a binary labelled tree. XML file with errors: XML file with errors Corrected XML file: Corrected XML file No ambiguities on the possible states of q’’ Immediate correction! XML Correction rules: XML Correction rules q1 q2 q’’ instead of *Java Implementation: Parser: Xerces, Tree structure : DOM Phase 1: look at the parent node of *-node. Propose tags for * (c or f) Phase 2: for each proposal, compute the distance. *=c, distance=1, replacing c with b. *=f, distance=2, replacing c with b and adding an a leaf. Choose the 1st solution. Java Implementation a b c * b a d a DTD: d (c,b,a) or (f,b,a) c (a,b,b) f (a,b,b,a) Relative value of documents: Relative value of documents Given a DTD, mark the Web documents as follows: Infinity if there are far Dist(Document,DTD)=i Provides a relative valued landscape. Works for boolean combinations Generalize to Min{ Dist(D,DTD’) : }Distance on words and trees: Distance on words and trees On words, how can one compute Dist(w,w’), a P-problem Is is possible in less than O(n) ? Yes, STOC 2003 Dist(w,L) and Dist(L,L’) Given two trees, how can one compute: Dist(T,T’) P on ordered trees and NP-complete on unordered trees p-Dist(T,T’) NP-complete. Conclusion: Conclusion Testers and Correctors Testers for approximate verification Correctors Trees Regular trees are testable If T is at distance less than k,then we can correct it. Theoretical algorithms Practical algorithmsTesters, Correctors and formal verification : Testers, Correctors and formal verification Two different views of logical verification: Formal verification. How can we check if a program satisfies a specification? Logical proof: theorem proving, model checking Design a tester for the specification (closer to practice: Windows 95 to XP !) (Blum & Kanan) Combine the two approaches to approximately verify a specification (LICS 2002, Sylvain’s thesis)Testers : Testers Self-testers and correctors for Linear Algebra Blum & Kanan 1985s Testers for graph properties : k-colorability Goldreich and al. 1995s graph properties have testers Alon and al. 1999 Regular languages have testers Alon and al. 2000s Testers for Regular tree languages (Mdr and Magniez) Corrector for regular trees! Blum’s Checker and Tester: Blum’s Checker and Tester Checker for f (Blum, Kannan, ~1990) P C x y A checker is a probabilistic program with an oracle P such that for all x,k : if P=f, C(x,k) = Correct If P(x)!=f(x), Prob[ C(x,k) =Buggy] >1- ½^k Correct BuggySelf-testing: Distance d(f,g) = | {x : f(x) != g(x)}| / | D| A self-tester for f is a probabilistic program T(P, ) such that : If d(P,f)=0, then T(P, )=Correct If d(P,f) > then T(P, )=Buggy Corrector. Division (x,y) : Majority { x.r /y.r : r random.} Self-testing Property testing on graphs : Property testing on graphs H random subgraph G Bipartite 2-colorable H 2-Colorability G bipartite Prob [ H is bipartite] =1 G is -far from bipartite Prob [ H is non-bipartite] > 2/3Property testing on graphs : Property testing on graphs 3-Colorability G 3-colorable Prob [ H is 3-colorable] =1 G is -far from 3-colorable Prob [ H is non 3-colorable] > 2/3 Generalization to k-colorability G H random subgraphProperty testing and descriptive complexity: Which graphs (and matrices) properties have testers? Alon and al., STOC 99: Sigma 2 testers Compression. Property testing and descriptive complexity -equivalent Property testing on words : Property testing on words F : 0*1* W |= F Prob [ H |= F’ ] =1 W is -far from F Prob [ H |= not F’] >2/3 H random subword Word WA testable regular property: A testable regular property W |= F Prob [ H |= F’ ] =1 W is -far from F Prob [ H |= not F’] >2/3 Many 10 appear in W. Repeating the test will detect it with high probability H random subword 0000 1111 0111 ..... F’ Word W How can we verify F : 0*1* ? distance(w,w’) = Hamming distanceRegular properties are testable : Regular properties are testable Theorem. Regular languages are testable. N. Alon, M. Krivelevich, I. Newman, M. Szegedy FOCS 99. General idea : if a word is far from a regular language, it contains many subwords which are infeasible and can be detected. Theorem. Dyck languages are not testable