Klein

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Boyer Moore Searches on Binary Texts: 

Boyer Moore Searches on Binary Texts Shmuel Tomi Klein Miri Kopel Ben-Nissan Bar Ilan University, ISRAEL Accelerating

Slide2: 

Outline Background and motivation Boyer Moore algorithm New binary variant Analysis Experiments Summary

Slide3: 

Important application of Automata: PATTERN MATCHING KMP BDM BM Boyer & Moore this-is-a-sample-text--- pattern Match Backwards ! !

Slide4: 

Mismatch – case 1: delta1 u b u a b does not occur in x x y Boyer – Moore Algorithm

Slide5: 

u b u a x y b occurs in x Mismatch – case 2: delta1 Boyer – Moore Algorithm

Slide6: 

u b u a x y Mismatch – case 3: delta2 u reoccurs in x preceded by c ≠ a Boyer – Moore Algorithm

Slide7: 

u b u a x y v Mismatch – case 4: delta2 Only a suffix v of u reoccurs in x Boyer – Moore Algorithm

Slide8: 

Boyer – Moore Example

Slide9: 

Problems of Binary Boyer & Moore delta1 useless most work by delta1

Slide10: 

Need for Binary Boyer & Moore Compressed Matching Given E(T) and P look for E(P) in E(T) rather than P in D(E(T)) Suggested Solution: BBBMM Blocked Binary Boyer Moore Matching

Slide11: 

BBBMM

Slide12: 

ffghabdgttiocb sbgghj 01100010 01101010 BBBMM More information in binary case ASCII BINARY

Slide13: 

BBBMM extended delta1

Slide14: 

BBBMM Total size of delta1 tables: If too large, use limit value Size of delta1 tables reduced to

Slide15: 

BBBMM Original delta1 : increase of text pointer BBBMM delta1 : shift size Mismatch not in last block Correct[sh,j]

Slide16: 

BBBMM delta2

Slide17: 

Analysis Assumption : random input Reasonable for compressed text Expected # comparisons till mismatch: Bit-wise: Blocked:

Slide18: 

Analysis Expected # bits shifted after mismatch: Bit-wise: M Blocked: M’

Slide19: 

Experiments English Bible (2.5MB) World Factbook (1.5MB) Text: Huffman encoded Patterns: Random substrings of lengths 10 to 500 k = 8

Slide20: 

Experiments: Average # comparisons between shifts

Slide21: 

Experiments: Average size of shifts Bit-wise

Slide22: 

Experiments: Average # comparisons for 1000 bits

Slide23: 

Experiments: Time to locate first occurrence (ms)

Slide24: 

Summary Blocked variant of BM Faster than alternatives, Overhead 1-10 K Extensions: ASCII, words instead of characters

Slide25: 

Thank you !