qut STDEval

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Dynamic Match Lattice Spotting: 

Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace

Overview: 

Overview Phonetic-based index  open-vocabulary Based on lattice-spotting technique Two-tier database Dynamic-match rules Algorithmic optimisations NOTE: Patented technology

Concept: 

Concept greasy ? Phone decomposition

Concept: 

Concept Target sequence: Observed sequences: Costs Dynamic matching ax ih

Indexing: 

Indexing Feature Extraction Segmentation Speech Recognition Sequence Generation Lattices Sequence DB Hyper- Sequence Generation Hyper- Sequence DB Audio

Hyper-sequence Mapping: 

Hyper-sequence Mapping Map individual phones to “parent” classes We use Vowels, Fricatives, Glides, Stops and Nasals Simple example Parent classes: Vowels, Consonants Map each phone to parent class to create hyper-sequence Sequence DB Hyper- Sequence DB

Hyper-sequence Mapping: 

Hyper-sequence Mapping Hyper-sequence DB Search term: Hyper-sequence: Sequence DB

Searching: 

Searching Term Sequence DB Hyper- Sequence DB Results Dynamic Matching Keyword Verification Hyper- mapping Phone decomp. Split long terms Merge long terms

Dynamic Matching: 

Dynamic Matching Minimum Edit Distance (MED) i.e. Levenshtein Distance Insertions, deletions, substitutions Finds minimum cost of transformation

Dynamic Matching: 

Dynamic Matching Substitution costs Derived from phone confusion statistics

Optimisations: 

Optimisations Prefix sequence optimisation Early stopping optimisation Linearised MED search approximation

Long Term Merging: 

Long Term Merging olympic sites Search Search Merge Results

Keyword Verification: 

Keyword Verification Acoustic Use acoustic score from lattice to boost occurrences with high confidence Neural Network Produce a confidence score by fusing MED score and Acoustic score Term phone length Term phone classes

Results: 

Results Maximum Term-Weighted Value on EvalSet terms

Conclusion: 

Conclusion Open-vocabulary and phone-based Patented technology utilises sequence and hyper-sequence databases optimisations for rapid searches Advantages Other languages Economy of scale

Conclusion: 

Conclusion Limitations Indexing speed and size Need to split long sequences Future work Keyword Verification Word-level information (e.g. LVCSR) Acoustic features (e.g. prosody) Indexing/searching frameworks Spoken Document Retrieval and other semantic applications

References: 

References A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005 K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.