logging in or signing up qut STDEval Semprone Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 115 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: April 22, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Dynamic Match Lattice Spotting: Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy WallaceOverview: Overview Phonetic-based index open-vocabulary Based on lattice-spotting technique Two-tier database Dynamic-match rules Algorithmic optimisations NOTE: Patented technology Concept: Concept greasy ? Phone decompositionConcept: Concept Target sequence: Observed sequences: Costs Dynamic matching ax ihIndexing: Indexing Feature Extraction Segmentation Speech Recognition Sequence Generation Lattices Sequence DB Hyper- Sequence Generation Hyper- Sequence DB AudioHyper-sequence Mapping: Hyper-sequence Mapping Map individual phones to “parent” classes We use Vowels, Fricatives, Glides, Stops and Nasals Simple example Parent classes: Vowels, Consonants Map each phone to parent class to create hyper-sequence Sequence DB Hyper- Sequence DBHyper-sequence Mapping: Hyper-sequence Mapping Hyper-sequence DB Search term: Hyper-sequence: Sequence DB Searching: Searching Term Sequence DB Hyper- Sequence DB Results Dynamic Matching Keyword Verification Hyper- mapping Phone decomp. Split long terms Merge long termsDynamic Matching: Dynamic Matching Minimum Edit Distance (MED) i.e. Levenshtein Distance Insertions, deletions, substitutions Finds minimum cost of transformation Dynamic Matching: Dynamic Matching Substitution costs Derived from phone confusion statistics Optimisations: Optimisations Prefix sequence optimisation Early stopping optimisation Linearised MED search approximation Long Term Merging: Long Term Merging olympic sites Search Search Merge ResultsKeyword Verification: Keyword Verification Acoustic Use acoustic score from lattice to boost occurrences with high confidence Neural Network Produce a confidence score by fusing MED score and Acoustic score Term phone length Term phone classes Results: Results Maximum Term-Weighted Value on EvalSet termsConclusion: Conclusion Open-vocabulary and phone-based Patented technology utilises sequence and hyper-sequence databases optimisations for rapid searches Advantages Other languages Economy of scaleConclusion: Conclusion Limitations Indexing speed and size Need to split long sequences Future work Keyword Verification Word-level information (e.g. LVCSR) Acoustic features (e.g. prosody) Indexing/searching frameworks Spoken Document Retrieval and other semantic applicationsReferences: References A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005 K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
qut STDEval Semprone Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 115 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: April 22, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Dynamic Match Lattice Spotting: Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy WallaceOverview: Overview Phonetic-based index open-vocabulary Based on lattice-spotting technique Two-tier database Dynamic-match rules Algorithmic optimisations NOTE: Patented technology Concept: Concept greasy ? Phone decompositionConcept: Concept Target sequence: Observed sequences: Costs Dynamic matching ax ihIndexing: Indexing Feature Extraction Segmentation Speech Recognition Sequence Generation Lattices Sequence DB Hyper- Sequence Generation Hyper- Sequence DB AudioHyper-sequence Mapping: Hyper-sequence Mapping Map individual phones to “parent” classes We use Vowels, Fricatives, Glides, Stops and Nasals Simple example Parent classes: Vowels, Consonants Map each phone to parent class to create hyper-sequence Sequence DB Hyper- Sequence DBHyper-sequence Mapping: Hyper-sequence Mapping Hyper-sequence DB Search term: Hyper-sequence: Sequence DB Searching: Searching Term Sequence DB Hyper- Sequence DB Results Dynamic Matching Keyword Verification Hyper- mapping Phone decomp. Split long terms Merge long termsDynamic Matching: Dynamic Matching Minimum Edit Distance (MED) i.e. Levenshtein Distance Insertions, deletions, substitutions Finds minimum cost of transformation Dynamic Matching: Dynamic Matching Substitution costs Derived from phone confusion statistics Optimisations: Optimisations Prefix sequence optimisation Early stopping optimisation Linearised MED search approximation Long Term Merging: Long Term Merging olympic sites Search Search Merge ResultsKeyword Verification: Keyword Verification Acoustic Use acoustic score from lattice to boost occurrences with high confidence Neural Network Produce a confidence score by fusing MED score and Acoustic score Term phone length Term phone classes Results: Results Maximum Term-Weighted Value on EvalSet termsConclusion: Conclusion Open-vocabulary and phone-based Patented technology utilises sequence and hyper-sequence databases optimisations for rapid searches Advantages Other languages Economy of scaleConclusion: Conclusion Limitations Indexing speed and size Need to split long sequences Future work Keyword Verification Word-level information (e.g. LVCSR) Acoustic features (e.g. prosody) Indexing/searching frameworks Spoken Document Retrieval and other semantic applicationsReferences: References A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005 K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.