1 / 17

Dynamic Match Lattice Spotting

Dynamic Match Lattice Spotting. Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace. Overview. Phonetic-based index  open-vocabulary Based on lattice-spotting technique

kalila
Download Presentation

Dynamic Match Lattice Spotting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace

  2. Overview • Phonetic-based index  open-vocabulary • Based on lattice-spotting technique • Two-tier database • Dynamic-match rules • Algorithmic optimisations NOTE: Patented technology

  3. g r ax s iy th ay n r nx ow d m nx ae … … … … … Concept greasy Phone decomposition ?

  4. Concept Target sequence: Dynamic matching Observed sequences: ax ih Costs

  5. Indexing Feature Extraction Segmentation Audio Sequence Generation Hyper- Sequence Generation Speech Recognition Lattices Sequence DB Hyper- Sequence DB

  6. Hyper-sequence Mapping • Map individual phones to “parent” classes • We use Vowels, Fricatives, Glides, Stops and Nasals • Simple example • Parent classes: Vowels, Consonants • Map each phone to parent class to create hyper-sequence Sequence DB Hyper- Sequence DB

  7. Hyper-sequence Mapping Search term: Sequence DB Hyper-sequence: Hyper-sequence DB

  8. Searching Term Phone decomp. Split long terms Results Hyper- mapping Dynamic Matching Merge long terms Keyword Verification Hyper- Sequence DB Sequence DB

  9. Dynamic Matching • Minimum Edit Distance (MED) • i.e. Levenshtein Distance • Insertions, deletions, substitutions • Finds minimum cost of transformation

  10. Dynamic Matching • Substitution costs • Derived from phone confusion statistics

  11. Optimisations • Prefix sequence optimisation • Early stopping optimisation • Linearised MED search approximation

  12. Long Term Merging olympic sites Search Search Merge Results

  13. Keyword Verification • Acoustic • Use acoustic score from lattice to boost occurrences with high confidence • Neural Network • Produce a confidence score by fusing • MED score and Acoustic score • Term phone length • Term phone classes

  14. Results Maximum Term-Weighted Value on EvalSet terms

  15. Conclusion • Open-vocabulary and phone-based • Patented technology utilises • sequence and hyper-sequence databases • optimisations for rapid searches • Advantages • Other languages • Economy of scale

  16. Conclusion • Limitations • Indexing speed and size • Need to split long sequences • Future work • Keyword Verification • Word-level information (e.g. LVCSR) • Acoustic features (e.g. prosody) • Indexing/searching frameworks • Spoken Document Retrieval and other semantic applications

  17. References • A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005 • K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication • CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict • S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc. • V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.

More Related