1 / 17

FSG Implementation in Sphinx2

FSG Implementation in Sphinx2. Mosur Ravishankar Jul 15, 2004. Outline. Input specification FSG related API Application examples Implementation issues. FSG Specification. “Assembly language” for specifying FSGs Low-level Most standards should compile down to this level

vic
Download Presentation

FSG Implementation in Sphinx2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  2. Outline • Input specification • FSG related API • Application examples • Implementation issues FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  3. FSG Specification • “Assembly language” for specifying FSGs • Low-level • Most standards should compile down to this level • Set of N states, numbered 0 .. N-1 • Transitions: • Emitting or non-emitting (aka null or epsilon) • Each emitting transition emits one word • Fixed probability 0 < p <= 1. • One start state, and one final state • Null transitions can effectively give you as many as needed • Goal: Find the highest likelihood path from the start state to the final state, given some input speech FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  4. 2 1 4 3 9 0 6 8 7 5 An FSG Example FSG_BEGIN leg NUM_STATES 10 START_STATE 0 FINAL_STATE 9 # Transitions T 0 1 0.5 to T 1 2 0.1 city1 … T 1 2 0.1 cityN T 2 3 1.0 from T 3 4 0.1 city1 … T 3 4 0.1 cityN T 4 9 1.0 T 0 5 0.5 from T 5 6 0.1 city1 … T 5 6 0.1 cityN T 6 7 1.0 to T 7 8 0.1 city1 … T 7 8 0.1 cityN T 8 9 1.0 FSG_END city1 city1 from e to cityN cityN city1 city1 from e to cityN cityN FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  5. [city] [city] from e to 2 1 1 4 3 9 from e [city] [city] to 0 boston 6 7 8 5 chicago pittsburgh 0 buffalo seattle A Better Representation • Composition of FSGs FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  6. [filler] [filler] [filler] [filler] [city] [city] from [filler] 2 1 3 4 9 [filler] e to 0 [filler] [filler] [filler] [filler] from e [city] [city] to 6 8 7 5 Multiple Pronunciations and Filler Words • Alternative pronunciations added automatically • Filler word transitions (silence and noise) added automatically • A filler self-transition at every state • Noise words added only if noise penalty (probability) > 0 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  7. FSG Related API • Loading during initialization (i.e., fbs_init()): • -fsgfn flag specifying an FSG file to load (similar to –lmfn flag) • Difference: FSG name is contained in the file • Dynamic loading: • char *uttproc_load_fsgfile(char *fsgfile); returns the FSG string name contained in the file • Switching to an FSG: • uttproc_set_fsg (char *fsgname); • Deleting a previously loaded FSG: • uttproc_del_fsg (char *fsgname); • Old demos could be run with FSGs, simply by recompiling with new libraries FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  8. Mixed LM/FSG Decoding Example • (See lm_fsg_test.c) FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  9. [allphone] [city] [city] from [allphone] [allphone] 2 1 4 3 9 e to 0 [allphone] from e [city] [city] to 6 8 7 5 Another Example: Garbage Models • Extraneous speech could be absorbed using an allphone “garbage model” FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  10. B/W Training and Forced Alignment • Consolidate code for FSGs, Baum-Welch training, and forced alignment? • Sentence HMMs for training and alignment are essentially linear FSGs • Alternative pronunciations and filler words handled automatically • Differences: • B/W uses forward (and backward) algorithm instead of Viterbi • Alignment has to produce phone and state segmentation as well FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  11. Implementation • Straightforward expansion of word-level FSG into a triphone HMM network • Viterbi beam search over this HMM network • No major optimizations attempted (so far) • No lextree implementation (What?) • Static allocation of all HMMs; not allocated “on demand” (Oh, no!) • FSG transitions represented by NxN matrix (You can’t be serious!!) • Speed/Memory usage profile needs to be evaluated • Mostly new set of data structures, separate from existing ones • Should be easily ported to Sphinx3 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  12. p3 q3 q2 q1 p4 p1 p2 1 1 2 2 Implementation: FSG Expansion to HMMs word1 0 word2 word1 0 word2 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  13. p1 p2 p3 p4 1 1 p1 p2 p2’ p1’ p1’’ p2’’ Implementation: Triphone HMMs word1 0 word1 p1 p2 p3 p4 p1’ p4’ 0 p1’’ p4’’ Multiple leaf HMMs for different right contexts Multiple root HMMs for different left contexts 1-phone words use SIL as right context Special case for 2-phone words FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  14. p1 p2 p3 p4 q1 q2 q3 Possible Optimization: Lextrees word1 wordN Lextree (associated with source state) FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  15. w w Possible Optimization: Path Pruning • If there are two transitions with the same label into the same state, the one starting out with a worse score can be pruned • But reconciling with lextrees is tricky, since labels are now blurred FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  16. Other Issues Pending • Dynamic allocation and management of HMMs • Implementation of absolute pruning • Lattice generation • N-best list generation • … FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

  17. Where Is It? • My copy of open source version of Sphinx2 • Someone needs to update the sourceforge copy • Html documentation has been updated FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)

More Related