1 / 29

Sausage

Sausage. Lidia Mangu Eric Brill Andreas Stolcke Presenter : Jen-Wei Kuo 2004/ 9 /24. Referred Reference. CSL ’ 00 Finding Consensus in Speech Recognition : Word Error Minimization and other Applications of Confusion Networks

elkan
Download Presentation

Sausage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sausage Lidia Mangu Eric Brill Andreas Stolcke Presenter : Jen-Wei Kuo 2004/ 9 /24

  2. Referred Reference • CSL’00 Finding Consensus in Speech Recognition : Word Error Minimization and other Applications of Confusion Networks • Eurospeech’99 Finding Consensus among Words : Lattice-Based Word Error Minimization • Eurospeech’97 Explicit Word Error Minimization in N-Best List Rescoring

  3. Motivation • The mismatch between the standard scoring paradigm (MAP) and the evaluation metric (WER). maximize sentence posterior probability  minimize sentence level error

  4. An Example Correct answer : I’M DOING FINE

  5. potential hypothesis Word Error Minimization • Minimizing the expected word error under the posterior distribution

  6. N-best Approximation

  7. Lattice-Based Word Error Minimization • Computational Problem • Several orders of magnitude larger than in N-best lists of practical size. • No efficient algorithm of this kind. • Fundamental Difficulty • Objective function is based on pairwise string distance, a nonlocal measure. • Solution • Replace pairwise string alignment with a modified multiple string alignment. • WE (word error)  MWE (modified word error)

  8. Multiple Alignment Lattice to Confusion Network

  9. Multiple Alignment • Finding the optimal alignment is a problem for which no efficient solution is known (Gusfield, 1992) • We resort to a heuristic approach based on lattice topology.

  10. Algorithms • Step1. Arc Pruning • Step2. Same-Arc Clustering • Step3. Intra-Word Clustering • Step4*. Same-Phones Clustering • Step5. Inter-Word Clustering • Step6. Adding null hypothesis • Step7. Consensus-based Lattice Pruning

  11. Arc Pruning

  12. Intra-Word Clustering • Same-Arc Clustering • Arcs with with same word_id, start frame and end frame would be merged first. • Intra-Word Clustering • Arcs with same word_id would be merged.

  13. Same-Phones Clustering • Same-Phones Clustering • Arcs with same phone sequences would be clustered in this stage.

  14. Inter-Word Clustering • Inter-Word Clustering • Remaining arcs be clustered at this stage finally.

  15. Adding null hypothesis • For each equivalent class, if the sum of the posterior probabilities is less than threshold (0.6) than add the null hypothesis to the class.

  16. Consensus-based Lattice Pruning • Standard Method  Likelihood-based • Paths whose overall score differs by more than a threshold from the best-scoring path are removed from the word graph. • Proposed Method  Consensus-based • Firstly we construct a pruned confusion network. • Then intersect the original lattice with the pruned confusion network.

  17. Algorithm

  18. 我 是 我 是 我 An Example • How to merge ? 是 我 是 我 是 誰

  19. Computational Issues • Partial Order Stupid Method: • History-based Look-ahead • Apply first-pass search to find the history arcs for each arc.  Generate the initial partial ordering. • While clusters are merged, lots of computation for (recursive) updates are needed. • Thousands of arcs  need lots of memory storage.

  20. Computational Issues – An example CA If we merge B and C, what happened? JA DA KA A B FA MA A C D F GA LA NA E G J H L N I K M

  21. Experimental Set-up • Lattices was built using HTK • Training Corpus • Trained with about 60 hours of Switchboard speech. • LM is a backoff trigram model trained on 2.2 million words of Switchboard transcripts. • Testing Corpus • Test set in the 1997 JHU

  22. Experimental Results

  23. Experimental Results

  24. Confusion Network Analyses

  25. Other Approaches • ROVER (Recognizer Output Voting Error Reduction)

More Related