Sausage

Sausage Lidia Mangu Eric Brill Andreas Stolcke Presenter : Jen-Wei Kuo 2004/ 9 /24

Referred Reference • CSL’00 Finding Consensus in Speech Recognition : Word Error Minimization and other Applications of Confusion Networks • Eurospeech’99 Finding Consensus among Words : Lattice-Based Word Error Minimization • Eurospeech’97 Explicit Word Error Minimization in N-Best List Rescoring

Motivation • The mismatch between the standard scoring paradigm (MAP) and the evaluation metric (WER). maximize sentence posterior probability  minimize sentence level error

An Example Correct answer : I’M DOING FINE

potential hypothesis Word Error Minimization • Minimizing the expected word error under the posterior distribution

N-best Approximation

Lattice-Based Word Error Minimization • Computational Problem • Several orders of magnitude larger than in N-best lists of practical size. • No efficient algorithm of this kind. • Fundamental Difficulty • Objective function is based on pairwise string distance, a nonlocal measure. • Solution • Replace pairwise string alignment with a modified multiple string alignment. • WE (word error)  MWE (modified word error)

Multiple Alignment Lattice to Confusion Network

Multiple Alignment • Finding the optimal alignment is a problem for which no efficient solution is known (Gusfield, 1992) • We resort to a heuristic approach based on lattice topology.

Algorithms • Step1. Arc Pruning • Step2. Same-Arc Clustering • Step3. Intra-Word Clustering • Step4*. Same-Phones Clustering • Step5. Inter-Word Clustering • Step6. Adding null hypothesis • Step7. Consensus-based Lattice Pruning

Arc Pruning

Intra-Word Clustering • Same-Arc Clustering • Arcs with with same word_id, start frame and end frame would be merged first. • Intra-Word Clustering • Arcs with same word_id would be merged.

Same-Phones Clustering • Same-Phones Clustering • Arcs with same phone sequences would be clustered in this stage.

Inter-Word Clustering • Inter-Word Clustering • Remaining arcs be clustered at this stage finally.

Adding null hypothesis • For each equivalent class, if the sum of the posterior probabilities is less than threshold (0.6) than add the null hypothesis to the class.

Consensus-based Lattice Pruning • Standard Method  Likelihood-based • Paths whose overall score differs by more than a threshold from the best-scoring path are removed from the word graph. • Proposed Method  Consensus-based • Firstly we construct a pruned confusion network. • Then intersect the original lattice with the pruned confusion network.

Algorithm

是我是我是我 An Example • How to merge ? 是我是我是誰

Computational Issues • Partial Order Stupid Method: • History-based Look-ahead • Apply first-pass search to find the history arcs for each arc.  Generate the initial partial ordering. • While clusters are merged, lots of computation for (recursive) updates are needed. • Thousands of arcs  need lots of memory storage.

Computational Issues – An example CA If we merge B and C, what happened? JA DA KA A B FA MA A C D F GA LA NA E G J H L N I K M

Experimental Set-up • Lattices was built using HTK • Training Corpus • Trained with about 60 hours of Switchboard speech. • LM is a backoff trigram model trained on 2.2 million words of Switchboard transcripts. • Testing Corpus • Test set in the 1997 JHU

Experimental Results

Confusion Network Analyses

Other Approaches • ROVER (Recognizer Output Voting Error Reduction)

Sausage

Sausage

Presentation Transcript

OGF and Sausage

Saxonville’s Italian Sausage

Making really good sausage

Breakfast Buttermilk Pancakes Sausage Links Sliced Potatoes Biscuits with Sausage Gravy

Welcome to the sausage factory

Welcome to the sausage factory

Sausage and Politics

Weißwurst (White sausage )

Uncle Benny’s Ultimate Sausage Machine

SAUSAGE BITES

STEAK AND SAUSAGE

Budaejjigae (Spicy Sausage Stew)

SAUSAGE BUSTER

Sweet Sausage

Chicken Creole Sausage Prototype Presentation

LITHUANIAN SAUSAGE

Best Sausage Stuffer

Sausage Seasonings

Sausage vegetable pasta recipe

SAUSAGE RAGU

Sausage-making 101