Mr. Yik-Cheung Tam Dr. Brian Mak

An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak

Outline • Motivation • Overview of MCE training • Problem using N-best hypotheses • Alternative:1-nearest hypothesis • What? • Why? • How? • Evaluation • Conclusion

MCE Overview • The MCE loss function: • Distance measure: • G(X) may be computed using the N-best hypotheses. • l(.) = 0-1 soft error-counting function (Sigmoid) • Gradient descent method to obtain a better estimate.

Trainable region Problem Using N-best Hypotheses • When d(X) gets large enough, • It falls out of the steep trainableregion of Sigmoid.

What is 1-nearest Hypothesis? • d(1-nearest) <= d(1-best) • The idea can be generalized to N-nearest hypotheses.

Using 1-nearest Hypothesis • Keep the training data inside the steep trainable region. Trainable region

How to Find 1-nearest Hypothesis? • Method 1 (exact approach) • Stack-based N-best decoder Drawback: • N may be very large => memory problem • Need to limit the size of N. • Method 2 (approximated approach) • Modify the Viterbi algorithm with a special pruning scheme.

Approximated 1-nearest Hypothesis • Notation: • V(t+1, j) : accumulated score at time t+1 and state j • : transition probability from state i to j • : observation probability at time t+1 and state j • : accumulated score of the Viterbi path of the correct string at time t+1. • Beam(t+1) : beam width applied at time t+1

Approximated 1-nearest Hypothesis (.) • There exists some “nearest” path in the search space (shaded area).

System Evaluation

Corpus: Aurora • Aurora • Noisy connected digits derived from TIDIGIT. • Multi-condition training: (Train on noisy condition) • {subway, babble, car, exhibition} x {clean, 20, 15, 10, 5} (5 noise levels) • 8440 training utterances. • Testing: (Test on matched noisy condition) • Same as above except with additional samples with 0 and –5 dB (7 noise levels) • 28,028 testing utterances.

System Configuration • Standard 39-dimension MFCC (cep + D + DD) • 11 Whole-word digit HMM (0-9, oh) • 16 states, 3 Gaussians per state • 3-state silence HMM, 6 Gaussians per state • 1-state short pause HMM tied to the 2nd state of the silence model. • Baum-Welch training to obtain the initial HMM. • Corrective MCE training on HMM parameters.

System Configuration (.) • Compare 3 kinds of competing hypotheses: • 1-best hypothesis • Exact 1-nearest hypothesis • Approx. 1-nearest hypothesis • Sigmoid parameters: • Various (control slope of Sigmoid) • Offset = 0

Experiment I: Effect of Sigmoid slope • Learning rate = 0.05, with different • 0.1 (best test performance) • 0.5 (steeper) • 0.02, 0.004 (more flat) Baseline: 12.71% 1-best: 11.01% Approx. 1-nearest: 10.71% Exact 1-nearest: 10.45%

Effective Amount of Training Data • Soft error < 0.95 is defined to be “effective”. • 1-nearest approach has more training data when the Sigmoid slope is relatively steep. Exact. 1-nearest (67%) Approx. 1-nearest (51%) 1-best (40%)

Experiment II: Compensation With More Training Iterations • With 100% effective training data, apply more training iterations: • = 0.004, learning rate = 0.05 • Result: Slow improvement compared to the best case. Exact 1-nearest with gamma = 0.1

Experiment II: Compensation Using a Larger Learning Rate • Use a larger learning rate (0.05 -> 1.25) • Fix = 0.004 (100% effective training data) • Result: 1-nearest approach is better than one-best approach after compensation.

Using a Larger Learning Rate (.) • Training performance: MCE loss versus # of training iterations. Approx. 1-nearest 1-best Exact. 1-nearest

Using a Larger Learning Rate (..) • Test performance: WER versus # of training iterations. 1-best (11.55%) Approx. 1-nearest (10.70%) Exact. 1-nearest (10.79%)

Conclusion • 1-best and 1-nearest methods were compared in MCE training. • Effect of Sigmoid slope. • Compensation on using a flat sigmoid. • 1-nearest method is better than 1-best approach. • More trainable data are available in the 1-nearest approach. • Approx. and exact 1-nearest methods yield comparable performance.

Questions and Answers

Mr. Yik-Cheung Tam Dr. Brian Mak

Mr. Yik-Cheung Tam Dr. Brian Mak

Presentation Transcript

TAM

Small Engine Technology Instructor: Mr. Brian Conte

Principal Investigator: DR NAOMI KEBBA, MBChB (MAK), MMED SURGERY (MAK)

Dr. Brian Rivers

Andy Cheung

DR. SUYANTO, SE, MM, MAk

Cheung Chau

Dr. Theresa Tam Manager, Respiratory Infections Section

Dr. Graham Scoles Dr. Brian Rossnagel

Dr Brian J Taylor

 Mr. Tam is giving speech.

Highlights Dr Brian Cook

Highlights Dr Brian Cook

Dr. Siu Cheung KONG Prof. Fu-Yun YU

Dr. Brian Duffy brd@media.mit

mak

Clone Apps Like Yik Yak - Mobile Apps like Yik Yak - AIS Technolabs

DR. SUYANTO, SE, MM, MAk

Dr. Brian Kwetkowski - Family Physician