1 / 11

Phonotactic using SVM for LRE2009

Phonotactic using SVM for LRE2009. Tom áš Mikolov, 2009. Task - numbers. TRAIN set: 9 810 utterances DEV set ( 30s condition): 13 331 23 classes (languages). Features. Feature vectors for HU recognizer 35 000 trigrams 80 000 four-grams

ohio
Download Presentation

Phonotactic using SVM for LRE2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phonotactic using SVM for LRE2009 Tomáš Mikolov, 2009

  2. Task - numbers • TRAIN set: 9810 utterances • DEV set (30s condition): 13 331 • 23 classes (languages)

  3. Features • Feature vectors for HU recognizer • 35 000 trigrams • 80 000 four-grams • Large amount of features results in huge data files – 80 000 * 13 331 ~= 1 billion numbers (a few GB on disk)

  4. SVMTorch vs SVMLib • Both work almost the same • In my work, SVM Torch was used • When using all features, training & testing phase is terribly slow (1/2 day on 10 machines)

  5. PCA for dimensionality reduction • Idea: many features tend to co-occure • Experiments: reduction of features to ~500 dimensions works almost as good as using original features • Speed-up of training phase depends on original number of features – reduction from 80 000 to 500 may result in speed-up ~1000x or more

  6. PCA for dimensionality reduction

  7. Tuning C parameter of SVM • When tuning C for each language, Cavg* goes from 2.62 goes to 2.44 (with fourgram features) • Olda: with trigrams, from 2.47 to 2.33

  8. Improvements of accuracy • Linear interpolation of trigram + fourgram score: 2.33 + 2.44 => 2.22 • Using multiple recognizers: • HU fourgram (80 000 => 500): 2.44 • RU trigram (110 000 => 500): 1.96 • HU+RU features (1000): 1.64

  9. Current work • Using features from EN recognizer • Scoring eval set

  10. Additional results • Using 5 SVMs: RU3, RU4, HU3, HU4, EN3 for 30sec condition: • DEV set C* grand average: 1.06 (~0.65 primary system) • EVAL set cavg: 2.42 (~2.29 primary system) • For 10sec and 3sec condition, results are not so good – probably because no unigram/bigram features are used (not tested yet)

  11. Conclusion • PCA based feature extraction provides great speed-up (seems to work much better than feature selection) • Most further improvement can come from different phoneme recognizers (different features) • Tuning C separately for each language provides ~8% relative improvement

More Related