110 likes | 188 Views
Phonotactic using SVM for LRE2009. Tom áš Mikolov, 2009. Task - numbers. TRAIN set: 9 810 utterances DEV set ( 30s condition): 13 331 23 classes (languages). Features. Feature vectors for HU recognizer 35 000 trigrams 80 000 four-grams
E N D
Phonotactic using SVM for LRE2009 Tomáš Mikolov, 2009
Task - numbers • TRAIN set: 9810 utterances • DEV set (30s condition): 13 331 • 23 classes (languages)
Features • Feature vectors for HU recognizer • 35 000 trigrams • 80 000 four-grams • Large amount of features results in huge data files – 80 000 * 13 331 ~= 1 billion numbers (a few GB on disk)
SVMTorch vs SVMLib • Both work almost the same • In my work, SVM Torch was used • When using all features, training & testing phase is terribly slow (1/2 day on 10 machines)
PCA for dimensionality reduction • Idea: many features tend to co-occure • Experiments: reduction of features to ~500 dimensions works almost as good as using original features • Speed-up of training phase depends on original number of features – reduction from 80 000 to 500 may result in speed-up ~1000x or more
Tuning C parameter of SVM • When tuning C for each language, Cavg* goes from 2.62 goes to 2.44 (with fourgram features) • Olda: with trigrams, from 2.47 to 2.33
Improvements of accuracy • Linear interpolation of trigram + fourgram score: 2.33 + 2.44 => 2.22 • Using multiple recognizers: • HU fourgram (80 000 => 500): 2.44 • RU trigram (110 000 => 500): 1.96 • HU+RU features (1000): 1.64
Current work • Using features from EN recognizer • Scoring eval set
Additional results • Using 5 SVMs: RU3, RU4, HU3, HU4, EN3 for 30sec condition: • DEV set C* grand average: 1.06 (~0.65 primary system) • EVAL set cavg: 2.42 (~2.29 primary system) • For 10sec and 3sec condition, results are not so good – probably because no unigram/bigram features are used (not tested yet)
Conclusion • PCA based feature extraction provides great speed-up (seems to work much better than feature selection) • Most further improvement can come from different phoneme recognizers (different features) • Tuning C separately for each language provides ~8% relative improvement