1 / 25

iVector approach to Phonotactic LRE

iVector approach to Phonotactic LRE. Mehdi Soufifar 2 nd May 2011. Phonotactic LRE. Extract n-gram statistics. Train :. Phoneme sequence. N-gram counts. Train Classifier LR, SVM, LM GLC,. L. L. Classifier. Recognizer ( Hvite,BUTPR ,...). Recognizer ( Hvite,BUTPR ,...).

ronni
Download Presentation

iVector approach to Phonotactic LRE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. iVector approach to Phonotactic LRE Mehdi Soufifar 2nd May 2011

  2. Phonotactic LRE Extract n-gram statistics Train : Phoneme sequence N-gram counts Train Classifier LR, SVM, LM GLC,.. L L Classifier Recognizer (Hvite,BUTPR,...) Recognizer (Hvite,BUTPR,...) Test Utterance Language-dependant Utterance AM AM Extract n-gram statistics Test : Phoneme sequence N-gram counts Language dependant Score

  3. N-gram Counts 1 • Problem : Huge vector of n-gram counts • Solutions: • Choose the most frequent n-grams • Choosing top N n-grams discriminatively(LL) • Compress the n-gram counts • Singular Value Decomposition (SVD) • Decompose the document matrix D • Using the the transformation matrix U to reduce the n-gram vector dimensionality • PCA-based dimensionality reduction • iVector feature selection N^3= 226981 for RU phoneme set

  4. Sub-space multinomial modeling • Every vector of n-gram counts consist of E events (#n-grams) • Log probability of nth utterance in MN distribution is: • can be defined as : • Model parameter to be estimated in ML estimation are t and w • No analytical solution! • We use Newton Raphson update as a Numerical solution N^3= 226981 for RU phoneme set

  5. Sub-space multinomial modeling • 1st solution : • consider all 3-grams to be components of a Bernoulli trial • Model the entire vector of 3-gram counts with one multinomial distribution • N-gram events are not independent (not consistent with Bernoulli trial presumption!) • 2nd solution • Cluster 3-grams based on their histories • Model each history with a separate MN distribution • Data sparsity problem! • Clustering 3-grams based on binary-decision tree

  6. Training of iVector extractor • Number of iterations : 5-7 (depends on sub-space dimension) • Sub-space dimension : 600 3 seconds 10 seconds 30 seconds

  7. Classifiers • Configuration : L one-to-all linear classifier • L: number of targeted languages • Classifiers: • SVM • LR • Linear Generative Classifier • MLR (to be done!)

  8. Results on different classifiers • Task : NIST LRE 2009

  9. Results of different systems LRE09

  10. N-gram clustering • Remove all the 3-gram with repetition < 10 over all training utterances • Model each history with a separate MN distribution • 1084 histories, up to 33 3-grams each

  11. Merging histories using BDT • In case of 3-gram PiPjPk • Merging histories which do not increase the entropy more than a certain value Models 1 Models 2 PiP22Pk PiP33+22Pk PiP33Pk E1=Entropy(Model1) E2=Entropy(Model2) D= E1 – E2

  12. Results on DT Hist. merging • 1089-60 More iterations on training T => T matrix is moving toward zero matrix!

  13. Deeper insight to the iVector Extrac.

  14. Strange results • 3-grams with no repetition through out the whole training set should not affect system performance! • Remove all the 3-grams with no repetition through the whole training set • 35973->35406 (567 reduction) • Even worse result if we prune more!!!!

  15. DT clustering of n-gram histories • The overall likelihood is an order of magnitude higher than the 1st solution • Change of the model-likelihood is quite notable in each iteration! • The T Matrix is mainly zero after some iterations!

  16. 1st iteration

  17. 2nd iteration

  18. 3rd iteration

  19. 4th iteration

  20. 5th iteration

  21. 6th iteration

  22. Closer look at TRAIN set

  23. Ivector inspection Engl Cant

  24. iVect inspection • Multiple data source causes bimodality • We also see this effect in some single source languages Amha

More Related