1 / 30

Speaker Identification using Gaussian Mixture Model

Speaker Identification using Gaussian Mixture Model. Presented by CWJ. Reference. D. A. Reynolds and R. C. Rose, “Robust Text- Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, vol.3, No.1,

gerodi
Download Presentation

Speaker Identification using Gaussian Mixture Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker Identification using Gaussian Mixture Model Presented by CWJ 2000/05/03

  2. Reference D. A. Reynolds and R. C. Rose, “Robust Text- Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, vol.3, No.1, pp.72-83,January 1995. 2000/05/03

  3. Outline 1. Introduction to Speaker Recognition 2. Gaussian Mixture Speaker Model (GMM) 3. Experimental Evaluation 2000/05/03

  4. Introduction to Speaker Recognition A. Some definitions of S.R. 1. Two tasks of Speaker Recognition -- Speaker Identification (this paper) e.g. voice mail labeling -- Speaker Verification e.g. financial transactions 2000/05/03

  5. 2. Two forms of spoken input -- Text-dependent -- Text-independent (this paper) 3. System Range -- Closed Set (this paper) -- Open Set 2000/05/03

  6. B. Several Methods used in Speaker Recognition VQ NN HMM VQ NN GMM HMM VQ NN 1985 1995 2000/05/03

  7. 1. Use long-term averages of acoustic features (spectrum,pitch…) first and earliest Idea : To average out the factors influencing intra-speaker variation, leave only the speaker dependent component. Drawback : required long speech utterance(>20s) 2000/05/03

  8. 2. Training SD model for each speaker Explicit segmentation HMM Implicit segmentation VQ,GMM 2000/05/03

  9. HMM: Advantage : Text-independent Drawback : a significant increase in computational complexity VQ: Advantage : unsupervised clustering Drawback : Text-dependent 2000/05/03

  10. 3. The use of discriminative Neural Network (NN) ※ model the decision function which best discriminate speakers Advantage : less parameters, higher performance compared to VQ model Drawback : The network must be retrained when a new speaker is added to the system. 2000/05/03

  11. GMM : Advantage : Text-Independent probabilistic framework (robust) computationally efficient easily to be implemented 2000/05/03

  12. The Gaussian mixture model (GMM) A. Model Interpretations Speech Recognition (GMM) State Level 2000/05/03

  13. 1. Each Gaussian component models an acoustic class Speaker Recognition Speaker k Acoustic class …………………… 2000/05/03

  14. 2. GMM gives the arbitrarily-shaped densities a better approximation. 2000/05/03

  15. 2000/05/03

  16. B. Signal Analysis 2000/05/03

  17. C. Model Description Gaussian Mixture Density Where Nodal, Grand,Global Nodal, diagonal (this) D-dimensional random vector 2000/05/03

  18. D. ML Parameter Estimation Step: 1. Beginning with an initial model 2. Estimate a new model such that 3. Repeated 2. until convergence is reached. 2000/05/03

  19. Mixture Weights Means Variances 2000/05/03

  20. E. Speaker Identification a group of speakers S = {1,2,…,S} is represented by GMM’s λ1, λ2, …, λs which 2000/05/03

  21. Experimental Evaluation A. Performance Evaluation e.g. frame rate = 10ms, T = 500 the length of a test utterance = 5 seconds 2000/05/03

  22. % correct identification = # of correctly identified segments total # of segments ×100 2000/05/03

  23. C. Algorithmic Issues 1. Model Initialization : -- Use SI,context dependent subword HMM’s mean and their global variance. -- Randomly choose 50 vectors for initial model mean, and an identity matrix for the starting covariance matrix 2000/05/03

  24. 2. Variance Limiting : When training a nodal variance GMM the magnitude of variance so, give the constraint The min variance, is determined empirically. 2000/05/03

  25. 3. Model Order : I. Performance versus model order. 1,2,4,8,16,32,64 2000/05/03

  26. II. Performance for different amounts of training data and model orders III. Performance versus model order for trained with 30,60,and 90s of speech. 2000/05/03

  27. 4. Spectral Variability Compensation : 1) Frequency Warping : : original Nyquist frequency 2000/05/03

  28. 2) Spectral Shape Compensation : Assumption : Signal Processing Speaker Channel Frequency response mel-cepstral feature vector f 2000/05/03

  29. ‧mean normalization for T.I. channel filter (CMS) ‧use “channel invariant” feature (delta-cepstral) 2000/05/03

  30. 5. Large Population Performance : 2000/05/03

More Related