300 likes | 550 Views
Speaker Identification using Gaussian Mixture Model. Presented by CWJ. Reference. D. A. Reynolds and R. C. Rose, “Robust Text- Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, vol.3, No.1,
E N D
Speaker Identification using Gaussian Mixture Model Presented by CWJ 2000/05/03
Reference D. A. Reynolds and R. C. Rose, “Robust Text- Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, vol.3, No.1, pp.72-83,January 1995. 2000/05/03
Outline 1. Introduction to Speaker Recognition 2. Gaussian Mixture Speaker Model (GMM) 3. Experimental Evaluation 2000/05/03
Introduction to Speaker Recognition A. Some definitions of S.R. 1. Two tasks of Speaker Recognition -- Speaker Identification (this paper) e.g. voice mail labeling -- Speaker Verification e.g. financial transactions 2000/05/03
2. Two forms of spoken input -- Text-dependent -- Text-independent (this paper) 3. System Range -- Closed Set (this paper) -- Open Set 2000/05/03
B. Several Methods used in Speaker Recognition VQ NN HMM VQ NN GMM HMM VQ NN 1985 1995 2000/05/03
1. Use long-term averages of acoustic features (spectrum,pitch…) first and earliest Idea : To average out the factors influencing intra-speaker variation, leave only the speaker dependent component. Drawback : required long speech utterance(>20s) 2000/05/03
2. Training SD model for each speaker Explicit segmentation HMM Implicit segmentation VQ,GMM 2000/05/03
HMM: Advantage : Text-independent Drawback : a significant increase in computational complexity VQ: Advantage : unsupervised clustering Drawback : Text-dependent 2000/05/03
3. The use of discriminative Neural Network (NN) ※ model the decision function which best discriminate speakers Advantage : less parameters, higher performance compared to VQ model Drawback : The network must be retrained when a new speaker is added to the system. 2000/05/03
GMM : Advantage : Text-Independent probabilistic framework (robust) computationally efficient easily to be implemented 2000/05/03
The Gaussian mixture model (GMM) A. Model Interpretations Speech Recognition (GMM) State Level 2000/05/03
1. Each Gaussian component models an acoustic class Speaker Recognition Speaker k Acoustic class …………………… 2000/05/03
2. GMM gives the arbitrarily-shaped densities a better approximation. 2000/05/03
B. Signal Analysis 2000/05/03
C. Model Description Gaussian Mixture Density Where Nodal, Grand,Global Nodal, diagonal (this) D-dimensional random vector 2000/05/03
D. ML Parameter Estimation Step: 1. Beginning with an initial model 2. Estimate a new model such that 3. Repeated 2. until convergence is reached. 2000/05/03
Mixture Weights Means Variances 2000/05/03
E. Speaker Identification a group of speakers S = {1,2,…,S} is represented by GMM’s λ1, λ2, …, λs which 2000/05/03
Experimental Evaluation A. Performance Evaluation e.g. frame rate = 10ms, T = 500 the length of a test utterance = 5 seconds 2000/05/03
% correct identification = # of correctly identified segments total # of segments ×100 2000/05/03
C. Algorithmic Issues 1. Model Initialization : -- Use SI,context dependent subword HMM’s mean and their global variance. -- Randomly choose 50 vectors for initial model mean, and an identity matrix for the starting covariance matrix 2000/05/03
2. Variance Limiting : When training a nodal variance GMM the magnitude of variance so, give the constraint The min variance, is determined empirically. 2000/05/03
3. Model Order : I. Performance versus model order. 1,2,4,8,16,32,64 2000/05/03
II. Performance for different amounts of training data and model orders III. Performance versus model order for trained with 30,60,and 90s of speech. 2000/05/03
4. Spectral Variability Compensation : 1) Frequency Warping : : original Nyquist frequency 2000/05/03
2) Spectral Shape Compensation : Assumption : Signal Processing Speaker Channel Frequency response mel-cepstral feature vector f 2000/05/03
‧mean normalization for T.I. channel filter (CMS) ‧use “channel invariant” feature (delta-cepstral) 2000/05/03
5. Large Population Performance : 2000/05/03