1 / 35

Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK

Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK. Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos. Outline. Prior Work Adaptation Acoustic Modeling Robust Feature Selection

ellema
Download Presentation

Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos

  2. Outline • Prior Work • Adaptation • Acoustic Modeling • Robust Feature Selection • Bridge over to HIWIRE work-plan • Robust Features, Acoustic Modeling, Adaptation • New areas: audio-visual, microphone arrays TUC - SDSG

  3. Adaptation • Transformation-based adaptation • MAP Adaptation(Bayesianlearning approximation) • Speaker Clustering / Speaker space models. • Robust Feature Selection • Combinations TUC - SDSG

  4. Acoustic Model Adaptation: SDSG Selected Work • Constrained Estimation Adaptation • Maximum Likelihood Stochastic Transformations • Combined Transformation-MAP adaptation • MLST Basis Vectors • Incremental Adaptation • Dependency modeling of biases • Vocal Tract Norm. with Linear Transformation TUC - SDSG

  5. Constrained Estimation Adaptation(Digalakis 1995) • Hypothesize a sequence of feature-space linear transformations: • Adapted models (A) are then: • diagonal. • Adaptation is equivalent to estimating the state dependent TUC - SDSG

  6. Compared to MLLR(Leggeter 1996) • Both published at the same time. • MLLR is only model adaptation. • MLLR transforms only the model means • in MLLR is block diagonal. • Constrained estimation is more generic. TUC - SDSG

  7. Limitations of the Linear Assumption • Linear assumption may be too restrictive in modeling the training testing dependency. • Goal: Try a more complex transformation. • All Gaussians in a class are restricted to be transformed identically using the same transformation. • Goal: Let each Gaussian in a class to decide for its own transformation. • Which transformation transforms each Gaussian is predefined. • Goal: Let the system to automatically choose the transformation-Gaussian couples. TUC - SDSG

  8. ML Stochastic Transformations (MLST)(Diakoloukas Digalakis 1997) • Hypothesize a sequence of feature-space stochastic transformations of the form: TUC - SDSG

  9. MLST: model-space • Use a set of MLSTs instead of linear transformations. • Adapted observation densities: • MLST-Method I • is diagonal • MLST-Method II • is block diagonal TUC - SDSG

  10. MLST: Reduce the number of mixture components • The adapted mixture densities consist of Gaussians. • Reduce the Gaussians back to their SI number: • HPT: Apply the component transformation with the highest probability to each Gaussian. • LCT: Linear combination of all component transforms. • MTG: Merge the transformed Gaussians. TUC - SDSG

  11. Schematic representation of MLST adaptation TUC - SDSG

  12. MLSTproperties • Asj,bsjare shared at a state or state-cluster level • Transformation weights lj are estimated at a Gaussian level • MLST combines transformed Gaussians • MLST is flexible on how to select a transformation for each Gaussian. • MLST chooses arbitrary number of transformations per class. TUC - SDSG

  13. MLST compared to ML Linear Transforms • Hard versus Soft decision: • Choose the linear component based on the training samples. • Adaptation Resolution: • Linear components are common to a transformation class • Choose the transformation at a Gaussian level • Increased adaptation resolution - robust estimation TUC - SDSG

  14. MLST basis transforms (Boulis Diakoloukas Digalakis 2000) • Algorithm steps: • Cluster the training speaker space into classes • Train MLST component transforms using data from each training speaker class • Adaptation data is used to estimate the transformation weight • It is like having a-priori knowledge to the estimation process • Results in rapid speaker adaptation • Significant gains for medium and small data sets TUC - SDSG

  15. Combined Transformation Bayesian (Digalakis Neumeyer 1996) • MAP estimation can be expressed as: • Retain the asymptotic properties of MAP • Retain fast adaptation rates of transformations. TUC - SDSG

  16. Rapid Speech Recognizer Adaptation (Digalakis et.al 2000) • Dependence models of the bias components of cascaded transforms. Techniques: • Gaussian multiscale process • Hierarchical tree-structured prior • Explicit correlation models • Markov Random Fields TUC - SDSG

  17. VTN with Linear Transformation(Potamianos and Rose 1997, Potamianos and Narayanan 1998) • Vocal Tract Normalization: Select optimal warping factor  according to  = arg max P(Xª|a, , H) where H is the transcription, and Xª frequency warped observation vector by factor a. • VTN with linear transformation {, } = arg max P(Xª|a, , , H) where h() is a parametric linear transformation with parameter  TUC - SDSG

  18. Acoustic Modeling:SDSG Selected Work • Genones: Generalized Gaussian mixturetying scheme • Stochastic Segment Models (SSMs) TUC - SDSG

  19. Genones: Generalized Mixture Tying (Digalakis Monaco Murveit 1996) • Algorithm Steps: • Clustering of HMM states based on the similarity of their distributions • Splitting: Construct seed codebooks for each state cluster • Either identify the most likely mixture component subset • Or cluster down the original codebook • Reestimation of the parameters using Baum-Welch • Better trade-off between modelling resolution and robustness • Genones are used in Decipher and Nuance TUC - SDSG

  20. Segment Models • HMM limitations: • Weak duration modelling • Conditional independence of observations assumption • Restrictions on feature extraction imposed by frame-based observations • Segment models motivation: • Larger number of degrees of freedom in the model • Use segmental features • Model correlation of frame-based features • Powerful modelling of transitions and longer-range speech dynamics • Less distortion for segmental coding  segmental recognition more efficient TUC - SDSG

  21. General Stochastic Segment Models • A segment s in an utterance of N frames is s = {(τa , τb): 1≤ τa≤ τb≤ N} • Segment model density: • Segment models generate a variable-length sequence of frames TUC - SDSG

  22. Stochastic Segment Model (Ostendorf Digalakis 1992) • Problem: Model time correlation within a segment • Solution: Gaussian model variations based on assumptions about the form of statistical dependency • Gauss-Markov model • Dynamical System model • Target State model. TUC - SDSG

  23. SSM Viterbi Decoding (Ostendorf Digalakis Kimball 1996) • HMM Viterbi recognition: • State to Word sequence mapping: • SSM analogous solution: • Map the segment label sequence to the appropriate word sequence: TUC - SDSG

  24. From HMMs to Segment Models(Ostendorf Digalakis 1996) • Unified view of stochastic modeling • General stochastic model that encompasses most SM type models • Similarities in terms of correlation and parameter tying assumptions • Analogies between segment models and HMMs TUC - SDSG

  25. Robust Feature Selection • Time-Frequency Representation for ASR (Potamianos and Maragos 1999) • Confidence Measure Estimation for ASR Features sent over wireless channels (“missing features”) (Potamianos and Weerackody 2001) • AM-FM Model Based Features (Dimitriadis et al 2002) TUC - SDSG

  26. Other Work • Multiple source separation using microphone arrays (Sidiropoulos et al. 2001) TUC - SDSG

  27. Prior Work Overview Constr. Est. Adapt. MLST. Combinations MAP (Bayes) Adapt. VTLN Genones Segment Models Robust Features TUC - SDSG

  28. HIWIRE Work Proposal Adaptation Bayes optimal class. Acoustic Modeling Segment Models Feature Selection AM-FM Features Microphone Arrays Speech/Noise Separation Audio Visual ASR Baseline experiments TUC - SDSG

  29. Bayes optimal classification (HIWIRE proposal) • Classifier decision for a test data vector xtest: • Choose the class that results in the highest value: TUC - SDSG

  30. Bayes optimal versus MAP • Assumption: the posterior is sufficiently peaked around the most probable point • MAP approximation: • θMAP is the set of parameters that maximize: TUC - SDSG

  31. Why Bayes optimal classification • Optimal classification criterion • The prediction of all the parameter hypotheses is combined • Better discrimination • Less training data • Faster asymptotic convergence to the ML estimate • However: • Computationally more expensive • Difficult to find analytical solutions • ....hence some approximations should still be considered TUC - SDSG

  32. Segment Models • Phone Transition modeling • New features • Combine with HMMs • Parametric modeling of feature trajectories TUC - SDSG

  33. AM-FM Features • See NTUA presentation TUC - SDSG

  34. Audio-Visual ASR • Baseline TUC - SDSG

  35. Microphone Array • Speech – Noise source separation algorithms TUC - SDSG

More Related