1 / 16

Regularized Adaptation for Discriminative Classifiers

Regularized Adaptation for Discriminative Classifiers. Xiao Li and Jeff Bilmes University of Washington, Seattle. This work …. Investigates links between a number discriminative classifiers Presents a general adaptation strategy – “regularized adaptation”. Adaptation for generative models.

nitesh
Download Presentation

Regularized Adaptation for Discriminative Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regularized Adaptation for Discriminative Classifiers Xiao Li and Jeff Bilmes University of Washington, Seattle

  2. This work … • Investigates links between a number discriminative classifiers • Presents a general adaptation strategy – “regularized adaptation” Xiao Li and Jeff Bilmes University of Washington, Seattle

  3. Adaptation for generative models • Target sample distribution is different from that of training • Has long been studied in speech recognition for generative models • Maximum likelihood linear regression • Maximum a posteriori • Eigenvoice Xiao Li and Jeff Bilmes University of Washington, Seattle

  4. Discriminative classifiers • Discriminative classifiers • Directly model the conditional relation of a label given features • Often yield more robust classification performance than generative models • Popularly used: • Support vector machines (SVM) • Multi-layer perceptrons (MLP) • Conditional maximum entropy models Xiao Li and Jeff Bilmes University of Washington, Seattle

  5. Existing Discriminative Adaptation Strategies • SVMs: • Combine SVs with selected adaptation data (Matic 93) • Combine selected SVs with adaptation data (Li 05) • MLPs: • Linear input network (Neto 95, Abrash 97) • Retrain both layers from unadapted model (Neto 95) • Retrain part of last layer (Stadermann 05) • Retrain first layer • Conditional MaxEnt: • Gaussian prior (Chelba 04) Xiao Li and Jeff Bilmes University of Washington, Seattle

  6. Regularizer Empirical risk SVMs and MLPs – Links • Binary classification (xt yt) • Discriminant function • Accuracy-regularization objective Nonlinear transform SVM: maximum margin MLP: weight decay MaxEnt: Gaussian smoothing Xiao Li and Jeff Bilmes University of Washington, Seattle

  7. SVMs and MLPs – Differences Xiao Li and Jeff Bilmes University of Washington, Seattle

  8. Adaptation • Adaptation data • May be in a small amount • May be unbalanced in classes • We intend to utilize • Unadapted model w0 • Adaptation data (xt, yt), t=1:T Xiao Li and Jeff Bilmes University of Washington, Seattle

  9. Regularized Adaptation • Generalized objective w.r.t. adapt data • Relations with existing SVM adapt. algs. • hinge loss (retrain SVM) • hard boosting (Matic 93) Margin error Xiao Li and Jeff Bilmes University of Washington, Seattle

  10. d0 Decision function using adapt data only New Regularized Adaptation for SVMs • Soft boosting – combine margin errors adapt data adapt data Xiao Li and Jeff Bilmes University of Washington, Seattle

  11. Regularized Adaptation for SVMs (Cont.) • Theorem, for linear SVMs In practice, we use α=1 Xiao Li and Jeff Bilmes University of Washington, Seattle

  12. Reg. Adaptation for MLPs • Extend this to a two-layer MLP • Relations with existing MLP adapt. algs. • Linear input network: μ∞ • Retrain from SI model: μ=0, ν=0 • Retrain last layer: μ=0, ν∞ • Retrain first layer: μ∞, ν=0 • Regularized:choose μ,ν on a dev set • This also relates to MaxEnt adaptation using Gaussian priors Xiao Li and Jeff Bilmes University of Washington, Seattle

  13. Experiments – Vowel Classification • Application: the Vocal Joystick • A voice based computer interface for individuals with motor impairments • Vowel quality  angle • Data set (extended) • Train/dev/eval: 21/4/10 speakers • 6-fold cross-validation • MLP configuration • 7 frames of MFCC + deltas • 50 hidden nodes • Frame-level classification error rate Xiao Li and Jeff Bilmes University of Washington, Seattle

  14. Varying Adaptation Time Xiao Li and Jeff Bilmes University of Washington, Seattle

  15. Varying # vowels in adaptation (3s each) SI: 32% Xiao Li and Jeff Bilmes University of Washington, Seattle

  16. Summary • Drew links between discriminative classifiers • Presented a general notion of “regularized adaptation” for discriminative classifiers • Natural adaptation strategies for SVMs and MLPs justified using a maximum margin argument • A unified view of different adaptation algorithms • MLP experiments show superior performance especially for class-skewed data Xiao Li and Jeff Bilmes University of Washington, Seattle

More Related