1 / 26

Phoneme Alignment based on Discriminative Learning

Phoneme Alignment based on Discriminative Learning. Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph Keshet, Hebrew University Yoram Singer, Google Dan Chazan, IBM. The Alignment Problem. Have a test. Text:. /hh ae v ey tcl t eh s tcl t/.

diep
Download Presentation

Phoneme Alignment based on Discriminative Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph Keshet, Hebrew University Yoram Singer, Google Dan Chazan, IBM

  2. The Alignment Problem Have a test Text: /hh ae v ey tcl t eh s tcl t/ Phonetic transcription: Waveform:

  3. The Alignment Problem Setting acoustic representation start-time of phoneme pi in x alignment function phonetic representation /hh ae v ey tcl t eh s tcl t/

  4. Acoustic Representation Short-time Fourier Transform

  5. Comparing Alignments e.g.

  6. -insensitive Cost -insensitivity region

  7. A Discriminative Learning Approach Training set: Learning Algorithm Hypotheses class Alignment function:

  8. Outline of Solution • Define the hypotheses class - constitutes the template of our alignment function: • Map each possible alignment into vectors in an abstract vector-space • Devise a projection in the vector-space which order alignments in accordance to their quality • Derive a simple online learning algorithm • Convert the Online Alg. to a Batch procedure with some formal guarantees

  9. Feature “Primitives” for Alignment acoustic and phonetic representation feature primitive for alignment Assessing the quality of a suggested alignment suggested alignment

  10. Feature Primitive I Cumulative spectral change across the boundaries

  11. Feature Primitives I Cumulative spectral change across the boundaries

  12. frame based phoneme classifier Learn a static frame-based phoneme classifier is the confidence that phoneme was uttered at frame (Dekel, Keshet, Singer, ‘04) Feature Primitives II Cumulative confidence in the phoneme sequence

  13. - average length of phoneme - standard deviation of the length of phoneme Feature Primitive III Phoneme duration model

  14. Feature Primitive IV Speaking-rate (“dynamics”) Spectogram at different rates of articulation (Pickett, 1980)

  15. Feature Functions for Alignment Mapping all possible alignments into a vector space slightly incorrect alignment correct alignment grossly incorrect alignment

  16. Main Solution Principle Find a linear projection that ranks alignments according to their quality slightly incorrect alignment correct alignment grossly incorrect alignment

  17. Main Solution Principle (cont.) example of low confidence projection slightly incorrect alignment correct alignment grossly incorrect alignment

  18. Main Solution Principle (cont.) example of incorrect projection slightly incorrect alignment correct alignment grossly incorrect alignment

  19. Online Learning Online Learning Algorithm Hypotheses class Cumulative cost

  20. Online Learning • For • Receive an instance • Predict • Receive true alignment and Pay cost • If • Set • Set • Update

  21. Converting from Online to Batch • Run online algorithm on the training set and generate w1,…,wM • Small online error  exists w 2 {w1,…,wM} whose generalization error is low (Cesa-bianchi et al.) • Choose w 2 {w1,…,wM} which minimizes the error on a fresh validation set

  22. Algorithmic aspects • Running-time: • If the “inference”, , can be performed in polynomial time (e.g. dynamic programming), then the entire algorithm operates in polynomial time as well. • Worst case analysis for Online Learning: • For any competitor u, • Generalization error • Online-to-batch conversion guarantees that: low online error  low generalization error

  23. Experiments • TIMIT corpus • Phoneme representation: • 48 phonemes (Lee & Hon, 1989) • Acoustic Representation: • MFCC+∆+∆∆ (ETSI standard) • TIMIT training set: • 500 utterances for training a frame classifier • 3096 utterances for learning alignment function • 100 utterances used for validation

  24. Alternative Approaches • Brugnara, Falavigna & Omologo, Automatic segmentation and labeling of speech based on HMM, 1993. • Hosom, Automatic phoneme alignment on acoustic-phonetic modeling, 2002. • Toledano, Gomez & Grande, Automatic Phoneme Alignment, 2003.

  25. Results Brugnara, Falavigna and Omologo, “Automatic segmentation and labling of speech based on Hidden Markov Models”, Speech Comm., 12 (1993) 357-370.

  26. Current and Future Work Discriminative learning methods for: • Whole phoneme sequence classification • 64% (ours) vs. 59% (HMM – IDIAP Torch3) • Results without normalization of silences etc. • Small vocabulary continuous speech recognition • Segmentation of utterances to speakers • Full online learning setting: real-time adaptation to Speaker/environment changes

More Related