1 / 18

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition. Nengheng Zheng. Supervised under Professor P.C. Ching. Nov. 26 , 2004. Outline. Speech production and glottal pulse excitation in detail Linear prediction: short-term and Long-term

raffaello
Download Presentation

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov.26 , 2004

  2. Outline • Speech production and glottal pulse excitation in detail • Linear prediction: short-term and Long-term • Glottal spectrum estimated with long-term prediction and acoustic features • For speaker recognition implementation

  3. Glottal pulses Vocal tract Speech signal Speech Production Discrete time model for speech production A combined transfer function

  4. Acoustic Features of Glottal Pulse • Time domain • pitch period • pitch period perturbation (jitter) • pulse amplitude perturbation (shimmer) • glottal pulse width • abruptness of closure of the glottal flow • aspiration noise • Frequency domain • fundamental frequency (F0) • spectral tilt (slope) • harmonic richness

  5. Glottal Pulse and Voice Quality • Glottal pulse shape plays an important role on the quality of Natural or synthesized vowels [Rosenberg 1971] • The shape and periodicity of vocal cord excitation are subject to large variation • Such variations are significant for preserving the speech naturalness • A typical glottal pulse: asymmetric with shorter falling phase; spectrum with -12dB/octave decay • More variation among different speakers than among different utterance of the same speaker [Mathews 1963] • Such variations have little significance for speech intelligibility but affect the perceived vocal quality [Childers 1991]

  6. Various Glottal Pulses • Some other vocal types breathy falsetto vocal fry • Temporal and spectral characteristics

  7. Some Comments • Generally, to study the glottal pulse characteristics, it is necessary to rebuilding the glottal pulse waveform by inverse filtering technique • Automatically and exactly rebuilding the glottal waveform from real speech is almost impossible, especially, at the transient phase of articulation, or, for high pitched speakers • Fortunately, it is possible to estimate the glottal spectrum from residual signal with pitch prediction

  8. Linear Prediction • Speech waveform: correlation between current and past samples and thus predictable • Short-term correlation: • Occurs within one pitch period • Formant modulation • Classical linear prediction analysis (short-term prediction) • Long-term correlation • occurs across consecutive pitch periods • Vocal cords vibration • Long-term/pitch prediction

  9. Linear Prediction • Short-term predictor <classical linear prediction> • Remove the short-term correlation and result in a glottal excitation signal • Long-term predictor <pitch prediction> • Remove the correlation across consecutive periods

  10. Linear Prediction: A example

  11. Examples of pitch prediction estimatedglottal spectrum

  12. Harmonic Structure of Glottal Spectrum • Two parameters describing the harmonic structure • Harmonic richness factor and Noise-to-harmonic ratio • Harmonic richness factor (HRF) • Noise-to-harmonic ratio (NHR)

  13. Feature Generation • Acoustic features including the following: • Fundamental frequency F0 • Pitch prediction gain g • Pitch prediction coefficients b-1, b0, b1 • HRFn and NHRn <n=1:10> • 10 Mel scale frequency bank • Feature generation process

  14. Experiments Conditions • Speech quality: telephone speech • Subject: 49 male speakers • Training condition: • 3 training session, about 90s speech totally, over 3~6 weeks • 128 GMM • Testing condition: • 12 testing sessions. Over 4~6 months.

  15. Speaker recognition experiments • Identification results with long-term prediction related features • Comparison of glottal source feature with classical features

  16. Summary • Glottal source excitation is important for perceptional naturalness of voice quality and is helpful for distinguishing a speaker from the others. • Linear prediction is a powerful tool for speech analysis. The spectral property of the supraglottal vocal tract system can be estimated by short-term prediction; While the long-term prediction estimates the spectrum of the glottal excitation system • Recognition results show that the glottal source related acoustic features (F0, prediction gain, HRF, NHR, etc.) provide a certain degree of speaker discriminative power.

  17. Other Applications • Speech coding • Speech recognition ? • Speaking emotion recognition !

  18. Thank You!

More Related