1 / 11

Automatic Burst Location

Automatic Burst Location. Alina Khasanova Prosody-ASR Group February 12, 2009. Objective: . Develop a procedure for accurate automatic location of the point of release of stops

dympna
Download Presentation

Automatic Burst Location

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Burst Location Alina Khasanova Prosody-ASR Group February 12, 2009

  2. Objective: Develop a procedure for accurate automatic location of the point of release of stops in order to study the temporal and spectral properties of plosives {p,t,k,b,d,g} in #CV and VC# contexts in spontaneous speech. Database: Buckeye corpus •hand-segmented for phone boundaries •the release point is not labeled

  3. Nature of the problem Acoustically, a plosive is characterized by two stages: (1) Acoustic closure: an interval of very low, relatively constant level of energy (the duration of full constriction) (2) Release: a sudden increase in energy across the full frequency range, i.e, a transient, followed by a period of frication (or, in case of aspirated stops, frication + aspiration) Hence, a natural way to find the point of release is to look for a large jump in energy after a continuous period of “silence”.

  4. Previous work on the problem • Engineers (Niyogi et al.1999, Hosom & Cole 2000) Task: Detecting plosives in continuous (non-segmented) speech Reasons: Automatic phonetic alignment, measurement of acoustic features (e.g. VOT) for the purposes of ASR Methods: (1) Identify candidates based on such measures as: total energy, relative change in energy, high-pass energy, spectral flatness, etc., (2) Classify candidates into plosive and non-plosives (via support vector machines, neural networks, etc.). Measure of success: Correct if within 20 msec of hand-labeled closure-release boundary • Linguists (Yao 2007) Task: Accurately locate the closure-burst boundary Reason: Study the duration of closure and VOT, and how a number of linguistic and extra-linguistic factors affect them Method: Mel spectral templates and similarity scoring (Johnson 2006) Measure of success: As accurate as possible

  5. Present Method: Stop Vowel Amplitude normalization Pre-emphasis Preprocessing Spectrogram Calculating: Average energy (AvgV) Maximum energy (MaxV) Spectrogram: 64 point DFT 4 ms Hamming window,2 ms step Calculating ave. energy >3KHz (HPE) (overlapping frames) Calculating HPE deltas (Del) (successive 4 ms frames) HPE>0.05*MaxV & Del>AvgV HPE>0.05*MaxV & Del>0.75*DelMax Y Y N N Closure Burst Burst Closure Last frame= boundary Last frame= boundary

  6. Results • Control Dataset: Hand-labeled closure-release boundary in 607 tokens from Buckeye corpus = 300 word-initial, 307 word-final; 5 tokens for each place of articulation; for /p,t,k,d,g/ from 10 subjects (5 m, 5 f), for /b/ from more than 10 subjects, with variety of speech rates • Error Analysis:

  7. How to improve the performance? • Poor performance on ambiguous/non-canonical tokens (e.g. “noisy” closure) • Would like to identify and exclude such tokens • Exclusion metric: • Standard deviation of deltas: Voiceless: exclude if std<3 (Init 14%, Final 42%) Voiced: exclude if std<2 (Init 5%, Final 33%) • Other?

  8. Results after exclusion • Best results (rms): InitialVless 0.0038 msec InitialVoiced 0.0133 FinalVless 0.013 FinalVoiced 0.014

  9. How to improve further? • • • • References: Hosom, J.-P. & Cole, R. (2000) Burst detection based on measurements of intensity discrimination in Sixth Int’l Conf. on Spoken Lang. Processing Niyogi, P., Burges, C, & Ramesh, P. (1999) Distinctive feature detection using support vector machines in Proc. of Int’l Cof. On Acoustics, Speech, & Signal Processing. Yao Y. (2007). Closure Duration and VOT of Word-initial Voiceless Plosives in English in Spontaneous Connected Speech. UC Berkeley Phonology Lab 2007 Annual Report.

More Related