1 / 17

ProZed: an Editor for the Automatic Processing of Prosodic Variation

ProZed: an Editor for the Automatic Processing of Prosodic Variation. C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université de Provence. Summary. 1. Prosodic systems Prosody as a multidimensional macro-system Levels of representation.

Download Presentation

ProZed: an Editor for the Automatic Processing of Prosodic Variation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ProZed: an Editor for the Automatic Processing of Prosodic Variation C. AURAN, C. BOUZON & D.J. HIRSTLaboratoire Parole et LangageCNRS UMR6057Université de Provence

  2. Summary 1. Prosodic systemsProsody as a multidimensional macro-systemLevels of representation 2. ProZEdGeneral conceptionsDemonstrations (a few modules)Long sound file fragmentation, Speaker separationDuration manipulation Silence detection and fragmentation MOMEL-INTSINT coding Phonological resynthesis 3. Perspectives

  3. Prosodic systems

  4. Prosody as a macro-system « Prosody » does not mean « intonation » • Prosody seen as consisting of 3 systems (Di Cristo 2001): • Tonal system • Temporal system • Metrical system • Intimate interactions between elements from these 3 systems • Complex relations between the acoustic, the phonetic and the phonological levels

  5. Orthogonal dimensions • Tonal and temporal systems make use of 2 orthogonal dimensions (Ladd 1996, Di Cristo et al. 2003 and forthcoming): • Linear dimension (tonal sequences, syllable length distribution, …) • Frame dimension (register level and span, downtrends, tempo, …) Both dimensions play a major part in the organisation of discourse and the linguistic characterisation of dialects (ref.)

  6. Levels of representation (1) • 4 levels of representation (cf. Hirst et al. 2000): • 0.Physical level (acoustic data) • 1. Phonetic level (continuous quantitative variables) • 2. Surface phonological level (abstract qualitative characteristics) • 3. Underlying phonological level • Interpretability constraint → local interpretation in relation with adjacent levels • Mapping: • between level 0 and level 1: phonetic representation • between level 1 and level 2: surface phonological representation

  7. Levels of representation (2) • Phonetic representation: • Temporal system: unit alignment with the speech signal • Tonal system: quadratic spline modelling of fundamental frequency (MOMEL algorithm)

  8. Levels of representation (3) • Surface phonological representation: • Temporal system: categorical coding (--, -, , +, ++) • Base dimension: raw segment duration • Frame dimension: tempo factor on raw segment duration • Tonal system: INTSINT coding of MOMEL targets (M, T, B, L, H, U, D) • Purely formal coding (≠ ToBI but cf. narrow IPA transcription) • Base dimension + frame dimensions (register level, register span, declination effect)

  9. INTSINT: base dimension • Absolute tones • T (Top) • M (Mid) • B (Bottom) • Relative tones • non-iterative • H (Higher) • L (Lower) • iterative • U (Up) • D (Down) • H (Higher) • L (Lower) • S (Same) • U (Up) • D (Down)

  10. INTSINT: Frame dimension Downdrift Register level and register span codings(cf. Portes & Di Cristo 2003)

  11. ProZEd

  12. General conceptions (1) • ProZEd: « Prosodic Editor » • Multi-functional • Preliminary processing (file segmentation, speakers separation, …) • Specific processing (duration processing, silence detection, intonation processing, resynthesis, …) • « Theory independent » (cf. Mixdorf’s work) • Multi-platform (Praat, Perl), freeware and open source (GPL)

  13. General conceptions (2) ProZEd: Representation levels Reversible mapping (for intonation): 0. Physical level 1. Phonetic level 2. Surface phonological level MBROLA MOMEL QSP INTSINT INT2PHO

  14. Demonstrations Long sound file fragmentation Duration manipulationSilence detection and fragmentationMOMEL-INTSINT codingPhonological resynthesis [ Launch ProZEd ]

  15. Perspectives

  16. Perspectives • Improved modelling of duration (z-score method) • Automatic generation of both xml and human (more easily) readable data sheets (polymetrical expressions for instance) • Ex.: _<M>(nV, <H>)(TIN, <BU>)_ • New modules for: • automatic pseudo-segment detection and processing (IRIT’s Vocalis software) • automatic complementary information extraction • automatic alignment using iterative DTW (Di Cristo & Hirst 1997)

  17. Thank you for your attention Presentation available fromwww.lpl.univ-aix.fr/~EPGA/ (ProZEd modules also available shortly… )

More Related