1 / 13

Data collection and experimentation

Data collection and experimentation. Why should we talk about data collection? •. It is a central part of most, if not all, aspects of current speech technology The higher grades (A, B; as tested in the home exam assignments and the project) require a measure of data collection.

heaton
Download Presentation

Data collection and experimentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data collection and experimentation

  2. Why should we talk about data collection? • • It is a central part of most, if not all, aspects of current speech technology • The higher grades (A, B; as tested in the home exam assignments and the project) require a measure of data collection

  3. What is data collection? • • In speech technology, the gathering of human communicative behaviours that can be used for implementation of e.g. spoken dialogue systems • What do we gather? • Speech • Text • Voices • Gestures • Patterns!

  4. All vs one? • Recognition: we want to have seen all possibilities • Synthesis: we want one, consistent behaviour

  5. Group exercise • Same groups as before • Design one or more data collection(s) that will become the basis for a spoken dialogue system intended to inform users of the television program • Take note of why you make your design choices • We’ll talk about it here in 30 minutes

  6. Application • Remote control • Select programme • Menu options - tree • TV guide • More free speech • But connected to GUI options (e.g. for lists) • Data • Room environment • Age recognition data • Recognize age • Recognize identity of a specific mother • Usage probabilities • Asking people - ratings • Language? Programmes are english, swedish • Read tv guide • But people speak differently (“trean”) • Monitor corpus (updated) • “Beta” version – iterative process (h/h, WoZ, beta) • Demography: adults, elderly, kids? • Keywords • Cloud • Times • Some commands

  7. What is a corpus? • • Wikipedia: • A collection of written or spoken material in machine-readable form, assembled for the purpose of studying linguistic structures, frequencies, etc. Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05

  8. Why collect a corpus?• • ”[...] for the purpose of studying linguistic structures, frequencies, etc.” • Sample - cannot analyze all • Training data for duplicating behaviours • Analysis of how humans do things • Generalisability, representativeness • Same results in different corpora • Use constraints, standards, theories to form the corpus • If findings are expected - corroborate theory - we're better off Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05

  9. How is a corpus collected? • • Often high formal demands: • Structure • Balance • Audio, visual, audiovisual - choice of modalities • Requires equipment • Silent lab Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05

  10. Where are corpora collected? • Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05

  11. When are corpora collected? • • Often collected once, then static • But monitor corpora exists • And the web is as always changing things Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05

  12. Examples of corpora? • Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05

  13. Thank you! Questions?

More Related