1 / 24

Language documentation and Infrastructures

CLARIN-NL Kick-off meeting, Utrecht, May 27 th 2009. Language documentation and Infrastructures. Dagmar Jung (Universität zu Köln) & Peter Wittenburg (MPI Nijmegen). A brief step back in time: setting the scene of language documentation. The linguist/anthropologis The native speaker

kgerald
Download Presentation

Language documentation and Infrastructures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLARIN-NL Kick-off meeting, Utrecht, May 27th 2009 Language documentation and Infrastructures Dagmar Jung (Universität zu Köln) & Peter Wittenburg (MPI Nijmegen)

  2. A brief step back in time: setting the scene of language documentation • The linguist/anthropologis • The native speaker • The transcription • The translation • The analysis • The ideal outcome: texts, grammar, dictionary CLARIN-NL Kick-off

  3. Pliny Earl Goddard 1914: The present condition of our knowledge of North American Languages “There remains a great amount of linguistic work to be done. With so little known of the origin of languages, and the conditions controling their development and their dispersion, it is important that a record should be preserved of every language spoken. In order that that record be adequate, great care must be taken in phonetic representation. The sounds which correspond to the characters employed in writing should be so carefully described as to their manner of articulation and their acoustic effects as to make them thoroughly intelligible for all time. Sufficient material from each dialect should be recorded in the connected form of texts to furnish a fairly complete lexicon of the words it contains and a representation of the grammatical forms in use.” (1914:592, American Anthropologist Vol. 16) CLARIN-NL Kick-off

  4. The Dobes-Program (Dokumentation bedrohter Sprachen) • Funded by the VolkswagenFoundation • Started in 2000 – ca. 45 projects worldwide • Technical team and archive development: MPI • Two main goals: • Documentation of endangered languages (gathering of audio and video data in the field and annotating them) • Creation of a web-accessible, digital archive that will persist over a longer period of time CLARIN-NL Kick-off

  5. The Dobes projects (2008) CLARIN-NL Kick-off

  6. Challenges today • Once the fieldwork situation is set up, a myriad of language data can be recorded • There is no limit to the quantity of recordings set by hardware any longer • Potentially a flood of audio and video data is collected -> how can it be processed to be useful? CLARIN-NL Kick-off

  7. Flexible Annotation Tools • ELAN (time-aligned video/audio annotation) • Toolbox (parsing tool and lexical database) • Interoperable with other representational and analytic tools (e.g. by providing XML-interfaces) CLARIN-NL Kick-off

  8. Elan CLARIN-NL Kick-off

  9. Elan: multiple tiers CLARIN-NL Kick-off

  10. Toolbox CLARIN-NL Kick-off

  11. Tools: LEXUS • Web-based lexical database: allows for customized lexicon creation • Also import from Toolbox • Multi-media links allowed • Its on-line nature ideal for collaborative efforts CLARIN-NL Kick-off

  12. The Archive Is not a place to merely ‘dump’ data and forget about them, but serves for: • Data preservation • Data presentation • Data analysis (e.g. by making use of metadata or intelligent searches) And last but not least: • Data accountability CLARIN-NL Kick-off

  13. The Archive CLARIN-NL Kick-off

  14. The Archive: flexible corpus structures CLARIN-NL Kick-off

  15. Metadata • Necessary for archival organization • Identity of resources: language name, etc. • also physical characteristics: quality, quantity • Desirable for scientific use of resources • Sociolinguistic data of participants • Characteristics of genre • Key words (free) CLARIN-NL Kick-off

  16. ANNEX searches in the archive Allows for simple searches or advanced multi-tier searches CLARIN-NL Kick-off

  17. ANNEX: multiple views CLARIN-NL Kick-off

  18. Ways of Access and Visualization: Google Earth layer CLARIN-NL Kick-off

  19. Ways of Access and Visualization: Google Earth layer CLARIN-NL Kick-off

  20. Ways of Access and Visualization CLARIN-NL Kick-off

  21. Ways of access: web-accessible stories (derived from ELAN) CLARIN-NL Kick-off

  22. Ways of access: Community Portal CLARIN-NL Kick-off

  23. Changes in Language Resources: Data and Tools • Data are not the same (audio, video, quantity and quality) • Archive is inherently work-in-progress, NOT published end-product • Tools are certainly not the same (annotation, presentation, search engines) • Linguistic work has become more cooperative: with communities, with international colleagues, with other disciplines • New foundation for linguistics as an empirical science CLARIN-NL Kick-off

  24. PS Pliny Earl Goddard, Documentation of Beaver Athabaskan (1917) Rousselot-Apparatus CLARIN-NL Kick-off

More Related