1 / 12

E-MELD 2004: Linguistic Databases & Best Practice Recommendations

A comprehensive guide with recommendations on stylesheets, case studies, software, hardware, and new features for linguistic databases.

edwardowens
Download Presentation

E-MELD 2004: Linguistic Databases & Best Practice Recommendations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quotable Quotes :-) • “If you're running Windows and using a scripting language, it’s just all difficult” - Ed Garrett • “In this case, WALS covers too many languages”- Terry Langendoen E-MELD 2004 Linguistic Databases & Best Practice

  2. EMELD 2004 Working Group 6 Report Stylesheets / Case Studies / Software / Hardware Baden Hughes, Ljuba Veselinova, Terry Langendoen, Manuela Noske, Mike Maxwell, Ed Garrett, Lori Levin, Zhenwei Chen, Prashant Nagara, Neil Salmon E-MELD 2004 Linguistic Databases & Best Practice

  3. Stylesheet Recommendations #1 • Clarify audience • Programmers ? Refer to authoritative guides • Linguists ? Possibly revise the approach • Define & Refine • Remove unhelpful references eg SGML • Define core concepts eg XML • Annotated Examples • Just use inline commentary in the sources ! • Consistency in Exemplars • Example file, with 6 different renderings based on stylesheets • Natural precursors: “to get your data into this format …” • Missing stylesheets • Interlinear text; Paradigms; Trees; Bibliography • Output Formats: PDF, SVG • Anything for non-Roman script ? E-MELD 2004 Linguistic Databases & Best Practice

  4. Stylesheet Recommendations #2 • Access to real data (not CF engine rendered output, but raw DTDs, schemas and XML data) • Check the validity of instructions • Potential for a service provider model (online validation and stylesheet library) E-MELD 2004 Linguistic Databases & Best Practice

  5. Case Studies Recommendations • Missing Case Studies • Multimodal, particularly video-centric • Systematising random archival collections of legacy data • “Meet the Author/Linguist/X?” • Guided tours based on features of case studies which are pertinent to the user (eg source data format, desired outcomes, project type, software) • Quantification of effort for activities of specific types • Support commentary on case studies E-MELD 2004 Linguistic Databases & Best Practice

  6. Software Recommendations #1 • New functional categories in software catalogue • Require contributor information for review comments • Low on content - many bulk listings are available in structured formats - leverage these to create a larger catalogue • Motivating reviews: contrast the book review model with the incentive for software reviews of substance • Ranking systems are problematic if arbitrary or non-transparent • Contextualisation of • Location: field, office, community use • Audience: linguist, technical support, others E-MELD 2004 Linguistic Databases & Best Practice

  7. Software Recommendations #2 • Disambiguate the open source/open format/proprietary/closed format dichotomy • Consider working format vs archival format distinction in making recommendations • Linking to other thematic, functionally-grounded software surveys • New proposal for software “smaller than an application” - later discussion E-MELD 2004 Linguistic Databases & Best Practice

  8. Hardware Recommendations • Other general sites list and review hardware, linking is a more efficient option • Sites which provide specifically linguistic insight should also be included • Addressing common misconceptions eg the minidisk debacle would be a valuable contribution from EMELD • Including complementary technologies which enable the use of hardware in field linguistics: solar panels, batteries, ziplock bags :-) • Important inclusions: handheld devices, scanners E-MELD 2004 Linguistic Databases & Best Practice

  9. Possible New Features #1 • “Small Tools” • Smaller than an application • Primary concern for data manipulation • Not GUI point and shoot solutions, but scripts, libraries etc • Project Guidance: “So you want to collect language data …” • Last speaker scenario • Non-documentary linguist • Incidental acts by non-linguists • Service Provider Model • Stylesheet library • Data conversion and inclusion • Navigational Enhancements • Where am I ? • Guided tours E-MELD 2004 Linguistic Databases & Best Practice

  10. Possible New Features #2 • Media • Incidental discussion of media needs to be formalised • Interactive Forums • While directories and reviews are a good starting point, active communities may help to engage new users with the site • “How to Systematise Language Data” • Draw on the experience of EMELD team in building the case studies • Workflow Approach • Logical pathways within the School • Decision Tree model E-MELD 2004 Linguistic Databases & Best Practice

  11. General Issues #1 • Which conceptual model best suits the resource model within the School ? • download.com - a data provider ? • dmoz.org - a directory service ? • sourceforge.net - a collaborative repository ? • Leadership ambitions • The “all things to all people” model is inherently inefficient - what is EMELD’s competitive advantage ? • While electronic language documentation is a “niche market”, a complementary approach may be mutually beneficial with other projects • Long-term sustainability • Operationally sustainable both in terms of resources, currency and applicability • How enduring are the methods, data formats and advice anyway (what would EMELD look like if we did it again in 5 years time ?) E-MELD 2004 Linguistic Databases & Best Practice

  12. General Issues #2 • Integration vs Dependency • While LL provides much of the manpower for EMELD efforts at present, the goal of enabling documentary linguists directly needs to return to focus • Perspectives on Best Practice • Top-down best practice: tendency to focus on “best” • Bottom-up best practice: grounded in “practice”, and its improvement • Standardization vs Community Building E-MELD 2004 Linguistic Databases & Best Practice

More Related