1 / 35

Next slide is the Neutral Avenue System Diagram with a Morphology Learning box added

Next slide is the Neutral Avenue System Diagram with a Morphology Learning box added. Avenue Overview. Elicitation. Morphology. Rule Learning. Run-Time System. Rule Refinement. Translation Correction Tool. Word-Aligned Parallel Corpus. Learning Module. Do NOT Use. Handcrafted

dore
Download Presentation

Next slide is the Neutral Avenue System Diagram with a Morphology Learning box added

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Next slide is the Neutral Avenue System Diagram with a Morphology Learning box added

  2. Avenue Overview Elicitation Morphology Rule Learning Run-Time System RuleRefinement Translation Correction Tool Word-Aligned Parallel Corpus Learning Module Do NOT Use Handcrafted rules Run Time Transfer System Learning Module Transfer Rules Rule Refinement Module Elicitation Corpus Morphology Analyzer Lexical Resources Lattice Elicitation Tool

  3. The next slide is for Ari. It has her sections highlighted but also has the extra box that I added for Morphology Learning

  4. Rule Refinement Elicitation Morphology Rule Learning Run-Time System Rule Refinement Translation Correction Tool Word-Aligned Parallel Corpus Learning Module Do NOT Use Handcrafted rules Run Time Transfer System Learning Module Transfer Rules Rule Refinement Module Elicitation Corpus Morphology Analyzer Lexical Resources Lattice Elicitation Tool

  5. Here is where Christian’s presentation begins

  6. Avenue Overview Elicitation Morphology Rule Learning Run-Time System RuleRefinement Translation Correction Tool Word-Aligned Parallel Corpus Learning Module Do NOT Use Handcrafted rules Run Time Transfer System Learning Module Transfer Rules Rule Refinement Module Elicitation Corpus Morphology Analyzer Lexical Resources Lattice Elicitation Tool

  7. The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun

  8. The Challenge of Morphology Mapudungun Allkütu -le -ke -fu -n

  9. The Challenge of Morphology Mapudungun Allkütu -le -ke -fu -n Listen -prog. -habitual -past -indic.1sg

  10. The Challenge of Morphology Mapudungun Allkütu -le -ke -fu -n Listen -prog. -habitual -past -indic.1sg I

  11. The Challenge of Morphology Mapudungun Allkütu -le -ke -fu -n Listen -prog. -habitual -past -indic.1sg I used to

  12. The Challenge of Morphology Mapudungun Allkütu -le -ke -fu -n Listen -prog. -habitual -past -indic.1sg I used to listen

  13. The Challenge of Morphology Mapudungun Tasks for Morphology • Segment Words • Map Morphemes onto Features Allkütu -le -ke -fu -n Listen -prog. -habitual -past -indic.1sg I used to listen

  14. The Challenge of Morphology • Learn these tasks • unsupervised • from data • for any language Tasks for Morphology • Segment Words • Map Morphemes onto Features

  15. Our Approach Leverage the Natural Structure of Morphology • Paradigm • Set of affixes that interchangeably attach to a set of stems

  16. Our Approach Ø.s blame solve Leverage the Natural Structure of Morphology • Paradigm • Set of affixes that interchangeably attach to a set of stems Example Vocabulary blame blamed blamesroamed roaming roams solve solves solving

  17. Our Approach Ø.s.d blame Ø.s blame solve Leverage the Natural Structure of Morphology • Paradigm • Set of affixes that interchangeably attach to a set of stems Example Vocabulary blame blamed blames roamed roaming roams solve solves solving

  18. Our Approach Ø.s.d blame Ø.s blame solve Leverage the Natural Structure of Morphology • Paradigm • Set of affixes that interchangeably attach to a set of stems Example Vocabulary blame blamed blames roamed roaming roams solve solves solving

  19. Our Approach Ø.s.d blame Ø.s blame solve Leverage the Natural Structure of Morphology • Paradigm • Set of affixes that interchangeably attach to a set of stems Example Vocabulary blame blamed blamesroamed roaming roams solve solvessolving s blame roam solve

  20. Our Approach Ø.s.d blame Ø.s blame solve Leverage the Natural Structure of Morphology • Paradigm • Set of affixes that interchangeably attach to a set of stems Example Vocabulary blame blamed blames roamed roaming roams solve solves solving s blame roam solve

  21. Our Approach Ø.s.d blame e.es blam solv Ø.s blame solve Example Vocabulary blame blamed blamesroamed roaming roams solve solves solving s blame roam solve

  22. Our Approach Ø.s.d blame e.es blam solv Ø.s blame solve Example Vocabulary blame blamed blames roamed roaming roams solve solves solving s blame roam solve

  23. me.mes.med bla e.es.ed blam Ø.s.d blame e.es blam solv Ø.s blame solve me.mes bla me.med bla e.ed blam Ø.d blame s.d blame mes.med bla es.ed blam e blam solv Ø blame blames blamed roams roamed roaming solve solves solving me bla s blame roam solve es blam solv mes bla med bla roa ed blam roam d blame roame

  24. a.as.o.os.tro 1 cas • Spanish Newswire Corpus • 40,011 Tokens • 6,975 Types a.as.o.os 43 african, cas, jurídic, l, ... a.as.o 59 cas, citad, jurídic, l, ... a.as.os 50 afectad, cas, jurídic, l, ... a.o.os 105 impuest, indonesi, italian, jurídic, ... as.o.os 54 cas, implicad, jurídic, l, ... a.as 199 huelg, incluid, industri, inundad, ... as.o 85 intern, jurídic, just, l, ... o.os 268 human, implicad, indici, indocumentad, ... a.tro 2 cas.cen a.o 214 id, indi, indonesi, inmediat, ... a.os 134 impedid, impuest, indonesi, inundad, ... as.os 68 cas, implicad, inundad, jurídic, ... tro 16 catas, ce, cen, cua, ... a 1237 huelg, ib, id, iglesi, ... as 404 huelg, huelguist, incluid, industri, ... o 1139 hub, hug, human, huyend, ... os 534 humorístic, human, hígad, impedid, ... 24

  25. Level 5 = 5 suffixes Stem Type Count Suffixes Stems a.as.o.os.tro 1 cas a.as.o.os 43 african, cas, jurídic, l, ... a.as.o 59 cas, citad, jurídic, l, ... a.as.os 50 afectad, cas, jurídic, l, ... a.o.os 105 impuest, indonesi, italian, jurídic, ... as.o.os 54 cas, implicad, jurídic, l, ... a.as 199 huelg, incluid, industri, inundad, ... as.o 85 intern, jurídic, just, l, ... o.os 268 human, implicad, indici, indocumentad, ... a.tro 2 cas.cen a.o 214 id, indi, indonesi, inmediat, ... a.os 134 impedid, impuest, indonesi, inundad, ... as.os 68 cas, implicad, inundad, jurídic, ... tro 16 catas, ce, cen, cua, ... a 1237 huelg, ib, id, iglesi, ... as 404 huelg, huelguist, incluid, industri, ... o 1139 hub, hug, human, huyend, ... os 534 humorístic, human, hígad, impedid, ... 25

  26. a.as.o.os.tro 1 cas a.tro 2 cas.cen tro 16 catas, ce, cen, cua, ... Adjective Inflection Class From the spurious suffix “tro” a.as.o.os 43 african, cas, jurídic, l, ... a.as.o 59 cas, citad, jurídic, l, ... a.as.os 50 afectad, cas, jurídic, l, ... a.o.os 105 impuest, indonesi, italian, jurídic, ... as.o.os 54 cas, implicad, jurídic, l, ... a.as 199 huelg, incluid, industri, inundad, ... as.o 85 intern, jurídic, just, l, ... o.os 268 human, implicad, indici, indocumentad, ... a.o 214 id, indi, indonesi, inmediat, ... a.os 134 impedid, impuest, indonesi, inundad, ... as.os 68 cas, implicad, inundad, jurídic, ... a 1237 huelg, ib, id, iglesi, ... as 404 huelg, huelguist, incluid, industri, ... o 1139 hub, hug, human, huyend, ... os 534 humorístic, human, hígad, impedid, ... 26

  27. a.as.o.os.tro 1 cas Decreasing Stem Count Increasing Suffix Count a.tro 2 cas.cen tro 16 catas, ce, cen, cua, ... Basic Search Procedure a.as.o.os 43 african, cas, jurídic, l, ... a.as.o 59 cas, citad, jurídic, l, ... a.as.os 50 afectad, cas, jurídic, l, ... a.o.os 105 impuest, indonesi, italian, jurídic, ... as.o.os 54 cas, implicad, jurídic, l, ... a.as 199 huelg, incluid, industri, inundad, ... as.o 85 intern, jurídic, just, l, ... o.os 268 human, implicad, indici, indocumentad, ... a.o 214 id, indi, indonesi, inmediat, ... a.os 134 impedid, impuest, indonesi, inundad, ... as.os 68 cas, implicad, inundad, jurídic, ... a 1237 huelg, ib, id, iglesi, ... as 404 huelg, huelguist, incluid, industri, ... o 1139 hub, hug, human, huyend, ... os 534 humorístic, human, hígad, impedid, ... 27

  28. Examples and Evaluation of Automatically Selected Suffix Sets Global Suffix Evaluation Precision: 0.506 Recall: 0.517 F1: 0.511 28

  29. Next Steps for Morphology Induction • Improve the Quality of Induced Paradigms • Current Work • Convert Paradigms into a Segmenter • Soon • Learn Mappings from Morphemes to Features • Future Goal

  30. Avenue Overview Elicitation Morphology Rule Learning Run-Time System RuleRefinement Translation Correction Tool Word-Aligned Parallel Corpus Learning Module Do NOT Use Handcrafted rules Run Time Transfer System Learning Module Transfer Rules Rule Refinement Module Elicitation Corpus Morphology Analyzer Lexical Resources Lattice Elicitation Tool

  31. Mapudungun • Indigenous Language of Chile and Argentina • ~ 1 Million Mapuche Speakers

  32. Collaboration • Mapuche Language Experts • Universidad de la Frontera (UFRO) • Instituto de Estudios Indígenas (IEI) • Institute for Indigenous Studies • Chilean Funding • Chilean Ministry of Education (Mineduc) • Bilingual and Multicultural Education Program Eliseo Cañulef Rosendo Huisca Hugo Carrasco Hector Painequeo Flor Caniupil Luis Caniupil Huaiquiñir Marcela Collio Calfunao Cristian Carrillan Anton Salvador Cañulef Carolina Huenchullan Arrúe Claudio Millacura Salas

  33. Accomplishments • Corpora Collection • Spoken Corpus • Collected: Luis Caniupil Huaiquiñir • Medical Domain • 3 of 4 Mapudungun Dialects • 120 hours of Nguluche • 30 hours of Lafkenche • 20 hours of Pwenche • Transcribed in Mapudungun • Translated into Spanish • Written Corpus • ~ 200,000 words • Bilingual Mapudungun – Spanish • Historical and newspaper text nmlch-nmjm1_x_0405_nmjm_00: M: <SPA>no pütokovilu kay ko C: no, si me lo tomaba con agua M: chumgechi pütokoki femuechi pütokon pu <Noise> C: como se debe tomar, me lo tomé pués nmlch-nmjm1_x_0406_nmlch_00: M: Chengewerkelafuymiürke C: Ya no estabas como gente entonces!

  34. Accomplishments • Developed At UFRO • Bilingual Dictionary with Examples • 1,926 entries • Spelling Corrected Mapudungun Word List • 117,003 fully-inflected word forms • Segmented Word List • 15,120 forms • Stems translated into Spanish

  35. Accomplishments • Developed at LTI using Mapudungun language resources from UFRO • Spelling Checker • Integrated into OpenOffice • Hand-built Morphological Analyzer • Prototype Machine Translation Systems • Rule-Based • Example-Based • LenguasAmerindias.org

More Related