1 / 79

Taaltheorie & Taalverwerking

Week 12: Automatisch Vertalen Jurafsky & Martin (ed. 1), Hoofdstuk 21: Machine Translation. Taaltheorie & Taalverwerking. Automatisch Vertalen (Machine Translation). Vol-automatische vertaling van een tekst van de ene taal naar de andere. Automatisch Vertalen.

Download Presentation

Taaltheorie & Taalverwerking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 12: Automatisch Vertalen Jurafsky & Martin (ed. 1), Hoofdstuk 21: Machine Translation Taaltheorie & Taalverwerking

  2. Automatisch Vertalen(Machine Translation) Vol-automatische vertaling van een tekst van de ene taal naar de andere.

  3. Automatisch Vertalen • Het probleem: verschillen tussen talen • Vier benaderingen van MT • Direct • Transfer • Interlingua • Statistisch • Toepassingen

  4. Machine Translation: Vol-automatische vertaling van een tekst van de ene taal naar de andere. Waarom is dat moeilijk?

  5. Voorbeeld Jurafsky & Martin: Vertaling van 18e-eeuwse Chinese literatuur naar hedendaags Engels.

  6. Voorbeeld: Chinees-Engels Dai-yu alone on bed top think-with-gratitude-about Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come

  7. Voorbeeld: Chinees-Engels Dai-yu alone on bed top think-with-gratitude-about Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.

  8. Iets letterlijkere vertaling Dai-yu alone on bed top think-with-gratitude-about Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come Alone on her bed, Dai-yu thought again with gratitude about Bao-chai. She listened to the rustle of the rain on the tips of the bamboos and the leaves of the plantains outside her window. The clear cold penetrated the curtains. Without noticing it, she started to cry again.

  9. Probleem: lidwoorden Dai-yu alone on bed top think-with-gratitude-about Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come Alone on her/the bed, Dai-yu thought again with gratitude about Bao-chai. She listened to the rustle of (the) rain on (the) tips of (the) bamboos and (the) leaves of (the) plantains outside her/the window. The clear cold penetrated the curtains. Without noticing it, she started to cry again.

  10. Probleem: "zero anaphora" Dai-yu alone on bed top think-with-gratitude-about Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come Alone on her bed, Dai-yu thought again with gratitude about Bao-chai. She listened to the rustle of the rain on the tips of the bamboos and the leaves of the plantains outside her window. The clear cold penetrated the curtains. Without noticing (it), she started to cry again.

  11. Probleem: werkwoordstijden Dai-yu alone on bed top think-with-gratitude-about Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come Alone on her bed, Dai-yu thought again with gratitude about Bao-chai. She listened to the rustle of the rain on the tips of the bamboos and the leaves of the plantains outside her window. The clear cold penetrated the curtains. Without noticing it, she started to cry again.

  12. Probleem: lexicon Dai-yu alone on bed top think-with-gratitude-about Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come Alone on her bed, Dai-yu thought again with gratitude about Bao-chai. She listened to the rustle of the rain on the tips of the bamboos and the leaves of the plantains outside her window. A cold draught penetrated the curtains of her bed. Without noticing it, she started to cry again.

  13. Probleem: woordvolgorde Dai-yu alone on bed top think-with-gratitude-about Bao-chai again listen to window outsidebamboo tipplantain leaf ofon-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come Alone on her bed, Dai-yu thought again with gratitude about Bao-chai. She listened to the rustle of the rain onthe tips of the bamboosandthe leaves of the plantainsoutside her window. A cold draught penetrated the curtains of her bed. Without noticing it, she started to cry again.

  14. Verschillen tussen talen

  15. Verschillen tussen talen Theoretische taalwetenschap: taal-typologie:taxonomie van de verschillen en overeenkomsten tussen de verschillende talen

  16. Morphologie vs. Syntax • Isolerende talen • Chinees, Vietnamees: 1 word – 1 morpheem • Polysynthetische talen • Eskimo-talen, Koreaans: 1 woord veel morphemen • Ertussenin (Engels, Nederlands)

  17. Morphologie vs. Syntax Rijkere morphologie correleert met vrijere woordvolgorde. B.v.: klassiek Latijn: “casus-markering”: de rol van een NP in de zin wordt aangegeven door de verbuiging van de Noun (“Rosa, Rosae, Rosam...”);de plaats van de NP in de zin is vrij.

  18. Syntax: verschillende “basis-volgordes” • SVO (Subject-Verb-Object) talen • Engels, Mandarijns • SOV talen • Japans, Hindi, Nederlands • VSO talen • Iers, Klassiek Arabisch

  19. Syntax • SVO-talen: preposities: Engels: "to Yuriko" • "echte" SOV-talen: postposities: Japans: "Yuriko ni"

  20. Segmentatie • Woordgrenzen worden niet in alle talen gemarkeerd! • Chinees, Japans, Thai, Vietnamees

  21. Syntax vs. Discourse • Zeer lange zinnen (te vertalen als alinea's) • Modern Standaard Arabisch, Chinees • Zeer korte zinnen (te combineren tot complexe zinnen) • Papoea-talen, Aboriginal-talen

  22. Lexical Divergence

  23. Lexical Divergence: Gaps • Japanese: no word for "privacy" • English: no word for Cantonese ‘haauseun’ or Japanese ‘oyakoko’ (something like `filial piety’)

  24. Vertaalprogramma's

  25. Vertaalprogramma's:Methodes

  26. 3 methoden voor MT • Direct • Transfer • Interlingua

  27. 3 methoden voor MT Interlingua expression Semantic Analysis Syntactic Structure Syntactic Structure (Syntactic) Transfer Syntactic Analysis Morpheme Sequence Morpheme Sequence Direct Morphological Generation Morphological Analysis Source Text Target Text

  28. Direct Translation • Morfologische analyse van brontaal-zin morfeem-sequentie. • Transformaties op deze morfeem-sequentie doeltaal-zin.

  29. Direct MT: Japans  Engels Wa ta shi ha tsu kue no ue no pen wo jon ni a ge ta 1. Morfologische analyse Wa ta shi ha tsu kue no ue no pen wo jon ni a ge ta Watashi ha tsukue no ue no pen wo jon ni ageruPAST 2) Woordenboek: vertaling van inhoudswoorden Watashi ha tsukue no ue no pen wo jon ni ageru PAST I ha desk no ue no pen wo John ni give PAST 3) PP-transformaties I ha desk no ue no pen wo John ni give PAST I ha pen on desk wo to John give PAST

  30. Direct MT: Japans  Engels 4) Werkwoordsverplaatsing I ha pen on desk wo to John give PAST. I give PAST pen on desk to John. 5) Lidwoord-insertie I give PAST pen on desk to John. I give PAST thepen on the desk to John. 6) morphological generation I give PAST the pen on the desk to John. I gave the pen on the desk to John.

  31. Direct MT: pros & cons • Pros • Computationeel overzichtelijk • Snel • Cons • Conceptueel onoverzichtelijk • Linguïstisch onbetrouwbaar

  32. Het Transfer Model

  33. B.v.: Engels  Frans • Engels: Adjective Noun • Frans: Noun AdjectiveN.B. Er zijn uitzonderingen: B.v.: • "route mauvaise": ‘slechte weg’ • "mauvaise route": ‘verkeerde weg’

  34. Engels: Adjective Noun • Frans: Noun Adjective Regel: noun phrase  adjective noun  noun phrase  noun adjective

  35. existential-there-sentence Transfer voorbeeld: Engels Japans there BE NP VP-ing There is [a black swan] [swimming in the pond]. Existential-There-Sentence There1 BE2 NP3 VP-ing4  S (NP  NP3 VP-ing4 ) BE2 S NP [[A black swan] [swimming in the pond] is. NP VP-ing BE Regel voor Existential-there: deleer constituent 1 en maak van constituent 4 een rechter-modifier van constituent 3; verplaats constituent 2 naar het eind.

  36. NP Transfer voorbeeld: Engels Japans [a black swan] [swimming in the pond]. NP VP-ing NP [swimming in the pond] [a black swan]. VP-ing NP NP  NP1 VP-ing2  NP  VP-ing2 NP1 Regel voor Relative Clause: Keer constituentenvolgorde om.

  37. existential-there-sentence Transfer voorbeeld: Engels Japans there was (an old man) gardening Regel voor Existential-there Regel voor relative clauses Lexicon

  38. English to Japanese Transfer Niwa no teire o suru ojiisan ita • Insereer “ga” na het onderwerp • Congruentie tussen werkwoord en onderwerp • Werkwoordsvervoeging Niwa no teire o shite ita ojiisan ga ita Gardening old man SUBJ was Garden GEN upkeep OBJ do PASTPROG

  39. Transfer: enkele beperkingen • Specifieke regels voor elk taal-paar • Houdt geen rekening met semantiek • Houdt geen rekening met statistiek

  40. MT Methode 3: Interlingua 1) Vertaal brontaal-zin naar betekenis-representatie 2) Genereer doeltaal-zin op grond van betekenis-representatie.

  41. Interlingua voor"There was an old man gardening" EVENT: GARDENING AGENT: [MAN NUMBER: SG DEFINITENESS: INDEF] ASPECT: PROGRESSIVE TENSE: PAST

  42. Interlingua MT: pros & cons • Pros • Één stelsel regels voor elke taal (i.p.v. voor elk paar talen). • Cons: • Semantiek is moeilijk • Syntactische informatie gaat verloren!

  43. "Alternatief":De Statistische benadering

  44. What makes a good translation Translators often talk about two factors we want to maximize: • Faithfulness or fidelity • How close is the meaning of the translation to the meaning of the original • Fluency or naturalness • How natural the translation is, just considering its fluency in the target language

  45. Statistical MT: Formalizing Faithfulness and Fluency

  46. Naar analogie van spraakherkenning: Regel van Bayes: Kans op target-zin T gegeven source-zin S: P(T|S) = P(S|T) * P(T)

  47. Afleiding van de regel van Bayes: P(S & T) = P(S|T) * P(T) en P(S & T) = P(T|S) * P(S) Dus: P(T|S) * P(S) = P(S|T) * P(T) Als S gegeven is: P(T|S) = P(S|T) * P(T)

  48. Kans op target-zin T gegeven source-zin S: P(T|S) = P(S|T) * P(T) P(T|S) = Faithfulness(S,T) * Fluency(T)

More Related