1 / 64

Semantic Representation and Formal Transformation

Semantic Representation and Formal Transformation. kevin knight usc / isi MURI meeting, CMU, Nov 3-4, 2011. Machine Translation. Phrase-based MT Syntax-based MT Meaning-based MT. source string. target string. NIST 2009 c2e. source string. source tree. target tree. target

bert
Download Presentation

Semantic Representation and Formal Transformation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Representation and Formal Transformation kevin knight usc/isi MURI meeting, CMU, Nov 3-4, 2011

  2. Machine Translation Phrase-based MT Syntax-based MT Meaning-based MT source string target string NIST 2009 c2e source string source tree target tree target string source string meaning representation target string source tree target tree

  3. Meaning-based MT • Too big for this MURI: • What content goes into the meaning representation? • linguistics, annotation • How are meaning representations probabilistically generated, transformed, scored, ranked? • automata theory, efficient algorithms • How can a full MT system be built and tested? • engineering, language modeling, features, training

  4. Meaning-based MT • Too big for this MURI: • What content goes into the meaning representation? • linguistics, annotation • How are meaning representations probabilistically generated, transformed, scored, ranked? • automata theory, efficient algorithms • How can a full MT system be built and tested? • engineering, language modeling, features, training

  5. Meaning-based MT • Too big for this MURI: • What content goes into the meaning representation? • linguistics, annotation • How are meaning representations probabilistically generated, transformed, scored, ranked? • automata theory, efficient algorithms • How can a full MT system be built and tested? • engineering, language modeling, features, training Language-independent theory. But driven by practical desires.

  6. Automata Frameworks • How to represent and manipulate linguistic representations? • Linguistics, NLP, and Automata Theory used to be together (1960s, 70s) • Context-free grammars were invented to model human language • Tree transducers were invented to model transformational grammar • They drifted apart • Renewed connections around MT (this century) • Role: greatly simplify systems!

  7. Finite-State Transducer (FST) Original input: Transformation: k q k FST q k  q2 *e* n n q i  q AY q g  q3 *e* i i q2 n  q N q3 h  q4 *e* q4 t  qfinal T g g k : *e* q2 i : AY h h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*

  8. Finite-State (String) Transducer Original input: Transformation: k FST q k  q2 *e* n q2 n q i  q AY q g  q3 *e* i i q2 n  q N q3 h  q4 *e* q4 t  qfinal T g g k : *e* q2 i : AY h h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*

  9. Finite-State (String) Transducer Original input: Transformation: k FST q k  q2 *e* n N q i  q AY q g  q3 *e* i q i q2 n  q N q3 h  q4 *e* q4 t  qfinal T g g k : *e* q2 i : AY h h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*

  10. Finite-State (String) Transducer Original input: Transformation: k FST q k  q2 *e* n N q i  q AY q g  q3 *e* i AY q2 n  q N q3 h  q4 *e* q4 t  qfinal T g q g k : *e* q2 i : AY h h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*

  11. Finite-State (String) Transducer Original input: Transformation: k FST q k  q2 *e* n N q i  q AY q g  q3 *e* i AY q2 n  q N q3 h  q4 *e* q4 t  qfinal T g k : *e* q2 i : AY h q3 h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*

  12. Finite-State (String) Transducer Original input: Transformation: k FST q k  q2 *e* n N q i  q AY q g  q3 *e* i AY q2 n  q N q3 h  q4 *e* q4 t  qfinal T g k : *e* q2 i : AY h q n : N qfinal g : *e* q3 q4 t : T t q4 t h : *e*

  13. Finite-State (String) Transducer Original input: Transformation: k FST q k  q2 *e* n N q i  q AY q g  q3 *e* i AY q2 n  q N q3 h  q4 *e* q4 t  qfinal T g k : *e* q2 i : AY h q n : N qfinal T g : *e* q3 q4 t : T t h : *e* qfinal

  14. Transliteration Angela Knight transliteration a n ji ra na i to Frequently occurring translation problem for languages with different sound systems and character sets. (Japanese, Chinese, Arabic, Russian, English…) Can’t be solved by dictionary lookup.

  15. Transliteration WFST Angela Knight 7 input symbols 13 output symbols

  16. Transliteration WFSA A Angela Knight WFST B AE N J EH L UH N AY T WFST C a n j i r a n a i t o WFST D

  17. DECODE + millions more WFSA A + millions more WFST B AE N J IH R UH N AY T AH N J IH L UH N AY T OH + millions more WFST C a n j i r a n a i t o WFST D

  18. General-Purpose Algorithms for String Automata

  19. Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S q S NP VP NP VP PRO VBZ NP PRO VBZ NP he enjoys SBAR he enjoys SBAR VBG VP VBG VP listening P NP listening P NP to music to music

  20. Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: q S 0.2  s x0, wa, r x2, ga, q x1 S q S x0:NP VP NP VP NP VP x1:VBZ x2:NP PRO VBZ NP PRO VBZ NP he enjoys SBAR he enjoys SBAR VBG VP VBG VP listening P NP listening P NP to music to music

  21. Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP VP PRO VBZ NP r NP q VBZ s NP , , , , ga wa he enjoys SBAR SBAR enjoys PRO VBG VP VBG VP he listening P NP listening P NP to music to music

  22. Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: 0.7 s NP  kare S PRO NP VP he PRO VBZ NP r NP q VBZ s NP , , , , ga wa he enjoys SBAR SBAR enjoys PRO VBG VP VBG VP he listening P NP listening P NP to music to music

  23. Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP VP PRO VBZ NP r NP q VBZ , , , , kare wa ga he enjoys SBAR SBAR enjoys VBG VP VBG VP listening P NP listening P NP to music to music

  24. Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Final output: S NP VP PRO VBZ NP , , , , , , , , kare wa ongaku o kiku no ga daisuki desu he enjoys SBAR VBG VP listening P NP to music

  25. Top-Down Tree Transducer Introduced by Rounds (1970) & Thatcher (1970) “Recent developments in the theory of automata have pointed to an extension of the domain of definition of automata from strings to trees … parts of mathematical linguistics can be formalized easily in a tree-automaton setting …” (Rounds 1970, “Mappings on Grammars and Trees”, Math. Systems Theory 4(3)) Large theory literature e.g., Gécseg & Steinby (1984), Comon et al (1997) Once again re-connecting with NLP practice e.g., Knight & Graehl (2005), Galley et al (2004, 2006), May & Knight (2006, 2010), Maletti et al (2009)

  26. Tree Transducers Can be Extracted from Bilingual Data (Galley et al, 04) TREE TRANSDUCER RULES: VBD(felt)  有 VBN(obliged)  责任 VB(do)  尽 NN(part)  一份 NN(part)  一份 力 VP-C(x0:VBN x1:SG-C)  x0 x1 VP(TO(to) x0:VP-C)  x0 … S(x0:NP-C x1:VP)  x0 x1 S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB NPB PRP PRP$ NN i felt obliged to do my part 我 有 责任 尽 一份 力

  27. Syntax-Based Decoding 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .

  28. Syntax-Based Decoding “these” “include” “France” “and” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.)  . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .

  29. Syntax-Based Decoding “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0 , x1 , x2 “these” “include” “France” “and” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.)  . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .

  30. Syntax-Based Decoding “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0 , x1 , x2 “these” “include” “France” “&” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.)  . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .

  31. Syntax-Based Decoding “astronauts coming from France and Russia” RULE 16: NP(x0:NP, x1:VP)  x1 , 的 , x0 “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0 , x1 , x2 “these” “include” “France” “&” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.)  . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .

  32. “include astronauts coming from France and Russia” RULE 14: VP(x0:VBP, x1:NP)  x0 , x1 “astronauts coming from France and Russia” RULE 16: NP(x0:NP, x1:VP)  x1 , 的 , x0 “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0 , x1 , x2 “these” “include” “France” “&” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.)  . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .

  33. “These 7 people include astronauts coming from France and Russia” RULE 15: S(x0:NP, x1:VP, x2:PUNC)  x0 , x1 , x2 RULE 14: VP(x0:VBP, x1:NP)  x0 , x1 “include astronauts coming from France and Russia” “astronauts coming from France and Russia” RULE 16: NP(x0:NP, x1:VP)  x1 , 的 , x0 “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP))  来自 , x0 “France and Russia” “these 7 people” RULE 13: NP(x0:NNP, x1:CC, x2:NNP)  x0 , x1 , x2 RULE 10: NP(x0:DT, CD(7), NNS(people)  x0 , 7人 “these” “include” “France” “&” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.)  . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .

  34. Derived English Tree S VP NP VP PP NP NP NP NP NP NNS VBG NNP CC NNP PUNC DT CD VBP NNS IN . These 7 people include astronauts coming from France and Russia

  35. General-Purpose Algorithms for Tree Automata

  36. Machine Translation Phrase-based MT Syntax-based MT Meaning-based MT source string target string source string source tree target tree target string source string meaning representation target string source tree target tree

  37. Five Equivalent Meaning Representation Formats “The boy wants to go.” LOGICAL FORM PATH EQUATIONS ((x0 instance) = WANT ((x1 instance) = BOY ((x2 instance) = GO ((x0 agent) = x1 ((x0 patent) = x2 ((x2 agent) = x1 E w, b, g : instance(w, WANT) ^ instance(g, GO) ^ instance(b, BOY) ^ agent(w, b) ^ patient(w, g) ^ agent(g, b) PENMAN (w / WANT :agent (b / BOY) :patient (g / GO :agent b))) DIRECTED ACYCLIC GRAPH FEATURE STRUCTURE patient instance instance: WANT agent instance agent: instance: BOY WANT 1 agent patient: instance: GO agent: instance GO 1 BOY

  38. Example “Government forces closed on rebel outposts on Thursday, showering the western mountain city of Zintan with missiles and attacking insurgents holed up near the Tunisian border, according to rebel sources.” (s / say :agent (s2 / source :mod (r / rebel)) :patient (a / and :op1 (c / close-on :agent (f / force :mod (g / government)) :patient (o / outpost :mod (r2 / rebel)) :temporal-locating (t / thursday)) :op2 (s / shower :agent f :patient (c / city :mod (m / mountain) :mod (w / west) :name "Zintan") :instrument (m2 / missile)) :op3 (a / attack :agent f :patient (i / insurgent :agent-of (h / hole-up :pp-near (b / border :gpi (c2 / country :name "Tunisia"] Slogan: “more logical than a parse tree”

  39. General-Purpose Algorithms for Feature Structures (Graphs)

  40. Automata Frameworks • Hyperedge-replacement graph grammars • (Drewes et al) • DAG acceptors • (Hart 75) • DAG-to-tree transducers • (Kamimura & Slutski 82)

  41. Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants to see WANT agent instance SEE BOY

  42. Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants to be seen WANT patient instance SEE BOY

  43. Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants the girl to be seen WANT patient instance SEE instance BOY GIRL

  44. Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants to see the girl agent WANT patient instance SEE instance BOY GIRL

  45. Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants to see himself agent WANT patient instance SEE BOY

  46. Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) patient instance agent recipient instance PROMISE agent instance GO GIRL instance BOY

  47. Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) instance patient recipient PROMISE instance instance agent GO GIRL instance BOY

  48. Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) instance patient recipient PROMISE instance instance agent GO q.NN | girl instance BOY

  49. Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) instance patient recipient PROMISE instance instance agent GO q.NN | girl instance q.NN | boy

  50. Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) instance patient recipient PROMISE instance instance agent q.VBZ | goes q.NN | girl instance q.NN | boy

More Related