1 / 27

Learning to Transform Natural to Formal Language

Learning to Transform Natural to Formal Language. Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney. Presented by Ping Zhang. Overview. Background SILT CL ANG and G EOQUERY Semantic Parsing using Transformation rules String-based learning Tree-based learning Experiments Future work

nhi
Download Presentation

Learning to Transform Natural to Formal Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Transform Natural to Formal Language Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney Presented by Ping Zhang

  2. Overview • Background • SILT • CLANG and GEOQUERY • Semantic Parsing using Transformation rules • String-based learning • Tree-based learning • Experiments • Future work • Conclusion

  3. Natural Language Processing (NLP) • Natural Language—human language. • English • The reason to process NL: • To provide a much user-friendly interface • Problems: • NL is too complex. • NL has many ambiguities. • Until now, NL cannot be used to program a computer.

  4. Classification of Language • Traditionally classification (Chomsky Hierarchy) • Regular grammar • Context-free grammar—Formal Language • Context-sensitive grammar • Unrestricted grammar—Natural Language • All programming languages are less flexible than context-sensitive languages currently. • For example, C++ is a restricted context-sensitive language.

  5. An Approach to process NL • Map a natural language to a formal query or command language. • Therefore, NL interfaces to complex computing and AI systems can be more easily developed. EnglishFormal Language Map Compiler Interpreter

  6. Grammar Terms • Grammar: G = (N, T, S, P) • N: finite set of Non-terminal symbols • T: finite set of Terminal symbols • S: Starting non-terminal symbol, S∈N • P: finite set of productions • Production: x->y • For example, • Noun -> “computer” • AssignmentStatement -> i := 10; • Statements -> Statement; Statements

  7. SILT • SILT—Semantic Interpretation by Learning Transformations • Transformation rules Map substrings in NL sentences or subtrees in their corresponding syntactic parse trees to subtrees of the formal-language parse tree. • SILT learns transformation rules from training data—pairs of NL sentences and manual translated formal language statements. • Two target formal languages: • CLANG • GEOQUERY

  8. CLANG • A formal language used in coaching robotic soccer in the RoboCup Coach Competition. • CLANG grammar consists of 37 non-terminals and 133 productions. • All tactics and behaviors are expressed in terms of if-then rules • An example: • ( (bpos (penalty-area our) ) (do (player-except our {4} ) (pos (half our) ) ) ) • “If the ball is in our penalty area, all our players except player 4 should stay in our half.”

  9. GEOQUERY • A database query language for a small database of U.S. geography. • The database contains about 800 facts. • Based on Prolog with meta-predicates augmentations. • An example: • answer(A, count(B, (city(B), loc(B, C), const(C, countryid(usa) ) ),A) ) • “How many cities are there in the US?”

  10. Two methods • String-based transformation learning • Directly maps strings of the NL sentences to the parse tree of formal languages • Tree-based transformation learning • Maps subtrees to subtrees between two languages. • Assumes the syntactic parse tree and parser of the NL sentences are provided

  11. S NP TEAM UNUM VP VBZ has NP DT the NN ball Semantic Parsing • Pattern matching • Patterns found in NL <-> Templates based on productions • NL phrases <-> Formal expression • Rule representation for two methods “TEAM UNUM has the ball” CONDITION →(bowner TEAM {UNUM})

  12. Examples of Parsing • “If our player 4 has the ball, our player 4 should shoot.” • “If TEAM UNUM has the ball, TEAM UNUM should ACTION.” our 4 our 4 (shoot) • “If CONDITION , TEAM UNUM should ACTION.” (bowner our {4}) our 4 (shoot) • “If CONDITION , DIRECTIVE .” (bowner our {4}) (do our {4} (shoot) ) • RULE( (bowner our {4}) (do our {4} (shoot) ))

  13. Variations of Rule Representation • SILT allows patterns to skip some words or nodes • “if CONDITION, <1> DIRECTIVE.” <1> -> ”then” • To deal with non-compositionality • SILT allows to apply constrains • “in REGION” matches “CONDITION -> (bpos REGION)” if “in REGION” follows “the ball <1>”. • SILT allows to use templates with multi productions • “TEAM player UNUM has the ball in REGION” CONDITION → (and (bowner TEAM UNUM) (bpos REGION))

  14. Input: A training set T of NL sentences paired with formal representations; • a set of productions in the formal grammar • Output: A learned rule base L • Algorithm: • Parse all formal representations in T using . • Collect positive P and negative examples N for all ∈ . • L = ∅ • Until all positive examples are covered, or no more good rules • can be found for any ∈ , do: • R’ = FindeBestRules( ,P,N) • L = L ∪ R’ • Apply rules in L to sentences in T. • Given a NL sentence S: • P: if is used in the formal expression of S, then S is positive to • N: if is not used in the formal expression of S, then S is negative to Learning Transformation Rules

  15. Issues of SILT Learning • Non-compositionality • Rule cooperation • Rules are learn in order. • Therefore an over-general ancestor will lead to a group of over-general child rules. Further, no rule can cooperate with that kind of rules. • Two approaches can solve: • Find the single best rule for all competing productions in each iteration. • Over generate rules; then find a subset which can cooperate

  16. FindBestRule() For String-based Learning Input: A set of productions in the formal grammar; sets of positive P and negative examples N for each in Output: The best rule BR Algorithm: R = ∅ For each production π∈ Π : Let Rπ be the maximally-specific rules derived from P. Repeat for k = 1000 times: Choose r1, r2 ∈ Rπ at random. g = GENERALIZE(r1, r2, π) Add g to R. R = R ∪ R BR = argmax r ∈ R goodness(r) Remove positive examples covered by BR from P .

  17. FindBestRule() Cont. • Goodness (r) • GENERALIZE • r1, r2 : two transformation rules based on the same production • For example: • π : Region -> (penalty-area TEAM) • pattern 1: TEAM ‘s penalty box • pattern 2: TEAM penalty area • Generalization: TEAM <1> penalty

  18. NP NP NN area NP TEAM NN penalty NN box PRP$ TEAM NN penalty NP , TEAM TEAM POS ‘s NN NN penalty Tree-based Learning • Similar FindBestRules() algorithm • GENERALIZE • Find the largest common subgraphs of two rules. • For example: • π : Region -> (penalty-area TEAM) Pattern 1 Pattern 2 Generalization

  19. Experiment • As for CLANG • 300 pieces selected randomly from log files of 2003 RoboCup Coach Competition. • Each formal instruction was translated into English by human. • Average length of a NL sentence is 22.52 words. • As for GEOQUERY • 250 questions were collected from undergraduate students. • All English queries were translated manually. • Average length of a NL sentence is 6.87 words.

  20. Result for CLANG

  21. Result for CLANG (Cont.)

  22. Result for GEOQUERY

  23. Result for GEOQUERY (Cont.)

  24. Time Consuming Time consuming in minutes.

  25. Future Work • Though improved, SILT still lacks robustness of statistical parsing. • The hard-matching symbolic rules of SILT are sometimes too brittle. • A more unified implementation of tree-based SILT which allows to directly compare and evaluate the benefit of using initial syntactic parsers.

  26. Conclusion • A novel approach, SILT, can learn transformation rules that maps NL sentences into a formal language. • It shows better overall performance than previous approaches. • NLP, still a long way to go.

  27. Thank you! Questions or comments?

More Related