Generation

Generation

Aims of this talk • Discuss MRS and LKB generation • Describe larger research programme: modular generation • Mention some interactions with other work in progress: • RMRS • SEM-I

Outline of talk • Towards modular generation • Why MRS? • MRS and chart generation • Data-driven techniques • SEM-I and documentation

Modular architecture Language independent component Meaning representation Language dependent realization string or speech output

Desiderata for a portable realization module • Application independent • Any well-formed input should be accepted • No grammar-specific/conventional information should be essential in the input • Output should be idiomatic

Architecture (preview) External LF SEM-I Internal LF specialization modules Chart generator control modules String

Why MRS? • Flat structures • independence of syntax: conventional LFs partially mirror tree structure • manipulation of individual components: can ignore scope structure etc • lexicalised generation • composition by accumulation of EPs: robust composition • Underspecification

An excursion: Robust MRS • Deep Thought: integration of deep and shallow processing via compatible semantics • All components construct RMRSs • Principled way of building robustness into deep processing • Requirements for consistency etc help human users too

Extreme flattening of deep output some every y dog1 every some x cat x y chase cat y dog1 chase x y x e x y e x y lb1:every_q(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat_n(x), lb5:dog_n_1(y), lb4:some_q(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase_v(e),ARG1(lb3,x), ARG2(lb3,y), h9 qeq lb2,h8 qeq lb5

Extreme Underspecification • Factorize deep representation to minimal units • Only represent what you know • Robust MRS • Separating relations • Separate arguments • Explicit equalities • Conventions for predicate names and sense distinctions • Hierarchy of sorts on variables

Chart generation with the LKB • Determine lexical signs from MRS • Determine possible rules contributing EPs (`construction semantics’: compound rule etc) • Instantiate signs (lexical and rule) according to variable equivalences • Apply lexical rules • Instantiate chart • Generate by parsing without string position • Check output against input

Lexical lookup for generation • _like_v_1(e,x,y) – return lexical entry for sense 1 of verb like • temp_loc_rel(e,x,y) – returns multiple lexical entries • multiple relations in one lexical entry: e.g., who, where • entries with null semantics: heuristics

Instantiation of entries • _like_v_1(e,x,y) & named(x,”Kim”) & named(y,”Sandy”) • find locations corresponding to `x’s in all FSs • replace all `x’s with constant • repeat for `y’s etc • Also for rules contributing construction semantics • `Skolemization’ (misleading name ...)

Lexical rule application • Lexical rules that contribute EPs only used if EP is in input • Inflectional rules will only apply if variable has the correct sort • Lexical rule application does morphological generation (e.g., liked, bought)

Chart generation proper • Possible lexical signs added to a chart structure • Currently no indexing of chart edges • chart generation can use semantic indices, but current results suggest this doesn’t help • Rules applied as for chart parsing: edges checked for compatibility with input semantics (bag of EPs)

Root conditions • Complete structures must consume all the EPs in the input MRS • Should check for compatibility of scopes • precise qeq matching is (probably) too strict • exactly same scopes is (probably) unrealistic and too slow

Generation failures due to MRS issues • Well-formedness check prior to input to generator (optional) • Lexical lookup failure: predicate doesn’t match entry, wrong arity, wrong variable types • Unwanted instantiations of variables • Missing EPs in input: syntax (e.g., no noun), lexical selection • Too many EPs in input: e.g., two verbs and no coordination

Improving generation via corpus-based techniques • CONTROL: e.g. intersective modifier order: • Logical representation does not determine order • wet(x) & weather(x) & cold(x) • UNDERSPECIFIED INPUT: e.g., • Determiners: none/a/the/ • Prepositions: in/on/at

Constraining generation for idiomatic output • Intersective modifier order: e.g., adjectives, prepositional phrases • Logical representation does not determine order • wet(x) & weather(x) & cold(x)

Adjective ordering • Constraints / preferences • big red car • * red big car • cold wet weather • wet cold weather (OK, but dispreferred) • Difficult to encode in symbolic grammar

Corpus-derived adjective ordering • ngrams perform poorly • Thater: direct evidence plus clustering • positional probability • Malouf (2000): memory-based learning plus positional probability: 92% on BNC

Underspecified input to generation We bought a car on Friday Accept: pron(x) & a_quant(y,h1,h2) & car(y) & buy(epast,x,y) & on(e,z) & named(z,Friday) and: pron(x) & general_q(y,h1,h2) & car(y) & buy(epast,x,y) & temploc(e,z) & named(z,Friday) And maybe: pron(x1pl) & car(y) & buy(epast,x,y) & temp_loc(e,z) & named(z,Friday)

Guess the determiner • We went climbing in _ Andes • _ president of _ United States • I tore _ pyjamas • I tore _ duvet • George doesn’t like _ vegetables • We bought _ new car yesterday

Determining determiners • Determiners are partly conventionalized, often predictable from local context • Translation from Japanese etc, speech prosthesis application • More `meaning-rich’ determiners assumed to be specified in the input • Minnen et al: 85% on WSJ (using TiMBL)

Preposition guessing • Choice between temporal in/on/at • in the morning • in July • on Wednesday • on Wednesday morning • at three o’clock • at New Year • ERG uses hand-coded rules and lexical categories • Machine learning approach gives very high precision and recall on WSJ, good results on balanced corpus (Lin Mei, 2004, Cambridge MPhil thesis)

SEM-I: semantic interface • Meta-level: manually specified `grammar’ relations (constructions and closed-class) • Object-level: linked to lexical database for deep grammars • Definitional: e.g. lemma+POS+sense • Linked test suites, examples, documentation

SEM-I development • SEM-I eventually forms the `API’: stable, changes negotiated. • SEM-I vs Verbmobil SEMDB • Technical limitations of SEMDB • Too painful! • `Munging’ rules: external vs internal • SEM-I development must be incremental

Role of SEM-I in architecture • Offline • Definition of `correct’ (R)MRS for developers • Documentation • Checking of test-suites • Online • In unifier/selector: reject invalid RMRSs • Patching up input to generation

Goal: semi-automated documentation [incr tsdb()] and semantic test-suite Lex DB ERG Documentation strings Object-level SEM-I Auto-generate examples semi-automatic Documentation examples, autogenerated on demand Meta-level SEM-I autogenerate appendix

Robust generation • SEM-I an important preliminary • check whether generator input is semantically compatible with grammars • Eventually: hierarchy of relations outside grammars, allowing underspecification • `fill-in’ of underspecified RMRS • exploit work on determiner guessing etc

Architecture (again) External LF SEM-I Internal LF specialization modules Chart generator control modules String

Interface • External representation • public, documented • reasonably stable • Internal representation • syntax/semantics interface • convenient for analysis • External/Internal conversion via SEM-I

Guaranteed generation? • Given a well-formed input MRS/RMRS, with elementary predications found in SEM-I (and dependencies) • Can we generate a string? with input fix up? negotiation? • Semantically bleached lexical items: which, one, piece, do, make • Defective paradigms, negative polarity, anti-collocations etc?

Next stages • SEM-I development • Documentation and test suite integration • Generation from RMRSs produced by shallower parser (or deep/shallow combination) • Partially fixed text in generation (cogeneration) • Further statistical modules: e.g., locational prepositions, other modifiers • More underspecification • Gradually increase flexibility of interface to generation

Generation

Generation

Presentation Transcript

Generation to Generation Introduction

Generation Income Plan, Generation Plan, MLM Generation Inco

GENERATION

Generation to Generation

Generation

Distributed Generation vs. Centralized Generation

Language Generation

Innovation Generation

BEHAVIOR GENERATION

Generation

Co-Generation

Code Generation

Generation ?

Generation to Generation

Generation

Lead Generation

Generation

Generation

Generation

Generation to Generation