130 likes | 287 Views
Models of Linguistic Choice. Christopher Manning. Explaining more: How do people choose to express things?. What people do say has two parts: Contingent facts about the world People in the Bay Area have talked a lot about electricity, housing prices, and stocks lately
E N D
Models of Linguistic Choice Christopher Manning
Explaining more: How do people choose to express things? • What people do say has two parts: • Contingent facts about the world • People in the Bay Area have talked a lot about electricity, housing prices, and stocks lately • The way speakers choose to express ideas within a situation using the resources of their language • People don’t often put that clauses pre-verbally: • That we will have to revise this program is almost certain • We’re focusing on linguistic models of the latter choice
How do people choose to express things? • Simply delimiting a set of grammatical sentences provides only a very weak description of a language, and of the ways people choose to express ideas in it • Probability densities over sentences and sentence structures can give a much richer view of language structure and use • In particular, we find that the same soft generalizations and tendencies of one language often appear as (apparently) categorical constraints in other languages • Linguistic theory should be able to uniformly capture these constraints, rather than only recognizing them when they are categorical
Probabilistic Models of Choice • P(form|meaning, context) • Looks difficult to define. We’re going to define it via features • A feature is anything we can measure/check • P(form|f1, f2, f3, f4, f5) • A feature might be “3rd singular subject”, “object is old information”, “addressee is a friend”, “want to express solidarity”
Constraints = Features = Properties Discourse Person Linking Input: approve<1pl [new], plan [old]> f1 *Su/Newer f2 *3>1/2 f3 *Ag/Non-subj We approved the plan last week 1 0 0 The plan was approved by us last week 0 1 1 Explaining language via (probabilistic) constraints
Explaining language via (probabilistic) constraints • Categorical/constraint-based grammar [GB, LFG, HPSG, …] • All constraints must be satisfied, if elsewhere conditions / emergence of unmarked, complex negated conditions need to be added. • Optimality Theory • Highest ranked differentiating constraint always determines things. Emergence of unmarked. Single winner: No variable outputs. No ganging up. • Stochastic OT • Probabilistic noise at evaluation time allows variable rankings and hence a distribution over multiple outputs. No ganging up. • Generalized linear models (e.g., Varbrul)
A theory with categorical feature combination • In a certain situation you can predict a single output or no well-formed output • No model of gradient grammaticality • No way to model variation • Or you can predict a set of outputs • Can’t model their relative frequency • Categorical models of constraint combination allows no room for soft preferences and constraint combining together to make an output dispreferred or impossible (“ganging up” or “cumulativity”)
Optimality Theory • Prince and Smolensky (1993/2005!): • Provide a ranking of constraints (ordinal model) • Highest differentiating constraint determines winner • “When the scalar and the gradient are recognized and brought within the purview of theory, Universal Grammar can supply the very substance from which grammars are built: a set of highly general constraints, which, through ranking, interact to produce the elaborate particularity of individual languages.” • No variation in output (except if ties) • No cumulativity of constraints
Creating more ties • One way to get more variation is to create more ties by allowing various forms of floating constraint rankings or unordering of constraints • If you have lots of ways of deriving a form from underlying meanings, then you can count the number of derivations • Anttila (1997) • (I confess I’m sceptical of such models; inter alia they inherit the problems of ties in OT: they’re extremely unstable.)
Stochastic OT (Boersma 1997) • Basically follows Optimality Theory, but • Don’t simply have a constraint ranking • Constraints have a numeric value on a scale • A random perturbation is added to a constraint’s ranking at evaluation time • The randomness represents incompleteness of our model • Variation results if constraints have similar values – our grammar constrains but underdetermines the output • One gets a probability distribution over optimal candidates for an input (over different evaluations) f1f2f3f4
Stochastic OT (Boersma 1997) • Stochastic OT can model variable outputs • It does have a model of cumulativity, but constraints in the model are and can only be very weakly cumulative • We’ll look soon at some papers that discuss how well this works as a model of linguistic feature combination
Generalized linear models • The grammar provides representations • We define arbitrary properties over those representations (e.g. Subj=Pro, Subj=Topic) • We learn weights wifor how important the properties are • These are put into a generalized linear model • Model: or
Generalized linear models • Can get categorical or variable outputs • As probability distribution: • All outputs have some probability of occurrence, with the distribution based on the weights of the features. Ganging up. Emergence of the unmarked. • Optimizing over generalized linear models: we choose one for which the probability is highest: • arg maxj P(cj) • Output for an input is categorical. Features gang up. (However by setting weights far enough apart, ganging up will never have an effect – giving conventional OT.) Emergence of unmarked.