370 likes | 545 Views
Information Density and Word Order. Why are some word orders more common than others?. In the majority of languages (with dominant word order) subjects precede objects (SOV,SVO) > VSO > (VOS, OVS) > OSV. Why are some word orders more common than others?. Genetically encoded bias?
E N D
Why are some word orders more common than others? • In the majority of languages (with dominant word order) subjects precede objects • (SOV,SVO) > VSO > (VOS, OVS) > OSV
Why are some word orders more common than others? • Genetically encoded bias? • Single common ancestor (SOV)? • General linguistic principles • Theme-first • Verb-object bodning • Animate-first • Great, but why do these principles work?
Uniform information density hypothesis • Constant information transmission rate • Slower for unexpected, high entropy content • Faster for predictable, low entropy content • The basic word order of a language influences the average transmission rate • Thus languages that are closer to the UID ideal will be more common compared to others further away from it
Word-order model • Simple world with • 13 objects (O) • 5 people • 8 food/drink items • 2 relations (R) • eat/drink • Events in this world consist of one relation and two objects • (o1, r, o2) • And appear with a certain probability P
Word-order model • Base entropy (the initial state of the observer before words are spoken) • After each word, observers adjust their expectations for the following ones, reaching an entropy of zero after the third word of the event
Word-order model • Each event has an information profile I1 = H0 − H1 , I2 = H2 − H1 , I3 = H2 • Where Hn are entropy trajectories of each word • UID suggests a straight line from base entropy to zero entropy such that each word conveys 1/3 of the total information
Word-order model • UID deviation score • Deviation of toy-world events from the “ideal information profile” according to UID • VSO > VOS > SVO > OVS > SOV > OSV
Corpus study • Child-directed speech (English and Japanese corpora) • Utterances involving singly transitive verbs • Ignored adjectives, plurality, tense etc • English: VSO (0.38), SVO (0.41), VOS (0.48), SOV (0.64), OSV (0.78), OVS (0.79) • Japanese: SVO (0.66), VSO (0.71), SOV (0.72), VOS (0.72), OSV (0.82), OVS (0.83)
Experiment • Languages must be optimal with respect to the frequencies of events in the real world • Judgement tasks for pairs of sentences (which one is more probable?) • VSO (0.17), SVO (0.18), VOS (0.20), SOV (0.23), OVS (0.23), OVS (0.24).
Discussion • Object-first word orders are rare • Object-first word orders have least uniform information density(first word carries too much information) • SOV is not as compatible with the UID as it is frequent in real languages – perhaps due to other important factors beside UID • TFP and AFP favor SOV, SVO (highest ranked in the results) and VSO – perhaps UID provides some justification at least for some word order rankings
Conclusion • Findings consistent with a weaker hypothesis that word order is optimal wrt the frequency speakers choose to discuss events (not wrt to how often these events really occur) • UID may not provide explanation for all of the word order rankings, but does explain several aspects of the empirical distribution of word orders
A Noisy Channel Account of Crosslinguistic Word Order Variation • In 96.3% of studied languages S precede O • SVO (English) and SOV (Japanese) are more prevalent than VSO • People construct sentences from and agent perspective – why SVO/SOV then? • Innate universal grammar – independent of communicative or performance factors
Why SOV/SVO • Communicative-based explanation • SOV default for the human language • Preference for S to precede O • Preference for the V to appear in the end of the clause • SVO arises from SOV as a result of communication/memory pressures that sometimes outweigh the second preference
Shanon’s communication theory • Comprehension and production operate via a noisy channel • Speakers are under constraints to chose utterances that will ensure maximal meaning recoverability by the listener • When does word order affect how easily meaning can be recovered? • The girl kicks the ball. (people should adhere to SOV) • The girl kicks the boy. (potential confusion resolved perhaps by the position of the noun wrt to the verb)
Method • Study investigates whether gestured word order across languages (English-SVO, Japanese, Korean-SOV) is depending on semantic reversibility of the event • Initial bias to SOV • Initial bias to native language • Communicative or memory pressures • English • Shift to SVO (second and third factors) • Japanese&Korean • Shift to SVO (only due to the third factor)
Method • Brief silent animations of intransitive/transitive events • First verbally described the animations • Then hand-gestured the meanings of the events • Verbal and gesture responses were coded for the relative position of the agent, action, and patient
Experiment 1 • Animate/inanimate patients (reversible or non-reversible sentences) • More SVO word orders should be produced if reversible • Results – uniformly SVO for verbal responses • Gestured S before O for animate patients • Gestured V before O for human patients (as expected) • Overwhelmingly gestured SOV for non-reversible events
Experiment 1&2 – Japanese/Korean • English participants’ results can be explained without resorting to noisy-channel hypothesis • Participants may shift from SOV to native (SVO) due to increased ambiguity in reversible events • Thus, tested participants with a SOV native language • Expected shift to SVO in reversible events • Experiment 2 – used more complex structures The old woman says that the fireman kicks the girl
Experiment 1&2 – Japanese/Korean • If participants use native word-order (SOV) • Then they should gesture both levels of embedded events with the same order: S1[S2O2V2] V1 • In case of reversible events SOV creates maximal potential confusion • Then they should gesture using SVO: S1 V1 [S2V2O2]
Experiment 1&2 – Japanese/Korean • Exp 1 results – native language word-order • J&K speakers verbalized patient before action (100%) • Gestured patient before action in both animate and inanimate patients • Exp 2 results – shift to SVO • J speakers never verbalized SVO; K speakers rarely • Both J&K speakers almost always gestured top-level verb in 2nd position between the top-level subject and the embedded subject • In the embedded clause patients were gestured before the action almost always, but more often in non-reversible events (both for J&K speakers) • Results predicted by noisy-channel but not by the combination of SOV default and native-language order
Experiment 3 • Alternative explanation of previous results • Minimizing syntactic dependency distances • Number of words between a syntactic head (verb) and its dependents (subject and object) • Shorter dependencies are easier • Shift from SOV to SVO given that SVO allows for shorter dependency distances
Experiment 3 - method • Animations of a boy and a girl interacting with one of a set of objects: • Circle/star/heart which was either • Spotted/striped (surface); in a box/pail (container); wearing a top/witch’s hat (headwear) • Giving/putting/intransitive event • Participants were to gesture each event and the features of the object • If sensitive to distance b/n agent and verb, then higher SVO gesture order for longer patient descriptions • No such shift predicted by noisy channel – patient is not a possible agent of the verb, adding modifiers will not affect the recoverability of who is doing what to whom
Experiment 3 - results • Gestured patient before action for most of events • Verbalized action before patient for most of events • Even with long productions still gestured patient before action, consistently with the noisy-channel hypothesis and not with the dependency-distance hypothesis
Discussion • English speakers have a strong SOV preference for non-reversible events even when the inanimate patient has up to 3 features to be gestured • SOV seems to be the preferred word order in human communication • For reversible events the preference for SOV disappears in favor of SVO • Although SOV-natives gesture SOV in simple events, they revert to SVO for more complex ones • This shift to SVO occurs in order to maximize meaning recoverability
Discussion • Case marking is often used in SOV • Mitigates the confusability of subject and object, helping to retain the default SOV • If no case marking is used, then SVO shift • Large majority of SOV languages are case marked, whereas few of SVOare • Used location in space as possible case markingin the experiments • Of the case-marked gestures most had SOV order • Animacy-dependent case marking • Many languages mark only animate direct objects • Non SVO languages have more word-order flexibility than SVO • Contain other mechanisms for disambiguation • So fixed word orders mostly SVO
Conclusion • No need for sophisticated innate machinery to explain word-order variation • Many aspects of crosslinguistic word-order variance are easily explained by communicative or memory pressures