400 likes | 565 Views
From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data. Chu-Ren Huang Academia Sinica http://cwn.ling.sinica.edu.tw/huang/huang.htm. Outline. A generative lexicalist approach to grammar
E N D
From Lexical Semantics to Knowledge Systems:How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica http://cwn.ling.sinica.edu.tw/huang/huang.htm
Outline • A generative lexicalist approach to grammar • From distributional data to the basic contrasts in a semantic field (or conceptual motivation for corpus distribution) • Lexical distribution as cognitive model • Radical as ontology • Language as a knowledge system ISLCC Chu-Ren Huang
Introduction: A generative lexicalist approach to grammar Back to Aristotle (through Pustejovsky) • How do know and know and what do we know: through what we experience • Qualia Structure: what we experience • Formal • Constitutive • Agentive • Telic ISLCC Chu-Ren Huang
Linguistics: What do we know about language • Qualia Structure of Theory of Language • Formal: from Sign to Structure, Structuralism • Constitutive: from IA to IP, rule and transformation based theories • Agentive: UG approaches • Telic: Function and Use based Theories • We need a linguistic theory that accounts for the complete knowledge structure, not just its individual aspects ISLCC Chu-Ren Huang
Towards Language as Knowledge System • Atoms of knowledge:lexicalized concepts • ‘frames’ of knowledge:lexical semantic relations • Instantiation of knowledge:corpus lexicon-driven, corpus-based to infer knowledge structure underlying linguistic structure ISLCC Chu-Ren Huang
Three Studies • The semantic field of emotion:(elaborated from Chang et al. 2000) • Lexicalized Model of Cognition: (Huang and Hong 2005) • Conventionalized Ontology in Writing(Chou and Huang 2005) ISLCC Chu-Ren Huang
Semantic Field of Verbs of Emotion • Issues: Methodological • Interpretation of Distributional Data • Measuring and Interpreting lexical choices • Issues: Linguistic • Archetype Via Contrast • Why Change-of-State: • Saliency and relevance to human cognition ISLCC Chu-Ren Huang
Distributional Contrast of Verbs of Emotion 高興gao1xing4 (Type A) Vs.快樂kuai4le4(Type B) • Category: intrans. vs. trans. state verb • Function: more predicative vs. more nominalized • Collocation: CAUSE complement vs. no CAUSE • Collocation: Perfect aspect vs. no -le • Collocation (modified nouns): Eventive vs. no selection • Interpretation (Imperative): Command vs. Wish ISLCC Chu-Ren Huang
A Natural Dichotomy of Verbs of Emotion Subtype Type A Type B Happiness gao1xing4高興(669) kuai4le4快樂(942) kai1xin1開心(152) yu2kuai4愉快(271) tong4kuai4痛快(40) xi3yue4喜悅(156) huan1le4歡樂(141) huan1xi3歡喜(107) kuai4huo2快活(48) Depression nan2guo4難過(232) Tong4ku3痛苦(443) tong4xin1痛心(48) chen2zhong4沈重(83) ju3sang4沮喪(62) ISLCC Chu-Ren Huang
A Natural Dichotomy of Verbs of Emotion Subtype Type A Type B Sadness hang1xin1傷心(134)bei1shang1悲傷(52) Regret hou4hui3後悔(102)yi2han4遺憾(198) Anger seng1qi4生氣(307) fen4nu4憤怒(112) qi4fen4氣憤(49) Fear hai4pa4害怕(261)kong3ju4恐懼(149) wei4ju4畏懼(40) Worry dan1xin1擔心(609) fan2nao3煩惱(199) dan1you1擔憂(64) ku3nao3苦惱(45) you1xin1憂心(46) ISLCC Chu-Ren Huang
Some Observations • Each of the seven kinds of emotion verbs show the same dichotomy: • change-of-state vs. homogeneous state • Each side of the dichotomy is dominated by a dominating verb • in terms of frequency and prototypicality of meaning ISLCC Chu-Ren Huang
Semantic Field and Contrast Set • A semantic field is consisted of a unique covering term and a number of contrast sets. Paraphrase of Grandy 1992 • The unique covering term may or may not occur in a contrast set. • All other members of the semantic field must be determined by entering into a contrast set relation with a known member of the semantic field. ISLCC Chu-Ren Huang
Observation: Chinese Defines a Property by Contrast • qing1zhong4 light+heavy = weight • da4xiao3 big+small = size • gao1ai3 tall+short = height • shi4fei1/dui4cuo4 right+wrong = affair • xiong1di4 elder+younger = brothers • zang1pi3 praise+attack = criticize • hu1xi1 exhale+inhale = breathe ISLCC Chu-Ren Huang
Our Proposal • T is either a single term or a privileged contrast set, called a contrast pair. • When T is a contrast pair, the semantic field can be defined by the shared semantic properties of the pair. • The fundamental contrast relation defining a contrast pair may be shared by a super-set of semantic fields. ISLCC Chu-Ren Huang
Our Proposal • T must enter contrast set relations with other members of the semantic field, although the contrast relation may be weakened to a marked/unmarked contrast. • The set of fundamental contrast relations are shared by all semantic fields. [cf. Semantic relations] ISLCC Chu-Ren Huang
Patterns of Distribution as Representational Clues • Numbers Don’t Lie • The pattern itself is a proof that generalizations based on a single lexical item is replicable. • The uniformity and universality of the pattern across a broad but contiguous semantic field strongly favors a conceptual motivation. ISLCC Chu-Ren Huang
Functional Distribution of Type A Verbs of Emotion Type A Pred. Nom. N.M. gao1xing485.05%0.30% 1.35% nan2guo486.64%2.16% 2.59% shang1xin1 76.12%2.99% 11.19% hou4hui394.12%0.00% 2.94% sheng1qi487.82%0.00% 4.06% hai4pa493.10%3.07% 2.68% dan1xin196.72% 1.97% 1.31% Average 88.51% 1.50% 3.73% ISLCC Chu-Ren Huang
Functional Distribution of Type B Verbs of Emotion Type B Pred. Nom. N.M. kuai4le437.79% 26.43% 24.84% tong4ku325.73% 45.60% 20.54% bei1shang1 40.38% 28.85% 19.23% yi2han434.85% 33.84% 3.54% fen4nu428.57% 37.50% 17.86% kong3ju423.49% 68.46% 7.38% fan2nao324.12% 69.85% 6.03% Average 30.70% 44.36% 14.21% ISLCC Chu-Ren Huang
Preference of A verbs over B verbs in Predicative Uses Verbs Pred.-Freq. A/B Ratio gaoxing/kuaile 569/356 1.59 nanguo/tongku 201/114 1.76 shangxin/beishang 102/21 4.86 houhui/yihan 96/69 1.39 shengqi/fennu 238/32 7.44 haipa/kongju 243/35 6.94 danxin/fannao 589/48 12.27 Average ratio 5.62 ISLCC Chu-Ren Huang
Preference of B verbs over A verbs in Nominal Uses Verbs Nom.-Freq. B/A Ratio gaoxing/kuaile 11/483 43.91 nanguo/tongku 11/293 26.64 shangxin/beishang 19/25 1.32 houhui/yihan 3/74 24.67 shengqi/fennu 11/62 5.64 haipa/kongju 15/113 7.53 danxin/fannao 20/151 7.55 Average ratio 16.75 ISLCC Chu-Ren Huang
Summary of the Likelyhood Ratio Data • A clear lexical preference between near-synonyms are established. • Predicative preference and deverbal preference tend to compensate each other to establish contrast. • Overall, the deverbal preference seems to be the defining feature of the dichotomy. [note that these are all verbs.] ISLCC Chu-Ren Huang
Deverbal Use Frequency ofType A Verbs tong4kuai4痛快 0.00% gao1xing4高興 1.65% hou4hui3後悔 2.94% dan1xin1擔心 3.28% sheng1qi4生氣 3.58% tong4xin1痛心 4.17% nan2guo4難過 4.75% hai4pa4害怕 5.75% you1xin1憂心 6.52% kai1xin1開心 7.89% dan1you1擔憂 9.38% shang1xin1傷心 14.18% ISLCC Chu-Ren Huang
Deverbal Use Frequency ofType B Verbs qi4fen4氣憤 24.49%chen1zhong4沈重48.19% wei4ju4畏懼 25.00% kuai4le4快樂 51.27% yu2kuai4愉快 29.89% fen4nu4憤怒 55.36% huan1xi1歡喜 30.84% tong4ku3痛苦 66.14% kuai4huo2快活33.33% kong3ju4恐懼 75.84% ju3sang4沮喪 33.87% fan2nao3煩惱 75.88% yi2han4遺憾 37.38% xi1yue4喜悅 92.20% ku3nao3苦惱 46.67% huan1le1歡樂 92.91% bei1shang1悲傷48.08% ISLCC Chu-Ren Huang
Deverbal Use Frequency as a Benchmark for Type A/B Verbs • More than 10% differentiates the lowest Type B verb(qi4fen4氣憤 24.49%) from the highest Type A verbs (shang1xin1傷心14.18%). • The smallest gap between a competing pair is almost 34% (shang1xin1傷心14.18%vs. bei1shang1悲傷48.08% ). ISLCC Chu-Ren Huang
The Noisy-Channel Model of Theory of Communication • Our Proposal • Language is an information-based communication system. • An optimized communication system is where all redundant signs (for one piece of information) also minimally differentiate another piece of information. ISLCC Chu-Ren Huang
Re-Interpretation of the Data • Members of the same semantic field in general, and a near-synonym pair in particular, are competing signs to express information pertaining to the field. • A sign is chosen to represent a piece of information because it expresses that piece of information most effectively. ISLCC Chu-Ren Huang
Re-Interpretation of the Data • This preference for expressing certain information can be lexicalized to establish logical implicature. • Once that lexical preference is established, linguists could use the preferential ratio to infer the lexical information being carried. ISLCC Chu-Ren Huang
Lexical distribution as cognitive model: Senses • A further step based on property defined by contrast, with focus on how senses are represented • Study the sense of hearing and the basic property term of sheng-yin ‘sound/voice’ • We (Huang and Hong 2005) look at the distribution of these two lexical elements in all derived words ISLCC Chu-Ren Huang
聲樂 vs.音樂 vocal music vs. music 發聲 vs.發音 make a sound vs. articulate 高聲 vs.高音 loudly vs. high pitch *噪聲 vs.噪音 noise 大聲 vs. *大音 loudly 聲 Sheng vs.音 Yin ISLCC Chu-Ren Huang
聲 Sheng +source 歌 掌 人 腳步 風 鐘 水 … 音 Yin + quality 嗓 鄉 喉 裝飾 尾 哨 … NN Compound N+* ISLCC Chu-Ren Huang
聲 Production of sounds Often refers to the manner or source of haw a sound was made 音 Perception of a sound Often refers to the sound quality or how a sound is perceived by an intelligent agent The semantic Contrast ISLCC Chu-Ren Huang
A Lexicalized Schema for Hearing in Chinese From Huang and Hong 2005 Process of Hearing 聲sheng音yin 起點、來源 source 終點、結果 goal 主動完成 production 被動接收 reception 發動者(instigator) 經驗者(experiencer) ISLCC Chu-Ren Huang
A Lexicalized Schema for Sense in Chinese Process of Sensation word1word0 經驗者(experiencer) Goal/perceptiopn: experience of sense 感知接收(sensation) ISLCC Chu-Ren Huang
詞彙 認知特徵的對比 感覺發動者 (instigator of action) — marked 感覺經驗者 (experiencer of sensation) — shared and unmarked 聽覺 聲 (production) 音 (perception) 視覺 看 (inchoative) 見 (bounded result) 觸覺 觸 (activity) 摸 (incremental theme) 詞彙詞義分析(7) 「視覺」、「觸覺」與「聽覺」三者的關係圖示 特徵 ISLCC Chu-Ren Huang perception
Radical as ontology • Chinese writing system has been conventionalized and shared for over three thousand years • And adopted by typologically very different languages • If the radical system is a system of conceptualization, then it is the most robust and most widely used ontology ISLCC Chu-Ren Huang
Example: the horse radical (from Chou 2005) • 馬 is a semantic symbol of horse • Examples: • 驩:馬名 a kind of horse • 驫:眾馬 horses • 騎:騎馬 riding a horse • 驍:良馬 a good horse • 驚:馬驚 a scared horse 馬 ISLCC Chu-Ren Huang
Research Tool and Issue • Formal Description • IEEE SUMO ( Suggested Upper Merged Ontology) http://www.ontologyportal.org http://BOW.sinica.edu.tw • Issue: Why Chinese radicals are usually considered as a imperfect and misleading taxonomy? ISLCC Chu-Ren Huang
Plants Descriptive/formal telic IS-A Constitutive 蕉蘭芒蒙菌蔓苦菊茱范荷茅蕈蔚菲草 蕃藥蔬菜薪苑藩藉茭 Knowledge System of the Radical 艸/艹 (Grass, for Plants) Description Usage 茲蒼芳落茸茂荒薄芬蒸莊 Parts 萌莖芽茄苗蓮葉 ISLCC Chu-Ren Huang
Conclusion I:Corpus as Evidence • Core issue of a scientific explanation of language and cognition • Language as an living organism allows variations and adaptations (the evolutionary view) • The coherence of language is the shared tendency of all users • Distributional data in corpus lead to discovery of these shared tendencies • This should be more valuable than incidental example ISLCC Chu-Ren Huang
Conclusion II: Language as a Knowledge System • The generative lexicalist approach to grammar: language as a knowledge system • All aspects of Language are projected from a unified knowledge system • Lexical semantics based on distributional data offers the best window to the underlying knowledge system of language ISLCC Chu-Ren Huang