1 / 29

Semantic frequency and the creation of pedagogical word lists: What can we learn from SemCor? Dee Gardner Brigham Young

Semantic frequency and the creation of pedagogical word lists: What can we learn from SemCor? Dee Gardner Brigham Young University. The Need for Semantic Frequency Studies?. From First Language Education ( Biemiller & Slonim, 2001, p. 510).

beate
Download Presentation

Semantic frequency and the creation of pedagogical word lists: What can we learn from SemCor? Dee Gardner Brigham Young

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic frequency and the creation of pedagogical word lists:What can we learn from SemCor?Dee GardnerBrigham Young University

  2. The Need for Semantic Frequency Studies?

  3. From First Language Education(Biemiller & Slonim, 2001, p. 510) • Why does [word] frequency have so little effect [on root word knowledge]? Part of the answer lies in the varying meanings of words. Some of the meanings that we used based on random sampling from LWV [Living Word Vocabulary] were uncommon uses of common words [e.g., beat (wings), bit (computer information), or tree (rack for shoes, hats)]. These words have high frequencies in print, but not in the meanings used. • It is also possible that some words have different frequencies in oral use than in print.

  4. Frequencies of word meanings rather than word forms might lead to better predictions (but would be very hard to produce).

  5. It is clear that factors other than print frequency account for most variation in word knowledge….”

  6. From Second Language Education • Read (2000) includes the following suggestions (among several others) regarding a new and much-needed high-frequency word list to be used in English language instruction (p. 228):

  7. ”A decision has to be made as to whether the list will consist just of individual words or whether certain fixed phrases occur with sufficient frequency and range to be included as items in their own right.”

  8. “There should be a thorough semantic analysis of the lemmas chosen, so that they can be grouped into word families.”

  9. “Thus although there are certainly ways in which computer analysis can make the development of a truly representative, high frequency word list much less time-consuming than it was for the pioneering scholars in the early years of this century, it would require a substantial amount of skilled analysis and judgement to produce a good quality list, and so far no one has taken up the challenge.”

  10. From Corpus Linguistics(Knowles & Mohd Don, 2004, pp. 70-71) • “…as linguists have begun to investigate increasingly large corpora, it has become apparent that individual members of the lemma can behave independently and develop their own meanings and collocations.”

  11. “One of the major insights gained from the work of the ‘Birmingham’ school … is that as linguists examine more and more data in greater and greater detail, generalizations about whole lemmas become less and less convincing, and we have to consider individual words (and actually even individual word meaning).”

  12. Problems with Computer-Generated Word Lists • What is counted as a word? • Morphological Relationships (inflections, derivations, etc.) • Grammatical Relationships Between Words (same or different parts of speech) • Collocational Relationships (individual forms vs. multiword items vs. larger context) • Meaning (multiple meanings for the same word forms and phrase forms) • Psychological Reality of a Word (”in the mind of the linguist” vs. “in the mind of the learner”)

  13. What English is used as the source for data? • General Registers (e.g., spoken vs. written; fiction versus nonfiction) • Specific Registers (e.g., law versus business; math versus history)

  14. Reality of a Word Within a Corpus

  15. Psychological Reality of a Word in the Mind of a Language Learner

  16. Knowledge Required to Know a Word = • Written and Spoken Forms • Morphological Relationships • Grammatical Characteristics • Collocational Relationships • Multiple Meanings • Register Characteristics • Relative Frequency

  17. Why SEMCOR?

  18. Grammatically Tagged Corpus (POS) • Semantically Tagged Corpus • Multiword Collocations Maintained • Grammatical Parts of Speech Maintained • Both Types and Lemmas Available • Multiple Registers Available

  19. Raw SEMCOR Data • <contextfile concordance=brown><context filename=br-a01 paras=yes><p pnum=1><s snum=1><wf cmd=ignore pos=DT>The</wf><wf cmd=done rdf=group pos=NNP lemma=group wnsn=1 lexsn=1:03:00:: pn=group>Fulton_County_Grand_Jury</wf><wf cmd=done pos=VB lemma=say wnsn=1 lexsn=2:32:00::>said</wf><wf cmd=done pos=NN lemma=friday wnsn=1 lexsn=1:28:00::>Friday</wf><wf cmd=ignore pos=DT>an</wf><wf cmd=done pos=NN lemma=investigation wnsn=1 lexsn=1:09:00::>investigation</wf><wf cmd=ignore pos=IN>of</wf><wf cmd=done pos=NN lemma=atlanta wnsn=1 lexsn=1:15:00::>Atlanta</wf><wf cmd=ignore pos=POS>'s</wf><wf cmd=done pos=JJ lemma=recent wnsn=2 lexsn=5:00:00:past:00>recent</wf><wf cmd=done pos=NN lemma=primary_election wnsn=1 lexsn=1:04:00::>primary_election</wf><wf cmd=done pos=VB lemma=produce wnsn=4 lexsn=2:39:01::>produced</wf><punc>``</punc><wf cmd=ignore pos=DT>no</wf><wf cmd=done pos=NN lemma=evidence wnsn=1 lexsn=1:09:00::>evidence</wf><punc>''</punc><wf cmd=ignore pos=IN>that</wf><wf cmd=ignore pos=DT>any</wf><wf cmd=done pos=NN lemma=irregularity wnsn=1 lexsn=1:04:00::>irregularities</wf><wf cmd=done pos=VB lemma=take_place wnsn=1 lexsn=2:30:00::>took_place</wf><punc>.</punc>

  20. Repurposed SEMCOR Data • RegTypePOSWNSNLemmaLemma>POS>Sense • br-a01 The DT 99 The The>DT>99 • br-a01 Fulton_County_Grand_Jury NNP 1 Fulton_County_Grand_Jury Fulton_County_Grand_Jury>NNP>1 • br-a01 said VB 1 say say>VB>1 • br-a01 Friday NN 1 friday friday>NN>1 • br-a01 an DT 99 an an>DT>99 • br-a01 investigation NN 1 investigation investigation>NN>1 • br-a01 of IN 99 of of>IN>99 • br-a01 Atlanta's NN 1 atlanta atlanta>NN>1 • br-a01 recent JJ 2 recent recent>JJ>2 • br-a01 primary_election NN 1 primary_election primary_election>NN>1 • br-a01 produced VB 4 produce produce>VB>4 • br-a01 no DT 99 no no>DT>99 • br-a01 evidence NN 1 evidence evidence>NN>1 • br-a01 that IN 99 that that>IN>99 • br-a01 any DT 99 any any>DT>99 • br-a01 irregularities NN 1 irregularity irregularity>NN>1 • br-a01 took_place VB 1 take_place take_place>VB>1

  21. Questions: What is the impact on computer-generated word lists when the following conditions are assumed for counting purposes?

  22. Semantic frequency and the creation of pedagogical word lists:What can we learn from SemCor?Dee GardnerBrigham Young University

More Related