1 / 20

Science Mapping -Richard Shiffrin-

Science Mapping -Richard Shiffrin-. New computational approaches allow extraction of implicit knowledge Computational approaches generate new knowledge transform the access to, and use of, knowledge.

kochava
Download Presentation

Science Mapping -Richard Shiffrin-

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Science Mapping-Richard Shiffrin- • New computational approaches allow extraction of implicit knowledge • Computational approaches • generate new knowledge • transform the access to, and use of, knowledge

  2. Some are surprised that the major developments in this field were initiated long ago by behavioral and social scientists, scientists who continue to produce major advances. • There should be no surprise: the text databases are generated by humans using their cognition—cognition and text mining cannot be separated. • In recent years, behavioral and social scientists work closely with statisticians, computer scientists, physicists, and so on.

  3. Flood of Information • Information doubles every 18 months • Human capabilities remain constant • are subjective • are biased • are based limited data • Data are of highly variable validity • Computational approaches extract the ‘wheat’ from the ‘chaff’

  4. Colleagues: • Katy Borner--edited ‘Mapping Knowledge Domains’ special PNAS issue (with me) • Mark Steyvers--contributed article to issue

  5. Mark Steyvers: • Developing state-of-art methods • Katy Borner • Developing ways to make extracted knowledge useful and interpretable

  6. Examples of uses: • Find and group science topics • Locate similar articles • Place topics/articles/authors in proper relation • Track growth and decay of science fields • Classify documents and authors by content • Project importance of new articles • Assess importance and results of grants • Answer queries

  7. Setting the stage

  8. Text Mining and Data Mining: Psychological/Computational Techniques A text corpus explicit knowledge e.g. -meaning of the sentences. A text corpus contains implicit knowledge: e.g. -rules of syntax and grammar -meaning of words -relations among word meanings -topics assumed by the sender -relations among topics

  9. Computational techniques extract implicit knowledge • Rapidly evolving research area • Approaches: • Heuristic/Ad Hoc • Descriptive Modeling • Generative Modeling

  10. Heuristic/Ad Hoc • Whatever works • Example: • Text parsers (~98% accuracy)— • E.g. TEMIS, INXSIGHT, LEXIQUEST, GATE, AUTONOMY

  11. Disadvantage of ad-hoc methods: • might not generalize to new situations. • Principled approaches allow development of new and better techniques.

  12. Descriptive models • Posit little about text generation • Use dimensional compression to allow inference and induction of implicit structure

  13. A Descriptive Model: LSA--Latent Semantic Analysis (e.g. Landauer and Dumais, 1997) • Idea: -- Semantically similar words occur together, or co-occur with the same other words -- ‘Meaning’ of a text segment = sum of meanings of its words • LSA extracts the relations of word meanings from the database

  14. LSA: • Matrix of documents by words, with entries giving word counts/document --Matrix decomposed by singular valued decomposition (SVD), into a product of three matrices ---First matrix gives word meanings, based on the most important dimensions of the original matrix ---Close distances represent close meaning.

  15. Even large databases can have the meanings of its words represented by 300 or so dimensions (i.e. vector values) • Distance between these word vectors = inner product = word similarity

  16. LSA representation very useful. • E.g. LSA has been used for essay grading • Grade based on distance from essay meaning to meaning of some essays previously judged good -- Grades as well or better than humans • E.g. LSA used to locate coherent scientific topics and estimate their similarity • Mark Steyvers and Katy Borner have used LSA in their work

  17. Generative Modeling: • Posits a model by which the text words are generated -- Uses induction to estimate the parameters of the model from the text

  18. Example: -- Mark Steyver’s ‘Topics Model’ • Assumes: • each text segment has words from a few topics • each topic mainly comprised of a few words • word in a segment is generated by sampling a topic and then a word from that topic. • Example: Used to map topics of articles in PNAS over a ten year period

  19. Science Mapping • Using ‘meaning’ and ‘topics’, one can map the structure of scientific knowledge in a database • E.g. organize subtopics and show how these evolve over time.

  20. In sum, these approaches provide an immensely useful way to find and map implicit knowledge • Eliminates personal bias • Bypasses the need to rely on judgments by individuals who have access to only a tiny slice of the relevant knowledge. • I now turn things over to Mark and Katy

More Related