410 likes | 1.13k Views
Citation Indexing. Nitish Mathew Thanks to Dr. C. Lee Giles Dr. Paul Cohen. Outline. Introduction to Citation Indexing What is Citation Indexing Concept Web of Science Bias Autonomous Citation Indexing Future Application Technology Forecasting Summary. Why do literature search?.
E N D
Citation Indexing Nitish Mathew Thanks to Dr. C. Lee Giles Dr. Paul Cohen
Outline • Introduction to Citation Indexing • What is Citation Indexing • Concept • Web of Science • Bias • Autonomous Citation Indexing • Future Application • Technology Forecasting • Summary
Why do literature search? • Avoid unwitting duplication of research • Wasted time, effort & funds • Plagiarism issues
Concept of Citations • Citations symbolize the conceptual association of scientific ideas as recognized by publishing research authors. • By the references they cite in their papers, authors make explicit linkages between their current research and prior work in the archive of scientific literature.
Distinction between "citation" and "reference" • If Paper R contains a bibliographic footnote using and describing Paper C, then • R contains a reference to C, • C has a citation from R. • The number of references a paper has is measured by the number of items in its bibliography as endnotes, footnotes, etc., • The number of citations a paper has is found by looking it up [in a] citation index and seeing how many others papers mention it." Source: Price D. J. D. Little science, big science...and beyond. New York: Columbia University Press, 1986.
Paper R …..To start, it is important to clarify the terminological distinction between "citation“[6] and "reference". In his classic book Little Science, Big Science, Derek Price gave a clear definition of both terms. He said: "It seems to me a great pity to waste a good technical term by using the words citation and reference interchangeably. I therefore propose and adopt the convention that if Paper R contains a bibliographic footnote using and describing Paper C, then R contains… R contains a reference to C, [6] The concept of citation indexing: A unique and innovative tool for navigating the research literature. Current Contents, January 3, 1994. Paper C Little science, big science...and beyond. This is my first Current Contents® (CC®) essay under the rubric of Citation Comments. As discussed in last week's CC, this new monthly feature will focus on the applications of the Institute for Scientific Information's (ISI's) databases. 1 An appropriate topic to launch this new series is perhaps the most rudimentary -- the basic concept of citation indexing. To start, it is important to clarify the terminological distinction between "citation" and "reference". In his classic book Little Science, Big Science, Derek Price gave a clear definition of both terms. He said: "It seems to me a great pity to waste a good technical term by using the words citation and reference interchangeably. I therefore propose and adopt the convention that if Paper R contains a bibliographic footnote using and describing Paper C, then R contains a. C has a citationfrom R.
Citation Index Paper C • Paper X • Paper Y • Paper R • Paper Q
Citation Indexing • A citation index indexes the citations an article makes, linking the article with cited works. • Originally designed mainly for literature search for researchers to find subsequent articles that cite a given article. • Invented by Dr. Eugene Garfield • Example of a Citation Indexing Firm - Institute for Scientific Information ® (ISI)
Institute for Scientific Information® (ISI) • Index the linkages by listing both the cited and citing works. • The ISI® databases • Science Citation Index® (SCI®) • Social Sciences Citation Index® (SSCI®) • Arts & Humanities Citation Index® (A&HCI®) • Multidisciplinary. They cover virtually all disciplines whereas traditional indexing and abstracting services are limited to a single field.
Web of Knowledge • ISI Web of Knowledge®, a dynamic, integrated, Web-based environment • ISI Web of Science® provides access to • Science Citation Index (over 3,200 journals ) • Social Sciences Citation Index (1400 journals) • Arts & Humanities Citation Index • Updated weekly. • Journals from 1986 is available for Penn State Users • Previous years of each index are available in PRINT at the Libraries.
Web of Science • search current and retrospective multidisciplinary information from nearly 8,500 research journals in the world. • users can navigate forward, backward, and through the literature, searching all disciplines and time spans to uncover lot of information relevant to their research.
Advantages • Compared to traditional indexing- • no subjective judgments to be made about relevant descriptors • faster • no limit to index terms - all cited references are indexed.
Problems with ISI Databases • Require manual effort during indexing • Expensive • Bias issues • One possible solution – Autonomous Citation Indexing Adapted from ‘Citation Indexing - Its Theory and Application in Science, Technology, and Humanities’ by Eugene Garfield
Bias in Citation Databases • Bibliometric indicators do not represent all publishing -though these databases have an international coverage, they have a certain amount of bias- • They contain more minor US journals than minor European journals • Non-English language journals are not as comprehensively indexed • From a non-English speaking world perspective, bibliometric indicators represent only international level, predominantly English language, higher impact, peer-reviewed, publicly available research output. Source: Bibliometric Indicators and the Social Sciences, prepared for ESRC, J. Sylvan Katz SPRU, University of Sussex UK, December 1999
Bias in Citation Databases • One of the recurrent criticisms – journal selection is biased by the internal management decisions of ISI. • Only journals are indexed- monographs are left out. • A lack of correlation between the most highly cited authors based on the journal sample and those based on the monograph sample suggests that there may be two distinct populations of highly cited authors. Source: Blaise Cronin and Herbert W. Snyder. Comparative citation rankings of authors in monographic and journal literature: a study of sociology. Journal of Documentation,53(3):263–273, 1997.
ResearchIndex/CiteSeer • ResearchIndex: A scientific literature digital library that incorporates • Autonomous citation indexing • Citation context • Full-text indexing • Related document identification • Query sensitive summaries • Awareness and tracking • Citation graph analysis • http://citeseer.nj.nec.com/cs Source: Presentation on “Searching the World Wide Web General and Scientific Information Access”, Steve Lawrence
CiteSeer – How does it work? Downloads papers from the Web Convert to text and parse Obtain Citations & Do Full Text Indexing Store them in Database Query by citations or key words Source: CiteSeer: An Automatic Citation Indexing System (1998),C. Lee Giles, Kurt D. Bollacker, Steve Lawrence, Digital Libraries 98 - The Third ACM Conference on Digital Libraries
CiteSeer - Document Acquisition • Web search engines used for crawling • Heuristics used to locate papers • Pages containing words “publications”, “papers”, “postscript”, etc.). • locates and downloads Postscript files identified by “.ps”, “.ps.Z”, or “.ps.gz” extensions. • URLs and Postscript files that are duplicates of those already found are detected and skipped.
Document Parsing • The downloaded Postscript files are first converted into text • Information extracted include- URL , Header, Abstract, Introduction, Citations, Citation context and Full text • Issues in Citation Parsing include: • Natural language citations • Citations to the same article (affects citation statistics)
Querying and Browsing • First query – key word search used to return a list of citations matching the query or list of articles. • Finds related documents- a combination of weighed similarity measures are used • http://citeseer.nj.nec.com/cs
Advantages of CiteSeer • Completely Autonomous - cheaper and more availability • More up-to-date databases - not limited to a pre-selected set of journals or publication delays • Literature search based on the context of citations • Ability to recognize variant forms of citations • No bias due to no subjective selection of journals • Not restricted to papers – preprints, technical reports, conference proceedings also indexed. • User feedback on each article Source: Autonomous Citation Matching (1999) Steve Lawrence, C. Lee Giles, Kurt Bollacker Proceedings of the Third International Conference on Autonomous Agents
Areas of Improvement 1. Does not cover the significant journals comprehensively. (might be less of a disadvantage over time as more journals become available online) 2. Cannot distinguish subfields as accurately (e.g. CiteSeer will not disambiguate two authors with the same name.) 3. Similar document retrieval system could be enhanced and improved. 4. Heuristics used to locate articles could be improved
Future prospects – Technology Forecasting • DIVA (for Database Information Visualization and Analysis system) - bibliometric analysis of collections of scientific literature and patents for technology forecasting. • Documents, drawn from the technological field of interest, are visualized as clusters on a two dimensional map, permitting exploration of the relationships among the documents and document clusters • Can yield insight into trends in the technological field of interest. Source: DIVA: A Visualization System for Exploring Document Databases For Technology Forecasting by Steven Morris, Zheng Wu, Camille DeYong, Sinan Salman, Dagmawi Yemenu Computers and Industrial Engineering, Vol. 43, No. 4
Document timelines ‘Polymers’ cluster report showing a plot of links to all other clusters by year
Document timelines ‘Polymers’ cluster report showing a plot of links to each other cluster by year.
A comment on bibliometric analysis Compared to “a drunk who is looking for his keys under a street lamp” . When asked by a passer-by as to why he is looking there, the reply was “ This is where the lamp is”.
A comment on bibliometric analysis Critics say that publications (and citations) just provide “easy data” and that the assessment of “real quality” needs more “quantitative considerations”.
Summary • Citation Indexing – more the 40 years old. • Simple concept – far reaching influences, applications • Many possibilities for • Improvement of existing systems • Developing new uses in the networked world