1 / 18

CiteGraph : A Citation Network System for MEDLINE Articles and Analysis

CiteGraph : A Citation Network System for MEDLINE Articles and Analysis. Qing Zhang 1,2 , Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester, MA, USA 2 University of Wisconsin Milwaukee, Milwaukee, Milwaukee, WI, USA 3 VA Central Massachusetts, Leeds, MA, USA. Outline.

lolita
Download Presentation

CiteGraph : A Citation Network System for MEDLINE Articles and Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang1,2, Hong Yu1,3 1University of Massachusetts Medical School, Worcester, MA, USA 2University of Wisconsin Milwaukee, Milwaukee, Milwaukee, WI, USA 3VA Central Massachusetts, Leeds, MA, USA

  2. Outline • Introduction • Background • Method • Evaluation • Analysis CiteGraph, MedInfo 2013

  3. Introduction • Citation network is important for • Information retrieval • Journal Impact Factor, H-index • Co-authorship network is important • Few citation networks are available for research • We built CiteGraph CiteGraph, MedInfo 2013

  4. Background • Citation network analysis • Power law distribution in citation networks • Article ranking, HITS and PageRank • Community structure of physics fields • Citation network tool for given legal issue using legal document citation network • Co-authorship network analysis • Research collaboration patterns • Author authority : Erdös Number • Literature search • CiteSeerX, Google Scholar CiteGraph, MedInfo 2013

  5. The CiteGraph Data CiteGraph, MedInfo 2013

  6. Citation Network Example CiteGraph, MedInfo 2013

  7. Challenges • Yu, H and Lee M. 2006. Accessing Bioscience Images from • Abstract Sentences. Bioinformatics. Vol 22 No. 14, pages e547–e556. • (2) Hong Yu and Minsuk Lee. Accessing Bioscience Images from • Abstract Sentences. Bioinformatics. Vol 22 No. 14, pages e547–e556. 2006. • (3) Yu H, Lee H. 2006. Accessing Bioscience Images from • Abstract Sentences. Bioinformatics: 22 (14), e547–e556. CiteGraph, MedInfo 2013

  8. Methods • Mapping between articles • Mapping articles to the PubMed ID • Author name disambiguation CiteGraph, MedInfo 2013

  9. Methods • If two of the following matching result are true, we consider the two entities (for example the citation and the article) are matched • Title matching • the set of tokens contained in one title field is a subset of the tokens in the other, or • the number of tokens common to both fields is more than 80% of the size of the larger of the two fields. • Author list matching • two lists of surnames have one-on-one mapping • surnames in one entity (citation) is fully contained in the surname set of the second (article). • Journal name matching • remove stop words such as “of” • if the number of common initials in the journal titles was greater than 80% of the tokens in the longer journal name, they were considered equivalent.

  10. Evaluation Results • 7 Annotators are invited to annotate the citation mapping and PMID mapping results • Each annotator is presented with 20 matching results of each task CiteGraph, MedInfo 2013

  11. The CiteGraph Statistics 1.65 M articles 1.37 M authors 6.35 M citations CiteGraph, MedInfo 2013

  12. The CiteGraph Statistics LivakKJ., Schmittgen TD., Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 2001 Dec;25(4):402-8. log y = 1.06 – 2.45* log x (p<0.05 t-test) CiteGraph, MedInfo 2013

  13. The CiteGraph Statistics Largest connected component : 1.27 million authors (92.7%) The second largest connected component: 35 authors CiteGraph, MedInfo 2013

  14. The CiteGraph Statistics Co-authorship spans from 1 to 35 years, while 83.7% of author pairs just appear once. CiteGraph, MedInfo 2013

  15. The CiteGraph Statistics * The largest component is excluded when calculating the statistics in the table. Its size is 1.27 million (92.7% authors) CiteGraph, MedInfo 2013

  16. Trends CiteGraph, MedInfo 2013

  17. Conclusion • We created a citation/co-authorship networks with biomedical full text literature • Our networks have high accuracy and large scale, and it can benefit biomedical text mining communities • Article ranking • Research collaboration recommendation • Social network analysis • The network database can be downloaded per request CiteGraph, MedInfo 2013

  18. Acknowledgement • National Institute of Health 1R01GM095476 to Hong Yu • A start-up fund from University of Massachusetts Medical School to Hong Yu • National Center for Advancing Translational Sciences of the National Institute of Health under award number UL1TR000161. CiteGraph, MedInfo 2013

More Related