110 likes | 123 Views
This study explores the use of relational structure to understand publication patterns in high-energy physics, including data cleaning, extraction, and analysis. The authors identify research communities and predict journal publications using KDL's PROXIMITY software. They also analyze data dependencies and author influence.
E N D
Exploiting Relational Structure to Understand Publication Patterns in High-Energy Physics Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, David Jensen Knowledge Discovery LaboratoryUniversity of Massachusetts Amherst
Knowledge Discovery Process Data cleaning Data extraction Data analysis Citation analysis Identifying research communities Predicting journal publication Data dependencies Understanding author influence Implemented using KDL’s PROXIMITY software
Extracted abstracts Consolidated authors Same name assumed 13,185 authors to 9,200 Co-authored with similar names Authors of referenced papers with similar names Authors with similar email domains and the same username Data cleaning and extraction Relational schema
Data dependencies • Examples of high correlations: • Number of downloads in first 60 days and number of citations • Is paper published and number of citations (binned) • Examples of high autocorrelation: • Journal name (through author) • Topic cluster of paper (through author) • Author’s total co-authors (through paper) • Number of downloads in first 60 days (through journal) + – + – + – – + + + Low autocorrelation High autocorrelation
Papers from 1995-2000 68% accuracy, 0.75 AUC Will a paper be accepted by Physics Letters B?
Identifying Research Communities • Spectral clustering on citation graph and abstracts • Papers from 1995 to 2000
Example topic clusters Cluster 2: Black hole approach to string theory: Sumit R.Das (251), Physical Review D Absorption of Fixed scalars and the D-brane Approach to Black Holes Universal Low-Energy Dynamics for Rotating Black Holes Interactions involving D-branes Black Hole Greybody Factors and D-Brane Spectroscopy Cluster 10: Tachyon Condensation: Juan M. Maldacena (1924), Journal of High Energy Physics Field theory models for tachyon and gauge field string dynamics Super-Poincare Invariant Superstring Field Theory Level Four Approximation to the Tachyon Potential in Superstring Field Theory SO(32) Spinors of Type I and Other Solitons on Brane-Antibrane Pair
KDD Cup 2003 Paper:kdl.cs.umass.edu/papers/kddcup2003.htmlProximity:kdl.cs.umass.edu/proximity/Email:amy@cs.umass.edu