210 likes | 317 Views
Proximity-based Ranking of Biomedical Texts. Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan. Outline. Research background Problem definition The proposed approach: PRE Empirical evaluation Conclusion. Research Background.
E N D
Proximity-based Ranking of Biomedical Texts Rey-Long Liu* and Yi-Chih Huang *Dept. of Medical Informatics Tzu Chi University Taiwan
Outline • Research background • Problem definition • The proposed approach: PRE • Empirical evaluation • Conclusion A Proximity-based Ranker Enhancer
Research Background A Proximity-based Ranker Enhancer
Biomedical Information Need • Biomedical research requires relevant evidences in the huge and ever-growing biomedical literature • Retrieval of the evidences requires a system that • Accepts a natural language query for a biomedical information need, and • Ranks relevant texts higher for access or processing A Proximity-based Ranker Enhancer
An Example Info Need • Query: urinary tract infection, criteria for treatment and admission (from OHSUMED) • A disease as the target concept (i.e., urinary tract infection) • Two concepts about the scenario of the information need (i.e., treatment and admission) • Neither special nor related to any disease A Proximity-based Ranker Enhancer
Problem Definition A Proximity-based Ranker Enhancer
Goals • Explore how text rankers may be improved by considering the completeness of query concepts appearing in a nearby area of the text being ranked • Develop a technique PRE (Proximity-based Ranker Enhancer) that • Measures contextual completeness of query concepts appearing in a nearby area in the text • Serves as a supplement to improve existing rankers A Proximity-based Ranker Enhancer
Related Work • Biomedical text ranking • Using synonyms and considering diversity of passages, without considering term proximity • Text ranking • Individual text scoring techniques (e.g., BM25) and learning to rank techniques (e.g., Ranking SVM), without considering term proximity • Improving ranking by term proximity • Term proximity is employed, but contextual completeness was not considered A Proximity-based Ranker Enhancer
The Proposed Approach: PRE A Proximity-based Ranker Enhancer
Training Data Ranked Texts System Overview User PRE Underlying Ranker Text Ranker Development Training Testing Query (q) TF (Term Frequency) Assessment Text Ranking TF in d Text (d) A Proximity-based Ranker Enhancer
TF Assessment • Three types of term proximity • Overall proximity (QTermTF) • Individual proximity (IndiP) • Collective proximity (CollP) • A term t may get a large TF increment in d, if • Many query terms appear frequently in d • Query terms are individually near to t at some places, and • Query terms collectively appear at a place near to t A Proximity-based Ranker Enhancer
RTF(t,d,q) = TF(t,d)+TFincrement(t,d,q) • TFincrement(t,d,q) =QtermTF(d,q)IndiP(t,d,q)×CollP(t,d,q) • QtermTF(d,q) = Total TF of query terms in d • IndiP(t,d,q) =ΣmM-{t}SigmoidWeight(Mindist(t,m))/ MaxIndiP • Mindist(x,y) = shortest distance between x and y in d • SigmoidWeight(dt) = 1/(1+e-((|q|-1)-dt)) • CollP(t,d,q) = MaxkK{mM-{t} SigmoidWeight(dist(t,k,m))}/MaxCollP, where K is the set positions at which t appears in d • dist(t,k,m) = Distance between t (at position k) and m A Proximity-based Ranker Enhancer
Empirical Evaluation A Proximity-based Ranker Enhancer
Experimental Data • OHSUMED • A popular database of biomedical queries and references • 106 queries • 348,566 references • 16,140 query-reference pairs • Definitively relevant • Possibly relevant • Not relevant A Proximity-based Ranker Enhancer
Underlying Rankers A Proximity-based Ranker Enhancer
Baseline Ranker Enhancer • Three state-of-the-art techniques that enhanced text rankers by term proximity • The t-function • t() by [Tao & Zhai, 2007] • The p-function • p() by [Cummins & O’Riordan, 2009] • The proximity language model • PLM by [Zhao & Yun, 2009]. A Proximity-based Ranker Enhancer
Evaluation Criteria • Evaluating how relevant references are ranked higher for users to access • Mean average precision (MAP) • Normalized discount cumulative gain at x (NDCG@X) A Proximity-based Ranker Enhancer
Results A Proximity-based Ranker Enhancer
Conclusion A Proximity-based Ranker Enhancer
Term proximity may be comprehensively applied to improving various kinds of text rankers • It is helpful to integrate three types of term proximity • Overall proximity • Individual proximity • Collective proximity • Term proximity information may be encoded to re-assess TF of each term A Proximity-based Ranker Enhancer