Proximity-based Ranking of Biomedical Texts

Proximity-based Ranking of Biomedical Texts Rey-Long Liu* and Yi-Chih Huang *Dept. of Medical Informatics Tzu Chi University Taiwan

Outline • Research background • Problem definition • The proposed approach: PRE • Empirical evaluation • Conclusion A Proximity-based Ranker Enhancer

Research Background A Proximity-based Ranker Enhancer

Biomedical Information Need • Biomedical research requires relevant evidences in the huge and ever-growing biomedical literature • Retrieval of the evidences requires a system that • Accepts a natural language query for a biomedical information need, and • Ranks relevant texts higher for access or processing A Proximity-based Ranker Enhancer

An Example Info Need • Query: urinary tract infection, criteria for treatment and admission (from OHSUMED) • A disease as the target concept (i.e., urinary tract infection) • Two concepts about the scenario of the information need (i.e., treatment and admission) • Neither special nor related to any disease A Proximity-based Ranker Enhancer

Problem Definition A Proximity-based Ranker Enhancer

Goals • Explore how text rankers may be improved by considering the completeness of query concepts appearing in a nearby area of the text being ranked • Develop a technique PRE (Proximity-based Ranker Enhancer) that • Measures contextual completeness of query concepts appearing in a nearby area in the text • Serves as a supplement to improve existing rankers A Proximity-based Ranker Enhancer

Related Work • Biomedical text ranking • Using synonyms and considering diversity of passages, without considering term proximity • Text ranking • Individual text scoring techniques (e.g., BM25) and learning to rank techniques (e.g., Ranking SVM), without considering term proximity • Improving ranking by term proximity • Term proximity is employed, but contextual completeness was not considered A Proximity-based Ranker Enhancer

The Proposed Approach: PRE A Proximity-based Ranker Enhancer

Training Data Ranked Texts System Overview User PRE Underlying Ranker Text Ranker Development Training Testing Query (q) TF (Term Frequency) Assessment Text Ranking TF in d Text (d) A Proximity-based Ranker Enhancer

TF Assessment • Three types of term proximity • Overall proximity (QTermTF) • Individual proximity (IndiP) • Collective proximity (CollP) • A term t may get a large TF increment in d, if • Many query terms appear frequently in d • Query terms are individually near to t at some places, and • Query terms collectively appear at a place near to t A Proximity-based Ranker Enhancer

RTF(t,d,q) = TF(t,d)+TFincrement(t,d,q) • TFincrement(t,d,q) =QtermTF(d,q)IndiP(t,d,q)×CollP(t,d,q) • QtermTF(d,q) = Total TF of query terms in d • IndiP(t,d,q) =ΣmM－{t}SigmoidWeight(Mindist(t,m))/ MaxIndiP • Mindist(x,y) = shortest distance between x and y in d • SigmoidWeight(dt) = 1/(1+e-((|q|-1)-dt)) • CollP(t,d,q) = MaxkK{mM－{t} SigmoidWeight(dist(t,k,m))}/MaxCollP, where K is the set positions at which t appears in d • dist(t,k,m) = Distance between t (at position k) and m A Proximity-based Ranker Enhancer

Empirical Evaluation A Proximity-based Ranker Enhancer

Experimental Data • OHSUMED • A popular database of biomedical queries and references • 106 queries • 348,566 references • 16,140 query-reference pairs • Definitively relevant • Possibly relevant • Not relevant A Proximity-based Ranker Enhancer

Underlying Rankers A Proximity-based Ranker Enhancer

Baseline Ranker Enhancer • Three state-of-the-art techniques that enhanced text rankers by term proximity • The t-function • t() by [Tao & Zhai, 2007] • The p-function • p() by [Cummins & O’Riordan, 2009] • The proximity language model • PLM by [Zhao & Yun, 2009]. A Proximity-based Ranker Enhancer

Evaluation Criteria • Evaluating how relevant references are ranked higher for users to access • Mean average precision (MAP) • Normalized discount cumulative gain at x (NDCG@X) A Proximity-based Ranker Enhancer

Results A Proximity-based Ranker Enhancer

A Proximity-based Ranker Enhancer

Conclusion A Proximity-based Ranker Enhancer

Term proximity may be comprehensively applied to improving various kinds of text rankers • It is helpful to integrate three types of term proximity • Overall proximity • Individual proximity • Collective proximity • Term proximity information may be encoded to re-assess TF of each term A Proximity-based Ranker Enhancer

Proximity-based Ranking of Biomedical Texts

Proximity-based Ranking of Biomedical Texts

Presentation Transcript

PROXIMITY

Proximity

Proximity

Team Based Design of Biomedical Devices

Proximity

Proximity

Graph-based Proximity Measures

Programming Assignment #3 Proximity-Based Localization

Using Overlay Networks for Proximity-based Discovery

Semiotics of Texts

Ranking Documents based on Relevance of Semantic Relationships

Information Extraction from biomedical texts

Score-based ranking of the documents

Proximity-Based Authentication of Mobile Devices

Proximity

Automating Discovery from Biomedical Texts

Link-based ranking I

Ranking-based Processing of SQL Queries

Web Based Information texts

Proximity Based Mobile Advertising

Proximity-Based Authentication of Mobile Devices

Proximity