280 likes | 287 Views
Question Answering in Biomedicine. Student: Andreea Tutos Id: 41064739 Supervisor: Diego Molla. Project outline. Why medical question answering? Current research Project methodology Project outcomes. Why Question Answering?.
E N D
Question Answering in Biomedicine Student: Andreea Tutos Id: 41064739 Supervisor: Diego Molla
Project outline • Why medical question answering? • Current research • Project methodology • Project outcomes
Why Question Answering? • Thousands of new biological and medical research articles published daily world wide • 66% of physicians report the volume of medical information as unmanageable (Craig et al, 2001). • The main impediment in maximizing the utility of research data: insufficient time
Project outline • Why question answering? • Current research • Evaluation methodology • Project outcomes
What is Question Answering? • The task of automatically finding an answer to a question • Relies on analyzing large collections of documents • Aims to provide short and concise answers rather than a list of relevant documents
Key steps to follow • Select the domain knowledge source • Construct the corpus of questions • Analyze the input question • Classify the question • Construct the search query • Extract the answer
Domain knowledge sources • Reliability of medical information is critical (NetScoring) • MEDLINE - medical repository maintained by the US National Library of Medicine (controlled vocabulary thesaurus MeSH)
Key steps to follow • Select the domain knowledge source • Construct the corpus of questions • Analyze the input question • Classify the question • Construct the search query • Extract the answer
Question corpus sources • Question sources we have reviewed in our research: • The Parkhurst Exchange website • The Clinical Questions Collection website • The Journal of Family Practice website
Questions format • Natural language question: “In children with an acute febrile illness, what is the efficacy of single-medication therapy with acetaminophen or ibuprofen in reducing fever?” • PICO format question: “Problem/Population: acute febrile illness / in children Intervention: acetaminophen Comparison: ibuprofen Outcome: reducing fever “ (Demner - Fushman and Lin, 2007 )
Key steps to follow • Select the domain knowledge source • Construct the corpus of questions • Analyze the input question • Classify the question • Construct the search query • Extract the answer
Question classification • The Evidence taxonomy (Ely et al, 2002)
Query analysis • Processes included: • Keyword selection: • extract keywords using parsers such as LTCHUNK • identify named entities with the support of UMLS • Answer pattern generation (different combinations of query terms) (Molla and Vicedo, 2009)
Key steps to follow • Select the domain knowledge source • Construct the corpus of questions • Analyze the input question • Classify the question • Construct the search query • Extract the answer
Answer extraction • Identify relevant sentences that answer the question • Rank the answer candidates (popularity, similarity with the question, answer patterns, answer validation) (Molla and Vicedo, 2009) • Could use the IMRAD (Introduction, Methods, Results and Discussion) structure of biomedical articles (MedQA)
Search engines and question answering systems • Generic: • Google • Answers.com • OneLook • Medical: • PubMed • MedQA • Google on PubMed only
Project outline • Why question answering? • Current research • Project methodology • Project outcomes
Project methodology - Question corpus • We have sourced 50 clinical questions and their answers from the Parkhurst Exchange web site
Project methodology - Question processing • We have defined five levels of processing to be applied to improve search outcomes.
Project methodology – Scoring system • We have used a scoring system first referred to in the Text Retrieval Conference (TREC), called Mean Reciprocal Rank (MRR) (Voorhees, 2001) • A relevant link returned in nth position (n<= 10) received a score of 1/n
Results – No Intervention questions • No Intervention questions average scores
Results – Intervention questions • Intervention questions average scores
Results – Answer location • Answer location in scientific articles
Project outline • Why question answering? • Current research • Project methodology • Project outcomes
Medical search engines and QA systems conclusions • Pubmed obtained similar scores for both categories (0.27 for No Intervention and 0.24 for Intervention questions) • Medical search engines perform relatively equal on Intervention and No Intervention questions
Generic search engines and QA systems conclusions • Google recorded the best performance for both categories of questions • Both Google and Answers.com scored better results on No Intervention questions than on Intervention questions • Non-medical oriented search engines have more difficulties in producing answers for scenario-based, complex medical questions.
Conclusions • All selected questions are answerable with the current technology • 50% of answers are located in the Abstract section of scientific articles; 25% in the Conclusions section • No Intervention questions are easier to answer than Intervention questions when it comes to generic search technology