1 / 5

UKPMC Supplementary Data

Explore how to efficiently manage and utilize the growing volume of unstructured supplementary data within UKPMC using a semantically-driven search approach. Enhance search capabilities, organize data, and enable contextual mapping with parent articles for better research insights.

Download Presentation

UKPMC Supplementary Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UKPMC Supplementary Data Vic Lyte 28th April 2010

  2. Background • Currently there is 277 GB of supplementary data within UKPMC and growing; • From 1.7M documents, 88,652 have 1 or more items of SD; • Consists of additional files that that the author has uploaded and feels add contextual richness to their article deposition process; • Individual documents are systematically marked up and tagged, supplementary data is not and exists in an unstructured form within a directory location attached to a given article; • Text & Data-mining initiatives offer cross aggregation and semantic views on document corpus but not extend to supplementary data due to its unstructured and granular nature.

  3. Background • No plans to manually mark up this additional resource of file due to multifarious range of file format (n= 290) and idiosyncratic nature of data artefacts - wider provenance issue; • This presents a challenge in the exposing and aggregation management of these rich assets other than a direct 1 to 1 relationship with their parent article; • As this sub-corpus continues to grow there is benefit in exploring techniques offering a way to bring this potentially hidden material into an overall semantic search strategy;

  4. Scenario • A researcher conducting a meta-analysis on RCT's related to pain management may want to identify: • what studies have been conducted in this area () • which semantic groupings occur from the document corpus in relation to 'perception from a psychological perspective' () • what questionnaires and associated data has been made available in the corresponding area of inquiry (X) • Currently not possible to cluster and group across the sub-corpus to achieve the last area due to these items being in the supplementary data layer.

  5. R&D ActivityComplementary approach • Unstructured search approach; • Similar discovery paradigm in other knowledge sectors; • Use Autonomy IDOL to investigate how it can organise and expose SD within context (semantically-driven search); • Proven Data agnostic search and mining capability; • Contextual mapping with parent article(s) and associated data; • Machine-driven taxonomies and clustering; • Automatic metadata generation. • Development of a ‘Proof-of-Concept’ demonstrator;

More Related