1 / 1

Publications

Challenges and Solutions for Enabling Facebook like Graph-search on Small and Macro-molecular Structural Data. Talapady N Bhat*, J T Elliott, C E Campbell, U R Kattner, S R Boger, A Plant , Biochemical Science Division, National Institute of Standard and Technology, Gaithersburg MD 20899, USA.

noe
Download Presentation

Publications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges and Solutions for Enabling Facebook like Graph-search on Small and Macro-molecular Structural Data Talapady N Bhat*, J T Elliott, C E Campbell, U R Kattner, S R Boger, A Plant, Biochemical Science Division, National Institute of Standard and Technology, Gaithersburg MD 20899, USA. Historic Perspective of Developing Languages • Word–based approach • Words with specific meanings are established for every concept by a top-down method. • One word, one meaning, if a word is not adequate to specify a concept, a sentence, or paragraph is used. • Light; light of color red. • We used a laser light source in a microscope. It’s color was red. • Evaluation of the meaning a concept can be complicated as it involves evaluating several words. • Root-based approach –Indo-European languages (Sanskrit and Latin) • A root, word and then block-based multi-layered, on-demand, self-evolving, sliding-scale approach is used to construct a word that can define a concept more accurately using discriminating, high value terms. • Short, highly re-used roots are established by a top-down method • Y(uj) (join), O (creator, God, brain), Ga (motion, initiation). • Re-usable, best-practice words are generated by combining roots on-demand to form new high-value concepts. • Yoga (Coordinating brain and muscle motion) • Geno (genos)-cide (Latin, English) –race based killing • Bio-technology. • Words are combined to make re-usable, best-practice, sliding-scale blocks by a on-demand method so that succinct sentences can be constructed for new concepts using discriminating features to build custom use-cases. • Yogesh, Yogendra. • In a root-based approach • Roots needed to evaluate meaning of a concept in a sentence are bundled together to form a single term. • This method also allows the formation of on-demand concepts with succinct meaning from roots. • Red-light, red-laser-light CONCLUSIONS • A user friendly automated method called Chem-BLAST -Chemical Block Layered Alignment of Sub-structure Technique, has been developed and implemented to organize, compare, search, and share info on ligands in the PDB and PubChem.. • http://www.rcsb.org/pdb/explore/externalReferences.do?structureId=3GGT • Recently, the method has been extended to construct graphs of text –based data. OBJECTIVE In technological applications, for instance in drug-discovery, information from many documents need be collated. Such a query is invariably recursive or graph-based across related sentences and documents. The most commonly-used search technique of Google index on individual documents made up of ad hoc terminology does not provide such recursive capability. To enable accurate answers across multiple documents on social network, CEO Mark Zuckerberg, in Jan 2013 announced to develop a graph-search. A critical step in enabling a graph-search is the creation of recursively linked indices on documents using terms that are relevant to use-cases. Huge volume of data and ad hoc terminology used by documents make it difficult to create technologically relevant recursive indices across related documents using manual annotation or conventional feature extraction methods. We describe a method that proposes to develop rule-based, re-usable, use-case relevant, best practice scalable on-demand vocabulary to describe, integrate, intersect, and share structural info using graph-search. We illustrate its use to data from the PDB and PubChem. We will also explain how this concept may be extended to text-based data. Re-used terms Web Service, XML, OWL, URI Debacle Success depends on carefully chosen shared vocabulary Database 1 METHOD • Identify high value re-used, best-practice, short ‘roots’ by searching through documents and ontologies and publish them. • Create re-usable short, context relevant terms from the ‘roots’ and publish them. • Establish re-usable, permutable, use-case relevant, discriminating short blocks of information from the terms and publish them. • Generate graphs (RDF, OWL, tree) from terms and blocks (but not ‘roots’) and publish them for others to re-use to document in a discriminating, sliding scale. • Cell-cycle-protein (207K), • Cell-cycle-process(7K), • Cell-in-vivo (91K), • Antigen-presenting-cell : Phagocytic-cell : Epithelioid-cell, • Antigen-presenting-cell : Phagocytic-cell : Macrophage. Database 2 Results http://xpdb.nist.gov/chemblast/pdb.pl (PDB, PubChem) http://xpdb.nist.gov/bioroot/bioroot.pl (Roots for Bio-ontologies) http://xpdb.nist.gov/bioroot/bioblocks.pl (Blocks for Bio-ontologies) http://xpdb.nist.gov/nike/term.pl (Terms from NIST publications) http://xpdb.nist.gov/image/cell_image.htm (Cell image) Publications • Prasanna, M., Vondrasek, J., Wlodawer, A., Bhat, TN. (2005). Application of InChI to curate, index and query 3-D structures. Proteins, Structure, Function, and Bioinformatics, 60, 1-4. • Prasanna, M. D., Vondrasek, J., Wlodawer, A., Rodriguez, H., & Bhat, T. N. (2006). Chemical compound navigator: a web-based chem-BLAST, chemical taxonomy-based search engine for browsing compounds. Proteins, 63(4), 907-917. • Bhat TN and Barkley J (2008) Development of use case for chemical RDF for AIDS drug discovery. Open Bioinf. J. 2:20-27. • Bhat, T.N., Building Chemical Ontology for Semantic Web Using Substructures Created by Chem-BLAST, JSWIS, 2010 6(3): P22-37 • Plant, A.L., Elliott, J.T. and Bhat, T.N. BMC Bioinformatics, 2011 12:487 Contact: Talapady N Bhat bhat@nist.gov Ph: 301 -975 5448

More Related