1 / 19

Building Grid-enabled Virtual Screening Service for Drug Discovery

Building Grid-enabled Virtual Screening Service for Drug Discovery. Ying-Ta Wu 1 and Hurng-Chun Lee 2 1 Academia Sinica Genomic Research Center 2 Academia Sinica Grid Computing Center (ASGC). Outlines. Avian flu drug analysis on the Grid Developing grid-enabled virtual screening service

wolcott
Download Presentation

Building Grid-enabled Virtual Screening Service for Drug Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Grid-enabled Virtual Screening Service for Drug Discovery Ying-Ta Wu1 and Hurng-Chun Lee2 1Academia Sinica Genomic Research Center 2Academia Sinica Grid Computing Center (ASGC)

  2. Outlines • Avian flu drug analysis on the Grid • Developing grid-enabled virtual screening service • The next large-scale virtual screening on avian flu

  3. The virtual screening 300,000 Chemical compounds: ZINC Chemical combinatorial library Millions of chemical compounds available in laboratories High Throughput Screening $2/compound, nearly impossible Molecular docking (Autodock) ~137 CPU years, 600 GB data Data challenge on EGEE, Auvergrid, TWGrid ~6 weeks on ~2000 computers Hits sorting and refining In vitroscreeningof 100 hits Target (PDB) : Neuraminidase (8 structures)

  4. 1st large-scale avian flu virtual screening on the Grid • In 2006, a grid-enabled high-throughput screening against the H5N1 virus was performed • Matching 300,000 ligands against 8 targets using AutoDock • The computing requirement of 137 CPU years was tackled by the 6-weeks high-throughput screening (HTS) activity on EGEE, AuverGrid and TWGrid • Two different computing models (WISDOM and DIANE) were adopted for submitting docking jobs concerning two different user aspects (scalability and interactivity) • The goal is to analyze the efficiency of the known drugs to the possible Neuraminidases mutations

  5. Results Compounds list Software Site1 Statistics Parameter settings Target structures Compounds sublists User interface Site2 Compounds database Storage Element Software Storage Element Results Computing Element Computing Element High-throughput screening using WISDOM • WISDOM: Wide In-Silico Docking On Malaria • The platform has been successfully tested in previous challenge • a workflow of Grid job handling: automatic job submission, status check and report, error recovery • push model job scheduling + batch mode job handling

  6. Interactive screening using DIANE + GANGA • DIANE: Distributed Analysis Environment • An overlay system on top of a variety of distributed computing environment, taking care of all synchronization, communication and workflow management details on behalf of application • A lightweight framework for parallel scientific applications in master-worker model • Pull model job scheduling + interactive mode job handling with flexible failure recovery mechanism

  7. The grid statistics • ~600 GBytes of docking results are produced and archived on the Grid • ~83% were successfully completed according to the Grid Logging and Bookkeeping; only ~70% of results were really produced on the Grid storage element

  8. GNA 2.4% 15% cut off Enrichment of primary in silico HTS Original Type: T06 • 2qwe: Zanamivir (known drug) • five out of six known effective compounds can be identified in the first 15% of the ranking DAN 35% pKd=5.3 4AM 13% pKd=7.3 pKd=7.5 E = (5/6)/15% = 5.5 (< 1 in most cases) Ki=4uM Ki=150nM Ki=1nM GNA=zanamivir

  9. Mutation effects top 5% by clustering top 15% by HTS 300,000 x15% = 45,000 45,000 x 5% = 2,250 autodock re-rank

  10. T01 DNA 4AM 55% E119A 11.5% Effects of point mutation • Most known effective inhibitors lose their affinity in binding with a mutated target 2qwe: 2.4%  11.5% 1f8c: 13%  55%

  11. Q: How to deliver an user-friendly service integrating the high-throughput virtual screening and the data analysis?

  12. Lessons learnt • The grid-enabled virtual screening application does benefit the drug analysis in terms of money and time. • 137 CPU years in 6 weeks using about 2000 grid worker nodes • Primary HTS helps filter out 85% of compounds • Global enrichment rate: 5.5 • Mutations do affect the efficiency of the know drugs and potential hits • Gaps between the current system and a real end-user application • Lack of a well-annotated ligand database • Lack of a friendly user interface to run the virtual screening process on the Grid • Lack of an easy-to-use interface to access the produced docking results for further analysis • Lack of an automatic refinement pipeline

  13. GUI - first step to real end-user application Interactive analysis Job History Progress monitoring

  14. Refinement pipeline

  15. Common database • Chemical properties to better annotate the compounds • Results essential for further analysis are extracted and stored in a result database • Database access through AMGA • for access control • for data replication

  16. Proposal of the 2nd data challenge • Proposed plan: • Testing phase: May, 2007 • Official launch: June, 2007 • Biology goals • Further analysis on the effect of the mutations • Further analysis on the open conformation of NA • Grid goals • Improving the service usability • Enabling the refinement pipeline • Reducing researchers’ effort in data analysis

  17. Docking workflow preparation Contact point: Y.T. Wu E. Rovida P. D'Ursi N. Jacq Grid resource management Contact point: J. Salzemann TWGrid : H.C. Lee, H. Y. Chen AuverGrid : E. Medernach EGEE : Y. Legré Platform deployment on the Grid Contact point: H.C. Lee, J. Salzemann M. Reichstadt N. Jacq Users (deputy) J. Salzemann (N. Jacq) M. Reichstadt (E. Medernach) L. Y. Ho (H. C. Lee) I. Merelli, C. Arlandini (L. Milanesi) J. Montagnat (T. Glatard) R. Mollon (C. Blanchet) I. Blanque (D. Segrelles) D. Garcia Credit

  18. Mini Workshop • tomorrow afternoon from 2 pm at Conference Room 4 • Discussions on the collaboration issues of the 2nd avian flu data challenge Welcome your participation!

  19. DIANE Directory Service • Improving the scalability of the DIANE framework • The Directory Service is a server containing a list of all the masters • The Master register itself to the Directory Service • The Workers obtain a Master through the Directory Service • Directory Service has an algorithm for the load balancing of the workers and prioritization of the masters

More Related