1 / 57

yucca: an efficient algorithm for small molecule docking

structure of a molecule, independent of the coordinate system ... Both molecules are flexible hundreds to thousands of degrees of freedom (DOF) ...

PamelaLan
Download Presentation

yucca: an efficient algorithm for small molecule docking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Vicky Choi Assistant Professor Department of Computer Science Virginia Tech

    Yucca: An Efficient Algorithm for Small Molecule Docking

    2. Outline

    Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm

    3. Molecular Docking

    Computational prediction of the structure of receptor-ligand complexes Receptor: Protein Ligand: Protein or Small Molecule Protein-Protein Docking Protein-Small molecule Docking

    4. Protein-Protein Docking

    Barnase Barstar 1BRS : Barnase + Barstar In the protein-protein docking problem were given two unbound protein complexes and the problem is to find the In the protein-protein docking problem were given two unbound protein complexes and the problem is to find the

    5. Protein-Small Molecule Docking

    Receptor: Adipocyte lipid-binding protein PDB code: 1LIC Ligand: Hexadecanesulfonic acid

    6. Why is Docking Important?

    Molecular interactions are central to most of biological processes The number of known molecular structures continues to grow, computational analysis of molecular interactions is increasingly important Computational prediction of molecular interactions is an invaluable tool for structure-based drug design

    7. Example: HIV-protease

    Image adopted from Nature Rev. Drug Discov. 2, 369-378 (2003)

    8. Formulation of Docking Problem

    A search algorithm that finds the docking complex structure measured by the scoring function. A scoring function that can discriminate correct (experimentally observed) docking complex structure from incorrect ones.

    9. Terminology: conformation, configuration, pose

    Conformation: the relative positions of atoms in the 3D structure of a molecule, independent of the coordinate system 2 different conformations of a ligand

    Configuration/placement: the positions of atoms of a molecule after undergoing a rigid transformation (rotation and translation) in a coordinate system 2 different configurations (of same conformation) Terminology: conformation, configuration, pose Pose: a configuration of a conformation of a molecule in a coordinate system 2 different poses of a ligand Terminology: conformation, configuration, pose

    12. Why is Docking Difficult ?

    Scoring Function: Estimate the binding affinity between ligand and receptor Factors: van der Waals interactions, hydrogen bonding, hydrophobic effects etc Search Space is high-dimensional: Both molecules are flexible hundreds to thousands of degrees of freedom (DOF) Total possible poses are astronomical

    13. Hydrogen bond

    hydrogen bond: a hydrogen is sandwiched between two electron-attracting atoms From : http://www.accessexcellence.org/RC/VL/GG

    14. Why is Docking Difficult ?

    Scoring Function: Estimate the binding affinity between ligand and receptor Factors: van der Waals interactions, hydrogen bonding, hydrophobic effects etc Search Space is high-dimensional: Both molecules are flexible hundreds to thousands of degrees of freedom (DOF) Total possible poses are astronomical

    15. Types of Docking Problems

    Protein-Protein Docking Bound docking (rigid redocking problem): 6 degrees of freedom: 3 for rotation, 3 for translation Unbound docking : side chain flexibility Protein-Small Molecule Docking Rigid receptor, rigid ligand Rigid receptor, flexible ligand Flexible receptor, flexible ligand

    16. Rigid-Receptor Flexible-Ligand Docking

    Rigid Receptor: (hold fixed) Flexible Ligand: Find a pose of the ligand which is close to its X-ray pose (bound conformation). RMSD: Root-Mean-Square-Distance

    17. Outline

    Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm

    18. Available Docking Software

    DOCK (Kuntz et al, 1982, Ewing & Kuntz 2001) FlexX (Rarey et al 1996) Hammerhead (Welch et al 1996) Surflex (Jain 2003) SLIDE (Kuhn et al 2002) AutoDock (Olson et al 1990, Morris et al 1998) ICM (Abagyan et al 1994) MCDock (Liu & Wang 1999) GOLD (Jones et al 1997) GemDock (Yang & Chen 2004) FRED (McGann et al 2002) Glide (Friesner et al 2004) Yucca (Choi 2005)

    19. Docking Algorithms

    Stochastic Search: Genetic Algorithm, Monte Carlo simulated annealing AutoDock, MCDock, ICM, GOLD, Glide Incremental Construction: Rigid fragments with rotatable bonds Incremental : preferred torsion angles DOCK, FlexX, SLIDE, Surflex Multiconformer: Generate a set of low-energy conformers Rigid docking FLOG, FRED, Yucca

    Incremental construction (FlexX & DOCK) 0. Fragmentation: 2. Anchor fragment placement 3. Incremental addition of other fragments a set of preferred torsion angles (<13) branch-and-bound heuristic 1. Base (Anchor) fragment selection: - specificity - placeability

    21. Docking Algorithms

    Stochastic Search: Genetic Algorithm, Monte Carlo simulated annealing AutoDock, MCDock, ICM, GOLD, Glide Incremental Construction: Rigid fragments with rotatable bonds Incremental : preferred torsion angles DOCK, FlexX, SLIDE, Surflex Multiconformer: Generate a set of low-energy conformers Rigid docking FLOG, FRED, Yucca

    22. Types of Scoring Functions

    Force Field-Based: use non-bonded energies of force fields (e.g AMBER and CHARMM) Empirical-Based: derive from a set of protein-ligand complexes with measured binding affinity Knowledge-Based: statistical atom pair potentials derived from structural databases (use Boltzmann law)

    23. Grid: precompute scoring function

    Most of docking algorithms are capable of dealing with different (additive) scoring functions

    24. Scoring Functions Comparison

    Comparative Evaluation of 11 Scoring Functions for molecular Docking by R. Wang, Y. Lu & S. Wang, J. Med Chem, 2003

    25. Outline

    Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm

    26. Comparative Study

    Benchmark: 100 protein-ligand complexes Diversity: molecular weight, number of rotatable bonds, volume of binding site cavity, polar surface area of the ligands 8 docking algorithms: Dock, FlexX, FRED, Glide, GOLD, Slide, Surflex, QXP Comparative Evaluation of Eight Docking Tools for Docking and Virtual Screening Accuracy by E. Kellenberger, J. Rodrigo, P. Muller,D. Rognan Proteins: Structure, Function, and Bioinformatics (2004)

    27. Docking & ranking accuracy

    Docking accuracy: Among the 30 top-scored poses, the smallest RMSD < 2A Ranking accuracy: The top-scored poses RMSD<2A

    28. Docking Accuracy

    Glide, GOLD, Surflex, QXP : > 80% FlexX: 66% FRED: 62% DOCK, SLIDE: ~50% Yucca: 76% Remark: QXP used some information of the bound-conformation.

    29. Ranking Accuracy

    Glide, GOLD, Surflex, FlexX: 50-55% DOCK, FRED, Slide, QXP : <40% Yucca: 45%

    30. Speed

    Average CPU time (seconds) on a 270MHz SGI R12K processor Running IRIX6.5: FRED 18 DOCK 46 FlexX 67 QXP 108 SLIDE 118 Surflex 135 GOLD 137 GLIDE 234 Yucca: average 4 seconds on a Pentium IV (3.0GHz) computer

    31. Outline

    Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm Multiconformer docking Our Scoring Function Local Improvement

    32. Yucca: Multiconformer docker

    Generate a set of comformers Use OMEGA (OpenEyes Scientific Co.) to generate a set of low-energy conformers Divide-and-Conquer: fragmentation + a set of preferred torsion angles Allow up to maximum 500 conformers for each molecule. Total: 5967 conformers (100-complex benchmark) Average 1.4 seconds per ligand Rigid docking each conformer Coarse sampling ! a set of initial configurations Locally improve each configuration to a local minimum configuration

    33. Rigid docking

    Move to quasi-centroid; Rotate about quasi-centroid by an angle; 3. Locally improve each configuration.

    34. Our Scoring Function

    2 components: Energy Bump Energy(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Energy(a,b) Bump(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Bump(a,b) Objective: Energy is minimized with Bump Tolerance

    35. Piecewise Linear Potentials (PLP)

    Atom Types: hydrogen bond donor hydrogen bond acceptor hydrogen bond donor/acceptor nonpolar Interaction Types: H-bond: donor and acceptor replusion : donor-donor or acceptor-acceptor dispersion : other contacts Molecular recognition of the inbibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolution programming. D. K. Gehlhaar, et al. Chemistry & Biology, 1995.

    36. PLP cont.

    A=2.3, B=2.6, C=3.1, D=3.4 E=-2, F=20 Etotal = EH-bond + Erepulsion + Edispersion

    37. Our PLP-based scoring function

    Energy(a,b) = PLP energy (a,b) Example: dist(a,b) = 3.1, energy = -2 Bump(a,b) = 1 if PLP energy(a,b)>0 Energy(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Energy(a,b) Bump(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Bump(a,b) Objective: Energy is minimized with Bump Tolerance

    38. Yucca: The Algorithm

    0. Preprocessing precompute grids; Rigid docking of each conformer: 1. Coarse sample a set of initial configurations; 2. Locally improve each configuration.

    39. Preprocessing: Compute grids

    For each atom type: Energy grid (0.2 A) Bump grid (0.2 A) Energy(a) = ?b 2 Receptor Energy(a,b)

    40. Attractor grid

    Attractor grid (0.8 A): - According to the distance to the protein atoms, find the lowest energy grid point within the local neighborhood.

    41. Bump-free grid

    Bump-free grid (0.2 A): -The nearest bump-free grid point within the neighborhood.

    42. Yucca: The Algorithm

    0. Preprocessing precompute grids; Rigid docking of each conformer: 1. Coarse sample a set of initial configuration; 2. Locally improve each configuration.

    43. Yucca: Coarse Sampling Step

    Translation : centroid ! quasi-centroid Rotate about qausi-centroid ! initial configuration Quasi-centroid = centroid of the grid points with energy<-2, bump=0 Distance (quasi-centroid, centroid of bound ligand) < 2.5 A Sample around the quasi-centroid (a cube with distance 2 A)

    44. Rotation

    A rotation in R3 can be specified by a rotation angle ? about a rotation axis u represented by unit quaternion. Rotation axes: 20 uniformly distributed points on unit sphere Rotation angle = max{5?/radius(ligand), ?/6} Total initial configurations: 9*20*6=1080

    45. Yucca: Local Improvement Step

    Step 1: Outer Loop lower energy Step 2: Inner Loop resolve collision

    46. Tool: Weighted Least-Squares Superposition

    = WLSS(w, B, C) : ?iwi||?(bi) ci||2 is minimized

    47. Outer Loop: decreasing energy

    For each ligand atom, match it with the lowest energy grid point within its neighborhood by looking up from attractor grid; Apply Least-Squares Superposition;

    48. Collision resolution

    49. Inner Loop: collision resolution

    If an atom is bump free, match it to its original position; If an atom causes bump, match it with the nearest bump-free grid point using bump-free grid; Set a larger weight (proportional to its inverse square distance); Apply weighted least square superposition

    50. Example 1

    Notation: [Energy, Bump, RMSD] Root Mean Square Distance Input: [2575, 38, 2.31] Outer iteration 1: [-2402, 41, 1.64] Inner loop: [-6706, 27, 1.50] ! [-8468,23,1.27] ! [-10279, 14, 0.97] ! [-11158, 6, 0.82] Outer iteration 2: [-10376, 14, 1.01] Inner loop: [-10956, 11, 0.80] ! [-10951, 8, 0.63] ! [-10482, 5, 0.57] Outer iteration 3: [-9586, 15, 0.83] Inner loop: [-11140, 3, 0.65]

    51. Example 2

    Notation: [Energy, Bump, RMSD] Root Mean Square Distance Input: [22133, 82, 5.20] Outer iteration 1: [25597, 101, 4.97] Inner loop: [20871, 87, 5.20] ! [14601, 71, 5.22] ! [7508, 51, 5.18] ! [3638, 38, 5.26] ! [1810, 29, 5.32] Outer iteration 2: [6644, 59, 5.05] Inner loop: [2408, 37, 4.98] ! [-27, 30, 4.88] ! [-1521, 27, 5.03] ! [-2249, 20, 4.97] ! [-2238, 15, 4.89] Outer iteration 3: [-461, 31, 4.80] Inner loop: [-1924, 27, 4.97] ! [-2051, 25, 4.82] ! [-3153, 21, 4.76] ! [-3627, 17, 4.61] ! [-3889, 13, 4.58]

    52. Yucca: The Algorithm

    0. Preprocessing precompute grids; Rigid docking of each conformer: 1. Coarse sample a set of initial configuration; 2. Locally improve each configuration: Outer Loop - Lower energy Inner Loop Resolving collisions

    53. Our algorithm Yuccas performance

    Average 4 seconds on Pentium IV (3GHz) Docking accuracy : 76% Ranking accuracy: 45%

    54. 14 Difficult Docking Cases (among 100-complex)

    No more than 2 programs (among the 8 programs) manage to successfully dock the ligand 1LIC:

    55. 14 Difficult Docking Cases (among 100-complex)

    No more than 2 programs (among the 8 programs) manage to successfully dock the ligand Yuccas results: Without including bound conformation in OMEGA : 6 successes, 8 fails With including bound conformation in OMEGA: 12 sucesses, 2 fails

    56. Work in Progress

    Conformer generator Use the available databases to mine the correlated torsion angles Directed tweak to resolve the collisions Better scoring function Flexible receptor docking Virtual Screening

    57. Acknowledgement

    David Bevan (Biochemistry, VT) Gavin Tsai (NCI/NIH) Joel Gillespie (VBI, VT) Bradley Feuston (Merck Research Laboratory)

More Related