590 likes | 1.29k Views
structure of a molecule, independent of the coordinate system ... Both molecules are flexible hundreds to thousands of degrees of freedom (DOF) ...
E N D
1. Vicky Choi Assistant Professor Department of Computer Science Virginia Tech
Yucca: An Efficient Algorithm for Small Molecule Docking
2. Outline
Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm
3. Molecular Docking
Computational prediction of the structure of receptor-ligand complexes Receptor: Protein Ligand: Protein or Small Molecule Protein-Protein Docking Protein-Small molecule Docking
4. Protein-Protein Docking
Barnase Barstar 1BRS : Barnase + Barstar In the protein-protein docking problem were given two unbound protein complexes and the problem is to find the In the protein-protein docking problem were given two unbound protein complexes and the problem is to find the
5. Protein-Small Molecule Docking
Receptor: Adipocyte lipid-binding protein PDB code: 1LIC Ligand: Hexadecanesulfonic acid
6. Why is Docking Important?
Molecular interactions are central to most of biological processes The number of known molecular structures continues to grow, computational analysis of molecular interactions is increasingly important Computational prediction of molecular interactions is an invaluable tool for structure-based drug design
7. Example: HIV-protease
Image adopted from Nature Rev. Drug Discov. 2, 369-378 (2003)
8. Formulation of Docking Problem
A search algorithm that finds the docking complex structure measured by the scoring function. A scoring function that can discriminate correct (experimentally observed) docking complex structure from incorrect ones.
9. Terminology: conformation, configuration, pose
Conformation: the relative positions of atoms in the 3D structure of a molecule, independent of the coordinate system 2 different conformations of a ligand
Configuration/placement: the positions of atoms of a molecule after undergoing a rigid transformation (rotation and translation) in a coordinate system 2 different configurations (of same conformation) Terminology: conformation, configuration, pose Pose: a configuration of a conformation of a molecule in a coordinate system 2 different poses of a ligand Terminology: conformation, configuration, pose12. Why is Docking Difficult ?
Scoring Function: Estimate the binding affinity between ligand and receptor Factors: van der Waals interactions, hydrogen bonding, hydrophobic effects etc Search Space is high-dimensional: Both molecules are flexible hundreds to thousands of degrees of freedom (DOF) Total possible poses are astronomical
13. Hydrogen bond
hydrogen bond: a hydrogen is sandwiched between two electron-attracting atoms From : http://www.accessexcellence.org/RC/VL/GG
14. Why is Docking Difficult ?
Scoring Function: Estimate the binding affinity between ligand and receptor Factors: van der Waals interactions, hydrogen bonding, hydrophobic effects etc Search Space is high-dimensional: Both molecules are flexible hundreds to thousands of degrees of freedom (DOF) Total possible poses are astronomical
15. Types of Docking Problems
Protein-Protein Docking Bound docking (rigid redocking problem): 6 degrees of freedom: 3 for rotation, 3 for translation Unbound docking : side chain flexibility Protein-Small Molecule Docking Rigid receptor, rigid ligand Rigid receptor, flexible ligand Flexible receptor, flexible ligand
16. Rigid-Receptor Flexible-Ligand Docking
Rigid Receptor: (hold fixed) Flexible Ligand: Find a pose of the ligand which is close to its X-ray pose (bound conformation). RMSD: Root-Mean-Square-Distance
17. Outline
Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm
18. Available Docking Software
DOCK (Kuntz et al, 1982, Ewing & Kuntz 2001) FlexX (Rarey et al 1996) Hammerhead (Welch et al 1996) Surflex (Jain 2003) SLIDE (Kuhn et al 2002) AutoDock (Olson et al 1990, Morris et al 1998) ICM (Abagyan et al 1994) MCDock (Liu & Wang 1999) GOLD (Jones et al 1997) GemDock (Yang & Chen 2004) FRED (McGann et al 2002) Glide (Friesner et al 2004) Yucca (Choi 2005)
19. Docking Algorithms
Stochastic Search: Genetic Algorithm, Monte Carlo simulated annealing AutoDock, MCDock, ICM, GOLD, Glide Incremental Construction: Rigid fragments with rotatable bonds Incremental : preferred torsion angles DOCK, FlexX, SLIDE, Surflex Multiconformer: Generate a set of low-energy conformers Rigid docking FLOG, FRED, Yucca
Incremental construction (FlexX & DOCK) 0. Fragmentation: 2. Anchor fragment placement 3. Incremental addition of other fragments a set of preferred torsion angles (<13) branch-and-bound heuristic 1. Base (Anchor) fragment selection: - specificity - placeability
21. Docking Algorithms
Stochastic Search: Genetic Algorithm, Monte Carlo simulated annealing AutoDock, MCDock, ICM, GOLD, Glide Incremental Construction: Rigid fragments with rotatable bonds Incremental : preferred torsion angles DOCK, FlexX, SLIDE, Surflex Multiconformer: Generate a set of low-energy conformers Rigid docking FLOG, FRED, Yucca
22. Types of Scoring Functions
Force Field-Based: use non-bonded energies of force fields (e.g AMBER and CHARMM) Empirical-Based: derive from a set of protein-ligand complexes with measured binding affinity Knowledge-Based: statistical atom pair potentials derived from structural databases (use Boltzmann law)
23. Grid: precompute scoring function
Most of docking algorithms are capable of dealing with different (additive) scoring functions
24. Scoring Functions Comparison
Comparative Evaluation of 11 Scoring Functions for molecular Docking by R. Wang, Y. Lu & S. Wang, J. Med Chem, 2003
25. Outline
Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm
26. Comparative Study
Benchmark: 100 protein-ligand complexes Diversity: molecular weight, number of rotatable bonds, volume of binding site cavity, polar surface area of the ligands 8 docking algorithms: Dock, FlexX, FRED, Glide, GOLD, Slide, Surflex, QXP Comparative Evaluation of Eight Docking Tools for Docking and Virtual Screening Accuracy by E. Kellenberger, J. Rodrigo, P. Muller,D. Rognan Proteins: Structure, Function, and Bioinformatics (2004)
27. Docking & ranking accuracy
Docking accuracy: Among the 30 top-scored poses, the smallest RMSD < 2A Ranking accuracy: The top-scored poses RMSD<2A
28. Docking Accuracy
Glide, GOLD, Surflex, QXP : > 80% FlexX: 66% FRED: 62% DOCK, SLIDE: ~50% Yucca: 76% Remark: QXP used some information of the bound-conformation.
29. Ranking Accuracy
Glide, GOLD, Surflex, FlexX: 50-55% DOCK, FRED, Slide, QXP : <40% Yucca: 45%
30. Speed
Average CPU time (seconds) on a 270MHz SGI R12K processor Running IRIX6.5: FRED 18 DOCK 46 FlexX 67 QXP 108 SLIDE 118 Surflex 135 GOLD 137 GLIDE 234 Yucca: average 4 seconds on a Pentium IV (3.0GHz) computer
31. Outline
Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm Multiconformer docking Our Scoring Function Local Improvement
32. Yucca: Multiconformer docker
Generate a set of comformers Use OMEGA (OpenEyes Scientific Co.) to generate a set of low-energy conformers Divide-and-Conquer: fragmentation + a set of preferred torsion angles Allow up to maximum 500 conformers for each molecule. Total: 5967 conformers (100-complex benchmark) Average 1.4 seconds per ligand Rigid docking each conformer Coarse sampling ! a set of initial configurations Locally improve each configuration to a local minimum configuration
33. Rigid docking
Move to quasi-centroid; Rotate about quasi-centroid by an angle; 3. Locally improve each configuration.
34. Our Scoring Function
2 components: Energy Bump Energy(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Energy(a,b) Bump(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Bump(a,b) Objective: Energy is minimized with Bump Tolerance
35. Piecewise Linear Potentials (PLP)
Atom Types: hydrogen bond donor hydrogen bond acceptor hydrogen bond donor/acceptor nonpolar Interaction Types: H-bond: donor and acceptor replusion : donor-donor or acceptor-acceptor dispersion : other contacts Molecular recognition of the inbibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolution programming. D. K. Gehlhaar, et al. Chemistry & Biology, 1995.
36. PLP cont.
A=2.3, B=2.6, C=3.1, D=3.4 E=-2, F=20 Etotal = EH-bond + Erepulsion + Edispersion
37. Our PLP-based scoring function
Energy(a,b) = PLP energy (a,b) Example: dist(a,b) = 3.1, energy = -2 Bump(a,b) = 1 if PLP energy(a,b)>0 Energy(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Energy(a,b) Bump(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Bump(a,b) Objective: Energy is minimized with Bump Tolerance
38. Yucca: The Algorithm
0. Preprocessing precompute grids; Rigid docking of each conformer: 1. Coarse sample a set of initial configurations; 2. Locally improve each configuration.
39. Preprocessing: Compute grids
For each atom type: Energy grid (0.2 A) Bump grid (0.2 A) Energy(a) = ?b 2 Receptor Energy(a,b)
40. Attractor grid
Attractor grid (0.8 A): - According to the distance to the protein atoms, find the lowest energy grid point within the local neighborhood.
41. Bump-free grid
Bump-free grid (0.2 A): -The nearest bump-free grid point within the neighborhood.
42. Yucca: The Algorithm
0. Preprocessing precompute grids; Rigid docking of each conformer: 1. Coarse sample a set of initial configuration; 2. Locally improve each configuration.
43. Yucca: Coarse Sampling Step
Translation : centroid ! quasi-centroid Rotate about qausi-centroid ! initial configuration Quasi-centroid = centroid of the grid points with energy<-2, bump=0 Distance (quasi-centroid, centroid of bound ligand) < 2.5 A Sample around the quasi-centroid (a cube with distance 2 A)
44. Rotation
A rotation in R3 can be specified by a rotation angle ? about a rotation axis u represented by unit quaternion. Rotation axes: 20 uniformly distributed points on unit sphere Rotation angle = max{5?/radius(ligand), ?/6} Total initial configurations: 9*20*6=1080
45. Yucca: Local Improvement Step
Step 1: Outer Loop lower energy Step 2: Inner Loop resolve collision
46. Tool: Weighted Least-Squares Superposition
= WLSS(w, B, C) : ?iwi||?(bi) ci||2 is minimized
47. Outer Loop: decreasing energy
For each ligand atom, match it with the lowest energy grid point within its neighborhood by looking up from attractor grid; Apply Least-Squares Superposition;
48. Collision resolution
49. Inner Loop: collision resolution
If an atom is bump free, match it to its original position; If an atom causes bump, match it with the nearest bump-free grid point using bump-free grid; Set a larger weight (proportional to its inverse square distance); Apply weighted least square superposition
50. Example 1
Notation: [Energy, Bump, RMSD] Root Mean Square Distance Input: [2575, 38, 2.31] Outer iteration 1: [-2402, 41, 1.64] Inner loop: [-6706, 27, 1.50] ! [-8468,23,1.27] ! [-10279, 14, 0.97] ! [-11158, 6, 0.82] Outer iteration 2: [-10376, 14, 1.01] Inner loop: [-10956, 11, 0.80] ! [-10951, 8, 0.63] ! [-10482, 5, 0.57] Outer iteration 3: [-9586, 15, 0.83] Inner loop: [-11140, 3, 0.65]
51. Example 2
Notation: [Energy, Bump, RMSD] Root Mean Square Distance Input: [22133, 82, 5.20] Outer iteration 1: [25597, 101, 4.97] Inner loop: [20871, 87, 5.20] ! [14601, 71, 5.22] ! [7508, 51, 5.18] ! [3638, 38, 5.26] ! [1810, 29, 5.32] Outer iteration 2: [6644, 59, 5.05] Inner loop: [2408, 37, 4.98] ! [-27, 30, 4.88] ! [-1521, 27, 5.03] ! [-2249, 20, 4.97] ! [-2238, 15, 4.89] Outer iteration 3: [-461, 31, 4.80] Inner loop: [-1924, 27, 4.97] ! [-2051, 25, 4.82] ! [-3153, 21, 4.76] ! [-3627, 17, 4.61] ! [-3889, 13, 4.58]
52. Yucca: The Algorithm
0. Preprocessing precompute grids; Rigid docking of each conformer: 1. Coarse sample a set of initial configuration; 2. Locally improve each configuration: Outer Loop - Lower energy Inner Loop Resolving collisions
53. Our algorithm Yuccas performance
Average 4 seconds on Pentium IV (3GHz) Docking accuracy : 76% Ranking accuracy: 45%
54. 14 Difficult Docking Cases (among 100-complex)
No more than 2 programs (among the 8 programs) manage to successfully dock the ligand 1LIC:
55. 14 Difficult Docking Cases (among 100-complex)
No more than 2 programs (among the 8 programs) manage to successfully dock the ligand Yuccas results: Without including bound conformation in OMEGA : 6 successes, 8 fails With including bound conformation in OMEGA: 12 sucesses, 2 fails
56. Work in Progress
Conformer generator Use the available databases to mine the correlated torsion angles Directed tweak to resolve the collisions Better scoring function Flexible receptor docking Virtual Screening
57. Acknowledgement
David Bevan (Biochemistry, VT) Gavin Tsai (NCI/NIH) Joel Gillespie (VBI, VT) Bradley Feuston (Merck Research Laboratory)