Advanced protein-ligand scoring function based on semiempirical quantum chemical method
Dotaz
Zobrazit nápovědu
This paper describes the excellent performance of a newly developed scoring function (SF), based on the semiempirical QM (SQM) PM6-D3H4X method combined with the conductor-like screening implicit solvent model (COSMO). The SQM/COSMO, Amber/GB and nine widely used SFs have been evaluated in terms of ranking power on the HSP90 protein with 72 biologically active compounds and 4469 structurally similar decoys. Among conventional SFs, the highest early and overall enrichment measured by EF1 and AUC% obtained using single-scoring-function ranking has been found for Glide SP and Gold-ASP SFs, respectively (7, 75 % and 3, 76 %). The performance of other standard SFs has not been satisfactory, mostly even decreasing below random values. The SQM/COSMO SF, where P-L structures were optimised at the advanced Amber level, has resulted in a dramatic enrichment increase (47, 98 %), almost reaching the best possible receiver operator characteristic (ROC) curve. The best SQM frame thus inserts about seven times more active compounds into the selected dataset than the best standard SF.
- Klíčová slova
- docking, enrichment, non-covalent interactions, semiempirical quantum mechanics-based scoring function, virtual screening,
- MeSH
- kvantová teorie * MeSH
- ligandy MeSH
- molekulární modely MeSH
- proteiny tepelného šoku HSP90 antagonisté a inhibitory metabolismus MeSH
- ROC křivka MeSH
- termodynamika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- ligandy MeSH
- proteiny tepelného šoku HSP90 MeSH
Although covalent interactions determine the primary structure of a molecule, the noncovalent interactions are responsible for the tertiary and quaternary structure of a molecule and create the fascinating world of the 3D architectures of biomacromolecules. For example, the double helical structure of DNA is of fundamental importance for the function of DNA: it allows it to store and transfer genetic information. To fulfill this role, the structure is rigid to maintain the double helix with a proper positioning of the complementary base, and floppy to allow for its opening. Very strong covalent interactions cannot fulfill both of these criteria, but noncovalent interactions, which are about 2 orders of magnitude weaker, can. This Account highlights the recent advances in the field of the design of novel wave function theory (WFT) methods applicable to noncovalent complexes ranging in size from less than 100 atoms, for which highly accurate ab initio methods are available, up to extended ones (several thousands atoms), which are the domain of semiempirical QM (SQM) methods. Accurate interaction energies for noncovalent complexes are generated by the coupled-cluster technique, taking single- and double-electron excitations iteratively and triple-electron excitation perturbatively with a complete basis set description (CCSD(T)/CBS). The procedure provides interaction energies with high accuracy (error less than 1 kcal/mol). Because the method is computationally demanding, its application is limited to complexes smaller than 30 atoms. But researchers would also like to use computational methods to determine these interaction energies accurately for larger biological and nanoscale structures. Standard QM methods such as MP2, MP3, CCSD, or DFT fail to describe various types of noncovalent systems (H-bonded, stacked, dispersion-controlled, etc.) with comparable accuracy. Therefore, novel methods are needed that have been parametrized toward noncovalent interactions, and existing benchmark data sets represent an important tool for the development of new methods providing reliable characteristics of noncovalent clusters. Our laboratory developed the first suitable data set of CCSD(T)/CBS interaction energies and geometries of various noncovalent complexes, called S22. Since its publication in 2006, it has frequently been applied in parametrization and/or verification of various wave function and density functional techniques. During the intense use of this data set, several inconsistencies emerged, such as the insufficient accuracy of the CCSD(T) correction term or its unbalanced character, which has triggered the introduction of a new, broader, and more accurate data set called the S66 data set. It contains not only 66 CCSD(T)/CBS interaction energies determined in the equilibrium geometries but also 1056 interaction energies calculated at the same level for nonequilibrium geometries. The S22 and S66 data sets have been used for the verification of various WFT methods, and the lowest RMSE (S66, in kcal/mol) was found for the recently introduced SCS-MI-CCSD/CBS (0.08), MP2.5/CBS (0.16), MP2.X/6-31G* (0.27), and SCS-MI-MP2/CBS (0.38) methods. Because of their computational economy, the MP2.5 and MP2.X/6-31G* methods can be recommended for highly accurate calculations of large complexes with up to 100 atoms. The evaluation of SQM methods was based only on the S22 data set, and because some of these methods have been parametrized toward the same data set, the respective results should be taken with caution. For really extended complexes such as protein-ligand systems, only the SMQ methods are applicable. After adding the corrections to the dispersion energy and H-bonding, several methods exhibit surprisingly low RMSE (even below 0.5 kcal/mol). Among the various SMQ methods, the PM6-DH2 can be recommended because of its computational efficiency and it can be used for optimization (which is not the case for other SQM methods). The PM6-DH2 is the base of our novel scoring function used in in silico drug design.