Sachem: a chemical cartridge for high-performance substructure search

. 2018 May 23 ; 10 (1) : 27. [epub] 20180523

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid29797000

Grantová podpora
LM2015047 Ministerstvo Školství, Mládeže a Tělovýchovy
61388963 Institute of Organic Chemistry and Biochemistry of the CAS (RVO)

Odkazy

PubMed 29797000
PubMed Central PMC5966370
DOI 10.1186/s13321-018-0282-y
PII: 10.1186/s13321-018-0282-y
Knihovny.cz E-zdroje

BACKGROUND: Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size. RESULTS: We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency. CONCLUSIONS: The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.

Zobrazit více v PubMed

Venkatraman V, Pérez-Nueno VI, Mavridis L, Ritchie DW. Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods. J Chem Inf Model. 2010;50(12):2079–2093. doi: 10.1021/ci100263p. PubMed DOI

Weskamp N. Guided iterative substructure search (GI-SSS)-a new trick for an old dog. Mol Inform. 2016;35(6–7):286–292. doi: 10.1002/minf.201600063. PubMed DOI

Barnard JM. Substructure searching methods: old and new. J Chem Inf Comput Sci. 1993;33(4):532–538. doi: 10.1021/ci00014a001. DOI

Zhuang C, Narayanapillai S, Zhang W, Sham YY, Xing C. Rapid identification of Keap1-Nrf2 small-molecule inhibitors through structure-based virtual screening and hit-based substructure search. J Med Chem. 2014;57(3):1121–1126. doi: 10.1021/jm4017174. PubMed DOI

Sheridan RP, Kearsley SK. Why do we need so many chemical similarity search methods? Drug Discov Today. 2002;7(17):903–911. doi: 10.1016/S1359-6446(02)02411-X. PubMed DOI

Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Garcia-Vallvé S, Pujadas G. Molecular fingerprint similarity search in virtual screening. Methods. 2015;71:58–63. doi: 10.1016/j.ymeth.2014.08.005. PubMed DOI

Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL. Recent developments of the chemistry development kit (CDK)-an open-source java library for chemo-and bioinformatics. Curr Pharm Des. 2006;12(17):2111–2120. doi: 10.2174/138161206777585274. PubMed DOI

Rijnbeek M, Steinbeck C. OrChem—an open source chemistry search engine for Oracle®. J Cheminform. 2009;1(1):17. doi: 10.1186/1758-2946-1-17. PubMed DOI PMC

Ihlenfeldt WD, Takahashi Y, Abe H, Sasaki Si. Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and compatibility. J Chem Inf Comput Sci. 1994;34(1):109–116. doi: 10.1021/ci00017a013. DOI

Brown RD, Martin YC. Use of structure- activity data to compare structure-based clustering methods and descriptors for use in compound selection. J Chem Inf Comput Sci. 1996;36(3):572–584. doi: 10.1021/ci9501047. DOI

Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–1474. doi: 10.1002/jcc.21707. PubMed DOI

Liu P, Agrafiotis DK, Rassokhin DN. Power Keys: a novel class of topological descriptors based on exhaustive subgraph enumeration and their application in substructure searching. J Chem Inf Model. 2011;51(11):2843–2851. doi: 10.1021/ci200282z. PubMed DOI

O’Boyle NM, Sayle RA. Comparing structural fingerprints using a literature-based similarity benchmark. J Cheminform. 2016;8(1):36. doi: 10.1186/s13321-016-0148-0. PubMed DOI PMC

pgFoundry::pgChem::Tigress [Web page] (2011) http://pgfoundry.org/projects/pgchem/. Accessed 9 Apr 2018

Pavlov D, Rybalkin M, Karulin B. Bingo from SciTouch LLC: chemistry cartridge for Oracle database. J Cheminform. 2010;2:1–1. doi: 10.1186/1758-2946-2-S1-F1. PubMed DOI

Degtyarenko K, De Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, et al. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 2007;36(suppl-1):D344–D350. doi: 10.1093/nar/gkm791. PubMed DOI PMC

Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754. doi: 10.1021/ci100050t. PubMed DOI

Broder A, Mitzenmacher M. Network applications of Bloom filters: a survey. Internet Math. 2004;1(4):485–509. doi: 10.1080/15427951.2004.10129096. DOI

Białecki A, Muir R, Ingersoll G (2012) Lucid Imagination. Apache lucene 4. In: SIGIR 2012 workshop on open source information retrieval, p 17

Apache Lucy [Web page] (2017) https://lucy.apache.org/. Accessed 9 Apr 2018

Smiley D, Pugh E, Parisa K, Mitchell M. Apache Solr enterprise search server. Birmingham: Packt Publishing Ltd; 2015.

Kuc R, Rogozinski M. Elasticsearch server. Birmingham: Packt Publishing Ltd; 2013.

Liu P, Agrafiotis DK, Rassokhin DN, Yang E. Accelerating chemical database searching using graphics processing units. J Cem Inf Model. 2011;51(8):1807–1816. doi: 10.1021/ci200164g. PubMed DOI

Tao L, Zhang P, Qin C, Chen S, Zhang C, Chen Z, et al. Recent progresses in the exploration of machine learning methods as in-silico ADME prediction tools. Adv Ddrug Deliv Rev. 2015;86:83–100. doi: 10.1016/j.addr.2015.03.014. PubMed DOI

Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A et al. (2015) Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems, pp 2224–2232

Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today. 2015;20(3):318–331. doi: 10.1016/j.drudis.2014.10.012. PubMed DOI

Landrum G et al. (2006) RDKit: open-source cheminformatics

MyChem [Web page] (2015) http://mychem.sourceforge.net/. Accessed 9 Apr 2018

O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open Babel: an open chemical toolbox. J Cheminform. 2011;3(1):33. doi: 10.1186/1758-2946-3-33. PubMed DOI PMC

Zamora A. An algorithm for finding the smallest set of smallest rings. J Chem Inf Comput Sci. 1976;16(1):40–43. doi: 10.1021/ci60005a013. DOI

O’Boyle NM, Guha R, Willighagen EL, Adams SE, Alvarsson J, Bradley JC, et al. Open data, open source and open standards in chemistry: the blue obelisk five years on. J Cheminform. 2011;3(1):37. doi: 10.1186/1758-2946-3-37. PubMed DOI PMC

Martin E, Monge A, Duret JA, Gualandi F, Peitsch MC, Pospisil P. Building an R&D chemical registration system. J Cheminform. 2012;4(1):11. doi: 10.1186/1758-2946-4-11. PubMed DOI PMC

Guilloux VL, Arrault A, Colliandre L, Bourg S, Vayer P, Morin-Allory L. Mining collections of compounds with screening assistant 2. J Cheminform. 2012;4(1):20. doi: 10.1186/1758-2946-4-20. PubMed DOI PMC

May J, Sayle R (2015) Substructure search faceoff; 2015. Cambridge cheminformatics network meeting. https://www.slideshare.net/NextMoveSoftware/substructure-search-faceoff. Accessed 9 Apr 2018

Dalke A (2014) Substructural query collection; 2014. https://bitbucket.org/dalke/sqc. Accessed 09 Apr 2018

Ehrlich HC, Rarey M. Systematic benchmark of substructure search in molecular graphs-from Ullmann to VF2. J Cheminform. 2012;4(1):13. doi: 10.1186/1758-2946-4-13. PubMed DOI PMC

Sitzmann M, Ihlenfeldt WD, Nicklaus MC. Tautomerism in large databases. J Comput-Aid Mol Des. 2010;24(6–7):521–551. doi: 10.1007/s10822-010-9346-4. PubMed DOI PMC

Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information retrieval and text mining technologies for chemistry. Chem Rev. 2017;117(12):7673–7761. doi: 10.1021/acs.chemrev.6b00851. PubMed DOI

Agrafiotis DK, Lobanov VS, Shemanarev M, Rassokhin DN, Izrailev S, Jaeger EP, et al. Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge. J Chem Inf Model. 2011;51(12):3113–3130. doi: 10.1021/ci200413e. PubMed DOI

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...