Advanced SPARQL querying in small molecule databases
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
27275187
PubMed Central
PMC4893829
DOI
10.1186/s13321-016-0144-4
PII: 144
Knihovny.cz E-zdroje
- Klíčová slova
- Database of small molecules, Resource Description Framework, SPARQL query language,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. RESULTS: We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. CONCLUSIONS: Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.
Zobrazit více v PubMed
Williams AJ. Public chemical compound databases. Curr Opin Drug Discov Dev. 2008;11:393–404. PubMed
ChemSpider. http://www.chemspider.com
Gobbi A, Lee M-L. Handling of tautomerism and stereochemistry in compound registration. J Chem Inf Model. 2011;52:285–292. doi: 10.1021/ci200330x. PubMed DOI
Martin E, Monge A, Duret J-A, Gualandi F, Peitsch MC, Pospisil P. Building an R&D chemical registration system. J Cheminform. 2012;4:11. doi: 10.1186/1758-2946-4-11. PubMed DOI PMC
RDF 1.1 Primer. http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/
Williams AJ, Harland L, Groth P, Pettifer S, Chichester C, Willighagen EL, Evelo CT, Blomberg N, Ecker G, Goble C, Mons B. Open PHACTS: semantic interoperability for drug discovery. Drug Discov Today. 2012;17:1188–1198. doi: 10.1016/j.drudis.2012.05.016. PubMed DOI
Chen B, Dong X, Jiao D, Wang H, Zhu Q, Ding Y, Wild DJ. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinform. 2010;11:255. doi: 10.1186/1471-2105-11-255. PubMed DOI PMC
PubChemRDF release notes. http://pubchem.ncbi.nlm.nih.gov/rdf/
ChemSpider Linked Data. http://rdf.chemspider.com
RDF Platform. http://www.ebi.ac.uk/rdf/
neXtProt—exploring the universe of human proteins. http://www.nextprot.org
De Matos P, Alcántara R, Dekker A, Ennis M, Hastings J, Haug K, Spiteri I, Turner S, Steinbeck C. Chemical entities of biological interest: an update. Nucleic Acids Res. 2010;38(Database issue):D249–D254. doi: 10.1093/nar/gkp886. PubMed DOI PMC
RDF 1.1 Concepts and abstract syntax. http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/
OWL Web Ontology Language reference. http://www.w3.org/TR/2004/REC-owl-ref-20040210/
OWL 2 Web Ontology Language document overview, 2nd edn. http://www.w3.org/TR/2012/REC-owl2-overview-20121211/
OWL 2 Web Ontology Language primer, 2nd edn. http://www.w3.org/TR/2012/REC-owl2-primer-20121211/
SPARQL 1.1 Query language. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/
Apache Jena. http://jena.apache.org
Sesame. http://rdf4j.org
Murray C (2012) Oracle® database semantic technologies developer’s Guide 11g Release 2 (11.2)
OpenLink Virtuoso. http://virtuoso.openlinksw.com
Ontotext GraphDB. http://ontotext.com/products/ontotext-graphdb/
Apache Jena: Extensions in ARQ. http://jena.apache.org/documentation/query/extension.html#property-functions
Apache Jena: ARQ—writing property functions. http://jena.apache.org/documentation/query/writing_propfuncs.html
Parr T (2013) The definitive ANTLR 4 reference. Pragmatic Bookshelf
The Apache Velocity project: user guide. http://velocity.apache.org/engine/releases/velocity-1.5/user-guide.html
Google Web Toolkit. http://www.gwtproject.org
CodeMirror. http://codemirror.net
Galgonek J, Vondrášek J. On InChI and evaluating the quality of cross-reference links. J Cheminform. 2014;6:15. doi: 10.1186/1758-2946-6-15. PubMed DOI PMC
ChemAxon JChem. http://www.chemaxon.com/products/jchem-base/
OrChem. http://orchem.sourceforge.net
Weininger D, Weininger A, Weininger JL. SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci. 1989;29:97–101. doi: 10.1021/ci00062a008. DOI
Accelrys (2011) CTfile Formats
IDSM ChemWebRDF: SPARQLing small-molecule datasets
Interoperable chemical structure search service