A comparison of approaches to accessing existing biological and chemical relational databases via SPARQL

. 2023 Jun 20 ; 15 (1) : 61. [epub] 20230620

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články, přehledy

Perzistentní odkaz   https://www.medvik.cz/link/pmid37340506

Grantová podpora
LM2018131 Ministerstvo Školství, Mládeže a Tělovýchovy
RVO:61388963 Institute of Organic Chemistry and Biochemistry, Czech Republic

Odkazy

PubMed 37340506
PubMed Central PMC10280967
DOI 10.1186/s13321-023-00729-5
PII: 10.1186/s13321-023-00729-5
Knihovny.cz E-zdroje

Current biological and chemical research is increasingly dependent on the reusability of previously acquired data, which typically come from various sources. Consequently, there is a growing need for database systems and databases stored in them to be interoperable with each other. One of the possible solutions to address this issue is to use systems based on Semantic Web technologies, namely on the Resource Description Framework (RDF) to express data and on the SPARQL query language to retrieve the data. Many existing biological and chemical databases are stored in the form of a relational database (RDB). Converting a relational database into the RDF form and storing it in a native RDF database system may not be desirable in many cases. It may be necessary to preserve the original database form, and having two versions of the same data may not be convenient. A solution may be to use a system mapping the relational database to the RDF form. Such a system keeps data in their original relational form and translates incoming SPARQL queries to equivalent SQL queries, which are evaluated by a relational-database system. This review compares different RDB-to-RDF mapping systems with a primary focus on those that can be used free of charge. In addition, it compares different approaches to expressing RDB-to-RDF mappings. The review shows that these systems represent a viable method providing sufficient performance. Their real-life performance is demonstrated on data and queries coming from the neXtProt project.

Zobrazit více v PubMed

Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. doi: 10.1038/sdata.2016.18. PubMed DOI PMC

Wang Y, et al. PubChem BioAssay: 2014 update. Nucleic Acids Res. 2014;42:D1075–D1082. doi: 10.1093/nar/gkt978. PubMed DOI PMC

Mendez D, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019;47:D930–D940. doi: 10.1093/nar/gky1075. PubMed DOI PMC

Alcantara R, et al. Rhea—a manually curated resource of biochemical reactions. Nucleic Acids Res. 2012;40:D754–D760. doi: 10.1093/nar/gkr1126. PubMed DOI PMC

Juracka J, Srejber M, Melikova M, Bazgier V, Berka K. MolMeDB: molecules on membranes database. Database. 2019;2019:baz078. doi: 10.1093/database/baz078. PubMed DOI PMC

W3C (2014) RDF 1.1 primer . https://www.w3.org/TR/rdf11-primer/. Accessed 15 Sep 2022

W3C (2013) SPARQL 1.1 query language . https://www.w3.org/TR/sparql11-query/. Accessed 15 Sep 2022

UniProt C. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. PubMed DOI PMC

Bansal P, et al. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res. 2022;50:D693–D700. doi: 10.1093/nar/gkab1016. PubMed DOI PMC

Zahn-Zabal M, et al. The neXtProt knowledgebase in 2020: data, tools and usability improvements. Nucleic Acids Res. 2020;48:D328–D334. PubMed PMC

Pinero J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48:D845–D855. PubMed PMC

Wikidata. https://www.wikidata.org. Accessed 15 Sep 2022

Rutz A, et al. The LOTUS initiative for open knowledge management in natural products research. Elife. 2022;11:e70780. doi: 10.7554/eLife.70780. PubMed DOI PMC

W3C (2014) RDF 1.1 concepts and abstract syntax . https://www.w3.org/TR/rdf11-concepts/. Accessed 15 Sep 2022

Codd EF. A relational model of data for large shared data banks. Commun ACM. 1970;13:377–387. doi: 10.1145/362384.362685. PubMed DOI

W3C (2012) R2RML: RDB to RDF mapping language . https://www.w3.org/TR/r2rml/. Accessed 15 Sep 2022

OpenLink Software: Virtuoso. https://virtuoso.openlinksw.com. Accessed 15 Sep 2022

Blazegraph. https://blazegraph.com. Accessed 15 Sep 2022

Zong N, et al. BETA: a comprehensive benchmark for computational drug-target prediction. Brief Bioinform. 2022 doi: 10.1093/bib/bbac199. PubMed DOI PMC

Ontotext GraphDB. https://graphdb.ontotext.com. Accessed 15 Sep 2022

Zhao S, et al. GlycoStore: a database of retention properties for glycan analysis. Bioinformatics. 2018;34:3231–3232. doi: 10.1093/bioinformatics/bty319. PubMed DOI

Zaki N, Tennakoon C. BioCarian: search engine for exploratory searches in heterogeneous biological databases. BMC Bioinf. 2017;18:435. doi: 10.1186/s12859-017-1840-4. PubMed DOI PMC

Apache Jena. https://jena.apache.org. Accessed 15 Sep 2022

Linked data views over RDBMS data source. http://docs.openlinksw.com/virtuoso/rdfviewsrdbms/. Accessed 15 Sep 2022

Ontop. https://ontop-vkg.org. Accessed 15 Sep 2022

Galgonek J, Vondrasek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform. 2021;13:38. doi: 10.1186/s13321-021-00515-1. PubMed DOI PMC

neXtProt SNORQL. https://snorql.nextprot.org. Accessed 15 Sep 2022

Kratochvil M, Vondrasek J, Galgonek J. Sachem: a chemical cartridge for high-performance substructure search. J Cheminform. 2018;10:27. doi: 10.1186/s13321-018-0282-y. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace