A comparison of approaches to accessing existing biological and chemical relational databases via SPARQL
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články, přehledy
Grantová podpora
LM2018131
Ministerstvo Školství, Mládeže a Tělovýchovy
RVO:61388963
Institute of Organic Chemistry and Biochemistry, Czech Republic
PubMed
37340506
PubMed Central
PMC10280967
DOI
10.1186/s13321-023-00729-5
PII: 10.1186/s13321-023-00729-5
Knihovny.cz E-zdroje
- Klíčová slova
- RDB-to-RDF mapping, Relational database, Resource Description Framework, SPARQL,
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Current biological and chemical research is increasingly dependent on the reusability of previously acquired data, which typically come from various sources. Consequently, there is a growing need for database systems and databases stored in them to be interoperable with each other. One of the possible solutions to address this issue is to use systems based on Semantic Web technologies, namely on the Resource Description Framework (RDF) to express data and on the SPARQL query language to retrieve the data. Many existing biological and chemical databases are stored in the form of a relational database (RDB). Converting a relational database into the RDF form and storing it in a native RDF database system may not be desirable in many cases. It may be necessary to preserve the original database form, and having two versions of the same data may not be convenient. A solution may be to use a system mapping the relational database to the RDF form. Such a system keeps data in their original relational form and translates incoming SPARQL queries to equivalent SQL queries, which are evaluated by a relational-database system. This review compares different RDB-to-RDF mapping systems with a primary focus on those that can be used free of charge. In addition, it compares different approaches to expressing RDB-to-RDF mappings. The review shows that these systems represent a viable method providing sufficient performance. Their real-life performance is demonstrated on data and queries coming from the neXtProt project.
Zobrazit více v PubMed
Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. doi: 10.1038/sdata.2016.18. PubMed DOI PMC
Wang Y, et al. PubChem BioAssay: 2014 update. Nucleic Acids Res. 2014;42:D1075–D1082. doi: 10.1093/nar/gkt978. PubMed DOI PMC
Mendez D, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019;47:D930–D940. doi: 10.1093/nar/gky1075. PubMed DOI PMC
Alcantara R, et al. Rhea—a manually curated resource of biochemical reactions. Nucleic Acids Res. 2012;40:D754–D760. doi: 10.1093/nar/gkr1126. PubMed DOI PMC
Juracka J, Srejber M, Melikova M, Bazgier V, Berka K. MolMeDB: molecules on membranes database. Database. 2019;2019:baz078. doi: 10.1093/database/baz078. PubMed DOI PMC
W3C (2014) RDF 1.1 primer . https://www.w3.org/TR/rdf11-primer/. Accessed 15 Sep 2022
W3C (2013) SPARQL 1.1 query language . https://www.w3.org/TR/sparql11-query/. Accessed 15 Sep 2022
UniProt C. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. PubMed DOI PMC
Bansal P, et al. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res. 2022;50:D693–D700. doi: 10.1093/nar/gkab1016. PubMed DOI PMC
Zahn-Zabal M, et al. The neXtProt knowledgebase in 2020: data, tools and usability improvements. Nucleic Acids Res. 2020;48:D328–D334. PubMed PMC
Pinero J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48:D845–D855. PubMed PMC
Wikidata. https://www.wikidata.org. Accessed 15 Sep 2022
Rutz A, et al. The LOTUS initiative for open knowledge management in natural products research. Elife. 2022;11:e70780. doi: 10.7554/eLife.70780. PubMed DOI PMC
W3C (2014) RDF 1.1 concepts and abstract syntax . https://www.w3.org/TR/rdf11-concepts/. Accessed 15 Sep 2022
Codd EF. A relational model of data for large shared data banks. Commun ACM. 1970;13:377–387. doi: 10.1145/362384.362685. PubMed DOI
W3C (2012) R2RML: RDB to RDF mapping language . https://www.w3.org/TR/r2rml/. Accessed 15 Sep 2022
OpenLink Software: Virtuoso. https://virtuoso.openlinksw.com. Accessed 15 Sep 2022
Blazegraph. https://blazegraph.com. Accessed 15 Sep 2022
Zong N, et al. BETA: a comprehensive benchmark for computational drug-target prediction. Brief Bioinform. 2022 doi: 10.1093/bib/bbac199. PubMed DOI PMC
Ontotext GraphDB. https://graphdb.ontotext.com. Accessed 15 Sep 2022
Zhao S, et al. GlycoStore: a database of retention properties for glycan analysis. Bioinformatics. 2018;34:3231–3232. doi: 10.1093/bioinformatics/bty319. PubMed DOI
Zaki N, Tennakoon C. BioCarian: search engine for exploratory searches in heterogeneous biological databases. BMC Bioinf. 2017;18:435. doi: 10.1186/s12859-017-1840-4. PubMed DOI PMC
Apache Jena. https://jena.apache.org. Accessed 15 Sep 2022
Linked data views over RDBMS data source. http://docs.openlinksw.com/virtuoso/rdfviewsrdbms/. Accessed 15 Sep 2022
Ontop. https://ontop-vkg.org. Accessed 15 Sep 2022
Galgonek J, Vondrasek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform. 2021;13:38. doi: 10.1186/s13321-021-00515-1. PubMed DOI PMC
neXtProt SNORQL. https://snorql.nextprot.org. Accessed 15 Sep 2022
Kratochvil M, Vondrasek J, Galgonek J. Sachem: a chemical cartridge for high-performance substructure search. J Cheminform. 2018;10:27. doi: 10.1186/s13321-018-0282-y. PubMed DOI PMC