Lightweight Distributed Provenance Model for Complex Real-world Environments
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
824087
EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
824087
EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
825575
EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
824087
EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
DIFRA project
Regione Autonoma della Sardegna (Sardinia Region)
DIFRA project
Regione Autonoma della Sardegna (Sardinia Region)
PubMed
35977957
PubMed Central
PMC9383664
DOI
10.1038/s41597-022-01537-6
PII: 10.1038/s41597-022-01537-6
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
Provenance is information describing the lineage of an object, such as a dataset or biological material. Since these objects can be passed between organizations, each organization can document only parts of the objects life cycle. As a result, interconnection of distributed provenance parts forms distributed provenance chains. Dependant on the actual provenance content, complete provenance chains can provide traceability and contribute to reproducibility and FAIRness of research objects. In this paper, we define a lightweight provenance model based on W3C PROV that enables generation of distributed provenance chains in complex, multi-organizational environments. The application of the model is demonstrated with a use case spanning several steps of a real-world research pipeline - starting with the acquisition of a specimen, its processing and storage, histological examination, and the generation/collection of associated data (images, annotations, clinical data), ending with training an AI model for the detection of tumor in the images. The proposed model has become an open conceptual foundation of the currently developed ISO 23494 standard on provenance for biotechnology domain.
BBMRI ERIC Neue Stiftingtalstrasse 2 8010 Graz Austria
Faculty of Informatics Masaryk University Botanická 68a 602 00 Brno Czech Republic
Institute of Computer Science Masaryk University Šumavská 416 15 602 00 Brno Czech Republic
Zobrazit více v PubMed
Begley CG, Ioannidis JP. Reproducibility in science. Circulation Research. 2015;116:116–126. doi: 10.1161/CIRCRESAHA.114.303819. PubMed DOI
Servick K, Enserink M. The pandemic’s first major research scandal erupts. Science. 2020;368:1041–1042. doi: 10.1126/science.368.6495.1041. PubMed DOI
Mobley A, Linder SK, Braeuer R, Ellis LM, Zwelling L. A survey on data reproducibility in cancer research provides insights into our limited ability to translate findings from the laboratory to the clinic. PLOS ONE. 2013;8:1–4. doi: 10.1371/journal.pone.0063221. PubMed DOI PMC
Morrison SJ. Time to do something about reproducibility. eLife. 2014;3:1–4. doi: 10.7554/eLife.03981. PubMed DOI PMC
Byrne, J. A., Grima, N., Capes-Davis, A. & Labbé, C. The Possibility of Systematic Research Fraud Targeting Under-Studied Human Genes: Causes, Consequences, and Potential Solutions. Biomarker Insights14, 10.1177/1177271919829162 (2019). PubMed PMC
Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery. 2011;10:712–712. doi: 10.1038/nrd3439-c1. PubMed DOI
Nickerson, D. et al. The Human Physiome: how standards, software and innovative service infrastructures are providing the building blocks to make it achievable. Interface Focus6, 20150103, 10.1098/rsfs.2015.0103. 00001 (2016). PubMed PMC
Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical Research. PLOS Biology. 2015;13:1–9. doi: 10.1371/journal.pbio.1002165. PubMed DOI PMC
Mahase, E. Covid-19: 146 researchers raise concerns over chloroquine study that halted who trial. BMJ369, 10.1136/bmj.m2197 (2020). PubMed
Chaplin S. Research misconduct: how bad is it and what can be done. Future Prescriber. 2012;13:5–76. doi: 10.1002/fps.88. DOI
National Academies of Sciences, Engineering, and Medicine. Fostering Integrity in Research (National Academies Press, Washington, D.C., 2017). PubMed
Ioannidis JP, et al. Increasing value and reducing waste in research design, conduct, and analysis. The Lancet. 2014;383:166–175. doi: 10.1016/S0140-6736(13)62227-8. PubMed DOI PMC
Freedman LP, Inglese J. The Increasing Urgency for Standards in Basic Biologic Research. Cancer Research. 2014;74:4024–4029. doi: 10.1158/0008-5472.CAN-14-0925. PubMed DOI PMC
Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012;483:531–3. doi: 10.1038/483531a. PubMed DOI
Landis SC, et al. A call for transparent reporting to optimize the predictive value of preclinical research. Nature. 2012;490:187–191. doi: 10.1038/nature11556. PubMed DOI PMC
Consortium of European Taxonomic Facilities (CETAF) Code of Conduct and Best Practice for Access and Benefit-Sharing. https://ec.europa.eu/environment/nature/biodiversity/international/abs/pdf/CETAF%20Best%20Practice%20-%20Annex%20to%20Commission%20Decision%20C(2019)%203380%20final.pdf.
Benson EE, Harding K, Mackenzie-dodds J. A new quality management perspective for biodiversity conservation and research: Investigating Biospecimen Reporting for Improved Study Quality (BRISQ) and the Standard PRE-analytical Code (SPREC) using Natural History Museum and culture collections as case studies. Systematics and Biodiversity. 2016;14:525–547. doi: 10.1080/14772000.2016.1201167. DOI
Curcin, V. et al. Implementing interoperable provenance in biomedical research. Future Generation Computer Systems34, 1–16, 10.1016/j.future.2013.12.001. Special Section: Distributed Solutions for Ubiquitous Computing and Ambient Intelligence (2014).
Xu, S., Ni, Q., Bertino, E. & Sandhu, R. A characterization of the problem of secure provenance management. In 2009 IEEE International Conference on Intelligence and Security Informatics, 310–314, 10.1109/ISI.2009.5137332 (2009).
Wittner, R. et al. Iso 23494: Biotechnology – provenance information model for biological specimen and data. In Glavic, B., Braganholo, V. & Koop, D. (eds.) Provenance and Annotation of Data and Processes, 222–225, 10.1007/978-3-030-80960-7_16 (Springer International Publishing, Cham, 2021).
Groth, P. & Moreau, L. Prov-overview: An overview of the prov family of documents. https://www.w3.org/TR/prov-overview/ (2013).
Buneman, P., Caro, A., Moreau, L. & Murray-Rust, D. Provenance composition in prov. https://eprints.soton.ac.uk/408513/ (2017).
Moreau L, Groth P. Provenance: An introduction to prov. Synthesis Lectures on the Semantic Web: Theory and Technology. 2013;3:1–129. doi: 10.2200/S00528ED1V01Y201308WBE007. DOI
Wittner R, 2021. EOSC-life common provenance model. Zenodo. DOI
Braun, U., Shinnar, A. & Seltzer, M. Securing provenance. In Proceedings of the 3rd Conference on Hot Topics in Security, HOTSEC’08, 4:1–4:5 (USENIX Association, 2008).
Moreau L, et al. The open provenance model core specification (v1.1) Future Generation Computer Systems. 2011;27:743–756. doi: 10.1016/j.future.2010.07.005. DOI
Khan, F. Z. et al. Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv. GigaScience8, Giz095, 10.1093/gigascience/giz095. (2019). PubMed PMC
Samuel, S. & König-Ries, B. Reproduce-me: Ontology-based data access for reproducibility of microscopy experiments. In Blomqvist, E. et al. (eds.) The Semantic Web: ESWC 2017 Satellite Events, 17–20, 10.1007/978-3-319-70407-4_4 (Springer International Publishing, Cham, 2017).
Margheri A, Masi M, Miladi A, Sassone V, Rosenzweig J. Decentralised provenance for healthcare data. International Journal of Medical Informatics. 2020;141:104197. doi: 10.1016/j.ijmedinf.2020.104197. PubMed DOI
Magagna, B. et al. Data Provenance, 208–225 (Springer International Publishing, Cham, 2020).
Magagna, B. et al. Data provenance and tracing for environmental sciences: system design. http://www.envriplus.eu/wp-content/uploads/2015/08/D8.5-Data-provenance-and-tracing-for-environmental-sciences-system-design.pdf (2018).
Curcin V, Fairweather E, Danger R, Corrigan D. Templates as a method for implementing data provenance in decision support systems. Journal of Biomedical Informatics. 2017;65:1–21. doi: 10.1016/j.jbi.2016.10.022. PubMed DOI
McClatchey, R. et al. Traceability and provenance in big data medical systems. In 2015 IEEE 28th International Symposium on Computer-Based Medical Systems, 226–231, 10.1109/CBMS.2015.10 (2015).
Giesler, A., Czekala, M., Hagemeier, B. & Grunzke, R. Uniprov: A flexible provenance tracking system for unicore. In Di Napoli, E., Hermanns, M.-A., Iliev, H., Lintermann, A. & Peyser, A. (eds.) High-Performance Scientific Computing, 233–242, 10.1007/978-3-319-53862-4_20 (Springer International Publishing, Cham, 2017).
Alterovitz G, et al. Enabling precision medicine via standard communication of hts provenance, analysis, and results. PLOS Biology. 2019;16:1–14. doi: 10.1371/journal.pbio.3000099. PubMed DOI PMC
Mammoliti A, Smirnov P, Safikhani Z, Ba-Alawi W, Haibe-Kains B. Creating reproducible pharmacogenomic analysis pipelines. Scientific Data. 2019;6:166. doi: 10.1038/s41597-019-0174-7. PubMed DOI PMC
Servillat, M. et al. Ivoa provenance data model. https://www.ivoa.net/documents/ProvenanceDM/ (2020).
Samuel, S., Löffler, F. & König-Ries, B. Machine learning pipelines: Provenance, reproducibility and fair data principles. In Glavic, B., Braganholo, V. & Koop, D. (eds.) Provenance and Annotation of Data and Processes, 226–230, 10.1007/978-3-030-80960-7_17 (Springer International Publishing, Cham, 2021).
Wang, J., Crawl, D., Purawat, S., Nguyen, M. & Altintas, I. Big data provenance: Challenges, state of the art and opportunities. In 2015 IEEE International Conference on Big Data (Big Data), 2509–2516, 10.1109/BigData.2015.7364047 (2015). PubMed PMC
Oliveira, W., Missier, P., Ocaña, K., de Oliveira, D. & Braganholo, V. Analyzing provenance across heterogeneous provenance graphs. In Mattoso, M. & Glavic, B. (eds.) Provenance and Annotation of Data and Processes, 57–70, 10.1007/978-3-319-40593-3_5 (Springer International Publishing, Cham, 2016).
Khuller, S. & Raghavachari, B. Basic graph algorithms. In Algorithms and theory of computation handbook: general concepts and techniques (CRC press, 2010).
Crawl, D., Wang, J. & Altintas, I. Provenance for mapreduce-based data-intensive workflows. In Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science, WORKS ’11, 21–30, 10.1145/2110497.2110501 (Association for Computing Machinery, New York, NY, USA, 2011).
Saltzer JH, Schroeder MD. The protection of information in computer systems. Proceedings of the IEEE. 1975;63:1278–1308. doi: 10.1109/PROC.1975.9939. DOI
Ahmad, R., Jung, E., de Senne Garcia, C., Irshad, H. & Gehani, A. Discrepancy detection in whole network provenance. In 12th International Workshop on Theory and Practice of Provenance (TaPP 2020) (USENIX Association, 2020).
Buneman, P., Caro, A. & Murray-Rust, D. Composition and substitution in provenance and workflows. In 8th USENIX Workshop on the Theory and Practice of Provenance (TaPP 16) (USENIX Association, Washington, D.C., 2016).
Cheney, J. & Perera, R. An analytical survey of provenance sanitization. In Ludascher, B. & Plale, B. (eds.) Provenance and Annotation of Data and Processes, 113–126, 10.1007/978-3-319-16462-5_9 (Springer International Publishing, 2015).
Moreau, L. et al. Linking across provenance bundles. https://www.w3.org/TR/prov-links/ (2013).
De Nies, T. Constraints of the prov data model. https://www.w3.org/TR/prov-constraints/ (2013).
Wittner, R. Distributed provenance information model for sensitive data in life sciences. https://is.muni.cz/th/ed52n/ (2022).
Nguyen, D., Park, J. & Sandhu, R. Dependency path patterns as the foundation of access control in provenance-aware systems. In Proceedings of the 4th USENIX Conference on Theory and Practice of Provenance, 4–4 (USENIX Association, 2012).
Moreau, L. et al. Provenance access and query. https://www.w3.org/TR/prov-aq/ (2013).
Nies, T. et al. A lightweight provenance pingback and query service for web publications. In Revised Selected Papers of the 5th International Provenance and Annotation Workshop on Provenance and Annotation of Data and Processes - Volume 8628, IPAW 2014, 203–208, 10.1007/978-3-319-16462-5_16 (Springer-Verlag, Berlin, Heidelberg, 2014).
Valle, M. et al. A persistent identifier (pid) policy for the european open science cloud (eosc). https://op.europa.eu/en/publication-detail/-/publication/35c5ca10-1417-11eb-b57e-01aa75ed71a1, 10.2777/926037 (2020).
Ciccarese P, et al. Pav ontology: provenance, authoring and versioning. Journal of Biomedical Semantics. 2013;4:37. doi: 10.1186/2041-1480-4-37. PubMed DOI PMC
Fairweather, E., Wittner, R., Chapman, M., Holub, P. & Curcin, V. Non-repudiable provenance for clinical decision support systems. In Glavic, B., Braganholo, V. & Koop, D. (eds.) Provenance and Annotation of Data and Processes, 165–182, 10.1007/978-3-030-80960-7_10 (Springer International Publishing, Cham, 2021).
Moreau L, Batlajery BV, Huynh TD, Michaelides D, Packer H. A templating system to generate provenance. IEEE Transactions on Software Engineering. 2018;44:103–121. doi: 10.1109/TSE.2017.2659745. DOI
Fairweather, E., Alper, P., Porat, T. & Curcin, V. Architecture for template-driven provenance recording. In Belhajjame, K., Gehani, A. & Alper, P. (eds.) Provenance and Annotation of Data and Processes, 217–221, 10.1007/978-3-319-98379-0_23 (Springer International Publishing, 2018).
Moreau, L. et al. Prov-dm: The prov data model. https://www.w3.org/TR/prov-dm/ (2013).
Dcmi metadata terms. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/2012-06-14/ (2012).
Bezak, M. Provenance model implementation for medical images processing by ai. https://is.muni.cz/th/axuvh/. Bachelor thesis (2021).
Wittner, R. et al. A provenance standard for life sciences - Enabling reliable, reproducible and traceable research. https://cdn-api.swapcard.com/public/files/928958c89f044768b2eff40bd1112559.pdf (2021).
Frexia F, et al. The common provenance model: Capturing distributed provenance in life sciences processes. Stud Health Technol Inform. 2022;294:415–416. doi: 10.3233/SHTI220489. PubMed DOI
Satyanarayanan M, Goode A, Gilbert B, Harkes J, Jukic D. OpenSlide: A vendor-neutral software foundation for digital pathology. Journal of Pathology Informatics. 2013;4:27. doi: 10.4103/2153-3539.119005. PubMed DOI PMC
Belhajjame, K. et al. Prov model primer. https://www.w3.org/TR/prov-primer/ (2013).
Recording provenance of workflow runs with RO-Crate
Toward a common standard for data and specimen provenance in life sciences