Lightweight Distributed Provenance Model for Complex Real-world Environments

. 2022 Aug 17 ; 9 (1) : 503. [epub] 20220817

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid35977957

Grantová podpora
824087 EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
824087 EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
825575 EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
824087 EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
DIFRA project Regione Autonoma della Sardegna (Sardinia Region)
DIFRA project Regione Autonoma della Sardegna (Sardinia Region)

Odkazy

PubMed 35977957
PubMed Central PMC9383664
DOI 10.1038/s41597-022-01537-6
PII: 10.1038/s41597-022-01537-6
Knihovny.cz E-zdroje

Provenance is information describing the lineage of an object, such as a dataset or biological material. Since these objects can be passed between organizations, each organization can document only parts of the objects life cycle. As a result, interconnection of distributed provenance parts forms distributed provenance chains. Dependant on the actual provenance content, complete provenance chains can provide traceability and contribute to reproducibility and FAIRness of research objects. In this paper, we define a lightweight provenance model based on W3C PROV that enables generation of distributed provenance chains in complex, multi-organizational environments. The application of the model is demonstrated with a use case spanning several steps of a real-world research pipeline - starting with the acquisition of a specimen, its processing and storage, histological examination, and the generation/collection of associated data (images, annotations, clinical data), ending with training an AI model for the detection of tumor in the images. The proposed model has become an open conceptual foundation of the currently developed ISO 23494 standard on provenance for biotechnology domain.

Zobrazit více v PubMed

Begley CG, Ioannidis JP. Reproducibility in science. Circulation Research. 2015;116:116–126. doi: 10.1161/CIRCRESAHA.114.303819. PubMed DOI

Servick K, Enserink M. The pandemic’s first major research scandal erupts. Science. 2020;368:1041–1042. doi: 10.1126/science.368.6495.1041. PubMed DOI

Mobley A, Linder SK, Braeuer R, Ellis LM, Zwelling L. A survey on data reproducibility in cancer research provides insights into our limited ability to translate findings from the laboratory to the clinic. PLOS ONE. 2013;8:1–4. doi: 10.1371/journal.pone.0063221. PubMed DOI PMC

Morrison SJ. Time to do something about reproducibility. eLife. 2014;3:1–4. doi: 10.7554/eLife.03981. PubMed DOI PMC

Byrne, J. A., Grima, N., Capes-Davis, A. & Labbé, C. The Possibility of Systematic Research Fraud Targeting Under-Studied Human Genes: Causes, Consequences, and Potential Solutions. Biomarker Insights14, 10.1177/1177271919829162 (2019). PubMed PMC

Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery. 2011;10:712–712. doi: 10.1038/nrd3439-c1. PubMed DOI

Nickerson, D. et al. The Human Physiome: how standards, software and innovative service infrastructures are providing the building blocks to make it achievable. Interface Focus6, 20150103, 10.1098/rsfs.2015.0103. 00001 (2016). PubMed PMC

Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical Research. PLOS Biology. 2015;13:1–9. doi: 10.1371/journal.pbio.1002165. PubMed DOI PMC

Mahase, E. Covid-19: 146 researchers raise concerns over chloroquine study that halted who trial. BMJ369, 10.1136/bmj.m2197 (2020). PubMed

Chaplin S. Research misconduct: how bad is it and what can be done. Future Prescriber. 2012;13:5–76. doi: 10.1002/fps.88. DOI

National Academies of Sciences, Engineering, and Medicine. Fostering Integrity in Research (National Academies Press, Washington, D.C., 2017). PubMed

Ioannidis JP, et al. Increasing value and reducing waste in research design, conduct, and analysis. The Lancet. 2014;383:166–175. doi: 10.1016/S0140-6736(13)62227-8. PubMed DOI PMC

Freedman LP, Inglese J. The Increasing Urgency for Standards in Basic Biologic Research. Cancer Research. 2014;74:4024–4029. doi: 10.1158/0008-5472.CAN-14-0925. PubMed DOI PMC

Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012;483:531–3. doi: 10.1038/483531a. PubMed DOI

Landis SC, et al. A call for transparent reporting to optimize the predictive value of preclinical research. Nature. 2012;490:187–191. doi: 10.1038/nature11556. PubMed DOI PMC

Consortium of European Taxonomic Facilities (CETAF) Code of Conduct and Best Practice for Access and Benefit-Sharing. https://ec.europa.eu/environment/nature/biodiversity/international/abs/pdf/CETAF%20Best%20Practice%20-%20Annex%20to%20Commission%20Decision%20C(2019)%203380%20final.pdf.

Benson EE, Harding K, Mackenzie-dodds J. A new quality management perspective for biodiversity conservation and research: Investigating Biospecimen Reporting for Improved Study Quality (BRISQ) and the Standard PRE-analytical Code (SPREC) using Natural History Museum and culture collections as case studies. Systematics and Biodiversity. 2016;14:525–547. doi: 10.1080/14772000.2016.1201167. DOI

Curcin, V. et al. Implementing interoperable provenance in biomedical research. Future Generation Computer Systems34, 1–16, 10.1016/j.future.2013.12.001. Special Section: Distributed Solutions for Ubiquitous Computing and Ambient Intelligence (2014).

Xu, S., Ni, Q., Bertino, E. & Sandhu, R. A characterization of the problem of secure provenance management. In 2009 IEEE International Conference on Intelligence and Security Informatics, 310–314, 10.1109/ISI.2009.5137332 (2009).

Wittner, R. et al. Iso 23494: Biotechnology – provenance information model for biological specimen and data. In Glavic, B., Braganholo, V. & Koop, D. (eds.) Provenance and Annotation of Data and Processes, 222–225, 10.1007/978-3-030-80960-7_16 (Springer International Publishing, Cham, 2021).

Groth, P. & Moreau, L. Prov-overview: An overview of the prov family of documents. https://www.w3.org/TR/prov-overview/ (2013).

Buneman, P., Caro, A., Moreau, L. & Murray-Rust, D. Provenance composition in prov. https://eprints.soton.ac.uk/408513/ (2017).

Moreau L, Groth P. Provenance: An introduction to prov. Synthesis Lectures on the Semantic Web: Theory and Technology. 2013;3:1–129. doi: 10.2200/S00528ED1V01Y201308WBE007. DOI

Wittner R, 2021. EOSC-life common provenance model. Zenodo. DOI

Braun, U., Shinnar, A. & Seltzer, M. Securing provenance. In Proceedings of the 3rd Conference on Hot Topics in Security, HOTSEC’08, 4:1–4:5 (USENIX Association, 2008).

Moreau L, et al. The open provenance model core specification (v1.1) Future Generation Computer Systems. 2011;27:743–756. doi: 10.1016/j.future.2010.07.005. DOI

Khan, F. Z. et al. Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv. GigaScience8, Giz095, 10.1093/gigascience/giz095. (2019). PubMed PMC

Samuel, S. & König-Ries, B. Reproduce-me: Ontology-based data access for reproducibility of microscopy experiments. In Blomqvist, E. et al. (eds.) The Semantic Web: ESWC 2017 Satellite Events, 17–20, 10.1007/978-3-319-70407-4_4 (Springer International Publishing, Cham, 2017).

Margheri A, Masi M, Miladi A, Sassone V, Rosenzweig J. Decentralised provenance for healthcare data. International Journal of Medical Informatics. 2020;141:104197. doi: 10.1016/j.ijmedinf.2020.104197. PubMed DOI

Magagna, B. et al. Data Provenance, 208–225 (Springer International Publishing, Cham, 2020).

Magagna, B. et al. Data provenance and tracing for environmental sciences: system design. http://www.envriplus.eu/wp-content/uploads/2015/08/D8.5-Data-provenance-and-tracing-for-environmental-sciences-system-design.pdf (2018).

Curcin V, Fairweather E, Danger R, Corrigan D. Templates as a method for implementing data provenance in decision support systems. Journal of Biomedical Informatics. 2017;65:1–21. doi: 10.1016/j.jbi.2016.10.022. PubMed DOI

McClatchey, R. et al. Traceability and provenance in big data medical systems. In 2015 IEEE 28th International Symposium on Computer-Based Medical Systems, 226–231, 10.1109/CBMS.2015.10 (2015).

Giesler, A., Czekala, M., Hagemeier, B. & Grunzke, R. Uniprov: A flexible provenance tracking system for unicore. In Di Napoli, E., Hermanns, M.-A., Iliev, H., Lintermann, A. & Peyser, A. (eds.) High-Performance Scientific Computing, 233–242, 10.1007/978-3-319-53862-4_20 (Springer International Publishing, Cham, 2017).

Alterovitz G, et al. Enabling precision medicine via standard communication of hts provenance, analysis, and results. PLOS Biology. 2019;16:1–14. doi: 10.1371/journal.pbio.3000099. PubMed DOI PMC

Mammoliti A, Smirnov P, Safikhani Z, Ba-Alawi W, Haibe-Kains B. Creating reproducible pharmacogenomic analysis pipelines. Scientific Data. 2019;6:166. doi: 10.1038/s41597-019-0174-7. PubMed DOI PMC

Servillat, M. et al. Ivoa provenance data model. https://www.ivoa.net/documents/ProvenanceDM/ (2020).

Samuel, S., Löffler, F. & König-Ries, B. Machine learning pipelines: Provenance, reproducibility and fair data principles. In Glavic, B., Braganholo, V. & Koop, D. (eds.) Provenance and Annotation of Data and Processes, 226–230, 10.1007/978-3-030-80960-7_17 (Springer International Publishing, Cham, 2021).

Wang, J., Crawl, D., Purawat, S., Nguyen, M. & Altintas, I. Big data provenance: Challenges, state of the art and opportunities. In 2015 IEEE International Conference on Big Data (Big Data), 2509–2516, 10.1109/BigData.2015.7364047 (2015). PubMed PMC

Oliveira, W., Missier, P., Ocaña, K., de Oliveira, D. & Braganholo, V. Analyzing provenance across heterogeneous provenance graphs. In Mattoso, M. & Glavic, B. (eds.) Provenance and Annotation of Data and Processes, 57–70, 10.1007/978-3-319-40593-3_5 (Springer International Publishing, Cham, 2016).

Khuller, S. & Raghavachari, B. Basic graph algorithms. In Algorithms and theory of computation handbook: general concepts and techniques (CRC press, 2010).

Crawl, D., Wang, J. & Altintas, I. Provenance for mapreduce-based data-intensive workflows. In Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science, WORKS ’11, 21–30, 10.1145/2110497.2110501 (Association for Computing Machinery, New York, NY, USA, 2011).

Saltzer JH, Schroeder MD. The protection of information in computer systems. Proceedings of the IEEE. 1975;63:1278–1308. doi: 10.1109/PROC.1975.9939. DOI

Ahmad, R., Jung, E., de Senne Garcia, C., Irshad, H. & Gehani, A. Discrepancy detection in whole network provenance. In 12th International Workshop on Theory and Practice of Provenance (TaPP 2020) (USENIX Association, 2020).

Buneman, P., Caro, A. & Murray-Rust, D. Composition and substitution in provenance and workflows. In 8th USENIX Workshop on the Theory and Practice of Provenance (TaPP 16) (USENIX Association, Washington, D.C., 2016).

Cheney, J. & Perera, R. An analytical survey of provenance sanitization. In Ludascher, B. & Plale, B. (eds.) Provenance and Annotation of Data and Processes, 113–126, 10.1007/978-3-319-16462-5_9 (Springer International Publishing, 2015).

Moreau, L. et al. Linking across provenance bundles. https://www.w3.org/TR/prov-links/ (2013).

De Nies, T. Constraints of the prov data model. https://www.w3.org/TR/prov-constraints/ (2013).

Wittner, R. Distributed provenance information model for sensitive data in life sciences. https://is.muni.cz/th/ed52n/ (2022).

Nguyen, D., Park, J. & Sandhu, R. Dependency path patterns as the foundation of access control in provenance-aware systems. In Proceedings of the 4th USENIX Conference on Theory and Practice of Provenance, 4–4 (USENIX Association, 2012).

Moreau, L. et al. Provenance access and query. https://www.w3.org/TR/prov-aq/ (2013).

Nies, T. et al. A lightweight provenance pingback and query service for web publications. In Revised Selected Papers of the 5th International Provenance and Annotation Workshop on Provenance and Annotation of Data and Processes - Volume 8628, IPAW 2014, 203–208, 10.1007/978-3-319-16462-5_16 (Springer-Verlag, Berlin, Heidelberg, 2014).

Valle, M. et al. A persistent identifier (pid) policy for the european open science cloud (eosc). https://op.europa.eu/en/publication-detail/-/publication/35c5ca10-1417-11eb-b57e-01aa75ed71a1, 10.2777/926037 (2020).

Ciccarese P, et al. Pav ontology: provenance, authoring and versioning. Journal of Biomedical Semantics. 2013;4:37. doi: 10.1186/2041-1480-4-37. PubMed DOI PMC

Fairweather, E., Wittner, R., Chapman, M., Holub, P. & Curcin, V. Non-repudiable provenance for clinical decision support systems. In Glavic, B., Braganholo, V. & Koop, D. (eds.) Provenance and Annotation of Data and Processes, 165–182, 10.1007/978-3-030-80960-7_10 (Springer International Publishing, Cham, 2021).

Moreau L, Batlajery BV, Huynh TD, Michaelides D, Packer H. A templating system to generate provenance. IEEE Transactions on Software Engineering. 2018;44:103–121. doi: 10.1109/TSE.2017.2659745. DOI

Fairweather, E., Alper, P., Porat, T. & Curcin, V. Architecture for template-driven provenance recording. In Belhajjame, K., Gehani, A. & Alper, P. (eds.) Provenance and Annotation of Data and Processes, 217–221, 10.1007/978-3-319-98379-0_23 (Springer International Publishing, 2018).

Moreau, L. et al. Prov-dm: The prov data model. https://www.w3.org/TR/prov-dm/ (2013).

Dcmi metadata terms. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/2012-06-14/ (2012).

Bezak, M. Provenance model implementation for medical images processing by ai. https://is.muni.cz/th/axuvh/. Bachelor thesis (2021).

Wittner, R. et al. A provenance standard for life sciences - Enabling reliable, reproducible and traceable research. https://cdn-api.swapcard.com/public/files/928958c89f044768b2eff40bd1112559.pdf (2021).

Frexia F, et al. The common provenance model: Capturing distributed provenance in life sciences processes. Stud Health Technol Inform. 2022;294:415–416. doi: 10.3233/SHTI220489. PubMed DOI

Satyanarayanan M, Goode A, Gilbert B, Harkes J, Jukic D. OpenSlide: A vendor-neutral software foundation for digital pathology. Journal of Pathology Informatics. 2013;4:27. doi: 10.4103/2153-3539.119005. PubMed DOI PMC

Belhajjame, K. et al. Prov model primer. https://www.w3.org/TR/prov-primer/ (2013).

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Recording provenance of workflow runs with RO-Crate

. 2024 ; 19 (9) : e0309210. [epub] 20240910

Toward a common standard for data and specimen provenance in life sciences

. 2024 Jan ; 8 (1) : e10365. [epub] 20230418

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...