Recording provenance of workflow runs with RO-Crate
Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
39255315
PubMed Central
PMC11386446
DOI
10.1371/journal.pone.0309210
PII: PONE-D-24-08706
Knihovny.cz E-zdroje
- MeSH
- průběh práce * MeSH
- reprodukovatelnost výsledků MeSH
- software MeSH
- strojové učení MeSH
- Publikační typ
- časopisecké články MeSH
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
Barcelona Supercomputing Center Barcelona Spain
Biozentrum University of Basel Basel Switzerland
Center for Advanced Studies Research and Development in Sardinia Italy
Computer Science Department Università degli Studi di Torino Torino Italy
Department of Computer Science The University of Manchester Manchester United Kingdom
DTL Projects Utrecht The Netherlands
Faculty of Informatics Masaryk University Brno Czech Republic
Forschungszentrum Jülich Jülich Germany
Informatics Institute University of Amsterdam Amsterdam The Netherlands
Institute for Advanced Academic Research Chiba University Chiba Japan
Institute of Computer Science Masaryk University Brno Czech Republic
Ontology Engineering Group Universidad Politécnica de Madrid Madrid Spain
Sator Incorporated Tokyo Japan
Zobrazit více v PubMed
Moreau L, Missier P, Belhajjame K, B’Far R, Cheney J, Coppens S, et al.. PROV-DM: The PROV Data Model. W3C Recommendation 30 April 2013. [cited 2023 Dec 7]. https://www.w3.org/TR/2013/REC-prov-dm-20130430/
Herschel M, Diestelkämper R, Ben Lahmar H. A survey on provenance: What for? What form? What from? The VLDB Journal, 2017;26:881–906. doi: 10.1007/s00778-017-0486-1 DOI
Himanen L, Geurts A, Foster AS, Rinke P. Data-Driven Materials Science: Status, Challenges, and Perspectives. Advanced Science, 2019;6(21):1900808. doi: 10.1002/advs.201900808 PubMed DOI PMC
Gauthier J, Vincent AT, Charette SJ, Derome N. A brief history of bioinformatics. Briefings in Bioinformatics, 2019;20(6):1981–1996. doi: 10.1093/bib/bby063 PubMed DOI
Huntingford C, Jeffers ES, Bonsall MB, Christensen HM, Lees T, Yang H. Machine learning and artificial intelligence to aid climate change research and preparedness. Environmental Research Letters, 2019;14(12):124007. doi: 10.1088/1748-9326/ab4e55 DOI
Lebo T, Sahoo S, McGuinness D, Belhajjame K, Cheney J, Corsar D, et al.. PROV-O: The PROV Ontology. W3C Recommendation 30 April 2013. [cited 2023 Dec 7]. https://www.w3.org/TR/2013/REC-prov-o-20130430/
W3C OWL Working Group. OWL 2 Web Ontology Language Document Overview (Second Edition). W3C Recommendation 11 December 2012 [cited 2023 Dec 7]. http://www.w3.org/TR/2012/REC-owl2-overview-20121211/
Missier P, Dey S, Belhajjame K, Cuevas-Vicenttín V, Ludäscher B. D-PROV: extending the PROV provenance model with workflow structure. In Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance (TaPP’13), 2013.
Cuevas-Vicenttín V, Ludäscher B, Missier P, Belhajjame K, Chirigati F, Wei Y, et al.. ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance, 2016. [cited 2023 Dec 7]. https://purl.dataone.org/provone-v1-dev
Garijo D, Gil Y. A new approach for publishing workflows: abstractions, standards, and linked data. In Proceedings of the 6th workshop on Workflows in support of large-scale science (WORKS’11) 2011.
Garijo D, Gil Y. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the Second International Workshop on Linked Science, 2012.
Freire J, Koop D, Santos E, Silva CT. Provenance for Computational Tasks: A Survey. Computing in Science & Engineering 2012;10(3):11–21. doi: 10.1109/MCSE.2008.79 DOI
Gil Y, Ratnakar V, Kim J, Gonzalez-Calero P, Groth P, Moody J, et al.. Wings: Intelligent Workflow-Based Design of Computational Experiments. IEEE Intelligent Systems 2011;26(1). doi: 10.1109/MIS.2010.9 DOI
Scheidegger CE, Vo HT, Koop D, Freire J, Silva CT. Querying and re-using workflows with VisTrails. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data 2008.
Costa F, Silva V, de Oliveira D, Ocaña K, Ogasawara E, Dias J, et al. Capturing and querying workflow runtime provenance with PROV: a practical approach. In Proceedings of the Joint EDBT/ICDT 2013 Workshops 2013.
Atkinson M, Gesing S, Montagnat J, Taylor I. Scientific workflows: Past, present and future. Future Generation Computer Systems 2017;75:216–227. doi: 10.1016/j.future.2017.05.041 DOI
Pérez B, Rubio J, Sáenz-Adán C. A systematic review of provenance systems. Knowledge and Information Systems 2018;57:495–543. doi: 10.1007/s10115-018-1164-3 DOI
Belhajjame K, Zhao J, Garijo D, Gamble M, Hettne K, Palma R, et al.. Using a suite of ontologies for preserving workflow-centric research objects. Journal of Web Semantics 2015;32:16–42. doi: 10.1016/j.websem.2015.01.003 DOI
Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, et al.. Why linked data is not enough for scientists. Future Generation Computer Systems 2013;29(2):599–611. doi: 10.1016/j.future.2011.08.004 DOI
Garijo D, Gil Y, Corcho O. Towards Workflow Ecosystems through Semantic and Standard Representations. In Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science 2014.
Samuel S, König-Ries B. End-to-End provenance representation for the understandability and reproducibility of scientific experiments using a semantic approach. Journal of Biomedical Semantics 2022;13:1. doi: 10.1186/s13326-021-00253-1 PubMed DOI PMC
Samuel S, König-Ries B. ProvBook: Provenance-based Semantic Enrichment of Interactive Notebooks for Reproducibility. The 17th International Semantic Web Conference (ISWC) 2018 Demo Track, 2018.
Khan FZ, Soiland-Reyes S, Sinnott RO, Lonie A, Goble C, Crusoe MR. Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv. GigaScience 2019;8(11):giz095. doi: 10.1093/gigascience/giz095 PubMed DOI PMC
Chard K, D’Arcy M, Heavner B, Foster I, Kesselman C, Madduri R, et al. I’ll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets. 2016 IEEE International Conference on Big Data (Big Data) 2016;319–328.
Soiland-Reyes S, Khan FZ, Crusoe MR. common-workflow-language/cwlprov: CWLProv 0.6.0. Zenodo, 2018.
Soiland-Reyes S, Alper P, Goble C. Tracking workflow execution with TavernaProv. Zenodo, 2016.
Crusoe MR, Abeln S, Iosup A, Amstutz P, Chilton J, Tijanić N, et al.. Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language. Communications of the ACM, 2022;65(6):54–63. doi: 10.1145/3486897 DOI
Common Workflow Language Implementations [cited 2024 May 24]. https://www.commonwl.org/implementations/
Amstutz P, Crusoe MR, Khan FZ, Soiland-Reyes S, Singh M, Kumar K, et al. common-workflow-language/cwltool: 3.1.20230127121939. Zenodo, 2023.
Lordan F, Tejedor E, Ejarque J, Rafanell R, Álvarez J, Marozzo F, et al.. ServiceSs: An interoperable programming framework for the cloud. Journal of Grid Computing 2014;12:67–91. doi: 10.1007/s10723-013-9272-5 DOI
Research Object Bundle context [cited 2024 May 24] https://w3id.org/bundle/context
Chard K, Gaffney N, Jones MB, Kowalik K, Ludäscher B, McPhillips T, et al. Application of BagIt-Serialized Research Object Bundles for Packaging and Re-Execution of Computational Analyses. 2019 15th International Conference on eScience (eScience) 2019.
Soiland-Reyes S, Sefton P, Crosas M, Castro LJ, Coppens F, Fernández JM, et al.. Packaging research artefacts with RO-Crate. Data Science 2022;5(2):97–138. doi: 10.3233/DS-210053 DOI
Guha RV, Brickley D, Macbeth S. Schema.org: Evolution of Structured Data on the Web: Big data makes common schemas even more necessary. Queue 2015;13(9):10–37. doi: 10.1145/2857274.2857276 DOI
Sporny M, Longley D, Kellogg G, Lanthaler M, Champin PA, Lindström N. JSON-LD 1.1: A JSON-based Serialization for Linked Data. W3C Recommendation 16 July 2020. [cited 2023 Dec 11]. https://www.w3.org/TR/2020/REC-json-ld11-20200716/
RO-Crate profiles [cited 2024 July 1]. https://www.researchobject.org/ro-crate/profiles.html#ro-crate-profiles
Goble C, Soiland-Reyes S, Bacall F, Owen S, Williams A, Eguinoa I, et al.. Implementing FAIR Digital Objects in the EOSC-Life Workflow Collaboratory. Zenodo, 2021. doi: 10.5281/zenodo.4605654 DOI
Bacall F, Williams AR, Owen S, Soiland-Reyes S. Workflow RO-Crate Profile 1.0. WorkflowHub community, 2022. [cited 2023 Dec 11]. https://w3id.org/workflowhub/workflow-ro-crate/1.0
Batista D, Gonzalez-Beltran A, Sansone SA, Rocca-Serra P. Machine actionable metadata models. Scientific Data, 2022;9:592. doi: 10.1038/s41597-022-01707-6 PubMed DOI PMC
Isaac A, Summers E. SKOS Simple Knowledge Organization System Primer. W3C Working Group Note 18 August 2009 [cited 2023 Dec 11]. https://www.w3.org/TR/2009/NOTE-skos-primer-20090818/
Soiland-Reyes S, Sefton P, Castro LJ, Coppens F, Garijo D, Leo S, et al.. Creating lightweight FAIR Digital Objects with RO-Crate. Research Ideas and Outcomes, 2022;8:e93937. doi: 10.3897/rio.8.e93937 DOI
Workflow Run RO-Crate [cited 2024 May 24]. https://www.researchobject.org/workflow-run-crate
Workflow Run RO-Crate competency questions [cited 2024 May 24]. https://www.researchobject.org/workflow-run-crate/requirements
Workflow Run RO-Crate working group. Process Run Crate specification. Version 0.5. Zenodo, 2024.
Workflow Run RO-Crate working group. Workflow Run Crate specification. Version 0.5. Zenodo, 2024.
Workflow Run RO-Crate working group. Provenance Run Crate specification. Version 0.5. Zenodo, 2024.
SPARQL queries for the Competency Questions [cited 2024 June 4]. https://github.com/ResearchObject/workflow-run-crate/tree/main/docs/sparql
RO-Crate JSON-LD context, version 1.1 [cited 2024 May 24]. https://www.researchobject.org/ro-crate/1.1/context.jsonld
Gray A, Goble C, Jimenez R, The Bioschemas Community (2017). Bioschemas: From Potato Salad to Protein Annotation. ISWC (Posters, Demos & Industry Tracks), 2017. https://iswc2017.semanticweb.org/paper-579/
Bioschemas ComputationalWorkflow Profile, version 1.0-RELEASE (09 March 2021) [cited 2024 May 24]. https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE
ro-terms: Workflow run namespace [cited 2024 Jul 03]. https://w3id.org/ro/terms/workflow-run
Köster J, Rahmann S. Snakemake–a scalable bioinformatics workflow engine. Bioinformatics 2012;28(19):2520–2522. doi: 10.1093/bioinformatics/bts480 PubMed DOI
Colonnelli I, Cantalupo B, Merelli I, Aldinucci M. StreamFlow: cross-breeding Cloud with HPC. IEEE Transactions on Emerging Topics in Computing, 2021;9(4):1723–1737. doi: 10.1109/TETC.2020.3019202 DOI
The Galaxy Community. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Research 2022;50(W1):W345–W351. doi: 10.1093/nar/gkac247 PubMed DOI PMC
Schema.org HowToStep definition [cited 2024 May 24]. https://schema.org/HowToStep
Leo S, Soiland-Reyes S, Crusoe MR. Runcrate. Version 0.5.0. Zenodo, 2023.
Blankenberg D, Von Kuster G, Bouvier E, Baker D, Afgan E, Stoler N, et al.. Dissemination of scientific software with Galaxy ToolShed. Genome Biology 2014;15:403. doi: 10.1186/gb4161 PubMed DOI PMC
The Galaxy Community. Galaxy. Version 23.1 Software Heritage Archive, 2023. https://identifiers.org/swh:1:rel:33ce0ce4f6e3d77d5c0af8cff24b2f68ba8d57e9
De Geest P, Droesbeke B, Eguinoa I, Gaignard A, Huber S, Kinoshita B, et al.. ResearchObject/ro-crate-py: ro-crate-py 0.9.0. Zenodo, 2023. doi: 10.5281/zenodo.10017862 DOI
De Geest P, Coppens F, Soiland-Reyes S, Eguinoa I, Leo S. Enhancing RDM in Galaxy by integrating RO-Crate. Research Ideas and Outcomes, 2022;8:e95164. doi: 10.3897/rio.8.e95164 DOI
Galaxy Workflow Format 2 Description [cited 2024 May 24]. https://galaxyproject.github.io/gxformat2/v19_09.html
De Geest P. Run of an example Galaxy collection workflow. Zenodo, 2023. https://zenodo.org/records/10017862
Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM et al.. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. Lecture Notes in Computer Science, 2004;3241:97–104. doi: 10.1007/978-3-540-30218-6_19 DOI
Dagum L, Menon R. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering 1998;5(1):46–55. doi: 10.1109/99.660313 DOI
Lam SK, Pitrou A, Seibert S. Numba: a LLVM-based Python JIT compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC 2015.
Sirvent R, Conejero J, Lordan F, Ejarque J, Rodriguez-Navas L, Fernandez JM, et al. Automatic, Efficient, and Scalable Provenance Registration for FAIR HPC Workflows. 2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS), 2022.
MareNostrum 4 user’s guide [cited 2024 May 24]. https://bsc.es/supportkc/docs/MareNostrum4/intro/
Poiata N, Satriano C, Vilotte JP, Bernard P, Obara K. Multiband array detection and location of seismic sources recorded by dense seismic networks. Geophysical Journal International, 2016;205(3):1548–1573. doi: 10.1093/gji/ggw071 DOI
Poiata N, Satriano C, Conejero J. BackTrackBB: Multi-band array detection and location of seismic sources (PyCOMPSs implementation). Zenodo, 2023. doi: 10.5281/zenodo.7788030 DOI
Ejarque J, Lordan F, Badia RM, Sirvent R, Lezzi D, Vazquez F, et al. COMPSs. Version v3.2. Zenodo, 2023.
Reis D, Piedade B, Correia FF, Dias JP, Aguiar A. Developing Docker and Docker-Compose Specifications: A Developers’ Survey. IEEE Access, 2022;10:2318–2329. doi: 10.1109/ACCESS.2021.3137671 DOI
Zerouali A, Opdebeeck R, De Roover C. Helm Charts for Kubernetes Applications: Evolution, Outdatedness and Security Risks. 2023 IEEE/ACM 20th International Conference on Mining Software Repositories, 2023;523–533.
Colonnelli I, Cantalupo B, Aldinucci M, Saitta G, Mulone A. StreamFlow. Version 0.2.0.dev10. Software Heritage Archive, 2023. https://identifiers.org/swh:1:rev:b2014add57189900fa5a0a0403b7ae3a384df73b
Fernández JM, Rodríguez-Navas L, Muñoz-Cívico A, Iborra P, Lea D. WfExS-backend. Version 1.0.0a0. Zenodo, 2024.
Di Tommaso P, Chatzou M, Floden EW, Prieto Barja P, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nature Biotechnology 2017;35:316–319. doi: 10.1038/nbt.3820 PubMed DOI
Bouyssié D, Altıner P, Capella-Gutierrez S, Fernández JM, Hagemeijer YP, Horvatovich P, et al.. WOMBAT-P: Benchmarking Label-Free Proteomics Data Analysis Workflows. Journal of Proteome Research, 2023. doi: 10.1021/acs.jproteome.3c00636 PubMed DOI
Fernández González JM. RO-Crate from staged WfExS working directory 047b6dfc-3547-4e09-92f8-df7143038ff4 (overbridging templon). Zenodo, 2024. doi: 10.5281/zenodo.12588049 DOI
Fernández JM. RO-Crate from staged WfExS working directory a37fee9e-4288-4a9e-b493-993a867207d0 (meer oxometalate). Zenodo, 2024. doi: 10.5281/zenodo.12622362 DOI
Suetake H, Tanjo T, Ishii M, Kinoshita BP, Fujino T, Hachiya T, et al. Sapporo: A workflow execution service that encourages the reuse of workflows in various languages in bioinformatics [version 1; peer review: 2 approved with reservations]. F1000Research 2022;11:889. PubMed PMC
Rehm HL, Page AJH, Smith L, Adams JB, Alterovitz G, Babb LJ, et al.. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genomics 2021;1(2):100029. doi: 10.1016/j.xgen.2021.100029 PubMed DOI PMC
Vivian J, Rao AA, Nothaft FA, Ketchum C, Armstrong J, Novak A, et al.. Toil enables reproducible, open source, big biomedical data analyses. Nature Biotechnology 2017;35(4):314–316. doi: 10.1038/nbt.3772 PubMed DOI PMC
ro-terms: Sapporo namespace [cited 2024 May 28]. https://github.com/ResearchObject/ro-terms/tree/master/sapporo
Suetake H, Fukusato T, Igarashi T, Ohta T. A workflow reproducibility scale for automatic validation of biological interpretation results. GigaScience 2023;12:giad031. doi: 10.1093/gigascience/giad031 PubMed DOI PMC
Suetake H, Ohta TI, Tanjo T, Ishii M, Kinoshita BP, DrYak. sapporo-wes/sapporo-service: 1.5.1. Zenodo, 2023.
Ohta T, Suetake H. Example of Workflow Run RO-Crate Output in Sapporo. Zenodo, 2023. doi: 10.5281/zenodo.10134581 DOI
Manubens-Gil D, Vegas-Regidor J, Prodhomme C, Mula-Valls O, Doblas-Reyes FJ. Seamless management of ensemble climate prediction experiments on HPC platforms. 2016 International Conference on High Performance Computing & Simulation (HPCS), 2016;895–900.
Yoo AB, Jette MA, Grondona M. SLURM: Simple Linux Utility for Resource Management. Job Scheduling Strategies for Parallel Processing (JSSPP 2003). Lecture Notes in Computer Science, 2003;2862. doi: 10.1007/10968987_3 DOI
Feng H, Misra V, Rubenstein D. PBS: a unified priority-based scheduler. Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2007;203–214.
Bahra A. Managing work flows with ecFlow. ECMWF Newsletter, 2011;129:30–32. doi: 10.21957/nr843dob DOI
Oliver H, Shin M, Matthews D, Sanders O, Bartholomew S, Clark A, et al.. Workflow Automation for Cycling Systems. Computing in Science & Engineering 2019;21(4):7–21. doi: 10.1109/MCSE.2019.2906593 DOI
Beltrán Mora D, Castrillo M, Marciani MG, Kinoshita BP, Tenorio-Ku L, Gaya-Àvila A, et al. Autosubmit 4.0.100. Zenodo, 2023.
Goble C, Cohen-Boulakia S, Soiland-Reyes S, Garijo D, Gil Y, Crusoe MR, et al.. FAIR Computational Workflows. Data Intelligence 2020;2(1-2):108–121. doi: 10.1162/dint_a_00033 DOI
Samaniego L, Kumar R, Attinger S. Multiscale parameter regionalization of a grid-based hydrologic model at the mesoscale. Water Resources Research, 2010;46(5). doi: 10.1029/2008WR007327 DOI
Kumar R, Samaniego L, Attinger S. Implications of distributed hydrologic model parameterization on water fluxes at multiple scales and locations. Water Resources Research 2013;49(1):360–379. doi: 10.1029/2012WR012195 DOI
Kinoshita BP. RO-Crate created using Autosubmit version 4.0.100 workflow running kinow/auto-mhm-test-domains. Zenodo, 2023. doi: 10.5281/zenodo.8144612 DOI
Leo S. Run of digital pathology tissue/tumor prediction workflow. Zenodo, 2023. doi: 10.5281/zenodo.7774351 DOI
Colonnelli I. StreamFlow run of digital pathology tissue/tumor prediction workflow. Zenodo, 2023.
Del Rio M, Lianas L, Aspegren O, Busonera G, Versaci F, Zelic R, et al. AI Support for Accelerating Histopathological Slide Examinations of Prostate Cancer in Clinical Studies. Image Analysis and Processing. ICIAP 2022 Workshops. ICIAP 2022. Lecture Notes in Computer Science 2022;13373.
CRS4 Digital Pathology Platform [cited 2024 May 27]. https://github.com/crs4/DigitalPathologyPlatform
MIRAX format [cited 2024 May 27]. https://openslide.org/formats/mirax/
Common Provenance Model RO-Crate profile [cited 2024 May 27]. https://w3id.org/cpm/ro-crate
Wittner R, Mascia C, Gallo M, Frexia F, Müller H, Plass M, et al.. Lightweight Distributed Provenance Model for Complex Real–world Environments. Scientific Data 2022;9:503. doi: 10.1038/s41597-022-01537-6 PubMed DOI PMC
Wittner R, Holub P, Mascia C, Frexia F, Müller H, Plass M. et al.. Towards a Common Standard for Data and Specimen Provenance in Life Sciences. Learning Health Systems 2023;e10365. doi: 10.1002/lrh2.10365 PubMed DOI PMC
Wittner R, Soiland-Reyes S, Leo S, Meurisse M, Hermjakob H. BY-COVID D4.3 Provenance model for infectious diseases. Zenodo, 2024. doi: 10.5281/zenodo.10927253 DOI
Wittner R, Gallo M, Leo S, Soiland-Reyes S. Packing provenance using CPM RO-Crate profile. Version 1.1. Zenodo, 2023.
The W3C SPARQL Working Group. SPARQL 1.1 Overview. W3C Recommendation 21 March 2013 [cited 2024 May 27]. https://www.w3.org/TR/sparql11-overview/
Ferreira da Silva R, Badia RM, Bala V, Bard D, Bremer PT, Buckley I, et al. Workflows Community Summit 2022: A Roadmap Revolution. arXiv:2304.00019, 2023.
de Wit R. A Non-Intimidating Approach to Workflow Reproducibility in Bioinformatics: Adding Metadata to Research Objects through the Design and Evaluation of Use-Focused Extensions to CWLProv. Zenodo, 2022. doi: 10.5281/zenodo.7113250 DOI
de Wit R, Crusoe MR. Analysis of runcrate. Zenodo, 2024. doi: 10.5281/zenodo.12689424 DOI
Leo S, Crusoe MR, Rodríguez-Navas L, Sirvent R, Kanitz A, De Geest P, et al.. Recording provenance of workflow runs with RO-Crate (RO-Crate and mapping). Zenodo, 2023. doi: 10.5281/zenodo.10368990 PubMed DOI PMC
Leo S, Crusoe MR, Rodríguez-Navas L, Sirvent R, Kanitz A, De Geest P, et al.. Recording provenance of workflow runs with RO-Crate (RO-Crate and mapping). HTML preview [cited 2024 May 27]. https://w3id.org/ro/doi/10.5281/zenodo.10368989 PubMed DOI PMC
Soiland-Reyes S, Wheater S. Five Safes RO-Crate profile. Version 0.4. TRE-FX Candidate Recommendation, 2023 [cited 2023 Dec 11]. https://w3id.org/5s-crate/0.4
Desai T, Ritchie F, Welpton R. Five Safes: designing data access for research. Economics Working Paper Series, 2016;1601. https://econpapers.repec.org/RePEc:uwe:wpaper:20161601
Giles T, Soiland-Reyes S, Couldridge J, Wheater S, Thomson B, Beggs J, et al.. TRE-FX: Delivering a federated network of trusted research environments to enable safe data analytics. Zenodo, 2023. doi: 10.5281/zenodo.10055354 DOI
Snowley K, Edwards L, Crosby B, Tatlow H. Integrating Our Community. Year 1. Health Data Research UK, 2023 (report) [cited 2023 Dec 11]. https://www.hdruk.ac.uk/wp-content/uploads/2023/10/Integrating-Our-Community_v1-Oct-2023-compressed.pdf
EOSC-ENTRUST: Creating a European network of TRUSTed research environments [cited 2024 May 27]. https://eosc-entrust.eu/
Mazumder R, Simonyan V (eds). IEEE P2791 BioCompute Working Group (BCOWG). IEEE Standard for Bioinformatics Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication. IEEE Std 2791-2020, 2020.
Alterovitz G, Dean D, Goble C, Crusoe MR, Soiland-Reyes S, Bell A. Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results. PLOS Biology 2018;16(12):e3000099. doi: 10.1371/journal.pbio.3000099 PubMed DOI PMC
Stian Soiland-Reyes. Packaging BioCompute Objects using RO-Crate [cited 2024 May 27]. https://biocompute-objects.github.io/bco-ro-crate/
Soiland-Reyes S. Describing and packaging workflows using RO-Crate and BioCompute Objects. Zenodo, 2021. doi: 10.5281/zenodo.4633732 DOI
Workflow Run RO-Crate GitHub repository [cited 2024 July 2]. https://github.com/ResearchObject/workflow-run-crate