• This record comes from PubMed

DOME Registry: implementing community-wide recommendations for reporting supervised machine learning in biology

. 2024 Jan 02 ; 13 () : .

Language English Country United States Media print

Document type Journal Article

Grant support
CA21160 European Cooperation in Science and Technology

Supervised machine learning (ML) is used extensively in biology and deserves closer scrutiny. The Data Optimization Model Evaluation (DOME) recommendations aim to enhance the validation and reproducibility of ML research by establishing standards for key aspects such as data handling and processing, optimization, evaluation, and model interpretability. The recommendations help to ensure that key details are reported transparently by providing a structured set of questions. Here, we introduce the DOME registry (URL: registry.dome-ml.org), a database that allows scientists to manage and access comprehensive DOME-related information on published ML studies. The registry uses external resources like ORCID, APICURON, and the Data Stewardship Wizard to streamline the annotation process and ensure comprehensive documentation. By assigning unique identifiers and DOME scores to publications, the registry fosters a standardized evaluation of ML methods. Future plans include continuing to grow the registry through community curation, improving the DOME score definition and encouraging publishers to adopt DOME standards, and promoting transparency and reproducibility of ML in the life sciences.

Artificial Intelligence Laboratory Vrije Universiteit Brussels Brussels 1050 Belgium

Department of Biology National and Kapodistrian University of Athens Athens 157 72 Greece

Department of Biomedical Sciences University of Padova Padova 35131 Italy

Department of Biosciences University of Milan Milan 20133 Italy

Department of Computational Biology University of Lausanne Lausanne 1015 Switzerland

Department of Information Engineering University of Padova Padova 35131 Italy

Department of Oncology Geneva University Hospitals Geneva 1205 Switzerland

Department of Pharmacy and Biotechnology University of Bologna Bologna 40126 Italy

ELIXIR Hub Hinxton Cambridge CB10 1SD UK

HES SO HEG Geneva Geneva 1227 Switzerland

Institute of Applied Biosciences Centre for Research and Technology Hellas Thessaloniki 570 01 Greece

Institute of Biomembranes Bioenergetics and Molecular Biotechnologies Bari 70126 Italy

Interuniversity Institute of Bioinformatics in Brussels Université Libre de Bruxelles Vrije Universiteit Brussels Brussels 1050 Belgium

Loschmidt Laboratories Department of Experimental Biology and RECETOX Faculty of Science Brno 62500 Czech Republic

Machine Learning Group Université Libre de Bruxelles Brussels 1050 Belgium

Masaryk University Czech Republic International Clinical Research Centre St Anne's Hospital Brno 65690 Czech Republic

SIB Swiss Institute of Bioinformatics Geneva 1206 Switzerland

Swiss Cancer Center Léman Lausanne 1015 Switzerland

Swiss Institute of Bioinformatics Lausanne 1015 Switzerland

ZB Med Information Centre for Life Sciences Cologne 50931 Germany

See more in PubMed

Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. 2015;16:321–32.. 10.1038/nrg3920. PubMed DOI PMC

Radivojac P, Clark WT, Oron TR, et al. . A large-scale evaluation of computational protein function prediction. Nat Methods. 2013;10:221–27.. 10.1038/nmeth.2340. PubMed DOI PMC

Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–58.. 10.1056/NEJMra1814259. PubMed DOI

Walsh I, Pollastri G, Tosatto SCE. Correct machine learning on protein sequences: a peer-reviewing perspective. Brief Bioinform. 2016;17:831–40.. 10.1093/bib/bbv082. PubMed DOI

Jones DT. Setting the standards for machine learning in biology. Nat Rev Mol Cell Biol. 2019;20:659–60.. 10.1038/s41580-019-0176-5. PubMed DOI

Culos A, Tsai AS, Stanley N, et al. . Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions. Nat Mach Intell. 2020;2:619–28.. 10.1038/s42256-020-00232-8. PubMed DOI PMC

Liu X, Faes L, Kale AU, et al. . A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1:e271–97.. 10.1016/S2589-7500(19)30123-2. PubMed DOI

Haibe-Kains B, Adam GA, Hosny A, et al. . Transparency and reproducibility in artificial intelligence. Nature. 2020;586:E14–E16.. 10.1038/s41586-020-2766-y. PubMed DOI PMC

Walsh I, Fishman D, Garcia-Gasulla D, et al. . DOME: recommendations for supervised machine learning validation in biology. Nat Methods. 2021;18:1122–27.. 10.1038/s41592-021-01205-4. PubMed DOI

Renaux A, Terwagne C, Cochez M, et al. . A knowledge graph approach to predict and interpret disease-causing gene interactions. BMC Bioinf. 2023;24:324. 10.1186/s12859-023-05451-5. PubMed DOI PMC

Versbraegen N, Gravel B, Nachtegael C, et al. . Faster and more accurate pathogenic combination predictions with VarCoPP2.0. BMC Bioinf. 2023;24:179. 10.1186/s12859-023-05291-3. PubMed DOI PMC

Matschinske J, Alcaraz N, Benis A, et al. . The AIMe registry for artificial intelligence in biomedical research. Nat Methods. 2021;18:1128–31.. 10.1038/s41592-021-01241-0. PubMed DOI

ORCID. https://orcid.org/. Accessed 17 July 2024.

Hatos A, Quaglia F, Piovesan D, et al. . APICURON: a database to credit and acknowledge the work of biocurators. Database J Biol Databases Curation. 2021;baab019. 10.1093/database/baab019. PubMed DOI PMC

Katz DS, Psomopoulos FE, Castro LJ. Working towards understanding the role of FAIR for machine learning. Zenodo. 2021. 10.5281/zenodo.5594990. Accessed 17 July 2024. DOI

DOME Registry. http://registry.dome-ml.org. Accessed 17 July 2024.

Data Stewardship Wizard. https://ds-wizard.org/. Accessed 17 July 2024.

Pergl R, Hooft R, Suchánek M, et al. . “Data stewardship wizard”: a tool bringing together researchers, data stewards, and data experts around data management planning. Data Sci J. 2019;18:59. 10.5334/dsj-2019-059. DOI

DOME Wizard. https://dome.dsw.elixir-europe.org/. Accessed 17 July 2024.

Apicuron Website. https://apicuron.org/. Accessed 17 July 2024.

Samuel S, Löffler F, König-Ries B. Machine learning pipelines: provenance, reproducibility and FAIR data principles. In: Glavic B, Braganholo V, Koop D, eds. Provenance and Annotation of Data and Processes. Cham, Switzerland: Springer International;2021:; 226–30.. 10.1007/978-3-030-80960-7_17. DOI

Bailey S, Bierlich C, Buckley A, et al. . Data and analysis preservation, recasting, and reinterpretation. arXiv. 2022; 10.48550/arXiv.2203.10057. Accessed 17 July 2024. DOI

Neubauer MS, Roy A. Explainable AI for high energy physics. arXiv. 2022; 10.48550/arXiv.2206.06632. Accessed 17 July 2024. DOI

Huerta EA, Blaiszik B, Brinson LC, et al. . FAIR for AI: an interdisciplinary and international community building perspective. Sci Data. 2023;10:487. 10.1038/s41597-023-02298-6. PubMed DOI PMC

FAIR for Machine Learning (FAIR4ML) IG. https://www.rd-alliance.org/groups/fair-machine-learning-fair4ml-ig. Accessed 17 July 2024.

Castro LJ, Beuttenmüller F, Chen Z, et al.. Towards metadata for machine learning—crosswalk tables. Zenodo. 2023. 10.5281/zenodo.10407320. Accessed 17 July 2024. DOI

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...