DOME Registry: implementing community-wide recommendations for reporting supervised machine learning in biology
Language English Country United States Media print
Document type Journal Article
Grant support
CA21160
European Cooperation in Science and Technology
PubMed
39661723
PubMed Central
PMC11633452
DOI
10.1093/gigascience/giae094
PII: 7921169
Knihovny.cz E-resources
- Keywords
- machine learning, reproducibility, standards, transparency,
- MeSH
- Databases, Factual MeSH
- Humans MeSH
- Registries * MeSH
- Reproducibility of Results MeSH
- Supervised Machine Learning * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Supervised machine learning (ML) is used extensively in biology and deserves closer scrutiny. The Data Optimization Model Evaluation (DOME) recommendations aim to enhance the validation and reproducibility of ML research by establishing standards for key aspects such as data handling and processing, optimization, evaluation, and model interpretability. The recommendations help to ensure that key details are reported transparently by providing a structured set of questions. Here, we introduce the DOME registry (URL: registry.dome-ml.org), a database that allows scientists to manage and access comprehensive DOME-related information on published ML studies. The registry uses external resources like ORCID, APICURON, and the Data Stewardship Wizard to streamline the annotation process and ensure comprehensive documentation. By assigning unique identifiers and DOME scores to publications, the registry fosters a standardized evaluation of ML methods. Future plans include continuing to grow the registry through community curation, improving the DOME score definition and encouraging publishers to adopt DOME standards, and promoting transparency and reproducibility of ML in the life sciences.
Artificial Intelligence Laboratory Vrije Universiteit Brussels Brussels 1050 Belgium
Department of Biology National and Kapodistrian University of Athens Athens 157 72 Greece
Department of Biomedical Sciences University of Padova Padova 35131 Italy
Department of Biosciences University of Milan Milan 20133 Italy
Department of Computational Biology University of Lausanne Lausanne 1015 Switzerland
Department of Information Engineering University of Padova Padova 35131 Italy
Department of Oncology Geneva University Hospitals Geneva 1205 Switzerland
Department of Pharmacy and Biotechnology University of Bologna Bologna 40126 Italy
ELIXIR Hub Hinxton Cambridge CB10 1SD UK
HES SO HEG Geneva Geneva 1227 Switzerland
Institute of Biomembranes Bioenergetics and Molecular Biotechnologies Bari 70126 Italy
Machine Learning Group Université Libre de Bruxelles Brussels 1050 Belgium
SIB Swiss Institute of Bioinformatics Geneva 1206 Switzerland
Swiss Cancer Center Léman Lausanne 1015 Switzerland
Swiss Institute of Bioinformatics Lausanne 1015 Switzerland
ZB Med Information Centre for Life Sciences Cologne 50931 Germany
See more in PubMed
Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. 2015;16:321–32.. 10.1038/nrg3920. PubMed DOI PMC
Radivojac P, Clark WT, Oron TR, et al. . A large-scale evaluation of computational protein function prediction. Nat Methods. 2013;10:221–27.. 10.1038/nmeth.2340. PubMed DOI PMC
Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–58.. 10.1056/NEJMra1814259. PubMed DOI
Walsh I, Pollastri G, Tosatto SCE. Correct machine learning on protein sequences: a peer-reviewing perspective. Brief Bioinform. 2016;17:831–40.. 10.1093/bib/bbv082. PubMed DOI
Jones DT. Setting the standards for machine learning in biology. Nat Rev Mol Cell Biol. 2019;20:659–60.. 10.1038/s41580-019-0176-5. PubMed DOI
Culos A, Tsai AS, Stanley N, et al. . Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions. Nat Mach Intell. 2020;2:619–28.. 10.1038/s42256-020-00232-8. PubMed DOI PMC
Liu X, Faes L, Kale AU, et al. . A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1:e271–97.. 10.1016/S2589-7500(19)30123-2. PubMed DOI
Haibe-Kains B, Adam GA, Hosny A, et al. . Transparency and reproducibility in artificial intelligence. Nature. 2020;586:E14–E16.. 10.1038/s41586-020-2766-y. PubMed DOI PMC
Walsh I, Fishman D, Garcia-Gasulla D, et al. . DOME: recommendations for supervised machine learning validation in biology. Nat Methods. 2021;18:1122–27.. 10.1038/s41592-021-01205-4. PubMed DOI
Renaux A, Terwagne C, Cochez M, et al. . A knowledge graph approach to predict and interpret disease-causing gene interactions. BMC Bioinf. 2023;24:324. 10.1186/s12859-023-05451-5. PubMed DOI PMC
Versbraegen N, Gravel B, Nachtegael C, et al. . Faster and more accurate pathogenic combination predictions with VarCoPP2.0. BMC Bioinf. 2023;24:179. 10.1186/s12859-023-05291-3. PubMed DOI PMC
Matschinske J, Alcaraz N, Benis A, et al. . The AIMe registry for artificial intelligence in biomedical research. Nat Methods. 2021;18:1128–31.. 10.1038/s41592-021-01241-0. PubMed DOI
ORCID. https://orcid.org/. Accessed 17 July 2024.
Hatos A, Quaglia F, Piovesan D, et al. . APICURON: a database to credit and acknowledge the work of biocurators. Database J Biol Databases Curation. 2021;baab019. 10.1093/database/baab019. PubMed DOI PMC
Katz DS, Psomopoulos FE, Castro LJ. Working towards understanding the role of FAIR for machine learning. Zenodo. 2021. 10.5281/zenodo.5594990. Accessed 17 July 2024. DOI
DOME Registry. http://registry.dome-ml.org. Accessed 17 July 2024.
Data Stewardship Wizard. https://ds-wizard.org/. Accessed 17 July 2024.
Pergl R, Hooft R, Suchánek M, et al. . “Data stewardship wizard”: a tool bringing together researchers, data stewards, and data experts around data management planning. Data Sci J. 2019;18:59. 10.5334/dsj-2019-059. DOI
DOME Wizard. https://dome.dsw.elixir-europe.org/. Accessed 17 July 2024.
Apicuron Website. https://apicuron.org/. Accessed 17 July 2024.
Samuel S, Löffler F, König-Ries B. Machine learning pipelines: provenance, reproducibility and FAIR data principles. In: Glavic B, Braganholo V, Koop D, eds. Provenance and Annotation of Data and Processes. Cham, Switzerland: Springer International;2021:; 226–30.. 10.1007/978-3-030-80960-7_17. DOI
Bailey S, Bierlich C, Buckley A, et al. . Data and analysis preservation, recasting, and reinterpretation. arXiv. 2022; 10.48550/arXiv.2203.10057. Accessed 17 July 2024. DOI
Neubauer MS, Roy A. Explainable AI for high energy physics. arXiv. 2022; 10.48550/arXiv.2206.06632. Accessed 17 July 2024. DOI
Huerta EA, Blaiszik B, Brinson LC, et al. . FAIR for AI: an interdisciplinary and international community building perspective. Sci Data. 2023;10:487. 10.1038/s41597-023-02298-6. PubMed DOI PMC
FAIR for Machine Learning (FAIR4ML) IG. https://www.rd-alliance.org/groups/fair-machine-learning-fair4ml-ig. Accessed 17 July 2024.
Castro LJ, Beuttenmüller F, Chen Z, et al.. Towards metadata for machine learning—crosswalk tables. Zenodo. 2023. 10.5281/zenodo.10407320. Accessed 17 July 2024. DOI