MIFA: Metadata, Incentives, Formats and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis

. 2025 Sep 15 ; () : . [epub] 20250915

Status Publisher Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, přehledy

Perzistentní odkaz   https://www.medvik.cz/link/pmid40954297

Grantová podpora
CC1076 RCUK | Medical Research Council (MRC)
CC1076 Wellcome Trust (Wellcome)
75N91019D00024 NCI NIH HHS - United States

Odkazy

PubMed 40954297
DOI 10.1038/s41592-025-02835-8
PII: 10.1038/s41592-025-02835-8
Knihovny.cz E-zdroje

Artificial intelligence (AI) methods are powerful tools for biological image analysis and processing. High-quality annotated images are key to training and developing new algorithms, but access to such data is often hindered by the lack of standards for sharing datasets. We discuss the barriers to sharing annotated image datasets and suggest specific guidelines to improve the reuse of bioimages and annotations for AI applications. These include standards on data formats, metadata, data presentation and sharing, and incentives to generate new datasets. We are sure that the Metadata, Incentives, Formats and Accessibility (MIFA) recommendations will accelerate the development of AI tools for bioimage analysis by facilitating access to high-quality training and benchmarking data.

BioVisionCenter Universität Zürich Zürich Switzerland

Cancer Early Detection Advanced Research Knight Cancer Institute Oregon Health and Science University Portland OR USA

Cancer Research Technology Program Frederick National Laboratory for Cancer Research Frederick MD USA

CEITEC Central European Institute of Technology Masaryk University Brno Czech Republic

Cell Biology and Biophysics Unit European Molecular Biology Laboratory Heidelberg Germany

Center for Molecular Microscopy Center for Cancer Research National Cancer Institute NIH Bethesda MD USA

Collaboration for joint PhD degree between EMBL and Heidelberg University Faculty of Biosciences Heidelberg Germany

Department of Experimental Medical Science Lund University Bioimaging Centre and Nanolund Lund University Lund Sweden

Division of Computational Genomics and System Genetics German Cancer Research Center Heidelberg Germany

Edinburgh Pathology Centre for Genomic and Experimental Medicine and CRUK Scotland Centre Institute of Genetics and Cancer University of Edinburgh Edinburgh UK

Electron Microscopy Science Technology Platform Francis Crick Institute London UK

Euro BioImaging ERIC Bio Hub European Molecular Biology Laboratory Heidelberg Heidelberg Germany

European Molecular Biology Laboratory Data Science Heidelberg Germany

European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK

European Molecular Biology Laboratory Genome Biology Unit Heidelberg Germany

Fondazione Human Technopole 5 le Rita Levi Montalcini Milan Italy

German BioImaging Gesellschaft für Mikroskopie und Bildanalyse e 5 Konstanz Germany

Imaging Platform Broad Institute Cambridge MA USA

Instituto de Investigación Sanitaria Gregorio Marañón Madrid Spain

IT4Innovations VSB Technical University of Ostrava Ostrava Poruba Czech Republic

Nantes Université CHU Nantes CNRS Inserm BioCore US16 SFR Bonamy Nantes France

Scalable Minds GmbH Potsdam Germany

Universidad Carlos 3 de Madrid Madrid Spain

Zobrazit více v PubMed

Zhang, K., Pintilie, G. D., Li, S., Schmid, M. F. & Chiu, W. Resolving individual atoms of protein complex by cryo-electron microscopy. Cell Res. 30, 1136–1139 (2020). PubMed DOI

Nakane, T. et al. Single-particle cryo-EM at atomic resolution. Nature 587, 152–156 (2020). PubMed DOI PMC

Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. Atomic-resolution protein structure determination by cryo-EM. Nature 587, 157–161 (2020). PubMed DOI

Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006). PubMed DOI

Rust, M. J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–795 (2006). PubMed DOI PMC

Hess, S. T., Girirajan, T. P. K. & Mason, M. D. Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006). PubMed DOI PMC

Megason, S. G. In toto imaging of embryogenesis with confocal time-lapse microscopy. Methods Mol. Biol. 546, 317–332 (2009). PubMed DOI PMC

McDole, K. et al. In toto imaging and reconstruction of post-implantation mouse development at the single-cell level. Cell 175, 859–876 (2018). PubMed DOI

Daetwyler, S., Günther, U., Modes, C. D., Harrington, K. & Huisken, J. Multi-sample SPIM image acquisition, processing and analysis of vascular growth in zebrafish. Development 146, dev173757 (2019). PubMed DOI PMC

Chen, B. -C. et al. Lattice light-sheet microscopy: imaging molecules to embryos at high spatiotemporal resolution. Science 346, 1257998 (2014). PubMed DOI PMC

Huisken, J., Swoger, J., Del Bene, F., Wittbrodt, J. & Stelzer, E. H. K. Optical sectioning deep inside live embryos by selective plane illumination microscopy. Science 305, 1007–1009 (2004). PubMed DOI

Udan, R. S., Piazza, V. G., Hsu, C. -W., Hadjantonakis, A. -K. & Dickinson, M. E. Quantitative imaging of cell dynamics in mouse embryos using light-sheet microscopy. Development 141, 4406–4414 (2014). PubMed DOI PMC

Royer, L. A. et al. Adaptive light-sheet microscopy for long-term, high-resolution imaging in living organisms. Nat. Biotechnol. 34, 1267–1278 (2016). PubMed DOI

Moen, E. et al. Deep learning for cellular image analysis. Nat. Methods 16, 1233–1246 (2019). PubMed DOI PMC

Hallou, A., Yevick, H. G., Dumitrascu, B. & Uhlmann, V. Deep learning for bioimage analysis in developmental biology. Development 148, dev199616 (2021). PubMed DOI PMC

Gupta, A. et al. Deep learning in image cytometry: a review. Cytometry A 95, 366–380 (2019). PubMed DOI

Villoutreix, P. What machine learning can do for developmental biology. Development 148, dev188474 (2021). PubMed DOI

Wang, S., Yang, D. M., Rong, R., Zhan, X. & Xiao, G. Pathology image analysis using segmentation deep learning algorithms. Am. J. Pathol. 189, 1686–1698 (2019). PubMed DOI PMC

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). PubMed DOI

Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). This publication introduces the FAIR principles, explains their rationale and highlights example implementations. PubMed DOI PMC

Rutschi, C., Berente, N. & Nwanganga, F. Data sensitivity and domain specificity in reuse of machine learning applications. Inf. Syst. Front. https://doi.org/10.1007/s10796-023-10388-4 (2023). DOI

Laine, R. F., Arganda-Carreras, I., Henriques, R. & Jacquemet, G. Avoiding a replication crisis in deep-learning-based bioimage analysis. Nat. Methods 18, 1136–1144 (2021). This Comment highlights important considerations for researchers to ensure reproducibility when publishing studies using deep learning in microscopy, including validation methods, tool selection, data practices and reporting standards. PubMed DOI PMC

Boehm, U. et al. QUAREP-LiMi: a community endeavor to advance quality assessment and reproducibility in light microscopy. Nat. Methods 18, 1423–1426 (2021). PubMed DOI PMC

Swedlow, J. R. et al. A global view of standards for open image data formats and repositories. Nat. Methods 18, 1440–1446 (2021). PubMed DOI

Linkert, M. et al. Metadata matters: access to image data in the real world. J. Cell Biol. 189, 777–782 (2010). PubMed DOI PMC

Sarkans, U. et al. REMBI: REcommended Metadata for Biological Images—enabling reuse of microscopy data in biology. Nat. Methods 18, 1418–1422 (2021). This article introduces the REMBI guidelines aimed to maximize the reuse of biological images across diverse imaging communities. PubMed DOI PMC

Schapiro, D. et al. MITI minimum information guidelines for highly multiplexed tissue images. Nat. Methods 19, 262–267 (2022). PubMed DOI PMC

Schwendy, M., Unger, R. E. & Parekh, S. H. EVICAN-a balanced dataset for algorithm development in cell and nucleus segmentation. Bioinformatics 36, 3863–3870 (2020). PubMed DOI PMC

Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021). PubMed DOI

Edlund, C. et al. LIVECell—a large-scale dataset for label-free live cell segmentation. Nat. Methods 18, 1038–1045 (2021). PubMed DOI PMC

Conrad, R. & Narayan, K. CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning. Elife 10, e65894 (2021). PubMed DOI PMC

Conrad, R. & Narayan, K. Instance segmentation of mitochondria in electron microscopy images with a generalist deep learning model trained on a diverse dataset. Cell Syst. 14, 58–71 (2023). PubMed DOI PMC

Caicedo, J. C. et al. Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl. Nat. Methods 16, 1247–1253 (2019). PubMed DOI PMC

Ulman, V. et al. An objective comparison of cell-tracking algorithms. Nat. Methods 14, 1141–1152 (2017). PubMed DOI PMC

Maška, M. et al. The Cell Tracking Challenge: 10 years of objective benchmarking. Nat. Methods https://doi.org/10.1038/s41592-023-01879-y (2023). PubMed DOI PMC

Ljosa, V., Sokolnicki, K. L. & Carpenter, A. E. Annotated high-throughput microscopy image sets for validation. Nat. Methods 9, 637 (2012). PubMed DOI PMC

Iudin, A. et al. EMPIAR: the Electron Microscopy Public Image Archive. Nucleic Acids Res. 51, D1503–D1511 (2023). PubMed DOI

Hartley, M. et al. The BioImage Archive—building a home for life-sciences microscopy data. J. Mol. Biol. 434, 167505 (2022). PubMed DOI

Bard, J. B. L. & Rhee, S. Y. Ontologies in biology: design, applications and future challenges. Nat. Rev. Genet. 5, 213–222 (2004). PubMed DOI

Creative Commons—CC0. Creative Commons https://creativecommons.org/share-your-work/public-domain/cc0/ (2009).

Creative Commons—Attribution 4.0 International—CC BY 4.0. https://creativecommons.org/licenses/by/4.0/

Ouyang, W. et al. BioImage Model Zoo: a community-driven resource for accessible deep learning in bioimage analysis. Preprint at bioRxiv https://doi.org/10.1101/2022.06.07.495102 (2022).

Virshup, I., Rybakov, S., Theis, F. J., Angerer, P. & Alexander Wolf, F. anndata: access and store annotated data matrices. J. Open Source Softw. 9, 4371 (2024). DOI

Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.91 (IEEE, 2016).

Stelzer, E. H. K. et al. Light sheet fluorescence microscopy. Nat. Rev. Methods Primers 1, 1–25 (2021). DOI

Peddie, C. J. et al. Volume electron microscopy. Nat. Rev. Methods Primers 2, 1–23 (2022).

Moore, J. et al. OME-Zarr: a cloud-optimized bioimaging file format with international community support. Histochem. Cell Biol. 160, 223–251 (2023). This work introduces the cloud-optimized file format OME-Zarr, which aims to improve FAIR data access and unify file standards across fields to support efficient data management and analysis. PubMed DOI PMC

Marconato, L. et al. SpatialData: an open and universal data framework for spatial omics. Nat. Methods https://doi.org/10.1038/s41592-024-02212-x (2024). PubMed DOI PMC

Butler, H. et al. The GeoJSON Format, RFC 7946. https://doi.org/10.17487/rfc7946 (2016).

Lin, T. -Y. et al. Microsoft COCO: Common Objects in Context. Preprint at https://arxiv.org/abs/1405.0312 (2014).

Data sharing is the future. Nat. Methods 20, 471 (2023).

Kaiser, J. & Brainard, J. Ready, set, share! Science 379, 322–325 (2023). PubMed DOI

Sever, R. We need a plan D. Nat. Methods 20, 473–474 (2023). PubMed DOI

Uhlmann, V., Hartley, M., Moore, J., Weisbart, E. & Zaritsky, A. Making the most of bioimaging data through interdisciplinary interactions. J. Cell Sci. 137, jcs262139 (2024). This article examines key players in the bioimaging field, highlights barriers to interdisciplinary interaction and proposes actions to foster a culture of open data sharing to drive innovation. PubMed DOI PMC

Jing, L. & Tian, Y. Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4037–4058 (2021). PubMed DOI

Bekkhus, T. et al. Remodeling of the lymph node high endothelial venules reflects tumor invasiveness in breast cancer and is associated with dysregulation of perivascular stromal cells. Cancers 13, 211 (2021). PubMed DOI PMC

Rangan, R. et al. CryoDRGN-ET: deep reconstructing generative networks for visualizing dynamic biomolecules inside cells. Nat. Methods 21, 1537–1545 (2024). PubMed DOI

Galimov, E. & Yakimovich, A. A tandem segmentation-classification approach for the localization of morphological predictors of lifespan and motility. Aging 14, 1665–1677 (2022). PubMed DOI PMC

Vijayan, A. et al. The annotation and analysis of complex 3D plant organs using 3DCoordX. Plant Physiol. 189, 1278–1295 (2022). PubMed DOI PMC

Jones, R. A., Renshaw, M. J., Barry, D. J. & Smith, J. C. Automated staging of zebrafish embryos using machine learning. Wellcome Open Res. 7, 275 (2022). PubMed DOI

Rappez, L., Rakhlin, A., Rigopoulos, A., Nikolenko, S. & Alexandrov, T. DeepCycle reconstructs a cyclic cell cycle trajectory from unsegmented cell images using convolutional neural networks. Mol. Syst. Biol. 16, e9474 (2020). PubMed DOI PMC

Kromp, F. et al. An annotated fluorescence image dataset for training nuclear segmentation methods. Sci. Data 7, 262 (2020). PubMed DOI PMC

Najít záznam

Citační ukazatele

Pouze přihlášení uživatelé

Možnosti archivace

Nahrávání dat ...