MIFA: Metadata, Incentives, Formats and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis
Status Publisher Jazyk angličtina Země Spojené státy americké Médium print-electronic
Typ dokumentu časopisecké články, přehledy
Grantová podpora
CC1076
RCUK | Medical Research Council (MRC)
CC1076
Wellcome Trust (Wellcome)
75N91019D00024
NCI NIH HHS - United States
PubMed
40954297
DOI
10.1038/s41592-025-02835-8
PII: 10.1038/s41592-025-02835-8
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Artificial intelligence (AI) methods are powerful tools for biological image analysis and processing. High-quality annotated images are key to training and developing new algorithms, but access to such data is often hindered by the lack of standards for sharing datasets. We discuss the barriers to sharing annotated image datasets and suggest specific guidelines to improve the reuse of bioimages and annotations for AI applications. These include standards on data formats, metadata, data presentation and sharing, and incentives to generate new datasets. We are sure that the Metadata, Incentives, Formats and Accessibility (MIFA) recommendations will accelerate the development of AI tools for bioimage analysis by facilitating access to high-quality training and benchmarking data.
BioVisionCenter Universität Zürich Zürich Switzerland
CEITEC Central European Institute of Technology Masaryk University Brno Czech Republic
Cell Biology and Biophysics Unit European Molecular Biology Laboratory Heidelberg Germany
Electron Microscopy Science Technology Platform Francis Crick Institute London UK
Euro BioImaging ERIC Bio Hub European Molecular Biology Laboratory Heidelberg Heidelberg Germany
European Molecular Biology Laboratory Data Science Heidelberg Germany
European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
European Molecular Biology Laboratory Genome Biology Unit Heidelberg Germany
Fondazione Human Technopole 5 le Rita Levi Montalcini Milan Italy
German BioImaging Gesellschaft für Mikroskopie und Bildanalyse e 5 Konstanz Germany
Imaging Platform Broad Institute Cambridge MA USA
Instituto de Investigación Sanitaria Gregorio Marañón Madrid Spain
IT4Innovations VSB Technical University of Ostrava Ostrava Poruba Czech Republic
Nantes Université CHU Nantes CNRS Inserm BioCore US16 SFR Bonamy Nantes France
Zobrazit více v PubMed
Zhang, K., Pintilie, G. D., Li, S., Schmid, M. F. & Chiu, W. Resolving individual atoms of protein complex by cryo-electron microscopy. Cell Res. 30, 1136–1139 (2020). PubMed DOI
Nakane, T. et al. Single-particle cryo-EM at atomic resolution. Nature 587, 152–156 (2020). PubMed DOI PMC
Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. Atomic-resolution protein structure determination by cryo-EM. Nature 587, 157–161 (2020). PubMed DOI
Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006). PubMed DOI
Rust, M. J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–795 (2006). PubMed DOI PMC
Hess, S. T., Girirajan, T. P. K. & Mason, M. D. Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006). PubMed DOI PMC
Megason, S. G. In toto imaging of embryogenesis with confocal time-lapse microscopy. Methods Mol. Biol. 546, 317–332 (2009). PubMed DOI PMC
McDole, K. et al. In toto imaging and reconstruction of post-implantation mouse development at the single-cell level. Cell 175, 859–876 (2018). PubMed DOI
Daetwyler, S., Günther, U., Modes, C. D., Harrington, K. & Huisken, J. Multi-sample SPIM image acquisition, processing and analysis of vascular growth in zebrafish. Development 146, dev173757 (2019). PubMed DOI PMC
Chen, B. -C. et al. Lattice light-sheet microscopy: imaging molecules to embryos at high spatiotemporal resolution. Science 346, 1257998 (2014). PubMed DOI PMC
Huisken, J., Swoger, J., Del Bene, F., Wittbrodt, J. & Stelzer, E. H. K. Optical sectioning deep inside live embryos by selective plane illumination microscopy. Science 305, 1007–1009 (2004). PubMed DOI
Udan, R. S., Piazza, V. G., Hsu, C. -W., Hadjantonakis, A. -K. & Dickinson, M. E. Quantitative imaging of cell dynamics in mouse embryos using light-sheet microscopy. Development 141, 4406–4414 (2014). PubMed DOI PMC
Royer, L. A. et al. Adaptive light-sheet microscopy for long-term, high-resolution imaging in living organisms. Nat. Biotechnol. 34, 1267–1278 (2016). PubMed DOI
Moen, E. et al. Deep learning for cellular image analysis. Nat. Methods 16, 1233–1246 (2019). PubMed DOI PMC
Hallou, A., Yevick, H. G., Dumitrascu, B. & Uhlmann, V. Deep learning for bioimage analysis in developmental biology. Development 148, dev199616 (2021). PubMed DOI PMC
Gupta, A. et al. Deep learning in image cytometry: a review. Cytometry A 95, 366–380 (2019). PubMed DOI
Villoutreix, P. What machine learning can do for developmental biology. Development 148, dev188474 (2021). PubMed DOI
Wang, S., Yang, D. M., Rong, R., Zhan, X. & Xiao, G. Pathology image analysis using segmentation deep learning algorithms. Am. J. Pathol. 189, 1686–1698 (2019). PubMed DOI PMC
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). PubMed DOI
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). This publication introduces the FAIR principles, explains their rationale and highlights example implementations. PubMed DOI PMC
Rutschi, C., Berente, N. & Nwanganga, F. Data sensitivity and domain specificity in reuse of machine learning applications. Inf. Syst. Front. https://doi.org/10.1007/s10796-023-10388-4 (2023). DOI
Laine, R. F., Arganda-Carreras, I., Henriques, R. & Jacquemet, G. Avoiding a replication crisis in deep-learning-based bioimage analysis. Nat. Methods 18, 1136–1144 (2021). This Comment highlights important considerations for researchers to ensure reproducibility when publishing studies using deep learning in microscopy, including validation methods, tool selection, data practices and reporting standards. PubMed DOI PMC
Boehm, U. et al. QUAREP-LiMi: a community endeavor to advance quality assessment and reproducibility in light microscopy. Nat. Methods 18, 1423–1426 (2021). PubMed DOI PMC
Swedlow, J. R. et al. A global view of standards for open image data formats and repositories. Nat. Methods 18, 1440–1446 (2021). PubMed DOI
Linkert, M. et al. Metadata matters: access to image data in the real world. J. Cell Biol. 189, 777–782 (2010). PubMed DOI PMC
Sarkans, U. et al. REMBI: REcommended Metadata for Biological Images—enabling reuse of microscopy data in biology. Nat. Methods 18, 1418–1422 (2021). This article introduces the REMBI guidelines aimed to maximize the reuse of biological images across diverse imaging communities. PubMed DOI PMC
Schapiro, D. et al. MITI minimum information guidelines for highly multiplexed tissue images. Nat. Methods 19, 262–267 (2022). PubMed DOI PMC
Schwendy, M., Unger, R. E. & Parekh, S. H. EVICAN-a balanced dataset for algorithm development in cell and nucleus segmentation. Bioinformatics 36, 3863–3870 (2020). PubMed DOI PMC
Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021). PubMed DOI
Edlund, C. et al. LIVECell—a large-scale dataset for label-free live cell segmentation. Nat. Methods 18, 1038–1045 (2021). PubMed DOI PMC
Conrad, R. & Narayan, K. CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning. Elife 10, e65894 (2021). PubMed DOI PMC
Conrad, R. & Narayan, K. Instance segmentation of mitochondria in electron microscopy images with a generalist deep learning model trained on a diverse dataset. Cell Syst. 14, 58–71 (2023). PubMed DOI PMC
Caicedo, J. C. et al. Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl. Nat. Methods 16, 1247–1253 (2019). PubMed DOI PMC
Ulman, V. et al. An objective comparison of cell-tracking algorithms. Nat. Methods 14, 1141–1152 (2017). PubMed DOI PMC
Maška, M. et al. The Cell Tracking Challenge: 10 years of objective benchmarking. Nat. Methods https://doi.org/10.1038/s41592-023-01879-y (2023). PubMed DOI PMC
Ljosa, V., Sokolnicki, K. L. & Carpenter, A. E. Annotated high-throughput microscopy image sets for validation. Nat. Methods 9, 637 (2012). PubMed DOI PMC
Iudin, A. et al. EMPIAR: the Electron Microscopy Public Image Archive. Nucleic Acids Res. 51, D1503–D1511 (2023). PubMed DOI
Hartley, M. et al. The BioImage Archive—building a home for life-sciences microscopy data. J. Mol. Biol. 434, 167505 (2022). PubMed DOI
Bard, J. B. L. & Rhee, S. Y. Ontologies in biology: design, applications and future challenges. Nat. Rev. Genet. 5, 213–222 (2004). PubMed DOI
Creative Commons—CC0. Creative Commons https://creativecommons.org/share-your-work/public-domain/cc0/ (2009).
Creative Commons—Attribution 4.0 International—CC BY 4.0. https://creativecommons.org/licenses/by/4.0/
Ouyang, W. et al. BioImage Model Zoo: a community-driven resource for accessible deep learning in bioimage analysis. Preprint at bioRxiv https://doi.org/10.1101/2022.06.07.495102 (2022).
Virshup, I., Rybakov, S., Theis, F. J., Angerer, P. & Alexander Wolf, F. anndata: access and store annotated data matrices. J. Open Source Softw. 9, 4371 (2024). DOI
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.91 (IEEE, 2016).
Stelzer, E. H. K. et al. Light sheet fluorescence microscopy. Nat. Rev. Methods Primers 1, 1–25 (2021). DOI
Peddie, C. J. et al. Volume electron microscopy. Nat. Rev. Methods Primers 2, 1–23 (2022).
Moore, J. et al. OME-Zarr: a cloud-optimized bioimaging file format with international community support. Histochem. Cell Biol. 160, 223–251 (2023). This work introduces the cloud-optimized file format OME-Zarr, which aims to improve FAIR data access and unify file standards across fields to support efficient data management and analysis. PubMed DOI PMC
Marconato, L. et al. SpatialData: an open and universal data framework for spatial omics. Nat. Methods https://doi.org/10.1038/s41592-024-02212-x (2024). PubMed DOI PMC
Butler, H. et al. The GeoJSON Format, RFC 7946. https://doi.org/10.17487/rfc7946 (2016).
Lin, T. -Y. et al. Microsoft COCO: Common Objects in Context. Preprint at https://arxiv.org/abs/1405.0312 (2014).
Data sharing is the future. Nat. Methods 20, 471 (2023).
Kaiser, J. & Brainard, J. Ready, set, share! Science 379, 322–325 (2023). PubMed DOI
Sever, R. We need a plan D. Nat. Methods 20, 473–474 (2023). PubMed DOI
Uhlmann, V., Hartley, M., Moore, J., Weisbart, E. & Zaritsky, A. Making the most of bioimaging data through interdisciplinary interactions. J. Cell Sci. 137, jcs262139 (2024). This article examines key players in the bioimaging field, highlights barriers to interdisciplinary interaction and proposes actions to foster a culture of open data sharing to drive innovation. PubMed DOI PMC
Jing, L. & Tian, Y. Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4037–4058 (2021). PubMed DOI
Bekkhus, T. et al. Remodeling of the lymph node high endothelial venules reflects tumor invasiveness in breast cancer and is associated with dysregulation of perivascular stromal cells. Cancers 13, 211 (2021). PubMed DOI PMC
Rangan, R. et al. CryoDRGN-ET: deep reconstructing generative networks for visualizing dynamic biomolecules inside cells. Nat. Methods 21, 1537–1545 (2024). PubMed DOI
Galimov, E. & Yakimovich, A. A tandem segmentation-classification approach for the localization of morphological predictors of lifespan and motility. Aging 14, 1665–1677 (2022). PubMed DOI PMC
Vijayan, A. et al. The annotation and analysis of complex 3D plant organs using 3DCoordX. Plant Physiol. 189, 1278–1295 (2022). PubMed DOI PMC
Jones, R. A., Renshaw, M. J., Barry, D. J. & Smith, J. C. Automated staging of zebrafish embryos using machine learning. Wellcome Open Res. 7, 275 (2022). PubMed DOI
Rappez, L., Rakhlin, A., Rigopoulos, A., Nikolenko, S. & Alexandrov, T. DeepCycle reconstructs a cyclic cell cycle trajectory from unsegmented cell images using convolutional neural networks. Mol. Syst. Biol. 16, e9474 (2020). PubMed DOI PMC
Kromp, F. et al. An annotated fluorescence image dataset for training nuclear segmentation methods. Sci. Data 7, 262 (2020). PubMed DOI PMC