International multicenter validation of AI-driven ultrasound detection of ovarian cancer

. 2025 Jan ; 31 (1) : 189-196. [epub] 20250102

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, multicentrická studie, validační studie

Perzistentní odkaz   https://www.medvik.cz/link/pmid39747679

Grantová podpora
231143 Radiumhemmets Forskningsfonder (Cancer Research Foundations of Radiumhemmet)
211657 Pi 01 H Cancerfonden (Swedish Cancer Society)
2020-01702 Vetenskapsrdet (Swedish Research Council)

Odkazy

PubMed 39747679
PubMed Central PMC11750711
DOI 10.1038/s41591-024-03329-4
PII: 10.1038/s41591-024-03329-4
Knihovny.cz E-zdroje

Ovarian lesions are common and often incidentally detected. A critical shortage of expert ultrasound examiners has raised concerns of unnecessary interventions and delayed cancer diagnoses. Deep learning has shown promising results in the detection of ovarian cancer in ultrasound images; however, external validation is lacking. In this international multicenter retrospective study, we developed and validated transformer-based neural network models using a comprehensive dataset of 17,119 ultrasound images from 3,652 patients across 20 centers in eight countries. Using a leave-one-center-out cross-validation scheme, for each center in turn, we trained a model using data from the remaining centers. The models demonstrated robust performance across centers, ultrasound systems, histological diagnoses and patient age groups, significantly outperforming both expert and non-expert examiners on all evaluated metrics, namely F1 score, sensitivity, specificity, accuracy, Cohen's kappa, Matthew's correlation coefficient, diagnostic odds ratio and Youden's J statistic. Furthermore, in a retrospective triage simulation, artificial intelligence (AI)-driven diagnostic support reduced referrals to experts by 63% while significantly surpassing the diagnostic performance of the current practice. These results show that transformer-based models exhibit strong generalization and above human expert-level diagnostic accuracy, with the potential to alleviate the shortage of expert ultrasound examiners and improve patient outcomes.

1st Department of Obstetrics and Gynecology Alexandra Hospital Medical School National and Kapodistrian University of Athens Athens Greece

3rd Faculty of Medicine Charles University Prague Czech Republic

Centro Integrato di Procreazione Medicalmente Assistita e Diagnostica Ostetrico Ginecologica Azienda Ospedaliero Universitaria Policlinico Duilio Casula Monserrato University of Cagliari Cagliari Italy

Department of Clinical Science and Education Södersjukhuset Karolinska Institutet Stockholm Sweden

Department of Gynecological Oncology and Gynecology Medical University of Lublin Lublin Poland

Department of Medicine and Surgery University of Milan Bicocca Milan Italy

Department of Obstetrics and Gynaecology Lithuanian University of Health Sciences Kaunas Lithuania

Department of Obstetrics and Gynecology Clínica Universidad de Navarra Pamplona Spain

Department of Obstetrics and Gynecology Rizal Medical Center Manila Philippines

Department of Obstetrics and Gynecology Skåne University Hospital Lund Sweden

Department of Obstetrics and Gynecology Södersjukhuset Stockholm Sweden

Department of Obstetrics Gynecology and Reproduction Dexeus University Hospital Barcelona Spain

Department of Perinatology and Oncological Gynecology Faculty of Medical Sciences Medical University of Silesia Katowice Poland

Digital Futures KTH Royal Institute of Technology Stockholm Sweden

Fondazione Poliambulanza Istituto Ospedaliero Brescia Italy

Gynecologic and Obstetric Unit Women's and Children's Department Forlì Hospital Forlì Italy

Gynecologic Oncology Centre Department of Gynecology Obstetrics and Neonatology 1st Faculty of Medicine Charles University and General University Hospital Prague Prague Czech Republic

Gynecology and Breast Care Center Mater Olbia Hospital Olbia Italy

Institute for Maternal and Child Health IRCCS 'Burlo Garofolo' Trieste Italy

Institute for the Care of Mother and Child Prague Czech Republic

Obstetrics and Gynecology Unit Forlì and Faenza Hospitals AUSL Romagna Forlì Italy

School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm Sweden

Science for Life Laboratory Stockholm Sweden

Section of Obstetrics and Gynecology Department of Clinical Sciences Università Politecnica delle Marche Azienda Ospedaliero Universitaria delle Marche Ancona Italy

Unit of Obstetrics and Gynecology Department of Biomedical and Clinical Sciences Luigi Sacco University Hospital University of Milan Milan Italy

Unit of Preventive Gynecology European Institute of Oncology IRCCS Milan Italy

UO Gynecology Fondazione IRCCS San Gerardo dei Tintori Monza Italy

Zobrazit více v PubMed

Yazbek, J. et al. Effect of quality of gynaecological ultrasonography on management of patients with suspected ovarian cancer: a randomised controlled trial. Lancet Oncol.9, 124–131 (2008). PubMed

Froyman, W. et al. Risk of complications in patients with conservatively managed ovarian tumours (IOTA5): a 2-year interim analysis of a multicentre, prospective, cohort study. Lancet Oncol.20, 448–458 (2019). PubMed

Vergote, I. et al. Prognostic importance of degree of differentiation and cyst rupture in stage I invasive epithelial ovarian carcinoma. Lancet357, 176–182 (2001). PubMed

Bristow, R. E., Tomacruz, R. S., Armstrong, D. K., Trimble, E. L. & Montz, F. J. Survival effect of maximal cytoreductive surgery for advanced ovarian carcinoma during the platinum era: a meta-analysis. J. Clin. Oncol.41, 4065–4076 (2023). PubMed

Timmerman, D. et al. ESGO/ISUOG/IOTA/ESGE Consensus Statement on pre-operative diagnosis of ovarian tumors. Int. J. Gynecol. Cancer31, 961–982 (2021). PubMed PMC

Van Holsbeke, C. et al. Ultrasound methods to distinguish between malignant and benign adnexal masses in the hands of examiners with different levels of experience. Ultrasound Obstet. Gynecol.34, 454–461 (2009). PubMed

Van Holsbeke, C. et al. Ultrasound experience substantially impacts on diagnostic performance and confidence when adnexal masses are classified using pattern recognition. Gynecol. Obstet. Invest.69, 160–168 (2010). PubMed

Timmerman, D. et al. Subjective assessment of adnexal masses with the use of ultrasonography: an analysis of interobserver variability and experience. Ultrasound Obstet. Gynecol.13, 11–16 (1999). PubMed

Christiansen, F. et al. Ultrasound image analysis using deep neural networks for discriminating between benign and malignant ovarian tumors: comparison with expert subjective assessment. Ultrasound Obstet. Gynecol.57, 155–163 (2021). PubMed PMC

Gao, Y. et al. Deep learning-enabled pelvic ultrasound images for accurate diagnosis of ovarian cancer in China: a retrospective, multicentre, diagnostic study. Lancet Digit. Health4, e179–e187 (2022). PubMed

Cohen, J. P. et al. Problems in the deployment of machine-learned models in health care. CMAJ193, e1391–e1394 (2021). PubMed PMC

Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

Stacke, K. et al. Measuring domain shift for deep learning in histopathology. IEEE J. Biomed. Health Inform.25, 325–336 (2020). PubMed

Sharifzadeh, M., Tehrani, A. K., Benali, H. & Rivaz, H. Ultrasound domain adaptation using frequency domain analysis. 2021 IEEE International Ultrasonics Symposium (IUS), 1–4 (2021).

Tierney, J., et al. Accounting for domain shift in neural network ultrasound beamforming. 2020 IEEE International Ultrasonics Symposium (IUS), 1–3 (2020).

Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health1, e271–e297 (2019). PubMed

Chalkidou, A. et al. Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening. Lancet Digit. Health4, e899–e905 (2022). PubMed

Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med.17, 195 (2019). PubMed PMC

Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. International Conference on Learning Representations (2020).

Touvron, H., Cord, M. & Jégou, H. DeiT III: Revenge of the ViT. 17th European Conference on Computer Vision, 516–533 (2022).

Matsoukas, C., Haslum, J. F., Sorkhei, M., Söderberg, M. & Smith, K. What makes transfer learning work for medical images: feature reuse & other factors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9225–9234 (2022).

Shamshad, F. et al. Transformers in medical imaging: a survey. Med. Image Anal.88, 102802 (2023). PubMed

Van Calster, B. et al. Calibration: The Achilles heel of predictive analytics. BMC Med.17, 1–7 (2019). PubMed PMC

Van Calster, B. et al. A calibration hierarchy for risk models was defined: from utopia to empirical data. J. Clin. Epidemiol.74, 167–176 (2016). PubMed

Caron, M., et al. Emerging properties in self-supervised vision transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision, 9650–9660 (2021).

Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788 (2016).

Brown, L. D., Cai, T. T. & DasGupta, A. Interval estimation for a binomial proportion. Stat. Sci.16, 101–133 (2001).

Minderer, M. et al. Revisiting the calibration of modern neural networks. Adv. Neural Inf. Process. Syst.34, 15682–15694 (2021).

Mukhoti, J. et al. Calibrating deep neural networks using focal loss. Adv. Neural Inf. Process. Syst.33, 15288–15299 (2020).

Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).

Vaseli, H., et al. ProtoASNet: Dynamic Prototypes for Inherently Interpretable and Uncertainty-Aware Aortic Stenosis Classification in Echocardiography. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 368–378 (2023).

Selvaraju, R. R., et al. Grad-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision, 618–626 (2017).

Glas, A. S., Lijmer, J. G., Prins, M. H., Bonsel, G. J. & Bossuyt, P. M. The diagnostic odds ratio: a single indicator of test performance. J. Clin. Epidemiol.56, 1129–1135 (2003). PubMed

Hlatky, M. A. et al. Factors affecting sensitivity and specificity of exercise electrocardiography: multivariable analysis. Am. J. Med.77, 64–71 (1984). PubMed

Moons, K. G., van Es, G. A., Deckers, J. W., Habbema, D. J. & Grobbee, D. E. Limitations of sensitivity, specificity, likelihood ratio, and Bayes’ theorem in assessing diagnostic probabilities: a clinical example. Epidemiology8, 12–17 (1997). PubMed

Koch, A. H. et al. Analysis of computer-aided diagnostics in the preoperative diagnosis of ovarian cancer: a systematic review. Insights Imaging14, 34 (2023). PubMed PMC

Van Calster, B., Timmerman, S., Geysels, A., Verbakel, J. Y. & Froyman, W. A deep-learning-enabled diagnosis of ovarian cancer. Lancet Digit. Health4, e630 (2022). PubMed

Meys, E. et al. Subjective assessment versus ultrasound models to diagnose ovarian cancer: A systematic review and meta-analysis. Eur. J. Cancer58, 17–29 (2016). PubMed

Reitsma, J. B. et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J. Clin. Epidemiol.58, 982–990 (2005). PubMed

Van Calster, B. et al. Discrimination between benign and malignant adnexal masses by specialist ultrasound examination versus serum CA-125. J. Natl Cancer Inst.99, 1706–1714 (2007). PubMed

Deng, J., et al. ImageNet: a large-scale hierarchical image database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255 (2009).

Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, 2980–2988 (2017).

Cubuk, E. D., Zoph, B., Shlens, J. & Le, Q. V. Randaugment: practical automated data augmentation with a reduced search space. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 3008–2017 (2020).

Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst.30, 5998–6008 (2017).

Singhal, K. et al. Large language models encode clinical knowledge. Nature620, 172–180 (2023). PubMed PMC

Gheflati, B. & Rivaz, H. Vision transformers for classification of breast ultrasound images. 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 480–483 (2022). PubMed

Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. International Conference on Learning Representations (2019).

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature542, 115–118 (2017). PubMed PMC

Rey, D. & Neuhäuser, M. Wilcoxon-signed-rank test. In: Lovric M. (ed) International Encyclopedia of Statistical Science (Springer, 2011).

Efron, B. & Hastie, T. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (Cambridge University Press, 2016).

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...