International multicenter validation of AI-driven ultrasound detection of ovarian cancer
Jazyk angličtina Země Spojené státy americké Médium print-electronic
Typ dokumentu časopisecké články, multicentrická studie, validační studie
Grantová podpora
231143
Radiumhemmets Forskningsfonder (Cancer Research Foundations of Radiumhemmet)
211657 Pi 01 H
Cancerfonden (Swedish Cancer Society)
2020-01702
Vetenskapsrdet (Swedish Research Council)
PubMed
39747679
PubMed Central
PMC11750711
DOI
10.1038/s41591-024-03329-4
PII: 10.1038/s41591-024-03329-4
Knihovny.cz E-zdroje
- MeSH
- deep learning MeSH
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- nádory vaječníků * diagnostické zobrazování diagnóza patologie MeSH
- neuronové sítě MeSH
- retrospektivní studie MeSH
- senioři MeSH
- senzitivita a specificita MeSH
- ultrasonografie metody MeSH
- umělá inteligence * MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- multicentrická studie MeSH
- validační studie MeSH
Ovarian lesions are common and often incidentally detected. A critical shortage of expert ultrasound examiners has raised concerns of unnecessary interventions and delayed cancer diagnoses. Deep learning has shown promising results in the detection of ovarian cancer in ultrasound images; however, external validation is lacking. In this international multicenter retrospective study, we developed and validated transformer-based neural network models using a comprehensive dataset of 17,119 ultrasound images from 3,652 patients across 20 centers in eight countries. Using a leave-one-center-out cross-validation scheme, for each center in turn, we trained a model using data from the remaining centers. The models demonstrated robust performance across centers, ultrasound systems, histological diagnoses and patient age groups, significantly outperforming both expert and non-expert examiners on all evaluated metrics, namely F1 score, sensitivity, specificity, accuracy, Cohen's kappa, Matthew's correlation coefficient, diagnostic odds ratio and Youden's J statistic. Furthermore, in a retrospective triage simulation, artificial intelligence (AI)-driven diagnostic support reduced referrals to experts by 63% while significantly surpassing the diagnostic performance of the current practice. These results show that transformer-based models exhibit strong generalization and above human expert-level diagnostic accuracy, with the potential to alleviate the shortage of expert ultrasound examiners and improve patient outcomes.
3rd Faculty of Medicine Charles University Prague Czech Republic
Department of Clinical Science and Education Södersjukhuset Karolinska Institutet Stockholm Sweden
Department of Gynecological Oncology and Gynecology Medical University of Lublin Lublin Poland
Department of Medicine and Surgery University of Milan Bicocca Milan Italy
Department of Obstetrics and Gynaecology Lithuanian University of Health Sciences Kaunas Lithuania
Department of Obstetrics and Gynecology Clínica Universidad de Navarra Pamplona Spain
Department of Obstetrics and Gynecology Rizal Medical Center Manila Philippines
Department of Obstetrics and Gynecology Skåne University Hospital Lund Sweden
Department of Obstetrics and Gynecology Södersjukhuset Stockholm Sweden
Department of Obstetrics Gynecology and Reproduction Dexeus University Hospital Barcelona Spain
Digital Futures KTH Royal Institute of Technology Stockholm Sweden
Fondazione Poliambulanza Istituto Ospedaliero Brescia Italy
Gynecologic and Obstetric Unit Women's and Children's Department Forlì Hospital Forlì Italy
Gynecology and Breast Care Center Mater Olbia Hospital Olbia Italy
Institute for Maternal and Child Health IRCCS 'Burlo Garofolo' Trieste Italy
Institute for the Care of Mother and Child Prague Czech Republic
Obstetrics and Gynecology Unit Forlì and Faenza Hospitals AUSL Romagna Forlì Italy
Science for Life Laboratory Stockholm Sweden
Unit of Preventive Gynecology European Institute of Oncology IRCCS Milan Italy
UO Gynecology Fondazione IRCCS San Gerardo dei Tintori Monza Italy
Zobrazit více v PubMed
Yazbek, J. et al. Effect of quality of gynaecological ultrasonography on management of patients with suspected ovarian cancer: a randomised controlled trial. PubMed DOI
Froyman, W. et al. Risk of complications in patients with conservatively managed ovarian tumours (IOTA5): a 2-year interim analysis of a multicentre, prospective, cohort study. PubMed DOI
Vergote, I. et al. Prognostic importance of degree of differentiation and cyst rupture in stage I invasive epithelial ovarian carcinoma. PubMed DOI
Bristow, R. E., Tomacruz, R. S., Armstrong, D. K., Trimble, E. L. & Montz, F. J. Survival effect of maximal cytoreductive surgery for advanced ovarian carcinoma during the platinum era: a meta-analysis. PubMed DOI
Timmerman, D. et al. ESGO/ISUOG/IOTA/ESGE Consensus Statement on pre-operative diagnosis of ovarian tumors. PubMed DOI PMC
Van Holsbeke, C. et al. Ultrasound methods to distinguish between malignant and benign adnexal masses in the hands of examiners with different levels of experience. PubMed DOI
Van Holsbeke, C. et al. Ultrasound experience substantially impacts on diagnostic performance and confidence when adnexal masses are classified using pattern recognition. PubMed DOI
Timmerman, D. et al. Subjective assessment of adnexal masses with the use of ultrasonography: an analysis of interobserver variability and experience. PubMed DOI
Christiansen, F. et al. Ultrasound image analysis using deep neural networks for discriminating between benign and malignant ovarian tumors: comparison with expert subjective assessment. PubMed DOI PMC
Gao, Y. et al. Deep learning-enabled pelvic ultrasound images for accurate diagnosis of ovarian cancer in China: a retrospective, multicentre, diagnostic study. PubMed DOI
Cohen, J. P. et al. Problems in the deployment of machine-learned models in health care. PubMed DOI PMC
Goodfellow, I., Bengio, Y. & Courville, A.
Stacke, K. et al. Measuring domain shift for deep learning in histopathology. PubMed DOI
Sharifzadeh, M., Tehrani, A. K., Benali, H. & Rivaz, H. Ultrasound domain adaptation using frequency domain analysis.
Tierney, J., et al. Accounting for domain shift in neural network ultrasound beamforming.
Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. PubMed DOI
Chalkidou, A. et al. Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening. PubMed DOI
Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. PubMed DOI PMC
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale.
Touvron, H., Cord, M. & Jégou, H. DeiT III: Revenge of the ViT.
Matsoukas, C., Haslum, J. F., Sorkhei, M., Söderberg, M. & Smith, K. What makes transfer learning work for medical images: feature reuse & other factors.
Shamshad, F. et al. Transformers in medical imaging: a survey. PubMed DOI
Van Calster, B. et al. Calibration: The Achilles heel of predictive analytics. PubMed PMC
Van Calster, B. et al. A calibration hierarchy for risk models was defined: from utopia to empirical data. PubMed DOI
Caron, M.,
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection.
Brown, L. D., Cai, T. T. & DasGupta, A. Interval estimation for a binomial proportion. DOI
Minderer, M. et al. Revisiting the calibration of modern neural networks.
Mukhoti, J. et al. Calibrating deep neural networks using focal loss.
Bishop, C. M.
Vaseli, H.,
Selvaraju, R. R.,
Glas, A. S., Lijmer, J. G., Prins, M. H., Bonsel, G. J. & Bossuyt, P. M. The diagnostic odds ratio: a single indicator of test performance. PubMed DOI
Hlatky, M. A. et al. Factors affecting sensitivity and specificity of exercise electrocardiography: multivariable analysis. PubMed DOI
Moons, K. G., van Es, G. A., Deckers, J. W., Habbema, D. J. & Grobbee, D. E. Limitations of sensitivity, specificity, likelihood ratio, and Bayes’ theorem in assessing diagnostic probabilities: a clinical example. PubMed DOI
Koch, A. H. et al. Analysis of computer-aided diagnostics in the preoperative diagnosis of ovarian cancer: a systematic review. PubMed DOI PMC
Van Calster, B., Timmerman, S., Geysels, A., Verbakel, J. Y. & Froyman, W. A deep-learning-enabled diagnosis of ovarian cancer. PubMed DOI
Meys, E. et al. Subjective assessment versus ultrasound models to diagnose ovarian cancer: A systematic review and meta-analysis. PubMed DOI
Reitsma, J. B. et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. PubMed DOI
Van Calster, B. et al. Discrimination between benign and malignant adnexal masses by specialist ultrasound examination versus serum CA-125. PubMed DOI
Deng, J.,
Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection.
Cubuk, E. D., Zoph, B., Shlens, J. & Le, Q. V. Randaugment: practical automated data augmentation with a reduced search space.
Vaswani, A. et al. Attention is all you need.
Singhal, K. et al. Large language models encode clinical knowledge. PubMed DOI PMC
Gheflati, B. & Rivaz, H. Vision transformers for classification of breast ultrasound images. PubMed
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization.
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. PubMed DOI PMC
Rey, D. & Neuhäuser, M. Wilcoxon-signed-rank test. In: Lovric M. (ed)
Efron, B. & Hastie, T.