Data processing pipeline for cardiogenic shock prediction using machine learning
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
37034352
PubMed Central
PMC10077147
DOI
10.3389/fcvm.2023.1132680
Knihovny.cz E-zdroje
- Klíčová slova
- cardiogenic shock, classification, machine learning, missing data imputation, prediction model, processing pipeline,
- Publikační typ
- časopisecké články MeSH
INTRODUCTION: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. METHODS: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. RESULTS: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. CONCLUSION: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
3rd Department of Cardiology National and Kapodistrian University of Athens Athens Greece
3rd Medical Department Cardiology and Intensive Care Medicine Wilhelminen Hospital Vienna Austria
Affiliated to the Sackler Faculty of Medicine Tel Aviv University Tel Aviv Israel
Anesthesia and Intensive Care Fondazione Policlinico San Matteo Hospital IRCCS Pavia Italy
Berlin Institute of Health Charité Universitätsmedizin Berlin Berlin Germany
Cardiology Department Emergency County Clinical Hospital of Oradea Oradea Romania
Clinic of Cardiac Surgery National Institute of Cardiovascular Diseases Bratislava Slovakia
Department of Acute Cardiology National Institute of Cardiovascular Diseases Bratislava Slovakia
Department of Cardiology Faculty of Human Medicine Zagazig University Zagazig Egypt
Department of Cardiology Ibrahim Cardiac Hospital and Research Institute Dhaka Bangladesh
Department of Clinical Surgical Diagnostic and Paediatric Sciences University of Pavia Pavia Italy
Department of Internal Medicine 2 Division of Cardiology Medical University of Vienna Vienna Austria
Deutsches Zentrum für Herz Kreislauf Forschung e 5 Berlin Germany
Duke Clinical Research Institute Durham NC United States
Faculty of Medicine Comenius University in Bratislava Bratislava Slovakia
Faculty of Medicine University of Novi Sad Novi Sad Serbia
Global Clinical Scholars Research Training Program Harvard Medical School Boston MA United States
Institute of Medical Informatics Charité Universitätsmedizin Berlin Berlin Germany
Premedix Academy Bratislava Slovakia
The Leviev Cardiothoracic and Vascular Center Chaim Sheba Medical Center Ramat Gan Israel
Zobrazit více v PubMed
Ghassemi M, Naumann T, Schulam P, Beam AL, Ranganath R. Opportunities in machine learning for healthcare. arXiv. (2018). arXiv:1806.00388. doi: arXiv:1806.00388v1 PubMed PMC
Bohm A, Jajcay N. Technical and practical aspects of artificial intelligence in cardiology. Bratisl Lek Listy. (2022) 123(0006-9248 (Print)):16–21. 10.4149/BLL_2022_003 PubMed DOI
Nemethova A, Nemeth M, Michalconok G, Bohm A. Identification of kdd problems from medical data. Adv Intell Syst Comput. Springer International Publishing (2019) 985:191–9. 10.1007/978-3-030-19810-7_19 DOI
Sanchez-Martinez S, Camara O, Piella G, Cikes M, González-Ballester MÁ, Miron M, et al. Machine learning for clinical decision-making: challenges and opportunities in cardiovascular imaging. Front Cardiovasc Med. (2022) 8:765693. 10.3389/fcvm.2021.765693 PubMed DOI PMC
Peterson E. Machine learning, predictive analytics, and clinical practice: can the past inform the present? JAMA. (2019) 322(23):2283–4. 10.1001/jama.2019.17831 PubMed DOI
Johnson AEW, Pollard TJ, Shen L, Lehman L-WH, Feng M, Ghassemi M, et al. Mimic-Iii, a freely accessible critical care database. Sci Data. (2016) 3(1):160035. 10.1038/sdata.2016.35 PubMed DOI PMC
Dai Z, Liu S, Wu J, Li M, Liu J, Li K. Analysis of adult disease characteristics and mortality on mimic-iii. PLoS One. (2020) 15(4):e0232176. 10.1371/journal.pone.0232176 PubMed DOI PMC
Song K, Guo C, Yang K, Li C, Ding N. Clinical characteristics of aortic aneurysm in mimic-iii. Heart Surg Forum. (2021) 24(2):E351–E8. 10.1532/hsf.3571 PubMed DOI
Li F, Xin H, Zhang J, Fu M, Zhou J, Lian Z. Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the mimic-iii database. BMJ Open. (2021) 11(7):e044779. 10.1136/bmjopen-2020-044779 PubMed DOI PMC
Peterkova A, Nemeth M, Bohm A. Computing missing values using neural networks in medical field. 2018 IEEE 22nd international conference on intelligent engineering systems (INES); 2018 21-23 June 2018
Bohm A, Jajcay N, Jankova J, Petrikova K, Bezak B. Artificial intelligence model for prediction of cardiogenic shock in patients with acute coronary syndrome. Eur Heart J Acute Cardiovascular Care. (2022) 11(Supplement_1):i107. 10.1093/ehjacc/zuac041.077 DOI
De Luca L, Olivari Z, Farina A, Gonzini L, Lucci D, Di Chiara A, et al. Temporal trends in the epidemiology, management, and outcome of patients with cardiogenic shock complicating acute coronary syndromes. Eur J Heart Fail. (2015) 17(11):1124–32. 10.1002/ejhf.339 PubMed DOI
Mandawat A, Rao SV. Percutaneous mechanical circulatory support devices in cardiogenic shock. Circ: Cardiovasc Interventions. (2017) 10(5):e004337. 10.1161/circinterventions.116.004337 PubMed DOI PMC
Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi J-C, et al. Coding algorithms for defining comorbidities in Icd-9-Cm and Icd-10 administrative data. Med Care. (2005) 43(11):1130–9. 10.1097/01.mlr.0000182534.19832.83 PubMed DOI
Vincent J-L, Nielsen ND, Shapiro NI, Gerbasi ME, Grossman A, Doroff R, et al. Mean arterial pressure and mortality in patients with distributive shock: a retrospective analysis of the mimic-iii database. Ann Intensive Care. (2018) 8:107. 10.1186/s13613-018-0448-9 PubMed DOI PMC
Lan P, Wang T-T, Li H-Y, Yan R-S, Liao W-C, Yu B-W, et al. Utilization of echocardiography during septic shock was associated with a decreased 28-day mortality: a propensity score-matched analysis of the mimic-iii database. Ann Transl Med. (2019) 7(22):662. 10.21037/atm.2019.10.79 PubMed DOI PMC
Little RJA. A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc. (1988) 83(404):1198–202. 10.1080/01621459.1988.10478722 DOI
Huque MH, Carlin JB, Simpson JA, Lee KJ. A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Med Res Methodol. (2018) 18:168. 10.1186/s12874-018-0615-6 PubMed DOI PMC
Herbers J, Miller R, Walther A, Schindler L, Schmidt K, Gao W, et al. How to deal with non-detectable and outlying values in biomarker research: best practices and recommendations for univariate imputation approaches. Compr Psychoneuroendocrinology. (2021) 7:100052. 10.1016/j.cpnec.2021.100052 PubMed DOI PMC
Waljee AK, Mukherjee A, Singal AG, Zhang Y, Warren J, Balis U, et al. Comparison of imputation methods for missing laboratory data in medicine. BMJ Open. (2013) 3(8):e002847. 10.1136/bmjopen-2013-002847 PubMed DOI PMC
He Y. Missing data analysis using multiple imputation: getting to the heart of the matter. Circ Cardiovasc Qual Outcomes. (2010) 3(1):98–105. 10.1161/circoutcomes.109.875658 PubMed DOI PMC
Tang L, Song J, Belin TR, Unützer J. A comparison of imputation methods in a longitudinal randomized clinical trial. Stat Med. (2006) 24(14):2111–28. 10.1002/sim.2099 PubMed DOI
Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res. (2011) 20(1):40–9. 10.1002/mpr.329 PubMed DOI PMC
Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of random forest and parametric imputation models for imputing missing data using mice: a caliber study. Am J Epidemiol. (2014) 179(6):764–74. 10.1093/aje/kwt312 PubMed DOI PMC
Ambler G, Omar RZ, Royston P. A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res. (2007) 16(3):277–98. 10.1177/0962280206074466 PubMed DOI
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. Long Beach, CA, USA: NIPS; (2017).
Morris TP, White IR, Royston P. Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. (2014) 14(1):75. 10.1186/1471-2288-14-75 PubMed DOI PMC
Malarvizhi RS, Thanamani AS. K-Nearest neighbor in missing data imputation. IJERD. (2012) 5(1):5–7. 10.9790/0661-0651215 DOI
Yao QK, James T. Accelerated and inexact soft-impute for large-scale matrix and tensor completion. IEEE Trans Knowl Data Eng. (2018) 31(9):1. 10.1109/tkde.2018.2867533 DOI
Liu Y, Brown SD. Comparison of five iterative imputation methods for multivariate classification. Chemometr Intell Lab Syst. (2013) 120:106–15. 10.1016/j.chemolab.2012.11.010 DOI
Salfrán D, Jordan P, Spiess M. Missing data: on criteria to evaluate imputation methods. Hamburg: Universitat Hamburg; (2016).
Abayomi K, Gelman A, Levy M. Diagnostics for multivariate imputations. J R Stat Soc, C: Appl Stat. (2008) 57(3):273–91. 10.1111/j.1467-9876.2007.00613.x DOI
Stevens JR, Al Masud A, Suyundikov A. A comparison of multiple testing adjustment methods with block-correlation positively-dependent tests. PLoS One. (2017) 12(4):e0176124. 10.1371/journal.pone.0176124 PubMed DOI PMC
Pears RF, Finlay J, Connor AM. Synthetic minority over-sampling technique (smote) for predicting software build outcomes. arXiv. 1407.2330 (2014). 10.48550/arxiv.1407.2330 DOI
Alejo R, Sotoca JM, Valdovinos RM, Toribio P. Edited nearest neighbor rule for improving neural networks classifications. In: Zhang L, Lu B-L, Kwok J, editors. Advances in neural networks. ISNN 2010; 2010 2010//. Berlin, Heidelberg: Springer Berlin Heidelberg. (2010). p. 303–10.
Kovács G. Smote-Variants: a python implementation of 85 minority oversampling techniques. Neurocomputing. (2019) 366:352–4. 10.1016/j.neucom.2019.06.100 DOI
Rafsunjani SS, Safa RS, Imran A, Rahim S, Nandi D. An empirical comparison of missing value imputation techniques on aps failure prediction. IJ Inf Technol Comput Sci. (2019) 11:21–9. 10.5815/ijitcs.2019.02.03 DOI
Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Br Med J. (2009) 338:b2393. 10.1136/bmj.b2393 PubMed DOI PMC
Wood AM, White IR, Thompson SG. Are missing outcome data adequately handled? A review of published randomized controlled trials in Major medical journals. Clin Trials. (2004) 1(4):368–76. 10.1191/1740774504cn032oa PubMed DOI
Noghrehchi F, Stoklosa J, Penev S, Warton DI. Selecting the model for multiple imputation of missing data: just use an ic!. Stat Med. (2021) 40(10):2467–97. 10.1002/sim.8915 PubMed DOI PMC
Staartjes VE, de Wispelaere MP, Vandertop WP, Schröder ML. Deep learning-based preoperative predictive analytics for patient-reported outcomes following lumbar discectomy: feasibility of center-specific modeling. Spine J. (2019) 19(5):853–61. 10.1016/j.spinee.2018.11.009 PubMed DOI
Alonso SG, de la Torre Díez I, Zapiraín BG. Predictive, personalized, preventive and participatory (4p) medicine applied to telemedicine and ehealth in the literature. J Med Syst. (2019) 43(5):140. 10.1007/s10916-019-1279-4 PubMed DOI
Machine learning-based scoring system to predict cardiogenic shock in acute coronary syndrome