Data processing pipeline for cardiogenic shock prediction using machine learning

. 2023 ; 10 () : 1132680. [epub] 20230323

Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid37034352

INTRODUCTION: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. METHODS: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. RESULTS: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. CONCLUSION: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.

3rd Department of Cardiology National and Kapodistrian University of Athens Athens Greece

3rd Medical Department Cardiology and Intensive Care Medicine Wilhelminen Hospital Vienna Austria

Affiliated to the Sackler Faculty of Medicine Tel Aviv University Tel Aviv Israel

Anesthesia and Intensive Care Fondazione Policlinico San Matteo Hospital IRCCS Pavia Italy

Berlin Institute of Health Charité Universitätsmedizin Berlin Berlin Germany

Cardiac Intensive Care Unit Institute for Cardiovascular Diseases of Vojvodina Sremska Kamenica Serbia

Cardiology and Arrhythmology Clinic Marche Polytechnic University University Hospital Umberto 1 Lancisi Salesi Ancona Italy

Cardiology Department Emergency County Clinical Hospital of Oradea Oradea Romania

Clinic of Cardiac Surgery National Institute of Cardiovascular Diseases Bratislava Slovakia

Department for Cardiac Surgery Cardiac Regeneration Research Medical University of Innsbruck Innsbruck Austria

Department of Acute Cardiology National Institute of Cardiovascular Diseases Bratislava Slovakia

Department of Cardiology Angiology and Intensive Care Medicine Deutsches Herzzentrum der Charité Campus Benjamin Franklin Charité Universitätsmedizin Berlin Berlin Germany

Department of Cardiology Faculty of Human Medicine Zagazig University Zagazig Egypt

Department of Cardiology Ibrahim Cardiac Hospital and Research Institute Dhaka Bangladesh

Department of Clinical Surgical Diagnostic and Paediatric Sciences University of Pavia Pavia Italy

Department of Complex Systems Institute of Computer Science Czech Academy of Sciences Prague Czech Republic

Department of Internal Medicine 2 Division of Cardiology Medical University of Vienna Vienna Austria

Deutsches Zentrum für Herz Kreislauf Forschung e 5 Berlin Germany

Duke Clinical Research Institute Durham NC United States

Faculty of Medicine Comenius University in Bratislava Bratislava Slovakia

Faculty of Medicine University of Novi Sad Novi Sad Serbia

Global Clinical Scholars Research Training Program Harvard Medical School Boston MA United States

Institute of Cardiovascular Sciences University of Birmingham Medical School Birmingham United Kingdom

Institute of Medical Informatics Charité Universitätsmedizin Berlin Berlin Germany

Premedix Academy Bratislava Slovakia

The Leviev Cardiothoracic and Vascular Center Chaim Sheba Medical Center Ramat Gan Israel

Zobrazit více v PubMed

Ghassemi M, Naumann T, Schulam P, Beam AL, Ranganath R. Opportunities in machine learning for healthcare. arXiv. (2018). arXiv:1806.00388. doi: arXiv:1806.00388v1 PubMed PMC

Bohm A, Jajcay N. Technical and practical aspects of artificial intelligence in cardiology. Bratisl Lek Listy. (2022) 123(0006-9248 (Print)):16–21. 10.4149/BLL_2022_003 PubMed DOI

Nemethova A, Nemeth M, Michalconok G, Bohm A. Identification of kdd problems from medical data. Adv Intell Syst Comput. Springer International Publishing (2019) 985:191–9. 10.1007/978-3-030-19810-7_19 DOI

Sanchez-Martinez S, Camara O, Piella G, Cikes M, González-Ballester MÁ, Miron M, et al. Machine learning for clinical decision-making: challenges and opportunities in cardiovascular imaging. Front Cardiovasc Med. (2022) 8:765693. 10.3389/fcvm.2021.765693 PubMed DOI PMC

Peterson E. Machine learning, predictive analytics, and clinical practice: can the past inform the present? JAMA. (2019) 322(23):2283–4. 10.1001/jama.2019.17831 PubMed DOI

Johnson AEW, Pollard TJ, Shen L, Lehman L-WH, Feng M, Ghassemi M, et al. Mimic-Iii, a freely accessible critical care database. Sci Data. (2016) 3(1):160035. 10.1038/sdata.2016.35 PubMed DOI PMC

Dai Z, Liu S, Wu J, Li M, Liu J, Li K. Analysis of adult disease characteristics and mortality on mimic-iii. PLoS One. (2020) 15(4):e0232176. 10.1371/journal.pone.0232176 PubMed DOI PMC

Song K, Guo C, Yang K, Li C, Ding N. Clinical characteristics of aortic aneurysm in mimic-iii. Heart Surg Forum. (2021) 24(2):E351–E8. 10.1532/hsf.3571 PubMed DOI

Li F, Xin H, Zhang J, Fu M, Zhou J, Lian Z. Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the mimic-iii database. BMJ Open. (2021) 11(7):e044779. 10.1136/bmjopen-2020-044779 PubMed DOI PMC

Peterkova A, Nemeth M, Bohm A. Computing missing values using neural networks in medical field. 2018 IEEE 22nd international conference on intelligent engineering systems (INES); 2018 21-23 June 2018

Bohm A, Jajcay N, Jankova J, Petrikova K, Bezak B. Artificial intelligence model for prediction of cardiogenic shock in patients with acute coronary syndrome. Eur Heart J Acute Cardiovascular Care. (2022) 11(Supplement_1):i107. 10.1093/ehjacc/zuac041.077 DOI

De Luca L, Olivari Z, Farina A, Gonzini L, Lucci D, Di Chiara A, et al. Temporal trends in the epidemiology, management, and outcome of patients with cardiogenic shock complicating acute coronary syndromes. Eur J Heart Fail. (2015) 17(11):1124–32. 10.1002/ejhf.339 PubMed DOI

Mandawat A, Rao SV. Percutaneous mechanical circulatory support devices in cardiogenic shock. Circ: Cardiovasc Interventions. (2017) 10(5):e004337. 10.1161/circinterventions.116.004337 PubMed DOI PMC

Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi J-C, et al. Coding algorithms for defining comorbidities in Icd-9-Cm and Icd-10 administrative data. Med Care. (2005) 43(11):1130–9. 10.1097/01.mlr.0000182534.19832.83 PubMed DOI

Vincent J-L, Nielsen ND, Shapiro NI, Gerbasi ME, Grossman A, Doroff R, et al. Mean arterial pressure and mortality in patients with distributive shock: a retrospective analysis of the mimic-iii database. Ann Intensive Care. (2018) 8:107. 10.1186/s13613-018-0448-9 PubMed DOI PMC

Lan P, Wang T-T, Li H-Y, Yan R-S, Liao W-C, Yu B-W, et al. Utilization of echocardiography during septic shock was associated with a decreased 28-day mortality: a propensity score-matched analysis of the mimic-iii database. Ann Transl Med. (2019) 7(22):662. 10.21037/atm.2019.10.79 PubMed DOI PMC

Little RJA. A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc. (1988) 83(404):1198–202. 10.1080/01621459.1988.10478722 DOI

Huque MH, Carlin JB, Simpson JA, Lee KJ. A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Med Res Methodol. (2018) 18:168. 10.1186/s12874-018-0615-6 PubMed DOI PMC

Herbers J, Miller R, Walther A, Schindler L, Schmidt K, Gao W, et al. How to deal with non-detectable and outlying values in biomarker research: best practices and recommendations for univariate imputation approaches. Compr Psychoneuroendocrinology. (2021) 7:100052. 10.1016/j.cpnec.2021.100052 PubMed DOI PMC

Waljee AK, Mukherjee A, Singal AG, Zhang Y, Warren J, Balis U, et al. Comparison of imputation methods for missing laboratory data in medicine. BMJ Open. (2013) 3(8):e002847. 10.1136/bmjopen-2013-002847 PubMed DOI PMC

He Y. Missing data analysis using multiple imputation: getting to the heart of the matter. Circ Cardiovasc Qual Outcomes. (2010) 3(1):98–105. 10.1161/circoutcomes.109.875658 PubMed DOI PMC

Tang L, Song J, Belin TR, Unützer J. A comparison of imputation methods in a longitudinal randomized clinical trial. Stat Med. (2006) 24(14):2111–28. 10.1002/sim.2099 PubMed DOI

Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res. (2011) 20(1):40–9. 10.1002/mpr.329 PubMed DOI PMC

Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of random forest and parametric imputation models for imputing missing data using mice: a caliber study. Am J Epidemiol. (2014) 179(6):764–74. 10.1093/aje/kwt312 PubMed DOI PMC

Ambler G, Omar RZ, Royston P. A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res. (2007) 16(3):277–98. 10.1177/0962280206074466 PubMed DOI

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. Long Beach, CA, USA: NIPS; (2017).

Morris TP, White IR, Royston P. Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. (2014) 14(1):75. 10.1186/1471-2288-14-75 PubMed DOI PMC

Malarvizhi RS, Thanamani AS. K-Nearest neighbor in missing data imputation. IJERD. (2012) 5(1):5–7. 10.9790/0661-0651215 DOI

Yao QK, James T. Accelerated and inexact soft-impute for large-scale matrix and tensor completion. IEEE Trans Knowl Data Eng. (2018) 31(9):1. 10.1109/tkde.2018.2867533 DOI

Liu Y, Brown SD. Comparison of five iterative imputation methods for multivariate classification. Chemometr Intell Lab Syst. (2013) 120:106–15. 10.1016/j.chemolab.2012.11.010 DOI

Salfrán D, Jordan P, Spiess M. Missing data: on criteria to evaluate imputation methods. Hamburg: Universitat Hamburg; (2016).

Abayomi K, Gelman A, Levy M. Diagnostics for multivariate imputations. J R Stat Soc, C: Appl Stat. (2008) 57(3):273–91. 10.1111/j.1467-9876.2007.00613.x DOI

Stevens JR, Al Masud A, Suyundikov A. A comparison of multiple testing adjustment methods with block-correlation positively-dependent tests. PLoS One. (2017) 12(4):e0176124. 10.1371/journal.pone.0176124 PubMed DOI PMC

Pears RF, Finlay J, Connor AM. Synthetic minority over-sampling technique (smote) for predicting software build outcomes. arXiv. 1407.2330 (2014). 10.48550/arxiv.1407.2330 DOI

Alejo R, Sotoca JM, Valdovinos RM, Toribio P. Edited nearest neighbor rule for improving neural networks classifications. In: Zhang L, Lu B-L, Kwok J, editors. Advances in neural networks. ISNN 2010; 2010 2010//. Berlin, Heidelberg: Springer Berlin Heidelberg. (2010). p. 303–10.

Kovács G. Smote-Variants: a python implementation of 85 minority oversampling techniques. Neurocomputing. (2019) 366:352–4. 10.1016/j.neucom.2019.06.100 DOI

Rafsunjani SS, Safa RS, Imran A, Rahim S, Nandi D. An empirical comparison of missing value imputation techniques on aps failure prediction. IJ Inf Technol Comput Sci. (2019) 11:21–9. 10.5815/ijitcs.2019.02.03 DOI

Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Br Med J. (2009) 338:b2393. 10.1136/bmj.b2393 PubMed DOI PMC

Wood AM, White IR, Thompson SG. Are missing outcome data adequately handled? A review of published randomized controlled trials in Major medical journals. Clin Trials. (2004) 1(4):368–76. 10.1191/1740774504cn032oa PubMed DOI

Noghrehchi F, Stoklosa J, Penev S, Warton DI. Selecting the model for multiple imputation of missing data: just use an ic!. Stat Med. (2021) 40(10):2467–97. 10.1002/sim.8915 PubMed DOI PMC

Staartjes VE, de Wispelaere MP, Vandertop WP, Schröder ML. Deep learning-based preoperative predictive analytics for patient-reported outcomes following lumbar discectomy: feasibility of center-specific modeling. Spine J. (2019) 19(5):853–61. 10.1016/j.spinee.2018.11.009 PubMed DOI

Alonso SG, de la Torre Díez I, Zapiraín BG. Predictive, personalized, preventive and participatory (4p) medicine applied to telemedicine and ehealth in the literature. J Med Syst. (2019) 43(5):140. 10.1007/s10916-019-1279-4 PubMed DOI

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Machine learning-based scoring system to predict cardiogenic shock in acute coronary syndrome

. 2025 Mar ; 6 (2) : 240-251. [epub] 20250106

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...