Deep Learning Analysis of Polish Electronic Health Records for Diagnosis Prediction in Patients with Cardiovascular Diseases
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
LM2018101
Ministry of Education of CR within the LINDAT-CLARIAH-CZ project
project MUNI/IGA/1326/2021
Grant Agency of Masaryk University
PCN-1-005/N/0/K and PCN-1-073/N/1/K
Medical University of Silesia in Poland
Mieczysław Koćwin Foundation Scholarship
PubMed
35743653
PubMed Central
PMC9225281
DOI
10.3390/jpm12060869
PII: jpm12060869
Knihovny.cz E-zdroje
- Klíčová slova
- Polish language, deep learning, diagnosis prediction, electronic health records, text analysis,
- Publikační typ
- časopisecké články MeSH
Electronic health records naturally contain most of the medical information in the form of doctor's notes as unstructured or semi-structured texts. Current deep learning text analysis approaches allow researchers to reveal the inner semantics of text information and even identify hidden consequences that can offer extra decision support to doctors. In the presented article, we offer a new automated analysis of Polish summary texts of patient hospitalizations. The presented models were found to be able to predict the final diagnosis with almost 70% accuracy based just on the patient's medical history (only 132 words on average), with possible accuracy increases when adding further sentences from hospitalization results; even one sentence was found to improve the results by 4%, and the best accuracy of 78% was achieved with five extra sentences. In addition to detailed descriptions of the data and methodology, we present an evaluation of the analysis using more than 50,000 Polish cardiology patient texts and dive into a detailed error analysis of the approach. The results indicate that the deep analysis of just the medical history summary can suggest the direction of diagnosis with a high probability that can be further increased just by supplementing the records with further examination results.
Zobrazit více v PubMed
Roth G.A., Mensah G.A., Johnson C.O., Addolorato G., Ammirati E., Baddour L.M., Barengo N.C., Beaton A.Z., Benjamin E.J., Benziger C.P., et al. Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019: Update from the GBD 2019 Study. J. Am. Coll. Cardiol. 2020;76:2982–3021. doi: 10.1016/j.jacc.2020.11.010. PubMed DOI PMC
World Health Organization . International Statistical Classification of Diseases and Related Health Problems: 10th Revision (ICD-10) World Health Organization; Geneva, Switzerland: 2015. p. 2131. 2016 Revision.
Chen P.-F., Wang S.-M., Liao W.-C., Kuo L.-C., Chen K.-C., Lin Y.-C., Yang C.-Y., Chiu C.-H., Chang S.-C., Lai F. Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning. JMIR Med. Inform. 2021;9:e23230. doi: 10.2196/23230. PubMed DOI PMC
Sinsky C., Colligan L., Li L., Prgomet M., Reynolds S., Goeders L., Westbrook J., Tutty M., Blike G. Allocation of physician time in ambulatory practice: A time and motion study in 4 specialties. Ann. Intern. Med. 2016;165:753–760. doi: 10.7326/M16-0961. PubMed DOI
U.S. Centers for Medicare & Medicaid Services. Medicare Fee-for-Service Payment Regulations. [(accessed on 15 April 2022)]; Available online: https://www.cms.gov/Regulations-and-Guidance/Regulations-and-Policies/Medicare-Fee-for-Service-Payment-Regulations.
Liu J., Zhang Z., Razavian N. Deep EHR: Chronic disease prediction using medical notes; Proceedings of the 3rd Machine Learning for Healthcare Conference; Palo Alto, CA, USA. 17–18 August 2018; pp. 440–464.
Du Z., Yang Y., Zheng J., Li Q., Lin D., Li Y., Fan J., Cheng W., Chen X.-H., Cai Y. Accurate Prediction of Coronary Heart Disease for Patients with Hypertension from Electronic Health Records with Big Data and Machine-Learning Methods: Model Development and Performance Evaluation. JMIR Med. Inform. 2020;8:e17257. doi: 10.2196/17257. PubMed DOI PMC
Van Vleck T.T., Chan L., Coca S.G., Craven C.K., Do R., Ellis S.B., Kannry J.L., Loos R.J.F., Bonis P.A., Cho J., et al. Augmented intelligence with natural language processing applied to electronic health records for identifying patients with non-alcoholic fatty liver disease at risk for disease progression. Int. J. Med. Inform. 2019;129:334–341. doi: 10.1016/j.ijmedinf.2019.06.028. PubMed DOI PMC
Ashfaq A., Sant’Anna A., Lingman M., Nowaczyk S. Readmission prediction using deep learning on electronic health records. J. Biomed. Inform. 2019;97:103256. doi: 10.1016/j.jbi.2019.103256. PubMed DOI
Ma F., Chitta R., Zhou J., You Q., Sun T., Gao J. Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks; Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD ’17; Halifax, NS, Canada. 13–17 August 2017; New York, NY, USA: ACM Press; 2017. pp. 1903–1911.
Gao J., Wang X., Wang Y., Yang Z., Gao J., Wang J., Tang W., Xie X. CAMP: Co-Attention Memory Networks for Diagnosis Prediction in Healthcare; Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM); Beijing, China. 8–11 November 2019; pp. 1036–1041.
Nancy A.M., Maheswari R. A review on unstructured data in medical data. J. Crit. Rev. 2020;7:2202–2208.
Xiao C., Choi E., Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: A systematic review. J. Am. Med. Inform. Assoc. 2018;25:1419–1428. doi: 10.1093/jamia/ocy068. PubMed DOI PMC
Vaswani A., Shazeer N., Parmar N. Attention is all you need; Proceedings of the Advances in Neural Information Processing Systems (NIPS 2017); Long Beach, CA, USA. 4–9 December 2017;
Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv. 2018 doi: 10.48550/arxiv.1810.04805.1810.04805 DOI
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv. 2019 doi: 10.48550/arxiv.1907.11692.1907.11692 DOI
Floridi L., Chiriatti M. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds Mach. 2020;30:681–694. doi: 10.1007/s11023-020-09548-1. DOI
Johnson A.E.W., Pollard T.J., Shen L., Lehman L.-W.H., Feng M., Ghassemi M., Moody B., Szolovits P., Celi L.A., Mark R.G. MIMIC-III, a Freely Accessible Critical Care Database. Sci. Data. 2016;3:160035. doi: 10.1038/sdata.2016.35. PubMed DOI PMC
Johnson A., Bulgarelli L., Pollard T., Celi L.A., Mark R., Horng S. MIMIC-IV-ED. PhysioNet. 2021 doi: 10.13026/77z6-9w59. DOI
European Parliament . Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation) European Parliament; Strasbourg, France: 2016.
Rybak P., Mroczkowski R., Tracz J., Gawlik I. KLEJ: Comprehensive Benchmark for Polish Language Understanding. arXiv. 2020 doi: 10.48550/arxiv.2005.00630.2005.00630 DOI
Mroczkowski R., Rybak P., Wróblewska A., Gawlik I. HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish. arXiv. 2021 doi: 10.48550/arxiv.2105.01735.2105.0173 DOI
Dadas S., Perełkiewicz M., Poświata R. Pre-training Polish Transformer-Based Language Models at Scale. In: Rutkowski L., Scherer R., Korytkowski M., Pedrycz W., Tadeusiewicz R., Zurada J.M., editors. Proceedings of the Artificial Intelligence and Soft Computing: 19th International Conference, ICAISC 2020, Part II, Zakopane, Poland, 12–14 October 2020. Volume 12416. Springer International Publishing; Cham, Switzerland: 2020. pp. 301–314. Lecture Notes in Computer Science.
Conneau A., Khandelwal K., Goyal N., Chaudhary V., Wenzek G., Guzmán F., Grave E., Ott M., Zettlemoyer L., Stoyanov V. Unsupervised Cross-lingual Representation Learning at Scale. arXiv. 2019 doi: 10.48550/arxiv.1911.02116.1911.02116 DOI
Kim E., Rubinstein S.M., Nead K.T., Wojcieszynski A.P., Gabriel P.E., Warner J.L. The evolving use of electronic health records (EHR) for research. Semin. Radiat. Oncol. 2019;29:354–361. doi: 10.1016/j.semradonc.2019.05.010. PubMed DOI
Virani S.S., Alonso A., Aparicio H.J., Benjamin E.J., Bittencourt M.S., Callaway C.W., Carson A.P., Chamberlain A.M., Cheng S., Delling F.N., et al. Heart Disease and Stroke Statistics—2021 Update: A Report From the American Heart Association. Circulation. 2021;143:e254–e743. doi: 10.1161/CIR.0000000000000950. PubMed DOI
Timmis A., Vardas P., Townsend N., Torbica A., Katus H., De Smedt D., Gale C.P., Maggioni A.P., Petersen S.E., Huculeci R., et al. European Society of Cardiology: Cardiovascular disease statistics 2021. Eur. Heart J. 2022;43:716–799. doi: 10.1093/eurheartj/ehab892. PubMed DOI
Heusch G. Myocardial ischemia: Lack of coronary blood flow, myocardial oxygen supply-demand imbalance, or what? Am. J. Physiol. Heart Circ. Physiol. 2019;316:H1439–H1446. doi: 10.1152/ajpheart.00139.2019. PubMed DOI PMC
Knuuti J., Wijns W., Saraste A., Capodanno D., Barbato E., Funck-Brentano C., Prescott E., Storey R.F., Deaton C., Cuisset T., et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. Eur. Heart J. 2020;41:407–477. doi: 10.1093/eurheartj/ehz425. PubMed DOI
Writing Committee Members. Lawton J.S., Tamis-Holland J.E., Bangalore S., Bates E.R., Beckie T.M., Bischoff J.M., Bittl J.A., Cohen M.G., DiMaio J.M., et al. 2021 ACC/AHA/SCAI guideline for coronary artery revascularization: A report of the american college of cardiology/american heart association joint committee on clinical practice guidelines. J. Am. Coll. Cardiol. 2022;79:e21–e129. doi: 10.1016/j.jacc.2021.09.006. PubMed DOI
Conrad N., Judge A., Tran J., Mohseni H., Hedgecott D., Crespillo A.P., Allison M., Hemingway H., Cleland J.G., McMurray J.J.V., et al. Temporal trends and patterns in heart failure incidence: A population-based study of 4 million individuals. Lancet. 2018;391:572–580. doi: 10.1016/S0140-6736(17)32520-5. PubMed DOI PMC
Smeets M., Vaes B., Mamouris P., Van Den Akker M., Van Pottelbergh G., Goderis G., Janssens S., Aertgeerts B., Henrard S. Burden of heart failure in Flemish general practices: A registry-based study in the Intego database. BMJ Open. 2019;9:e022972. doi: 10.1136/bmjopen-2018-022972. PubMed DOI PMC
Virani S.S., Alonso A., Benjamin E.J., Bittencourt M.S., Callaway C.W., Carson A.P., Chamberlain A.M., Chang A.R., Cheng S., Delling F.N., et al. Heart Disease and Stroke Statistics—2020 Update: A Report From the American Heart Association. Circulation. 2020;141:e139–e596. doi: 10.1161/CIR.0000000000000757. PubMed DOI
McDonagh T.A., Metra M., Adamo M., Gardner R.S., Baumbach A., Böhm M., Burri H., Butler J., Čelutkienė J., Chioncel O., et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur. Heart J. 2021;42:3599–3726. doi: 10.1093/eurheartj/ehab368. PubMed DOI
Heidenreich P.A., Bozkurt B., Aguilar D., Allen L.A., Byun J.J., Colvin M.M., Deswal A., Drazner M.H., Dunlay S.M., Evers L.R., et al. 2022 AHA/ACC/HFSA Guideline for the Management of Heart Failure: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. J. Am. Coll. Cardiol. 2022;79:e263–e421. doi: 10.1016/j.jacc.2021.12.012. PubMed DOI
Hindricks G., Potpara T., Dagres N., Arbelo E., Bax J.J., Blomström-Lundqvist C., Boriani G., Castella M., Dan G.-A., Dilaveris P.E., et al. 2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS): The Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC) Developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. Eur. Heart J. 2021;42:373–498. doi: 10.1093/eurheartj/ehaa612. PubMed DOI
Mulder B.A., Rienstra M., Van Gelder I.C., Blaauw Y. Update on management of atrial fibrillation in heart failure: A focus on ablation. Heart. 2022;108:422–428. doi: 10.1136/heartjnl-2020-318081. PubMed DOI PMC
Kong H.-J. Managing unstructured big data in healthcare system. Healthc. Inform. Res. 2019;25:1–2. doi: 10.4258/hir.2019.25.1.1. PubMed DOI PMC
Reading Turchioe M., Volodarskiy A., Pathak J., Wright D.N., Tcheng J.E., Slotwiner D. Systematic review of current natural language processing methods and applications in cardiology. Heart. 2021 doi: 10.1136/heartjnl-2021-319769. PubMed DOI PMC
Chang T.E., Lichtman J.H., Goldstein L.B., George M.G. Accuracy of ICD-9-CM Codes by Hospital Characteristics and Stroke Severity: Paul Coverdell National Acute Stroke Program. J. Am. Heart Assoc. 2016;5:e003056. doi: 10.1161/JAHA.115.003056. PubMed DOI PMC
McCarthy C., Murphy S., Cohen J.A., Rehman S., Jones-O’Connor M., Olshan D.S., Singh A., Vaduganathan M., Januzzi J.L., Wasfy J.H. Misclassification of Myocardial Injury as Myocardial Infarction: Implications for Assessing Outcomes in Value-Based Programs. JAMA Cardiol. 2019;4:460–464. doi: 10.1001/jamacardio.2019.0716. PubMed DOI PMC
Kim H.N., Gupta A., Lan K., Stewart J., Dhanireddy S., Corcorran M.A. Diagnostic accuracy of ICD code versus discharge summary-based query for endocarditis cohort identification. Medicine. 2021;100:e28354. doi: 10.1097/MD.0000000000028354. PubMed DOI PMC
Horsky J., Drucker E.A., Ramelson H.Z. Accuracy and Completeness of Clinical Coding Using ICD-10 for Ambulatory Visits. AMIA Annu. Symp. Proc. 2017;2017:912–920. PubMed PMC
Alonso V., Santos J.V., Pinto M., Ferreira J., Lema I., Lopes F., Freitas A. Problems and Barriers during the Process of Clinical Coding: A Focus Group Study of Coders’ Perceptions. J. Med. Syst. 2020;44:62. doi: 10.1007/s10916-020-1532-x. PubMed DOI
Miotto R., Li L., Kidd B.A., Dudley J.T. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Sci. Rep. 2016;6:26094. doi: 10.1038/srep26094. PubMed DOI PMC
Goldstein B.A., Navar A.M., Pencina M.J., Ioannidis J.P.A. Opportunities and challenges in developing risk prediction models with electronic health records data: A systematic review. J. Am. Med. Inform. Assoc. 2017;24:198–208. doi: 10.1093/jamia/ocw042. PubMed DOI PMC
Osler T.M., Glance L.G., Cook A., Buzas J.S., Hosmer D.W. A trauma mortality prediction model based on the ICD-10-CM lexicon: TMPM-ICD10. J. Trauma Acute Care Surg. 2019;86:891–895. doi: 10.1097/TA.0000000000002194. PubMed DOI
Mullenbach J., Wiegreffe S., Duke J., Sun J., Eisenstein J. Explainable Prediction of Medical Codes from Clinical Text. arXiv. 2018 doi: 10.48550/arxiv.1802.05695.1802.05695 DOI
Mahbub M., Srinivasan S., Danciu I., Peluso A., Begoli E., Tamang S., Peterson G.D. Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients. PLoS ONE. 2022;17:e0262182. doi: 10.1371/journal.pone.0262182. PubMed DOI PMC
Shah A.D., Bailey E., Williams T., Denaxas S., Dobson R., Hemingway H. Natural language processing for disease phenotyping in UK primary care records for research: A pilot study in myocardial infarction and death. J. Biomed. Semant. 2019;10:20. doi: 10.1186/s13326-019-0214-4. PubMed DOI PMC
Moore C.R., Jain S., Haas S., Yadav H., Whitsel E., Rosamand W., Heiss G., Kucharska-Newton A.M. Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: A multicentre Atherosclerosis Risk in Communities (ARIC) validation study. BMJ Open. 2021;11:e047356. doi: 10.1136/bmjopen-2020-047356. PubMed DOI PMC
Garvin J.H., Kim Y., Gobbel G.T., Matheny M.E., Redd A., Bray B.E., Heidenreich P., Bolton D., Heavirland J., Kelly N., et al. Automating quality measures for heart failure using natural language processing: A descriptive study in the department of veterans affairs. JMIR Med. Inform. 2018;6:e5. doi: 10.2196/medinform.9150. PubMed DOI PMC
Bielinski S.J., Pathak J., Carrell D.S., Takahashi P.Y., Olson J.E., Larson N.B., Liu H., Sohn S., Wells Q.S., Denny J.C., et al. A Robust e-Epidemiology Tool in Phenotyping Heart Failure with Differentiation for Preserved and Reduced Ejection Fraction: The Electronic Medical Records and Genomics (eMERGE) Network. J. Cardiovasc. Transl. Res. 2015;8:475–483. doi: 10.1007/s12265-015-9644-2. PubMed DOI PMC
Shah S.J., Katz D.H., Selvaraj S., Burke M.A., Yancy C.W., Gheorghiade M., Bonow R.O., Huang C.-C., Deo R.C. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation. 2015;131:269–279. doi: 10.1161/CIRCULATIONAHA.114.010637. PubMed DOI PMC
Somani S., Yoffie S., Teng S., Havaldar S., Nadkarni G.N., Zhao S., Glicksberg B.S. Development and validation of techniques for phenotyping ST-elevation myocardial infarction encounters from electronic health records. JAMIA Open. 2021;4:ooab068. doi: 10.1093/jamiaopen/ooab068. PubMed DOI PMC
Watzlaf V.J.M., Garvin J.H., Moeini S., Anania-Firouzan P. The effectiveness of ICD-10-CM in capturing public health diseases. Perspect. Health Inf. Manag. 2007;4:6. PubMed PMC
Gąsior M., Pres D., Wojakowski W., Buszman P., Kalarus Z., Hawranek M., Gierlotka M., Lekston A., Mizia-Stec K., Zembala M., et al. Causes of hospitalization and prognosis in patients with cardiovascular diseases. Secular trends in the years 2006–2014 according to the SILesian CARDiovascular (SILCARD) database. Pol. Arch. Med. Wewn. 2016;126:754–762. doi: 10.20452/pamw.3557. PubMed DOI
Faryan M., Buchta P., Kowalski O., Wybraniec M.T., Cieśla D., Myrda K., Wnuk-Wojnar A., Kalarus Z., Gąsior M., Mizia-Stec K. Temporal trends in the availability and efficacy of catheter ablation for atrial fibrillation and atrial flutter in a highly populated urban area. Kardiol. Pol. 2020;78:537–544. doi: 10.33963/KP.15275. PubMed DOI
Myrda K., Streb W., Wojakowski W., Piegza J., Mitręga K., Smolka G., Nowak J., Podolecki T., Gasiewska-Żurek E., Nowowiejska-Wiewióra A., et al. Long-term outcomes in patients after left atrial appendage occlusion: The results from the LAAO SILESIA registry. Kardiol. Pol. 2022;80:332–338. doi: 10.33963/KP.a2022.0047. PubMed DOI
Wilczek K., Hawranek M., Wojakowski W., Chodór P., Zembala M., Buszman P., Bochenek A., Deja M., Dyrbus M., Ciesla D., et al. Transcatheter Versus Surgical Valve Replacement: A 24-months Propensity-matched Analysis of the SILCARD Registry. Anatol. J. Cardiol. 2022;26:172–179. doi: 10.5152/AnatolJCardiol.2021.83009. PubMed DOI PMC
Pres D., Niedziela J., Kurek A., Gołba K., Mizia-Stec K., Gąsior Z., Nowalany-Kozielska E., Wojakowski W., Tajstra M., Gierlotka M., et al. In-hospital and long-term prognosis in patients after implantation of implantable cardioverter-defibrillators and cardiac resynchronization therapy: 10-year results of the SILCARD registry. Pol. Arch. Intern. Med. 2018;128:580–586. doi: 10.20452/pamw.4332. PubMed DOI
Han S., Zhang R.F., Shi L., Richie R., Liu H., Tseng A., Quan W., Ryan N., Brent D., Tsui F.R. Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing. J. Biomed. Inform. 2022;127:103984. doi: 10.1016/j.jbi.2021.103984. PubMed DOI
Hatef E., Rouhizadeh M., Nau C., Xie F., Rouillard C., Abu-Nasser M., Padilla A., Lyons L.J., Kharrazi H., Weiner J.P., et al. Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: A comparison of 3 integrated healthcare delivery systems. JAMIA Open. 2022;5:ooac006. doi: 10.1093/jamiaopen/ooac006. PubMed DOI PMC
Patra B.G., Sharma M.M., Vekaria V., Adekkanattu P., Patterson O.V., Glicksberg B., Lepow L.A., Ryu E., Biernacka J.M., Furmanchuk A., et al. Extracting social determinants of health from electronic health records using natural language processing: A systematic review. J. Am. Med. Inform. Assoc. 2021;28:2716–2727. doi: 10.1093/jamia/ocab170. PubMed DOI PMC
Lazakidou A.A. Handbook of Research on Informatics in Healthcare and Biomedicine. 1st ed. IGI Global; Hershey, PA, USA: 2006. p. 240.
Wang Y., Wang L., Rastegar-Mojarad M., Moon S., Shen F., Afzal N., Liu S., Zeng Y., Mehrabi S., Sohn S., et al. Clinical information extraction applications: A literature review. J. Biomed. Inform. 2018;77:34–49. doi: 10.1016/j.jbi.2017.11.011. PubMed DOI PMC