Methodological guidelines to estimate population-based health indicators using linked data and/or machine learning techniques
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
801553 / InfAct
European Commission
PubMed
34983651
PubMed Central
PMC8725299
DOI
10.1186/s13690-021-00770-6
PII: 10.1186/s13690-021-00770-6
Knihovny.cz E-zdroje
- Klíčová slova
- Artificial intelligence, Data linkage, Guidelines, Health indicators, Linked data, Machine learning techniques, Methodological guidelines, Population health research, Statistical techniques,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: The capacity to use data linkage and artificial intelligence to estimate and predict health indicators varies across European countries. However, the estimation of health indicators from linked administrative data is challenging due to several reasons such as variability in data sources and data collection methods resulting in reduced interoperability at various levels and timeliness, availability of a large number of variables, lack of skills and capacity to link and analyze big data. The main objective of this study is to develop the methodological guidelines calculating population-based health indicators to guide European countries using linked data and/or machine learning (ML) techniques with new methods. METHOD: We have performed the following step-wise approach systematically to develop the methodological guidelines: i. Scientific literature review, ii. Identification of inspiring examples from European countries, and iii. Developing the checklist of guidelines contents. RESULTS: We have developed the methodological guidelines, which provide a systematic approach for studies using linked data and/or ML-techniques to produce population-based health indicators. These guidelines include a detailed checklist of the following items: rationale and objective of the study (i.e., research question), study design, linked data sources, study population/sample size, study outcomes, data preparation, data analysis (i.e., statistical techniques, sensitivity analysis and potential issues during data analysis) and study limitations. CONCLUSIONS: This is the first study to develop the methodological guidelines for studies focused on population health using linked data and/or machine learning techniques. These guidelines would support researchers to adopt and develop a systematic approach for high-quality research methods. There is a need for high-quality research methodologies using more linked data and ML-techniques to develop a structured cross-disciplinary approach for improving the population health information and thereby the population health.
Bordeaux University Bordeaux School of Public Health Bordeaux France
Department of Non Communicable Diseases and Injuries Santé Publique France Saint Maurice France
Finnish Institute for Health and Welfare Helsinki Finland
INSERM INRIA SISTM team Bordeaux Population health Bordeaux France
Institute of Biostatistics and Analyses Faculty of Medicine Masaryk University Brno Czech Republic
Institute of Health Information and Statistics of the Czech Republic Prague Czech Republic
Medical Information Department Bordeaux University Hospital Bordeaux France
National Institute for Public Health and the Environment Bilthoven The Netherlands
National Institute of public health division of health informatics and biostatistics Zagreb Croatia
Zobrazit více v PubMed
Lloyd K, McGregor J, John A, Craddock N, Walters JT, Linden D, Jones I, Bentall R, Lyons RA, Ford DV, Owen MJ. A national population-based e-cohort of people with psychosis (PsyCymru) linking prospectively ascertained phenotypically rich and genetic data to routinely collected records: overview, recruitment and linkage. Schizophr Res. 2015;166(1):131–136. doi: 10.1016/j.schres.2015.05.036. PubMed DOI
Delnord M, Szamotulska K, Hindori-Mohangoo AD, Blondel B, Macfarlane AJ, Dattani N, Barona C, Berrut S, Zile I, Wood R, Sakkeus L, Gissler M, Zeitlin J, and the Euro-Peristat Scientific Committee Linking databases on perinatal health: a review of the literature and current practices in Europe. Eur J Pub Health. 2016;26(3):422–430. doi: 10.1093/eurpub/ckv231. PubMed DOI PMC
Haneef R, Delnord M, Vernay M, Bauchet E, Gaidelyte R, Van Oyen H, Or Z, Pérez-Gómez B, Palmieri L, Achterberg P, et al. Innovative use of data sources: a cross-sectional study of data linkage and artificial intelligence practices across European countries. Arc Public Health. 2020;78(1):55. doi: 10.1186/s13690-020-00436-9. PubMed DOI PMC
Bradley CJ, Penberthy L, Devers KJ, Holden DJ. Health Services Research and Data Linkages: Issues, Methods, and Directions for the Future. Health Serv Res. 2010;45(5p2):1468–1488. doi: 10.1111/j.1475-6773.2010.01142.x. PubMed DOI PMC
Joint Action on Health Information: https://www.inf-act.eu/. 2018.
INFACT: Inspiring Examples from European Countries: https://www.inf-act.eu/sites/inf-act.eu/files/2021-02/D%209.2%20%28Part%20B%29_Inspiring%20Examples.pdf. 2020.
Stevens G, Alkema L, Black R, Boerma J, Collins G, Ezzati M, Grove J, Hogan D, Hogan M, Horton R, Lawn JE, Marušić A, Mathers CD, Murray CJ, Rudan I, Salomon JA, Simpson PJ, Vos T, Welch V, (The GATHER Working Group) Guidelines for accurate and transparent health estimates reporting: the GATHER statement. Lancet. 2016;388(10062):e19–e23. doi: 10.1016/S0140-6736(16)30388-9. PubMed DOI
Bohensky M, Jolley D, Sundararajan V, Evans S, Ibrahim J, Brand C. Development and validation of reporting guidelines for studies involving data linkage. Aust N Z J Public Health. 2011;35(5):486–489. doi: 10.1111/j.1753-6405.2011.00741.x. PubMed DOI
Illinois Uo: What is a PICO model?:https://researchguides.uic.edu/c.php?g=252338&p=3954402. 2020.
Brownlee J: A Gentle Introduction to Statistical Hypothesis: https://machinelearningmastery.com/statistical-hypothesis-tests/. 2018.
Robnik-Sikonja M, Kononenko I: An adaptation of Relief for attribute estimation in regression: http://www.clopinet.com/isabelle/Projects/reading/robnik97-icml.pdf. 1997.
Ezzati A, Zammit AR, Harvey DJ, Habeck C, Hall CB, Lipton RB, for the Alzheimer’s disease neuroimaging I Optimizing machine learning Methods to improve predictive models of Alzheimer’s disease. J Alzheimers Dis. 2019;71(3):1027–1036. doi: 10.3233/JAD-190262. PubMed DOI PMC
Yang T, Zhang L, Yi L, Feng H, Li S, Chen H, Zhu J, Zhao J, Zeng Y, Liu H. Ensemble learning models based on noninvasive features for type 2 diabetes screening: model development and validation. JMIR Med Inform. 2020;8(6):e15431. doi: 10.2196/15431. PubMed DOI PMC
Mason KE, Pearce N, Cummins S. Associations between fast food and physical activity environments and adiposity in mid-life: cross-sectional, observational evidence from UK biobank. Lancet Public Health. 2018;3(1):e24–e33. doi: 10.1016/S2468-2667(17)30212-8. PubMed DOI PMC
Sultan A, West J, Grainge M, Riley R, Tata L, Stephansson O, et al. Development and validation of risk prediction model for venous thromboembolism in postpartum women: multinational cohort study. Bmj. 2016:5(355). 10.1136/bmj.i6253. PubMed PMC
Patel K, Spertus J, Khariton Y, Tang Y, Curtis L, Chan P. Association between prompt defibrillation and epinephrine treatment with long-term survival after in-hospital cardiac arrest. Circulation. 2018;137(19):2041–2051. doi: 10.1161/CIRCULATIONAHA.117.030488. PubMed DOI PMC
Fogg AJ, Welsh J, Banks E, Abhayaratna W, Korda RJ. Variation in cardiovascular disease care: an Australian cohort study on sex differences in receipt of coronary procedures. BMJ Open. 2019;9(7):e026507. doi: 10.1136/bmjopen-2018-026507. PubMed DOI PMC
Odgers D, Tellis N, Hall H, Dumontier M. Using LASSO regression to predict rheumatoid arthritis treatment efficacy. AMIA Jt Summits Transl Sci Proc. 2016;20:176–183. PubMed PMC
Orriols L, Avalos-Fernandez M, Moore N, Philip P, Delorme B, Laumon B, Gadegbeku B, Salmi LR, Lagarde E. Long-term chronic diseases and crash responsibility: a record linkage study. Accid Anal Prev. 2014;71:137–143. doi: 10.1016/j.aap.2014.05.001. PubMed DOI
Patte K, Laxer R, Qian W, Leatherdale S. An analysis of weight perception and physical activity and dietary behaviours among youth in the COMPASS study. SSM Popul Health. 2016;2:841–849. doi: 10.1016/j.ssmph.2016.10.016. PubMed DOI PMC
Astley CM, Chew DP, Keech W, Nicholls S, Beltrame J, Horsfall M, Tavella R, Tirimacco R, Clark RA. The impact of cardiac rehabilitation and secondary prevention programs on 12-month clinical outcomes: a linked data Analysis. Heart Lung Circ. 2020;29(3):475–482. doi: 10.1016/j.hlc.2019.03.015. PubMed DOI
Van der Heyden J, Van Oyen H, Berger N, De Bacquer D, Van Herck K. Activity limitations predict health care expenditures in the general population in Belgium. BMC Public Health. 2015;15(1):267. doi: 10.1186/s12889-015-1607-7. PubMed DOI PMC
Asaria M, Walker S, Palmer S, Gale CP, Shah AD, Abrams KR, Crowther M, Manca A, Timmis A, Hemingway H, Sculpher M. Using electronic health records to predict costs and outcomes in stable coronary artery disease. Heart. 2016;102(10):755–762. doi: 10.1136/heartjnl-2015-308850. PubMed DOI PMC
Tuti T, Agweyu A, Mwaniki P, Peek N, English M. An exploration of mortality risk factors in non-severe pneumonia in children using clinical data from Kenya. BMC Med. 2017;15(1):201. doi: 10.1186/s12916-017-0963-9. PubMed DOI PMC
Goldstein S, Zhang F, Thomas J, Butryn M, Herbert J, Forman E. Application of machine learning to predict dietary lapses during weight loss. J Diabetes Sci Technol. 2018;12(5):1045–1052. doi: 10.1177/1932296818775757. PubMed DOI PMC
Montazeri M, Montazeri M, Montazeri M, Beigzadeh A. Machine learning models in breast cancer survival prediction. Technol Health Care. 2016;24(1):31–42. doi: 10.3233/THC-151071. PubMed DOI
Rahimian F, Salimi-Khorshidi G, Payberah AH, Tran J, Ayala Solares R, Raimondi F, Nazarzadeh M, Canoy D, Rahimi K. Predicting the risk of emergency admission with machine learning: development and validation using linked electronic health records. PLoS Med. 2018;15(11):e1002695. doi: 10.1371/journal.pmed.1002695. PubMed DOI PMC
Zhang L, Wang Y, Niu M, Wang C, Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan rural cohort study. Sci Rep. 2020;10(1):4406. doi: 10.1038/s41598-020-61123-x. PubMed DOI PMC
Zhao M, Tang Y, Kim H, Hasegawa K. Machine learning with K-means dimensional reduction for predicting survival outcomes in patients with breast Cancer. Cancer Inform. 2018;17:1176935118810215. doi: 10.1177/1176935118810215. PubMed DOI PMC
Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–127. doi: 10.1016/j.ijmedinf.2016.09.014. PubMed DOI PMC
Seligman B, Tuljapurkar S, Rehkopf D. Machine learning approaches to the social determinants of health in the health and retirement study. SSM - Population Health. 2018;4:95–99. doi: 10.1016/j.ssmph.2017.11.008. PubMed DOI PMC
Ahlqvist E, Storm P, Käräjämäki A, Martinell M, Dorkhan M, Carlsson A, Vikman P, Prasad RB, Aly DM, Almgren P, Wessman Y, Shaat N, Spégel P, Mulder H, Lindholm E, Melander O, Hansson O, Malmqvist U, Lernmark Å, Lahti K, Forsén T, Tuomi T, Rosengren AH, Groop L. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol. 2018;6(5):361–369. doi: 10.1016/S2213-8587(18)30051-2. PubMed DOI
Maeta K, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, Naito T. Prediction of glucose metabolism disorder risk using a machine learning algorithm: pilot study. JMIR Diabetes. 2018;3(4):10212. doi: 10.2196/10212. PubMed DOI PMC
Aniruddha BHANDARI: AUC-ROC Curve in Machine Learning: https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/. 2020.
MUJTABA H: What is Cross Validation in Machine Learning?: https://www.mygreatlearning.com/blog/cross-validation/. 2020.
Introduction to Sensitivity Analysis. In: Global Sensitivity Analysis The Primer. edn.: 1–51.
Sensitivity Analysis: From Theory to Practice. In: Global Sensitivity Analysis The Primer. edn.: 237–275.
Variance-Based Methods. In: Global Sensitivity Analysis The Primer. edn.: 155–182.
Elementary Effects Method. In: Global Sensitivity Analysis The Primer. edn.: 109–154.
Donders ART, van der Heijden GJMG, Stijnen T, Moons KGM. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087–1091. doi: 10.1016/j.jclinepi.2006.01.014. PubMed DOI
Chinomona A, Mwambi H. Multiple imputation for non-response when estimating HIV prevalence using survey data. BMC Public Health. 2015;15(1):1059. doi: 10.1186/s12889-015-2390-1. PubMed DOI PMC
Maladkar K: 5 Ways To Handle Missing Values In Machine Learning Datasets: https://analyticsindiamag.com/5-ways-handle-missing-values-machine-learning-datasets/. 2018.
Badr W: 6 Different Ways to Compensate for Missing Values In a Dataset (Data Imputation with examples): https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779. 2019.
Wang G, Deng Z, Choi KS. Tackling missing data in community health studies using additive LS-SVM classifier. IEEE J Biomed Health Inform. 2018;22(2):579–587. doi: 10.1109/JBHI.2016.2634587. PubMed DOI
Shelke MS, Deshmukh PR, Shandilya VK: A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique: https://www.ijrter.com/papers/volume-3/issue-4/a-review-on-imbalanced-data-handling-using-undersampling-and-oversampling-technique.pdf. 2017.
Brownlee J: Random Oversampling and Undersampling for Imbalanced Classification: https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/. 2020.
Rashidi HH, Tran NK, Betts EV, Howell LP, Green R. Artificial intelligence and machine learning in pathology: the present landscape of supervised Methods. Acad Pathol. 2019;6:2374289519873088. doi: 10.1177/2374289519873088. PubMed DOI PMC
Glushkovsky A: Robust Tuning for Machine Learning: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1868-2018.pdf 2018.
Jordan J: Hyperparameters tuning: https://www.jeremyjordan.me/hyperparameter-tuning/. 2017.
Campbell M, Katikireddi SV, Hoffmann T, Armstrong R, Waters E, Craig P. TIDieR-PHP: a reporting guideline for population health and policy interventions. BMJ. 2018;361:k1079. doi: 10.1136/bmj.k1079. PubMed DOI PMC
von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. PLoS Med. 2007;4(10):e296. doi: 10.1371/journal.pmed.0040296. PubMed DOI PMC
Sauerbrei W, Abrahamowicz M, Altman DG, le Cessie S, Carpenter J. STRengthening analytical thinking for observational studies: the STRATOS initiative. Stat Med. 2014;33(30):5413–5432. doi: 10.1002/sim.6265. PubMed DOI PMC
EPRS: How the General Data Protection Regulation changes the rules for scientific research: https://www.europarl.europa.eu/RegData/etudes/STUD/2019/634447/EPRS_STU(2019)634447_EN.pdf. 2019.