Methodological guidelines to estimate population-based health indicators using linked data and/or machine learning techniques

. 2022 Jan 04 ; 80 (1) : 9. [epub] 20220104

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid34983651

Grantová podpora
801553 / InfAct European Commission

Odkazy

PubMed 34983651
PubMed Central PMC8725299
DOI 10.1186/s13690-021-00770-6
PII: 10.1186/s13690-021-00770-6
Knihovny.cz E-zdroje

BACKGROUND: The capacity to use data linkage and artificial intelligence to estimate and predict health indicators varies across European countries. However, the estimation of health indicators from linked administrative data is challenging due to several reasons such as variability in data sources and data collection methods resulting in reduced interoperability at various levels and timeliness, availability of a large number of variables, lack of skills and capacity to link and analyze big data. The main objective of this study is to develop the methodological guidelines calculating population-based health indicators to guide European countries using linked data and/or machine learning (ML) techniques with new methods. METHOD: We have performed the following step-wise approach systematically to develop the methodological guidelines: i. Scientific literature review, ii. Identification of inspiring examples from European countries, and iii. Developing the checklist of guidelines contents. RESULTS: We have developed the methodological guidelines, which provide a systematic approach for studies using linked data and/or ML-techniques to produce population-based health indicators. These guidelines include a detailed checklist of the following items: rationale and objective of the study (i.e., research question), study design, linked data sources, study population/sample size, study outcomes, data preparation, data analysis (i.e., statistical techniques, sensitivity analysis and potential issues during data analysis) and study limitations. CONCLUSIONS: This is the first study to develop the methodological guidelines for studies focused on population health using linked data and/or machine learning techniques. These guidelines would support researchers to adopt and develop a systematic approach for high-quality research methods. There is a need for high-quality research methodologies using more linked data and ML-techniques to develop a structured cross-disciplinary approach for improving the population health information and thereby the population health.

Erratum v

PubMed

Zobrazit více v PubMed

Lloyd K, McGregor J, John A, Craddock N, Walters JT, Linden D, Jones I, Bentall R, Lyons RA, Ford DV, Owen MJ. A national population-based e-cohort of people with psychosis (PsyCymru) linking prospectively ascertained phenotypically rich and genetic data to routinely collected records: overview, recruitment and linkage. Schizophr Res. 2015;166(1):131–136. doi: 10.1016/j.schres.2015.05.036. PubMed DOI

Delnord M, Szamotulska K, Hindori-Mohangoo AD, Blondel B, Macfarlane AJ, Dattani N, Barona C, Berrut S, Zile I, Wood R, Sakkeus L, Gissler M, Zeitlin J, and the Euro-Peristat Scientific Committee Linking databases on perinatal health: a review of the literature and current practices in Europe. Eur J Pub Health. 2016;26(3):422–430. doi: 10.1093/eurpub/ckv231. PubMed DOI PMC

Haneef R, Delnord M, Vernay M, Bauchet E, Gaidelyte R, Van Oyen H, Or Z, Pérez-Gómez B, Palmieri L, Achterberg P, et al. Innovative use of data sources: a cross-sectional study of data linkage and artificial intelligence practices across European countries. Arc Public Health. 2020;78(1):55. doi: 10.1186/s13690-020-00436-9. PubMed DOI PMC

Bradley CJ, Penberthy L, Devers KJ, Holden DJ. Health Services Research and Data Linkages: Issues, Methods, and Directions for the Future. Health Serv Res. 2010;45(5p2):1468–1488. doi: 10.1111/j.1475-6773.2010.01142.x. PubMed DOI PMC

Joint Action on Health Information: https://www.inf-act.eu/. 2018.

INFACT: Inspiring Examples from European Countries: https://www.inf-act.eu/sites/inf-act.eu/files/2021-02/D%209.2%20%28Part%20B%29_Inspiring%20Examples.pdf. 2020.

Stevens G, Alkema L, Black R, Boerma J, Collins G, Ezzati M, Grove J, Hogan D, Hogan M, Horton R, Lawn JE, Marušić A, Mathers CD, Murray CJ, Rudan I, Salomon JA, Simpson PJ, Vos T, Welch V, (The GATHER Working Group) Guidelines for accurate and transparent health estimates reporting: the GATHER statement. Lancet. 2016;388(10062):e19–e23. doi: 10.1016/S0140-6736(16)30388-9. PubMed DOI

Bohensky M, Jolley D, Sundararajan V, Evans S, Ibrahim J, Brand C. Development and validation of reporting guidelines for studies involving data linkage. Aust N Z J Public Health. 2011;35(5):486–489. doi: 10.1111/j.1753-6405.2011.00741.x. PubMed DOI

Illinois Uo: What is a PICO model?:https://researchguides.uic.edu/c.php?g=252338&p=3954402. 2020.

Brownlee J: A Gentle Introduction to Statistical Hypothesis: https://machinelearningmastery.com/statistical-hypothesis-tests/. 2018.

Robnik-Sikonja M, Kononenko I: An adaptation of Relief for attribute estimation in regression: http://www.clopinet.com/isabelle/Projects/reading/robnik97-icml.pdf. 1997.

Ezzati A, Zammit AR, Harvey DJ, Habeck C, Hall CB, Lipton RB, for the Alzheimer’s disease neuroimaging I Optimizing machine learning Methods to improve predictive models of Alzheimer’s disease. J Alzheimers Dis. 2019;71(3):1027–1036. doi: 10.3233/JAD-190262. PubMed DOI PMC

Yang T, Zhang L, Yi L, Feng H, Li S, Chen H, Zhu J, Zhao J, Zeng Y, Liu H. Ensemble learning models based on noninvasive features for type 2 diabetes screening: model development and validation. JMIR Med Inform. 2020;8(6):e15431. doi: 10.2196/15431. PubMed DOI PMC

Mason KE, Pearce N, Cummins S. Associations between fast food and physical activity environments and adiposity in mid-life: cross-sectional, observational evidence from UK biobank. Lancet Public Health. 2018;3(1):e24–e33. doi: 10.1016/S2468-2667(17)30212-8. PubMed DOI PMC

Sultan A, West J, Grainge M, Riley R, Tata L, Stephansson O, et al. Development and validation of risk prediction model for venous thromboembolism in postpartum women: multinational cohort study. Bmj. 2016:5(355). 10.1136/bmj.i6253. PubMed PMC

Patel K, Spertus J, Khariton Y, Tang Y, Curtis L, Chan P. Association between prompt defibrillation and epinephrine treatment with long-term survival after in-hospital cardiac arrest. Circulation. 2018;137(19):2041–2051. doi: 10.1161/CIRCULATIONAHA.117.030488. PubMed DOI PMC

Fogg AJ, Welsh J, Banks E, Abhayaratna W, Korda RJ. Variation in cardiovascular disease care: an Australian cohort study on sex differences in receipt of coronary procedures. BMJ Open. 2019;9(7):e026507. doi: 10.1136/bmjopen-2018-026507. PubMed DOI PMC

Odgers D, Tellis N, Hall H, Dumontier M. Using LASSO regression to predict rheumatoid arthritis treatment efficacy. AMIA Jt Summits Transl Sci Proc. 2016;20:176–183. PubMed PMC

Orriols L, Avalos-Fernandez M, Moore N, Philip P, Delorme B, Laumon B, Gadegbeku B, Salmi LR, Lagarde E. Long-term chronic diseases and crash responsibility: a record linkage study. Accid Anal Prev. 2014;71:137–143. doi: 10.1016/j.aap.2014.05.001. PubMed DOI

Patte K, Laxer R, Qian W, Leatherdale S. An analysis of weight perception and physical activity and dietary behaviours among youth in the COMPASS study. SSM Popul Health. 2016;2:841–849. doi: 10.1016/j.ssmph.2016.10.016. PubMed DOI PMC

Astley CM, Chew DP, Keech W, Nicholls S, Beltrame J, Horsfall M, Tavella R, Tirimacco R, Clark RA. The impact of cardiac rehabilitation and secondary prevention programs on 12-month clinical outcomes: a linked data Analysis. Heart Lung Circ. 2020;29(3):475–482. doi: 10.1016/j.hlc.2019.03.015. PubMed DOI

Van der Heyden J, Van Oyen H, Berger N, De Bacquer D, Van Herck K. Activity limitations predict health care expenditures in the general population in Belgium. BMC Public Health. 2015;15(1):267. doi: 10.1186/s12889-015-1607-7. PubMed DOI PMC

Asaria M, Walker S, Palmer S, Gale CP, Shah AD, Abrams KR, Crowther M, Manca A, Timmis A, Hemingway H, Sculpher M. Using electronic health records to predict costs and outcomes in stable coronary artery disease. Heart. 2016;102(10):755–762. doi: 10.1136/heartjnl-2015-308850. PubMed DOI PMC

Tuti T, Agweyu A, Mwaniki P, Peek N, English M. An exploration of mortality risk factors in non-severe pneumonia in children using clinical data from Kenya. BMC Med. 2017;15(1):201. doi: 10.1186/s12916-017-0963-9. PubMed DOI PMC

Goldstein S, Zhang F, Thomas J, Butryn M, Herbert J, Forman E. Application of machine learning to predict dietary lapses during weight loss. J Diabetes Sci Technol. 2018;12(5):1045–1052. doi: 10.1177/1932296818775757. PubMed DOI PMC

Montazeri M, Montazeri M, Montazeri M, Beigzadeh A. Machine learning models in breast cancer survival prediction. Technol Health Care. 2016;24(1):31–42. doi: 10.3233/THC-151071. PubMed DOI

Rahimian F, Salimi-Khorshidi G, Payberah AH, Tran J, Ayala Solares R, Raimondi F, Nazarzadeh M, Canoy D, Rahimi K. Predicting the risk of emergency admission with machine learning: development and validation using linked electronic health records. PLoS Med. 2018;15(11):e1002695. doi: 10.1371/journal.pmed.1002695. PubMed DOI PMC

Zhang L, Wang Y, Niu M, Wang C, Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan rural cohort study. Sci Rep. 2020;10(1):4406. doi: 10.1038/s41598-020-61123-x. PubMed DOI PMC

Zhao M, Tang Y, Kim H, Hasegawa K. Machine learning with K-means dimensional reduction for predicting survival outcomes in patients with breast Cancer. Cancer Inform. 2018;17:1176935118810215. doi: 10.1177/1176935118810215. PubMed DOI PMC

Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–127. doi: 10.1016/j.ijmedinf.2016.09.014. PubMed DOI PMC

Seligman B, Tuljapurkar S, Rehkopf D. Machine learning approaches to the social determinants of health in the health and retirement study. SSM - Population Health. 2018;4:95–99. doi: 10.1016/j.ssmph.2017.11.008. PubMed DOI PMC

Ahlqvist E, Storm P, Käräjämäki A, Martinell M, Dorkhan M, Carlsson A, Vikman P, Prasad RB, Aly DM, Almgren P, Wessman Y, Shaat N, Spégel P, Mulder H, Lindholm E, Melander O, Hansson O, Malmqvist U, Lernmark Å, Lahti K, Forsén T, Tuomi T, Rosengren AH, Groop L. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol. 2018;6(5):361–369. doi: 10.1016/S2213-8587(18)30051-2. PubMed DOI

Maeta K, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, Naito T. Prediction of glucose metabolism disorder risk using a machine learning algorithm: pilot study. JMIR Diabetes. 2018;3(4):10212. doi: 10.2196/10212. PubMed DOI PMC

Aniruddha BHANDARI: AUC-ROC Curve in Machine Learning: https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/. 2020.

MUJTABA H: What is Cross Validation in Machine Learning?: https://www.mygreatlearning.com/blog/cross-validation/. 2020.

Introduction to Sensitivity Analysis. In: Global Sensitivity Analysis The Primer. edn.: 1–51.

Sensitivity Analysis: From Theory to Practice. In: Global Sensitivity Analysis The Primer. edn.: 237–275.

Variance-Based Methods. In: Global Sensitivity Analysis The Primer. edn.: 155–182.

Elementary Effects Method. In: Global Sensitivity Analysis The Primer. edn.: 109–154.

Donders ART, van der Heijden GJMG, Stijnen T, Moons KGM. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087–1091. doi: 10.1016/j.jclinepi.2006.01.014. PubMed DOI

Chinomona A, Mwambi H. Multiple imputation for non-response when estimating HIV prevalence using survey data. BMC Public Health. 2015;15(1):1059. doi: 10.1186/s12889-015-2390-1. PubMed DOI PMC

Maladkar K: 5 Ways To Handle Missing Values In Machine Learning Datasets: https://analyticsindiamag.com/5-ways-handle-missing-values-machine-learning-datasets/. 2018.

Badr W: 6 Different Ways to Compensate for Missing Values In a Dataset (Data Imputation with examples): https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779. 2019.

Wang G, Deng Z, Choi KS. Tackling missing data in community health studies using additive LS-SVM classifier. IEEE J Biomed Health Inform. 2018;22(2):579–587. doi: 10.1109/JBHI.2016.2634587. PubMed DOI

Shelke MS, Deshmukh PR, Shandilya VK: A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique: https://www.ijrter.com/papers/volume-3/issue-4/a-review-on-imbalanced-data-handling-using-undersampling-and-oversampling-technique.pdf. 2017.

Brownlee J: Random Oversampling and Undersampling for Imbalanced Classification: https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/. 2020.

Rashidi HH, Tran NK, Betts EV, Howell LP, Green R. Artificial intelligence and machine learning in pathology: the present landscape of supervised Methods. Acad Pathol. 2019;6:2374289519873088. doi: 10.1177/2374289519873088. PubMed DOI PMC

Glushkovsky A: Robust Tuning for Machine Learning: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1868-2018.pdf 2018.

Jordan J: Hyperparameters tuning: https://www.jeremyjordan.me/hyperparameter-tuning/. 2017.

Campbell M, Katikireddi SV, Hoffmann T, Armstrong R, Waters E, Craig P. TIDieR-PHP: a reporting guideline for population health and policy interventions. BMJ. 2018;361:k1079. doi: 10.1136/bmj.k1079. PubMed DOI PMC

von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. PLoS Med. 2007;4(10):e296. doi: 10.1371/journal.pmed.0040296. PubMed DOI PMC

Sauerbrei W, Abrahamowicz M, Altman DG, le Cessie S, Carpenter J. STRengthening analytical thinking for observational studies: the STRATOS initiative. Stat Med. 2014;33(30):5413–5432. doi: 10.1002/sim.6265. PubMed DOI PMC

EPRS: How the General Data Protection Regulation changes the rules for scientific research: https://www.europarl.europa.eu/RegData/etudes/STUD/2019/634447/EPRS_STU(2019)634447_EN.pdf. 2019.

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...