Enhanced metabolomic predictions using concept drift analysis: identification and correction of confounding factors
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
40297776
PubMed Central
PMC12037104
DOI
10.1093/bioadv/vbaf073
PII: vbaf073
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
MOTIVATION: The increasing use of big data and optimized prediction methods in metabolomics requires techniques aligned with biological assumptions to improve early symptom diagnosis. One major challenge in predictive data analysis is handling confounding factors-variables influencing predictions but not directly included in the analysis. RESULTS: Detecting and correcting confounding factors enhances prediction accuracy, reducing false negatives that contribute to diagnostic errors. This study reviews concept drift detection methods in metabolomic predictions and selects the most appropriate ones. We introduce a new implementation of concept drift analysis in predictive classifiers using metabolomics data. Known confounding factors were confirmed, validating our approach and aligning it with conventional methods. Additionally, we identified potential confounding factors that may influence biomarker analysis, which could introduce bias and impact model performance. AVAILABILITY AND IMPLEMENTATION: Based on biological assumptions supported by detected concept drift, these confounding factors were incorporated into correction of prediction algorithms to enhance their accuracy. The proposed methodology has been implemented in Semi-Automated Pipeline using Concept Drift Analysis for improving Metabolomic Predictions (SAPCDAMP), an open-source workflow available at https://github.com/JanaSchwarzerova/SAPCDAMP.
Department of Physiology Faculty of Medicine Masaryk University Brno 625 00 Czech Republic
Faculty of Informatics Masaryk University Brno 602 00 Czech Republic
Faculty of Medicine and Dentistry Palacký University Olomouc Olomouc 779 00 Czech Republic
Institute of Neuroimmunology Slovak Academy of Sciences Bratislava 845 05 Slovak Republic
Vienna Metabolomics Center University of Vienna Vienna 1010 Austria
Zobrazit více v PubMed
Abbad Ur Rehman H, Lin CY, Mushtaq Z. Effective K-Nearest neighbor algorithms performance analysis of thyroid disease. J Chin Inst Eng, Series A 2021;44:77–87. 10.1080/02533839.2020.1831967 DOI
Aprit D, Wu S, Natarajan P et al. Ridge regression based classifiers for large scale class imbalanced datasets. In:
Baena-García M, Del Campo-Avila J, Fidalgo R et al. Early drift detection method. In: Proceedings of the Fourth International Workshop on Knowledge Discovery from Data Streams (pp. 77–86). Berlin, Germany: Springer, 2006.
Barros RSM, Cabral DRL, Gonçalves PM et al. RDDM: reactive drift detection method. Expert Syst Appl 2017;90:344–55. 10.1016/j.eswa.2017.08.023 DOI
Beyene HB, Giles C, Huynh K et al. Metabolic phenotyping of BMI to characterize cardiometabolic risk: evidence from large population-based cohorts. Nat Commun 2023;14:6280. 10.1038/s41467-023-41963-7 PubMed DOI PMC
Bifet A, Holmes G, Pfahringer B, Frank E. Fast perceptron decision tree learning from evolving data streams. In: Zaki MJ, Yu JX, Ravindran B, Pudi V (eds),
Brereton RG, Lloyd GR. Support vector machines for classification and regression. Analyst 2010;135:230–67. 10.1039/b918972f PubMed DOI
Cabral DRDL, Barros RSMD. EMZD: Equal Means Z-Test Concept Drift Detector. In DOI
Chen T, Cao Y, Zhang Y et al. Random Forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evid Based Complement Alternat Med 2013;2013:298183. 10.1155/2013/298183 PubMed DOI PMC
Chu X, Jaeger M, Beumer J et al. Integration of metabolomics, genomics, and immune phenotypes reveals the causal roles of metabolites in disease. Genome Biol 2021;22:198. 10.1186/s13059-021-02413-z PubMed DOI PMC
Cohen IR, Harel D. Explaining a complex living system: dynamics, multi-scaling and emergence. J R Soc Interface 2007;4:175–82. 10.1098/rsif.2006.0173 PubMed DOI PMC
Costa AFJ, Albuquerque RAS, Santos EM. A Drift Detection Method Based on Active Learning. In: DOI
de Villiers JPR, Fernández Pierna JA, Wahl F, de Noord OE. Methods for outlier detection in prediction.
Dubey A, Malla MA, Kumar A et al. Plants endophytes: unveiling hidden agenda for bioprospecting toward sustainable agriculture. Crit Rev Biotechnol 2020;40:1210–31. 10.1080/07388551.2020.1808584 PubMed DOI
Fernández Pierna JA, Wahl F de Noord OE, Massart DL. Methods for outlier detection in prediction.
Frías-Blanco I, Del Campo-Ávila J, Ramos-Jiménez G et al. Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans Knowl Data Eng 2015;27:810–23. 10.1109/TKDE.2014.2345382 DOI
Huangyang P, Simon MC. Hidden features: exploring the non-canonical functions of metabolic enzymes. PubMed DOI PMC
Ip C, Hayes C, Budnick RM et al. Chemical form of selenium, critical metabolites, and cancer prevention. Cancer Res 1991;51:595–600. PubMed
Ivanisevic J, Thomas A. Metabolomics as a tool to understand pathophysiological processes. Methods Mol Biol 2018;1730:3–28. 10.1007/978-1-4939-7592-1_1 PubMed DOI
Jeanquartier F, Jean-Quartier C, Cemernek D et al. In silico modeling for tumor growth visualization. BMC Syst Biol 2016;10:59. 10.1186/s12918-016-0318-8 PubMed DOI PMC
Gama J, Medas P, Rodrigues P.
Karlíková R, Široká J, Friedecký D et al. Metabolite profiling of the plasma and leukocytes of chronic myeloid leukemia patients. J Proteome Res 2016;15:3158–66. 10.1021/acs.jproteome.6b00356 PubMed DOI
Kokla M, Virtanen J, Kolehmainen M, Oresic M. Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study. PubMed PMC
Kopylov AT, Petrovsky DV, Stepanov AA et al. Convolutional neural network in proteomics and metabolomics for determination of comorbidity between cancer and schizophrenia. J Biomed Inform 2021;122:103890. 10.1016/j.jbi.2021.103890 PubMed DOI
Kugler H, Larjo A, Harel D. Biocharts: a visual formalism for complex biological systems. J R Soc Interface 2010;7:1015–24. 10.1098/rsif.2009.0457 PubMed DOI PMC
Kyosuke N, Yamauchi K. Detecting concept drift using statistical testing. Discovery Science, Vol. 4755. 2007, 264–9.
Li K, Schön M, Naviaux JC et al. Cerebrospinal fluid and plasma metabolomics of acute endurance exercise. FASEB J 2022;36:e22408. 10.1096/fj.202200509R PubMed DOI
Ojala M, Garriga GC. Permutation tests for studying classifier performance. J Mach Learn Res 2010;11:1833–63.
Pedregosa F, Varoquaux G, Gramfort A et al. Scikit-learn: machine learning in python. J Mach Learn Res 2011;12:2825–30. http://scikit-learn.sourceforge.net.
Pesaranghader A, Viktor H, Paquet E. Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams. Mach Learn 2018;107:1711–43. 10.1007/s10994-018-5719-z DOI
Rattray NJW, Charkoftaki G, Rattray Z et al. Environmental influences in the etiology of colorectal cancer: the premise of metabolomics. Curr Pharmacol Rep 2017;3:114–25. 10.1007/s40495-017-0088-z PubMed DOI PMC
Reddi SJ, Kale S, Kumar S. On the Convergence of Adam and Beyond. arXiv Preprint arXiv: 1904.09237. 2019.
Roberts SW. Control chart tests based on geometric moving averages. Technometrics 2000;42:97–101. 10.1080/00401706.2000.10485986 DOI
Schwarzerova J, Bajger A, Pierides I et al. An innovative perspective on metabolomics data analysis in biomedical research using concept drift detection. In: DOI
Schwarzerova J, Kostoval A, Bajger A et al. A revealed imperfection in concept drift correction in metabolomics modeling. In:
Simsek M, Meijer B, van Bodegraven AA et al. Finding hidden treasures in old drugs: the challenges and importance of licensing generics. Drug Discov Today 2018;23:17–21. 10.1016/j.drudis.2017.08.008 PubMed DOI
Sun X, Weckwerth W. COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential jacobian from metabolomics covariance data. Metabolomics 2012;8:81–93. 10.1007/s11306-012-0399-3 DOI
Trairatphisan P, Wiesinger M, Bahlawane C et al. A probabilistic boolean network approach for the analysis of cancer-specific signalling: a case study of deregulated pdgf signalling in GIST. PLoS One 2016;11:e0156223. 10.1371/journal.pone.0156223 PubMed DOI PMC
van den Berg RA, Hoefsloot HCJ, Westerhuis JA et al. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 2006;7:142. 10.1186/1471-2164-7-142 PubMed DOI PMC
Webb GI, Hyde R, Cao H et al. Characterizing concept drift. Data Min Knowl Disc 2016;30:964–94. 10.1007/s10618-015-0448-4 DOI
Wind Y. Issues and advances· in segmentation research. J Marketing Res 1978;15:317–37.
Xia J, Psychogios N, Young N et al. MetaboAnalyst: a web server for metabolomic data analysis and interpretation. Nucleic Acids Res 2009;37:W652–60. 10.1093/nar/gkp356 PubMed DOI PMC
Žliobaitė I, Pechenizkiy M, Gama J. An overview of concept drift applications. Studies in Big Data, Vol. 16. 2016, 91–114. 10.1007/978-3-319-26989-4_4 DOI