Enhanced metabolomic predictions using concept drift analysis: identification and correction of confounding factors

. 2025 ; 5 (1) : vbaf073. [epub] 20250404

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid40297776

MOTIVATION: The increasing use of big data and optimized prediction methods in metabolomics requires techniques aligned with biological assumptions to improve early symptom diagnosis. One major challenge in predictive data analysis is handling confounding factors-variables influencing predictions but not directly included in the analysis. RESULTS: Detecting and correcting confounding factors enhances prediction accuracy, reducing false negatives that contribute to diagnostic errors. This study reviews concept drift detection methods in metabolomic predictions and selects the most appropriate ones. We introduce a new implementation of concept drift analysis in predictive classifiers using metabolomics data. Known confounding factors were confirmed, validating our approach and aligning it with conventional methods. Additionally, we identified potential confounding factors that may influence biomarker analysis, which could introduce bias and impact model performance. AVAILABILITY AND IMPLEMENTATION: Based on biological assumptions supported by detected concept drift, these confounding factors were incorporated into correction of prediction algorithms to enhance their accuracy. The proposed methodology has been implemented in Semi-Automated Pipeline using Concept Drift Analysis for improving Metabolomic Predictions (SAPCDAMP), an open-source workflow available at https://github.com/JanaSchwarzerova/SAPCDAMP.

Zobrazit více v PubMed

Abbad Ur Rehman H, Lin CY, Mushtaq Z.  Effective K-Nearest neighbor algorithms performance analysis of thyroid disease. J Chin Inst Eng, Series A  2021;44:77–87. 10.1080/02533839.2020.1831967 DOI

Aprit D, Wu S, Natarajan P  et al. Ridge regression based classifiers for large scale class imbalanced datasets. In:

Baena-García M, Del Campo-Avila J, Fidalgo R  et al.  Early drift detection method. In: Proceedings of the Fourth International Workshop on Knowledge Discovery from Data Streams (pp. 77–86). Berlin, Germany: Springer, 2006.

Barros RSM, Cabral DRL, Gonçalves PM  et al.  RDDM: reactive drift detection method. Expert Syst Appl  2017;90:344–55. 10.1016/j.eswa.2017.08.023 DOI

Beyene HB, Giles C, Huynh K  et al.  Metabolic phenotyping of BMI to characterize cardiometabolic risk: evidence from large population-based cohorts. Nat Commun  2023;14:6280. 10.1038/s41467-023-41963-7 PubMed DOI PMC

Bifet A, Holmes  G, Pfahringer  B, Frank  E. Fast perceptron decision tree learning from evolving data streams. In: Zaki MJ, Yu JX, Ravindran B, Pudi V (eds),

Brereton RG, Lloyd GR.  Support vector machines for classification and regression. Analyst  2010;135:230–67. 10.1039/b918972f PubMed DOI

Cabral DRDL, Barros RSMD. EMZD: Equal Means Z-Test Concept Drift Detector. In DOI

Chen T, Cao Y, Zhang Y  et al.  Random Forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evid Based Complement Alternat Med  2013;2013:298183. 10.1155/2013/298183 PubMed DOI PMC

Chu X, Jaeger M, Beumer J  et al.  Integration of metabolomics, genomics, and immune phenotypes reveals the causal roles of metabolites in disease. Genome Biol  2021;22:198. 10.1186/s13059-021-02413-z PubMed DOI PMC

Cohen IR, Harel D.  Explaining a complex living system: dynamics, multi-scaling and emergence. J R Soc Interface  2007;4:175–82. 10.1098/rsif.2006.0173 PubMed DOI PMC

Costa AFJ, Albuquerque RAS, Santos EM.  A Drift Detection Method Based on Active Learning. In: DOI

de Villiers JPR, Fernández Pierna JA, Wahl F, de Noord OE.  Methods for outlier detection in prediction.

Dubey A, Malla MA, Kumar A  et al.  Plants endophytes: unveiling hidden agenda for bioprospecting toward sustainable agriculture. Crit Rev Biotechnol  2020;40:1210–31. 10.1080/07388551.2020.1808584 PubMed DOI

Fernández Pierna JA, Wahl F  de Noord OE, Massart DL. Methods for outlier detection in prediction.

Frías-Blanco I, Del Campo-Ávila J, Ramos-Jiménez G  et al.  Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans Knowl Data Eng  2015;27:810–23. 10.1109/TKDE.2014.2345382 DOI

Huangyang P, Simon MC.  Hidden features: exploring the non-canonical functions of metabolic enzymes. PubMed DOI PMC

Ip C, Hayes C, Budnick RM  et al.  Chemical form of selenium, critical metabolites, and cancer prevention. Cancer Res  1991;51:595–600. PubMed

Ivanisevic J, Thomas A.  Metabolomics as a tool to understand pathophysiological processes. Methods Mol Biol  2018;1730:3–28. 10.1007/978-1-4939-7592-1_1 PubMed DOI

Jeanquartier F, Jean-Quartier C, Cemernek D  et al.  In silico modeling for tumor growth visualization. BMC Syst Biol  2016;10:59. 10.1186/s12918-016-0318-8 PubMed DOI PMC

Gama J, Medas P, Rodrigues P.

Karlíková R, Široká J, Friedecký D  et al.  Metabolite profiling of the plasma and leukocytes of chronic myeloid leukemia patients. J Proteome Res  2016;15:3158–66. 10.1021/acs.jproteome.6b00356 PubMed DOI

Kokla M, Virtanen J, Kolehmainen M, Oresic M. Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study. PubMed PMC

Kopylov AT, Petrovsky DV, Stepanov AA  et al.  Convolutional neural network in proteomics and metabolomics for determination of comorbidity between cancer and schizophrenia. J Biomed Inform  2021;122:103890. 10.1016/j.jbi.2021.103890 PubMed DOI

Kugler H, Larjo A, Harel D.  Biocharts: a visual formalism for complex biological systems. J R Soc Interface  2010;7:1015–24. 10.1098/rsif.2009.0457 PubMed DOI PMC

Kyosuke N, Yamauchi K.  Detecting concept drift using statistical testing. Discovery Science, Vol.  4755. 2007, 264–9.

Li K, Schön M, Naviaux JC  et al.  Cerebrospinal fluid and plasma metabolomics of acute endurance exercise. FASEB J  2022;36:e22408. 10.1096/fj.202200509R PubMed DOI

Ojala M, Garriga GC.  Permutation tests for studying classifier performance. J Mach Learn Res  2010;11:1833–63.

Pedregosa F, Varoquaux G, Gramfort A  et al.  Scikit-learn: machine learning in python. J Mach Learn Res  2011;12:2825–30. http://scikit-learn.sourceforge.net.

Pesaranghader A, Viktor H, Paquet E.  Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams. Mach Learn  2018;107:1711–43. 10.1007/s10994-018-5719-z DOI

Rattray NJW, Charkoftaki G, Rattray Z  et al.  Environmental influences in the etiology of colorectal cancer: the premise of metabolomics. Curr Pharmacol Rep  2017;3:114–25. 10.1007/s40495-017-0088-z PubMed DOI PMC

Reddi  SJ, Kale  S, Kumar  S. On the Convergence of Adam and Beyond. arXiv Preprint arXiv: 1904.09237. 2019.

Roberts SW.  Control chart tests based on geometric moving averages. Technometrics  2000;42:97–101. 10.1080/00401706.2000.10485986 DOI

Schwarzerova  J, Bajger  A, Pierides  I  et al. An innovative perspective on metabolomics data analysis in biomedical research using concept drift detection. In: DOI

Schwarzerova J, Kostoval A, Bajger A  et al. A revealed imperfection in concept drift correction in metabolomics modeling. In:

Simsek M, Meijer B, van Bodegraven AA  et al.  Finding hidden treasures in old drugs: the challenges and importance of licensing generics. Drug Discov Today  2018;23:17–21. 10.1016/j.drudis.2017.08.008 PubMed DOI

Sun X, Weckwerth W.  COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential jacobian from metabolomics covariance data. Metabolomics  2012;8:81–93. 10.1007/s11306-012-0399-3 DOI

Trairatphisan P, Wiesinger M, Bahlawane C  et al.  A probabilistic boolean network approach for the analysis of cancer-specific signalling: a case study of deregulated pdgf signalling in GIST. PLoS One  2016;11:e0156223. 10.1371/journal.pone.0156223 PubMed DOI PMC

van den Berg RA, Hoefsloot HCJ, Westerhuis JA  et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics  2006;7:142. 10.1186/1471-2164-7-142 PubMed DOI PMC

Webb GI, Hyde R, Cao H  et al.  Characterizing concept drift. Data Min Knowl Disc  2016;30:964–94. 10.1007/s10618-015-0448-4 DOI

Wind Y.  Issues and advances· in segmentation research. J Marketing Res  1978;15:317–37.

Xia J, Psychogios N, Young N  et al.  MetaboAnalyst: a web server for metabolomic data analysis and interpretation. Nucleic Acids Res  2009;37:W652–60. 10.1093/nar/gkp356 PubMed DOI PMC

Žliobaitė I, Pechenizkiy M, Gama J.  An overview of concept drift applications. Studies in Big Data, Vol.  16. 2016, 91–114. 10.1007/978-3-319-26989-4_4 DOI

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...