• This record comes from PubMed

Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data

. 2025 Jan ; 20 (1) : 92-162. [epub] 20240920

Language English Country Great Britain, England Media print-electronic

Document type Journal Article

Grant support
EXC 2124 Deutsche Forschungsgemeinschaft (German Research Foundation)

Links

PubMed 39304763
DOI 10.1038/s41596-024-01046-3
PII: 10.1038/s41596-024-01046-3
Knihovny.cz E-resources

Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography-tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography-tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks ( https://github.com/Functional-Metabolomics-Lab/FBMN-STATS ). Additionally, the protocol is accompanied by a web application with a graphical user interface ( https://fbmn-statsguide.gnps2.org/ ) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.

Applied Bioinformatics Department of Computer Science University of Tübingen Tübingen Germany

Bigelow Laboratory for Ocean Sciences East Boothbay ME USA

Bioinformatics Group Wageningen University and Research Wageningen the Netherlands

Boyce Thompson Institute and Department of Chemistry and Chemical Biology Cornell University Ithaca NY USA

Collaborative Mass Spectrometry Innovation Center Skaggs School of Pharmacy and Pharmaceutical Sciences University of California San Diego San Diego CA USA

Department of Analytical Chemistry University of Vienna Vienna Austria

Department of Biochemistry and Microbiology Rhodes University Makhanda South Africa

Department of Biochemistry University of California Riverside Riverside CA USA

Department of Biochemistry University of Johannesburg Johannesburg South Africa

Department of Bioinformatics University of Jena Jena Germany

Department of Biotechnology Engineering School of Lorena University of São Paulo Lorena São Paulo Brazil

Department of Chemistry and Biochemistry University of Denver Denver CO USA

Department of Computer Science University of California Riverside Riverside CA USA

Department of Ecology Behavior and Evolution University of California San Diego San Diego CA USA

Department of Environmental Science Aarhus University Roskilde Denmark

Department of Environmental Systems Analysis University of Tübingen Tübingen Germany

Department of Food Chemistry and Toxicology University of Vienna Vienna Austria

Department of Nutrition Exercise and Sports University of Copenhagen Frederiksberg C Denmark

German Center for Infection Research Partner Site Braunschweig Hannover Braunschweig Germany

Helmholtz Institute for Pharmaceutical Research Saarland Helmholtz Centre for Infection Research Saarbrücken Germany

Institute of Inorganic and Analytical Chemistry University of Münster Münster Germany

Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Prague Czech Republic

Laboratorio de Microbiología Molecular y Biotecnología Ambiental Centro de Biotecnología DAL Universidad Técnica Federico Santa María Valparaíso Chile

Leibniz Institute DSMZ German Collection of Microorganisms and Cell Cultures Braunschweig Germany

Leibniz Institute of Freshwater Ecology and Inland Fisheries Berlin Germany

National Center for Genetic Engineering and Biotechnology National Science and Technology Development Agency Thailand Science Park Pathum Thani Thailand

Saarland University Saarbrücken Germany

School of Marine Sciences Darling Marine Center University of Maine Walpole ME USA

Section for Clinical Mass Spectrometry Danish Center for Neonatal Screening Department of Congenital Disorders Statens Serum Institut Copenhagen S Denmark

Skaggs School of Pharmacy and Pharmaceutical Sciences University of California San Diego San Diego CA USA

The Novo Nordisk Foundation for Biosustainability Technical University of Denmark Kongens Lyngby Denmark

Universidad EAFIT Medellín Antioquia Colombia

University of Tübingen Interfaculty Institute of Microbiology and Infection Medicine Tübingen Germany

Virtual Multi Omics Laboratory The Internet Riverside CA USA

See more in PubMed

Vailati-Riboni, M., Palombo, V. & Loor, J. J. What are omics sciences? in Periparturient Diseases of Dairy Cows (ed. Ametaj, B.) Ch. 1 (Springer, 2017); https://doi.org/10.1007/978-3-319-43033-1_1 .

Patti, G. J., Yanes, O. & Siuzdak, G. Metabolomics: the apogee of the omics trilogy. Nat. Rev. Mol. Cell Biol. 13, 263–269 (2012). PubMed DOI PMC

Dayalan, S., Xia, J., Spicer, R. A., Salek, R. & Roessner, U. Metabolome analysis. in Encyclopedia of Bioinformatics and Computational Biology (eds. Ranganathan, S., Gribskov, M., Nakai, K. & Schönbach, C.) 396–409 (Academic Press, 2019); https://doi.org/10.1016/B978-0-12-809633-8.20251-3 .

Tolstikov, V., Moser, A. J., Sarangarajan, R., Narain, N. R. & Kiebish, M. A. Current status of metabolomic biomarker discovery: impact of study design and demographic characteristics. Metabolites 10, 224 (2020). PubMed DOI PMC

de Jonge, N. F. et al. Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools. Metabolomics 18, 103 (2022). PubMed DOI PMC

Nothias, L.-F. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020). PubMed DOI PMC

Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016). PubMed DOI PMC

Ottosson, F. et al. Effects of long-term storage on the biobanked neonatal dried blood spot metabolome. J. Am. Soc. Mass Spectrom. 34, 685–694 (2023). PubMed DOI PMC

Dantas Machado, A. C. et al. Portosystemic shunt placement reveals blood signatures for the development of hepatic encephalopathy through mass spectrometry. Nat. Commun. 14, 5303 (2023). PubMed DOI PMC

Xie, H.-F. et al. Feature-based molecular networking analysis of the metabolites produced by in vitro solid-state fermentation reveals pathways for the bioconversion of epigallocatechin gallate. J. Agric. Food Chem. 68, 7995–8007 (2020). PubMed DOI

Berlanga-Clavero, M. V. et al. Bacillus subtilis biofilm matrix components target seed oil bodies to promote growth and anti-fungal resistance in melon. Nat. Microbiol. 7, 1001–1015 (2022). PubMed DOI PMC

Raheem, D. J., Tawfike, A. F., Abdelmohsen, U. R., Edrada-Ebel, R. & Fitzsimmons-Thoss, V. Application of metabolomics and molecular networking in investigating the chemical profile and antitrypanosomal activity of British bluebells (Hyacinthoides non-scripta). Sci. Rep. 9, 2547 (2019). PubMed DOI PMC

Pendergraft, M. A. et al. Bacterial and chemical evidence of coastal water pollution from the Tijuana River in sea spray aerosol. Environ. Sci. Technol. 57, 4071–4081 (2023). PubMed DOI PMC

Petras, D. et al. Non-targeted tandem mass spectrometry enables the visualization of organic matter chemotype shifts in coastal seawater. Chemosphere 271, 129450 (2021). PubMed DOI PMC

Stincone, P. et al. Evaluation of data-dependent MS/MS acquisition parameters for non-targeted metabolomics and molecular networking of environmental samples: focus on the Q exactive platform. Anal. Chem. 95, 12673–12682 (2023). PubMed DOI PMC

Wegley Kelly, L. et al. Distinguishing the molecular diversity, nutrient content, and energetic potential of exometabolomes produced by macroalgae and reef-building corals. Proc. Natl Acad. Sci. Usa. 119, e2110283119 (2022). PubMed DOI PMC

Mannochio-Russo, H. et al. Microbiomes and metabolomes of dominant coral reef primary producers illustrate a potential role for immunolipids in marine symbioses. Commun. Biol. 6, 896 (2023). PubMed DOI PMC

Shaffer, J. P. et al. Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity. Nat. Microbiol. 7, 2128–2150 (2022). PubMed DOI PMC

Molina-Santiago, C. et al. Chemical interplay and complementary adaptative strategies toggle bacterial antagonism and co-existence. Cell Rep. 36, 109449 (2021). PubMed DOI PMC

Reher, R. et al. Native metabolomics identifies the rivulariapeptolide family of protease inhibitors. Nat. Commun. 13, 4619 (2022). PubMed DOI PMC

Aron, A. T. et al. Native mass spectrometry-based metabolomics identifies metal-binding compounds. Nat. Chem. 14, 100–109 (2022). PubMed DOI

Behnsen, J. et al. Siderophore-mediated zinc acquisition enhances enterobacterial colonization of the inflamed gut. Nat. Commun. 12, 7016 (2021). PubMed DOI PMC

Pang, Z. et al. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. Nucleic Acids Res. 49, W388–W396 (2021). PubMed DOI PMC

Pang, Z. et al. Using MetaboAnalyst 5.0 for LC–HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data. Nat. Protoc. 17, 1735–1761 (2022). PubMed DOI

Cajka, T. & Fiehn, O. Toward merging untargeted and targeted methods in mass spectrometry-based metabolomics and lipidomics. Anal. Chem. 88, 524–545 (2016). PubMed DOI

Alder, L., Greulich, K., Kempe, G. & Vieth, B. Residue analysis of 500 high priority pesticides: better by GC–MS or LC–MS/MS? Mass Spectrom. Rev. 25, 838–865 (2006). PubMed DOI

Díaz-Cruz, M. S., López de Alda, M. J., López, R. & Barceló, D. Determination of estrogens and progestogens by mass spectrometric techniques (GC/MS, LC/MS and LC/MS/MS). J. Mass Spectrom. 38, 917–923 (2003). PubMed DOI

Michely, J. A., Helfer, A. G., Brandt, S. D., Meyer, M. R. & Maurer, H. H. Metabolism of the new psychoactive substances N,N-diallyltryptamine (DALT) and 5-methoxy-DALT and their detectability in urine by GC–MS, LC–MSn, and LC–HR–MS–MS. Anal. Bioanal. Chem. 407, 7831–7842 (2015). PubMed DOI

Di Masi, S. et al. HPLC–MS/MS method applied to an untargeted metabolomics approach for the diagnosis of “olive quick decline syndrome”. Anal. Bioanal. Chem. 414, 465–473 (2022). PubMed DOI

Reveglia, P. et al. Untargeted and targeted LC–MS/MS based metabolomics study on in vitro culture of phaeoacremonium species. J. Fungi 8, 55 (2022). DOI

Baig, F., Pechlaner, R. & Mayr, M. Caveats of untargeted metabolomics for biomarker discovery∗. J. Am. Coll. Cardiol. 68, 1294–1296 (2016). PubMed DOI

Xiao, J. F., Zhou, B. & Ressom, H. W. Metabolite identification and quantitation in LC–MS/MS-based metabolomics. TrAC Trends Anal. Chem. 32, 1–14 (2012). DOI

Blaženović, I. et al. Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy. J. Cheminformatics 9, 32 (2017). DOI

Blaženović, I., Kind, T., Ji, J. & Fiehn, O. Software tools and approaches for compound identification of LC–MS/MS data in metabolomics. Metabolites 8, 31 (2018). PubMed DOI PMC

Dührkop, K., Shen, H., Meusel, M., Rousu, J. & Böcker, S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc. Natl Acad. Sci. USA 112, 12580–12585 (2015). PubMed DOI PMC

Böcker, S., Letzel, M. C., Lipták, Z. & Pervukhin, A. SIRIUS: decomposing isotope patterns for metabolite identification. Bioinformatics 25, 218–224 (2009). PubMed DOI

Stravs, M. A., Dührkop, K., Böcker, S. & Zamboni, N. MSNovelist: de novo structure generation from mass spectra. Nat. Methods 19, 865–870 (2022). PubMed DOI PMC

Aron, A. T. et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat. Protoc. 15, 1954–1991 (2020). PubMed DOI

Schmid, R. et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat. Commun. 12, 3832 (2021). PubMed DOI PMC

Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 (2008). PubMed DOI PMC

Hulstaert, N. et al. ThermoRawFileParser: modular, scalable, and cross-platform RAW file conversion. J. Proteome Res. 19, 537–542 (2020). PubMed DOI

Adusumilli, R. & Mallick, P. Data conversion with ProteoWizard msConvert. Methods Mol. Biol. 1550, 339–368 (2017). PubMed DOI

Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78, 779–787 (2006). PubMed DOI

Kuhl, C., Tautenhahn, R., Böttcher, C., Larson, T. R. & Neumann, S. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 84, 283–289 (2012). PubMed DOI

Schmid, R. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat. Biotechnol. 41, 447–449 (2023). PubMed DOI PMC

Tsugawa, H. et al. A lipidome atlas in MS-DIAL 4. Nat. Biotechnol. 38, 1159–1163 (2020). PubMed DOI

Pfeuffer, J. et al. OpenMS—a platform for reproducible analysis of mass spectrometry data. J. Biotechnol. 261, 142–148 (2017). PubMed DOI

Gloaguen, Y., Kirwan, J. A. & Beule, D. Deep learning-assisted peak curation for large-scale LC–MS metabolomics. Anal. Chem. 94, 4930–4937 (2022).

Chetnik, K., Petrick, L. & Pandey, G. MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC–MS metabolomics data. Metabolomics 16, 117 (2020). PubMed DOI PMC

El Abiead, Y., Milford, M., Salek, R. M. & Koellensperger, G. mzRAPP: a tool for reliability assessment of data pre-processing in non-targeted metabolomics. Bioinformatics 37, 3678–3680 (2021). PubMed DOI PMC

Heuckeroth, S., Damiani, T., Smirnov, A. et al. Reproducible mass spectrometry data processing and compound annotation in MZmine 3. Nat. Protoc. https://doi.org/10.1038/s41596-024-00996-y (2024).

Sumner, L. W. et al. Proposed minimum reporting standards for chemical analysis. Metabolomics 3, 211–221 (2007). PubMed DOI PMC

Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019). PubMed DOI

Dührkop, K. et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol. 39, 462–471 (2021). PubMed DOI

Liu, L.-L. et al. Molecular networking-based for the target discovery of potent antiproliferative polycyclic macrolactam ansamycins from Streptomyces cacaoi subsp. asoensis. Org. Chem. Front. 7, 4008–4018 (2020). DOI

Sedio, B. E., Boya P, C. A. & Rojas Echeverri, J. C. A protocol for high-throughput, untargeted forest community metabolomics using mass spectrometry molecular networks. Appl. Plant Sci. 6, e1033 (2018). PubMed DOI PMC

Quinn, R. A. et al. Molecular networking as a drug discovery, drug metabolism, and precision medicine strategy. Trends Pharmacol. Sci. 38, 143–154 (2017). PubMed DOI

Pluskal, T., Castillo, S., Villar-Briones, A. & Orešič, M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinforma. 11, 395 (2010). DOI

Nguyen, L. H. & Holmes, S. Ten quick tips for effective dimensionality reduction. PLOS Comput. Biol. 15, e1006907 (2019). PubMed DOI PMC

GOWER, J. C. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338 (1966). DOI

Xu, Y. et al. Application of dissimilarity indices, principal coordinates analysis, and rank tests to peak tables in metabolomics of the gas chromatography/mass spectrometry of human sweat. Anal. Chem. 79, 5633–5641 (2007). PubMed DOI

Tian, M. et al. Pure ion chromatograms combined with advanced machine learning methods improve accuracy of discriminant models in LC–MS-based untargeted metabolomics. Molecules 26, 2715 (2021). PubMed DOI PMC

Cacciatore, S., Tenori, L., Luchinat, C., Bennett, P. R. & MacIntyre, D. A. KODAMA: an R package for knowledge discovery and data mining. Bioinformatics 33, 621–623 (2017). PubMed DOI

Paliy, O. & Shankar, V. Application of multivariate statistical techniques in microbial ecology. Mol. Ecol. 25, 1032–1057 (2016). PubMed DOI PMC

Efron, B. Bootstrap methods: another look at the jackknife. in Breakthroughs in Statistics: Methodology and Distribution (eds. Kotz, S. & Johnson, N. L.) 569–593 (Springer, 1992); https://doi.org/10.1007/978-1-4612-4380-9_41 .

Desu, M. M. & Raghavarao, D. Nonparametric Statistical Methods For Complete and Censored Data. (CRC Press, 2003).

Xia, Y. & Sun, J. Hypothesis testing and statistical analysis of microbiome. Genes Dis. 4, 138–148 (2017). PubMed DOI PMC

Anderson, M. J. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 26, 32–46 (2001).

Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminformatics 8, 61 (2016). DOI

Kim, H. W. et al. NPClassifier: a deep neural network-based structural classification tool for natural products. J. Nat. Prod. 84, 2795–2807 (2021). PubMed DOI PMC

Tibshirani, R., Walther, G. & Hastie, T. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 63, 411–423 (2001). DOI

Benton, P. H. et al. An interactive cluster heat map to visualize and explore multidimensional metabolomic data. Metabolomics. J. Metabolomic Soc. 11, 1029–1034 (2015).

Ren, S., Hinzman, A. A., Kang, E. L., Szczesniak, R. D. & Lu, L. J. Computational and statistical analysis of metabolomics data. Metabolomics 11, 1492–1513 (2015). DOI

Liebal, U. W., Phan, A. N. T., Sudhakar, M., Raman, K. & Blank, L. M. Machine learning applications for mass spectrometry-based metabolomics. Metabolites 10, 243 (2020). PubMed DOI PMC

Gromski, P. S. et al. A tutorial review: metabolomics and partial least squares-discriminant analysis – a marriage of convenience or a shotgun wedding. Anal. Chim. Acta 879, 10–23 (2015). PubMed DOI

Mendez, K. M., Reinke, S. N. & Broadhurst, D. I. A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification. Metabolomics 15, 150 (2019). PubMed DOI PMC

Jafari, M. & Ansari-Pour, N. Why, when and how to adjust your P values? Cell J. Yakhteh 20, 604–607 (2019).

Korthauer, K. et al. A practical guide to methods controlling false discoveries in computational biology. Genome Biol. 20, 118 (2019). PubMed DOI PMC

Mishra, P. et al. Descriptive statistics and normality tests for statistical data. Ann. Card. Anaesth. 22, 67–72 (2019). PubMed DOI PMC

Neuhaus, G. F. et al. Environmental metabolomics characterization of modern stromatolites and annotation of ibhayipeptolides. PLoS ONE 19, e0303273 (2024). PubMed DOI PMC

Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019). PubMed DOI PMC

Moseley, H. N. B. Error analysis and propagation in metabolomics data analysis. Comput. Struct. Biotechnol. J. 4, e201301006 (2013). PubMed DOI PMC

Di Guida, R. et al. Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling. Metabolomics 12, 93 (2016). PubMed DOI PMC

Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). PubMed DOI PMC

Hoffmann, M. A. et al. High-confidence structural annotation of metabolites absent from spectral libraries. Nat. Biotechnol. 40, 411–421 (2022). PubMed DOI

Rinker, T. & Kurkiewicz, D. pacman: package management for R, version 0.5.0. https://github.com/trinker/pacman (2018).

Wickham, H. et al. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686 (2019). DOI

Kluyver, T., Angerer, P. & Schulz, J. IRdisplay: ‘Jupyter’ display machinery. (2022).

Cacciatore, S., Luchinat, C. & Tenori, L. Knowledge discovery by accuracy maximization. Proc. Natl Acad. Sci. USA 111, 5117–5122 (2014). PubMed DOI PMC

Kassambara, A. & Mundt, F. Factoextra: extract and visualize the results of multivariate data analyses. R package version 1.0.7. https://CRAN.R-project.org/package=factoextra (2020).

Oksanen, J. et al. vegan: community ecology package. R package version 2.6-4. https://doi.org/10.32614/CRAN.package.vegan (2024).

Gu, Z. Complex heatmap visualization. iMeta 1, e43 (2022). PubMed DOI PMC

Galili, T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinforma. Oxf. Engl. 31, 3718–3720 (2015). DOI

Charrad, M., Ghazzali, N., Boiteau, V. & Niknafs, A. NbClust: an R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 61, 1–36 (2014). DOI

Archer, E. rfPermute: estimate permutation P values for random forest importance metrics. R package version 2.5.1. CRAN https://doi.org/10.32614/CRAN.package.rfPermute (2023).

Ogle, D. H., Doll, J. C., Wheeler, A. P. & Dinno, A. FSA: simple fisheries stock assessment methods. R package version 0.9.4. CRAN https://fishr-core-team.github.io/FSA/ ; https://doi.org/10.32614/CRAN.package.FSA (2023).

Bengtsson, H. et al. matrixStats: functions that apply to rows and columns of matrices (and to vectors). R package version 0.63.0. CRAN https://doi.org/10.32614/CRAN.package.matrixStats (2023).

Xiao, N., Cook, J., Jégousse, C., Chen, H. & Li, M. ggsci: scientific journal and sci-fi themed color palettes for ‘ggplot2’. R package version 3.0. CRAN https://doi.org/10.32614/CRAN.package.ggsci (2023).

Wilke, C. O. cowplot: streamlined plot theme and plot annotations for ‘ggplot2’. R package version 1.1.1. CRAN https://doi.org/10.32614/CRAN.package.cowplot (2020).

Wickham, H. et al. svglite: an ‘SVG’ graphics device. R package version 2.1.1. CRAN https://doi.org/10.32614/CRAN.package.svglite (2023).

Reese, S. E. et al. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics 29, 2877–2883 (2013). PubMed DOI PMC

Burton, L. et al. Instrumental and experimental effects in LC–MS-based metabolomics. J. Chromatogr. B 871, 227–235 (2008). DOI

Gregori, J. et al. Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics. J. Proteom. 75, 3938–3951 (2012). DOI

Thonusin, C. et al. Evaluation of intensity drift correction strategies using MetaboDrift, a normalization tool for multi-batch metabolomics data. J. Chromatogr. A 1523, 265–274 (2017). PubMed DOI PMC

Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007). PubMed DOI

Deng, K. et al. WaveICA: a novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis. Anal. Chim. Acta 1061, 60–69 (2019). PubMed DOI

Wehrens, R. et al. Improved batch correction in untargeted MS-based metabolomics. Metabolomics 12, 88 (2016). PubMed DOI PMC

Dunn, W. B. et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 6, 1060–1083 (2011). PubMed DOI

Kuligowski, J., Sánchez-Illana, Á., Sanjuán-Herráez, D., Vento, M. & Quintás, G. Intra-batch effect correction in liquid chromatography-mass spectrometry using quality control samples and support vector regression (QC-SVRC). Analyst 140, 7810–7817 (2015). PubMed DOI

Luan, H., Ji, F., Chen, Y. & Cai, Z. statTarget: a streamlined tool for signal drift correction and interpretations of quantitative mass spectrometry-based omics data. Anal. Chim. Acta 1036, 66–72 (2018). PubMed DOI

Rong, Z. et al. NormAE: deep adversarial learning model to remove batch effects in liquid chromatography mass spectrometry-based metabolomics data. Anal. Chem. 92, 5082–5090 (2020). PubMed DOI

Dmitrenko, A., Reid, M. & Zamboni, N. Regularized adversarial learning for normalization of multi-batch untargeted metabolomics data. Bioinformatics 39, btad096 (2023). PubMed DOI PMC

Tokareva, A. O. et al. Normalization methods for reducing interbatch effect without quality control samples in liquid chromatography-mass spectrometry-based studies. Anal. Bioanal. Chem. 413, 3479–3486 (2021). PubMed DOI

Liu, Q. et al. Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing. Sci. Rep. 10, 13856 (2020). PubMed DOI PMC

Cleary, J. L., Luu, G. T., Pierce, E. C., Dutton, R. J. & Sanchez, L. M. BLANKA: an algorithm for blank subtraction in mass spectrometry of complex biological samples. J. Am. Soc. Mass Spectrom. 30, 1426–1434 (2019). PubMed DOI PMC

Gorrochategui, E., Jaumot, J., Lacorte, S. & Tauler, R. Data analysis strategies for targeted and untargeted LC–MS metabolomic studies: overview and workflow. TrAC Trends Anal. Chem. 82, 425–442 (2016). DOI

Wulff, J. E. & Mitchell, M. W. A comparison of various normalization methods for LC/MS metabolomics data. Adv. Biosci. Biotechnol. 9, 339–351 (2018). DOI

Dieterle, F., Ross, A., Schlotterbeck, G. & Senn, H. Probabilistic Quotient normalization as robust method to account for dilution of complex biological mixtures. application in 1H NMR metabonomics. Anal. Chem. 78, 4281–4290 (2006). PubMed DOI

van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K. & van der Werf, M. J. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7, 142 (2006). PubMed DOI PMC

Morgan, M. & Ramos, M. BiocManager: access the bioconductor project package repository. (2023).

Anderson, M. J. & Walsh, D. C. I. PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: what null hypothesis are you testing? Ecol. Monogr. 83, 557–574 (2013). DOI

Wilkinson, L. & Friendly, M. The history of the cluster heat map. Am. Stat. 63, 179–184 (2009). DOI

Wu, W. & Noble, W. S. Genomic data visualization on the Web. Bioinformatics 20, 1804–1805 (2004). PubMed DOI

Griffiths, E. T. et al. Detection and classification of narrow-band high frequency echolocation clicks from drifting recorders. J. Acoust. Soc. Am. 147, 3511–3522 (2020). PubMed DOI

Liu, S. et al. Comammox biogeography subject to anthropogenic interferences along a high-altitude river. Water Res. 226, 119225 (2022). PubMed DOI

Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001). DOI

Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002); https://journal.r-project.org/articles/RN-2002-022/RN-2002-022.pdf .

Robinson, D. et al. broom: convert statistical objects into tidy tibbles. CRAN https://doi.org/10.32614/CRAN.package.broom (2023).

Vinaixa, M. et al. A Guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. Metabolites 2, 775–795 (2012). PubMed DOI PMC

Ostertagová, E., Ostertag, O. & Kováč, J. Methodology and application of the Kruskal–Wallis test. Appl. Mech. Mater. 611, 115–120 (2014). DOI

Davidson, R. L., Weber, R. J. M., Liu, H., Sharma-Oates, A. & Viant, M. R. Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data. GigaScience 5, 10 (2016). PubMed DOI PMC

Giacomoni, F. et al. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics. Bioinformatics 31, 1493–1495 (2015). PubMed DOI

Kontou, E. E. et al. UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis. J. Cheminformatics 15, 52 (2023). DOI

Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752 (2017). PubMed DOI PMC

Chong, J. & Xia, J. MetaboAnalystR: an R package for flexible and reproducible analysis of metabolomics data. Bioinformatics 34, 4313–4314 (2018). PubMed DOI PMC

Pang, Z. & Xia, J. LC–MS/MS raw spectral data processing. https://www.metaboanalyst.ca/resources/vignettes/LCMSMS_Raw_Spectral_Processing.html (2024).

Tiffany, C. R. & Bäumler, A. J. omu, a metabolomics count data analysis tool for intuitive figures and convenient metadata collection. Microbiol. Resour. Announc. 8, e00129-19 (2019). PubMed DOI PMC

Han, X. & Liang, L. metabolomicsR: a streamlined workflow to analyze metabolomic data in R. Bioinforma. Adv. 2, vbac067 (2022). DOI

Fernández-Albert, F., Llorach, R., Andrés-Lacueva, C. & Perera, A. An R package to analyse LC/MS metabolomic data: MAIT (metabolite automatic identification toolkit). Bioinformatics 30, 1937–1939 (2014). PubMed DOI PMC

Thévenot, E. A., Roux, A., Xu, Y., Ezan, E. & Junot, C. Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. J. Proteome Res. 14, 3322–3335 (2015). PubMed DOI

Kohler, D. et al. MSstats version 4.0: statistical analyses of quantitative mass spectrometry-based proteomic experiments with chromatography-based quantification at scale. J. Proteome Res. 22, 1466–1482 (2023). PubMed DOI PMC

Riquelme, G., Zabalegui, N., Marchi, P., Jones, C. M. & Monge, M. E. A python-based pipeline for preprocessing LC–MS data for untargeted metabolomics workflows. Metabolites 10, 416 (2020). PubMed DOI PMC

Ivanisevic, J. & Want, E. J. From samples to insights into metabolism: uncovering biologically relevant information in LC–HRMS metabolomics data. Metabolites 9, 308 (2019). PubMed DOI PMC

Silva, A. M., Cordeiro-da-Silva, A. & Coombs, G. H. Metabolic variation during development in culture of Leishmania donovani promastigotes. PLoS Negl. Trop. Dis. 5, e1451 (2011). PubMed DOI PMC

Martínez-Sena, T. et al. Monitoring of system conditioning after blank injections in untargeted UPLC–MS metabolomic analysis. Sci. Rep. 9, 9822 (2019). PubMed DOI PMC

Raynie, D. The vital role of blanks in sample preparation. LCGC N. Am. 36, 494–497 (2018).

Yue, Y., Bao, X., Jiang, J. & Li, J. Evaluation and correction of injection order effects in LC–MS/MS based targeted metabolomics. J. Chromatogr. B 1212, 123513 (2022). DOI

Livera, A. M. D. et al. Statistical methods for handling unwanted variation in metabolomics data. Anal. Chem. 87, 3606–3615 (2015). PubMed DOI PMC

Broadhurst, D. et al. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics 14, 72 (2018). PubMed DOI PMC

Lawson, T. N. et al. msPurity: automated evaluation of precursor ion purity for mass spectrometry-based fragmentation in metabolomics. Anal. Chem. 89, 2432–2439 (2017). PubMed DOI

Schiffman, C. et al. Filtering procedures for untargeted LC–MS metabolomics data. BMC Bioinforma. 20, 334 (2019). DOI

Carobene, A., Braga, F., Roraas, T., Sandberg, S. & Bartlett, W. A. A systematic review of data on biological variation for alanine aminotransferase, aspartate aminotransferase and γ-glutamyl transferase. Clin. Chem. Lab. Med. CCLM 51, 1997–2007 (2013). PubMed DOI

Wei, R. et al. Missing value imputation approach for mass spectrometry-based metabolomics data. Sci. Rep. 8, 663 (2018). PubMed DOI PMC

Do, K. T. et al. Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. Metabolomics 14, 128 (2018). PubMed DOI PMC

Li, B. et al. Performance evaluation and online realization of data-driven normalization methods used in LC/MS based untargeted metabolomics analysis. Sci. Rep. 6, 38881 (2016). PubMed DOI PMC

Scholz, M., Gatzek, S., Sterling, A., Fiehn, O. & Selbig, J. Metabolite fingerprinting: detecting biological features by independent component analysis. Bioinformatics 20, 2447–2454 (2004). PubMed DOI

Deininger, S.-O. et al. Normalization in MALDI-TOF imaging datasets of proteins: practical considerations. Anal. Bioanal. Chem. 401, 167–181 (2011). PubMed DOI PMC

Qannari, E. M., Wakeling, I., Courcoux, P. & MacFie, H. J. H. Defining the underlying sensory dimensions. Food Qual. Prefer. 11, 151–154 (2000). DOI

Khalheim, O. M. Scaling of analytical data. Anal. Chim. Acta 177, 71–79 (1985). DOI

Kasprzak, E. M. & Lewis, K. E. Pareto analysis in multiobjective optimization using the collinearity theorem and scaling method. Struct. Multidiscip. Optim. 22, 208–218 (2001). DOI

Keenan, M. R. & Kotula, P. G. Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images. Surf. Interface Anal. 36, 203–212 (2004). DOI

Jäggi, C., Wirth, T. & Baur, B. Genetic variability in subpopulations of the asp viper (Vipera aspis) in the Swiss Jura mountains: implications for a conservation strategy. Biol. Conserv. 94, 69–77 (2000). DOI

Pinheiro, H. P., de Souza Pinheiro, A. & Sen, P. K. Comparison of genomic sequences using the Hamming distance. J. Stat. Plan. Inference 130, 325–339 (2005). DOI

Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71, 8228–8235 (2005). PubMed DOI PMC

Brejnrod, A. et al. Implementations of the chemical structural and compositional similarity metric in R and Python. Preprint at bioRxiv https://doi.org/10.1101/546150 (2019).

Tripathi, A. et al. Chemically informed analyses of metabolomics mass spectrometry data with Qemistree. Nat. Chem. Biol. 17, 146–151 (2021). PubMed DOI

Ramette, A. Multivariate analyses in microbial ecology. FEMS Microbiol. Ecol. 62, 142–160 (2007). PubMed DOI

Koenig, J. E. et al. Succession of microbial consortia in the developing infant gut microbiome. Proc. Natl Acad. Sci. 108, 4578–4585 (2011). PubMed DOI

Archer, F. I., Martien, K. K. & Taylor, B. L. Diagnosability of mt DNA with random forests: using sequence data to delimit subspecies. Mar. Mammal. Sci. 33, 101–131 (2017). DOI

Breiman, L. Out-of-bag estimation. Technical report 1-13 (Statistics Department, University of California Berkeley, 1996); https://www.stat.berkeley.edu/pub/users/breiman/OOBestimation.pdf .

Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T. & Zeileis, A. Conditional variable importance for random forests. BMC Bioinforma. 9, 307 (2008). DOI

Archer, K. J. & Kimes, R. V. Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. 52, 2249–2260 (2008). DOI

Riffenburgh, R. H. & Gillen, D. L. Statistics in Medicine (Academic Press, 2020).

Sato, T. Type I and type II error in multiple comparisons. J. Psychol. 130, 293–302 (1996). DOI

Bathke, A. The ANOVA F test can still be used in some balanced designs with unequal variances and nonnormal data. J. Stat. Plan. Inference 126, 413–422 (2004). DOI

Abdi, H. & Williams, L. Newman–Keuls test and Tukey test. Encycl. Res. Des. (2010).

Hecke, T. V. Power study of anova versus Kruskal–Wallis test. J. Stat. Manag. Syst. 15, 241–247 (2012).

Dinno, A. Nonparametric pairwise multiple comparisons in independent groups using Dunn’s test. Stata J. Promot. Commun. Stat. Stata 15, 292–300 (2015). DOI

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...