Best practices and tools in R and Python for statistical processing and visualization of lipidomics and metabolomics data

. 2025 Sep 30 ; 16 (1) : 8714. [epub] 20250930

Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články, přehledy

Perzistentní odkaz   https://www.medvik.cz/link/pmid41027880
Odkazy

PubMed 41027880
PubMed Central PMC12485147
DOI 10.1038/s41467-025-63751-1
PII: 10.1038/s41467-025-63751-1
Knihovny.cz E-zdroje

Mass spectrometry-based lipidomics and metabolomics generate extensive data sets that, along with metadata such as clinical parameters, require specific data exploration skills to identify and visualize statistically significant trends and biologically relevant differences. Besides tailored methods developed by individual labs, a solid core of freely accessible tools exists for exploratory data analysis and visualization, which we have compiled here, including preparation of descriptive statistics, annotated box plots, hypothesis testing, volcano plots, lipid maps and fatty acyl chain plots, unsupervised and supervised dimensionality reduction, dendrograms, and heat maps. This review is intended for those who would like to develop their skills in data analysis and visualization using freely available R or Python solutions. Beginners are guided through a selection of R and Python libraries for producing publication-ready graphics without being overwhelmed by the code complexity. This manuscript, along with associated GitBook code repository containing step-by-step instructions, offers readers a comprehensive guide, encouraging the application of R and Python for robust and reproducible chemometric analysis of omics data.

Department of Analytical Chemistry Faculty of Chemical Technology University of Pardubice Pardubice Czechia

Department of Biomedical Engineering Faculty of Electrical Engineering and Communication Brno University of Technology Brno Czechia

Department of Mathematical Analysis and Applications of Mathematics Faculty of Science Palacký University Olomouc Olomouc Czech Republic

Department of Medical Biochemistry Oslo University Hospital Oslo Norway

Department of Molecular and Clinical Pathology and Medical Genetics University Hospital Ostrava Ostrava Czechia

Department of Oncology KU Leuven Leuven Flanders Belgium

Institute of Experimental Endocrinology Biomedical Research Center Slovak Academy of Sciences Bratislava Slovakia

Institute of Neuroimmunology Slovak Academy of Sciences Bratislava Slovakia

Laboratory for Inherited Metabolic Disorders Department of Clinical Biochemistry University Hospital Olomouc and Faculty of Medicine and Dentistry Palacký University Olomouc Olomouc Czechia

Laboratory of Applied Mass Spectrometry Department of Cellular and Molecular Medicine KU Leuven Leuven Flanders Belgium

Laboratory of Integrative Cancer Genomics VIB KU Leuven Center for Cancer Biology Leuven Flanders Belgium

Laboratory of Lipid Metabolism and Cancer Department of Oncology Leuven Cancer Institute KU Leuven Leuven Flanders Belgium

Laboratory of Multi Omic Integrative Bioinformatics Department of Human Genetics KU Leuven Leuven Flanders Belgium

Metabolomics Core Facility VIB KU Leuven Center for Cancer Biology Leuven Flanders Belgium

Molecular Systems Biology Department of Functional and Evolutionary Ecology Faculty of Life Sciences University of Vienna Vienna Austria

South Australian Health and Medical Research Institute North Terrace Adelaide SA Australia

South Australian immunoGENomics Cancer Institute and Freemasons Centre for Male Health and Well Being The University of Adelaide Medical School North Terrace Adelaide SA Australia

VIB Center for AI and Computational Biology Leuven Flanders Belgium

Vienna Metabolomics Center University of Vienna Vienna Austria

Zobrazit více v PubMed

Géhin, C., Fowler, S. J. & Trivedi, D. K. Chewing the fat: how lipidomics is changing our understanding of human health and disease in 2022. PubMed PMC

Kvasnička, A. et al. Clinical lipidomics in the era of the big data. PubMed

L. Symons, J. et al. Lipidomic atlas of mammalian cell membranes reveals hierarchical variation induced by culture conditions, subcellular membranes, and cell lineages. PubMed PMC

Surma, M. A. et al. Mouse lipidomics reveals inherent flexibility of a mammalian lipidome. PubMed PMC

Slade, E. et al. Age and sex are associated with the plasma lipidome: findings from the GOLDN study. PubMed PMC

Weir, J. M. et al. Plasma lipid profiling in a large population-based cohort. PubMed PMC

Beyene, H. B. et al. High-coverage plasma lipidomics reveals novel sex-specific lipidomic fingerprints of age and BMI: Evidence from two large population cohort studies. PubMed PMC

Lindqvist, H. M. et al. A randomized controlled dietary intervention improved the serum lipid signature towards a less atherogenic profile in patients with rheumatoid arthritis. PubMed PMC

Eichelmann, F. et al. Deep lipidomics in human plasma: cardiometabolic disease risk and effect of dietary fat modulation. PubMed PMC

Israelsen, M. et al. Comprehensive lipidomics reveals phenotypic differences in hepatic lipid turnover in ALD and NAFLD during alcohol intoxication⋆. PubMed PMC

Meikle, P. J. et al. Statin action favors normalization of the plasma lipidome in the atherogenic mixed dyslipidemia of MetS: potential relevance to statin-associated dysglycemia. PubMed PMC

Matthiesen, R. et al. Shotgun mass spectrometry-based lipid profiling identifies and distinguishes between chronic inflammatory diseases. PubMed PMC

Chua, E. C.-P. et al. Extensive diversity in circadian regulation of plasma lipids and evidence for different circadian metabolic phenotypes in humans. PubMed PMC

Gnocchi, D., Pedrelli, M., Hurt-Camejo, E. & Parini, P. Lipids around the clock: focus on circadian rhythms and lipid metabolism. PubMed PMC

Sinturel, F., Spaleniak, W. & Dibner, C. Circadian rhythm of lipid metabolism. PubMed

Huynh, K. et al. High-throughput plasma lipidomics: detailed mapping of the associations with cardiometabolic risk factors. PubMed

Wolrab, D. et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. PubMed PMC

Wolrab, D. et al. Plasma lipidomic profiles of kidney, breast and prostate cancer patients differ from healthy controls. PubMed PMC

Afshinnia, F. et al. Lipidomic signature of progression of chronic kidney disease in the chronic renal insufficiency cohort. PubMed PMC

Pei, K. et al. An overview of lipid metabolism and nonalcoholic fatty liver disease. PubMed PMC

Graessler, J. et al. Top-down lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PubMed PMC

Vvedenskaya, O. et al. Nonalcoholic fatty liver disease stratification by liver lipidomics. PubMed PMC

Han, S. et al. TIGER: technical variation elimination for metabolomics data using ensemble learning architecture. PubMed PMC

Altman, N. & Krzywinski, M. Sources of variation. PubMed

Olshansky, G., Giles, C., Salim, A. & Meikle, P. J. Challenges and opportunities for prevention and removal of unwanted variation in lipidomic studies. PubMed

McDonald, J. G. et al. Introducing the lipidomics minimal reporting checklist. PubMed

Köfeler, H. C. et al. Recommendations for good practice in MS-based lipidomics. PubMed PMC

Liebisch, G. et al. Lipidomics needs more standardization. PubMed

Liebisch, G. et al. Shorthand notation for lipid structures derived from mass spectrometry. PubMed PMC

Liebisch, G. et al. Update on LIPID MAPS classification, nomenclature, and shorthand notation for MS-derived lipid structures. PubMed PMC

Holčapek, M., Liebisch, G. & Ekroos, K. Lipidomic analysis. PubMed

Kopczynski, D. et al. The lipidomics reporting checklist a framework for transparency of lipidomic experiments and repurposing resource data. PubMed PMC

Ruiz-Perez, D., Guan, H., Madhivanan, P., Mathee, K. & Narasimhan, G. So you think you can PLS-DA?. PubMed PMC

Wei, R. et al. Missing value imputation approach for mass spectrometry-based metabolomics data. PubMed PMC

Frölich, N., Klose, C., Widén, E., Ripatti, S. & Gerl, M. J. Imputation of missing values in lipidomic datasets. PubMed

Do, K. T. et al. Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. PubMed PMC

Armitage, E. G., Godzien, J., Alonso-Herranz, V., López-Gonzálvez, Á & Barbas, C. Missing value imputation strategies for metabolomics data. PubMed

Kokla, M., Virtanen, J., Kolehmainen, M., Paananen, J. & Hanhineva, K. Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study. PubMed PMC

González-Domínguez, Á, Estanyol-Torres, N., Brunius, C., Landberg, R. & González-Domínguez, R. QComics: recommendations and guidelines for robust, easily implementable and reportable quality control of metabolomics data. PubMed PMC

van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K. & van der Werf, M. J. Centering, scaling, and transformations: improving the biological information content of metabolomics data. PubMed PMC

Wong, G., Chan, J., Kingwell, B. A., Leckie, C. & Meikle, P. J. LICRE: unsupervised feature correlation reduction for lipidomics. PubMed PMC

Vinaixa, M. et al. A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. PubMed PMC

Bowden, J. A. et al. Harmonizing lipidomics: NIST interlaboratory comparison exercise for lipidomics using SRM 1950-Metabolites in Frozen Human Plasma. PubMed PMC

Chocholoušková, M. et al. Intra-laboratory comparison of four analytical platforms for lipidomic quantitation using hydrophilic interaction liquid chromatography or supercritical fluid chromatography coupled to quadrupole - time-of-flight mass spectrometry. PubMed

Pang, Z. et al. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. PubMed PMC

Lin, W.-J. et al. LipidSig: a web-based tool for lipidomic data analysis. PubMed PMC

Mohamed, A. & Hill, M. M. LipidSuite: interactive web server for lipidomics differential and enrichment analysis. PubMed PMC

LIPID MAPS. https://www.lipidmaps.org/resources/tools/stats.

Sun, X. & Weckwerth, W. COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential Jacobian from metabolomics covariance data.

Del Prete, E. et al. ADViSELipidomics: a workflow for analyzing lipidomics data. PubMed PMC

Karpievitch, Y. V., Dabney, A. R. & Smith, R. D. Normalization and missing value imputation for label-free LC-MS analysis. PubMed PMC

Ou, H. et al. Imputation for lipidomics and metabolomics (ImpLiMet): a web-based application for optimization and method selection for missing data imputation. PubMed PMC

Webb-Robertson, B.-J. M. et al. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. PubMed PMC

De Livera, A. M. et al. Normalizing and integrating metabolomics data. PubMed

Wu, Y. & Li, L. Sample normalization methods in quantitative metabolomics. PubMed

Lipid Species Quantification – lipidomicstandards.org. https://lipidomicstandards.org/lipid-species-quantification/.

Low, B., Wang, Y., Zhao, T., Yu, H. & Huan, T. Closing the knowledge gap of post-acquisition sample normalization in untargeted metabolomics. PubMed PMC

Drotleff, B. & Lämmerhofer, M. Guidelines for selection of internal standard-based normalization strategies in untargeted lipidomic profiling by LC-HR-MS/MS. PubMed

Ghafari, N. & Sleno, L. Challenges and recent advances in quantitative mass spectrometry-based metabolomics. PubMed PMC

Livera, A. M. D. et al. Statistical methods for handling unwanted variation in metabolomics data. PubMed PMC

Fan, S. et al. Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. PubMed PMC

Filzmoser, P. & Walczak, B. What can go wrong at the data normalization step for identification of biomarkers?. PubMed

Dieterle, F., Ross, A., Schlotterbeck, G. & Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. application in 1H NMR metabonomics. PubMed

Saccenti, E., Hoefsloot, H. C. J., Smilde, A. K., Westerhuis, J. A. & Hendriks, M. M. W. B. Reflections on univariate and multivariate analysis of metabolomics data.

Mohan, S. & Su, M. K. Biostatistics and epidemiology for the toxicologist: measures of central tendency and variability—where is the “middle?” and what Is the “spread?”. PubMed PMC

Christopher, A.

Yadav, S. K., Singh, S. & Gupta, R.

Ospina, R. & Marmolejo-Ramos, F. Performance of some estimators of relative variability.

Rosner B.

Checa, A., Bedia, C. & Jaumot, J. Lipidomic data analysis: tutorial, practical guidelines and applications. PubMed

Hubert, M. & Vandervieren, E. An adjusted boxplot for skewed distributions.

Krzywinski, M. & Altman, N. Visualizing samples with box plots. PubMed

Schulz, M., Walvoort, D. J. J., Barry, J., Fleet, D. M. & van Loon, W. M. G. M. Baseline and power analyses for the assessment of beach litter reductions in the European OSPAR region. PubMed

Hintze, J. L. & Nelson, R. D. Violin Plots: A Box Plot-Density Trace Synergism.

Gowda, H. et al. Interactive XCMS Online: simplifying advanced metabolomic data processing and subsequent statistical analyses. PubMed PMC

Fagerland, M. W. t-tests, non-parametric tests, and large studies—a paradox of statistical practice?. PubMed PMC

Azizi, F., Ghasemi, R. & Ardalan, M. Two common mistakes in applying ANOVA test: guide for biological researchers. Preprint at 10.20944/preprints202207.0082.v1 (2022).

Analysis of Variance. in

RPubs - Post-Hoc Analysis with Tukey’s Test. https://rpubs.com/aaronsc32/post-hoc-analysis-tukey.

Kruskal-Wallis Test. in

Pohlert, T. The Pairwise Multiple Comparison of Mean Ranks Package (PMCMR)

Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. PubMed PMC

Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. PubMed PMC

Importance of Feature Scaling.

van der Maaten, L. & Hinton, G. Visualizing Data using t-SNE.

McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection.

Sainburg, T., McInnes, L. & Gentner, T. Q. Parametric UMAP embeddings for representation and semisupervised learning. PubMed PMC

Lee, L. C., Liong, C.-Y. & Jemain, A. A. Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps. PubMed

Liland, K. H., Stefansson, P. & Indahl, U. G. Much faster cross-validation in PLSR-modelling by avoiding redundant calculations.

Trygg, J. & Wold, S. Orthogonal projections to latent structures (O-PLS).

Bylesjö, M. et al. OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification.

Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for ‘omics feature selection and multiple data integration. PubMed PMC

Worley, B., Halouska, S. & Powers, R. Utilities for quantifying separation in PCA/PLS-DA scores plots. PubMed PMC

Štefelová, N., Palarea-Albaladejo, J., Hron, K., Gába, A. & Dygrýn, J. Compositional PLS biplot based on pivoting balances: an application to explore the association between 24-h movement behaviours and adiposity.

Wiklund, S. et al. Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds Using OPLS class models. PubMed

Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. PubMed PMC

Yu, G. Using ggtree to visualize data on tree-like structures. PubMed

Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data.

Wickham, H. et al.

Wickham, H. et al. Welcome to the Tidyverse.

Mariño, J., Kasbohm, E., Struckmann, S., Kapsner, L. A. & Schmidt, C. O. R Packages for data quality assessments and data monitoring: a software scoping review with recommendations for future developments.

Sjoberg, D. et al. Reproducible summary tables with the gtsummary Package.

Engler, J. B. Tidyplots empowers life scientists with easy code-based data visualization. PubMed PMC

Patil, I. Visualizations with statistical details: The ‘ggstatsplot’ approach.

Stacklies, W., Redestig, H. & Wright, K. pcaMethods: a collection of PCA methods. Bioconductor version: Release (3.17) 10.18129/B9.bioc.pcaMethods (2023).

Stacklies, W., Redestig, H., Scholz, M., Walther, D. & Selbig, J. pcaMethods—a bioconductor package providing PCA methods for incomplete data. PubMed

Thévenot, E. A., Roux, A., Xu, Y., Ezan, E. & Junot, C. Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. PubMed

Thevenot, E. A. ropls: PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data. Bioconductor version: Release (3.17) 10.18129/B9.bioc.ropls (2023).

Kuhn, M. Building Predictive Models in R Using the caret Package.

Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. PubMed

Gu, Z. Complex heatmap visualization. PubMed PMC

Gu, Z. ComplexHeatmap: Make Complex Heatmaps. Bioconductor version: Release (3.17) 10.18129/B9.bioc.ComplexHeatmap (2023).

Mangiola, S. & Papenfuss, A. T. tidyHeatmap: an R package for modular heatmap production based on tidy principles.

Gu, Z. & Hübschmann, D. Make interactive complex heatmaps in R. PubMed PMC

Gu, Z. InteractiveComplexHeatmap: Make Interactive Complex Heatmaps. Bioconductor version: Release (3.17) 10.18129/B9.bioc.InteractiveComplexHeatmap (2023).

Shen, X. et al. TidyMass an object-oriented reproducible analysis framework for LC–MS data. PubMed PMC

Pang, Z. et al. MetaboAnalystR 4.0: a unified LC-MS workflow for global metabolomics. PubMed PMC

Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. XCMS:  processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. PubMed

Tautenhahn, R., Böttcher, C. & Neumann, S. Highly sensitive feature detection for high resolution LC/MS. PubMed PMC

Benton, H. P., Want, E. J. & Ebbels, T. M. D. Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data. PubMed

Mohamed, A., Molendijk, J. & Hill, M. M. lipidr: a Software Tool for Data Mining and Analysis of Lipidomics Datasets. PubMed

Riquelme, G., Zabalegui, N., Marchi, P., Jones, C. M. & Monge, M. E. A Python-based pipeline for preprocessing LC–MS Data for untargeted metabolomics workflows. PubMed PMC

Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. PubMed

Stanstrup, J. et al. The metaRbolomics Toolbox in Bioconductor and beyond. PubMed PMC

Jirásko, R. et al. Altered plasma, urine, and tissue profiles of sulfatides and sphingomyelins in patients with renal cell carcinoma. PubMed PMC

Idkowiak, J. et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. PubMed

Kvasnička, A. et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...