Best practices and tools in R and Python for statistical processing and visualization of lipidomics and metabolomics data
Jazyk angličtina Země Anglie, Velká Británie Médium electronic
Typ dokumentu časopisecké články, přehledy
PubMed
41027880
PubMed Central
PMC12485147
DOI
10.1038/s41467-025-63751-1
PII: 10.1038/s41467-025-63751-1
Knihovny.cz E-zdroje
- MeSH
- hmotnostní spektrometrie MeSH
- lidé MeSH
- lipidomika * metody MeSH
- metabolomika * metody MeSH
- programovací jazyk MeSH
- software * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Mass spectrometry-based lipidomics and metabolomics generate extensive data sets that, along with metadata such as clinical parameters, require specific data exploration skills to identify and visualize statistically significant trends and biologically relevant differences. Besides tailored methods developed by individual labs, a solid core of freely accessible tools exists for exploratory data analysis and visualization, which we have compiled here, including preparation of descriptive statistics, annotated box plots, hypothesis testing, volcano plots, lipid maps and fatty acyl chain plots, unsupervised and supervised dimensionality reduction, dendrograms, and heat maps. This review is intended for those who would like to develop their skills in data analysis and visualization using freely available R or Python solutions. Beginners are guided through a selection of R and Python libraries for producing publication-ready graphics without being overwhelmed by the code complexity. This manuscript, along with associated GitBook code repository containing step-by-step instructions, offers readers a comprehensive guide, encouraging the application of R and Python for robust and reproducible chemometric analysis of omics data.
Department of Medical Biochemistry Oslo University Hospital Oslo Norway
Department of Oncology KU Leuven Leuven Flanders Belgium
Institute of Neuroimmunology Slovak Academy of Sciences Bratislava Slovakia
Metabolomics Core Facility VIB KU Leuven Center for Cancer Biology Leuven Flanders Belgium
South Australian Health and Medical Research Institute North Terrace Adelaide SA Australia
VIB Center for AI and Computational Biology Leuven Flanders Belgium
Vienna Metabolomics Center University of Vienna Vienna Austria
Zobrazit více v PubMed
Géhin, C., Fowler, S. J. & Trivedi, D. K. Chewing the fat: how lipidomics is changing our understanding of human health and disease in 2022. PubMed DOI PMC
Kvasnička, A. et al. Clinical lipidomics in the era of the big data. PubMed DOI
L. Symons, J. et al. Lipidomic atlas of mammalian cell membranes reveals hierarchical variation induced by culture conditions, subcellular membranes, and cell lineages. PubMed DOI PMC
Surma, M. A. et al. Mouse lipidomics reveals inherent flexibility of a mammalian lipidome. PubMed DOI PMC
Slade, E. et al. Age and sex are associated with the plasma lipidome: findings from the GOLDN study. PubMed DOI PMC
Weir, J. M. et al. Plasma lipid profiling in a large population-based cohort. PubMed DOI PMC
Beyene, H. B. et al. High-coverage plasma lipidomics reveals novel sex-specific lipidomic fingerprints of age and BMI: Evidence from two large population cohort studies. PubMed PMC
Lindqvist, H. M. et al. A randomized controlled dietary intervention improved the serum lipid signature towards a less atherogenic profile in patients with rheumatoid arthritis. PubMed DOI PMC
Eichelmann, F. et al. Deep lipidomics in human plasma: cardiometabolic disease risk and effect of dietary fat modulation. PubMed DOI PMC
Israelsen, M. et al. Comprehensive lipidomics reveals phenotypic differences in hepatic lipid turnover in ALD and NAFLD during alcohol intoxication⋆. PubMed PMC
Meikle, P. J. et al. Statin action favors normalization of the plasma lipidome in the atherogenic mixed dyslipidemia of MetS: potential relevance to statin-associated dysglycemia. PubMed DOI PMC
Matthiesen, R. et al. Shotgun mass spectrometry-based lipid profiling identifies and distinguishes between chronic inflammatory diseases. PubMed PMC
Chua, E. C.-P. et al. Extensive diversity in circadian regulation of plasma lipids and evidence for different circadian metabolic phenotypes in humans. PubMed DOI PMC
Gnocchi, D., Pedrelli, M., Hurt-Camejo, E. & Parini, P. Lipids around the clock: focus on circadian rhythms and lipid metabolism. PubMed DOI PMC
Sinturel, F., Spaleniak, W. & Dibner, C. Circadian rhythm of lipid metabolism. PubMed DOI
Huynh, K. et al. High-throughput plasma lipidomics: detailed mapping of the associations with cardiometabolic risk factors. PubMed DOI
Wolrab, D. et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. PubMed DOI PMC
Wolrab, D. et al. Plasma lipidomic profiles of kidney, breast and prostate cancer patients differ from healthy controls. PubMed DOI PMC
Afshinnia, F. et al. Lipidomic signature of progression of chronic kidney disease in the chronic renal insufficiency cohort. PubMed DOI PMC
Pei, K. et al. An overview of lipid metabolism and nonalcoholic fatty liver disease. PubMed DOI PMC
Graessler, J. et al. Top-down lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PubMed DOI PMC
Vvedenskaya, O. et al. Nonalcoholic fatty liver disease stratification by liver lipidomics. PubMed DOI PMC
Han, S. et al. TIGER: technical variation elimination for metabolomics data using ensemble learning architecture. PubMed DOI PMC
Altman, N. & Krzywinski, M. Sources of variation. PubMed DOI
Olshansky, G., Giles, C., Salim, A. & Meikle, P. J. Challenges and opportunities for prevention and removal of unwanted variation in lipidomic studies. PubMed DOI
McDonald, J. G. et al. Introducing the lipidomics minimal reporting checklist. PubMed DOI
Köfeler, H. C. et al. Recommendations for good practice in MS-based lipidomics. PubMed PMC
Liebisch, G. et al. Lipidomics needs more standardization. PubMed DOI
Liebisch, G. et al. Shorthand notation for lipid structures derived from mass spectrometry. PubMed DOI PMC
Liebisch, G. et al. Update on LIPID MAPS classification, nomenclature, and shorthand notation for MS-derived lipid structures. PubMed DOI PMC
Holčapek, M., Liebisch, G. & Ekroos, K. Lipidomic analysis. PubMed
Kopczynski, D. et al. The lipidomics reporting checklist a framework for transparency of lipidomic experiments and repurposing resource data. PubMed PMC
Ruiz-Perez, D., Guan, H., Madhivanan, P., Mathee, K. & Narasimhan, G. So you think you can PLS-DA?. PubMed DOI PMC
Wei, R. et al. Missing value imputation approach for mass spectrometry-based metabolomics data. PubMed DOI PMC
Frölich, N., Klose, C., Widén, E., Ripatti, S. & Gerl, M. J. Imputation of missing values in lipidomic datasets. PubMed DOI
Do, K. T. et al. Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. PubMed DOI PMC
Armitage, E. G., Godzien, J., Alonso-Herranz, V., López-Gonzálvez, Á & Barbas, C. Missing value imputation strategies for metabolomics data. PubMed DOI
Kokla, M., Virtanen, J., Kolehmainen, M., Paananen, J. & Hanhineva, K. Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study. PubMed DOI PMC
González-Domínguez, Á, Estanyol-Torres, N., Brunius, C., Landberg, R. & González-Domínguez, R. QComics: recommendations and guidelines for robust, easily implementable and reportable quality control of metabolomics data. PubMed DOI PMC
van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K. & van der Werf, M. J. Centering, scaling, and transformations: improving the biological information content of metabolomics data. PubMed DOI PMC
Wong, G., Chan, J., Kingwell, B. A., Leckie, C. & Meikle, P. J. LICRE: unsupervised feature correlation reduction for lipidomics. PubMed DOI PMC
Vinaixa, M. et al. A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. PubMed DOI PMC
Bowden, J. A. et al. Harmonizing lipidomics: NIST interlaboratory comparison exercise for lipidomics using SRM 1950-Metabolites in Frozen Human Plasma. PubMed DOI PMC
Chocholoušková, M. et al. Intra-laboratory comparison of four analytical platforms for lipidomic quantitation using hydrophilic interaction liquid chromatography or supercritical fluid chromatography coupled to quadrupole - time-of-flight mass spectrometry. PubMed DOI
Pang, Z. et al. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. PubMed DOI PMC
Lin, W.-J. et al. LipidSig: a web-based tool for lipidomic data analysis. PubMed DOI PMC
Mohamed, A. & Hill, M. M. LipidSuite: interactive web server for lipidomics differential and enrichment analysis. PubMed DOI PMC
LIPID MAPS. https://www.lipidmaps.org/resources/tools/stats.
Sun, X. & Weckwerth, W. COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential Jacobian from metabolomics covariance data. DOI
Del Prete, E. et al. ADViSELipidomics: a workflow for analyzing lipidomics data. PubMed DOI PMC
Karpievitch, Y. V., Dabney, A. R. & Smith, R. D. Normalization and missing value imputation for label-free LC-MS analysis. PubMed DOI PMC
Ou, H. et al. Imputation for lipidomics and metabolomics (ImpLiMet): a web-based application for optimization and method selection for missing data imputation. PubMed DOI PMC
Webb-Robertson, B.-J. M. et al. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. PubMed DOI PMC
De Livera, A. M. et al. Normalizing and integrating metabolomics data. PubMed DOI
Wu, Y. & Li, L. Sample normalization methods in quantitative metabolomics. PubMed DOI
Lipid Species Quantification – lipidomicstandards.org. https://lipidomicstandards.org/lipid-species-quantification/.
Low, B., Wang, Y., Zhao, T., Yu, H. & Huan, T. Closing the knowledge gap of post-acquisition sample normalization in untargeted metabolomics. PubMed DOI PMC
Drotleff, B. & Lämmerhofer, M. Guidelines for selection of internal standard-based normalization strategies in untargeted lipidomic profiling by LC-HR-MS/MS. PubMed DOI
Ghafari, N. & Sleno, L. Challenges and recent advances in quantitative mass spectrometry-based metabolomics. PubMed DOI PMC
Livera, A. M. D. et al. Statistical methods for handling unwanted variation in metabolomics data. PubMed DOI PMC
Fan, S. et al. Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. PubMed DOI PMC
Filzmoser, P. & Walczak, B. What can go wrong at the data normalization step for identification of biomarkers?. PubMed DOI
Dieterle, F., Ross, A., Schlotterbeck, G. & Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. application in 1H NMR metabonomics. PubMed DOI
Saccenti, E., Hoefsloot, H. C. J., Smilde, A. K., Westerhuis, J. A. & Hendriks, M. M. W. B. Reflections on univariate and multivariate analysis of metabolomics data. DOI
Mohan, S. & Su, M. K. Biostatistics and epidemiology for the toxicologist: measures of central tendency and variability—where is the “middle?” and what Is the “spread?”. PubMed DOI PMC
Christopher, A.
Yadav, S. K., Singh, S. & Gupta, R.
Ospina, R. & Marmolejo-Ramos, F. Performance of some estimators of relative variability.
Rosner B.
Checa, A., Bedia, C. & Jaumot, J. Lipidomic data analysis: tutorial, practical guidelines and applications. PubMed DOI
Hubert, M. & Vandervieren, E. An adjusted boxplot for skewed distributions. DOI
Krzywinski, M. & Altman, N. Visualizing samples with box plots. PubMed DOI
Schulz, M., Walvoort, D. J. J., Barry, J., Fleet, D. M. & van Loon, W. M. G. M. Baseline and power analyses for the assessment of beach litter reductions in the European OSPAR region. PubMed DOI
Hintze, J. L. & Nelson, R. D. Violin Plots: A Box Plot-Density Trace Synergism. DOI
Gowda, H. et al. Interactive XCMS Online: simplifying advanced metabolomic data processing and subsequent statistical analyses. PubMed DOI PMC
Fagerland, M. W. t-tests, non-parametric tests, and large studies—a paradox of statistical practice?. PubMed DOI PMC
Azizi, F., Ghasemi, R. & Ardalan, M. Two common mistakes in applying ANOVA test: guide for biological researchers. Preprint at 10.20944/preprints202207.0082.v1 (2022).
Analysis of Variance. in
RPubs - Post-Hoc Analysis with Tukey’s Test. https://rpubs.com/aaronsc32/post-hoc-analysis-tukey.
Kruskal-Wallis Test. in
Pohlert, T. The Pairwise Multiple Comparison of Mean Ranks Package (PMCMR)
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. PubMed DOI PMC
Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. PubMed DOI PMC
Importance of Feature Scaling.
van der Maaten, L. & Hinton, G. Visualizing Data using t-SNE.
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection.
Sainburg, T., McInnes, L. & Gentner, T. Q. Parametric UMAP embeddings for representation and semisupervised learning. PubMed PMC
Lee, L. C., Liong, C.-Y. & Jemain, A. A. Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps. PubMed DOI
Liland, K. H., Stefansson, P. & Indahl, U. G. Much faster cross-validation in PLSR-modelling by avoiding redundant calculations. DOI
Trygg, J. & Wold, S. Orthogonal projections to latent structures (O-PLS). DOI
Bylesjö, M. et al. OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. DOI
Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for ‘omics feature selection and multiple data integration. PubMed DOI PMC
Worley, B., Halouska, S. & Powers, R. Utilities for quantifying separation in PCA/PLS-DA scores plots. PubMed DOI PMC
Štefelová, N., Palarea-Albaladejo, J., Hron, K., Gába, A. & Dygrýn, J. Compositional PLS biplot based on pivoting balances: an application to explore the association between 24-h movement behaviours and adiposity.
Wiklund, S. et al. Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds Using OPLS class models. PubMed DOI
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. PubMed DOI PMC
Yu, G. Using ggtree to visualize data on tree-like structures. PubMed DOI
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. DOI
Wickham, H. et al.
Wickham, H. et al. Welcome to the Tidyverse. DOI
Mariño, J., Kasbohm, E., Struckmann, S., Kapsner, L. A. & Schmidt, C. O. R Packages for data quality assessments and data monitoring: a software scoping review with recommendations for future developments. DOI
Sjoberg, D. et al. Reproducible summary tables with the gtsummary Package. DOI
Engler, J. B. Tidyplots empowers life scientists with easy code-based data visualization. PubMed PMC
Patil, I. Visualizations with statistical details: The ‘ggstatsplot’ approach. DOI
Stacklies, W., Redestig, H. & Wright, K. pcaMethods: a collection of PCA methods. Bioconductor version: Release (3.17) 10.18129/B9.bioc.pcaMethods (2023).
Stacklies, W., Redestig, H., Scholz, M., Walther, D. & Selbig, J. pcaMethods—a bioconductor package providing PCA methods for incomplete data. PubMed DOI
Thévenot, E. A., Roux, A., Xu, Y., Ezan, E. & Junot, C. Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. PubMed DOI
Thevenot, E. A. ropls: PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data. Bioconductor version: Release (3.17) 10.18129/B9.bioc.ropls (2023).
Kuhn, M. Building Predictive Models in R Using the caret Package. DOI
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. PubMed DOI
Gu, Z. Complex heatmap visualization. PubMed PMC
Gu, Z. ComplexHeatmap: Make Complex Heatmaps. Bioconductor version: Release (3.17) 10.18129/B9.bioc.ComplexHeatmap (2023).
Mangiola, S. & Papenfuss, A. T. tidyHeatmap: an R package for modular heatmap production based on tidy principles. DOI
Gu, Z. & Hübschmann, D. Make interactive complex heatmaps in R. PubMed DOI PMC
Gu, Z. InteractiveComplexHeatmap: Make Interactive Complex Heatmaps. Bioconductor version: Release (3.17) 10.18129/B9.bioc.InteractiveComplexHeatmap (2023).
Shen, X. et al. TidyMass an object-oriented reproducible analysis framework for LC–MS data. PubMed DOI PMC
Pang, Z. et al. MetaboAnalystR 4.0: a unified LC-MS workflow for global metabolomics. PubMed DOI PMC
Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. PubMed DOI
Tautenhahn, R., Böttcher, C. & Neumann, S. Highly sensitive feature detection for high resolution LC/MS. PubMed DOI PMC
Benton, H. P., Want, E. J. & Ebbels, T. M. D. Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data. PubMed DOI
Mohamed, A., Molendijk, J. & Hill, M. M. lipidr: a Software Tool for Data Mining and Analysis of Lipidomics Datasets. PubMed DOI
Riquelme, G., Zabalegui, N., Marchi, P., Jones, C. M. & Monge, M. E. A Python-based pipeline for preprocessing LC–MS Data for untargeted metabolomics workflows. PubMed DOI PMC
Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. PubMed DOI
Stanstrup, J. et al. The metaRbolomics Toolbox in Bioconductor and beyond. PubMed DOI PMC
Jirásko, R. et al. Altered plasma, urine, and tissue profiles of sulfatides and sphingomyelins in patients with renal cell carcinoma. PubMed DOI PMC
Idkowiak, J. et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. PubMed DOI
Kvasnička, A. et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. PubMed DOI PMC