Best practices and tools in R and Python for statistical processing and visualization of lipidomics and metabolomics data
Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články, přehledy
PubMed
41027880
PubMed Central
PMC12485147
DOI
10.1038/s41467-025-63751-1
PII: 10.1038/s41467-025-63751-1
Knihovny.cz E-zdroje
- MeSH
- hmotnostní spektrometrie MeSH
- lidé MeSH
- lipidomika * metody MeSH
- metabolomika * metody MeSH
- programovací jazyk MeSH
- software * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Mass spectrometry-based lipidomics and metabolomics generate extensive data sets that, along with metadata such as clinical parameters, require specific data exploration skills to identify and visualize statistically significant trends and biologically relevant differences. Besides tailored methods developed by individual labs, a solid core of freely accessible tools exists for exploratory data analysis and visualization, which we have compiled here, including preparation of descriptive statistics, annotated box plots, hypothesis testing, volcano plots, lipid maps and fatty acyl chain plots, unsupervised and supervised dimensionality reduction, dendrograms, and heat maps. This review is intended for those who would like to develop their skills in data analysis and visualization using freely available R or Python solutions. Beginners are guided through a selection of R and Python libraries for producing publication-ready graphics without being overwhelmed by the code complexity. This manuscript, along with associated GitBook code repository containing step-by-step instructions, offers readers a comprehensive guide, encouraging the application of R and Python for robust and reproducible chemometric analysis of omics data.
Department of Medical Biochemistry Oslo University Hospital Oslo Norway
Department of Oncology KU Leuven Leuven Flanders Belgium
Institute of Neuroimmunology Slovak Academy of Sciences Bratislava Slovakia
Metabolomics Core Facility VIB KU Leuven Center for Cancer Biology Leuven Flanders Belgium
South Australian Health and Medical Research Institute North Terrace Adelaide SA Australia
VIB Center for AI and Computational Biology Leuven Flanders Belgium
Vienna Metabolomics Center University of Vienna Vienna Austria
Zobrazit více v PubMed
Géhin, C., Fowler, S. J. & Trivedi, D. K. Chewing the fat: how lipidomics is changing our understanding of human health and disease in 2022. PubMed PMC
Kvasnička, A. et al. Clinical lipidomics in the era of the big data. PubMed
L. Symons, J. et al. Lipidomic atlas of mammalian cell membranes reveals hierarchical variation induced by culture conditions, subcellular membranes, and cell lineages. PubMed PMC
Surma, M. A. et al. Mouse lipidomics reveals inherent flexibility of a mammalian lipidome. PubMed PMC
Slade, E. et al. Age and sex are associated with the plasma lipidome: findings from the GOLDN study. PubMed PMC
Weir, J. M. et al. Plasma lipid profiling in a large population-based cohort. PubMed PMC
Beyene, H. B. et al. High-coverage plasma lipidomics reveals novel sex-specific lipidomic fingerprints of age and BMI: Evidence from two large population cohort studies. PubMed PMC
Lindqvist, H. M. et al. A randomized controlled dietary intervention improved the serum lipid signature towards a less atherogenic profile in patients with rheumatoid arthritis. PubMed PMC
Eichelmann, F. et al. Deep lipidomics in human plasma: cardiometabolic disease risk and effect of dietary fat modulation. PubMed PMC
Israelsen, M. et al. Comprehensive lipidomics reveals phenotypic differences in hepatic lipid turnover in ALD and NAFLD during alcohol intoxication⋆. PubMed PMC
Meikle, P. J. et al. Statin action favors normalization of the plasma lipidome in the atherogenic mixed dyslipidemia of MetS: potential relevance to statin-associated dysglycemia. PubMed PMC
Matthiesen, R. et al. Shotgun mass spectrometry-based lipid profiling identifies and distinguishes between chronic inflammatory diseases. PubMed PMC
Chua, E. C.-P. et al. Extensive diversity in circadian regulation of plasma lipids and evidence for different circadian metabolic phenotypes in humans. PubMed PMC
Gnocchi, D., Pedrelli, M., Hurt-Camejo, E. & Parini, P. Lipids around the clock: focus on circadian rhythms and lipid metabolism. PubMed PMC
Sinturel, F., Spaleniak, W. & Dibner, C. Circadian rhythm of lipid metabolism. PubMed
Huynh, K. et al. High-throughput plasma lipidomics: detailed mapping of the associations with cardiometabolic risk factors. PubMed
Wolrab, D. et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. PubMed PMC
Wolrab, D. et al. Plasma lipidomic profiles of kidney, breast and prostate cancer patients differ from healthy controls. PubMed PMC
Afshinnia, F. et al. Lipidomic signature of progression of chronic kidney disease in the chronic renal insufficiency cohort. PubMed PMC
Pei, K. et al. An overview of lipid metabolism and nonalcoholic fatty liver disease. PubMed PMC
Graessler, J. et al. Top-down lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PubMed PMC
Vvedenskaya, O. et al. Nonalcoholic fatty liver disease stratification by liver lipidomics. PubMed PMC
Han, S. et al. TIGER: technical variation elimination for metabolomics data using ensemble learning architecture. PubMed PMC
Altman, N. & Krzywinski, M. Sources of variation. PubMed
Olshansky, G., Giles, C., Salim, A. & Meikle, P. J. Challenges and opportunities for prevention and removal of unwanted variation in lipidomic studies. PubMed
McDonald, J. G. et al. Introducing the lipidomics minimal reporting checklist. PubMed
Köfeler, H. C. et al. Recommendations for good practice in MS-based lipidomics. PubMed PMC
Liebisch, G. et al. Lipidomics needs more standardization. PubMed
Liebisch, G. et al. Shorthand notation for lipid structures derived from mass spectrometry. PubMed PMC
Liebisch, G. et al. Update on LIPID MAPS classification, nomenclature, and shorthand notation for MS-derived lipid structures. PubMed PMC
Holčapek, M., Liebisch, G. & Ekroos, K. Lipidomic analysis. PubMed
Kopczynski, D. et al. The lipidomics reporting checklist a framework for transparency of lipidomic experiments and repurposing resource data. PubMed PMC
Ruiz-Perez, D., Guan, H., Madhivanan, P., Mathee, K. & Narasimhan, G. So you think you can PLS-DA?. PubMed PMC
Wei, R. et al. Missing value imputation approach for mass spectrometry-based metabolomics data. PubMed PMC
Frölich, N., Klose, C., Widén, E., Ripatti, S. & Gerl, M. J. Imputation of missing values in lipidomic datasets. PubMed
Do, K. T. et al. Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. PubMed PMC
Armitage, E. G., Godzien, J., Alonso-Herranz, V., López-Gonzálvez, Á & Barbas, C. Missing value imputation strategies for metabolomics data. PubMed
Kokla, M., Virtanen, J., Kolehmainen, M., Paananen, J. & Hanhineva, K. Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study. PubMed PMC
González-Domínguez, Á, Estanyol-Torres, N., Brunius, C., Landberg, R. & González-Domínguez, R. QComics: recommendations and guidelines for robust, easily implementable and reportable quality control of metabolomics data. PubMed PMC
van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K. & van der Werf, M. J. Centering, scaling, and transformations: improving the biological information content of metabolomics data. PubMed PMC
Wong, G., Chan, J., Kingwell, B. A., Leckie, C. & Meikle, P. J. LICRE: unsupervised feature correlation reduction for lipidomics. PubMed PMC
Vinaixa, M. et al. A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. PubMed PMC
Bowden, J. A. et al. Harmonizing lipidomics: NIST interlaboratory comparison exercise for lipidomics using SRM 1950-Metabolites in Frozen Human Plasma. PubMed PMC
Chocholoušková, M. et al. Intra-laboratory comparison of four analytical platforms for lipidomic quantitation using hydrophilic interaction liquid chromatography or supercritical fluid chromatography coupled to quadrupole - time-of-flight mass spectrometry. PubMed
Pang, Z. et al. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. PubMed PMC
Lin, W.-J. et al. LipidSig: a web-based tool for lipidomic data analysis. PubMed PMC
Mohamed, A. & Hill, M. M. LipidSuite: interactive web server for lipidomics differential and enrichment analysis. PubMed PMC
LIPID MAPS. https://www.lipidmaps.org/resources/tools/stats.
Sun, X. & Weckwerth, W. COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential Jacobian from metabolomics covariance data.
Del Prete, E. et al. ADViSELipidomics: a workflow for analyzing lipidomics data. PubMed PMC
Karpievitch, Y. V., Dabney, A. R. & Smith, R. D. Normalization and missing value imputation for label-free LC-MS analysis. PubMed PMC
Ou, H. et al. Imputation for lipidomics and metabolomics (ImpLiMet): a web-based application for optimization and method selection for missing data imputation. PubMed PMC
Webb-Robertson, B.-J. M. et al. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. PubMed PMC
De Livera, A. M. et al. Normalizing and integrating metabolomics data. PubMed
Wu, Y. & Li, L. Sample normalization methods in quantitative metabolomics. PubMed
Lipid Species Quantification – lipidomicstandards.org. https://lipidomicstandards.org/lipid-species-quantification/.
Low, B., Wang, Y., Zhao, T., Yu, H. & Huan, T. Closing the knowledge gap of post-acquisition sample normalization in untargeted metabolomics. PubMed PMC
Drotleff, B. & Lämmerhofer, M. Guidelines for selection of internal standard-based normalization strategies in untargeted lipidomic profiling by LC-HR-MS/MS. PubMed
Ghafari, N. & Sleno, L. Challenges and recent advances in quantitative mass spectrometry-based metabolomics. PubMed PMC
Livera, A. M. D. et al. Statistical methods for handling unwanted variation in metabolomics data. PubMed PMC
Fan, S. et al. Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. PubMed PMC
Filzmoser, P. & Walczak, B. What can go wrong at the data normalization step for identification of biomarkers?. PubMed
Dieterle, F., Ross, A., Schlotterbeck, G. & Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. application in 1H NMR metabonomics. PubMed
Saccenti, E., Hoefsloot, H. C. J., Smilde, A. K., Westerhuis, J. A. & Hendriks, M. M. W. B. Reflections on univariate and multivariate analysis of metabolomics data.
Mohan, S. & Su, M. K. Biostatistics and epidemiology for the toxicologist: measures of central tendency and variability—where is the “middle?” and what Is the “spread?”. PubMed PMC
Christopher, A.
Yadav, S. K., Singh, S. & Gupta, R.
Ospina, R. & Marmolejo-Ramos, F. Performance of some estimators of relative variability.
Rosner B.
Checa, A., Bedia, C. & Jaumot, J. Lipidomic data analysis: tutorial, practical guidelines and applications. PubMed
Hubert, M. & Vandervieren, E. An adjusted boxplot for skewed distributions.
Krzywinski, M. & Altman, N. Visualizing samples with box plots. PubMed
Schulz, M., Walvoort, D. J. J., Barry, J., Fleet, D. M. & van Loon, W. M. G. M. Baseline and power analyses for the assessment of beach litter reductions in the European OSPAR region. PubMed
Hintze, J. L. & Nelson, R. D. Violin Plots: A Box Plot-Density Trace Synergism.
Gowda, H. et al. Interactive XCMS Online: simplifying advanced metabolomic data processing and subsequent statistical analyses. PubMed PMC
Fagerland, M. W. t-tests, non-parametric tests, and large studies—a paradox of statistical practice?. PubMed PMC
Azizi, F., Ghasemi, R. & Ardalan, M. Two common mistakes in applying ANOVA test: guide for biological researchers. Preprint at 10.20944/preprints202207.0082.v1 (2022).
Analysis of Variance. in
RPubs - Post-Hoc Analysis with Tukey’s Test. https://rpubs.com/aaronsc32/post-hoc-analysis-tukey.
Kruskal-Wallis Test. in
Pohlert, T. The Pairwise Multiple Comparison of Mean Ranks Package (PMCMR)
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. PubMed PMC
Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. PubMed PMC
Importance of Feature Scaling.
van der Maaten, L. & Hinton, G. Visualizing Data using t-SNE.
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection.
Sainburg, T., McInnes, L. & Gentner, T. Q. Parametric UMAP embeddings for representation and semisupervised learning. PubMed PMC
Lee, L. C., Liong, C.-Y. & Jemain, A. A. Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps. PubMed
Liland, K. H., Stefansson, P. & Indahl, U. G. Much faster cross-validation in PLSR-modelling by avoiding redundant calculations.
Trygg, J. & Wold, S. Orthogonal projections to latent structures (O-PLS).
Bylesjö, M. et al. OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification.
Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for ‘omics feature selection and multiple data integration. PubMed PMC
Worley, B., Halouska, S. & Powers, R. Utilities for quantifying separation in PCA/PLS-DA scores plots. PubMed PMC
Štefelová, N., Palarea-Albaladejo, J., Hron, K., Gába, A. & Dygrýn, J. Compositional PLS biplot based on pivoting balances: an application to explore the association between 24-h movement behaviours and adiposity.
Wiklund, S. et al. Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds Using OPLS class models. PubMed
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. PubMed PMC
Yu, G. Using ggtree to visualize data on tree-like structures. PubMed
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data.
Wickham, H. et al.
Wickham, H. et al. Welcome to the Tidyverse.
Mariño, J., Kasbohm, E., Struckmann, S., Kapsner, L. A. & Schmidt, C. O. R Packages for data quality assessments and data monitoring: a software scoping review with recommendations for future developments.
Sjoberg, D. et al. Reproducible summary tables with the gtsummary Package.
Engler, J. B. Tidyplots empowers life scientists with easy code-based data visualization. PubMed PMC
Patil, I. Visualizations with statistical details: The ‘ggstatsplot’ approach.
Stacklies, W., Redestig, H. & Wright, K. pcaMethods: a collection of PCA methods. Bioconductor version: Release (3.17) 10.18129/B9.bioc.pcaMethods (2023).
Stacklies, W., Redestig, H., Scholz, M., Walther, D. & Selbig, J. pcaMethods—a bioconductor package providing PCA methods for incomplete data. PubMed
Thévenot, E. A., Roux, A., Xu, Y., Ezan, E. & Junot, C. Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. PubMed
Thevenot, E. A. ropls: PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data. Bioconductor version: Release (3.17) 10.18129/B9.bioc.ropls (2023).
Kuhn, M. Building Predictive Models in R Using the caret Package.
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. PubMed
Gu, Z. Complex heatmap visualization. PubMed PMC
Gu, Z. ComplexHeatmap: Make Complex Heatmaps. Bioconductor version: Release (3.17) 10.18129/B9.bioc.ComplexHeatmap (2023).
Mangiola, S. & Papenfuss, A. T. tidyHeatmap: an R package for modular heatmap production based on tidy principles.
Gu, Z. & Hübschmann, D. Make interactive complex heatmaps in R. PubMed PMC
Gu, Z. InteractiveComplexHeatmap: Make Interactive Complex Heatmaps. Bioconductor version: Release (3.17) 10.18129/B9.bioc.InteractiveComplexHeatmap (2023).
Shen, X. et al. TidyMass an object-oriented reproducible analysis framework for LC–MS data. PubMed PMC
Pang, Z. et al. MetaboAnalystR 4.0: a unified LC-MS workflow for global metabolomics. PubMed PMC
Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. PubMed
Tautenhahn, R., Böttcher, C. & Neumann, S. Highly sensitive feature detection for high resolution LC/MS. PubMed PMC
Benton, H. P., Want, E. J. & Ebbels, T. M. D. Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data. PubMed
Mohamed, A., Molendijk, J. & Hill, M. M. lipidr: a Software Tool for Data Mining and Analysis of Lipidomics Datasets. PubMed
Riquelme, G., Zabalegui, N., Marchi, P., Jones, C. M. & Monge, M. E. A Python-based pipeline for preprocessing LC–MS Data for untargeted metabolomics workflows. PubMed PMC
Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. PubMed
Stanstrup, J. et al. The metaRbolomics Toolbox in Bioconductor and beyond. PubMed PMC
Jirásko, R. et al. Altered plasma, urine, and tissue profiles of sulfatides and sphingomyelins in patients with renal cell carcinoma. PubMed PMC
Idkowiak, J. et al. Robust and high-throughput lipidomic quantitation of human blood samples using flow injection analysis with tandem mass spectrometry for clinical use. PubMed
Kvasnička, A. et al. Alterations in lipidome profiles distinguish early-onset hyperuricemia, gout, and the effect of urate-lowering treatment. PubMed PMC