Orthonormal pairwise logratio selection (OPALS) algorithm for compositional data analysis in high dimensions
Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
41293320
PubMed Central
PMC12641611
DOI
10.1093/bioadv/vbaf229
PII: vbaf229
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
SUMMARY: In the analysis of compositional data, the most fundamental information is conveyed by the pairwise logratios between components. While logratio coordinate representations, such as balances and pivot coordinates, are widely used to aggregate such information into higher-level relationships, there are instances where a fine-grained representation using all pairwise logratios can be advantageous. Performing this within an orthonormal (or orthogonal) logratio coordinate framework becomes particularly challenging for high-dimensional compositions, since a composition with D parts results in D ( D - 1 ) / 2 pairwise logratios (excluding reciprocals). This work presents an efficient algorithm (OPALS) based on Latin squares theory to obtain all orthonormal pairwise logratios from just D - 1 logratio coordinate systems. Thus, the computational burden associated with using such representation for data analysis and modelling in high dimensions is notably alleviated, or even made feasible. Moreover, the relationship between estimates from orthonormal pairwise logratios and ordinary pivot coordinates is discussed in the context of regression and classification analysis. AVAILABILITY AND IMPLEMENTATION: The OPALS algorithm is described in detail in this article and can be implemented directly from the provided methodology. The performance and properties of the method are illustrated through two examples using contemporary molecular biology data.
Zobrazit více v PubMed
Acharya C, Sahingur SE, Bajaj JS. Microbiota, cirrhosis, and the emerging oral-gut-liver axis. JCI Insight 2017;2:e94416. PubMed PMC
Aitchison J. The statistical analysis of compositional data. J R Stat Soc Ser B Stat Methodol 1982;44:139–60.
Aitchison J. The Statistical Analysis of Compositional Data. London: Chapman and Hall, 1986.
Barker M, Rayens W. Partial least squares for discrimination. J Chemom 2003;17:166–73.
Bates S, Tibshirani R. Log-ratio lasso: scalable, sparse estimation for log-ratio models. Biometrics 2019;75:613–24. PubMed PMC
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol 1995;57:289–300.
Berland M, Meslier V, Berreira Ibraim S et al. Both disease activity and hla–b27 status are associated with gut microbiome dysbiosis in spondyloarthritis patients. Arthritis Rheumatol 2023;75:41–52. PubMed PMC
Bica R, Palarea-Albaladejo J, Kew W et al. Nuclear magnetic resonance to detect rumen metabolites associated with enteric methane emissions from beef cattle. Sci Rep 2020;10:5578. PubMed PMC
Casselgren CJ, Häggkvist R. Completing partial latin squares with one filled row, column and symbol. Discrete Math 2013;313:1011–7.
Champion C, Momal R, Chatelier EL et al. Microbial species abundances from public project prjeb6337 on liver cirrhosis. Technical report. Recherche Data Gouv DOI
Coenders G, Greenacre M. Three approaches to supervised learning for compositional data with pairwise logratios. J Appl Stat 2023;50:3272–93. PubMed PMC
Coenders G, Pawlowsky-Glahn V. On interpretations of tests and effect sizes in regression models with a compositional predictor. Stat Oper Res Transac 2020;44:201–20.
Egozcue J, Pawlowsky-Glahn V. Groups of parts and their balances in compositional data analysis. Math Geol 2005;37:795–828.
Egozcue J, Pawlowsky-Glahn V, Mateu-Figueras G et al. Isometric logratio transformations for compositional data analysis. Math Geol 2003;35:279–300.
Filzmoser P, Hron K, Templ M. Applied Compositional Data Analysis. Cham: Springer, 2018.
Gloor G, Macklaim J, Pawlowsky-Glahn V et al. Microbiome datasets are compositional: and this is not optional. Front Microbiol 2017;8:2224. PubMed PMC
Gordon-Rodriguez E, Quinn T, Cunningham J. Learning sparse log-ratios for high-throughput sequencing data. Bioinformatics 2021;38:157–63. PubMed PMC
Hron K, Filzmoser P, Thompson K. Linear regression with compositional explanatory variables. J Appl Stat 2012;39:1115–28.
Hron K, Coenders G, Filzmoser P et al. Analysing pairwise logratios revisited. Math Geosci 2021;53:1643–66.
Kalivodová A, Hron K, Filzmoser P et al. PLS-DA for compositional data with application to metabolomics. J Chemom 2015;29:21–8.
Martín-Fernández J. Comments on: compositional data: the sample space and its structure. TEST 2019;28:653–7.
McGregor D, Palarea-Albaladejo J, Dall P et al. Cox regression survival analysis with compositional covariates: application to modelling mortality risk from 24-h physical activity patterns. Stat Methods Med Res 2020;29:1447–65. PubMed
Müller I, Hron K, Fišerová E et al. Interpretation of compositional regression with application to time budget analysis. AJS 2018;47:3–19.
Nesrstová V, Jašková P, Pavlů I et al. Simple enough, but not simpler: reconsidering additive logratio coordinates in compositional analysis. Stat Oper Res Transac 2023. a;47:269–94.
Nesrstová V, Wilms I, Palarea-Albaladejo J et al. Principal balances of compositional data for regression and classification using partial least squares. J Chemom 2023. b;37:e3518.
Nesrstová V, Wilms I, Hron K et al. Identifying important pairwise logratios in compositional data with sparse principal component analysis. Math Geosci 2025;57:333–58. PubMed PMC
Palarea-Albaladejo J, Rooke J, Nevison I et al. Compositional mixed modeling of methane emissions and ruminal volatile fatty acids from individual cattle and multiple experiments. J Anim Sci 2017;95:2467–80. PubMed
Pawlowsky-Glahn V, Egozcue J, Tolosana-Delgado R. Modeling and Analysis of Compositional Data. Chichester: Wiley, 2015.
Plaza Oñate F, Le Chatelier E, Almeida M et al. Mspminer: abundance-based reconstitution of microbial pan-genomes from shotgun metagenomic data. Bioinformatics 2019;35:1544–52. PubMed PMC
Plaza Oñate F, Pons N, Gauthier F et al. Updated metagenomic species pan-genomes (msps) of the human gastrointestinal microbiota. Technical report DOI
Qin N, Yang F, Li A et al. Alterations of the human gut microbiome in liver cirrhosis. Nature 2014;513:59–64. PubMed
Quinn T, Erb I. Interpretable log contrasts for the classification of health biomarkers: a new approach to balance selection. mSystems 2020;5:e00230–19. PubMed PMC
Rivera-Pinto J, Egozcue JJ, Pawlowsky-Glahn V et al. Balances: a new perspective for microbiome analysis. mSystems 2018;3:00053-18. PubMed PMC
Saperas-Riera J, Mateu-Figueras G, Martín-Fernández J. Lasso regression method for a compositional covariate regularised by the norm l1 pairwise logratio. J Geochem Explor 2023;255:107327.
Solé C, Guilly S, Da Silva K et al. Alterations in gut microbiome in cirrhosis as assessed by quantitative metagenomics: relationship with acute-on-chronic liver failure and prognosis. Gastroenterology 2021;160:206–18.e13. PubMed
Susin A, Wang Y, Lê Cao K et al. Variable selection in microbiome compositional data analysis. NAR Genom Bioinform 2020;2:lqaa029. PubMed PMC
Thirion F, Speyer H, Hansen TH et al. Alteration of gut microbiome in patients with schizophrenia indicates links between bacterial tyrosine biosynthesis and cognitive dysfunction. Biol Psychiatry Glob Open Sci 2023;3:283–91. PubMed PMC
Varmuza K, Filzmoser P. Introduction to Multivariate Statistical Analysis in Chemometrics. Boca Raton: CRC Press, 2009.
Štefelová N, Palarea-Albaladejo J, Hron K. Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data. Stat Anal 2021;14:315–30.
Štefelová N, de Sousa J, Hron K et al. Selective pivot logratio coordinates for partial least squares discriminant analysis modelling with applications in metabolomics. Stat 2023;12:e592.
Wold A, Sjöström M, Eriksson L. Pls-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst 2001;58:109–30.