Orthonormal pairwise logratio selection (OPALS) algorithm for compositional data analysis in high dimensions

. 2025 ; 5 (1) : vbaf229. [epub] 20251001

Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid41293320

SUMMARY: In the analysis of compositional data, the most fundamental information is conveyed by the pairwise logratios between components. While logratio coordinate representations, such as balances and pivot coordinates, are widely used to aggregate such information into higher-level relationships, there are instances where a fine-grained representation using all pairwise logratios can be advantageous. Performing this within an orthonormal (or orthogonal) logratio coordinate framework becomes particularly challenging for high-dimensional compositions, since a composition with D parts results in D ( D - 1 ) / 2 pairwise logratios (excluding reciprocals). This work presents an efficient algorithm (OPALS) based on Latin squares theory to obtain all orthonormal pairwise logratios from just D - 1 logratio coordinate systems. Thus, the computational burden associated with using such representation for data analysis and modelling in high dimensions is notably alleviated, or even made feasible. Moreover, the relationship between estimates from orthonormal pairwise logratios and ordinary pivot coordinates is discussed in the context of regression and classification analysis. AVAILABILITY AND IMPLEMENTATION: The OPALS algorithm is described in detail in this article and can be implemented directly from the provided methodology. The performance and properties of the method are illustrated through two examples using contemporary molecular biology data.

Zobrazit více v PubMed

Acharya C, Sahingur SE, Bajaj JS.  Microbiota, cirrhosis, and the emerging oral-gut-liver axis. JCI Insight  2017;2:e94416. PubMed PMC

Aitchison J.  The statistical analysis of compositional data. J R Stat Soc Ser B Stat Methodol  1982;44:139–60.

Aitchison J.  The Statistical Analysis of Compositional Data. London: Chapman and Hall, 1986.

Barker M, Rayens W.  Partial least squares for discrimination. J Chemom  2003;17:166–73.

Bates S, Tibshirani R.  Log-ratio lasso: scalable, sparse estimation for log-ratio models. Biometrics  2019;75:613–24. PubMed PMC

Benjamini Y, Hochberg Y.  Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol  1995;57:289–300.

Berland M, Meslier V, Berreira Ibraim S  et al.  Both disease activity and hla–b27 status are associated with gut microbiome dysbiosis in spondyloarthritis patients. Arthritis Rheumatol  2023;75:41–52. PubMed PMC

Bica R, Palarea-Albaladejo J, Kew W  et al.  Nuclear magnetic resonance to detect rumen metabolites associated with enteric methane emissions from beef cattle. Sci Rep  2020;10:5578. PubMed PMC

Casselgren CJ, Häggkvist R.  Completing partial latin squares with one filled row, column and symbol. Discrete Math  2013;313:1011–7.

Champion C, Momal R, Chatelier EL  et al. Microbial species abundances from public project prjeb6337 on liver cirrhosis. Technical report. Recherche Data Gouv DOI

Coenders G, Greenacre M.  Three approaches to supervised learning for compositional data with pairwise logratios. J Appl Stat  2023;50:3272–93. PubMed PMC

Coenders G, Pawlowsky-Glahn V.  On interpretations of tests and effect sizes in regression models with a compositional predictor. Stat Oper Res Transac  2020;44:201–20.

Egozcue J, Pawlowsky-Glahn V.  Groups of parts and their balances in compositional data analysis. Math Geol  2005;37:795–828.

Egozcue J, Pawlowsky-Glahn V, Mateu-Figueras G  et al.  Isometric logratio transformations for compositional data analysis. Math Geol  2003;35:279–300.

Filzmoser P, Hron K, Templ M.  Applied Compositional Data Analysis. Cham: Springer, 2018.

Gloor G, Macklaim J, Pawlowsky-Glahn V  et al.  Microbiome datasets are compositional: and this is not optional. Front Microbiol  2017;8:2224. PubMed PMC

Gordon-Rodriguez E, Quinn T, Cunningham J.  Learning sparse log-ratios for high-throughput sequencing data. Bioinformatics  2021;38:157–63. PubMed PMC

Hron K, Filzmoser P, Thompson K.  Linear regression with compositional explanatory variables. J Appl Stat  2012;39:1115–28.

Hron K, Coenders G, Filzmoser P  et al.  Analysing pairwise logratios revisited. Math Geosci  2021;53:1643–66.

Kalivodová A, Hron K, Filzmoser P  et al.  PLS-DA for compositional data with application to metabolomics. J Chemom  2015;29:21–8.

Martín-Fernández J.  Comments on: compositional data: the sample space and its structure. TEST  2019;28:653–7.

McGregor D, Palarea-Albaladejo J, Dall P  et al.  Cox regression survival analysis with compositional covariates: application to modelling mortality risk from 24-h physical activity patterns. Stat Methods Med Res  2020;29:1447–65. PubMed

Müller I, Hron K, Fišerová E  et al.  Interpretation of compositional regression with application to time budget analysis. AJS  2018;47:3–19.

Nesrstová V, Jašková P, Pavlů I  et al.  Simple enough, but not simpler: reconsidering additive logratio coordinates in compositional analysis. Stat Oper Res Transac  2023. a;47:269–94.

Nesrstová V, Wilms I, Palarea-Albaladejo J  et al.  Principal balances of compositional data for regression and classification using partial least squares. J Chemom  2023. b;37:e3518.

Nesrstová V, Wilms I, Hron K  et al.  Identifying important pairwise logratios in compositional data with sparse principal component analysis. Math Geosci  2025;57:333–58. PubMed PMC

Palarea-Albaladejo J, Rooke J, Nevison I  et al.  Compositional mixed modeling of methane emissions and ruminal volatile fatty acids from individual cattle and multiple experiments. J Anim Sci  2017;95:2467–80. PubMed

Pawlowsky-Glahn V, Egozcue J, Tolosana-Delgado R.  Modeling and Analysis of Compositional Data. Chichester: Wiley, 2015.

Plaza Oñate F, Le Chatelier E, Almeida M  et al.  Mspminer: abundance-based reconstitution of microbial pan-genomes from shotgun metagenomic data. Bioinformatics  2019;35:1544–52. PubMed PMC

Plaza Oñate F, Pons N, Gauthier F  et al. Updated metagenomic species pan-genomes (msps) of the human gastrointestinal microbiota. Technical report DOI

Qin N, Yang F, Li A  et al.  Alterations of the human gut microbiome in liver cirrhosis. Nature  2014;513:59–64. PubMed

Quinn T, Erb I.  Interpretable log contrasts for the classification of health biomarkers: a new approach to balance selection. mSystems  2020;5:e00230–19. PubMed PMC

Rivera-Pinto J, Egozcue JJ, Pawlowsky-Glahn V  et al.  Balances: a new perspective for microbiome analysis. mSystems  2018;3:00053-18. PubMed PMC

Saperas-Riera J, Mateu-Figueras G, Martín-Fernández J.  Lasso regression method for a compositional covariate regularised by the norm l1 pairwise logratio. J Geochem Explor  2023;255:107327.

Solé C, Guilly S, Da Silva K  et al.  Alterations in gut microbiome in cirrhosis as assessed by quantitative metagenomics: relationship with acute-on-chronic liver failure and prognosis. Gastroenterology  2021;160:206–18.e13. PubMed

Susin A, Wang Y, Lê Cao K  et al.  Variable selection in microbiome compositional data analysis. NAR Genom Bioinform  2020;2:lqaa029. PubMed PMC

Thirion F, Speyer H, Hansen TH  et al.  Alteration of gut microbiome in patients with schizophrenia indicates links between bacterial tyrosine biosynthesis and cognitive dysfunction. Biol Psychiatry Glob Open Sci  2023;3:283–91. PubMed PMC

Varmuza K, Filzmoser P.  Introduction to Multivariate Statistical Analysis in Chemometrics. Boca Raton: CRC Press, 2009.

Štefelová N, Palarea-Albaladejo J, Hron K.  Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data. Stat Anal  2021;14:315–30.

Štefelová N, de Sousa J, Hron K  et al.  Selective pivot logratio coordinates for partial least squares discriminant analysis modelling with applications in metabolomics. Stat  2023;12:e592.

Wold A, Sjöström M, Eriksson L.  Pls-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst  2001;58:109–30.

Najít záznam

Citační ukazatele

Pouze přihlášení uživatelé

Možnosti archivace

Nahrávání dat ...