• This record comes from PubMed

Identifying Important Pairwise Logratios in Compositional Data with Sparse Principal Component Analysis

. 2025 ; 57 (2) : 333-358. [epub] 20241010

Status PubMed-not-MEDLINE Language English Country Germany Media print-electronic

Document type Journal Article

Compositional data are characterized by the fact that their elemental information is contained in simple pairwise logratios of the parts that constitute the composition. While pairwise logratios are typically easy to interpret, the number of possible pairs to consider quickly becomes too large even for medium-sized compositions, which may hinder interpretability in further multivariate analysis. Sparse methods can therefore be useful for identifying a few important pairwise logratios (and parts contained in them) from the total candidate set. To this end, we propose a procedure based on the construction of all possible pairwise logratios and employ sparse principal component analysis to identify important pairwise logratios. The performance of the procedure is demonstrated with both simulated and real-world data. In our empirical analysis, we propose three visual tools showing (i) the balance between sparsity and explained variability, (ii) the stability of the pairwise logratios, and (iii) the importance of the original compositional parts to aid practitioners in their model interpretation.

See more in PubMed

Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, London

Aitchison J, Greenacre M (2002) Biplots for compositional data. J Roy Stat Soc Ser C (Appl Stat) 51(4):375–392

Alfons A, Palarea-Albaladejo J, Filzmoser P, Hron K (2021) Robust regression with compositional covariates including cellwise outliers. Adv Data Anal Classif 15:869–909

Baxter M, Cool H, Heyworth M (1990) Principal component and correspondence analysis of compositional data: some similarities. J Appl Stat 17(2):229–235

Coenders G, Greenacre M (2023) Three approaches to supervised learning for compositional data with pairwise logratios. J Appl Stat 50(16):3272–3293 PubMed PMC

Daunis-i Estadella J, Thió-Henestrosa S, Mateu-Figueras G (2011) Including supplementary elements in a compositional biplot. Comput Geosci 37(5):696–701

Di Palma M, Filzmoser P, Gallo M, Hron K (2018) A robust parafac model for compositional data. J Appl Stat 45(8):1347–1369

Erichson NB, Zheng P, Aravkin S (2018) sparsepca: sparse principal component analysis (SPCA). R package version 0.1.2

Erichson NB, Zheng P, Manohar K, Brunton SL, Kutz JN, Aravkin AY (2020) Sparse principal component analysis via variable projection. SIAM J Appl Math 80(2):977–1002

Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874

Filzmoser P, Hron K, Templ M (2018) Applied compositional data analysis. Springer, Berlin

Greenacre M (2018) Compositional Data in Practice. CRC Press, Boca Raton

Greenacre M (2019) Variable selection in compositional data analysis using pairwise logratios. Math Geosci 51(5):649–682

Greenacre M (2020) easyCODA: compositional Data Analysis in Practice. R package version 0.34.3

Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

Hron K, Coenders G, Filzmoser P, Palarea-Albaladejo J, Faměra M, Grygar TM (2021) Analysing pairwise logratios revisited. Math Geosci 53:1643–1666

Martín-Fernández J, Pawlowsky-Glahn V, Egozcue J, Tolosona-Delgado R (2018) Advances in principal balances for compositional data. Math Geosci 50(3):273–298

Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Chichester

R Core Team (2023) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol 58(1):267–288

Tolosana-Delgado R, Talebi H, Khodadadzadeh M, van den Boogaart KG (2019) On machine learning algorithms and compositional data. In: Egozcue J, Graffelman M, Ortego J (eds) Proceedings of the 8th international workshop on compositional data analysis (CoDaWork2019): Terrassa, 3-8 June, 2019. Universitat Politécnica de Catalunya-BarcelonaTECH, Les Corts, pp 172–175

van den Boogaart K G, Tolosana-Delgado R, Bren M (2021) Compositions: compositional data analysis. R package version 2.0-1

von Eynatten H, Tolosana-Delgado R, Karius V (2012) Sediment generation in modern glacial settings: Grain-size and source-rock control on sediment composition. Sed Geol 280:80–92

Walach J, Filzmoser P, Hron K, Walczak B, Najdekr L (2017) Robust biomarker identification in a two-class problem based on pairwise log-ratios. Chemom Intell Lab Syst 171:277–285

Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67(2):301–320

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...