Identifying Important Pairwise Logratios in Compositional Data with Sparse Principal Component Analysis
Status PubMed-not-MEDLINE Language English Country Germany Media print-electronic
Document type Journal Article
PubMed
39925888
PubMed Central
PMC11805788
DOI
10.1007/s11004-024-10159-0
PII: 10159
Knihovny.cz E-resources
- Keywords
- Compositional data, Geochemical data, Pairwise logratios, Sparse PCA,
- Publication type
- Journal Article MeSH
Compositional data are characterized by the fact that their elemental information is contained in simple pairwise logratios of the parts that constitute the composition. While pairwise logratios are typically easy to interpret, the number of possible pairs to consider quickly becomes too large even for medium-sized compositions, which may hinder interpretability in further multivariate analysis. Sparse methods can therefore be useful for identifying a few important pairwise logratios (and parts contained in them) from the total candidate set. To this end, we propose a procedure based on the construction of all possible pairwise logratios and employ sparse principal component analysis to identify important pairwise logratios. The performance of the procedure is demonstrated with both simulated and real-world data. In our empirical analysis, we propose three visual tools showing (i) the balance between sparsity and explained variability, (ii) the stability of the pairwise logratios, and (iii) the importance of the original compositional parts to aid practitioners in their model interpretation.
See more in PubMed
Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, London
Aitchison J, Greenacre M (2002) Biplots for compositional data. J Roy Stat Soc Ser C (Appl Stat) 51(4):375–392
Alfons A, Palarea-Albaladejo J, Filzmoser P, Hron K (2021) Robust regression with compositional covariates including cellwise outliers. Adv Data Anal Classif 15:869–909
Baxter M, Cool H, Heyworth M (1990) Principal component and correspondence analysis of compositional data: some similarities. J Appl Stat 17(2):229–235
Coenders G, Greenacre M (2023) Three approaches to supervised learning for compositional data with pairwise logratios. J Appl Stat 50(16):3272–3293 PubMed PMC
Daunis-i Estadella J, Thió-Henestrosa S, Mateu-Figueras G (2011) Including supplementary elements in a compositional biplot. Comput Geosci 37(5):696–701
Di Palma M, Filzmoser P, Gallo M, Hron K (2018) A robust parafac model for compositional data. J Appl Stat 45(8):1347–1369
Erichson NB, Zheng P, Aravkin S (2018) sparsepca: sparse principal component analysis (SPCA). R package version 0.1.2
Erichson NB, Zheng P, Manohar K, Brunton SL, Kutz JN, Aravkin AY (2020) Sparse principal component analysis via variable projection. SIAM J Appl Math 80(2):977–1002
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Filzmoser P, Hron K, Templ M (2018) Applied compositional data analysis. Springer, Berlin
Greenacre M (2018) Compositional Data in Practice. CRC Press, Boca Raton
Greenacre M (2019) Variable selection in compositional data analysis using pairwise logratios. Math Geosci 51(5):649–682
Greenacre M (2020) easyCODA: compositional Data Analysis in Practice. R package version 0.34.3
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Hron K, Coenders G, Filzmoser P, Palarea-Albaladejo J, Faměra M, Grygar TM (2021) Analysing pairwise logratios revisited. Math Geosci 53:1643–1666
Martín-Fernández J, Pawlowsky-Glahn V, Egozcue J, Tolosona-Delgado R (2018) Advances in principal balances for compositional data. Math Geosci 50(3):273–298
Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Chichester
R Core Team (2023) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol 58(1):267–288
Tolosana-Delgado R, Talebi H, Khodadadzadeh M, van den Boogaart KG (2019) On machine learning algorithms and compositional data. In: Egozcue J, Graffelman M, Ortego J (eds) Proceedings of the 8th international workshop on compositional data analysis (CoDaWork2019): Terrassa, 3-8 June, 2019. Universitat Politécnica de Catalunya-BarcelonaTECH, Les Corts, pp 172–175
van den Boogaart K G, Tolosana-Delgado R, Bren M (2021) Compositions: compositional data analysis. R package version 2.0-1
von Eynatten H, Tolosana-Delgado R, Karius V (2012) Sediment generation in modern glacial settings: Grain-size and source-rock control on sediment composition. Sed Geol 280:80–92
Walach J, Filzmoser P, Hron K, Walczak B, Najdekr L (2017) Robust biomarker identification in a two-class problem based on pairwise log-ratios. Chemom Intell Lab Syst 171:277–285
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67(2):301–320