Robust principal component analysis for compositional tables
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
35707689
PubMed Central
PMC9041953
DOI
10.1080/02664763.2020.1722078
PII: 1722078
Knihovny.cz E-zdroje
- Klíčová slova
- Compositional data, compositional table, independence table, interaction table, pivot coordinates, robust principal component analysis,
- Publikační typ
- časopisecké články MeSH
A data table arranged according to two factors can often be considered a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information consists of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (rPCA) is performed for dimension reduction, allowing to investigate relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply rPCA, which would otherwise suffer from the singularity of clr coefficients.
Zobrazit více v PubMed
Aitchison J., The Statistical Analysis of Compositional Data, Chapman and Hall, London, 1986.
Aitchison J. and Greenacre M., Biplots of compositional data, J. R. Stat. Soc. C-Appl. 51 (2002), pp. 375–392. doi: 10.1111/1467-9876.00275 DOI
Bruno F., Greco F. and Ventrucci M., Spatio-temporal regression on compositional covariates: modeling vegetation in a gypsum outcrop, Environ. Ecol. Stat. 22 (2015), pp. 445–463. doi: 10.1007/s10651-014-0305-4 DOI
Buccianti A., Egozcue J.J. and Pawlowsky-Glahn V., Variation diagrams to statistically model the behavior of geochemical variables: Theory and applications, J. Hydrol. (Amst) 519 (2014), pp. 988–998. doi: 10.1016/j.jhydrol.2014.08.028 DOI
Dickhaus T., Straßburger K., Schunk D., Morcillo-Suarez C., Illig T. and Navarro A., How to analyze many contingency tables simultaneously in genetic association studies, Stat. Appl. Genet. Mol. Biol. 11 (2012), pp. 3026–3034. doi: 10.1515/1544-6115.1776 PubMed DOI
Di Palma M.A., Filzmoser P., Gallo M. and Hron K., A robust Parafac model for compositional data, J. Appl. Stat. 45 (2018), pp. 1347–1369. doi: 10.1080/02664763.2017.1381669 DOI
Dumuid D., Stanford T.E., Martín-Fernández J.A., Pedišić Ž., Maher C.A., Lewis L.K., Hron K., Katzmarzyk P.T., Chaput J.P., Fogelholm M., Hu G., Lambert E.V., Maia J., Sarmiento O.L., Standage M., Barreira T.V., Broyles S.T., Tudor-Locke C., Tremblay M.S. and Olds T., Compositional data analysis for physical activity, sedentary time and sleep research, Stat. Methods. Med. Res. (2018). 10.1177/0962280217710835. PubMed DOI
Egozcue J.J., Díaz-Barrero J.L. and Pawlowsky-Glahn V., Compositional analysis of bivariate discrete probabilities, in Proceedings of CODAWORK'08, The 3rd Compositional Data Analysis Workshop, J. Daunis-i-Estadella, J.A. Martín-Fernández, eds., University of Girona, Spain, 2008.
Egozcue J.J. and Pawlowsky-Glahn V., Groups of parts and their balances in compositional data analysis, Math. Geol. 37 (2005), pp. 795–828. doi: 10.1007/s11004-005-7381-9 DOI
Egozcue J.J., Pawlowsky-Glahn V., Mateu-Figueras G. and Barceló-Vidal C., Isometric logratio transformations for compositional data analysis, Math. Geol. 35 (2003), pp. 279–300. doi: 10.1023/A:1023818214614 DOI
Egozcue J.J., Pawlowsky-Glahn V., Templ M. and Hron K., Independence in contingency tables using simplicial geometry, Commun. Stat. Theory 44 (2015), pp. 3978–3996. doi: 10.1080/03610926.2013.824980 DOI
Fačevicová K., Hron K., Todorov V., Guo D. and Templ M., Logratio approach to statistical analysis of 2 x 2 compositional tables, J. Appl. Stat. 41 (2014), pp. 944–958. doi: 10.1080/02664763.2013.856871 DOI
Fačevicová K., Hron K., Todorov V. and Templ M., Compositional tables analysis in coordinates, Scand. J. Stat. 43 (2016), pp. 962–977. doi: 10.1111/sjos.12223 DOI
Fačevicová K., Hron K., Todorov V. and Templ M., General approach to coordinate representation of compositional tables, Scand. J. Stat. 45 (2018), pp. 879–899. doi: 10.1111/sjos.12326 DOI
Filzmoser P. and Hron K., Outlier detection for compositional data using robust methods, Math. Geosci. 40 (2008), pp. 233–248. doi: 10.1007/s11004-007-9141-5 DOI
Filzmoser P., Hron K. and Reimann C., Principal component analysis for compositional data with outliers, Environmetrics 20 (2009), pp. 621–632. doi: 10.1002/env.966 DOI
Filzmoser P. and Hron K., Robustness for compositional data, in Robustness and Complex Data Structures, C. Becker, R. Fried, and S. Kuhnt, eds., Springer, Berlin, 2013, pp. 117–131.
Fišerová E. and Hron K., On interpretation of orthonormal coordinates for compositional data, Math. Geosci. 43 (2011), pp. 455–468. doi: 10.1007/s11004-011-9333-x DOI
Herder C., Rathmann W., Strassburger K., Finner H., Grallert H., Huth C., Meisinger C., Gieger C., Martin S., Giani G., Scherbaum W.A., Wichmann H.E. and Illig T., Variants of the PPARG, IGF2BP2, CDKAL1, HHEX, and TCF7L2 genes confer risk of type 2 diabetes independently of BMI in the German KORA studies, Horm. Metab. Res. 40 (2008), pp. 722–726. doi: 10.1055/s-2008-1078730 PubMed DOI
Hron K., Filzmoser P., de Caritat P., Fišerová E. and Gardlo A., Weighted pivot coordinates for compositional data and their application to geochemical mapping, Math. Geosci. 49 (2017), pp. 797–814. doi: 10.1007/s11004-017-9684-z DOI
Hubert M., Rousseeuw P.J. and Vanden Branden K., ROBPCA: A new approach to robust principal component analysis, Technometrics 47 (2005), pp. 64–79. doi: 10.1198/004017004000000563 DOI
Johnson R. and Wichern D., Applied Multivariate Statistical Analysis, 6th ed., Prentice-Hall, London, 2007.
Kalivodová A., Hron K., Filzmoser P., Najdekr L., Janečková H. and Adam T., PLS-DA for compositional data with application to metabolomics, J. Chemom. 29 (2015), pp. 21–28. doi: 10.1002/cem.2657 DOI
Kynčlová P., Filzmoser P. and Hron K., Compositional biplots including external non-compositional variables, Statistics 50 (2016), pp. 1132–1148. doi: 10.1080/02331888.2015.1135155 DOI
Maronna R., Martin R.D. and Yohai V.J., Robust Statistics: Theory and Methods, Wiley, New York, 2006.
Mateu-Figueras G., Pawlowsky-Glahn V. and Egozcue J.J., The principle of working on coordinates, in Compositional Data Analysis: Theory and Applications, Wiley, Chichester, 2011, pp. 31–42.
OECD Statistics , Unemployment by sex and age – 2010. Available at http://stats.oecd.org/.
OECD Statistics , Education and training – 2010. Available at http://stats.oecd.org/.
OECD Statistics , Policy indicators of trade and environment – Carbon emissions embodied in trade – 2011. Available at http://stats.oecd.org/.
OECD Statistics , Environment – Material resources – 2017. Available at http://stats.oecd.org/.
OECD Statistics , Environment – Biodiversity of protected areas in
Ortego M.I. and Egozcue J.J., Bayesian estimation of the orthogonal decomposition of a contingency table, Aust. J. Stat. 45 (2016), pp. 45–56. doi: 10.17713/ajs.v45i4.136 DOI
Pawlowsky-Glahn V. and Egozcue J.J., Geometric approach to statistical analysis on the simplex, Stoch. Env. Res. Risk A. 15 (2001), pp. 384–398. doi: 10.1007/s004770100077 DOI
Pawlowsky-Glahn V., Statistical modelling on coordinates, Universitat de Girona, 2003. Available at http://ima.udg.es/Activitats/CoDaWork2003/.
Pawlowsky-Glahn V., Egozcue J.J. and Tolosana-Delgado R., Modeling and Analysis of Compositional Data, Wiley, Chichester, 2015.
R Core Team , R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2018. Available at https://www.R-project.org/.
Templ M., Hron K. and Filzmoser P., robCompositions: An R-package for robust statistical analysis of compositional data, in Compositional Data Analysis. Theory and Applications, V. Pawlowsky-Glahn and A. Buccianti, eds., Wiley, Chichester, 2011, pp. 341–355.
UNdata , Youth unemployment, both sexes – 2011. Available at http://data.un.org/DocumentData.asp/x?id=264#30.
Compositional cubes: a new concept for multi-factorial compositions