Robust principal component analysis for compositional tables

. 2021 ; 48 (2) : 214-233. [epub] 20200204

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid35707689

A data table arranged according to two factors can often be considered a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information consists of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (rPCA) is performed for dimension reduction, allowing to investigate relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply rPCA, which would otherwise suffer from the singularity of clr coefficients.

Zobrazit více v PubMed

Aitchison J., The Statistical Analysis of Compositional Data, Chapman and Hall, London, 1986.

Aitchison J. and Greenacre M., Biplots of compositional data, J. R. Stat. Soc. C-Appl. 51 (2002), pp. 375–392. doi: 10.1111/1467-9876.00275 DOI

Bruno F., Greco F. and Ventrucci M., Spatio-temporal regression on compositional covariates: modeling vegetation in a gypsum outcrop, Environ. Ecol. Stat. 22 (2015), pp. 445–463. doi: 10.1007/s10651-014-0305-4 DOI

Buccianti A., Egozcue J.J. and Pawlowsky-Glahn V., Variation diagrams to statistically model the behavior of geochemical variables: Theory and applications, J. Hydrol. (Amst) 519 (2014), pp. 988–998. doi: 10.1016/j.jhydrol.2014.08.028 DOI

Dickhaus T., Straßburger K., Schunk D., Morcillo-Suarez C., Illig T. and Navarro A., How to analyze many contingency tables simultaneously in genetic association studies, Stat. Appl. Genet. Mol. Biol. 11 (2012), pp. 3026–3034. doi: 10.1515/1544-6115.1776 PubMed DOI

Di Palma M.A., Filzmoser P., Gallo M. and Hron K., A robust Parafac model for compositional data, J. Appl. Stat. 45 (2018), pp. 1347–1369. doi: 10.1080/02664763.2017.1381669 DOI

Dumuid D., Stanford T.E., Martín-Fernández J.A., Pedišić Ž., Maher C.A., Lewis L.K., Hron K., Katzmarzyk P.T., Chaput J.P., Fogelholm M., Hu G., Lambert E.V., Maia J., Sarmiento O.L., Standage M., Barreira T.V., Broyles S.T., Tudor-Locke C., Tremblay M.S. and Olds T., Compositional data analysis for physical activity, sedentary time and sleep research, Stat. Methods. Med. Res. (2018). 10.1177/0962280217710835. PubMed DOI

Egozcue J.J., Díaz-Barrero J.L. and Pawlowsky-Glahn V., Compositional analysis of bivariate discrete probabilities, in Proceedings of CODAWORK'08, The 3rd Compositional Data Analysis Workshop, J. Daunis-i-Estadella, J.A. Martín-Fernández, eds., University of Girona, Spain, 2008.

Egozcue J.J. and Pawlowsky-Glahn V., Groups of parts and their balances in compositional data analysis, Math. Geol. 37 (2005), pp. 795–828. doi: 10.1007/s11004-005-7381-9 DOI

Egozcue J.J., Pawlowsky-Glahn V., Mateu-Figueras G. and Barceló-Vidal C., Isometric logratio transformations for compositional data analysis, Math. Geol. 35 (2003), pp. 279–300. doi: 10.1023/A:1023818214614 DOI

Egozcue J.J., Pawlowsky-Glahn V., Templ M. and Hron K., Independence in contingency tables using simplicial geometry, Commun. Stat. Theory 44 (2015), pp. 3978–3996. doi: 10.1080/03610926.2013.824980 DOI

Fačevicová K., Hron K., Todorov V., Guo D. and Templ M., Logratio approach to statistical analysis of 2 x 2 compositional tables, J. Appl. Stat. 41 (2014), pp. 944–958. doi: 10.1080/02664763.2013.856871 DOI

Fačevicová K., Hron K., Todorov V. and Templ M., Compositional tables analysis in coordinates, Scand. J. Stat. 43 (2016), pp. 962–977. doi: 10.1111/sjos.12223 DOI

Fačevicová K., Hron K., Todorov V. and Templ M., General approach to coordinate representation of compositional tables, Scand. J. Stat. 45 (2018), pp. 879–899. doi: 10.1111/sjos.12326 DOI

Filzmoser P. and Hron K., Outlier detection for compositional data using robust methods, Math. Geosci. 40 (2008), pp. 233–248. doi: 10.1007/s11004-007-9141-5 DOI

Filzmoser P., Hron K. and Reimann C., Principal component analysis for compositional data with outliers, Environmetrics 20 (2009), pp. 621–632. doi: 10.1002/env.966 DOI

Filzmoser P. and Hron K., Robustness for compositional data, in Robustness and Complex Data Structures, C. Becker, R. Fried, and S. Kuhnt, eds., Springer, Berlin, 2013, pp. 117–131.

Fišerová E. and Hron K., On interpretation of orthonormal coordinates for compositional data, Math. Geosci. 43 (2011), pp. 455–468. doi: 10.1007/s11004-011-9333-x DOI

Herder C., Rathmann W., Strassburger K., Finner H., Grallert H., Huth C., Meisinger C., Gieger C., Martin S., Giani G., Scherbaum W.A., Wichmann H.E. and Illig T., Variants of the PPARG, IGF2BP2, CDKAL1, HHEX, and TCF7L2 genes confer risk of type 2 diabetes independently of BMI in the German KORA studies, Horm. Metab. Res. 40 (2008), pp. 722–726. doi: 10.1055/s-2008-1078730 PubMed DOI

Hron K., Filzmoser P., de Caritat P., Fišerová E. and Gardlo A., Weighted pivot coordinates for compositional data and their application to geochemical mapping, Math. Geosci. 49 (2017), pp. 797–814. doi: 10.1007/s11004-017-9684-z DOI

Hubert M., Rousseeuw P.J. and Vanden Branden K., ROBPCA: A new approach to robust principal component analysis, Technometrics 47 (2005), pp. 64–79. doi: 10.1198/004017004000000563 DOI

Johnson R. and Wichern D., Applied Multivariate Statistical Analysis, 6th ed., Prentice-Hall, London, 2007.

Kalivodová A., Hron K., Filzmoser P., Najdekr L., Janečková H. and Adam T., PLS-DA for compositional data with application to metabolomics, J. Chemom. 29 (2015), pp. 21–28. doi: 10.1002/cem.2657 DOI

Kynčlová P., Filzmoser P. and Hron K., Compositional biplots including external non-compositional variables, Statistics 50 (2016), pp. 1132–1148. doi: 10.1080/02331888.2015.1135155 DOI

Maronna R., Martin R.D. and Yohai V.J., Robust Statistics: Theory and Methods, Wiley, New York, 2006.

Mateu-Figueras G., Pawlowsky-Glahn V. and Egozcue J.J., The principle of working on coordinates, in Compositional Data Analysis: Theory and Applications, Wiley, Chichester, 2011, pp. 31–42.

OECD Statistics , Unemployment by sex and age – 2010. Available at http://stats.oecd.org/.

OECD Statistics , Education and training – 2010. Available at http://stats.oecd.org/.

OECD Statistics , Policy indicators of trade and environment – Carbon emissions embodied in trade – 2011. Available at http://stats.oecd.org/.

OECD Statistics , Environment – Material resources – 2017. Available at http://stats.oecd.org/.

OECD Statistics , Environment – Biodiversity of protected areas in

Ortego M.I. and Egozcue J.J., Bayesian estimation of the orthogonal decomposition of a contingency table, Aust. J. Stat. 45 (2016), pp. 45–56. doi: 10.17713/ajs.v45i4.136 DOI

Pawlowsky-Glahn V. and Egozcue J.J., Geometric approach to statistical analysis on the simplex, Stoch. Env. Res. Risk A. 15 (2001), pp. 384–398. doi: 10.1007/s004770100077 DOI

Pawlowsky-Glahn V., Statistical modelling on coordinates, Universitat de Girona, 2003. Available at http://ima.udg.es/Activitats/CoDaWork2003/.

Pawlowsky-Glahn V., Egozcue J.J. and Tolosana-Delgado R., Modeling and Analysis of Compositional Data, Wiley, Chichester, 2015.

R Core Team , R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2018. Available at https://www.R-project.org/.

Templ M., Hron K. and Filzmoser P., robCompositions: An R-package for robust statistical analysis of compositional data, in Compositional Data Analysis. Theory and Applications, V. Pawlowsky-Glahn and A. Buccianti, eds., Wiley, Chichester, 2011, pp. 341–355.

UNdata , Youth unemployment, both sexes – 2011. Available at http://data.un.org/DocumentData.asp/x?id=264#30.

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Compositional cubes: a new concept for multi-factorial compositions

. 2023 ; 64 (3) : 955-985. [epub] 20220811

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...