Compositional data analysis
Dotaz
Zobrazit nápovědu
The health effects of daily activity behaviours (physical activity, sedentary time and sleep) are widely studied. While previous research has largely examined activity behaviours in isolation, recent studies have adjusted for multiple behaviours. However, the inclusion of all activity behaviours in traditional multivariate analyses has not been possible due to the perfect multicollinearity of 24-h time budget data. The ensuing lack of adjustment for known effects on the outcome undermines the validity of study findings. We describe a statistical approach that enables the inclusion of all daily activity behaviours, based on the principles of compositional data analysis. Using data from the International Study of Childhood Obesity, Lifestyle and the Environment, we demonstrate the application of compositional multiple linear regression to estimate adiposity from children's daily activity behaviours expressed as isometric log-ratio coordinates. We present a novel method for predicting change in a continuous outcome based on relative changes within a composition, and for calculating associated confidence intervals to allow for statistical inference. The compositional data analysis presented overcomes the lack of adjustment that has plagued traditional statistical methods in the field, and provides robust and reliable insights into the health effects of daily activity behaviours.
- Klíčová slova
- Compositional data analysis, multicollinearity, physical activity, sedentary behaviour, sleep,
- MeSH
- cvičení * MeSH
- dítě MeSH
- interpretace statistických dat * MeSH
- lidé MeSH
- obezita dětí a dospívajících * MeSH
- sedavý životní styl * MeSH
- spánek * MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Although there is an increasing awareness of the suitability of using compositional data methodology in public health research, classical methods of statistical analysis have been primarily used so far. The present study aims to illustrate the potential of robust statistics to model movement behaviour using Czech adolescent data. We investigated: (1) the inter-relationship between various physical activity (PA) intensities, extended to model relationships by age; and (2) the associations between adolescents' PA and sedentary behavior (SB) structure and obesity. These research questions were addressed using three different types of compositional regression analysis-compositional covariates, compositional response, and regression between compositional parts. Robust counterparts of classical regression methods were used to lessen the influence of possible outliers. We outlined the differences in both classical and robust methods of compositional data analysis. There was a pattern in Czech adolescents' movement/non-movement behavior-extensive SB was related to higher amounts of light-intensity PA, and vigorous PA ratios formed the main source of potential aberrant observations; aging is associated with more SB and vigorous PA at the expense of light-intensity PA and moderate-intensity PA. The robust counterparts indicated that they might provide more stable estimates in the presence of outlying observations. The findings suggested that replacing time spent in SB with vigorous PA may be a powerful tool against adolescents' obesity.
- Klíčová slova
- compositional data, compositional linear regression, log-ratio methodology, physical activity, pivot coordinates,
- MeSH
- chování mladistvých * MeSH
- cvičení * MeSH
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- obezita dětí a dospívajících etiologie MeSH
- regresní analýza * MeSH
- sedavý životní styl * MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
In recent years, the focus of activity behavior research has shifted away from univariate paradigms (e.g., physical activity, sedentary behavior and sleep) to a 24-h time-use paradigm that integrates all daily activity behaviors. Behaviors are analyzed relative to each other, rather than as individual entities. Compositional data analysis (CoDA) is increasingly used for the analysis of time-use data because it is intended for data that convey relative information. While CoDA has brought new understanding of how time use is associated with health, it has also raised challenges in how this methodology is applied, and how the findings are interpreted. In this paper we provide a brief overview of CoDA for time-use data, summarize current CoDA research in time-use epidemiology and discuss challenges and future directions. We use 24-h time-use diary data from Wave 6 of the Longitudinal Study of Australian Children (birth cohort, n = 3228, aged 10.9 ± 0.3 years) to demonstrate descriptive analyses of time-use compositions and how to explore the relationship between daily time use (sleep, sedentary behavior and physical activity) and a health outcome (in this example, adiposity). We illustrate how to comprehensively interpret the CoDA findings in a meaningful way.
- Klíčová slova
- compositional data, physical activity, sedentary behavior, sleep,
- MeSH
- adipozita MeSH
- analýza dat * MeSH
- činnosti denního života MeSH
- cvičení * MeSH
- dítě MeSH
- kohortové studie MeSH
- lidé MeSH
- longitudinální studie MeSH
- sedavý životní styl * MeSH
- spánek MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Austrálie MeSH
Most data in environmental sciences and geochemistry are compositional. Already the unit used to report the data (e.g., μg/l, mg/kg, wt%) implies that the analytical results for each element are not free to vary independently of the other measured variables. This is often neglected in statistical analysis, where a simple log-transformation of the single variables is insufficient to put the data into an acceptable geometry. This is also important for bivariate data analysis and for correlation analysis, for which the data need to be appropriately log-ratio transformed. A new approach based on the isometric log-ratio (ilr) transformation, leading to so-called symmetric coordinates, is presented here. Summarizing the correlations in a heat-map gives a powerful tool for bivariate data analysis. Here an application of the new method using a data set from a regional geochemical mapping project based on soil O and C horizon samples is demonstrated. Differences to 'classical' correlation analysis based on log-transformed data are highlighted. The fact that some expected strong positive correlations appear and remain unchanged even following a log-ratio transformation has probably led to the misconception that the special nature of compositional data can be ignored when working with trace elements. The example dataset is employed to demonstrate that using 'classical' correlation analysis and plotting XY diagrams, scatterplots, based on the original or simply log-transformed data can easily lead to severe misinterpretations of the relationships between elements.
- Klíčová slova
- CoDa, Compositional data analysis, Correlation, Log-ratio methodology, Scatterplot,
- Publikační typ
- časopisecké články MeSH
Compositional data are characterized by the fact that their elemental information is contained in simple pairwise logratios of the parts that constitute the composition. While pairwise logratios are typically easy to interpret, the number of possible pairs to consider quickly becomes too large even for medium-sized compositions, which may hinder interpretability in further multivariate analysis. Sparse methods can therefore be useful for identifying a few important pairwise logratios (and parts contained in them) from the total candidate set. To this end, we propose a procedure based on the construction of all possible pairwise logratios and employ sparse principal component analysis to identify important pairwise logratios. The performance of the procedure is demonstrated with both simulated and real-world data. In our empirical analysis, we propose three visual tools showing (i) the balance between sparsity and explained variability, (ii) the stability of the pairwise logratios, and (iii) the importance of the original compositional parts to aid practitioners in their model interpretation.
- Klíčová slova
- Compositional data, Geochemical data, Pairwise logratios, Sparse PCA,
- Publikační typ
- časopisecké články MeSH
Clinical metabolomics aims at finding statistically significant differences in metabolic statuses of patient and control groups with the intention of understanding pathobiochemical processes and identification of clinically useful biomarkers of particular diseases. After the raw measurements are integrated and pre-processed as intensities of chromatographic peaks, the differences between controls and patients are evaluated by both univariate and multivariate statistical methods. The traditional univariate approach relies on t-tests (or their nonparametric alternatives) and the results from multiple testing are misleadingly compared merely by p-values using the so-called volcano plot. This paper proposes a Bayesian counterpart to the widespread univariate analysis, taking into account the compositional character of a metabolome. Since each metabolome is a collection of some small-molecule metabolites in a biological material, the relative structure of metabolomic data, which is inherently contained in ratios between metabolites, is of the main interest. Therefore, a proper choice of logratio coordinates is an essential step for any statistical analysis of such data. In addition, a concept of b-values is introduced together with a Bayesian version of the volcano plot incorporating distance levels of the posterior highest density intervals from zero. The theoretical background of the contribution is illustrated using two data sets containing samples of patients suffering from 3-hydroxy-3-methylglutaryl-CoA lyase deficiency and medium-chain acyl-CoA dehydrogenase deficiency. To evaluate the stability of the proposed method as well as the benefits of the compositional approach, two simulations designed to mimic a loss of samples and a systematical measurement error, respectively, are added.
- Klíčová slova
- Bayesian inference, Compositional data, High-dimensional data, Multiple hypotheses testing, Untargeted metabolomics, Volcano plot,
- MeSH
- acetyl-CoA-C-acetyltransferasa nedostatek metabolismus MeSH
- acyl-CoA-dehydrogenasa nedostatek metabolismus MeSH
- Bayesova věta * MeSH
- datové soubory jako téma MeSH
- lidé MeSH
- metabolomika * MeSH
- vrozené poruchy metabolismu aminokyselin metabolismus MeSH
- vrozené poruchy metabolismu tuků metabolismus MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- acetyl-CoA-C-acetyltransferasa MeSH
- acyl-CoA-dehydrogenasa MeSH
A data table arranged according to two factors can often be considered a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information consists of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (rPCA) is performed for dimension reduction, allowing to investigate relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply rPCA, which would otherwise suffer from the singularity of clr coefficients.
- Klíčová slova
- Compositional data, compositional table, independence table, interaction table, pivot coordinates, robust principal component analysis,
- Publikační typ
- časopisecké články MeSH
INTRODUCTION: It is unclear whether adiposity leads to changes in movement behaviors, and there is a lack of compositional analyses of longitudinal data which focus on these associations. Using a compositional approach, this study aimed to examine the associations between baseline adiposity and 7-year changes in physical activity (PA) and sedentary behavior (SB) among elderly women. We also explored the longitudinal associations between change in adiposity and change in movement-behavior composition. METHODS: This longitudinal study included 176 older women (mean baseline age 62.8 (4.1) years) from Central Europe. Movement behavior was assessed by accelerometers and adiposity was measured by bioelectrical impedance analysis at baseline and follow-up. A set of multivariate least-squares regression analyses was used to examine the associations of baseline adiposity and longitudinal changes in adiposity as explanatory variables with longitudinal changes in a 3-part movement-behavior composition consisting of SB, light PA (LPA) and moderate-to-vigorous PA (MVPA) as outcome variables. RESULTS: No significant associations were found between baseline adiposity and longitudinal changes in the movement-behavior composition (p > 0.05). We found significant associations of changes in body mass index (BMI) and fat mass percentage (FM%) with changes in the movement-behavior composition. An increase in BMI was associated with an increase of SB at the expense of LPA and MVPA (β = 0.042, p = 0.009) and with a decrease of MVPA in favor of SB and LPA (β = - 0.059, p = 0.037). An increase in FM% was significantly associated only with an increase of SB at the expense of LPA and MVPA (β = 0.019, p = 0.031). CONCLUSIONS: This study did not support the assumption that baseline adiposity is associated with longitudinal changes in movement behaviors among elderly women, but we found evidence for change-to-change associations, suggesting that a 7-year increase in adiposity is associated with a concurrent increase of SB at the expense of LPA and MVPA and with a concurrent decrease of MVPA in favor of LPA and SB. Public health interventions are needed to simultaneously prevent weight gain and promote physically active lifestyle among elderly women.
- Klíčová slova
- Compositional data analysis, Exercise, Fatness, Obesity, Sitting, Time-use epidemiology,
- MeSH
- adipozita * MeSH
- akcelerometrie MeSH
- analýza dat * MeSH
- index tělesné hmotnosti MeSH
- lidé MeSH
- longitudinální studie MeSH
- prospektivní studie MeSH
- průřezové studie MeSH
- senioři MeSH
- Check Tag
- lidé MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Evropa MeSH
Solid-phase microextraction in headspace mode coupled with gas chromatography-mass spectrometry was applied to the determination of volatile compounds in 30 commercially available coffee samples. In order to differentiate and characterize Arabica and Robusta coffee, six major volatile compounds (acetic acid, 2-methylpyrazine, furfural, 2-furfuryl alcohol, 2,6-dimethylpyrazine, 5-methylfurfural) were chosen as the most relevant markers. Cluster analysis and principal component analysis (PCA) were applied to the raw chromatographic data and data processed by centred logratio transformation.
- MeSH
- 2-furaldehyd analogy a deriváty analýza izolace a purifikace MeSH
- analýza hlavních komponent MeSH
- káva chemie klasifikace MeSH
- kyselina octová analýza izolace a purifikace MeSH
- mikroextrakce na pevné fázi metody MeSH
- plynová chromatografie s hmotnostně spektrometrickou detekcí metody MeSH
- pyraziny analýza izolace a purifikace MeSH
- shluková analýza MeSH
- těkavé organické sloučeniny analýza izolace a purifikace MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- 2-furaldehyd MeSH
- 2,5-dimethylpyrazine MeSH Prohlížeč
- 5-methyl-2-furfural MeSH Prohlížeč
- káva MeSH
- kyselina octová MeSH
- pyraziny MeSH
- těkavé organické sloučeniny MeSH
BACKGROUND: Between-person differences in sedentary patterns should be considered to understand the role of sedentary behavior (SB) in the development of childhood obesity. This study took a novel approach based on compositional data analysis to examine associations between SB patterns and adiposity and investigate differences in adiposity associated with time reallocation between time spent in sedentary bouts of different duration and physical activity. METHODS: An analysis of cross-sectional data was performed in 425 children aged 7-12 years (58% girls). Waking behaviors were assessed using ActiGraph GT3X accelerometer for seven consecutive days. Multi-frequency bioimpedance measurement was used to determine adiposity. Compositional regression models with robust estimators were used to analyze associations between sedentary patterns and adiposity markers. To examine differences in adiposity associated with time reallocation, we used the compositional isotemporal substitution model. RESULTS: Significantly higher fat mass percentage (FM%; βilr1 = 0.18; 95% CI: 0.01, 0.34; p = 0.040) and visceral adipose tissue (VAT; βilr1 = 0.37; 95% CI: 0.03, 0.71; p = 0.034) were associated with time spent in middle sedentary bouts in duration of 10-29 min (relative to remaining behaviors). No significant associations were found for short (< 10 min) and long sedentary bouts (≥30 min). Substituting the time spent in total SB with moderate-to-vigorous physical activity (MVPA) was associated with a decrease in VAT. Substituting 1 h/week of the time spent in middle sedentary bouts with MVPA was associated with 2.9% (95% CI: 1.2, 4.6), 3.4% (95% CI: 1.2, 5.5), and 6.1% (95% CI: 2.9, 9.2) lower FM%, fat mass index, and VAT, respectively. Moreover, substituting 2 h/week of time spent in middle sedentary bouts with short sedentary bouts was associated with 3.5% (95% CI: 0.02, 6.9) lower FM%. CONCLUSIONS: Our findings suggest that adiposity status could be improved by increasing MVPA at the expense of time spent in middle sedentary bouts. Some benefits to adiposity may also be expected from replacing middle sedentary bouts with short sedentary bouts, that is, by taking standing or activity breaks more often. These findings may help design more effective interventions to prevent and control childhood obesity.
- Klíčová slova
- Accelerometry, Body mass index, Child behavior, Pediatric obesity, Sedentary behavior,
- MeSH
- adipozita * MeSH
- akcelerometrie MeSH
- analýza dat MeSH
- dítě MeSH
- index tělesné hmotnosti MeSH
- lidé MeSH
- průřezové studie MeSH
- sedavý životní styl * MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH