Data analysis
Dotaz
Zobrazit nápovědu
The health effects of daily activity behaviours (physical activity, sedentary time and sleep) are widely studied. While previous research has largely examined activity behaviours in isolation, recent studies have adjusted for multiple behaviours. However, the inclusion of all activity behaviours in traditional multivariate analyses has not been possible due to the perfect multicollinearity of 24-h time budget data. The ensuing lack of adjustment for known effects on the outcome undermines the validity of study findings. We describe a statistical approach that enables the inclusion of all daily activity behaviours, based on the principles of compositional data analysis. Using data from the International Study of Childhood Obesity, Lifestyle and the Environment, we demonstrate the application of compositional multiple linear regression to estimate adiposity from children's daily activity behaviours expressed as isometric log-ratio coordinates. We present a novel method for predicting change in a continuous outcome based on relative changes within a composition, and for calculating associated confidence intervals to allow for statistical inference. The compositional data analysis presented overcomes the lack of adjustment that has plagued traditional statistical methods in the field, and provides robust and reliable insights into the health effects of daily activity behaviours.
- Klíčová slova
- Compositional data analysis, multicollinearity, physical activity, sedentary behaviour, sleep,
- MeSH
- cvičení * MeSH
- dítě MeSH
- interpretace statistických dat * MeSH
- lidé MeSH
- obezita dětí a dospívajících * MeSH
- sedavý životní styl * MeSH
- spánek * MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
We use random matrix theory to demonstrate the existence of generic and subject-independent features of the ensemble of correlation matrices extracted from human EEG data. In particular, the spectral density as well as the level spacings was analyzed and shown to be generic and subject independent. We also investigate number variance distributions. In this case we show that when the measured subject is visually stimulated the number variance displays deviations from the random matrix prediction.
- MeSH
- elektroencefalografie metody MeSH
- interpretace statistických dat * MeSH
- lidé MeSH
- statistické modely * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography-tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography-tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks ( https://github.com/Functional-Metabolomics-Lab/FBMN-STATS ). Additionally, the protocol is accompanied by a web application with a graphical user interface ( https://fbmn-statsguide.gnps2.org/ ) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.
Statistical theory indicates that hierarchical clustering by interviewers or raters needs to be considered to avoid incorrect inferences when performing any analyses including regression, factor analysis (FA) or item response theory (IRT) modelling of binary or ordinal data. We use simulated Positive and Negative Syndrome Scale (PANSS) data to show the consequences (in terms of bias, variance and mean square error) of using an analysis ignoring clustering on confirmatory factor analysis (CFA) estimates. Our investigation includes the performance of different estimators, such as maximum likelihood, weighted least squares and Markov Chain Monte Carlo (MCMC). Our simulation results suggest that ignoring clustering may lead to serious bias of the estimated factor loadings, item thresholds, and corresponding standard errors in CFAs for ordinal item response data typical of that commonly encountered in psychiatric research. In addition, fit indices tend to show a poor fit for the hypothesized structural model. MCMC estimation may be more robust against clustering than maximum likelihood and weighted least squares approaches but further investigation of these issues is warranted in future simulation studies of other datasets. Copyright © 2015 John Wiley & Sons, Ltd.
- Klíčová slova
- PANSS, factor analysis, hierarchical modelling, simulation,
- MeSH
- faktorová analýza statistická * MeSH
- interpretace statistických dat * MeSH
- lidé MeSH
- počítačová simulace MeSH
- psychiatrické posuzovací škály statistika a číselné údaje MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.
- Klíčová slova
- benchmarking, data analysis, label-free proteomics, quality metrics, workflow,
- MeSH
- analýza dat MeSH
- proteiny MeSH
- proteomika * MeSH
- průběh práce MeSH
- software MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny MeSH
The occurrence of acne in women with hyperandrogenemia is well known; a question remains, however, as to whether a further positive relationship can be detected between the intensity of acne and the levels of testosterone, androgen precursors and sex hormone binding globulin (SHBG). A procedure of interactive data analysis extracting relevant information from original data was applied. Exploratory data analysis (EDA) identifies basic statistical features and patterns of data using a variety of diagnostic displays. The need for this step is particularly acute in biochemical and clinical data, the distribution of which is mostly non-Gaussian and often corrupted by the outliers. The omission of EDA can lead to incorrect results and false conclusions. In the EDA (i) several graphical tools for summarizing data are applied, (ii) the peculiarities of a sample distribution are investigated, (iii) a construction of distribution is carried out, (iv) a graphical comparison of the sample distribution with selected theoretical distributions is employed. The proposed procedure is illustrated by typical case study in the evaluation of differences between mean values of serum levels of testosterone, androgen precursors and SHBG in a group of patients with mild and severe forms of acne. A knowledge of the interval estimate of the mean value in both groups enables their comparison at the chosen probability level. As will be apparent from the evaluation of inter-group SHBG differences, an incorrect approach to the determination of group mean values could result in a complete misinterpretation of the data. The results indicate that androgens are not significantly related to the intensity of acne, and that SHBG is higher in patients with more severe forms of acne.
- MeSH
- acne vulgaris komplikace metabolismus MeSH
- dehydroepiandrosteron krev metabolismus MeSH
- diagnostické testy rutinní normy MeSH
- globulin vázající pohlavní hormony analýza metabolismus MeSH
- hyperandrogenismus komplikace diagnóza metabolismus MeSH
- interpretace statistických dat MeSH
- lidé MeSH
- počítačová grafika MeSH
- pravděpodobnost MeSH
- statistické modely * MeSH
- testosteron krev metabolismus MeSH
- vzorkové studie MeSH
- zobrazování dat * MeSH
- Check Tag
- lidé MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- srovnávací studie MeSH
- Názvy látek
- dehydroepiandrosteron MeSH
- globulin vázající pohlavní hormony MeSH
- testosteron MeSH
Exploratory data analysis based on multivariate statistical analysis techniques was introduced as a new approach to expressing the toxicity of chemical substances at the simultaneous acceptance of various cell models. Using principal component analysis and cluster analysis methods the toxicity of chlorinated phenol derivatives on employing some of the cell models (chlorococcal algae, cyanobacteria, bacteria, micromycetes, plant and animal cells) was characterized. The previous empirical experience that the toxicity of chlorinated phenol derivatives will increase with a growing degree of chlorination and that the presence of the methoxy group will cause a lowering of the toxic effect was demonstrated. The relationship between groups of tests used was presented.
- MeSH
- Allium účinky léků MeSH
- Bacteria účinky léků MeSH
- biologické modely MeSH
- chlorfenoly toxicita MeSH
- Eukaryota účinky léků MeSH
- houby účinky léků MeSH
- interpretace statistických dat MeSH
- látky znečišťující životní prostředí toxicita MeSH
- Oligochaeta účinky léků MeSH
- shluková analýza MeSH
- sinice účinky léků MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- chlorfenoly MeSH
- látky znečišťující životní prostředí MeSH
The pedometer is a widely used research tool for measuring the level and extent of physical activity (PA) within population subgroups. The sample used in this study was drawn from a population of university students to examine the influence of the monitoring interval and alternate starting days on step-count activity patterns. The study was part of a national project during 2008-2010. Eligible subjects (641) were selected from a sample of 906 university students. The students wore pedometers continuously for 7 days excluding time for sleep and personal hygiene. Steps per day were logged on record sheets by each student. Data gathering spanned an entire week, and the results were sorted by alternate starting days, by activity for an entire week, by activity for only the weekdays of the one-week monitoring interval and for the two-day weekend. The statistical analysis included ANOVA, intra-class correlation (ICC) analysis, and regression analysis. The ICC analysis suggested that monitoring starting on Monday (ICC = 0.71; 95%CI (0.61-0.79)), Tuesday (ICC = 0.67; 95%CI (0.59-0.75)) or Thursday (ICC = 0.68; 95%CI (0.55-0.79)) improved reliability. The results of regression analysis also indicated that any starting day except Sunday is satisfactory as long as a minimum of four days of monitoring are used.
- MeSH
- ambulantní monitorování přístrojové vybavení MeSH
- chůze statistika a číselné údaje MeSH
- dospělí MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- regresní analýza MeSH
- sběr dat přístrojové vybavení metody MeSH
- studenti MeSH
- univerzity MeSH
- výzkumný projekt * MeSH
- Check Tag
- dospělí MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The author discusses the beginning of the systematic collection of data on nationalities and linguistic groups in Czechoslovakia during the eighteenth century. The collection of this type of data was a result of two factors: the need for topographic data by secular and ecclesiastic authorities and the general growth of scientific research. The different types of data collected are also examined. Topographic materials generally dealt with the ethnicity of individual places, while scientific studies focused on global data on nationalities.
- Klíčová slova
- Communication, Cultural Background, Czechoslovakia, Data Collection *, Demographic Factors, Developed Countries, Eastern Europe, Ethnic Groups *, Europe, Geographic Factors, Historical Survey *, Language *, Population, Population Characteristics, Research Methodology,
- MeSH
- demografie MeSH
- etnicita * MeSH
- jazyk (prostředek komunikace) * MeSH
- komunikace MeSH
- kultura MeSH
- populace MeSH
- populační charakteristiky MeSH
- sběr dat * MeSH
- vyspělé země MeSH
- výzkum MeSH
- zeměpis MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Československo MeSH
- Evropa MeSH
- východní Evropa MeSH
Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021-22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.