Bayesian multiple hypotheses testing in compositional analysis of untargeted metabolomic data
Jazyk angličtina Země Nizozemsko Médium print-electronic
Typ dokumentu časopisecké články
PubMed
31910969
DOI
10.1016/j.aca.2019.11.006
PII: S0003-2670(19)31349-2
Knihovny.cz E-zdroje
- Klíčová slova
- Bayesian inference, Compositional data, High-dimensional data, Multiple hypotheses testing, Untargeted metabolomics, Volcano plot,
- MeSH
- acetyl-CoA-C-acetyltransferasa nedostatek metabolismus MeSH
- acyl-CoA-dehydrogenasa nedostatek metabolismus MeSH
- Bayesova věta * MeSH
- datové soubory jako téma MeSH
- lidé MeSH
- metabolomika * MeSH
- vrozené poruchy metabolismu aminokyselin metabolismus MeSH
- vrozené poruchy metabolismu tuků metabolismus MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- acetyl-CoA-C-acetyltransferasa MeSH
- acyl-CoA-dehydrogenasa MeSH
Clinical metabolomics aims at finding statistically significant differences in metabolic statuses of patient and control groups with the intention of understanding pathobiochemical processes and identification of clinically useful biomarkers of particular diseases. After the raw measurements are integrated and pre-processed as intensities of chromatographic peaks, the differences between controls and patients are evaluated by both univariate and multivariate statistical methods. The traditional univariate approach relies on t-tests (or their nonparametric alternatives) and the results from multiple testing are misleadingly compared merely by p-values using the so-called volcano plot. This paper proposes a Bayesian counterpart to the widespread univariate analysis, taking into account the compositional character of a metabolome. Since each metabolome is a collection of some small-molecule metabolites in a biological material, the relative structure of metabolomic data, which is inherently contained in ratios between metabolites, is of the main interest. Therefore, a proper choice of logratio coordinates is an essential step for any statistical analysis of such data. In addition, a concept of b-values is introduced together with a Bayesian version of the volcano plot incorporating distance levels of the posterior highest density intervals from zero. The theoretical background of the contribution is illustrated using two data sets containing samples of patients suffering from 3-hydroxy-3-methylglutaryl-CoA lyase deficiency and medium-chain acyl-CoA dehydrogenase deficiency. To evaluate the stability of the proposed method as well as the benefits of the compositional approach, two simulations designed to mimic a loss of samples and a systematical measurement error, respectively, are added.
Citace poskytuje Crossref.org