JavaScript NENÍ povolen !

Prosím povolte JavaScript.

Článek

FT
PubMed

Záznam pochází z PubMed

Cellwise outlier detection and biomarker identification in metabolomics based on pairwise log ratios

Walach, Jan
Autor Walach, Jan ORCID Institute of Statistics and Mathematical Methods in Economics TU Wien Vienna Austria
Filzmoser, Peter
Autor Filzmoser, Peter ORCID Institute of Statistics and Mathematical Methods in Economics TU Wien Vienna Austria
Kouřil, Štěpán
Autor Kouřil, Štěpán ORCID Laboratory of Metabolomics, Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry Palacký University Olomouc Olomouc Czech Republic Department of Clinical Biochemistry University Hospital Olomouc Olomouc Czech Republic
Friedecký, David
Autor Friedecký, David Laboratory of Metabolomics, Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry Palacký University Olomouc Olomouc Czech Republic Department of Clinical Biochemistry University Hospital Olomouc Olomouc Czech Republic
Adam, Tomáš
Autor Adam, Tomáš ORCID Laboratory of Metabolomics, Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry Palacký University Olomouc Olomouc Czech Republic Department of Clinical Biochemistry University Hospital Olomouc Olomouc Czech Republic

Journal of chemometrics. 2020 Jan ; 34 (1) : e3182. [epub] 20191202

J Chemom
ISSN 0886-9383
Zdroj

Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium print-electronic

Typ dokumentu časopisecké články

Perzistentní odkaz https://www.medvik.cz/link/pmid32189829

Online Plný text

PubMed 32189829
PubMed Central PMC7063692
DOI 10.1002/cem.3182
PII: CEM3182
Knihovny.cz E-zdroje

Klíčová slova
biomarker, cellwise outliers, cell‐rPLR, log ratio, metabolomics, robust method,
Publikační typ
časopisecké články MeSH

Data outliers can carry very valuable information and might be most informative for the interpretation. Nevertheless, they are often neglected. An algorithm called cellwise outlier diagnostics using robust pairwise log ratios (cell-rPLR) for the identification of outliers in single cell of a data matrix is proposed. The algorithm is designed for metabolomic data, where due to the size effect, the measured values are not directly comparable. Pairwise log ratios between the variable values form the elemental information for the algorithm, and the aggregation of appropriate outlyingness values results in outlyingness information. A further feature of cell-rPLR is that it is useful for biomarker identification, particularly in the presence of cellwise outliers. Real data examples and simulation studies underline the good performance of this algorithm in comparison with alternative methods.

Department of Clinical Biochemistry University Hospital Olomouc Olomouc Czech Republic

Institute of Statistics and Mathematical Methods in Economics TU Wien Vienna Austria

Laboratory of Metabolomics Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacký University Olomouc Olomouc Czech Republic

Zobrazit více v PubMed

Strimbu K, Tavel JA. What are biomarkers? Curr Opin HIV AIDS. 2010;5(6):463. PubMed PMC

Pepe MS, Etzioni R, Feng Z, et al. Phases of biomarker development for early detection of cancer. JNCI: J Natl Cancer Inst. 2001;93(14):1054‐1061. PubMed

Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. 2009;26(3):392‐398. PubMed

Huber PJ, Ronchetti EM. Robust Statistics, Series in Probability and Mathematical Statistics. New York, NY, USA: John Wiley; 1981.

Maronna RA, Martin RD, Yohai VJ, Salibián‐Barrera M. Robust Statistics: Theory and Methods (With R). Chichester, UK: Wiley; 2019.

Maronna R, Martin RD, Yohai V. Robust Statistics. Chichester, UK: John Wiley & Sons; 2006.

Rousseeuw PJ, Bossche WVD. Detecting deviating data cells. Technometrics. 2018;60(2):135‐145.

Öllerer V, Alfons A, Croux C. The shooting S‐estimator for robust regression. Comput Stat. 2016;31(3):829‐844.

Warrack BM, Hnatyshyn S, Ott KH, et al. Normalization strategies for metabonomic analysis of urine samples. J Chromatogr B. 2009;877(5‐6):547‐552. PubMed

Filzmoser P, Walczak B. What can go wrong at the data normalization step for identification of biomarkers? J Chromatog A. 2014;1362:194‐205. PubMed

Kvalheim OM, Brakstad F, Liang Y. Preprocessing of analytical profiles in the presence of homoscedastic or heteroscedastic noise. Anal Chem. 1994;66(1):43‐51.

Craig A, Cloarec O, Holmes E, Nicholson JK, Lindon JC. Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal Chem. 2006;78(7):2262‐2267. PubMed

Dieterle F, Ross A, Schlotterbeck G, Senn H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in PubMed

Pawlowsky‐Glahn V, Egozcue JJ, Tolosana‐Delgado R. Modeling and Analysis of Compositional Data. Chichester, UK: John Wiley & Sons; 2015.

Walach J, Filzmoser P, Hron K. Data normalization and scaling: Consequences for the analysis in omics science. In: Jaumot J, Bedia C, Tauler R, eds.

Beaton AE, Tukey JW. The fitting of power series, meaning polynomials, illustrated on band‐spectroscopic data. Technometrics. 1974;16(2):147‐185.

Yohai VJ, Zamar RH. High breakdown‐point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc. 1988;83(402):406‐413.

Maronna RA, Zamar RH. Robust estimates of location and dispersion for high‐dimensional datasets. Technometrics. 2002;44(4):307‐317.

Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust Statistics: The Approach Based on Influence Functions. New York, NY, USA: John Wiley & Sons; 1986.

Fisher RA. The Design of Experiments. UK: Oliver & Boyd, Edinburgh and London; 1935.

Rubin DB. Randomization analysis of experimental data: the Fisher randomization test comment. J Am Stat Assoc. 1980;75(371):591‐593.

Janečková H, Hron K, Wojtowicz P, et al. Targeted metabolomic analysis of plasma samples for the diagnosis of inherited metabolic disorders. J Chromatogr A. 2012;1226:11‐17. PubMed

Franceschi P, Masuero D, Vrhovsek U, Mattivi F, Wehrens R. A benchmark spike‐in data set for biomarker identification in metabolomics. J Chemom. 2012;26(1‐2):16‐24.

Wehrens R, Franceschi P, Vrhovsek U, Mattivi F. Stability‐based biomarker selection. Anal Chim Acta. 2011;705(1‐2):15‐23. PubMed

Wang J, Christison TT, Misuno K, et al. Metabolomic profiling of anionic metabolites in head and neck cancer cells by capillary ion chromatography with orbitrap mass spectrometry. Anal Chem. 2014;86(10):5116‐5124. PubMed

Cleveland WS, Devlin SJ. Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc. 1988;83(403):596‐610.

Miller MJ, Kennedy AD, Eckhart AD, et al. Untargeted metabolomic analysis for the clinical screening of inborn errors of metabolism. J Inherit Metab Dis. 2015;38(6):1029‐1039. PubMed PMC

Jansen RS, Addie R, Merkx R, et al. N‐lactoyl‐amino acids are ubiquitous metabolites that originate from CNDP2‐mediated reverse proteolysis of lactate and amino acids. Proc Natl Acad Sci. 2015;112(21):6601‐6606. PubMed PMC

Václavík J, Coene KL, Vrobel I, et al. Structural elucidation of novel biomarkers of known metabolic disorders based on multistage fragmentation mass spectra. J Inherit Metab Dis. 2018;41(3):407‐414. PubMed

Wold H. Path models with latent variables: the NIPALS approach. In: Blalock HM, Aganbegian A, Borodkin FM, Boudon R, Capecchi V, eds.

Wold S, Martens H, Wold H. The Multivariate Calibration Problem in Chemistry Solved by the PLS Method. In: Kågström B, Ruhe A, eds.

Ståhle L, Wold S. Partial least squares analysis with cross‐validation for the two‐class problem: a Monte Carlo study. J Chemom. 1987;1(3):185‐196.

Favilla S, Durante C, Vigni ML, Cocchi M. Assessing feature relevance in NPLS models by VIP. Chemom Intell Lab Syst. 2013;129:76‐86.

Wold S, Johansson E, Cocchi M. PLS—partial least squares projections to latent structures. 3D QSAR. Drug Des. 1993;1:523‐550.

Chong IG, Jun CH. Performance of some variable selection methods when multicollinearity is present. Chemom Intell Lab Syst. 2005;78(1):103‐112.

Gosselin Ryan, Rodrigue Denis, Duchesne Carl. A bootstrap‐VIP approach for selecting wavelength intervals in spectral imaging applications. Chemom Intell Lab Syst. 2010;100(1):12‐21.

Mehmood T, Liland KH, Snipen L, Sæbø S. A review of variable selection methods in partial least squares regression. Chemom Intell Lab Syst. 2012;118:62‐69.

Rajalahti T, Arneberg R, Berven FS, Myhr KM, Ulvik RJ, Kvalheim OM. Biomarker discovery in mass spectral profiles by means of selectivity ratio plot. Chemom Intell Lab Syst. 2009;95(1):35‐48.

Kvalheim OM. Interpretation of partial least squares regression models by means of target projection and selectivity ratio plots. J Chemom. 2009;24(7‐8):496‐504.

Rajalahti T, Arneberg R, Kroksveen AC, Berle M, Myhr KM, Kvalheim OM. Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles. Anal Chem. 2009;81(7):2581‐2590. PubMed

Filzmoser P, Serneels S, Maronna R, Van Espen PJ. Robust multivariate methods in chemometrics In: Walczak B, Ferre RT, Brown S, eds. Comprehensive Chemometrics (vol. 3). Oxford, UK: Oxford, UK; 2009:681‐722.

Serneels S, Croux C, Filzmoser P, Van Espen PJ. Partial robust M‐regression. Chemom Intell Lab Syst. 2005;79(1‐2):55‐64.

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA‐seq data with DESeq2. Genome Biol. 2014;15(12):550. PubMed PMC

Wald A. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans Am Math Soc. 1943;54(3):426‐482.

Harrell FE. Regression Modeling Strategies. Germany: Springer, Cham; 2014.

Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB. ANOVA‐like differential expression (ALDEx) analysis for mixed population RNA‐Seq. PLoS ONE. 2013;8(7):e67019. PubMed PMC

Gloor GB, Reid G. Compositional analysis: a valid approach to analyze microbiome high‐throughput sequencing data. Can J Microbiol. 2016;62(8):692‐703. PubMed

Aitchison J. The statistical analysis of compositional data. J R Stat Soc Ser B Methodol. 1982;44(2):139‐177.

Welch BL. The significance of the difference between two means when the population variances are unequal. Biometrika. 1938;29(3/4):350‐362.

R Core Team. R: a language and environment for statistical computing: R Foundation for Statistical Computing, Vienna, Austria: https://www.R‐project.org/; 2018.

Chang W, Cheng J, Allaire JJ, Xie Y, McPherson J. Shiny: web application framework for R. https://CRAN.Rproject.org/package=shiny, r package version1.1.0.; 2018.

Najít záznam

v BMČ

Citační ukazatele

Pouze přihlášení uživatelé

Cellwise outlier detection and biomarker identification in metabolomics based on pairwise log ratios

Najít záznam

Citační ukazatele

Možnosti archivace