JavaScript NENÍ povolen !

Prosím povolte JavaScript.

Článek

FT
PubMed

Záznam pochází z PubMed

Interpoint distance tests for high-dimensional comparison studies

Marozzi, Marco
Autor Marozzi, Marco ORCID Ca' Foscari University of Venice, Venice, Italy
Mukherjee, Amitava
Autor Mukherjee, Amitava ORCID XLRI-Xavier School of Management, Jamshedpur, India
Kalina, Jan
Autor Kalina, Jan ORCID The Czech Academy of Sciences, Institute of Computer Science, Prague, Czech Republic

Journal of applied statistics. 2020 ; 47 (4) : 653-665. [epub] 20190731

J Appl Stat
ISSN 0266-4763
Zdroj

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz https://www.medvik.cz/link/pmid35707487

Online Plný text

PubMed 35707487
PubMed Central PMC9042018
DOI 10.1080/02664763.2019.1649374
PII: 1649374
Knihovny.cz E-zdroje

Klíčová slova
Multivariate data, biomedicine, genomics, nonparametric combination, nonparametric tests,
Publikační typ
časopisecké články MeSH

Modern data collection techniques allow to analyze a very large number of endpoints. In biomedical research, for example, expressions of thousands of genes are commonly measured only on a small number of subjects. In these situations, traditional methods for comparison studies are not applicable. Moreover, the assumption of normal distribution is often questionable for high-dimensional data, and some variables may be at the same time highly correlated with others. Hypothesis tests based on interpoint distances are very appealing for studies involving the comparison of means, because they do not assume data to come from normally distributed populations and comprise tests that are distribution free, unbiased, consistent, and computationally feasible, even if the number of endpoints is much larger than the number of subjects. New tests based on interpoint distances are proposed for multivariate studies involving simultaneous comparison of means and variability, or the whole distribution shapes. The tests are shown to perform well in terms of power, when the endpoints have complex dependence relations, such as in genomic and metabolomic studies. A practical application to a genetic cardiovascular case-control study is discussed.

Ca' Foscari University of Venice Venice Italy

The Czech Academy of Sciences Institute of Computer Science Prague Czech Republic

XLRI Xavier School of Management Jamshedpur India

Zobrazit více v PubMed

Bai Z.D. and Saranadasa H., Effect of high dimension: By an example of a two sample problem, Statist. Sinica 6 (1996), pp. 311–329.

Baringhaus L. and Franz C., On a new multivariate two-sample test, J. Multivar. Anal. 88 (2004), pp. 190–206. doi: 10.1016/S0047-259X(03)00079-4 DOI

Chen S.X. and Qin Y.L., A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Stat. 38 (2010), pp. 808–835. doi: 10.1214/09-AOS716 DOI

Chowdhury S., Mukherjee A. and Chakraborti S., A new distribution-free control chart for joint monitoring of location and scale parameters of continuous distributions, Qual. Reliab. Eng. Int. 30 (2014), pp. 191–204. doi: 10.1002/qre.1488 DOI

Cucconi O., Un nuovo test non parametrico per il confronto tra due gruppi campionari, G. Econ. Ann. Econ. 27 (1968), pp. 225–248.

Hollander M. and Wolfe D.A., Nonparametric Statistical Methods, 2nd ed., Wiley, New York, 1999.

Hossain A. and Beyene J., Application of skew-normal distribution for detecting differential expression to microRNA data, J. Appl. Stat. 42 (2015), pp. 477–491. doi: 10.1080/02664763.2014.962490 DOI

Jurečková J. and Kalina J., Nonparametric multivariate rank tests and their unbiasedness, Bernoulli 18 (2012), pp. 229–251. doi: 10.3150/10-BEJ326 DOI

Kalina J., A robust pre-processing of BeadChip microarray images, Biocybern. Biomed. Eng. 38 (2018), pp. 556–563. doi: 10.1016/j.bbe.2018.04.005 DOI

Kalina J. and Schlenker A., A robust supervised variable selection for noisy high-dimensional data, BioMed Res. Int. 2015 (2015), pp. 1–10. Article 320385. doi: 10.1155/2015/320385 PubMed DOI PMC

Lepage Y., A combination of Wilcoxon's and Ansari-Bradley's statistics, Biometrika 58 (1971), pp. 213–217. doi: 10.1093/biomet/58.1.213 DOI

Liu Z. and Modarres R., A triangle test for equality of distribution functions in high dimensions, J. Nonparametr. Stat. 23 (2011), pp. 605–615. doi: 10.1080/10485252.2010.485644 DOI

Marozzi M., Some notes on the location-scale Cucconi test, J. Nonparametr. Stat. 21 (2009), pp. 629–647. doi: 10.1080/10485250902952435 DOI

Marozzi M., Multivariate tests based on interpoint distances with application to magnetic resonance imaging, Stat. Methods Med. Res. 25 (2016), pp. 2593–2610. doi: 10.1177/0962280214529104 PubMed DOI

Minas C. and Montana G., Distance-based analysis of variance: Approximate inference, Stat. Anal. Data Min. 7 (2014), pp. 450–470. doi: 10.1002/sam.11227 DOI

Nelsen R.B., An Introduction to Copulas, 2nd ed., Springer Science + Business, New York, 2006.

Neuhäuser M., Combining the t test and Wilcoxon's rank-sum test, J. Appl. Stat. 42 (2015), pp. 2769–2775. doi: 10.1080/02664763.2015.1070809 DOI

Pesarin F. and Salmaso L., Permutation Tests for Complex Data, Chichester, Wiley, 2010.

Rapaport F., Khanin R., Liang Y., Pirun M., Krek A., Zumbo P., Mason C.E., Socci N.D. and Betel D., Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data, Genome Biol. 14 (2013), pp. 1–13. Article 3158. doi: 10.1186/gb-2013-14-9-r95 PubMed DOI PMC

Saraiva E.R., Suzuki A.K., Louzada F. and Milan L.A., Partitioning gene expression data by data-driven Markov chain Monte Carlo, J. Appl. Stat. 43 (2016), pp. 1155–1173. doi: 10.1080/02664763.2015.1092113 DOI

Seok J., Davis R.W. and Xiao W., A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data, PLoS ONE. 10 (2015), article e0122103. doi: 10.1371/journal.pone.0122103 PubMed DOI PMC

Shinohara R.T., Shou H., Carone M., Schultz R., Tunc B., Parker D. and Verma R., Distance-based analysis of variance for brain connectivity, in University of Pennsylvania UPenn Biostatistics Working Papers 482016. PubMed PMC

Smyth G.K., Limma: Linear models for microarray data. in Bioinformatics and computational biology solutions using R and bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber, eds., Springer, New York, pp. 397–420.2005.

Srivastava M.S., A test for the mean vector with fewer observations than the dimension under non-normality, J. Multivar. Anal. 100 (2009), pp. 518–532. doi: 10.1016/j.jmva.2008.06.006 DOI

Stadler N. and Mukherjee S., Two-sample testing in high dimensions, J. R. Stat. Soc. B 79 (2017), pp. 225–246. doi: 10.1111/rssb.12173 DOI

Szekely G.J. and Rizzo M.L., Energy statistics: Statistics based on distances, J. Statist. Plann. Inference 143 (2013), pp. 1249–1272. doi: 10.1016/j.jspi.2013.03.018 DOI

Yan J., Enjoy the joy of copulas: With a package copula, J. Stat. Softw. 21 (2007), pp. 1–21. doi: 10.18637/jss.v021.i04 DOI

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Testing exchangeability of multivariate distributions

Journal of applied statistics. 2023 ; 50 (15) : 3142-3156. [epub] 20220726

J Appl Stat
ISSN 0266-4763
Zdroj

Najít záznam

v BMČ

Citační ukazatele

Pouze přihlášení uživatelé

Interpoint distance tests for high-dimensional comparison studies

Najít záznam

Citační ukazatele

Možnosti archivace