Interpoint distance tests for high-dimensional comparison studies

. 2020 ; 47 (4) : 653-665. [epub] 20190731

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid35707487

Modern data collection techniques allow to analyze a very large number of endpoints. In biomedical research, for example, expressions of thousands of genes are commonly measured only on a small number of subjects. In these situations, traditional methods for comparison studies are not applicable. Moreover, the assumption of normal distribution is often questionable for high-dimensional data, and some variables may be at the same time highly correlated with others. Hypothesis tests based on interpoint distances are very appealing for studies involving the comparison of means, because they do not assume data to come from normally distributed populations and comprise tests that are distribution free, unbiased, consistent, and computationally feasible, even if the number of endpoints is much larger than the number of subjects. New tests based on interpoint distances are proposed for multivariate studies involving simultaneous comparison of means and variability, or the whole distribution shapes. The tests are shown to perform well in terms of power, when the endpoints have complex dependence relations, such as in genomic and metabolomic studies. A practical application to a genetic cardiovascular case-control study is discussed.

Zobrazit více v PubMed

Bai Z.D. and Saranadasa H., Effect of high dimension: By an example of a two sample problem, Statist. Sinica 6 (1996), pp. 311–329.

Baringhaus L. and Franz C., On a new multivariate two-sample test, J. Multivar. Anal. 88 (2004), pp. 190–206. doi: 10.1016/S0047-259X(03)00079-4 DOI

Chen S.X. and Qin Y.L., A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Stat. 38 (2010), pp. 808–835. doi: 10.1214/09-AOS716 DOI

Chowdhury S., Mukherjee A. and Chakraborti S., A new distribution-free control chart for joint monitoring of location and scale parameters of continuous distributions, Qual. Reliab. Eng. Int. 30 (2014), pp. 191–204. doi: 10.1002/qre.1488 DOI

Cucconi O., Un nuovo test non parametrico per il confronto tra due gruppi campionari, G. Econ. Ann. Econ. 27 (1968), pp. 225–248.

Hollander M. and Wolfe D.A., Nonparametric Statistical Methods, 2nd ed., Wiley, New York, 1999.

Hossain A. and Beyene J., Application of skew-normal distribution for detecting differential expression to microRNA data, J. Appl. Stat. 42 (2015), pp. 477–491. doi: 10.1080/02664763.2014.962490 DOI

Jurečková J. and Kalina J., Nonparametric multivariate rank tests and their unbiasedness, Bernoulli 18 (2012), pp. 229–251. doi: 10.3150/10-BEJ326 DOI

Kalina J., A robust pre-processing of BeadChip microarray images, Biocybern. Biomed. Eng. 38 (2018), pp. 556–563. doi: 10.1016/j.bbe.2018.04.005 DOI

Kalina J. and Schlenker A., A robust supervised variable selection for noisy high-dimensional data, BioMed Res. Int. 2015 (2015), pp. 1–10. Article 320385. doi: 10.1155/2015/320385 PubMed DOI PMC

Lepage Y., A combination of Wilcoxon's and Ansari-Bradley's statistics, Biometrika 58 (1971), pp. 213–217. doi: 10.1093/biomet/58.1.213 DOI

Liu Z. and Modarres R., A triangle test for equality of distribution functions in high dimensions, J. Nonparametr. Stat. 23 (2011), pp. 605–615. doi: 10.1080/10485252.2010.485644 DOI

Marozzi M., Some notes on the location-scale Cucconi test, J. Nonparametr. Stat. 21 (2009), pp. 629–647. doi: 10.1080/10485250902952435 DOI

Marozzi M., Multivariate tests based on interpoint distances with application to magnetic resonance imaging, Stat. Methods Med. Res. 25 (2016), pp. 2593–2610. doi: 10.1177/0962280214529104 PubMed DOI

Minas C. and Montana G., Distance-based analysis of variance: Approximate inference, Stat. Anal. Data Min. 7 (2014), pp. 450–470. doi: 10.1002/sam.11227 DOI

Nelsen R.B., An Introduction to Copulas, 2nd ed., Springer Science + Business, New York, 2006.

Neuhäuser M., Combining the t test and Wilcoxon's rank-sum test, J. Appl. Stat. 42 (2015), pp. 2769–2775. doi: 10.1080/02664763.2015.1070809 DOI

Pesarin F. and Salmaso L., Permutation Tests for Complex Data, Chichester, Wiley, 2010.

Rapaport F., Khanin R., Liang Y., Pirun M., Krek A., Zumbo P., Mason C.E., Socci N.D. and Betel D., Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data, Genome Biol. 14 (2013), pp. 1–13. Article 3158. doi: 10.1186/gb-2013-14-9-r95 PubMed DOI PMC

Saraiva E.R., Suzuki A.K., Louzada F. and Milan L.A., Partitioning gene expression data by data-driven Markov chain Monte Carlo, J. Appl. Stat. 43 (2016), pp. 1155–1173. doi: 10.1080/02664763.2015.1092113 DOI

Seok J., Davis R.W. and Xiao W., A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data, PLoS ONE. 10 (2015), article e0122103. doi: 10.1371/journal.pone.0122103 PubMed DOI PMC

Shinohara R.T., Shou H., Carone M., Schultz R., Tunc B., Parker D. and Verma R., Distance-based analysis of variance for brain connectivity, in University of Pennsylvania UPenn Biostatistics Working Papers 482016. PubMed PMC

Smyth G.K., Limma: Linear models for microarray data. in Bioinformatics and computational biology solutions using R and bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber, eds., Springer, New York, pp. 397–420.2005.

Srivastava M.S., A test for the mean vector with fewer observations than the dimension under non-normality, J. Multivar. Anal. 100 (2009), pp. 518–532. doi: 10.1016/j.jmva.2008.06.006 DOI

Stadler N. and Mukherjee S., Two-sample testing in high dimensions, J. R. Stat. Soc. B 79 (2017), pp. 225–246. doi: 10.1111/rssb.12173 DOI

Szekely G.J. and Rizzo M.L., Energy statistics: Statistics based on distances, J. Statist. Plann. Inference 143 (2013), pp. 1249–1272. doi: 10.1016/j.jspi.2013.03.018 DOI

Yan J., Enjoy the joy of copulas: With a package copula, J. Stat. Softw. 21 (2007), pp. 1–21. doi: 10.18637/jss.v021.i04 DOI

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Testing exchangeability of multivariate distributions

. 2023 ; 50 (15) : 3142-3156. [epub] 20220726

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...