Interpoint distance tests for high-dimensional comparison studies
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
35707487
PubMed Central
PMC9042018
DOI
10.1080/02664763.2019.1649374
PII: 1649374
Knihovny.cz E-zdroje
- Klíčová slova
- Multivariate data, biomedicine, genomics, nonparametric combination, nonparametric tests,
- Publikační typ
- časopisecké články MeSH
Modern data collection techniques allow to analyze a very large number of endpoints. In biomedical research, for example, expressions of thousands of genes are commonly measured only on a small number of subjects. In these situations, traditional methods for comparison studies are not applicable. Moreover, the assumption of normal distribution is often questionable for high-dimensional data, and some variables may be at the same time highly correlated with others. Hypothesis tests based on interpoint distances are very appealing for studies involving the comparison of means, because they do not assume data to come from normally distributed populations and comprise tests that are distribution free, unbiased, consistent, and computationally feasible, even if the number of endpoints is much larger than the number of subjects. New tests based on interpoint distances are proposed for multivariate studies involving simultaneous comparison of means and variability, or the whole distribution shapes. The tests are shown to perform well in terms of power, when the endpoints have complex dependence relations, such as in genomic and metabolomic studies. A practical application to a genetic cardiovascular case-control study is discussed.
Ca' Foscari University of Venice Venice Italy
The Czech Academy of Sciences Institute of Computer Science Prague Czech Republic
Zobrazit více v PubMed
Bai Z.D. and Saranadasa H., Effect of high dimension: By an example of a two sample problem, Statist. Sinica 6 (1996), pp. 311–329.
Baringhaus L. and Franz C., On a new multivariate two-sample test, J. Multivar. Anal. 88 (2004), pp. 190–206. doi: 10.1016/S0047-259X(03)00079-4 DOI
Chen S.X. and Qin Y.L., A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Stat. 38 (2010), pp. 808–835. doi: 10.1214/09-AOS716 DOI
Chowdhury S., Mukherjee A. and Chakraborti S., A new distribution-free control chart for joint monitoring of location and scale parameters of continuous distributions, Qual. Reliab. Eng. Int. 30 (2014), pp. 191–204. doi: 10.1002/qre.1488 DOI
Cucconi O., Un nuovo test non parametrico per il confronto tra due gruppi campionari, G. Econ. Ann. Econ. 27 (1968), pp. 225–248.
Hollander M. and Wolfe D.A., Nonparametric Statistical Methods, 2nd ed., Wiley, New York, 1999.
Hossain A. and Beyene J., Application of skew-normal distribution for detecting differential expression to microRNA data, J. Appl. Stat. 42 (2015), pp. 477–491. doi: 10.1080/02664763.2014.962490 DOI
Jurečková J. and Kalina J., Nonparametric multivariate rank tests and their unbiasedness, Bernoulli 18 (2012), pp. 229–251. doi: 10.3150/10-BEJ326 DOI
Kalina J., A robust pre-processing of BeadChip microarray images, Biocybern. Biomed. Eng. 38 (2018), pp. 556–563. doi: 10.1016/j.bbe.2018.04.005 DOI
Kalina J. and Schlenker A., A robust supervised variable selection for noisy high-dimensional data, BioMed Res. Int. 2015 (2015), pp. 1–10. Article 320385. doi: 10.1155/2015/320385 PubMed DOI PMC
Lepage Y., A combination of Wilcoxon's and Ansari-Bradley's statistics, Biometrika 58 (1971), pp. 213–217. doi: 10.1093/biomet/58.1.213 DOI
Liu Z. and Modarres R., A triangle test for equality of distribution functions in high dimensions, J. Nonparametr. Stat. 23 (2011), pp. 605–615. doi: 10.1080/10485252.2010.485644 DOI
Marozzi M., Some notes on the location-scale Cucconi test, J. Nonparametr. Stat. 21 (2009), pp. 629–647. doi: 10.1080/10485250902952435 DOI
Marozzi M., Multivariate tests based on interpoint distances with application to magnetic resonance imaging, Stat. Methods Med. Res. 25 (2016), pp. 2593–2610. doi: 10.1177/0962280214529104 PubMed DOI
Minas C. and Montana G., Distance-based analysis of variance: Approximate inference, Stat. Anal. Data Min. 7 (2014), pp. 450–470. doi: 10.1002/sam.11227 DOI
Nelsen R.B., An Introduction to Copulas, 2nd ed., Springer Science + Business, New York, 2006.
Neuhäuser M., Combining the t test and Wilcoxon's rank-sum test, J. Appl. Stat. 42 (2015), pp. 2769–2775. doi: 10.1080/02664763.2015.1070809 DOI
Pesarin F. and Salmaso L., Permutation Tests for Complex Data, Chichester, Wiley, 2010.
Rapaport F., Khanin R., Liang Y., Pirun M., Krek A., Zumbo P., Mason C.E., Socci N.D. and Betel D., Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data, Genome Biol. 14 (2013), pp. 1–13. Article 3158. doi: 10.1186/gb-2013-14-9-r95 PubMed DOI PMC
Saraiva E.R., Suzuki A.K., Louzada F. and Milan L.A., Partitioning gene expression data by data-driven Markov chain Monte Carlo, J. Appl. Stat. 43 (2016), pp. 1155–1173. doi: 10.1080/02664763.2015.1092113 DOI
Seok J., Davis R.W. and Xiao W., A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data, PLoS ONE. 10 (2015), article e0122103. doi: 10.1371/journal.pone.0122103 PubMed DOI PMC
Shinohara R.T., Shou H., Carone M., Schultz R., Tunc B., Parker D. and Verma R., Distance-based analysis of variance for brain connectivity, in University of Pennsylvania UPenn Biostatistics Working Papers 482016. PubMed PMC
Smyth G.K., Limma: Linear models for microarray data. in Bioinformatics and computational biology solutions using R and bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber, eds., Springer, New York, pp. 397–420.2005.
Srivastava M.S., A test for the mean vector with fewer observations than the dimension under non-normality, J. Multivar. Anal. 100 (2009), pp. 518–532. doi: 10.1016/j.jmva.2008.06.006 DOI
Stadler N. and Mukherjee S., Two-sample testing in high dimensions, J. R. Stat. Soc. B 79 (2017), pp. 225–246. doi: 10.1111/rssb.12173 DOI
Szekely G.J. and Rizzo M.L., Energy statistics: Statistics based on distances, J. Statist. Plann. Inference 143 (2013), pp. 1249–1272. doi: 10.1016/j.jspi.2013.03.018 DOI
Yan J., Enjoy the joy of copulas: With a package copula, J. Stat. Softw. 21 (2007), pp. 1–21. doi: 10.18637/jss.v021.i04 DOI
Testing exchangeability of multivariate distributions