Model-averaged Bayesian t tests

. 2025 Jun ; 32 (3) : 1007-1031. [epub] 20241107

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, přehledy

Perzistentní odkaz   https://www.medvik.cz/link/pmid39511109

Grantová podpora
EP-C-17-017 EPA - United States CEP - Centrální evidence projektů
16.Vici.170.083, 451-17-017 Nederlandse Organisatie voor Wetenschappelijk Onderzoek
743086 UNIFY H2020 European Research Council

Odkazy

PubMed 39511109
PubMed Central PMC12092555
DOI 10.3758/s13423-024-02590-5
PII: 10.3758/s13423-024-02590-5
Knihovny.cz E-zdroje

One of the most common statistical analyses in experimental psychology concerns the comparison of two means using the frequentist t test. However, frequentist t tests do not quantify evidence and require various assumption tests. Recently, popularized Bayesian t tests do quantify evidence, but these were developed for scenarios where the two populations are assumed to have the same variance. As an alternative to both methods, we outline a comprehensive t test framework based on Bayesian model averaging. This new t test framework simultaneously takes into account models that assume equal and unequal variances, and models that use t-likelihoods to improve robustness to outliers. The resulting inference is based on a weighted average across the entire model ensemble, with higher weights assigned to models that predicted the observed data well. This new t test framework provides an integrated approach to assumption checks and inference by applying a series of pertinent models to the data simultaneously rather than sequentially. The integrated Bayesian model-averaged t tests achieve robustness without having to commit to a single model following a series of assumption checks. To facilitate practical applications, we provide user-friendly implementations in JASP and via the RoBTT package in R . A tutorial video is available at https://www.youtube.com/watch?v=EcuzGTIcorQ.

Zobrazit více v PubMed

Alipourfard, N., Arendt, B., Benjamin, D. M., Benkler, N., Bishop, M., Burstein, M., ... Clark, C., Et al. (2021). Systematizing confidence in open research and evidence (score).

Barbieri, A., Marin, J. M., & Florin, K. (2016). A fully objective Bayesian approach for the Behrens-Fisher problem using historical studies. arXiv:1611.06873

Bartolucci, A. A., Blanchard, P. D., Howell, W. M., & Singh, K. P. (1998). A Bayesian Behrens-Fisher solution to a problem in taxonomy. Environmental Modelling & Software,13(1), 25–29. 10.1016/S1364-8152(97)00033-9

Bartoš, F., & Maier, M. (2022). RoBTT: An R package for robust Bayesian t-test.[SPACE]https://CRAN.R-project.org/package=RoBTT. (R package)

Bartoš, F., Gronau, Q. F., Timmers, B., Otte, W. M., Ly, A., & Wagenmakers, E. J. (2021). Bayesian model-averaged meta-analysis in medicine. Statistics in Medicine,40(30), 6743–6761. 10.1002/sim.9170 PubMed PMC

Bayarri, M. J., & Mayoral, A. M. (2002). Bayesian design of “successful’’ replications. The American Statistician,56, 207–214.

Berger, J.O., & Wolpert, R.L. (1988). The likelihood principle. Institute of Mathematical Statistics.

Berger, J. O., & Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence. Journal of the American Statistical Association,82, 112–139.

Bürkner, P. C., & Vuorre, M. (2019). Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science,2(1), 77–101. 10.1177/2515245918823

Cornfield, J. (1966). A Bayesian test of some classical hypotheses-with applications to sequential clinical trials. Journal of the American Statistical Association,61(315), 577–594. 10.1080/01621459.1966.10480890

Dablander, F., Bergh, D. V., Wagenmakers, E., & Ly, A. (2020). Default Bayes Factors for Testing the (In)equality of Several Population Variances.[SPACE]arXiv:2003.06278

Dayal, H. H., & Dickey, J. M. (1976). Bayes factors for Behrens-Fisher problems. Sankhyā: The Indian Journal of Statistics, Series B,38(4), 315–328.

de Heide, R., & Grünwald, P. D. (2021). Why optional stopping can be a problem for Bayesians. Psychonomic Bulletin & Review,28, 795–812. PubMed PMC

Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch’s t-test instead of Student’s t-test. International Review of Social Psychology,30(1), 92–101. 10.5334/irsp.82

Dickey, J. M. (1973). Scientific reporting and personal probabilities: Student’s hypothesis. Journal of the Royal Statistical Society B,35, 285–305.

Dickey, J. M. (1976). Approximate posterior distributions. Journal of the American Statistical Association,71, 680–689.

Dickey, J. M. (1977). Is the tail area useful as an approximate Bayes factor? Journal of the American Statistical Association,72(357), 138–142. 10.1080/01621459.1977.10479922

Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review,70, 193–242.

Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: an easy way to maximize the accuracy and power of your research. American Psychologist,63(7), 591–601. 10.1037/0003-066X.63.7.591 PubMed

Etz, A., & Wagenmakers, E. J. (2017). J. B. S. Haldane’s contribution to the Bayes factor hypothesis test. Statistical Science,32, 313–329. 10.1214/16-STS599

Ferrari, S., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics,31(7), 799–815. 10.1080/0266476042000214501

Fu, Q., Hoijtink, H., & Moerbeek, M. (2020). Sample-size determination for the bayesian [Image: see text] test and welch’s test using the approximate adjusted fractional bayes factor. Behavior Research Methods,53, 1–14. 10.3758/s13428-020-01408-1 PubMed PMC

Gallistel, C. R. (2009). The importance of proving the null. Psychological Review,116, 439–453. PubMed PMC

Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. Perspectives on Psychological Science,9(6), 641–651. 10.1177/1745691614551642 PubMed

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.

George, E. I., & McCulloch, R. E. (1993). Variable selection via gibbs sampling. Journal of the American Statistical Association,88(423), 881–889. 10.1080/01621459.1993.10476353

Gönen, M., Johnson, W. O., Lu, Y., & Westfall, P. H. (2005). The Bayesian two-sample [Image: see text] test. The American Statistician,59(3), 252–257. 10.1198/000313005X55233

Good, I. J. (1967). A Bayesian significance test for multinomial distributions. Journal of the Royal Statistical Society: Series B (Methodological),29(3), 399–418. 10.1111/j.2517-6161.1967.tb00705.x

Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of Consulting and Clinical Psychology,68(1), 155–165. 10.1037/0022-006X.68.1.155 PubMed

Gronau, Q. F., van Erp, S., Heck, D. W., Cesario, J., Jonas, K. J., & Wagenmakers, E. J. (2017). A Bayesian model-averaged meta-analysis of the power pose effect with informed and default priors: The case of felt power. Comprehensive Results in Social Psychology,2(1), 123–138. 10.1080/23743603.2017.1326760

Gronau, Q. F., Ly, A., & Wagenmakers, E. J. (2020). Informed Bayesian [Image: see text]-tests. The American Statistician,74, 137–143. 10.1080/00031305.2018.1562983

Hayes, A. F., & Cai, L. (2007). Further evaluating the conditional decision rule for comparing two independent means. British Journal of Mathematical and Statistical Psychology,60(2), 217–244. 10.1348/000711005X62576 PubMed

Hendriksen, A., de Heide, R., & Gruenwald, P. (2021). Optional stopping with bayes factors: a categorization and extension of folklore results, with an application to invariant situations. Bayesian Analysis,16(3), 961–989.

Hinne, M., Gronau, Q. F., van den Bergh, D., & Wagenmakers, E. J. (2020). A conceptual introduction to Bayesian model averaging. Advances in Methods and Practices in Psychological Science,3(2), 200–215. 10.1177/2515245919898657

Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: a tutorial. Statistical Science,14(4), 382–401. 10.1214/SS/1009212519

Ibrahim, J. G., Chen, M. H., Gwon, Y., & Chen, F. (2015). The power prior: theory and applications. Statistics in Medicine,34(28), 3724–3749. PubMed PMC

Jamil, T., Ly, A., Morey, R. D., Love, J., Marsman, M., & Wagenmakers, E. J. (2017). Default “gunel and dickey’’ bayes factors for contingency tables. Behavior Research Methods,49, 638–652. 10.3758/s13428-016-0739-8 PubMed PMC

JASP Team. (2022). JASP (Version 0.17)[Computer software].[SPACE]https://jasp-stats.org/.

Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophy Society,31, 203–222.

Jeffreys, H. (1939). Theory of probability (1st ed.). Oxford, UK: Oxford University Press.

Jeffreys, H. (1950). Bertrand russell on probability. Mind: A Quarterly Review of Psychology and Philosophy,59, 313–319.

Jennison, C., & Turnbull, B. W. (1999). Group sequential methods with applications to clinical trials. Boca Raton: CRC Press.

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association,90(430), 773–795. 10.1080/01621459.1995.10476572

Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., et al. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research,68(3), 350–386. 10.3102/00346543068003350

Keysers, C., Gazzola, V., & Wagenmakers, E. J. (2020). Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nature Neuroscience,23, 788–799. 10.1038/s41593-020-0660-4 PubMed PMC

Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General,142(2), 573. 10.1037/a0029146 PubMed

Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science,1, 270–280.

Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses. European Journal of Social Psychology,44(7), 701–710. 10.1002/ejsp.2023

Lee, M.D., & Wagenmakers, E. J. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press.

Lumley, T., Diehr, P., Emerson, S., Chen, L., et al. (2002). The importance of the normality assumption in large public health data sets. Annual Review of Public Health,23(1), 151–169. 10.1146/annurev.publhealth.23.100901.140546 PubMed

Ly, A., & Wagenmakers, E. J. (in press). Bayes factors for peri-null hypotheses. TEST. 10.48550/arXiv.2102.07162

Ly, A., Etz, A., Marsman, M., & Wagenmakers, E. J. (2019). Replication Bayes factors from evidence updating. Behavior Research Methods,51(6), 2498–2508. 10.3758/s13428-018-1092-x PubMed PMC

Ly, A., Verhagen, J., & Wagenmakers, E. J. (2016). Harold jeffreys’s default bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology,72, 19–32. 10.1016/j.jmp.2015.06.004

MacFarland, T.W., & Yates, J.M. (2016). Mann–whitney u test. In: Introduction to nonparametric statistics for the biological sciences using r (pp. 103–132). Springer.

Mair, P., & Wilcox, R. (2020). Robust statistical methods in r using the WRS2 package. Behavior Research Methods,52(2), 464–488. 10.3758/s13428-019-01246-w PubMed

Martel, C., Rathje, S., Clark, C. J., Pennycook, G., Bavel, J. J. V., Rand, D. G., & van der Linden, S. (2024). On the efficacy of accuracy prompts across partisan lines: An adversarial collaboration. Psychological Science,35(4), 435–450. 10.1177/09567976241232905. (PMID: 38506937). PubMed

McElreath, R. (2016). Statistical rethinking: A Bayesian course with examples in R and Stan. Boca Raton (FL): Chapman & Hall/CRC Press.

McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (2nd ed.). CRC Press.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology,46(4), 806–834. 10.1037/0022-006X.46.4.806

Moreno, E., Bertolino, F., & Racugno, W. (1999). Default Bayesian analysis of the Behrens-Fisher problem. Journal of Statistical Planning and Inference,81(2), 323–333. 10.1016/S0378-3758(99)00070-1

Morey, R.D., & Rouder, J.N. (2015). BayesFactor: Computation of Bayes factors for common designs.[SPACE]https://cran.r-project.org/web/packages/BayesFactor/index.html

Morey, R.D., & Rouder, J.N. (2018). BayesFactor 0.9.12-4.2. Comprehensive R Archive Network. http://cran.r-project.org/web/packages/BayesFactor/index.html

O’Hagan, A., & Forster, J. (2004). Kendall’s advanced theory of statistics vol. 2B: Bayesian inference (2nd ed.). London: Arnold.

Orben, A., & Lakens, D. (2020). Crud (re)defined. Advances in Methods and Practices in Psychological Science,3(2), 238–247. 10.1177/2515245920917961 PubMed PMC

Pennycook, G., McPhetres, J., Zhang, Y., Lu, J. G., & Rand, D. G. (2020). Fighting covid-19 misinformation on social media: Experimental evidence for a scalable accuracy-nudge intervention. Psychological Science,31(7), 770–780. PubMed PMC

Pleasant, A., & Barclay, P. (2018). Why hate the good guy? Antisocial punishment of high cooperators is greater when people compete to be chosen. Psychological Science,29(6), 868–876. 10.1177/0956797617752642 PubMed

Roozenbeek, J., Freeman, A. L., & Van Der Linden, S. (2021). How accurate are accuracy-nudge interventions? a preregistered direct replication of pennycook et al. (2020). Psychological Science,32(7), 1169–1178. 10.1177/09567976211024535 PubMed PMC

Rouder, J. N. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin & Review,21, 301–308. PubMed

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian [Image: see text] tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review,16(2), 225–237. 10.3758/PBR.16.2.225 PubMed

Sanborn, A. N., & Hills, T. T. (2014). The frequentist implications of optional stopping on Bayesian hypothesis tests. Psychonomic Bulletin & Review,21, 283–300. PubMed

Schnuerch, M., & Erdfelder, E. (2020). Controlling decision errors with minimal costs: The sequential probability ratio [Image: see text] test. Psychological Methods,25(2), 206. 10.1037/met0000234 PubMed

Schönbrodt, F. D., Wagenmakers, E. J., Zehetleitner, M., & Perugini, M. (2017). Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychological Methods,22(2), 322. 10.1037/met0000061 PubMed

Stefan, A. M., Evans, N. J., & Wagenmakers, E. J. (2020). Practical challenges and methodological flexibility in prior elicitation. Psychological Methods. 10.1037/met0000354 PubMed

Stefan, A.M., Schönbrodt, F., Evans, N.J., & Wagenmakers, E. J. (2020). Efficiency in sequential testing: Comparing the sequential probability ratio test and the sequential Bayes factor test. Manuscript submitted for publication, 10.31234/osf.io/ry4fw PubMed PMC

Stefan, A. M., Gronau, Q. F., Schönbrodt, F. D., & Wagenmakers, E. J. (2019). A tutorial on Bayes factor design analysis using an informed prior. Behavior Research Methods,51(3), 1042–1058. 10.3758/s13428-018-01189-8 PubMed PMC

Verhagen, J., & Wagenmakers, E. J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General,143(4), 1457. 10.1037/a0036731 PubMed

Vohs, K., Schmeichel, B., Lohmann, S., Gronau, Q.F., Finley, A.J., Wagenmakers, E. J., & Albarracin, D. (2021). A multi-site preregistered paradigmatic test of the ego depletion effect.

Wagenmakers, E. J., Gronau, Q.F., & Vandekerckhove, J. (2022). Five Bayesian intuitions for the stopping rule principle.[SPACE]10.31234/osf.io/5ntkd

Wagenmakers, E. J. (2020). Bayesian thinking for toddlers. Amsterdam: JASP Publishing.

Wagenmakers, E. J., & Grünwald, P. (2006). A bayesian perspective on hypothesis testing: A comment on killeen (2005). Psychological Science,17(7), 641. 10.1111/j.1467-9280.2006.01757.x PubMed

Wagenmakers, E. J., Morey, R. D., & Lee, M. D. (2016). Bayesian benefits for the pragmatic researcher. Current Directions in Psychological Science,25(3), 169–176. 10.1177/0963721416643289

Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of Mathematical Psychology,44, 92–107. PubMed

Western, B. (1995). Concepts and suggestions for robust regression analysis. American Journal of Political Science,39, 786–817. 10.2307/2111654

Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E. J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 [Image: see text] tests. Perspectives on Psychological Science,6, 291–298. PubMed

Wetzels, R., Raaijmakers, J. G., Jakab, E., & Wagenmakers, E. J. (2009). How to quantify support for and against the null hypothesis: A flexible WinBUGS implementation of a default Bayesian [Image: see text] test. Psychonomic Bulletin & Review,16(4), 752–760. 10.3758/PBR.16.4.752 PubMed

Wilcox, R. (2017). Modern statistics for the social and behavioral sciences: A practical introduction (2nd ed.). Boca Raton, FL: CRC Press.

Wrinch, D., & Jeffreys, H. (1921). On certain fundamental principles of scientific inquiry. Philosophical Magazine,42, 369–390. 10.1080/14786442108633773

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...