Model-averaged Bayesian t tests
Jazyk angličtina Země Spojené státy americké Médium print-electronic
Typ dokumentu časopisecké články, přehledy
Grantová podpora
EP-C-17-017
EPA - United States
CEP - Centrální evidence projektů
16.Vici.170.083, 451-17-017
Nederlandse Organisatie voor Wetenschappelijk Onderzoek
743086 UNIFY
H2020 European Research Council
PubMed
39511109
PubMed Central
PMC12092555
DOI
10.3758/s13423-024-02590-5
PII: 10.3758/s13423-024-02590-5
Knihovny.cz E-zdroje
- Klíčová slova
- t-likelihood, t test, Bayes factor, Bayesian model-averaging, Robust inference, Unequal variances,
- MeSH
- Bayesova věta MeSH
- experimentální psychologie * metody MeSH
- interpretace statistických dat MeSH
- lidé MeSH
- statistické modely * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
One of the most common statistical analyses in experimental psychology concerns the comparison of two means using the frequentist t test. However, frequentist t tests do not quantify evidence and require various assumption tests. Recently, popularized Bayesian t tests do quantify evidence, but these were developed for scenarios where the two populations are assumed to have the same variance. As an alternative to both methods, we outline a comprehensive t test framework based on Bayesian model averaging. This new t test framework simultaneously takes into account models that assume equal and unequal variances, and models that use t-likelihoods to improve robustness to outliers. The resulting inference is based on a weighted average across the entire model ensemble, with higher weights assigned to models that predicted the observed data well. This new t test framework provides an integrated approach to assumption checks and inference by applying a series of pertinent models to the data simultaneously rather than sequentially. The integrated Bayesian model-averaged t tests achieve robustness without having to commit to a single model following a series of assumption checks. To facilitate practical applications, we provide user-friendly implementations in JASP and via the RoBTT package in R . A tutorial video is available at https://www.youtube.com/watch?v=EcuzGTIcorQ.
Department of Psychology University of Amsterdam Amsterdam The Netherlands
Department of Psychology University of Oslo Oslo Norway
Institute for Advanced Study University of Amsterdam Amsterdam Netherlands
Institute for Biodiversity and Ecosystem Dynamics University of Amsterdam Amsterdam Netherlands
Institute of Computer Science Czech Academy of Sciences Prague Czechia
KG Jebsen Centre for Neurodevelopmental Disorders University of Oslo Oslo Norway
Machine Learning Group CWI Amsterdam Amsterdam The Netherlands
NevSom Department of Rare Disorders Oslo University Hospital Oslo Norway
Zobrazit více v PubMed
Alipourfard, N., Arendt, B., Benjamin, D. M., Benkler, N., Bishop, M., Burstein, M., ... Clark, C., Et al. (2021). Systematizing confidence in open research and evidence (score).
Barbieri, A., Marin, J. M., & Florin, K. (2016). A fully objective Bayesian approach for the Behrens-Fisher problem using historical studies. arXiv:1611.06873
Bartolucci, A. A., Blanchard, P. D., Howell, W. M., & Singh, K. P. (1998). A Bayesian Behrens-Fisher solution to a problem in taxonomy. Environmental Modelling & Software,13(1), 25–29. 10.1016/S1364-8152(97)00033-9
Bartoš, F., & Maier, M. (2022). RoBTT: An R package for robust Bayesian t-test.[SPACE]https://CRAN.R-project.org/package=RoBTT. (R package)
Bartoš, F., Gronau, Q. F., Timmers, B., Otte, W. M., Ly, A., & Wagenmakers, E. J. (2021). Bayesian model-averaged meta-analysis in medicine. Statistics in Medicine,40(30), 6743–6761. 10.1002/sim.9170 PubMed PMC
Bayarri, M. J., & Mayoral, A. M. (2002). Bayesian design of “successful’’ replications. The American Statistician,56, 207–214.
Berger, J.O., & Wolpert, R.L. (1988). The likelihood principle. Institute of Mathematical Statistics.
Berger, J. O., & Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence. Journal of the American Statistical Association,82, 112–139.
Bürkner, P. C., & Vuorre, M. (2019). Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science,2(1), 77–101. 10.1177/2515245918823
Cornfield, J. (1966). A Bayesian test of some classical hypotheses-with applications to sequential clinical trials. Journal of the American Statistical Association,61(315), 577–594. 10.1080/01621459.1966.10480890
Dablander, F., Bergh, D. V., Wagenmakers, E., & Ly, A. (2020). Default Bayes Factors for Testing the (In)equality of Several Population Variances.[SPACE]arXiv:2003.06278
Dayal, H. H., & Dickey, J. M. (1976). Bayes factors for Behrens-Fisher problems. Sankhyā: The Indian Journal of Statistics, Series B,38(4), 315–328.
de Heide, R., & Grünwald, P. D. (2021). Why optional stopping can be a problem for Bayesians. Psychonomic Bulletin & Review,28, 795–812. PubMed PMC
Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch’s t-test instead of Student’s t-test. International Review of Social Psychology,30(1), 92–101. 10.5334/irsp.82
Dickey, J. M. (1973). Scientific reporting and personal probabilities: Student’s hypothesis. Journal of the Royal Statistical Society B,35, 285–305.
Dickey, J. M. (1976). Approximate posterior distributions. Journal of the American Statistical Association,71, 680–689.
Dickey, J. M. (1977). Is the tail area useful as an approximate Bayes factor? Journal of the American Statistical Association,72(357), 138–142. 10.1080/01621459.1977.10479922
Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review,70, 193–242.
Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: an easy way to maximize the accuracy and power of your research. American Psychologist,63(7), 591–601. 10.1037/0003-066X.63.7.591 PubMed
Etz, A., & Wagenmakers, E. J. (2017). J. B. S. Haldane’s contribution to the Bayes factor hypothesis test. Statistical Science,32, 313–329. 10.1214/16-STS599
Ferrari, S., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics,31(7), 799–815. 10.1080/0266476042000214501
Fu, Q., Hoijtink, H., & Moerbeek, M. (2020). Sample-size determination for the bayesian [Image: see text] test and welch’s test using the approximate adjusted fractional bayes factor. Behavior Research Methods,53, 1–14. 10.3758/s13428-020-01408-1 PubMed PMC
Gallistel, C. R. (2009). The importance of proving the null. Psychological Review,116, 439–453. PubMed PMC
Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. Perspectives on Psychological Science,9(6), 641–651. 10.1177/1745691614551642 PubMed
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.
George, E. I., & McCulloch, R. E. (1993). Variable selection via gibbs sampling. Journal of the American Statistical Association,88(423), 881–889. 10.1080/01621459.1993.10476353
Gönen, M., Johnson, W. O., Lu, Y., & Westfall, P. H. (2005). The Bayesian two-sample [Image: see text] test. The American Statistician,59(3), 252–257. 10.1198/000313005X55233
Good, I. J. (1967). A Bayesian significance test for multinomial distributions. Journal of the Royal Statistical Society: Series B (Methodological),29(3), 399–418. 10.1111/j.2517-6161.1967.tb00705.x
Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of Consulting and Clinical Psychology,68(1), 155–165. 10.1037/0022-006X.68.1.155 PubMed
Gronau, Q. F., van Erp, S., Heck, D. W., Cesario, J., Jonas, K. J., & Wagenmakers, E. J. (2017). A Bayesian model-averaged meta-analysis of the power pose effect with informed and default priors: The case of felt power. Comprehensive Results in Social Psychology,2(1), 123–138. 10.1080/23743603.2017.1326760
Gronau, Q. F., Ly, A., & Wagenmakers, E. J. (2020). Informed Bayesian [Image: see text]-tests. The American Statistician,74, 137–143. 10.1080/00031305.2018.1562983
Hayes, A. F., & Cai, L. (2007). Further evaluating the conditional decision rule for comparing two independent means. British Journal of Mathematical and Statistical Psychology,60(2), 217–244. 10.1348/000711005X62576 PubMed
Hendriksen, A., de Heide, R., & Gruenwald, P. (2021). Optional stopping with bayes factors: a categorization and extension of folklore results, with an application to invariant situations. Bayesian Analysis,16(3), 961–989.
Hinne, M., Gronau, Q. F., van den Bergh, D., & Wagenmakers, E. J. (2020). A conceptual introduction to Bayesian model averaging. Advances in Methods and Practices in Psychological Science,3(2), 200–215. 10.1177/2515245919898657
Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: a tutorial. Statistical Science,14(4), 382–401. 10.1214/SS/1009212519
Ibrahim, J. G., Chen, M. H., Gwon, Y., & Chen, F. (2015). The power prior: theory and applications. Statistics in Medicine,34(28), 3724–3749. PubMed PMC
Jamil, T., Ly, A., Morey, R. D., Love, J., Marsman, M., & Wagenmakers, E. J. (2017). Default “gunel and dickey’’ bayes factors for contingency tables. Behavior Research Methods,49, 638–652. 10.3758/s13428-016-0739-8 PubMed PMC
JASP Team. (2022). JASP (Version 0.17)[Computer software].[SPACE]https://jasp-stats.org/.
Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophy Society,31, 203–222.
Jeffreys, H. (1939). Theory of probability (1st ed.). Oxford, UK: Oxford University Press.
Jeffreys, H. (1950). Bertrand russell on probability. Mind: A Quarterly Review of Psychology and Philosophy,59, 313–319.
Jennison, C., & Turnbull, B. W. (1999). Group sequential methods with applications to clinical trials. Boca Raton: CRC Press.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association,90(430), 773–795. 10.1080/01621459.1995.10476572
Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., et al. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research,68(3), 350–386. 10.3102/00346543068003350
Keysers, C., Gazzola, V., & Wagenmakers, E. J. (2020). Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nature Neuroscience,23, 788–799. 10.1038/s41593-020-0660-4 PubMed PMC
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General,142(2), 573. 10.1037/a0029146 PubMed
Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science,1, 270–280.
Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses. European Journal of Social Psychology,44(7), 701–710. 10.1002/ejsp.2023
Lee, M.D., & Wagenmakers, E. J. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press.
Lumley, T., Diehr, P., Emerson, S., Chen, L., et al. (2002). The importance of the normality assumption in large public health data sets. Annual Review of Public Health,23(1), 151–169. 10.1146/annurev.publhealth.23.100901.140546 PubMed
Ly, A., & Wagenmakers, E. J. (in press). Bayes factors for peri-null hypotheses. TEST. 10.48550/arXiv.2102.07162
Ly, A., Etz, A., Marsman, M., & Wagenmakers, E. J. (2019). Replication Bayes factors from evidence updating. Behavior Research Methods,51(6), 2498–2508. 10.3758/s13428-018-1092-x PubMed PMC
Ly, A., Verhagen, J., & Wagenmakers, E. J. (2016). Harold jeffreys’s default bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology,72, 19–32. 10.1016/j.jmp.2015.06.004
MacFarland, T.W., & Yates, J.M. (2016). Mann–whitney u test. In: Introduction to nonparametric statistics for the biological sciences using r (pp. 103–132). Springer.
Mair, P., & Wilcox, R. (2020). Robust statistical methods in r using the WRS2 package. Behavior Research Methods,52(2), 464–488. 10.3758/s13428-019-01246-w PubMed
Martel, C., Rathje, S., Clark, C. J., Pennycook, G., Bavel, J. J. V., Rand, D. G., & van der Linden, S. (2024). On the efficacy of accuracy prompts across partisan lines: An adversarial collaboration. Psychological Science,35(4), 435–450. 10.1177/09567976241232905. (PMID: 38506937). PubMed
McElreath, R. (2016). Statistical rethinking: A Bayesian course with examples in R and Stan. Boca Raton (FL): Chapman & Hall/CRC Press.
McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (2nd ed.). CRC Press.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology,46(4), 806–834. 10.1037/0022-006X.46.4.806
Moreno, E., Bertolino, F., & Racugno, W. (1999). Default Bayesian analysis of the Behrens-Fisher problem. Journal of Statistical Planning and Inference,81(2), 323–333. 10.1016/S0378-3758(99)00070-1
Morey, R.D., & Rouder, J.N. (2015). BayesFactor: Computation of Bayes factors for common designs.[SPACE]https://cran.r-project.org/web/packages/BayesFactor/index.html
Morey, R.D., & Rouder, J.N. (2018). BayesFactor 0.9.12-4.2. Comprehensive R Archive Network. http://cran.r-project.org/web/packages/BayesFactor/index.html
O’Hagan, A., & Forster, J. (2004). Kendall’s advanced theory of statistics vol. 2B: Bayesian inference (2nd ed.). London: Arnold.
Orben, A., & Lakens, D. (2020). Crud (re)defined. Advances in Methods and Practices in Psychological Science,3(2), 238–247. 10.1177/2515245920917961 PubMed PMC
Pennycook, G., McPhetres, J., Zhang, Y., Lu, J. G., & Rand, D. G. (2020). Fighting covid-19 misinformation on social media: Experimental evidence for a scalable accuracy-nudge intervention. Psychological Science,31(7), 770–780. PubMed PMC
Pleasant, A., & Barclay, P. (2018). Why hate the good guy? Antisocial punishment of high cooperators is greater when people compete to be chosen. Psychological Science,29(6), 868–876. 10.1177/0956797617752642 PubMed
Roozenbeek, J., Freeman, A. L., & Van Der Linden, S. (2021). How accurate are accuracy-nudge interventions? a preregistered direct replication of pennycook et al. (2020). Psychological Science,32(7), 1169–1178. 10.1177/09567976211024535 PubMed PMC
Rouder, J. N. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin & Review,21, 301–308. PubMed
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian [Image: see text] tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review,16(2), 225–237. 10.3758/PBR.16.2.225 PubMed
Sanborn, A. N., & Hills, T. T. (2014). The frequentist implications of optional stopping on Bayesian hypothesis tests. Psychonomic Bulletin & Review,21, 283–300. PubMed
Schnuerch, M., & Erdfelder, E. (2020). Controlling decision errors with minimal costs: The sequential probability ratio [Image: see text] test. Psychological Methods,25(2), 206. 10.1037/met0000234 PubMed
Schönbrodt, F. D., Wagenmakers, E. J., Zehetleitner, M., & Perugini, M. (2017). Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychological Methods,22(2), 322. 10.1037/met0000061 PubMed
Stefan, A. M., Evans, N. J., & Wagenmakers, E. J. (2020). Practical challenges and methodological flexibility in prior elicitation. Psychological Methods. 10.1037/met0000354 PubMed
Stefan, A.M., Schönbrodt, F., Evans, N.J., & Wagenmakers, E. J. (2020). Efficiency in sequential testing: Comparing the sequential probability ratio test and the sequential Bayes factor test. Manuscript submitted for publication, 10.31234/osf.io/ry4fw PubMed PMC
Stefan, A. M., Gronau, Q. F., Schönbrodt, F. D., & Wagenmakers, E. J. (2019). A tutorial on Bayes factor design analysis using an informed prior. Behavior Research Methods,51(3), 1042–1058. 10.3758/s13428-018-01189-8 PubMed PMC
Verhagen, J., & Wagenmakers, E. J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General,143(4), 1457. 10.1037/a0036731 PubMed
Vohs, K., Schmeichel, B., Lohmann, S., Gronau, Q.F., Finley, A.J., Wagenmakers, E. J., & Albarracin, D. (2021). A multi-site preregistered paradigmatic test of the ego depletion effect.
Wagenmakers, E. J., Gronau, Q.F., & Vandekerckhove, J. (2022). Five Bayesian intuitions for the stopping rule principle.[SPACE]10.31234/osf.io/5ntkd
Wagenmakers, E. J. (2020). Bayesian thinking for toddlers. Amsterdam: JASP Publishing.
Wagenmakers, E. J., & Grünwald, P. (2006). A bayesian perspective on hypothesis testing: A comment on killeen (2005). Psychological Science,17(7), 641. 10.1111/j.1467-9280.2006.01757.x PubMed
Wagenmakers, E. J., Morey, R. D., & Lee, M. D. (2016). Bayesian benefits for the pragmatic researcher. Current Directions in Psychological Science,25(3), 169–176. 10.1177/0963721416643289
Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of Mathematical Psychology,44, 92–107. PubMed
Western, B. (1995). Concepts and suggestions for robust regression analysis. American Journal of Political Science,39, 786–817. 10.2307/2111654
Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E. J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 [Image: see text] tests. Perspectives on Psychological Science,6, 291–298. PubMed
Wetzels, R., Raaijmakers, J. G., Jakab, E., & Wagenmakers, E. J. (2009). How to quantify support for and against the null hypothesis: A flexible WinBUGS implementation of a default Bayesian [Image: see text] test. Psychonomic Bulletin & Review,16(4), 752–760. 10.3758/PBR.16.4.752 PubMed
Wilcox, R. (2017). Modern statistics for the social and behavioral sciences: A practical introduction (2nd ed.). Boca Raton, FL: CRC Press.
Wrinch, D., & Jeffreys, H. (1921). On certain fundamental principles of scientific inquiry. Philosophical Magazine,42, 369–390. 10.1080/14786442108633773