Large language models are proficient in solving and creating emotional intelligence tests
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články
PubMed
40399566
PubMed Central
PMC12095572
DOI
10.1038/s44271-025-00258-x
PII: 10.1038/s44271-025-00258-x
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
Large Language Models (LLMs) demonstrate expertise across diverse domains, yet their capacity for emotional intelligence remains uncertain. This research examined whether LLMs can solve and generate performance-based emotional intelligence tests. Results showed that ChatGPT-4, ChatGPT-o1, Gemini 1.5 flash, Copilot 365, Claude 3.5 Haiku, and DeepSeek V3 outperformed humans on five standard emotional intelligence tests, achieving an average accuracy of 81%, compared to the 56% human average reported in the original validation studies. In a second step, ChatGPT-4 generated new test items for each emotional intelligence test. These new versions and the original tests were administered to human participants across five studies (total N = 467). Overall, original and ChatGPT-generated tests demonstrated statistically equivalent test difficulty. Perceived item clarity and realism, item content diversity, internal consistency, correlations with a vocabulary test, and correlations with an external ability emotional intelligence test were not statistically equivalent between original and ChatGPT-generated tests. However, all differences were smaller than Cohen's d ± 0.25, and none of the 95% confidence interval boundaries exceeded a medium effect size (d ± 0.50). Additionally, original and ChatGPT-generated tests were strongly correlated (r = 0.46). These findings suggest that LLMs can generate responses that are consistent with accurate knowledge about human emotions and their regulation.
Institute of Psychology Czech Academy of Sciences Brno Czech Republic
Institute of Psychology University of Bern Bern Switzerland
Swiss Center for Affective Sciences University of Geneva Geneva Switzerland
Zobrazit více v PubMed
Niedenthal, P. & Brauer, M. Social functionality of human emotion. Annu. Rev. Psychol.63, 259–285 (2012). PubMed
Mayer, J. D., Caruso, D. R. & Salovey, P. The ability model of emotional intelligence: Principles and ipdates. Emot. Rev.8, 290–300 (2016).
Schlegel, K., Jong, M. de & Boros, S. Conflict management 101: How emotional intelligence can make or break a manager. Int. J. Confl. Manag.36, 145–165 (2025).
Picard, R. W. Affective Computing. (The MIT Press, Cambridge, 1997).
Schuller, D. & Schuller, B. W. The age of artificial emotional intelligence. Computer51, 38–46 (2018).
Marcos-Pablos, S. & García-Peñalvo, F. J. Emotional intelligence in robotics: A scoping review. in New Trends in Disruptive Technologies, Tech Ethics and Artificial Intelligence (eds. de Paz Santana, J. F., de la Iglesia, D. H. & López Rivero, A. J.) 66–75 (Springer, Cham, 2022).
Abdollahi, H., Mahoor, M. H., Zandie, R., Siewierski, J. & Qualls, S. H. Artificial emotional intelligence in socially assistive robots for older adults: A pilot study. IEEE Trans. Affect. Comput.14, 2020–2032 (2023). PubMed PMC
Mejbri, N., Essalmi, F., Jemni, M. & Alyoubi, B. A. Trends in the use of affective computing in e-learning environments. Educ. Inf. Technol.27, 3867–3889 (2022).
Quaquebeke, N. V. & Gerpott, F. H. The now, new, and next of digital leadership: how artificial intelligence (AI) will take over and change leadership as we know it. J. Leadersh. Organ. Stud.30, 265–275 (2023).
Bubeck, S. et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4. Preprint at 10.48550/arXiv.2303.12712 (2023).
Nature Human Behavior Living in a brave new AI era. Nat. Hum. Behav.7, 1799 (2023). PubMed
Hagendorff, T. Deception abilities emerged in large language models. Proc. Natl. Acad. Sci.121, e2317967121 (2024). PubMed PMC
Nakadai, R., Nakawake, Y. & Shibasaki, S. AI language tools risk scientific diversity and innovation. Nat. Hum. Behav.7, 1804–1805 (2023). PubMed
Suzuki, S. We need a culturally aware approach to AI. Nat. Hum. Behav.7, 1816–1817 (2023). PubMed
Cao, X. & Kosinski, M. Large language models know how the personality of public figures is perceived by the general public. Sci. Rep.14, 6735 (2024). PubMed PMC
Kosinski, M. Evaluating large language models in theory of mind tasks. Proc. Natl. Acad. Sci.121, e2405460121 (2024). PubMed PMC
Strachan, J. W. A. et al. Testing theory of mind in large language models and humans. Nat. Hum. Behav.8, 1285–1295 (2024). PubMed PMC
Ayers, J. W. et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med.183, 589–596 (2023). PubMed PMC
Inzlicht, M., Cameron, C. D., D’Cruz, J. & Bloom, P. In praise of empathic AI. Trends Cogn. Sci.28, 89–91 (2024). PubMed
Perry, A. AI will never convey the essence of human empathy. Nat. Hum. Behav.7, 1808–1809 (2023). PubMed
Mortillaro, M. & Schlegel, K. Embracing the emotion in emotional intelligence measurement: Insights from emotion theory and research. J. Intell.11, 210 (2023). PubMed PMC
Lane, R. D., Quinlan, D. M., Schwartz, G. E., Walker, P. A. & Zeitlin, S. B. The Levels of Emotional Awareness Scale: A cognitive-developmental measure of emotion. J. Pers. Assess.55, 124–134 (1990). PubMed
Elyoseph, Z., Hadar-Shoval, D., Asraf, K. & Lvovsky, M. ChatGPT outperforms humans in emotional awareness evaluations. Front. Psychol.14, 1199058 (2023). PubMed PMC
MacCann, C. & Roberts, R. D. New paradigms for assessing emotional intelligence: Theory and data. Emotion8, 540–551 (2008). PubMed
Schlegel, K. & Scherer, K. R. The nomological network of emotion knowledge and emotion understanding in adults: evidence from two new performance-based tests. Cogn. Emot.32, 1514–1530 (2018). PubMed
Schlegel, K. & Mortillaro, M. The geneva emotional competence test (GECo): An ability measure of workplace emotional intelligence. J. Appl. Psychol.104, 559–580 (2019). PubMed
Gandhi, K., Fränken, J.-P., Gerstenberg, T. & Goodman, N. D. Understanding Social Reasoning in Language Models with Language Models. Preprint at 10.48550/arXiv.2306.15448 (2023).
Jiang, H. et al. PersonaLLM: Investigating the ability of large language models to express personality traits. Preprint at 10.48550/arXiv.2305.02547 (2024).
Milička, J. et al. Large language models are able to downplay their cognitive abilities to fit the persona they simulate. PLOS ONE19, e0298522 (2024). PubMed PMC
Roseman, I. J. A model of appraisal in the emotion system: Integrating theory, research, and applications. in Appraisal Processes in Emotion: Theory, Methods, Research (eds. Scherer, K. R., Schorr, A. & Johnstone, T.) 68–91 (Oxford University Press, New York, 2001).
Scherer, K. R. Component models of emotion can inform the quest for emotional competence. in The Science of Emotional Intelligence: Knowns and Unknowns (eds. Matthews, G., Zeidner, M. & Roberts, R. D.) 101–126 (Oxford University Press, New York, 2007).
Fontaine, J. J. R., Scherer, K. R. & Soriano, C. (eds.) Components of Emotional Meaning: A Sourcebook. (Oxford University Press, New York, 2013).
Garnefski, N., Kraaij, V. & Spinhoven, P. Negative life events, cognitive emotion regulation and emotional problems. Personal. Individ. Differ.30, 1311–1327 (2001).
Thomas, K. W. Conflict and conflict management: Reflections and update. J. Organ. Behav.13, 265–274 (1992).
Allen, V. D., Weissman, A., Hellwig, S., MacCann, C. & Roberts, R. D. Development of the situational test of emotional understanding – brief (STEU-B) using item response theory. Personal. Individ. Differ.65, 3–7 (2014).
Allen, V. et al. The Situational Test of Emotional Management – Brief (STEM-B): Development and validation using item response theory and latent class analysis. Personal. Individ. Differ.81, 195–200 (2015).
Vermeiren, H., Vandendaele, A. & Brysbaert, M. Validated tests for language research with university students whose native language is English: Tests of vocabulary, general knowledge, author recognition, and reading comprehension. Behav. Res. Methods55, 1036–1068 (2023). PubMed
Faul, F., Erdfelder, E., Lang, A.-G. & Buchner, A. G. Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods39, 175–191 (2007). PubMed
Cohen, J. Statistical Power Analysis for the Behavioral Sciences. (2nd edn), Routledge, New York, 1988).
Caldwell, A. R. Exploring equivalence testing with the updated TOSTER R package. Preprint at 10.31234/osf.io/ty8de (2022).
Olderbak, S., Semmler, M. & Doebler, P. Four-branch model of ability emotional intelligence with fluid and crystallized intelligence: a meta-analysis of relations. Emot. Rev.11, 166–183 (2019).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol.57, 289–300 (1995).
Viechtbauer, W. Conducting Meta-Analyses in R with the metafor Package. J. Stat. Softw.36, 1–48 (2010).
Montemayor, C., Halpern, J. & Fairweather, A. In principle obstacles for empathic AI: why we can’t replace human empathy in healthcare. AI Soc.37, 1353–1359 (2022). PubMed PMC
Joseph, D. L. & Newman, D. A. Emotional intelligence: An integrative meta-analysis and cascading model. J. Appl. Psychol.95, 54–78 (2010). PubMed
Simpson, J. A. et al. Attachment and the management of empathic accuracy in relationship-threatening situations. Pers. Soc. Psychol. Bull.37, 242–254 (2011). PubMed PMC
Drollinger, T., Comer, L. B. & Warrington, P. T. Development and validation of the active empathetic listening scale. Psychol. Mark.23, 161–180 (2006).
Ullman, T. Large language models fail on trivial alterations to Theory-of-Mind tasks. Preprint at 10.48550/arXiv.2302.08399 (2023).
Mesquita, B. & Schouten, A. Culture and emotion regulation. in Handbook of Emotion Regulation (eds. Gross, J. J. & Ford, B. Q.) 218–224 (3rd edn, The Guilford Press, New York, 2024).
Scherer, K. R., Clark-Polner, E. & Mortillaro, M. In the eye of the beholder? Universality and cultural specificity in the expression and perception of emotion. Int. J. Psychol.46, 401–435 (2011). PubMed
Shao, B., Doucet, L. & Caruso, D. R. Universality versus cultural specificity of three emotion domains: some evidence based on the cascading model of emotional intelligence. J. Cross-Cult. Psychol.46, 229–251 (2015).
Choudhury, M. Generative AI has a language problem. Nat. Hum. Behav.7, 1802–1803 (2023). PubMed
Yuan, H. et al. The high dimensional psychological profile and cultural bias of ChatGPT. Preprint at 10.48550/arXiv.2405.03387 (2024).
Angelov, P. P., Soares, E. A., Jiang, R., Arnold, N. I. & Atkinson, P. M. Explainable artificial intelligence: an analytical review. WIREs Data Min. Knowl. Discov.11, e1424 (2021).
Yiu, E., Kosoy, E. & Gopnik, A. Transmission versus truth, imitation versus innovation: what children can do that large language and language-and-vision models cannot (yet). Perspect. Psychol. Sci.19, 874–883 (2023). PubMed PMC
Sommer, N., Schlegel, K. & Mortillaro, M. The use of generative artificial intelligence in the creation of emotional intelligence tests. https://osf.io/mgqre/ (2023).
Diedenhofen, B. & Musch, J. cocron: A web interface and R package for the statistical comparison of Cronbach’s alpha coefficients. Int. J. Internet Sci.11, 51–60 (2016).
Goh, J. X., Hall, J. A. & Rosenthal, R. Mini meta-analysis of your own studies: Some arguments on why and a primer on how. Soc. Personal. Psychol. Compass10, 535–549 (2016).
Lee, I. A. & Preacher, K. J. Calculation for the test of the difference between two dependent correlations with one variable in common. Computer Software at https://quantpsy.org/corrtest/corrtest2.htm (2013).