Large language models are able to downplay their cognitive abilities to fit the persona they simulate

. 2024 ; 19 (3) : e0298522. [epub] 20240313

Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid38478522

This study explores the capabilities of large language models to replicate the behavior of individuals with underdeveloped cognitive and language skills. Specifically, we investigate whether these models can simulate child-like language and cognitive development while solving false-belief tasks, namely, change-of-location and unexpected-content tasks. GPT-3.5-turbo and GPT-4 models by OpenAI were prompted to simulate children (N = 1296) aged one to six years. This simulation was instantiated through three types of prompts: plain zero-shot, chain-of-thoughts, and primed-by-corpus. We evaluated the correctness of responses to assess the models' capacity to mimic the cognitive skills of the simulated children. Both models displayed a pattern of increasing correctness in their responses and rising language complexity. That is in correspondence with a gradual enhancement in linguistic and cognitive abilities during child development, which is described in the vast body of research literature on child development. GPT-4 generally exhibited a closer alignment with the developmental curve observed in 'real' children. However, it displayed hyper-accuracy under certain conditions, notably in the primed-by-corpus prompt type. Task type, prompt type, and the choice of language model influenced developmental patterns, while temperature and the gender of the simulated parent and child did not consistently impact results. We conducted analyses of linguistic complexity, examining utterance length and Kolmogorov complexity. These analyses revealed a gradual increase in linguistic complexity corresponding to the age of the simulated children, regardless of other variables. These findings show that the language models are capable of downplaying their abilities to achieve a faithful simulation of prompted personas.

Zobrazit více v PubMed

Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S, et al.. Emergent Abilities of Large Language Models; 2022.

Reynolds L, McDonell K. Prompt programming for large language models: Beyond the few-shot paradigm. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems; 2021. p. 1–7.

Wiener N. Cybernetics or Control and Communication in the Animal and the Machine. Paris: Hermann et Cie; 1948.

Janus. Simulators; 2023. Available from: https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators.

Shanahan M, McDonell K, Reynolds L. Role-Play with Large Language Models. arXiv preprint arXiv:230516367. 2023;. PubMed

Liu Y, Han T, Ma S, Zhang J, Yang Y, Tian J, et al.. Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models. Meta-Radiology. 2023; p. 1–14.

Milligan K, Astington JW, Dack LA. Language and Theory of Mind: Meta-analysis of the Relation Between Language Ability and False-Belief Understanding. Child Development. 2007;78(2):622–646. doi: 10.1111/j.1467-8624.2007.01018.x PubMed DOI

Hagendorff T, Fabi S, Kosinski M. Machine intuition: Uncovering human-like intuitive decision-making in GPT-3.5. arXiv preprint arXiv:221205206. 2022;.

Dasgupta I, Lampinen AK, Chan SC, Creswell A, Kumaran D, McClelland JL, et al. Language models show human-like content effects on reasoning. arXiv preprint arXiv:220707051. 2022;. PubMed PMC

Sap M, Le Bras R, Fried D, Choi Y. Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics; 2022. p. 3762–3780. Available from: https://aclanthology.org/2022.emnlp-main.248.

Baron-Cohen S, Leslie AM, Frith U. Does the autistic child have a “theory of mind”? Cognition. 1985;21(1):37–46. doi: 10.1016/0010-0277(85)90022-8 PubMed DOI

Wimmer H, Perner J. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition. 1983;13(1):103–128. doi: 10.1016/0010-0277(83)90004-5 PubMed DOI

Kosinski M. Theory of Mind Might Have Spontaneously Emerged in Large Language Models; arXiv preprint arXiv:2302.02083v5. 2023;.

Aher GV, Arriaga RI, Kalai AT. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. In: Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, editors. Proceedings of the 40th International Conference on Machine Learning. vol. 202 of Proceedings of Machine Learning Research. PMLR; 2023. p. 337–371. Available from: https://proceedings.mlr.press/v202/aher23a.html.

Korinek A. Language models and cognitive automation for economic research. National Bureau of Economic Research; 2023.

Park JS, Popowski L, Cai C, Morris MR, Liang P, Bernstein MS. Social simulacra: Creating populated prototypes for social computing systems. In: Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology; 2022. p. 1–18.

Caron G, Srivastava S. Identifying and manipulating the personality traits of language models. arXiv preprint arXiv:221210276. 2022;.

Karra SR, Nguyen S, Tulabandhula T. AI personification: Estimating the personality of language models. arXiv preprint arXiv:220412000. 2022;.

Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D. Out of one, many: Using language models to simulate human samples. Political Analysis. 2023; p. 1–15. doi: 10.1017/pan.2023.2 DOI

Wellman HM, Cross D, Watson J. Meta-analysis of Theory-of-Mind Development: The Truth About False Belief. Child Development. 2001;72(3):655–684. doi: 10.1111/1467-8624.00304 PubMed DOI

Liu D, Wellman HM, Tardif T, Sabbagh MA. Theory of Mind Development in Chinese Children: A Meta-analysis of False-Belief Understanding Across Cultures and Languages. Developmental Psychology. 2008;44(2):523–531. doi: 10.1037/0012-1649.44.2.523 PubMed DOI

Slaughter V, Imuta K, Peterson CC, Henry JD. Meta-analysis of theory of mind and peer popularity in the preschool and early school years. Child Development. 2015;86(4):1159–1174. doi: 10.1111/cdev.12372 PubMed DOI

Wellman HM. Theory of Mind: The State of the Art. European Journal of Developmental Psychology. 2018. doi: 10.1080/17405629.2018.1435413 DOI

Brown R. A First Language: The Early Stages. London: George Allen & Unwin; 1973.

Bickerton D. The Pace of Syntactic Acquisition. In: Proceedings of the Annual Meetings of the Berkeley Linguistics Society. vol. 17; 1991. p. 41–52.

Johnston JR. An Alternate MLU Calculation: Magnitude and Variability of Effects. Journal of Speech, Language, and Hearing Research. 2001;44:156–164. doi: 10.1044/1092-4388(2001/014) PubMed DOI

Rollins PR, Snow CE, Willett JB. Predictors of MLU: Semantic and Morphological Developments. First Language. 1996;16:243–259. doi: 10.1177/014272379601604705 DOI

Klima ES, Bellugi U. Syntactic Regularities in the Speech of Children. In: Lyons J, Wales RJ, editors. Psycholinguistics Papers. Edinburgh: University of Edinburgh Press; 1966. p. 183–208.

Ezeizabarrena MJ, Garcia Fernandez I. Length of Utterance, in Morphemes or in Words?: MLU3-w, a Reliable Measure of Language Development in Early Basque. Frontiers in Psychology. 2018;8. doi: 10.3389/fpsyg.2017.02265 PubMed DOI PMC

Rice ML, Smolik F, Perpich D, Thompson T, Rytting N, Blossom M. Mean Length of Utterance Levels in 6-Month Intervals for Children 3 to 9 Years With and Without Language Impairments. Journal of Speech, Language, and Hearing Research: JSLHR. 2010;53(2):333–349. doi: 10.1044/1092-4388(2009/08-0183) PubMed DOI PMC

Clark EV. First Language Acquisition. Cambridge, MA: Cambridge University Press; 2003.

Houwer AD. Bilingual First Language Acquisition. Bristol, Blue Ridge Summit: Multilingual Matters; 2009. Available from: 10.21832/9781847691507 [cited 2023-10-09]. DOI

MacWhinney B. The CHILDES Project: Tools for Analyzing Talk. 3rd ed. Mahwah, NJ: Lawrence Erlbaum Associates; 2000.

Bates E, Bretherton I, Snyder L. From First Words to Grammar: Individual Differences and Dissociable Mechanisms. Cambridge, MA: Cambridge University Press; 1988.

Bernstein N. Acoustic Study of Mothers’ Speech to Language-Learning Children: An Analysis of Vowel Articulatory Characteristics [PhD]. Boston University. Boston; 1982.

Demetras M. Working Parents’ Conversational Responses to Their Two-Year-Old Sons [PhD]. The University of Arizona. Arizona; 1989.

Gelman SA, Taylor MG, Nguyen S. Mother-Child Conversations About Gender: Understanding the Acquisition of Essentialist Beliefs. Monographs of the Society for Research in Child Development. 2004;69(1):I–142.

Bellinger D, Gleason J. Sex Differences in Parental Directives to Young Children. Journal of Sex Roles. 1982;8:1123–1139. doi: 10.1007/BF00290968 DOI

Higginson RP. Fixing-Assimilation in Language Acquisition [PhD]. Washington State University. Washington; 1985.

Dickinson DK, Tabors PO, editors. Beginning Literacy with Language: Young Children Learning at Home and School. Baltimore: Paul Brookes Publishing; 2001.

McCune L. A Normative Study of Representational Play at the Transition to Language. Developmental Psychology. 1995;31(2):198–206. doi: 10.1037/0012-1649.31.2.198 DOI

Morisset CE, Barnard KE, Booth CL. Toddlers’ Language Development: Sex Differences Within Social Risk. Developmental Psychology. 1995;31(5):851–865. doi: 10.1037/0012-1649.31.5.851 DOI

Wei J, Wang X, Schuurmans D, Bosma M, ichter b, Xia F, et al.. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, editors. Advances in Neural Information Processing Systems. vol. 35. Curran Associates, Inc.; 2022. p. 24824–24837. Available from: https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf.

Hebenstreit K, Praas R, Kiesewetter LP, Samwald M. An automatically discovered chain-of-thought prompt generalizes to novel models and datasets. arXiv preprint arXiv:2305.02897v2. 2023;.

Perner J, Leekam S, Wimmer H. Three-year-olds’ difficulty with false belief: The case for a conceptual deficit. British Journal of Developmental Psychology. 1987;5:125–137. doi: 10.1111/j.2044-835X.1987.tb01048.x DOI

Hines M. Sex and Sex Differences. In: Zelazo PD, editor. The Oxford Handbook of Developmental Psychology. vol. 1. New York: Oxford University Press; 2013. p. 164–201.

Eriksson M, Marschik PB, Tulviste T, Almgren M, Pérez Pereira M, Wehberg S, et al.. Differences Between Girls and Boys in Emerging Language Skills: Evidence from 10 Language Communities. British Journal of Developmental Psychology. 2012;30(2):326–343. doi: 10.1111/j.2044-835X.2011.02042.x PubMed DOI

Fenson L, Dale PS, Reznick SJ, Bates E, Thal DJ, Pethick SJ. Variability in Early Communicative Development. Monographs of the Society for Research in Child Development. 1994;59(5):1. doi: 10.2307/1166093 PubMed DOI

Lange B, Euler H, Zaretsky E. Sex differences in language competence of 3- to 6-year-old children. Applied Psycholinguistics. 2016;-1:1–22. doi: 10.1017/S0142716415000624 DOI

Pancsofar N, Vernon-Feagans L. Mother and father language input to young children: Contributions to later language development. Journal of Applied Developmental Psychology. 2006;27:571–587. doi: 10.1016/j.appdev.2006.08.003 DOI

Gilkerson J, Richards JA. The Power of Talk, Second Edition. Boulder, CO: LENA Foundation; 2016. Available from: https://www.lena.org/wp-content/uploads/2016/07/LTR-01-2_PowerOfTalk.pdf.

Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:230312712. 2023;.

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, et al.. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems. 2022;35:27730–27744.

OpenAI. Chat API Reference; 2023. Available from: https://platform.openai.com/docs/api-reference/chat.

Juola P. Assessing linguistic complexity. Language Complexity: Typology, Contact, Change John Benjamins Press, Amsterdam, Netherlands. 2008;.

Kolmogorov A. Three Approaches to the Quantitative Definition of Information. Problems of Information Transmission. 1965;1(1):1–7.

Li M, Vitányi P. An Introduction to Kolmogorov Complexity and Its Applications. New York, Springer; 2013.

Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on information theory. 1977;23(3):337–343. doi: 10.1109/TIT.1977.1055714 DOI

Huffman DA. A method for the construction of minimum-redundancy codes. Proceedings of the IRE. 1952;40(9):1098–1101. doi: 10.1109/JRPROC.1952.273898 DOI

Deutsch LP. DEFLATE Compressed Data Format Specification version 1.3; 1996. RFC 1951. Available from: https://www.rfc-editor.org/info/rfc1951.

Landis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data. Biometrics. 1977;33(1):159–74. doi: 10.2307/2529310 PubMed DOI

Zhou Y, Muresanu AI, Han Z, Paster K, Pitis S, Chan H, et al.. Large Language Models Are Human-Level Prompt Engineers; 2023.

Burnell R, Schellaert W, Burden J, Ullman TD, Martinez-Plumed F, Tenenbaum JB, et al.. Rethink reporting of evaluation results in AI. Science. 2023;380(6641):136–138. doi: 10.1126/science.adf6369 PubMed DOI

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Large language models are proficient in solving and creating emotional intelligence tests

. 2025 May 21 ; 3 (1) : 80. [epub] 20250521

Najít záznam

Citační ukazatele

Pouze přihlášení uživatelé

Možnosti archivace

Nahrávání dat ...