Beyond English: Considering Language and Culture in Psychological Text Analysis
Status PubMed-not-MEDLINE Language English Country Switzerland Media electronic-ecollection
Document type Journal Article
PubMed
35310262
PubMed Central
PMC8931497
DOI
10.3389/fpsyg.2022.819543
Knihovny.cz E-resources
- Keywords
- LIWC, closed-vocabulary approaches, cross-language, culture, natural language processing,
- Publication type
- Journal Article MeSH
The paper discusses the role of language and culture in the context of quantitative text analysis in psychological research. It reviews current automatic text analysis methods and approaches from the perspective of the unique challenges that can arise when going beyond the default English language. Special attention is paid to closed-vocabulary approaches and related methods (and Linguistic Inquiry and Word Count in particular), both from the perspective of cross-cultural research where the analytic process inherently consists of comparing phenomena across cultures and languages and the perspective of generalizability beyond the language and the cultural focus of the original investigation. We highlight the need for a more universal and flexible theoretical and methodological grounding of current research, which includes the linguistic, cultural, and situational specifics of communication, and we provide suggestions for procedures that can be implemented in future studies and facilitate psychological text analysis across languages and cultures.
See more in PubMed
Abusa’aleek A. (2015). Internet linguistics: a linguistic analysis of electronic discourse as a new variety of language. Int. J. Engl. Linguist. 5. 10.5539/ijel.v5n1p135 DOI
Afshin H., Alaeddini M. (2016). A Contrastive Analysis of Machine Translation (Google Translate) and Human Translation: efficacy in Translating Verb Tense from English to Persian. Mediterr. J. Soc. Sci. 7:40. 10.5901/mjss.2016.v7n4S2p40 DOI
Agosti A., Rellini A. (2007). The Italian LIWC Dictionary: Technical Report. Austin: LIWC.Net.
Althoff T., Clark K., Leskovec J. (2016). Large-scale Analysis of Counseling Conversations: an Application of Natural Language Processing to Mental Health. Trans. Assoc. Comput. Linguist. 4 463–476. 10.1162/tacl_a_00111 PubMed DOI PMC
Amini H., Farahnak F., Kosseim L. (2019). “Natural Language Processing: An Overview,” in Frontiers in Pattern Recognition and Artificial Intelligence, eds Blom M., Nobile N., Suen C. Y. (Singapore: World Scientific; ), 35–55. 10.1142/9789811203527_0003 DOI
Andrei A. L. (2014). Development and evaluation of Tagalog linguistic inquiry and word count (LIWC) dictionaries for negative and positive emotion. Mclean: Mitre Corp Mclean.
Araújo M., Pereira A., Benevenuto F. (2020). A comparative study of machine translation for multilingual sentence-level sentiment analysis. Inf. Sci. 512 1078–1102.
Asher N., van de Cruys T. (2018). Content vs. function words: the view from distributional semantics. Proc. Sinn Und Bedeutung 22 1–21.
Avolio B. J., Gardner W. L. (2005). Authentic leadership development: getting to the root of positive forms of leadership. Leadersh. Q. 16 315–338. 10.1016/j.leaqua.2005.03.001 DOI
Baccianella S., Esuli A., Sebastiani F. (2010). “Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining,” Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), (France: European Language Resources Association (ELRA)), 2200–2204.
Baeza-Yates R., Liaghat Z. (2017). “Quality-efficiency trade-offs in machine learning for text processing,” in 2017 IEEE International Conference on Big Data (Big Data), (Boston: IEEE; ), 897–904.
Balage Filho P., Pardo T. A. S., Aluísio S. (2013). “An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis,” in Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, (Porto Alegre: SBC; ).
Barrett H. C. (2020). Towards a Cognitive Science of the Human: cross-Cultural Approaches and Their Urgency. Trends Cogn. Sci. 24 620–638. 10.1016/j.tics.2020.05.007 PubMed DOI
Basnight-Brown D. M., Altarriba J. (2018). “The influence of emotion and culture on language representation and processing,” in Advances in culturally-aware intelligent systems and in cross-cultural psychological studies, ed. Faucher C. (Berlin: Springer; ), 415–432.
Bender E. M. (2011). On achieving and evaluating language-independence in NLP. Linguist. Issues Lang. Technol. 6 1–26.
Bermel N. (2014). “Czech diglossia: Dismantling or dissolution?,” in Divided Languages?, eds Arokay J., Gvozdanovic J., Miyajima D. (Berlin: Springer; ), 21–37.
Berry D. S., Pennebaker J. W., Mueller J. S., Hiller W. S. (1997). Linguistic bases of social perception. Pers. Soc. Psychol. Bull. 23 526–537.
Biber D. (1991). Variation Across Speech and Writing. Cambridge: Cambridge University Press.
Biber D. (2014). Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Lang. Contrast 14 7–34.
Biber D., Conrad S. (2019). Register, Genre, and Style. Cambridge: Cambridge University Press.
Bjekić J., Lazareviæ L. B., Živanoviæ M., Kneževiæ G. (2014). Psychometric evaluation of the Serbian dictionary for automatic text analysis—LIWCser. Psihologija 47 5–32. 10.2298/psi1401005b DOI
Boot P. (2021). Machine-translated texts as an alternative to translated dictionaries for LIWC. Open Science Framework [Preprint]. 10.31219/osf.io/tsc36 DOI
Boot P., Zijlstra H., Geenen R. (2017). The Dutch translation of the Linguistic Inquiry and Word Count (LIWC) 2007 dictionary. Dutch J. Appl. Linguist. 6 65–76. 10.1075/dujal.6.1.04boo PubMed DOI
Bradley M. M., Lang P. J. (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical report C-1. Gainesville: University of Florida, Center for research in psychophysiology.
Brewer M. B., Gardner W. (1996). Who is this “We”? Levels of collective identity and self representations. J. Pers. Soc. Psychol. 71:83. 10.1037/0022-3514.71.1.83 DOI
Carvalho F., Rodrigues R. G., Santos G., Cruz P., Ferrari L., Guedes G. P. (2019). “Evaluating the Brazilian Portuguese version of the 2015 LIWC Lexicon with sentiment analysis in social networks,” in Anais Do VIII Brazilian Workshop on Social Network Analysis and Mining, (Porto Alegre: SBC; ), 24–34.
Castelvecchi D. (2016). Can we open the black box of AI?. Nat. News 538:20. 10.1038/538020a PubMed DOI
Chen J., Qiu L., Ho M.-H. R. (2020). A meta-analysis of linguistic markers of extraversion: Positive emotion and social process words. J. Res. Pers. 89:104035. 10.1016/j.jrp.2020.104035 DOI
Chung C. K., Pennebaker J. W. (2018). “What do we know when we LIWC a person? Text analysis as an assessment tool for traits, personal concerns and life stories,” in The SAGE Handbook of Personality and Individual Differences: The Science of Personality and Individual Differences, eds Zeigler-Hill V., Shackelford T. K. (Thousand Oaks: Sage; ), 341–360.
Church A. T., Katigbak M. S. (1989). Internal, external, and self-report structure of personality in a non-western culture: an investigation of cross-language and cross-cultural generalizability. J. Pers. Soc. Psychol. 57:857.
Corver N., van Riemsdijk H. (2001). Semi-lexical categories: The function of content words and the content of function words. Berlin: Walter de Gruyter.
Cruse D. A., Cruse D. A., Cruse D. A., Cruse D. A. (1986). Lexical Semantics. Cambridge: Cambridge University Press.
Cvrček V., Laubeová Z., Lukeš D., Poukarová P., Řehořková A., Zasina A. J. (2020). Author and register as sources of variation: a corpus-based study using elicited texts. Int. J. Corpus Linguist. 25 461–488.
Daems J., Speelman D., Ruette T. (2013). Register analysis in blogs: correlation between professional sector and functional dimensions. Leuven Work. Papers Linguist. 2 1–27.
de Marneffe M.-C., Manning C. D., Nivre J., Zeman D. (2021). Universal dependencies. Comput. Linguist. 47 255–308.
Demjén Z. (2014). Drowning in negativism, self-hate, doubt, madness: linguistic insights into Sylvia Plath’s experience of depression’. Commun. Med. 11 41–54. 10.1558/cam.v11i1.18478 PubMed DOI
Dino A., Reysen S., Branscombe N. R. (2009). Online Interactions Between Group Members Who Differ in Status. J. Lang. Soc. Psychol. 28 85–93. 10.1177/0261927X08325916 DOI
Dudãu D. P., Sava F. A. (2020). The development and validation of the Romanian version of Linguistic Inquiry and Word Count 2015 (Ro-LIWC2015). Curr. Psychol. 10.1007/s12144-020-00872-4 DOI
Dudãu D. P., Sava F. A. (2021). Performing multilingual analysis with Linguistic Inquiry and Word Count 2015 (LIWC2015). An equivalence study of four languages. Front. Psychol. 12:570568. 10.3389/fpsyg.2021.570568 PubMed DOI PMC
Duff A. S. (2000). Information Society Studies (Vol. 3). East Sussex: Psychology Press.
Ehrlinger L., Haunschmid V., Palazzini D., Lettner C. (2019). “A DaQL to monitor data quality in machine learning applications,” in International Conference on Database and Expert Systems Applications, eds Hartmann S., Küng J., Chakravarthy S., Anderst-Kotsis G., Tjoa A., Khalil I. (Cham: Springer; ), 227–237.
Eichstaedt J. C., Kern M. L., Yaden D. B., Schwartz H. A., Giorgi S., Park G., et al. (2020). Closed and open vocabulary approaches to text analysis: a review, quantitative comparison, and recommendations. PsyArXiv [Preprint]. 10.31234/osf.io/t52c6 PubMed DOI
Fuller S. (2005). Another sense of the information age. Inf. Commun. Soc. 8 459–463. 10.1080/13691180500418246 DOI
Gandomi A., Haider M. (2015). Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35 137–144. 10.1016/j.ijinfomgt.2014.10.007 DOI
Gao R., Hao B., Li H., Gao Y., Zhu T. (2013). “Developing simplified Chinese psychological linguistic analysis dictionary for microblog,” in International Conference on Brain and Health Informatics, (Berlin: Springer International Publishing; ), 359–368. 10.1007/978-3-319-02753-1_36 DOI
Garimella A., Mihalcea R., Pennebaker J. (2016). “). Identifying Cross-Cultural Differences in Word Usage,” in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, (Japan: The COLING 2016 Organizing Committee; ), 674–683. https://www.aclweb.org/anthology/C16-1065
Garten J., Hoover J., Johnson K. M., Boghrati R., Iskiwitch C., Dehghani M. (2018). Dictionaries and distributions: combining expert knowledge and large scale textual data content analysis. Behav. Res. Methods 50 344–361. 10.3758/s13428-017-0875-9 PubMed DOI
Gill A. J., Nowson S., Oberlander J. (2009). “). What are they blogging about? Personality, topic and motivation in blogs,” in Third International AAAI Conference on Weblogs and Social Media, eds Adar E., Hurst M., Finin T., Glance N. S., Nicolov N., Tseng B. L. (California: The AAAI Press; ), 18–25.
Gill A. J., Oberlander J. (2019). “Taking care of the linguistic features of extraversion,” in Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society, eds Gray W. D., Schunn C. D. (Mahwah: Lawrence Erlbaum Associates; ), 363–368. 10.4324/9781315782379-99 DOI
Goldberg S. B., Flemotomos N., Martinez V. R., Tanana M. J., Kuo P. B., Pace B. T., et al. (2020). Machine learning and natural language processing in psychotherapy research: alliance as example use case. J. Couns. Psychol. 67 438–448. 10.1037/cou0000382 PubMed DOI PMC
Gottschalk L. A. (2000). The Application of Computerized Content Analysis of Natural Language in Psychotherapy Research Now and in the Future. Am. J. Psychother. 54 305–311. 10.1176/appi.psychotherapy.2000.54.3.305 PubMed DOI
Gottschalk L. A., Winget C. N., Gleser G. C. (1969). Manual of Instructions for Using the Gottschalk-Gleser Content Analysis Scales: Anxiety, Hostility, and Social Alienation–personal Disorganization. California: University of California Press.
Gutiérrez-Artacho J., Olvera-Lobo M.-D., Rivera-Trigueros I. (2019). “Hybrid machine translation oriented to cross-language information retrieval: English-Spanish error analysis,” in World Conference on Information Systems and Technologies, eds Rocha Á, Adeli H., Reis L., Costanzo S. (Cham: Springer; ), 185–194.
Haider T., Palmer A. (2017). “Modeling communicative purpose with functional style: Corpus and features for German genre and register analysis,” in Proceedings of the Workshop on Stylistic Variation, (Stroudsburg: Association for Computational Linguistics; ), 74–84.
Harley T. A. (2013). The Psychology of Language: From Data to Theory. East Sussex: Psychology press.
Hart R. P. (2001). “Redeveloping DICTION: Theoretical considerations,” in Progress in Communication Sciences, ed. West M. (New York: Springer; ), 43–60.
Hart R. P., Carroll C. (2011). DICTION: The text-analysis program. Thousand Oaks: Sage.
Haspelmath M. (2020). The structural uniqueness of languages and the value of comparison for language description. Asian Lang. Linguist. 1 346–366. 10.3389/fneur.2019.01207 PubMed DOI PMC
Hasselgård H. (2013). “Crosslinguistic Differences in Grammar,” in The Encyclopedia of Applied Linguistics, ed. Chapplle C. A. (Hoboken: Blackwell Publishing Ltd; ). 10.1002/9781405198431.wbeal0290 DOI
Hayeri N. (2014). Does gender affect translation?: Analysis of English talks translated to Arabic. Ph.D. thesis. Austin: The University of Texas.
Hickey R. (n.d.). English Linguistics. In English Linguistics in Essen. Duisburg: University of Duisburg and Essen. https://www.uni-due.de/ELE/
Hieber D. W. (2020). “The languages and linguistics of indigenous North America: Word Classes,” in The languages and linguistics of indigenous North America: A comprehensive guide (The World of Linguistics 13), eds Jany C., Rice K., Mithun M. (Berlin: Mouton de Gruyter; ).
Hogenraad R. (2018). Smoke and mirrors: Tracing ambiguity in texts. Digit. Scholarsh. Humanit. 33 297–315. 10.1093/llc/fqx044 DOI
Holtzman N. S., Tackman A. M., Carey A. L., Brucks M. S., Küfner A. C., Deters F. G., et al. (2019). Linguistic markers of grandiose narcissism: a LIWC analysis of 15 samples. J. Lang. Soc. Psychol. 38 773–786.
Huang C.-L., Chung C. K., Hui N., Lin Y.-C., Seih Y.-T., Lam B. C., et al. (2012). The development of the Chinese linguistic inquiry and word count dictionary. Chin. J. Psychol. 54 185–201. 10.3389/fpsyg.2021.648677 PubMed DOI PMC
Iliev R., Dehghani M., Sagi E. (2015). Automated text analysis in psychology: methods, applications, and future developments. Lang. Cogn. 7 265–290. 10.1186/s13063-015-0931-7 PubMed DOI PMC
Impana P., Kallimani J. S. (2017). “Cross-lingual sentiment analysis for Indian regional languages,” in 2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), (New Jersey: IEEE; ), 1–6.
Internet Users by Language (2021). Internet World Stats. Available online at: https://www.internetworldstats.com/stats7.htm (accessed September 24, 2021).
Ireland M. E., Pennebaker J. W. (2010). Language style matching in writing: synchrony in essays, correspondence, and poetry. J. Pers. Soc. Psychol. 99:549. 10.1037/a0020386 PubMed DOI
Jackson J. C., Watts J., Henry T. R., List J.-M., Forkel R., Mucha P. J., et al. (2019). Emotion semantics show both cultural variation and universal structure. Science 366 1517–1522. 10.1126/science.aaw8160 PubMed DOI
Johannßen D., Biemann C. (2018). “Between the Lines: Machine Learning for Prediction of Psychological Traits - A Survey,” in Machine Learning and Knowledge Extraction, eds Holzinger A., Kieseberg P., Tjoa A. M., Weippl E. (Berlin: Springer International Publishing; ), 192–211. 10.1007/978-3-319-99740-7_13 DOI
Johnson A. (2009). The Rise of English: the Language of Globalization in China and the European Union. Macalester Int. 22:39. 10.1089/omi.2017.0192 PubMed DOI
Kacewicz E., Pennebaker J. W., Davis M., Jeon M., Graesser A. C. (2014). Pronoun use reflects standings in social hierarchies. J. Lang. Soc. Psychol. 33 125–143. 10.1177/0261927x13502654 DOI
Kailer A., Chung C. K. (2007). The Russian LIWC2007 dictionary. Austin: LIWC.Net.
Kennedy B., Ashokkumar A., Boyd R. L., Dehghani M. (2021). Text analysis for psychology: methods, principles, and practices. PsyArXiv [Preprint]. 10.31234/osf.io/h2b8t DOI
Kim U., Park Y.-S., Park D. (2000). The challenge of cross-cultural psychology: the role of the indigenous psychologies. J. Cross Cult. Psychol. 31 63–75.
Kirov C., Cotterell R., Sylak-Glassman J., Walther G., Vylomova E., Xia P., et al. (2020). UniMorph 2.0: universal Morphology. ArXiv [Preprint]. Available online at: http://arxiv.org/abs/1810.11101 (accessed September 24, 2021).
Koehn P., Knowles R. (2017). “Six Challenges for Neural Machine Translation,” in Proceedings of the First Workshop on Neural Machine Translation, (Pennsylvania: Association for Computational Linguistics; ), 28–39. 10.18653/v1/W17-3204 DOI
König E., van der Auwera J. (eds) (2002). The Germanic Languages. Oxfordshire: Routledge.
Kornfilt J. (2020). Parts of Speech, Lexical Categories, and Word Classes in Morphology. In Oxford Research Encyclopedia of Linguistics. Oxford: Oxford University Press. 10.1093/acrefore/9780199384655.013.606 DOI
Kučera D. (2020). Osobnostní markery v textu: Aplikace kvantitativní psychologicko-lingvistické analýzy písemného projevu při popisu osobnosti [Personality markers in text: Application of quantitative psychological-linguistic analysis of written text in personality description]. Czechia: Jihočeská univerzita v českých Budìjovicích.
Kučera D., Haviger J., Havigerová J. M. (2020). Personality and Text: quantitative Psycholinguistic Analysis of a Stylistically Differentiated Czech Text. Psychol. Stud. 65 336–348. 10.1007/s12646-020-00553-z DOI
Kučera D., Haviger J., Havigerová J. M. (2021). Personality and Word Use: Study on Czech Language and the BigFive. Available online at: https://osf.io/vdb34 (accessed September 24, 2021). PubMed
Laajaj R., Macours K., Hernandez D. A. P., Arias O., Gosling S. D., Potter J., et al. (2019). Challenges to capture the big five personality traits in non-WEIRD populations. Sci. Adv. 5:eaaw5226. 10.1126/sciadv.aaw5226 PubMed DOI PMC
List of Countries Where English Is an Official Language – GLOBED (2019). Education Policies for Global Development. Available online at: http://www.globed.eu/wp-content/uploads/2019/11/English_official_language.pdf (accessed September 24, 2021).
Lyddy F., Farina F., Hanney J., Farrell L., Kelly O., Neill N. (2014). An Analysis of Language in University Students’ Text Messages: language In University Students’ Text Messages. J. Comput. Mediat. Commun. 19 546–561. 10.1111/jcc4.12045 DOI
Magnini B., Lavelli A., Magnolini S. (2020). “Comparing Machine Learning and Deep Learning Approaches on NLP Tasks for the Italian Language,” in Proceedings of The 12th Language Resources and Evaluation Conference, (Marseille: European Language Resources Association; ), 2110–2119.
Manning C. D., Surdeanu M., Bauer J., Finkel J. R., Bethard S., McClosky D. (2014). “The Stanford CoreNLP natural language processing toolkit,” in Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, (Pennsylvania: Association for Computational Linguistics; ), 55–60.
Martindale C. (1973). An experimental simulation of literary change. J. Pers. Soc. Psychol. 25:319. 10.1007/s10936-020-09741-4 PubMed DOI
Massó G., Lambert P., Penagos C. R., Saurí R. (2013). “Generating new LIWC dictionaries by triangulation,” in Asia Information Retrieval Symposium, (Berlin: Springer; ), 263–271.
McAuliffe W. H. B., Moshontz H., McCauley T. G., McCullough M. E. (2020). Searching for Prosociality in Qualitative Data: comparing Manual, Closed–Vocabulary, and Open–Vocabulary Methods. Eur. J. Pers. 34 903–916. 10.1002/per.2240 DOI
McCarthy A. D., Kirov C., Grella M., Nidhi A., Xia P., Gorman K., et al. (2020). “UniMorph 3.0: Universal Morphology,” in Proceedings of the 12th Language Resources and Evaluation Conference, (France: European Language Resources Association; ), 3922–3931.
Medvedeva M., Haagsma H., Nissim M. (2017). “An analysis of cross-genre and in-genre performance for author profiling in social media,” in International Conference of the Cross-Language Evaluation Forum for European Languages, (Cham: Springer; ), 211–223. 10.1007/978-3-319-65813-1_21 DOI
Mehl M. R. (2006). “Quantitative Text Analysis,” in Handbook of Multimethod Measurement in Psychology, eds Eid M., Diener E. (Washington: American Psychological Association; ), 141–156.
Mehl M. R., Pennebaker J. W. (2003). The sounds of social life: a psychometric analysis of students’ daily social environments and natural conversations. J. Pers. Soc. Psychol. 84:857. 10.1037/0022-3514.84.4.857 PubMed DOI
Mehl M. R., Robbins M. L., Holleran S. E. (2012). How taking a word for a word can be problematic: context-dependent linguistic markers of extraversion and neuroticism. J. Methods Meas. Soc. Sci. 3 30–50.
Meier T., Boyd R. L., Pennebaker J. W., Mehl M. R., Martin M., Wolf M., et al. (2019). “LIWC auf Deutsch”: the Development, Psychometrics, and Introduction of DE-LIWC2015. PsyArXiv [Preprint]. 10.17605/OSF.IO/TFQZC DOI
Meneghini R., Packer A. L. (2007). Is there science beyond English?: initiatives to increase the quality and visibility of non−English publications might help to break down language barriers in scientific communication. EMBO Rep. 8 112–116. 10.1038/sj.embor.7400906 PubMed DOI PMC
Mereu L. (1999). Boundaries of Morphology and Syntax. Amsterdam: John Benjamins Publishing.
Mergenthaler E., Bucci W. (1999). Linking verbal and non-verbal representations: computer analysis of referential activity. Br. J. Med. Psychol. 72 339–354. 10.1348/000711299160040 PubMed DOI
Milizia P. (2020). “Morphology in Indo-European languages,” in Oxford Research Encyclopedia of Linguistics. Available online at: https://oxfordre.com/linguistics/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-634 (accessed June 30, 2020). DOI
Modaresi P., Liebeck M., Conrad S. (2016). Exploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016. Verona: CLEF. 970–977.
Mønsted B., Mollgaard A., Mathiesen J. (2018). Phone-based metric as a predictor for basic personality traits. J. Res. Pers. 74 16–22. 10.1016/j.jrp.2017.12.004 DOI
Newman M. L., Groom C. J., Handelman L. D., Pennebaker J. W. (2008). Gender differences in language use: an analysis of 14,000 text samples. Discourse Process. 45 211–236. 10.1080/01638530802073712 DOI
Nivre J., de Marneffe M.-C., Ginter F., Goldberg Y., Hajič J., Manning C. D., et al. (2016). “Universal Dependencies v1: A Multilingual Treebank Collection,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), (France: European Language Resources Association (ELRA)), 1659–1666.
Oberlander J., Gill A. J. (2006). Language with character: a stratified corpus comparison of individual differences in e-mail communication. Discourse Process. 42 239–270.
Osborne T., Gerdes K. (2019). The status of function words in dependency grammar: a critique of Universal Dependencies (UD). Glossa 4:17.
Ott M., Auli M., Grangier D., Ranzato M. (2018). Analyzing uncertainty in neural machine translation. Int. Conf. Mach. Learn. 80 3956–3965.
Pam P. (2020). A stylistic investigation of selected internet discourses as tools for national development. Res. J. Mod. Lang. Lit. 1 18–39.
Park G., Schwartz H. A., Eichstaedt J. C., Kern M. L., Kosinski M., Stillwell D. J., et al. (2015). Automatic personality assessment through social media language. J. Pers. Soc. Psychol. 108:934. 10.1037/pspp0000020 PubMed DOI
Pennebaker J., Chung C., Frazee J., Lavergne G., Beaver D. (2014). When Small Words Foretell Academic Success: the Case of College Admissions Essays. PLoS One 9:e115844. 10.1371/journal.pone.0115844 PubMed DOI PMC
Pennebaker J. W., Boyd R. L., Jordan K., Blackburn K. (2015). The development and psychometric properties of LIWC2015. Austin: University of Texas at Austin.
Pennebaker J. W., Chung C. K., Ireland M., Gonzales A., Booth R. J. (2007). The Development and Psychometric Properties of LIWC2007. Austin: The University of Texas at Austin.
Pennebaker J. W., King L. A. (1999). Linguistic styles: language use as an individual difference. J. Pers. Soc. Psychol. 77:1296. 10.1037//0022-3514.77.6.1296 PubMed DOI
Pennebaker J. W., Lay T. C. (2002). Language use and personality during crises: analyses of Mayor Rudolph Giuliani’s press conferences. J. Res. Pers. 36 271–282.
Pennebaker J. W., Mehl M. R., Niederhoffer K. G. (2003). Psychological Aspects of Natural Language Use: our Words, Our Selves. Annu. Rev. Psychol. 54 547–577. 10.1146/annurev.psych.54.101601.145041 PubMed DOI
Piolat A., Booth R. J., Chung C. K., Davids M., Pennebaker J. W. (2011). La version française du dictionnaire pour le LIWC: modalités de construction et exemples d’utilisation. Psychol. Française 56 145–159. 10.1016/j.psfr.2011.07.002 DOI
Pradhan T., Bhansali R., Chandnani D., Pangaonkar A. (2020). “Analysis of Personality Traits using Natural Language Processing and Deep Learning,” in 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), (Piscataway: IEEE; ), 457–461. 10.1109/ICIRCA48905.2020.9183090 DOI
Prates M. O., Avelar P. H., Lamb L. (2018). Assessing gender bias in machine translation–a case study with Google translate. ArXiv [Preprint]. Available online at: https://arxiv.org/abs/1809.02208 (accessed September 24, 2021).
Putri G. D., Havid A. (2015). Types of errors found in Google Translation: a model of MT evaluation. Proc. ISELT FBS Univ. Negeri Padang 3 183–188.
Qiu L., Lin H., Ramsay J., Yang F. (2012). You are what you tweet: personality expression and perception on Twitter. J. Res. Pers. 46 710–718. 10.1016/j.jrp.2012.08.008 DOI
Ramírez-Esparza N., Chung C. K., Kacewicz E., Pennebaker J. W. (2008). “The Psychology of Word Use in Depression Forums in English and in Spanish: Testing Two Text Analytic Approaches,” in Proceedings of the 2008 International Conference on Weblogs and Social Media, (California: association for the Advancement of Artificial Intelligence (AAAI)), 102–108.
Ramírez-Esparza N., Pennebaker J. W., García F. A., Suriá R. (2007). La psicología del uso de las palabras: un programa de computadora que analiza textos en español. Rev. Mex. Psicol. 24 85–99.
Rayson P. (2009). Wmatrix: A web-based corpus processing environment. Lancaster: Lancaster University.
Riemer N. (ed.) (2016). The Routledge Handbook of Semantics. Oxfordshire: Routledge.
Rijkhoff J. (2011). When can a language have adjectives? An implicational universal. Berlin: De Gruyter Mouton.
Rusínová Z. (2020). “Sufix (přípona),” in Nový encyklopedický slovník češtiny online. eds Karlík P., Nekula M., Pleskalová J. (Brno: Masarykova univerzita; ).
Sánchez-Rada J. F., Iglesias C. A. (2019). Social context in sentiment analysis: formal definition, overview of current trends and framework for comparison. Inf. Fusion 52 344–356. 10.1016/j.inffus.2019.05.003 DOI
Sardinha T. B., Pinto M. V. (2019). Multi-Dimensional Analysis: Research Methods and Current Issues. London: Bloomsbury Publishing.
Schwartz H. A., Eichstaedt J. C., Kern M. L., Dziurzynski L., Ramones S. M., Agrawal M., et al. (2013b). Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One 8:e73791. 10.1371/journal.pone.0073791 PubMed DOI PMC
Schwartz H. A., Eichstaedt J., Blanco E., Dziurzynski L., Kern M. L., Ramones S., et al. (2013a). “Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach,” in Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1. Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, (Pennsylvania: Association for Computational Linguistics; ), 296–305.
Seidlhofer B. (2011). Understanding English as a lingua franca. Oxford: Oxford University Press.
Seki K. (2021). Cross-lingual text similarity exploiting neural machine translation models. J. Inf. Sci. 47 404–418. 10.1177/0165551520912676 DOI
Sharir O., Peleg B., Shoham Y. (2020). The cost of training nlp models: a concise overview. ArXiv [Preprint]. Available online at: https://arxiv.org/abs/2004.08900 (accessed September 24, 2021).
Shibata D., Wakamiya S., Kinoshita A., Aramaki E. (2016). “Detecting Japanese patients with Alzheimer’s disease based on word category frequencies,” in Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP), (Japan: The COLING 2016 Organizing Committee; ), 78–85.
Smith J., Saint-Amand H., Plamadã M., Koehn P., Callison-Burch C., Lopez A. (2013). “Dirt cheap web-scale parallel text from the common crawl,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (Stroudsburg: Association for Computational Linguistics; ), 1374–1383.
Søgaard A., Ruder S., Vuliæ I. (2018). On the limitations of unsupervised bilingual dictionary induction. ArXiv [Preprint]. Available online at: https://arxiv.org/abs/1805.03620 (accessed September 24, 2021).
Sonneveld H. B., Loening K. L. (1993). Terminology: Applications in interdisciplinary communication. Amsterdam: John Benjamins Publishing.
Stachl C., Pargent F., Hilbert S., Harari G. M., Schoedel R., Vaid S., et al. (2020). Personality research and assessment in the era of machine learning. Eur. J. Pers. 34 613–631. 10.1002/per.2257 DOI
Stone P. J., Bales R. F., Namenwirth J. Z., Ogilvie D. M. (1962). The general inquirer: a computer system for content analysis and retrieval based on the sentence as a unit of information. Behav. Sci. 7:484. 10.1002/bs.3830070412 DOI
Straka M., Straková J. (2017). “Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe,” in Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, (Stroudsburg: Association for Computational Linguistics; ), 88–99.
Stuart-Smith J., Timmins C. (2010). “The role of the individual in language variation and change,” in Language and Identities, eds Lamas C., Watt D. (Edinburgh: Edinburgh University Press; ), 39–54. 10.3389/frai.2020.00046 DOI
Świątek A. (2012). Pro-drop phenomenon across miscellaneous languages. Poland: Pedagogical University of Cracow.
Sylak-Glassman J. (2016). The Composition and Use of the Universal Morphological Feature Schema (UniMorph Schema. Maryland: Center for Language and Speech Processing Johns Hopkins University.
Tausczik Y. R., Pennebaker J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29 24–54.
Temizöz Ö. (2016). Postediting machine translation output: subject-matter experts versus professional translators. Perspectives 24 646–665. 10.1080/0907676X.2015.1119862 DOI
Thelwall M., Buckley K., Paltoglou G., Cai D., Kappas A. (2010). Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 61 2544–2558. 10.1186/s12888-015-0659-7 PubMed DOI PMC
Thomas D. R., Thomas Y. L. (1994). “Same language, different culture: understanding inter-cultural communication difficulties among English speakers,” in Proceedings of the International English Language Education Conference: National and International Challenges and Responses (Kuala Lumpur: Language Centre, Universiti Kebangsaan Malaysia; ) 211–219.
Thompson B., Roberts S. G., Lupyan G. (2020). Cultural influences on word meanings revealed through large-scale semantic alignment. Nat. Hum. Behav. 4 1029–1038. 10.1038/s41562-020-0924-8 PubMed DOI
Thuy N. T. T., Bach N. X., Phuong T. M. (2018). “Cross-language aspect extraction for opinion mining,” in 2018 10th International Conference on Knowledge and Systems Engineering (KSE), (Piscataway: IEEE; ), 67–72.
Universal Dependencies (2021). Universal Dependencies. Available online at: https://universaldependencies.org/ (accessed September 24, 2021).
Universal Dependencies: Syntax (2021). Syntax: General Principles. Available online at: https://universaldependencies.org/u/overview/syntax.html (accessed September 24, 2021).
Van Wissen L., Boot P. (2017). “An electronic translation of the LIWC Dictionary into Dutch,” in Electronic Lexicography in the 21st Century: Proceedings of ELex 2017 Conference, (Leiden: Lexical Computing Ltd; ), 703–715.
Vanhove M. (2008). From Polysemy to Semantic Change: Towards a typology of lexical semantic associations. Amsterdam: John Benjamins Publishing.
Vannest J., Bertram R., Järvikivi J., Niemi J. (2002). Counterintuitive Cross-Linguistic Differences: more Morphological Computation in English Than in Finnish. J. Psycholinguist. Res. 31 83–106. 10.1023/A:1014934915952 PubMed DOI
Vivas J., Kogan B., Romanelli S., Lizarralde F., Corda L. (2020). A cross-linguistic comparison of Spanish and English semantic norms: looking at core features. Appl. Psycholinguist. 41 285–297.
Wierzbicka A. (2013). Imprisoned in English: The Hazards of English as a Default Language. Oxford: Oxford University Press.
Wilson T., Hoffmann P., Somasundaran S., Kessler J., Wiebe J., Choi Y., et al. (2005). “OpinionFinder: A system for subjectivity analysis,” in Proceedings of HLT/EMNLP 2005 Interactive Demonstrations, (Stroudsburg: Association for Computational Linguistics; ), 34–35.
Windsor L. C., Cupit J. G., Windsor A. J. (2019). Automated content analysis across six languages. PLoS One 14:e0224425. 10.1371/journal.pone.0224425 PubMed DOI PMC
Wolf M., Horn A. B., Mehl M. R., Haug S., Pennebaker J. W., Kordy H. (2008). Computergestützte quantitative textanalyse: Äquivalenz und robustheit der deutschen version des linguistic inquiry and word count. Diagnostica 54 85–98. 10.1026/0012-1924.54.2.85 DOI
Wolfram W., Friday W. C. (1997). The role of dialect differences in cross-cultural communication: proactive dialect awareness. Bull. Suisse de Linguistique Appl. 65 143–154.
Yano Y. (2006). Cross-cultural Communication and English as an international language. Intercult. Commun. Stud. 15:172.
Yarkoni T. (2010). Personality in 100,000 Words: a large-scale analysis of personality and word use among bloggers. J. Res. Pers. 44 363–373. 10.1016/j.jrp.2010.04.001 PubMed DOI PMC
Yarkoni T., Westfall J. (2017). Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12 1100–1122. 10.1177/1745691617693393 PubMed DOI PMC
Zednik C. (2019). Solving the Black Box Problem: a Normative Framework for Explainable Artificial Intelligence. ArXiv [Preprint]. Available online at: http://arxiv.org/abs/1903.04361 (accessed September 24, 2021).
Zijlstra H., Van Meerveld T., Van Middendorp H., Pennebaker J. W., Geenen R. (2004). De Nederlandse versie van de ‘linguistic inquiry and word count’(LIWC). Gedrag Gezond 32 271–281.