The Indo-European Cognate Relationships dataset
Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články, dataset
PubMed
40897732
PubMed Central
PMC12405575
DOI
10.1038/s41597-025-05445-3
PII: 10.1038/s41597-025-05445-3
Knihovny.cz E-zdroje
- MeSH
- jazyk (prostředek komunikace) * MeSH
- lidé MeSH
- lingvistika * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
- Geografické názvy
- Evropa MeSH
The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words ('cognates') pattern across 160 languages of the Indo-European family. IE-CoR is intended as a benchmark dataset for computational research into the evolution of the Indo-European languages. It is structured around 170 reference meanings in core lexicon, and contains 25731 lexeme entries, analysed into 4981 cognate sets. Novel, dedicated structures are used to code all known cases of horizontal transfer. All 13 main documented clades of Indo-European, and their main subclades, are well represented. Time calibration data for each language are also included, as are relevant geographical and social metadata. Data collection was performed by an expert consortium of 89 linguists drawing on 355 cited sources. The dataset is extendable to further languages and meanings and follows the Cross-Linguistic Data Format (CLDF) protocols for linguistic data. It is designed to be interoperable with other cross-linguistic datasets and catalogues, and provides a reference framework for similar initiatives for other language families.
ATILF 44 avenue de la Libération B P 30687 54063 Nancy France
Consultant Language Development FLI 2 Block 19 G 8 Markaz Islamabad 44000 Pakistan
Department of English Language and Literature Persian Gulf University Bushehr 7516913817 Iran
Department of English Shahrekord University Shahrekord Iran
Department of General Linguistics University of Bamberg Schillerplatz 17 96047 Bamberg Germany
Department of German Nordic and Slavic University of Wisconsin Madison Madison Wisconsin USA
Department of Language Science and Technology Saarland University 66123 Saarbrücken Germany
Department of Linguistics Alzahra University Vanak Tehran Iran
Department of Linguistics and Philology Uppsala University Box 635 751 26 Uppsala Sweden
Department of Linguistics Ghent University Blandijnberg 2 9000 Ghent Belgium
Department of Nordic Studies and Linguistics University of Copenhagen 2300 København S Denmark
Directorate of Higher Education Colleges Muzaffarabad Azad Jammu and Kashmir Pakistan
Faclair na Gàidhlig Sabhal Mòr Ostaig Sleat Isle of Skye IV44 8RQ UK
Faculty of Asian and Middle Eastern Studies University of Oxford Pusey Lane Oxford OX1 2LE UK
Faculty of Education Free University of Bozen Bolzano Regensburger Allee 16 39042 Brixen Italy
Forum for Language Initiatives P O Box No 763 Islamabad Pakistan
Heidelberg Academy of Sciences and Humanities Karlstraße 4 69117 Heidelberg Germany
Independent scholar Berkeley USA
Independent scholar Berlin Germany
Independent scholar Bratislava Slovakia
Independent scholar Brussels Belgium
Independent scholar Cottonwood Arizona USA
Independent scholar Grand Forks USA
Independent scholar Leipzig Germany
Independent scholar Lund Sweden
Independent scholar Östra Ämtervik Sweden
Independent scholar Skopje North Macedonia
Institut für Linguistik Philologische Fakultät Universität Leipzig 04081 Leipzig Germany
Institute of Modern Greek Studies Aristotle University of Thessaloniki 54124 Thessaloniki Greece
Leiden University Centre for Linguistics Postbus 9515 2300 RA Leiden Netherlands
Max Planck Institute of Geoanthropology Kahlaische Strasse 10 07745 Jena Germany
Middle Eastern Languages and Cultures UC Berkeley 250 Social Sciences Building Berkeley CA 94720 USA
NCCR Evolving Language Affolternstrasse 56 8050 Zürich Switzerland
Radboud University Houtlaan 4 6525 XZ Nijmegen Netherlands
Ruhr Universität Bochum CERES Center of Religious Studies Universitätsstr 90a 44789 Bochum Germany
Saxon Academy of Sciences and Humanities Karl Tauchnitz Straße 1 04107 Leipzig Germany
School of English University of Nottingham University Park Nottingham NG7 2RD UK
School of Humanities University of Westminster 309 Regent Street London W1B 2HW UK
School of Psychology University of Auckland 23 Symonds St Auckland 1010 New Zealand
SOAS University of London Thornhaugh Street Russell Square London WC1H 0XG UK
Surrey Morphology Group University of Surrey Guildford Surrey GU2 7XH UK
UiT The Arctic University of Norway Postboks 6050 Langnes 9037 Tromsø Norway
Université Rennes 2 Place du recteur Henri Le Moal CS 24307 35043 Rennes France
University of Cyprus P O Box 20537 1678 Nicosia Cyprus
University of Ilam Ilam Province Ilam Pazhohesh Blvd Iran
University of New South Wales Sydney NSW 2052 Australia
Zobrazit více v PubMed
Eberhard, D. M., Simons, G. F. & Fennig, C. D.
Heggarty, P. Cognacy databases and phylogenetic research on Indo-European.
Ringe, D. A., Warnow, T. & Taylor, A. Indo-European and computational cladistics.
Nakhleh, L., Ringe, D. & Warnow, T. Perfect phylogenetic networks: A new methodology for reconstructing the evolutionary history of natural languages.
Chang, W., Cathcart, C., Hall, D. & Garrett, A. Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis.
Kassian, A. S.
Heggarty, P. PubMed
McMahon, A. M. S. & McMahon, R.
Pereltsvaig, A. & M. W. Lewis.
Verkerk, A. Phylogenies: Future, not fallacy.
Forkel, R. & Bank, S. The clld toolkit. Language Comparison with Linguistic Databases: RefLex and Typological Databases, Nijmegen.
Forkel, R. and List, J-M. CLDFBench: Give your cross-linguistic data a lift. In
Greenhill, S. J., Heggarty, P. & Gray, R. D. Bayesian phylolinguistics. In R. D. Janda, B. D. Joseph & B. S. Vance (Eds.),
Swadesh, M. Lexico-statistic dating of prehistoric ethnic contacts: with special reference to North American Indians and Eskimos.
Swadesh, M. Towards Greater Accuracy in Lexicostatistic Dating.
Embleton, S. M. Statistics in Historical Linguistics. (Brockmeyer, Bochum, 1986).
Dyen, I., Kruskal, J. B. & Black, P. Comparative Indo-European database collected by Isidore Dyen. https://thevore.com/comparative-indoeuropean-database-collected-by-isidore-dyen/ (1997).
Kassian, A., Starostin, G., Dybo, A. & Chernov, V. The Swadesh wordlist. An attempt at semantic specification.
Starostin, G. Preliminary lexicostatistics as a basis for language classification: a new approach.
Starostin, G. (ed.) The Global Lexicostatistical Database. http://starling.rinet.ru/new100/ (Moscow: Higher School of Economics, & Santa Fe: Santa Fe Institute, 2011).
Wichmann, S. & Grant, A. (eds.)
Dunn, M. & Tresoldi, T.
McMahon, A. M. S., Heggarty, P., McMahon, R. & Slaska, N. Swadesh sublists and the benefits of borrowing: An Andean case study.
Heggarty, P. Beyond lexicostatistics: How to get more out of ‘word list’ comparisons.
Tadmor, U. Loanwords in the world’s languages: findings and results. in
Heggarty, P. & Powell, A. Bayesian phylogenetics for language prehistory. in
Durkin, P. The Oxford Guide to Etymology. (Oxford University Press, 2009).
Mailhammer, R. (Ed.)
Mailhammer, R. Etymology. In
Geeraerts, D.
Berlin, B. & Kay, P.
Koptjevskaja-Tamm, M.
Rosch, E. Principles of Categorization,
Zhang, H., Ji, T., Pagel, M. & Mace, R. Dated phylogeny suggests early Neolithic origin of Sino-Tibetan languages. PubMed PMC
Zhang, M., Yan, S., Pan, W. & Jin, L. Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic. PubMed
Haspelmath, M., Tadmor, U. eds.
List, J.-M.
Quine, W. V.
Majid, A. & Levinson, S. WEIRD languages have misled us, too. PubMed
Wu, M.-S. & List, J.-M. Annotating Cognates in Phylogenetic Studies of South-East Asian Languages.
Hill, N. W. & List, J.-M. Challenges of Annotation and Analysis in Computer-Assisted Language Comparison: A Case Study on Burmish Languages.
Fellner, H. A. & Hill, N. W. Word Families, Allofams, and the Comparative Method.
Wichmann, S., Holman, E. W. & Brown C. H. eds. The ASJP Database (version 19). https://asjp.clld.org (2020).
Rix, H. & Kümmel, M.
Wodtko, D. S., Irslinger, B., Schneider, C.
Schumacher S. & Matzinger, J. Die Verben des Altalbanischen: Bildwörterbuch, Vorgeschichte und Etymologie. (Harrassowitz, 2013).
Demiraj, B. Albanische Etymologien. (Rodopi, 1997).
Orel, V. Albanian Etymological Dictionary (Brill, 1998).
Babinotis, G. Ετυμολογικό λεξικό της νεοελληνικής γλώσσας [Etymological Dictionary of the Modern Greek Language]. (Centre for Lexicography, 2010).
Scarborough, M. Cognacy and Computational Cladistics: Issues in Determining Lexical Cognacy for Indo-European Cladistic Research. in
Matthews, P. H.
Rochelle, L. & Štekauer, P.
Warnow, T., Evans, S. N., Ringe, D. & Nakhleh, L. A stochastic model of language evolution that incorporates homoplasy and borrowing. In P. Forster & C. Renfrew (Eds.),
Kloekhorst, A. Etymological Dictionary of the Hittite Inherited Lexicon (Brill, 2008).
Puhvel, J. Hittite Etymological Dictionary (Mouton de Gruyter, 1984–2021).
Adams, D. Q. A Dictionary of Tocharian B. Revised and Greatly Enlarged (Rodopi, 2013).
Frisk, H. Griechisches Etymologisches Wörterbuch (Winter, 1960–1972).
Chantraine, P. Dictionnaire étymologique de la langue grecque. (Klincksieck, 1968–1980).
Beekes, R. S. P. Etymological Dictionary of Greek (Brill, 2010).
Martirosyan, H. Etymological Dictionary of the Armenian Inherited Lexicon. (Brill, 2010).
Mayrhofer, M. Kurzgefaßtes etymologisches Wörterbuch des Altindischen (Winter, 1956–1980).
Mayrhofer, M. Etymologisches Wörterbuch des Altindoarischen. (Winter, 1986–2001).
Turner, R. L. A Comparative Dictionary of the Indo-Aryan Languages (Oxford, 1962–1966).
Rastorgueva, V. S. & Edelman, D. I. Этимологический словарь иранских языков [Etymological Dictionary of the Iranian Languages]. (Nauka, 2000–2020).
Cheung, J. Etymological Dictionary of the Iranian Verb (Brill, 2007).
Derksen, R. Etymological Dictionary of the Baltic Inherited Lexicon (Brill, 2015).
Hock, W.
Smoczyński, W. Słownik etymologiczny języka Litewskiego. (Wydział Filologiczny, 2007).
Smoczyński, W. Lexikon der altpreussischen Verben. (Institut für Sprachen und Literaturen, 2005).
Trubačev, O. N. Этимологический словарь славянских языков (Etymological Dictionary of the Slavic Languages). (Nauka, 1974–).
Derksen, R. Etymological Dictionary of the Slavic Inherited Lexicon (Brill, 2008).
Kroonen, G. Etymological Dictionary of Proto-Germanic. (Brill, 2013).
Orel, V. A Handbook of Germanic Etymology. (Brill, 2003).
Lehmann, W. P. A Gothic Etymological Dictionary. (Brill, 1986).
Ernout, A. & Meillet, A. Dictionnaire étymologique de la langue latine (4é, with additions and corrections by Jacques André). (Klincksieck, 1985).
de Vaan, M. Etymological Dictionary of Latin and the other Italic Languages (Brill, 2008).
Untermann, J. Wörterbuch des Oskisch-Umbrischen (Winter, 2000).
Meyer-Lübke, W. Romanisches etymologisches Wörterbuch (3e.). (Winter, 1935).
Matasović, R. Etymological Dictionary of Proto-Celtic. (Brill, 2009).
Vendryes, J., Bachellery, E., Lambert, P.-Y. Lexique étymologique de l’Irlandais ancien. (Dublin Institute for Advanced Studies & CNRS, 1959–1996).
Falileyev, A. Etymological Glossary of Old Welsh (Max Niemeyer, 2000).
Deshayes, A. Dictionnaire étymologique du breton (Chasse-Marée, 2003).
Pokorny, J. Indogermanisches etymologisches Wörterbuch (Franke, 1959).
Heggarty, P., Anderson, C. & Scarborough, M. Indo-European Cognate Relationships database (IE-CoR version 1.2).
Anderson, C.
List, J.-M., Anderson, C., Tresoldi, T. & Forkel, R. Cross-Linguistic Transcription Systems (Version v2.1.0).
List, J.-M. and Forkel, R. LingPy. A Python library for historical linguistics. Version 2.6.13. 10.5281/zenodo.5144474 With contributions by Simon Greenhill, Tiago Tresoldi, Christoph Rzymski, Gereon Kaiping, Steven Moran, Peter Bouda, Johannes Dellert, Taraka Rama, Frank Nagel, Patrick Elmer, Arne Rubehn. Passau: University of Passau (2024)
List, J.-M., Walworth, M., Greenhill, D. J., Tresoldi, T. & Forkel, R. Sequence Comparison in Computational Historical Linguistics.