Grambank shows diversity of the world’s languages

20 April 2023
 Grammatical similarity in the Grambank sample of languages. The color coding represents the distribution of languages according to the first three principal components (PCs) mapped onto RGB color space (PC1, red; PC2, green; PC3, blue). Similarity in color indicates similarity in grammatical structure on the first three dimensions. See fig. S15 for loading of Grambank features on the first two components and fig. S16 for correlation with theoretical metrics.
What shapes the structure of languages? The Grambank database, initiated by the Max Planck Institute (MPI) of Psycholinguistics in Nijmegen, the Netherlands, and the MPI for Evolutionary Anthropology in Leipzig, Germany, contains a third of the world’s 7000 languages—and their grammatical structure. In a study in Science Advances, an international team of researchers reports that grammatical structure is highly flexible across languages, shaped by common ancestry, constraints on cognition and usage, and language contact.

Linguists have long been interested in language variation. What are common or universal patterns across languages? What limits the possible variation between them? Grambank, the largest and most comprehensive database of language structure in the world, enables researchers to answer some of these questions. Grambank was constructed in an international collaboration between the Max Planck institutes, the Australian National University in Canberra, Australia, and over a hundred scholars from around the world.

“The design of the database required much recoding as we went along, in order to encompass the many different solutions that languages have evolved to code essential properties”, says Stephen Levinson, director emeritus of the Max Planck Institute for Psycholinguistics in Nijmegen and one of the founders of the Grambank project.

Limits on variation

The team settled on 195 grammatical properties, ranging from word order to whether or not a language has gendered pronouns. For instance, many languages have separate pronouns for ‘he’ and ‘she’, but some also have male and female versions of ‘I’ or ‘you’. If grammatical properties were to vary freely, the possible ‘design space’ would be enormous.

Limits on variation could be related to cognitive principles, rooted in memory or learning, rendering some grammatical structures more likely than others. Limits could also be related to historical ‘accidents’, such as descent from a common language or contact with other languages.

The researchers discovered that there is much greater flexibility in the combination of grammatical features than many theorists have assumed. “Languages are free to vary considerably in quantifiable ways, but not without limits”, explains Levinson. “A sign of the extraordinary diversity of the 2400 languages in our sample is that only five of them occupy the same location in design space (share the same grammatical properties).”

Languages show much stronger similarity to those with a common ancestor than to those they happen to be in contact with. “Nevertheless, if processes of linguistic evolution and diversification were run again from the beginning, there would still be some resemblance to what we now have”, Levinson says.

Diversity under threat

“The extraordinary diversity of languages is one of humanity’s greatest cultural endowments”, concludes Levinson. “This diversity is under threat, especially in some areas like Northern Australia, part of South America or Northern America. With that impending loss will come loss of our scientific understanding of the role of language in mind and culture.”

The database is an open access comprehensive resource, maintained by the Max Planck Society, which will encourage future explorations of linguistic diversity. According to Levinson, “It puts linguistics on an even footing with genetics, archaeology and anthropology, allowing for explorations that connect such large databases.”


Link to paper

Share this page