You are here: Home Departments Other research Research projects Projects on language documentation and data mining

This content is archived, it could be outdated.

Projects on language documentation and data mining - Home

This is not an official institute’s project with a single major theme, but rather a selection of  smaller projects, mostly financed by 3rd-party funds, which are (i) focussed on the documentation of endangered or under-described languages on the one hand, or (ii) the exploitation of this kind of data for linguistic theory and analysis on the other hand. These projects thus contribute to the infrastructure for the language sciences, and illustrate some of the uses to which it can be put.
Stephen Levinson
Peter Wittenburg

Documentation teams

1. Yuraké (Bolivian isolate). Rik van Gijn, Sonja Gipper (PhD student); Vincent Hirtzel (College de France). Funded by Volkswagen
2.  ≠Akhoe Hai//om (Khoisan, Namibia). Thomas Widlok (U. Durham & MPI), Gertie Hoymann (PhD student), Christian Rapold (Zürich). Funded by Volkswagen.
3. Semang (Six Asian languages: Kensiw, Kintaq, Jahai, Menriq, Batek, Lanoh) Niclas Burenhult, Claudia Wegener and Neele Becker. Funded by Volkswagen.
4. Five Paman languages of Cape York, Australia (Kugu Muminh, Kuku Thaypan, Umbuygamu, Umpila, and Wik Ngathan.) Clair Hill (PhD student), Jean-Christophe Verstraete (U. Leuven), Peter Sutton (Adelaide), Alice Gaby (Berkeley).
Funded by Hans Rausing Endangered Languages Project.

Other MPI Nijmegen documentation projects:

5.   Yélî Dnye (Papuan, Papua New Guinea): Steve Levinson
6.   Karii (Vietic sub-branch of Austroasiatic, Laos) & Lao: Nick Enfield
7.   Savosavo (Papuan, Solomons): Claudia Wegener (PhD student)
8.   Rotokas (Papuan, Bougainville, PNG): Stuart Robinson (PhD student)
9.   Tzeltal (Mayan, Mexico): Penelope Brown
10. Siwu (Ghana-Togo Mountain, Ghana): Mark Dingemanse (PhD student)
11. Kata Kolok (Balinese sign language): Connie de Vos (PhD student)
12. Kilivila (Austronesian, PNG): Gunter Senft
13. Zapotec (Otomanguean, Mexico): Mark Sicoli
14. Semai (Aslian, Malaysia): Sylvia Tufvesson (PhD student)
15. Warlpiri (Pama-Nyungan, Australia): Carmel O'Shannessey (PhD student)
16. Lowland Chontal (isolate, Mexico): Loretta O’Connor
17. Arrernte (Pama-Nyungan, Australia): Jenny Green (PhD student Melbourne & MPI)

Data-mining teams

18. Sahul Project (typology and cladistics of New Guinea and Australia): Michael Dunn, Ger Reesink, Ruth Singer, Miriam van Staden, Pieter Muysken & Steve Levinson. Funded by NWO.
19. Marquesan Lexicon project: Gaby Cablitz, Jacquelijn Ringersma. Funded by Volkswagen.
20. DOBES exploitation tools: Peter Wittenburg, Claus Zinn, Paul Trilsbeek, et al. Funded by EC and Volkswagen.



List of publications that are relevant for the "Documentation of endangered languages" project more>


Last checked 2015-12-02 by Jacquelijn Ringersma
About MPI

This is the MPI

The Max Planck Institute for Psycholinguistics is an institute of the German Max Planck Society. Our mission is to undertake basic research into the psychological,social and biological foundations of language. The goal is to understand how our minds and brains process language, how language interacts with other aspects of mind, and how we can learn languages of quite different types.

The institute is situated on the campus of the Radboud University. We participate in the Donders Institute for Brain, Cognition and Behaviour, and have particularly close ties to that institute's Centre for Cognitive Neuroimaging. We also participate in the Centre for Language Studies. A joint graduate school, the IMPRS in Language Sciences, links the Donders Institute, the CLS and the MPI.


Street address
Wundtlaan 1
6525 XD Nijmegen
The Netherlands

Mailing address
P.O. Box 310
6500 AH Nijmegen
The Netherlands

Phone:   +31-24-3521911
Fax:        +31-24-3521213

Public Outreach Officer
Charlotte Horn