The MPI archives & databases

Below you will find an overview of the data archives and language corpora which can be accessed through the MPI.


The MPI for Psycholinguistics stores the central digital archive for the DoBeS endangered languages documentation programme. There are currently more than 6,000 different languages in the world, but many of them are in danger of becoming extinct in the near future. The DoBeS (Dokumentation Bedrohter Sprachen) programme was launched in 2000 by the German VolkswagenFoundation, with the aim of documenting some of these languages before it is too late. To date, approximately 50 documentation projects have been funded to document one or more endangered languages in various parts of the world. These projects document the language in its cultural setting, comprising annotated video and audio recordings, photographs, and texts. All the material is available in the online archive, although permission is required to view many of the resources.

The NGT corpus

The NGT corpus is a collection of data from deaf signers using Dutch sign language (NGT; Nederlandse Gebarentaal). The corpus consists of recordings using multiple synchronised video cameras, accompanied by gloss and translation annotations. All data are freely accessible to researchers and the general public. The project is led by Onno Crasborn, Inge Zwitserlood, and Johan Ros at Radboud University. The data is stored in the MPI archive for linguistic resources.

Database of Dutch diphone perception

A good resource for finding out more about this database is the article written by Smits, R., Warner, N., McQueen, J.M. & Cutler (2003), Unfolding of phonetic information over time: A database of Dutch diphone perception, Journal of the Acoustical Society of America, 113, 563-574.

The Fromkin Speech Error Database

Data for the Fromkin Speech Error Database was collected over many years. It was converted to a format that could be read by computers at UCLA, with support from a National Science Foundation grant which was awarded to Professor Victoria A. Fromkin.

At the time of Vicki Fromkin's death in January 2000, the wider availability of the database was in doubt because the software format used to encode it was no longer supported. The current version of the database is in XML format. The plan to rescue the Fromkin Database was developed by Anne Cutler, Caroline Henton, Peter Ladefoged, Sieb Nooteboom, Carson Schutze, and Stefanie Shattuck-Hufnagel. The XML conversion was carried out by Hansje Braam under the supervision of Sieb Nooteboom, with financial support from the Max Planck Society .

UCLA is currently working on developing the Speech Error Database even further. The new system will include a web-based search coupled with a menu-driven interface for entering new slips, verification/correction of the UCLASEC codings, improved documentation, and an online bibliography. Once the new version is available, information on how to access it will be posted here. In the meantime, please contact Carson Schutze (cschutze [at] if you have any questions or queries.

The database is available here. To download Error selections, simply click on the Download button. Note that it is not advisable to download more than 500 entries at once.

The Stern diaries

Clara and William Stern kept a diary (Tagebuch) about the psychological development of their three children, Hilde, Günther, and Eva, born in 1900, 1902, and 1904 respectively. Their first major publication based on these diaries was "Die Kindersprache" (1907), which became a classic text in the field of language acquisition. However, it has not always been easy to access the Stern diaries. Werner Deutsch of the Max Planck Institute for Psycholinguistics in Nijmegen, with Eva's unremitting support in Jerusalem, had all the diaries transcribed. Since then, they have been available digitally through the MPI.

Please read the full introduction by Pim Levelt.

The diaries are available via the Language Archive.


Share this page