You are here: Home Publications The Nijmegen corpus of casual Czech

The Nijmegen corpus of casual Czech

Ernestus, M., Kočková-Amortová, L., & Pollak, P. (2014). The Nijmegen corpus of casual Czech. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 365-370).
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which contains more than 30 hours of high-quality recordings of casual conversations in Common Czech, among ten groups of three male and ten groups of three female friends. All speakers were native speakers of Czech, raised in Prague or in the region of Central Bohemia, and were between 19 and 26 years old. Every group of speakers consisted of one confederate, who was instructed to keep the conversations lively, and two speakers naive to the purposes of the recordings. The naive speakers were engaged in conversations for approximately 90 minutes, while the confederate joined them for approximately the last 72 minutes. The corpus was orthographically annotated by experienced transcribers and this orthographic transcription was aligned with the speech signal. In addition, the conversations were videotaped. This corpus can form the basis for all types of research on casual conversations in Czech, including phonetic research and research on how to improve automatic speech recognition. The corpus will be freely available
About MPI

This is the MPI

The Max Planck Institute for Psycholinguistics is an institute of the German Max Planck Society. Our mission is to undertake basic research into the psychological,social and biological foundations of language. The goal is to understand how our minds and brains process language, how language interacts with other aspects of mind, and how we can learn languages of quite different types.

The institute is situated on the campus of the Radboud University. We participate in the Donders Institute for Brain, Cognition and Behaviour, and have particularly close ties to that institute's Centre for Cognitive Neuroimaging. We also participate in the Centre for Language Studies. A joint graduate school, the IMPRS in Language Sciences, links the Donders Institute, the CLS and the MPI.


Street address
Wundtlaan 1
6525 XD Nijmegen
The Netherlands

Mailing address
P.O. Box 310
6500 AH Nijmegen
The Netherlands

Phone:   +31-24-3521911
Fax:        +31-24-3521213

Public Outreach Officer
Charlotte Horn