Impact of irregular pronunciation on phonetic segmentation of Nijmegen corpus of Casual Czech
Mizera, P., Pollak, P., Kolman, A., & Ernestus, M.
Impact of irregular pronunciation on phonetic segmentation of Nijmegen corpus of Casual Czech. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (Eds.
), Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings
(pp. 499-506). Heidelberg: Springer.
This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the accuracy of phonetic segmentation when canonical or reduced pronunciation is used. The achieved accuracy of realized phonetic segmentation provides information about general accuracy of proper acoustic modelling which is supposed to be applied in spontaneous speech recognition. As a byproduct of presented spontaneous speech segmentation, this paper also describes the created lexicon with canonical pronunciations of words in NCCCz, a tool supporting pronunciation check of lexicon items, and finally also a minidatabase of selected utterances from NCCCz manually labelled on phonetic level suitable for evaluation purposes