Displaying 1 - 15 of 15
-
Ernestus, M. (2014). Acoustic reduction and the roles of abstractions and exemplars in speech processing. Lingua, 142, 27-41. doi:10.1016/j.lingua.2012.12.006.
Abstract
Acoustic reduction refers to the frequent phenomenon in conversational speech that words are produced with fewer or lenited segments compared to their citation forms. The few published studies on the production and comprehension of acoustic reduction have important implications for the debate on the relevance of abstractions and exemplars in speech processing. This article discusses these implications. It first briefly introduces the key assumptions of simple abstractionist and simple exemplar-based models. It then discusses the literature on acoustic reduction and draws the conclusion that both types of models need to be extended to explain all findings. The ultimate model should allow for the storage of different pronunciation variants, but also reserve an important role for phonetic implementation. Furthermore, the recognition of a highly reduced pronunciation variant requires top down information and leads to activation of the corresponding unreduced variant, the variant that reaches listeners’ consciousness. These findings are best accounted for in hybrids models, assuming both abstract representations and exemplars. None of the hybrid models formulated so far can account for all data on reduced speech and we need further research for obtaining detailed insight into how speakers produce and listeners comprehend reduced speech. -
Ernestus, M., & Giezenaar, G. (2014). Een goed verstaander heeft maar een half woord nodig. In B. Bossers (
Ed. ), Vakwerk 9: Achtergronden van de NT2-lespraktijk: Lezingen conferentie Hoeven 2014 (pp. 81-92). Amsterdam: BV NT2. -
Ernestus, M., Kočková-Amortová, L., & Pollak, P. (2014). The Nijmegen corpus of casual Czech. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (
Eds. ), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 365-370).Abstract
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which contains more than 30 hours of high-quality recordings of casual conversations in Common Czech, among ten groups of three male and ten groups of three female friends. All speakers were native speakers of Czech, raised in Prague or in the region of Central Bohemia, and were between 19 and 26 years old. Every group of speakers consisted of one confederate, who was instructed to keep the conversations lively, and two speakers naive to the purposes of the recordings. The naive speakers were engaged in conversations for approximately 90 minutes, while the confederate joined them for approximately the last 72 minutes. The corpus was orthographically annotated by experienced transcribers and this orthographic transcription was aligned with the speech signal. In addition, the conversations were videotaped. This corpus can form the basis for all types of research on casual conversations in Czech, including phonetic research and research on how to improve automatic speech recognition. The corpus will be freely available -
Lahey, M., & Ernestus, M. (2014). Pronunciation variation in infant-directed speech: Phonetic reduction of two highly frequent words. Language Learning and Development, 10, 308-327. doi:10.1080/15475441.2013.860813.
Abstract
In spontaneous conversations between adults, words are often pronounced with fewer segments or syllables than their citation forms. The question arises whether infant-directed speech also contains phonetic reduction. If so, infants would be presented with speech input that enables them to acquire reduced variants from an early age. This study compared speech directed at 11- and 12-month-old infants with adult-directed conversational speech and adult-directed read speech. In an acoustic study, 216 tokens of the Dutch words allemaal and helemaal from speech corpora were analyzed for duration, number of syllables, and vowel quality. In a perception study, adult participants rated these same materials for reduction and provided phonetic transcriptions. The results show that these two words are frequently reduced in infant-directed speech, and that their degree of reduction is comparable with conversational adult-directed speech. These findings suggest that lexical representations for reduced pronunciation variants can be acquired early in linguistic developmentFiles private
Request files -
Mizera, P., Pollak, P., Kolman, A., & Ernestus, M. (2014). Impact of irregular pronunciation on phonetic segmentation of Nijmegen corpus of Casual Czech. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (
Eds. ), Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings (pp. 499-506). Heidelberg: Springer.Abstract
This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the accuracy of phonetic segmentation when canonical or reduced pronunciation is used. The achieved accuracy of realized phonetic segmentation provides information about general accuracy of proper acoustic modelling which is supposed to be applied in spontaneous speech recognition. As a byproduct of presented spontaneous speech segmentation, this paper also describes the created lexicon with canonical pronunciations of words in NCCCz, a tool supporting pronunciation check of lexicon items, and finally also a minidatabase of selected utterances from NCCCz manually labelled on phonetic level suitable for evaluation purposes -
Schertz, J., & Ernestus, M. (2014). Variability in the pronunciation of non-native English the: Effects of frequency and disfluencies. Corpus Linguistics and Linguistic Theory, 10, 329-345. doi:10.1515/cllt-2014-0024.
Abstract
This study examines how lexical frequency and planning problems can predict phonetic variability in the function word ‘the’ in conversational speech produced by non-native speakers of English. We examined 3180 tokens of ‘the’ drawn from English conversations between native speakers of Czech or Norwegian. Using regression models, we investigated the effect of following word frequency and disfluencies on three phonetic parameters: vowel duration, vowel quality, and consonant quality. Overall, the non-native speakers showed variation that is very similar to the variation displayed by native speakers of English. Like native speakers, Czech speakers showed an effect of frequency on vowel durations, which were shorter in more frequent word sequences. Both groups of speakers showed an effect of frequency on consonant quality: the substitution of another consonant for /ð/ occurred more often in the context of more frequent words. The speakers in this study also showed a native-like allophonic distinction in vowel quality, in which /ði/ occurs more often before vowels and /ðə/ before consonants. Vowel durations were longer in the presence of following disfluencies, again mirroring patterns in native speakers, and the consonant quality was more likely to be the target /ð/ before disfluencies, as opposed to a different consonant. The fact that non-native speakers show native-like sensitivity to lexical frequency and disfluencies suggests that these effects are consequences of a general, non-language-specific production mechanism governing language planning. On the other hand, the non-native speakers in this study did not show native-like patterns of vowel quality in the presence of disfluencies, suggesting that the pattern attested in native speakers of English may result from language-specific processes separate from the general production mechanisms -
Ten Bosch, L., Ernestus, M., & Boves, L. (2014). Comparing reaction time sequences from human participants and computational models. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 462-466).
Abstract
This paper addresses the question how to compare reaction times computed by a computational model of speech comprehension with observed reaction times by participants. The question is based on the observation that reaction time sequences substantially differ per participant, which raises the issue of how exactly the model is to be assessed. Part of the variation in reaction time sequences is caused by the so-called local speed: the current reaction time correlates to some extent with a number of previous reaction times, due to slowly varying variations in attention, fatigue etc. This paper proposes a method, based on time series analysis, to filter the observed reaction times in order to separate the local speed effects. Results show that after such filtering the between-participant correlations increase as well as the average correlation between participant and model increases. The presented technique provides insights into relevant aspects that are to be taken into account when comparing reaction time sequences -
Ernestus, M., & Neijt, A. (2008). Word length and the location of primary word stress in Dutch, German, and English. Linguistics, 46(3), 507-540. doi:10.1515/LING.2008.017.
Abstract
This study addresses the extent to which the location of primary stress in Dutch, German, and English monomorphemic words is affected by the syllables preceding the three final syllables. We present analyses of the monomorphemic words in the CELEX lexical database, which showed that penultimate primary stress is less frequent in Dutch and English trisyllabic than quadrisyllabic words. In addition, we discuss paper-and-pencil experiments in which native speakers assigned primary stress to pseudowords. These experiments provided evidence that in all three languages penultimate stress is more likely in quadrisyllabic than in trisyllabic words. We explain this length effect with the preferences in these languages for word-initial stress and for alternating patterns of stressed and unstressed syllables. The experimental data also showed important intra- and interspeaker variation, and they thus form a challenging test case for theories of language variation. -
Kuperman, V., Ernestus, M., & Baayen, R. H. (2008). Frequency distributions of uniphones, diphones, and triphones in spontaneous speech. Journal of the Acoustical Society of America, 124(6), 3897-3908. doi:10.1121/1.3006378.
Abstract
This paper explores the relationship between the acoustic duration of phonemic sequences and their frequencies of occurrence. The data were obtained from large (sub)corpora of spontaneous speech in Dutch, English, German, and Italian. Acoustic duration of an n-phone is shown to codetermine the n-phone's frequency of use, such that languages preferentially use diphones and triphones that are neither very long nor very short. The observed distributions are well approximated by a theoretical function that quantifies the concurrent action of the self-regulatory processes of minimization of articulatory effort and minimization of perception effort -
Mitterer, H., & Ernestus, M. (2008). The link between speech perception and production is phonological and abstract: Evidence from the shadowing task. Cognition, 109(1), 168-173. doi:10.1016/j.cognition.2008.08.002.
Abstract
This study reports a shadowing experiment, in which one has to repeat a speech stimulus as fast as possible. We tested claims about a direct link between perception and production based on speech gestures, and obtained two types of counterevidence. First, shadowing is not slowed down by a gestural mismatch between stimulus and response. Second, phonetic detail is more likely to be imitated in a shadowing task if it is phonologically relevant. This is consistent with the idea that speech perception and speech production are only loosely coupled, on an abstract phonological level. -
Mitterer, H., Yoneyama, K., & Ernestus, M. (2008). How we hear what is hardly there: Mechanisms underlying compensation for /t/-reduction in speech comprehension. Journal of Memory and Language, 59, 133-152. doi:10.1016/j.jml.2008.02.004.
Abstract
In four experiments, we investigated how listeners compensate for reduced /t/ in Dutch. Mitterer and Ernestus [Mitterer,H., & Ernestus, M. (2006). Listeners recover /t/s that speakers lenite: evidence from /t/-lenition in Dutch. Journal of Phonetics, 34, 73–103] showed that listeners are biased to perceive a /t/ more easily after /s/ than after /n/, compensating for the tendency of speakers to reduce word-final /t/ after /s/ in spontaneous conversations. We tested the robustness of this phonological context effect in perception with three very different experimental tasks: an identification task, a discrimination task with native listeners and with non-native listeners who do not have any experience with /t/-reduction,and a passive listening task (using electrophysiological dependent measures). The context effect was generally robust against these experimental manipulations, although we also observed some deviations from the overall pattern. Our combined results show that the context effect in compensation for reduced /t/ results from a complex process involving auditory constraints, phonological learning, and lexical constraints. -
De Schryver, J., Neijt, A., Ghesquière, P., & Ernestus, M. (2008). Analogy, frequency, and sound change: The case of Dutch devoicing. Journal of Germanic Linguistics, 20(2), 159-195. doi:10.1017/S1470542708000056.
Abstract
This study investigates the roles of phonetic analogy and lexical frequency in an ongoing sound change, the devoicing of fricatives in Dutch, which occurs mainly in the Netherlands and to a lesser degree in Flanders. In the experiment, Dutch and Flemish students read two variants of 98 words: the standard and a nonstandard form with the incorrect voice value of the fricative. Dutch students chose the non-standard forms with devoiced fricatives more often than Flemish students. Moreover, devoicing, though a gradual process, appeared lexically diffused, affecting first the words that are low in frequency and phonetically similar to words with voiceless fricatives. -
Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.
Abstract
This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature. -
Wagner, A., & Ernestus, M. (2008). Identification of phonemes: Differences between phoneme classes and the effect of class size. Phonetica, 65(1-2), 106-127. doi:10.1159/000132389.
Abstract
This study reports general and language-specific patterns in phoneme identification. In a series of phoneme monitoring experiments, Castilian Spanish, Catalan, Dutch, English, and Polish listeners identified vowel, fricative, and stop consonant targets that are phonemic in all these languages, embedded in nonsense words. Fricatives were generally identified more slowly than vowels, while the speed of identification for stop consonants was highly dependent on the onset of the measurements. Moreover, listeners' response latencies and accuracy in detecting a phoneme correlated with the number of categories within that phoneme's class in the listener's native phoneme repertoire: more native categories slowed listeners down and decreased their accuracy. We excluded the possibility that this effect stems from differences in the frequencies of occurrence of the phonemes in the different languages. Rather, the effect of the number of categories can be explained by general properties of the perception system, which cause language-specific patterns in speech processing. -
Ernestus, M., Baayen, R. H., & Schreuder, R. (2002). The recognition of reduced word forms. Brain and Language, 81(1-3), 162-173. doi:10.1006/brln.2001.2514.
Abstract
This article addresses the recognition of reduced word forms, which are frequent in casual speech. We describe two experiments on Dutch showing that listeners only recognize highly reduced forms well when these forms are presented in their full context and that the probability that a listener recognizes a word form in limited context is strongly correlated with the degree of reduction of the form. Moreover, we show that the effect of degree of reduction can only partly be interpreted as the effect of the intelligibility of the acoustic signal, which is negatively correlated with degree of reduction. We discuss the consequences of our findings for models of spoken word recognition and especially for the role that storage plays in these models.
Share this page