Publications

Displaying 1 - 17 of 17
  • Brand, S., & Ernestus, M. (2021). Reduction of word-final obstruent-liquid-schwa clusters in Parisian French. Corpus Linguistics and Linguistic Theory, 17(1), 249-285. doi:10.1515/cllt-2017-0067.

    Abstract

    This corpus study investigated pronunciation variants of word-final obstruent-liquid-schwa (OLS) clusters in nouns in casual Parisian French. Results showed that at least one phoneme was absent in 80.7% of the 291 noun tokens in the dataset, and that the whole cluster was absent (e.g., [mis] for ministre) in no less than 15.5% of the tokens. We demonstrate that phonemes are not always completely absent, but that they may leave traces on neighbouring phonemes. Further, the clusters display undocumented voice assimilation patterns. Statistical modelling showed that a phoneme is most likely to be absent if the following phoneme is also absent. The durations of the phonemes are conditioned particularly by the position of the word in the prosodic phrase. We argue, on the basis of three different types of evidence, that in French word-final OLS clusters, the absence of obstruents is mainly due to gradient reduction processes, whereas the absence of schwa and liquids may also be due to categorical deletion processes.
  • Felker, E. R., Broersma, M., & Ernestus, M. (2021). The role of corrective feedback and lexical guidance in perceptual learning of a novel L2 accent in dialogue. Applied Psycholinguistics, 42, 1029-1055. doi:10.1017/S0142716421000205.

    Abstract

    Perceptual learning of novel accents is a critical skill for second-language speech perception, but little is known about the mechanisms that facilitate perceptual learning in communicative contexts. To study perceptual learning in an interactive dialogue setting while maintaining experimental control of the phonetic input, we employed an innovative experimental method incorporating prerecorded speech into a naturalistic conversation. Using both computer-based and face-to-face dialogue settings, we investigated the effect of two types of learning mechanisms in interaction: explicit corrective feedback and implicit lexical guidance. Dutch participants played an information-gap game featuring minimal pairs with an accented English speaker whose /ε/ pronunciations were shifted to /ɪ/. Evidence for the vowel shift came either from corrective feedback about participants’ perceptual mistakes or from onscreen lexical information that constrained their interpretation of the interlocutor’s words. Corrective feedback explicitly contrasting the minimal pairs was more effective than generic feedback. Additionally, both receiving lexical guidance and exhibiting more uptake for the vowel shift improved listeners’ subsequent online processing of accented words. Comparable learning effects were found in both the computer-based and face-to-face interactions, showing that our results can be generalized to a more naturalistic learning context than traditional computer-based perception training programs.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2021). Semantic sentence similarity: Size does not always matter. In Proceedings of Interspeech 2021 (pp. 4393-4397). doi:10.21437/Interspeech.2021-1464.

    Abstract

    This study addresses the question whether visually grounded speech recognition (VGS) models learn to capture sentence semantics without access to any prior linguistic knowledge. We produce synthetic and natural spoken versions of a well known semantic textual similarity database and show that our VGS model produces embeddings that correlate well with human semantic similarity judgements. Our results show that a model trained on a small image-caption database outperforms two models trained on much larger databases, indicating that database size is not all that matters. We also investigate the importance of having multiple captions per image and find that this is indeed helpful even if the total number of images is lower, suggesting that paraphrasing is a valuable learning signal. While the general trend in the field is to create ever larger datasets to train models on, our findings indicate other characteristics of the database can just as important.
  • Ernestus, M. (2014). Acoustic reduction and the roles of abstractions and exemplars in speech processing. Lingua, 142, 27-41. doi:10.1016/j.lingua.2012.12.006.

    Abstract

    Acoustic reduction refers to the frequent phenomenon in conversational speech that words are produced with fewer or lenited segments compared to their citation forms. The few published studies on the production and comprehension of acoustic reduction have important implications for the debate on the relevance of abstractions and exemplars in speech processing. This article discusses these implications. It first briefly introduces the key assumptions of simple abstractionist and simple exemplar-based models. It then discusses the literature on acoustic reduction and draws the conclusion that both types of models need to be extended to explain all findings. The ultimate model should allow for the storage of different pronunciation variants, but also reserve an important role for phonetic implementation. Furthermore, the recognition of a highly reduced pronunciation variant requires top down information and leads to activation of the corresponding unreduced variant, the variant that reaches listeners’ consciousness. These findings are best accounted for in hybrids models, assuming both abstract representations and exemplars. None of the hybrid models formulated so far can account for all data on reduced speech and we need further research for obtaining detailed insight into how speakers produce and listeners comprehend reduced speech.
  • Ernestus, M., & Giezenaar, G. (2014). Een goed verstaander heeft maar een half woord nodig. In B. Bossers (Ed.), Vakwerk 9: Achtergronden van de NT2-lespraktijk: Lezingen conferentie Hoeven 2014 (pp. 81-92). Amsterdam: BV NT2.
  • Ernestus, M., Kočková-Amortová, L., & Pollak, P. (2014). The Nijmegen corpus of casual Czech. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 365-370).

    Abstract

    This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which contains more than 30 hours of high-quality recordings of casual conversations in Common Czech, among ten groups of three male and ten groups of three female friends. All speakers were native speakers of Czech, raised in Prague or in the region of Central Bohemia, and were between 19 and 26 years old. Every group of speakers consisted of one confederate, who was instructed to keep the conversations lively, and two speakers naive to the purposes of the recordings. The naive speakers were engaged in conversations for approximately 90 minutes, while the confederate joined them for approximately the last 72 minutes. The corpus was orthographically annotated by experienced transcribers and this orthographic transcription was aligned with the speech signal. In addition, the conversations were videotaped. This corpus can form the basis for all types of research on casual conversations in Czech, including phonetic research and research on how to improve automatic speech recognition. The corpus will be freely available
  • Lahey, M., & Ernestus, M. (2014). Pronunciation variation in infant-directed speech: Phonetic reduction of two highly frequent words. Language Learning and Development, 10, 308-327. doi:10.1080/15475441.2013.860813.

    Abstract

    In spontaneous conversations between adults, words are often pronounced with fewer segments or syllables than their citation forms. The question arises whether infant-directed speech also contains phonetic reduction. If so, infants would be presented with speech input that enables them to acquire reduced variants from an early age. This study compared speech directed at 11- and 12-month-old infants with adult-directed conversational speech and adult-directed read speech. In an acoustic study, 216 tokens of the Dutch words allemaal and helemaal from speech corpora were analyzed for duration, number of syllables, and vowel quality. In a perception study, adult participants rated these same materials for reduction and provided phonetic transcriptions. The results show that these two words are frequently reduced in infant-directed speech, and that their degree of reduction is comparable with conversational adult-directed speech. These findings suggest that lexical representations for reduced pronunciation variants can be acquired early in linguistic development

    Files private

    Request files
  • Mizera, P., Pollak, P., Kolman, A., & Ernestus, M. (2014). Impact of irregular pronunciation on phonetic segmentation of Nijmegen corpus of Casual Czech. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings (pp. 499-506). Heidelberg: Springer.

    Abstract

    This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the accuracy of phonetic segmentation when canonical or reduced pronunciation is used. The achieved accuracy of realized phonetic segmentation provides information about general accuracy of proper acoustic modelling which is supposed to be applied in spontaneous speech recognition. As a byproduct of presented spontaneous speech segmentation, this paper also describes the created lexicon with canonical pronunciations of words in NCCCz, a tool supporting pronunciation check of lexicon items, and finally also a minidatabase of selected utterances from NCCCz manually labelled on phonetic level suitable for evaluation purposes
  • Schertz, J., & Ernestus, M. (2014). Variability in the pronunciation of non-native English the: Effects of frequency and disfluencies. Corpus Linguistics and Linguistic Theory, 10, 329-345. doi:10.1515/cllt-2014-0024.

    Abstract

    This study examines how lexical frequency and planning problems can predict phonetic variability in the function word ‘the’ in conversational speech produced by non-native speakers of English. We examined 3180 tokens of ‘the’ drawn from English conversations between native speakers of Czech or Norwegian. Using regression models, we investigated the effect of following word frequency and disfluencies on three phonetic parameters: vowel duration, vowel quality, and consonant quality. Overall, the non-native speakers showed variation that is very similar to the variation displayed by native speakers of English. Like native speakers, Czech speakers showed an effect of frequency on vowel durations, which were shorter in more frequent word sequences. Both groups of speakers showed an effect of frequency on consonant quality: the substitution of another consonant for /ð/ occurred more often in the context of more frequent words. The speakers in this study also showed a native-like allophonic distinction in vowel quality, in which /ði/ occurs more often before vowels and /ðə/ before consonants. Vowel durations were longer in the presence of following disfluencies, again mirroring patterns in native speakers, and the consonant quality was more likely to be the target /ð/ before disfluencies, as opposed to a different consonant. The fact that non-native speakers show native-like sensitivity to lexical frequency and disfluencies suggests that these effects are consequences of a general, non-language-specific production mechanism governing language planning. On the other hand, the non-native speakers in this study did not show native-like patterns of vowel quality in the presence of disfluencies, suggesting that the pattern attested in native speakers of English may result from language-specific processes separate from the general production mechanisms
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2014). Comparing reaction time sequences from human participants and computational models. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 462-466).

    Abstract

    This paper addresses the question how to compare reaction times computed by a computational model of speech comprehension with observed reaction times by participants. The question is based on the observation that reaction time sequences substantially differ per participant, which raises the issue of how exactly the model is to be assessed. Part of the variation in reaction time sequences is caused by the so-called local speed: the current reaction time correlates to some extent with a number of previous reaction times, due to slowly varying variations in attention, fatigue etc. This paper proposes a method, based on time series analysis, to filter the observed reaction times in order to separate the local speed effects. Results show that after such filtering the between-participant correlations increase as well as the average correlation between participant and model increases. The presented technique provides insights into relevant aspects that are to be taken into account when comparing reaction time sequences
  • Ernestus, M., & Neijt, A. (2008). Word length and the location of primary word stress in Dutch, German, and English. Linguistics, 46(3), 507-540. doi:10.1515/LING.2008.017.

    Abstract

    This study addresses the extent to which the location of primary stress in Dutch, German, and English monomorphemic words is affected by the syllables preceding the three final syllables. We present analyses of the monomorphemic words in the CELEX lexical database, which showed that penultimate primary stress is less frequent in Dutch and English trisyllabic than quadrisyllabic words. In addition, we discuss paper-and-pencil experiments in which native speakers assigned primary stress to pseudowords. These experiments provided evidence that in all three languages penultimate stress is more likely in quadrisyllabic than in trisyllabic words. We explain this length effect with the preferences in these languages for word-initial stress and for alternating patterns of stressed and unstressed syllables. The experimental data also showed important intra- and interspeaker variation, and they thus form a challenging test case for theories of language variation.
  • Kuperman, V., Ernestus, M., & Baayen, R. H. (2008). Frequency distributions of uniphones, diphones, and triphones in spontaneous speech. Journal of the Acoustical Society of America, 124(6), 3897-3908. doi:10.1121/1.3006378.

    Abstract

    This paper explores the relationship between the acoustic duration of phonemic sequences and their frequencies of occurrence. The data were obtained from large (sub)corpora of spontaneous speech in Dutch, English, German, and Italian. Acoustic duration of an n-phone is shown to codetermine the n-phone's frequency of use, such that languages preferentially use diphones and triphones that are neither very long nor very short. The observed distributions are well approximated by a theoretical function that quantifies the concurrent action of the self-regulatory processes of minimization of articulatory effort and minimization of perception effort
  • Mitterer, H., & Ernestus, M. (2008). The link between speech perception and production is phonological and abstract: Evidence from the shadowing task. Cognition, 109(1), 168-173. doi:10.1016/j.cognition.2008.08.002.

    Abstract

    This study reports a shadowing experiment, in which one has to repeat a speech stimulus as fast as possible. We tested claims about a direct link between perception and production based on speech gestures, and obtained two types of counterevidence. First, shadowing is not slowed down by a gestural mismatch between stimulus and response. Second, phonetic detail is more likely to be imitated in a shadowing task if it is phonologically relevant. This is consistent with the idea that speech perception and speech production are only loosely coupled, on an abstract phonological level.
  • Mitterer, H., Yoneyama, K., & Ernestus, M. (2008). How we hear what is hardly there: Mechanisms underlying compensation for /t/-reduction in speech comprehension. Journal of Memory and Language, 59, 133-152. doi:10.1016/j.jml.2008.02.004.

    Abstract

    In four experiments, we investigated how listeners compensate for reduced /t/ in Dutch. Mitterer and Ernestus [Mitterer,H., & Ernestus, M. (2006). Listeners recover /t/s that speakers lenite: evidence from /t/-lenition in Dutch. Journal of Phonetics, 34, 73–103] showed that listeners are biased to perceive a /t/ more easily after /s/ than after /n/, compensating for the tendency of speakers to reduce word-final /t/ after /s/ in spontaneous conversations. We tested the robustness of this phonological context effect in perception with three very different experimental tasks: an identification task, a discrimination task with native listeners and with non-native listeners who do not have any experience with /t/-reduction,and a passive listening task (using electrophysiological dependent measures). The context effect was generally robust against these experimental manipulations, although we also observed some deviations from the overall pattern. Our combined results show that the context effect in compensation for reduced /t/ results from a complex process involving auditory constraints, phonological learning, and lexical constraints.
  • De Schryver, J., Neijt, A., Ghesquière, P., & Ernestus, M. (2008). Analogy, frequency, and sound change: The case of Dutch devoicing. Journal of Germanic Linguistics, 20(2), 159-195. doi:10.1017/S1470542708000056.

    Abstract

    This study investigates the roles of phonetic analogy and lexical frequency in an ongoing sound change, the devoicing of fricatives in Dutch, which occurs mainly in the Netherlands and to a lesser degree in Flanders. In the experiment, Dutch and Flemish students read two variants of 98 words: the standard and a nonstandard form with the incorrect voice value of the fricative. Dutch students chose the non-standard forms with devoiced fricatives more often than Flemish students. Moreover, devoicing, though a gradual process, appeared lexically diffused, affecting first the words that are low in frequency and phonetically similar to words with voiceless fricatives.
  • Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.

    Abstract

    This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.
  • Wagner, A., & Ernestus, M. (2008). Identification of phonemes: Differences between phoneme classes and the effect of class size. Phonetica, 65(1-2), 106-127. doi:10.1159/000132389.

    Abstract

    This study reports general and language-specific patterns in phoneme identification. In a series of phoneme monitoring experiments, Castilian Spanish, Catalan, Dutch, English, and Polish listeners identified vowel, fricative, and stop consonant targets that are phonemic in all these languages, embedded in nonsense words. Fricatives were generally identified more slowly than vowels, while the speed of identification for stop consonants was highly dependent on the onset of the measurements. Moreover, listeners' response latencies and accuracy in detecting a phoneme correlated with the number of categories within that phoneme's class in the listener's native phoneme repertoire: more native categories slowed listeners down and decreased their accuracy. We excluded the possibility that this effect stems from differences in the frequencies of occurrence of the phonemes in the different languages. Rather, the effect of the number of categories can be explained by general properties of the perception system, which cause language-specific patterns in speech processing.

Share this page