Publications

Displaying 1 - 14 of 14
  • Rodd, J., Bosker, H. R., Ernestus, M., Alday, P. M., Meyer, A. S., & Ten Bosch, L. (2020). Control of speaking rate is achieved by switching between qualitatively distinct cognitive ‘gaits’: Evidence from simulation. Psychological Review, 127(2), 281-304. doi:10.1037/rev0000172.

    Abstract

    That speakers can vary their speaking rate is evident, but how they accomplish this has hardly been studied. Consider this analogy: When walking, speed can be continuously increased, within limits, but to speed up further, humans must run. Are there multiple qualitatively distinct speech “gaits” that resemble walking and running? Or is control achieved by continuous modulation of a single gait? This study investigates these possibilities through simulations of a new connectionist computational model of the cognitive process of speech production, EPONA, that borrows from Dell, Burger, and Svec’s (1997) model. The model has parameters that can be adjusted to fit the temporal characteristics of speech at different speaking rates. We trained the model on a corpus of disyllabic Dutch words produced at different speaking rates. During training, different clusters of parameter values (regimes) were identified for different speaking rates. In a 1-gait system, the regimes used to achieve fast and slow speech are qualitatively similar, but quantitatively different. In a multiple gait system, there is no linear relationship between the parameter settings associated with each gait, resulting in an abrupt shift in parameter values to move from speaking slowly to speaking fast. After training, the model achieved good fits in all three speaking rates. The parameter settings associated with each speaking rate were not linearly related, suggesting the presence of cognitive gaits. Thus, we provide the first computationally explicit account of the ability to modulate the speech production system to achieve different speaking styles.

    Additional information

    Supplemental material
  • Drijvers, L., Mulder, K., & Ernestus, M. (2016). Alpha and gamma band oscillations index differential processing of acoustically reduced and full forms. Brain and Language, 153-154, 27-37. doi:10.1016/j.bandl.2016.01.003.

    Abstract

    Reduced forms like yeshay for yesterday often occur in conversations. Previous behavioral research reported a processing advantage for full over reduced forms. The present study investigated whether this processing advantage is reflected in a modulation of alpha (8–12 Hz) and gamma (30+ Hz) band activity. In three electrophysiological experiments, participants listened to full and reduced forms in isolation (Experiment 1), sentence-final position (Experiment 2), or mid-sentence position (Experiment 3). Alpha power was larger in response to reduced forms than to full forms, but only in Experiments 1 and 2. We interpret these increases in alpha power as reflections of higher auditory cognitive load. In all experiments, gamma power only increased in response to full forms, which we interpret as showing that lexical activation spreads more quickly through the semantic network for full than for reduced forms. These results confirm a processing advantage for full forms, especially in non-medial sentence position.
  • Ernestus, M., Giezenaar, G., & Dikmans, M. (2016). Ikfstajezotuuknie: Half uitgesproken woorden in alledaagse gesprekken. Les, 199, 7-9.

    Abstract

    Amsterdam klinkt in informele gesprekken vaak als Amsdam en Rotterdam als Rodam, zonder dat de meeste moedertaalsprekers zich daar bewust van zijn. In alledaagse situaties valt een aanzienlijk deel van de klanken weg. Daarnaast worden veel klanken zwakker gearticuleerd (bijvoorbeeld een d als een j, als de mond niet helemaal afgesloten wordt). Het lijkt waarschijnlijk dat deze half uitgesproken woorden een probleem vormen voor tweedetaalleerders. Gereduceerde vormen kunnen immers sterk afwijken van de vormen die deze leerders geleerd hebben. Of dit werkelijk zo is, hebben de auteurs onderzocht in twee studies. Voordat ze deze twee studies bespreken, vertellen ze eerst kort iets over de verschillende typen reducties die voorkomen.
  • Ernestus, M. (2016). L'utilisation des corpus oraux pour la recherche en (psycho)linguistique. In M. Kilani-Schoch, C. Surcouf, & A. Xanthos (Eds.), Nouvelles technologies et standards méthodologiques en linguistique (pp. 65-93). Lausanne: Université de Lausanne.
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2016). Combining data-oriented and process-oriented approaches to modeling reaction time data. In Proceedings of Interspeech 2016: The 17th Annual Conference of the International Speech Communication Association (pp. 2801-2805). doi:10.21437/Interspeech.2016-1072.

    Abstract

    This paper combines two different approaches to modeling reaction time data from lexical decision experiments, viz. a dataoriented statistical analysis by means of a linear mixed effects model, and a process-oriented computational model of human speech comprehension. The linear mixed effect model is implemented by lmer in R. As computational model we apply DIANA, an end-to-end computational model which aims at modeling the cognitive processes underlying speech comprehension. DIANA takes as input the speech signal, and provides as output the orthographic transcription of the stimulus, a word/non-word judgment and the associated reaction time. Previous studies have shown that DIANA shows good results for large-scale lexical decision experiments in Dutch and North-American English. We investigate whether predictors that appear significant in an lmer analysis and processes implemented in DIANA can be related and inform both approaches. Predictors such as ‘previous reaction time’ can be related to a process description; other predictors, such as ‘lexical neighborhood’ are hard-coded in lmer and emergent in DIANA. The analysis focuses on the interaction between subject variables and task variables in lmer, and the ways in which these interactions can be implemented in DIANA.
  • Ten Bosch, L., Giezenaar, G., Boves, L., & Ernestus, M. (2016). Modeling language-learners' errors in understanding casual speech. In G. Adda, V. Barbu Mititelu, J. Mariani, D. Tufiş, & I. Vasilescu (Eds.), Errors by humans and machines in multimedia, multimodal, multilingual data processing. Proceedings of Errare 2015 (pp. 107-121). Bucharest: Editura Academiei Române.

    Abstract

    In spontaneous conversations, words are often produced in reduced form compared to formal careful speech. In English, for instance, ’probably’ may be pronounced as ’poly’ and ’police’ as ’plice’. Reduced forms are very common, and native listeners usually do not have any problems with interpreting these reduced forms in context. Non-native listeners, however, have great difficulties in comprehending reduced forms. In order to investigate the problems in comprehension that non-native listeners experience, a dictation experiment was conducted in which sentences were presented auditorily to non-natives either in full (unreduced) or reduced form. The types of errors made by the L2 listeners reveal aspects of the cognitive processes underlying this dictation task. In addition, we compare the errors made by these human participants with the type of word errors made by DIANA, a recently developed computational model of word comprehension.
  • Tucker, B., & Ernestus, M. (2016). Why we need to investigate casual speech to truly understand language production, processing and the mental lexicon. The Mental Lexicon, 11(3), 375-400. doi:10.1075/ml.11.3.03tuc.

    Abstract

    The majority of studies addressing psycholinguistic questions focus on speech produced and processed in a careful, laboratory speech style. This ‘careful’ speech is very different from the speech that listeners encounter in casual conversations. This article argues that research on casual speech is necessary to show the validity of conclusions based on careful speech. Moreover, research on casual speech produces new insights and questions on the processes underlying communication and on the mental lexicon that cannot be revealed by research using careful speech. This article first places research on casual speech in its historic perspective. It then provides many examples of how casual speech differs from careful speech and shows that these differences may have important implications for psycholinguistic theories. Subsequently, the article discusses the challenges that research on casual speech faces, which stem from the high variability of this speech style, its necessary casual context, and that casual speech is connected speech. We also present opportunities for research on casual speech, mostly in the form of new experimental methods that facilitate research on connected speech. However, real progress can only be made if these new methods are combined with advanced (still to be developed) statistical techniques.
  • Ernestus, M. (2014). Acoustic reduction and the roles of abstractions and exemplars in speech processing. Lingua, 142, 27-41. doi:10.1016/j.lingua.2012.12.006.

    Abstract

    Acoustic reduction refers to the frequent phenomenon in conversational speech that words are produced with fewer or lenited segments compared to their citation forms. The few published studies on the production and comprehension of acoustic reduction have important implications for the debate on the relevance of abstractions and exemplars in speech processing. This article discusses these implications. It first briefly introduces the key assumptions of simple abstractionist and simple exemplar-based models. It then discusses the literature on acoustic reduction and draws the conclusion that both types of models need to be extended to explain all findings. The ultimate model should allow for the storage of different pronunciation variants, but also reserve an important role for phonetic implementation. Furthermore, the recognition of a highly reduced pronunciation variant requires top down information and leads to activation of the corresponding unreduced variant, the variant that reaches listeners’ consciousness. These findings are best accounted for in hybrids models, assuming both abstract representations and exemplars. None of the hybrid models formulated so far can account for all data on reduced speech and we need further research for obtaining detailed insight into how speakers produce and listeners comprehend reduced speech.
  • Ernestus, M., & Giezenaar, G. (2014). Een goed verstaander heeft maar een half woord nodig. In B. Bossers (Ed.), Vakwerk 9: Achtergronden van de NT2-lespraktijk: Lezingen conferentie Hoeven 2014 (pp. 81-92). Amsterdam: BV NT2.
  • Ernestus, M., Kočková-Amortová, L., & Pollak, P. (2014). The Nijmegen corpus of casual Czech. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 365-370).

    Abstract

    This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which contains more than 30 hours of high-quality recordings of casual conversations in Common Czech, among ten groups of three male and ten groups of three female friends. All speakers were native speakers of Czech, raised in Prague or in the region of Central Bohemia, and were between 19 and 26 years old. Every group of speakers consisted of one confederate, who was instructed to keep the conversations lively, and two speakers naive to the purposes of the recordings. The naive speakers were engaged in conversations for approximately 90 minutes, while the confederate joined them for approximately the last 72 minutes. The corpus was orthographically annotated by experienced transcribers and this orthographic transcription was aligned with the speech signal. In addition, the conversations were videotaped. This corpus can form the basis for all types of research on casual conversations in Czech, including phonetic research and research on how to improve automatic speech recognition. The corpus will be freely available
  • Lahey, M., & Ernestus, M. (2014). Pronunciation variation in infant-directed speech: Phonetic reduction of two highly frequent words. Language Learning and Development, 10, 308-327. doi:10.1080/15475441.2013.860813.

    Abstract

    In spontaneous conversations between adults, words are often pronounced with fewer segments or syllables than their citation forms. The question arises whether infant-directed speech also contains phonetic reduction. If so, infants would be presented with speech input that enables them to acquire reduced variants from an early age. This study compared speech directed at 11- and 12-month-old infants with adult-directed conversational speech and adult-directed read speech. In an acoustic study, 216 tokens of the Dutch words allemaal and helemaal from speech corpora were analyzed for duration, number of syllables, and vowel quality. In a perception study, adult participants rated these same materials for reduction and provided phonetic transcriptions. The results show that these two words are frequently reduced in infant-directed speech, and that their degree of reduction is comparable with conversational adult-directed speech. These findings suggest that lexical representations for reduced pronunciation variants can be acquired early in linguistic development

    Files private

    Request files
  • Mizera, P., Pollak, P., Kolman, A., & Ernestus, M. (2014). Impact of irregular pronunciation on phonetic segmentation of Nijmegen corpus of Casual Czech. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings (pp. 499-506). Heidelberg: Springer.

    Abstract

    This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the accuracy of phonetic segmentation when canonical or reduced pronunciation is used. The achieved accuracy of realized phonetic segmentation provides information about general accuracy of proper acoustic modelling which is supposed to be applied in spontaneous speech recognition. As a byproduct of presented spontaneous speech segmentation, this paper also describes the created lexicon with canonical pronunciations of words in NCCCz, a tool supporting pronunciation check of lexicon items, and finally also a minidatabase of selected utterances from NCCCz manually labelled on phonetic level suitable for evaluation purposes
  • Schertz, J., & Ernestus, M. (2014). Variability in the pronunciation of non-native English the: Effects of frequency and disfluencies. Corpus Linguistics and Linguistic Theory, 10, 329-345. doi:10.1515/cllt-2014-0024.

    Abstract

    This study examines how lexical frequency and planning problems can predict phonetic variability in the function word ‘the’ in conversational speech produced by non-native speakers of English. We examined 3180 tokens of ‘the’ drawn from English conversations between native speakers of Czech or Norwegian. Using regression models, we investigated the effect of following word frequency and disfluencies on three phonetic parameters: vowel duration, vowel quality, and consonant quality. Overall, the non-native speakers showed variation that is very similar to the variation displayed by native speakers of English. Like native speakers, Czech speakers showed an effect of frequency on vowel durations, which were shorter in more frequent word sequences. Both groups of speakers showed an effect of frequency on consonant quality: the substitution of another consonant for /ð/ occurred more often in the context of more frequent words. The speakers in this study also showed a native-like allophonic distinction in vowel quality, in which /ði/ occurs more often before vowels and /ðə/ before consonants. Vowel durations were longer in the presence of following disfluencies, again mirroring patterns in native speakers, and the consonant quality was more likely to be the target /ð/ before disfluencies, as opposed to a different consonant. The fact that non-native speakers show native-like sensitivity to lexical frequency and disfluencies suggests that these effects are consequences of a general, non-language-specific production mechanism governing language planning. On the other hand, the non-native speakers in this study did not show native-like patterns of vowel quality in the presence of disfluencies, suggesting that the pattern attested in native speakers of English may result from language-specific processes separate from the general production mechanisms
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2014). Comparing reaction time sequences from human participants and computational models. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 462-466).

    Abstract

    This paper addresses the question how to compare reaction times computed by a computational model of speech comprehension with observed reaction times by participants. The question is based on the observation that reaction time sequences substantially differ per participant, which raises the issue of how exactly the model is to be assessed. Part of the variation in reaction time sequences is caused by the so-called local speed: the current reaction time correlates to some extent with a number of previous reaction times, due to slowly varying variations in attention, fatigue etc. This paper proposes a method, based on time series analysis, to filter the observed reaction times in order to separate the local speed effects. Results show that after such filtering the between-participant correlations increase as well as the average correlation between participant and model increases. The presented technique provides insights into relevant aspects that are to be taken into account when comparing reaction time sequences

Share this page