Publications

Displaying 1 - 14 of 14
  • Bentum, M., Ten Bosch, L., Van den Bosch, A., & Ernestus, M. (2022). Speech register influences listeners’ word expectations. Brain and Language, 235: 105197. doi:10.1016/j.bandl.2022.105197.

    Abstract

    We utilized the N400 effect to investigate the influence of speech register on predictive language processing. Participants listened to long stretches (4 – 15 min) of naturalistic speech from different registers (dialogues, news broadcasts, and read-aloud books), totalling approximately 50,000 words, while the EEG signal was recorded. We estimated the surprisal of words in the speech materials with the aid of a statistical language model in such a manner that it reflected different predictive processing strategies; generic, register-specific, or recency-based. The N400 amplitude was best predicted with register-specific word surprisal, indicating that the statistics of the wider context (i.e., register) influences predictive language processing. Furthermore, adaptation to speech register cannot merely be explained by recency effects; instead, listeners adapt their word anticipations to the presented speech register.
  • Cutler, A., Ernestus, M., Warner, N., & Weber, A. (2022). Managing speech perception data sets. In B. McDonnell, E. Koller, & L. B. Collister (Eds.), The Open Handbook of Linguistic Data Management (pp. 565-573). Cambrdige, MA, USA: MIT Press. doi:10.7551/mitpress/12200.003.0055.
  • Eijk, L., Rasenberg, M., Arnese, F., Blokpoel, M., Dingemanse, M., Doeller, C. F., Ernestus, M., Holler, J., Milivojevic, B., Özyürek, A., Pouw, W., Van Rooij, I., Schriefers, H., Toni, I., Trujillo, J. P., & Bögels, S. (2022). The CABB dataset: A multimodal corpus of communicative interactions for behavioural and neural analyses. NeuroImage, 264: 119734. doi:10.1016/j.neuroimage.2022.119734.

    Abstract

    We present a dataset of behavioural and fMRI observations acquired in the context of humans involved in multimodal referential communication. The dataset contains audio/video and motion-tracking recordings of face-to-face, task-based communicative interactions in Dutch, as well as behavioural and neural correlates of participants’ representations of dialogue referents. Seventy-one pairs of unacquainted participants performed two interleaved interactional tasks in which they described and located 16 novel geometrical objects (i.e., Fribbles) yielding spontaneous interactions of about one hour. We share high-quality video (from three cameras), audio (from head-mounted microphones), and motion-tracking (Kinect) data, as well as speech transcripts of the interactions. Before and after engaging in the face-to-face communicative interactions, participants’ individual representations of the 16 Fribbles were estimated. Behaviourally, participants provided a written description (one to three words) for each Fribble and positioned them along 29 independent conceptual dimensions (e.g., rounded, human, audible). Neurally, fMRI signal evoked by each Fribble was measured during a one-back working-memory task. To enable functional hyperalignment across participants, the dataset also includes fMRI measurements obtained during visual presentation of eight animated movies (35 minutes total). We present analyses for the various types of data demonstrating their quality and consistency with earlier research. Besides high-resolution multimodal interactional data, this dataset includes different correlates of communicative referents, obtained before and after face-to-face dialogue, allowing for novel investigations into the relation between communicative behaviours and the representational space shared by communicators. This unique combination of data can be used for research in neuroscience, psychology, linguistics, and beyond.
  • Marcoux, K., Cooke, M., Tucker, B. V., & Ernestus, M. (2022). The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners. Speech Communication, 136, 53-62. doi:10.1016/j.specom.2021.11.007.

    Abstract

    Speech produced in noise (Lombard speech) is more intelligible than speech produced in quiet (plain speech). Previous research on the Lombard intelligibility benefit focused almost entirely on how native speakers produce and perceive Lombard speech. In this study, we investigate the size of the Lombard intelligibility benefit of both native (American-English) and non-native (native Dutch) English for native and non-native listeners (Dutch and Spanish). We used a glimpsing metric to measure the energetic masking potential of speech, which predicted that both native and non-native Lombard speech could withstand greater amounts of masking to a similar extent, compared to plain speech. In an intelligibility experiment, native English, Spanish, and Dutch listeners listened to the same words, mixed with noise. While the non-native listeners appeared to benefit more from Lombard speech than the native listeners did, each listener group experienced a similar benefit for native and non-native Lombard speech. Energetic masking, as captured by the glimpsing metric, only accounted for part of the Lombard benefit, indicating that the Lombard intelligibility benefit does not only result from a shift in spectral distribution. Despite subtle native language influences on non-native Lombard speech, both native and non-native speech provides a Lombard benefit.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2022). Seeing the advantage: Visually grounding word embeddings to better capture human semantic knowledge. In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2022) (pp. 1-11). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL).

    Abstract

    Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings.Importantly, in both experiments we show that he grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information.
  • Nijveld, A., Ten Bosch, L., & Ernestus, M. (2022). The use of exemplars differs between native and non-native listening. Bilingualism: Language and Cognition, 25(5), 841-855. doi:10.1017/S1366728922000116.

    Abstract

    This study compares the role of exemplars in native and non-native listening. Two English identity priming experiments were conducted with native English, Dutch non-native, and Spanish non-native listeners. In Experiment 1, primes and targets were spoken in the same or a different voice. Only the native listeners showed exemplar effects. In Experiment 2, primes and targets had the same or a different degree of vowel reduction. The Dutch, but not the Spanish, listeners were familiar with this reduction pattern from their L1 phonology. In this experiment, exemplar effects only arose for the Spanish listeners. We propose that in these lexical decision experiments the use of exemplars is co-determined by listeners’ available processing resources, which is modulated by the familiarity with the variation type from their L1 phonology. The use of exemplars differs between native and non-native listening, suggesting qualitative differences between native and non-native speech comprehension processes.
  • Rodd, J., Bosker, H. R., Ernestus, M., Alday, P. M., Meyer, A. S., & Ten Bosch, L. (2020). Control of speaking rate is achieved by switching between qualitatively distinct cognitive ‘gaits’: Evidence from simulation. Psychological Review, 127(2), 281-304. doi:10.1037/rev0000172.

    Abstract

    That speakers can vary their speaking rate is evident, but how they accomplish this has hardly been studied. Consider this analogy: When walking, speed can be continuously increased, within limits, but to speed up further, humans must run. Are there multiple qualitatively distinct speech “gaits” that resemble walking and running? Or is control achieved by continuous modulation of a single gait? This study investigates these possibilities through simulations of a new connectionist computational model of the cognitive process of speech production, EPONA, that borrows from Dell, Burger, and Svec’s (1997) model. The model has parameters that can be adjusted to fit the temporal characteristics of speech at different speaking rates. We trained the model on a corpus of disyllabic Dutch words produced at different speaking rates. During training, different clusters of parameter values (regimes) were identified for different speaking rates. In a 1-gait system, the regimes used to achieve fast and slow speech are qualitatively similar, but quantitatively different. In a multiple gait system, there is no linear relationship between the parameter settings associated with each gait, resulting in an abrupt shift in parameter values to move from speaking slowly to speaking fast. After training, the model achieved good fits in all three speaking rates. The parameter settings associated with each speaking rate were not linearly related, suggesting the presence of cognitive gaits. Thus, we provide the first computationally explicit account of the ability to modulate the speech production system to achieve different speaking styles.

    Additional information

    Supplemental material
  • Ernestus, M., & Neijt, A. (2008). Word length and the location of primary word stress in Dutch, German, and English. Linguistics, 46(3), 507-540. doi:10.1515/LING.2008.017.

    Abstract

    This study addresses the extent to which the location of primary stress in Dutch, German, and English monomorphemic words is affected by the syllables preceding the three final syllables. We present analyses of the monomorphemic words in the CELEX lexical database, which showed that penultimate primary stress is less frequent in Dutch and English trisyllabic than quadrisyllabic words. In addition, we discuss paper-and-pencil experiments in which native speakers assigned primary stress to pseudowords. These experiments provided evidence that in all three languages penultimate stress is more likely in quadrisyllabic than in trisyllabic words. We explain this length effect with the preferences in these languages for word-initial stress and for alternating patterns of stressed and unstressed syllables. The experimental data also showed important intra- and interspeaker variation, and they thus form a challenging test case for theories of language variation.
  • Kuperman, V., Ernestus, M., & Baayen, R. H. (2008). Frequency distributions of uniphones, diphones, and triphones in spontaneous speech. Journal of the Acoustical Society of America, 124(6), 3897-3908. doi:10.1121/1.3006378.

    Abstract

    This paper explores the relationship between the acoustic duration of phonemic sequences and their frequencies of occurrence. The data were obtained from large (sub)corpora of spontaneous speech in Dutch, English, German, and Italian. Acoustic duration of an n-phone is shown to codetermine the n-phone's frequency of use, such that languages preferentially use diphones and triphones that are neither very long nor very short. The observed distributions are well approximated by a theoretical function that quantifies the concurrent action of the self-regulatory processes of minimization of articulatory effort and minimization of perception effort
  • Mitterer, H., & Ernestus, M. (2008). The link between speech perception and production is phonological and abstract: Evidence from the shadowing task. Cognition, 109(1), 168-173. doi:10.1016/j.cognition.2008.08.002.

    Abstract

    This study reports a shadowing experiment, in which one has to repeat a speech stimulus as fast as possible. We tested claims about a direct link between perception and production based on speech gestures, and obtained two types of counterevidence. First, shadowing is not slowed down by a gestural mismatch between stimulus and response. Second, phonetic detail is more likely to be imitated in a shadowing task if it is phonologically relevant. This is consistent with the idea that speech perception and speech production are only loosely coupled, on an abstract phonological level.
  • Mitterer, H., Yoneyama, K., & Ernestus, M. (2008). How we hear what is hardly there: Mechanisms underlying compensation for /t/-reduction in speech comprehension. Journal of Memory and Language, 59, 133-152. doi:10.1016/j.jml.2008.02.004.

    Abstract

    In four experiments, we investigated how listeners compensate for reduced /t/ in Dutch. Mitterer and Ernestus [Mitterer,H., & Ernestus, M. (2006). Listeners recover /t/s that speakers lenite: evidence from /t/-lenition in Dutch. Journal of Phonetics, 34, 73–103] showed that listeners are biased to perceive a /t/ more easily after /s/ than after /n/, compensating for the tendency of speakers to reduce word-final /t/ after /s/ in spontaneous conversations. We tested the robustness of this phonological context effect in perception with three very different experimental tasks: an identification task, a discrimination task with native listeners and with non-native listeners who do not have any experience with /t/-reduction,and a passive listening task (using electrophysiological dependent measures). The context effect was generally robust against these experimental manipulations, although we also observed some deviations from the overall pattern. Our combined results show that the context effect in compensation for reduced /t/ results from a complex process involving auditory constraints, phonological learning, and lexical constraints.
  • De Schryver, J., Neijt, A., Ghesquière, P., & Ernestus, M. (2008). Analogy, frequency, and sound change: The case of Dutch devoicing. Journal of Germanic Linguistics, 20(2), 159-195. doi:10.1017/S1470542708000056.

    Abstract

    This study investigates the roles of phonetic analogy and lexical frequency in an ongoing sound change, the devoicing of fricatives in Dutch, which occurs mainly in the Netherlands and to a lesser degree in Flanders. In the experiment, Dutch and Flemish students read two variants of 98 words: the standard and a nonstandard form with the incorrect voice value of the fricative. Dutch students chose the non-standard forms with devoiced fricatives more often than Flemish students. Moreover, devoicing, though a gradual process, appeared lexically diffused, affecting first the words that are low in frequency and phonetically similar to words with voiceless fricatives.
  • Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.

    Abstract

    This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.
  • Wagner, A., & Ernestus, M. (2008). Identification of phonemes: Differences between phoneme classes and the effect of class size. Phonetica, 65(1-2), 106-127. doi:10.1159/000132389.

    Abstract

    This study reports general and language-specific patterns in phoneme identification. In a series of phoneme monitoring experiments, Castilian Spanish, Catalan, Dutch, English, and Polish listeners identified vowel, fricative, and stop consonant targets that are phonemic in all these languages, embedded in nonsense words. Fricatives were generally identified more slowly than vowels, while the speed of identification for stop consonants was highly dependent on the onset of the measurements. Moreover, listeners' response latencies and accuracy in detecting a phoneme correlated with the number of categories within that phoneme's class in the listener's native phoneme repertoire: more native categories slowed listeners down and decreased their accuracy. We excluded the possibility that this effect stems from differences in the frequencies of occurrence of the phonemes in the different languages. Rather, the effect of the number of categories can be explained by general properties of the perception system, which cause language-specific patterns in speech processing.

Share this page