Publications

Displaying 1 - 8 of 8
  • Bentum, M., Ten Bosch, L., Van den Bosch, A., & Ernestus, M. (2022). Speech register influences listeners’ word expectations. Brain and Language, 235: 105197. doi:10.1016/j.bandl.2022.105197.

    Abstract

    We utilized the N400 effect to investigate the influence of speech register on predictive language processing. Participants listened to long stretches (4 – 15 min) of naturalistic speech from different registers (dialogues, news broadcasts, and read-aloud books), totalling approximately 50,000 words, while the EEG signal was recorded. We estimated the surprisal of words in the speech materials with the aid of a statistical language model in such a manner that it reflected different predictive processing strategies; generic, register-specific, or recency-based. The N400 amplitude was best predicted with register-specific word surprisal, indicating that the statistics of the wider context (i.e., register) influences predictive language processing. Furthermore, adaptation to speech register cannot merely be explained by recency effects; instead, listeners adapt their word anticipations to the presented speech register.
  • Cutler, A., Ernestus, M., Warner, N., & Weber, A. (2022). Managing speech perception data sets. In B. McDonnell, E. Koller, & L. B. Collister (Eds.), The Open Handbook of Linguistic Data Management (pp. 565-573). Cambrdige, MA, USA: MIT Press. doi:10.7551/mitpress/12200.003.0055.
  • Eijk, L., Rasenberg, M., Arnese, F., Blokpoel, M., Dingemanse, M., Doeller, C. F., Ernestus, M., Holler, J., Milivojevic, B., Özyürek, A., Pouw, W., Van Rooij, I., Schriefers, H., Toni, I., Trujillo, J. P., & Bögels, S. (2022). The CABB dataset: A multimodal corpus of communicative interactions for behavioural and neural analyses. NeuroImage, 264: 119734. doi:10.1016/j.neuroimage.2022.119734.

    Abstract

    We present a dataset of behavioural and fMRI observations acquired in the context of humans involved in multimodal referential communication. The dataset contains audio/video and motion-tracking recordings of face-to-face, task-based communicative interactions in Dutch, as well as behavioural and neural correlates of participants’ representations of dialogue referents. Seventy-one pairs of unacquainted participants performed two interleaved interactional tasks in which they described and located 16 novel geometrical objects (i.e., Fribbles) yielding spontaneous interactions of about one hour. We share high-quality video (from three cameras), audio (from head-mounted microphones), and motion-tracking (Kinect) data, as well as speech transcripts of the interactions. Before and after engaging in the face-to-face communicative interactions, participants’ individual representations of the 16 Fribbles were estimated. Behaviourally, participants provided a written description (one to three words) for each Fribble and positioned them along 29 independent conceptual dimensions (e.g., rounded, human, audible). Neurally, fMRI signal evoked by each Fribble was measured during a one-back working-memory task. To enable functional hyperalignment across participants, the dataset also includes fMRI measurements obtained during visual presentation of eight animated movies (35 minutes total). We present analyses for the various types of data demonstrating their quality and consistency with earlier research. Besides high-resolution multimodal interactional data, this dataset includes different correlates of communicative referents, obtained before and after face-to-face dialogue, allowing for novel investigations into the relation between communicative behaviours and the representational space shared by communicators. This unique combination of data can be used for research in neuroscience, psychology, linguistics, and beyond.
  • Marcoux, K., Cooke, M., Tucker, B. V., & Ernestus, M. (2022). The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners. Speech Communication, 136, 53-62. doi:10.1016/j.specom.2021.11.007.

    Abstract

    Speech produced in noise (Lombard speech) is more intelligible than speech produced in quiet (plain speech). Previous research on the Lombard intelligibility benefit focused almost entirely on how native speakers produce and perceive Lombard speech. In this study, we investigate the size of the Lombard intelligibility benefit of both native (American-English) and non-native (native Dutch) English for native and non-native listeners (Dutch and Spanish). We used a glimpsing metric to measure the energetic masking potential of speech, which predicted that both native and non-native Lombard speech could withstand greater amounts of masking to a similar extent, compared to plain speech. In an intelligibility experiment, native English, Spanish, and Dutch listeners listened to the same words, mixed with noise. While the non-native listeners appeared to benefit more from Lombard speech than the native listeners did, each listener group experienced a similar benefit for native and non-native Lombard speech. Energetic masking, as captured by the glimpsing metric, only accounted for part of the Lombard benefit, indicating that the Lombard intelligibility benefit does not only result from a shift in spectral distribution. Despite subtle native language influences on non-native Lombard speech, both native and non-native speech provides a Lombard benefit.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2022). Seeing the advantage: Visually grounding word embeddings to better capture human semantic knowledge. In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2022) (pp. 1-11). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL).

    Abstract

    Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings.Importantly, in both experiments we show that he grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information.
  • Nijveld, A., Ten Bosch, L., & Ernestus, M. (2022). The use of exemplars differs between native and non-native listening. Bilingualism: Language and Cognition, 25(5), 841-855. doi:10.1017/S1366728922000116.

    Abstract

    This study compares the role of exemplars in native and non-native listening. Two English identity priming experiments were conducted with native English, Dutch non-native, and Spanish non-native listeners. In Experiment 1, primes and targets were spoken in the same or a different voice. Only the native listeners showed exemplar effects. In Experiment 2, primes and targets had the same or a different degree of vowel reduction. The Dutch, but not the Spanish, listeners were familiar with this reduction pattern from their L1 phonology. In this experiment, exemplar effects only arose for the Spanish listeners. We propose that in these lexical decision experiments the use of exemplars is co-determined by listeners’ available processing resources, which is modulated by the familiarity with the variation type from their L1 phonology. The use of exemplars differs between native and non-native listening, suggesting qualitative differences between native and non-native speech comprehension processes.
  • Rodd, J., Bosker, H. R., Ernestus, M., Alday, P. M., Meyer, A. S., & Ten Bosch, L. (2020). Control of speaking rate is achieved by switching between qualitatively distinct cognitive ‘gaits’: Evidence from simulation. Psychological Review, 127(2), 281-304. doi:10.1037/rev0000172.

    Abstract

    That speakers can vary their speaking rate is evident, but how they accomplish this has hardly been studied. Consider this analogy: When walking, speed can be continuously increased, within limits, but to speed up further, humans must run. Are there multiple qualitatively distinct speech “gaits” that resemble walking and running? Or is control achieved by continuous modulation of a single gait? This study investigates these possibilities through simulations of a new connectionist computational model of the cognitive process of speech production, EPONA, that borrows from Dell, Burger, and Svec’s (1997) model. The model has parameters that can be adjusted to fit the temporal characteristics of speech at different speaking rates. We trained the model on a corpus of disyllabic Dutch words produced at different speaking rates. During training, different clusters of parameter values (regimes) were identified for different speaking rates. In a 1-gait system, the regimes used to achieve fast and slow speech are qualitatively similar, but quantitatively different. In a multiple gait system, there is no linear relationship between the parameter settings associated with each gait, resulting in an abrupt shift in parameter values to move from speaking slowly to speaking fast. After training, the model achieved good fits in all three speaking rates. The parameter settings associated with each speaking rate were not linearly related, suggesting the presence of cognitive gaits. Thus, we provide the first computationally explicit account of the ability to modulate the speech production system to achieve different speaking styles.

    Additional information

    Supplemental material
  • Ernestus, M., Baayen, R. H., & Schreuder, R. (2002). The recognition of reduced word forms. Brain and Language, 81(1-3), 162-173. doi:10.1006/brln.2001.2514.

    Abstract

    This article addresses the recognition of reduced word forms, which are frequent in casual speech. We describe two experiments on Dutch showing that listeners only recognize highly reduced forms well when these forms are presented in their full context and that the probability that a listener recognizes a word form in limited context is strongly correlated with the degree of reduction of the form. Moreover, we show that the effect of degree of reduction can only partly be interpreted as the effect of the intelligibility of the acoustic signal, which is negatively correlated with degree of reduction. We discuss the consequences of our findings for models of spoken word recognition and especially for the role that storage plays in these models.

Share this page