Displaying 1 - 18 of 18
-
Bentum, M., Ten Bosch, L., Van den Bosch, A., & Ernestus, M. (2022). Speech register influences listeners’ word expectations. Brain and Language, 235: 105197. doi:10.1016/j.bandl.2022.105197.
Abstract
We utilized the N400 effect to investigate the influence of speech register on predictive language processing. Participants listened to long stretches (4 – 15 min) of naturalistic speech from different registers (dialogues, news broadcasts, and read-aloud books), totalling approximately 50,000 words, while the EEG signal was recorded. We estimated the surprisal of words in the speech materials with the aid of a statistical language model in such a manner that it reflected different predictive processing strategies; generic, register-specific, or recency-based. The N400 amplitude was best predicted with register-specific word surprisal, indicating that the statistics of the wider context (i.e., register) influences predictive language processing. Furthermore, adaptation to speech register cannot merely be explained by recency effects; instead, listeners adapt their word anticipations to the presented speech register. -
Cutler, A., Ernestus, M., Warner, N., & Weber, A. (2022). Managing speech perception data sets. In B. McDonnell, E. Koller, & L. B. Collister (
Eds. ), The Open Handbook of Linguistic Data Management (pp. 565-573). Cambrdige, MA, USA: MIT Press. doi:10.7551/mitpress/12200.003.0055. -
Eijk, L., Rasenberg, M., Arnese, F., Blokpoel, M., Dingemanse, M., Doeller, C. F., Ernestus, M., Holler, J., Milivojevic, B., Özyürek, A., Pouw, W., Van Rooij, I., Schriefers, H., Toni, I., Trujillo, J. P., & Bögels, S. (2022). The CABB dataset: A multimodal corpus of communicative interactions for behavioural and neural analyses. NeuroImage, 264: 119734. doi:10.1016/j.neuroimage.2022.119734.
Abstract
We present a dataset of behavioural and fMRI observations acquired in the context of humans involved in multimodal referential communication. The dataset contains audio/video and motion-tracking recordings of face-to-face, task-based communicative interactions in Dutch, as well as behavioural and neural correlates of participants’ representations of dialogue referents. Seventy-one pairs of unacquainted participants performed two interleaved interactional tasks in which they described and located 16 novel geometrical objects (i.e., Fribbles) yielding spontaneous interactions of about one hour. We share high-quality video (from three cameras), audio (from head-mounted microphones), and motion-tracking (Kinect) data, as well as speech transcripts of the interactions. Before and after engaging in the face-to-face communicative interactions, participants’ individual representations of the 16 Fribbles were estimated. Behaviourally, participants provided a written description (one to three words) for each Fribble and positioned them along 29 independent conceptual dimensions (e.g., rounded, human, audible). Neurally, fMRI signal evoked by each Fribble was measured during a one-back working-memory task. To enable functional hyperalignment across participants, the dataset also includes fMRI measurements obtained during visual presentation of eight animated movies (35 minutes total). We present analyses for the various types of data demonstrating their quality and consistency with earlier research. Besides high-resolution multimodal interactional data, this dataset includes different correlates of communicative referents, obtained before and after face-to-face dialogue, allowing for novel investigations into the relation between communicative behaviours and the representational space shared by communicators. This unique combination of data can be used for research in neuroscience, psychology, linguistics, and beyond. -
Marcoux, K., Cooke, M., Tucker, B. V., & Ernestus, M. (2022). The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners. Speech Communication, 136, 53-62. doi:10.1016/j.specom.2021.11.007.
Abstract
Speech produced in noise (Lombard speech) is more intelligible than speech produced in quiet (plain speech). Previous research on the Lombard intelligibility benefit focused almost entirely on how native speakers produce and perceive Lombard speech. In this study, we investigate the size of the Lombard intelligibility benefit of both native (American-English) and non-native (native Dutch) English for native and non-native listeners (Dutch and Spanish). We used a glimpsing metric to measure the energetic masking potential of speech, which predicted that both native and non-native Lombard speech could withstand greater amounts of masking to a similar extent, compared to plain speech. In an intelligibility experiment, native English, Spanish, and Dutch listeners listened to the same words, mixed with noise. While the non-native listeners appeared to benefit more from Lombard speech than the native listeners did, each listener group experienced a similar benefit for native and non-native Lombard speech. Energetic masking, as captured by the glimpsing metric, only accounted for part of the Lombard benefit, indicating that the Lombard intelligibility benefit does not only result from a shift in spectral distribution. Despite subtle native language influences on non-native Lombard speech, both native and non-native speech provides a Lombard benefit. -
Merkx, D., Frank, S. L., & Ernestus, M. (2022). Seeing the advantage: Visually grounding word embeddings to better capture human semantic knowledge. In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (
Eds. ), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2022) (pp. 1-11). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL).Abstract
Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings.Importantly, in both experiments we show that he grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information. -
Nijveld, A., Ten Bosch, L., & Ernestus, M. (2022). The use of exemplars differs between native and non-native listening. Bilingualism: Language and Cognition, 25(5), 841-855. doi:10.1017/S1366728922000116.
Abstract
This study compares the role of exemplars in native and non-native listening. Two English identity priming experiments were conducted with native English, Dutch non-native, and Spanish non-native listeners. In Experiment 1, primes and targets were spoken in the same or a different voice. Only the native listeners showed exemplar effects. In Experiment 2, primes and targets had the same or a different degree of vowel reduction. The Dutch, but not the Spanish, listeners were familiar with this reduction pattern from their L1 phonology. In this experiment, exemplar effects only arose for the Spanish listeners. We propose that in these lexical decision experiments the use of exemplars is co-determined by listeners’ available processing resources, which is modulated by the familiarity with the variation type from their L1 phonology. The use of exemplars differs between native and non-native listening, suggesting qualitative differences between native and non-native speech comprehension processes. -
Ernestus, M., Dikmans, M., & Giezenaar, G. (2017). Advanced second language learners experience difficulties processing reduced word pronunciation variants. Dutch Journal of Applied Linguistics, 6(1), 1-20. doi:10.1075/dujal.6.1.01ern.
Abstract
Words are often pronounced with fewer segments in casual conversations than in formal speech. Previous research has shown that foreign language learners and beginning second language learners experience problems processing reduced speech. We examined whether this also holds for advanced second language learners. We designed a dictation task in Dutch consisting of sentences spliced from casual conversations and an unreduced counterpart of this task, with the same sentences carefully articulated by the same speaker. Advanced second language learners of Dutch produced substantially more transcription errors for the reduced than for the unreduced sentences. These errors made the sentences incomprehensible or led to non-intended meanings. The learners often did not rely on the semantic and syntactic information in the sentence or on the subsegmental cues to overcome the reductions. Hence, advanced second language learners also appear to suffer from the reduced pronunciation variants of words that are abundant in everyday conversations -
Ernestus, M., Kouwenhoven, H., & Van Mulken, M. (2017). The direct and indirect effects of the phonotactic constraints in the listener's native language on the comprehension of reduced and unreduced word pronunciation variants in a foreign language. Journal of Phonetics, 62, 50-64. doi:10.1016/j.wocn.2017.02.003.
Abstract
This study investigates how the comprehension of casual speech in foreign languages is affected by the phonotactic constraints in the listener’s native language. Non-native listeners of English with different native languages heard short English phrases produced by native speakers of English or Spanish and they indicated whether these phrases included can or can’t. Native Mandarin listeners especially tended to interpret can’t as can. We interpret this result as a direct effect of the ban on word-final /nt/ in Mandarin. Both the native Mandarin and the native Spanish listeners did not take full advantage of the subsegmental information in the speech signal cueing reduced can’t. This finding is probably an indirect effect of the phonotactic constraints in their native languages: these listeners have difficulties interpreting the subsegmental cues because these cues do not occur or have different functions in their native languages. Dutch resembles English in the phonotactic constraints relevant to the comprehension of can’t, and native Dutch listeners showed similar patterns in their comprehension of native and non-native English to native English listeners. This result supports our conclusion that the major patterns in the comprehension results are driven by the phonotactic constraints in the listeners’ native languages. -
Ten Bosch, L., Boves, L., & Ernestus, M. (2017). The recognition of compounds: A computational account. In Proceedings of Interspeech 2017 (pp. 1158-1162). doi:10.21437/Interspeech.2017-1048.
Abstract
This paper investigates the processes in comprehending spoken noun-noun compounds, using data from the BALDEY database. BALDEY contains lexicality judgments and reaction times (RTs) for Dutch stimuli for which also linguistic information is included. Two different approaches are combined. The first is based on regression by Dynamic Survival Analysis, which models decisions and RTs as a consequence of the fact that a cumulative density function exceeds some threshold. The parameters of that function are estimated from the observed RT data. The second approach is based on DIANA, a process-oriented computational model of human word comprehension, which simulates the comprehension process with the acoustic stimulus as input. DIANA gives the identity and the number of the word candidates that are activated at each 10 ms time step.
Both approaches show how the processes involved in comprehending compounds change during a stimulus. Survival Analysis shows that the impact of word duration varies during the course of a stimulus. The density of word and non-word hypotheses in DIANA shows a corresponding pattern with different regimes. We show how the approaches complement each other, and discuss additional ways in which data and process models can be combined. -
De Vaan, L., Van Krieken, K., Van den Bosch, W., Schreuder, R., & Ernestus, M. (2017). The traces that novel morphologically complex words leave in memory are abstract in nature. The Mental Lexicon, 12(2), 181-218. doi:10.1075/ml.16006.vaa.
Abstract
Previous work has shown that novel morphologically complex words (henceforth neologisms) leave traces in memory after just one encounter. This study addressed the question whether these traces are abstract in nature or exemplars. In three experiments, neologisms were either primed by themselves or by their stems. The primes occurred in the visual modality whereas the targets were presented in the auditory modality (Experiment 1) or vice versa (Experiments 2 and 3). The primes were presented in sentences in a selfpaced reading task (Experiment 1) or in stories in a listening comprehension task (Experiments 2 and 3). The targets were incorporated in lexical decision tasks, auditory or visual (Experiment 1 and Experiment 2, respectively), or in stories in a self-paced reading task (Experiment 3). The experimental part containing the targets immediately followed the familiarization phase with the primes (Experiment 1), or after a one week delay (Experiments 2 and 3). In all experiments, participants recognized neologisms faster if they had encountered them before (identity priming) than if the familiarization phase only contained the neologisms’ stems (stem priming). These results show that the priming effects are robust despite substantial differences between the primes and the targets. This suggests that the traces novel morphologically complex words leave in memory after just one encounter are abstract in nature. -
Viebahn, M., Ernestus, M., & McQueen, J. M. (2017). Speaking style influences the brain’s electrophysiological response to grammatical errors in speech comprehension. Journal of Cognitive Neuroscience, 29(7), 1132-1146. doi:10.1162/jocn_a_01095.
Abstract
This electrophysiological study asked whether the brain processes grammatical gender
violations in casual speech differently than in careful speech. Native speakers of Dutch were
presented with utterances that contained adjective-noun pairs in which the adjective was either
correctly inflected with a word-final schwa (e.g. een spannende roman “a suspenseful novel”) or
incorrectly uninflected without that schwa (een spannend roman). Consistent with previous
findings, the uninflected adjectives elicited an electrical brain response sensitive to syntactic
violations when the talker was speaking in a careful manner. When the talker was speaking in a
casual manner, this response was absent. A control condition showed electrophysiological responses
for carefully as well as casually produced utterances with semantic anomalies, showing that
listeners were able to understand the content of both types of utterance. The results suggest that
listeners take information about the speaking style of a talker into account when processing the
acoustic-phonetic information provided by the speech signal. Absent schwas in casual speech are
effectively not grammatical gender violations. These changes in syntactic processing are evidence
of contextually-driven neural flexibility.Files private
Request files -
Ernestus, M. (2014). Acoustic reduction and the roles of abstractions and exemplars in speech processing. Lingua, 142, 27-41. doi:10.1016/j.lingua.2012.12.006.
Abstract
Acoustic reduction refers to the frequent phenomenon in conversational speech that words are produced with fewer or lenited segments compared to their citation forms. The few published studies on the production and comprehension of acoustic reduction have important implications for the debate on the relevance of abstractions and exemplars in speech processing. This article discusses these implications. It first briefly introduces the key assumptions of simple abstractionist and simple exemplar-based models. It then discusses the literature on acoustic reduction and draws the conclusion that both types of models need to be extended to explain all findings. The ultimate model should allow for the storage of different pronunciation variants, but also reserve an important role for phonetic implementation. Furthermore, the recognition of a highly reduced pronunciation variant requires top down information and leads to activation of the corresponding unreduced variant, the variant that reaches listeners’ consciousness. These findings are best accounted for in hybrids models, assuming both abstract representations and exemplars. None of the hybrid models formulated so far can account for all data on reduced speech and we need further research for obtaining detailed insight into how speakers produce and listeners comprehend reduced speech. -
Ernestus, M., & Giezenaar, G. (2014). Een goed verstaander heeft maar een half woord nodig. In B. Bossers (
Ed. ), Vakwerk 9: Achtergronden van de NT2-lespraktijk: Lezingen conferentie Hoeven 2014 (pp. 81-92). Amsterdam: BV NT2. -
Ernestus, M., Kočková-Amortová, L., & Pollak, P. (2014). The Nijmegen corpus of casual Czech. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (
Eds. ), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 365-370).Abstract
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which contains more than 30 hours of high-quality recordings of casual conversations in Common Czech, among ten groups of three male and ten groups of three female friends. All speakers were native speakers of Czech, raised in Prague or in the region of Central Bohemia, and were between 19 and 26 years old. Every group of speakers consisted of one confederate, who was instructed to keep the conversations lively, and two speakers naive to the purposes of the recordings. The naive speakers were engaged in conversations for approximately 90 minutes, while the confederate joined them for approximately the last 72 minutes. The corpus was orthographically annotated by experienced transcribers and this orthographic transcription was aligned with the speech signal. In addition, the conversations were videotaped. This corpus can form the basis for all types of research on casual conversations in Czech, including phonetic research and research on how to improve automatic speech recognition. The corpus will be freely available -
Lahey, M., & Ernestus, M. (2014). Pronunciation variation in infant-directed speech: Phonetic reduction of two highly frequent words. Language Learning and Development, 10, 308-327. doi:10.1080/15475441.2013.860813.
Abstract
In spontaneous conversations between adults, words are often pronounced with fewer segments or syllables than their citation forms. The question arises whether infant-directed speech also contains phonetic reduction. If so, infants would be presented with speech input that enables them to acquire reduced variants from an early age. This study compared speech directed at 11- and 12-month-old infants with adult-directed conversational speech and adult-directed read speech. In an acoustic study, 216 tokens of the Dutch words allemaal and helemaal from speech corpora were analyzed for duration, number of syllables, and vowel quality. In a perception study, adult participants rated these same materials for reduction and provided phonetic transcriptions. The results show that these two words are frequently reduced in infant-directed speech, and that their degree of reduction is comparable with conversational adult-directed speech. These findings suggest that lexical representations for reduced pronunciation variants can be acquired early in linguistic developmentFiles private
Request files -
Mizera, P., Pollak, P., Kolman, A., & Ernestus, M. (2014). Impact of irregular pronunciation on phonetic segmentation of Nijmegen corpus of Casual Czech. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (
Eds. ), Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings (pp. 499-506). Heidelberg: Springer.Abstract
This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the accuracy of phonetic segmentation when canonical or reduced pronunciation is used. The achieved accuracy of realized phonetic segmentation provides information about general accuracy of proper acoustic modelling which is supposed to be applied in spontaneous speech recognition. As a byproduct of presented spontaneous speech segmentation, this paper also describes the created lexicon with canonical pronunciations of words in NCCCz, a tool supporting pronunciation check of lexicon items, and finally also a minidatabase of selected utterances from NCCCz manually labelled on phonetic level suitable for evaluation purposes -
Schertz, J., & Ernestus, M. (2014). Variability in the pronunciation of non-native English the: Effects of frequency and disfluencies. Corpus Linguistics and Linguistic Theory, 10, 329-345. doi:10.1515/cllt-2014-0024.
Abstract
This study examines how lexical frequency and planning problems can predict phonetic variability in the function word ‘the’ in conversational speech produced by non-native speakers of English. We examined 3180 tokens of ‘the’ drawn from English conversations between native speakers of Czech or Norwegian. Using regression models, we investigated the effect of following word frequency and disfluencies on three phonetic parameters: vowel duration, vowel quality, and consonant quality. Overall, the non-native speakers showed variation that is very similar to the variation displayed by native speakers of English. Like native speakers, Czech speakers showed an effect of frequency on vowel durations, which were shorter in more frequent word sequences. Both groups of speakers showed an effect of frequency on consonant quality: the substitution of another consonant for /ð/ occurred more often in the context of more frequent words. The speakers in this study also showed a native-like allophonic distinction in vowel quality, in which /ði/ occurs more often before vowels and /ðə/ before consonants. Vowel durations were longer in the presence of following disfluencies, again mirroring patterns in native speakers, and the consonant quality was more likely to be the target /ð/ before disfluencies, as opposed to a different consonant. The fact that non-native speakers show native-like sensitivity to lexical frequency and disfluencies suggests that these effects are consequences of a general, non-language-specific production mechanism governing language planning. On the other hand, the non-native speakers in this study did not show native-like patterns of vowel quality in the presence of disfluencies, suggesting that the pattern attested in native speakers of English may result from language-specific processes separate from the general production mechanisms -
Ten Bosch, L., Ernestus, M., & Boves, L. (2014). Comparing reaction time sequences from human participants and computational models. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 462-466).
Abstract
This paper addresses the question how to compare reaction times computed by a computational model of speech comprehension with observed reaction times by participants. The question is based on the observation that reaction time sequences substantially differ per participant, which raises the issue of how exactly the model is to be assessed. Part of the variation in reaction time sequences is caused by the so-called local speed: the current reaction time correlates to some extent with a number of previous reaction times, due to slowly varying variations in attention, fatigue etc. This paper proposes a method, based on time series analysis, to filter the observed reaction times in order to separate the local speed effects. Results show that after such filtering the between-participant correlations increase as well as the average correlation between participant and model increases. The presented technique provides insights into relevant aspects that are to be taken into account when comparing reaction time sequences
Share this page