Publications

Displaying 1 - 46 of 46
  • Severijnen, G. G. A., Bosker, H. R., & McQueen, J. M. (2024). Your “VOORnaam” is not my “VOORnaam”: An acoustic analysis of individual talker differences in word stress in Dutch. Journal of Phonetics, 103: 101296. doi:10.1016/j.wocn.2024.101296.

    Abstract

    Different talkers speak differently, even within the same homogeneous group. These differences lead to acoustic variability in speech, causing challenges for correct perception of the intended message. Because previous descriptions of this acoustic variability have focused mostly on segments, talker variability in prosodic structures is not yet well documented. The present study therefore examined acoustic between-talker variability in word stress in Dutch. We recorded 40 native Dutch talkers from a participant sample with minimal dialectal variation and balanced gender, producing segmentally overlapping words (e.g., VOORnaam vs. voorNAAM; ‘first name’ vs. ‘respectable’, capitalization indicates lexical stress), and measured different acoustic cues to stress. Each individual participant’s acoustic measurements were analyzed using Linear Discriminant Analyses, which provide coefficients for each cue, reflecting the strength of each cue in a talker’s productions. On average, talkers primarily used mean F0, intensity, and duration. Moreover, each participant also employed a unique combination of cues, illustrating large prosodic variability between talkers. In fact, classes of cue-weighting tendencies emerged, differing in which cue was used as the main cue. These results offer the most comprehensive acoustic description, to date, of word stress in Dutch, and illustrate that large prosodic variability is present between individual talkers.
  • Wu, M., Bosker, H. R., & Riecke, L. (2023). Sentential contextual facilitation of auditory word processing builds up during sentence tracking. Journal of Cognitive Neuroscience, 35(8), 1262 -1278. doi:10.1162/jocn_a_02007.

    Abstract

    While listening to meaningful speech, auditory input is processed more rapidly near the end (vs. beginning) of sentences. Although several studies have shown such word-to-word changes in auditory input processing, it is still unclear from which processing level these word-to-word dynamics originate. We investigated whether predictions derived from sentential context can result in auditory word-processing dynamics during sentence tracking. We presented healthy human participants with auditory stimuli consisting of word sequences, arranged into either predictable (coherent sentences) or less predictable (unstructured, random word sequences) 42-Hz amplitude-modulated speech, and a continuous 25-Hz amplitude-modulated distractor tone. We recorded RTs and frequency-tagged neuroelectric responses 1(auditory steady-state responses) to individual words at multiple temporal positions within the sentences, and quantified sentential context effects at each position while controlling for individual word characteristics (i.e., phonetics, frequency, and familiarity). We found that sentential context increasingly facilitates auditory word processing as evidenced by accelerated RTs and increased auditory steady-state responses to later-occurring words within sentences. These purely top–down contextually driven auditory word-processing dynamics occurred only when listeners focused their attention on the speech and did not transfer to the auditory processing of the concurrent distractor tone. These findings indicate that auditory word-processing dynamics during sentence tracking can originate from sentential predictions. The predictions depend on the listeners' attention to the speech, and affect only the processing of the parsed speech, not that of concurrently presented auditory streams.
  • Papoutsi*, C., Zimianiti*, E., Bosker, H. R., & Frost, R. L. A. (2023). Statistical learning at a virtual cocktail party. Psychonomic Bulletin & Review. Advance online publication. doi:10.3758/s13423-023-02384-1.

    Abstract

    * These two authors contributed equally to this study
    Statistical learning – the ability to extract distributional regularities from input – is suggested to be key to language acquisition. Yet, evidence for the human capacity for statistical learning comes mainly from studies conducted in carefully controlled settings without auditory distraction. While such conditions permit careful examination of learning, they do not reflect the naturalistic language learning experience, which is replete with auditory distraction – including competing talkers. Here, we examine how statistical language learning proceeds in a virtual cocktail party environment, where the to-be-learned input is presented alongside a competing speech stream with its own distributional regularities. During exposure, participants in the Dual Talker group concurrently heard two novel languages, one produced by a female talker and one by a male talker, with each talker virtually positioned at opposite sides of the listener (left/right) using binaural acoustic manipulations. Selective attention was manipulated by instructing participants to attend to only one of the two talkers. At test, participants were asked to distinguish words from part-words for both the attended and the unattended languages. Results indicated that participants’ accuracy was significantly higher for trials from the attended vs. unattended
    language. Further, the performance of this Dual Talker group was no different compared to a control group who heard only one language from a single talker (Single Talker group). We thus conclude that statistical learning is modulated by selective attention, being relatively robust against the additional cognitive load provided by competing speech, emphasizing its efficiency in naturalistic language learning situations.

    Additional information

    supplementary file
  • Severijnen, G. G. A., Di Dona, G., Bosker, H. R., & McQueen, J. M. (2023). Tracking talker-specific cues to lexical stress: Evidence from perceptual learning. Journal of Experimental Psychology: Human Perception and Performance, 49(4), 549-565. doi:10.1037/xhp0001105.

    Abstract

    When recognizing spoken words, listeners are confronted by variability in the speech signal caused by talker differences. Previous research has focused on segmental talker variability; less is known about how suprasegmental variability is handled. Here we investigated the use of perceptual learning to deal with between-talker differences in lexical stress. Two groups of participants heard Dutch minimal stress pairs (e.g., VOORnaam vs. voorNAAM, “first name” vs. “respectable”) spoken by two male talkers. Group 1 heard Talker 1 use only F0 to signal stress (intensity and duration values were ambiguous), while Talker 2 used only intensity (F0 and duration were ambiguous). Group 2 heard the reverse talker-cue mappings. After training, participants were tested on words from both talkers containing conflicting stress cues (“mixed items”; e.g., one spoken by Talker 1 with F0 signaling initial stress and intensity signaling final stress). We found that listeners used previously learned information about which talker used which cue to interpret the mixed items. For example, the mixed item described above tended to be interpreted as having initial stress by Group 1 but as having final stress by Group 2. This demonstrates that listeners learn how individual talkers signal stress and use that knowledge in spoken-word recognition.
  • Bosker, H. R. (2022). Evidence for selective adaptation and recalibration in the perception of lexical stress. Language and Speech, 65(2), 472-490. doi:10.1177/00238309211030307.

    Abstract

    Individuals vary in how they produce speech. This variability affects both the segments (vowels and consonants) and the suprasegmental properties of their speech (prosody). Previous literature has demonstrated that listeners can adapt to variability in how different talkers pronounce the segments of speech. This study shows that listeners can also adapt to variability in how talkers produce lexical stress. Experiment 1 demonstrates a selective adaptation effect in lexical stress perception: repeatedly hearing Dutch trochaic words biased perception of a subsequent lexical stress continuum towards more iamb responses. Experiment 2 demonstrates a recalibration effect in lexical stress perception: when ambiguous suprasegmental cues to lexical stress were disambiguated by lexical orthographic context as signaling a trochaic word in an exposure phase, Dutch participants categorized a subsequent test continuum as more trochee-like. Moreover, the selective adaptation and recalibration effects generalized to novel words, not encountered during exposure. Together, the experiments demonstrate that listeners also flexibly adapt to variability in the suprasegmental properties of speech, thus expanding our understanding of the utility of listener adaptation in speech perception. Moreover, the combined outcomes speak for an architecture of spoken word recognition involving abstract prosodic representations at a prelexical level of analysis.
  • Reinisch, E., & Bosker, H. R. (2022). Encoding speech rate in challenging listening conditions: White noise and reverberation. Attention, Perception & Psychophysics, 84, 2303 -2318. doi:10.3758/s13414-022-02554-8.

    Abstract

    Temporal contrasts in speech are perceived relative to the speech rate of the surrounding context. That is, following a fast context
    sentence, listeners interpret a given target sound as longer than following a slow context, and vice versa. This rate effect, often
    referred to as “rate-dependent speech perception,” has been suggested to be the result of a robust, low-level perceptual process,
    typically examined in quiet laboratory settings. However, speech perception often occurs in more challenging listening condi-
    tions. Therefore, we asked whether rate-dependent perception would be (partially) compromised by signal degradation relative to
    a clear listening condition. Specifically, we tested effects of white noise and reverberation, with the latter specifically distorting
    temporal information. We hypothesized that signal degradation would reduce the precision of encoding the speech rate in the
    context and thereby reduce the rate effect relative to a clear context. This prediction was borne out for both types of degradation in
    Experiment 1, where the context sentences but not the subsequent target words were degraded. However, in Experiment 2, which
    compared rate effects when contexts and targets were coherent in terms of signal quality, no reduction of the rate effect was
    found. This suggests that, when confronted with coherently degraded signals, listeners adapt to challenging listening situations,
    eliminating the difference between rate-dependent perception in clear and degraded conditions. Overall, the present study
    contributes towards understanding the consequences of different types of listening environments on the functioning of low-
    level perceptual processes that listeners use during speech perception.

    Additional information

    Data availability
  • Bosker, H. R. (2021). Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies. Behavior Research Methods, 53(5), 1945-1953. doi:10.3758/s13428-021-01542-4.

    Abstract

    Many studies of speech perception assess the intelligibility of spoken sentence stimuli by means
    of transcription tasks (‘type out what you hear’). The intelligibility of a given stimulus is then often
    expressed in terms of percentage of words correctly reported from the target sentence. Yet scoring
    the participants’ raw responses for words correctly identified from the target sentence is a time-
    consuming task, and hence resource-intensive. Moreover, there is no consensus among speech
    scientists about what specific protocol to use for the human scoring, limiting the reliability of
    human scores. The present paper evaluates various forms of fuzzy string matching between
    participants’ responses and target sentences, as automated metrics of listener transcript accuracy.
    We demonstrate that one particular metric, the Token Sort Ratio, is a consistent, highly efficient,
    and accurate metric for automated assessment of listener transcripts, as evidenced by high
    correlations with human-generated scores (best correlation: r = 0.940) and a strong relationship to
    acoustic markers of speech intelligibility. Thus, fuzzy string matching provides a practical tool for
    assessment of listener transcript accuracy in large-scale speech intelligibility studies. See
    https://tokensortratio.netlify.app for an online implementation.
  • Bosker, H. R., Badaya, E., & Corley, M. (2021). Discourse markers activate their, like, cohort competitors. Discourse Processes, 58(9), 837-851. doi:10.1080/0163853X.2021.1924000.

    Abstract

    Speech in everyday conversations is riddled with discourse markers (DMs), such as well, you know, and like. However, in many lab-based studies of speech comprehension, such DMs are typically absent from the carefully articulated and highly controlled speech stimuli. As such, little is known about how these DMs influence online word recognition. The present study specifically investigated the online processing of DM like and how it influences the activation of words in the mental lexicon. We specifically targeted the cohort competitor (CC) effect in the Visual World Paradigm: Upon hearing spoken instructions to “pick up the beaker,” human listeners also typically fixate—next to the target object—referents that overlap phonologically with the target word (cohort competitors such as beetle; CCs). However, several studies have argued that CC effects are constrained by syntactic, semantic, pragmatic, and discourse constraints. Therefore, the present study investigated whether DM like influences online word recognition by activating its cohort competitors (e.g., lightbulb). In an eye-tracking experiment using the Visual World Paradigm, we demonstrate that when participants heard spoken instructions such as “Now press the button for the, like … unicycle,” they showed anticipatory looks to the CC referent (lightbulb)well before hearing the target. This CC effect was sustained for a relatively long period of time, even despite hearing disambiguating information (i.e., the /k/ in like). Analysis of the reaction times also showed that participants were significantly faster to select CC targets (lightbulb) when preceded by DM like. These findings suggest that seemingly trivial DMs, such as like, activate their CCs, impacting online word recognition. Thus, we advocate a more holistic perspective on spoken language comprehension in naturalistic communication, including the processing of DMs.
  • Bosker, H. R., & Peeters, D. (2021). Beat gestures influence which speech sounds you hear. Proceedings of the Royal Society B: Biological Sciences, 288: 20202419. doi:10.1098/rspb.2020.2419.

    Abstract

    Beat gestures—spontaneously produced biphasic movements of the hand—
    are among the most frequently encountered co-speech gestures in human
    communication. They are closely temporally aligned to the prosodic charac-
    teristics of the speech signal, typically occurring on lexically stressed
    syllables. Despite their prevalence across speakers of the world’s languages,
    how beat gestures impact spoken word recognition is unclear. Can these
    simple ‘flicks of the hand’ influence speech perception? Across a range
    of experiments, we demonstrate that beat gestures influence the explicit
    and implicit perception of lexical stress (e.g. distinguishing OBject from
    obJECT), and in turn can influence what vowels listeners hear. Thus, we pro-
    vide converging evidence for a manual McGurk effect: relatively simple and
    widely occurring hand movements influence which speech sounds we hear

    Additional information

    example stimuli and experimental data
  • Bosker, H. R. (2021). The contribution of amplitude modulations in speech to perceived charisma. In B. Weiss, J. Trouvain, M. Barkat-Defradas, & J. J. Ohala (Eds.), Voice attractiveness: Prosody, phonology and phonetics (pp. 165-181). Singapore: Springer. doi:10.1007/978-981-15-6627-1_10.

    Abstract

    Speech contains pronounced amplitude modulations in the 1–9 Hz range, correlating with the syllabic rate of speech. Recent models of speech perception propose that this rhythmic nature of speech is central to speech recognition and has beneficial effects on language processing. Here, we investigated the contribution of amplitude modulations to the subjective impression listeners have of public speakers. The speech from US presidential candidates Hillary Clinton and Donald Trump in the three TV debates of 2016 was acoustically analyzed by means of modulation spectra. These indicated that Clinton’s speech had more pronounced amplitude modulations than Trump’s speech, particularly in the 1–9 Hz range. A subsequent perception experiment, with listeners rating the perceived charisma of (low-pass filtered versions of) Clinton’s and Trump’s speech, showed that more pronounced amplitude modulations (i.e., more ‘rhythmic’ speech) increased perceived charisma ratings. These outcomes highlight the important contribution of speech rhythm to charisma perception.
  • Rodd, J., Decuyper, C., Bosker, H. R., & Ten Bosch, L. (2021). A tool for efficient and accurate segmentation of speech data: Announcing POnSS. Behavior Research Methods, 53, 744-756. doi:10.3758/s13428-020-01449-6.

    Abstract

    Despite advances in automatic speech recognition (ASR), human input is still essential to produce research-grade segmentations of speech data. Con- ventional approaches to manual segmentation are very labour-intensive. We introduce POnSS, a browser-based system that is specialized for the task of segmenting the onsets and offsets of words, that combines aspects of ASR with limited human input. In developing POnSS, we identified several sub- tasks of segmentation, and implemented each of these as separate interfaces for the annotators to interact with, to streamline their task as much as possible. We evaluated segmentations made with POnSS against a base- line of segmentations of the same data made conventionally in Praat. We observed that POnSS achieved comparable reliability to segmentation us- ing Praat, but required 23% less annotator time investment. Because of its greater efficiency without sacrificing reliability, POnSS represents a distinct methodological advance for the segmentation of speech data.
  • Severijnen, G. G. A., Bosker, H. R., Piai, V., & McQueen, J. M. (2021). Listeners track talker-specific prosody to deal with talker-variability. Brain Research, 1769: 147605. doi:10.1016/j.brainres.2021.147605.

    Abstract

    One of the challenges in speech perception is that listeners must deal with considerable
    segmental and suprasegmental variability in the acoustic signal due to differences between talkers. Most previous studies have focused on how listeners deal with segmental variability.
    In this EEG experiment, we investigated whether listeners track talker-specific usage of suprasegmental cues to lexical stress to recognize spoken words correctly. In a three-day training phase, Dutch participants learned to map non-word minimal stress pairs onto different object referents (e.g., USklot meant “lamp”; usKLOT meant “train”). These non-words were
    produced by two male talkers. Critically, each talker used only one suprasegmental cue to signal stress (e.g., Talker A used only F0 and Talker B only intensity). We expected participants to learn which talker used which cue to signal stress. In the test phase, participants indicated whether spoken sentences including these non-words were correct (“The word for lamp is…”).
    We found that participants were slower to indicate that a stimulus was correct if the non-word was produced with the unexpected cue (e.g., Talker A using intensity). That is, if in training Talker A used F0 to signal stress, participants experienced a mismatch between predicted and perceived phonological word-forms if, at test, Talker A unexpectedly used intensity to cue
    stress. In contrast, the N200 amplitude, an event-related potential related to phonological
    prediction, was not modulated by the cue mismatch. Theoretical implications of these
    contrasting results are discussed. The behavioral findings illustrate talker-specific prediction of prosodic cues, picked up through perceptual learning during training.
  • Bosker, H. R., & Cooke, M. (2020). Enhanced amplitude modulations contribute to the Lombard intelligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech. The Journal of the Acoustical Society of America, 147: 721. doi:10.1121/10.0000646.

    Abstract

    Speakers adjust their voice when talking in noise, which is known as Lombard speech. These acoustic adjustments facilitate speech comprehension in noise relative to plain speech (i.e., speech produced in quiet). However, exactly which characteristics of Lombard speech drive this intelligibility benefit in noise remains unclear. This study assessed the contribution of enhanced amplitude modulations to the Lombard speech intelligibility benefit by demonstrating that (1) native speakers of Dutch in the Nijmegen Corpus of Lombard Speech (NiCLS) produce more pronounced amplitude modulations in noise vs. in quiet; (2) more enhanced amplitude modulations correlate positively with intelligibility in a speech-in-noise perception experiment; (3) transplanting the amplitude modulations from Lombard speech onto plain speech leads to an intelligibility improvement, suggesting that enhanced amplitude modulations in Lombard speech contribute towards intelligibility in noise. Results are discussed in light of recent neurobiological models of speech perception with reference to neural oscillators phase-locking to the amplitude modulations in speech, guiding the processing of speech.
  • Bosker, H. R., Peeters, D., & Holler, J. (2020). How visual cues to speech rate influence speech perception. Quarterly Journal of Experimental Psychology, 73(10), 1523-1536. doi:10.1177/1747021820914564.

    Abstract

    Spoken words are highly variable and therefore listeners interpret speech sounds relative to the surrounding acoustic context, such as the speech rate of a preceding sentence. For instance, a vowel midway between short /ɑ/ and long /a:/ in Dutch is perceived as short /ɑ/ in the context of preceding slow speech, but as long /a:/ if preceded by a fast context. Despite the well-established influence of visual articulatory cues on speech comprehension, it remains unclear whether visual cues to speech rate also influence subsequent spoken word recognition. In two ‘Go Fish’-like experiments, participants were presented with audio-only (auditory speech + fixation cross), visual-only (mute videos of talking head), and audiovisual (speech + videos) context sentences, followed by ambiguous target words containing vowels midway between short /ɑ/ and long /a:/. In Experiment 1, target words were always presented auditorily, without visual articulatory cues. Although the audio-only and audiovisual contexts induced a rate effect (i.e., more long /a:/ responses after fast contexts), the visual-only condition did not. When, in Experiment 2, target words were presented audiovisually, rate effects were observed in all three conditions, including visual-only. This suggests that visual cues to speech rate in a context sentence influence the perception of following visual target cues (e.g., duration of lip aperture), which at an audiovisual integration stage bias participants’ target categorization responses. These findings contribute to a better understanding of how what we see influences what we hear.
  • Bosker, H. R., Sjerps, M. J., & Reinisch, E. (2020). Temporal contrast effects in human speech perception are immune to selective attention. Scientific Reports, 10: 5607. doi:10.1038/s41598-020-62613-8.

    Abstract

    Two fundamental properties of perception are selective attention and perceptual contrast, but how these two processes interact remains unknown. Does an attended stimulus history exert a larger contrastive influence on the perception of a following target than unattended stimuli? Dutch listeners categorized target sounds with a reduced prefix “ge-” marking tense (e.g., ambiguous between gegaan-gaan “gone-go”). In ‘single talker’ Experiments 1–2, participants perceived the reduced syllable (reporting gegaan) when the target was heard after a fast sentence, but not after a slow sentence (reporting gaan). In ‘selective attention’ Experiments 3–5, participants listened to two simultaneous sentences from two different talkers, followed by the same target sounds, with instructions to attend only one of the two talkers. Critically, the speech rates of attended and unattended talkers were found to equally influence target perception – even when participants could watch the attended talker speak. In fact, participants’ target perception in ‘selective attention’ Experiments 3–5 did not differ from participants who were explicitly instructed to divide their attention equally across the two talkers (Experiment 6). This suggests that contrast effects of speech rate are immune to selective attention, largely operating prior to attentional stream segregation in the auditory processing hierarchy.

    Additional information

    Supplementary information
  • Bosker, H. R., Sjerps, M. J., & Reinisch, E. (2020). Spectral contrast effects are modulated by selective attention in ‘cocktail party’ settings. Attention, Perception & Psychophysics, 82, 1318-1332. doi:10.3758/s13414-019-01824-2.

    Abstract

    Speech sounds are perceived relative to spectral properties of surrounding speech. For instance, target words ambiguous between /bɪt/ (with low F1) and /bɛt/ (with high F1) are more likely to be perceived as “bet” after a ‘low F1’ sentence, but as “bit” after a ‘high F1’ sentence. However, it is unclear how these spectral contrast effects (SCEs) operate in multi-talker listening conditions. Recently, Feng and Oxenham [(2018b). J.Exp.Psychol.-Hum.Percept.Perform. 44(9), 1447–1457] reported that selective attention affected SCEs to a small degree, using two simultaneously presented sentences produced by a single talker. The present study assessed the role of selective attention in more naturalistic ‘cocktail party’ settings, with 200 lexically unique sentences, 20 target words, and different talkers. Results indicate that selective attention to one talker in one ear (while ignoring another talker in the other ear) modulates SCEs in such a way that only the spectral properties of the attended talker influences target perception. However, SCEs were much smaller in multi-talker settings (Experiment 2) than those in single-talker settings (Experiment 1). Therefore, the influence of SCEs on speech comprehension in more naturalistic settings (i.e., with competing talkers) may be smaller than estimated based on studies without competing talkers.

    Additional information

    13414_2019_1824_MOESM1_ESM.docx
  • Kaufeld, G., Naumann, W., Meyer, A. S., Bosker, H. R., & Martin, A. E. (2020). Contextual speech rate influences morphosyntactic prediction and integration. Language, Cognition and Neuroscience, 35(7), 933-948. doi:10.1080/23273798.2019.1701691.

    Abstract

    Understanding spoken language requires the integration and weighting of multiple cues, and may call on cue integration mechanisms that have been studied in other areas of perception. In the current study, we used eye-tracking (visual-world paradigm) to examine how contextual speech rate (a lower-level, perceptual cue) and morphosyntactic knowledge (a higher-level, linguistic cue) are iteratively combined and integrated. Results indicate that participants used contextual rate information immediately, which we interpret as evidence of perceptual inference and the generation of predictions about upcoming morphosyntactic information. Additionally, we observed that early rate effects remained active in the presence of later conflicting lexical information. This result demonstrates that (1) contextual speech rate functions as a cue to morphosyntactic inferences, even in the presence of subsequent disambiguating information; and (2) listeners iteratively use multiple sources of information to draw inferences and generate predictions during speech comprehension. We discuss the implication of these demonstrations for theories of language processing
  • Kaufeld, G., Ravenschlag, A., Meyer, A. S., Martin, A. E., & Bosker, H. R. (2020). Knowledge-based and signal-based cues are weighted flexibly during spoken language comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 46(3), 549-562. doi:10.1037/xlm0000744.

    Abstract

    During spoken language comprehension, listeners make use of both knowledge-based and signal-based sources of information, but little is known about how cues from these distinct levels of representational hierarchy are weighted and integrated online. In an eye-tracking experiment using the visual world paradigm, we investigated the flexible weighting and integration of morphosyntactic gender marking (a knowledge-based cue) and contextual speech rate (a signal-based cue). We observed that participants used the morphosyntactic cue immediately to make predictions about upcoming referents, even in the presence of uncertainty about the cue’s reliability. Moreover, we found speech rate normalization effects in participants’ gaze patterns even in the presence of preceding morphosyntactic information. These results demonstrate that cues are weighted and integrated flexibly online, rather than adhering to a strict hierarchy. We further found rate normalization effects in the looking behavior of participants who showed a strong behavioral preference for the morphosyntactic gender cue. This indicates that rate normalization effects are robust and potentially automatic. We discuss these results in light of theories of cue integration and the two-stage model of acoustic context effects
  • Kaufeld, G., Bosker, H. R., Ten Oever, S., Alday, P. M., Meyer, A. S., & Martin, A. E. (2020). Linguistic structure and meaning organize neural oscillations into a content-specific hierarchy. The Journal of Neuroscience, 49(2), 9467-9475. doi:10.1523/JNEUROSCI.0302-20.2020.

    Abstract

    Neural oscillations track linguistic information during speech comprehension (e.g., Ding et al., 2016; Keitel et al., 2018), and are known to be modulated by acoustic landmarks and speech intelligibility (e.g., Doelling et al., 2014; Zoefel & VanRullen, 2015). However, studies investigating linguistic tracking have either relied on non-naturalistic isochronous stimuli or failed to fully control for prosody. Therefore, it is still unclear whether low frequency activity tracks linguistic structure during natural speech, where linguistic structure does not follow such a palpable temporal pattern. Here, we measured electroencephalography (EEG) and manipulated the presence of semantic and syntactic information apart from the timescale of their occurrence, while carefully controlling for the acoustic-prosodic and lexical-semantic information in the signal. EEG was recorded while 29 adult native speakers (22 women, 7 men) listened to naturally-spoken Dutch sentences, jabberwocky controls with morphemes and sentential prosody, word lists with lexical content but no phrase structure, and backwards acoustically-matched controls. Mutual information (MI) analysis revealed sensitivity to linguistic content: MI was highest for sentences at the phrasal (0.8-1.1 Hz) and lexical timescale (1.9-2.8 Hz), suggesting that the delta-band is modulated by lexically-driven combinatorial processing beyond prosody, and that linguistic content (i.e., structure and meaning) organizes neural oscillations beyond the timescale and rhythmicity of the stimulus. This pattern is consistent with neurophysiologically inspired models of language comprehension (Martin, 2016, 2020; Martin & Doumas, 2017) where oscillations encode endogenously generated linguistic content over and above exogenous or stimulus-driven timing and rhythm information.
  • Kösem, A., Bosker, H. R., Jensen, O., Hagoort, P., & Riecke, L. (2020). Biasing the perception of spoken words with transcranial alternating current stimulation. Journal of Cognitive Neuroscience, 32(8), 1428-1437. doi:10.1162/jocn_a_01579.

    Abstract

    Recent neuroimaging evidence suggests that the frequency of entrained oscillations in auditory cortices influences the perceived duration of speech segments, impacting word perception (Kösem et al. 2018). We further tested the causal influence of neural entrainment frequency during speech processing, by manipulating entrainment with continuous transcranial alternating
    current stimulation (tACS) at distinct oscillatory frequencies (3 Hz and 5.5 Hz) above the auditory cortices. Dutch participants listened to speech and were asked to report their percept of a target Dutch word, which contained a vowel with an ambiguous duration. Target words
    were presented either in isolation (first experiment) or at the end of spoken sentences (second experiment). We predicted that the tACS frequency would influence neural entrainment and
    therewith how speech is perceptually sampled, leading to a perceptual over- or underestimation of the vowel’s duration. Whereas results from Experiment 1 did not confirm this prediction, results from experiment 2 suggested a small effect of tACS frequency on target word
    perception: Faster tACS lead to more long-vowel word percepts, in line with the previous neuroimaging findings. Importantly, the difference in word perception induced by the different tACS frequencies was significantly larger in experiment 1 vs. experiment 2, suggesting that the
    impact of tACS is dependent on the sensory context. tACS may have a stronger effect on spoken word perception when the words are presented in continuous speech as compared to when they are isolated, potentially because prior (stimulus-induced) entrainment of brain oscillations
    might be a prerequisite for tACS to be effective.

    Additional information

    Data availability
  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2020). Eye-tracking the time course of distal and global speech rate effects. Journal of Experimental Psychology: Human Perception and Performance, 46(10), 1148-1163. doi:10.1037/xhp0000838.

    Abstract

    To comprehend speech sounds, listeners tune in to speech rate information in the proximal (immediately adjacent), distal (non-adjacent), and global context (further removed preceding and following sentences). Effects of global contextual speech rate cues on speech perception have been shown to follow constraints not found for proximal and distal speech rate. Therefore, listeners may process such global cues at distinct time points during word recognition. We conducted a printed-word eye-tracking experiment to compare the time courses of distal and global rate effects. Results indicated that the distal rate effect emerged immediately after target sound presentation, in line with a general-auditory account. The global rate effect, however, arose more than 200 ms later than the distal rate effect, indicating that distal and global context effects involve distinct processing mechanisms. Results are interpreted in a two-stage model of acoustic context effects. This model posits that distal context effects involve very early perceptual processes, while global context effects arise at a later stage, involving cognitive adjustments conditioned by higher-level information.
  • Rodd, J., Bosker, H. R., Ernestus, M., Alday, P. M., Meyer, A. S., & Ten Bosch, L. (2020). Control of speaking rate is achieved by switching between qualitatively distinct cognitive ‘gaits’: Evidence from simulation. Psychological Review, 127(2), 281-304. doi:10.1037/rev0000172.

    Abstract

    That speakers can vary their speaking rate is evident, but how they accomplish this has hardly been studied. Consider this analogy: When walking, speed can be continuously increased, within limits, but to speed up further, humans must run. Are there multiple qualitatively distinct speech “gaits” that resemble walking and running? Or is control achieved by continuous modulation of a single gait? This study investigates these possibilities through simulations of a new connectionist computational model of the cognitive process of speech production, EPONA, that borrows from Dell, Burger, and Svec’s (1997) model. The model has parameters that can be adjusted to fit the temporal characteristics of speech at different speaking rates. We trained the model on a corpus of disyllabic Dutch words produced at different speaking rates. During training, different clusters of parameter values (regimes) were identified for different speaking rates. In a 1-gait system, the regimes used to achieve fast and slow speech are qualitatively similar, but quantitatively different. In a multiple gait system, there is no linear relationship between the parameter settings associated with each gait, resulting in an abrupt shift in parameter values to move from speaking slowly to speaking fast. After training, the model achieved good fits in all three speaking rates. The parameter settings associated with each speaking rate were not linearly related, suggesting the presence of cognitive gaits. Thus, we provide the first computationally explicit account of the ability to modulate the speech production system to achieve different speaking styles.

    Additional information

    Supplemental material
  • Van Os, M., De Jong, N. H., & Bosker, H. R. (2020). Fluency in dialogue: Turn‐taking behavior shapes perceived fluency in native and nonnative speech. Language Learning, 70(4), 1183-1217. doi:10.1111/lang.12416.

    Abstract

    Fluency is an important part of research on second language learning, but most research on language proficiency typically has not included oral fluency as part of interaction, even though natural communication usually occurs in conversations. The present study considered aspects of turn-taking behavior as part of the construct of fluency and investigated whether these aspects differentially influence perceived fluency ratings of native and non-native speech. Results from two experiments using acoustically manipulated speech showed that, in native speech, too ‘eager’ (interrupting a question with a fast answer) and too ‘reluctant’ answers (answering slowly after a long turn gap) negatively affected fluency ratings. However, in non-native speech, only too ‘reluctant’ answers led to lower fluency ratings. Thus, we demonstrate that acoustic properties of dialogue are perceived as part of fluency. By adding to our current understanding of dialogue fluency, these lab-based findings carry implications for language teaching and assessment

    Additional information

    data + R analysis script via osf
  • Bosker, H. R., Van Os, M., Does, R., & Van Bergen, G. (2019). Counting 'uhm's: how tracking the distribution of native and non-native disfluencies influences online language comprehension. Journal of Memory and Language, 106, 189-202. doi:10.1016/j.jml.2019.02.006.

    Abstract

    Disfluencies, like 'uh', have been shown to help listeners anticipate reference to low-frequency words. The associative account of this 'disfluency bias' proposes that listeners learn to associate disfluency with low-frequency referents based on prior exposure to non-arbitrary disfluency distributions (i.e., greater probability of low-frequency words after disfluencies). However, there is limited evidence for listeners actually tracking disfluency distributions online. The present experiments are the first to show that adult listeners, exposed to a typical or more atypical disfluency distribution (i.e., hearing a talker unexpectedly say uh before high-frequency words), flexibly adjust their predictive strategies to the disfluency distribution at hand (e.g., learn to predict high-frequency referents after disfluency). However, when listeners were presented with the same atypical disfluency distribution but produced by a non-native speaker, no adjustment was observed. This suggests pragmatic inferences can modulate distributional learning, revealing the flexibility of, and constraints on, distributional learning in incremental language comprehension.
  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2019). How the tracking of habitual rate influences speech perception. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(1), 128-138. doi:10.1037/xlm0000579.

    Abstract

    Listeners are known to track statistical regularities in speech. Yet, which temporal cues
    are encoded is unclear. This study tested effects of talker-specific habitual speech rate
    and talker-independent average speech rate (heard over a longer period of time) on
    the perception of the temporal Dutch vowel contrast /A/-/a:/. First, Experiment 1
    replicated that slow local (surrounding) speech contexts induce fewer long /a:/
    responses than faster contexts. Experiment 2 tested effects of long-term habitual
    speech rate. One high-rate group listened to ambiguous vowels embedded in `neutral'
    speech from talker A, intermixed with speech from fast talker B. Another low-rate group
    listened to the same `neutral' speech from talker A, but to talker B being slow.
    Between-group comparison of the `neutral' trials showed that the high-rate group
    demonstrated a lower proportion of /a:/ responses, indicating that talker A's habitual
    speech rate sounded slower when B was faster. In Experiment 3, both talkers
    produced speech at both rates, removing the different habitual speech rates of talker A
    and B, while maintaining the average rate differing between groups. This time no
    global rate effect was observed. Taken together, the present experiments show that a
    talker's habitual rate is encoded relative to the habitual rate of another talker, carrying
    implications for episodic and constraint-based models of speech perception.
  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2019). Listeners normalize speech for contextual speech rate even without an explicit recognition task. The Journal of the Acoustical Society of America, 146(1), 179-188. doi:10.1121/1.5116004.

    Abstract

    Speech can be produced at different rates. Listeners take this rate variation into account by normalizing vowel duration for contextual speech rate: An ambiguous Dutch word /m?t/ is perceived as short /mAt/ when embedded in a slow context, but long /ma:t/ in a fast context. Whilst some have argued that this rate normalization involves low-level automatic perceptual processing, there is also evidence that it arises at higher-level cognitive processing stages, such as decision making. Prior research on rate-dependent speech perception has only used explicit recognition tasks to investigate the phenomenon, involving both perceptual processing and decision making. This study tested whether speech rate normalization can be observed without explicit decision making, using a cross-modal repetition priming paradigm. Results show that a fast precursor sentence makes an embedded ambiguous prime (/m?t/) sound (implicitly) more /a:/-like, facilitating lexical access to the long target word "maat" in a (explicit) lexical decision task. This result suggests that rate normalization is automatic, taking place even in the absence of an explicit recognition task. Thus, rate normalization is placed within the realm of everyday spoken conversation, where explicit categorization of ambiguous sounds is rare.
  • Rodd, J., Bosker, H. R., Ten Bosch, L., & Ernestus, M. (2019). Deriving the onset and offset times of planning units from acoustic and articulatory measurements. The Journal of the Acoustical Society of America, 145(2), EL161-EL167. doi:10.1121/1.5089456.

    Abstract

    Many psycholinguistic models of speech sequence planning make claims about the onset and offset times of planning units, such as words, syllables, and phonemes. These predictions typically go untested, however, since psycholinguists have assumed that the temporal dynamics of the speech signal is a poor index of the temporal dynamics of the underlying speech planning process. This article argues that this problem is tractable, and presents and validates two simple metrics that derive planning unit onset and offset times from the acoustic signal and articulatographic data.
  • Bosker, H. R., & Ghitza, O. (2018). Entrained theta oscillations guide perception of subsequent speech: Behavioral evidence from rate normalization. Language, Cognition and Neuroscience, 33(8), 955-967. doi:10.1080/23273798.2018.1439179.

    Abstract

    This psychoacoustic study provides behavioral evidence that neural entrainment in the theta range (3-9 Hz) causally shapes speech perception. Adopting the ‘rate normalization’ paradigm (presenting compressed carrier sentences followed by uncompressed target words), we show that uniform compression of a speech carrier to syllable rates inside the theta range influences perception of subsequent uncompressed targets, but compression outside theta range does not. However, the influence of carriers – compressed outside theta range – on target perception is salvaged when carriers are ‘repackaged’ to have a packet rate inside theta. This suggests that the brain can only successfully entrain to syllable/packet rates within theta range, with a causal influence on the perception of subsequent speech, in line with recent neuroimaging data. Thus, this study points to a central role for sustained theta entrainment in rate normalization and contributes to our understanding of the functional role of brain oscillations in speech perception.
  • Bosker, H. R. (2018). Putting Laurel and Yanny in context. The Journal of the Acoustical Society of America, 144(6), EL503-EL508. doi:10.1121/1.5070144.

    Abstract

    Recently, the world’s attention was caught by an audio clip that was perceived as “Laurel” or “Yanny”. Opinions were sharply split: many could not believe others heard something different from their perception. However, a crowd-source experiment with >500 participants shows that it is possible to make people hear Laurel, where they previously heard Yanny, by manipulating preceding acoustic context. This study is not only the first to reveal within-listener variation in Laurel/Yanny percepts, but also to demonstrate contrast effects for global spectral information in larger frequency regions. Thus, it highlights the intricacies of human perception underlying these social media phenomena.
  • Bosker, H. R., & Cooke, M. (2018). Talkers produce more pronounced amplitude modulations when speaking in noise. The Journal of the Acoustical Society of America, 143(2), EL121-EL126. doi:10.1121/1.5024404.

    Abstract

    Speakers adjust their voice when talking in noise (known as Lombard speech), facilitating speech comprehension. Recent neurobiological models of speech perception emphasize the role of amplitude modulations in speech-in-noise comprehension, helping neural oscillators to ‘track’ the attended speech. This study tested whether talkers produce more pronounced amplitude modulations in noise. Across four different corpora, modulation spectra showed greater power in amplitude modulations below 4 Hz in Lombard speech compared to matching plain speech. This suggests that noise-induced speech contains more pronounced amplitude modulations, potentially helping the listening brain to entrain to the attended talker, aiding comprehension.
  • Kösem, A., Bosker, H. R., Takashima, A., Meyer, A. S., Jensen, O., & Hagoort, P. (2018). Neural entrainment determines the words we hear. Current Biology, 28, 2867-2875. doi:10.1016/j.cub.2018.07.023.

    Abstract

    Low-frequency neural entrainment to rhythmic input
    has been hypothesized as a canonical mechanism
    that shapes sensory perception in time. Neural
    entrainment is deemed particularly relevant for
    speech analysis, as it would contribute to the extraction
    of discrete linguistic elements from continuous
    acoustic signals. However, its causal influence in
    speech perception has been difficult to establish.
    Here, we provide evidence that oscillations build temporal
    predictions about the duration of speech tokens
    that affect perception. Using magnetoencephalography
    (MEG), we studied neural dynamics during
    listening to sentences that changed in speech rate.
    Weobserved neural entrainment to preceding speech
    rhythms persisting for several cycles after the change
    in rate. The sustained entrainment was associated
    with changes in the perceived duration of the last
    word’s vowel, resulting in the perception of words
    with different meanings. These findings support oscillatory
    models of speech processing, suggesting that
    neural oscillations actively shape speech perception.
  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2018). Listening to yourself is special: Evidence from global speech rate tracking. PLoS One, 13(9): e0203571. doi:10.1371/journal.pone.0203571.

    Abstract

    Listeners are known to use adjacent contextual speech rate in processing temporally ambiguous speech sounds. For instance, an ambiguous vowel between short /A/ and long /a:/ in Dutch sounds relatively long (i.e., as /a:/) embedded in a fast precursor sentence, but short in a slow sentence. Besides the local speech rate, listeners also track talker-specific global speech rates. However, it is yet unclear whether other talkers' global rates are encoded with reference to a listener's self-produced rate. Three experiments addressed this question. In Experiment 1, one group of participants was instructed to speak fast, whereas another group had to speak slowly. The groups were compared on their perception of ambiguous /A/-/a:/ vowels embedded in neutral rate speech from another talker. In Experiment 2, the same participants listened to playback of their own speech and again evaluated target vowels in neutral rate speech. Neither of these experiments provided support for the involvement of self-produced speech in perception of another talker's speech rate. Experiment 3 repeated Experiment 2 but with a new participant sample that was unfamiliar with the participants from Experiment 2. This experiment revealed fewer /a:/ responses in neutral speech in the group also listening to a fast rate, suggesting that neutral speech sounds slow in the presence of a fast talker and vice versa. Taken together, the findings show that self-produced speech is processed differently from speech produced by others. They carry implications for our understanding of the perceptual and cognitive mechanisms involved in rate-dependent speech perception in dialogue settings.
  • Van Bergen, G., & Bosker, H. R. (2018). Linguistic expectation management in online discourse processing: An investigation of Dutch inderdaad 'indeed' and eigenlijk 'actually'. Journal of Memory and Language, 103, 191-209. doi:10.1016/j.jml.2018.08.004.

    Abstract

    Interpersonal discourse particles (DPs), such as Dutch inderdaad (≈‘indeed’) and eigenlijk (≈‘actually’) are highly frequent in everyday conversational interaction. Despite extensive theoretical descriptions of their polyfunctionality, little is known about how they are used by language comprehenders. In two visual world eye-tracking experiments involving an online dialogue completion task, we asked to what extent inderdaad, confirming an inferred expectation, and eigenlijk, contrasting with an inferred expectation, influence real-time understanding of dialogues. Answers in the dialogues contained a DP or a control adverb, and a critical discourse referent was replaced by a beep; participants chose the most likely dialogue completion by clicking on one of four referents in a display. Results show that listeners make rapid and fine-grained situation-specific inferences about the use of DPs, modulating their expectations about how the dialogue will unfold. Findings further specify and constrain theories about the conversation-managing function and polyfunctionality of DPs.
  • Bosker, H. R. (2017). Accounting for rate-dependent category boundary shifts in speech perception. Attention, Perception & Psychophysics, 79, 333-343. doi:10.3758/s13414-016-1206-4.

    Abstract

    The perception of temporal contrasts in speech is known to be influenced by the speech rate in the surrounding context. This rate-dependent perception is suggested to involve general auditory processes since it is also elicited by non-speech contexts, such as pure tone sequences. Two general auditory mechanisms have been proposed to underlie rate-dependent perception: durational contrast and neural entrainment. The present study compares the predictions of these two accounts of rate-dependent speech perception by means of four experiments in which participants heard tone sequences followed by Dutch target words ambiguous between /ɑs/ “ash” and /a:s/ “bait”. Tone sequences varied in the duration of tones (short vs. long) and in the presentation rate of the tones (fast vs. slow). Results show that the duration of preceding tones did not influence target perception in any of the experiments, thus challenging durational contrast as explanatory mechanism behind rate-dependent perception. Instead, the presentation rate consistently elicited a category boundary shift, with faster presentation rates inducing more /a:s/ responses, but only if the tone sequence was isochronous. Therefore, this study proposes an alternative, neurobiologically plausible, account of rate-dependent perception involving neural entrainment of endogenous oscillations to the rate of a rhythmic stimulus.
  • Bosker, H. R., Reinisch, E., & Sjerps, M. J. (2017). Cognitive load makes speech sound fast, but does not modulate acoustic context effects. Journal of Memory and Language, 94, 166-176. doi:10.1016/j.jml.2016.12.002.

    Abstract

    In natural situations, speech perception often takes place during the concurrent execution of other cognitive tasks, such as listening while viewing a visual scene. The execution of a dual task typically has detrimental effects on concurrent speech perception, but how exactly cognitive load disrupts speech encoding is still unclear. The detrimental effect on speech representations may consist of either a general reduction in the robustness of processing of the speech signal (‘noisy encoding’), or, alternatively it may specifically influence the temporal sampling of the sensory input, with listeners missing temporal pulses, thus underestimating segmental durations (‘shrinking of time’). The present study investigated whether and how spectral and temporal cues in a precursor sentence that has been processed under high vs. low cognitive load influence the perception of a subsequent target word. If cognitive load effects are implemented through ‘noisy encoding’, increasing cognitive load during the precursor should attenuate the encoding of both its temporal and spectral cues, and hence reduce the contextual effect that these cues can have on subsequent target sound perception. However, if cognitive load effects are expressed as ‘shrinking of time’, context effects should not be modulated by load, but a main effect would be expected on the perceived duration of the speech signal. Results from two experiments indicate that increasing cognitive load (manipulated through a secondary visual search task) did not modulate temporal (Experiment 1) or spectral context effects (Experiment 2). However, a consistent main effect of cognitive load was found: increasing cognitive load during the precursor induced a perceptual increase in its perceived speech rate, biasing the perception of a following target word towards longer durations. This finding suggests that cognitive load effects in speech perception are implemented via ‘shrinking of time’, in line with a temporal sampling framework. In addition, we argue that our results align with a model in which early (spectral and temporal) normalization is unaffected by attention but later adjustments may be attention-dependent.
  • Bosker, H. R., & Reinisch, E. (2017). Foreign languages sound fast: evidence from implicit rate normalization. Frontiers in Psychology, 8: 1063. doi:10.3389/fpsyg.2017.01063.

    Abstract

    Anecdotal evidence suggests that unfamiliar languages sound faster than one’s native language. Empirical evidence for this impression has, so far, come from explicit rate judgments. The aim of the present study was to test whether such perceived rate differences between native and foreign languages have effects on implicit speech processing. Our measure of implicit rate perception was “normalization for speaking rate”: an ambiguous vowel between short /a/ and long /a:/ is interpreted as /a:/ following a fast but as /a/ following a slow carrier sentence. That is, listeners did not judge speech rate itself; instead, they categorized ambiguous vowels whose perception was implicitly affected by the rate of the context. We asked whether a bias towards long /a:/ might be observed when the context is not actually faster but simply spoken in a foreign language. A fully symmetrical experimental design was used: Dutch and German participants listened to rate matched (fast and slow) sentences in both languages spoken by the same bilingual speaker. Sentences were followed by nonwords that contained vowels from an /a-a:/ duration continuum. Results from Experiments 1 and 2 showed a consistent effect of rate normalization for both listener groups. Moreover, for German listeners, across the two experiments, foreign sentences triggered more /a:/ responses than (rate matched) native sentences, suggesting that foreign sentences were indeed perceived as faster. Moreover, this Foreign Language effect was modulated by participants’ ability to understand the foreign language: those participants that scored higher on a foreign language translation task showed less of a Foreign Language effect. However, opposite effects were found for the Dutch listeners. For them, their native rather than the foreign language induced more /a:/ responses. Nevertheless, this reversed effect could be reduced when additional spectral properties of the context were controlled for. Experiment 3, using explicit rate judgments, replicated the effect for German but not Dutch listeners. We therefore conclude that the subjective impression that foreign languages sound fast may have an effect on implicit speech processing, with implications for how language learners perceive spoken segments in a foreign language.

    Additional information

    data sheet 1.docx
  • Bosker, H. R. (2017). How our own speech rate influences our perception of others. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(8), 1225-1238. doi:10.1037/xlm0000381.

    Abstract

    In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through six experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing pre-recorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings.
  • Gibson, M., & Bosker, H. R. (2016). Over vloeiendheid in spraak. Tijdschrift Taal, 7(10), 40-45.
  • Bosker, H. R., Quené, H., Sanders, T. J. M., & de Jong, N. H. (2014). Native 'um's elicit prediction of low-frequency referents, but non-native 'um's do not. Journal of Memory and Language, 75, 104-116. doi:10.1016/j.jml.2014.05.004.

    Abstract

    Speech comprehension involves extensive use of prediction. Linguistic prediction may be guided by the semantics or syntax, but also by the performance characteristics of the speech signal, such as disfluency. Previous studies have shown that listeners, when presented with the filler uh, exhibit a disfluency bias for discourse-new or unknown referents, drawing inferences about the source of the disfluency. The goal of the present study is to study the contrast between native and non-native disfluencies in speech comprehension. Experiment 1 presented listeners with pictures of high-frequency (e.g., a hand) and low-frequency objects (e.g., a sewing machine) and with fluent and disfluent instructions. Listeners were found to anticipate reference to low-frequency objects when encountering disfluency, thus attributing disfluency to speaker trouble in lexical retrieval. Experiment 2 showed that, when participants listened to disfluent non-native speech, no anticipation of low-frequency referents was observed. We conclude that listeners can adapt their predictive strategies to the (non-native) speaker at hand, extending our understanding of the role of speaker identity in speech comprehension.
  • Bosker, H. R., Quené, H., Sanders, T. J. M., & de Jong, N. H. (2014). The perception of fluency in native and non-native speech. Language Learning, 64, 579-614. doi:10.1111/lang.12067.

    Abstract

    Where native speakers supposedly are fluent by default, non-native speakers often have to strive hard to achieve a native-like fluency level. However, disfluencies (such as pauses, fillers, repairs, etc.) occur in both native and non-native speech and it is as yet unclear ow luency raters weigh the fluency characteristics of native and non-native speech. Two rating experiments compared the way raters assess the luency of native and non-native speech. The fluency characteristics of native and non- native speech were controlled by using phonetic anipulations in pause (Experiment 1) and speed characteristics (Experiment 2). The results show that the ratings on manipulated native and on-native speech were affected in a similar fashion. This suggests that there is no difference in the way listeners weigh the fluency haracteristics of native and non-native speakers.
  • Pinget, A.-F., Bosker, H. R., Quené, H., & de Jong, N. H. (2014). Native speakers' perceptions of fluency and accent in L2 speech. Language Testing, 31, 349-365. doi:10.1177/0265532214526177.

    Abstract

    Oral fluency and foreign accent distinguish L2 from L1 speech production. In language testing practices, both fluency and accent are usually assessed by raters. This study investigates what exactly native raters of fluency and accent take into account when judging L2. Our aim is to explore the relationship between objectively measured temporal, segmental and suprasegmental properties of speech on the one hand, and fluency and accent as rated by native raters on the other hand. For 90 speech fragments from Turkish and English L2 learners of Dutch, several acoustic measures of fluency and accent were calculated. In Experiment 1, 20 native speakers of Dutch rated the L2 Dutch samples on fluency. In Experiment 2, 20 different untrained native speakers of Dutch judged the L2 Dutch samples on accentedness. Regression analyses revealed that acoustic measures of fluency were good predictors of fluency ratings. Secondly, segmental and suprasegmental measures of accent could predict some variance of accent ratings. Thirdly, perceived fluency and perceived accent were only weakly related. In conclusion, this study shows that fluency and perceived foreign accent can be judged as separate constructs.
  • Poellmann, K., Bosker, H. R., McQueen, J. M., & Mitterer, H. (2014). Perceptual adaptation to segmental and syllabic reductions in continuous spoken Dutch. Journal of Phonetics, 46, 101-127. doi:10.1016/j.wocn.2014.06.004.

    Abstract

    This study investigates if and how listeners adapt to reductions in casual continuous speech. In a perceptual-learning variant of the visual-world paradigm, two groups of Dutch participants were exposed to either segmental (/b/ → [ʋ]) or syllabic (ver- → [fː]) reductions in spoken Dutch sentences. In the test phase, both groups heard both kinds of reductions, but now applied to different words. In one of two experiments, the segmental reduction exposure group was better than the syllabic reduction exposure group in recognizing new reduced /b/-words. In both experiments, the syllabic reduction group showed a greater target preference for new reduced ver-words. Learning about reductions was thus applied to previously unheard words. This lexical generalization suggests that mechanisms compensating for segmental and syllabic reductions take place at a prelexical level, and hence that lexical access involves an abstractionist mode of processing. Existing abstractionist models need to be revised, however, as they do not include representations of sequences of segments (corresponding e.g. to ver-) at the prelexical level.
  • Bosker, H. R. (2013). Juncture (prosodic). In G. Khan (Ed.), Encyclopedia of Hebrew Language and Linguistics (pp. 432-434). Leiden: Brill.

    Abstract

    Prosodic juncture concerns the compartmentalization and partitioning of syntactic entities in spoken discourse by means of prosody. It has been argued that the Intonation Unit, defined by internal criteria and prosodic boundary phenomena (e.g., final lengthening, pitch reset, pauses), encapsulates the basic structural unit of spoken Modern Hebrew.
  • Bosker, H. R. (2013). Sibilant consonants. In G. Khan (Ed.), Encyclopedia of Hebrew Language and Linguistics (pp. 557-561). Leiden: Brill.

    Abstract

    Fricative consonants in Hebrew can be divided into bgdkpt and sibilants (ז, ס, צ, שׁ, שׂ). Hebrew sibilants have been argued to stem from Proto-Semitic affricates, laterals, interdentals and /s/. In standard Israeli Hebrew the sibilants are pronounced as [s] (ס and שׂ), [ʃ] (שׁ), [z] (ז), [ʦ] (צ).
  • Bosker, H. R., Pinget, A.-F., Quené, H., Sanders, T., & De Jong, N. H. (2013). What makes speech sound fluent? The contributions of pauses, speed and repairs. Language testing, 30(2), 159-175. doi:10.1177/0265532212455394.

    Abstract

    The oral fluency level of an L2 speaker is often used as a measure in assessing language proficiency. The present study reports on four experiments investigating the contributions of three fluency aspects (pauses, speed and repairs) to perceived fluency. In Experiment 1 untrained raters evaluated the oral fluency of L2 Dutch speakers. Using specific acoustic measures of pause, speed and repair phenomena, linear regression analyses revealed that pause and speed measures best predicted the subjective fluency ratings, and that repair measures contributed only very little. A second research question sought to account for these results by investigating perceptual sensitivity to acoustic pause, speed and repair phenomena, possibly accounting for the results from Experiment 1. In Experiments 2–4 three new groups of untrained raters rated the same L2 speech materials from Experiment 1 on the use of pauses, speed and repairs. A comparison of the results from perceptual sensitivity (Experiments 2–4) with fluency perception (Experiment 1) showed that perceptual sensitivity alone could not account for the contributions of the three aspects to perceived fluency. We conclude that listeners weigh the importance of the perceived aspects of fluency to come to an overall judgment.
  • Bosker, H. R., Briaire, J., Heeren, W., van Heuven, V. J., & Jongman, S. R. (2010). Whispered speech as input for cochlear implants. In J. Van Kampen, & R. Nouwen (Eds.), Linguistics in the Netherlands 2010 (pp. 1-14).

Share this page