Publications

Displaying 1 - 10 of 10
  • Bosker, H. R. (2022). Evidence for selective adaptation and recalibration in the perception of lexical stress. Language and Speech, 65(2), 472-490. doi:10.1177/00238309211030307.

    Abstract

    Individuals vary in how they produce speech. This variability affects both the segments (vowels and consonants) and the suprasegmental properties of their speech (prosody). Previous literature has demonstrated that listeners can adapt to variability in how different talkers pronounce the segments of speech. This study shows that listeners can also adapt to variability in how talkers produce lexical stress. Experiment 1 demonstrates a selective adaptation effect in lexical stress perception: repeatedly hearing Dutch trochaic words biased perception of a subsequent lexical stress continuum towards more iamb responses. Experiment 2 demonstrates a recalibration effect in lexical stress perception: when ambiguous suprasegmental cues to lexical stress were disambiguated by lexical orthographic context as signaling a trochaic word in an exposure phase, Dutch participants categorized a subsequent test continuum as more trochee-like. Moreover, the selective adaptation and recalibration effects generalized to novel words, not encountered during exposure. Together, the experiments demonstrate that listeners also flexibly adapt to variability in the suprasegmental properties of speech, thus expanding our understanding of the utility of listener adaptation in speech perception. Moreover, the combined outcomes speak for an architecture of spoken word recognition involving abstract prosodic representations at a prelexical level of analysis.
  • Bujok, R., Meyer, A. S., & Bosker, H. R. (2022). Visible lexical stress cues on the face do not influence audiovisual speech perception. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 259-263). doi:10.21437/SpeechProsody.2022-53.

    Abstract

    Producing lexical stress leads to visible changes on the face, such as longer duration and greater size of the opening of the mouth. Research suggests that these visual cues alone can inform participants about which syllable carries stress (i.e., lip-reading silent videos). This study aims to determine the influence of visual articulatory cues on lexical stress perception in more naturalistic audiovisual settings. Participants were presented with seven disyllabic, Dutch minimal stress pairs (e.g., VOORnaam [first name] & voorNAAM [respectable]) in audio-only (phonetic lexical stress continua without video), video-only (lip-reading silent videos), and audiovisual trials (e.g., phonetic lexical stress continua with video of talker saying VOORnaam or voorNAAM). Categorization data from video-only trials revealed that participants could distinguish the minimal pairs above chance from seeing the silent videos alone. However, responses in the audiovisual condition did not differ from the audio-only condition. We thus conclude that visual lexical stress information on the face, while clearly perceivable, does not play a major role in audiovisual speech perception. This study demonstrates that clear unimodal effects do not always generalize to more naturalistic multimodal communication, advocating that speech prosody is best considered in multimodal settings.
  • Reinisch, E., & Bosker, H. R. (2022). Encoding speech rate in challenging listening conditions: White noise and reverberation. Attention, Perception & Psychophysics, 84, 2303 -2318. doi:10.3758/s13414-022-02554-8.

    Abstract

    Temporal contrasts in speech are perceived relative to the speech rate of the surrounding context. That is, following a fast context
    sentence, listeners interpret a given target sound as longer than following a slow context, and vice versa. This rate effect, often
    referred to as “rate-dependent speech perception,” has been suggested to be the result of a robust, low-level perceptual process,
    typically examined in quiet laboratory settings. However, speech perception often occurs in more challenging listening condi-
    tions. Therefore, we asked whether rate-dependent perception would be (partially) compromised by signal degradation relative to
    a clear listening condition. Specifically, we tested effects of white noise and reverberation, with the latter specifically distorting
    temporal information. We hypothesized that signal degradation would reduce the precision of encoding the speech rate in the
    context and thereby reduce the rate effect relative to a clear context. This prediction was borne out for both types of degradation in
    Experiment 1, where the context sentences but not the subsequent target words were degraded. However, in Experiment 2, which
    compared rate effects when contexts and targets were coherent in terms of signal quality, no reduction of the rate effect was
    found. This suggests that, when confronted with coherently degraded signals, listeners adapt to challenging listening situations,
    eliminating the difference between rate-dependent perception in clear and degraded conditions. Overall, the present study
    contributes towards understanding the consequences of different types of listening environments on the functioning of low-
    level perceptual processes that listeners use during speech perception.

    Additional information

    Data availability
  • Severijnen, G. G. A., Bosker, H. R., & McQueen, J. M. (2022). Acoustic correlates of Dutch lexical stress re-examined: Spectral tilt is not always more reliable than intensity. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 278-282). doi:10.21437/SpeechProsody.2022-57.

    Abstract

    The present study examined two acoustic cues in the production
    of lexical stress in Dutch: spectral tilt and overall intensity.
    Sluijter and Van Heuven (1996) reported that spectral tilt is a
    more reliable cue to stress than intensity. However, that study
    included only a small number of talkers (10) and only syllables
    with the vowels /aː/ and /ɔ/.
    The present study re-examined this issue in a larger and
    more variable dataset. We recorded 38 native speakers of Dutch
    (20 females) producing 744 tokens of Dutch segmentally
    overlapping words (e.g., VOORnaam vs. voorNAAM, “first
    name” vs. “respectable”), targeting 10 different vowels, in
    variable sentence contexts. For each syllable, we measured
    overall intensity and spectral tilt following Sluijter and Van
    Heuven (1996).
    Results from Linear Discriminant Analyses showed that,
    for the vowel /aː/ alone, spectral tilt showed an advantage over
    intensity, as evidenced by higher stressed/unstressed syllable
    classification accuracy scores for spectral tilt. However, when
    all vowels were included in the analysis, the advantage
    disappeared.
    These findings confirm that spectral tilt plays a larger role
    in signaling stress in Dutch /aː/ but show that, for a larger
    sample of Dutch vowels, overall intensity and spectral tilt are
    equally important.
  • Bosker, H. R., & Ghitza, O. (2018). Entrained theta oscillations guide perception of subsequent speech: Behavioral evidence from rate normalization. Language, Cognition and Neuroscience, 33(8), 955-967. doi:10.1080/23273798.2018.1439179.

    Abstract

    This psychoacoustic study provides behavioral evidence that neural entrainment in the theta range (3-9 Hz) causally shapes speech perception. Adopting the ‘rate normalization’ paradigm (presenting compressed carrier sentences followed by uncompressed target words), we show that uniform compression of a speech carrier to syllable rates inside the theta range influences perception of subsequent uncompressed targets, but compression outside theta range does not. However, the influence of carriers – compressed outside theta range – on target perception is salvaged when carriers are ‘repackaged’ to have a packet rate inside theta. This suggests that the brain can only successfully entrain to syllable/packet rates within theta range, with a causal influence on the perception of subsequent speech, in line with recent neuroimaging data. Thus, this study points to a central role for sustained theta entrainment in rate normalization and contributes to our understanding of the functional role of brain oscillations in speech perception.
  • Bosker, H. R. (2018). Putting Laurel and Yanny in context. The Journal of the Acoustical Society of America, 144(6), EL503-EL508. doi:10.1121/1.5070144.

    Abstract

    Recently, the world’s attention was caught by an audio clip that was perceived as “Laurel” or “Yanny”. Opinions were sharply split: many could not believe others heard something different from their perception. However, a crowd-source experiment with >500 participants shows that it is possible to make people hear Laurel, where they previously heard Yanny, by manipulating preceding acoustic context. This study is not only the first to reveal within-listener variation in Laurel/Yanny percepts, but also to demonstrate contrast effects for global spectral information in larger frequency regions. Thus, it highlights the intricacies of human perception underlying these social media phenomena.
  • Bosker, H. R., & Cooke, M. (2018). Talkers produce more pronounced amplitude modulations when speaking in noise. The Journal of the Acoustical Society of America, 143(2), EL121-EL126. doi:10.1121/1.5024404.

    Abstract

    Speakers adjust their voice when talking in noise (known as Lombard speech), facilitating speech comprehension. Recent neurobiological models of speech perception emphasize the role of amplitude modulations in speech-in-noise comprehension, helping neural oscillators to ‘track’ the attended speech. This study tested whether talkers produce more pronounced amplitude modulations in noise. Across four different corpora, modulation spectra showed greater power in amplitude modulations below 4 Hz in Lombard speech compared to matching plain speech. This suggests that noise-induced speech contains more pronounced amplitude modulations, potentially helping the listening brain to entrain to the attended talker, aiding comprehension.
  • Kösem, A., Bosker, H. R., Takashima, A., Meyer, A. S., Jensen, O., & Hagoort, P. (2018). Neural entrainment determines the words we hear. Current Biology, 28, 2867-2875. doi:10.1016/j.cub.2018.07.023.

    Abstract

    Low-frequency neural entrainment to rhythmic input
    has been hypothesized as a canonical mechanism
    that shapes sensory perception in time. Neural
    entrainment is deemed particularly relevant for
    speech analysis, as it would contribute to the extraction
    of discrete linguistic elements from continuous
    acoustic signals. However, its causal influence in
    speech perception has been difficult to establish.
    Here, we provide evidence that oscillations build temporal
    predictions about the duration of speech tokens
    that affect perception. Using magnetoencephalography
    (MEG), we studied neural dynamics during
    listening to sentences that changed in speech rate.
    Weobserved neural entrainment to preceding speech
    rhythms persisting for several cycles after the change
    in rate. The sustained entrainment was associated
    with changes in the perceived duration of the last
    word’s vowel, resulting in the perception of words
    with different meanings. These findings support oscillatory
    models of speech processing, suggesting that
    neural oscillations actively shape speech perception.
  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2018). Listening to yourself is special: Evidence from global speech rate tracking. PLoS One, 13(9): e0203571. doi:10.1371/journal.pone.0203571.

    Abstract

    Listeners are known to use adjacent contextual speech rate in processing temporally ambiguous speech sounds. For instance, an ambiguous vowel between short /A/ and long /a:/ in Dutch sounds relatively long (i.e., as /a:/) embedded in a fast precursor sentence, but short in a slow sentence. Besides the local speech rate, listeners also track talker-specific global speech rates. However, it is yet unclear whether other talkers' global rates are encoded with reference to a listener's self-produced rate. Three experiments addressed this question. In Experiment 1, one group of participants was instructed to speak fast, whereas another group had to speak slowly. The groups were compared on their perception of ambiguous /A/-/a:/ vowels embedded in neutral rate speech from another talker. In Experiment 2, the same participants listened to playback of their own speech and again evaluated target vowels in neutral rate speech. Neither of these experiments provided support for the involvement of self-produced speech in perception of another talker's speech rate. Experiment 3 repeated Experiment 2 but with a new participant sample that was unfamiliar with the participants from Experiment 2. This experiment revealed fewer /a:/ responses in neutral speech in the group also listening to a fast rate, suggesting that neutral speech sounds slow in the presence of a fast talker and vice versa. Taken together, the findings show that self-produced speech is processed differently from speech produced by others. They carry implications for our understanding of the perceptual and cognitive mechanisms involved in rate-dependent speech perception in dialogue settings.
  • Van Bergen, G., & Bosker, H. R. (2018). Linguistic expectation management in online discourse processing: An investigation of Dutch inderdaad 'indeed' and eigenlijk 'actually'. Journal of Memory and Language, 103, 191-209. doi:10.1016/j.jml.2018.08.004.

    Abstract

    Interpersonal discourse particles (DPs), such as Dutch inderdaad (≈‘indeed’) and eigenlijk (≈‘actually’) are highly frequent in everyday conversational interaction. Despite extensive theoretical descriptions of their polyfunctionality, little is known about how they are used by language comprehenders. In two visual world eye-tracking experiments involving an online dialogue completion task, we asked to what extent inderdaad, confirming an inferred expectation, and eigenlijk, contrasting with an inferred expectation, influence real-time understanding of dialogues. Answers in the dialogues contained a DP or a control adverb, and a critical discourse referent was replaced by a beep; participants chose the most likely dialogue completion by clicking on one of four referents in a display. Results show that listeners make rapid and fine-grained situation-specific inferences about the use of DPs, modulating their expectations about how the dialogue will unfold. Findings further specify and constrain theories about the conversation-managing function and polyfunctionality of DPs.

Share this page