Publications

Displaying 1 - 7 of 7

Bujok, R., Meyer, A. S., & Bosker, H. R. (2025). Audiovisual perception of lexical stress: Beat gestures and articulatory cues. Language and Speech, 68(1), 181-203. doi:10.1177/00238309241258162.

DOI

Full Text

Abstract
Human communication is inherently multimodal. Auditory speech, but also visual cues can be used to understand another talker. Most studies of audiovisual speech perception have focused on the perception of speech segments (i.e., speech sounds). However, less is known about the influence of visual information on the perception of suprasegmental aspects of speech like lexical stress. In two experiments, we investigated the influence of different visual cues (e.g., facial articulatory cues and beat gestures) on the audiovisual perception of lexical stress. We presented auditory lexical stress continua of disyllabic Dutch stress pairs together with videos of a speaker producing stress on the first or second syllable (e.g., articulating VOORnaam or voorNAAM). Moreover, we combined and fully crossed the face of the speaker producing lexical stress on either syllable with a gesturing body producing a beat gesture on either the first or second syllable. Results showed that people successfully used visual articulatory cues to stress in muted videos. However, in audiovisual conditions, we were not able to find an effect of visual articulatory cues. In contrast, we found that the temporal alignment of beat gestures with speech robustly influenced participants' perception of lexical stress. These results highlight the importance of considering suprasegmental aspects of language in multimodal contexts.

Permanent link to publication record
Ye, C., McQueen, J. M., & Bosker, H. R. (2025). Effect of auditory cues to lexical stress on the visual perception of gestural timing. Attention, Perception & Psychophysics. Advance online publication. doi:10.3758/s13414-025-03072-z.

DOI

Full Text

Abstract
Speech is often accompanied by gestures. Since beat gestures—simple nonreferential up-and-down hand movements—frequently co-occur with prosodic prominence, they can indicate stress in a word and hence influence spoken-word recognition. However, little is known about the reverse influence of auditory speech on visual perception. The current study investigated whether lexical stress has an effect on the perceived timing of hand beats. We used videos in which a disyllabic word, embedded in a carrier sentence (Experiment 1) or in isolation (Experiment 2), was coupled with an up-and-down hand beat, while varying their degrees of asynchrony. Results from Experiment 1, a novel beat timing estimation task, revealed that gestures were estimated to occur closer in time to the pitch peak in a stressed syllable than their actual timing, hence reducing the perceived temporal distance between gestures and stress by around 60%. Using a forced-choice task, Experiment 2 further demonstrated that listeners tended to perceive a gesture, falling midway between two syllables, on the syllable receiving stronger cues to stress than the other, and this auditory effect was greater when gestural timing was most ambiguous. Our findings suggest that f0 and intensity are the driving force behind the temporal attraction effect of stress on perceived gestural timing. This study provides new evidence for auditory influences on visual perception, supporting bidirectionality in audiovisual interaction between speech-related signals that occur in everyday face-to-face communication.

Permanent link to publication record
Rohrer, P. L., Bujok, R., Van Maastricht, L., & Bosker, H. R. (2025). From “I dance” to “she danced” with a flick of the hands: Audiovisual stress perception in Spanish. Psychonomic Bulletin & Review. Advance online publication. doi:10.3758/s13423-025-02683-9.

DOI

Full Text

Abstract
When talking, speakers naturally produce hand movements (co-speech gestures) that contribute to communication. Evidence in Dutch suggests that the timing of simple up-and-down, non-referential “beat” gestures influences spoken word recognition: the same auditory stimulus was perceived as CONtent (noun, capitalized letters indicate stressed syllables) when a beat gesture occurred on the first syllable, but as conTENT (adjective) when the gesture occurred on the second syllable. However, these findings were based on a small number of minimal pairs in Dutch, limiting the generalizability of the findings. We therefore tested this effect in Spanish, where lexical stress is highly relevant in the verb conjugation system, distinguishing bailo, “I dance” with word-initial stress from bailó, “she danced” with word-final stress. Testing a larger sample (N = 100), we also assessed whether individual differences in working memory capacity modulated how much individuals relied on the gestures in spoken word recognition. The results showed that, similar to Dutch, Spanish participants were biased to perceive lexical stress on the syllable that visually co-occurred with a beat gesture, with the effect being strongest when the acoustic stress cues were most ambiguous. No evidence was found for by-participant effect sizes to be influenced by individual differences in phonological or visuospatial working memory. These findings reveal gestural-speech coordination impacts lexical stress perception in a language where listeners are regularly confronted with such lexical stress contrasts, highlighting the impact of gestures’ timing on prominence perception and spoken word recognition.

Permanent link to publication record
Bosker, H. R. (2022). Evidence for selective adaptation and recalibration in the perception of lexical stress. Language and Speech, 65(2), 472-490. doi:10.1177/00238309211030307.

DOI

Full Text

Abstract
Individuals vary in how they produce speech. This variability affects both the segments (vowels and consonants) and the suprasegmental properties of their speech (prosody). Previous literature has demonstrated that listeners can adapt to variability in how different talkers pronounce the segments of speech. This study shows that listeners can also adapt to variability in how talkers produce lexical stress. Experiment 1 demonstrates a selective adaptation effect in lexical stress perception: repeatedly hearing Dutch trochaic words biased perception of a subsequent lexical stress continuum towards more iamb responses. Experiment 2 demonstrates a recalibration effect in lexical stress perception: when ambiguous suprasegmental cues to lexical stress were disambiguated by lexical orthographic context as signaling a trochaic word in an exposure phase, Dutch participants categorized a subsequent test continuum as more trochee-like. Moreover, the selective adaptation and recalibration effects generalized to novel words, not encountered during exposure. Together, the experiments demonstrate that listeners also flexibly adapt to variability in the suprasegmental properties of speech, thus expanding our understanding of the utility of listener adaptation in speech perception. Moreover, the combined outcomes speak for an architecture of spoken word recognition involving abstract prosodic representations at a prelexical level of analysis.

Additional information
supplementary information experimental data

Permanent link to publication record
Bujok, R., Meyer, A. S., & Bosker, H. R. (2022). Visible lexical stress cues on the face do not influence audiovisual speech perception. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 259-263). doi:10.21437/SpeechProsody.2022-53.

DOI

Full Text

Abstract
Producing lexical stress leads to visible changes on the face, such as longer duration and greater size of the opening of the mouth. Research suggests that these visual cues alone can inform participants about which syllable carries stress (i.e., lip-reading silent videos). This study aims to determine the influence of visual articulatory cues on lexical stress perception in more naturalistic audiovisual settings. Participants were presented with seven disyllabic, Dutch minimal stress pairs (e.g., VOORnaam [first name] & voorNAAM [respectable]) in audio-only (phonetic lexical stress continua without video), video-only (lip-reading silent videos), and audiovisual trials (e.g., phonetic lexical stress continua with video of talker saying VOORnaam or voorNAAM). Categorization data from video-only trials revealed that participants could distinguish the minimal pairs above chance from seeing the silent videos alone. However, responses in the audiovisual condition did not differ from the audio-only condition. We thus conclude that visual lexical stress information on the face, while clearly perceivable, does not play a major role in audiovisual speech perception. This study demonstrates that clear unimodal effects do not always generalize to more naturalistic multimodal communication, advocating that speech prosody is best considered in multimodal settings.

Permanent link to publication record
Reinisch, E., & Bosker, H. R. (2022). Encoding speech rate in challenging listening conditions: White noise and reverberation. Attention, Perception & Psychophysics, 84, 2303 -2318. doi:10.3758/s13414-022-02554-8.

DOI

Full Text

Abstract
Temporal contrasts in speech are perceived relative to the speech rate of the surrounding context. That is, following a fast context
sentence, listeners interpret a given target sound as longer than following a slow context, and vice versa. This rate effect, often
referred to as “rate-dependent speech perception,” has been suggested to be the result of a robust, low-level perceptual process,
typically examined in quiet laboratory settings. However, speech perception often occurs in more challenging listening condi-
tions. Therefore, we asked whether rate-dependent perception would be (partially) compromised by signal degradation relative to
a clear listening condition. Specifically, we tested effects of white noise and reverberation, with the latter specifically distorting
temporal information. We hypothesized that signal degradation would reduce the precision of encoding the speech rate in the
context and thereby reduce the rate effect relative to a clear context. This prediction was borne out for both types of degradation in
Experiment 1, where the context sentences but not the subsequent target words were degraded. However, in Experiment 2, which
compared rate effects when contexts and targets were coherent in terms of signal quality, no reduction of the rate effect was
found. This suggests that, when confronted with coherently degraded signals, listeners adapt to challenging listening situations,
eliminating the difference between rate-dependent perception in clear and degraded conditions. Overall, the present study
contributes towards understanding the consequences of different types of listening environments on the functioning of low-
level perceptual processes that listeners use during speech perception.

Additional information
Data availability

Permanent link to publication record
Severijnen, G. G. A., Bosker, H. R., & McQueen, J. M. (2022). Acoustic correlates of Dutch lexical stress re-examined: Spectral tilt is not always more reliable than intensity. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 278-282). doi:10.21437/SpeechProsody.2022-57.

DOI

Full Text

Abstract
The present study examined two acoustic cues in the production
of lexical stress in Dutch: spectral tilt and overall intensity.
Sluijter and Van Heuven (1996) reported that spectral tilt is a
more reliable cue to stress than intensity. However, that study
included only a small number of talkers (10) and only syllables
with the vowels /aː/ and /ɔ/.
The present study re-examined this issue in a larger and
more variable dataset. We recorded 38 native speakers of Dutch
(20 females) producing 744 tokens of Dutch segmentally
overlapping words (e.g., VOORnaam vs. voorNAAM, “first
name” vs. “respectable”), targeting 10 different vowels, in
variable sentence contexts. For each syllable, we measured
overall intensity and spectral tilt following Sluijter and Van
Heuven (1996).
Results from Linear Discriminant Analyses showed that,
for the vowel /aː/ alone, spectral tilt showed an advantage over
intensity, as evidenced by higher stressed/unstressed syllable
classification accuracy scores for spectral tilt. However, when
all vowels were included in the analysis, the advantage
disappeared.
These findings confirm that spectral tilt plays a larger role
in signaling stress in Dutch /aː/ but show that, for a larger
sample of Dutch vowels, overall intensity and spectral tilt are
equally important.

Permanent link to publication record

Publications

Abstract

Abstract

Abstract

Abstract

Additional information

Abstract

Abstract

Additional information

Abstract

Contact

Follow us

Breadcrumb

Publications

Abstract

Abstract

Abstract

Abstract

Additional information

Abstract

Abstract

Additional information

Abstract

Share this page