Visible speech enhanced: What do iconic gestures and lip movements contribute to degraded speech comprehension?
Drijvers, L., & Ozyurek, A.
(2016). Visible speech enhanced: What do iconic gestures and lip movements contribute to degraded speech comprehension?
. Talk presented at the 7th Conference of the International Society for Gesture Studies (ISGS7). Paris, France. 2016-07-18 - 2016-07-22.
Natural, face-to-face communication consists of an audiovisual binding that integrates speech and visual
information, such as iconic co-speech gestures and lip movements. Especially in adverse listening conditions
such as in noise, this visual information can enhance speech comprehension. However, the contribution of lip
movements and iconic gestures to understanding speech in noise has been mostly studied separately. Here, we
investigated the contribution of iconic gestures and lip movements to degraded speech comprehension in a joint
context. In a free-recall task, participants watched short videos of an actress uttering an action verb. This verb
could be presented in clear speech, severely degraded speech (2-band noise-vocoding) or moderately degraded
speech (6-band noise-vocoding), and could view the actress with her lips blocked, with her lips visible, or with
her lips visible and making an iconic co-speech gesture. Additionally, we presented these clips without audio
and with just the lip movements present, or with just lip movements and gestures present, to investigate how
much information listeners could get from visual input alone. Our results reveal that when listeners perceive
degraded speech in a visual context, listeners benefit more from gestural information than from just lip
movements alone. This benefit is larger at moderate noise levels where auditory cues are still moderately
reliable than compared to severe noise levels where auditory cues are no longer reliable. As a result, listeners
are only able to benefit from this additive effect of ‘double’ multimodal enhancement of iconic gestures and lip
movements when there are enough auditory cues present to map lip movements to the phonological
information in the speech signal