What we see shapes what we hear

27 January 2021
People often move their hands up and down to ‘highlight’ what they are saying. Are such ‘beat gestures’ important for communication? Hans Rutger Bosker from the Max Planck Institute for Psycholinguistics and David Peeters from Tilburg University created words with an ambiguous stress pattern and asked listeners what they heard (DIScount or disCOUNT?). The beat gestures people saw influenced what they heard, showing that listeners quickly integrate verbal and visual information during speech recognition.

When politicians address an audience, they typically highlight important words with beat gestures, for example by moving their hands up and down. In fact, we all seem to do it: Such ‘flicks of the hands’ are among the most common gestures in everyday conversations. People align these gestures very precisely to the prominent words in speech. But do beat gestures help listeners to understand the speaker? Hans Rutger Bosker (Max Planck Institute for Psycholinguistics and Radboud University) and David Peeters (Tilburg University) tested whether what we see shapes what we hear.

“In face-to-face communication, language entails much more than just speech”, explains senior investigator Hans Rutger Bosker. “Speakers make use of different channels (mouth, hands, and face) to get a message across. We want to understand how listeners make use of these different streams of information when they are listening to someone.” In a well-known illusion called the ‘McGurk effect’ , people hear a sound (like the ‘b’ in ‘ba’) as a different sound (for instance ‘pa’ or ‘fa’), depending on the lip movements they see. But is there also a manual McGurk effect? Does what we hear depend on the gestures we see?

Plato or plateau?

To investigate this question, the researchers chose a set of Dutch words that differed only in stress pattern. For instance, the word “PLAto”—with stress on the first syllable—refers to the philosopher from ancient Greece. However, “plaTO”, pronounced with stress on the second syllable, refers to a plateau. Participants watched a video of Bosker producing the words (with ambiguous stress) while making beat gestures (“Now I say the word ... plato”). Participants then had to decide which word they heard (PLAto or plaTEAU?). Would it matter whether beat gestures occurred at the first or second syllable?

Listeners were more likely to hear stress on a syllable if there was a beat gesture on that syllable. This ‘manual McGurk effect’ occurred for both words and non-words (“BAAGpif” or “baagPIF”?). Even more surprisingly, beat gestures influenced what vowel people heard (long or short ‘a’ in ‘baagpif / bagpif’), as vowel length is typically associated with the stress pattern of a word.

“Listeners listen not only with their ears, but also with their eyes”, says Bosker. “These findings are the first to show that beat gestures influence which speech sounds you hear”. Bosker and Peeters think that the effect of beat gestures may be even bigger in real life, when speech is less clear than in the lab. In noisy listening conditions, visual beat gestures might be even more important for successful communication. “So wash your hands, and use them”, Bosker adds jokingly.

“Our findings also have the potential to enrich human-computer interaction and improve multimodal speech recognition systems. It seems clear that such systems should take into account more than just speech”, Bosker concludes. “We will follow up on the study by using virtual reality to test how specific these effects are—are they induced by beat gestures only or also by other types of communicative cues, such as head nods and eyebrow movements."

Link to paper



Share this page