Judith Holler

Publications

Displaying 1 - 17 of 17
  • Drijvers, L., & Holler, J. (2023). The multimodal facilitation effect in human communication. Psychonomic Bulletin & Review, 30(2), 792-801. doi:10.3758/s13423-022-02178-x.

    Abstract

    During face-to-face communication, recipients need to rapidly integrate a plethora of auditory and visual signals. This integration of signals from many different bodily articulators, all offset in time, with the information in the speech stream may either tax the cognitive system, thus slowing down language processing, or may result in multimodal facilitation. Using the classical shadowing paradigm, participants shadowed speech from face-to-face, naturalistic dyadic conversations in an audiovisual context, an audiovisual context without visual speech (e.g., lips), and an audio-only context. Our results provide evidence of a multimodal facilitation effect in human communication: participants were faster in shadowing words when seeing multimodal messages compared with when hearing only audio. Also, the more visual context was present, the fewer shadowing errors were made, and the earlier in time participants shadowed predicted lexical items. We propose that the multimodal facilitation effect may contribute to the ease of fast face-to-face conversational interaction.
  • Hamilton, A., & Holler, J. (Eds.). (2023). Face2face: Advancing the science of social interaction [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences. Retrieved from https://royalsocietypublishing.org/toc/rstb/2023/378/1875.

    Abstract

    Face to face interaction is fundamental to human sociality but is very complex to study in a scientific fashion. This theme issue brings together cutting-edge approaches to the study of face-to-face interaction and showcases how we can make progress in this area. Researchers are now studying interaction in adult conversation, parent-child relationships, neurodiverse groups, interactions with virtual agents and various animal species. The theme issue reveals how new paradigms are leading to more ecologically grounded and comprehensive insights into what social interaction is. Scientific advances in this area can lead to improvements in education and therapy, better understanding of neurodiversity and more engaging artificial agents
  • Hamilton, A., & Holler, J. (2023). Face2face: Advancing the science of social interaction. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 378(1875): 20210470. doi:10.1098/rstb.2021.0470.

    Abstract

    Face-to-face interaction is core to human sociality and its evolution, and provides the environment in which most of human communication occurs. Research into the full complexities that define face-to-face interaction requires a multi-disciplinary, multi-level approach, illuminating from different perspectives how we and other species interact. This special issue showcases a wide range of approaches, bringing together detailed studies of naturalistic social-interactional behaviour with larger scale analyses for generalization, and investigations of socially contextualized cognitive and neural processes that underpin the behaviour we observe. We suggest that this integrative approach will allow us to propel forwards the science of face-to-face interaction by leading us to new paradigms and novel, more ecologically grounded and comprehensive insights into how we interact with one another and with artificial agents, how differences in psychological profiles might affect interaction, and how the capacity to socially interact develops and has evolved in the human and other species. This theme issue makes a first step into this direction, with the aim to break down disciplinary boundaries and emphasizing the value of illuminating the many facets of face-to-face interaction.
  • Hintz, F., Khoe, Y. H., Strauß, A., Psomakas, A. J. A., & Holler, J. (2023). Electrophysiological evidence for the enhancement of gesture-speech integration by linguistic predictability during multimodal discourse comprehension. Cognitive, Affective and Behavioral Neuroscience, 23, 340-353. doi:10.3758/s13415-023-01074-8.

    Abstract

    In face-to-face discourse, listeners exploit cues in the input to generate predictions about upcoming words. Moreover, in addition to speech, speakers produce a multitude of visual signals, such as iconic gestures, which listeners readily integrate with incoming words. Previous studies have shown that processing of target words is facilitated when these are embedded in predictable compared to non-predictable discourses and when accompanied by iconic compared to meaningless gestures. In the present study, we investigated the interaction of both factors. We recorded electroencephalogram from 60 Dutch adults while they were watching videos of an actress producing short discourses. The stimuli consisted of an introductory and a target sentence; the latter contained a target noun. Depending on the preceding discourse, the target noun was either predictable or not. Each target noun was paired with an iconic gesture and a gesture that did not convey meaning. In both conditions, gesture presentation in the video was timed such that the gesture stroke slightly preceded the onset of the spoken target by 130 ms. Our ERP analyses revealed independent facilitatory effects for predictable discourses and iconic gestures. However, the interactive effect of both factors demonstrated that target processing (i.e., gesture-speech integration) was facilitated most when targets were part of predictable discourses and accompanied by an iconic gesture. Our results thus suggest a strong intertwinement of linguistic predictability and non-verbal gesture processing where listeners exploit predictive discourse cues to pre-activate verbal and non-verbal representations of upcoming target words.
  • Kendrick, K. H., Holler, J., & Levinson, S. C. (2023). Turn-taking in human face-to-face interaction is multimodal: Gaze direction and manual gestures aid the coordination of turn transitions. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 378(1875): 20210473. doi:10.1098/rstb.2021.0473.

    Abstract

    Human communicative interaction is characterized by rapid and precise turn-taking. This is achieved by an intricate system that has been elucidated in the field of conversation analysis, based largely on the study of the auditory signal. This model suggests that transitions occur at points of possible completion identified in terms of linguistic units. Despite this, considerable evidence exists that visible bodily actions including gaze and gestures also play a role. To reconcile disparate models and observations in the literature, we combine qualitative and quantitative methods to analyse turn-taking in a corpus of multimodal interaction using eye-trackers and multiple cameras. We show that transitions seem to be inhibited when a speaker averts their gaze at a point of possible turn completion, or when a speaker produces gestures which are beginning or unfinished at such points. We further show that while the direction of a speaker's gaze does not affect the speed of transitions, the production of manual gestures does: turns with gestures have faster transitions. Our findings suggest that the coordination of transitions involves not only linguistic resources but also visual gestural ones and that the transition-relevance places in turns are multimodal in nature.

    Additional information

    supplemental material
  • Mazzini, S., Holler, J., & Drijvers, L. (2023). Studying naturalistic human communication using dual-EEG and audio-visual recordings. STAR Protocols, 4(3): 102370. doi:10.1016/j.xpro.2023.102370.

    Abstract

    We present a protocol to study naturalistic human communication using dual-EEG and audio-visual recordings. We describe preparatory steps for data collection including setup preparation, experiment design, and piloting. We then describe the data collection process in detail which consists of participant recruitment, experiment room preparation, and data collection. We also outline the kinds of research questions that can be addressed with the current protocol, including several analysis possibilities, from conversational to advanced time-frequency analyses.
    For complete details on the use and execution of this protocol, please refer to Drijvers and Holler (2022).
  • Nota, N., Trujillo, J. P., & Holler, J. (2023). Specific facial signals associate with categories of social actions conveyed through questions. PLoS One, 18(7): e0288104. doi:10.1371/journal.pone.0288104.

    Abstract

    The early recognition of fundamental social actions, like questions, is crucial for understanding the speaker’s intended message and planning a timely response in conversation. Questions themselves may express more than one social action category (e.g., an information request “What time is it?”, an invitation “Will you come to my party?” or a criticism “Are you crazy?”). Although human language use occurs predominantly in a multimodal context, prior research on social actions has mainly focused on the verbal modality. This study breaks new ground by investigating how conversational facial signals may map onto the expression of different types of social actions conveyed through questions. The distribution, timing, and temporal organization of facial signals across social actions was analysed in a rich corpus of naturalistic, dyadic face-to-face Dutch conversations. These social actions were: Information Requests, Understanding Checks, Self-Directed questions, Stance or Sentiment questions, Other-Initiated Repairs, Active Participation questions, questions for Structuring, Initiating or Maintaining Conversation, and Plans and Actions questions. This is the first study to reveal differences in distribution and timing of facial signals across different types of social actions. The findings raise the possibility that facial signals may facilitate social action recognition during language processing in multimodal face-to-face interaction.

    Additional information

    supporting information
  • Nota, N., Trujillo, J. P., Jacobs, V., & Holler, J. (2023). Facilitating question identification through natural intensity eyebrow movements in virtual avatars. Scientific Reports, 13: 21295. doi:10.1038/s41598-023-48586-4.

    Abstract

    In conversation, recognizing social actions (similar to ‘speech acts’) early is important to quickly understand the speaker’s intended message and to provide a fast response. Fast turns are typical for fundamental social actions like questions, since a long gap can indicate a dispreferred response. In multimodal face-to-face interaction, visual signals may contribute to this fast dynamic. The face is an important source of visual signalling, and previous research found that prevalent facial signals such as eyebrow movements facilitate the rapid recognition of questions. We aimed to investigate whether early eyebrow movements with natural movement intensities facilitate question identification, and whether specific intensities are more helpful in detecting questions. Participants were instructed to view videos of avatars where the presence of eyebrow movements (eyebrow frown or raise vs. no eyebrow movement) was manipulated, and to indicate whether the utterance in the video was a question or statement. Results showed higher accuracies for questions with eyebrow frowns, and faster response times for questions with eyebrow frowns and eyebrow raises. No additional effect was observed for the specific movement intensity. This suggests that eyebrow movements that are representative of naturalistic multimodal behaviour facilitate question recognition.
  • Nota, N., Trujillo, J. P., & Holler, J. (2023). Conversational eyebrow frowns facilitate question identification: An online study using virtual avatars. Cognitive Science, 47(12): e13392. doi:10.1111/cogs.13392.

    Abstract

    Conversation is a time-pressured environment. Recognizing a social action (the ‘‘speech act,’’ such as a question requesting information) early is crucial in conversation to quickly understand the intended message and plan a timely response. Fast turns between interlocutors are especially relevant for responses to questions since a long gap may be meaningful by itself. Human language is multimodal, involving speech as well as visual signals from the body, including the face. But little is known about how conversational facial signals contribute to the communication of social actions. Some of the most prominent facial signals in conversation are eyebrow movements. Previous studies found links between eyebrow movements and questions, suggesting that these facial signals could contribute to the rapid recognition of questions. Therefore, we aimed to investigate whether early eyebrow movements (eyebrow frown or raise vs. no eyebrow movement) facilitate question identification. Participants were instructed to view videos of avatars where the presence of eyebrow movements accompanying questions was manipulated. Their task was to indicate whether the utterance was a question or a statement as accurately and quickly as possible. Data were collected using the online testing platform Gorilla. Results showed higher accuracies and faster response times for questions with eyebrow frowns, suggesting a facilitative role of eyebrow frowns for question identification. This means that facial signals can critically contribute to the communication of social actions in conversation by signaling social action-specific visual information and providing visual cues to speakers’ intentions.

    Additional information

    link to preprint
  • Trujillo, J. P., & Holler, J. (2023). Interactionally embedded gestalt principles of multimodal human communication. Perspectives on Psychological Science, 18(5), 1136-1159. doi:10.1177/17456916221141422.

    Abstract

    Natural human interaction requires us to produce and process many different signals, including speech, hand and head gestures, and facial expressions. These communicative signals, which occur in a variety of temporal relations with each other (e.g., parallel or temporally misaligned), must be rapidly processed as a coherent message by the receiver. In this contribution, we introduce the notion of interactionally embedded, affordance-driven gestalt perception as a framework that can explain how this rapid processing of multimodal signals is achieved as efficiently as it is. We discuss empirical evidence showing how basic principles of gestalt perception can explain some aspects of unimodal phenomena such as verbal language processing and visual scene perception but require additional features to explain multimodal human communication. We propose a framework in which high-level gestalt predictions are continuously updated by incoming sensory input, such as unfolding speech and visual signals. We outline the constituent processes that shape high-level gestalt perception and their role in perceiving relevance and prägnanz. Finally, we provide testable predictions that arise from this multimodal interactionally embedded gestalt-perception framework. This review and framework therefore provide a theoretically motivated account of how we may understand the highly complex, multimodal behaviors inherent in natural social interaction.
  • Connell, L., Cai, Z. G., & Holler, J. (2012). Do you see what I'm singing? Visuospatial movement biases pitch perception. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 252-257). Austin, TX: Cognitive Science Society.

    Abstract

    The nature of the connection between musical and spatial processing is controversial. While pitch may be described in spatial terms such as “high” or “low”, it is unclear whether pitch and space are associated but separate dimensions or whether they share representational and processing resources. In the present study, we asked participants to judge whether a target vocal note was the same as (or different from) a preceding cue note. Importantly, target trials were presented as video clips where a singer sometimes gestured upward or downward while singing that target note, thus providing an alternative, concurrent source of spatial information. Our results show that pitch discrimination was significantly biased by the spatial movement in gesture. These effects were eliminated by spatial memory load but preserved under verbal memory load conditions. Together, our findings suggest that pitch and space have a shared representation such that the mental representation of pitch is audiospatial in nature.
  • Holler, J., Kelly, S., Hagoort, P., & Ozyurek, A. (2012). When gestures catch the eye: The influence of gaze direction on co-speech gesture comprehension in triadic communication. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 467-472). Austin, TX: Cognitive Society. Retrieved from http://mindmodeling.org/cogsci2012/papers/0092/index.html.

    Abstract

    Co-speech gestures are an integral part of human face-to-face communication, but little is known about how pragmatic factors influence our comprehension of those gestures. The present study investigates how different types of recipients process iconic gestures in a triadic communicative situation. Participants (N = 32) took on the role of one of two recipients in a triad and were presented with 160 video clips of an actor speaking, or speaking and gesturing. Crucially, the actor’s eye gaze was manipulated in that she alternated her gaze between the two recipients. Participants thus perceived some messages in the role of addressed recipient and some in the role of unaddressed recipient. In these roles, participants were asked to make judgements concerning the speaker’s messages. Their reaction times showed that unaddressed recipients did comprehend speaker’s gestures differently to addressees. The findings are discussed with respect to automatic and controlled processes involved in gesture comprehension.
  • Kelly, S., Healey, M., Ozyurek, A., & Holler, J. (2012). The communicative influence of gesture and action during speech comprehension: Gestures have the upper hand [Abstract]. Abstracts of the Acoustics 2012 Hong Kong conference published in The Journal of the Acoustical Society of America, 131, 3311. doi:10.1121/1.4708385.

    Abstract

    Hand gestures combine with speech to form a single integrated system of meaning during language comprehension (Kelly et al., 2010). However, it is unknown whether gesture is uniquely integrated with speech or is processed like any other manual action. Thirty-one participants watched videos presenting speech with gestures or manual actions on objects. The relationship between the speech and gesture/action was either complementary (e.g., “He found the answer,” while producing a calculating gesture vs. actually using a calculator) or incongruent (e.g., the same sentence paired with the incongruent gesture/action of stirring with a spoon). Participants watched the video (prime) and then responded to a written word (target) that was or was not spoken in the video prime (e.g., “found” or “cut”). ERPs were taken to the primes (time-locked to the spoken verb, e.g., “found”) and the written targets. For primes, there was a larger frontal N400 (semantic processing) to incongruent vs. congruent items for the gesture, but not action, condition. For targets, the P2 (phonemic processing) was smaller for target words following congruent vs. incongruent gesture, but not action, primes. These findings suggest that hand gestures are integrated with speech in a privileged fashion compared to manual actions on objects.
  • Rowbotham, S., Holler, J., Lloyd, D., & Wearden, A. (2012). How do we communicate about pain? A systematic analysis of the semantic contribution of co-speech gestures in pain-focused conversations. Journal of Nonverbal Behavior, 36, 1-21. doi:10.1007/s10919-011-0122-5.

    Abstract

    The purpose of the present study was to investigate co-speech gesture use during communication about pain. Speakers described a recent pain experience and the data were analyzed using a ‘semantic feature approach’ to determine the distribution of information across gesture and speech. This analysis revealed that a considerable proportion of pain-focused talk was accompanied by gestures, and that these gestures often contained more information about pain than speech itself. Further, some gestures represented information that was hardly represented in speech at all. Overall, these results suggest that gestures are integral to the communication of pain and need to be attended to if recipients are to obtain a fuller understanding of the pain experience and provide help and support to pain sufferers.
  • Holler, J., Shovelton, H., & Beattie, G. (2009). Do iconic gestures really contribute to the semantic information communicated in face-to-face interaction? Journal of Nonverbal Behavior, 33, 73-88.
  • Holler, J., & Wilkin, K. (2009). Communicating common ground: how mutually shared knowledge influences the representation of semantic information in speech and gesture in a narrative task. Language and Cognitive Processes, 24, 267-289.
  • Kidd, E., & Holler, J. (2009). Children’s use of gesture to resolve lexical ambiguity. Developmental Science, 12, 903-913.

Share this page