Displaying 1 - 15 of 15
-
Drijvers, L., & Holler, J. (2023). The multimodal facilitation effect in human communication. Psychonomic Bulletin & Review, 30(2), 792-801. doi:10.3758/s13423-022-02178-x.
Abstract
During face-to-face communication, recipients need to rapidly integrate a plethora of auditory and visual signals. This integration of signals from many different bodily articulators, all offset in time, with the information in the speech stream may either tax the cognitive system, thus slowing down language processing, or may result in multimodal facilitation. Using the classical shadowing paradigm, participants shadowed speech from face-to-face, naturalistic dyadic conversations in an audiovisual context, an audiovisual context without visual speech (e.g., lips), and an audio-only context. Our results provide evidence of a multimodal facilitation effect in human communication: participants were faster in shadowing words when seeing multimodal messages compared with when hearing only audio. Also, the more visual context was present, the fewer shadowing errors were made, and the earlier in time participants shadowed predicted lexical items. We propose that the multimodal facilitation effect may contribute to the ease of fast face-to-face conversational interaction. -
Hamilton, A., & Holler, J. (
Eds. ). (2023). Face2face: Advancing the science of social interaction [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences. Retrieved from https://royalsocietypublishing.org/toc/rstb/2023/378/1875.Abstract
Face to face interaction is fundamental to human sociality but is very complex to study in a scientific fashion. This theme issue brings together cutting-edge approaches to the study of face-to-face interaction and showcases how we can make progress in this area. Researchers are now studying interaction in adult conversation, parent-child relationships, neurodiverse groups, interactions with virtual agents and various animal species. The theme issue reveals how new paradigms are leading to more ecologically grounded and comprehensive insights into what social interaction is. Scientific advances in this area can lead to improvements in education and therapy, better understanding of neurodiversity and more engaging artificial agents -
Hamilton, A., & Holler, J. (2023). Face2face: Advancing the science of social interaction. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 378(1875): 20210470. doi:10.1098/rstb.2021.0470.
Abstract
Face-to-face interaction is core to human sociality and its evolution, and provides the environment in which most of human communication occurs. Research into the full complexities that define face-to-face interaction requires a multi-disciplinary, multi-level approach, illuminating from different perspectives how we and other species interact. This special issue showcases a wide range of approaches, bringing together detailed studies of naturalistic social-interactional behaviour with larger scale analyses for generalization, and investigations of socially contextualized cognitive and neural processes that underpin the behaviour we observe. We suggest that this integrative approach will allow us to propel forwards the science of face-to-face interaction by leading us to new paradigms and novel, more ecologically grounded and comprehensive insights into how we interact with one another and with artificial agents, how differences in psychological profiles might affect interaction, and how the capacity to socially interact develops and has evolved in the human and other species. This theme issue makes a first step into this direction, with the aim to break down disciplinary boundaries and emphasizing the value of illuminating the many facets of face-to-face interaction. -
Hintz, F., Khoe, Y. H., Strauß, A., Psomakas, A. J. A., & Holler, J. (2023). Electrophysiological evidence for the enhancement of gesture-speech integration by linguistic predictability during multimodal discourse comprehension. Cognitive, Affective and Behavioral Neuroscience, 23, 340-353. doi:10.3758/s13415-023-01074-8.
Abstract
In face-to-face discourse, listeners exploit cues in the input to generate predictions about upcoming words. Moreover, in addition to speech, speakers produce a multitude of visual signals, such as iconic gestures, which listeners readily integrate with incoming words. Previous studies have shown that processing of target words is facilitated when these are embedded in predictable compared to non-predictable discourses and when accompanied by iconic compared to meaningless gestures. In the present study, we investigated the interaction of both factors. We recorded electroencephalogram from 60 Dutch adults while they were watching videos of an actress producing short discourses. The stimuli consisted of an introductory and a target sentence; the latter contained a target noun. Depending on the preceding discourse, the target noun was either predictable or not. Each target noun was paired with an iconic gesture and a gesture that did not convey meaning. In both conditions, gesture presentation in the video was timed such that the gesture stroke slightly preceded the onset of the spoken target by 130 ms. Our ERP analyses revealed independent facilitatory effects for predictable discourses and iconic gestures. However, the interactive effect of both factors demonstrated that target processing (i.e., gesture-speech integration) was facilitated most when targets were part of predictable discourses and accompanied by an iconic gesture. Our results thus suggest a strong intertwinement of linguistic predictability and non-verbal gesture processing where listeners exploit predictive discourse cues to pre-activate verbal and non-verbal representations of upcoming target words. -
Kendrick, K. H., Holler, J., & Levinson, S. C. (2023). Turn-taking in human face-to-face interaction is multimodal: Gaze direction and manual gestures aid the coordination of turn transitions. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 378(1875): 20210473. doi:10.1098/rstb.2021.0473.
Abstract
Human communicative interaction is characterized by rapid and precise turn-taking. This is achieved by an intricate system that has been elucidated in the field of conversation analysis, based largely on the study of the auditory signal. This model suggests that transitions occur at points of possible completion identified in terms of linguistic units. Despite this, considerable evidence exists that visible bodily actions including gaze and gestures also play a role. To reconcile disparate models and observations in the literature, we combine qualitative and quantitative methods to analyse turn-taking in a corpus of multimodal interaction using eye-trackers and multiple cameras. We show that transitions seem to be inhibited when a speaker averts their gaze at a point of possible turn completion, or when a speaker produces gestures which are beginning or unfinished at such points. We further show that while the direction of a speaker's gaze does not affect the speed of transitions, the production of manual gestures does: turns with gestures have faster transitions. Our findings suggest that the coordination of transitions involves not only linguistic resources but also visual gestural ones and that the transition-relevance places in turns are multimodal in nature.Additional information
supplemental material -
Mazzini, S., Holler, J., & Drijvers, L. (2023). Studying naturalistic human communication using dual-EEG and audio-visual recordings. STAR Protocols, 4(3): 102370. doi:10.1016/j.xpro.2023.102370.
Abstract
We present a protocol to study naturalistic human communication using dual-EEG and audio-visual recordings. We describe preparatory steps for data collection including setup preparation, experiment design, and piloting. We then describe the data collection process in detail which consists of participant recruitment, experiment room preparation, and data collection. We also outline the kinds of research questions that can be addressed with the current protocol, including several analysis possibilities, from conversational to advanced time-frequency analyses.
For complete details on the use and execution of this protocol, please refer to Drijvers and Holler (2022).Additional information
additionally, exemplary data and scripts -
Nota, N., Trujillo, J. P., & Holler, J. (2023). Specific facial signals associate with categories of social actions conveyed through questions. PLoS One, 18(7): e0288104. doi:10.1371/journal.pone.0288104.
Abstract
The early recognition of fundamental social actions, like questions, is crucial for understanding the speaker’s intended message and planning a timely response in conversation. Questions themselves may express more than one social action category (e.g., an information request “What time is it?”, an invitation “Will you come to my party?” or a criticism “Are you crazy?”). Although human language use occurs predominantly in a multimodal context, prior research on social actions has mainly focused on the verbal modality. This study breaks new ground by investigating how conversational facial signals may map onto the expression of different types of social actions conveyed through questions. The distribution, timing, and temporal organization of facial signals across social actions was analysed in a rich corpus of naturalistic, dyadic face-to-face Dutch conversations. These social actions were: Information Requests, Understanding Checks, Self-Directed questions, Stance or Sentiment questions, Other-Initiated Repairs, Active Participation questions, questions for Structuring, Initiating or Maintaining Conversation, and Plans and Actions questions. This is the first study to reveal differences in distribution and timing of facial signals across different types of social actions. The findings raise the possibility that facial signals may facilitate social action recognition during language processing in multimodal face-to-face interaction.Additional information
supporting information -
Nota, N., Trujillo, J. P., Jacobs, V., & Holler, J. (2023). Facilitating question identification through natural intensity eyebrow movements in virtual avatars. Scientific Reports, 13: 21295. doi:10.1038/s41598-023-48586-4.
Abstract
In conversation, recognizing social actions (similar to ‘speech acts’) early is important to quickly understand the speaker’s intended message and to provide a fast response. Fast turns are typical for fundamental social actions like questions, since a long gap can indicate a dispreferred response. In multimodal face-to-face interaction, visual signals may contribute to this fast dynamic. The face is an important source of visual signalling, and previous research found that prevalent facial signals such as eyebrow movements facilitate the rapid recognition of questions. We aimed to investigate whether early eyebrow movements with natural movement intensities facilitate question identification, and whether specific intensities are more helpful in detecting questions. Participants were instructed to view videos of avatars where the presence of eyebrow movements (eyebrow frown or raise vs. no eyebrow movement) was manipulated, and to indicate whether the utterance in the video was a question or statement. Results showed higher accuracies for questions with eyebrow frowns, and faster response times for questions with eyebrow frowns and eyebrow raises. No additional effect was observed for the specific movement intensity. This suggests that eyebrow movements that are representative of naturalistic multimodal behaviour facilitate question recognition. -
Nota, N., Trujillo, J. P., & Holler, J. (2023). Conversational eyebrow frowns facilitate question identification: An online study using virtual avatars. Cognitive Science, 47(12): e13392. doi:10.1111/cogs.13392.
Abstract
Conversation is a time-pressured environment. Recognizing a social action (the ‘‘speech act,’’ such as a question requesting information) early is crucial in conversation to quickly understand the intended message and plan a timely response. Fast turns between interlocutors are especially relevant for responses to questions since a long gap may be meaningful by itself. Human language is multimodal, involving speech as well as visual signals from the body, including the face. But little is known about how conversational facial signals contribute to the communication of social actions. Some of the most prominent facial signals in conversation are eyebrow movements. Previous studies found links between eyebrow movements and questions, suggesting that these facial signals could contribute to the rapid recognition of questions. Therefore, we aimed to investigate whether early eyebrow movements (eyebrow frown or raise vs. no eyebrow movement) facilitate question identification. Participants were instructed to view videos of avatars where the presence of eyebrow movements accompanying questions was manipulated. Their task was to indicate whether the utterance was a question or a statement as accurately and quickly as possible. Data were collected using the online testing platform Gorilla. Results showed higher accuracies and faster response times for questions with eyebrow frowns, suggesting a facilitative role of eyebrow frowns for question identification. This means that facial signals can critically contribute to the communication of social actions in conversation by signaling social action-specific visual information and providing visual cues to speakers’ intentions.Additional information
link to preprint -
Trujillo, J. P., & Holler, J. (2023). Interactionally embedded gestalt principles of multimodal human communication. Perspectives on Psychological Science, 18(5), 1136-1159. doi:10.1177/17456916221141422.
Abstract
Natural human interaction requires us to produce and process many different signals, including speech, hand and head gestures, and facial expressions. These communicative signals, which occur in a variety of temporal relations with each other (e.g., parallel or temporally misaligned), must be rapidly processed as a coherent message by the receiver. In this contribution, we introduce the notion of interactionally embedded, affordance-driven gestalt perception as a framework that can explain how this rapid processing of multimodal signals is achieved as efficiently as it is. We discuss empirical evidence showing how basic principles of gestalt perception can explain some aspects of unimodal phenomena such as verbal language processing and visual scene perception but require additional features to explain multimodal human communication. We propose a framework in which high-level gestalt predictions are continuously updated by incoming sensory input, such as unfolding speech and visual signals. We outline the constituent processes that shape high-level gestalt perception and their role in perceiving relevance and prägnanz. Finally, we provide testable predictions that arise from this multimodal interactionally embedded gestalt-perception framework. This review and framework therefore provide a theoretically motivated account of how we may understand the highly complex, multimodal behaviors inherent in natural social interaction. -
Holler, J., Kendrick, K. H., Casillas, M., & Levinson, S. C. (
Eds. ). (2016). Turn-Taking in Human Communicative Interaction. Lausanne: Frontiers Media. doi:10.3389/978-2-88919-825-2.Abstract
The core use of language is in face-to-face conversation. This is characterized by rapid turn-taking. This turn-taking poses a number central puzzles for the psychology of language.
Consider, for example, that in large corpora the gap between turns is on the order of 100 to 300 ms, but the latencies involved in language production require minimally between 600ms (for a single word) or 1500 ms (for as simple sentence). This implies that participants in conversation are predicting the ends of the incoming turn and preparing in advance. But how is this done? What aspects of this prediction are done when? What happens when the prediction is wrong? What stops participants coming in too early? If the system is running on prediction, why is there consistently a mode of 100 to 300 ms in response time?
The timing puzzle raises further puzzles: it seems that comprehension must run parallel with the preparation for production, but it has been presumed that there are strict cognitive limitations on more than one central process running at a time. How is this bottleneck overcome? Far from being 'easy' as some psychologists have suggested, conversation may be one of the most demanding cognitive tasks in our everyday lives. Further questions naturally arise: how do children learn to master this demanding task, and what is the developmental trajectory in this domain?
Research shows that aspects of turn-taking such as its timing are remarkably stable across languages and cultures, but the word order of languages varies enormously. How then does prediction of the incoming turn work when the verb (often the informational nugget in a clause) is at the end? Conversely, how can production work fast enough in languages that have the verb at the beginning, thereby requiring early planning of the whole clause? What happens when one changes modality, as in sign languages -- with the loss of channel constraints is turn-taking much freer? And what about face-to-face communication amongst hearing individuals -- do gestures, gaze, and other body behaviors facilitate turn-taking? One can also ask the phylogenetic question: how did such a system evolve? There seem to be parallels (analogies) in duetting bird species, and in a variety of monkey species, but there is little evidence of anything like this among the great apes.
All this constitutes a neglected set of problems at the heart of the psychology of language and of the language sciences. This research topic welcomes contributions from right across the board, for example from psycholinguists, developmental psychologists, students of dialogue and conversation analysis, linguists interested in the use of language, phoneticians, corpus analysts and comparative ethologists or psychologists. We welcome contributions of all sorts, for example original research papers, opinion pieces, and reviews of work in subfields that may not be fully understood in other subfields. -
Humphries, S., Holler, J., Crawford, T. J., Herrera, E., & Poliakoff, E. (2016). A third-person perspective on co-speech action gestures in Parkinson’s disease. Cortex, 78, 44-54. doi:10.1016/j.cortex.2016.02.009.
Abstract
A combination of impaired motor and cognitive function in Parkinson’s disease (PD) can impact on language and communication, with patients exhibiting a particular difficulty processing action verbs. Co-speech gestures embody a link between action and language and contribute significantly to communication in healthy people. Here, we investigated how co-speech gestures depicting actions are affected in PD, in particular with respect to the visual perspective—or the viewpoint – they depict. Gestures are closely related to mental imagery and motor simulations, but people with PD may be impaired in the way they simulate actions from a first-person perspective and may compensate for this by relying more on third-person visual features. We analysed the action-depicting gestures produced by mild-moderate PD patients and age-matched controls on an action description task and examined the relationship between gesture viewpoint, action naming, and performance on an action observation task (weight judgement). Healthy controls produced the majority of their action gestures from a first-person perspective, whereas PD patients produced a greater proportion of gestures produced from a third-person perspective. We propose that this reflects a compensatory reliance on third-person visual features in the simulation of actions in PD. Performance was also impaired in action naming and weight judgement, although this was unrelated to gesture viewpoint. Our findings provide a more comprehensive understanding of how action-language impairments in PD impact on action communication, on the cognitive underpinnings of this impairment, as well as elucidating the role of action simulation in gesture production -
Rowbotham, S. J., Holler, J., Wearden, A., & Lloyd, D. M. (2016). I see how you feel: Recipients obtain additional information from speakers’ gestures about pain. Patient Education and Counseling, 99(8), 1333-1342. doi:10.1016/j.pec.2016.03.007.
Abstract
Objective
Despite the need for effective pain communication, pain is difficult to verbalise. Co-speech gestures frequently add information about pain that is not contained in the accompanying speech. We explored whether recipients can obtain additional information from gestures about the pain that is being described.
Methods
Participants (n = 135) viewed clips of pain descriptions under one of four conditions: 1) Speech Only; 2) Speech and Gesture; 3) Speech, Gesture and Face; and 4) Speech, Gesture and Face plus Instruction (short presentation explaining the pain information that gestures can depict). Participants provided free-text descriptions of the pain that had been described. Responses were scored for the amount of information obtained from the original clips.
Findings
Participants in the Instruction condition obtained the most information, while those in the Speech Only condition obtained the least (all comparisons p<.001).
Conclusions
Gestures produced during pain descriptions provide additional information about pain that recipients are able to pick up without detriment to their uptake of spoken information.
Practice implications
Healthcare professionals may benefit from instruction in gestures to enhance uptake of information about patients’ pain experiences. -
Holler, J., & Beattie, G. (2003). How iconic gestures and speech interact in the representation of meaning: are both aspects really integral to the process? Semiotica, 146, 81-116.
-
Holler, J., & Beattie, G. (2003). Pragmatic aspects of representational gestures: Do speakers use them to clarify verbal ambiguity for the listener? Gesture, 3, 127-154.
Share this page