Judith Holler

Publications

Displaying 1 - 21 of 21
  • Emmendorfer, A. K., & Holler, J. (2025). Facial signals shape predictions about the nature of upcoming conversational responses. Scientific Reports, 15: 1381. doi:10.1038/s41598-025-85192-y.

    Abstract

    Increasing evidence suggests that interlocutors use visual communicative signals to form predictions about unfolding utterances, but there is little data on the predictive potential of facial signals in conversation. In an online experiment with virtual agents, we examine whether facial signals produced by an addressee may allow speakers to anticipate the response to a question before it is given. Participants (n = 80) viewed videos of short conversation fragments between two virtual humans. Each fragment ended with the Questioner asking a question, followed by a pause during which the Responder looked either straight at the Questioner (baseline), or averted their gaze, or accompanied the straight gaze with one of the following facial signals: brow raise, brow frown, nose wrinkle, smile, squint, mouth corner pulled back (dimpler). Participants then indicated on a 6-point scale whether they expected a “yes” or “no” response. Analyses revealed that all signals received different ratings relative to the baseline: brow raises, dimplers, and smiles were associated with more positive responses, gaze aversions, brow frowns, nose wrinkles, and squints with more negative responses. Qur findings show that interlocutors may form strong associations between facial signals and upcoming responses to questions, highlighting their predictive potential in face-to-face conversation.

    Additional information

    supplementary materials
  • Hömke, P., Levinson, S. C., Emmendorfer, A. K., & Holler, J. (2025). Eyebrow movements as signals of communicative problems in human face-to-face interaction. Royal Society Open Science, 12(3): 241632. doi:10.1098/rsos.241632.

    Abstract

    Repair is a core building block of human communication, allowing us to address problems of understanding in conversation. Past research has uncovered the basic mechanisms by which interactants signal and solve such problems. However, the focus has been on verbal interaction, neglecting the fact that human communication is inherently multimodal. Here, we focus on a visual signal particularly prevalent in signalling problems of understanding: eyebrow furrows and raises. We present, first, a corpus study showing that differences in eyebrow actions (furrows versus raises) were systematically associated with differences in the format of verbal repair initiations. Second, we present a follow-up study using an avatar that allowed us to test the causal consequences of addressee eyebrow movements, zooming into the effect of eyebrow furrows as signals of trouble in understanding in particular. The results revealed that addressees’ eyebrow furrows have a striking effect on speakers’ speech, leading speakers to produce answers to questions several seconds longer than when not perceiving addressee eyebrow furrows while speaking. Together, the findings demonstrate that eyebrow movements play a communicative role in initiating repair during conversation rather than being merely epiphenomenal and that their occurrence can critically influence linguistic behaviour. Thus, eyebrow movements should be considered core coordination devices in human conversational interaction.

    Additional information

    link to preprint
  • Ter Bekke, M., Drijvers, L., & Holler, J. (2025). Co-speech hand gestures are used to predict upcoming meaning. Psychological Science. Advance online publication. doi:10.1177/09567976251331041.

    Abstract

    In face-to-face conversation, people use speech and gesture to convey meaning. Seeing gestures alongside speech facilitates comprehenders’ language processing, but crucially, the mechanisms underlying this facilitation remain unclear. We investigated whether comprehenders use the semantic information in gestures, typically preceding related speech, to predict upcoming meaning. Dutch adults listened to questions asked by a virtual avatar. Questions were accompanied by an iconic gesture (e.g., typing) or meaningless control movement (e.g., arm scratch) followed by a short pause and target word (e.g., “type”). A Cloze experiment showed that gestures improved explicit predictions of upcoming target words. Moreover, an EEG experiment showed that gestures reduced alpha and beta power during the pause, indicating anticipation, and reduced N400 amplitudes, demonstrating facilitated semantic processing. Thus, comprehenders use iconic gestures to predict upcoming meaning. Theories of linguistic prediction should incorporate communicative bodily signals as predictive cues to capture how language is processed in face-to-face interaction.

    Additional information

    supplementary material
  • Tilston, O., Holler, J., & Bangerter, A. (2025). Opening social interactions: The coordination of approach, gaze, speech and handshakes during greetings. Cognitive Science, 49(2): e70049. doi:10.1111/cogs.70049.

    Abstract

    Despite the importance of greetings for opening social interactions, their multimodal coordination processes remain poorly understood. We used a naturalistic, lab-based setup where pairs of unacquainted participants approached and greeted each other while unaware their greeting behavior was studied. We measured the prevalence and time course of multimodal behaviors potentially culminating in a handshake, including motor behaviors (e.g., walking, standing up, hand movements like raise, grasp, and retraction), gaze patterns (using eye tracking glasses), and speech (close and distant verbal salutations). We further manipulated the visibility of partners’ eyes to test its effect on gaze. Our findings reveal that gaze to a partner's face increases over the course of a greeting, but is partly averted during approach and is influenced by the visibility of partners’ eyes. Gaze helps coordinate handshakes, by signaling intent and guiding the grasp. The timing of adjacency pairs in verbal salutations is comparable to the precision of floor transitions in the main body of conversations, and varies according to greeting phase, with distant salutation pair parts featuring more gaps and close salutation pair parts featuring more overlap. Gender composition and a range of multimodal behaviors affect whether pairs chose to shake hands or not. These findings fill several gaps in our understanding of greetings and provide avenues for future research, including advancements in social robotics and human−robot interaction.
  • Trujillo, J. P., Dyer, R. M. K., & Holler, J. (2025). Dyadic differences in empathy scores are associated with kinematic similarity during conversational question-answer pairs. Discourse Processes, 62(3), 195-213. doi:10.1080/0163853X.2025.2467605.

    Abstract

    During conversation, speakers coordinate and synergize their behaviors at multiple levels, and in different ways. The extent to which individuals converge or diverge in their behaviors during interaction may relate to interpersonal differences relevant to social interaction, such as empathy as measured by the empathy quotient (EQ). An association between interpersonal difference in empathy and interpersonal entrainment could help to throw light on how interlocutor characteristics influence interpersonal entrainment. We investigated this possibility in a corpus of unconstrained conversation between dyads. We used dynamic time warping to quantify entrainment between interlocutors of head motion, hand motion, and maximum speech f0 during question–response sequences. We additionally calculated interlocutor differences in EQ scores. We found that, for both head and hand motion, greater difference in EQ was associated with higher entrainment. Thus, we consider that people who are dissimilar in EQ may need to “ground” their interaction with low-level movement entrainment. There was no significant relationship between f0 entrainment and EQ score differences.
  • Trujillo, J. P., & Holler, J. (2025). Multimodal information density is highest in question beginnings, and early entropy is associated with fewer but longer visual signals. Discourse Processes, 62(2), 69-88. doi:10.1080/0163853X.2024.2413314.

    Abstract

    When engaged in spoken conversation, speakers convey meaning using both speech and visual signals, such as facial expressions and manual gestures. An important question is how information is distributed in utterances during face-to-face interaction when information from visual signals is also present. In a corpus of casual Dutch face-to-face conversations, we focus on spoken questions in particular because they occur frequently, thus constituting core building blocks of conversation. We quantified information density (i.e. lexical entropy and surprisal) and the number and relative duration of facial and manual signals. We tested whether lexical information density or the number of visual signals differed between the first and last halves of questions, as well as whether the number of visual signals occurring in the less-predictable portion of a question was associated with the lexical information density of the same portion of the question in a systematic manner. We found that information density, as well as number of visual signals, were higher in the first half of questions, and specifically lexical entropy was associated with fewer, but longer visual signals. The multimodal front-loading of questions and the complementary distribution of visual signals and high entropy words in Dutch casual face-to-face conversations may have implications for the parallel processes of utterance comprehension and response planning during turn-taking.

    Additional information

    supplemental material
  • Drijvers, L., & Holler, J. (2022). Face-to-face spatial orientation fine-tunes the brain for neurocognitive processing in conversation. iScience, 25(11): 105413. doi:10.1016/j.isci.2022.105413.

    Abstract

    We here demonstrate that face-to-face spatial orientation induces a special ‘social mode’ for neurocognitive processing during conversation, even in the absence of visibility. Participants conversed face-to-face, face-to-face but visually occluded, and back-to-back to tease apart effects caused by seeing visual communicative signals and by spatial orientation. Using dual-EEG, we found that 1) listeners’ brains engaged more strongly while conversing in face-to-face than back-to-back, irrespective of the visibility of communicative signals, 2) listeners attended to speech more strongly in a back-to-back compared to a face-to-face spatial orientation without visibility; visual signals further reduced the attention needed; 3) the brains of interlocutors were more in sync in a face-to-face compared to a back-to-back spatial orientation, even when they could not see each other; visual signals further enhanced this pattern. Communicating in face-to-face spatial orientation is thus sufficient to induce a special ‘social mode’ which fine-tunes the brain for neurocognitive processing in conversation.
  • Eijk, L., Rasenberg, M., Arnese, F., Blokpoel, M., Dingemanse, M., Doeller, C. F., Ernestus, M., Holler, J., Milivojevic, B., Özyürek, A., Pouw, W., Van Rooij, I., Schriefers, H., Toni, I., Trujillo, J. P., & Bögels, S. (2022). The CABB dataset: A multimodal corpus of communicative interactions for behavioural and neural analyses. NeuroImage, 264: 119734. doi:10.1016/j.neuroimage.2022.119734.

    Abstract

    We present a dataset of behavioural and fMRI observations acquired in the context of humans involved in multimodal referential communication. The dataset contains audio/video and motion-tracking recordings of face-to-face, task-based communicative interactions in Dutch, as well as behavioural and neural correlates of participants’ representations of dialogue referents. Seventy-one pairs of unacquainted participants performed two interleaved interactional tasks in which they described and located 16 novel geometrical objects (i.e., Fribbles) yielding spontaneous interactions of about one hour. We share high-quality video (from three cameras), audio (from head-mounted microphones), and motion-tracking (Kinect) data, as well as speech transcripts of the interactions. Before and after engaging in the face-to-face communicative interactions, participants’ individual representations of the 16 Fribbles were estimated. Behaviourally, participants provided a written description (one to three words) for each Fribble and positioned them along 29 independent conceptual dimensions (e.g., rounded, human, audible). Neurally, fMRI signal evoked by each Fribble was measured during a one-back working-memory task. To enable functional hyperalignment across participants, the dataset also includes fMRI measurements obtained during visual presentation of eight animated movies (35 minutes total). We present analyses for the various types of data demonstrating their quality and consistency with earlier research. Besides high-resolution multimodal interactional data, this dataset includes different correlates of communicative referents, obtained before and after face-to-face dialogue, allowing for novel investigations into the relation between communicative behaviours and the representational space shared by communicators. This unique combination of data can be used for research in neuroscience, psychology, linguistics, and beyond.
  • Holler, J., Drijvers, L., Rafiee, A., & Majid, A. (2022). Embodied space-pitch associations are shaped by language. Cognitive Science, 46(2): e13083. doi:10.1111/cogs.13083.

    Abstract

    Height-pitch associations are claimed to be universal and independent of language, but this claim remains controversial. The present study sheds new light on this debate with a multimodal analysis of individual sound and melody descriptions obtained in an interactive communication paradigm with speakers of Dutch and Farsi. The findings reveal that, in contrast to Dutch speakers, Farsi speakers do not use a height-pitch metaphor consistently in speech. Both Dutch and Farsi speakers’ co-speech gestures did reveal a mapping of higher pitches to higher space and lower pitches to lower space, and this gesture space-pitch mapping tended to co-occur with corresponding spatial words (high-low). However, this mapping was much weaker in Farsi speakers than Dutch speakers. This suggests that cross-linguistic differences shape the conceptualization of pitch and further calls into question the universality of height-pitch associations.

    Additional information

    supporting information
  • Holler, J. (2022). Visual bodily signals as core devices for coordinating minds in interaction. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 377(1859): 20210094. doi:10.1098/rstb.2021.0094.

    Abstract

    The view put forward here is that visual bodily signals play a core role in human communication and the coordination of minds. Critically, this role goes far beyond referential and propositional meaning. The human communication system that we consider to be the explanandum in the evolution of language thus is not spoken language. It is, instead, a deeply multimodal, multilayered, multifunctional system that developed—and survived—owing to the extraordinary flexibility and adaptability that it endows us with. Beyond their undisputed iconic power, visual bodily signals (manual and head gestures, facial expressions, gaze, torso movements) fundamentally contribute to key pragmatic processes in modern human communication. This contribution becomes particularly evident with a focus that includes non-iconic manual signals, non-manual signals and signal combinations. Such a focus also needs to consider meaning encoded not just via iconic mappings, since kinematic modulations and interaction-bound meaning are additional properties equipping the body with striking pragmatic capacities. Some of these capacities, or its precursors, may have already been present in the last common ancestor we share with the great apes and may qualify as early versions of the components constituting the hypothesized interaction engine.
  • Holler, J., Bavelas, J., Woods, J., Geiger, M., & Simons, L. (2022). Given-new effects on the duration of gestures and of words in face-to-face dialogue. Discourse Processes, 59(8), 619-645. doi:10.1080/0163853X.2022.2107859.

    Abstract

    The given-new contract entails that speakers must distinguish for their addressee whether references are new or already part of their dialogue. Past research had found that, in a monologue to a listener, speakers shortened repeated words. However, the notion of the given-new contract is inherently dialogic, with an addressee and the availability of co-speech gestures. Here, two face-to-face dialogue experiments tested whether gesture duration also follows the given-new contract. In Experiment 1, four experimental sequences confirmed that when speakers repeated their gestures, they shortened the duration significantly. Experiment 2 replicated the effect with spontaneous gestures in a different task. This experiment also extended earlier results with words, confirming that speakers shortened their repeated words significantly in a multimodal dialogue setting, the basic form of language use. Because words and gestures were not necessarily redundant, these results offer another instance in which gestures and words independently serve pragmatic requirements of dialogue.
  • Pouw, W., & Holler, J. (2022). Timing in conversation is dynamically adjusted turn by turn in dyadic telephone conversations. Cognition, 222: 105015. doi:10.1016/j.cognition.2022.105015.

    Abstract

    Conversational turn taking in humans involves incredibly rapid responding. The timing mechanisms underpinning such responses have been heavily debated, including questions such as who is doing the timing. Similar to findings on rhythmic tapping to a metronome, we show that floor transfer offsets (FTOs) in telephone conversations are serially dependent, such that FTOs are lag-1 negatively autocorrelated. Finding this serial dependence on a turn-by-turn basis (lag-1) rather than on the basis of two or more turns, suggests a counter-adjustment mechanism operating at the level of the dyad in FTOs during telephone conversations, rather than a more individualistic self-adjustment within speakers. This finding, if replicated, has major implications for models describing turn taking, and confirms the joint, dyadic nature of human conversational dynamics. Future research is needed to see how pervasive serial dependencies in FTOs are, such as for example in richer communicative face-to-face contexts where visual signals affect conversational timing.
  • Schubotz, L., Özyürek, A., & Holler, J. (2022). Individual differences in working memory and semantic fluency predict younger and older adults' multimodal recipient design in an interactive spatial task. Acta Psychologica, 229: 103690. doi:10.1016/j.actpsy.2022.103690.

    Abstract

    Aging appears to impair the ability to adapt speech and gestures based on knowledge shared with an addressee
    (common ground-based recipient design) in narrative settings. Here, we test whether this extends to spatial settings
    and is modulated by cognitive abilities. Younger and older adults gave instructions on how to assemble 3D-
    models from building blocks on six consecutive trials. We induced mutually shared knowledge by either
    showing speaker and addressee the model beforehand, or not. Additionally, shared knowledge accumulated
    across the trials. Younger and crucially also older adults provided recipient-designed utterances, indicated by a
    significant reduction in the number of words and of gestures when common ground was present. Additionally, we
    observed a reduction in semantic content and a shift in cross-modal distribution of information across trials.
    Rather than age, individual differences in verbal and visual working memory and semantic fluency predicted the
    extent of addressee-based adaptations. Thus, in spatial tasks, individual cognitive abilities modulate the inter-
    active language use of both younger and older adul

    Additional information

    1-s2.0-S0001691822002050-mmc1.docx
  • Trujillo, J. P., Levinson, S. C., & Holler, J. (2022). A multi-scale investigation of the human communication system's response to visual disruption. Royal Society Open Science, 9(4): 211489. doi:10.1098/rsos.211489.

    Abstract

    In human communication, when the speech is disrupted, the visual channel (e.g. manual gestures) can compensate to ensure successful communication. Whether speech also compensates when the visual channel is disrupted is an open question, and one that significantly bears on the status of the gestural modality. We test whether gesture and speech are dynamically co-adapted to meet communicative needs. To this end, we parametrically reduce visibility during casual conversational interaction and measure the effects on speakers' communicative behaviour using motion tracking and manual annotation for kinematic and acoustic analyses. We found that visual signalling effort was flexibly adapted in response to a decrease in visual quality (especially motion energy, gesture rate, size, velocity and hold-time). Interestingly, speech was also affected: speech intensity increased in response to reduced visual quality (particularly in speech-gesture utterances, but independently of kinematics). Our findings highlight that multi-modal communicative behaviours are flexibly adapted at multiple scales of measurement and question the notion that gesture plays an inferior role to speech.

    Additional information

    supplemental material
  • Holler, J., Kendrick, K. H., Casillas, M., & Levinson, S. C. (2015). Editorial: Turn-taking in human communicative interaction. Frontiers in Psychology, 6: 1919. doi:10.3389/fpsyg.2015.01919.
  • Holler, J., Kokal, I., Toni, I., Hagoort, P., Kelly, S. D., & Ozyurek, A. (2015). Eye’m talking to you: Speakers’ gaze direction modulates co-speech gesture processing in the right MTG. Social Cognitive & Affective Neuroscience, 10, 255-261. doi:10.1093/scan/nsu047.

    Abstract

    Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture.
    Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that
    were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech–gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts.
  • Holler, J., & Kendrick, K. H. (2015). Unaddressed participants’ gaze in multi-person interaction: Optimizing recipiency. Frontiers in Psychology, 6: 98. doi:10.3389/fpsyg.2015.00098.

    Abstract

    One of the most intriguing aspects of human communication is its turn-taking system. It requires the ability to process on-going turns at talk while planning the next, and to launch this next turn without considerable overlap or delay. Recent research has investigated the eye movements of observers of dialogues to gain insight into how we process turns at talk. More specifically, this research has focused on the extent to which we are able to anticipate the end of current and the beginning of next turns. At the same time, there has been a call for shifting experimental paradigms exploring social-cognitive processes away from passive observation towards online processing. Here, we present research that responds to this call by situating state-of-the-art technology for tracking interlocutors’ eye movements within spontaneous, face-to-face conversation. Each conversation involved three native speakers of English. The analysis focused on question-response sequences involving just two of those participants, thus rendering the third momentarily unaddressed. Temporal analyses of the unaddressed participants’ gaze shifts from current to next speaker revealed that unaddressed participants are able to anticipate next turns, and moreover, that they often shift their gaze towards the next speaker before the current turn ends. However, an analysis of the complex structure of turns at talk revealed that the planning of these gaze shifts virtually coincides with the points at which the turns first become recog-nizable as possibly complete. We argue that the timing of these eye movements is governed by an organizational principle whereby unaddressed participants shift their gaze at a point that appears interactionally most optimal: It provides unaddressed participants with access to much of the visual, bodily behavior that accompanies both the current speaker’s and the next speaker’s turn, and it allows them to display recipiency with regard to both speakers’ turns.
  • Kelly, S., Healey, M., Ozyurek, A., & Holler, J. (2015). The processing of speech, gesture and action during language comprehension. Psychonomic Bulletin & Review, 22, 517-523. doi:10.3758/s13423-014-0681-7.

    Abstract

    Hand gestures and speech form a single integrated system of meaning during language comprehension, but is gesture processed with speech in a unique fashion? We had subjects watch multimodal videos that presented auditory (words) and visual (gestures and actions on objects) information. Half of the subjects related the audio information to a written prime presented before the video, and the other half related the visual information to the written prime. For half of the multimodal video stimuli, the audio and visual information contents were congruent, and for the other half, they were incongruent. For all subjects, stimuli in which the gestures and actions were incongruent with the speech produced more errors and longer response times than did stimuli that were congruent, but this effect was less prominent for speech-action stimuli than for speech-gesture stimuli. However, subjects focusing on visual targets were more accurate when processing actions than gestures. These results suggest that although actions may be easier to process than gestures, gestures may be more tightly tied to the processing of accompanying speech.
  • Peeters, D., Chu, M., Holler, J., Hagoort, P., & Ozyurek, A. (2015). Electrophysiological and kinematic correlates of communicative intent in the planning and production of pointing gestures and speech. Journal of Cognitive Neuroscience, 27(12), 2352-2368. doi:10.1162/jocn_a_00865.

    Abstract

    In everyday human communication, we often express our communicative intentions by manually pointing out referents in the material world around us to an addressee, often in tight synchronization with referential speech. This study investigated whether and how the kinematic form of index finger pointing gestures is shaped by the gesturer's communicative intentions and how this is modulated by the presence of concurrently produced speech. Furthermore, we explored the neural mechanisms underpinning the planning of communicative pointing gestures and speech. Two experiments were carried out in which participants pointed at referents for an addressee while the informativeness of their gestures and speech was varied. Kinematic and electrophysiological data were recorded online. It was found that participants prolonged the duration of the stroke and poststroke hold phase of their gesture to be more communicative, in particular when the gesture was carrying the main informational burden in their multimodal utterance. Frontal and P300 effects in the ERPs suggested the importance of intentional and modality-independent attentional mechanisms during the planning phase of informative pointing gestures. These findings contribute to a better understanding of the complex interplay between action, attention, intention, and language in the production of pointing gestures, a communicative act core to human interaction.
  • Rowbotham, S., Lloyd, D. M., Holler, J., & Wearden, A. (2015). Externalizing the private experience of pain: A role for co-speech gestures in pain communication? Health Communication, 30(1), 70-80. doi:10.1080/10410236.2013.836070.

    Abstract

    Despite the importance of effective pain communication, talking about pain represents a major challenge for patients and clinicians because pain is a private and subjective experience. Focusing primarily on acute pain, this article considers the limitations of current methods of obtaining information about the sensory characteristics of pain and suggests that spontaneously produced “co-speech hand gestures” may constitute an important source of information here. Although this is a relatively new area of research, we present recent empirical evidence that reveals that co-speech gestures contain important information about pain that can both add to and clarify speech. Following this, we discuss how these findings might eventually lead to a greater understanding of the sensory characteristics of pain, and to improvements in treatment and support for pain sufferers. We hope that this article will stimulate further research and discussion of this previously overlooked dimension of pain communication
  • Schubotz, L., Holler, J., & Ozyurek, A. (2015). Age-related differences in multi-modal audience design: Young, but not old speakers, adapt speech and gestures to their addressee's knowledge. In G. Ferré, & M. Tutton (Eds.), Proceedings of the 4th GESPIN - Gesture & Speech in Interaction Conference (pp. 211-216). Nantes: Université of Nantes.

    Abstract

    Speakers can adapt their speech and co-speech gestures for
    addressees. Here, we investigate whether this ability is
    modulated by age. Younger and older adults participated in a
    comic narration task in which one participant (the speaker)
    narrated six short comic stories to another participant (the
    addressee). One half of each story was known to both participants, the other half only to the speaker. Younger but
    not older speakers used more words and gestures when narrating novel story content as opposed to known content.
    We discuss cognitive and pragmatic explanations of these findings and relate them to theories of gesture production.

Share this page