Displaying 1 - 19 of 19
-
Ghaleb, E., Rasenberg, M., Pouw, W., Toni, I., Holler, J., Özyürek, A., & Fernandez, R. (2024). Analysing cross-speaker convergence through the lens of automatically detected shared linguistic constructions. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (
Eds. ), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 1717-1723).Abstract
Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions---expressions with a common lexical core used by both speakers within a dialogue---and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.Additional information
link to eScholarship -
Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Ozyurek, A., & Fernandez, R. (2024). Cospeech gesture detection through multi-phase sequence labeling. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024) (pp. 4007-4015).
Abstract
Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and re-
traction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework’s capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis. -
Ghaleb, E., Khaertdinov, B., Pouw, W., Rasenberg, M., Holler, J., Ozyurek, A., & Fernandez, R. (2024). Learning co-speech gesture representations in dialogue through contrastive learning: An intrinsic evaluation. In Proceedings of the 26th International Conference on Multimodal Interaction (ICMI 2024) (pp. 274-283).
Abstract
In face-to-face dialogues, the form-meaning relationship of co-speech gestures varies depending on contextual factors such as what the gestures refer to and the individual characteristics of speakers. These factors make co-speech gesture representation learning challenging. How can we learn meaningful gestures representations considering gestures’ variability and relationship with speech? This paper tackles this challenge by employing self-supervised contrastive learning techniques to learn gesture representations from skeletal and speech information. We propose an approach that includes both unimodal and multimodal pre-training to ground gesture representations in co-occurring speech. For training, we utilize a face-to-face dialogue dataset rich with representational iconic gestures. We conduct thorough intrinsic evaluations of the learned representations through comparison with human-annotated pairwise gesture similarity. Moreover, we perform a diagnostic probing analysis to assess the possibility of recovering interpretable gesture features from the learned representations. Our results show a significant positive correlation with human-annotated gesture similarity and reveal that the similarity between the learned representations is consistent with well-motivated patterns related to the dynamics of dialogue interaction. Moreover, our findings demonstrate that several features concerning the form of gestures can be recovered from the latent representations. Overall, this study shows that multimodal contrastive learning is a promising approach for learning gesture representations, which opens the door to using such representations in larger-scale gesture analysis studies. -
Kendrick, K. H., & Holler, J. (2024). Conversation. In M. C. Frank, & A. Majid (
Eds. ), Open Encyclopedia of Cognitive Science. Cambridge: MIT Press. doi:10.21428/e2759450.3c00b537. -
Rasing, N. B., Van de Geest-Buit, W., Chan, O. Y. A., Mul, K., Lanser, A., Erasmus, C. E., Groothuis, J. T., Holler, J., Ingels, K. J. A. O., Post, B., Siemann, I., & Voermans, N. C. (2024). Psychosocial functioning in patients with altered facial expression: A scoping review in five neurological diseases. Disability and Rehabilitation, 46(17), 3772-3791. doi:10.1080/09638288.2023.2259310.
Abstract
Purpose
To perform a scoping review to investigate the psychosocial impact of having an altered facial expression in five neurological diseases.
Methods
A systematic literature search was performed. Studies were on Bell’s palsy, facioscapulohumeral muscular dystrophy (FSHD), Moebius syndrome, myotonic dystrophy type 1, or Parkinson’s disease patients; had a focus on altered facial expression; and had any form of psychosocial outcome measure. Data extraction focused on psychosocial outcomes.
Results
Bell’s palsy, myotonic dystrophy type 1, and Parkinson’s disease patients more often experienced some degree of psychosocial distress than healthy controls. In FSHD, facial weakness negatively influenced communication and was experienced as a burden. The psychosocial distress applied especially to women (Bell’s palsy and Parkinson’s disease), and patients with more severely altered facial expression (Bell’s palsy), but not for Moebius syndrome patients. Furthermore, Parkinson’s disease patients with more pronounced hypomimia were perceived more negatively by observers. Various strategies were reported to compensate for altered facial expression.
Conclusions
This review showed that patients with altered facial expression in four of five included neurological diseases had reduced psychosocial functioning. Future research recommendations include studies on observers’ judgements of patients during social interactions and on the effectiveness of compensation strategies in enhancing psychosocial functioning.
Implications for rehabilitation
Negative effects of altered facial expression on psychosocial functioning are common and more abundant in women and in more severely affected patients with various neurological disorders.
Health care professionals should be alert to psychosocial distress in patients with altered facial expression.
Learning of compensatory strategies could be a beneficial therapy for patients with psychosocial distress due to an altered facial expression. -
Ter Bekke, M., Drijvers, L., & Holler, J. (2024). Hand gestures have predictive potential during conversation: An investigation of the timing of gestures in relation to speech. Cognitive Science, 48(1): e13407. doi:10.1111/cogs.13407.
Abstract
During face-to-face conversation, transitions between speaker turns are incredibly fast. These fast turn exchanges seem to involve next speakers predicting upcoming semantic information, such that next turn planning can begin before a current turn is complete. Given that face-to-face conversation also involves the use of communicative bodily signals, an important question is how bodily signals such as co-speech hand gestures play into these processes of prediction and fast responding. In this corpus study, we found that hand gestures that depict or refer to semantic information started before the corresponding information in speech, which held both for the onset of the gesture as a whole, as well as the onset of the stroke (the most meaningful part of the gesture). This early timing potentially allows listeners to use the gestural information to predict the corresponding semantic information to be conveyed in speech. Moreover, we provided further evidence that questions with gestures got faster responses than questions without gestures. However, we found no evidence for the idea that how much a gesture precedes its lexical affiliate (i.e., its predictive potential) relates to how fast responses were given. The findings presented here highlight the importance of the temporal relation between speech and gesture and help to illuminate the potential mechanisms underpinning multimodal language processing during face-to-face conversation. -
Ter Bekke, M., Drijvers, L., & Holler, J. (2024). Gestures speed up responses to questions. Language, Cognition and Neuroscience, 39(4), 423-430. doi:10.1080/23273798.2024.2314021.
Abstract
Most language use occurs in face-to-face conversation, which involves rapid turn-taking. Seeing communicative bodily signals in addition to hearing speech may facilitate such fast responding. We tested whether this holds for co-speech hand gestures by investigating whether these gestures speed up button press responses to questions. Sixty native speakers of Dutch viewed videos in which an actress asked yes/no-questions, either with or without a corresponding iconic hand gesture. Participants answered the questions as quickly and accurately as possible via button press. Gestures did not impact response accuracy, but crucially, gestures sped up responses, suggesting that response planning may be finished earlier when gestures are seen. How much gestures sped up responses was not related to their timing in the question or their timing with respect to the corresponding information in speech. Overall, these results are in line with the idea that multimodality may facilitate fast responding during face-to-face conversation. -
Ter Bekke, M., Levinson, S. C., Van Otterdijk, L., Kühn, M., & Holler, J. (2024). Visual bodily signals and conversational context benefit the anticipation of turn ends. Cognition, 248: 105806. doi:10.1016/j.cognition.2024.105806.
Abstract
The typical pattern of alternating turns in conversation seems trivial at first sight. But a closer look quickly reveals the cognitive challenges involved, with much of it resulting from the fast-paced nature of conversation. One core ingredient to turn coordination is the anticipation of upcoming turn ends so as to be able to ready oneself for providing the next contribution. Across two experiments, we investigated two variables inherent to face-to-face conversation, the presence of visual bodily signals and preceding discourse context, in terms of their contribution to turn end anticipation. In a reaction time paradigm, participants anticipated conversational turn ends better when seeing the speaker and their visual bodily signals than when they did not, especially so for longer turns. Likewise, participants were better able to anticipate turn ends when they had access to the preceding discourse context than when they did not, and especially so for longer turns. Critically, the two variables did not interact, showing that visual bodily signals retain their influence even in the context of preceding discourse. In a pre-registered follow-up experiment, we manipulated the visibility of the speaker's head, eyes and upper body (i.e. torso + arms). Participants were better able to anticipate turn ends when the speaker's upper body was visible, suggesting a role for manual gestures in turn end anticipation. Together, these findings show that seeing the speaker during conversation may critically facilitate turn coordination in interaction. -
Trujillo, J. P., & Holler, J. (2024). Conversational facial signals combine into compositional meanings that change the interpretation of speaker intentions. Scientific Reports, 14: 2286. doi:10.1038/s41598-024-52589-0.
Abstract
Human language is extremely versatile, combining a limited set of signals in an unlimited number of ways. However, it is unknown whether conversational visual signals feed into the composite utterances with which speakers communicate their intentions. We assessed whether different combinations of visual signals lead to different intent interpretations of the same spoken utterance. Participants viewed a virtual avatar uttering spoken questions while producing single visual signals (i.e., head turn, head tilt, eyebrow raise) or combinations of these signals. After each video, participants classified the communicative intention behind the question. We found that composite utterances combining several visual signals conveyed different meaning compared to utterances accompanied by the single visual signals. However, responses to combinations of signals were more similar to the responses to related, rather than unrelated, individual signals, indicating a consistent influence of the individual visual signals on the whole. This study therefore provides first evidence for compositional, non-additive (i.e., Gestalt-like) perception of multimodal language.Additional information
41598_2024_52589_MOESM1_ESM.docx -
Trujillo, J. P., & Holler, J. (2024). Information distribution patterns in naturalistic dialogue differ across languages. Psychonomic Bulletin & Review, 31, 1723-1734. doi:10.3758/s13423-024-02452-0.
Abstract
The natural ecology of language is conversation, with individuals taking turns speaking to communicate in a back-and-forth fashion. Language in this context involves strings of words that a listener must process while simultaneously planning their own next utterance. It would thus be highly advantageous if language users distributed information within an utterance in a way that may facilitate this processing–planning dynamic. While some studies have investigated how information is distributed at the level of single words or clauses, or in written language, little is known about how information is distributed within spoken utterances produced during naturalistic conversation. It also is not known how information distribution patterns of spoken utterances may differ across languages. We used a set of matched corpora (CallHome) containing 898 telephone conversations conducted in six different languages (Arabic, English, German, Japanese, Mandarin, and Spanish), analyzing more than 58,000 utterances, to assess whether there is evidence of distinct patterns of information distributions at the utterance level, and whether these patterns are similar or differed across the languages. We found that English, Spanish, and Mandarin typically show a back-loaded distribution, with higher information (i.e., surprisal) in the last half of utterances compared with the first half, while Arabic, German, and Japanese showed front-loaded distributions, with higher information in the first half compared with the last half. Additional analyses suggest that these patterns may be related to word order and rate of noun and verb usage. We additionally found that back-loaded languages have longer turn transition times (i.e.,time between speaker turns)Additional information
Data availability -
Cai, Z. G., Conell, L., & Holler, J. (2013). Time does not flow without language: Spatial distance affects temporal duration regardless of movement or direction. Psychonomic Bulletin & Review, 20(5), 973-980. doi:10.3758/s13423-013-0414-3.
Abstract
Much evidence has suggested that people conceive of time as flowing directionally in transverse space (e.g., from left to right for English speakers). However, this phenomenon has never been tested in a fully nonlinguistic paradigm where neither stimuli nor task use linguistic labels, which raises the possibility that time is directional only when reading/writing direction has been evoked. In the present study, English-speaking participants viewed a video where an actor sang a note while gesturing and reproduced the duration of the sung note by pressing a button. Results showed that the perceived duration of the note was increased by a long-distance gesture, relative to a short-distance gesture. This effect was equally strong for gestures moving from left to right and from right to left and was not dependent on gestures depicting movement through space; a weaker version of the effect emerged with static gestures depicting spatial distance. Since both our gesture stimuli and temporal reproduction task were nonlinguistic, we conclude that the spatial representation of time is nondirectional: Movement contributes, but is not necessary, to the representation of temporal information in a transverse timeline. -
Connell, L., Cai, Z. G., & Holler, J. (2013). Do you see what I'm singing? Visuospatial movement biases pitch perception. Brain and Cognition, 81, 124-130. doi:10.1016/j.bandc.2012.09.005.
Abstract
The nature of the connection between musical and spatial processing is controversial. While pitch may be described in spatial terms such as “high” or “low”, it is unclear whether pitch and space are associated but separate dimensions or whether they share representational and processing resources. In the present study, we asked participants to judge whether a target vocal note was the same as (or different from) a preceding cue note. Importantly, target trials were presented as video clips where a singer sometimes gestured upward or downward while singing that target note, thus providing an alternative, concurrent source of spatial information. Our results show that pitch discrimination was significantly biased by the spatial movement in gesture, such that downward gestures made notes seem lower in pitch than they really were, and upward gestures made notes seem higher in pitch. These effects were eliminated by spatial memory load but preserved under verbal memory load conditions. Together, our findings suggest that pitch and space have a shared representation such that the mental representation of pitch is audiospatial in nature. -
Hall, S., Rumney, L., Holler, J., & Kidd, E. (2013). Associations among play, gesture and early spoken language acquisition. First Language, 33, 294-312. doi:10.1177/0142723713487618.
Abstract
The present study investigated the developmental interrelationships between play, gesture use and spoken language development in children aged 18–31 months. The children completed two tasks: (i) a structured measure of pretend (or ‘symbolic’) play and (ii) a measure of vocabulary knowledge in which children have been shown to gesture. Additionally, their productive spoken language knowledge was measured via parental report. The results indicated that symbolic play is positively associated with children’s gesture use, which in turn is positively associated with spoken language knowledge over and above the influence of age. The tripartite relationship between gesture, play and language development is discussed with reference to current developmental theory. -
Holler, J., Schubotz, L., Kelly, S., Schuetze, M., Hagoort, P., & Ozyurek, A. (2013). Here's not looking at you, kid! Unaddressed recipients benefit from co-speech gestures when speech processing suffers. In M. Knauff, M. Pauen, I. Sebanz, & I. Wachsmuth (
Eds. ), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 2560-2565). Austin, TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0463/index.html.Abstract
In human face-to-face communication, language comprehension is a multi-modal, situated activity. However, little is known about how we combine information from these different modalities, and how perceived communicative intentions, often signaled through visual signals, such as eye
gaze, may influence this processing. We address this question by simulating a triadic communication context in which a
speaker alternated her gaze between two different recipients. Participants thus viewed speech-only or speech+gesture
object-related utterances when being addressed (direct gaze) or unaddressed (averted gaze). Two object images followed
each message and participants’ task was to choose the object that matched the message. Unaddressed recipients responded significantly slower than addressees for speech-only
utterances. However, perceiving the same speech accompanied by gestures sped them up to a level identical to
that of addressees. That is, when speech processing suffers due to not being addressed, gesture processing remains intact and enhances the comprehension of a speaker’s message -
Holler, J., Turner, K., & Varcianna, T. (2013). It's on the tip of my fingers: Co-speech gestures during lexical retrieval in different social contexts. Language and Cognitive Processes, 28(10), 1509-1518. doi:10.1080/01690965.2012.698289.
Abstract
The Lexical Retrieval Hypothesis proposes that gestures function at the level of speech production, aiding in the retrieval of lexical items from the mental lexicon. However, empirical evidence for this account is mixed, and some critics argue that a more likely function of gestures during lexical retrieval is a communicative one. The present study was designed to test these predictions against each other by keeping lexical retrieval difficulty constant while varying social context. Participants' gestures were analysed during tip of the tongue experiences when communicating with a partner face-to-face (FTF), while being separated by a screen, or on their own by speaking into a voice recorder. The results show that participants in the FTF context produced significantly more representational gestures than participants in the solitary condition. This suggests that, even in the specific context of lexical retrieval difficulties, representational gestures appear to play predominantly a communicative role.Files private
Request files -
Lynott, D., Connell, L., & Holler, J. (
Eds. ). (2013). The role of body and environment in cognition. Frontiers in Psychology, 4: 465. doi:10.3389/fpsyg.2013.00465. -
Peeters, D., Chu, M., Holler, J., Ozyurek, A., & Hagoort, P. (2013). Getting to the point: The influence of communicative intent on the kinematics of pointing gestures. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (
Eds. ), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 1127-1132). Austin, TX: Cognitive Science Society.Abstract
In everyday communication, people not only use speech but
also hand gestures to convey information. One intriguing
question in gesture research has been why gestures take the
specific form they do. Previous research has identified the
speaker-gesturer’s communicative intent as one factor
shaping the form of iconic gestures. Here we investigate
whether communicative intent also shapes the form of
pointing gestures. In an experimental setting, twenty-four
participants produced pointing gestures identifying a referent
for an addressee. The communicative intent of the speakergesturer
was manipulated by varying the informativeness of
the pointing gesture. A second independent variable was the
presence or absence of concurrent speech. As a function of their communicative intent and irrespective of the presence of speech, participants varied the durations of the stroke and the post-stroke hold-phase of their gesture. These findings add to our understanding of how the communicative context influences the form that a gesture takes.Additional information
http://mindmodeling.org/cogsci2013/papers/0219/index.html -
Holler, J., & Beattie, G. (2003). How iconic gestures and speech interact in the representation of meaning: are both aspects really integral to the process? Semiotica, 146, 81-116.
-
Holler, J., & Beattie, G. (2003). Pragmatic aspects of representational gestures: Do speakers use them to clarify verbal ambiguity for the listener? Gesture, 3, 127-154.
Share this page