Publications

Displaying 1 - 10 of 10
  • Burchardt, L., Van de Sande, Y., Kehy, M., Gamba, M., Ravignani, A., & Pouw, W. (2024). A toolkit for the dynamic study of air sacs in siamang and other elastic circular structures. PLOS Computational Biology, 20(6): e1012222. doi:10.1371/journal.pcbi.1012222.

    Abstract

    Biological structures are defined by rigid elements, such as bones, and elastic elements, like muscles and membranes. Computer vision advances have enabled automatic tracking of moving animal skeletal poses. Such developments provide insights into complex time-varying dynamics of biological motion. Conversely, the elastic soft-tissues of organisms, like the nose of elephant seals, or the buccal sac of frogs, are poorly studied and no computer vision methods have been proposed. This leaves major gaps in different areas of biology. In primatology, most critically, the function of air sacs is widely debated; many open questions on the role of air sacs in the evolution of animal communication, including human speech, remain unanswered. To support the dynamic study of soft-tissue structures, we present a toolkit for the automated tracking of semi-circular elastic structures in biological video data. The toolkit contains unsupervised computer vision tools (using Hough transform) and supervised deep learning (by adapting DeepLabCut) methodology to track inflation of laryngeal air sacs or other biological spherical objects (e.g., gular cavities). Confirming the value of elastic kinematic analysis, we show that air sac inflation correlates with acoustic markers that likely inform about body size. Finally, we present a pre-processed audiovisual-kinematic dataset of 7+ hours of closeup audiovisual recordings of siamang (Symphalangus syndactylus) singing. This toolkit (https://github.com/WimPouw/AirSacTracker) aims to revitalize the study of non-skeletal morphological structures across multiple species.
  • Ghaleb, E., Rasenberg, M., Pouw, W., Toni, I., Holler, J., Özyürek, A., & Fernandez, R. (2024). Analysing cross-speaker convergence through the lens of automatically detected shared linguistic constructions. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 1717-1723).

    Abstract

    Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions---expressions with a common lexical core used by both speakers within a dialogue---and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.

    Additional information

    link to eScholarship
  • Ghaleb, E., Khaertdinov, B., Pouw, W., Rasenberg, M., Holler, J., Ozyurek, A., & Fernandez, R. (2024). Learning co-speech gesture representations in dialogue through contrastive learning: An intrinsic evaluation. In Proceedings of the 26th International Conference on Multimodal Interaction (ICMI 2024) (pp. 274-283).

    Abstract

    In face-to-face dialogues, the form-meaning relationship of co-speech gestures varies depending on contextual factors such as what the gestures refer to and the individual characteristics of speakers. These factors make co-speech gesture representation learning challenging. How can we learn meaningful gestures representations considering gestures’ variability and relationship with speech? This paper tackles this challenge by employing self-supervised contrastive learning techniques to learn gesture representations from skeletal and speech information. We propose an approach that includes both unimodal and multimodal pre-training to ground gesture representations in co-occurring speech. For training, we utilize a face-to-face dialogue dataset rich with representational iconic gestures. We conduct thorough intrinsic evaluations of the learned representations through comparison with human-annotated pairwise gesture similarity. Moreover, we perform a diagnostic probing analysis to assess the possibility of recovering interpretable gesture features from the learned representations. Our results show a significant positive correlation with human-annotated gesture similarity and reveal that the similarity between the learned representations is consistent with well-motivated patterns related to the dynamics of dialogue interaction. Moreover, our findings demonstrate that several features concerning the form of gestures can be recovered from the latent representations. Overall, this study shows that multimodal contrastive learning is a promising approach for learning gesture representations, which opens the door to using such representations in larger-scale gesture analysis studies.
  • Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Ozyurek, A., & Fernandez, R. (2024). Cospeech gesture detection through multi-phase sequence labeling. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024) (pp. 4007-4015).

    Abstract

    Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and re-
    traction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework’s capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis.
  • Leonetti, S., Ravignani, A., & Pouw, W. (2024). A cross-species framework for classifying sound-movement couplings. Neuroscience and Biobehavioral Reviews, 167: 105911. doi:10.1016/j.neubiorev.2024.105911.

    Abstract

    Sound and movement are entangled in animal communication. This is obviously true in the case of sound-constituting vibratory movements of biological structures which generate acoustic waves. A little less obvious is that other moving structures produce the energy required to sustain these vibrations. In many species, the respiratory system moves to generate the expiratory flow which powers the sound-constituting movements (sound-powering movements). The sound may acquire additional structure via upper tract movements, such as articulatory movements or head raising (sound-filtering movements). Some movements are not necessary for sound production, but when produced, impinge on the sound-producing process due to weak biomechanical coupling with body parts (e.g., respiratory system) that are necessary for sound production (sound-impinging movements). Animals also produce sounds contingent with movement, requiring neuro-physiological control regimes allowing to flexibly couple movements to a produced sound, or coupling movements to a perceived external sound (sound-contingent movement). Here, we compare and classify the variety of ways sound and movements are coupled in animal communication; our proposed framework should help structure previous and future studies on this topic.
  • Brown, A. R., Pouw, W., Brentari, D., & Goldin-Meadow, S. (2021). People are less susceptible to illusion when they use their hands to communicate rather than estimate. Psychological Science, 32, 1227-1237. doi:10.1177/0956797621991552.

    Abstract

    When we use our hands to estimate the length of a stick in the Müller-Lyer illusion, we are highly susceptible to the illusion. But when we prepare to act on sticks under the same conditions, we are significantly less susceptible. Here, we asked whether people are susceptible to illusion when they use their hands not to act on objects but to describe them in spontaneous co-speech gestures or conventional sign languages of the deaf. Thirty-two English speakers and 13 American Sign Language signers used their hands to act on, estimate the length of, and describe sticks eliciting the Müller-Lyer illusion. For both gesture and sign, the magnitude of illusion in the description task was smaller than the magnitude of illusion in the estimation task and not different from the magnitude of illusion in the action task. The mechanisms responsible for producing gesture in speech and sign thus appear to operate not on percepts involved in estimation but on percepts derived from the way we act on objects.

    Additional information

    supplementary material data via OSF
  • Pouw, W., Dingemanse, M., Motamedi, Y., & Ozyurek, A. (2021). A systematic investigation of gesture kinematics in evolving manual languages in the lab. Cognitive Science, 45(7): e13014. doi:10.1111/cogs.13014.

    Abstract

    Silent gestures consist of complex multi-articulatory movements but are now primarily studied through categorical coding of the referential gesture content. The relation of categorical linguistic content with continuous kinematics is therefore poorly understood. Here, we reanalyzed the video data from a gestural evolution experiment (Motamedi, Schouwstra, Smith, Culbertson, & Kirby, 2019), which showed increases in the systematicity of gesture content over time. We applied computer vision techniques to quantify the kinematics of the original data. Our kinematic analyses demonstrated that gestures become more efficient and less complex in their kinematics over generations of learners. We further detect the systematicity of gesture form on the level of thegesture kinematic interrelations, which directly scales with the systematicity obtained on semantic coding of the gestures. Thus, from continuous kinematics alone, we can tap into linguistic aspects that were previously only approachable through categorical coding of meaning. Finally, going beyond issues of systematicity, we show how unique gesture kinematic dialects emerged over generations as isolated chains of participants gradually diverged over iterations from other chains. We, thereby, conclude that gestures can come to embody the linguistic system at the level of interrelationships between communicative tokens, which should calibrate our theories about form and linguistic content.
  • Pouw, W., Wit, J., Bögels, S., Rasenberg, M., Milivojevic, B., & Ozyurek, A. (2021). Semantically related gestures move alike: Towards a distributional semantics of gesture kinematics. In V. G. Duffy (Ed.), Digital human modeling and applications in health, safety, ergonomics and risk management. human body, motion and behavior:12th International Conference, DHM 2021, Held as Part of the 23rd HCI International Conference, HCII 2021 (pp. 269-287). Berlin: Springer. doi:10.1007/978-3-030-77817-0_20.
  • Pouw, W., Proksch, S., Drijvers, L., Gamba, M., Holler, J., Kello, C., Schaefer, R. S., & Wiggins, G. A. (2021). Multilevel rhythms in multimodal communication. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 376: 20200334. doi:10.1098/rstb.2020.0334.

    Abstract

    It is now widely accepted that the brunt of animal communication is conducted via several modalities, e.g. acoustic and visual, either simultaneously or sequentially. This is a laudable multimodal turn relative to traditional accounts of temporal aspects of animal communication which have focused on a single modality at a time. However, the fields that are currently contributing to the study of multimodal communication are highly varied, and still largely disconnected given their sole focus on a particular level of description or their particular concern with human or non-human animals. Here, we provide an integrative overview of converging findings that show how multimodal processes occurring at neural, bodily, as well as social interactional levels each contribute uniquely to the complex rhythms that characterize communication in human and non-human animals. Though we address findings for each of these levels independently, we conclude that the most important challenge in this field is to identify how processes at these different levels connect.
  • Pouw, W., De Jonge-Hoekstra, L., Harrison, S. J., Paxton, A., & Dixon, J. A. (2021). Gesture-speech physics in fluent speech and rhythmic upper limb movements. Annals of the New York Academy of Sciences, 1491(1), 89-105. doi:10.1111/nyas.14532.

    Abstract

    Communicative hand gestures are often coordinated with prosodic aspects of speech, and salient moments of gestural movement (e.g., quick changes in speed) often co-occur with salient moments in speech (e.g., near peaks in fundamental frequency and intensity). A common understanding is that such gesture and speech coordination is culturally and cognitively acquired, rather than having a biological basis. Recently, however, the biomechanical physical coupling of arm movements to speech movements has been identified as a potentially important factor in understanding the emergence of gesture-speech coordination. Specifically, in the case of steady-state vocalization and mono-syllable utterances, forces produced during gesturing are transferred onto the tensioned body, leading to changes in respiratory-related activity and thereby affecting vocalization F0 and intensity. In the current experiment (N = 37), we extend this previous line of work to show that gesture-speech physics impacts fluent speech, too. Compared with non-movement, participants who are producing fluent self-formulated speech, while rhythmically moving their limbs, demonstrate heightened F0 and amplitude envelope, and such effects are more pronounced for higher-impulse arm versus lower-impulse wrist movement. We replicate that acoustic peaks arise especially during moments of peak-impulse (i.e., the beat) of the movement, namely around deceleration phases of the movement. Finally, higher deceleration rates of higher-mass arm movements were related to higher peaks in acoustics. These results confirm a role for physical-impulses of gesture affecting the speech system. We discuss the implications of
    gesture-speech physics for understanding of the emergence of communicative gesture, both ontogenetically and phylogenetically.

    Additional information

    data and analyses

Share this page