Publications

Displaying 1 - 10 of 10
  • Burchardt, L., Van de Sande, Y., Kehy, M., Gamba, M., Ravignani, A., & Pouw, W. (2024). A toolkit for the dynamic study of air sacs in siamang and other elastic circular structures. PLOS Computational Biology, 20(6): e1012222. doi:10.1371/journal.pcbi.1012222.

    Abstract

    Biological structures are defined by rigid elements, such as bones, and elastic elements, like muscles and membranes. Computer vision advances have enabled automatic tracking of moving animal skeletal poses. Such developments provide insights into complex time-varying dynamics of biological motion. Conversely, the elastic soft-tissues of organisms, like the nose of elephant seals, or the buccal sac of frogs, are poorly studied and no computer vision methods have been proposed. This leaves major gaps in different areas of biology. In primatology, most critically, the function of air sacs is widely debated; many open questions on the role of air sacs in the evolution of animal communication, including human speech, remain unanswered. To support the dynamic study of soft-tissue structures, we present a toolkit for the automated tracking of semi-circular elastic structures in biological video data. The toolkit contains unsupervised computer vision tools (using Hough transform) and supervised deep learning (by adapting DeepLabCut) methodology to track inflation of laryngeal air sacs or other biological spherical objects (e.g., gular cavities). Confirming the value of elastic kinematic analysis, we show that air sac inflation correlates with acoustic markers that likely inform about body size. Finally, we present a pre-processed audiovisual-kinematic dataset of 7+ hours of closeup audiovisual recordings of siamang (Symphalangus syndactylus) singing. This toolkit (https://github.com/WimPouw/AirSacTracker) aims to revitalize the study of non-skeletal morphological structures across multiple species.
  • Ghaleb, E., Rasenberg, M., Pouw, W., Toni, I., Holler, J., Özyürek, A., & Fernandez, R. (2024). Analysing cross-speaker convergence through the lens of automatically detected shared linguistic constructions. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 1717-1723).

    Abstract

    Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions---expressions with a common lexical core used by both speakers within a dialogue---and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.

    Additional information

    link to eScholarship
  • Ghaleb, E., Khaertdinov, B., Pouw, W., Rasenberg, M., Holler, J., Ozyurek, A., & Fernandez, R. (2024). Learning co-speech gesture representations in dialogue through contrastive learning: An intrinsic evaluation. In Proceedings of the 26th International Conference on Multimodal Interaction (ICMI 2024) (pp. 274-283).

    Abstract

    In face-to-face dialogues, the form-meaning relationship of co-speech gestures varies depending on contextual factors such as what the gestures refer to and the individual characteristics of speakers. These factors make co-speech gesture representation learning challenging. How can we learn meaningful gestures representations considering gestures’ variability and relationship with speech? This paper tackles this challenge by employing self-supervised contrastive learning techniques to learn gesture representations from skeletal and speech information. We propose an approach that includes both unimodal and multimodal pre-training to ground gesture representations in co-occurring speech. For training, we utilize a face-to-face dialogue dataset rich with representational iconic gestures. We conduct thorough intrinsic evaluations of the learned representations through comparison with human-annotated pairwise gesture similarity. Moreover, we perform a diagnostic probing analysis to assess the possibility of recovering interpretable gesture features from the learned representations. Our results show a significant positive correlation with human-annotated gesture similarity and reveal that the similarity between the learned representations is consistent with well-motivated patterns related to the dynamics of dialogue interaction. Moreover, our findings demonstrate that several features concerning the form of gestures can be recovered from the latent representations. Overall, this study shows that multimodal contrastive learning is a promising approach for learning gesture representations, which opens the door to using such representations in larger-scale gesture analysis studies.
  • Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Ozyurek, A., & Fernandez, R. (2024). Cospeech gesture detection through multi-phase sequence labeling. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024) (pp. 4007-4015).

    Abstract

    Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and re-
    traction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework’s capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis.
  • Leonetti, S., Ravignani, A., & Pouw, W. (2024). A cross-species framework for classifying sound-movement couplings. Neuroscience and Biobehavioral Reviews, 167: 105911. doi:10.1016/j.neubiorev.2024.105911.

    Abstract

    Sound and movement are entangled in animal communication. This is obviously true in the case of sound-constituting vibratory movements of biological structures which generate acoustic waves. A little less obvious is that other moving structures produce the energy required to sustain these vibrations. In many species, the respiratory system moves to generate the expiratory flow which powers the sound-constituting movements (sound-powering movements). The sound may acquire additional structure via upper tract movements, such as articulatory movements or head raising (sound-filtering movements). Some movements are not necessary for sound production, but when produced, impinge on the sound-producing process due to weak biomechanical coupling with body parts (e.g., respiratory system) that are necessary for sound production (sound-impinging movements). Animals also produce sounds contingent with movement, requiring neuro-physiological control regimes allowing to flexibly couple movements to a produced sound, or coupling movements to a perceived external sound (sound-contingent movement). Here, we compare and classify the variety of ways sound and movements are coupled in animal communication; our proposed framework should help structure previous and future studies on this topic.
  • Pouw, W., Van Gog, T., Zwaan, R. A., & Paas, F. (2016). Augmenting instructional animations with a body analogy to help children learn about physical systems. Frontiers in Psychology, 7: 860. doi:10.3389/fpsyg.2016.00860.

    Abstract

    We investigated whether augmenting instructional animations with a body analogy (BA) would improve 10- to 13-year-old children’s learning about class-1 levers. Children with a lower level of general math skill who learned with an instructional animation that provided a BA of the physical system, showed higher accuracy on a lever problem-solving reaction time task than children studying the instructional animation without this BA. Additionally, learning with a BA led to a higher speed–accuracy trade-off during the transfer task for children with a lower math skill, which provided additional evidence that especially this group is likely to be affected by learning with a BA. However, overall accuracy and solving speed on the transfer task was not affected by learning with or without this BA. These results suggest that providing children with a BA during animation study provides a stepping-stone for understanding mechanical principles of a physical system, which may prove useful for instructional designers. Yet, because the BA does not seem effective for all children, nor for all tasks, the degree of effectiveness of body analogies should be studied further. Future research, we conclude, should be more sensitive to the necessary degree of analogous mapping between the body and physical systems, and whether this mapping is effective for reasoning about more complex instantiations of such physical systems.
  • Pouw, W., Eielts, C., Van Gog, T., Zwaan, R. A., & Paas, F. (2016). Does (non‐)meaningful sensori‐motor engagement promote learning with animated physical systems? Mind, Brain and Education, 10(2), 91-104. doi:10.1111/mbe.12105.

    Abstract

    Previous research indicates that sensori‐motor experience with physical systems can have a positive effect on learning. However, it is not clear whether this effect is caused by mere bodily engagement or the intrinsically meaningful information that such interaction affords in performing the learning task. We investigated (N = 74), through the use of a Wii Balance Board, whether different forms of physical engagement that was either meaningfully, non‐meaningfully, or minimally related to the learning content would be beneficial (or detrimental) to learning about the workings of seesaws from instructional animations. The results were inconclusive, indicating that motoric competency on lever problem solving did not significantly differ between conditions, nor were response speed and transfer performance affected. These findings suggest that adult's implicit and explicit knowledge about physical systems is stable and not easily affected by (contradictory) sensori‐motor experiences. Implications for embodied learning are discussed.
  • Pouw, W., & Hostetter, A. B. (2016). Gesture as predictive action. Reti, Saperi, Linguaggi: Italian Journal of Cognitive Sciences, 3, 57-80. doi:10.12832/83918.

    Abstract

    Two broad approaches have dominated the literature on the production of speech-accompanying gestures. On the one hand, there are approaches that aim to explain the origin of gestures by specifying the mental processes that give rise to them. On the other, there are approaches that aim to explain the cognitive function that gestures have for the gesturer or the listener. In the present paper we aim to reconcile both approaches in one single perspective that is informed by a recent sea change in cognitive science, namely, Predictive Processing Perspectives (PPP; Clark 2013b; 2015). We start with the idea put forth by the Gesture as Simulated Action (GSA) framework (Hostetter, Alibali 2008). Under this view, the mental processes that give rise to gesture are re-enactments of sensori-motor experiences (i.e., simulated actions). We show that such anticipatory sensori-motor states and the constraints put forth by the GSA framework can be understood as top-down kinesthetic predictions that function in a broader predictive machinery as proposed by PPP. By establishing this alignment, we aim to show how gestures come to fulfill a genuine cognitive function above and beyond the mental processes that give rise to gesture.
  • Pouw, W., Myrto-Foteini, M., Van Gog, T., & Paas, F. (2016). Gesturing during mental problem solving reduces eye movements, especially for individuals with lower visual working memory capacity. Cognitive Processing, 17, 269-277. doi:10.1007/s10339-016-0757-6.

    Abstract

    Non-communicative hand gestures have been found to benefit problem-solving performance. These gestures seem to compensate for limited internal cognitive capacities, such as visual working memory capacity. Yet, it is not clear how gestures might perform this cognitive function. One hypothesis is that gesturing is a means to spatially index mental simulations, thereby reducing the need for visually projecting the mental simulation onto the visual presentation of the task. If that hypothesis is correct, less eye movements should be made when participants gesture during problem solving than when they do not gesture. We therefore used mobile eye tracking to investigate the effect of co-thought gesturing and visual working memory capacity on eye movements during mental solving of the Tower of Hanoi problem. Results revealed that gesturing indeed reduced the number of eye movements (lower saccade counts), especially for participants with a relatively lower visual working memory capacity. Subsequent problem-solving performance was not affected by having (not) gestured during the mental solving phase. The current findings suggest that our understanding of gestures in problem solving could be improved by taking into account eye movements during gesturing.
  • Van Wermeskerken, M., Fijan, N., Eielts, C., & Pouw, W. (2016). Observation of depictive versus tracing gestures selectively aids verbal versus visual–spatial learning in primary school children. Applied Cognitive Psychology, 30, 806-814. doi:10.1002/acp.3256.

    Abstract

    Previous research has established that gesture observation aids learning in children. The current study examinedwhether observation of gestures (i.e. depictive and tracing gestures) differentially affected verbal and visual–spatial retention whenlearning a route and its street names. Specifically, we explored whether children (n = 97) with lower visual and verbal working-memory capacity benefited more from observing gestures as compared with children who score higher on these traits. To thisend, 11- to 13-year-old children were presented with an instructional video of a route containing no gestures, depictive gestures,tracing gestures or both depictive and tracing gestures. Results indicated that the type of observed gesture affected performance:Observing tracing gestures or both tracing and depictive gestures increased performance on route retention, while observingdepictive gestures or both depictive and tracing gestures increased performance on street name retention. These effects werenot differentially affected by working-memory capacity

Share this page