Projects at the Multimodal Language department

 

Research lines
Multimodal Language Grammars and Typology (core)

What are possible variations versus universals in language structures (e.g. in demonstrative systems, negation, or prosody) when multimodal diversity across signed and spoken languages as well as (bimodal) bilingual data sets are taken into account?

Multimodal Language Processing

What are the (neuro)cognitive and processing mechanisms of multimodal language use within and across spoken and signed languages?

Multimodal Interaction

How are language(s) in multiple modalities used to manage social and interactive coordination in dialogue or groups?  How do interaction patterns and groups shape language multimodal language structures?

Multimodal Language Transmission

How does the multimodal nature of language allow or facilitate its transmission to new generations (L1 acquisition/L2 learning) and allow language to emerge anew when it is not accessible?

Multimodal Language Technology Innovation

How can we adapt new technologies such as machine learning, virtual reality, avatars and motion capture, to study visual features of language in situated face-to-face contexts and to process large multimodal language corpora?

Cluster Groups 

Collaborative groups focused on shared research topics and theoretical questions.

Prosody Cluster

Leader: Hatice Zora

Purpose: The group seeks to bring together researchers working on, or interested in, prosody across modalities, languages, and functions. Its objectives are to establish a shared foundation across research traditions, to identify and address central research challenges, and to explore strategies for advancing the field. The group will provide a forum for discussing experimental ideas, applying a range of methodologies, from behavioral and computational approaches to brain imaging techniques, and for developing theoretical frameworks.

Main topics: In particular, the group will focus on:

  • Architecture of prosody: How prosodic cues interact structurally and operationally, with an emphasis on their multimodality and multifunctionality.
  • Neurobiology of prosody: Neural mechanisms underlying prosody, with an emphasis on developing neural network models of interaction.
Reference Cluster

Leader: Paula Rubio-Fernández

Purpose: The Multimodal Reference Cluster investigates the interdependence between language and social cognition by studying multimodal referential communication from four complementary approaches:

Reference production: How do speakers of different languages synchronize gaze, pointing, and speech in face-to-face referential communication? We address this question by comparing the use of demonstratives and other referential expressions in Turkish, Japanese, and Spanish. Our participants wear eye-tracking glasses to monitor their gaze coordination, while external cameras record their speech and pointing gestures.

Reference comprehension: How do listeners integrate the speaker’s gaze, pointing, and speech when they interpret a referential expression? To accurately measure listener’s response (including their looking behavior via eye-tracking), we immerse our participants in a Virtual Reality where they play a referential communication task with a human-animated avatar. We are conducting the first experiment in Dutch at the MPI, but using mobile technology that will allow us to conduct the experiment in other languages during fieldwork.

Reference development: Infants break into language through pointing, and soon after start using demonstratives to establish joint attention with their caregivers. Despite the universality of these milestones, little is known about how children acquire the meaning of demonstratives across languages. To address this question, we investigate mother-infant interaction during naturalistic toy play, focusing on the mother’s use of demonstratives and definite articles (our baseline). We use head-mounted eye-tracking to monitor gaze coordination between mother and infant, and external cameras to track their object manipulation during reference.

Reference modelling: Recent advances in multimodal language models (MLMs) have enabled systems to use text and image so naturally that users often perceive them as real conversational partners. However, existing evaluations of MLMs have largely focused on their use of vocabulary and syntax, while overlooking a fundamental class of grammatical words: indexicals. We have recently completed the first study of humans’ and MLMs’ use of indexicals in simulated face-to-face referential communication. The results confirmed the predicted difficulty hierarchy (vocabulary < possessives < demonstratives) in both groups. However, the difference between content words and indexicals was larger in MLMs, suggesting limitations in perspective-taking and spatial reasoning.

The Multimodal Reference Cluster has monthly meetings, alternative Update Meetings (where all members give updates of their ongoing work and get feedback from others) and a Journal Club (where one of the junior members lead the discussion of a published paper on multimodal reference).

Focus Groups

Hands-on discussion groups focused on methods, tools, and analysis pipelines.

Concepts

Leader: Ezgi Mamus & Marius Peelen 

This focus group is a new initiative of the MPI and Donders Institute, aimed at bringing people together across centers and themes around a topic of shared interest: concepts. Concepts play an important role in multiple fields of cognitive neuroscience, including language (e.g., linguistic concepts), perception (e.g., object categories), action (e.g., embodied cognition), memory, neuropsychology (e.g., apraxia), and lifespan development (e.g., acquisition of concepts, semantic dementia). Many people at the MPI and Donders Institute share an interest in these concepts, and we believe it would be fruitful to bring them together to discuss core questions, new results, and competing theories, potentially leading to new interdisciplinary collaborations. 
 

Bayesian/Statistics 

Leader: Sho Akamine

Purpose: The aim of the statistics focus group is to enhance our understanding of statistics (e.g., mixed-effects regression models) and to cultivate the critical thinking necessary for accurately interpreting statistical analysis outcomes. Our primary focus is on Bayesian inference because: (i) it is gaining popularity, (ii) the knowledge gained can be applied to the traditional frequentist approach, and (iii) Bayesian regressions demand a solid understanding of statistics, which can be overlooked in the frequentist approach, leading to potential misuse and misinterpretation of statistical models.

Main topics: (Generalized) linear mixed-effects regression, Bayesian statistics, causal inference

Multimodal Modeling

Leader: Esam Ghaleb, Lois Dona

Purpose: 

Gesture/Sign-Kinematics 

Leaders: Sharice Clough and Mounika Kanakanti

Purpose: To train researchers with the skills and knowledge to conduct kinematic analyses of gesture and sign language behavior using video-based motion tracking

Main Topics: The MLD Kinematics Focus Group gives researchers hands-on experience with coding modules covering topics such as:

  • how to extract video-based motion tracking data (e.g., MediaPipe data),
  • how to preprocess the data (e.g., smoothing and normalization),
  • how to merge motion-tracking data with other time series data (e.g., ELAN annotations, eye-tracking data), and
  • how to perform calculations to quantify the spatiotemporal dynamics of a movement signal (e.g., velocity, submovements, holds, vertical amplitude, size, volume). 

The focus group also provides opportunities for researchers to present their own kinematic analyses for code review and share new research findings. The group is highly interactive with lots of discussion about novel applications of kinematic analyses for gesture and sign language research, as well as limitations and practical considerations for applying video-based motion tracking methods to existing and new datasets. 

 

MPI Github 

This is the code for how we study how visible bodily signals (hands, face, body posture) coordinate with speech to form multimodal language. Our work spans corpora, experimental design, computational modeling, and kinematic / gesture analysis. On this Github landing page, we host repositories containing scripts for various multimodal analyses. Portions of many of these scripts are sourced and adapted from EnvisionBox (https://envisionbox.org) which maintains a library of coding modules for open exchange among researchers.

https://github.com/Multimodal-Language-Department-MPI-NL

Extracting MediaPipe keypoints 

This module shows how to generate motion tracking data from videos. It uses Google’s MediaPipe library to extract human pose landmarks across all video frames. 

https://github.com/Multimodal-Language-Department-MPI-NL/MediaPipe_keypoints_extraction

 

 

Smoothing

This module shows how to smooth motion tracking data to handle noise due to tracking inaccuracy and how to interpolate missing data.  

https://github.com/Multimodal-Language-Department-MPI-NL/Smoothing

 

Normalization 

This module shows how to normalize the size and position of motion tracking data across files. Normalization ensures that the data for all files is on the same scale so that you can compare movement trajectories across files with different resolutions, camera setups, and participant size.

https://github.com/Multimodal-Language-Department-MPI-NL/Normalization

 

Merging Elan and MediaPipe

This module shows how to merge motion tracking data with annotations from ELAN (or other time series data). This allows you to perform kinematic analyses of gesture strokes or other manually-annotated units from ELAN.

https://github.com/Multimodal-Language-Department-MPI-NL/Merging_Motion_ELAN

 

Speed, Acceleration, and Jerk

This module shows how to calculate movement speed. It also calculates acceleration (changes in speed over time) and jerk (e.g., sudden movements) which are derivatives of speed. 

https://github.com/Multimodal-Language-Department-MPI-NL/Speed_Acceleration_Jerk

 

Submovements and Holds

This module shows how to calculate the number of submovements of a movement signal based on peak speed and detect movement holds (i.e., pauses) below a certain speed threshold. These measures relate to how complex and/or segmented a movement signal is.

https://github.com/Multimodal-Language-Department-MPI-NL/Submovements_Holds

Gesture Space, Size, and Volume 

This module shows how to calculate maximum vertical amplitude (i.e., gesture height), characterize the location of gestures based on McNeillian space (McNeill, 1992), and calculate the 2D size or 3D volume of gesture (or sign) space. 

https://github.com/Multimodal-Language-Department-MPI-NL/Gesture_Space_Size_and_Volume

 

Heatmap Visualization 

This module shows how to generate a scatterplot density heatmap depicting the location of participants’ wrist keypoints during a given movement signal. This visualization shows how participants use space during sign or gesture production. For specific codes, click on the link below:

https://github.com/Multimodal-Language-Department-MPI-NL/Heatmap

 

 

Reading Groups

Groups for collective exploration of current literature and theoretical developments.

Theory of Multimodal Language 

Leader: Ercenur Unal & Neil Cohn

For over a century, language has been considered as an amodal capacity, flowing into different modalities while maintaining a primary modality of speech. However, research over the past half-century has revealed problems with this speech-centric amodal conception of language, particularly given the pervasiveness of multimodal communication. In these meetings, we discuss readings that challenge the predominant amodal paradigm of language and propose alternative, multimodal theoretical frameworks of language.

We discuss issues like: 

  • What is language, particularly in relation to other behaviors like gesture, drawing, or music? What is a modality?
  • Where does iconicity, indexicality, and symbolicity fit within the language architecture?
  • How do we characterize the varying complexity of grammars and their interactions both within modalities (like in bilingual codeswitching) and across modalities (like in multimodality)?
  • How does multimodality change conceptions of linguistic universals, evolution, or relativity? 

Altogether, this discussion aims to provide a theoretical foundation for reconfiguring the grounding principles of the language sciences for a Multimodal Paradigm.

 

Share this page