Displaying 1 - 29 of 29
  • Drijvers, L., Small, S. L., & Skipper, J. I. (2025). Language is widely distributed throughout the brain. Nature Reviews Neuroscience, 26: 189. doi:10.1038/s41583-024-00903-0.
  • Emmendorfer, A. K., & Holler, J. (2025). Facial signals shape predictions about the nature of upcoming conversational responses. Scientific Reports, 15: 1381. doi:10.1038/s41598-025-85192-y.

    Abstract

    Increasing evidence suggests that interlocutors use visual communicative signals to form predictions about unfolding utterances, but there is little data on the predictive potential of facial signals in conversation. In an online experiment with virtual agents, we examine whether facial signals produced by an addressee may allow speakers to anticipate the response to a question before it is given. Participants (n = 80) viewed videos of short conversation fragments between two virtual humans. Each fragment ended with the Questioner asking a question, followed by a pause during which the Responder looked either straight at the Questioner (baseline), or averted their gaze, or accompanied the straight gaze with one of the following facial signals: brow raise, brow frown, nose wrinkle, smile, squint, mouth corner pulled back (dimpler). Participants then indicated on a 6-point scale whether they expected a “yes” or “no” response. Analyses revealed that all signals received different ratings relative to the baseline: brow raises, dimplers, and smiles were associated with more positive responses, gaze aversions, brow frowns, nose wrinkles, and squints with more negative responses. Qur findings show that interlocutors may form strong associations between facial signals and upcoming responses to questions, highlighting their predictive potential in face-to-face conversation.

    Additional information

    supplementary materials
  • Esmer, Ş. C., Turan, E., Karadöller, D. Z., & Göksun, T. (2025). Sources of variation in preschoolers’ relational reasoning: The interaction between language use and working memory. Journal of Experimental Child Psychology, 252: 106149. doi:10.1016/j.jecp.2024.106149.

    Abstract

    Previous research has suggested the importance of relational language and working memory in children’s relational reasoning. The tendency to use language (e.g., using more relational than object-focused language, prioritizing focal objects over background in linguistic descriptions) could reflect children’s biases toward the relational versus object-based solutions in a relational match-to-sample (RMTS) task. In the lack of any apparent object match as a foil option, object-focused children might rely on other cognitive mechanisms (i.e., working memory) to choose a relational match in the RMTS task. The current study examined the interactive roles of language- and working memory-related sources of variation in Turkish-learning preschoolers’ relational reasoning. We collected data from 4- and 5-year-olds (N = 41) via Zoom in the RMTS task, a scene description task, and a backward word span task. Generalized binomial mixed effects models revealed that children who used more relational language and background-focused scene descriptions performed worse in the relational reasoning task. Furthermore, children with less frequent relational language use and focal object descriptions of the scenes benefited more from working memory to succeed in the relational reasoning task. These results suggest additional working memory demands for object-focused children to choose relational matches in the RMTS task, highlighting the importance of examining the interactive effects of different cognitive mechanisms on relational reasoning.

    Additional information

    supplementary material
  • Göksun, T., Aktan-Erciyes, A., Karadöller, D. Z., & Demir-Lira, Ö. E. (2025). Multifaceted nature of early vocabulary development: Connecting child characteristics with parental input types. Child Development Perspectives, 19(1), 30-37. doi:10.1111/cdep.12524.

    Abstract

    Children need to learn the demands of their native language in the early vocabulary development phase. In this dynamic process, parental multimodal input may shape neurodevelopmental trajectories while also being tailored by child-related factors. Moving beyond typically characterized group profiles, in this article, we synthesize growing evidence on the effects of parental multimodal input (amount, quality, or absence), domain-specific input (space and math), and language-specific input (causal verbs and sound symbols) on preterm, full-term, and deaf children's early vocabulary development, focusing primarily on research with children learning Turkish and Turkish Sign Language. We advocate for a theoretical perspective, integrating neonatal characteristics and parental input, and acknowledging the unique constraints of languages.
  • Karadöller, D. Z., Demir-Lira, Ö. E., & Göksun, T. (2025). Full-term children with lower vocabulary scores receive more multimodal math input than preterm children. Journal of Cognition and Development. Advance online publication. doi:10.1080/15248372.2025.2470245.

    Abstract

    One of the earliest sources of mathematical input arises in dyadic parent–child interactions. However, the emphasis has been on parental input only in speech and how input varies across different environmental and child-specific factors remains largely unexplored. Here, we investigated the relationship among parental math input modality and type, children’s gestational status (being preterm vs. full-term born), and vocabulary development. Using book-reading as a medium for parental math input in dyadic interaction, we coded specific math input elicited by Turkish-speaking parents and their 26-month-old children (N = 58, 24 preterms) for speech-only and multimodal (speech and gestures combined) input. Results showed that multimodal math input, as opposed to speech-only math input, was uniquely associated with gestational status, expressive vocabulary, and the interaction between the two. Full-term children with lower expressive vocabulary scores received more multimodal input compared to their preterm peers. However, there was no association between expressive vocabulary and multimodal math input for preterm children. Moreover, cardinality was the most frequent type for both speech-only and multimodal input. These findings suggest that the specific type of multimodal math input can be produced as a function of children’s gestational status and vocabulary development.
  • Lokhesh, N. N., Swaminathan, K., Shravan, G., Menon, D., Mishra, S., Nandanwar, A., & Mishra, C. (2025). Welcome to the library: Integrating social robots in Indian libraries. In O. Palinko, L. Bodenhagen, J.-J. Cabibihan, K. Fischer, S. Šabanović, K. Winkle, L. Behera, S. S. Ge, D. Chrysostomou, W. Jiang, & H. He (Eds.), Social Robotics: 16th International Conference, ICSR + AI 2024, Odense, Denmark, October 23–26, 2024, Proceedings (pp. 239-246). Singapore: Springer. doi:10.1007/978-981-96-3525-2_20.

    Abstract

    Libraries are very often considered the hallway to developing knowledge. However, the lack of adequate staff within Indian libraries makes catering to the visitors’ needs difficult. Previous systems that have sought to address libraries’ needs through automation have mostly been limited to storage and fetching aspects while lacking in their interaction aspect. We propose to address this issue by incorporating social robots within Indian libraries that can communicate and address the visitors’ queries in a multi-modal fashion attempting to make the experience more natural and appealing while helping reduce the burden on the librarians. In this paper, we propose and deploy a Furhat robot as a robot librarian by programming it on certain core librarian functionalities. We evaluate our system with a physical robot librarian (N = 26). The results show that the robot librarian was found to be very informative and overall left with a positive impression and preference.
  • Mishra, C., Skantze, G., Hagoort, P., & Verdonschot, R. G. (2025). Perception of emotions in human and robot faces: Is the eye region enough? In O. Palinko, L. Bodenhagen, J.-J. Cabihihan, K. Fischer, S. Šabanović, K. Winkle, L. Behera, S. S. Ge, D. Chrysostomou, W. Jiang, & H. He (Eds.), Social Robotics: 116th International Conference, ICSR + AI 2024, Odense, Denmark, October 23–26, 2024, Proceedings (pp. 290-303). Singapore: Springer.

    Abstract

    The increased interest in developing next-gen social robots has raised questions about the factors affecting the perception of robot emotions. This study investigates the impact of robot appearances (human-like, mechanical) and face regions (full-face, eye-region) on human perception of robot emotions. A between-subjects user study (N = 305) was conducted where participants were asked to identify the emotions being displayed in videos of robot faces, as well as a human baseline. Our findings reveal three important insights for effective social robot face design in Human-Robot Interaction (HRI): Firstly, robots equipped with a back-projected, fully animated face – regardless of whether they are more human-like or more mechanical-looking – demonstrate a capacity for emotional expression comparable to that of humans. Secondly, the recognition accuracy of emotional expressions in both humans and robots declines when only the eye region is visible. Lastly, within the constraint of only the eye region being visible, robots with more human-like features significantly enhance emotion recognition.
  • Özer, D., Özyürek, A., & Göksun, T. (2025). Spatial working memory is critical for gesture processing: Evidence from gestures with varying semantic links to speech. Psychonomic Bulletin & Review. Advance online publication. doi:10.3758/s13423-025-02642-4.

    Abstract

    Gestures express redundant or complementary information to speech they accompany by depicting visual and spatial features of referents. In doing so, they recruit both spatial and verbal cognitive resources that underpin the processing of visual semantic information and its integration with speech. The relation between spatial and verbal skills and gesture comprehension, where gestures may serve different roles in relation to speech is yet to be explored. This study examined the role of spatial and verbal skills in processing gestures that expressed redundant or complementary information to speech during the comprehension of spatial relations between objects. Turkish-speaking adults (N=74) watched videos describing the spatial location of objects that involved perspective-taking (left-right) or not (on-under) with speech and gesture. Gestures either conveyed redundant information to speech (e.g., saying and gesturing “left”) or complemented the accompanying demonstrative in speech (e.g., saying “here,” gesturing “left”). We also measured participants’ spatial (the Corsi block span and the mental rotation tasks) and verbal skills (the digit span task). Our results revealed nuanced interactions between these skills and spatial language comprehension, depending on the modality in which the information was expressed. One insight emerged prominently. Spatial skills, particularly spatial working memory capacity, were related to enhanced comprehension of visual semantic information conveyed through gestures especially when this information was not present in the accompanying speech. This study highlights the critical role of spatial working memory in gesture processing and underscores the importance of examining the interplay among cognitive and contextual factors to understand the complex dynamics of multimodal language.

    Additional information

    supplementary file data via OSF
  • Rubio-Fernandez, P. (2025). First acquiring articles in a second language: A new approach to the study of language and social cognition. Lingua, 313: 103851. doi:10.1016/j.lingua.2024.103851.

    Abstract

    Pragmatic phenomena are characterized by extreme variability, which makes it difficult to draw sound generalizations about the role of social cognition in pragmatic language by and large. I introduce cultural evolutionary pragmatics as a new framework for the study of the interdependence between language and social cognition, and point at the study of common-ground management across languages and ages as a way to test the reliance of pragmatic language on social cognition. I illustrate this new research line with three experiments on article use by second language speakers, whose mother tongue lacks articles. These L2 speakers are known to find article use challenging and it is often argued that their difficulties stem from articles being pragmatically redundant. Contrary to this view, the results of this exploratory study support the view that proficient article use requires automatizing basic socio-cognitive processes, offering a window into the interdependence between language and social cognition.
  • Rubio-Fernandez, P., Berke, M. D., & Jara-Ettinger, J. (2025). Tracking minds in communication. Trends in Cognitive Sciences, 29(3), 269-281. doi:10.1016/j.tics.2024.11.005.

    Abstract

    How might social cognition help us communicate through language? At what levels does this interaction occur? In classical views, social cognition is independent of language, and integrating the two can be slow, effortful, and error-prone. But new research into word level processes reveals that communication
    is brimming with social micro-processes that happen in real time, guiding even the simplest choices like how we use adjectives, articles, and demonstratives. We interpret these findings in the context of advances in theoretical models of social cognition and propose a Communicative Mind-Tracking
    framework, where social micro-processes aren’t a secondary process in how we use language—they are fundamental to how communication works.
  • Soberanes, M., Pérez-Ramírez, C. A., & Assaneo, M. F. (2025). Insights into the effect of general attentional state, coarticulation, and primed speech rate in phoneme production time. Journal of Speech, Language, and Hearing Research, 68(4), 1773-1783. doi:10.1044/2025_JSLHR-24-00595.

    Abstract

    Purpose:
    This study aimed to identify how a set of predefined factors modulates phoneme articulation time within a speaker.
    Method:
    We used a custom in-lab system that records lip muscle activity through electromyography signals, aligned with the produced speech, to measure phoneme articulation time. Twenty Spanish-speaking participants (12 females) were evaluated while producing sequences of a consonant–vowel syllable, with each sequence consisting of repeated articulations of either /pa/ or /pu/. Before starting the sequences, participants underwent a priming step with either a fast or slow speech rate. Additionally, the general attentional state level was assessed at the beginning, middle, and end of the protocol. To analyze the variability in the duration of /p/ and vowel articulation, we fitted individual linear mixed-models considering three factors: general attentional state level, priming rate, and coarticulation effects (for /p/, i.e., followed by /a/ or /u/) or phoneme identity (for vowels, i.e., being /a/ or /u/).
    Results:
    We found that the level of general attentional state positively correlated with production time for both the consonant /p/ and the vowels. Additionally, /p/ production was influenced by the nature of the following vowel (i.e., coarticulation effects), while vowel production time was affected by the primed speech rate.
    Conclusions:
    Phoneme duration appears to be influenced by both stable, speaker-specific characteristics (idiosyncratic traits) and internal, state-dependent factors related to the speaker's condition at the time of speech production. While some factors affect both consonants and vowels, others specifically modify only one of these types.

    Additional information

    supplemental material
  • Ter Bekke, M., Drijvers, L., & Holler, J. (2025). Co-speech hand gestures are used to predict upcoming meaning. Psychological Science, 36(4), 237-248. doi:10.1177/09567976251331041.

    Abstract

    In face-to-face conversation, people use speech and gesture to convey meaning. Seeing gestures alongside speech facilitates comprehenders’ language processing, but crucially, the mechanisms underlying this facilitation remain unclear. We investigated whether comprehenders use the semantic information in gestures, typically preceding related speech, to predict upcoming meaning. Dutch adults listened to questions asked by a virtual avatar. Questions were accompanied by an iconic gesture (e.g., typing) or meaningless control movement (e.g., arm scratch) followed by a short pause and target word (e.g., “type”). A Cloze experiment showed that gestures improved explicit predictions of upcoming target words. Moreover, an EEG experiment showed that gestures reduced alpha and beta power during the pause, indicating anticipation, and reduced N400 amplitudes, demonstrating facilitated semantic processing. Thus, comprehenders use iconic gestures to predict upcoming meaning. Theories of linguistic prediction should incorporate communicative bodily signals as predictive cues to capture how language is processed in face-to-face interaction.

    Additional information

    supplementary material
  • Tilston, O., Holler, J., & Bangerter, A. (2025). Opening social interactions: The coordination of approach, gaze, speech and handshakes during greetings. Cognitive Science, 49(2): e70049. doi:10.1111/cogs.70049.

    Abstract

    Despite the importance of greetings for opening social interactions, their multimodal coordination processes remain poorly understood. We used a naturalistic, lab-based setup where pairs of unacquainted participants approached and greeted each other while unaware their greeting behavior was studied. We measured the prevalence and time course of multimodal behaviors potentially culminating in a handshake, including motor behaviors (e.g., walking, standing up, hand movements like raise, grasp, and retraction), gaze patterns (using eye tracking glasses), and speech (close and distant verbal salutations). We further manipulated the visibility of partners’ eyes to test its effect on gaze. Our findings reveal that gaze to a partner's face increases over the course of a greeting, but is partly averted during approach and is influenced by the visibility of partners’ eyes. Gaze helps coordinate handshakes, by signaling intent and guiding the grasp. The timing of adjacency pairs in verbal salutations is comparable to the precision of floor transitions in the main body of conversations, and varies according to greeting phase, with distant salutation pair parts featuring more gaps and close salutation pair parts featuring more overlap. Gender composition and a range of multimodal behaviors affect whether pairs chose to shake hands or not. These findings fill several gaps in our understanding of greetings and provide avenues for future research, including advancements in social robotics and human−robot interaction.
  • Trujillo, J. P., Dyer, R. M. K., & Holler, J. (2025). Dyadic differences in empathy scores are associated with kinematic similarity during conversational question-answer pairs. Discourse Processes, 62(3), 195-213. doi:10.1080/0163853X.2025.2467605.

    Abstract

    During conversation, speakers coordinate and synergize their behaviors at multiple levels, and in different ways. The extent to which individuals converge or diverge in their behaviors during interaction may relate to interpersonal differences relevant to social interaction, such as empathy as measured by the empathy quotient (EQ). An association between interpersonal difference in empathy and interpersonal entrainment could help to throw light on how interlocutor characteristics influence interpersonal entrainment. We investigated this possibility in a corpus of unconstrained conversation between dyads. We used dynamic time warping to quantify entrainment between interlocutors of head motion, hand motion, and maximum speech f0 during question–response sequences. We additionally calculated interlocutor differences in EQ scores. We found that, for both head and hand motion, greater difference in EQ was associated with higher entrainment. Thus, we consider that people who are dissimilar in EQ may need to “ground” their interaction with low-level movement entrainment. There was no significant relationship between f0 entrainment and EQ score differences.
  • Trujillo, J. P., & Holler, J. (2025). Multimodal information density is highest in question beginnings, and early entropy is associated with fewer but longer visual signals. Discourse Processes, 62(2), 69-88. doi:10.1080/0163853X.2024.2413314.

    Abstract

    When engaged in spoken conversation, speakers convey meaning using both speech and visual signals, such as facial expressions and manual gestures. An important question is how information is distributed in utterances during face-to-face interaction when information from visual signals is also present. In a corpus of casual Dutch face-to-face conversations, we focus on spoken questions in particular because they occur frequently, thus constituting core building blocks of conversation. We quantified information density (i.e. lexical entropy and surprisal) and the number and relative duration of facial and manual signals. We tested whether lexical information density or the number of visual signals differed between the first and last halves of questions, as well as whether the number of visual signals occurring in the less-predictable portion of a question was associated with the lexical information density of the same portion of the question in a systematic manner. We found that information density, as well as number of visual signals, were higher in the first half of questions, and specifically lexical entropy was associated with fewer, but longer visual signals. The multimodal front-loading of questions and the complementary distribution of visual signals and high entropy words in Dutch casual face-to-face conversations may have implications for the parallel processes of utterance comprehension and response planning during turn-taking.

    Additional information

    supplemental material
  • Ünal, E., Kırbaşoğlu, K., Karadöller, D. Z., Sumer, B., & Özyürek, A. (2025). Gesture reduces mapping difficulties in the development of spatial language depending on the complexity of spatial relations. Cognitive Science, 49(2): e70046. doi:10.1111/cogs.70046.

    Abstract

    In spoken languages, children acquire locative terms in a cross-linguistically stable order. Terms similar in meaning to in and on emerge earlier than those similar to front and behind, followed by left and right. This order has been attributed to the complexity of the relations expressed by different locative terms. An additional possibility is that children may be delayed in expressing certain spatial meanings partly due to difficulties in discovering the mappings between locative terms in speech and spatial relation they express. We investigate cognitive and mapping difficulties in the domain of spatial language by comparing how children map spatial meanings onto speech versus visually motivated forms in co-speech gesture across different spatial relations. Twenty-four 8-year-old and 23 adult native Turkish-speakers described four-picture displays where the target picture depicted in-on, front-behind, or left-right relations between objects. As the complexity of spatial relations increased, children were more likely to rely on gestures as opposed to speech to informatively express the spatial relation. Adults overwhelmingly relied on speech to informatively express the spatial relation, and this did not change across the complexity of spatial relations. Nevertheless, even when spatial expressions in both speech and co-speech gesture were considered, children lagged behind adults when expressing the most complex left-right relations. These findings suggest that cognitive development and mapping difficulties introduced by the modality of expressions interact in shaping the development of spatial language.

    Additional information

    list of stimuli and descriptions
  • Yılmaz, B., Doğan, I., Karadöller, D. Z., Demir-Lira, Ö. E., & Göksun, T. (2025). Parental attitudes and beliefs about mathematics and the use of gestures in children’s math development. Cognitive Development, 73: 101531. doi:10.1016/j.cogdev.2024.101531.

    Abstract

    Children vary in mathematical skills even before formal schooling. The current study investigated how parental math beliefs, parents’ math anxiety, and children's spontaneous gestures contribute to preschool-aged children’s math performance. Sixty-three Turkish-reared children (33 girls, Mage = 49.9 months, SD = 3.68) were assessed on verbal counting, cardinality, and arithmetic tasks (nonverbal and verbal). Results showed that parental math beliefs were related to children’s verbal counting, cardinality and arithmetic scores. Children whose parents have higher math beliefs along with low math anxiety scored highest in the cardinality task. Children’s gesture use was also related to lower cardinality performance and the relation between parental math beliefs and children’s performance became stronger when child gestures were absent. These findings highlight the importance of parent and child-related contributors in explaining the variability in preschool-aged children’s math skills.
  • Yılmaz, B., Doğan, I., Karadöller, D. Z., Demir-Lira, Ö. E., & Göksun, T. (2025). Parental attitudes and beliefs about mathematics and the use of gestures in children’s math development. Cognitive Development, 73: 101531. doi:10.1016/j.cogdev.2024.101531.

    Abstract

    Children vary in mathematical skills even before formal schooling. The current study investigated how parental math beliefs, parents’ math anxiety, and children's spontaneous gestures contribute to preschool-aged children’s math performance. Sixty-three Turkish-reared children (33 girls, Mage = 49.9 months, SD = 3.68) were assessed on verbal counting, cardinality, and arithmetic tasks (nonverbal and verbal). Results showed that parental math beliefs were related to children’s verbal counting, cardinality and arithmetic scores. Children whose parents have higher math beliefs along with low math anxiety scored highest in the cardinality task. Children’s gesture use was also related to lower cardinality performance and the relation between parental math beliefs and children’s performance became stronger when child gestures were absent. These findings highlight the importance of parent and child-related contributors in explaining the variability in preschool-aged children’s math skills.

    Additional information

    supplementary material
  • Zora, H., Kabak, B., & Hagoort, P. (2025). Relevance of prosodic focus and lexical stress for discourse comprehension in Turkish: Evidence from psychometric and electrophysiological data. Journal of Cognitive Neuroscience, 37(3), 693-736. doi:10.1162/jocn_a_02262.

    Abstract

    Prosody underpins various linguistic domains ranging from semantics and syntax to discourse. For instance, prosodic information in the form of lexical stress modifies meanings and, as such, syntactic contexts of words as in Turkish kaz-má "pickaxe" (noun) versus káz-ma "do not dig" (imperative). Likewise, prosody indicates the focused constituent of an utterance as the noun phrase filling the wh-spot in a dialogue like What did you eat? I ate----. In the present study, we investigated the relevance of such prosodic variations for discourse comprehension in Turkish. We aimed at answering how lexical stress and prosodic focus mismatches on critical noun phrases-resulting in grammatical anomalies involving both semantics and syntax and discourse-level anomalies, respectively-affect the perceived correctness of an answer to a question in a given context. To that end, 80 native speakers of Turkish, 40 participating in a psychometric experiment and 40 participating in an EEG experiment, were asked to judge the acceptability of prosodic mismatches that occur either separately or concurrently. Psychometric results indicated that lexical stress mismatch led to a lower correctness score than prosodic focus mismatch, and combined mismatch received the lowest score. Consistent with the psychometric data, EEG results revealed an N400 effect to combined mismatch, and this effect was followed by a P600 response to lexical stress mismatch. Conjointly, these results suggest that every source of prosodic information is immediately available and codetermines the interpretation of an utterance; however, semantically and syntactically relevant lexical stress information is assigned more significance by the language comprehension system compared with prosodic focus information.
  • Coventry, K. R., Gudde, H. B., Diessel, H., Collier, J., Guijarro-Fuentes, P., Vulchanova, M., Vulchanov, V., Todisco, E., Reile, M., Breunesse, M., Plado, H., Bohnemeyer, J., Bsili, R., Caldano, M., Dekova, R., Donelson, K., Forker, D., Park, Y., Pathak, L. S., Peeters, D. and 25 moreCoventry, K. R., Gudde, H. B., Diessel, H., Collier, J., Guijarro-Fuentes, P., Vulchanova, M., Vulchanov, V., Todisco, E., Reile, M., Breunesse, M., Plado, H., Bohnemeyer, J., Bsili, R., Caldano, M., Dekova, R., Donelson, K., Forker, D., Park, Y., Pathak, L. S., Peeters, D., Pizzuto, G., Serhan, B., Apse, L., Hesse, F., Hoang, L., Hoang, P., Igari, Y., Kapiley, K., Haupt-Khutsishvili, T., Kolding, S., Priiki, K., Mačiukaitytė, I., Mohite, V., Nahkola, T., Tsoi, S. Y., Williams, S., Yasuda, S., Cangelosi, A., Duñabeitia, J. A., Mishra, R. K., Rocca, R., Šķilters, J., Wallentin, M., Žilinskaitė-Šinkūnienė, E., & Incel, O. D. (2023). Spatial communication systems across languages reflect universal action constraints. Nature Human Behaviour, 77, 2099-2110. doi:10.1038/s41562-023-01697-4.

    Abstract

    The extent to which languages share properties reflecting the non-linguistic constraints of the speakers who speak them is key to the debate regarding the relationship between language and cognition. A critical case is spatial communication, where it has been argued that semantic universals should exist, if anywhere. Here, using an experimental paradigm able to separate variation within a language from variation between languages, we tested the use of spatial demonstratives—the most fundamental and frequent spatial terms across languages. In n = 874 speakers across 29 languages, we show that speakers of all tested languages use spatial demonstratives as a function of being able to reach or act on an object being referred to. In some languages, the position of the addressee is also relevant in selecting between demonstrative forms. Commonalities and differences across languages in spatial communication can be understood in terms of universal constraints on action shaping spatial language and cognition.
  • Dingemanse, M., Liesenfeld, A., Rasenberg, M., Albert, S., Ameka, F. K., Birhane, A., Bolis, D., Cassell, J., Clift, R., Cuffari, E., De Jaegher, H., Dutilh Novaes, C., Enfield, N. J., Fusaroli, R., Gregoromichelaki, E., Hutchins, E., Konvalinka, I., Milton, D., Rączaszek-Leonardi, J., Reddy, V. and 8 moreDingemanse, M., Liesenfeld, A., Rasenberg, M., Albert, S., Ameka, F. K., Birhane, A., Bolis, D., Cassell, J., Clift, R., Cuffari, E., De Jaegher, H., Dutilh Novaes, C., Enfield, N. J., Fusaroli, R., Gregoromichelaki, E., Hutchins, E., Konvalinka, I., Milton, D., Rączaszek-Leonardi, J., Reddy, V., Rossano, F., Schlangen, D., Seibt, J., Stokoe, E., Suchman, L. A., Vesper, C., Wheatley, T., & Wiltschko, M. (2023). Beyond single-mindedness: A figure-ground reversal for the cognitive sciences. Cognitive Science, 47(1): e13230. doi:10.1111/cogs.13230.

    Abstract

    A fundamental fact about human minds is that they are never truly alone: all minds are steeped in situated interaction. That social interaction matters is recognised by any experimentalist who seeks to exclude its influence by studying individuals in isolation. On this view, interaction complicates cognition. Here we explore the more radical stance that interaction co-constitutes cognition: that we benefit from looking beyond single minds towards cognition as a process involving interacting minds. All around the cognitive sciences, there are approaches that put interaction centre stage. Their diverse and pluralistic origins may obscure the fact that collectively, they harbour insights and methods that can respecify foundational assumptions and fuel novel interdisciplinary work. What might the cognitive sciences gain from stronger interactional foundations? This represents, we believe, one of the key questions for the future. Writing as a multidisciplinary collective assembled from across the classic cognitive science hexagon and beyond, we highlight the opportunity for a figure-ground reversal that puts interaction at the heart of cognition. The interactive stance is a way of seeing that deserves to be a key part of the conceptual toolkit of cognitive scientists.
  • Dong, T., & Toneva, M. (2023). Modeling brain responses to video stimuli using multimodal video transformers. In Proceedings of the Conference on Cognitive Computational Neuroscience (CCN 2023) (pp. 194-197).

    Abstract

    Prior work has shown that internal representations of artificial neural networks can significantly predict brain responses elicited by unimodal stimuli (i.e. reading a book chapter or viewing static images). However, the computational modeling of brain representations of naturalistic video stimuli, such as movies or TV shows, still remains underexplored. In this work, we present a promising approach for modeling vision-language brain representations of video stimuli by a transformer-based model that represents videos jointly through audio, text, and vision. We show that the joint representations of vision and text information are better aligned with brain representations of subjects watching a popular TV show. We further show that the incorporation of visual information improves brain alignment across several regions that support language processing.
  • Kanakanti, M., Singh, S., & Shrivastava, M. (2023). MultiFacet: A multi-tasking framework for speech-to-sign language generation. In E. André, M. Chetouani, D. Vaufreydaz, G. Lucas, T. Schultz, L.-P. Morency, & A. Vinciarelli (Eds.), ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction (pp. 205-213). New York: ACM. doi:10.1145/3610661.3616550.

    Abstract

    Sign language is a rich form of communication, uniquely conveying meaning through a combination of gestures, facial expressions, and body movements. Existing research in sign language generation has predominantly focused on text-to-sign pose generation, while speech-to-sign pose generation remains relatively underexplored. Speech-to-sign language generation models can facilitate effective communication between the deaf and hearing communities. In this paper, we propose an architecture that utilises prosodic information from speech audio and semantic context from text to generate sign pose sequences. In our approach, we adopt a multi-tasking strategy that involves an additional task of predicting Facial Action Units (FAUs). FAUs capture the intricate facial muscle movements that play a crucial role in conveying specific facial expressions during sign language generation. We train our models on an existing Indian Sign language dataset that contains sign language videos with audio and text translations. To evaluate our models, we report Dynamic Time Warping (DTW) and Probability of Correct Keypoints (PCK) scores. We find that combining prosody and text as input, along with incorporating facial action unit prediction as an additional task, outperforms previous models in both DTW and PCK scores. We also discuss the challenges and limitations of speech-to-sign pose generation models to encourage future research in this domain. We release our models, results and code to foster reproducibility and encourage future research1.
  • Karadöller, D. Z., Sumer, B., Ünal, E., & Özyürek, A. (2023). Late sign language exposure does not modulate the relation between spatial language and spatial memory in deaf children and adults. Memory & Cognition, 51, 582-600. doi:10.3758/s13421-022-01281-7.

    Abstract

    Prior work with hearing children acquiring a spoken language as their first language shows that spatial language and cognition are related systems and spatial language use predicts spatial memory. Here, we further investigate the extent of this relationship in signing deaf children and adults and ask if late sign language exposure, as well as the frequency and the type of spatial language use that might be affected by late exposure, modulate subsequent memory for spatial relations. To do so, we compared spatial language and memory of 8-year-old late-signing children (after 2 years of exposure to a sign language at the school for the deaf) and late-signing adults to their native-signing counterparts. We elicited picture descriptions of Left-Right relations in Turkish Sign Language (Türk İşaret Dili) and measured the subsequent recognition memory accuracy of the described pictures. Results showed that late-signing adults and children were similar to their native-signing counterparts in how often they encoded the spatial relation. However, late-signing adults but not children differed from their native-signing counterparts in the type of spatial language they used. However, neither late sign language exposure nor the frequency and type of spatial language use modulated spatial memory accuracy. Therefore, even though late language exposure seems to influence the type of spatial language use, this does not predict subsequent memory for spatial relations. We discuss the implications of these findings based on the theories concerning the correspondence between spatial language and cognition as related or rather independent systems.
  • Mamus, E., Speed, L. J., Rissman, L., Majid, A., & Özyürek, A. (2023). Lack of visual experience affects multimodal language production: Evidence from congenitally blind and sighted people. Cognitive Science, 47(1): e13228. doi:10.1111/cogs.13228.

    Abstract

    The human experience is shaped by information from different perceptual channels, but it is still debated whether and how differential experience influences language use. To address this, we compared congenitally blind, blindfolded, and sighted people's descriptions of the same motion events experienced auditorily by all participants (i.e., via sound alone) and conveyed in speech and gesture. Comparison of blind and sighted participants to blindfolded participants helped us disentangle the effects of a lifetime experience of being blind versus the task-specific effects of experiencing a motion event by sound alone. Compared to sighted people, blind people's speech focused more on path and less on manner of motion, and encoded paths in a more segmented fashion using more landmarks and path verbs. Gestures followed the speech, such that blind people pointed to landmarks more and depicted manner less than sighted people. This suggests that visual experience affects how people express spatial events in the multimodal language and that blindness may enhance sensitivity to paths of motion due to changes in event construal. These findings have implications for the claims that language processes are deeply rooted in our sensory experiences.
  • Mamus, E., Speed, L., Özyürek, A., & Majid, A. (2023). The effect of input sensory modality on the multimodal encoding of motion events. Language, Cognition and Neuroscience, 38(5), 711-723. doi:10.1080/23273798.2022.2141282.

    Abstract

    Each sensory modality has different affordances: vision has higher spatial acuity than audition, whereas audition has better temporal acuity. This may have consequences for the encoding of events and its subsequent multimodal language production—an issue that has received relatively little attention to date. In this study, we compared motion events presented as audio-only, visual-only, or multimodal (visual + audio) input and measured speech and co-speech gesture depicting path and manner of motion in Turkish. Input modality affected speech production. Speakers with audio-only input produced more path descriptions and fewer manner descriptions in speech compared to speakers who received visual input. In contrast, the type and frequency of gestures did not change across conditions. Path-only gestures dominated throughout. Our results suggest that while speech is more susceptible to auditory vs. visual input in encoding aspects of motion events, gesture is less sensitive to such differences.

    Additional information

    Supplemental material
  • Manhardt, F., Brouwer, S., Van Wijk, E., & Özyürek, A. (2023). Word order preference in sign influences speech in hearing bimodal bilinguals but not vice versa: Evidence from behavior and eye-gaze. Bilingualism: Language and Cognition, 26(1), 48-61. doi:10.1017/S1366728922000311.

    Abstract

    We investigated cross-modal influences between speech and sign in hearing bimodal bilinguals, proficient in a spoken and a sign language, and its consequences on visual attention during message preparation using eye-tracking. We focused on spatial expressions in which sign languages, unlike spoken languages, have a modality-driven preference to mention grounds (big objects) prior to figures (smaller objects). We compared hearing bimodal bilinguals’ spatial expressions and visual attention in Dutch and Dutch Sign Language (N = 18) to those of their hearing non-signing (N = 20) and deaf signing peers (N = 18). In speech, hearing bimodal bilinguals expressed more ground-first descriptions and fixated grounds more than hearing non-signers, showing influence from sign. In sign, they used as many ground-first descriptions as deaf signers and fixated grounds equally often, demonstrating no influence from speech. Cross-linguistic influence of word order preference and visual attention in hearing bimodal bilinguals appears to be one-directional modulated by modality-driven differences.
  • Özer, D., Karadöller, D. Z., Özyürek, A., & Göksun, T. (2023). Gestures cued by demonstratives in speech guide listeners' visual attention during spatial language comprehension. Journal of Experimental Psychology: General, 152(9), 2623-2635. doi:10.1037/xge0001402.

    Abstract

    Gestures help speakers and listeners during communication and thinking, particularly for visual-spatial information. Speakers tend to use gestures to complement the accompanying spoken deictic constructions, such as demonstratives, when communicating spatial information (e.g., saying “The candle is here” and gesturing to the right side to express that the candle is on the speaker's right). Visual information conveyed by gestures enhances listeners’ comprehension. Whether and how listeners allocate overt visual attention to gestures in different speech contexts is mostly unknown. We asked if (a) listeners gazed at gestures more when they complement demonstratives in speech (“here”) compared to when they express redundant information to speech (e.g., “right”) and (b) gazing at gestures related to listeners’ information uptake from those gestures. We demonstrated that listeners fixated gestures more when they expressed complementary than redundant information in the accompanying speech. Moreover, overt visual attention to gestures did not predict listeners’ comprehension. These results suggest that the heightened communicative value of gestures as signaled by external cues, such as demonstratives, guides listeners’ visual attention to gestures. However, overt visual attention does not seem to be necessary to extract the cued information from the multimodal message.
  • Rasenberg, M. (2023). Mutual understanding from a multimodal and interactional perspective. PhD Thesis, Radboud University Nijmegen, Nijmegen.

Share this page