Displaying 1 - 100 of 278
  • Giles, M., Rubio-Fernández, P., & Mollica, F. (in press). Search efficiency drives reference production across modalities, but colour is special. Open mind.
  • Holler, J., & Kuhlen, A. K. (in press). Psycholinguistic perspectives on face-to-face conversation. Nature Reviews Psychology.
  • Liu, L., Ghaleb, E., Özyürek, A., & Yumak, Z. (in press). SemGes: Semantics-aware co-speech gesture generation using semantic coherence and relevance learning. In Proceedings of the International Conference on Computer Vision (ICCV 2025).
  • Ning, M., Li, M., Su, J., Jia, H., Liu, L., Beneš, M., Salah, A. A., & Ertugrul, I. O. (in press). DCTdiff: Intriguing properties of image generative modeling in the DCT space. In Proceedings of the Forty-Second International Conference on Machine Learning (ICML 2025).
  • Randone*, F., Mellana*, M., Toscano, S., & Muò, R. (in press). Risorse per Supportare la Conversazione delle Persone con Afasia: Indagine sull’Applicabilità nella Pratica Clinica. Logopedia e Comunicazione.

    Abstract

    * = Joint first authorship
  • Rubio-Fernandez, P., & Harris, D. W. (in press). Common ground: Between formal pragmatics and psycholinguistics. Annual Review of Linguistics.
  • Rubio-Fernandez, P. (in press). Cultural evolutionary pragmatics: An empirical approach to the relation between language and social cognition. In B. Geurts, & R. Moore (Eds.), The Oxford Handbook of Evolutionary Pragmatics. Oxford: Oxford University Press.
  • Slonimska, A., Campisi, E., & Ozyurek, A. (in press). Adults mark the communicative relevance of their gestures more for children than for other adults. Discourse Processes.
  • Dona, L., & Schouwstra, M. (2026). Iconicity in the evolution of language: Computational models and laboratory experiments. In O. Fisher, K. Akita, & P. Perniss (Eds.), The Oxford Handbook of Iconicity in Language (pp. 773-787). New York: Oxford University Press. doi:10.1093/oxfordhb/9780192849489.013.0049.

    Abstract

    The emergence of human language is a complex process, and to investigate the role of iconicity in this, researchers have combined insights from computational models with empirical observations from laboratory experiments. This chapter provides an overview of the most important insights on the interaction between iconicity and other linguistic properties such as combinatoriality and systematicity. In the experimental and computational work reported, it is shown how iconicity can affect the way in which emerging languages are learned and used. The chapter also discusses how computational methods can help to better understand the gradient and subjective nature of iconicity.
  • Rubio-Fernandez, P., Long, M., & Ozyurek, A. (2026). Spatial and social cognition jointly determine multimodal demonstrative reference: Experimental evidence from Turkish and Spanish. Cognition, 266: 106289. doi:10.1016/j.cognition.2025.106289.

    Abstract

    All languages in the world have demonstrative terms such as ‘this’ and ‘that’ in English, which have traditionally been treated as spatial words. Here we aim to provide experimental evidence that demonstrative choice is jointly determined by spatial considerations (e.g., whether the referent is near or far) and socio-cognitive factors (e.g.,the listener’s attention focus). We also test whether demonstrative choice varies depending on the speaker’s use of pointing, to provide evidence for a multimodal account of demonstrative systems. We focus on the Turkish system and compare it with the Spanish one to better understand the cross-linguistic variability of 3-term demonstrative systems. Corpus studies have suggested that the Turkish proximal ‘bu’ and distal ‘o’ mark a spatial contrast between near and far space, whereas the medial ‘s¸u’ is used to direct the listener’s attention to a new referent. Supporting this analysis, an online experiment using a picture-based demonstrative-choice task revealed that the medial form ‘s¸u’ was preferred when the listener was looking at the wrong object. The results of a second experiment using video stimuli further showed that the medial ‘s¸u’ was preferred when the speaker pointed to the referent to direct the listener’s attention, whereas the proximal demonstrative was used in near space and the distal in far space, mostly in joint attention and without pointing. The results of a third experiment in Spanish showed radically different patterns of demonstrative-pointing use. The medial ‘ese’ was preferred in joint attention, whereas the proximal ‘este’ and distal ‘aquel’ were selected to direct the listener’s attention towards the intended referent but without an effect of pointing. Our results confirm that demonstrative choice within a given system is determined by both spatial and socio-cognitive factors, interacting with pointing patterns and varying across languages. Leveraging recent experimental work in several languages, we interpret these findings as further evidence for the weighted parameters framework (e.g., referent position and listener attention), which explains demonstrative choice beyond previous categorical analyses.
  • Silva, E. S., Drijvers, L., & Trujillo, J. P. (2026). Exploring auditory perception experiences in daily situations in autistic adults. Autism, 30(2), 439-451. doi:10.1177/13623613251391492.

    Abstract

    Autistic individuals often show differential sensory perception, including hypo- or hypersensitivities to sound. Previous research also suggests that autistic individuals often have difficulty processing intentional and affective cues in speech acoustics. However, general speech processing difficulties remain underexplored. We investigated self-reported auditory perception using the Speech, Spatial, and Qualities of Hearing Questionnaire among autistic (self-identifying (n = 18) and clinically diagnosed (n = 45)) and non-autistic adults (N = 66). The study was conducted in the Netherlands, but the questionnaire and call for participation were in English and open to anyone regardless of country of residence. Both clinically diagnosed and self-identifying individuals with autism reported significantly lower scores on the Speech, Spatial, and Qualities of Hearing Questionnaire score and on the Speech subscale compared with non-autistic individuals, indicating challenges in overall quality of auditory perception, speech comprehension. Clinically diagnosed individuals also showed lower scores on the quality and spatial subscales compared with non-autistic individuals. Post hoc analysis further suggested that speech hearing is particularly challenging for many autistic individuals. In addition, our finding that self-identifying and clinically diagnosed autistic individuals show similar patterns of hearing difficulties emphasizes the need for more inclusive research practices that collect the experiences of all the individuals in the autistic community in the study of sensory perception in autism.
  • Slonimska, A. (2026). Iconicity in simultaneous constructions in sign languages. In O. Fisher, K. Akita, & P. Perniss (Eds.), The Oxford Handbook of Iconicity in Language (pp. 452-466). New York: Oxford University Press. doi:10.1093/oxfordhb/9780192849489.013.0028.

    Abstract

    Simultaneous constructions present a unique linguistic property in sign languages where multiple body articulators (hands, torso, head, facial expression, and eye gaze) are used to structure linguistic information not only linearly, as is done in speech, but also simultaneously. Simultaneous constructions allow a more direct encoding of scenes and events by using various iconic strategies to depict multiple event elements (e.g. referents and their actions), their spatial relations and temporal overlap, mirroring how they are perceived in the world; that is, simultaneously. The chapter provides an overview of how iconicity is operationalized in simultaneous constructions and the methods used to study their function, and discusses the history and future directions of the study of iconicity as a structuring principle in sign languages.
  • Ünal, E., Karadöller, D. Z., & Özyürek, A. (2026). Children sustain their attention on spatial scenes when planning to describe spatial relations multimodally in speech and gesture. Developmental Science, 29(2): e70128. doi:10.1111/desc.70128.

    Abstract

    How do children allocate visual attention to scenes as they prepare to describe them multimodally in speech and co-speech gesture? In an eye-tracking study, Turkish-speaking 8-year-old children viewed four-picture displays depicting the same two objects in different spatial relations as they prepared to describe target pictures depicting left-right relations. Children's visual attention was sustained on the target picture when they were planning descriptions that expressed the spatial relation multimodally in speech and gesture, but not unimodally in speech only. This pattern persisted regardless of the semantic relation between speech and gesture (i.e., for both complementary gestures that disambiguated speech and redundant gestures that supplemented already unambiguous speech). Importantly, visual attention patterns did not differ across description types while children were previewing the displays before message preparation. These results indicate that multimodal message preparation might place different demands on visual attention than unimodal message preparation, possibly due to the affordances of gestures for expressing spatial relations.
  • Atik, M. A., & Karadöller, D. Z. (2025). The effects of late sign language acquisition on emotion recall and expression in deaf children. In D. Barner, N. R. Bramley, A. Ruggeri, & C. M. Walker (Eds.), Proceedings of the 47th Annual Meeting of the Cognitive Science Society (CogSci 2025) (pp. 2225-2231).

    Abstract

    Children's emotional development is linked to language development for typically developing children and deaf children with native sign language exposure. However, approximately 90% of deaf children are born to hearing parents who are not familiar with sign language. These deaf children begin learning a sign language when they attend a school for the deaf. Late sign language exposure has negative consequences on several aspects of language development. We investigate whether acquiring sign language late affects children's emotion recall and channel of emotion expression. After watching a silent video depicting emotions, late- and native-signing deaf children retold the story in Turkish Sign Language. Results showed that late signers recalled fewer emotions and used fewer signs and facial expressions compared to native signers. Manual gestures (non-sign hand movements), head and body movements did not differ across groups. The findings suggest that late sign language acquisition negatively impacts deaf children's ability to recall and express emotions, highlighting the importance of early language exposure for the development of emotion recall.

    Additional information

    link to escholarship
  • Bariş, C., & Ünal, E. (2025). Agent preference in children: The role of animacy and event coherence. In D. Barner, N. R. Bramley, A. Ruggeri, & C. M. Walker (Eds.), Proceedings of the 47th Annual Meeting of the Cognitive Science Society (CogSci 2025) (pp. 409-415).

    Abstract

    Thematic roles in language (Agents, Patients) are considered to be hierarchically organized in terms of their salience, and this hierarchy is rooted in their counterparts as event participants in cognition. Here, we examine the relative salience of Agents over Patients in two-participant causative events in Turkish-speaking 3- to 5-year-old children. We also test if this asymmetry is modulated by the animacy of the Patient (human vs. inanimate object) and specific to the presence of a coherent event. In an eye-tracked change detection task, changes to Agents were detected more accurately (and after fewer fixations) than changes to inanimate Patients when there was a coherent event. This asymmetry disappeared when the Patient was animate (for accuracy) and when event coherence was disrupted (for both accuracy and fixations). These findings suggest an interplay of event roles and animacy in Agent preference.

    Additional information

    Link to escholarship
  • Bavaresco, A., Bernardi, R., Bertolazzi, L., Elliott, D., Fernández, R., Gatt, A., Ghaleb, E., Giulianelli, M., Hanna, M., Koller, A., Martins, A. F. T., Mondorf, P., Neplenbroek, V., Pezzelle, S., Plank, B., Schlangen, D., Suglia, A., Surikuchi, A. K., Takmaz, E., & Testoni, A. (2025). LLMs instead of human judges? A large scale empirical study across 20 NLP evaluation tasks. In W. Che, J. Nabende, E. Shutova, & M. T. Pilehvar (Eds.), Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) (pp. 238-255). Vienna, Austria: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2025.acl-short.20/.

    Abstract

    There is an increasing trend towards evaluating NLP models with LLMs instead of human judgments, raising questions about the validity of these evaluations, as well as their reproducibility in the case of proprietary models. We provide JUDGE-BENCH, an extensible collection of 20 NLP datasets with human annotations covering a broad range of evaluated properties and types of data, and comprehensively evaluate 11 current LLMs, covering both open-weight and proprietary models, for their ability to replicate the annotations. Our evaluations show substantial variance across models and datasets. Models are reliable evaluators on some tasks, but overall display substantial variability depending on the property being evaluated, the expertise level of the human judges, and whether the language is human or model-generated. We conclude that LLMs should be carefully validated against human judgments before being used as evaluators.
  • Bujok, R., Peeters, D., Meyer, A. S., & Bosker, H. R. (2025). Beating stress: Evidence for recalibration of word stress perception. Attention, Perception & Psychophysics, 87, 1729-1749. doi:10.3758/s13414-025-03088-5.

    Abstract

    Speech is inherently variable, requiring listeners to apply adaptation mechanisms to deal with the variability. A proposed perceptual adaptation mechanism is recalibration, whereby listeners learn to adjust cognitive representations of speech sounds based on disambiguating contextual information. Most studies on the role of recalibration in speech perception have focused on variability in particular speech segments (e.g., consonants/vowels), and speech has mostly been studied with a focus on talking heads. However, speech is often accompanied by visual bodily signals like hand gestures, and is thus multimodal. Moreover, variability in speech extends beyond segmental aspects alone and also affects prosodic aspects, like lexical stress. We currently do not understand well how listeners adjust their representations of lexical stress patterns to different speakers. In four experiments, we investigated recalibration of lexical stress perception, driven by lexico-orthographical information (Experiment 1) and by manual beat gestures (Experiments 2–4). Across experiments, we observed that these two types of disambiguating information (presented in an audiovisual exposure phase) led listeners to adjust their representations of lexical stress, with lasting consequences for subsequent spoken word recognition (in an audio-only test phase). However, evidence for generalization of this recalibration to new words was only found in the third experiment, suggesting that generalization may be limited. These results highlight that recalibration is a plausible mechanism for suprasegmental speech adaption in everyday communication and show that even the timing of simple hand gestures can have a lasting effect on auditory speech perception.
  • Cho, S.-J., Brown-Schmidt, S., Clough, S., & Duff, M. C. (2025). Comparing functional trend and learning among groups in intensive binary longitudinal eye-tracking data using by-variable smooth functions of GAMM. Psychometrika, 90(2), 628-657. doi:10.1007/s11336-024-09986-1.

    Abstract

    This paper presents a model specification for group comparisons regarding a functional trend over time within a trial and learning across a series of trials in intensive binary longitudinal eye-tracking data. The functional trend and learning effects are modeled using by-variable smooth functions. This model specification is formulated as a generalized additive mixed model, which allowed for the use of the freely available mgcv package (Wood in Package ‘mgcv.’ https://cran.r-project.org/web/packages/mgcv/mgcv.pdf, 2023) in R. The model specification was applied to intensive binary longitudinal eye-tracking data, where the questions of interest concern differences between individuals with and without brain injury in their real-time language comprehension and how this affects their learning over time. The results of the simulation study show that the model parameters are recovered well and the by-variable smooth functions are adequately predicted in the same condition as those found in the application.
  • Clough, S., Evans, M. J., Duff, M. C., & Brown-Schmidt, S. (2025). Reduced temporal organization of narrative recall in adults with moderate-severe traumatic brain injury. Cortex, 190, 86-109. doi:10.1016/j.cortex.2025.06.007.

    Abstract

    Narrative discourse impairments are well documented in individuals with moderate-severe traumatic brain injury (TBI). Studies of narrative discourse (i.e., story generation, story retelling) in this population have frequently focused on impairment of semantic relations across utterances and the larger discourse context (e.g., cohesion, coherence, story grammar). Less attention has been given to the temporal organization of narrative retelling in TBI. We applied temporal contiguity analyses, a technique traditionally used to characterize temporal organization of free recall of wordlists, to quantify the temporal organization of participants' story retellings with respect to the order in which the narrator originally presented the story details. We also conducted a parallel analysis of temporal contiguity of wordlist recall using data from the Rey Auditory Verbal Learning test. Participants with moderate-severe TBI and non-injured peers demonstrated above chance temporal organization and a tendency to make short transitions in the forward direction when recalling items in both the narrative recall and wordlist recall task. However, these effects were significantly reduced in the TBI group. Overall, their free recall performance was less temporally clustered, and they were more likely to make larger jumps between story details (or words in the wordlist recall task) than their non-injured peers when recalling stories. Examining free recall at multiple timepoints revealed that while repetition (i.e., multiple presentations of the wordlist) increased temporal organization of recall, long delays (i.e., one week) decreased temporal organization for both the TBI and non-injured groups. We propose that reduced temporal organization of narrative recall in individuals with moderate-severe TBI is linked to impairments in the declarative relational memory system. In line with retrieved-context models of free recall, memory disruption not only impacts the total number of story details recalled, but also the ability to use temporal context to encode and retrieve items in a sequentially organized way.
  • Drijvers, L., Small, S. L., & Skipper, J. I. (2025). Language is widely distributed throughout the brain. Nature Reviews Neuroscience, 26: 189. doi:10.1038/s41583-024-00903-0.
  • Emmendorfer, A. K., & Holler, J. (2025). Facial signals shape predictions about the nature of upcoming conversational responses. Scientific Reports, 15: 1381. doi:10.1038/s41598-025-85192-y.

    Abstract

    Increasing evidence suggests that interlocutors use visual communicative signals to form predictions about unfolding utterances, but there is little data on the predictive potential of facial signals in conversation. In an online experiment with virtual agents, we examine whether facial signals produced by an addressee may allow speakers to anticipate the response to a question before it is given. Participants (n = 80) viewed videos of short conversation fragments between two virtual humans. Each fragment ended with the Questioner asking a question, followed by a pause during which the Responder looked either straight at the Questioner (baseline), or averted their gaze, or accompanied the straight gaze with one of the following facial signals: brow raise, brow frown, nose wrinkle, smile, squint, mouth corner pulled back (dimpler). Participants then indicated on a 6-point scale whether they expected a “yes” or “no” response. Analyses revealed that all signals received different ratings relative to the baseline: brow raises, dimplers, and smiles were associated with more positive responses, gaze aversions, brow frowns, nose wrinkles, and squints with more negative responses. Qur findings show that interlocutors may form strong associations between facial signals and upcoming responses to questions, highlighting their predictive potential in face-to-face conversation.

    Additional information

    supplementary materials
  • Esmer, Ş. C., Turan, E., Karadöller, D. Z., & Göksun, T. (2025). Sources of variation in preschoolers’ relational reasoning: The interaction between language use and working memory. Journal of Experimental Child Psychology, 252: 106149. doi:10.1016/j.jecp.2024.106149.

    Abstract

    Previous research has suggested the importance of relational language and working memory in children’s relational reasoning. The tendency to use language (e.g., using more relational than object-focused language, prioritizing focal objects over background in linguistic descriptions) could reflect children’s biases toward the relational versus object-based solutions in a relational match-to-sample (RMTS) task. In the lack of any apparent object match as a foil option, object-focused children might rely on other cognitive mechanisms (i.e., working memory) to choose a relational match in the RMTS task. The current study examined the interactive roles of language- and working memory-related sources of variation in Turkish-learning preschoolers’ relational reasoning. We collected data from 4- and 5-year-olds (N = 41) via Zoom in the RMTS task, a scene description task, and a backward word span task. Generalized binomial mixed effects models revealed that children who used more relational language and background-focused scene descriptions performed worse in the relational reasoning task. Furthermore, children with less frequent relational language use and focal object descriptions of the scenes benefited more from working memory to succeed in the relational reasoning task. These results suggest additional working memory demands for object-focused children to choose relational matches in the RMTS task, highlighting the importance of examining the interactive effects of different cognitive mechanisms on relational reasoning.

    Additional information

    supplementary material
  • Ghaleb, E., Khaertdinov, B., Özyürek, A., & Fernández, R. (2025). I see what you mean: Co-speech gestures for reference resolution in multimodal dialogue. In W. Che, J. Nabende, E. Shutova, & M. T. Pilehvar (Eds.), Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) (pp. 13191-13206). Vienna, Austria: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2025.findings-acl.682/.

    Abstract

    In face-to-face interaction, we use multiple modalities, including speech and gestures, to communicate information and resolve references to objects. However, how representational co-speech gestures refer to objects remains understudied from a computational perspective. In this work, we address this gap by introducing a multimodal reference resolution task centred on representational gestures, while simultaneously tackling the challenge of learning robust gesture embeddings. We propose a self-supervised pre-training approach to gesture representation learning that grounds body movements in spoken language. Our experiments show that the learned embeddings align with expert annotations and have significant predictive power. Moreover, reference resolution accuracy further improves when (1) using multimodal gesture representations, even when speech is unavailable at inference time, and (2) leveraging dialogue history. Overall, our findings highlight the complementary roles of gesture and speech in reference resolution, offering a step towards more naturalistic models of human-machine interaction.
  • Giles, M., Rubio-Fernández, P., & Mollica, F. (2025). Perceptual discriminability drives overinformative reference, but colour information is special. In D. Barner, N. R. Bramley, A. Ruggeri, & C. M. Walker (Eds.), Proceedings of the 47th Annual Meeting of the Cognitive Science Society (CogSci 2025) (pp. 551-558).

    Abstract

    When speakers refer to objects in the world, they often overinform: provide their listener with redundant adjectival information. Contrary to classical theories in linguistics, recent theories have framed overinformativeness as an efficient means of grounding reference in perceptual information of high discriminability to facilitate listener comprehension. However, the generalisability of such theories is constrained by the methodological challenge associated with reliably manipulating the perceptual discriminability of naturalistic stimuli. Here, we overcome these methodological challenges, using methods from psychophysics to manipulate the perceptual discriminability of colour and material attributes in a reference-production experiment. We provide a robust validation of the view that overinformative reference is driven by speakers grounding expressions in attributes of high discriminability. However, we also find that colour information is privileged above and beyond such factors of discriminability.

    Additional information

    Link to escholarship
  • Göksun, T., Aktan-Erciyes, A., Karadöller, D. Z., & Demir-Lira, Ö. E. (2025). Multifaceted nature of early vocabulary development: Connecting child characteristics with parental input types. Child Development Perspectives, 19(1), 30-37. doi:10.1111/cdep.12524.

    Abstract

    Children need to learn the demands of their native language in the early vocabulary development phase. In this dynamic process, parental multimodal input may shape neurodevelopmental trajectories while also being tailored by child-related factors. Moving beyond typically characterized group profiles, in this article, we synthesize growing evidence on the effects of parental multimodal input (amount, quality, or absence), domain-specific input (space and math), and language-specific input (causal verbs and sound symbols) on preterm, full-term, and deaf children's early vocabulary development, focusing primarily on research with children learning Turkish and Turkish Sign Language. We advocate for a theoretical perspective, integrating neonatal characteristics and parental input, and acknowledging the unique constraints of languages.
  • Hagoort, P., & Özyürek, A. (2025). Extending the architecture of language from a multimodal perspective. Topics in Cognitive Science, 17(4), 877-887. doi:10.1111/tops.12728.

    Abstract

    Language is inherently multimodal. In spoken languages, combined spoken and visual signals (e.g., co-speech gestures) are an integral part of linguistic structure and language representation. This requires an extension of the parallel architecture, which needs to include the visual signals concomitant to speech. We present the evidence for the multimodality of language. In addition, we propose that distributional semantics might provide a format for integrating speech and co-speech gestures in a common semantic representation.
  • Holler, J. (2025). Facial clues to conversational intentions. Trends in Cognitive Sciences, 29(8), 750-762. doi:10.1016/j.tics.2025.03.006.

    Abstract

    It has long been known that we use words to perform speech acts foundational to everyday conversation, such as requesting, informing, proposing, or complaining. However, the natural environment of human language is face-to-face interaction where we use words and an abundance of visual signals to communicate. The multimodal nature of human language is increasingly recognised in the language and cognitive sciences. In line with this turn of the tide, findings demonstrate that facial signals significantly contribute to communicating intentions and that they may facilitate pragmatically appropriate responding in the fast-paced environment of conversation. In light of this, the notion of speech acts no longer seems appropriate, highlighting the need for a modality-neutral conception, such as social action.
  • Hustá, C., Meyer, A. S., & Drijvers, L. (2025). Using Rapid Invisible Frequency Tagging (RIFT) to probe the neural interaction between representations of speech planning and comprehension. Neurobiology of Language, 6: nol_a_00171. doi:10.1162/nol_a_00171.

    Abstract

    Interlocutors often use the semantics of comprehended speech to inform the semantics of planned speech. Do representations of the comprehension and planning stimuli interact? In this EEG study, we used rapid invisible frequency tagging (RIFT) to better understand the attentional distribution to representations of comprehension and speech planning stimuli, and how they interact in the neural signal. To do this, we leveraged the picture-word interference (PWI) paradigm with delayed naming, where participants simultaneously comprehend auditory distractors (auditory [f1]; tagged at 54 Hz) while preparing to name related or unrelated target pictures (visual [f2]; tagged at 68 Hz). RIFT elicits steady-state evoked potentials, which reflect allocation of attention to the tagged stimuli. When representations of the tagged stimuli interact, increased power has been observed at the intermodulation frequency resulting from an interaction of the base frequencies (f2 ± f1; Drijvers et al., 2021). Our results showed clear power increases at 54 Hz and 68 Hz during the tagging window, but no power difference between the related and unrelated condition. Interestingly, we observed a larger power difference in the intermodulation frequency (compared to baseline) in the unrelated compared to the related condition (68 Hz − 54 Hz: 14 Hz), indicating stronger interaction between unrelated auditory and visual representations. Our results go beyond standard PWI results by showing that participants’ difficulties in the related condition do not arise from allocating attention to the pictures or distractors. Instead, processing difficulties arise during interaction of the concepts or lemmas invoked by the two stimuli, thus, we conclude, that interaction might be downregulated in the related condition.

    Additional information

    data and analysis scripts
  • Kabak, B., & Zora, H. (2025). Psycholinguistics and Turkish: Prosodic representations and processing. In L. Johanson (Ed.), Encyclopedia of Turkic Languages and Linguistics. Leiden: Brill. doi:10.1163/2667-3029_ETLO_COM_038976.

    Abstract

    Psycholinguistic investigations provide invaluable empirical utility in theorizing and typologizing phonological phenomena. Instrumental approaches to the sound structure of Turkish have proven to be no exception here, contributing independent and multi-faceted evidence towards theory building and testing. Two areas of Turkish phonology in relation to suprasegmental structure and prominence patterns, namely word-level prosody (Section 2) and prominence and rhythmic phenomena at the level of the sentence and beyond (Section 3) have particularly fueled psycholinguistically motivated empirical studies. This chapter will approach representational and processing-related issues in each of these and provide a review of pertinent perception and production studies, touching upon phonetic and developmental investigations insofar as they have implications for mental representations or processing.
  • Karadöller, D. Z., Demir-Lira, Ö. E., & Göksun, T. (2025). Full-term children with lower vocabulary scores receive more multimodal math input than preterm children. Journal of Cognition and Development, 26(4), 630-650. doi:10.1080/15248372.2025.2470245.

    Abstract

    One of the earliest sources of mathematical input arises in dyadic parent–child interactions. However, the emphasis has been on parental input only in speech and how input varies across different environmental and child-specific factors remains largely unexplored. Here, we investigated the relationship among parental math input modality and type, children’s gestational status (being preterm vs. full-term born), and vocabulary development. Using book-reading as a medium for parental math input in dyadic interaction, we coded specific math input elicited by Turkish-speaking parents and their 26-month-old children (N = 58, 24 preterms) for speech-only and multimodal (speech and gestures combined) input. Results showed that multimodal math input, as opposed to speech-only math input, was uniquely associated with gestational status, expressive vocabulary, and the interaction between the two. Full-term children with lower expressive vocabulary scores received more multimodal input compared to their preterm peers. However, there was no association between expressive vocabulary and multimodal math input for preterm children. Moreover, cardinality was the most frequent type for both speech-only and multimodal input. These findings suggest that the specific type of multimodal math input can be produced as a function of children’s gestational status and vocabulary development.
  • Karadöller*, D. Z., Sümer*, B., & Özyürek, A. (2025). Advancing the multimodal language acquisition framework through collaborative dialogue. First Language, 45(6), 797-809. doi:10.1177/01427237251379276.

    Abstract

    *Joint first authorship.
    Language acquisition unfolds within inherently multimodal contexts, where communication is expressed and perceived through diverse channels embedded in social interactions. For hearing children, this involves integrating speech with gesture; for deaf children, language develops through fully visual modalities. Such observations necessitate a paradigm shift from speech-centric models to a holistic framework that equally values all modalities, whether in spoken or signed languages. This framework must account not only for the multimodal scaffolding of input and interaction but also for individual and contextual diversity, including the cultural and cognitive variabilities children bring to language learning contexts. Responding to commentaries on our target article, this paper refines and expands the multimodal language framework, emphasizing its capacity to integrate the interactive richness of input and the heterogeneous contexts and individual variations shaping language acquisition.
  • Karadöller*, D. Z., Sümer*, B., & Özyürek, A. (2025). First-language acquisition in a multimodal language framework: Insights from speech, gesture, and sign. First Language, 45(6), 673-710. doi:10.1177/01427237241290678.

    Abstract

    *=shared first authorship
    Children across the world acquire their first language(s) naturally, regardless of typology or modality (e.g. sign or spoken). Various attempts have been made to explain the puzzle of language acquisition using several approaches, trying to understand to what extent it can be explained by what children bring to language-learning situations as well as what they learn from the input and the interactive context. However, most of these approaches consider only speech development, thus ignoring the inherently multimodal nature of human language. As a multimodal view of language is becoming more widely adopted for the study of adult language, a multimodal approach to language acquisition is inevitable. Not only do children have the capacity to learn spoken and sign language equally easily, but spoken language acquisition consists of learning to coordinate linguistic expressions in both modalities, that is, in both speech and gesture. To provide a step forward in this direction, this article aims to synthesize findings from research studies that take a multimodal perspective on language acquisition in different sign and spoken languages, including the development of speech and accompanying gestures. Our review shows that while some aspects of language acquisition seem to be modality-independent, others might differ according to the affordances of each modality when used separately as well as together (either in sign, speech, and/or gesture). We argue that these findings need to be integrated into our understanding of language acquisition. We also identify which areas need future research for both spoken and sign language acquisition, taking into account not only multimodal but also cross-linguistic variation.
  • Kekes-Szabo, S., Clough, S., Brown-Schmidt, S., & Duff, M. C. (2025). Multiparty communication: A new direction in characterizing the impact of traumatic brain injury on social communication. American Journal of Speech-Language Pathology, 34(S3), 1896-1909. doi:10.1044/2025_AJSLP-24-00151.

    Abstract

    Purpose: The purpose of this viewpoint is to advocate for increased study of 3654244brcommon ground and audience design processes in multiparty communication in traumatic brain injury (TBI).
    Method: Building on discussions at the 2024 International Cognitive-Communication Disorders Conference, we review common ground and audience design processes in dyadic and multiparty communication. We discuss how the diffuse profiles of neural and cognitive deficits place individuals with TBI at increased risk for keeping track of who knows what in group settings and using that knowledge to flexibly adapt their communication behaviors.
    Results: We routinely engage in social communication in groups of three or more people at work, school, and social functions. While academic, vocational, and interpersonal domains are all areas where individuals with TBI are at risk for negative outcomes, we know very little about the impact of TBI on group, or multiparty, communication.
    Conclusions: The empirical study of common ground and audience design in multiparty communication in TBI presents a promising new direction in characterizing the impact of TBI on social communication, uncovering the underlying mechanisms of cognitive-communication disorders, and may lead to new inter-ventions aimed at improving success in navigating group communication at work and school, and in interpersonal relationships.
  • Lokhesh, N. N., Swaminathan, K., Shravan, G., Menon, D., Mishra, S., Nandanwar, A., & Mishra, C. (2025). Welcome to the library: Integrating social robots in Indian libraries. In O. Palinko, L. Bodenhagen, J.-J. Cabibihan, K. Fischer, S. Šabanović, K. Winkle, L. Behera, S. S. Ge, D. Chrysostomou, W. Jiang, & H. He (Eds.), Social Robotics: 16th International Conference, ICSR + AI 2024, Odense, Denmark, October 23–26, 2024, Proceedings (pp. 239-246). Singapore: Springer. doi:10.1007/978-981-96-3525-2_20.

    Abstract

    Libraries are very often considered the hallway to developing knowledge. However, the lack of adequate staff within Indian libraries makes catering to the visitors’ needs difficult. Previous systems that have sought to address libraries’ needs through automation have mostly been limited to storage and fetching aspects while lacking in their interaction aspect. We propose to address this issue by incorporating social robots within Indian libraries that can communicate and address the visitors’ queries in a multi-modal fashion attempting to make the experience more natural and appealing while helping reduce the burden on the librarians. In this paper, we propose and deploy a Furhat robot as a robot librarian by programming it on certain core librarian functionalities. We evaluate our system with a physical robot librarian (N = 26). The results show that the robot librarian was found to be very informative and overall left with a positive impression and preference.
  • Mamus, E., Speed, L. J., Ortega, G., Majid, A., & Özyürek, A. (2025). Gestural and verbal evidence of conceptual representation differences in blind and sighted individuals. Cognitive Science, 49: 10. doi:10.1111/cogs.70125.

    Abstract

    This preregistered study examined whether visual experience influences conceptual representations by examining both gestural expression and feature listing. Gestures—mostly driven by analog mappings of visuospatial and motoric experiences onto the body—offer a unique window into conceptual representations and provide complementary information not offered by language-based features, which have been the focus of previous work. Thirty congenitally or early blind and 30 sighted Turkish speakers produced silent gestures and features for concepts from semantic categories that differentially rely on experience in visual (non-manipulable objects and animals) and motor (manipulable objects) information. Blind individuals were less likely than sighted individuals to produce gestures for non-manipulable objects and animals, but not for manipulable objects. Overall, the tendency to use a particular gesture strategy for specific semantic categories was similar across groups. However, blind participants relied less on drawing and personification strategies depicting visuospatial aspects of concepts than sighted participants. Feature-listing revealed that blind participants share considerable conceptual knowledge with sighted participants, but their understanding differs in fine-grained details, particularly for animals. Thus, while concepts appear broadly similar in blind and sighted individuals, this study reveals nuanced differences, too, highlighting the intricate role of visual experience in conceptual representations.
  • Mishra, C., Skantze, G., Hagoort, P., & Verdonschot, R. G. (2025). Perception of emotions in human and robot faces: Is the eye region enough? In O. Palinko, L. Bodenhagen, J.-J. Cabihihan, K. Fischer, S. Šabanović, K. Winkle, L. Behera, S. S. Ge, D. Chrysostomou, W. Jiang, & H. He (Eds.), Social Robotics: 116th International Conference, ICSR + AI 2024, Odense, Denmark, October 23–26, 2024, Proceedings (pp. 290-303). Singapore: Springer.

    Abstract

    The increased interest in developing next-gen social robots has raised questions about the factors affecting the perception of robot emotions. This study investigates the impact of robot appearances (human-like, mechanical) and face regions (full-face, eye-region) on human perception of robot emotions. A between-subjects user study (N = 305) was conducted where participants were asked to identify the emotions being displayed in videos of robot faces, as well as a human baseline. Our findings reveal three important insights for effective social robot face design in Human-Robot Interaction (HRI): Firstly, robots equipped with a back-projected, fully animated face – regardless of whether they are more human-like or more mechanical-looking – demonstrate a capacity for emotional expression comparable to that of humans. Secondly, the recognition accuracy of emotional expressions in both humans and robots declines when only the eye region is visible. Lastly, within the constraint of only the eye region being visible, robots with more human-like features significantly enhance emotion recognition.
  • Orakçı-Beyaztaş, E., & Karadöller, D. Z. (2025). Exploring the relation between gesture presentation perspective and children’s spatial performance. Gesture. Advance online publication. doi:10.1075/gest.25016.ora.

    Abstract

    The study investigated whether the perspective of multimodal input in visuospatial maps predicts children’s spatial performance, particularly verbal recall and direction-following behavior. 5-year-old monolingual Turkish children were engaged in the Directions Task, which included visuospatial maps and videos of a speaker describing routes on maps in three conditions: Speech-Gesture combination with a front-facing view, Speech-Gesture combination with an upper back angle, and Speech-only condition with a front-facing view for control. Children were asked to verbally recall and draw the route described in the videos. They also engaged in perspective-taking, mental rotation, and relational reasoning tasks. Results showed that children’s verbal recall, but not necessarily behavioral recall, was enhanced by receiving multimodal directions. Moreover, children’s relational reasoning and perspective-taking abilities modulate their verbal recall performances. The results of this study underline the importance of multimodal input and presentation perspective in enhancing children’s spatial performance.
  • Özer, D., Özyürek, A., & Göksun, T. (2025). Spatial working memory is critical for gesture processing: Evidence from gestures with varying semantic links to speech. Psychonomic Bulletin & Review, 32, 1639-1653. doi:10.3758/s13423-025-02642-4.

    Abstract

    Gestures express redundant or complementary information to speech they accompany by depicting visual and spatial features of referents. In doing so, they recruit both spatial and verbal cognitive resources that underpin the processing of visual semantic information and its integration with speech. The relation between spatial and verbal skills and gesture comprehension, where gestures may serve different roles in relation to speech is yet to be explored. This study examined the role of spatial and verbal skills in processing gestures that expressed redundant or complementary information to speech during the comprehension of spatial relations between objects. Turkish-speaking adults (N=74) watched videos describing the spatial location of objects that involved perspective-taking (left-right) or not (on-under) with speech and gesture. Gestures either conveyed redundant information to speech (e.g., saying and gesturing “left”) or complemented the accompanying demonstrative in speech (e.g., saying “here,” gesturing “left”). We also measured participants’ spatial (the Corsi block span and the mental rotation tasks) and verbal skills (the digit span task). Our results revealed nuanced interactions between these skills and spatial language comprehension, depending on the modality in which the information was expressed. One insight emerged prominently. Spatial skills, particularly spatial working memory capacity, were related to enhanced comprehension of visual semantic information conveyed through gestures especially when this information was not present in the accompanying speech. This study highlights the critical role of spatial working memory in gesture processing and underscores the importance of examining the interplay among cognitive and contextual factors to understand the complex dynamics of multimodal language.

    Additional information

    supplementary file data via OSF
  • Özyürek, A. (2025). Multimodal language, diversity and neuro-cognition. In D. Bradley, K. Dziubalska-Kołaczyk, C. Hamans, I.-H. Lee, & F. Steurs (Eds.), Contemporary Linguistics Integrating Languages, Communities, and Technologies (pp. 275-284). Leiden: Brill Press. doi:10.1163/9789004715608_023.
  • Rubianes, M., Jiménez-Ortega, L., Muñoz, F., Drijvers, L., Almeida-Rivera, T., Sánchez-García, J., Fondevila, S., Casado, P., & Martín-Loeches, M. (2025). Effects of subliminal emotional facial expressions on language comprehension as revealed by event-related brain potentials. Scientific Reports, 15: 20449. doi:10.1038/s41598-025-06037-2.

    Abstract

    Emotional facial expressions often take place during communicative face-to-face interactions. Yet little is known as to whether natural spoken processing can be modulated by emotional expressions during online processing. Furthermore, the functional independence of syntactic processing from other cognitive and affective processes remains a long-standing debate in the literature. To address these issues, this study investigated the influence of masked emotional facial expressions on syntactic speech processing. Participants listened to sentences that could contain morphosyntactic anomalies while a masked emotional expression was presented for 16 ms (i.e., subliminally) just preceding the critical word. A larger Left Anterior Negativity (LAN) amplitude was observed for both emotional faces (i.e., happy and angry) compared to neutral ones. Moreover, a larger LAN amplitude was found for angry faces than for happy faces. Finally, a reduced P600 amplitude was observed only for angry faces when compared to neutral faces. Collectively, the results presented here indicate that first-pass syntactic parsing is influenced by emotional visual stimuli even under masked conditions and that this effect extends also to later linguistic processes. These findings constitute evidence in favor of an interactive view of language processing as integrated within a complex and integrated system for human communication.

    Additional information

    supplementary information
  • Rubianes, M., Muñoz, F., Drijvers, L., & Martín-Loeches, M. (2025). Brain signal variability is reduced during self-face processing irrespective of emotional facial expressions: Evidence from multiscale entropy analysis. Cortex, 192, 1-17. doi:10.1016/j.cortex.2025.08.007.

    Abstract

    Prior research shows that self-referential information (e.g., seeing one's own face) is prioritized in human cognition. However, the brain signal variability underlying self-processing remains scarcely treated in the literature. Additionally, less is known about whether the processing of self-referential visual content can be modulated by facial expressions of emotion, as these resemble more natural situations than neutral expressions. This study therefore investigated the brain signal variability underlying self-referential visual processing and its possible interaction with emotional facial expressions, as indexed by multiscale entropy analysis (MSE). This metric captures the temporal complexity or variability contained in neural patterns at varying timescales. Thirty-two participants were presented with distinctive facial identities (self, friend, and unknown) displaying different facial expressions (happy, neutral, and angry) and performed an identity recognition task. Our results showed that brain signal variability decreases in response to self-faces compared to other identities. Similarly, brain signal variability also decreases for friend faces relative to unknown faces. This reduction in complexity could be indicative of greater efficiency during the preferential processing of personally relevant stimuli. Furthermore, the data observed here show that self-processing is unaffected by facial expressions of emotion, suggesting an independent processing of identity from more dynamic facial information, particularly when the task demands are focused on identity recognition. These results provide novel evidence of the moment-to-moment brain signal variability involved in the identity of the self and others. The evidence presented here adds to a growing literature highlighting the relevance of neural variability for understanding brain-behavior relationships.
  • Rubio-Fernandez, P. (2025). First acquiring articles in a second language: A new approach to the study of language and social cognition. Lingua, 313: 103851. doi:10.1016/j.lingua.2024.103851.

    Abstract

    Pragmatic phenomena are characterized by extreme variability, which makes it difficult to draw sound generalizations about the role of social cognition in pragmatic language by and large. I introduce cultural evolutionary pragmatics as a new framework for the study of the interdependence between language and social cognition, and point at the study of common-ground management across languages and ages as a way to test the reliance of pragmatic language on social cognition. I illustrate this new research line with three experiments on article use by second language speakers, whose mother tongue lacks articles. These L2 speakers are known to find article use challenging and it is often argued that their difficulties stem from articles being pragmatically redundant. Contrary to this view, the results of this exploratory study support the view that proficient article use requires automatizing basic socio-cognitive processes, offering a window into the interdependence between language and social cognition.
  • Rubio-Fernandez, P., Berke, M. D., & Jara-Ettinger, J. (2025). Tracking minds in communication. Trends in Cognitive Sciences, 29(3), 269-281. doi:10.1016/j.tics.2024.11.005.

    Abstract

    How might social cognition help us communicate through language? At what levels does this interaction occur? In classical views, social cognition is independent of language, and integrating the two can be slow, effortful, and error-prone. But new research into word level processes reveals that communication
    is brimming with social micro-processes that happen in real time, guiding even the simplest choices like how we use adjectives, articles, and demonstratives. We interpret these findings in the context of advances in theoretical models of social cognition and propose a Communicative Mind-Tracking
    framework, where social micro-processes aren’t a secondary process in how we use language—they are fundamental to how communication works.
  • Slonimska, A., & Özyürek, A. (2025). Methods to study evolution of iconicity in sign languages. In L. Raviv, & C. Boeckx (Eds.), The Oxford handbook of approaches to language evolution (pp. 177-194). Oxford: Oxford University Press.

    Abstract

    Sign languages—the conventional languages of deaf communities—have been considered to provide a window into answering some questions regarding language emergence and evolution. In particular, iconicity, defined as the ‘existence of a structure-preserving mapping between mental models of linguistic form and meaning’, is generally regarded as a precursor to the arbitrary and segmental categorical structures found in spoken languages. However, iconic structures are omnipresent in sign languages at all levels of linguistic organization. Thus, there is a necessity for a more nuanced understanding of iconicity and its trajectory in language evolution. In this chapter, we outline different quantitative and qualitative methods to study iconicity and how one can operationalize them at lexical and discourse levels to investigate the role of iconicity in the evolution of sign languages.
  • Soberanes, M., Pérez-Ramírez, C. A., & Assaneo, M. F. (2025). Insights into the effect of general attentional state, coarticulation, and primed speech rate in phoneme production time. Journal of Speech, Language, and Hearing Research, 68(4), 1773-1783. doi:10.1044/2025_JSLHR-24-00595.

    Abstract

    Purpose:
    This study aimed to identify how a set of predefined factors modulates phoneme articulation time within a speaker.
    Method:
    We used a custom in-lab system that records lip muscle activity through electromyography signals, aligned with the produced speech, to measure phoneme articulation time. Twenty Spanish-speaking participants (12 females) were evaluated while producing sequences of a consonant–vowel syllable, with each sequence consisting of repeated articulations of either /pa/ or /pu/. Before starting the sequences, participants underwent a priming step with either a fast or slow speech rate. Additionally, the general attentional state level was assessed at the beginning, middle, and end of the protocol. To analyze the variability in the duration of /p/ and vowel articulation, we fitted individual linear mixed-models considering three factors: general attentional state level, priming rate, and coarticulation effects (for /p/, i.e., followed by /a/ or /u/) or phoneme identity (for vowels, i.e., being /a/ or /u/).
    Results:
    We found that the level of general attentional state positively correlated with production time for both the consonant /p/ and the vowels. Additionally, /p/ production was influenced by the nature of the following vowel (i.e., coarticulation effects), while vowel production time was affected by the primed speech rate.
    Conclusions:
    Phoneme duration appears to be influenced by both stable, speaker-specific characteristics (idiosyncratic traits) and internal, state-dependent factors related to the speaker's condition at the time of speech production. While some factors affect both consonants and vowels, others specifically modify only one of these types.

    Additional information

    supplemental material
  • Sümer, B., & Özyürek, A. (2025). Action bias in describing object locations by signing children. Sign Language and Linguistics. Advance online publication. doi:10.1075/sll.24008.sum.

    Abstract

    This study investigates the role of action bias in the acquisition of classifier constructions by deaf children acquiring Turkish Sign Language (TİD). While classifier handshapes are morphologically complex and iconic, deaf children (aged 7–9) were found to prefer handling classifiers (reflecting the actions performed by agents) more than signing adults, even in contexts requiring entity classifiers (reflecting the visual properties of their referents). The findings reveal that children’s frequent use of action-based lexical signs for nouns influenced their classifier preferences, suggesting a cognitive bias toward motoric representations. Furthermore, our results suggest the use of handling classifiers in intransitive contexts — even by adult signers — thus indicating a new type of variability in classifier use, which has not been reported for other sign languages before. These results provide new insights into how iconicity and lexical context shape the developmental trajectory of classifier constructions in sign language acquisition.
  • Tanguay, A. F. N., Clough, S., McCurdy, R. A., Padilla, V.-G., Lord, K. M., Brown-Schmidt, S., & Duff, M. C. (2025). A scoping review on conversational memory and characteristics of conversations in Alzheimer's disease. Journal of Speech, Language, and Hearing Research, 68(12), 5870-5909. doi:10.1044/2025_JSLHR-24-00780.

    Abstract

    Purpose:
    Typical late-onset Alzheimer's disease (AD) compromises episodic memory, the ability to encode new events and recollect past events. Much of the research on episodic memory in AD has relied on lab-based memory tasks (e.g., word lists, short stories). It is unclear how well these tasks characterize the impact of episodic memory impairments across different domains of everyday life, including conversational memory. The goal of the review was to establish what is known about conversational memory in AD, that is, memory for the content of conversations one overhears or in which one participates, such as utterances said and corresponding referents (e.g., “I remember discussing medical decisions with my children and I said […] and they responded […]”).
    Method:
    In this scoping review, we followed the JBI Manual for Evidence Synthesis and PRISMA Extension for Scoping Reviews guidelines to conduct the scoping review. We retained 121 reports on conversation and three reports on conversational memory out of the 8,351 unique records found on PubMed, CINAHL, PsycINFO, Web of Science, Cochrane, and Embase and in relevant reviews. Included reports had to involve conversation in any format and on any topic, include an AD population, be peer-reviewed, be in English, and be published between 1990 and 2022.
    Results:
    None of the studies investigated memory for spontaneous conversation. Although most studies on conversation did report on key characteristics of the interactional context (e.g., level of structure, number and category of conversational partners), studies also left several important details unspecified, such as hearing/vision (omitted in 67% studies) and diagnostic process (omitted in 33% studies). Studies described a broad range of behaviors during conversation, with most concerning verbal behaviors (e.g., repetitions, disfluency, ambiguity) and only 29% nonverbal behaviors (e.g., facial expression, head and hand gestures, eye gaze). Given the rarity of studies on conversational memory, we primarily summarized existing knowledge about conversation and methodological considerations to inspire hypotheses for future research on conversational memory in AD and to illuminate decisions regarding study design.
    Conclusions:
    This review revealed a wide gap in knowledge on conversational memory in AD and offers a path to accelerating research on the topic. Conversational memory may be an important factor in promoting independence, participation in health care, and social well-being.

    Additional information

    supplemental material
  • Ter Bekke, M., Drijvers, L., & Holler, J. (2025). Co-speech hand gestures are used to predict upcoming meaning. Psychological Science, 36(4), 237-248. doi:10.1177/09567976251331041.

    Abstract

    In face-to-face conversation, people use speech and gesture to convey meaning. Seeing gestures alongside speech facilitates comprehenders’ language processing, but crucially, the mechanisms underlying this facilitation remain unclear. We investigated whether comprehenders use the semantic information in gestures, typically preceding related speech, to predict upcoming meaning. Dutch adults listened to questions asked by a virtual avatar. Questions were accompanied by an iconic gesture (e.g., typing) or meaningless control movement (e.g., arm scratch) followed by a short pause and target word (e.g., “type”). A Cloze experiment showed that gestures improved explicit predictions of upcoming target words. Moreover, an EEG experiment showed that gestures reduced alpha and beta power during the pause, indicating anticipation, and reduced N400 amplitudes, demonstrating facilitated semantic processing. Thus, comprehenders use iconic gestures to predict upcoming meaning. Theories of linguistic prediction should incorporate communicative bodily signals as predictive cues to capture how language is processed in face-to-face interaction.

    Additional information

    supplementary material
  • Tilston, O., Holler, J., & Bangerter, A. (2025). Opening social interactions: The coordination of approach, gaze, speech and handshakes during greetings. Cognitive Science, 49(2): e70049. doi:10.1111/cogs.70049.

    Abstract

    Despite the importance of greetings for opening social interactions, their multimodal coordination processes remain poorly understood. We used a naturalistic, lab-based setup where pairs of unacquainted participants approached and greeted each other while unaware their greeting behavior was studied. We measured the prevalence and time course of multimodal behaviors potentially culminating in a handshake, including motor behaviors (e.g., walking, standing up, hand movements like raise, grasp, and retraction), gaze patterns (using eye tracking glasses), and speech (close and distant verbal salutations). We further manipulated the visibility of partners’ eyes to test its effect on gaze. Our findings reveal that gaze to a partner's face increases over the course of a greeting, but is partly averted during approach and is influenced by the visibility of partners’ eyes. Gaze helps coordinate handshakes, by signaling intent and guiding the grasp. The timing of adjacency pairs in verbal salutations is comparable to the precision of floor transitions in the main body of conversations, and varies according to greeting phase, with distant salutation pair parts featuring more gaps and close salutation pair parts featuring more overlap. Gender composition and a range of multimodal behaviors affect whether pairs chose to shake hands or not. These findings fill several gaps in our understanding of greetings and provide avenues for future research, including advancements in social robotics and human−robot interaction.
  • Trujillo, J. P., & Holler, J. (2025). Multimodal information density is highest in question beginnings, and early entropy is associated with fewer but longer visual signals. Discourse Processes, 62(2), 69-88. doi:10.1080/0163853X.2024.2413314.

    Abstract

    When engaged in spoken conversation, speakers convey meaning using both speech and visual signals, such as facial expressions and manual gestures. An important question is how information is distributed in utterances during face-to-face interaction when information from visual signals is also present. In a corpus of casual Dutch face-to-face conversations, we focus on spoken questions in particular because they occur frequently, thus constituting core building blocks of conversation. We quantified information density (i.e. lexical entropy and surprisal) and the number and relative duration of facial and manual signals. We tested whether lexical information density or the number of visual signals differed between the first and last halves of questions, as well as whether the number of visual signals occurring in the less-predictable portion of a question was associated with the lexical information density of the same portion of the question in a systematic manner. We found that information density, as well as number of visual signals, were higher in the first half of questions, and specifically lexical entropy was associated with fewer, but longer visual signals. The multimodal front-loading of questions and the complementary distribution of visual signals and high entropy words in Dutch casual face-to-face conversations may have implications for the parallel processes of utterance comprehension and response planning during turn-taking.

    Additional information

    supplemental material
  • Trujillo, J. P., Dyer, R. M. K., & Holler, J. (2025). Dyadic differences in empathy scores are associated with kinematic similarity during conversational question-answer pairs. Discourse Processes, 62(3), 195-213. doi:10.1080/0163853X.2025.2467605.

    Abstract

    During conversation, speakers coordinate and synergize their behaviors at multiple levels, and in different ways. The extent to which individuals converge or diverge in their behaviors during interaction may relate to interpersonal differences relevant to social interaction, such as empathy as measured by the empathy quotient (EQ). An association between interpersonal difference in empathy and interpersonal entrainment could help to throw light on how interlocutor characteristics influence interpersonal entrainment. We investigated this possibility in a corpus of unconstrained conversation between dyads. We used dynamic time warping to quantify entrainment between interlocutors of head motion, hand motion, and maximum speech f0 during question–response sequences. We additionally calculated interlocutor differences in EQ scores. We found that, for both head and hand motion, greater difference in EQ was associated with higher entrainment. Thus, we consider that people who are dissimilar in EQ may need to “ground” their interaction with low-level movement entrainment. There was no significant relationship between f0 entrainment and EQ score differences.
  • Ünal, E., Kırbaşoğlu, K., Karadöller, D. Z., Sumer, B., & Özyürek, A. (2025). Gesture reduces mapping difficulties in the development of spatial language depending on the complexity of spatial relations. Cognitive Science, 49(2): e70046. doi:10.1111/cogs.70046.

    Abstract

    In spoken languages, children acquire locative terms in a cross-linguistically stable order. Terms similar in meaning to in and on emerge earlier than those similar to front and behind, followed by left and right. This order has been attributed to the complexity of the relations expressed by different locative terms. An additional possibility is that children may be delayed in expressing certain spatial meanings partly due to difficulties in discovering the mappings between locative terms in speech and spatial relation they express. We investigate cognitive and mapping difficulties in the domain of spatial language by comparing how children map spatial meanings onto speech versus visually motivated forms in co-speech gesture across different spatial relations. Twenty-four 8-year-old and 23 adult native Turkish-speakers described four-picture displays where the target picture depicted in-on, front-behind, or left-right relations between objects. As the complexity of spatial relations increased, children were more likely to rely on gestures as opposed to speech to informatively express the spatial relation. Adults overwhelmingly relied on speech to informatively express the spatial relation, and this did not change across the complexity of spatial relations. Nevertheless, even when spatial expressions in both speech and co-speech gesture were considered, children lagged behind adults when expressing the most complex left-right relations. These findings suggest that cognitive development and mapping difficulties introduced by the modality of expressions interact in shaping the development of spatial language.

    Additional information

    list of stimuli and descriptions
  • Yılmaz, B., Doğan, I., Karadöller, D. Z., Demir-Lira, Ö. E., & Göksun, T. (2025). Parental attitudes and beliefs about mathematics and the use of gestures in children’s math development. Cognitive Development, 73: 101531. doi:10.1016/j.cogdev.2024.101531.

    Abstract

    Children vary in mathematical skills even before formal schooling. The current study investigated how parental math beliefs, parents’ math anxiety, and children's spontaneous gestures contribute to preschool-aged children’s math performance. Sixty-three Turkish-reared children (33 girls, Mage = 49.9 months, SD = 3.68) were assessed on verbal counting, cardinality, and arithmetic tasks (nonverbal and verbal). Results showed that parental math beliefs were related to children’s verbal counting, cardinality and arithmetic scores. Children whose parents have higher math beliefs along with low math anxiety scored highest in the cardinality task. Children’s gesture use was also related to lower cardinality performance and the relation between parental math beliefs and children’s performance became stronger when child gestures were absent. These findings highlight the importance of parent and child-related contributors in explaining the variability in preschool-aged children’s math skills.
  • Zora, H., Kabak, B., & Hagoort, P. (2025). Relevance of prosodic focus and lexical stress for discourse comprehension in Turkish: Evidence from psychometric and electrophysiological data. Journal of Cognitive Neuroscience, 37(3), 693-736. doi:10.1162/jocn_a_02262.

    Abstract

    Prosody underpins various linguistic domains ranging from semantics and syntax to discourse. For instance, prosodic information in the form of lexical stress modifies meanings and, as such, syntactic contexts of words as in Turkish kaz-má "pickaxe" (noun) versus káz-ma "do not dig" (imperative). Likewise, prosody indicates the focused constituent of an utterance as the noun phrase filling the wh-spot in a dialogue like What did you eat? I ate----. In the present study, we investigated the relevance of such prosodic variations for discourse comprehension in Turkish. We aimed at answering how lexical stress and prosodic focus mismatches on critical noun phrases-resulting in grammatical anomalies involving both semantics and syntax and discourse-level anomalies, respectively-affect the perceived correctness of an answer to a question in a given context. To that end, 80 native speakers of Turkish, 40 participating in a psychometric experiment and 40 participating in an EEG experiment, were asked to judge the acceptability of prosodic mismatches that occur either separately or concurrently. Psychometric results indicated that lexical stress mismatch led to a lower correctness score than prosodic focus mismatch, and combined mismatch received the lowest score. Consistent with the psychometric data, EEG results revealed an N400 effect to combined mismatch, and this effect was followed by a P600 response to lexical stress mismatch. Conjointly, these results suggest that every source of prosodic information is immediately available and codetermines the interpretation of an utterance; however, semantically and syntactically relevant lexical stress information is assigned more significance by the language comprehension system compared with prosodic focus information.
  • Zora, H., Bowin, H., Heldner, M., Riad, T., & Hagoort, P. (2025). Lexical and information structure functions of prosody and their relevance for spoken communication: Evidence from psychometric and electroencephalographic data. Journal of Cognitive Neuroscience, 37(10), 1633-1665. doi:10.1162/jocn_a_02334.

    Abstract

    Prosody not only distinguishes “lexical” meaning but also plays a key role in information packaging by highlighting the most relevant constituent of the discourse, namely, “focus” information. The present study investigated the role of lexical and focus functions of prosody in the coherent interpretation of linguistic input. To this end, we manipulated the correctness of prosodic markers in the context and scrutinized how listeners evaluate these violations—whether they result in lexical or focus anomalies—using psychometric and EEG measures. Psychometric data from 40 participants indicated that prosodic violations were judged as incorrect by the listeners both at the lexical and focus levels, with focus level violations leading to lower correctness scores than lexical level violations, and combined violations receiving the lowest scores. EEG data from 20 participants documented a strong N400 effect (350–550 msec) in response to combined violations, and a late posterior negativity (600–900 msec) present only for combined violations and focus-level violations. Consistent with the psychometric data, the EEG data suggest that prosodic violations at the focus level result in higher costs for comprehension than prosodic violations at the lexical level, whereas combined prosodic violations most significantly disrupt the interpretation. Taken together, these findings suggest that the language comprehension system is sensitive to accurate representations of both lexical and information structure prosody, and benefits from the interaction between them; however, they are weighted differently based on their relevance for a functioning spoken communication.
  • Zora, H., Bowin, H., Heldner, M., Riad, T., & Hagoort, P. (2025). Functional roles of Swedish pitch accents and their phonological and cognitive markedness. Neuropsychologia, 219: 109273. doi:10.1016/j.neuropsychologia.2025.109273.

    Abstract

    In Swedish, words are associated with either of two pitch contours labelled as Accent 1 and Accent 2. At least one of them is taken to be phonologically and cognitively marked. Besides encoding lexical tonal distinctions, these accents reflect intonational prominence. Drawing on data from psychometric and electroencephalographic (EEG) measures, we scrutinized the functional load of the accents for the processing of linguistic input, and explored any potential processing differences between Accent 1 and Accent 2. Experimental stimuli consisted of one hundred sets of auditory dialogues, where test words were accented either appropriately or inappropriately within their respective contexts. Native speakers of Central Swedish were tasked with judging the correctness of sentences containing the test words, actively in the psychometric paradigm and passively in the EEG paradigm. Psychometric data from forty participants revealed that accent violations exerted a statistically significant negative impact on correctness judgements. Both Accent 1 and Accent 2 violations were deemed as incorrect by the listeners, indicating that listeners use both of them to arrive at the correct interpretation of the linguistic input. Moreover, there was a statistically significant difference in the perceived correctness of violations depending on the accent pattern. Accent 2 violations received a lower rating for correctness in comparison to Accent 1 violations, suggesting that listeners show more sensitivity to accent violations in Accent 2 words than in Accent 1 words. EEG data from twenty participants were in accordance with the psychometric data, and documented larger negative ERP responses, observed at both early and later latencies, to Accent 2 violations compared to Accent 1 violations, reflecting neurocognitive difficulty associated with the processing of linguistic input. Put differently, the application of wrong accent pattern for Accent 2 words resulted in higher costs for spoken communication than Accent 1 words, which is in line with the notion that Accent 2 is marked both phonologically and cognitively in Central Swedish. This pattern of results provides evidence that the brain not only extracts and utilizes pitch accents for a coherent interpretation of the linguistic input but also treats them differently depending on their phonological and cognitive markedness.
  • Akamine, S., Ghaleb, E., Rasenberg, M., Fernandez, R., Meyer, A. S., & Özyürek, A. (2024). Speakers align both their gestures and words not only to establish but also to maintain reference to create shared labels for novel objects in interaction. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 2435-2442).

    Abstract

    When we communicate with others, we often repeat aspects of each other's communicative behavior such as sentence structures and words. Such behavioral alignment has been mostly studied for speech or text. Yet, language use is mostly multimodal, flexibly using speech and gestures to convey messages. Here, we explore the use of alignment in speech (words) and co-speech gestures (iconic gestures) in a referential communication task aimed at finding labels for novel objects in interaction. In particular, we investigate how people flexibly use lexical and gestural alignment to create shared labels for novel objects and whether alignment in speech and gesture are related over time. The present study shows that interlocutors establish shared labels multimodally, and alignment in words and iconic gestures are used throughout the interaction. We also show that the amount of lexical alignment positively associates with the amount of gestural alignment over time, suggesting a close relationship between alignment in the vocal and manual modalities.

    Additional information

    link to eScholarship
  • Ben-Ami, S., Shukla, Vishakha, V., Gupta, P., Shah, P., Ralekar, C., Ganesh, S., Gilad-Gutnick, S., Rubio-Fernández, P., & Sinha, P. (2024). Form perception as a bridge to real-world functional proficiency. In L. K. Samuelson, S. L. Frank, M. Toneva, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 6094-6102).

    Abstract

    Recognizing the limitations of standard vision assessments in capturing the real-world capabilities of individuals with low vision, we investigated the potential of the Seguin Form Board Test (SFBT), a widely-used intelligence assessment employing a visuo-haptic shape-fitting task, as an estimator of vision's practical utility. We present findings from 23 children from India, who underwent treatment for congenital bilateral dense cataracts, and 21 control participants. To assess the development of functional visual ability, we conducted the SFBT and the standard measure of visual acuity, before and longitudinally after treatment. We observed a dissociation in the development of shape-fitting and visual acuity. Improvements of patients' shape-fitting preceded enhancements in their visual acuity after surgery and emerged even with acuity worse than that of control participants. Our findings highlight the importance of incorporating multi-modal and cognitive aspects into evaluations of visual proficiency in low-vision conditions, to better reflect vision's impact on daily activities.

    Additional information

    link to eScholarship
  • Clough, S., Brown-Schmidt, S., Cho, S.-J., & Duff, M. C. (2024). Reduced on-line speech gesture integration during multimodal language processing in adults with moderate-severe traumatic brain injury: Evidence from eye-tracking. Cortex, 181, 26-46. doi:10.1016/j.cortex.2024.08.008.

    Abstract

    Background
    Language is multimodal and situated in rich visual contexts. Language is also incremental, unfolding moment-to-moment in real time, yet few studies have examined how spoken language interacts with gesture and visual context during multimodal language processing. Gesture is a rich communication cue that is integrally related to speech and often depicts concrete referents from the visual world. Using eye-tracking in an adapted visual world paradigm, we examined how participants with and without moderate-severe traumatic brain injury (TBI) use gesture to resolve temporary referential ambiguity.

    Methods
    Participants viewed a screen with four objects and one video. The speaker in the video produced sentences (e.g., “The girl will eat the very good sandwich”), paired with either a meaningful gesture (e.g., sandwich-holding gesture) or a meaningless grooming movement (e.g., arm scratch) at the verb “will eat.” We measured participants’ gaze to the target object (e.g., sandwich), a semantic competitor (e.g., apple), and two unrelated distractors (e.g., piano, guitar) during the critical window between movement onset in the gesture modality and onset of the spoken referent in speech.

    Results
    Both participants with and without TBI were more likely to fixate the target when the speaker produced a gesture compared to a grooming movement; however, relative to non-injured participants, the effect was significantly attenuated in the TBI group.

    Discussion
    We demonstrated evidence of reduced speech-gesture integration in participants with TBI relative to non-injured peers. This study advances our understanding of the communicative abilities of adults with TBI and could lead to a more mechanistic account of the communication difficulties adults with TBI experience in rich communication contexts that require the processing and integration of multiple co-occurring cues. This work has the potential to increase the ecological validity of language assessment and provide insights into the cognitive and neural mechanisms that support multimodal language processing.

    Additional information

    supplementary data
  • Dikshit, A. P., Das, D., Samal, R. R., Parashar, K., Mishra, C., & Parashar, S. (2024). Optimization of (Ba1-xCax)(Ti0.9Sn0.1)O3 ceramics in X-band using Machine Learning. Journal of Alloys and Compounds, 982: 173797. doi:10.1016/j.jallcom.2024.173797.

    Abstract

    Developing efficient electromagnetic interference shielding materials has become significantly important in present times. This paper reports a series of (Ba1-xCax)(Ti0.9Sn0.1)O3 (BCTS) ((x =0, 0.01, 0.05, & 0.1)ceramics synthesized by conventional method which were studied for electromagnetic interference shielding (EMI) applications in X-band (8-12.4 GHz). EMI shielding properties and all S parameters (S11 & S12) of BCTS ceramic pellets were measured in the frequency range (8-12.4 GHz) using a Vector Network Analyser (VNA). The BCTS ceramic pellets for x = 0.05 showed maximum total effective shielding of 46 dB indicating good shielding behaviour for high-frequency applications. However, the development of lead-free ceramics with different concentrations usually requires iterative experiments resulting in, longer development cycles and higher costs. To address this, we used a machine learning (ML) strategy to predict the EMI shielding for different concentrations and experimentally verify the concentration predicted to give the best EMI shielding. The ML model predicted BCTS ceramics with concentration (x = 0.06, 0.07, 0.08, and 0.09) to have higher shielding values. On experimental verification, a shielding value of 58 dB was obtained for x = 0.08, which was significantly higher than what was obtained experimentally before applying the ML approach. Our results show the potential of using ML in accelerating the process of optimal material development, reducing the need for repeated experimental measures significantly.
  • Dona, L., & Schouwstra, M. (2024). Balancing regularization and variation: The roles of priming and motivatedness. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 130-133). Nijmegen: The Evolution of Language Conferences.
  • Evans, M. J., Clough, S., Duff, M. C., & Brown‐Schmidt, S. (2024). Temporal organization of narrative recall is present but attenuated in adults with hippocampal amnesia. Hippocampus, 34(8), 438-451. doi:10.1002/hipo.23620.

    Abstract

    Studies of the impact of brain injury on memory processes often focus on the quantity and episodic richness of those recollections. Here, we argue that the organization of one's recollections offers critical insights into the impact of brain injury on functional memory. It is well-established in studies of word list memory that free recall of unrelated words exhibits a clear temporal organization. This temporal contiguity effect refers to the fact that the order in which word lists are recalled reflects the original presentation order. Little is known, however, about the organization of recall for semantically rich materials, nor how recall organization is impacted by hippocampal damage and memory impairment. The present research is the first study, to our knowledge, of temporal organization in semantically rich narratives in three groups: (1) Adults with bilateral hippocampal damage and severe declarative memory impairment, (2) adults with bilateral ventromedial prefrontal cortex (vmPFC) damage and no memory impairment, and (3) demographically matched non-brain-injured comparison participants. We find that although the narrative recall of adults with bilateral hippocampal damage reflected the temporal order in which those narratives were experienced above chance levels, their temporal contiguity effect was significantly attenuated relative to comparison groups. In contrast, individuals with vmPFC damage did not differ from non-brain-injured comparison participants in temporal contiguity. This pattern of group differences yields insights into the cognitive and neural systems that support the use of temporal organization in recall. These data provide evidence that the retrieval of temporal context in narrative recall is hippocampal-dependent, whereas damage to the vmPFC does not impair the temporal organization of narrative recall. This evidence of limited but demonstrable organization of memory in participants with hippocampal damage and amnesia speaks to the power of narrative structures in supporting meaningfully organized recall despite memory impairment.

    Additional information

    supporting information
  • Feller, J. J., Duff, M. C., Clough, S., Jacobson, G. P., Roberts, R. A., & Romero, D. J. (2024). Evidence of peripheral vestibular impairment among adults with chronic moderate–severe traumatic brain injury. American Journal of Audiology, 33, 1118-1134. doi:10.1044/2024_AJA-24-00058.

    Abstract

    Purpose:
    Traumatic brain injury (TBI) is a leading cause of death and disability among adults in the United States. There is evidence to suggest the peripheral vestibular system is vulnerable to damage in individuals with TBI. However, there are limited prospective studies that describe the type and frequency of vestibular impairment in individuals with chronic moderate–severe TBI (> 6 months postinjury).

    Method:
    Cervical and ocular vestibular evoked myogenic potentials (VEMPs) and video head impulse test (vHIT) were used to assess the function of otolith organ and horizontal semicircular canal (hSCC) pathways in adults with chronic moderate–severe TBI and in noninjured comparison (NC) participants. Self-report questionnaires were administered to participants with TBI to determine prevalence of vestibular symptoms and quality of life associated with those symptoms.

    Results:
    Chronic moderate–severe TBI was associated with a greater degree of impairment in otolith organ, rather than hSCC, pathways. About 63% of participants with TBI had abnormal VEMP responses, compared to only ~10% with abnormal vHIT responses. The NC group had significantly less abnormal VEMP responses (~7%), while none of the NC participants had abnormal vHIT responses. As many as 80% of participants with TBI reported vestibular symptoms, and up to 36% reported that these symptoms negatively affected their quality of life.

    Conclusions:
    Adults with TBI reported vestibular symptoms and decreased quality of life related to those symptoms and had objective evidence of peripheral vestibular impairment. Vestibular testing for adults with chronic TBI who report persistent dizziness and imbalance may serve as a guide for treatment and rehabilitation in these individuals.
  • Ghaleb, E., Rasenberg, M., Pouw, W., Toni, I., Holler, J., Özyürek, A., & Fernandez, R. (2024). Analysing cross-speaker convergence through the lens of automatically detected shared linguistic constructions. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 1717-1723).

    Abstract

    Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions---expressions with a common lexical core used by both speakers within a dialogue---and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.

    Additional information

    link to eScholarship
  • Ghaleb, E., Khaertdinov, B., Pouw, W., Rasenberg, M., Holler, J., Ozyurek, A., & Fernandez, R. (2024). Learning co-speech gesture representations in dialogue through contrastive learning: An intrinsic evaluation. In Proceedings of the 26th International Conference on Multimodal Interaction (ICMI 2024) (pp. 274-283).

    Abstract

    In face-to-face dialogues, the form-meaning relationship of co-speech gestures varies depending on contextual factors such as what the gestures refer to and the individual characteristics of speakers. These factors make co-speech gesture representation learning challenging. How can we learn meaningful gestures representations considering gestures’ variability and relationship with speech? This paper tackles this challenge by employing self-supervised contrastive learning techniques to learn gesture representations from skeletal and speech information. We propose an approach that includes both unimodal and multimodal pre-training to ground gesture representations in co-occurring speech. For training, we utilize a face-to-face dialogue dataset rich with representational iconic gestures. We conduct thorough intrinsic evaluations of the learned representations through comparison with human-annotated pairwise gesture similarity. Moreover, we perform a diagnostic probing analysis to assess the possibility of recovering interpretable gesture features from the learned representations. Our results show a significant positive correlation with human-annotated gesture similarity and reveal that the similarity between the learned representations is consistent with well-motivated patterns related to the dynamics of dialogue interaction. Moreover, our findings demonstrate that several features concerning the form of gestures can be recovered from the latent representations. Overall, this study shows that multimodal contrastive learning is a promising approach for learning gesture representations, which opens the door to using such representations in larger-scale gesture analysis studies.
  • Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Ozyurek, A., & Fernandez, R. (2024). Cospeech gesture detection through multi-phase sequence labeling. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024) (pp. 3995-4003). doi:10.1109/WACV57701.2024.00396.

    Abstract

    Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and re-
    traction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework’s capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis.
  • Gordon, J. K., & Clough, S. (2024). The Flu-ID: A new evidence-based method of assessing fluency in aphasia. American Journal of Speech-Language Pathology, 33, 2972-2990. doi:10.1044/2024_AJSLP-23-00424.

    Abstract

    Purpose:
    Assessing fluency in aphasia is diagnostically important for determining aphasia type and severity and therapeutically important for determining appropriate treatment targets. However, wide variability in the measures and criteria used to assess fluency, as revealed by a recent survey of clinicians (Gordon & Clough, 2022), results in poor reliability. Furthermore, poor specificity in many fluency measures makes it difficult to identify the underlying impairments. Here, we introduce the Flu-ID Aphasia, an evidence-based tool that provides a more informative method of assessing fluency by capturing the range of behaviors that can affect the flow of speech in aphasia.

    Method:
    The development of the Flu-ID was based on prior evidence about factors underlying fluency (Clough & Gordon, 2020; Gordon & Clough, 2020) and clinical perceptions about the measurement of fluency (Gordon & Clough, 2022). Clinical utility is maximized by automated counting of fluency behaviors in an Excel template. Reliability is maximized by outlining thorough guidelines for transcription and coding. Eighteen narrative samples representing a range of fluency were coded independently by the authors to examine the Flu-ID's utility, reliability, and validity.

    Results:
    Overall reliability was very good, with point-to-point agreement of 86% between coders. Ten of the 12 dimensions showed good to excellent reliability. Validity analyses indicated that Flu-ID scores were similar to clinician ratings on some dimensions, but differed on others. Possible reasons and implications of the discrepancies are discussed, along with opportunities for improvement.

    Conclusions:
    The Flu-ID assesses fluency in aphasia using a consistent and comprehensive set of measures and semi-automated procedures to generate individual fluency profiles. The profiles generated in the current study illustrate how similar ratings of fluency can arise from different underlying impairments. Supplemental materials include an analysis template, extensive guidelines for transcription and coding, a completed sample, and a quick reference guide.

    Additional information

    supplemental material
  • Jara-Ettinger, J., & Rubio-Fernandez, P. (2024). Demonstratives as attention tools: Evidence of mentalistic representations in language. Proceedings of the National Academy of Sciences of the United States of America, 121(32): e2402068121. doi:10.1073/pnas.2402068121.

    Abstract

    Linguistic communication is an intrinsically social activity that enables us to share thoughts across minds. Many complex social uses of language can be captured by domain-general representations of other minds (i.e., mentalistic representations) that externally modulate linguistic meaning through Gricean reasoning. However, here we show that representations of others’ attention are embedded within language itself. Across ten languages, we show that demonstratives—basic grammatical words (e.g.,“this”/“that”) which are evolutionarily ancient, learned early in life, and documented in all known languages—are intrinsic attention tools. Beyond their spatial meanings, demonstratives encode both joint attention and the direction in which the listenermmust turn to establish it. Crucially, the frequency of the spatial and attentional uses of demonstratives varies across languages, suggesting that both spatial and mentalistic representations are part of their conventional meaning. Using computational modeling, we show that mentalistic representations of others’ attention are internally encoded in demonstratives, with their effect further boosted by Gricean reasoning. Yet, speakers are largely unaware of this, incorrectly reporting that they primarily capture spatial representations. Our findings show that representations of other people’s cognitive states (namely, their attention) are embedded in language and suggest that the most basic building blocks of the linguistic system crucially rely on social cognition.

    Additional information

    pnas.2402068121.sapp.pdf
  • Joshi, A., Mohanty, R., Kanakanti, M., Mangla, A., Choudhary, S., Barbate, M., & Modi, A. (2024). iSign: A benchmark for Indian Sign Language processing. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Findings of the Association for Computational Linguistics ACL 2024 (pp. 10827-10844). Bangkok, Thailand: Association for Computational Linguistics.

    Abstract

    Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose iSign: a benchmark for Indian Sign Language (ISL) Processing. We make three primary contributions to this work. First, we release one of the largest ISL-English datasets with more than video-sentence/phrase pairs. To the best of our knowledge, it is the largest sign language dataset available for ISL. Second, we propose multiple NLP-specific tasks (including SignVideo2Text, SignPose2Text, Text2Pose, Word Prediction, and Sign Semantics) and benchmark them with the baseline models for easier access to the research community. Third, we provide detailed insights into the proposed benchmarks with a few linguistic insights into the working of ISL. We streamline the evaluation of Sign Language processing, addressing the gaps in the NLP research community for Sign Languages. We release the dataset, tasks and models via the following website: https://exploration-lab.github.io/iSign/

    Additional information

    dataset, tasks, models
  • Karadöller, D. Z., Peeters, D., Manhardt, F., Özyürek, A., & Ortega, G. (2024). Iconicity and gesture jointly facilitate learning of second language signs at first exposure in hearing non-signers. Language Learning, 74(4), 781-813. doi:10.1111/lang.12636.

    Abstract

    When learning a spoken second language (L2), words overlapping in form and meaning with one’s native language (L1) help break into the new language. When non-signing speakers learn a sign language as L2, such forms are absent because of the modality differences (L1:speech, L2:sign). In such cases, non-signing speakers might use iconic form-meaning mappings in signs or their own gestural experience as gateways into the to-be-acquired sign language. Here, we investigated how both these factors may contribute jointly to the acquisition of sign language vocabulary by hearing non-signers. Participants were presented with three types of sign in NGT (Sign Language of the Netherlands): arbitrary signs, iconic signs with high or low gesture overlap. Signs that were both iconic and highly overlapping with gestures boosted learning most at first exposure, and this effect remained the day after. Findings highlight the influence of modality-specific factors supporting the acquisition of a signed lexicon.
  • Karadöller, D. Z., Sümer, B., Ünal, E., & Özyürek, A. (2024). Sign advantage: Both children and adults’ spatial expressions in sign are more informative than those in speech and gestures combined. Journal of Child Language, 51(4), 876-902. doi:10.1017/S0305000922000642.

    Abstract

    Expressing Left-Right relations is challenging for speaking-children. Yet, this challenge was absent for signing-children, possibly due to iconicity in the visual-spatial modality of expression. We investigate whether there is also a modality advantage when speaking-children’s co-speech gestures are considered. Eight-year-old child and adult hearing monolingual Turkish speakers and deaf signers of Turkish-Sign-Language described pictures of objects in various spatial relations. Descriptions were coded for informativeness in speech, sign, and speech-gesture combinations for encoding Left-Right relations. The use of co-speech gestures increased the informativeness of speakers’ spatial expressions compared to speech-only. This pattern was more prominent for children than adults. However, signing-adults and children were more informative than child and adult speakers even when co-speech gestures were considered. Thus, both speaking- and signing-children benefit from iconic expressions in visual modality. Finally, in each modality, children were less informative than adults, pointing to the challenge of this spatial domain in development.
  • Kejriwal, J., Mishra, C., Skantze, G., Offrede, T., & Beňuš, Š. (2024). Does a robot’s gaze behavior affect entrainment in HRI? Computing and Informatics, 43(5), 1256-1284. doi:10.31577/cai_2024_5_1256.

    Abstract

    Speakers tend to engage in adaptive behavior, known as entrainment, when they reuse their partner's linguistic representations, including lexical, acoustic prosodic, semantic, or syntactic structures during a conversation. Studies have explored the relationship between entrainment and social factors such as likeability, task success, and rapport. Still, limited research has investigated the relationship between entrainment and gaze. To address this gap, we conducted a within-subjects user study (N = 33) to test if gaze behavior of a robotic head affects entrainment of subjects toward the robot on four linguistic dimensions: lexical, syntactic, semantic, and acoustic-prosodic. Our results show that participants entrain more on lexical and acoustic-prosodic features when the robot exhibits well-timed gaze aversions similar to the ones observed in human gaze behavior, as compared to when the robot keeps staring at participants constantly. Our results support the predictions of the computers as social actors (CASA) model and suggest that implementing well-timed gaze aversion behavior in a robot can lead to speech entrainment in human-robot interactions.
  • Kendrick, K. H., & Holler, J. (2024). Conversation. In M. C. Frank, & A. Majid (Eds.), Open Encyclopedia of Cognitive Science. Cambridge: MIT Press. doi:10.21428/e2759450.3c00b537.
  • Kimmel, M., Schneider, S. M., & Fisher, V. J. (2024). "Introjecting" imagery: A process model of how minds and bodies are co-enacted. Language Sciences, 102: 101602. doi:10.1016/j.langsci.2023.101602.

    Abstract

    Somatic practices frequently use imagery, typically via verbal instructions, to scaffold sensorimotor organization and experience, a phenomenon we term “introjection”. We argue that introjection is an imagery practice in which sensorimotor and conceptual aspects are co-orchestrated, suggesting the necessity of crosstalk between somatics, phenomenology, psychology, embodied-enactive cognition, and linguistic research on embodied simulation. We presently focus on the scarcely addressed details of the process necessary to enact instructions of a literal or metaphoric nature through the body. Based on vignettes from dance, Feldenkrais, and Taichi practice, we describe introjection as a complex form of processual sense-making, in which context-interpretive, mental, attentional and physical sub-processes recursively braid. Our analysis focuses on how mental and body-related processes progressively align, inform and augment each other. This dialectic requires emphasis on the active body, which implies that uni-directional models (concept ⇒ body) are inadequate and should be replaced by interactionist alternatives (concept ⇔ body). Furthermore, we emphasize that both the source image itself and the body are specifically conceptualized for the context through constructive operations, and both evolve through their interplay. At this level introjection employs representational operations that are embedded in enactive dynamics of a fully situated person.
  • Long, M., Rohde, H., Oraa Ali, M., & Rubio-Fernandez, P. (2024). The role of cognitive control and referential complexity on adults’ choice of referring expressions: Testing and expanding the referential complexity scale. Journal of Experimental Psychology: Learning, Memory, and Cognition, 50(1), 109-136. doi:10.1037/xlm0001273.

    Abstract

    This study aims to advance our understanding of the nature and source(s) of individual differences in pragmatic language behavior over the adult lifespan. Across four story continuation experiments, we probed adults’ (N = 496 participants, ages 18–82) choice of referential forms (i.e., names vs. pronouns to refer to the main character). Our manipulations were based on Fossard et al.’s (2018) scale of referential complexity which varies according to the visual properties of the scene: low complexity (one character), intermediate complexity (two characters of different genders), and high complexity (two characters of the same gender). Since pronouns signal topic continuity (i.e., that the discourse will continue to be about the same referent), the use of pronouns is expected to decrease as referential complexity increases. The choice of names versus pronouns, therefore, provides insight into participants’ perception of the topicality of a referent, and whether that varies by age and cognitive capacity. In Experiment 1, we used the scale to test the association between referential choice, aging, and cognition, identifying a link between older adults’ switching skills and optimal referential choice. In Experiments 2–4, we tested novel manipulations that could impact the scale and found both the timing of a competitor referent’s presence and emphasis placed on competitors modulated referential choice, leading us to refine the scale for future use. Collectively, Experiments 1–4 highlight what type of contextual information is prioritized at different ages, revealing older adults’ preserved sensitivity to (visual) scene complexity but reduced sensitivity to linguistic prominence cues, compared to younger adults.
  • Long, M., MacPherson, S. E., & Rubio-Fernandez, P. (2024). Prosocial speech acts: Links to pragmatics and aging. Developmental Psychology, 60(3), 491-504. doi:10.1037/dev0001725.

    Abstract

    This study investigated how adults over the lifespan flexibly adapt their use of prosocial speech acts when conveying bad news to communicative partners. Experiment 1a (N = 100 Scottish adults aged 18–72 years) assessed whether participants’ use of prosocial speech acts varied according to audience design considerations (i.e., whether or not the recipient of the news was directly affected). Experiment 1b (N = 100 Scottish adults aged 19–70 years) assessed whether participants adjusted for whether the bad news was more or less severe (an index of general knowledge). Younger adults displayed more flexible adaptation to the recipient manipulation, while no age differences were found for severity. These findings are consistent with prior work showing age-related decline in audience design but not in the use of general knowledge during language production. Experiment 2 further probed younger adults (N = 40, Scottish, aged 18–37 years) and older adults’ (N = 40, Scottish, aged 70–89 years) prosocial linguistic behavior by investigating whether health (vs. nonhealth-related) matters would affect responses. While older adults used prosocial speech acts to a greater extent than younger adults, they did not distinguish between conditions. Our results suggest that prosocial linguistic behavior is likely influenced by a combination of differences in audience design and communicative styles at different ages. Collectively, these findings highlight the importance of situating prosocial speech acts within the pragmatics and aging literature, allowing us to uncover the factors modulating prosocial linguistic behavior at different developmental stages.

    Additional information

    figures
  • Long, M., & Rubio-Fernandez, P. (2024). Beyond typicality: Lexical category affects the use and processing of color words. In L. K. Samuelson, S. L. Frank, M. Toneva, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 4925-4930).

    Abstract

    Speakers and listeners show an informativity bias in the use and interpretation of color modifiers. For example, speakers use color more often when referring to objects that vary in color than to objects with a prototypical color. Likewise, listeners look away from objects with prototypical colors upon hearing that color mentioned. Here we test whether speakers and listeners account for another factor related to informativity: the strength of the association between lexical categories and color. Our results demonstrate that speakers and listeners' choices are indeed influenced by this factor; as such, it should be integrated into current pragmatic theories of informativity and computational models of color reference.

    Additional information

    link to eScholarship
  • Mamus, E. (2024). Perceptual experience shapes how blind and sighted people express concepts in multimodal language. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Mishra, C., Nandanwar, A., & Mishra, S. (2024). HRI in Indian education: Challenges opportunities. In H. Admoni, D. Szafir, W. Johal, & A. Sandygulova (Eds.), Designing an introductory HRI course (workshop at HRI 2024). ArXiv. doi:10.48550/arXiv.2403.12223.

    Abstract

    With the recent advancements in the field of robotics and the increased focus on having general-purpose robots widely available to the general public, it has become increasingly necessary to pursue research into Human-robot interaction (HRI). While there have been a lot of works discussing frameworks for teaching HRI in educational institutions with a few institutions already offering courses to students, a consensus on the course content still eludes the field. In this work, we highlight a few challenges and opportunities while designing an HRI course from an Indian perspective. These topics warrant further deliberations as they have a direct impact on the design of HRI courses and wider implications for the entire field.
  • Motiekaitytė, K., Grosseck, O., Wolf, L., Bosker, H. R., Peeters, D., Perlman, M., Ortega, G., & Raviv, L. (2024). Iconicity and compositionality in emerging vocal communication systems: a Virtual Reality approach. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 387-389). Nijmegen: The Evolution of Language Conferences.
  • Nölle, J., Raviv, L., Graham, K. E., Hartmann, S., Jadoul, Y., Josserand, M., Matzinger, T., Mudd, K., Pleyer, M., Slonimska, A., Wacewicz, S., & Watson, S. (Eds.). (2024). The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV). Nijmegen: The Evolution of Language Conferences. doi:10.17617/2.3587960.
  • Plate, L., Fisher, V. J., Nabibaks, F., & Feenstra, M. (2024). Feeling the traces of the Dutch colonial past: Dance as an affective methodology in Farida Nabibaks’s radiant shadow. In E. Van Bijnen, P. Brandon, K. Fatah-Black, I. Limon, W. Modest, & M. Schavemaker (Eds.), The future of the Dutch colonial past: From dialogues to new narratives (pp. 126-139). Amsterdam: Amsterdam University Press.
  • Rasing, N. B., Van de Geest-Buit, W., Chan, O. Y. A., Mul, K., Lanser, A., Erasmus, C. E., Groothuis, J. T., Holler, J., Ingels, K. J. A. O., Post, B., Siemann, I., & Voermans, N. C. (2024). Psychosocial functioning in patients with altered facial expression: A scoping review in five neurological diseases. Disability and Rehabilitation, 46(17), 3772-3791. doi:10.1080/09638288.2023.2259310.

    Abstract

    Purpose

    To perform a scoping review to investigate the psychosocial impact of having an altered facial expression in five neurological diseases.
    Methods

    A systematic literature search was performed. Studies were on Bell’s palsy, facioscapulohumeral muscular dystrophy (FSHD), Moebius syndrome, myotonic dystrophy type 1, or Parkinson’s disease patients; had a focus on altered facial expression; and had any form of psychosocial outcome measure. Data extraction focused on psychosocial outcomes.
    Results

    Bell’s palsy, myotonic dystrophy type 1, and Parkinson’s disease patients more often experienced some degree of psychosocial distress than healthy controls. In FSHD, facial weakness negatively influenced communication and was experienced as a burden. The psychosocial distress applied especially to women (Bell’s palsy and Parkinson’s disease), and patients with more severely altered facial expression (Bell’s palsy), but not for Moebius syndrome patients. Furthermore, Parkinson’s disease patients with more pronounced hypomimia were perceived more negatively by observers. Various strategies were reported to compensate for altered facial expression.
    Conclusions

    This review showed that patients with altered facial expression in four of five included neurological diseases had reduced psychosocial functioning. Future research recommendations include studies on observers’ judgements of patients during social interactions and on the effectiveness of compensation strategies in enhancing psychosocial functioning.
    Implications for rehabilitation

    Negative effects of altered facial expression on psychosocial functioning are common and more abundant in women and in more severely affected patients with various neurological disorders.

    Health care professionals should be alert to psychosocial distress in patients with altered facial expression.

    Learning of compensatory strategies could be a beneficial therapy for patients with psychosocial distress due to an altered facial expression.
  • Ronderos, C. R., Zhang, Y., & Rubio-Fernandez, P. (2024). Weighted parameters in demonstrative use: The case of Spanish teens and adults. In L. K. Samuelson, S. L. Frank, M. Toneva, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 3279-3286).
  • Ronderos, C. R., Aparicio, H., Long, M., Shukla, V., Jara-Ettinger, J., & Rubio-Fernandez, P. (2024). Perceptual, semantic, and pragmatic factors affect the derivation of contrastive inferences. Open mind: discoveries in cognitive science, 8, 1213-1227. doi:10.1162/opmi_a_00165.

    Abstract

    People derive contrastive inferences when interpreting adjectives (e.g., inferring that ‘the short pencil’ is being contrasted with a longer one). However, classic eye-tracking studies revealed contrastive inferences with scalar and material adjectives, but not with color adjectives. This was explained as a difference in listeners’ informativity expectations, since color adjectives are often used descriptively (hence not warranting a contrastive interpretation). Here we hypothesized that, beyond these pragmatic factors, perceptual factors (i.e., the relative perceptibility of color, material and scalar contrast) and semantic factors (i.e., the difference between gradable and non-gradable properties) also affect the real-time derivation of contrastive inferences. We tested these predictions in three languages with prenominal modification (English, Hindi, and Hungarian) and found that people derive contrastive inferences for color and scalar adjectives, but not for material adjectives. In addition, the processing of scalar adjectives was more context dependent than that of color and material adjectives, confirming that pragmatic, perceptual and semantic factors affect the derivation of contrastive inferences.
  • Rubianes, M., Drijvers, L., Muñoz, F., Jiménez-Ortega, L., Almeida-Rivera, T., Sánchez-García, J., Fondevila, S., Casado, P., & Martín-Loeches, M. (2024). The self-reference effect can modulate language syntactic processing even without explicit awareness: An electroencephalography study. Journal of Cognitive Neuroscience, 36(3), 460-474. doi:10.1162/jocn_a_02104.

    Abstract

    Although it is well established that self-related information can rapidly capture our attention and bias cognitive functioning, whether this self-bias can affect language processing remains largely unknown. In addition, there is an ongoing debate as to the functional independence of language processes, notably regarding the syntactic domain. Hence, this study investigated the influence of self-related content on syntactic speech processing. Participants listened to sentences that could contain morphosyntactic anomalies while the masked face identity (self, friend, or unknown faces) was presented for 16 msec preceding the critical word. The language-related ERP components (left anterior negativity [LAN] and P600) appeared for all identity conditions. However, the largest LAN effect followed by a reduced P600 effect was observed for self-faces, whereas a larger LAN with no reduction of the P600 was found for friend faces compared with unknown faces. These data suggest that both early and late syntactic processes can be modulated by self-related content. In addition, alpha power was more suppressed over the left inferior frontal gyrus only when self-faces appeared before the critical word. This may reflect higher semantic demands concomitant to early syntactic operations (around 150–550 msec). Our data also provide further evidence of self-specific response, as reflected by the N250 component. Collectively, our results suggest that identity-related information is rapidly decoded from facial stimuli and may impact core linguistic processes, supporting an interactive view of syntactic processing. This study provides evidence that the self-reference effect can be extended to syntactic processing.
  • Rubio-Fernandez, P., Long, M., Shukla, V., Bhatia, V., Mahapatra, A., Ralekar, C., Ben-Ami, S., & Sinha, P. (2024). Multimodal communication in newly sighted children: An investigation of the relation between visual experience and pragmatic development. In L. K. Samuelson, S. L. Frank, M. Toneva, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 2560-2567).

    Abstract

    We investigated the relationship between visual experience and pragmatic development by testing the socio-communicative skills of a unique population: the Prakash children of India, who received treatment for congenital cataracts after years of visual deprivation. Using two different referential communication tasks, our study investigated Prakash' children ability to produce sufficiently informative referential expressions (e.g., ‘the green pear' or ‘the small plate') and pay attention to their interlocutor's face during the task (Experiment 1), as well as their ability to recognize a speaker's referential intent through non-verbal cues such as head turning and pointing (Experiment 2). Our results show that Prakash children have strong pragmatic skills, but do not look at their interlocutor's face as often as neurotypical children do. However, longitudinal analyses revealed an increase in face fixations, suggesting that over time, Prakash children come to utilize their improved visual skills for efficient referential communication.

    Additional information

    link to eScholarship
  • Sekine, K., & Özyürek, A. (2024). Children benefit from gestures to understand degraded speech but to a lesser extent than adults. Frontiers in Psychology, 14: 1305562. doi:10.3389/fpsyg.2023.1305562.

    Abstract

    The present study investigated to what extent children, compared to adults, benefit from gestures to disambiguate degraded speech by manipulating speech signals and manual modality. Dutch-speaking adults (N = 20) and 6- and 7-year-old children (N = 15) were presented with a series of video clips in which an actor produced a Dutch action verb with or without an accompanying iconic gesture. Participants were then asked to repeat what they had heard. The speech signal was either clear or altered into 4- or 8-band noise-vocoded speech. Children had more difficulty than adults in disambiguating degraded speech in the speech-only condition. However, when presented with both speech and gestures, children reached a comparable level of accuracy to that of adults in the degraded-speech-only condition. Furthermore, for adults, the enhancement of gestures was greater in the 4-band condition than in the 8-band condition, whereas children showed the opposite pattern. Gestures help children to disambiguate degraded speech, but children need more phonological information than adults to benefit from use of gestures. Children’s multimodal language integration needs to further develop to adapt flexibly to challenging situations such as degraded speech, as tested in our study, or instances where speech is heard with environmental noise or through a face mask.

    Additional information

    supplemental material
  • Slonimska, A. (2024). The role of iconicity and simultaneity in efficient communication in the visual modality: Evidence from LIS (Italian Sign Language) [Dissertation Abstract]. Sign Language & Linguistics, 27(1), 116-124. doi:10.1075/sll.00084.slo.
  • Ter Bekke, M., Drijvers, L., & Holler, J. (2024). Gestures speed up responses to questions. Language, Cognition and Neuroscience, 39(4), 423-430. doi:10.1080/23273798.2024.2314021.

    Abstract

    Most language use occurs in face-to-face conversation, which involves rapid turn-taking. Seeing communicative bodily signals in addition to hearing speech may facilitate such fast responding. We tested whether this holds for co-speech hand gestures by investigating whether these gestures speed up button press responses to questions. Sixty native speakers of Dutch viewed videos in which an actress asked yes/no-questions, either with or without a corresponding iconic hand gesture. Participants answered the questions as quickly and accurately as possible via button press. Gestures did not impact response accuracy, but crucially, gestures sped up responses, suggesting that response planning may be finished earlier when gestures are seen. How much gestures sped up responses was not related to their timing in the question or their timing with respect to the corresponding information in speech. Overall, these results are in line with the idea that multimodality may facilitate fast responding during face-to-face conversation.
  • Ter Bekke, M., Levinson, S. C., Van Otterdijk, L., Kühn, M., & Holler, J. (2024). Visual bodily signals and conversational context benefit the anticipation of turn ends. Cognition, 248: 105806. doi:10.1016/j.cognition.2024.105806.

    Abstract

    The typical pattern of alternating turns in conversation seems trivial at first sight. But a closer look quickly reveals the cognitive challenges involved, with much of it resulting from the fast-paced nature of conversation. One core ingredient to turn coordination is the anticipation of upcoming turn ends so as to be able to ready oneself for providing the next contribution. Across two experiments, we investigated two variables inherent to face-to-face conversation, the presence of visual bodily signals and preceding discourse context, in terms of their contribution to turn end anticipation. In a reaction time paradigm, participants anticipated conversational turn ends better when seeing the speaker and their visual bodily signals than when they did not, especially so for longer turns. Likewise, participants were better able to anticipate turn ends when they had access to the preceding discourse context than when they did not, and especially so for longer turns. Critically, the two variables did not interact, showing that visual bodily signals retain their influence even in the context of preceding discourse. In a pre-registered follow-up experiment, we manipulated the visibility of the speaker's head, eyes and upper body (i.e. torso + arms). Participants were better able to anticipate turn ends when the speaker's upper body was visible, suggesting a role for manual gestures in turn end anticipation. Together, these findings show that seeing the speaker during conversation may critically facilitate turn coordination in interaction.
  • Ter Bekke, M., Drijvers, L., & Holler, J. (2024). Hand gestures have predictive potential during conversation: An investigation of the timing of gestures in relation to speech. Cognitive Science, 48(1): e13407. doi:10.1111/cogs.13407.

    Abstract

    During face-to-face conversation, transitions between speaker turns are incredibly fast. These fast turn exchanges seem to involve next speakers predicting upcoming semantic information, such that next turn planning can begin before a current turn is complete. Given that face-to-face conversation also involves the use of communicative bodily signals, an important question is how bodily signals such as co-speech hand gestures play into these processes of prediction and fast responding. In this corpus study, we found that hand gestures that depict or refer to semantic information started before the corresponding information in speech, which held both for the onset of the gesture as a whole, as well as the onset of the stroke (the most meaningful part of the gesture). This early timing potentially allows listeners to use the gestural information to predict the corresponding semantic information to be conveyed in speech. Moreover, we provided further evidence that questions with gestures got faster responses than questions without gestures. However, we found no evidence for the idea that how much a gesture precedes its lexical affiliate (i.e., its predictive potential) relates to how fast responses were given. The findings presented here highlight the importance of the temporal relation between speech and gesture and help to illuminate the potential mechanisms underpinning multimodal language processing during face-to-face conversation.
  • Titus, A., Dijkstra, T., Willems, R. M., & Peeters, D. (2024). Beyond the tried and true: How virtual reality, dialog setups, and a focus on multimodality can take bilingual language production research forward. Neuropsychologia, 193: 108764. doi:10.1016/j.neuropsychologia.2023.108764.

    Abstract

    Bilinguals possess the ability of expressing themselves in more than one language, and typically do so in contextually rich and dynamic settings. Theories and models have indeed long considered context factors to affect bilingual language production in many ways. However, most experimental studies in this domain have failed to fully incorporate linguistic, social, or physical context aspects, let alone combine them in the same study. Indeed, most experimental psycholinguistic research has taken place in isolated and constrained lab settings with carefully selected words or sentences, rather than under rich and naturalistic conditions. We argue that the most influential experimental paradigms in the psycholinguistic study of bilingual language production fall short of capturing the effects of context on language processing and control presupposed by prominent models. This paper therefore aims to enrich the methodological basis for investigating context aspects in current experimental paradigms and thereby move the field of bilingual language production research forward theoretically. After considering extensions of existing paradigms proposed to address context effects, we present three far-ranging innovative proposals, focusing on virtual reality, dialog situations, and multimodality in the context of bilingual language production.
  • Titus, A., & Peeters, D. (2024). Multilingualism at the market: A pre-registered immersive virtual reality study of bilingual language switching. Journal of Cognition, 7(1), 24-35. doi:10.5334/joc.359.

    Abstract

    Bilinguals, by definition, are capable of expressing themselves in more than one language. But which cognitive mechanisms allow them to switch from one language to another? Previous experimental research using the cued language-switching paradigm supports theoretical models that assume that both transient, reactive and sustained, proactive inhibitory mechanisms underlie bilinguals’ capacity to flexibly and efficiently control which language they use. Here we used immersive virtual reality to test the extent to which these inhibitory mechanisms may be active when unbalanced Dutch-English bilinguals i) produce full sentences rather than individual words, ii) to a life-size addressee rather than only into a microphone, iii) using a message that is relevant to that addressee rather than communicatively irrelevant, iv) in a rich visual environment rather than in front of a computer screen. We observed a reversed language dominance paired with switch costs for the L2 but not for the L1 when participants were stand owners in a virtual marketplace and informed their monolingual customers in full sentences about the price of their fruits and vegetables. These findings strongly suggest that the subtle balance between the application of reactive and proactive inhibitory mechanisms that support bilingual language control may be different in the everyday life of a bilingual compared to in the (traditional) psycholinguistic laboratory.
  • Trujillo, J. P., & Holler, J. (2024). Conversational facial signals combine into compositional meanings that change the interpretation of speaker intentions. Scientific Reports, 14: 2286. doi:10.1038/s41598-024-52589-0.

    Abstract

    Human language is extremely versatile, combining a limited set of signals in an unlimited number of ways. However, it is unknown whether conversational visual signals feed into the composite utterances with which speakers communicate their intentions. We assessed whether different combinations of visual signals lead to different intent interpretations of the same spoken utterance. Participants viewed a virtual avatar uttering spoken questions while producing single visual signals (i.e., head turn, head tilt, eyebrow raise) or combinations of these signals. After each video, participants classified the communicative intention behind the question. We found that composite utterances combining several visual signals conveyed different meaning compared to utterances accompanied by the single visual signals. However, responses to combinations of signals were more similar to the responses to related, rather than unrelated, individual signals, indicating a consistent influence of the individual visual signals on the whole. This study therefore provides first evidence for compositional, non-additive (i.e., Gestalt-like) perception of multimodal language.

    Additional information

    41598_2024_52589_MOESM1_ESM.docx
  • Trujillo, J. P., & Holler, J. (2024). Information distribution patterns in naturalistic dialogue differ across languages. Psychonomic Bulletin & Review, 31, 1723-1734. doi:10.3758/s13423-024-02452-0.

    Abstract

    The natural ecology of language is conversation, with individuals taking turns speaking to communicate in a back-and-forth fashion. Language in this context involves strings of words that a listener must process while simultaneously planning their own next utterance. It would thus be highly advantageous if language users distributed information within an utterance in a way that may facilitate this processing–planning dynamic. While some studies have investigated how information is distributed at the level of single words or clauses, or in written language, little is known about how information is distributed within spoken utterances produced during naturalistic conversation. It also is not known how information distribution patterns of spoken utterances may differ across languages. We used a set of matched corpora (CallHome) containing 898 telephone conversations conducted in six different languages (Arabic, English, German, Japanese, Mandarin, and Spanish), analyzing more than 58,000 utterances, to assess whether there is evidence of distinct patterns of information distributions at the utterance level, and whether these patterns are similar or differed across the languages. We found that English, Spanish, and Mandarin typically show a back-loaded distribution, with higher information (i.e., surprisal) in the last half of utterances compared with the first half, while Arabic, German, and Japanese showed front-loaded distributions, with higher information in the first half compared with the last half. Additional analyses suggest that these patterns may be related to word order and rate of noun and verb usage. We additionally found that back-loaded languages have longer turn transition times (i.e.,time between speaker turns)

    Additional information

    Data availability
  • Trujillo, J. P. (2024). Motion-tracking technology for the study of gesture. In A. Cienki (Ed.), The Cambridge Handbook of Gesture Studies. Cambridge: Cambridge University Press.
  • Ünal, E., Mamus, E., & Özyürek, A. (2024). Multimodal encoding of motion events in speech, gesture, and cognition. Language and Cognition, 16(4), 785-804. doi:10.1017/langcog.2023.61.

    Abstract

    How people communicate about motion events and how this is shaped by language typology are mostly studied with a focus on linguistic encoding in speech. Yet, human communication typically involves an interactional exchange of multimodal signals, such as hand gestures that have different affordances for representing event components. Here, we review recent empirical evidence on multimodal encoding of motion in speech and gesture to gain a deeper understanding of whether and how language typology shapes linguistic expressions in different modalities, and how this changes across different sensory modalities of input and interacts with other aspects of cognition. Empirical evidence strongly suggests that Talmy’s typology of event integration predicts multimodal event descriptions in speech and gesture and visual attention to event components prior to producing these descriptions. Furthermore, variability within the event itself, such as type and modality of stimuli, may override the influence of language typology, especially for expression of manner.
  • Coventry, K. R., Gudde, H. B., Diessel, H., Collier, J., Guijarro-Fuentes, P., Vulchanova, M., Vulchanov, V., Todisco, E., Reile, M., Breunesse, M., Plado, H., Bohnemeyer, J., Bsili, R., Caldano, M., Dekova, R., Donelson, K., Forker, D., Park, Y., Pathak, L. S., Peeters, D. and 25 moreCoventry, K. R., Gudde, H. B., Diessel, H., Collier, J., Guijarro-Fuentes, P., Vulchanova, M., Vulchanov, V., Todisco, E., Reile, M., Breunesse, M., Plado, H., Bohnemeyer, J., Bsili, R., Caldano, M., Dekova, R., Donelson, K., Forker, D., Park, Y., Pathak, L. S., Peeters, D., Pizzuto, G., Serhan, B., Apse, L., Hesse, F., Hoang, L., Hoang, P., Igari, Y., Kapiley, K., Haupt-Khutsishvili, T., Kolding, S., Priiki, K., Mačiukaitytė, I., Mohite, V., Nahkola, T., Tsoi, S. Y., Williams, S., Yasuda, S., Cangelosi, A., Duñabeitia, J. A., Mishra, R. K., Rocca, R., Šķilters, J., Wallentin, M., Žilinskaitė-Šinkūnienė, E., & Incel, O. D. (2023). Spatial communication systems across languages reflect universal action constraints. Nature Human Behaviour, 77, 2099-2110. doi:10.1038/s41562-023-01697-4.

    Abstract

    The extent to which languages share properties reflecting the non-linguistic constraints of the speakers who speak them is key to the debate regarding the relationship between language and cognition. A critical case is spatial communication, where it has been argued that semantic universals should exist, if anywhere. Here, using an experimental paradigm able to separate variation within a language from variation between languages, we tested the use of spatial demonstratives—the most fundamental and frequent spatial terms across languages. In n = 874 speakers across 29 languages, we show that speakers of all tested languages use spatial demonstratives as a function of being able to reach or act on an object being referred to. In some languages, the position of the addressee is also relevant in selecting between demonstrative forms. Commonalities and differences across languages in spatial communication can be understood in terms of universal constraints on action shaping spatial language and cognition.
  • Dingemanse, M., Liesenfeld, A., Rasenberg, M., Albert, S., Ameka, F. K., Birhane, A., Bolis, D., Cassell, J., Clift, R., Cuffari, E., De Jaegher, H., Dutilh Novaes, C., Enfield, N. J., Fusaroli, R., Gregoromichelaki, E., Hutchins, E., Konvalinka, I., Milton, D., Rączaszek-Leonardi, J., Reddy, V. and 8 moreDingemanse, M., Liesenfeld, A., Rasenberg, M., Albert, S., Ameka, F. K., Birhane, A., Bolis, D., Cassell, J., Clift, R., Cuffari, E., De Jaegher, H., Dutilh Novaes, C., Enfield, N. J., Fusaroli, R., Gregoromichelaki, E., Hutchins, E., Konvalinka, I., Milton, D., Rączaszek-Leonardi, J., Reddy, V., Rossano, F., Schlangen, D., Seibt, J., Stokoe, E., Suchman, L. A., Vesper, C., Wheatley, T., & Wiltschko, M. (2023). Beyond single-mindedness: A figure-ground reversal for the cognitive sciences. Cognitive Science, 47(1): e13230. doi:10.1111/cogs.13230.

    Abstract

    A fundamental fact about human minds is that they are never truly alone: all minds are steeped in situated interaction. That social interaction matters is recognised by any experimentalist who seeks to exclude its influence by studying individuals in isolation. On this view, interaction complicates cognition. Here we explore the more radical stance that interaction co-constitutes cognition: that we benefit from looking beyond single minds towards cognition as a process involving interacting minds. All around the cognitive sciences, there are approaches that put interaction centre stage. Their diverse and pluralistic origins may obscure the fact that collectively, they harbour insights and methods that can respecify foundational assumptions and fuel novel interdisciplinary work. What might the cognitive sciences gain from stronger interactional foundations? This represents, we believe, one of the key questions for the future. Writing as a multidisciplinary collective assembled from across the classic cognitive science hexagon and beyond, we highlight the opportunity for a figure-ground reversal that puts interaction at the heart of cognition. The interactive stance is a way of seeing that deserves to be a key part of the conceptual toolkit of cognitive scientists.

Share this page