Publications

Displaying 1 - 100 of 2969
  • Akamine, S., Ghaleb, E., Rasenberg, M., Fernandez, R., Meyer, A. S., & Özyürek, A. (in press). Speakers align both their gestures and words not only to establish but also to maintain reference to create shared labels for novel objects in interaction. In Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024).
  • Araújo, S., Reis, A., Faísca, L., & Petersson, K. M. (in press). Brain sensitivity to words and the “word recognition potential”. In D. Marques, & J. H. Toscano (Eds.), De las neurociencias a la neuropsicologia: el estúdio del cerebro humano. Barranquilla, Colombia: Corporación Universitaria Reformada.
  • Bauer, B. L. M. (in press). Evolution of counting systems. In E. Aldridge, A. Breitbarth, K. É. Kiss, A. Ledgeway, J. Salmon, & A. Simonenko (Eds.), Wiley Blackwell companion to diachronic linguistics. Oxford: Wiley Blackwell.
  • Cos, F., Bujok, R., & Bosker, H. R. (in press). Test-retest reliability of audiovisual lexical stress perception after >1.5 years. In Proceedings of the 12th International Conference on Speech Prosody.

    Abstract

    In natural communication, we typically both see and hear our conversation partner. Speech comprehension thus requires the integration of auditory and visual information from the speech signal. This is for instance evidenced by the Manual McGurk effect, where the perception of lexical stress is biased towards the syllable that has a beat gesture aligned to it. However, there is considerable individual variation in how heavily gestural timing is weighed as a cue to stress. To assess within-individualconsistency, this study investigated the test-retest reliability of the Manual McGurk effect. We reran an earlier Manual McGurk experiment with the same participants, over 1.5 years later. At the group level, we successfully replicated the Manual McGurk effect with a similar effect size. However, a correlation of the by-participant effect sizes in the two identical experiments indicated that there was only a weak correlation between both tests, suggesting that the weighing of gestural information in the perception of lexical stress is stable at the group level, but less so in individuals. Findings are discussed in comparison to other measures of audiovisual integration in speech perception. Index Terms: Audiovisual integration, beat gestures, lexical stress, test-retest reliability
  • Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Ozyurek, A., & Fernandez, R. (in press). Cospeech gesture detection through multi-phase sequence labeling. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
  • Ghaleb, E., Rasenberg, M., Pouw, W., Toni, I., Holler, J., Özyürek, A., & Fernandez, R. (in press). Analysing cross-speaker convergence through the lens of automatically detected shared linguistic constructions. In Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024).
  • Liesenfeld, A., & Dingemanse, M. (in press). Rethinking open source generative AI: open-washing and the EU AI Act. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24). ACM.

    Abstract

    The past year has seen a steep rise in generative AI systems that claim to be open. But how open are they really? The question of what counts as open source in generative AI is poised to take on particular importance in light of the upcoming EU AI Act that regulates open source systems differently, creating an urgent need for practical openness assessment. Here we use an evidence-based framework that distinguishes 14 dimensions of openness, from training datasets to scientific and technical documentation and from licensing to access methods. Surveying over 45 generative AI systems (both text and text-to-image), we find that while the term open source is widely used, many models are `open weight' at best and many providers seek to evade scientific, legal and regulatory scrutiny by withholding information on training and fine-tuning data. We argue that openness in generative AI is necessarily composite (consisting of multiple elements) and gradient (coming in degrees), and point out the risk of relying on single features like access or licensing to declare models open or not. Evidence-based openness assessment can help foster a generative AI landscape in which models can be effectively regulated, model providers can be held accountable, scientists can scrutinise generative AI, and end users can make informed decisions.
  • Matteo, M., & Bosker, H. R. (in press). How to test gesture-speech integration in ten minutes. In Proceedings of the 12th International Conference on Speech Prosody.
  • Rohrer, P. L., Hong, Y., & Bosker, H. R. (in press). Gestures time to vowel onset and change the acoustics of the word in Mandarin. In Proceedings of the 12th International Conference on Speech Prosody.
  • Rohrer, P. L., Bujok, R., Van Maastricht, L., & Bosker, H. R. (in press). The timing of beat gestures affects lexical stress perception in Spanish. In Proceedings of the 12th International Conference on Speech Prosody.
  • Slonimska, A., & Özyürek, A. (in press). Methods to study evolution of iconicity in sign languages. In L. Raviv, & C. Boeckx (Eds.), The Oxford Handbook of Approaches to Language Evolution. Oxford: Oxford University Press.
  • Uluşahin, O., Bosker, H. R., McQueen, J. M., & Meyer, A. S. (in press). Knowledge of a talker’s f0 affects subsequent perception of voiceless fricatives. In Proceedings of the 12th International Conference on Speech Prosody.

    Abstract

    The human brain deals with the infinite variability of speech through multiple mechanisms. Some of them rely solely on information in the speech input (i.e., signal-driven) whereas some rely on linguistic or real-world knowledge (i.e., knowledge-driven). Many signal-driven perceptual processes rely on the enhancement of acoustic differences between incoming speech sounds, producing contrastive adjustments. For instance, when an ambiguous voiceless fricative is preceded by a high fundamental frequency (f0) sentence, the fricative is perceived as having lower a spectral center of gravity (CoG). However, it is not clear whether knowledge of a talker’s typical f0 can lead to similar contrastive effects. This study investigated a possible talker f0 effect on fricative CoG perception. In the exposure phase, two groups of participants (N=16 each) heard the same talker at high or low f0 for 20 minutes. Later, in the test phase, participants rated fixed-f0 /?ɔk/ tokens as being /sɔk/ (i.e., high CoG) or /ʃɔk/ (i.e., low CoG), where /?/ represents a fricative from a 5-step /s/-/ʃ/ continuum. Surprisingly, the data revealed the opposite of our contrastive hypothesis, whereby hearing high f0 instead biased perception towards high CoG. Thus, we demonstrated that talker f0 information affects fricative CoG perception.
  • De Vos, C. (in press). Language of perception in Kata Kolok. In A. Majid, & S. C. Levinson (Eds.), The Oxford Handbook of the Language of Perception. Oxford: Oxford University Press.

    Abstract

    This study describes the sensory lexicon on the domains of colour, taste, shape, smell and touch of a rural sign language called Kata Kolok (KK). Taste was highly codable for Kata Kolok signers, who used a dedicated set of signs and facial expressions to indicate each of the taste stimuli. The second most codable perceptual domain was shape, for which signers often used classifiers and tracing gestures that reflected the shape of the object directly. Smell had a comparatively intermediate level of codability, but this was due, for the most part, to the use of evaluative terms. Although Kata Kolok has a dedicated set of colour signs, these leave large parts of the colour spectrum unnamed, resulting in low degrees of codability in this sensory domain. Unnamed colours were frequently described by iconic-indexical forms such as object labelling and pointing strategies. Touch was the least codable domain for Kata Kolok, which resulted in a wide range of iconically motivated constructions including a restricted set of domain-specific lexical signs, classifiers, tracing gestures, object labelling, and general evaluative terms.
  • Hintz, F., & Meyer, A. S. (Eds.). (2024). Individual differences in language skills [Special Issue]. Journal of Cognition, 7(1).
  • Mishra, C., Nandanwar, A., & Mishra, S. (2024). HRI in Indian education: Challenges opportunities. In H. Admoni, D. Szafir, W. Johal, & A. Sandygulova (Eds.), Designing an introductory HRI course (workshop at HRI 2024). ArXiv. doi:10.48550/arXiv.2403.12223.

    Abstract

    With the recent advancements in the field of robotics and the increased focus on having general-purpose robots widely available to the general public, it has become increasingly necessary to pursue research into Human-robot interaction (HRI). While there have been a lot of works discussing frameworks for teaching HRI in educational institutions with a few institutions already offering courses to students, a consensus on the course content still eludes the field. In this work, we highlight a few challenges and opportunities while designing an HRI course from an Indian perspective. These topics warrant further deliberations as they have a direct impact on the design of HRI courses and wider implications for the entire field.
  • Plate, L., Fisher, V. J., Nabibaks, F., & Feenstra, M. (2024). Feeling the traces of the Dutch colonial past: Dance as an affective methodology in Farida Nabibaks’s radiant shadow. In E. Van Bijnen, P. Brandon, K. Fatah-Black, I. Limon, W. Modest, & M. Schavemaker (Eds.), The future of the Dutch colonial past: From dialogues to new narratives (pp. 126-139). Amsterdam: Amsterdam University Press.
  • Silverstein, P., Bergmann, C., & Syed, M. (Eds.). (2024). Open science and metascience in developmental psychology [Special Issue]. Infant and Child Development, 33(1).
  • Trujillo, J. P. (2024). Motion-tracking technology for the study of gesture. In A. Cienki (Ed.), The Cambridge Handbook of Gesture Studies. Cambridge: Cambridge University Press.
  • Agirrezabal, M., Paggio, P., Navarretta, C., & Jongejan, B. (2023). Multimodal detection and classification of head movements in face-to-face conversations: Exploring models, features and their interaction. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527200.

    Abstract

    In this work we perform multimodal detection and classification
    of head movements from face to face video conversation data.
    We have experimented with different models and feature sets
    and provided some insight on the effect of independent features,
    but also how their interaction can enhance a head movement
    classifier. Used features include nose, neck and mid hip position
    coordinates and their derivatives together with acoustic features,
    namely, intensity and pitch of the speaker on focus. Results
    show that when input features are sufficiently processed by in-
    teracting with each other, a linear classifier can reach a similar
    performance to a more complex non-linear neural model with
    several hidden layers. Our best models achieve state-of-the-art
    performance in the detection task, measured by macro-averaged
    F1 score.
  • Cabrelli, J., Chaouch-Orozco, A., González Alonso, J., Pereira Soares, S. M., Puig-Mayenco, E., & Rothman, J. (2023). Introduction - Multilingualism: Language, brain, and cognition. In J. Cabrelli, A. Chaouch-Orozco, J. González Alonso, S. M. Pereira Soares, E. Puig-Mayenco, & J. Rothman (Eds.), The Cambridge handbook of third language acquisition (pp. 1-20). Cambridge: Cambridge University Press. doi:10.1017/9781108957823.001.

    Abstract

    This chapter provides an introduction to the handbook. It succintly overviews the key questions in the field of L3/Ln acquisition and summarizes the scope of all the chapters included. The chapter ends by raising some outstanding questions that the field needs to address.
  • Caplan, S., Peng, M. Z., Zhang, Y., & Yu, C. (2023). Using an Egocentric Human Simulation Paradigm to quantify referential and semantic ambiguity in early word learning. In M. Goldwater, F. K. Anggoro, B. K. Hayes, & D. C. Ong (Eds.), Proceedings of the 45th Annual Meeting of the Cognitive Science Society (CogSci 2023) (pp. 1043-1049).

    Abstract

    In order to understand early word learning we need to better understand and quantify properties of the input that young children receive. We extended the human simulation paradigm (HSP) using egocentric videos taken from infant head-mounted cameras. The videos were further annotated with gaze information indicating in-the-moment visual attention from the infant. Our new HSP prompted participants for two types of responses, thus differentiating referential from semantic ambiguity in the learning input. Consistent with findings on visual attention in word learning, we find a strongly bimodal distribution over HSP accuracy. Even in this open-ended task, most videos only lead to a small handful of common responses. What's more, referential ambiguity was the key bottleneck to performance: participants can nearly always recover the exact word that was said if they identify the correct referent. Finally, analysis shows that adult learners relied on particular, multimodal behavioral cues to infer those target referents.
  • Chevrefils, L., Morgenstern, A., Beaupoil-Hourdel, P., Bedoin, D., Caët, S., Danet, C., Danino, C., De Pontonx, S., & Parisse, C. (2023). Coordinating eating and languaging: The choreography of speech, sign, gesture and action in family dinners. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527183.

    Abstract

    In this study, we analyze one French signing and one French speaking family’s interaction during dinner. The families composed of two parents and two children aged 3 to 11 were filmed with three cameras to capture all family members’ behaviors. The three videos per dinner were synchronized and coded on ELAN. We annotated all participants’ acting, and languaging.
    Our quantitative analyses show how family members collaboratively manage multiple streams of activity through the embodied performances of dining and interacting. We uncover different profiles according to participants’ modality of expression and status (focusing on the mother and the younger child). The hearing participants’ co-activity management illustrates their monitoring of dining and conversing and how they progressively master the affordances of the visual and vocal channels to maintain the simultaneity of the two activities. The deaf mother skillfully manages to alternate smoothly between dining and interacting. The deaf younger child manifests how she is in the process of developing her skills to manage multi-activity. Our qualitative analyses focus on the ecology of visual-gestural and audio-vocal languaging in the context of co-activity according to language and participant. We open new perspectives on the management of gaze and body parts in multimodal languaging.
  • Corps, R. E. (2023). What do we know about the mechanisms of response planning in dialog? In Psychology of Learning and Motivation (pp. 41-81). doi:10.1016/bs.plm.2023.02.002.

    Abstract

    During dialog, interlocutors take turns at speaking with little gap or overlap between their contributions. But language production in monolog is comparatively slow. Theories of dialog tend to agree that interlocutors manage these timing demands by planning a response early, before the current speaker reaches the end of their turn. In the first half of this chapter, I review experimental research supporting these theories. But this research also suggests that planning a response early, while simultaneously comprehending, is difficult. Does response planning need to be this difficult during dialog? In other words, is early-planning always necessary? In the second half of this chapter, I discuss research that suggests the answer to this question is no. In particular, corpora of natural conversation demonstrate that speakers do not directly respond to the immediately preceding utterance of their partner—instead, they continue an utterance they produced earlier. This parallel talk likely occurs because speakers are highly incremental and plan only part of their utterance before speaking, leading to pauses, hesitations, and disfluencies. As a result, speakers do not need to engage in extensive advance planning. Thus, laboratory studies do not provide a full picture of language production in dialog, and further research using naturalistic tasks is needed.
  • Creemers, A. (2023). Morphological processing in spoken-word recognition. In D. Crepaldi (Ed.), Linguistic morphology in the mind and brain (pp. 50-64). New York: Routledge.

    Abstract

    Most psycholinguistic studies on morphological processing have examined the role of morphological structure in the visual modality. This chapter discusses morphological processing in the auditory modality, which is an area of research that has only recently received more attention. It first discusses why results in the visual modality cannot straightforwardly be applied to the processing of spoken words, stressing the importance of acknowledging potential modality effects. It then gives a brief overview of the existing research on the role of morphology in the auditory modality, for which an increasing number of studies report that listeners show sensitivity to morphological structure. Finally, the chapter highlights insights gained by looking at morphological processing not only in reading, but also in listening, and it discusses directions for future research
  • Dingemanse, M. (2023). Ideophones. In E. Van Lier (Ed.), The Oxford handbook of word classes (pp. 466-476). Oxford: Oxford University Press.

    Abstract

    Many of the world’s languages feature an open lexical class of ideophones, words whose marked forms and sensory meanings invite iconic associations. Ideophones (also known as mimetics or expressives) are well-known from languages in Asia, Africa and the Americas, where they often form a class on the same order of magnitude as other major word classes and take up a considerable functional load as modifying expressions or predicates. Across languages, commonalities in the morphosyntactic behaviour of ideophones can be related to their nature and origin as vocal depictions. At the same time there is ample room for linguistic diversity, raising the need for fine-grained grammatical description of ideophone systems. As vocal depictions, ideophones often form a distinct lexical stratum seemingly conjured out of thin air; but as conventionalized words, they inevitably grow roots in local linguistic systems, showing relations to adverbs, adjectives, verbs and other linguistic resources devoted to modification and predication
  • Dingemanse, M. (2023). Interjections. In E. Van Lier (Ed.), The Oxford handbook of word classes (pp. 477-491). Oxford: Oxford University Press.

    Abstract

    No class of words has better claims to universality than interjections. At the same time, no category has more variable content than this one, traditionally the catch-all basket for linguistic items that bear a complicated relation to sentential syntax. Interjections are a mirror reflecting methodological and theoretical assumptions more than a coherent linguistic category that affords unitary treatment. This chapter focuses on linguistic items that typically function as free-standing utterances, and on some of the conceptual, methodological, and theoretical questions generated by such items. A key move is to study these items in the setting of conversational sequences, rather than from the “flatland” of sequential syntax. This makes visible how some of the most frequent interjections streamline everyday language use and scaffold complex language. Approaching interjections in terms of their sequential positions and interactional functions has the potential to reveal and explain patterns of universality and diversity in interjections.
  • Drijvers, L., & Mazzini, S. (2023). Neural oscillations in audiovisual language and communication. In Oxford Research Encyclopedia of Neuroscience. Oxford: Oxford University Press. doi:10.1093/acrefore/9780190264086.013.455.

    Abstract

    How do neural oscillations support human audiovisual language and communication? Considering the rhythmic nature of audiovisual language, in which stimuli from different sensory modalities unfold over time, neural oscillations represent an ideal candidate to investigate how audiovisual language is processed in the brain. Modulations of oscillatory phase and power are thought to support audiovisual language and communication in multiple ways. Neural oscillations synchronize by tracking external rhythmic stimuli or by re-setting their phase to presentation of relevant stimuli, resulting in perceptual benefits. In particular, synchronized neural oscillations have been shown to subserve the processing and the integration of auditory speech, visual speech, and hand gestures. Furthermore, synchronized oscillatory modulations have been studied and reported between brains during social interaction, suggesting that their contribution to audiovisual communication goes beyond the processing of single stimuli and applies to natural, face-to-face communication.

    There are still some outstanding questions that need to be answered to reach a better understanding of the neural processes supporting audiovisual language and communication. In particular, it is not entirely clear yet how the multitude of signals encountered during audiovisual communication are combined into a coherent percept and how this is affected during real-world dyadic interactions. In order to address these outstanding questions, it is fundamental to consider language as a multimodal phenomenon, involving the processing of multiple stimuli unfolding at different rhythms over time, and to study language in its natural context: social interaction. Other outstanding questions could be addressed by implementing novel techniques (such as rapid invisible frequency tagging, dual-electroencephalography, or multi-brain stimulation) and analysis methods (e.g., using temporal response functions) to better understand the relationship between oscillatory dynamics and efficient audiovisual communication.
  • Düngen, D., Sarfati, M., & Ravignani, A. (2023). Cross-species research in biomusicality: Methods, pitfalls, and prospects. In E. H. Margulis, P. Loui, & D. Loughridge (Eds.), The science-music borderlands: Reckoning with the past and imagining the future (pp. 57-95). Cambridge, MA, USA: The MIT Press. doi:10.7551/mitpress/14186.003.0008.
  • Ekerdt, C., Takashima, A., & McQueen, J. M. (2023). Memory consolidation in second language neurocognition. In K. Morgan-Short, & J. G. Van Hell (Eds.), The Routledge handbook of second language acquisition and neurolinguistics. Oxfordshire: Routledge.

    Abstract

    Acquiring a second language (L2) requires newly learned information to be integrated with existing knowledge. It has been proposed that several memory systems work together to enable this process of rapidly encoding new information and then slowly incorporating it with existing knowledge, such that it is consolidated and integrated into the language network without catastrophic interference. This chapter focuses on consolidation of L2 vocabulary. First, the complementary learning systems model is outlined, along with the model’s predictions regarding lexical consolidation. Next, word learning studies in first language (L1) that investigate the factors playing a role in consolidation, and the neural mechanisms underlying this, are reviewed. Using the L1 memory consolidation literature as background, the chapter then presents what is currently known about memory consolidation in L2 word learning. Finally, considering what is already known about L1 but not about L2, future research investigating memory consolidation in L2 neurocognition is proposed.
  • Ferré, G. (2023). Pragmatic gestures and prosody. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527215.

    Abstract

    The study presented here focuses on two pragmatic gestures:
    the hand flip (Ferré, 2011), a gesture of the Palm Up Open
    Hand/PUOH family (Müller, 2004) and the closed hand which
    can be considered as the opposite kind of movement to the open-
    ing of the hands present in the PUOH gesture. Whereas one of
    the functions of the hand flip has been described as presenting
    a new point in speech (Cienki, 2021), the closed hand gesture
    has not yet been described in the literature to the best of our
    knowledge. It can however be conceived of as having the oppo-
    site function of announcing the end of a point in discourse. The
    object of the present study is therefore to determine, with the
    study of prosodic features, if the two gestures are found in the
    same type of speech units and what their respective scope is.
    Drawing from a corpus of three TED Talks in French the
    prosodic characteristics of the speech that accompanies the two
    gestures will be examined. The hypothesis developed in the
    present paper is that their scope should be reflected in the
    prosody of accompanying speech, especially pitch key, tone,
    and relative pitch range. The prediction is that hand flips and
    closing hand gestures are expected to be located at the periph-
    ery of Intonation Phrases (IPs), Inter-Pausal Units (IPUs) or
    more conversational Turn Constructional Units (TCUs), and are
    likely to be co-occurrent with pauses in speech. But because of
    the natural slope of intonation in speech, the speech that accom-
    pany early gestures in Intonation Phrases should reveal different
    features from the speech at the end of intonational units. Tones
    should be different as well, considering the prosodic structure
    of spoken French.
  • Gamba, M., Raimondi, T., De Gregorio, C., Valente, D., Carugati, F., Cristiano, W., Ferrario, V., Torti, V., Favaro, L., Friard, O., Giacoma, C., & Ravignani, A. (2023). Rhythmic categories across primate vocal displays. In A. Astolfi, F. Asdrubali, & L. Shtrepi (Eds.), Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023 (pp. 3971-3974). Torino: European Acoustics Association.

    Abstract

    The last few years have revealed that several species may share the building blocks of Musicality with humans. The recognition of these building blocks (e.g., rhythm, frequency variation) was a necessary impetus for a new round of studies investigating rhythmic variation in animal vocal displays. Singing primates are a small group of primate species that produce modulated songs ranging from tens to thousands of vocal units. Previous studies showed that the indri, the only singing lemur, is currently the only known species that perform duet and choruses showing multiple rhythmic categories, as seen in human music. Rhythmic categories occur when temporal intervals between note onsets are not uniformly distributed, and rhythms with a small integer ratio between these intervals are typical of human music. Besides indris, white-handed gibbons and three crested gibbon species showed a prominent rhythmic category corresponding to a single small integer ratio, isochrony. This study reviews previous evidence on the co-occurrence of rhythmic categories in primates and focuses on the prospects for a comparative, multimodal study of rhythmicity in this clade.
  • Green, K., Osei-Cobbina, C., Perlman, M., & Kita, S. (2023). Infants can create different types of iconic gestures, with and without parental scaffolding. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527188.

    Abstract

    Despite the early emergence of pointing, children are generally not documented to produce iconic gestures until later in development. Although research has described this developmental trajectory and the types of iconic gestures that emerge first, there has been limited focus on iconic gestures within interactional contexts. This study identified the first 10 iconic gestures produced by five monolingual English-speaking children in a naturalistic longitudinal video corpus and analysed the interactional contexts. We found children produced their first iconic gesture between 12 and 20 months and that gestural types varied. Although 34% of gestures could have been imitated or derived from adult or child actions in the preceding context, the majority were produced independently of any observed model. In these cases, adults often led the interaction in a direction where iconic gesture was an appropriate response. Overall, we find infants can represent a referent symbolically and possess a greater capacity for innovation than previously assumed. In order to develop our understanding of how children learn to produce iconic gestures, it is important to consider the immediate interactional context. Conducting naturalistic corpus analyses could be a more ecologically valid approach to understanding how children learn to produce iconic gestures in real life contexts.
  • Hamilton, A., & Holler, J. (Eds.). (2023). Face2face: Advancing the science of social interaction [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences. Retrieved from https://royalsocietypublishing.org/toc/rstb/2023/378/1875.

    Abstract

    Face to face interaction is fundamental to human sociality but is very complex to study in a scientific fashion. This theme issue brings together cutting-edge approaches to the study of face-to-face interaction and showcases how we can make progress in this area. Researchers are now studying interaction in adult conversation, parent-child relationships, neurodiverse groups, interactions with virtual agents and various animal species. The theme issue reveals how new paradigms are leading to more ecologically grounded and comprehensive insights into what social interaction is. Scientific advances in this area can lead to improvements in education and therapy, better understanding of neurodiversity and more engaging artificial agents
  • Hellwig, B., Allen, S. E. M., Davidson, L., Defina, R., Kelly, B. F., & Kidd, E. (Eds.). (2023). The acquisition sketch project [Special Issue]. Language Documentation and Conservation Special Publication, 28.

    Abstract

    This special publication aims to build a renewed enthusiasm for collecting acquisition data across many languages, including those facing endangerment and loss. It presents a guide for documenting and describing child language and child-directed language in diverse languages and cultures, as well as a collection of acquisition sketches based on this guide. The guide is intended for anyone interested in working across child language and language documentation, including, for example, field linguists and language documenters, community language workers, child language researchers or graduate students.
  • Jadoul, Y., Düngen, D., & Ravignani, A. (2023). Live-tracking acoustic parameters in animal behavioural experiments: Interactive bioacoustics with parselmouth. In A. Astolfi, F. Asdrubali, & L. Shtrepi (Eds.), Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023 (pp. 4675-4678). Torino: European Acoustics Association.

    Abstract

    Most bioacoustics software is used to analyse the already collected acoustics data in batch, i.e., after the data-collecting phase of a scientific study. However, experiments based on animal training require immediate and precise reactions from the experimenter, and thus do not easily dovetail with a typical bioacoustics workflow. Bridging this methodological gap, we have developed a custom application to live-monitor the vocal development of harbour seals in a behavioural experiment. In each trial, the application records and automatically detects an animal's call, and immediately measures duration and acoustic measures such as intensity, fundamental frequency, or formant frequencies. It then displays a spectrogram of the recording and the acoustic measurements, allowing the experimenter to instantly evaluate whether or not to reinforce the animal's vocalisation. From a technical perspective, the rapid and easy development of this custom software was made possible by combining multiple open-source software projects. Here, we integrated the acoustic analyses from Parselmouth, a Python library for Praat, together with PyAudio and Matplotlib's recording and plotting functionality, into a custom graphical user interface created with PyQt. This flexible recombination of different open-source Python libraries allows the whole program to be written in a mere couple of hundred lines of code
  • Jordanoska, I. (2023). Focus marking and size in some Mande and Atlantic languages. In N. Sumbatova, I. Kapitonov, M. Khachaturyan, S. Oskolskaya, & V. Verhees (Eds.), Songs and Trees: Papers in Memory of Sasha Vydrina (pp. 311-343). St. Petersburg: Institute for Linguistic Studies and Russian Academy of Sciences.

    Abstract

    This paper compares the focus marking systems and the focus size that can be expressed by the different focus markings in four Mande and three Atlantic languages and varieties, namely: Bambara, Dyula, Kakabe, Soninke (Mande), Wolof, Jóola Foñy and Jóola Karon (Atlantic). All of these languages are known to mark focus morphosyntactically, rather than prosodically, as the more well-studied Germanic languages do. However, the Mande languages under discussion use only morphology, in the form of a particle that follows the focus, while the Atlantic ones use a more complex morphosyntactic system in which focus is marked by morphology in the verbal complex and movement of the focused term. It is shown that while there are some syntactic restrictions to how many different focus sizes can be marked in a distinct way, there is also a certain degree of arbitrariness as to which focus sizes are marked in the same way as each other.
  • Jordanoska, I., Kocher, A., & Bendezú-Araujo, R. (Eds.). (2023). Marking the truth: A cross-linguistic approach to verum [Special Issue]. Zeitschrift für Sprachwissenschaft, 42(3).
  • Kanakanti, M., Singh, S., & Shrivastava, M. (2023). MultiFacet: A multi-tasking framework for speech-to-sign language generation. In E. André, M. Chetouani, D. Vaufreydaz, G. Lucas, T. Schultz, L.-P. Morency, & A. Vinciarelli (Eds.), ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction (pp. 205-213). New York: ACM. doi:10.1145/3610661.3616550.

    Abstract

    Sign language is a rich form of communication, uniquely conveying meaning through a combination of gestures, facial expressions, and body movements. Existing research in sign language generation has predominantly focused on text-to-sign pose generation, while speech-to-sign pose generation remains relatively underexplored. Speech-to-sign language generation models can facilitate effective communication between the deaf and hearing communities. In this paper, we propose an architecture that utilises prosodic information from speech audio and semantic context from text to generate sign pose sequences. In our approach, we adopt a multi-tasking strategy that involves an additional task of predicting Facial Action Units (FAUs). FAUs capture the intricate facial muscle movements that play a crucial role in conveying specific facial expressions during sign language generation. We train our models on an existing Indian Sign language dataset that contains sign language videos with audio and text translations. To evaluate our models, we report Dynamic Time Warping (DTW) and Probability of Correct Keypoints (PCK) scores. We find that combining prosody and text as input, along with incorporating facial action unit prediction as an additional task, outperforms previous models in both DTW and PCK scores. We also discuss the challenges and limitations of speech-to-sign pose generation models to encourage future research in this domain. We release our models, results and code to foster reproducibility and encourage future research1.
  • Laparle, S. (2023). Moving past the lexical affiliate with a frame-based analysis of gesture meaning. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527218.

    Abstract

    Interpreting the meaning of co-speech gesture often involves
    identifying a gesture’s ‘lexical affiliate’, the word or phrase to
    which it most closely relates (Schegloff 1984). Though there is
    work within gesture studies that resists this simplex mapping of
    meaning from speech to gesture (e.g. de Ruiter 2000; Kendon
    2014; Parrill 2008), including an evolving body of literature on
    recurrent gesture and gesture families (e.g. Fricke et al. 2014; Müller 2017), it is still the lexical affiliate model that is most ap-
    parent in formal linguistic models of multimodal meaning(e.g.
    Alahverdzhieva et al. 2017; Lascarides and Stone 2009; Puste-
    jovsky and Krishnaswamy 2021; Schlenker 2020). In this work,
    I argue that the lexical affiliate should be carefully reconsidered
    in the further development of such models.
    In place of the lexical affiliate, I suggest a further shift
    toward a frame-based, action schematic approach to gestural
    meaning in line with that proposed in, for example, Parrill and
    Sweetser (2004) and Müller (2017). To demonstrate the utility
    of this approach I present three types of compositional gesture
    sequences which I call spatial contrast, spatial embedding, and
    cooperative abstract deixis. All three rely on gestural context,
    rather than gesture-speech alignment, to convey interactive (i.e.
    pragmatic) meaning. The centrality of gestural context to ges-
    ture meaning in these examples demonstrates the necessity of
    developing a model of gestural meaning independent of its in-
    tegration with speech.
  • Levinson, S. C. (2023). On cognitive artifacts. In R. Feldhay (Ed.), The evolution of knowledge: A scientific meeting in honor of Jürgen Renn (pp. 59-78). Berlin: Max Planck Institute for the History of Science.

    Abstract

    Wearing the hat of a cognitive anthropologist rather than an historian, I will try to amplify the ideas of Renn’s cited above. I argue that a particular subclass of material objects, namely “cognitive artifacts,” involves a close coupling of mind and artifact that acts like a brain prosthesis. Simple cognitive artifacts are external objects that act as aids to internal
    computation, and not all cultures have extended inventories of these. Cognitive artifacts in this sense (e.g., calculating or measuring devices) have clearly played a central role in the history of science. But the notion can be widened to take in less material externalizations of cognition, like writing and language itself. A critical question here is how and why this close coupling of internal computation and external device actually works, a rather neglected question to which I’ll suggest some answers.

    Additional information

    link to book
  • Levshina, N. (2023). Testing communicative and learning biases in a causal model of language evolution:A study of cues to Subject and Object. In M. Degano, T. Roberts, G. Sbardolini, & M. Schouwstra (Eds.), The Proceedings of the 23rd Amsterdam Colloquium (pp. 383-387). Amsterdam: University of Amsterdam.
  • Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators. In CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

    Abstract

    Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI's ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as 'open source', many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human labour is involved), and careful scientific documentation is exceedingly rare. Degrees of openness are relevant to fairness and accountability at all points, from data collection and curation to model architecture, and from training and fine-tuning to release and deployment.
  • Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems. In Proceedings of the 24rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDial 2023). doi:10.18653/v1/2023.sigdial-1.45.

    Abstract

    Speech recognition systems are a key intermediary in voice-driven human-computer interaction. Although speech recognition works well for pristine monologic audio, real-life use cases in open-ended interactive settings still present many challenges. We argue that timing is mission-critical for dialogue systems, and evaluate 5 major commercial ASR systems for their conversational and multilingual support. We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1). This impacts especially the recognition of conversational words (study 2), and in turn has dire consequences for downstream intent recognition (study 3). Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies.
  • Nabrotzky, J., Ambrazaitis, G., Zellers, M., & House, D. (2023). Temporal alignment of manual gestures’ phase transitions with lexical and post-lexical accentual F0 peaks in spontaneous Swedish interaction. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527194.

    Abstract

    Many studies investigating the temporal alignment of co-speech
    gestures to acoustic units in the speech signal find a close
    coupling of the gestural landmarks and pitch accents or the
    stressed syllable of pitch-accented words. In English, a pitch
    accent is anchored in the lexically stressed syllable. Hence, it is
    unclear whether it is the lexical phonological dimension of
    stress, or the phrase-level prominence that determines the
    details of speech-gesture synchronization. This paper explores
    the relation between gestural phase transitions and accentual F0
    peaks in Stockholm Swedish, which exhibits a lexical pitch
    accent distinction. When produced with phrase-level
    prominence, there are three different configurations of
    lexicality of F0 peaks and the status of the syllable it is aligned
    with. Through analyzing the alignment of the different F0 peaks
    with gestural onsets in spontaneous dyadic conversations, we
    aim to contribute to our understanding of the role of lexical
    prosodic phonology in the co-production of speech and gesture.
    The results, though limited by a small dataset, still suggest
    differences between the three types of peaks concerning which
    types of gesture phase onsets they tend to align with, and how
    well these landmarks align with each other, although these
    differences did not reach significance.
  • Offrede, T., Mishra, C., Skantze, G., Fuchs, S., & Mooshammer, C. (2023). Do Humans Converge Phonetically When Talking to a Robot? In R. Skarnitzl, & J. Volin (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (pp. 3507-3511). Prague: GUARANT International.

    Abstract

    Phonetic convergence—i.e., adapting one’s speech
    towards that of an interlocutor—has been shown
    to occur in human-human conversations as well as
    human-machine interactions. Here, we investigate
    the hypothesis that human-to-robot convergence is
    influenced by the human’s perception of the robot
    and by the conversation’s topic. We conducted a
    within-subjects experiment in which 33 participants
    interacted with two robots differing in their eye gaze
    behavior—one looked constantly at the participant;
    the other produced gaze aversions, similarly to a
    human’s behavior. Additionally, the robot asked
    questions with increasing intimacy levels.
    We observed that the speakers tended to converge
    on F0 to the robots. However, this convergence
    to the robots was not modulated by how the
    speakers perceived them or by the topic’s intimacy.
    Interestingly, speakers produced lower F0 means
    when talking about more intimate topics. We
    discuss these findings in terms of current theories of
    conversational convergence.
  • Pereira Soares, S. M., Chaouch-Orozco, A., & González Alonso, J. (2023). Innovations and challenges in acquisition and processing methodologies for L3/Ln. In J. Cabrelli, A. Chaouch-Orozco, J. González Alonso, S. M. Pereira Soares, E. Puig-Mayenco, & J. Rothman (Eds.), The Cambridge handbook of third language acquisition (pp. 661-682). Cambridge: Cambridge University Press. doi:10.1017/9781108957823.026.

    Abstract

    The advent of psycholinguistic and neurolinguistic methodologies has provided new insights into theories of language acquisition. Sequential multilingualism is no exception, and some of the most recent work on the subject has incorporated a particular focus on language processing. This chapter surveys some of the work on the processing of lexical and morphosyntactic aspects of third or further languages, with different offline and online methodologies. We also discuss how, while increasingly sophisticated techniques and experimental designs have improved our understanding of third language acquisition and processing, simpler but clever designs can answer pressing questions in our theoretical debate. We provide examples of both sophistication and clever simplicity in experimental design, and argue that the field would benefit from incorporating a combination of both concepts into future work.
  • Raviv, L., & Kirby, S. (2023). Self domestication and the cultural evolution of language. In J. J. Tehrani, J. Kendal, & R. Kendal (Eds.), The Oxford Handbook of Cultural Evolution. Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780198869252.013.60.

    Abstract

    The structural design features of human language emerge in the process of cultural evolution, shaping languages over the course of communication, learning, and transmission. What role does this leave biological evolution? This chapter highlights the biological bases and preconditions that underlie the particular type of prosocial behaviours and cognitive inference abilities that are required for languages to emerge via cultural evolution to begin with.
  • Sander, J., Lieberman, A., & Rowland, C. F. (2023). Exploring joint attention in American Sign Language: The influence of sign familiarity. In M. Goldwater, F. K. Anggoro, B. K. Hayes, & D. C. Ong (Eds.), Proceedings of the 45th Annual Meeting of the Cognitive Science Society (CogSci 2023) (pp. 632-638).

    Abstract

    Children’s ability to share attention with another social partner (i.e., joint attention) has been found to support language development. Despite the large amount of research examining the effects of joint attention on language in hearing population, little is known about how deaf children learning sign languages achieve joint attention with their caregivers during natural social interaction and how caregivers provide and scaffold learning opportunities for their children. The present study investigates the properties and timing of joint attention surrounding familiar and novel naming events and their relationship to children’s vocabulary. Naturalistic play sessions of caretaker-child-dyads using American Sign Language were analyzed in regards to naming events of either familiar or novel object labeling events and the surrounding joint attention events. We observed that most naming events took place in the context of a successful joint attention event and that sign familiarity was related to the timing of naming events within the joint attention events. Our results suggest that caregivers are highly sensitive to their child’s visual attention in interactions and modulate joint attention differently in the context of naming events of familiar vs. novel object labels.
  • Sekine, K., & Kajikawa, T. (2023). Does the spatial distribution of a speaker's gaze and gesture impact on a listener's comprehension of discourse? In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527208.

    Abstract

    This study investigated the impact of a speaker's gaze direction
    on a listener's comprehension of discourse. Previous research
    suggests that hand gestures play a role in referent allocation,
    enabling listeners to better understand the discourse. The
    current study aims to determine whether the speaker's gaze
    direction has a similar effect on reference resolution as co-
    speech gestures. Thirty native Japanese speakers participated in
    the study and were assigned to one of three conditions:
    congruent, incongruent, or speech-only. Participants watched
    36 videos of an actor narrating a story consisting of three
    sentences with two protagonists. The speaker consistently
    used hand gestures to allocate one protagonist to the lower right
    and the other to the lower left space, while directing her gaze to
    either space of the target person (congruent), the other person
    (incongruent), or no particular space (speech-only). Participants
    were required to verbally answer a question about the target
    protagonist involved in an accidental event as quickly as
    possible. Results indicate that participants in the congruent
    condition exhibited faster reaction times than those in the
    incongruent condition, although the difference was not
    significant. These findings suggest that the speaker's gaze
    direction is not enough to facilitate a listener's comprehension
    of discourse.
  • Senft, G. (2023). The system of classifiers in Kilivila - The role of these formatives and their functions. In M. Allassonnière-Tang, & M. Kilarski (Eds.), Nominal Classification in Asia and Oceania. Functional and diachronic perspectives (pp. 10-29). Amsterdam: John Benjamins. doi:10.1075/cilt.362.02sen.

    Abstract

    This paper presents the complex system of classifiers in Kilivila, the language of the Trobriand Islanders of Papua New Guinea. After a brief introduction to the language and its speakers, the classifier system is briefly described with respect to the role of these formatives for the word formation of Kilivila numerals, adjectives, demonstratives and one form of an interrogative pronoun/adverb. Then the functions the classifier system fulfils with respect to concord, temporary classification, the unitizing of nominal expressions, nominalization, indication of plural, anaphoric reference as well as text and discourse coherence are discussed and illustrated. The paper ends with some language specific and cross-linguistic questions for further research.
  • Severijnen, G. G. A., Bosker, H. R., & McQueen, J. M. (2023). Syllable rate drives rate normalization, but is not the only factor. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th International Congress of the Phonetic Sciences (ICPhS 2023) (pp. 56-60). Prague: Guarant International.

    Abstract

    Speech is perceived relative to the speech rate in the context. It is unclear, however, what information listeners use to compute speech rate. The present study examines whether listeners use the number of
    syllables per unit time (i.e., syllable rate) as a measure of speech rate, as indexed by subsequent vowel perception. We ran two rate-normalization experiments in which participants heard duration-matched word lists that contained either monosyllabic
    vs. bisyllabic words (Experiment 1), or monosyllabic vs. trisyllabic pseudowords (Experiment 2). The participants’ task was to categorize an /ɑ-aː/ continuum that followed the word lists. The monosyllabic condition was perceived as slower (i.e., fewer /aː/ responses) than the bisyllabic and
    trisyllabic condition. However, no difference was observed between bisyllabic and trisyllabic contexts. Therefore, while syllable rate is used in perceiving speech rate, other factors, such as fast speech processes, mean F0, and intensity, must also influence rate normalization.
  • Siahaan, P., & Wijaya Rajeg, G. P. (2023). Multimodal language use in Indonesian: Recurrent gestures associated with negation. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527196.

    Abstract

    This paper presents research findings on manual gestures
    associated with negation in Indonesian, utilizing data sourced
    from talk shows available on YouTube. The study reveals that
    Indonesian speakers employ six recurrent negation gestures,
    which have been observed in various languages worldwide.
    This suggests that gestures exhibiting a stable form-meaning
    relationship and recurring frequently in relation to negation are
    prevalent around the globe, although their distribution may
    differ across cultures and languages. Furthermore, the paper
    demonstrates that negation gestures are not strictly tied to
    verbal negation. Overall, the aim of this paper is to contribute
    to a deeper understanding of the conventional usage and cross-
    linguistic distribution of recurrent gestures.
  • Stern, G. (2023). On embodied use of recognitional demonstratives. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527204.

    Abstract

    This study focuses on embodied uses of recognitional
    demonstratives. While multimodal conversation analytic
    studies have shown how gesture and speech interact in the
    elaboration of exophoric references, little attention has been
    given to the multimodal configuration of other types of
    referential actions. Based on a video-recorded corpus of
    professional meetings held in French, this qualitative study
    shows that a subtype of deictic references, namely recognitional
    references, are frequently associated with iconic gestures, thus
    challenging the traditional distinction between exophoric and
    endophoric uses of deixis.
  • Uhrig, P., Payne, E., Pavlova, I., Burenko, I., Dykes, N., Baltazani, M., Burrows, E., Hale, S., Torr, P., & Wilson, A. (2023). Studying time conceptualisation via speech, prosody, and hand gesture: Interweaving manual and computational methods of analysis. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527220.

    Abstract

    This paper presents a new interdisciplinary methodology for the
    analysis of future conceptualisations in big messy media data.
    More specifically, it focuses on the depictions of post-Covid
    futures by RT during the pandemic, i.e. on data which are of
    interest not just from the perspective of academic research but
    also of policy engagement. The methodology has been
    developed to support the scaling up of fine-grained data-driven
    analysis of discourse utterances larger than individual lexical
    units which are centred around ‘will’ + the infinitive. It relies
    on the true integration of manual analytical and computational
    methods and tools in researching three modalities – textual,
    prosodic1, and gestural. The paper describes the process of
    building a computational infrastructure for the collection and
    processing of video data, which aims to empower the manual
    analysis. It also shows how manual analysis can motivate the
    development of computational tools. The paper presents
    individual computational tools to demonstrate how the
    combination of human and machine approaches to analysis can
    reveal new manifestations of cohesion between gesture and
    prosody. To illustrate the latter, the paper shows how the
    boundaries of prosodic units can work to help determine the
    boundaries of gestural units for future conceptualisations.
  • Uluşahin, O., Bosker, H. R., McQueen, J. M., & Meyer, A. S. (2023). No evidence for convergence to sub-phonemic F2 shifts in shadowing. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th International Congress of the Phonetic Sciences (ICPhS 2023) (pp. 96-100). Prague: Guarant International.

    Abstract

    Over the course of a conversation, interlocutors sound more and more like each other in a process called convergence. However, the automaticity and grain size of convergence are not well established. This study therefore examined whether female native Dutch speakers converge to large yet sub-phonemic shifts in the F2 of the vowel /e/. Participants first performed a short reading task to establish baseline F2s for the vowel /e/, then shadowed 120 target words (alongside 360 fillers) which contained one instance of a manipulated vowel /e/ where the F2 had been shifted down to that of the vowel /ø/. Consistent exposure to large (sub-phonemic) downward shifts in F2 did not result in convergence. The results raise issues for theories which view convergence as a product of automatic integration between perception and production.
  • Verga, L., Schwartze, M., & Kotz, S. A. (2023). Neurophysiology of language pathologies. In M. Grimaldi, E. Brattico, & Y. Shtyrov (Eds.), Language Electrified: Neuromethods (pp. 753-776). New York, NY: Springer US. doi:10.1007/978-1-0716-3263-5_24.

    Abstract

    Language- and speech-related disorders are among the most frequent consequences of developmental and acquired pathologies. While classical approaches to the study of these disorders typically employed the lesion method to unveil one-to-one correspondence between locations, the extent of the brain damage, and corresponding symptoms, recent advances advocate the use of online methods of investigation. For example, the use of electrophysiology or magnetoencephalography—especially when combined with anatomical measures—allows for in vivo tracking of real-time language and speech events, and thus represents a particularly promising venue for future research targeting rehabilitative interventions. In this chapter, we provide a comprehensive overview of language and speech pathologies arising from cortical and/or subcortical damage, and their corresponding neurophysiological and pathological symptoms. Building upon the reviewed evidence and literature, we aim at providing a description of how the neurophysiology of the language network changes as a result of brain damage. We will conclude by summarizing the evidence presented in this chapter, while suggesting directions for future research.
  • Vogel, C., Koutsombogera, M., Murat, A. C., Khosrobeigi, Z., & Ma, X. (2023). Gestural linguistic context vectors encode gesture meaning. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527176.

    Abstract

    Linguistic context vectors are adapted for measuring the linguistic contexts that accompany gestures and comparable co-linguistic behaviours. Focusing on gestural semiotic types, it is demonstrated that gestural linguistic context vectors carry information associated with gesture. It is suggested that these may be used to approximate gesture meaning in a similar manner to the approximation of word meaning by context vectors.
  • Witteman, J., Karaseva, E., Schiller, N. O., & McQueen, J. M. (2023). What does successful L2 vowel acquisition depend on? A conceptual replication. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th International Congress of the Phonetic Sciences (ICPhS 2023) (pp. 928-931). Prague: Guarant International.

    Abstract

    It has been suggested that individual variation in vowel compactness of the native language (L1) and the distance between L1 vowels and vowels in the second language (L2) predict successful L2 vowel acquisition. Moreover, general articulatory skills have been proposed to account for variation in vowel compactness. In the present work, we conceptually replicate a previous study to test these hypotheses with a large sample size, a new language pair and a
    new vowel pair. We find evidence that individual variation in L1 vowel compactness has opposing effects for two different vowels. We do not find evidence that individual variation in L1 compactness
    is explained by general articulatory skills. We conclude that the results found previously might be specific to sub-groups of L2 learners and/or specific sub-sets of vowel pairs.
  • Akamine, S., Kohatsu, T., Niikuni, K., Schafer, A. J., & Sato, M. (2022). Emotions in language processing: Affective priming in embodied cognition. In Proceedings of the 39th Annual Meeting of Japanese Cognitive Science Society (pp. 326-332). Tokyo: Japanese Cognitive Science Society.
  • Bauer, B. L. M. (2022). Counting systems. In A. Ledgeway, & M. Maiden (Eds.), The Cambridge Handbook of Romance Linguistics (pp. 459-488). Cambridge: Cambridge University Press.

    Abstract

    The Romance counting system is numerical – with residues of earlier systems whereby each commodity had its own unit of quantification – and decimal. Numeral formations beyond ‘10’ are compounds, combining two or more numerals that are in an arithmetical relation, typically that of addition and multiplication. Formal variation across the (standard) Romance languages and dialects and across historical stages involves the relative sequence of the composing elements, absence or presence of connectors, their synthetic vs. analytic nature, and the degree of grammatical marking. A number of ‘deviant’ numeral formations raise the question of borrowing vs independent development, such as vigesimals (featuring a base ‘20’ instead ‘10’) in certain Romance varieties and the teen and decad formations in Romanian. The other types of numeral in Romance, which derive from the unmarked and consistent cardinals, feature a significantly higher degree of formal complexity and variation involving Latin formants and tend toward analyticity. While Latin features prominently in the Romance counting system as a source of numeral formations and suffixes, it is only in Romance that the inherited decimal system reached its full potential, illustrating its increasing prominence, reflected not only in numerals, but also in language acquisition, sign language, and post-Revolution measuring systems.
  • Bauer, B. L. M. (2022). Finite verb + infinite + object in later Latin: Early brace constructions? In G. V. M. Haverling (Ed.), Studies on Late and Vulgar Latin in the Early 21st Century: Acts of the 12th International Colloquium "Latin vulgaire – Latin tardif (pp. 166-181). Uppsala: Acta Universitatis Upsaliensis.
  • Bruggeman, L., Yu, J., & Cutler, A. (2022). Listener adjustment of stress cue use to fit language vocabulary structure. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 264-267). doi:10.21437/SpeechProsody.2022-54.

    Abstract

    In lexical stress languages, phonemically identical syllables can differ suprasegmentally (in duration, amplitude, F0). Such stress
    cues allow listeners to speed spoken-word recognition by rejecting mismatching competitors (e.g., unstressed set- in settee
    rules out stressed set- in setting, setter, settle). Such processing effects have indeed been observed in Spanish, Dutch and German, but English listeners are known to largely ignore stress cues. Dutch and German listeners even outdo English listeners in distinguishing stressed versus unstressed English syllables. This has been attributed to the relative frequency across the stress languages of unstressed syllables with full vowels; in English most unstressed syllables contain schwa, instead, and stress cues on full vowels are thus least often informative in this language. If only informativeness matters, would English listeners who encounter situations where such cues would pay off for them (e.g., learning one of those other stress languages) then shift to using stress cues? Likewise, would stress cue users with English as L2, if mainly using English, shift away from
    using the cues in English? Here we report tests of these two questions, with each receiving a yes answer. We propose that
    English listeners’ disregard of stress cues is purely pragmatic.
  • Bujok, R., Meyer, A. S., & Bosker, H. R. (2022). Visible lexical stress cues on the face do not influence audiovisual speech perception. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 259-263). doi:10.21437/SpeechProsody.2022-53.

    Abstract

    Producing lexical stress leads to visible changes on the face, such as longer duration and greater size of the opening of the mouth. Research suggests that these visual cues alone can inform participants about which syllable carries stress (i.e., lip-reading silent videos). This study aims to determine the influence of visual articulatory cues on lexical stress perception in more naturalistic audiovisual settings. Participants were presented with seven disyllabic, Dutch minimal stress pairs (e.g., VOORnaam [first name] & voorNAAM [respectable]) in audio-only (phonetic lexical stress continua without video), video-only (lip-reading silent videos), and audiovisual trials (e.g., phonetic lexical stress continua with video of talker saying VOORnaam or voorNAAM). Categorization data from video-only trials revealed that participants could distinguish the minimal pairs above chance from seeing the silent videos alone. However, responses in the audiovisual condition did not differ from the audio-only condition. We thus conclude that visual lexical stress information on the face, while clearly perceivable, does not play a major role in audiovisual speech perception. This study demonstrates that clear unimodal effects do not always generalize to more naturalistic multimodal communication, advocating that speech prosody is best considered in multimodal settings.
  • Cambier, N., Miletitch, R., Burraco, A. B., & Raviv, L. (2022). Prosociality in swarm robotics: A model to study self-domestication and language evolution. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 98-100). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Cheung, C.-Y., Yakpo, K., & Coupé, C. (2022). A computational simulation of the genesis and spread of lexical items in situations of abrupt language contact. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 115-122). Nijmegen: Joint Conference on Language Evolution (JCoLE).

    Abstract

    The current study presents an agent-based model which simulates the innovation and
    competition among lexical items in cases of language contact. It is inspired by relatively
    recent historical cases in which the linguistic ecology and sociohistorical context are highly complex. Pidgin and creole genesis offers an opportunity to obtain linguistic facts, social dynamics, and historical demography in a highly segregated society. This provides a solid ground for researching the interaction of populations with different pre-existing language systems, and how different factors contribute to the genesis of the lexicon of a newly generated mixed language. We take into consideration the population dynamics and structures, as well as a distribution of word frequencies related to language use, in order to study how social factors may affect the developmental trajectory of languages. Focusing on the case of Sranan in Suriname, our study shows that it is possible to account for the
    composition of its core lexicon in relation to different social groups, contact patterns, and
    large population movements.
  • Cho, T. (2022). The Phonetics-Prosody Interface and Prosodic Strengthening in Korean. In S. Cho, & J. Whitman (Eds.), Cambridge handbook of Korean linguistics (pp. 248-293). Cambridge: Cambridge University Press.
  • Cutler, A., Ernestus, M., Warner, N., & Weber, A. (2022). Managing speech perception data sets. In B. McDonnell, E. Koller, & L. B. Collister (Eds.), The Open Handbook of Linguistic Data Management (pp. 565-573). Cambrdige, MA, USA: MIT Press. doi:10.7551/mitpress/12200.003.0055.
  • Dingemanse, M., Liesenfeld, A., & Woensdregt, M. (2022). Convergent cultural evolution of continuers (mhmm). In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 160-167). Nijmegen: Joint Conference on Language Evolution (JCoLE). doi:10.31234/osf.io/65c79.

    Abstract

    Continuers —words like mm, mmhm, uhum and the like— are among the most frequent types of responses in conversation. They play a key role in joint action coordination by showing positive evidence of understanding and scaffolding narrative delivery. Here we investigate the hypothesis that their functional importance along with their conversational ecology places selective pressures on their form and may lead to cross-linguistic similarities through convergent cultural evolution. We compare continuer tokens in linguistically diverse conversational corpora and find languages make available highly similar forms. We then approach the causal mechanism of convergent cultural evolution using exemplar modelling, simulating the process by which a combination of effort minimization and functional specialization may push continuers to a particular region of phonological possibility space. By combining comparative linguistics and computational modelling we shed new light on the question of how language structure is shaped by and for social interaction.
  • Dingemanse, M., & Liesenfeld, A. (2022). From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022) (pp. 5614 -5633). Dublin, Ireland: Association for Computational Linguistics.

    Abstract

    Informal social interaction is the primordial home of human language. Linguistically diverse conversational corpora are an important and largely untapped resource for computational linguistics and language technology. Through the efforts of a worldwide language documentation movement, such corpora are increasingly becoming available. We show how interactional data from 63 languages (26 families) harbours insights about turn-taking, timing, sequential structure and social action, with implications for language technology, natural language understanding, and the design of conversational interfaces. Harnessing linguistically diverse conversational corpora will provide the empirical foundations for flexible, localizable, humane language technologies of the future.
  • Dona, L., & Schouwstra, M. (2022). The Role of Structural Priming, Semantics and Population Structure in Word Order Conventionalization: A Computational Model. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 171-173). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Embick, D., Creemers, A., & Goodwin Davies, A. J. (2022). Morphology and the mental lexicon: Three questions about decomposition. In A. Papafragou, J. C. Trueswell, & L. R. Gleitman (Eds.), The Oxford Handbook of the Mental Lexicon (pp. 77-97). Oxford: Oxford University Press.

    Abstract

    The most basic question for the study of morphology and the mental lexicon is whether or not words are _decomposed_: informally, this is the question of whether words are represented (and processed) in terms of some kind of smaller units; that is, broken down into constituent parts. Formally, what it means to represent or process a word as decomposed or not turns out to be quite complex. One of the basic lines of division in the field classifies approaches according to whether they decompose all “complex” words (“Full Decomposition”), or none (“Full Listing”), or some but not all, according to some criterion (typical of “Dual-Route” models). However, if we are correct, there are at least three senses in which an approach might be said to be decompositional or not, with the result that ongoing discussions of what appears to be a single large issue might not always be addressing the same distinction. Put slightly differently, there is no single question of decomposition. Instead, there are independent but related questions that define current research. Our goal here is to identify this finer-grained set of questions, as they are the ones that should assume a central place in the study of morphological and lexical representation.
  • Fisher, V. J. (2022). Unpeeling meaning: An analogy and metaphor identification and analysis tool for modern and post-modern dance, and beyond. In C. Fernandes, V. Evola, & C. Ribeiro (Eds.), Dance data, cognition, and multimodal communication (pp. 297-319). Oxford: Routledge. doi:10.4324/9781003106401-24.
  • Fletcher, J., Kidd, E., Stoakes, H., & Nordlinger, R. (2022). Prosodic phrasing, pitch range, and word order variation in Murrinhpatha. In R. Billington (Ed.), Proceedings of the 18th Australasian International Conference on Speech Science and Technology (pp. 201-205). Canberra: Australasian Speech Science and Technology Association.

    Abstract

    Like many Indigenous Australian languages, Murrinhpatha has flexible word order with no apparent configurational syntax. We analyzed an experimental corpus of Murrinhpatha utterances for associations between different thematic role orders, intonational phrasing patterns and pitch downtrends. We found that initial constituents (Agents or Patients) tend to carry the highest pitch targets (HiF0), followed by patterns of downstep and declination. Sentence-final verbs always have lower Hif0 values than either initial or medial Agents or Patients. Thematic role order does not influence intonational
    patterns, with the results suggesting that Murrinhpatha has positional prosody, although final nominals can disrupt global
    pitch downtrends regardless of thematic role.
  • Forkel, S. J. (2022). Lesion-Symptom Mapping: From Single Cases to the Human Disconnectome. In S. Della Salla (Ed.), Encyclopedia of Behavioral Neuroscience (2nd edition, pp. 142-154). Elsevier. doi:10.1016/B978-0-12-819641-0.00056-6.

    Abstract

    Lesion symptom mapping has revolutionized our understanding of the functioning of the human brain. Associating damaged voxels in the brain with loss of function has created a map of the brain that identifies critical areas. While these methods have significantly advanced our understanding, recent improvements have identified the need for multivariate and multimodal methods to map hidden lesions and damage to white matter networks beyond the lesion voxels. This article reviews the evolution of lesion-symptom mapping from single case studies to the human disconnectome.
  • Galke, L., & Scherp, A. (2022). Bag-of-words vs. graph vs. sequence in text classification: Questioning the necessity of text-graphs and the surprising strength of a wide MLP. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (pp. 4038-4051). Dublin: Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.279.
  • Galke, L., Cuber, I., Meyer, C., Nölscher, H. F., Sonderecker, A., & Scherp, A. (2022). General cross-architecture distillation of pretrained language models into matrix embedding. In Proceedings of the IEEE Joint Conference on Neural Networks (IJCNN 2022), part of the IEEE World Congress on Computational Intelligence (WCCI 2022). doi:10.1109/IJCNN55064.2022.9892144.

    Abstract

    Large pretrained language models (PreLMs) are rev-olutionizing natural language processing across all benchmarks. However, their sheer size is prohibitive for small laboratories or for deployment on mobile devices. Approaches like pruning and distillation reduce the model size but typically retain the same model architecture. In contrast, we explore distilling PreLMs into a different, more efficient architecture, Continual Multiplication of Words (CMOW), which embeds each word as a matrix and uses matrix multiplication to encode sequences. We extend the CMOW architecture and its CMOW/CBOW-Hybrid variant with a bidirectional component for more expressive power, per-token representations for a general (task-agnostic) distillation during pretraining, and a two-sequence encoding scheme that facilitates downstream tasks on sentence pairs, such as sentence similarity and natural language inference. Our matrix-based bidirectional CMOW/CBOW-Hybrid model is competitive to DistilBERT on question similarity and recognizing textual entailment, but uses only half of the number of parameters and is three times faster in terms of inference speed. We match or exceed the scores of ELMo for all tasks of the GLUE benchmark except for the sentiment analysis task SST-2 and the linguistic acceptability task CoLA. However, compared to previous cross-architecture distillation approaches, we demonstrate a doubling of the scores on detecting linguistic acceptability. This shows that matrix-based embeddings can be used to distill large PreLM into competitive models and motivates further research in this direction.
  • Gamba, M., De Gregorio, C., Valente, D., Raimondi, T., Torti, V., Miaretsoa, L., Carugati, F., Friard, O., Giacoma, C., & Ravignani, A. (2022). Primate rhythmic categories analyzed on an individual basis. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 229-236). Nijmegen: Joint Conference on Language Evolution (JCoLE).

    Abstract

    Rhythm is a fundamental feature characterizing communicative displays, and recent studies showed that primate songs encompass categorical rhythms falling on small integer ratios observed in humans. We individually assessed the presence and sexual dimorphism of rhythmic categories, analyzing songs emitted by 39 wild indris. Considering the intervals between the units given during each song, we extracted 13556 interval ratios and found three peaks (at around 0.33, 0.47, and 0.70). Two peaks indicated rhythmic categories corresponding to small integer ratios (1:1, 2:1). All individuals showed a peak at 0.70, and
    most showed those at 0.47 and 0.33. In addition, we found sex differences in the peak at 0.47 only, with males showing lower values than females. This work investigates the presence of individual rhythmic categories in a non-human species; further research may highlight the significance of rhythmicity and untie selective pressures that guided its evolution across species, including humans.
  • Hagoort, P. (2022). Reasoning and the brain. In M. Stokhof, & K. Stenning (Eds.), Rules, regularities, randomness. Festschrift for Michiel van Lambalgen (pp. 83-85). Amsterdam: Institute for Logic, Language and Computation.
  • Hintz, F., Voeten, C. C., McQueen, J. M., & Meyer, A. S. (2022). Quantifying the relationships between linguistic experience, general cognitive skills and linguistic processing skills. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 2491-2496). Toronto, Canada: Cognitive Science Society.

    Abstract

    Humans differ greatly in their ability to use language. Contemporary psycholinguistic theories assume that individual differences in language skills arise from variability in linguistic experience and in general cognitive skills. While much previous research has tested the involvement of select verbal and non-verbal variables in select domains of linguistic processing, comprehensive characterizations of the relationships among the skills underlying language use are rare. We contribute to such a research program by re-analyzing a publicly available set of data from 112 young adults tested on 35 behavioral tests. The tests assessed nine key constructs reflecting linguistic processing skills, linguistic experience and general cognitive skills. Correlation and hierarchical clustering analyses of the test scores showed that most of the tests assumed to measure the same construct correlated moderately to strongly and largely clustered together. Furthermore, the results suggest important roles of processing speed in comprehension, and of linguistic experience in production.
  • Hoeksema, N., Hagoort, P., & Vernes, S. C. (2022). Piecing together the building blocks of the vocal learning bat brain. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 294-296). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Kan, U., Gökgöz, K., Sumer, B., Tamyürek, E., & Özyürek, A. (2022). Emergence of negation in a Turkish homesign system: Insights from the family context. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 387-389). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Kohatsu, T., Akamine, S., Sato, M., & Niikuni, K. (2022). Individual differences in empathy affect perspective adoption in language comprehension. In Proceedings of the 39th Annual Meeting of Japanese Cognitive Science Society (pp. 652-656). Tokyo: Japanese Cognitive Science Society.
  • Levinson, S. C. (2022). Cognitive anthropology. In J. Verschueren, & J.-O. Östman (Eds.), Handbook of Pragmatics. Manual. 2nd edition (pp. 164-170). Amsterdam: Benjamins. doi:10.1075/hop.m2.cog1.
  • Levshina, N. (2022). Comparing Bayesian and frequentist models of language variation: The case of help + (to) Infinitive. In O. Schützler, & J. Schlüter (Eds.), Data and methods in corpus linguistics – Comparative Approaches (pp. 224-258). Cambridge: Cambridge University Press.
  • Liesenfeld, A., & Dingemanse, M. (2022). Bottom-up discovery of structure and variation in response tokens (‘backchannels’) across diverse languages. In Proceedings of Interspeech 2022 (pp. 1126-1130).

    Abstract

    Response tokens (also known as backchannels, continuers, or feedback) are a frequent feature of human interaction, where they serve to display understanding and streamline turn-taking. We propose a bottom-up method to study responsive behaviour across 16 languages (8 language families). We use sequential context and recurrence of turns formats to identify candidate response tokens in a language-agnostic way across diverse conversational corpora. We then use UMAP clustering directly on speech signals to represent structure and variation. We find that (i) written orthographic annotations underrepresent the attested variation, (ii) distinctions between formats can be gradient rather than discrete, (iii) most languages appear to make available a broad distinction between a minimal nasal format `mm' and a fuller `yeah’-like format. Charting this aspect of human interaction contributes to our understanding of interactional infrastructure across languages and can inform the design of speech technologies.
  • Liesenfeld, A., & Dingemanse, M. (2022). Building and curating conversational corpora for diversity-aware language science and technology. In F. Béchet, P. Blache, K. Choukri, C. Cieri, T. DeClerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, & J. Odijk (Eds.), Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022) (pp. 1178-1192). Marseille, France: European Language Resources Association.

    Abstract

    We present an analysis pipeline and best practice guidelines for building and curating corpora of everyday conversation in diverse languages. Surveying language documentation corpora and other resources that cover 67 languages and varieties from 28 phyla, we describe the compilation and curation process, specify minimal properties of a unified format for interactional data, and develop methods for quality control that take into account turn-taking and timing. Two case studies show the broad utility of conversational data for (i) charting human interactional infrastructure and (ii) tracing challenges and opportunities for current ASR solutions. Linguistically diverse conversational corpora can provide new insights for the language sciences and stronger empirical foundations for language technology.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2022). Seeing the advantage: Visually grounding word embeddings to better capture human semantic knowledge. In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2022) (pp. 1-11). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL).

    Abstract

    Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings.Importantly, in both experiments we show that he grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information.
  • Mishra, C., & Skantze, G. (2022). Knowing where to look: A planning-based architecture to automate the gaze behavior of social robots. In Proceedings of the 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) (pp. 1201-1208). doi:10.1109/RO-MAN53752.2022.9900740.

    Abstract

    Gaze cues play an important role in human communication and are used to coordinate turn-taking and joint attention, as well as to regulate intimacy. In order to have fluent conversations with people, social robots need to exhibit humanlike gaze behavior. Previous Gaze Control Systems (GCS) in HRI have automated robot gaze using data-driven or heuristic approaches. However, these systems tend to be mainly reactive in nature. Planning the robot gaze ahead of time could help in achieving more realistic gaze behavior and better eye-head coordination. In this paper, we propose and implement a novel planning-based GCS. We evaluate our system in a comparative within-subjects user study (N=26) between a reactive system and our proposed system. The results show that the users preferred the proposed system and that it was significantly more interpretable and better at regulating intimacy.
  • Raviv, L., Jacobson, S. L., Plotnik, J. M., Bowman, J., Lynch, V., & Benítez-Burraco, A. (2022). Elephants as a new animal model for studying the evolution of language as a result of self-domestication. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 606-608). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • de Reus, K., Carlson, D., Lowry, A., Gross, S., Garcia, M., Rubio-García, A., Salazar-Casals, A., & Ravignani, A. (2022). Body size predicts vocal tract size in a mammalian vocal learner. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 154-156). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Scholman, M., Tianai, D., Yung, F., & Demberg, V. (2022). DiscoGeM: A crowdsourced corpus of genre-mixed implicit discourse relations. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. DeClerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022) (pp. 3281-3290). Marseille, France: European Language Resources Association.

    Abstract

    We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech,
    literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods
    were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results
    show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses.
    Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text
    genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic
    relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level
    labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to
    function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into
    non-connective signals of discourse relations.
  • Severijnen, G. G., Bosker, H. R., & McQueen, J. M. (2022). Acoustic correlates of Dutch lexical stress re-examined: Spectral tilt is not always more reliable than intensity. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 278-282). doi:10.21437/SpeechProsody.2022-57.

    Abstract

    The present study examined two acoustic cues in the production
    of lexical stress in Dutch: spectral tilt and overall intensity.
    Sluijter and Van Heuven (1996) reported that spectral tilt is a
    more reliable cue to stress than intensity. However, that study
    included only a small number of talkers (10) and only syllables
    with the vowels /aː/ and /ɔ/.
    The present study re-examined this issue in a larger and
    more variable dataset. We recorded 38 native speakers of Dutch
    (20 females) producing 744 tokens of Dutch segmentally
    overlapping words (e.g., VOORnaam vs. voorNAAM, “first
    name” vs. “respectable”), targeting 10 different vowels, in
    variable sentence contexts. For each syllable, we measured
    overall intensity and spectral tilt following Sluijter and Van
    Heuven (1996).
    Results from Linear Discriminant Analyses showed that,
    for the vowel /aː/ alone, spectral tilt showed an advantage over
    intensity, as evidenced by higher stressed/unstressed syllable
    classification accuracy scores for spectral tilt. However, when
    all vowels were included in the analysis, the advantage
    disappeared.
    These findings confirm that spectral tilt plays a larger role
    in signaling stress in Dutch /aː/ but show that, for a larger
    sample of Dutch vowels, overall intensity and spectral tilt are
    equally important.
  • Slonimska, A., Özyürek, A., & Capirci, O. (2022). Simultaneity as an emergent property of sign languages. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 678-680). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Tsutsui, S., Wang, X., Weng, G., Zhang, Y., Crandall, D., & Yu, C. (2022). Action recognition based on cross-situational action-object statistics. In Proceedings of the 2022 IEEE International Conference on Development and Learning (ICDL 2022).

    Abstract

    Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition models with greater generalization ability. To do this, we take inspiration from a cognitive mechanism called cross-situational learning, which states that human learners extract the meaning of concepts by observing instances of the same concept across different situations. We perform controlled experiments with various types of action-object associations, and identify key properties of action-object co-occurrence in training data that lead to better classifiers. Given that these properties are missing in the datasets that are typically used to train action classifiers in the computer vision literature, our work provides useful insights on how we should best construct datasets for efficiently training for better generalization.
  • Van Leeuwen, T. M., & Dingemanse, M. (2022). Samenwerkende zintuigen. In S. Dekker, & H. Kause (Eds.), Wetenschappelijke doorbraken de klas in!: Geloven, Neustussenschot en Samenwerkende zintuigen (pp. 85-116). Nijmegen: Wetenschapsknooppunt Radboud Universiteit.

    Abstract

    Ook al hebben we het niet altijd door, onze zintuigen werken altijd samen. Als je iemand ziet praten, bijvoorbeeld, verwerken je hersenen automatisch tegelijkertijd het geluid van de woorden en de bewegingen van de lippen. Omdat onze zintuigen altijd samenwerken zijn onze hersenen erg gevoelig voor dingen die ‘samenhoren’ en goed bij elkaar passen. In dit hoofdstuk beschrijven we een project onderzoekend leren met als thema ‘Samenwerkende zintuigen’.
  • Van den Heuvel, H., Oostdijk, N., Rowland, C. F., & Trilsbeek, P. (2022). The CLARIN Knowledge Centre for Atypical Communication Expertise. In D. Fišer, & A. Witt (Eds.), CLARIN: The Infrastructure for Language Resources (pp. 373-388). Berlin, Boston: De Gruyter.

    Abstract

    In this chapter we introduce the CLARIN Knowledge Centre for Atypical Communication Expertise. The mission of ACE is to support researchers engaged in languages which pose particular challenges for analysis; for this, we use the umbrella term “atypical communication”. This includes language use by second-language learners, people with language disorders or those suffering from lan-guage disabilities, and languages that pose unique challenges for analysis, such as sign languages and languages spoken in a multilingual context. The chapter presents details about the collaborations and outreach of the centre, the services offered, and a number of showcases for its activities.
  • Vessel, E. A., Ishizu, T., & Bignardi, G. (2022). Neural correlates of visual aesthetic appeal. In M. Skov, & M. Nadal (Eds.), The Routledge international handbook of neuroaesthetics (pp. 103-133). London: Routledge.
  • Woensdregt, M., Jara-Ettinger, J., & Rubio-Fernandez, P. (2022). Language universals rely on social cognition: Computational models of the use of this and that to redirect the receiver’s attention. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 1382-1388). Toronto, Canada: Cognitive Science Society.

    Abstract

    Demonstratives—simple referential devices like this and that—are linguistic universals, but their meaning varies cross-linguistically. In languages like English and Italian, demonstratives are thought to encode the referent’s distance from the producer (e.g., that one means “the one far away from me”),
    while in others, like Portuguese and Spanish, they encode relative distance from both producer and receiver (e.g., aquel means “the one far away from both of us”). Here we propose that demonstratives are also sensitive to the receiver’s focus of attention, hence requiring a deeper form of social cognition
    than previously thought. We provide initial empirical and computational evidence for this idea, suggesting that producers use
    demonstratives to redirect the receiver’s attention towards the intended referent, rather than only to indicate its physical distance.
  • Zhang, Y., & Yu, C. (2022). Examining real-time attention dynamics in parent-infant picture book reading. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 1367-1374). Toronto, Canada: Cognitive Science Society.

    Abstract

    Picture book reading is a common word-learning context from which parents repeatedly name objects to their child and it has been found to facilitate early word learning. To learn the correct word-object mappings in a book-reading context, infants need to be able to link what they see with what they hear. However, given multiple objects on every book page, it is not clear how infants direct their attention to objects named by parents. The aim of the current study is to examine how infants mechanistically discover the correct word-object mappings during book reading in real time. We used head-mounted eye-tracking during parent-infant picture book reading and measured the infant's moment-by-moment visual attention to the named referent. We also examined how gesture cues provided by both the child and the parent may influence infants' attention to the named target. We found that although parents provided many object labels during book reading, infants were not able to attend to the named objects easily. However, their abilities to follow and use gestures to direct the other social partner’s attention increase the chance of looking at the named target during parent naming.
  • Amatuni, A., Schroer, S. E., Zhang, Y., Peters, R. E., Reza, M. A., Crandall, D., & Yu, C. (2021). In-the-moment visual information from the infant's egocentric view determines the success of infant word learning: A computational study. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 265-271). Vienna: Cognitive Science Society.

    Abstract

    Infants learn the meaning of words from accumulated experiences of real-time interactions with their caregivers. To study the effects of visual sensory input on word learning, we recorded infant's view of the world using head-mounted eye trackers during free-flowing play with a caregiver. While playing, infants were exposed to novel label-object mappings and later learning outcomes for these items were tested after the play session. In this study we use a classification based approach to link properties of infants' visual scenes during naturalistic labeling moments to their word learning outcomes. We find that a model which integrates both highly informative and ambiguous sensory evidence is a better fit to infants' individual learning outcomes than models where either type of evidence is taken alone, and that raw labeling frequency is unable to account for the word learning differences we observe. Here we demonstrate how a computational model, using only raw pixels taken from the egocentric scene image, can derive insights on human language learning.

Share this page