Publications

Displaying 101 - 200 of 335
  • Galke, L., Saleh, A., & Scherp, A. (2017). Word embeddings for practical information retrieval. In M. Eibl, & M. Gaedke (Eds.), INFORMATIK 2017 (pp. 2155-2167). Bonn: Gesellschaft für Informatik. doi:10.18420/in2017_215.

    Abstract

    We assess the suitability of word embeddings for practical information retrieval scenarios. Thus, we assume that users issue ad-hoc short queries where we return the first twenty retrieved documents after applying a boolean matching operation between the query and the documents. We compare the performance of several techniques that leverage word embeddings in the retrieval models to compute the similarity between the query and the documents, namely word centroid similarity, paragraph vectors, Word Mover’s distance, as well as our novel inverse document frequency (IDF) re-weighted word centroid similarity. We evaluate the performance using the ranking metrics mean average precision, mean reciprocal rank, and normalized discounted cumulative gain. Additionally, we inspect the retrieval models’ sensitivity to document length by using either only the title or the full-text of the documents for the retrieval task. We conclude that word centroid similarity is the best competitor to state-of-the-art retrieval models. It can be further improved by re-weighting the word frequencies with IDF before aggregating the respective word vectors of the embedding. The proposed cosine similarity of IDF re-weighted word vectors is competitive to the TF-IDF baseline and even outperforms it in case of the news domain with a relative percentage of 15%.
  • Galke, L., & Scherp, A. (2022). Bag-of-words vs. graph vs. sequence in text classification: Questioning the necessity of text-graphs and the surprising strength of a wide MLP. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (pp. 4038-4051). Dublin: Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.279.
  • Galke, L., Cuber, I., Meyer, C., Nölscher, H. F., Sonderecker, A., & Scherp, A. (2022). General cross-architecture distillation of pretrained language models into matrix embedding. In Proceedings of the IEEE Joint Conference on Neural Networks (IJCNN 2022), part of the IEEE World Congress on Computational Intelligence (WCCI 2022). doi:10.1109/IJCNN55064.2022.9892144.

    Abstract

    Large pretrained language models (PreLMs) are rev-olutionizing natural language processing across all benchmarks. However, their sheer size is prohibitive for small laboratories or for deployment on mobile devices. Approaches like pruning and distillation reduce the model size but typically retain the same model architecture. In contrast, we explore distilling PreLMs into a different, more efficient architecture, Continual Multiplication of Words (CMOW), which embeds each word as a matrix and uses matrix multiplication to encode sequences. We extend the CMOW architecture and its CMOW/CBOW-Hybrid variant with a bidirectional component for more expressive power, per-token representations for a general (task-agnostic) distillation during pretraining, and a two-sequence encoding scheme that facilitates downstream tasks on sentence pairs, such as sentence similarity and natural language inference. Our matrix-based bidirectional CMOW/CBOW-Hybrid model is competitive to DistilBERT on question similarity and recognizing textual entailment, but uses only half of the number of parameters and is three times faster in terms of inference speed. We match or exceed the scores of ELMo for all tasks of the GLUE benchmark except for the sentiment analysis task SST-2 and the linguistic acceptability task CoLA. However, compared to previous cross-architecture distillation approaches, we demonstrate a doubling of the scores on detecting linguistic acceptability. This shows that matrix-based embeddings can be used to distill large PreLM into competitive models and motivates further research in this direction.
  • Gamba, M., De Gregorio, C., Valente, D., Raimondi, T., Torti, V., Miaretsoa, L., Carugati, F., Friard, O., Giacoma, C., & Ravignani, A. (2022). Primate rhythmic categories analyzed on an individual basis. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 229-236). Nijmegen: Joint Conference on Language Evolution (JCoLE).

    Abstract

    Rhythm is a fundamental feature characterizing communicative displays, and recent studies showed that primate songs encompass categorical rhythms falling on small integer ratios observed in humans. We individually assessed the presence and sexual dimorphism of rhythmic categories, analyzing songs emitted by 39 wild indris. Considering the intervals between the units given during each song, we extracted 13556 interval ratios and found three peaks (at around 0.33, 0.47, and 0.70). Two peaks indicated rhythmic categories corresponding to small integer ratios (1:1, 2:1). All individuals showed a peak at 0.70, and
    most showed those at 0.47 and 0.33. In addition, we found sex differences in the peak at 0.47 only, with males showing lower values than females. This work investigates the presence of individual rhythmic categories in a non-human species; further research may highlight the significance of rhythmicity and untie selective pressures that guided its evolution across species, including humans.
  • Gamba, M., Raimondi, T., De Gregorio, C., Valente, D., Carugati, F., Cristiano, W., Ferrario, V., Torti, V., Favaro, L., Friard, O., Giacoma, C., & Ravignani, A. (2023). Rhythmic categories across primate vocal displays. In A. Astolfi, F. Asdrubali, & L. Shtrepi (Eds.), Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023 (pp. 3971-3974). Torino: European Acoustics Association.

    Abstract

    The last few years have revealed that several species may share the building blocks of Musicality with humans. The recognition of these building blocks (e.g., rhythm, frequency variation) was a necessary impetus for a new round of studies investigating rhythmic variation in animal vocal displays. Singing primates are a small group of primate species that produce modulated songs ranging from tens to thousands of vocal units. Previous studies showed that the indri, the only singing lemur, is currently the only known species that perform duet and choruses showing multiple rhythmic categories, as seen in human music. Rhythmic categories occur when temporal intervals between note onsets are not uniformly distributed, and rhythms with a small integer ratio between these intervals are typical of human music. Besides indris, white-handed gibbons and three crested gibbon species showed a prominent rhythmic category corresponding to a single small integer ratio, isochrony. This study reviews previous evidence on the co-occurrence of rhythmic categories in primates and focuses on the prospects for a comparative, multimodal study of rhythmicity in this clade.
  • Gebre, B. G., Wittenburg, P., & Heskes, T. (2013). Automatic sign language identification. In Proceeding of the 20th IEEE International Conference on Image Processing (ICIP) (pp. 2626-2630).

    Abstract

    We propose a Random-Forest based sign language identification system. The system uses low-level visual features and is based on the hypothesis that sign languages have varying distributions of phonemes (hand-shapes, locations and movements). We evaluated the system on two sign languages -- British SL and Greek SL, both taken from a publicly available corpus, called Dicta Sign Corpus. Achieved average F1 scores are about 95% - indicating that sign languages can be identified with high accuracy using only low-level visual features.
  • Gebre, B. G., Wittenburg, P., & Heskes, T. (2013). Automatic signer diarization - the mover is the signer approach. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on (pp. 283-287). doi:10.1109/CVPRW.2013.49.

    Abstract

    We present a vision-based method for signer diarization -- the task of automatically determining "who signed when?" in a video. This task has similar motivations and applications as speaker diarization but has received little attention in the literature. In this paper, we motivate the problem and propose a method for solving it. The method is based on the hypothesis that signers make more movements than their interlocutors. Experiments on four videos (a total of 1.4 hours and each consisting of two signers) show the applicability of the method. The best diarization error rate (DER) obtained is 0.16.
  • Gebre, B. G., Zampieri, M., Wittenburg, P., & Heskes, T. (2013). Improving Native Language Identification with TF-IDF weighting. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 216-223).

    Abstract

    This paper presents a Native Language Identification (NLI) system based on TF-IDF weighting schemes and using linear classifiers - support vector machines, logistic regressions and perceptrons. The system was one of the participants of the 2013 NLI Shared Task in the closed-training track, achieving 0.814 overall accuracy for a set of 11 native languages. This accuracy was only 2.2 percentage points lower than the winner's performance. Furthermore, with subsequent evaluations using 10-fold cross-validation (as given by the organizers) on the combined training and development data, the best average accuracy obtained is 0.8455 and the features that contributed to this accuracy are the TF-IDF of the combined unigrams and bigrams of words.
  • Gebre, B. G., Wittenburg, P., & Heskes, T. (2013). The gesturer is the speaker. In Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013) (pp. 3751-3755).

    Abstract

    We present and solve the speaker diarization problem in a novel way. We hypothesize that the gesturer is the speaker and that identifying the gesturer can be taken as identifying the active speaker. We provide evidence in support of the hypothesis from gesture literature and audio-visual synchrony studies. We also present a vision-only diarization algorithm that relies on gestures (i.e. upper body movements). Experiments carried out on 8.9 hours of a publicly available dataset (the AMI meeting data) show that diarization error rates as low as 15% can be achieved.
  • Gijssels, T., Bottini, R., Rueschemeyer, S.-A., & Casasanto, D. (2013). Space and time in the parietal cortex: fMRI Evidence for a meural asymmetry. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 495-500). Austin,TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0113/index.html.

    Abstract

    How are space and time related in the brain? This study contrasts two proposals that make different predictions about the interaction between spatial and temporal magnitudes. Whereas ATOM implies that space and time are symmetrically related, Metaphor Theory claims they are asymmetrically related. Here we investigated whether space and time activate the same neural structures in the inferior parietal cortex (IPC) and whether the activation is symmetric or asymmetric across domains. We measured participants’ neural activity while they made temporal and spatial judgments on the same visual stimuli. The behavioral results replicated earlier observations of a space-time asymmetry: Temporal judgments were more strongly influenced by irrelevant spatial information than vice versa. The BOLD fMRI data indicated that space and time activated overlapping clusters in the IPC and that, consistent with Metaphor Theory, this activation was asymmetric: The shared region of IPC was activated more strongly during temporal judgments than during spatial judgments. We consider three possible interpretations of this neural asymmetry, based on 3 possible functions of IPC.
  • Goudbeek, M., Smits, R., Cutler, A., & Swingley, D. (2017). Auditory and phonetic category formation. In H. Cohen, & C. Lefebvre (Eds.), Handbook of categorization in cognitive science (2nd revised ed.) (pp. 687-708). Amsterdam: Elsevier.
  • Green, K., Osei-Cobbina, C., Perlman, M., & Kita, S. (2023). Infants can create different types of iconic gestures, with and without parental scaffolding. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527188.

    Abstract

    Despite the early emergence of pointing, children are generally not documented to produce iconic gestures until later in development. Although research has described this developmental trajectory and the types of iconic gestures that emerge first, there has been limited focus on iconic gestures within interactional contexts. This study identified the first 10 iconic gestures produced by five monolingual English-speaking children in a naturalistic longitudinal video corpus and analysed the interactional contexts. We found children produced their first iconic gesture between 12 and 20 months and that gestural types varied. Although 34% of gestures could have been imitated or derived from adult or child actions in the preceding context, the majority were produced independently of any observed model. In these cases, adults often led the interaction in a direction where iconic gesture was an appropriate response. Overall, we find infants can represent a referent symbolically and possess a greater capacity for innovation than previously assumed. In order to develop our understanding of how children learn to produce iconic gestures, it is important to consider the immediate interactional context. Conducting naturalistic corpus analyses could be a more ecologically valid approach to understanding how children learn to produce iconic gestures in real life contexts.
  • Gussenhoven, C., & Zhou, W. (2013). Revisiting pitch slope and height effects on perceived duration. In Proceedings of INTERSPEECH 2013: 14th Annual Conference of the International Speech Communication Association (pp. 1365-1369).

    Abstract

    The shape of pitch contours has been shown to have an effect on the perceived duration of vowels. For instance, vowels with high level pitch and vowels with falling contours sound longer than vowels with low level pitch. Depending on whether the
    comparison is between level pitches or between level and dynamic contours, these findings have been interpreted in two ways. For inter-level comparisons, where the duration results are the reverse of production results, a hypercorrection strategy in production has been proposed [1]. By contrast, for comparisons between level pitches and dynamic contours, the
    longer production data for dynamic contours have been held responsible. We report an experiment with Dutch and Chinese listeners which aimed to show that production data and perception data are each other’s opposites for high, low, falling and rising contours. We explain the results, which are consistent with earlier findings, in terms of the compensatory listening strategy of [2], arguing that the perception effects are due to a perceptual compensation of articulatory strategies and
    constraints, rather than that differences in production compensate for psycho-acoustic perception effects.
  • Hagoort, P. (2022). Reasoning and the brain. In M. Stokhof, & K. Stenning (Eds.), Rules, regularities, randomness. Festschrift for Michiel van Lambalgen (pp. 83-85). Amsterdam: Institute for Logic, Language and Computation.
  • Hagoort, P. (2017). It is the facts, stupid. In J. Brockman, F. Van der Wa, & H. Corver (Eds.), Wetenschappelijke parels: het belangrijkste wetenschappelijke nieuws volgens 193 'briljante geesten'. Amsterdam: Maven Press.
  • Hagoort, P., & Brown, C. M. (1994). Brain responses to lexical ambiguity resolution and parsing. In C. Clifton Jr, L. Frazier, & K. Rayner (Eds.), Perspectives on sentence processing (pp. 45-81). Hilsdale NY: Lawrence Erlbaum Associates.
  • Hagoort, P. (1998). The shadows of lexical meaning in patients with semantic impairments. In B. Stemmer, & H. Whitaker (Eds.), Handbook of neurolinguistics (pp. 235-248). New York: Academic Press.
  • Hagoort, P., & Poeppel, D. (2013). The infrastructure of the language-ready brain. In M. A. Arbib (Ed.), Language, music, and the brain: A mysterious relationship (pp. 233-255). Cambridge, MA: MIT Press.

    Abstract

    This chapter sketches in very general terms the cognitive architecture of both language comprehension and production, as well as the neurobiological infrastructure that makes the human brain ready for language. Focus is on spoken language, since that compares most directly to processing music. It is worth bearing in mind that humans can also interface with language as a cognitive system using sign and text (visual) as well as Braille (tactile); that is to say, the system can connect with input/output processes in any sensory modality. Language processing consists of a complex and nested set of subroutines to get from sound to meaning (in comprehension) or meaning to sound (in production), with remarkable speed and accuracy. The fi rst section outlines a selection of the major constituent operations, from fractionating the input into manageable units to combining and unifying information in the construction of meaning. The next section addresses the neurobiological infrastructure hypothesized to form the basis for language processing. Principal insights are summarized by building on the notion of “brain networks” for speech–sound processing, syntactic processing, and the construction of meaning, bearing in mind that such a neat three-way subdivision overlooks important overlap and shared mechanisms in the neural architecture subserving language processing. Finally, in keeping with the spirit of the volume, some possible relations are highlighted between language and music that arise from the infrastructure developed here. Our characterization of language and its neurobiological foundations is necessarily selective and brief. Our aim is to identify for the reader critical questions that require an answer to have a plausible cognitive neuroscience of language processing.
  • Hagoort, P. (2017). The neural basis for primary and acquired language skills. In E. Segers, & P. Van den Broek (Eds.), Developmental Perspectives in Written Language and Literacy: In honor of Ludo Verhoeven (pp. 17-28). Amsterdam: Benjamins. doi:10.1075/z.206.02hag.

    Abstract

    Reading is a cultural invention that needs to recruit cortical infrastructure that was not designed for it (cultural recycling of cortical maps). In the case of reading both visual cortex and networks for speech processing are recruited. Here I discuss current views on the neurobiological underpinnings of spoken language that deviate in a number of ways from the classical Wernicke-Lichtheim-Geschwind model. More areas than Broca’s and Wernicke’s region are involved in language. Moreover, a division along the axis of language production and language comprehension does not seem to be warranted. Instead, for central aspects of language processing neural infrastructure is shared between production and comprehension. Arguments are presented in favor of a dynamic network view, in which the functionality of a region is co-determined by the network of regions in which it is embedded at particular moments in time. Finally, core regions of language processing need to interact with other networks (e.g. the attentional networks and the ToM network) to establish full functionality of language and communication. The consequences of this architecture for reading are discussed.
  • Hammarström, H., & O'Connor, L. (2013). Dependency sensitive typological distance. In L. Borin, & A. Saxena (Eds.), Approaches to measuring linguistic differences (pp. 337-360). Berlin: Mouton de Gruyter.
  • Hammarström, H. (2013). Noun class parallels in Kordofanian and Niger-Congo: Evidence of genealogical inheritance? In T. C. Schadeberg, & R. M. Blench (Eds.), Nuba Mountain Language Studies (pp. 549-570). Köln: Köppe.
  • Haun, D. B. M., & Over, H. (2013). Like me: A homophily-based account of human culture. In P. J. Richerson, & M. H. Christiansen (Eds.), Cultural Evolution: Society, technology, language, and religion (pp. 75-85). Cambridge, MA: MIT Press.
  • Hayano, K. (2013). Question design in conversation. In J. Sidnell, & T. Stivers (Eds.), The handbook of conversation analysis (pp. 395-414). Malden, MA: Wiley-Blackwell. doi:10.1002/9781118325001.ch19.

    Abstract

    This chapter contains sections titled: Introduction Questions Questioning and the Epistemic Gradient Presuppositions, Agenda Setting and Preferences Social Actions Implemented by Questions Questions as Building Blocks of Institutional Activities Future Directions
  • Hintz, F., Voeten, C. C., McQueen, J. M., & Meyer, A. S. (2022). Quantifying the relationships between linguistic experience, general cognitive skills and linguistic processing skills. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 2491-2496). Toronto, Canada: Cognitive Science Society.

    Abstract

    Humans differ greatly in their ability to use language. Contemporary psycholinguistic theories assume that individual differences in language skills arise from variability in linguistic experience and in general cognitive skills. While much previous research has tested the involvement of select verbal and non-verbal variables in select domains of linguistic processing, comprehensive characterizations of the relationships among the skills underlying language use are rare. We contribute to such a research program by re-analyzing a publicly available set of data from 112 young adults tested on 35 behavioral tests. The tests assessed nine key constructs reflecting linguistic processing skills, linguistic experience and general cognitive skills. Correlation and hierarchical clustering analyses of the test scores showed that most of the tests assumed to measure the same construct correlated moderately to strongly and largely clustered together. Furthermore, the results suggest important roles of processing speed in comprehension, and of linguistic experience in production.
  • Hoeksema, N., Hagoort, P., & Vernes, S. C. (2022). Piecing together the building blocks of the vocal learning bat brain. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 294-296). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Hofmeister, P., & Norcliffe, E. (2013). Does resumption facilitate sentence comprehension? In P. Hofmeister, & E. Norcliffe (Eds.), The core and the periphery: Data-driven perspectives on syntax inspired by Ivan A. Sag (pp. 225-246). Stanford, CA: CSLI Publications.
  • Holler, J., & Bavelas, J. (2017). Multi-modal communication of common ground: A review of social functions. In R. B. Church, M. W. Alibali, & S. D. Kelly (Eds.), Why gesture? How the hands function in speaking, thinking and communicating (pp. 213-240). Amsterdam: Benjamins.

    Abstract

    Until recently, the literature on common ground depicted its influence as a purely verbal phenomenon. We review current research on how common ground influences gesture. With informative exceptions, most experiments found that speakers used fewer gestures as well as fewer words in common ground contexts; i.e., the gesture/word ratio did not change. Common ground often led to more poorly articulated gestures, which parallels its effect on words. These findings support the principle of recipient design as well as more specific social functions such as grounding, the given-new contract, and Grice’s maxims. However, conceptual pacts or linking old with new information may maintain the original form. All together, these findings implicate gesture-speech ensembles rather than isolated effects on gestures alone.
  • Holler, J., Schubotz, L., Kelly, S., Schuetze, M., Hagoort, P., & Ozyurek, A. (2013). Here's not looking at you, kid! Unaddressed recipients benefit from co-speech gestures when speech processing suffers. In M. Knauff, M. Pauen, I. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 2560-2565). Austin, TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0463/index.html.

    Abstract

    In human face-to-face communication, language comprehension is a multi-modal, situated activity. However, little is known about how we combine information from these different modalities, and how perceived communicative intentions, often signaled through visual signals, such as eye
    gaze, may influence this processing. We address this question by simulating a triadic communication context in which a
    speaker alternated her gaze between two different recipients. Participants thus viewed speech-only or speech+gesture
    object-related utterances when being addressed (direct gaze) or unaddressed (averted gaze). Two object images followed
    each message and participants’ task was to choose the object that matched the message. Unaddressed recipients responded significantly slower than addressees for speech-only
    utterances. However, perceiving the same speech accompanied by gestures sped them up to a level identical to
    that of addressees. That is, when speech processing suffers due to not being addressed, gesture processing remains intact and enhances the comprehension of a speaker’s message
  • Huettig, F. (2013). Young children’s use of color information during language-vision mapping. In B. R. Kar (Ed.), Cognition and brain development: Converging evidence from various methodologies (pp. 368-391). Washington, DC: American Psychological Association Press.
  • Irvine, L., Roberts, S. G., & Kirby, S. (2013). A robustness approach to theory building: A case study of language evolution. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 2614-2619). Retrieved from http://mindmodeling.org/cogsci2013/papers/0472/index.html.

    Abstract

    Models of cognitive processes often include simplifications, idealisations, and fictionalisations, so how should we learn about cognitive processes from such models? Particularly in cognitive science, when many features of the target system are unknown, it is not always clear which simplifications, idealisations, and so on, are appropriate for a research question, and which are highly misleading. Here we use a case-study from studies of language evolution, and ideas from philosophy of science, to illustrate a robustness approach to learning from models. Robust properties are those that arise across a range of models, simulations and experiments, and can be used to identify key causal structures in the models, and the phenomenon, under investigation. For example, in studies of language evolution, the emergence of compositional structure is a robust property across models, simulations and experiments of cultural transmission, but only under pressures for learnability and expressivity. This arguably illustrates the principles underlying real cases of language evolution. We provide an outline of the robustness approach, including its limitations, and suggest that this methodology can be productively used throughout cognitive science. Perhaps of most importance, it suggests that different modelling frameworks should be used as tools to identify the abstract properties of a system, rather than being definitive expressions of theories.
  • Isbilen, E. S., McCauley, S. M., Kidd, E., & Christiansen, M. H. (2017). Testing statistical learning implicitly: A novel chunk-based measure of statistical learning. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 564-569). Austin, TX: Cognitive Science Society.

    Abstract

    Attempts to connect individual differences in statistical learning with broader aspects of cognition have received considerable attention, but have yielded mixed results. A possible explanation is that statistical learning is typically tested using the two-alternative forced choice (2AFC) task. As a meta-cognitive task relying on explicit familiarity judgments, 2AFC may not accurately capture implicitly formed statistical computations. In this paper, we adapt the classic serial-recall memory paradigm to implicitly test statistical learning in a statistically-induced chunking recall (SICR) task. We hypothesized that artificial language exposure would lead subjects to chunk recurring statistical patterns, facilitating recall of words from the input. Experiment 1 demonstrates that SICR offers more fine-grained insights into individual differences in statistical learning than 2AFC. Experiment 2 shows that SICR has higher test-retest reliability than that reported for 2AFC. Thus, SICR offers a more sensitive measure of individual differences, suggesting that basic chunking abilities may explain statistical learning.
  • Jadoul, Y., Düngen, D., & Ravignani, A. (2023). Live-tracking acoustic parameters in animal behavioural experiments: Interactive bioacoustics with parselmouth. In A. Astolfi, F. Asdrubali, & L. Shtrepi (Eds.), Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023 (pp. 4675-4678). Torino: European Acoustics Association.

    Abstract

    Most bioacoustics software is used to analyse the already collected acoustics data in batch, i.e., after the data-collecting phase of a scientific study. However, experiments based on animal training require immediate and precise reactions from the experimenter, and thus do not easily dovetail with a typical bioacoustics workflow. Bridging this methodological gap, we have developed a custom application to live-monitor the vocal development of harbour seals in a behavioural experiment. In each trial, the application records and automatically detects an animal's call, and immediately measures duration and acoustic measures such as intensity, fundamental frequency, or formant frequencies. It then displays a spectrogram of the recording and the acoustic measurements, allowing the experimenter to instantly evaluate whether or not to reinforce the animal's vocalisation. From a technical perspective, the rapid and easy development of this custom software was made possible by combining multiple open-source software projects. Here, we integrated the acoustic analyses from Parselmouth, a Python library for Praat, together with PyAudio and Matplotlib's recording and plotting functionality, into a custom graphical user interface created with PyQt. This flexible recombination of different open-source Python libraries allows the whole program to be written in a mere couple of hundred lines of code
  • De Jong, N. H., & Bosker, H. R. (2013). Choosing a threshold for silent pauses to measure second language fluency. In R. Eklund (Ed.), Proceedings of the 6th Workshop on Disfluency in Spontaneous Speech (DiSS) (pp. 17-20).

    Abstract

    Second language (L2) research often involves analyses of acoustic measures of fluency. The studies investigating fluency, however, have been difficult to compare because the measures of fluency that were used differed widely. One of the differences between studies concerns the lower cut-off point for silent pauses, which has been set anywhere between 100 ms and 1000 ms. The goal of this paper is to find an optimal cut-off point. We calculate acoustic measures of fluency using different pause thresholds and then relate these measures to a measure of L2 proficiency and to ratings on fluency.
  • Jordan, F. M., van Schaik, C. P., Francois, P., Gintis, H., Haun, D. B. M., Hruschka, D. H., Janssen, M. A., Kitts, J. A., Lehmann, L., Mathew, S., Richerson, P. J., Turchin, P., & Wiessner, P. (2013). Cultural evolution of the structure of human groups. In P. J. Richerson, & M. H. Christiansen (Eds.), Cultural Evolution: Society, technology, language, and religion (pp. 87-116). Cambridge, MA: MIT Press.
  • Jordan, F. (2013). Comparative phylogenetic methods and the study of pattern and process in kinship. In P. McConvell, I. Keen, & R. Hendery (Eds.), Kinship systems: Change and reconstruction (pp. 43-58). Salt Lake City, UT: University of Utah Press.

    Abstract

    Anthropology began by comparing aspects of kinship across cultures, while linguists interested in semantic domains such as kinship necessarily compare across languages. In this chapter I show how phylogenetic comparative methods from evolutionary biology can be used to study evolutionary processes relating to kinship and kinship terminologies across language and culture.
  • Jordanoska, I. (2023). Focus marking and size in some Mande and Atlantic languages. In N. Sumbatova, I. Kapitonov, M. Khachaturyan, S. Oskolskaya, & V. Verhees (Eds.), Songs and Trees: Papers in Memory of Sasha Vydrina (pp. 311-343). St. Petersburg: Institute for Linguistic Studies and Russian Academy of Sciences.

    Abstract

    This paper compares the focus marking systems and the focus size that can be expressed by the different focus markings in four Mande and three Atlantic languages and varieties, namely: Bambara, Dyula, Kakabe, Soninke (Mande), Wolof, Jóola Foñy and Jóola Karon (Atlantic). All of these languages are known to mark focus morphosyntactically, rather than prosodically, as the more well-studied Germanic languages do. However, the Mande languages under discussion use only morphology, in the form of a particle that follows the focus, while the Atlantic ones use a more complex morphosyntactic system in which focus is marked by morphology in the verbal complex and movement of the focused term. It is shown that while there are some syntactic restrictions to how many different focus sizes can be marked in a distinct way, there is also a certain degree of arbitrariness as to which focus sizes are marked in the same way as each other.
  • Jordens, P. (1998). Defaultformen des Präteritums. Zum Erwerb der Vergangenheitsmorphologie im Niederlänidischen. In H. Wegener (Ed.), Eine zweite Sprache lernen (pp. 61-88). Tübingen, Germany: Verlag Gunter Narr.
  • Jordens, P. (2013). Dummies and auxiliaries in the acquisition of L1 and L2 Dutch. In E. Blom, I. Van de Craats, & J. Verhagen (Eds.), Dummy Auxiliaries in First and Second Language Acquisition (pp. 341-368). Berlin: Mouton de Gruyter.
  • Kallmeyer, L., Osswald, R., & Van Valin Jr., R. D. (2013). Tree wrapping for Role and Reference Grammar. In G. Morrill, & M.-J. Nederhof (Eds.), Formal grammar: 17th and 18th International Conferences, FG 2012/2013, Opole, Poland, August 2012: revised Selected Papers, Düsseldorf, Germany, August 2013: proceedings (pp. 175-190). Heidelberg: Springer.
  • Kan, U., Gökgöz, K., Sumer, B., Tamyürek, E., & Özyürek, A. (2022). Emergence of negation in a Turkish homesign system: Insights from the family context. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 387-389). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Kanakanti, M., Singh, S., & Shrivastava, M. (2023). MultiFacet: A multi-tasking framework for speech-to-sign language generation. In E. André, M. Chetouani, D. Vaufreydaz, G. Lucas, T. Schultz, L.-P. Morency, & A. Vinciarelli (Eds.), ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction (pp. 205-213). New York: ACM. doi:10.1145/3610661.3616550.

    Abstract

    Sign language is a rich form of communication, uniquely conveying meaning through a combination of gestures, facial expressions, and body movements. Existing research in sign language generation has predominantly focused on text-to-sign pose generation, while speech-to-sign pose generation remains relatively underexplored. Speech-to-sign language generation models can facilitate effective communication between the deaf and hearing communities. In this paper, we propose an architecture that utilises prosodic information from speech audio and semantic context from text to generate sign pose sequences. In our approach, we adopt a multi-tasking strategy that involves an additional task of predicting Facial Action Units (FAUs). FAUs capture the intricate facial muscle movements that play a crucial role in conveying specific facial expressions during sign language generation. We train our models on an existing Indian Sign language dataset that contains sign language videos with audio and text translations. To evaluate our models, we report Dynamic Time Warping (DTW) and Probability of Correct Keypoints (PCK) scores. We find that combining prosody and text as input, along with incorporating facial action unit prediction as an additional task, outperforms previous models in both DTW and PCK scores. We also discuss the challenges and limitations of speech-to-sign pose generation models to encourage future research in this domain. We release our models, results and code to foster reproducibility and encourage future research1.
  • Kapatsinski, V., & Harmon, Z. (2017). A Hebbian account of entrenchment and (over)-extension in language learning. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Meeting of the Cognitive Science Society (CogSci 2017) (pp. 2366-2371). Austin, TX: Cognitive Science Society.

    Abstract

    In production, frequently used words are preferentially extended to new, though related meanings. In comprehension, frequent exposure to a word instead makes the learner confident that all of the word’s legitimate uses have been experienced, resulting in an entrenched form-meaning mapping between the word and its experienced meaning(s). This results in a perception-production dissociation, where the forms speakers are most likely to map onto a novel meaning are precisely the forms that they believe can never be used that way. At first glance, this result challenges the idea of bidirectional form-meaning mappings, assumed by all current approaches to linguistic theory. In this paper, we show that bidirectional form-meaning mappings are not in fact challenged by this production-perception dissociation. We show that the production-perception dissociation is expected even if learners of the lexicon acquire simple symmetrical form-meaning associations through simple Hebbian learning.
  • Karadöller, D. Z., Sumer, B., & Ozyurek, A. (2017). Effects of delayed language exposure on spatial language acquisition by signing children and adults. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 2372-2376). Austin, TX: Cognitive Science Society.

    Abstract

    Deaf children born to hearing parents are exposed to language input quite late, which has long-lasting effects on language production. Previous studies with deaf individuals mostly focused on linguistic expressions of motion events, which have several event components. We do not know if similar effects emerge in simple events such as descriptions of spatial configurations of objects. Moreover, previous data mainly come from late adult signers. There is not much known about language development of late signing children soon after learning sign language. We compared simple event descriptions of late signers of Turkish Sign Language (adults, children) to age-matched native signers. Our results indicate that while late signers in both age groups are native-like in frequency of expressing a relational encoding, they lag behind native signers in using morphologically complex linguistic forms compared to other simple forms. Late signing children perform similar to adults and thus showed no development over time.
  • Kember, H., Grohe, A.-.-K., Zahner, K., Braun, B., Weber, A., & Cutler, A. (2017). Similar prosodic structure perceived differently in German and English. In Proceedings of Interspeech 2017 (pp. 1388-1392). doi:10.21437/Interspeech.2017-544.

    Abstract

    English and German have similar prosody, but their speakers realize some pitch falls (not rises) in subtly different ways. We here test for asymmetry in perception. An ABX discrimination task requiring F0 slope or duration judgements on isolated vowels revealed no cross-language difference in duration or F0 fall discrimination, but discrimination of rises (realized similarly in each language) was less accurate for English than for German listeners. This unexpected finding may reflect greater sensitivity to rising patterns by German listeners, or reduced sensitivity by English listeners as a result of extensive exposure to phrase-final rises (“uptalk”) in their language
  • Kempen, G., & Harbusch, K. (1998). A 'tree adjoining' grammar without adjoining: The case of scrambling in German. In Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4).
  • Kempen, G., & Harbusch, K. (2017). Frequential test of (S)OV as unmarked word order in Dutch and German clauses: A serendipitous corpus-linguistic experiment. In H. Reckman, L. L. S. Cheng, M. Hijzelendoorn, & R. Sybesma (Eds.), Crossroads semantics: Computation, experiment and grammar (pp. 107-123). Amsterdam: Benjamins.

    Abstract

    In a paper entitled “Against markedness (and what to replace it with)”, Haspelmath argues “that the term ‘markedness’ is superfluous”, and that frequency asymmetries often explain structural (un)markedness asymmetries (Haspelmath 2006). We investigate whether this argument applies to Object and Verb orders in main (VO, marked) and subordinate (OV, unmarked) clauses of spoken and written German and Dutch, using English (without VO/OV alternation) as control. Frequency counts from six treebanks (three languages, two output modalities) do not support Haspelmath’s proposal. However, they reveal an unexpected phenomenon, most prominently in spoken Dutch and German: a small set of extremely high-frequent finite verbs with unspecific meanings populates main clauses much more densely than subordinate clauses. We suggest these verbs accelerate the start-up of grammatical encoding, thus facilitating sentence-initial output fluency
  • Kempen, G. (1994). Innovative language checking software for Dutch. In J. Van Gent, & E. Peeters (Eds.), Proceedings of the 2e Dag van het Document (pp. 99-100). Delft: TNO Technisch Physische Dienst.
  • Kempen, G. (1998). Sentence parsing. In A. D. Friederici (Ed.), Language comprehension: A biological perspective (pp. 213-228). Berlin: Springer.
  • Kempen, G. (1994). The unification space: A hybrid model of human syntactic processing [Abstract]. In Cuny 1994 - The 7th Annual CUNY Conference on Human Sentence Processing. March 17-19, 1994. CUNY Graduate Center, New York.
  • Kempen, G., & Dijkstra, A. (1994). Toward an integrated system for grammar, writing and spelling instruction. In L. Appelo, & F. De Jong (Eds.), Computer-Assisted Language Learning: Proceedings of the Seventh Twente Workshop on Language Technology (pp. 41-46). Enschede: University of Twente.
  • Khetarpal, N., Neveu, G., Majid, A., Michael, L., & Regier, T. (2013). Spatial terms across languages support near-optimal communication: Evidence from Peruvian Amazonia, and computational analyses. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (pp. 764-769). Austin, TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0158/index.html.

    Abstract

    Why do languages have the categories they do? It has been argued that spatial terms in the world’s languages reflect categories that support highly informative communication, and that this accounts for the spatial categories found across languages. However, this proposal has been tested against only nine languages, and in a limited fashion. Here, we consider two new languages: Maijɨki, an under-documented language of Peruvian Amazonia, and English. We analyze spatial data from these two new languages and the original nine, using thorough and theoretically targeted computational tests. The results support the hypothesis that spatial terms across dissimilar languages enable near-optimally informative communication, over an influential competing hypothesis
  • Kidd, E., Bavin, S. L., & Brandt, S. (2013). The role of the lexicon in the development of the language processor. In D. Bittner, & N. Ruhlig (Eds.), Lexical bootstrapping: The role of lexis and semantics in child language development (pp. 217-244). Berlin: De Gruyter Mouton.
  • Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In Gesture and Sign-Language in Human-Computer Interaction (Lecture Notes in Artificial Intelligence - LNCS Subseries, Vol. 1371) (pp. 23-35). Berlin, Germany: Springer-Verlag.

    Abstract

    The previous literature has suggested that the hand movement in co-speech gestures and signs consists of a series of phases with qualitatively different dynamic characteristics. In this paper, we propose a syntagmatic rule system for movement phases that applies to both co-speech gestures and signs. Descriptive criteria for the rule system were developed for the analysis video-recorded continuous production of signs and gesture. It involves segmenting a stream of body movement into phases and identifying different phase types. Two human coders used the criteria to analyze signs and cospeech gestures that are produced in natural discourse. It was found that the criteria yielded good inter-coder reliability. These criteria can be used for the technology of automatic recognition of signs and co-speech gestures in order to segment continuous production and identify the potentially meaningbearing phase.
  • Klamer, M., Trilsbeek, P., Hoogervorst, T., & Haskett, C. (2017). Creating a Language Archive of Insular South East Asia and West New Guinea. In J. Odijk, & A. Van Hessen (Eds.), CLARIN in the Low Countries (pp. 113-121). London: Ubiquity Press. doi:10.5334/bbi.10.

    Abstract

    The geographical region of Insular South East Asia and New Guinea is well-known as an
    area of mega-biodiversity. Less well-known is the extreme linguistic diversity in this area:
    over a quarter of the world’s 6,000 languages are spoken here. As small minority languages,
    most of them will cease to be spoken in the coming few generations. The project described
    here ensures the preservation of unique records of languages and the cultures encapsulated
    by them in the region. The language resources were gathered by twenty linguists at,
    or in collaboration with, Dutch universities over the last 40 years, and were compiled and
    archived in collaboration with The Language Archive (TLA) at the Max Planck Institute in
    Nijmegen. The resulting archive constitutes a collection ofmultimediamaterials and written
    documents from 48 languages in Insular South East Asia and West New Guinea. At TLA,
    the data was archived according to state-of-the-art standards (TLA holds the Data Seal of
    Approval): the component metadata infrastructure CMDI was used; all metadata categories
    as well as relevant units of annotation were linked to the ISO data category registry ISOcat.
    This guaranteed proper integration of the language resources into the CLARIN framework.
    Through the archive, future speaker communities and researchers will be able to extensively
    search thematerials for answers to their own questions, even if they do not themselves know the language, and even if the language dies.
  • Klein, W. (2013). Basic variety. In P. Robinson (Ed.), The Routledge encyclopedia of second language acquisition (pp. 64-65). New York: Routledge.
  • Klein, W. (1998). Ein Blick zurück auf die Varietätengrammatik. In U. Ammon, K. Mattheier, & P. Nelde (Eds.), Sociolinguistica: Internationales Jahrbuch für europäische Soziolinguistik (pp. 22-38). Tübingen: Niemeyer.
  • Klein, W. (1998). Assertion and finiteness. In N. Dittmar, & Z. Penner (Eds.), Issues in the theory of language acquisition: Essays in honor of Jürgen Weissenborn (pp. 225-245). Bern: Peter Lang.
  • Klein, W. (1994). Für eine rein zeitliche Deutung von Tempus und Aspekt. In R. Baum (Ed.), Lingua et Traditio: Festschrift für Hans Helmut Christmann zum 65. Geburtstag (pp. 409-422). Tübingen: Narr.
  • Klein, W. (1994). Keine Känguruhs zur Linken: Über die Variabilität von Raumvorstellungen und ihren Ausdruck in der Sprache. In H.-J. Kornadt, J. Grabowski, & R. Mangold-Allwinn (Eds.), Sprache und Kognition (pp. 163-182). Heidelberg, Berlin, Oxford: Spektrum.
  • Klein, W. (2013). L'effettivo declino e la crescita potenziale della lessicografia tedesca. In N. Maraschio, D. De Martiono, & G. Stanchina (Eds.), L'italiano dei vocabolari: Atti di La piazza delle lingue 2012 (pp. 11-20). Firenze: Accademia della Crusca.
  • Klein, W. (1994). Learning how to express temporality in a second language. In A. G. Ramat, & M. Vedovelli (Eds.), Società di linguistica Italiana, SLI 34: Italiano - lingua seconda/lingua straniera: Atti del XXVI Congresso (pp. 227-248). Roma: Bulzoni.
  • Klein, W. (2013). European Science Foundation (ESF) Project. In P. Robinson (Ed.), The Routledge encyclopedia of second language acquisition (pp. 220-221). New York: Routledge.
  • Klein, W., & Vater, H. (1998). The perfect in English and German. In L. Kulikov, & H. Vater (Eds.), Typology of verbal categories: Papers presented to Vladimir Nedjalkov on the occasion of his 70th birthday (pp. 215-235). Tübingen: Niemeyer.
  • Klein, W. (2013). Von Reichtum und Armut des deutschen Wortschatzes. In Deutsche Akademie für Sprache und Dichtung, & Union der deutschen Akademien der Wissenschaften (Eds.), Reichtum und Armut der deutschen Sprache (pp. 15-55). Boston: de Gruyter.
  • Kohatsu, T., Akamine, S., Sato, M., & Niikuni, K. (2022). Individual differences in empathy affect perspective adoption in language comprehension. In Proceedings of the 39th Annual Meeting of Japanese Cognitive Science Society (pp. 652-656). Tokyo: Japanese Cognitive Science Society.
  • Kristoffersen, J. H., Troelsgard, T., & Zwitserlood, I. (2013). Issues in sign language lexicography. In H. Jackson (Ed.), The Bloomsbury companion to lexicography (pp. 259-283). London: Bloomsbury.
  • Kuijpers, C. T., Coolen, R., Houston, D., & Cutler, A. (1998). Using the head-turning technique to explore cross-linguistic performance differences. In C. Rovee-Collier, L. Lipsitt, & H. Hayne (Eds.), Advances in infancy research: Vol. 12 (pp. 205-220). Stamford: Ablex.
  • Ladd, D. R., & Dediu, D. (2013). Genes and linguistic tone. In H. Pashler (Ed.), Encyclopedia of the mind (pp. 372-373). London: Sage Publications.

    Abstract

    It is usually assumed that the language spoken by a human community is independent of the community's genetic makeup, an assumption supported by an overwhelming amount of evidence. However, the possibility that language is influenced by its speakers' genes cannot be ruled out a priori, and a recently discovered correlation between the geographic distribution of tone languages and two human genes seems to point to a genetically influenced bias affecting language. This entry describes this specific correlation and highlights its major implications. Voice pitch has a variety of communicative functions. Some of these are probably universal, such as conveying information about the speaker's sex, age, and emotional state. In many languages, including the European languages, voice pitch also conveys certain sentence-level meanings such as signaling that an utterance is a question or an exclamation; these uses of pitch are known as intonation. Some languages, however, known as tone languages, nian ...
  • Laparle, S. (2023). Moving past the lexical affiliate with a frame-based analysis of gesture meaning. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527218.

    Abstract

    Interpreting the meaning of co-speech gesture often involves
    identifying a gesture’s ‘lexical affiliate’, the word or phrase to
    which it most closely relates (Schegloff 1984). Though there is
    work within gesture studies that resists this simplex mapping of
    meaning from speech to gesture (e.g. de Ruiter 2000; Kendon
    2014; Parrill 2008), including an evolving body of literature on
    recurrent gesture and gesture families (e.g. Fricke et al. 2014; Müller 2017), it is still the lexical affiliate model that is most ap-
    parent in formal linguistic models of multimodal meaning(e.g.
    Alahverdzhieva et al. 2017; Lascarides and Stone 2009; Puste-
    jovsky and Krishnaswamy 2021; Schlenker 2020). In this work,
    I argue that the lexical affiliate should be carefully reconsidered
    in the further development of such models.
    In place of the lexical affiliate, I suggest a further shift
    toward a frame-based, action schematic approach to gestural
    meaning in line with that proposed in, for example, Parrill and
    Sweetser (2004) and Müller (2017). To demonstrate the utility
    of this approach I present three types of compositional gesture
    sequences which I call spatial contrast, spatial embedding, and
    cooperative abstract deixis. All three rely on gestural context,
    rather than gesture-speech alignment, to convey interactive (i.e.
    pragmatic) meaning. The centrality of gestural context to ges-
    ture meaning in these examples demonstrates the necessity of
    developing a model of gestural meaning independent of its in-
    tegration with speech.
  • Lausberg, H., & Sloetjes, H. (2013). NEUROGES in combination with the annotation tool ELAN. In H. Lausberg (Ed.), Understanding body movement: A guide to empirical research on nonverbal behaviour with an introduction to the NEUROGES coding system (pp. 199-200). Frankfurt a/M: Lang.
  • Lee, R., Chambers, C. G., Huettig, F., & Ganea, P. A. (2017). Children’s semantic and world knowledge overrides fictional information during anticipatory linguistic processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Meeting of the Cognitive Science Society (CogSci 2017) (pp. 730-735). Austin, TX: Cognitive Science Society.

    Abstract

    Using real-time eye-movement measures, we asked how a fantastical discourse context competes with stored representations of semantic and world knowledge to influence children's and adults' moment-by-moment interpretation of a story. Seven-year- olds were less effective at bypassing stored semantic and world knowledge during real-time interpretation than adults. Nevertheless, an effect of discourse context on comprehension was still apparent.
  • Lenkiewicz, A., & Drude, S. (2013). Automatic annotation of linguistic 2D and Kinect recordings with the Media Query Language for Elan. In Proceedings of Digital Humanities 2013 (pp. 276-278).

    Abstract

    Research in body language with use of gesture recognition and speech analysis has gained much attention in the recent times, influencing disciplines related to image and speech processing.

    This study aims to design the Media Query Language (MQL) (Lenkiewicz, et al. 2012) combined with the Linguistic Media Query Interface (LMQI) for Elan (Wittenburg, et al. 2006). The system integrated with the new achievements in audio-video recognition will allow querying media files with predefined gesture phases (or motion primitives) and speech characteristics as well as combinations of both. For the purpose of this work the predefined motions and speech characteristics are called patterns for atomic elements and actions for a sequence of patterns. The main assumption is that a user-customized library of patterns and actions and automated media annotation with LMQI will reduce annotation time, hence decreasing costs of creation of annotated corpora. Increase of the number of annotated data should influence the speed and number of possible research in disciplines in which human multimodal interaction is a subject of interest and where annotated corpora are required.
  • Levelt, W. J. M. (1994). Psycholinguistics. In A. M. Colman (Ed.), Companion Encyclopedia of Psychology: Vol. 1 (pp. 319-337). London: Routledge.

    Abstract

    Linguistic skills are primarily tuned to the proper conduct of conversation. The innate ability to converse has provided species with a capacity to share moods, attitudes, and information of almost any kind, to assemble knowledge and skills, to plan coordinated action, to educate its offspring, in short, to create and transmit culture. In conversation the interlocutors are involved in negotiating meaning. Speaking is most complex cognitive-motor skill. It involves the conception of an intention, the selection of information whose expression will make that intention recognizable, the selection of appropriate words, the construction of a syntactic framework, the retrieval of the words’ sound forms, and the computation of an articulatory plan for each word and for the utterance as a whole. The question where communicative intentions come from is a psychodynamic question rather than a psycholinguistic one. Speaking is a form of social action, and it is in the context of action that intentions, goals, and subgoals develop.
  • Levelt, W. J. M. (1962). Motion breaking and the perception of causality. In A. Michotte (Ed.), Causalité, permanence et réalité phénoménales: Etudes de psychologie expérimentale (pp. 244-258). Louvain: Publications Universitaires.
  • Levelt, W. J. M., & Plomp, R. (1962). Musical consonance and critical bandwidth. In Proceedings of the 4th International Congress Acoustics (pp. 55-55).
  • Levelt, W. J. M. (1994). On the skill of speaking: How do we access words? In Proceedings ICSLP 94 (pp. 2253-2258). Yokohama: The Acoustical Society of Japan.
  • Levelt, W. J. M. (1994). Onder woorden brengen: Beschouwingen over het spreekproces. In Haarlemse voordrachten: voordrachten gehouden in de Hollandsche Maatschappij der Wetenschappen te Haarlem. Haarlem: Hollandsche maatschappij der wetenschappen.
  • Levelt, W. J. M. (1994). The skill of speaking. In P. Bertelson, P. Eelen, & G. d'Ydewalle (Eds.), International perspectives on psychological science: Vol. 1. Leading themes (pp. 89-103). Hove: Erlbaum.
  • Levelt, W. J. M. (1994). What can a theory of normal speaking contribute to AAC? In ISAAC '94 Conference Book and Proceedings. Hoensbroek: IRV.
  • Levinson, S. C. (2022). Cognitive anthropology. In J. Verschueren, & J.-O. Östman (Eds.), Handbook of Pragmatics. Manual. 2nd edition (pp. 164-170). Amsterdam: Benjamins. doi:10.1075/hop.m2.cog1.
  • Levinson, S. C. (2013). Action formation and ascription. In T. Stivers, & J. Sidnell (Eds.), The handbook of conversation analysis (pp. 103-130). Malden, MA: Wiley-Blackwell. doi:10.1002/9781118325001.ch6.

    Abstract

    Since the core matrix for language use is interaction, the main job of language
    is not to express propositions or abstract meanings, but to deliver actions.
    For in order to respond in interaction we have to ascribe to the prior turn
    a primary ‘action’ – variously thought of as an ‘illocution’, ‘speech act’, ‘move’,
    etc. – to which we then respond. The analysis of interaction also relies heavily
    on attributing actions to turns, so that, e.g., sequences can be characterized in
    terms of actions and responses. Yet the process of action ascription remains way
    understudied. We don’t know much about how it is done, when it is done, nor even
    what kind of inventory of possible actions might exist, or the degree to which they
    are culturally variable.
    The study of action ascription remains perhaps the primary unfulfilled task in
    the study of language use, and it needs to be tackled from conversationanalytic,
    psycholinguistic, cross-linguistic and anthropological perspectives.
    In this talk I try to take stock of what we know, and derive a set of goals for and
    constraints on an adequate theory. Such a theory is likely to employ, I will suggest,
    a top-down plus bottom-up account of action perception, and a multi-level notion
    of action which may resolve some of the puzzles that have repeatedly arisen.
  • Levinson, S. C. (2013). Cross-cultural universals and communication structures. In M. A. Arbib (Ed.), Language, music, and the brain: A mysterious relationship (pp. 67-80). Cambridge, MA: MIT Press.

    Abstract

    Given the diversity of languages, it is unlikely that the human capacity for language resides in rich universal syntactic machinery. More likely, it resides centrally in the capacity for vocal learning combined with a distinctive ethology for communicative interaction, which together (no doubt with other capacities) make diverse languages learnable. This chapter focuses on face-to-face communication, which is characterized by the mapping of sounds and multimodal signals onto speech acts and which can be deeply recursively embedded in interaction structure, suggesting an interactive origin for complex syntax. These actions are recognized through Gricean intention recognition, which is a kind of “ mirroring” or simulation distinct from the classic mirror neuron system. The multimodality of conversational interaction makes evident the involvement of body, hand, and mouth, where the burden on these can be shifted, as in the use of speech and gesture, or hands and face in sign languages. Such shifts having taken place during the course of human evolution. All this suggests a slightly different approach to the mystery of music, whose origins should also be sought in joint action, albeit with a shift from turn-taking to simultaneous expression, and with an affective quality that may tap ancient sources residual in primate vocalization. The deep connection of language to music can best be seen in the only universal form of music, namely song.
  • Levinson, S. C. (1994). Deixis. In R. E. Asher (Ed.), Encyclopedia of language and linguistics (pp. 853-857). Oxford: Pergamon Press.
  • Levinson, S. C. (1998). Deixis. In J. L. Mey (Ed.), Concise encyclopedia of pragmatics (pp. 200-204). Amsterdam: Elsevier.
  • Levinson, S. C. (1998). Minimization and conversational inference. In A. Kasher (Ed.), Pragmatics: Vol. 4 Presupposition, implicature and indirect speech acts (pp. 545-612). London: Routledge.
  • Levinson, S. C. (2017). Living with Manny's dangerous idea. In G. Raymond, G. H. Lerner, & J. Heritage (Eds.), Enabling human conduct: Studies of talk-in-interaction in honor of Emanuel A. Schegloff (pp. 327-349). Amsterdam: Benjamins.
  • Levinson, S. C. (2017). Speech acts. In Y. Huang (Ed.), Oxford handbook of pragmatics (pp. 199-216). Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780199697960.013.22.

    Abstract

    The essential insight of speech act theory was that when we use language, we perform actions—in a more modern parlance, core language use in interaction is a form of joint action. Over the last thirty years, speech acts have been relatively neglected in linguistic pragmatics, although important work has been done especially in conversation analysis. Here we review the core issues—the identifying characteristics, the degree of universality, the problem of multiple functions, and the puzzle of speech act recognition. Special attention is drawn to the role of conversation structure, probabilistic linguistic cues, and plan or sequence inference in speech act recognition, and to the centrality of deep recursive structures in sequences of speech acts in conversation

    Files private

    Request files
  • Levinson, S. C., & Dediu, D. (2013). The interplay of genetic and cultural factors in ongoing language evolution. In P. J. Richerson, & M. H. Christiansen (Eds.), Cultural evolution: Society, technology, language, and religion. Strüngmann Forum Reports, vol. 12 (pp. 219-232). Cambridge, Mass: MIT Press.
  • Levinson, S. C., & Senft, G. (1994). Wie lösen Sprecher von Sprachen mit absoluten und relativen Systemen des räumlichen Verweisens nicht-sprachliche räumliche Aufgaben? In Jahrbuch der Max-Planck-Gesellschaft 1994 (pp. 295-299). München: Generalverwaltung der Max-Planck-Gesellschaft München.
  • Levinson, S. C. (2023). On cognitive artifacts. In R. Feldhay (Ed.), The evolution of knowledge: A scientific meeting in honor of Jürgen Renn (pp. 59-78). Berlin: Max Planck Institute for the History of Science.

    Abstract

    Wearing the hat of a cognitive anthropologist rather than an historian, I will try to amplify the ideas of Renn’s cited above. I argue that a particular subclass of material objects, namely “cognitive artifacts,” involves a close coupling of mind and artifact that acts like a brain prosthesis. Simple cognitive artifacts are external objects that act as aids to internal
    computation, and not all cultures have extended inventories of these. Cognitive artifacts in this sense (e.g., calculating or measuring devices) have clearly played a central role in the history of science. But the notion can be widened to take in less material externalizations of cognition, like writing and language itself. A critical question here is how and why this close coupling of internal computation and external device actually works, a rather neglected question to which I’ll suggest some answers.

    Additional information

    link to book
  • Levshina, N. (2022). Comparing Bayesian and frequentist models of language variation: The case of help + (to) Infinitive. In O. Schützler, & J. Schlüter (Eds.), Data and methods in corpus linguistics – Comparative Approaches (pp. 224-258). Cambridge: Cambridge University Press.
  • Levshina, N. (2023). Testing communicative and learning biases in a causal model of language evolution:A study of cues to Subject and Object. In M. Degano, T. Roberts, G. Sbardolini, & M. Schouwstra (Eds.), The Proceedings of the 23rd Amsterdam Colloquium (pp. 383-387). Amsterdam: University of Amsterdam.
  • Levshina, N. (2023). Word classes in corpus linguistics. In E. Van Lier (Ed.), The Oxford handbook of word classes (pp. 833-850). Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780198852889.013.34.

    Abstract

    Word classes play a central role in corpus linguistics under the name of parts of speech (POS). Many popular corpora are provided with POS tags. This chapter gives examples of popular tagsets and discusses the methods of automatic tagging. It also considers bottom-up approaches to POS induction, which are particularly important for the ‘poverty of stimulus’ debate in language acquisition research. The choice of optimal POS tagging involves many difficult decisions, which are related to the level of granularity, redundancy at different levels of corpus annotation, cross-linguistic applicability, language-specific descriptive adequacy, and dealing with fuzzy boundaries between POS. The chapter also discusses the problem of flexible word classes and demonstrates how corpus data with POS tags and syntactic dependencies can be used to quantify the level of flexibility in a language.
  • Liesenfeld, A., & Dingemanse, M. (2022). Bottom-up discovery of structure and variation in response tokens (‘backchannels’) across diverse languages. In Proceedings of Interspeech 2022 (pp. 1126-1130).

    Abstract

    Response tokens (also known as backchannels, continuers, or feedback) are a frequent feature of human interaction, where they serve to display understanding and streamline turn-taking. We propose a bottom-up method to study responsive behaviour across 16 languages (8 language families). We use sequential context and recurrence of turns formats to identify candidate response tokens in a language-agnostic way across diverse conversational corpora. We then use UMAP clustering directly on speech signals to represent structure and variation. We find that (i) written orthographic annotations underrepresent the attested variation, (ii) distinctions between formats can be gradient rather than discrete, (iii) most languages appear to make available a broad distinction between a minimal nasal format `mm' and a fuller `yeah’-like format. Charting this aspect of human interaction contributes to our understanding of interactional infrastructure across languages and can inform the design of speech technologies.
  • Liesenfeld, A., & Dingemanse, M. (2022). Building and curating conversational corpora for diversity-aware language science and technology. In F. Béchet, P. Blache, K. Choukri, C. Cieri, T. DeClerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, & J. Odijk (Eds.), Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022) (pp. 1178-1192). Marseille, France: European Language Resources Association.

    Abstract

    We present an analysis pipeline and best practice guidelines for building and curating corpora of everyday conversation in diverse languages. Surveying language documentation corpora and other resources that cover 67 languages and varieties from 28 phyla, we describe the compilation and curation process, specify minimal properties of a unified format for interactional data, and develop methods for quality control that take into account turn-taking and timing. Two case studies show the broad utility of conversational data for (i) charting human interactional infrastructure and (ii) tracing challenges and opportunities for current ASR solutions. Linguistically diverse conversational corpora can provide new insights for the language sciences and stronger empirical foundations for language technology.
  • Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators. In CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

    Abstract

    Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI's ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as 'open source', many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human labour is involved), and careful scientific documentation is exceedingly rare. Degrees of openness are relevant to fairness and accountability at all points, from data collection and curation to model architecture, and from training and fine-tuning to release and deployment.
  • Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems. In Proceedings of the 24rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDial 2023). doi:10.18653/v1/2023.sigdial-1.45.

    Abstract

    Speech recognition systems are a key intermediary in voice-driven human-computer interaction. Although speech recognition works well for pristine monologic audio, real-life use cases in open-ended interactive settings still present many challenges. We argue that timing is mission-critical for dialogue systems, and evaluate 5 major commercial ASR systems for their conversational and multilingual support. We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1). This impacts especially the recognition of conversational words (study 2), and in turn has dire consequences for downstream intent recognition (study 3). Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies.
  • Little, H., Perlman, M., & Eryilmaz, K. (2017). Repeated interactions can lead to more iconic signals. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 760-765). Austin, TX: Cognitive Science Society.

    Abstract

    Previous research has shown that repeated interactions can cause iconicity in signals to reduce. However, data from several recent studies has shown the opposite trend: an increase in iconicity as the result of repeated interactions. Here, we discuss whether signals may become less or more iconic as a result of the modality used to produce them. We review several recent experimental results before presenting new data from multi-modal signals, where visual input creates audio feedback. Our results show that the growth in iconicity present in the audio information may come at a cost to iconicity in the visual information. Our results have implications for how we think about and measure iconicity in artificial signalling experiments. Further, we discuss how iconicity in real world speech may stem from auditory, kinetic or visual information, but iconicity in these different modalities may conflict.
  • Majid, A., & Enfield, N. J. (2017). Body. In H. Burkhardt, J. Seibt, G. Imaguire, & S. Gerogiorgakis (Eds.), Handbook of mereology (pp. 100-103). Munich: Philosophia.
  • Majid, A., Manko, P., & De Valk, J. (2017). Language of the senses. In S. Dekker (Ed.), Scientific breakthroughs in the classroom! (pp. 40-76). Nijmegen: Science Education Hub Radboud University.

    Abstract

    The project that we describe in this chapter has the theme ‘Language of the senses’. This theme is
    based on the research of Asifa Majid and her team regarding the influence of language and culture on
    sensory perception. The chapter consists of two sections. Section 2.1 describes how different sensory
    perceptions are spoken of in different languages. Teachers can use this section as substantive preparation
    before they launch this theme in the classroom. Section 2.2 describes how teachers can handle
    this theme in accordance with the seven phases of inquiry-based learning. Chapter 1, in which the
    general guideline of the seven phases is described, forms the basis for this. We therefore recommend
    the use of chapter 1 as the starting point for the execution of a project in the classroom. This chapter
    provides the thematic additions.

    Additional information

    Materials Language of the senses

Share this page