Publications

Displaying 101 - 200 of 222
  • Kanero, J., Franko, I., Oranç, C., Uluşahin, O., Koskulu, S., Adigüzel, Z., Küntay, A. C., & Göksun, T. (2018). Who can benefit from robots? Effects of individual differences in robot-assisted language learning. In Proceedings of the 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 212-217). Piscataway, NJ, USA: IEEE.

    Abstract

    It has been suggested that some individuals may benefit more from social robots than do others. Using second
    language (L2) as an example, the present study examined how individual differences in attitudes toward robots and personality
    traits may be related to learning outcomes. Preliminary results with 24 Turkish-speaking adults suggest that negative attitudes
    toward robots, more specifically thoughts and anxiety about the negative social impact that robots may have on the society,
    predicted how well adults learned L2 words from a social robot. The possible implications of the findings as well as future directions are also discussed
  • Kempen, G. (1997). De ontdubbelde taalgebruiker: Maken taalproductie en taalperceptie gebruik van één en dezelfde syntactische processor? [Abstract]. In 6e Winter Congres NvP. Programma and abstracts (pp. 31-32). Nederlandse Vereniging voor Psychonomie.
  • Kempen, G., Kooij, A., & Van Leeuwen, T. (1997). Do skilled readers exploit inflectional spelling cues that do not mirror pronunciation? An eye movement study of morpho-syntactic parsing in Dutch. In Abstracts of the Orthography Workshop "What spelling changes". Nijmegen: Max Planck Institute for Psycholinguistics.
  • Klein, W. (Ed.). (1983). Intonation [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (49).
  • Klein, W. (Ed.). (1997). Technologischer Wandel in den Philologien [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (106).
  • Klein, W. (Ed.). (1979). Sprache und Kontext [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (33).
  • Klein, W. (Ed.). (1990). Sprache und Raum [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (78).
  • Klein, W. (Ed.). (1987). Sprache und Ritual [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (65).
  • Klein, W., & Schlieben-Lange, B. (Eds.). (1990). Zukunft der Sprache [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (79).
  • Koch, X., & Janse, E. (2015). Effects of age and hearing loss on articulatory precision for sibilants. In M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). London: International Phonetic Association.

    Abstract

    This study investigates the effects of adult age and speaker abilities on articulatory precision for sibilant productions. Normal-hearing young adults with
    better sibilant discrimination have been shown to produce greater spectral sibilant contrasts. As reduced auditory feedback may gradually impact on feedforward
    commands, we investigate whether articulatory precision as indexed by spectral mean for [s] and [S] decreases with age, and more particularly with agerelated
    hearing loss. Younger, middle-aged and older adults read aloud words starting with the sibilants [s] or [S]. Possible effects of cognitive, perceptual, linguistic and sociolinguistic background variables
    on the sibilants’ acoustics were also investigated. Sibilant contrasts were less pronounced for male than female speakers. Most importantly, for the fricative
    [s], the spectral mean was modulated by individual high-frequency hearing loss, but not age. These results underscore that even mild hearing loss already affects articulatory precision.
  • Kohatsu, T., Akamine, S., Sato, M., & Niikuni, K. (2022). Individual differences in empathy affect perspective adoption in language comprehension. In Proceedings of the 39th Annual Meeting of Japanese Cognitive Science Society (pp. 652-656). Tokyo: Japanese Cognitive Science Society.
  • Koster, M., & Cutler, A. (1997). Segmental and suprasegmental contributions to spoken-word recognition in Dutch. In Proceedings of EUROSPEECH 97 (pp. 2167-2170). Grenoble, France: ESCA.

    Abstract

    Words can be distinguished by segmental differences or by suprasegmental differences or both. Studies from English suggest that suprasegmentals play little role in human spoken-word recognition; English stress, however, is nearly always unambiguously coded in segmental structure (vowel quality); this relationship is less close in Dutch. The present study directly compared the effects of segmental and suprasegmental mispronunciation on word recognition in Dutch. There was a strong effect of suprasegmental mispronunciation, suggesting that Dutch listeners do exploit suprasegmental information in word recognition. Previous findings indicating the effects of mis-stressing for Dutch differ with stress position were replicated only when segmental change was involved, suggesting that this is an effect of segmental rather than suprasegmental processing.
  • Lattenkamp, E. Z., Vernes, S. C., & Wiegrebe, L. (2018). Mammalian models for the study of vocal learning: A new paradigm in bats. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 235-237). Toruń, Poland: NCU Press. doi:10.12775/3991-1.056.
  • Lauscher, A., Eckert, K., Galke, L., Scherp, A., Rizvi, S. T. R., Ahmed, S., Dengel, A., Zumstein, P., & Klein, A. (2018). Linked open citation database: Enabling libraries to contribute to an open and interconnected citation graph. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 109-118). New York: ACM. doi:10.1145/3197026.3197050.

    Abstract

    Citations play a crucial role in the scientific discourse, in information retrieval, and in bibliometrics. Many initiatives are currently promoting the idea of having free and open citation data. Creation of citation data, however, is not part of the cataloging workflow in libraries nowadays.
    In this paper, we present our project Linked Open Citation Database, in which we design distributed processes and a system infrastructure based on linked data technology. The goal is to show that efficiently cataloging citations in libraries using a semi-automatic approach is possible. We specifically describe the current state of the workflow and its implementation. We show that we could significantly improve the automatic reference extraction that is crucial for the subsequent data curation. We further give insights on the curation and linking process and provide evaluation results that not only direct the further development of the project, but also allow us to discuss its overall feasibility.
  • Lefever, E., Hendrickx, I., Croijmans, I., Van den Bosch, A., & Majid, A. (2018). Discovering the language of wine reviews: A text mining account. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 3297-3302). Paris: LREC.

    Abstract

    It is widely held that smells and flavors are impossible to put into words. In this paper we test this claim by seeking predictive patterns in wine reviews, which ostensibly aim to provide guides to perceptual content. Wine reviews have previously been critiqued as random and meaningless. We collected an English corpus of wine reviews with their structured metadata, and applied machine learning techniques to automatically predict the wine's color, grape variety, and country of origin. To train the three supervised classifiers, three different information sources were incorporated: lexical bag-of-words features, domain-specific terminology features, and semantic word embedding features. In addition, using regression analysis we investigated basic review properties, i.e., review length, average word length, and their relationship to the scalar values of price and review score. Our results show that wine experts do share a common vocabulary to describe wines and they use this in a consistent way, which makes it possible to automatically predict wine characteristics based on the review text alone. This means that odors and flavors may be more expressible in language than typically acknowledged.
  • Lenkiewicz, P., Auer, E., Schreer, O., Masneri, S., Schneider, D., & Tschöpe, S. (2012). AVATecH ― automated annotation through audio and video analysis. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 209-214). European Language Resources Association.

    Abstract

    In different fields of the humanities annotations of multimodal resources are a necessary component of the research workflow. Examples include linguistics, psychology, anthropology, etc. However, creation of those annotations is a very laborious task, which can take 50 to 100 times the length of the annotated media, or more. This can be significantly improved by applying innovative audio and video processing algorithms, which analyze the recordings and provide automated annotations. This is the aim of the AVATecH project, which is a collaboration of the Max Planck Institute for Psycholinguistics (MPI) and the Fraunhofer institutes HHI and IAIS. In this paper we present a set of results of automated annotation together with an evaluation of their quality.
  • Lenkiewicz, A., Lis, M., & Lenkiewicz, P. (2012). Linguistic concepts described with Media Query Language for automated annotation. In J. C. Meiser (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 477-479).

    Abstract

    Introduction Human spoken communication is multimodal, i.e. it encompasses both speech and gesture. Acoustic properties of voice, body movements, facial expression, etc. are an inherent and meaningful part of spoken interaction; they can provide attitudinal, grammatical and semantic information. In the recent years interest in audio-visual corpora has been rising rapidly as they enable investigation of different communicative modalities and provide more holistic view on communication (Kipp et al. 2009). Moreover, for some languages such corpora are the only available resource, as is the case for endangered languages for which no written resources exist.
  • Lenkiewicz, P., Van Uytvanck, D., Wittenburg, P., & Drude, S. (2012). Towards automated annotation of audio and video recordings by application of advanced web-services. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 1880-1883).

    Abstract

    In this paper we describe audio and video processing algorithms that are developed in the scope of AVATecH project. The purpose of these algorithms is to shorten the time taken by manual annotation of audio and video recordings by extracting features from media files and creating semi-automated annotations. We show that the use of such supporting algorithms can shorten the annotation time to 30-50% of the time necessary to perform a fully manual annotation of the same kind.
  • Levelt, W. J. M., & Schriefers, H. (1987). Stages of lexical access. In G. A. Kempen (Ed.), Natural language generation: new results in artificial intelligence, psychology and linguistics (pp. 395-404). Dordrecht: Nijhoff.
  • Levelt, W. J. M. (1983). The speaker's organization of discourse. In Proceedings of the XIIIth International Congress of Linguists (pp. 278-290).
  • Levinson, S. C. (1987). Minimization and conversational inference. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 61-129). Benjamins.
  • Levinson, S. C. (1979). Pragmatics and social deixis: Reclaiming the notion of conventional implicature. In C. Chiarello (Ed.), Proceedings of the Fifth Annual Meeting of the Berkeley Linguistics Society (pp. 206-223).
  • Liesenfeld, A., & Dingemanse, M. (2022). Bottom-up discovery of structure and variation in response tokens (‘backchannels’) across diverse languages. In Proceedings of Interspeech 2022 (pp. 1126-1130).

    Abstract

    Response tokens (also known as backchannels, continuers, or feedback) are a frequent feature of human interaction, where they serve to display understanding and streamline turn-taking. We propose a bottom-up method to study responsive behaviour across 16 languages (8 language families). We use sequential context and recurrence of turns formats to identify candidate response tokens in a language-agnostic way across diverse conversational corpora. We then use UMAP clustering directly on speech signals to represent structure and variation. We find that (i) written orthographic annotations underrepresent the attested variation, (ii) distinctions between formats can be gradient rather than discrete, (iii) most languages appear to make available a broad distinction between a minimal nasal format `mm' and a fuller `yeah’-like format. Charting this aspect of human interaction contributes to our understanding of interactional infrastructure across languages and can inform the design of speech technologies.
  • Liesenfeld, A., & Dingemanse, M. (2022). Building and curating conversational corpora for diversity-aware language science and technology. In F. Béchet, P. Blache, K. Choukri, C. Cieri, T. DeClerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, & J. Odijk (Eds.), Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022) (pp. 1178-1192). Marseille, France: European Language Resources Association.

    Abstract

    We present an analysis pipeline and best practice guidelines for building and curating corpora of everyday conversation in diverse languages. Surveying language documentation corpora and other resources that cover 67 languages and varieties from 28 phyla, we describe the compilation and curation process, specify minimal properties of a unified format for interactional data, and develop methods for quality control that take into account turn-taking and timing. Two case studies show the broad utility of conversational data for (i) charting human interactional infrastructure and (ii) tracing challenges and opportunities for current ASR solutions. Linguistically diverse conversational corpora can provide new insights for the language sciences and stronger empirical foundations for language technology.
  • Little, H., Eryılmaz, K., & de Boer, B. (2015). A new artificial sign-space proxy for investigating the emergence of structure and categories in speech. In The Scottish Consortium for ICPhS 2015 (Ed.), The proceedings of the 18th International Congress of Phonetic Sciences. (ICPhS 2015).
  • Little, H., Eryılmaz, K., & de Boer, B. (2015). Linguistic modality affects the creation of structure and iconicity in signals. In D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. Jennings, & P. Maglio (Eds.), The 37th annual meeting of the Cognitive Science Society (CogSci 2015) (pp. 1392-1398). Austin, TX: Cognitive Science Society.

    Abstract

    Different linguistic modalities (speech or sign) offer different levels at which signals can iconically represent the world. One hypothesis argues that this iconicity has an effect on how linguistic structure emerges. However, exactly how and why these effects might come about is in need of empirical investigation. In this contribution, we present a signal creation experiment in which both the signalling space and the meaning space are manipulated so that different levels and types of iconicity are available between the signals and meanings. Signals are produced using an infrared sensor that detects the hand position of participants to generate auditory feedback. We find evidence that iconicity may be maladaptive for the discrimination of created signals. Further, we implemented Hidden Markov Models to characterise the structure within signals, which was also used to inform a metric for iconicity.
  • Lopopolo, A., Frank, S. L., Van den Bosch, A., Nijhof, A., & Willems, R. M. (2018). The Narrative Brain Dataset (NBD), an fMRI dataset for the study of natural language processing in the brain. In B. Devereux, E. Shutova, & C.-R. Huang (Eds.), Proceedings of LREC 2018 Workshop "Linguistic and Neuro-Cognitive Resources (LiNCR) (pp. 8-11). Paris: LREC.

    Abstract

    We present the Narrative Brain Dataset, an fMRI dataset that was collected during spoken presentation of short excerpts of three
    stories in Dutch. Together with the brain imaging data, the dataset contains the written versions of the stimulation texts. The texts are
    accompanied with stochastic (perplexity and entropy) and semantic computational linguistic measures. The richness and unconstrained
    nature of the data allows the study of language processing in the brain in a more naturalistic setting than is common for fMRI studies.
    We hope that by making NBD available we serve the double purpose of providing useful neural data to researchers interested in natural
    language processing in the brain and to further stimulate data sharing in the field of neuroscience of language.
  • Lupyan, G., Wendorf, A., Berscia, L. M., & Paul, J. (2018). Core knowledge or language-augmented cognition? The case of geometric reasoning. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 252-254). Toruń, Poland: NCU Press. doi:10.12775/3991-1.062.
  • Mai, F., Galke, L., & Scherp, A. (2018). Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 169-178). New York: ACM.

    Abstract

    For (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competitive with the performance on the full-texts if the same number of training samples is used for training. However, it is much easier to obtain title data in large quantities and to use it for training than full-text data. In this paper, we investigate the question how models obtained from training on increasing amounts of title training data compare to models from training on a constant number of full-texts. We evaluate this question on a large-scale dataset from the medical domain (PubMed) and from economics (EconBiz). In these datasets, the titles and annotations of millions of publications are available, and they outnumber the available full-texts by a factor of 20 and 15, respectively. To exploit these large amounts of data to their full potential, we develop three strong deep learning classifiers and evaluate their performance on the two datasets. The results are promising. On the EconBiz dataset, all three classifiers outperform their full-text counterparts by a large margin. The best title-based classifier outperforms the best full-text method by 9.4%. On the PubMed dataset, the best title-based method almost reaches the performance of the best full-text classifier, with a difference of only 2.9%.
  • Majid, A. (2012). Taste in twenty cultures [Abstract]. Abstracts from the XXIth Congress of European Chemoreception Research Organization, ECRO-2011. Publ. in Chemical Senses, 37(3), A10.

    Abstract

    Scholars disagree about the extent to which language can tell us
    about conceptualisation of the world. Some believe that language
    is a direct window onto concepts: Having a word ‘‘bird’’, ‘‘table’’ or
    ‘‘sour’’ presupposes the corresponding underlying concept, BIRD,
    TABLE, SOUR. Others disagree. Words are thought to be uninformative,
    or worse, misleading about our underlying conceptual representations;
    after all, our mental worlds are full of ideas that we
    struggle to express in language. How could this be so, argue sceptics,
    if language were a direct window on our inner life? In this presentation,
    I consider what language can tell us about the
    conceptualisation of taste. By considering linguistic data from
    twenty unrelated cultures – varying in subsistence mode (huntergatherer
    to industrial), ecological zone (rainforest jungle to desert),
    dwelling type (rural and urban), and so forth – I argue any single language is, indeed, impoverished about what it can reveal about
    taste. But recurrent lexicalisation patterns across languages can
    provide valuable insights about human taste experience. Moreover,
    language patterning is part of the data that a good theory of taste
    perception has to be answerable for. Taste researchers, therefore,
    cannot ignore the crosslinguistic facts.
  • Majid, A., Jordan, F., & Dunn, M. (Eds.). (2015). Semantic systems in closely related languages [Special Issue]. Language Sciences, 49.
  • Majid, A., Boroditsky, L., & Gaby, A. (Eds.). (2012). Time in terms of space [Research topic] [Special Issue]. Frontiers in cultural psychology. Retrieved from http://www.frontiersin.org/cultural_psychology/researchtopics/Time_in_terms_of_space/755.

    Abstract

    This Research Topic explores the question: what is the relationship between representations of time and space in cultures around the world? This question touches on the broader issue of how humans come to represent and reason about abstract entities – things we cannot see or touch. Time is a particularly opportune domain to investigate this topic. Across cultures, people use spatial representations for time, for example in graphs, time-lines, clocks, sundials, hourglasses, and calendars. In language, time is also heavily related to space, with spatial terms often used to describe the order and duration of events. In English, for example, we might move a meeting forward, push a deadline back, attend a long concert or go on a short break. People also make consistent spatial gestures when talking about time, and appear to spontaneously invoke spatial representations when processing temporal language. A large body of evidence suggests a close correspondence between temporal and spatial language and thought. However, the ways that people spatialize time can differ dramatically across languages and cultures. This research topic identifies and explores some of the sources of this variation, including patterns in spatial thinking, patterns in metaphor, gesture and other cultural systems. This Research Topic explores how speakers of different languages talk about time and space and how they think about these domains, outside of language. The Research Topic invites papers exploring the following issues: 1. Do the linguistic representations of space and time share the same lexical and morphosyntactic resources? 2. To what extent does the conceptualization of time follow the conceptualization of space?
  • Merkx, D., Frank, S. L., & Ernestus, M. (2022). Seeing the advantage: Visually grounding word embeddings to better capture human semantic knowledge. In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2022) (pp. 1-11). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL).

    Abstract

    Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings.Importantly, in both experiments we show that he grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information.
  • Merkx, D., & Scharenborg, O. (2018). Articulatory feature classification using convolutional neural networks. In Proceedings of Interspeech 2018 (pp. 2142-2146). doi:10.21437/Interspeech.2018-2275.

    Abstract

    The ultimate goal of our research is to improve an existing speech-based computational model of human speech recognition on the task of simulating the role of fine-grained phonetic information in human speech processing. As part of this work we are investigating articulatory feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Different approaches have been used to build AF classifiers, most notably multi-layer perceptrons. Recently, deep neural networks have been applied to the task of AF classification. This paper aims to improve AF classification by investigating two different approaches: 1) investigating the usefulness of a deep Convolutional neural network (CNN) for AF classification; 2) integrating the Mel filtering operation into the CNN architecture. The results showed a remarkable improvement in classification accuracy of the CNNs over state-of-the-art AF classification results for Dutch, most notably in the minority classes. Integrating the Mel filtering operation into the CNN architecture did not further improve classification performance.
  • Micklos, A., Macuch Silva, V., & Fay, N. (2018). The prevalence of repair in studies of language evolution. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 316-318). Toruń, Poland: NCU Press. doi:10.12775/3991-1.075.
  • Mishra, C., & Skantze, G. (2022). Knowing where to look: A planning-based architecture to automate the gaze behavior of social robots. In Proceedings of the 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) (pp. 1201-1208). doi:10.1109/RO-MAN53752.2022.9900740.

    Abstract

    Gaze cues play an important role in human communication and are used to coordinate turn-taking and joint attention, as well as to regulate intimacy. In order to have fluent conversations with people, social robots need to exhibit humanlike gaze behavior. Previous Gaze Control Systems (GCS) in HRI have automated robot gaze using data-driven or heuristic approaches. However, these systems tend to be mainly reactive in nature. Planning the robot gaze ahead of time could help in achieving more realistic gaze behavior and better eye-head coordination. In this paper, we propose and implement a novel planning-based GCS. We evaluate our system in a comparative within-subjects user study (N=26) between a reactive system and our proposed system. The results show that the users preferred the proposed system and that it was significantly more interpretable and better at regulating intimacy.
  • Mitterer, H. (Ed.). (2012). Ecological aspects of speech perception [Research topic] [Special Issue]. Frontiers in Cognition.

    Abstract

    Our knowledge of speech perception is largely based on experiments conducted with carefully recorded clear speech presented under good listening conditions to undistracted listeners - a near-ideal situation, in other words. But the reality poses a set of different challenges. First of all, listeners may need to divide their attention between speech comprehension and another task (e.g., driving). Outside the laboratory, the speech signal is often slurred by less than careful pronunciation and the listener has to deal with background noise. Moreover, in a globalized world, listeners need to understand speech in more than their native language. Relatedly, the speakers we listen to often have a different language background so we have to deal with a foreign or regional accent we are not familiar with. Finally, outside the laboratory, speech perception is not an end in itself, but rather a mean to contribute to a conversation. Listeners do not only need to understand the speech they are hearing, they also need to use this information to plan and time their own responses. For this special topic, we invite papers that address any of these ecological aspects of speech perception.
  • Moers, C., Janse, E., & Meyer, A. S. (2015). Probabilistic reduction in reading aloud: A comparison of younger and older adults. In M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). London: International Phonetics Association.

    Abstract

    Frequent and predictable words are generally pronounced with less effort and are therefore acoustically more reduced than less frequent or unpredictable words. Local predictability can be operationalised by Transitional Probability (TP), which indicates how likely a word is to occur given its immediate context. We investigated whether and how probabilistic reduction effects on word durations change with adult age when reading aloud content words embedded in sentences. The results showed equally large frequency effects on verb and noun durations for both younger (Mage = 20 years) and older (Mage = 68 years) adults. Backward TP also affected word duration for younger and older adults alike. ForwardTP, however, had no significant effect on word duration in either age group. Our results resemble earlier findings of more robust BackwardTP effects compared to ForwardTP effects. Furthermore, unlike often reported decline in predictive processing with aging, probabilistic reduction effects remain stable across adulthood.
  • Moisik, S. R., & Dediu, D. (2015). Anatomical biasing and clicks: Preliminary biomechanical modelling. In H. Little (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015) Satellite Event: The Evolution of Phonetic Capabilities: Causes constraints, consequences (pp. 8-13). Glasgow: ICPhS.

    Abstract

    It has been observed by several researchers that the Khoisan palate tends to lack a prominent alveolar ridge. A preliminary biomechanical model of click production was created to examine if these sounds might be subject to an anatomical bias associated with alveolar ridge size. Results suggest the bias is plausible, taking the form of decreased articulatory effort and improved volume change characteristics, however, further modelling and experimental research is required to solidify the claim.
  • Morano, L., Ernestus, M., & Ten Bosch, L. (2015). Schwa reduction in low-proficiency L2 speakers: Learning and generalization. In Scottish consortium for ICPhS, M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). Glasgow: University of Glasgow.

    Abstract

    This paper investigated the learnability and generalizability of French schwa alternation by Dutch low-proficiency second language learners. We trained 40 participants on 24 new schwa words by exposing them equally often to the reduced and full forms of these words. We then assessed participants' accuracy and reaction times to these newly learnt words as well as 24 previously encountered schwa words with an auditory lexical decision task. Our results show learning of the new words in both forms. This suggests that lack of exposure is probably the main cause of learners' difficulties with reduced forms. Nevertheless, the full forms were slightly better recognized than the reduced ones, possibly due to phonetic and phonological properties of the reduced forms. We also observed no generalization to previously encountered words, suggesting that our participants stored both of the learnt word forms and did not create a rule that applies to all schwa words.
  • Mulder, K., Ten Bosch, L., & Boves, L. (2018). Analyzing EEG Signals in Auditory Speech Comprehension Using Temporal Response Functions and Generalized Additive Models. In Proceedings of Interspeech 2018 (pp. 1452-1456). doi:10.21437/Interspeech.2018-1676.

    Abstract

    Analyzing EEG signals recorded while participants are listening to continuous speech with the purpose of testing linguistic hypotheses is complicated by the fact that the signals simultaneously reflect exogenous acoustic excitation and endogenous linguistic processing. This makes it difficult to trace subtle differences that occur in mid-sentence position. We apply an analysis based on multivariate temporal response functions to uncover subtle mid-sentence effects. This approach is based on a per-stimulus estimate of the response of the neural system to speech input. Analyzing EEG signals predicted on the basis of the response functions might then bring to light conditionspecific differences in the filtered signals. We validate this approach by means of an analysis of EEG signals recorded with isolated word stimuli. Then, we apply the validated method to the analysis of the responses to the same words in the middle of meaningful sentences.
  • Mulder, K., Brekelmans, G., & Ernestus, M. (2015). The processing of schwa reduced cognates and noncognates in non-native listeners of English. In Scottish consortium for ICPhS, M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). Glasgow: University of Glasgow.

    Abstract

    In speech, words are often reduced rather than fully pronounced (e.g., (/ˈsʌmri/ for /ˈsʌməri/, summary). Non-native listeners may have problems in processing these reduced forms, because they have encountered them less often. This paper addresses the question whether this also holds for highly proficient non-natives and for words with similar forms and meanings in the non-natives' mother tongue (i.e., cognates). In an English auditory lexical decision task, natives and highly proficient Dutch non-natives of English listened to cognates and non-cognates that were presented in full or without their post-stress schwa. The data show that highly proficient learners are affected by reduction as much as native speakers. Nevertheless, the two listener groups appear to process reduced forms differently, because non-natives produce more errors on reduced cognates than on non-cognates. While listening to reduced forms, non-natives appear to be hindered by the co-activated lexical representations of cognate forms in their native language.
  • Namjoshi, J., Tremblay, A., Broersma, M., Kim, S., & Cho, T. (2012). Influence of recent linguistic exposure on the segmentation of an unfamiliar language [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1968.

    Abstract

    Studies have shown that listeners segmenting unfamiliar languages transfer native-language (L1) segmentation cues. These studies, however, conflated L1 and recent linguistic exposure. The present study investigates the relative influences of L1 and recent linguistic exposure on the use of prosodic cues for segmenting an artificial language (AL). Participants were L1-French listeners, high-proficiency L2-French L1-English listeners, and L1-English listeners without functional knowledge of French. The prosodic cue assessed was F0 rise, which is word-final in French, but in English tends to be word-initial. 30 participants heard a 20-minute AL speech stream with word-final boundaries marked by F0 rise, and decided in a subsequent listening task which of two words (without word-final F0 rise) had been heard in the speech stream. The analyses revealed a marginally significant effect of L1 (all listeners) and, importantly, a significant effect of recent linguistic exposure (L1-French and L2-French listeners): accuracy increased with decreasing time in the US since the listeners’ last significant (3+ months) stay in a French-speaking environment. Interestingly, no effect of L2 proficiency was found (L2-French listeners).
  • Neger, T. M., Rietveld, T., & Janse, E. (2015). Adult age effects in auditory statistical learning. In M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). London: International Phonetic Association.

    Abstract

    Statistical learning plays a key role in language processing, e.g., for speech segmentation. Older adults have been reported to show less statistical learning on the basis of visual input than younger adults. Given age-related changes in perception and cognition, we investigated whether statistical learning is also impaired in the auditory modality in older compared to younger adults and whether individual learning ability is associated with measures of perceptual (i.e., hearing sensitivity) and cognitive functioning in both age groups. Thirty younger and thirty older adults performed an auditory artificial-grammar-learning task to assess their statistical learning ability. In younger adults, perceptual effort came at the cost of processing resources required for learning. Inhibitory control (as indexed by Stroop colornaming performance) did not predict auditory learning. Overall, younger and older adults showed the same amount of auditory learning, indicating that statistical learning ability is preserved over the adult life span.
  • Nijveld, A., Ten Bosch, L., & Ernestus, M. (2015). Exemplar effects arise in a lexical decision task, but only under adverse listening conditions. In Scottish consortium for ICPhS, M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). Glasgow: University of Glasgow.

    Abstract

    This paper studies the influence of adverse listening conditions on exemplar effects in priming experiments that do not instruct participants to use their episodic memories. We conducted two lexical decision experiments, in which a prime and a target represented the same word type and could be spoken by the same or a different speaker. In Experiment 1, participants listened to clear speech, and showed no exemplar effects: they recognised repetitions by the same speaker as quickly as different speaker repetitions. In Experiment 2, the stimuli contained noise, and exemplar effects did arise. Importantly, Experiment 1 elicited longer average RTs than Experiment 2, a result that contradicts the time-course hypothesis, according to which exemplars only play a role when processing is slow. Instead, our findings support the hypothesis that exemplar effects arise under adverse listening conditions, when participants are stimulated to use their episodic memories in addition to their mental lexicons.
  • Nordhoff, S., & Hammarström, H. (2012). Glottolog/Langdoc: Increasing the visibility of grey literature for low-density languages. In N. Calzolari (Ed.), Proceedings of the 8th International Conference on Language Resources and Evaluation [LREC 2012], May 23-25, 2012 (pp. 3289-3294). [Paris]: ELRA.

    Abstract

    Language resources can be divided into structural resources treating phonology, morphosyntax, semantics etc. and resources treating the social, demographic, ethnic, political context. A third type are meta-resources, like bibliographies, which provide access to the resources of the first two kinds. This poster will present the Glottolog/Langdoc project, a comprehensive bibliography providing web access to 180k bibliographical records to (mainly) low visibility resources from low-density languages. The resources are annotated for macro-area, content language, and document type and are available in XHTML and RDF.
  • Pallier, C., Cutler, A., & Sebastian-Galles, N. (1997). Prosodic structure and phonetic processing: A cross-linguistic study. In Proceedings of EUROSPEECH 97 (pp. 2131-2134). Grenoble, France: ESCA.

    Abstract

    Dutch and Spanish differ in how predictable the stress pattern is as a function of the segmental content: it is correlated with syllable weight in Dutch but not in Spanish. In the present study, two experiments were run to compare the abilities of Dutch and Spanish speakers to separately process segmental and stress information. It was predicted that the Spanish speakers would have more difficulty focusing on the segments and ignoring the stress pattern than the Dutch speakers. The task was a speeded classification task on CVCV syllables, with blocks of trials in which the stress pattern could vary versus blocks in which it was fixed. First, we found interference due to stress variability in both languages, suggesting that the processing of segmental information cannot be performed independently of stress. Second, the effect was larger for Spanish than for Dutch, suggesting that that the degree of interference from stress variation may be partially mitigated by the predictability of stress placement in the language.
  • Peeters, D., Snijders, T. M., Hagoort, P., & Ozyurek, A. (2015). The role of left inferior frontal Gyrus in the integration of point- ing gestures and speech. In G. Ferré, & M. Tutton (Eds.), Proceedings of the4th GESPIN - Gesture & Speech in Interaction Conference. Nantes: Université de Nantes.

    Abstract

    Comprehension of pointing gestures is fundamental to human communication. However, the neural mechanisms
    that subserve the integration of pointing gestures and speech in visual contexts in comprehension
    are unclear. Here we present the results of an fMRI study in which participants watched images of an
    actor pointing at an object while they listened to her referential speech. The use of a mismatch paradigm
    revealed that the semantic unication of pointing gesture and speech in a triadic context recruits left
    inferior frontal gyrus. Complementing previous ndings, this suggests that left inferior frontal gyrus
    semantically integrates information across modalities and semiotic domains.
  • Perlman, M., Paul, J., & Lupyan, G. (2015). Congenitally deaf children generate iconic vocalizations to communicate magnitude. In D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. R. Maglio (Eds.), Proceedings of the 37th Annual Cognitive Science Society Meeting (CogSci 2015) (pp. 315-320). Austin, TX: Cognitive Science Society.

    Abstract

    From an early age, people exhibit strong links between certain visual (e.g. size) and acoustic (e.g. duration) dimensions. Do people instinctively extend these crossmodal correspondences to vocalization? We examine the ability of congenitally deaf Chinese children and young adults (age M = 12.4 years, SD = 3.7 years) to generate iconic vocalizations to distinguish items with contrasting magnitude (e.g., big vs. small ball). Both deaf and hearing (M = 10.1 years, SD = 0.83 years) participants produced longer, louder vocalizations for greater magnitude items. However, only hearing participants used pitch—higher pitch for greater magnitude – which counters the hypothesized, innate size “frequency code”, but fits with Mandarin language and culture. Thus our results show that the translation of visible magnitude into the duration and intensity of vocalization transcends auditory experience, whereas the use of pitch appears more malleable to linguistic and cultural influence.
  • Perniss, P. M., Ozyurek, A., & Morgan, G. (Eds.). (2015). The influence of the visual modality on language structure and conventionalization: Insights from sign language and gesture [Special Issue]. Topics in Cognitive Science, 7(1). doi:10.1111/tops.12113.
  • Perry, L., Perlman, M., & Lupyan, G. (2015). Iconicity in English vocabulary and its relation to toddlers’ word learning. In D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. R. Maglio (Eds.), Proceedings of the 37th Annual Cognitive Science Society Meeting (CogSci 2015) (pp. 315-320). Austin, TX: Cognitive Science Society.

    Abstract

    Scholars have documented substantial classes of iconic vocabulary in many non-Indo-European languages. In comparison, Indo-European languages like English are assumed to be arbitrary outside of a small number of onomatopoeic words. In three experiments, we asked English speakers to rate the iconicity of words from the MacArthur-Bates Communicative Developmental Inventory. We found English—contrary to common belief—exhibits iconicity that correlates with age of acquisition and differs across lexical classes. Words judged as most iconic are learned earlier, in accord with findings that iconic words are easier to learn. We also find that adjectives and verbs are more iconic than nouns, supporting the idea that iconicity provides an extra cue in learning more difficult abstract meanings. Our results provide new evidence for a relationship between iconicity and word learning and suggest iconicity may be a more pervasive property of spoken languages than previously thought.
  • Poellmann, K., McQueen, J. M., & Mitterer, H. (2012). How talker-adaptation helps listeners recognize reduced word-forms [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 2053.

    Abstract

    Two eye-tracking experiments tested whether native listeners can adapt
    to reductions in casual Dutch speech. Listeners were exposed to segmental
    ([b] > [m]), syllabic (full-vowel-deletion), or no reductions. In a subsequent
    test phase, all three listener groups were tested on how efficiently they could
    recognize both types of reduced words. In the first Experiment’s exposure
    phase, the (un)reduced target words were predictable. The segmental reductions
    were completely consistent (i.e., involved the same input sequences).
    Learning about them was found to be pattern-specific and generalized in the
    test phase to new reduced /b/-words. The syllabic reductions were not consistent
    (i.e., involved variable input sequences). Learning about them was
    weak and not pattern-specific. Experiment 2 examined effects of word repetition
    and predictability. The (un-)reduced test words appeared in the exposure
    phase and were not predictable. There was no evidence of learning for
    the segmental reductions, probably because they were not predictable during
    exposure. But there was word-specific learning for the vowel-deleted words.
    The results suggest that learning about reductions is pattern-specific and
    generalizes to new words if the input is consistent and predictable. With
    variable input, there is more likely to be adaptation to a general speaking
    style and word-specific learning.
  • Räsänen, O., Seshadri, S., & Casillas, M. (2018). Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. In Proceedings of Interspeech 2018 (pp. 1200-1204). doi:10.21437/Interspeech.2018-1047.

    Abstract

    Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.
  • Ravignani, A., & Fitch, W. T. (2012). Sonification of experimental parameters as a new method for efficient coding of behavior. In A. Spink, F. Grieco, O. E. Krips, L. W. S. Loijens, L. P. P. J. Noldus, & P. H. Zimmerman (Eds.), Measuring Behavior 2012, 8th International Conference on Methods and Techniques in Behavioral Research (pp. 376-379).

    Abstract

    Cognitive research is often focused on experimental condition-driven reactions. Ethological studies frequently
    rely on the observation of naturally occurring specific behaviors. In both cases, subjects are filmed during the
    study, so that afterwards behaviors can be coded on video. Coding should typically be blind to experimental
    conditions, but often requires more information than that present on video. We introduce a method for blindcoding
    of behavioral videos that takes care of both issues via three main innovations. First, of particular
    significance for playback studies, it allows creation of a “soundtrack” of the study, that is, a track composed of
    synthesized sounds representing different aspects of the experimental conditions, or other events, over time.
    Second, it facilitates coding behavior using this audio track, together with the possibly muted original video.
    This enables coding blindly to conditions as required, but not ignoring other relevant events. Third, our method
    makes use of freely available, multi-platform software, including scripts we developed.
  • Ravignani, A., Garcia, M., Gross, S., de Reus, K., Hoeksema, N., Rubio-Garcia, A., & de Boer, B. (2018). Pinnipeds have something to say about speech and rhythm. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 399-401). Toruń, Poland: NCU Press. doi:10.12775/3991-1.095.
  • Raviv, L., Meyer, A. S., & Lev-Ari, S. (2018). The role of community size in the emergence of linguistic structure. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 402-404). Toruń, Poland: NCU Press. doi:10.12775/3991-1.096.
  • Raviv, L., Jacobson, S. L., Plotnik, J. M., Bowman, J., Lynch, V., & Benítez-Burraco, A. (2022). Elephants as a new animal model for studying the evolution of language as a result of self-domestication. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 606-608). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • de Reus, K., Carlson, D., Lowry, A., Gross, S., Garcia, M., Rubio-García, A., Salazar-Casals, A., & Ravignani, A. (2022). Body size predicts vocal tract size in a mammalian vocal learner. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 154-156). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Roberts, L., & Meyer, A. S. (Eds.). (2012). Individual differences in second language acquisition [Special Issue]. Language Learning, 62(Supplement S2).
  • Roberts, S. G., Everett, C., & Blasi, D. (2015). Exploring potential climate effects on the evolution of human sound systems. In H. Little (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences [ICPhS 2015] Satellite Event: The Evolution of Phonetic Capabilities: Causes constraints, consequences (pp. 14-19). Glasgow: ICPHS.

    Abstract

    We suggest that it is now possible to conduct research on a topic which might be called evolutionary geophonetics. The main question is how the climate influences the evolution of language. This involves biological adaptations to the climate that may affect biases in production and perception; cultural evolutionary adaptations of the sounds of a language to climatic conditions; and influences of the climate on language diversity and contact. We discuss these ideas with special reference to a recent hypothesis that lexical tone is not adaptive in dry climates (Everett, Blasi & Roberts, 2015).
  • Rubio-Fernández, P., & Jara-Ettinger, J. (2018). Joint inferences of speakers’ beliefs and referents based on how they speak. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 991-996). Austin, TX: Cognitive Science Society.

    Abstract

    For almost two decades, the poor performance observed with the so-called Director task has been interpreted as evidence of limited use of Theory of Mind in communication. Here we propose a probabilistic model of common ground in referential communication that derives three inferences from an utterance: what the speaker is talking about in a visual context, what she knows about the context, and what referential expressions she prefers. We tested our model by comparing its inferences with those made by human participants and found that it closely mirrors their judgments, whereas an alternative model compromising the hearer’s expectations of cooperativeness and efficiency reveals a worse fit to the human data. Rather than assuming that common ground is fixed in a given exchange and may or may not constrain reference resolution, we show how common ground can be inferred as part of the process of reference assignment.
  • Saleh, A., Beck, T., Galke, L., & Scherp, A. (2018). Performance comparison of ad-hoc retrieval models over full-text vs. titles of documents. In M. Dobreva, A. Hinze, & M. Žumer (Eds.), Maturity and Innovation in Digital Libraries: 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings (pp. 290-303). Cham, Switzerland: Springer.

    Abstract

    While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.
  • San Roque, L., & Bergvist, H. (Eds.). (2015). Epistemic marking in typological perspective [Special Issue]. STUF -Language typology and universals, 68(2).
  • Scharenborg, O., & Merkx, D. (2018). The role of articulatory feature representation quality in a computational model of human spoken-word recognition. In Proceedings of the Machine Learning in Speech and Language Processing Workshop (MLSLP 2018).

    Abstract

    Fine-Tracker is a speech-based model of human speech
    recognition. While previous work has shown that Fine-Tracker
    is successful at modelling aspects of human spoken-word
    recognition, its speech recognition performance is not
    comparable to that of human performance, possibly due to
    suboptimal intermediate articulatory feature (AF)
    representations. This study investigates the effect of improved
    AF representations, obtained using a state-of-the-art deep
    convolutional network, on Fine-Tracker’s simulation and
    recognition performance: Although the improved AF quality
    resulted in improved speech recognition; it, surprisingly, did
    not lead to an improvement in Fine-Tracker’s simulation power.
  • Scharenborg, O., Witteman, M. J., & Weber, A. (2012). Computational modelling of the recognition of foreign-accented speech. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 882 -885).

    Abstract

    In foreign-accented speech, pronunciation typically deviates from the canonical form to some degree. For native listeners, it has been shown that word recognition is more difficult for strongly-accented words than for less strongly-accented words. Furthermore recognition of strongly-accented words becomes easier with additional exposure to the foreign accent. In this paper, listeners’ behaviour was simulated with Fine-tracker, a computational model of word recognition that uses real speech as input. The simulations showed that, in line with human listeners, 1) Fine-Tracker’s recognition outcome is modulated by the degree of accentedness and 2) it improves slightly after brief exposure with the accent. On the level of individual words, however, Fine-tracker failed to correctly simulate listeners’ behaviour, possibly due to differences in overall familiarity with the chosen accent (German-accented Dutch) between human listeners and Fine-Tracker.
  • Scharenborg, O., & Janse, E. (2012). Hearing loss and the use of acoustic cues in phonetic categorisation of fricatives. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 1458-1461).

    Abstract

    Aging often affects sensitivity to the higher frequencies, which results in the loss of sensitivity to phonetic detail in speech. Hearing loss may therefore interfere with the categorisation of two consonants that have most information to differentiate between them in those higher frequencies and less in the lower frequencies, e.g., /f/ and /s/. We investigate two acoustic cues, i.e., formant transitions and fricative intensity, that older listeners might use to differentiate between /f/ and /s/. The results of two phonetic categorisation tasks on 38 older listeners (aged 60+) with varying degrees of hearing loss indicate that older listeners seem to use formant transitions as a cue to distinguish /s/ from /f/. Moreover, this ability is not impacted by hearing loss. On the other hand, listeners with increased hearing loss seem to rely more on intensity for fricative identification. Thus, progressive hearing loss may lead to gradual changes in perceptual cue weighting.
  • Scharenborg, O., Janse, E., & Weber, A. (2012). Perceptual learning of /f/-/s/ by older listeners. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 398-401).

    Abstract

    Young listeners can quickly modify their interpretation of a speech sound when a talker produces the sound ambiguously. Young Dutch listeners rely mainly on the higher frequencies to distinguish between /f/ and /s/, but these higher frequencies are particularly vulnerable to age-related hearing loss. We therefore tested whether older Dutch listeners can show perceptual retuning given an ambiguous pronunciation in between /f/ and /s/. Results of a lexically-guided perceptual learning experiment showed that older Dutch listeners are still able to learn non-standard pronunciations of /f/ and /s/. Possibly, the older listeners have learned to rely on other acoustic cues, such as formant transitions, to distinguish between /f/ and /s/. However, the size and duration of the perceptual effect is influenced by hearing loss, with listeners with poorer hearing showing a smaller and a shorter-lived learning effect.
  • Schiller, N. O., Van Lieshout, P. H. H. M., Meyer, A. S., & Levelt, W. J. M. (1997). Is the syllable an articulatory unit in speech production? Evidence from an Emma study. In P. Wille (Ed.), Fortschritte der Akustik: Plenarvorträge und Fachbeiträge der 23. Deutschen Jahrestagung für Akustik (DAGA 97) (pp. 605-606). Oldenburg: DEGA.
  • Schmidt, J., Scharenborg, O., & Janse, E. (2015). Semantic processing of spoken words under cognitive load in older listeners. In M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). London: International Phonetic Association.

    Abstract

    Processing of semantic information in language comprehension has been suggested to be modulated by attentional resources. Consequently, cognitive load would be expected to reduce semantic priming, but studies have yielded inconsistent results. This study investigated whether cognitive load affects semantic activation in speech processing in older adults, and whether this is modulated by individual differences in cognitive and hearing abilities. Older adults participated in an auditory continuous lexical decision task in a low-load and high-load condition. The group analysis showed only a marginally significant reduction of semantic priming in the high-load condition compared to the low-load condition. The individual differences analysis showed that semantic priming was significantly reduced under increased load in participants with poorer attention-switching control. Hence, a resource-demanding secondary task may affect the integration of spoken words into a coherent semantic representation for listeners with poorer attentional skills.
  • Scholman, M., Tianai, D., Yung, F., & Demberg, V. (2022). DiscoGeM: A crowdsourced corpus of genre-mixed implicit discourse relations. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. DeClerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022) (pp. 3281-3290). Marseille, France: European Language Resources Association.

    Abstract

    We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech,
    literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods
    were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results
    show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses.
    Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text
    genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic
    relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level
    labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to
    function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into
    non-connective signals of discourse relations.
  • Schubotz, L., Holler, J., & Ozyurek, A. (2015). Age-related differences in multi-modal audience design: Young, but not old speakers, adapt speech and gestures to their addressee's knowledge. In G. Ferré, & M. Tutton (Eds.), Proceedings of the 4th GESPIN - Gesture & Speech in Interaction Conference (pp. 211-216). Nantes: Université of Nantes.

    Abstract

    Speakers can adapt their speech and co-speech gestures for
    addressees. Here, we investigate whether this ability is
    modulated by age. Younger and older adults participated in a
    comic narration task in which one participant (the speaker)
    narrated six short comic stories to another participant (the
    addressee). One half of each story was known to both participants, the other half only to the speaker. Younger but
    not older speakers used more words and gestures when narrating novel story content as opposed to known content.
    We discuss cognitive and pragmatic explanations of these findings and relate them to theories of gesture production.
  • Schuerman, W. L., Nagarajan, S., & Houde, J. (2015). Changes in consonant perception driven by adaptation of vowel production to altered auditory feedback. In M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congresses of Phonetic Sciences (ICPhS 2015). London: International Phonetic Association.

    Abstract

    Adaptation to altered auditory feedback has been shown to induce subsequent shifts in perception. However, it is uncertain whether these perceptual changes may generalize to other speech sounds. In this experiment, we tested whether exposing the production of a vowel to altered auditory feedback affects perceptual categorization of a consonant distinction. In two sessions, participants produced CVC words containing the vowel /i/, while intermittently categorizing stimuli drawn from a continuum between "see" and "she." In the first session feedback was unaltered, while in the second session the formants of the vowel were shifted 20% towards /u/. Adaptation to the altered vowel was found to reduce the proportion of perceived /S/ stimuli. We suggest that this reflects an alteration to the sensorimotor mapping that is shared between vowels and consonants.
  • Seuren, P. A. M., & Mufwene, S. S. (Eds.). (1990). Issues in Creole lingusitics [Special Issue]. Linguistics, 28(4).
  • Severijnen, G. G. A., Bosker, H. R., & McQueen, J. M. (2022). Acoustic correlates of Dutch lexical stress re-examined: Spectral tilt is not always more reliable than intensity. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 278-282). doi:10.21437/SpeechProsody.2022-57.

    Abstract

    The present study examined two acoustic cues in the production
    of lexical stress in Dutch: spectral tilt and overall intensity.
    Sluijter and Van Heuven (1996) reported that spectral tilt is a
    more reliable cue to stress than intensity. However, that study
    included only a small number of talkers (10) and only syllables
    with the vowels /aː/ and /ɔ/.
    The present study re-examined this issue in a larger and
    more variable dataset. We recorded 38 native speakers of Dutch
    (20 females) producing 744 tokens of Dutch segmentally
    overlapping words (e.g., VOORnaam vs. voorNAAM, “first
    name” vs. “respectable”), targeting 10 different vowels, in
    variable sentence contexts. For each syllable, we measured
    overall intensity and spectral tilt following Sluijter and Van
    Heuven (1996).
    Results from Linear Discriminant Analyses showed that,
    for the vowel /aː/ alone, spectral tilt showed an advantage over
    intensity, as evidenced by higher stressed/unstressed syllable
    classification accuracy scores for spectral tilt. However, when
    all vowels were included in the analysis, the advantage
    disappeared.
    These findings confirm that spectral tilt plays a larger role
    in signaling stress in Dutch /aː/ but show that, for a larger
    sample of Dutch vowels, overall intensity and spectral tilt are
    equally important.
  • Sjerps, M. J., McQueen, J. M., & Mitterer, H. (2012). Extrinsic normalization for vocal tracts depends on the signal, not on attention. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 394-397).

    Abstract

    When perceiving vowels, listeners adjust to speaker-specific vocal-tract characteristics (such as F1) through "extrinsic vowel normalization". This effect is observed as a shift in the location of categorization boundaries of vowel continua. Similar effects have been found with non-speech. Non-speech materials, however, have consistently led to smaller effect-sizes, perhaps because of a lack of attention to non-speech. The present study investigated this possibility. Non-speech materials that had previously been shown to elicit reduced normalization effects were tested again, with the addition of an attention manipulation. The results show that increased attention does not lead to increased normalization effects, suggesting that vowel normalization is mainly determined by bottom-up signal characteristics.
  • Sloetjes, H., & Somasundaram, A. (2012). ELAN development, keeping pace with communities' needs. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 219-223). European Language Resources Association (ELRA).

    Abstract

    ELAN is a versatile multimedia annotation tool that is being developed at the Max Planck Institute for Psycholinguistics. About a decade ago it emerged out of a number of corpus tools and utilities and it has been extended ever since. This paper focuses on the efforts made to ensure that the application keeps up with the growing needs of that era in linguistics and multimodality research; growing needs in terms of length and resolution of recordings, the number of recordings made and transcribed and the number of levels of annotation per transcription.
  • Slonimska, A., Ozyurek, A., & Campisi, E. (2015). Ostensive signals: markers of communicative relevance of gesture during demonstration to adults and children. In G. Ferré, & M. Tutton (Eds.), Proceedings of the 4th GESPIN - Gesture & Speech in Interaction Conference (pp. 217-222). Nantes: Universite of Nantes.

    Abstract

    Speakers adapt their speech and gestures in various ways for their audience. We investigated further whether they use
    ostensive signals (eye gaze, ostensive speech (e.g. like this, this) or a combination of both) in relation to their gestures
    when talking to different addressees, i.e., to another adult or a child in a multimodal demonstration task. While adults used
    more eye gaze towards their gestures with other adults than with children, they were more likely to use combined
    ostensive signals for children than for adults. Thus speakers mark the communicative relevance of their gestures with different types of ostensive signals and by taking different types of addressees into account.
  • Slonimska, A., Özyürek, A., & Capirci, O. (2022). Simultaneity as an emergent property of sign languages. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 678-680). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • De Smedt, K., & Kempen, G. (1990). Discontinuous constituency in Segment Grammar. In Proceedings of the Symposium on Discontinuous Constituency. Tilburg: University of Brabant.
  • Smorenburg, L., Rodd, J., & Chen, A. (2015). The effect of explicit training on the prosodic production of L2 sarcasm by Dutch learners of English. In M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). Glasgow, UK: University of Glasgow.

    Abstract

    Previous research [9] suggests that Dutch learners of (British) English are not able to express sarcasm prosodically in their L2. The present study investigates whether explicit training on the prosodic markers of sarcasm in English can improve learners’ realisation of sarcasm. Sarcastic speech was elicited in short simulated telephone conversations between Dutch advanced learners of English and a native British English-speaking ‘friend’ in two sessions, fourteen days apart. Between the two sessions, participants were trained by means of (1) a presentation, (2) directed independent practice, and (3) evaluation of participants’ production and individual feedback in small groups. L1 British English-speaking raters subsequently evaluated the degree of sarcastic sounding in the participants’ responses on a five-point scale. It was found that significantly higher sarcasm ratings were given to L2 learners’ production obtained after the training than that obtained before the training; explicit training on prosody has a positive effect on learners’ production of sarcasm.
  • Speed, L., & Majid, A. (2018). Music and odor in harmony: A case of music-odor synaesthesia. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2527-2532). Austin, TX: Cognitive Science Society.

    Abstract

    We report an individual with music-odor synaesthesia who experiences automatic and vivid odor sensations when she hears music. S’s odor associations were recorded on two days, and compared with those of two control participants. Overall, S produced longer descriptions, and her associations were of multiple odors at once, in comparison to controls who typically reported a single odor. Although odor associations were qualitatively different between S and controls, ratings of the consistency of their descriptions did not differ. This demonstrates that crossmodal associations between music and odor exist in non-synaesthetes too. We also found that S is better at discriminating between odors than control participants, and is more likely to experience emotion, memories and evaluations triggered by odors, demonstrating the broader impact of her synaesthesia.

    Additional information

    link to conference website
  • Stehouwer, H., Durco, M., Auer, E., & Broeder, D. (2012). Federated search: Towards a common search infrastructure. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 3255-3259). European Language Resources Association (ELRA).

    Abstract

    Within scientific institutes there exist many language resources. These resources are often quite specialized and relatively unknown. The current infrastructural initiatives try to tackle this issue by collecting metadata about the resources and establishing centers with stable repositories to ensure the availability of the resources. It would be beneficial if the researcher could, by means of a simple query, determine which resources and which centers contain information useful to his or her research, or even work on a set of distributed resources as a virtual corpus. In this article we propose an architecture for a distributed search environment allowing researchers to perform searches in a set of distributed language resources.
  • Sumer, B., Zwitserlood, I., Perniss, P. M., & Ozyurek, A. (2012). Development of locative expressions by Turkish deaf and hearing children: Are there modality effects? In A. K. Biller, E. Y. Chung, & A. E. Kimball (Eds.), Proceedings of the 36th Annual Boston University Conference on Language Development (BUCLD 36) (pp. 568-580). Boston: Cascadilla Press.
  • Svantesson, J.-O., Burenhult, N., Holmer, A., Karlsson, A., & Lundström, H. (Eds.). (2012). Humanities of the lesser-known: New directions in the description, documentation and typology of endangered languages and musics [Special Issue]. Language Documentation and Description, 10.
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2018). Analyzing reaction time sequences from human participants in auditory experiments. In Proceedings of Interspeech 2018 (pp. 971-975). doi:10.21437/Interspeech.2018-1728.

    Abstract

    Sequences of reaction times (RT) produced by participants in an experiment are not only influenced by the stimuli, but by many other factors as well, including fatigue, attention, experience, IQ, handedness, etc. These confounding factors result in longterm effects (such as a participant’s overall reaction capability) and in short- and medium-time fluctuations in RTs (often referred to as ‘local speed effects’). Because stimuli are usually presented in a random sequence different for each participant, local speed effects affect the underlying ‘true’ RTs of specific trials in different ways across participants. To be able to focus statistical analysis on the effects of the cognitive process under study, it is necessary to reduce the effect of confounding factors as much as possible. In this paper we propose and compare techniques and criteria for doing so, with focus on reducing (‘filtering’) the local speed effects. We show that filtering matters substantially for the significance analyses of predictors in linear mixed effect regression models. The performance of filtering is assessed by the average between-participant correlation between filtered RT sequences and by Akaike’s Information Criterion, an important measure of the goodness-of-fit of linear mixed effect regression models.
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2015). DIANA, an end-to-end computational model of human word comprehension. In Scottish consortium for ICPhS, M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). Glasgow: University of Glasgow.

    Abstract

    This paper presents DIANA, a new computational model of human speech processing. It is the first model that simulates the complete processing chain from the on-line processing of an acoustic signal to the execution of a response, including reaction times. Moreover it assumes minimal modularity. DIANA consists of three components. The activation component computes a probabilistic match between the input acoustic signal and representations in DIANA’s lexicon, resulting in a list of word hypotheses changing over time as the input unfolds. The decision component operates on this list and selects a word as soon as sufficient evidence is available. Finally, the execution component accounts for the time to execute a behavioral action. We show that DIANA well simulates the average participant in a word recognition experiment.
  • Ten Bosch, L., Boves, L., Tucker, B., & Ernestus, M. (2015). DIANA: Towards computational modeling reaction times in lexical decision in North American English. In Proceedings of Interspeech 2015: The 16th Annual Conference of the International Speech Communication Association (pp. 1576-1580).

    Abstract

    DIANA is an end-to-end computational model of speech processing, which takes as input the speech signal, and provides as output the orthographic transcription of the stimulus, a word/non-word judgment and the associated estimated reaction time. So far, the model has only been tested for Dutch. In this paper, we extend DIANA such that it can also process North American English. The model is tested by having it simulate human participants in a large scale North American English lexical decision experiment. The simulations show that DIANA can adequately approximate the reaction times of an average participant (r = 0.45). In addition, they indicate that DIANA does not yet adequately model the cognitive processes that take place after stimulus offset.
  • Ten Bosch, L., & Boves, L. (2018). Information encoding by deep neural networks: what can we learn? In Proceedings of Interspeech 2018 (pp. 1457-1461). doi:10.21437/Interspeech.2018-1896.

    Abstract

    The recent advent of deep learning techniques in speech tech-nology and in particular in automatic speech recognition hasyielded substantial performance improvements. This suggeststhat deep neural networks (DNNs) are able to capture structurein speech data that older methods for acoustic modeling, suchas Gaussian Mixture Models and shallow neural networks failto uncover. In image recognition it is possible to link repre-sentations on the first couple of layers in DNNs to structuralproperties of images, and to representations on early layers inthe visual cortex. This raises the question whether it is possi-ble to accomplish a similar feat with representations on DNNlayers when processing speech input. In this paper we presentthree different experiments in which we attempt to untanglehow DNNs encode speech signals, and to relate these repre-sentations to phonetic knowledge, with the aim to advance con-ventional phonetic concepts and to choose the topology of aDNNs more efficiently. Two experiments investigate represen-tations formed by auto-encoders. A third experiment investi-gates representations on convolutional layers that treat speechspectrograms as if they were images. The results lay the basisfor future experiments with recursive networks.
  • Ten Bosch, L., & Scharenborg, O. (2012). Modeling cue trading in human word recognition. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 2003-2006).

    Abstract

    Classical phonetic studies have shown that acoustic-articulatory cues can be interchanged without affecting the resulting phoneme percept (‘cue trading’). Cue trading has so far mainly been investigated in the context of phoneme identification. In this study, we investigate cue trading in word recognition, because words are the units of speech through which we communicate. This paper aims to provide a method to quantify cue trading effects by using a computational model of human word recognition. This model takes the acoustic signal as input and represents speech using articulatory feature streams. Importantly, it allows cue trading and underspecification. Its set-up is inspired by the functionality of Fine-Tracker, a recent computational model of human word recognition. This approach makes it possible, for the first time, to quantify cue trading in terms of a trade-off between features and to investigate cue trading in the context of a word recognition task.
  • Terband, H., Rodd, J., & Maas, E. (2015). Simulations of feedforward and feedback control in apraxia of speech (AOS): Effects of noise masking on vowel production in the DIVA model. In M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahan, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015).

    Abstract

    Apraxia of Speech (AOS) is a motor speech disorder whose precise nature is still poorly understood. A recent behavioural experiment featuring a noise masking paradigm suggests that AOS reflects a disruption of feedforward control, whereas feedback control is spared and plays a more prominent role in achieving and maintaining segmental contrasts [10]. In the present study, we set out to validate the interpretation of AOS as a feedforward impairment by means of a series of computational simulations with the DIVA model [6, 7] mimicking the behavioural experiment. Simulation results showed a larger reduction in vowel spacing and a smaller vowel dispersion in the masking condition compared to the no-masking condition for the simulated feedforward deficit, whereas the other groups showed an opposite pattern. These results mimic the patterns observed in the human data, corroborating the notion that AOS can be conceptualized as a deficit in feedforward control
  • Thompson, B., & Lupyan, G. (2018). Automatic estimation of lexical concreteness in 77 languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1122-1127). Austin, TX: Cognitive Science Society.

    Abstract

    We estimate lexical Concreteness for millions of words across 77 languages. Using a simple regression framework, we combine vector-based models of lexical semantics with experimental norms of Concreteness in English and Dutch. By applying techniques to align vector-based semantics across distinct languages, we compute and release Concreteness estimates at scale in numerous languages for which experimental norms are not currently available. This paper lays out the technique and its efficacy. Although this is a difficult dataset to evaluate immediately, Concreteness estimates computed from English correlate with Dutch experimental norms at $\rho$ = .75 in the vocabulary at large, increasing to $\rho$ = .8 among Nouns. Our predictions also recapitulate attested relationships with word frequency. The approach we describe can be readily applied to numerous lexical measures beyond Concreteness
  • Thompson, B., Roberts, S., & Lupyan, G. (2018). Quantifying semantic similarity across languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2551-2556). Austin, TX: Cognitive Science Society.

    Abstract

    Do all languages convey semantic knowledge in the same way? If language simply mirrors the structure of the world, the answer should be a qualified “yes”. If, however, languages impose structure as much as reflecting it, then even ostensibly the “same” word in different languages may mean quite different things. We provide a first pass at a large-scale quantification of cross-linguistic semantic alignment of approximately 1000 meanings in 55 languages. We find that the translation equivalents in some domains (e.g., Time, Quantity, and Kinship) exhibit high alignment across languages while the structure of other domains (e.g., Politics, Food, Emotions, and Animals) exhibits substantial cross-linguistic variability. Our measure of semantic alignment correlates with known phylogenetic distances between languages: more phylogenetically distant languages have less semantic alignment. We also find semantic alignment to correlate with cultural distances between societies speaking the languages, suggesting a rich co-adaptation of language and culture even in domains of experience that appear most constrained by the natural world
  • Torreira, F. (2015). Melodic alternations in Spanish. In The Scottish Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015) (pp. 946.1-5). Glasgow, UK: The University of Glasgow. Retrieved from http://www.icphs2015.info/pdfs/Papers/ICPHS0946.pdf.

    Abstract

    This article describes how the tonal elements of two common Spanish intonation contours –the falling statement and the low-rising-falling request– align with the segmental string in broad-focus utterances differing in number of prosodic words. Using an imitation-and-completion task, we show that (i) the last stressed syllable of the utterance, traditionally viewed as carrying the ‘nuclear’ accent, associates with either a high or a low tonal element depending on phrase length (ii) that certain tonal elements can be realized or omitted depending on the availability of specific metrical positions in their intonational phrase, and (iii) that the high tonal element of the request contour associates with either a stressed syllable or an intonational phrase edge depending on phrase length. On the basis of these facts, and in contrast to previous descriptions of Spanish intonation relying on obligatory and constant nuclear contours (e.g., L* L% for all neutral statements), we argue for a less constrained intonational morphology involving tonal units linked to the segmental string via contour-specific principles.
  • Tourtouri, E. N., Delogu, F., & Crocker, M. W. (2015). ERP indices of situated reference in visual contexts. In D. Noelle, R. Dale, A. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. P. Maglio (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society (CogSci 2015) (pp. 2422-2427). Austin: Cognitive Science Society.

    Abstract

    Violations of the maxims of Quantity occur when utterances provide more (over-specified) or less (under-specified) information than strictly required for referent identification. While behavioural datasuggest that under-specified expressions lead to comprehension difficulty and communicative failure, there is no consensus as to whether over-specified expressions are also detrimental to comprehension. In this study we shed light on this debate, providing neurophysiological evidence supporting the view that extra information facilitates comprehension. We further present novel evidence that referential failure due to under-specification is qualitatively different from explicit cases of referential failure, when no matching referential candidate is available in the context.
  • Tourtouri, E. N., Delogu, F., & Crocker, M. W. (2018). Specificity and entropy reduction in situated referential processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3356-3361). Austin: Cognitive Science Society.

    Abstract

    In situated communication, reference to an entity in the shared visual context can be established using eitheranexpression that conveys precise (minimally specified) or redundant (over-specified) information. There is, however, along-lasting debate in psycholinguistics concerningwhether the latter hinders referential processing. We present evidence from an eyetrackingexperiment recordingfixations as well asthe Index of Cognitive Activity –a novel measure of cognitive workload –supporting the view that over-specifications facilitate processing. We further present originalevidence that, above and beyond the effect of specificity,referring expressions thatuniformly reduce referential entropyalso benefitprocessing
  • Trilsbeek, P., Broeder, D., Elbers, W., & Moreira, A. (2015). A sustainable archiving software solution for The Language Archive. In Proceedings of the 4th International Conference on Language Documentation and Conservation (ICLDC).
  • Tsutsui, S., Wang, X., Weng, G., Zhang, Y., Crandall, D., & Yu, C. (2022). Action recognition based on cross-situational action-object statistics. In Proceedings of the 2022 IEEE International Conference on Development and Learning (ICDL 2022).

    Abstract

    Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition models with greater generalization ability. To do this, we take inspiration from a cognitive mechanism called cross-situational learning, which states that human learners extract the meaning of concepts by observing instances of the same concept across different situations. We perform controlled experiments with various types of action-object associations, and identify key properties of action-object co-occurrence in training data that lead to better classifiers. Given that these properties are missing in the datasets that are typically used to train action classifiers in the computer vision literature, our work provides useful insights on how we should best construct datasets for efficiently training for better generalization.
  • Turco, G., & Gubian, M. (2012). L1 Prosodic transfer and priming effects: A quantitative study on semi-spontaneous dialogues. In Q. Ma, H. Ding, & D. Hirst (Eds.), Proceedings of the 6th International Conference on Speech Prosody (pp. 386-389). International Speech Communication Association (ISCA).

    Abstract

    This paper represents a pilot investigation of primed accentuation patterns produced by advanced Dutch speakers of Italian as a second language (L2). Contrastive accent patterns within prepositional phrases were elicited in a semispontaneous dialogue entertained with a confederate native speaker of Italian. The aim of the analysis was to compare learner’s contrastive accentual configurations induced by the confederate speaker’s prime against those produced by Italian and Dutch natives in the same testing conditions. F0 and speech rate data were analysed by applying powerful datadriven techniques available in the Functional Data Analysis statistical framework. Results reveal different accentual configurations in L1 and L2 Italian in response to the confederate’s prime. We conclude that learner’s accentual patterns mirror those ones produced by their L1 control group (prosodic-transfer hypothesis) although the hypothesis of a transient priming effect on learners’ choice of contrastive patterns cannot be completely ruled out.
  • Vagliano, I., Galke, L., Mai, F., & Scherp, A. (2018). Using adversarial autoencoders for multi-modal automatic playlist continuation. In C.-W. Chen, P. Lamere, M. Schedl, & H. Zamani (Eds.), RecSys Challenge '18: Proceedings of the ACM Recommender Systems Challenge 2018 (pp. 5.1-5.6). New York: ACM. doi:10.1145/3267471.3267476.

    Abstract

    The task of automatic playlist continuation is generating a list of recommended tracks that can be added to an existing playlist. By suggesting appropriate tracks, i. e., songs to add to a playlist, a recommender system can increase the user engagement by making playlist creation easier, as well as extending listening beyond the end of current playlist. The ACM Recommender Systems Challenge 2018 focuses on such task. Spotify released a dataset of playlists, which includes a large number of playlists and associated track listings. Given a set of playlists from which a number of tracks have been withheld, the goal is predicting the missing tracks in those playlists. We participated in the challenge as the team Unconscious Bias and, in this paper, we present our approach. We extend adversarial autoencoders to the problem of automatic playlist continuation. We show how multiple input modalities, such as the playlist titles as well as track titles, artists and albums, can be incorporated in the playlist continuation task.
  • Van Valin Jr., R. D. (1987). Aspects of the interaction of syntax and pragmatics: Discourse coreference mechanisms and the typology of grammatical systems. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 513-531). Amsterdam: Benjamins.

Share this page