Publications

Displaying 101 - 120 of 120
  • Severijnen, G. G., Bosker, H. R., & McQueen, J. M. (2022). Acoustic correlates of Dutch lexical stress re-examined: Spectral tilt is not always more reliable than intensity. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 278-282). doi:10.21437/SpeechProsody.2022-57.

    Abstract

    The present study examined two acoustic cues in the production
    of lexical stress in Dutch: spectral tilt and overall intensity.
    Sluijter and Van Heuven (1996) reported that spectral tilt is a
    more reliable cue to stress than intensity. However, that study
    included only a small number of talkers (10) and only syllables
    with the vowels /aː/ and /ɔ/.
    The present study re-examined this issue in a larger and
    more variable dataset. We recorded 38 native speakers of Dutch
    (20 females) producing 744 tokens of Dutch segmentally
    overlapping words (e.g., VOORnaam vs. voorNAAM, “first
    name” vs. “respectable”), targeting 10 different vowels, in
    variable sentence contexts. For each syllable, we measured
    overall intensity and spectral tilt following Sluijter and Van
    Heuven (1996).
    Results from Linear Discriminant Analyses showed that,
    for the vowel /aː/ alone, spectral tilt showed an advantage over
    intensity, as evidenced by higher stressed/unstressed syllable
    classification accuracy scores for spectral tilt. However, when
    all vowels were included in the analysis, the advantage
    disappeared.
    These findings confirm that spectral tilt plays a larger role
    in signaling stress in Dutch /aː/ but show that, for a larger
    sample of Dutch vowels, overall intensity and spectral tilt are
    equally important.
  • Sidnell, J., & Stivers, T. (Eds.). (2005). Multimodal Interaction [Special Issue]. Semiotica, 156.
  • Slonimska, A., Ozyurek, A., & Campisi, E. (2015). Ostensive signals: markers of communicative relevance of gesture during demonstration to adults and children. In G. Ferré, & M. Tutton (Eds.), Proceedings of the 4th GESPIN - Gesture & Speech in Interaction Conference (pp. 217-222). Nantes: Universite of Nantes.

    Abstract

    Speakers adapt their speech and gestures in various ways for their audience. We investigated further whether they use
    ostensive signals (eye gaze, ostensive speech (e.g. like this, this) or a combination of both) in relation to their gestures
    when talking to different addressees, i.e., to another adult or a child in a multimodal demonstration task. While adults used
    more eye gaze towards their gestures with other adults than with children, they were more likely to use combined
    ostensive signals for children than for adults. Thus speakers mark the communicative relevance of their gestures with different types of ostensive signals and by taking different types of addressees into account.
  • Slonimska, A., Özyürek, A., & Capirci, O. (2022). Simultaneity as an emergent property of sign languages. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 678-680). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Smorenburg, L., Rodd, J., & Chen, A. (2015). The effect of explicit training on the prosodic production of L2 sarcasm by Dutch learners of English. In M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). Glasgow, UK: University of Glasgow.

    Abstract

    Previous research [9] suggests that Dutch learners of (British) English are not able to express sarcasm prosodically in their L2. The present study investigates whether explicit training on the prosodic markers of sarcasm in English can improve learners’ realisation of sarcasm. Sarcastic speech was elicited in short simulated telephone conversations between Dutch advanced learners of English and a native British English-speaking ‘friend’ in two sessions, fourteen days apart. Between the two sessions, participants were trained by means of (1) a presentation, (2) directed independent practice, and (3) evaluation of participants’ production and individual feedback in small groups. L1 British English-speaking raters subsequently evaluated the degree of sarcastic sounding in the participants’ responses on a five-point scale. It was found that significantly higher sarcasm ratings were given to L2 learners’ production obtained after the training than that obtained before the training; explicit training on prosody has a positive effect on learners’ production of sarcasm.
  • Sprenger, S. A., & Van Rijn, H. (2005). Clock time naming: Complexities of a simple task. In B. G. Bara, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th Annual Meeting of the Cognitive Science Society (pp. 2062-2067).
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2015). DIANA, an end-to-end computational model of human word comprehension. In Scottish consortium for ICPhS, M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). Glasgow: University of Glasgow.

    Abstract

    This paper presents DIANA, a new computational model of human speech processing. It is the first model that simulates the complete processing chain from the on-line processing of an acoustic signal to the execution of a response, including reaction times. Moreover it assumes minimal modularity. DIANA consists of three components. The activation component computes a probabilistic match between the input acoustic signal and representations in DIANA’s lexicon, resulting in a list of word hypotheses changing over time as the input unfolds. The decision component operates on this list and selects a word as soon as sufficient evidence is available. Finally, the execution component accounts for the time to execute a behavioral action. We show that DIANA well simulates the average participant in a word recognition experiment.
  • Ten Bosch, L., Boves, L., Tucker, B., & Ernestus, M. (2015). DIANA: Towards computational modeling reaction times in lexical decision in North American English. In Proceedings of Interspeech 2015: The 16th Annual Conference of the International Speech Communication Association (pp. 1576-1580).

    Abstract

    DIANA is an end-to-end computational model of speech processing, which takes as input the speech signal, and provides as output the orthographic transcription of the stimulus, a word/non-word judgment and the associated estimated reaction time. So far, the model has only been tested for Dutch. In this paper, we extend DIANA such that it can also process North American English. The model is tested by having it simulate human participants in a large scale North American English lexical decision experiment. The simulations show that DIANA can adequately approximate the reaction times of an average participant (r = 0.45). In addition, they indicate that DIANA does not yet adequately model the cognitive processes that take place after stimulus offset.
  • ten Bosch, L., & Scharenborg, O. (2005). ASR decoding in a computational model of human word recognition. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1241-1244). ISCA Archive.

    Abstract

    This paper investigates the interaction between acoustic scores and symbolic mismatch penalties in multi-pass speech decoding techniques that are based on the creation of a segment graph followed by a lexical search. The interaction between acoustic and symbolic mismatches determines to a large extent the structure of the search space of these multipass approaches. The background of this study is a recently developed computational model of human word recognition, called SpeM. SpeM is able to simulate human word recognition data and is built as a multi-pass speech decoder. Here, we focus on unravelling the structure of the search space that is used in SpeM and similar decoding strategies. Finally, we elaborate on the close relation between distances in this search space, and distance measures in search spaces that are based on a combination of acoustic and phonetic features.
  • Terband, H., Rodd, J., & Maas, E. (2015). Simulations of feedforward and feedback control in apraxia of speech (AOS): Effects of noise masking on vowel production in the DIVA model. In M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahan, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015).

    Abstract

    Apraxia of Speech (AOS) is a motor speech disorder whose precise nature is still poorly understood. A recent behavioural experiment featuring a noise masking paradigm suggests that AOS reflects a disruption of feedforward control, whereas feedback control is spared and plays a more prominent role in achieving and maintaining segmental contrasts [10]. In the present study, we set out to validate the interpretation of AOS as a feedforward impairment by means of a series of computational simulations with the DIVA model [6, 7] mimicking the behavioural experiment. Simulation results showed a larger reduction in vowel spacing and a smaller vowel dispersion in the masking condition compared to the no-masking condition for the simulated feedforward deficit, whereas the other groups showed an opposite pattern. These results mimic the patterns observed in the human data, corroborating the notion that AOS can be conceptualized as a deficit in feedforward control
  • Torreira, F. (2015). Melodic alternations in Spanish. In The Scottish Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015) (pp. 946.1-5). Glasgow, UK: The University of Glasgow. Retrieved from http://www.icphs2015.info/pdfs/Papers/ICPHS0946.pdf.

    Abstract

    This article describes how the tonal elements of two common Spanish intonation contours –the falling statement and the low-rising-falling request– align with the segmental string in broad-focus utterances differing in number of prosodic words. Using an imitation-and-completion task, we show that (i) the last stressed syllable of the utterance, traditionally viewed as carrying the ‘nuclear’ accent, associates with either a high or a low tonal element depending on phrase length (ii) that certain tonal elements can be realized or omitted depending on the availability of specific metrical positions in their intonational phrase, and (iii) that the high tonal element of the request contour associates with either a stressed syllable or an intonational phrase edge depending on phrase length. On the basis of these facts, and in contrast to previous descriptions of Spanish intonation relying on obligatory and constant nuclear contours (e.g., L* L% for all neutral statements), we argue for a less constrained intonational morphology involving tonal units linked to the segmental string via contour-specific principles.
  • Tourtouri, E. N., Delogu, F., & Crocker, M. W. (2015). ERP indices of situated reference in visual contexts. In D. Noelle, R. Dale, A. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. P. Maglio (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society (CogSci 2015) (pp. 2422-2427). Austin: Cognitive Science Society.

    Abstract

    Violations of the maxims of Quantity occur when utterances provide more (over-specified) or less (under-specified) information than strictly required for referent identification. While behavioural datasuggest that under-specified expressions lead to comprehension difficulty and communicative failure, there is no consensus as to whether over-specified expressions are also detrimental to comprehension. In this study we shed light on this debate, providing neurophysiological evidence supporting the view that extra information facilitates comprehension. We further present novel evidence that referential failure due to under-specification is qualitatively different from explicit cases of referential failure, when no matching referential candidate is available in the context.
  • Trilsbeek, P., Broeder, D., Elbers, W., & Moreira, A. (2015). A sustainable archiving software solution for The Language Archive. In Proceedings of the 4th International Conference on Language Documentation and Conservation (ICLDC).
  • Tsutsui, S., Wang, X., Weng, G., Zhang, Y., Crandall, D., & Yu, C. (2022). Action recognition based on cross-situational action-object statistics. In Proceedings of the 2022 IEEE International Conference on Development and Learning (ICDL 2022).

    Abstract

    Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition models with greater generalization ability. To do this, we take inspiration from a cognitive mechanism called cross-situational learning, which states that human learners extract the meaning of concepts by observing instances of the same concept across different situations. We perform controlled experiments with various types of action-object associations, and identify key properties of action-object co-occurrence in training data that lead to better classifiers. Given that these properties are missing in the datasets that are typically used to train action classifiers in the computer vision literature, our work provides useful insights on how we should best construct datasets for efficiently training for better generalization.
  • Verdonschot, R. G., & Tamaoka, K. (Eds.). (2015). The production of speech sounds across languages [Special Issue]. Japanese Psychological Research, 57(1).
  • Verhoef, T., Roberts, S. G., & Dingemanse, M. (2015). Emergence of systematic iconicity: Transmission, interaction and analogy. In D. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. P. Maglio (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society (CogSci 2015) (pp. 2481-2486). Austin, Tx: Cognitive Science Society.

    Abstract

    Languages combine arbitrary and iconic signals. How do iconic signals emerge and when do they persist? We present an experimental study of the role of iconicity in the emergence of structure in an artificial language. Using an iterated communication game in which we control the signalling medium as well as the meaning space, we study the evolution of communicative signals in transmission chains. This sheds light on how affordances of the communication medium shape and constrain the mappability and transmissibility of form-meaning pairs. We find that iconic signals can form the building blocks for wider compositional patterns
  • Wanrooij, K., De Vos, J., & Boersma, P. (2015). Distributional vowel training may not be effective for Dutch adults. In Scottish consortium for ICPhS 2015, M. Wolters, J. Livingstone, B. Beattie, R. Smith, M. MacMahon, J. Stuart-Smith, & J. Scobbie (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015). Glasgow: University of Glasgow.

    Abstract

    Distributional vowel training for adults has been reported as “effective” for Spanish and Bulgarian learners of Dutch vowels, in studies using a behavioural task. A recent study did not yield a similar clear learning effect for Dutch learners of the English vowel contrast /æ/~/ε/, as measured with event-related potentials (ERPs). The present study aimed to examine the possibility that the latter result was related to the method. As in the ERP study, we tested whether distributional training improved Dutch adult learners’ perception of English /æ/~/ε/. However, we measured behaviour instead of ERPs, in a design identical to that used in the previous studies with Spanish learners. The results do not support an effect of distributional training and thus “replicate” the ERP study. We conclude that it remains unclear whether distributional vowel training is effective for Dutch adults.
  • Woensdregt, M., Jara-Ettinger, J., & Rubio-Fernandez, P. (2022). Language universals rely on social cognition: Computational models of the use of this and that to redirect the receiver’s attention. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 1382-1388). Toronto, Canada: Cognitive Science Society.

    Abstract

    Demonstratives—simple referential devices like this and that—are linguistic universals, but their meaning varies cross-linguistically. In languages like English and Italian, demonstratives are thought to encode the referent’s distance from the producer (e.g., that one means “the one far away from me”),
    while in others, like Portuguese and Spanish, they encode relative distance from both producer and receiver (e.g., aquel means “the one far away from both of us”). Here we propose that demonstratives are also sensitive to the receiver’s focus of attention, hence requiring a deeper form of social cognition
    than previously thought. We provide initial empirical and computational evidence for this idea, suggesting that producers use
    demonstratives to redirect the receiver’s attention towards the intended referent, rather than only to indicate its physical distance.
  • Zhang, Y., & Yu, C. (2022). Examining real-time attention dynamics in parent-infant picture book reading. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 1367-1374). Toronto, Canada: Cognitive Science Society.

    Abstract

    Picture book reading is a common word-learning context from which parents repeatedly name objects to their child and it has been found to facilitate early word learning. To learn the correct word-object mappings in a book-reading context, infants need to be able to link what they see with what they hear. However, given multiple objects on every book page, it is not clear how infants direct their attention to objects named by parents. The aim of the current study is to examine how infants mechanistically discover the correct word-object mappings during book reading in real time. We used head-mounted eye-tracking during parent-infant picture book reading and measured the infant's moment-by-moment visual attention to the named referent. We also examined how gesture cues provided by both the child and the parent may influence infants' attention to the named target. We found that although parents provided many object labels during book reading, infants were not able to attend to the named objects easily. However, their abilities to follow and use gestures to direct the other social partner’s attention increase the chance of looking at the named target during parent naming.
  • Zhang, Y., Yurovsky, D., & Yu, C. (2015). Statistical word learning is a continuous process: Evidence from the human simulation paradigm. In D. Noelle, R. Dale, A. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. P. Maglio (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society (CogSci 2015) (pp. 2422-2427). Austin: Cognitive Science Society.

    Abstract

    In the word-learning domain, both adults and young children are able to find the correct referent of a word from highly ambiguous contexts that involve many words and objects by computing distributional statistics across the co-occurrences of words and referents at multiple naming moments (Yu & Smith, 2007; Smith & Yu, 2008). However, there is still debate regarding how learners accumulate distributional information to learn object labels in natural learning environments, and what underlying learning mechanism learners are most likely to adopt. Using the Human Simulation Paradigm (Gillette, Gleitman, Gleitman & Lederer, 1999), we found that participants’ learning performance gradually improved and that their ability to remember and carry over partial knowledge from past learning instances facilitated subsequent learning. These results support the statistical learning model that word learning is a continuous process.

Share this page