Publications

Displaying 101 - 156 of 156
  • Matsuo, A. (2004). Young children's understanding of ongoing vs. completion in present and perfective participles. In J. v. Kampen, & S. Baauw (Eds.), Proceedings of GALA 2003 (pp. 305-316). Utrecht: Netherlands Graduate School of Linguistics (LOT).
  • Matteo, M., & Bosker, H. R. (2024). How to test gesture-speech integration in ten minutes. In Y. Chen, A. Chen, & A. Arvaniti (Eds.), Proceedings of Speech Prosody 2024 (pp. 737-741). doi:10.21437/SpeechProsody.2024-149.

    Abstract

    Human conversations are inherently multimodal, including auditory speech, visual articulatory cues, and hand gestures. Recent studies demonstrated that the timing of a simple up-and-down hand movement, known as a beat gesture, can affect speech perception. A beat gesture falling on the first syllable of a disyllabic word induces a bias to perceive a strong-weak stress pattern (i.e., “CONtent”), while a beat gesture falling on the second syllable combined with the same acoustics biases towards a weak-strong stress pattern (“conTENT”). This effect, termed the “manual McGurk effect”, has been studied in both in-lab and online studies, employing standard experimental sessions lasting approximately forty minutes. The present work tests whether the manual McGurk effect can be observed in an online short version (“mini-test”) of the original paradigm, lasting only ten minutes. Additionally, we employ two different response modalities, namely a two-alternative forced choice and a visual analog scale. A significant manual McGurk effect was observed with both response modalities. Overall, the present study demonstrates the feasibility of employing a ten-minute manual McGurk mini-test to obtain a measure of gesture-speech integration. As such, it may lend itself for inclusion in large-scale test batteries that aim to quantify individual variation in language processing.
  • McDonough, J., Lehnert-LeHouillier, H., & Bardhan, N. P. (2009). The perception of nasalized vowels in American English: An investigation of on-line use of vowel nasalization in lexical access. In Nasal 2009.

    Abstract

    The goal of the presented study was to investigate the use of coarticulatory vowel nasalization in lexical access by native speakers of American English. In particular, we compare the use of coart culatory place of articulation cues to that of coarticulatory vowel nasalization. Previous research on lexical access has shown that listeners use cues to the place of articulation of a postvocalic stop in the preceding vowel. However, vowel nasalization as cue to an upcoming nasal consonant has been argued to be a more complex phenomenon. In order to establish whether coarticulatory vowel nasalization aides in the process of lexical access in the same way as place of articulation cues do, we conducted two perception experiments: an off-line 2AFC discrimination task and an on-line eyetracking study using the visual world paradigm. The results of our study suggest that listeners are indeed able to use vowel nasalization in similar ways to place of articulation information, and that both types of cues aide in lexical access.
  • Mishra, C., Nandanwar, A., & Mishra, S. (2024). HRI in Indian education: Challenges opportunities. In H. Admoni, D. Szafir, W. Johal, & A. Sandygulova (Eds.), Designing an introductory HRI course (workshop at HRI 2024). ArXiv. doi:10.48550/arXiv.2403.12223.

    Abstract

    With the recent advancements in the field of robotics and the increased focus on having general-purpose robots widely available to the general public, it has become increasingly necessary to pursue research into Human-robot interaction (HRI). While there have been a lot of works discussing frameworks for teaching HRI in educational institutions with a few institutions already offering courses to students, a consensus on the course content still eludes the field. In this work, we highlight a few challenges and opportunities while designing an HRI course from an Indian perspective. These topics warrant further deliberations as they have a direct impact on the design of HRI courses and wider implications for the entire field.
  • Motiekaitytė, K., Grosseck, O., Wolf, L., Bosker, H. R., Peeters, D., Perlman, M., Ortega, G., & Raviv, L. (2024). Iconicity and compositionality in emerging vocal communication systems: a Virtual Reality approach. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 387-389). Nijmegen: The Evolution of Language Conferences.
  • Musgrave, S., & Cutfield, S. (2009). Language documentation and an Australian National Corpus. In M. Haugh, K. Burridge, J. Mulder, & P. Peters (Eds.), Selected proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages (pp. 10-18). Somerville: Cascadilla Proceedings Project.

    Abstract

    Corpus linguistics and language documentation are usually considered separate subdisciplines within linguistics, having developed from different traditions and often operating on different scales, but the authors will suggest that there are commonalities to the two: both aim to represent language use in a community, and both are concerned with managing digital data. The authors propose that the development of the Australian National Corpus (AusNC) be guided by the experience of language documentation in the management of multimodal digital data and its annotation, and in ethical issues pertaining to making the data accessible. This would allow an AusNC that is distributed, multimodal, and multilingual, with holdings of text, audio, and video data distributed across multiple institutions; and including Indigenous, sign, and migrant community languages. An audit of language material held by Australian institutions and individuals is necessary to gauge the diversity and volume of possible content, and to inform common technical standards.
  • Nijland, L., & Janse, E. (Eds.). (2009). Auditory processing in speakers with acquired or developmental language disorders [Special Issue]. Clinical Linguistics and Phonetics, 23(3).
  • Otake, T., Davis, S. M., & Cutler, A. (1995). Listeners’ representations of within-word structure: A cross-linguistic and cross-dialectal investigation. In J. Pardo (Ed.), Proceedings of EUROSPEECH 95: Vol. 3 (pp. 1703-1706). Madrid: European Speech Communication Association.

    Abstract

    Japanese, British English and American English listeners were presented with spoken words in their native language, and asked to mark on a written transcript of each word the first natural division point in the word. The results showed clear and strong patterns of consensus, indicating that listeners have available to them conscious representations of within-word structure. Orthography did not play a strongly deciding role in the results. The patterns of response were at variance with results from on-line studies of speech segmentation, suggesting that the present task taps not those representations used in on-line listening, but levels of representation which may involve much richer knowledge of word-internal structure.
  • Ozyurek, A., & Kita, S. (1999). Expressing manner and path in English and Turkish: Differences in speech, gesture, and conceptualization. In M. Hahn, & S. C. Stoness (Eds.), Proceedings of the Twenty-first Annual Conference of the Cognitive Science Society (pp. 507-512). London: Erlbaum.
  • Pacheco, A., Araújo, S., Faísca, L., Petersson, K. M., & Reis, A. (2009). Profiling dislexic children: Phonology and visual naming skills. In Abstracts presented at the International Neuropsychological Society, Finnish Neuropsychological Society, Joint Mid-Year Meeting July 29-August 1, 2009. Helsinki, Finland & Tallinn, Estonia (pp. 40). Retrieved from http://www.neuropsykologia.fi/ins2009/INS_MY09_Abstract.pdf.
  • Peirolo, M., Meyer, A. S., & Frances, C. (2024). Investigating the causes of prosodic marking in self-repairs: An automatic process? In Y. Chen, A. Chen, & A. Arvaniti (Eds.), Proceedings of Speech Prosody 2024 (pp. 1080-1084). doi:10.21437/SpeechProsody.2024-218.

    Abstract

    Natural speech involves repair. These repairs are often highlighted through prosodic marking (Levelt & Cutler, 1983). Prosodic marking usually entails an increase in pitch, loudness, and/or duration that draws attention to the corrected word. While it is established that natural self-repairs typically elicit prosodic marking, the exact cause of this is unclear. This study investigates whether producing a prosodic marking emerges from an automatic correction process or has a communicative purpose. In the current study, we elicit corrections to test whether all self-corrections elicit prosodic marking. Participants carried out a picture-naming task in which they described two images presented on-screen. To prompt self-correction, the second image was altered in some cases, requiring participants to abandon their initial utterance and correct their description to match the new image. This manipulation was compared to a control condition in which only the orientation of the object would change, eliciting no self-correction while still presenting a visual change. We found that the replacement of the item did not elicit a prosodic marking, regardless of the type of change. Theoretical implications and research directions are discussed, in particular theories of prosodic planning.
  • de Reus, K., Benítez-Burraco, A., Hersh, T. A., Groot, N., Lambert, M. L., Slocombe, K. E., Vernes, S. C., & Raviv, L. (2024). Self-domestication traits in vocal learning mammals. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 105-108). Nijmegen: The Evolution of Language Conferences.
  • Ringersma, J., Zinn, C., & Kemps-Snijders, M. (2009). LEXUS & ViCoS From lexical to conceptual spaces. In 1st International Conference on Language Documentation and Conservation (ICLDC).

    Abstract

    LEXUS and ViCoS: from lexicon to conceptual spaces LEXUS is a web-based lexicon tool and the knowledge space software ViCoS is an extension of LEXUS, allowing users to create relations between objects in and across lexica. LEXUS and ViCoS are part of the Language Archiving Technology software, developed at the MPI for Psycholinguistics to archive and enrich linguistic resources collected in the framework of language documentation projects. LEXUS is of primary interest for language documentation, offering the possibility to not just create a digital dictionary, but additionally it allows the creation of multi-media encyclopedic lexica. ViCoS provides an interface between the lexical space and the ontological space. Its approach permits users to model a world of concepts and their interrelations based on categorization patterns made by the speech community. We describe the LEXUS and ViCoS functionalities using three cases from DoBeS language documentation projects: (1) Marquesan The Marquesan lexicon was initially created in Toolbox and imported into LEXUS using the Toolbox import functionality. The lexicon is enriched with multi-media to illustrate the meaning of the words in its cultural environment. Members of the speech community consider words as keys to access and describe relevant parts of their life and traditions. Their understanding of words is best described by the various associations they evoke rather than in terms of any formal theory of meaning. Using ViCoS a knowledge space of related concepts is being created. (2) Kola-Sámi Two lexica are being created in LEXUS: RuSaDic lexicon is a Russian-Kildin wordlist in which the entries are of relative limited structure and content. SaRuDiC is a more complex structured lexicon with much richer content, including multi-media fragments and derivations. Using ViCoS we have created a connection between the two lexica, so that speakers who are familiair with Russian and wish to revitalize their Kildin can enter the lexicon through the RuSaDic and from there approach the informative SaRuDic. Similary we will create relations from the two lexica to external open databases, like e.g. Álgu. (3) Beaver A speaker database including kinship relations has been created and the database has been imported into LEXUS. In the LEXUS views the relations for individual speakers are being displayed. Using ViCoS the relational information from the database will be extracted to form a kisnhip relation space with specific relation types, like e.g 'mother-of'. The whole set of relations from the database can be displayed in one ViCoS relation window, and zoom functionality is available.
  • Rohrer, P. L., Bujok, R., Van Maastricht, L., & Bosker, H. R. (2024). The timing of beat gestures affects lexical stress perception in Spanish. In Y. Chen, A. Chen, & A. Arvaniti (Eds.), Proceedings Speech Prosody 2024 (pp. 702-706). doi:10.21437/SpeechProsody.2024-142.

    Abstract

    It has been shown that when speakers produce hand gestures, addressees are attentive towards these gestures, using them to facilitate speech processing. Even relatively simple “beat” gestures are taken into account to help process aspects of speech such as prosodic prominence. In fact, recent evidence suggests that the timing of a beat gesture can influence spoken word recognition. Termed the manual McGurk Effect, Dutch participants, when presented with lexical stress minimal pair continua in Dutch, were biased to hear lexical stress on the syllable that coincided with a beat gesture. However, little is known about how this manual McGurk effect would surface in languages other than Dutch, with different acoustic cues to prominence, and variable gestures. Therefore, this study tests the effect in Spanish where lexical stress is arguably even more important, being a contrastive cue in the regular verb conjugation system. Results from 24 participants corroborate the effect in Spanish, namely that when given the same auditory stimulus, participants were biased to perceive lexical stress on the syllable that visually co-occurred with a beat gesture. These findings extend the manual McGurk effect to a different language, emphasizing the impact of gestures' timing on prosody perception and spoken word recognition.
  • Rohrer, P. L., Hong, Y., & Bosker, H. R. (2024). Gestures time to vowel onset and change the acoustics of the word in Mandarin. In Y. Chen, A. Chen, & A. Arvaniti (Eds.), Proceedings of Speech Prosody 2024 (pp. 866-870). doi:10.21437/SpeechProsody.2024-175.

    Abstract

    Recent research on multimodal language production has revealed that prominence in speech and gesture go hand-in-hand. Specifically, peaks in gesture (i.e., the apex) seem to closely coordinate with peaks in fundamental frequency (F0). The nature of this relationship may also be bi-directional, as it has also been shown that the production of gesture directly affects speech acoustics. However, most studies on the topic have largely focused on stress-based languages, where fundamental frequency has a prominence-lending function. Less work has been carried out on lexical tone languages such as Mandarin, where F0 is lexically distinctive. In this study, four native Mandarin speakers were asked to produce single monosyllabic CV words, taken from minimal lexical tone triplets (e.g., /pi1/, /pi2/, /pi3/), either with or without a beat gesture. Our analyses of the timing of the gestures showed that the gesture apex most stably occurred near vowel onset, with consonantal duration being the strongest predictor of apex placement. Acoustic analyses revealed that words produced with gesture showed raised F0 contours, greater intensity, and shorter durations. These findings further our understanding of gesture-speech alignment in typologically diverse languages, and add to the discussion about multimodal prominence.
  • Ronderos, C. R., Zhang, Y., & Rubio-Fernandez, P. (2024). Weighted parameters in demonstrative use: The case of Spanish teens and adults. In L. K. Samuelson, S. L. Frank, M. Toneva, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 3279-3286).
  • Rubio-Fernandez, P., Long, M., Shukla, V., Bhatia, V., Mahapatra, A., Ralekar, C., Ben-Ami, S., & Sinha, P. (2024). Multimodal communication in newly sighted children: An investigation of the relation between visual experience and pragmatic development. In L. K. Samuelson, S. L. Frank, M. Toneva, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 2560-2567).

    Abstract

    We investigated the relationship between visual experience and pragmatic development by testing the socio-communicative skills of a unique population: the Prakash children of India, who received treatment for congenital cataracts after years of visual deprivation. Using two different referential communication tasks, our study investigated Prakash' children ability to produce sufficiently informative referential expressions (e.g., ‘the green pear' or ‘the small plate') and pay attention to their interlocutor's face during the task (Experiment 1), as well as their ability to recognize a speaker's referential intent through non-verbal cues such as head turning and pointing (Experiment 2). Our results show that Prakash children have strong pragmatic skills, but do not look at their interlocutor's face as often as neurotypical children do. However, longitudinal analyses revealed an increase in face fixations, suggesting that over time, Prakash children come to utilize their improved visual skills for efficient referential communication.

    Additional information

    link to eScholarship
  • De Ruiter, J. P. (2004). On the primacy of language in multimodal communication. In Workshop Proceedings on Multimodal Corpora: Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces.(LREC2004) (pp. 38-41). Paris: ELRA - European Language Resources Association (CD-ROM).

    Abstract

    In this paper, I will argue that although the study of multimodal interaction offers exciting new prospects for Human Computer Interaction and human-human communication research, language is the primary form of communication, even in multimodal systems. I will support this claim with theoretical and empirical arguments, mainly drawn from human-human communication research, and will discuss the implications for multimodal communication research and Human-Computer Interaction.
  • Sander, J., Çetinçelik, M., Zhang, Y., Rowland, C. F., & Harmon, Z. (2024). Why does joint attention predict vocabulary acquisition? The answer depends on what coding scheme you use. In L. K. Samuelson, S. L. Frank, M. Toneva, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 1607-1613).

    Abstract

    Despite decades of study, we still know less than we would like about the association between joint attention (JA) and language acquisition. This is partly because of disagreements on how to operationalise JA. In this study, we examine the impact of applying two different, influential JA operationalisation schemes to the same dataset of child-caregiver interactions, to determine which yields a better fit to children's later vocabulary size. Two coding schemes— one defining JA in terms of gaze overlap and one in terms of social aspects of shared attention—were applied to video-recordings of dyadic naturalistic toy-play interactions (N=45). We found that JA was predictive of later production vocabulary when operationalised as shared focus (study 1), but also that its operationalisation as shared social awareness increased its predictive power (study 2). Our results emphasise the critical role of methodological choices in understanding how and why JA is associated with vocabulary size.
  • Sauter, D., Scott, S., & Calder, A. (2004). Categorisation of vocally expressed positive emotion: A first step towards basic positive emotions? [Abstract]. Proceedings of the British Psychological Society, 12, 111.

    Abstract

    Most of the study of basic emotion expressions has focused on facial expressions and little work has been done to specifically investigate happiness, the only positive of the basic emotions (Ekman & Friesen, 1971). However, a theoretical suggestion has been made that happiness could be broken down into discrete positive emotions, which each fulfil the criteria of basic emotions, and that these would be expressed vocally (Ekman, 1992). To empirically test this hypothesis, 20 participants categorised 80 paralinguistic sounds using the labels achievement, amusement, contentment, pleasure and relief. The results suggest that achievement, amusement and relief are perceived as distinct categories, which subjects accurately identify. In contrast, the categories of contentment and pleasure were systematically confused with other responses, although performance was still well above chance levels. These findings are initial evidence that the positive emotions engage distinct vocal expressions and may be considered to be distinct emotion categories.
  • Sauter, D., Eisner, F., Ekman, P., & Scott, S. K. (2009). Universal vocal signals of emotion. In N. Taatgen, & H. Van Rijn (Eds.), Proceedings of the 31st Annual Meeting of the Cognitive Science Society (CogSci 2009) (pp. 2251-2255). Cognitive Science Society.

    Abstract

    Emotional signals allow for the sharing of important information with conspecifics, for example to warn them of danger. Humans use a range of different cues to communicate to others how they feel, including facial, vocal, and gestural signals. Although much is known about facial expressions of emotion, less research has focused on affect in the voice. We compare British listeners to individuals from remote Namibian villages who have had no exposure to Western culture, and examine recognition of non-verbal emotional vocalizations, such as screams and laughs. We show that a number of emotions can be universally recognized from non-verbal vocal signals. In addition we demonstrate the specificity of this pattern, with a set of additional emotions only recognized within, but not across these cultural groups. Our findings indicate that a small set of primarily negative emotions have evolved signals across several modalities, while most positive emotions are communicated with culture-specific signals.
  • Scharenborg, O., Boves, L., & Ten Bosch, L. (2004). ‘On-line early recognition’ of polysyllabic words in continuous speech. In S. Cassidy, F. Cox, R. Mannell, & P. Sallyanne (Eds.), Proceedings of the Tenth Australian International Conference on Speech Science & Technology (pp. 387-392). Canberra: Australian Speech Science and Technology Association Inc.

    Abstract

    In this paper, we investigate the ability of SpeM, our recognition system based on the combination of an automatic phone recogniser and a wordsearch module, to determine as early as possible during the word recognition process whether a word is likely to be recognised correctly (this we refer to as ‘on-line’ early word recognition). We present two measures that can be used to predict whether a word is correctly recognised: the Bayesian word activation and the amount of available (acoustic) information for a word. SpeM was tested on 1,463 polysyllabic words in 885 continuous speech utterances. The investigated predictors indicated that a word activation that is 1) high (but not too high) and 2) based on more phones is more reliable to predict the correctness of a word than a similarly high value based on a small number of phones or a lower value of the word activation.
  • Scharenborg, O., & Okolowski, S. (2009). Lexical embedding in spoken Dutch. In INTERSPEECH 2009 - 10th Annual Conference of the International Speech Communication Association (pp. 1879-1882). ISCA Archive.

    Abstract

    A stretch of speech is often consistent with multiple words, e.g., the sequence /hæm/ is consistent with ‘ham’ but also with the first syllable of ‘hamster’, resulting in temporary ambiguity. However, to what degree does this lexical embedding occur? Analyses on two corpora of spoken Dutch showed that 11.9%-19.5% of polysyllabic word tokens have word-initial embedding, while 4.1%-7.5% of monosyllabic word tokens can appear word-initially embedded. This is much lower than suggested by an analysis of a large dictionary of Dutch. Speech processing thus appears to be simpler than one might expect on the basis of statistics on a dictionary.
  • Scharenborg, O. (2009). Using durational cues in a computational model of spoken-word recognition. In INTERSPEECH 2009 - 10th Annual Conference of the International Speech Communication Association (pp. 1675-1678). ISCA Archive.

    Abstract

    Evidence that listeners use durational cues to help resolve temporarily ambiguous speech input has accumulated over the past few years. In this paper, we investigate whether durational cues are also beneficial for word recognition in a computational model of spoken-word recognition. Two sets of simulations were carried out using the acoustic signal as input. The simulations showed that the computational model, like humans, takes benefit from durational cues during word recognition, and uses these to disambiguate the speech signal. These results thus provide support for the theory that durational cues play a role in spoken-word recognition.
  • Schuppler, B., Van Dommelen, W., Koreman, J., & Ernestus, M. (2009). Word-final [t]-deletion: An analysis on the segmental and sub-segmental level. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 2275-2278). Causal Productions Pty Ltd.

    Abstract

    This paper presents a study on the reduction of word-final [t]s in conversational standard Dutch. Based on a large amount of tokens annotated on the segmental level, we show that the bigram frequency and the segmental context are the main predictors for the absence of [t]s. In a second study, we present an analysis of the detailed acoustic properties of word-final [t]s and we show that bigram frequency and context also play a role on the subsegmental level. This paper extends research on the realization of /t/ in spontaneous speech and shows the importance of incorporating sub-segmental properties in models of speech.
  • Scott, S., & Sauter, D. (2004). Vocal expressions of emotion and positive and negative basic emotions [Abstract]. Proceedings of the British Psychological Society, 12, 156.

    Abstract

    Previous studies have indicated that vocal and facial expressions of the ‘basic’ emotions share aspects of processing. Thus amygdala damage compromises the perception of fear and anger from the face and from the voice. In the current study we tested the hypothesis that there exist positive basic emotions, expressed mainly in the voice (Ekman, 1992). Vocal stimuli were produced to express the specific positive emotions of amusement, achievement, pleasure, contentment and relief.
  • Seuren, P. A. M. (2009). Logical systems and natural logical intuitions. In Current issues in unity and diversity of languages: Collection of the papers selected from the CIL 18, held at Korea University in Seoul on July 21-26, 2008. http://www.cil18.org (pp. 53-60).

    Abstract

    The present paper is part of a large research programme investigating the nature and properties of the predicate logic inherent in natural language. The general hypothesis is that natural speakers start off with a basic-natural logic, based on natural cognitive functions, including the basic-natural way of dealing with plural objects. As culture spreads, functional pressure leads to greater generalization and mathematical correctness, yielding ever more refined systems until the apogee of standard modern predicate logic. Four systems of predicate calculus are considered: Basic-Natural Predicate Calculus (BNPC), Aritsotelian-Abelardian Predicate Calculus (AAPC), Aritsotelian-Boethian Predicate Calculus (ABPC), also known as the classic Square of Opposition, and Standard Modern Predicate Calculus (SMPC). (ABPC is logically faulty owing to its Undue Existential Import (UEI), but that fault is repaired by the addition of a presuppositional component to the logic.) All four systems are checked against seven natural logical intuitions. It appears that BNPC scores best (five out of seven), followed by ABPC (three out of seven). AAPC and SMPC finish ex aequo with two out of seven.
  • Shattuck-Hufnagel, S., & Cutler, A. (1999). The prosody of speech error corrections revisited. In J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. Bailey (Eds.), Proceedings of the Fourteenth International Congress of Phonetic Sciences: Vol. 2 (pp. 1483-1486). Berkely: University of California.

    Abstract

    A corpus of digitized speech errors is used to compare the prosody of correction patterns for word-level vs. sound-level errors. Results for both peak F0 and perceived prosodic markedness confirm that speakers are more likely to mark corrections of word-level errors than corrections of sound-level errors, and that errors ambiguous between word-level and soundlevel (such as boat for moat) show correction patterns like those for sound level errors. This finding increases the plausibility of the claim that word-sound-ambiguous errors arise at the same level of processing as sound errors that do not form words.
  • Shatzman, K. B. (2004). Segmenting ambiguous phrases using phoneme duration. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 329-332). Seoul: Sunjijn Printing Co.

    Abstract

    The results of an eye-tracking experiment are presented in which Dutch listeners' eye movements were monitored as they heard sentences and saw four pictured objects. Participants were instructed to click on the object mentioned in the sentence. In the critical sentences, a stop-initial target (e.g., "pot") was preceded by an [s], thus causing ambiguity regarding whether the sentence refers to a stop-initial or a cluster-initial word (e.g., "spot"). Participants made fewer fixations to the target pictures when the stop and the preceding [s] were cross-spliced from the cluster-initial word than when they were spliced from a different token of the sentence containing the stop-initial word. Acoustic analyses showed that the two versions differed in various measures, but only one of these - the duration of the [s] - correlated with the perceptual effect. Thus, in this context, the [s] duration information is an important factor guiding word recognition.
  • Silverstein, P., Bergmann, C., & Syed, M. (Eds.). (2024). Open science and metascience in developmental psychology [Special Issue]. Infant and Child Development, 33(1).
  • Stehouwer, H., & van Zaanen, M. (2009). Language models for contextual error detection and correction. In Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference (pp. 41-48). Association for Computational Linguistics.

    Abstract

    The problem of identifying and correcting confusibles, i.e. context-sensitive spelling errors, in text is typically tackled using specifically trained machine learning classifiers. For each different set of confusibles, a specific classifier is trained and tuned. In this research, we investigate a more generic approach to context-sensitive confusible correction. Instead of using specific classifiers, we use one generic classifier based on a language model. This measures the likelihood of sentences with different possible solutions of a confusible in place. The advantage of this approach is that all confusible sets are handled by a single model. Preliminary results show that the performance of the generic classifier approach is only slightly worse that that of the specific classifier approach
  • Stehouwer, H., & Van Zaanen, M. (2009). Token merging in language model-based confusible disambiguation. In T. Calders, K. Tuyls, & M. Pechenizkiy (Eds.), Proceedings of the 21st Benelux Conference on Artificial Intelligence (pp. 241-248).

    Abstract

    In the context of confusible disambiguation (spelling correction that requires context), the synchronous back-off strategy combined with traditional n-gram language models performs well. However, when alternatives consist of a different number of tokens, this classification technique cannot be applied directly, because the computation of the probabilities is skewed. Previous work already showed that probabilities based on different order n-grams should not be compared directly. In this article, we propose new probability metrics in which the size of the n is varied according to the number of tokens of the confusible alternative. This requires access to n-grams of variable length. Results show that the synchronous back-off method is extremely robust. We discuss the use of suffix trees as a technique to store variable length n-gram information efficiently.
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Turn-taking in social talk dialogues: Temporal, formal and functional aspects. In 9th International Conference Speech and Computer (SPECOM'2004) (pp. 454-461).

    Abstract

    This paper presents a quantitative analysis of the
    turn-taking mechanism evidenced in 93 telephone
    dialogues that were taken from the 9-million-word
    Spoken Dutch Corpus. While the first part of the paper
    focuses on the temporal phenomena of turn taking, such
    as durations of pauses and overlaps of turns in the
    dialogues, the second part explores the discoursefunctional
    aspects of utterances in a subset of 8
    dialogues that were annotated especially for this
    purpose. The results show that speakers adapt their turntaking
    behaviour to the interlocutor’s behaviour.
    Furthermore, the results indicate that male-male dialogs
    show a higher proportion of overlapping turns than
    female-female dialogues.
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Durational aspects of turn-taking in spontaneous face-to-face and telephone dialogues. In P. Sojka, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: Proceedings of the 7th International Conference TSD 2004 (pp. 563-570). Heidelberg: Springer.

    Abstract

    On the basis of two-speaker spontaneous conversations, it is shown that the distributions of both pauses and speech-overlaps of telephone and faceto-face dialogues have different statistical properties. Pauses in a face-to-face
    dialogue last up to 4 times longer than pauses in telephone conversations in functionally comparable conditions. There is a high correlation (0.88 or larger) between the average pause duration for the two speakers across face-to-face
    dialogues and telephone dialogues. The data provided form a first quantitative analysis of the complex turn-taking mechanism evidenced in the dialogues available in the 9-million-word Spoken Dutch Corpus.
  • Torreira, F., & Ernestus, M. (2009). Probabilistic effects on French [t] duration. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 448-451). Causal Productions Pty Ltd.

    Abstract

    The present study shows that [t] consonants are affected by probabilistic factors in a syllable-timed language as French, and in spontaneous as well as in journalistic speech. Study 1 showed a word bigram frequency effect in spontaneous French, but its exact nature depended on the corpus on which the probabilistic measures were based. Study 2 investigated journalistic speech and showed an effect of the joint frequency of the test word and its following word. We discuss the possibility that these probabilistic effects are due to the speaker’s planning of upcoming words, and to the speaker’s adaptation to the listener’s needs.
  • Uddén, J., Araújo, S., Forkstam, C., Ingvar, M., Hagoort, P., & Petersson, K. M. (2009). A matter of time: Implicit acquisition of recursive sequence structures. In N. Taatgen, & H. Van Rijn (Eds.), Proceedings of the Thirty-First Annual Conference of the Cognitive Science Society (pp. 2444-2449).

    Abstract

    A dominant hypothesis in empirical research on the evolution of language is the following: the fundamental difference between animal and human communication systems is captured by the distinction between regular and more complex non-regular grammars. Studies reporting successful artificial grammar learning of nested recursive structures and imaging studies of the same have methodological shortcomings since they typically allow explicit problem solving strategies and this has been shown to account for the learning effect in subsequent behavioral studies. The present study overcomes these shortcomings by using subtle violations of agreement structure in a preference classification task. In contrast to the studies conducted so far, we use an implicit learning paradigm, allowing the time needed for both abstraction processes and consolidation to take place. Our results demonstrate robust implicit learning of recursively embedded structures (context-free grammar) and recursive structures with cross-dependencies (context-sensitive grammar) in an artificial grammar learning task spanning 9 days. Keywords: Implicit artificial grammar learning; centre embedded; cross-dependency; implicit learning; context-sensitive grammar; context-free grammar; regular grammar; non-regular grammar
  • Uluşahin, O., Bosker, H. R., McQueen, J. M., & Meyer, A. S. (2024). Knowledge of a talker’s f0 affects subsequent perception of voiceless fricatives. In Y. Chen, A. Chen, & A. Arvaniti (Eds.), Proceedings of Speech Prosody 2024 (pp. 432-436).

    Abstract

    The human brain deals with the infinite variability of speech through multiple mechanisms. Some of them rely solely on information in the speech input (i.e., signal-driven) whereas some rely on linguistic or real-world knowledge (i.e., knowledge-driven). Many signal-driven perceptual processes rely on the enhancement of acoustic differences between incoming speech sounds, producing contrastive adjustments. For instance, when an ambiguous voiceless fricative is preceded by a high fundamental frequency (f0) sentence, the fricative is perceived as having lower a spectral center of gravity (CoG). However, it is not clear whether knowledge of a talker’s typical f0 can lead to similar contrastive effects. This study investigated a possible talker f0 effect on fricative CoG perception. In the exposure phase, two groups of participants (N=16 each) heard the same talker at high or low f0 for 20 minutes. Later, in the test phase, participants rated fixed-f0 /?ɔk/ tokens as being /sɔk/ (i.e., high CoG) or /ʃɔk/ (i.e., low CoG), where /?/ represents a fricative from a 5-step /s/-/ʃ/ continuum. Surprisingly, the data revealed the opposite of our contrastive hypothesis, whereby hearing high f0 instead biased perception towards high CoG. Thus, we demonstrated that talker f0 information affects fricative CoG perception.
  • Vainio, M., Suni, A., Raitio, T., Nurminen, J., Järvikivi, J., & Alku, P. (2009). New method for delexicalization and its application to prosodic tagging for text-to-speech synthesis. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 1703-1706).

    Abstract

    This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibility to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delexicalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The experiment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.
  • Van Berkum, J. J. A. (2009). Does the N400 directly reflect compositional sense-making? Psychophysiology, Special Issue: Society for Psychophysiological Research Abstracts for the Forty-Ninth Annual Meeting, 46(Suppl. 1), s2.

    Abstract

    A not uncommon assumption in psycholinguistics is that the N400 directly indexes high-level semantic integration, the compositional, word-driven construction of sentence- and discourse-level meaning in some language-relevant unification space. The various discourse- and speaker-dependent modulations of the N400 uncovered by us and others are often taken to support this 'compositional integration' position. In my talk, I will argue that these N400 modulations are probably better interpreted as only indirectly reflecting compositional sense-making. The account that I will advance for these N400 effects is a variant of the classic Kutas and Federmeier (2002, TICS) memory retrieval account in which context effects on the word-elicited N400 are taken to reflect contextual priming of LTM access. It differs from the latter in making more explicit that the contextual cues that prime access to a word's meaning in LTM can range from very simple (e.g., a single concept) to very complex ones (e.g., a structured representation of the current discourse). Furthermore, it incorporates the possibility, suggested by recent N400 findings, that semantic retrieval can also be intensified in response to certain ‘relevance signals’, such as strong value-relevance, or a marked delivery (linguistic focus, uncommon choice of words, etc). In all, the perspective I'll draw is that in the context of discourse-level language processing, N400 effects reflect an 'overlay of technologies', with the construction of discourse-level representations riding on top of more ancient sense-making technology.
  • Van Valin Jr., R. D. (1987). Aspects of the interaction of syntax and pragmatics: Discourse coreference mechanisms and the typology of grammatical systems. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 513-531). Amsterdam: Benjamins.
  • Van Geenhoven, V. (1999). A before-&-after picture of when-, before-, and after-clauses. In T. Matthews, & D. Strolovitch (Eds.), Proceedings of the 9th Semantics and Linguistic Theory Conference (pp. 283-315). Ithaca, NY, USA: Cornell University.
  • Van de Ven, M., Tucker, B. V., & Ernestus, M. (2009). Semantic context effects in the recognition of acoustically unreduced and reduced words. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (pp. 1867-1870). Causal Productions Pty Ltd.

    Abstract

    Listeners require context to understand the casual pronunciation variants of words that are typical of spontaneous speech (Ernestus et al., 2002). The present study reports two auditory lexical decision experiments, investigating listeners' use of semantic contextual information in the comprehension of unreduced and reduced words. We found a strong semantic priming effect for low frequency unreduced words, whereas there was no such effect for reduced words. Word frequency was facilitatory for all words. These results show that semantic context is relevant especially for the comprehension of unreduced words, which is unexpected given the listener driven explanation of reduction in spontaneous speech.
  • Van Valin Jr., R. D. (1987). Pragmatics, island phenomena, and linguistic competence. In A. M. Farley, P. T. Farley, & K.-E. McCullough (Eds.), CLS 22. Papers from the parasession on pragmatics and grammatical theory (pp. 223-233). Chicago Linguistic Society.
  • van der Burght, C. L., & Meyer, A. S. (2024). Interindividual variation in weighting prosodic and semantic cues during sentence comprehension – a partial replication of Van der Burght et al. (2021). In Y. Chen, A. Chen, & A. Arvaniti (Eds.), Proceedings of Speech Prosody 2024 (pp. 792-796). doi:10.21437/SpeechProsody.2024-160.

    Abstract

    Contrastive pitch accents can mark sentence elements occupying parallel roles. In “Mary kissed John, not Peter”, a pitch accent on Mary or John cues the implied syntactic role of Peter. Van der Burght, Friederici, Goucha, and Hartwigsen (2021) showed that listeners can build expectations concerning syntactic and semantic properties of upcoming words, derived from pitch accent information they heard previously. To further explore these expectations, we attempted a partial replication of the original German study in Dutch. In the experimental sentences “Yesterday, the police officer arrested the thief, not the inspector/murderer”, a pitch accent on subject or object cued the subject/object role of the ellipsis clause. Contrasting elements were additionally cued by the thematic role typicality of the nouns. Participants listened to sentences in which the ellipsis clause was omitted and selected the most plausible sentence-final noun (presented visually) via button press. Replicating the original study results, listeners based their sentence-final preference on the pitch accent information available in the sentence. However, as in the original study, individual differences between listeners were found, with some following prosodic information and others relying on a structural bias. The results complement the literature on ellipsis resolution and on interindividual variability in cue weighting.
  • Walsh Dickey, L. (1999). Syllable count and Tzeltal segmental allomorphy. In J. Rennison, & K. Kühnhammer (Eds.), Phonologica 1996. Proceedings of the 8th International Phonology Meeting (pp. 323-334). Holland Academic Graphics.

    Abstract

    Tzeltal, a Mayan language spoken in southern Mexico, exhibits allo-morphy of an unusual type. The vowel quality of the perfective suffix is determined by the number of syllables in the stem to which it is attaching. This paper presents previously unpublished data of this allomorphy and demonstrates that a syllable-count analysis of the phenomenon is the proper one. This finding is put in a more general context of segment-prosody interaction in allomorphy.
  • Weber, A., & Paris, G. (2004). The origin of the linguistic gender effect in spoken-word recognition: Evidence from non-native listening. In K. Forbus, D. Gentner, & T. Tegier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society. Mahwah, NJ: Erlbaum.

    Abstract

    Two eye-tracking experiments examined linguistic gender effects in non-native spoken-word recognition. French participants, who knew German well, followed spoken instructions in German to click on pictures on a computer screen (e.g., Wo befindet sich die Perle, “where is the pearl”) while their eye movements were monitored. The name of the target picture was preceded by a gender-marked article in the instructions. When a target and a competitor picture (with phonologically similar names) were of the same gender in both German and French, French participants fixated competitor pictures more than unrelated pictures. However, when target and competitor were of the same gender in German but of different gender in French, early fixations to the competitor picture were reduced. Competitor activation in the non-native language was seemingly constrained by native gender information. German listeners showed no such viewing time difference. The results speak against a form-based account of the linguistic gender effect. They rather support the notion that the effect originates from the grammatical level of language processing.
  • Weber, A. (2009). The role of linguistic experience in lexical recognition [Abstract]. Journal of the Acoustical Society of America, 125, 2759.

    Abstract

    Lexical recognition is typically slower in L2 than in L1. Part of the difficulty comes from a not precise enough processing of L2 phonemes. Consequently, L2 listeners fail to eliminate candidate words that L1 listeners can exclude from competing for recognition. For instance, the inability to distinguish /r/ from /l/ in rocket and locker makes for Japanese listeners both words possible candidates when hearing their onset (e.g., Cutler, Weber, and Otake, 2006). The L2 disadvantage can, however, be dispelled: For L2 listeners, but not L1 listeners, L2 speech from a non-native talker with the same language background is known to be as intelligible as L2 speech from a native talker (e.g., Bent and Bradlow, 2003). A reason for this may be that L2 listeners have ample experience with segmental deviations that are characteristic for their own accent. On this account, only phonemic deviations that are typical for the listeners’ own accent will cause spurious lexical activation in L2 listening (e.g., English magic pronounced as megic for Dutch listeners). In this talk, I will present evidence from cross-modal priming studies with a variety of L2 listener groups, showing how the processing of phonemic deviations is accent-specific but withstands fine phonetic differences.
  • Weber, A., & Mueller, K. (2004). Word order variation in German main clauses: A corpus analysis. In Proceedings of the 20th International Conference on Computational Linguistics.

    Abstract

    In this paper, we present empirical data from a corpus study on the linear order of subjects and objects in German main clauses. The aim was to establish the validity of three well-known ordering constraints: given complements tend to occur before new complements, definite before indefinite, and pronoun before full noun phrase complements. Frequencies of occurrences were derived for subject-first and object-first sentences from the German Negra corpus. While all three constraints held on subject-first sentences, results for object-first sentences varied. Our findings suggest an influence of grammatical functions on the ordering of verb complements.
  • Wittenburg, P. (2004). The IMDI metadata concept. In S. F. Ferreira (Ed.), Workingmaterial on Building the LR&E Roadmap: Joint COCOSDA and ICCWLRE Meeting, (LREC2004). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Brugman, H., Broeder, D., & Russel, A. (2004). XML-based language archiving. In Workshop Proceedings on XML-based Richly Annotaded Corpora (LREC2004) (pp. 63-69). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Gulrajani, G., Broeder, D., & Uneson, M. (2004). Cross-disciplinary integration of metadata descriptions. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004) (pp. 113-116). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Johnson, H., Buchhorn, M., Brugman, H., & Broeder, D. (2004). Architecture for distributed language resource management and archiving. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004) (pp. 361-364). Paris: ELRA - European Language Resources Association.
  • Xiao, M., Kong, X., Liu, J., & Ning, J. (2009). TMBF: Bloom filter algorithms of time-dependent multi bit-strings for incremental set. In Proceedings of the 2009 International Conference on Ultra Modern Telecommunications & Workshops.

    Abstract

    Set is widely used as a kind of basic data structure. However, when it is used for large scale data set the cost of storage, search and transport is overhead. The bloom filter uses a fixed size bit string to represent elements in a static set, which can reduce storage space and search cost that is a fixed constant. The time-space efficiency is achieved at the cost of a small probability of false positive in membership query. However, for many applications the space savings and locating time constantly outweigh this drawback. Dynamic bloom filter (DBF) can support concisely representation and approximate membership queries of dynamic set instead of static set. It has been proved that DBF not only possess the advantage of standard bloom filter, but also has better features when dealing with dynamic set. This paper proposes a time-dependent multiple bit-strings bloom filter (TMBF) which roots in the DBF and targets on dynamic incremental set. TMBF uses multiple bit-strings in time order to present a dynamic increasing set and uses backward searching to test whether an element is in a set. Based on the system logs from a real P2P file sharing system, the evaluation shows a 20% reduction in searching cost compared to DBF.
  • Yang, J., Zhang, Y., & Yu, C. (2024). Learning semantic knowledge based on infant real-time. In L. K. Samuelson, S. L. Frank, M. Toneva, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 741-747).

    Abstract

    Early word learning involves mapping individual words to their meanings and building organized semantic representations among words. Previous corpus-based studies (e.g., using text from websites, newspapers, child-directed speech corpora) demonstrated that linguistic information such as word co-occurrence alone is sufficient to build semantically organized word knowledge. The present study explored two new research directions to advance understanding of how infants acquire semantically organized word knowledge. First, infants in the real world hear words surrounded by contextual information. Going beyond inferring semantic knowledge merely from language input, we examined the role of extra-linguistic contextual information in learning semantic knowledge. Second, previous research relies on large amounts of linguistic data to demonstrate in-principle learning, which is unrealistic compared with the input children receive. Here, we showed that incorporating extra-linguistic information provides an efficient mechanism through which semantic knowledge can be acquired with a small amount of data infants perceive in everyday learning contexts, such as toy play.

    Additional information

    link to eScholarship
  • Zhou, Y., van der Burght, C. L., & Meyer, A. S. (2024). Investigating the role of semantics and perceptual salience in the memory benefit of prosodic prominence. In Y. Chen, A. Chen, & A. Arvaniti (Eds.), Proceedings of Speech Prosody 2024 (pp. 1250-1254). doi:10.21437/SpeechProsody.2024-252.

    Abstract

    Prosodic prominence can enhance memory for the prominent words. This mnemonic benefit has been linked to listeners’ allocation of attention and deeper processing, which leads to more robust semantic representations. We investigated whether, in addition to the well-established effect at the semantic level, there was a memory benefit for prominent words at the phonological level. To do so, participants (48 native speakers of Dutch), first performed an accent judgement task, where they had to discriminate accented from unaccented words, and accented from unaccented pseudowords. All stimuli were presented in lists. They then performed an old/new recognition task for the stimuli. Accuracy in the accent judgement task was equally high for words and pseudowords. In the recognition task, performance was, as expected, better for words than pseudowords. More importantly, there was an interaction of accent with word type, with a significant advantage for accented compared to unaccented words, but not for pseudowords. The results confirm the memory benefit for accented compared to unaccented words seen in earlier studies, and they are consistent with the view that prominence primarily affects the semantic encoding of words. There was no evidence for an additional memory benefit arising at the phonological level.
  • Zora, H., Bowin, H., Heldner, M., Riad, T., & Hagoort, P. (2024). The role of pitch accent in discourse comprehension and the markedness of Accent 2 in Central Swedish. In Y. Chen, A. Chen, & A. Arvaniti (Eds.), Proceedings of Speech Prosody 2024 (pp. 921-925). doi:10.21437/SpeechProsody.2024-186.

    Abstract

    In Swedish, words are associated with either of two pitch contours known as Accent 1 and Accent 2. Using a psychometric test, we investigated how listeners judge pitch accent violations while interpreting discourse. Forty native speakers of Central Swedish were presented with auditory dialogues, where test words were appropriately or inappropriately accented in a given context, and asked to judge the correctness of sentences containing the test words. Data indicated a statistically significant effect of wrong accent pattern on the correctness judgment. Both Accent 1 and Accent 2 violations interfered with the coherent interpretation of discourse and were judged as incorrect by the listeners. Moreover, there was a statistically significant difference in the perceived correctness between the accent patterns. Accent 2 violations led to a lower correctness score compared to Accent 1 violations, indicating that the listeners were more sensitive to pitch accent violations in Accent 2 words than in Accent 1 words. This result is in line with the notion that Accent 2 is marked and lexically represented in Central Swedish. Taken together, these findings indicate that listeners use both Accent 1 and Accent 2 to arrive at the correct interpretation of the linguistic input, while assigning varying degrees of relevance to them depending on their markedness.

Share this page