Publications

Displaying 101 - 148 of 148
  • Merkx, D., Frank, S. L., & Ernestus, M. (2022). Seeing the advantage: Visually grounding word embeddings to better capture human semantic knowledge. In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2022) (pp. 1-11). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL).

    Abstract

    Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings.Importantly, in both experiments we show that he grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information.
  • Meyer, A. S., & Wheeldon, L. (Eds.). (2006). Language production across the life span [Special Issue]. Language and Cognitive Processes, 21(1-3).
  • Mishra, C., & Skantze, G. (2022). Knowing where to look: A planning-based architecture to automate the gaze behavior of social robots. In Proceedings of the 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) (pp. 1201-1208). doi:10.1109/RO-MAN53752.2022.9900740.

    Abstract

    Gaze cues play an important role in human communication and are used to coordinate turn-taking and joint attention, as well as to regulate intimacy. In order to have fluent conversations with people, social robots need to exhibit humanlike gaze behavior. Previous Gaze Control Systems (GCS) in HRI have automated robot gaze using data-driven or heuristic approaches. However, these systems tend to be mainly reactive in nature. Planning the robot gaze ahead of time could help in achieving more realistic gaze behavior and better eye-head coordination. In this paper, we propose and implement a novel planning-based GCS. We evaluate our system in a comparative within-subjects user study (N=26) between a reactive system and our proposed system. The results show that the users preferred the proposed system and that it was significantly more interpretable and better at regulating intimacy.
  • Moore, R. K., & Cutler, A. (2001). Constraints on theories of human vs. machine recognition of speech. In R. Smits, J. Kingston, T. Neary, & R. Zondervan (Eds.), Proceedings of the workshop on Speech Recognition as Pattern Classification (SPRAAC) (pp. 145-150). Nijmegen: Max Planck Institute for Psycholinguistics.

    Abstract

    The central issues in the study of speech recognition by human listeners (HSR) and of automatic speech recognition (ASR) are clearly comparable; nevertheless the research communities that concern themselves with ASR and HSR are largely distinct. This paper compares the research objectives of the two fields, and attempts to draw informative lessons from one to the other.
  • Norris, D., Cutler, A., McQueen, J. M., Butterfield, S., & Kearns, R. K. (2000). Language-universal constraints on the segmentation of English. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP (Workshop on Spoken Word Access Processes) (pp. 43-46). Nijmegen: Max-Planck-Institute for Psycholinguistics.

    Abstract

    Two word-spotting experiments are reported that examine whether the Possible-Word Constraint (PWC) [1] is a language-specific or language-universal strategy for the segmentation of continuous speech. The PWC disfavours parses which leave an impossible residue between the end of a candidate word and a known boundary. The experiments examined cases where the residue was either a CV syllable with a lax vowel, or a CVC syllable with a schwa. Although neither syllable context is a possible word in English, word-spotting in both contexts was easier than with a context consisting of a single consonant. The PWC appears to be language-universal rather than language-specific.
  • Norris, D., Van Ooijen, B., & Cutler, A. (1992). Speeded detection of vowels and steady-state consonants. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing; Vol. 2 (pp. 1055-1058). Alberta: University of Alberta.

    Abstract

    We report two experiments in which vowels and steady-state consonants served as targets in a speeded detection task. In the first experiment, two vowels were compared with one voiced and once unvoiced fricative. Response times (RTs) to the vowels were longer than to the fricatives. The error rate was higher for the consonants. Consonants in word-final position produced the shortest RTs, For the vowels, RT correlated negatively with target duration. In the second experiment, the same two vowel targets were compared with two nasals. This time there was no significant difference in RTs, but the error rate was still significantly higher for the consonants. Error rate and length correlated negatively for the vowels only. We conclude that RT differences between phonemes are independent of vocalic or consonantal status. Instead, we argue that the process of phoneme detection reflects more finely grained differences in acoustic/articulatory structure within the phonemic repertoire.
  • Norris, D., Cutler, A., & McQueen, J. M. (2000). The optimal architecture for simulating spoken-word recognition. In C. Davis, T. Van Gelder, & R. Wales (Eds.), Cognitive Science in Australia, 2000: Proceedings of the Fifth Biennial Conference of the Australasian Cognitive Science Society. Adelaide: Causal Productions.

    Abstract

    Simulations explored the inability of the TRACE model of spoken-word recognition to model the effects on human listening of subcategorical mismatch in word forms. The source of TRACE's failure lay not in interactive connectivity, not in the presence of inter-word competition, and not in the use of phonemic representations, but in the need for continuously optimised interpretation of the input. When an analogue of TRACE was allowed to cycle to asymptote on every slice of input, an acceptable simulation of the subcategorical mismatch data was achieved. Even then, however, the simulation was not as close as that produced by the Merge model, which has inter-word competition, phonemic representations and continuous optimisation (but no interactive connectivity).
  • Offenga, F., Broeder, D., Wittenburg, P., Ducret, J., & Romary, L. (2006). Metadata profile in the ISO data category registry. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1866-1869).
  • Otake, T., & Cutler, A. (2000). A set of Japanese word cohorts rated for relative familiarity. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the Sixth International Conference on Spoken Language Processing: Vol. 3 (pp. 766-769). Beijing: China Military Friendship Publish.

    Abstract

    A database is presented of relative familiarity ratings for 24 sets of Japanese words, each set comprising words overlapping in the initial portions. These ratings are useful for the generation of material sets for research in the recognition of spoken words.
  • Otake, T., & Cutler, A. (2001). Recognition of (almost) spoken words: Evidence from word play in Japanese. In P. Dalsgaard (Ed.), Proceedings of EUROSPEECH 2001 (pp. 465-468).

    Abstract

    Current models of spoken-word recognition assume automatic activation of multiple candidate words fully or partially compatible with the speech input. We propose that listeners make use of this concurrent activation in word play such as punning. Distortion in punning should ideally involve no more than a minimal contrastive deviation between two words, namely a phoneme. Moreover, we propose that this metric of similarity does not presuppose phonemic awareness on the part of the punster. We support these claims with an analysis of modern and traditional puns in Japanese (in which phonemic awareness in language users is not encouraged by alphabetic orthography). For both data sets, the results support the predictions. Punning draws on basic processes of spokenword recognition, common across languages.
  • Ozyurek, A., & Ozcaliskan, S. (2000). How do children learn to conflate manner and path in their speech and gestures? Differences in English and Turkish. In E. V. Clark (Ed.), The proceedings of the Thirtieth Child Language Research Forum (pp. 77-85). Stanford: CSLI Publications.
  • Ozyurek, A. (2001). What do speech-gesture mismatches reveal about language specific processing? A comparison of Turkish and English. In C. Cavé, I. Guaitella, & S. Santi (Eds.), Oralité et gestualité: Interactions et comportements multimodaux dans la communication: Actes du Colloque ORAGE 2001 (pp. 567-581). Paris: L'Harmattan.
  • Pallier, C., Cutler, A., & Sebastian-Galles, N. (1997). Prosodic structure and phonetic processing: A cross-linguistic study. In Proceedings of EUROSPEECH 97 (pp. 2131-2134). Grenoble, France: ESCA.

    Abstract

    Dutch and Spanish differ in how predictable the stress pattern is as a function of the segmental content: it is correlated with syllable weight in Dutch but not in Spanish. In the present study, two experiments were run to compare the abilities of Dutch and Spanish speakers to separately process segmental and stress information. It was predicted that the Spanish speakers would have more difficulty focusing on the segments and ignoring the stress pattern than the Dutch speakers. The task was a speeded classification task on CVCV syllables, with blocks of trials in which the stress pattern could vary versus blocks in which it was fixed. First, we found interference due to stress variability in both languages, suggesting that the processing of segmental information cannot be performed independently of stress. Second, the effect was larger for Spanish than for Dutch, suggesting that that the degree of interference from stress variation may be partially mitigated by the predictability of stress placement in the language.
  • Papafragou, A., & Ozturk, O. (2006). The acquisition of epistemic modality. In A. Botinis (Ed.), Proceedings of ITRW on Experimental Linguistics in ExLing-2006 (pp. 201-204). ISCA Archive.

    Abstract

    In this paper we try to contribute to the body of knowledge about the acquisition of English epistemic modal verbs (e.g. Mary may/has to be at school). Semantically, these verbs encode possibility or necessity with respect to available evidence. Pragmatically, the use of epistemic modals often gives rise to scalar conversational inferences (Mary may be at school -> Mary doesn’t have to be at school). The acquisition of epistemic modals is challenging for children on both these levels. In this paper, we present findings from two studies which were conducted with 5-year-old children and adults. Our findings, unlike previous work, show that 5-yr-olds have mastered epistemic modal semantics, including the notions of necessity and possibility. However, they are still in the process of acquiring epistemic modal pragmatics.
  • Pereiro Estevan, Y., Wan, V., Scharenborg, O., & Gallardo Antolín, A. (2006). Segmentación de fonemas no supervisada basada en métodos kernel de máximo margen. In Proceedings of IV Jornadas en Tecnología del Habla.

    Abstract

    En este artículo se desarrolla un método automático de segmentación de fonemas no supervisado. Este método utiliza el algoritmo de agrupación de máximo margen [1] para realizar segmentación de fonemas sobre habla continua sin necesidad de información a priori para el entrenamiento del sistema.
  • Pluymaekers, M., Ernestus, M., Baayen, R. H., & Booij, G. (2006). The role of morphology in fine phonetic detail: The case of Dutch -igheid. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 53-54).
  • Pluymaekers, M., Ernestus, M., & Baayen, R. H. (2006). Effects of word frequency on the acoustic durations of affixes. In Proceedings of Interspeech 2006 (pp. 953-956). Pittsburgh: ICSLP.

    Abstract

    This study investigates whether the acoustic durations of derivational affixes in Dutch are affected by the frequency of the word they occur in. In a word naming experiment, subjects were presented with a large number of words containing one of the affixes ge-, ver-, ont, or -lijk. Their responses were recorded on DAT tapes, and the durations of the affixes were measured using Automatic Speech Recognition technology. To investigate whether frequency also affected durations when speech rate was high, the presentation rate of the stimuli was varied. The results show that a higher frequency of the word as a whole led to shorter acoustic realizations for all affixes. Furthermore, affixes became shorter as the presentation rate of the stimuli increased. There was no interaction between word frequency and presentation rate, suggesting that the frequency effect also applies in situations in which the speed of articulation is very high.
  • Poletiek, F. H., & Chater, N. (2006). Grammar induction profits from representative stimulus sampling. In R. Sun (Ed.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006) (pp. 1968-1973). Austin, TX, USA: Cognitive Science Society.
  • Raviv, L., Jacobson, S. L., Plotnik, J. M., Bowman, J., Lynch, V., & Benítez-Burraco, A. (2022). Elephants as a new animal model for studying the evolution of language as a result of self-domestication. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 606-608). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • de Reus, K., Carlson, D., Lowry, A., Gross, S., Garcia, M., Rubio-García, A., Salazar-Casals, A., & Ravignani, A. (2022). Body size predicts vocal tract size in a mammalian vocal learner. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 154-156). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Scharenborg, O., Sturm, J., & Boves, L. (2001). Business listings in automatic directory assistance. In Interspeech - Eurospeech 2001 - 7th European Conference on Speech Communication and Technology (pp. 2381-2384). ISCA Archive.

    Abstract

    So far most attempts to automate Directory Assistance services focused on private listings, because it is not known precisely how callers will refer to a business listings. The research described in this paper, carried out in the SMADA project, tries to fill this gap. The aim of the research is to model the expressions people use when referring to a business listing by means of rules, in order to automatically create a vocabulary, which can be part of an automated DA service. In this paper a rule-base procedure is proposed, which derives rules from the expressions people use. These rules are then used to automatically create expressions from directory listings. Two categories of businesses, viz. hospitals and the hotel and catering industry, are used to explain this procedure. Results for these two categories are used to discuss the problem of the over- and undergeneration of expressions.
  • Scharenborg, O., Wan, V., & Moore, R. K. (2006). Capturing fine-phonetic variation in speech through automatic classification of articulatory features. In Speech Recognition and Intrinsic Variation Workshop [SRIV2006] (pp. 77-82). ISCA Archive.

    Abstract

    The ultimate goal of our research is to develop a computational model of human speech recognition that is able to capture the effects of fine-grained acoustic variation on speech recognition behaviour. As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. In the experiments reported here, we compared support vector machines (SVMs) with multilayer perceptrons (MLPs). MLPs have been widely (and rather successfully) used for the task of multi-value articulatory feature classification, while (to the best of our knowledge) SVMs have not. This paper compares the performances of the two classifiers and analyses the results in order to better understand the articulatory representations. It was found that the MLPs outperformed the SVMs, but it is concluded that both classifiers exhibit similar behaviour in terms of patterns of errors.
  • Scharenborg, O., Bouwman, G., & Boves, L. (2000). Connected digit recognition with class specific word models. In Proceedings of the COST249 Workshop on Voice Operated Telecom Services workshop (pp. 71-74).

    Abstract

    This work focuses on efficient use of the training material by selecting the optimal set of model topologies. We do this by training multiple word models of each word class, based on a subclassification according to a priori knowledge of the training material. We will examine classification criteria with respect to duration of the word, gender of the speaker, position of the word in the utterance, pauses in the vicinity of the word, and combinations of these. Comparative experiments were carried out on a corpus consisting of Dutch spoken connected digit strings and isolated digits, which are recorded in a wide variety of acoustic conditions. The results show, that classification based on gender of the speaker, position of the digit in the string, pauses in the vicinity of the training tokens, and models based on a combination of these criteria perform significantly better than the set with single models per digit.
  • Schiller, N. O., Van Lieshout, P. H. H. M., Meyer, A. S., & Levelt, W. J. M. (1997). Is the syllable an articulatory unit in speech production? Evidence from an Emma study. In P. Wille (Ed.), Fortschritte der Akustik: Plenarvorträge und Fachbeiträge der 23. Deutschen Jahrestagung für Akustik (DAGA 97) (pp. 605-606). Oldenburg: DEGA.
  • Scholman, M., Tianai, D., Yung, F., & Demberg, V. (2022). DiscoGeM: A crowdsourced corpus of genre-mixed implicit discourse relations. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. DeClerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022) (pp. 3281-3290). Marseille, France: European Language Resources Association.

    Abstract

    We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech,
    literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods
    were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results
    show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses.
    Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text
    genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic
    relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level
    labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to
    function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into
    non-connective signals of discourse relations.
  • Scott, S., & Sauter, D. (2006). Non-verbal expressions of emotion - acoustics, valence, and cross cultural factors. In Third International Conference on Speech Prosody 2006. ISCA.

    Abstract

    This presentation will address aspects of the expression of emotion in non-verbal vocal behaviour, specifically attempting to determine the roles of both positive and negative emotions, their acoustic bases, and the extent to which these are recognized in non-Western cultures.
  • Senft, G. (2000). COME and GO in Kilivila. In B. Palmer, & P. Geraghty (Eds.), SICOL. Proceedings of the second international conference on Oceanic linguistics: Volume 2, Historical and descriptive studies (pp. 105-136). Canberra: Pacific Linguistics.
  • Seuren, P. A. M. (2001). Lexical meaning and metaphor. In E. N. Enikö (Ed.), Cognition in language use (pp. 422-431). Antwerp, Belgium: International Pragmatics Association (IPrA).
  • Seuren, P. A. M. (1980). Variabele competentie: Linguïstiek en sociolinguïstiek anno 1980. In Handelingen van het 36e Nederlands Filologencongres: Gehouden te Groningen op woensdag 9, donderdag 10 en vrijdag 11 April 1980 (pp. 41-56). Amsterdam: Holland University Press.
  • Severijnen, G. G., Bosker, H. R., & McQueen, J. M. (2022). Acoustic correlates of Dutch lexical stress re-examined: Spectral tilt is not always more reliable than intensity. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 278-282). doi:10.21437/SpeechProsody.2022-57.

    Abstract

    The present study examined two acoustic cues in the production
    of lexical stress in Dutch: spectral tilt and overall intensity.
    Sluijter and Van Heuven (1996) reported that spectral tilt is a
    more reliable cue to stress than intensity. However, that study
    included only a small number of talkers (10) and only syllables
    with the vowels /aː/ and /ɔ/.
    The present study re-examined this issue in a larger and
    more variable dataset. We recorded 38 native speakers of Dutch
    (20 females) producing 744 tokens of Dutch segmentally
    overlapping words (e.g., VOORnaam vs. voorNAAM, “first
    name” vs. “respectable”), targeting 10 different vowels, in
    variable sentence contexts. For each syllable, we measured
    overall intensity and spectral tilt following Sluijter and Van
    Heuven (1996).
    Results from Linear Discriminant Analyses showed that,
    for the vowel /aː/ alone, spectral tilt showed an advantage over
    intensity, as evidenced by higher stressed/unstressed syllable
    classification accuracy scores for spectral tilt. However, when
    all vowels were included in the analysis, the advantage
    disappeared.
    These findings confirm that spectral tilt plays a larger role
    in signaling stress in Dutch /aː/ but show that, for a larger
    sample of Dutch vowels, overall intensity and spectral tilt are
    equally important.
  • Seyfeddinipur, M., & Kita, S. (2001). Gestures and dysfluencies in speech. In C. Cavé, I. Guaïtella, & S. Santi (Eds.), Oralité et gestualité: Interactions et comportements multimodaux dans la communication. Actes du colloque ORAGE 2001 (pp. 266-270). Paris, France: Éditions L'Harmattan.
  • Slonimska, A., Özyürek, A., & Capirci, O. (2022). Simultaneity as an emergent property of sign languages. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 678-680). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Ten Bosch, L., Baayen, R. H., & Ernestus, M. (2006). On speech variation and word type differentiation by articulatory feature representations. In Proceedings of Interspeech 2006 (pp. 2230-2233).

    Abstract

    This paper describes ongoing research aiming at the description of variation in speech as represented by asynchronous articulatory features. We will first illustrate how distances in the articulatory feature space can be used for event detection along speech trajectories in this space. The temporal structure imposed by the cosine distance in articulatory feature space coincides to a large extent with the manual segmentation on phone level. The analysis also indicates that the articulatory feature representation provides better such alignments than the MFCC representation does. Secondly, we will present first results that indicate that articulatory features can be used to probe for acoustic differences in the onsets of Dutch singulars and plurals.
  • ten Bosch, L., Hämäläinen, A., Scharenborg, O., & Boves, L. (2006). Acoustic scores and symbolic mismatch penalties in phone lattices. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing [ICASSP 2006]. IEEE.

    Abstract

    This paper builds on previous work that aims at unraveling the structure of the speech signal by means of using probabilistic representations. The context of this work is a multi-pass speech recognition system in which a phone lattice is created and used as a basis for a lexical search in which symbolic mismatches are allowed at certain costs. The focus is on the optimization of the costs of phone insertions, deletions and substitutions that are used in the lexical decoding pass. Two optimization approaches are presented, one related to a multi-pass computational model for human speech recognition, the other based on a decoding in which Bayes’ risks are minimized. In the final section, the advantages of these optimization methods are discussed and compared.
  • Tsutsui, S., Wang, X., Weng, G., Zhang, Y., Crandall, D., & Yu, C. (2022). Action recognition based on cross-situational action-object statistics. In Proceedings of the 2022 IEEE International Conference on Development and Learning (ICDL 2022).

    Abstract

    Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition models with greater generalization ability. To do this, we take inspiration from a cognitive mechanism called cross-situational learning, which states that human learners extract the meaning of concepts by observing instances of the same concept across different situations. We perform controlled experiments with various types of action-object associations, and identify key properties of action-object co-occurrence in training data that lead to better classifiers. Given that these properties are missing in the datasets that are typically used to train action classifiers in the computer vision literature, our work provides useful insights on how we should best construct datasets for efficiently training for better generalization.
  • Tuinman, A. (2006). Overcompensation of /t/ reduction in Dutch by German/Dutch bilinguals. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 101-102).
  • Van Valin Jr., R. D. (2000). Focus structure or abstract syntax? A role and reference grammar account of some ‘abstract’ syntactic phenomena. In Z. Estrada Fernández, & I. Barreras Aguilar (Eds.), Memorias del V Encuentro Internacional de Lingüística en el Noroeste: (2 v.) Estudios morfosintácticos (pp. 39-62). Hermosillo: Editorial Unison.
  • Van den Bos, E. J., & Poletiek, F. H. (2006). Implicit artificial grammar learning in adults and children. In R. Sun (Ed.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006) (pp. 2619). Austin, TX, USA: Cognitive Science Society.
  • Van de Weijer, J. (1997). Language input to a prelingual infant. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings of the GALA '97 conference on language acquisition (pp. 290-293). Edinburgh University Press.

    Abstract

    Pitch, intonation, and speech rate were analyzed in a collection of everyday speech heard by one Dutch infant between the ages of six and nine months. Components of each of these variables were measured in the speech of three adult speakers (mother, father, baby-sitter) when they addressed the infant, and when they addressed another adult. The results are in line with previously reported findings which are usually based on laboratory or prearranged settings: infant-directed speech in a natural setting exhibits more pitch variation, a larger number of simple intonation contours, and slower speech rate than does adult-directed speech.
  • Van Heuven, V. J., Haan, J., Janse, E., & Van der Torre, E. J. (1997). Perceptual identification of sentence type and the time-distribution of prosodic interrogativity markers in Dutch. In Proceedings of the ESCA Tutorial and Research Workshop on Intonation: Theory, Models and Applications, Athens, Greece, 1997 (pp. 317-320).

    Abstract

    Dutch distinguishes at least four sentence types: statements and questions, the latter type being subdivided into wh-questions (beginning with a question word), yes/no-questions (with inversion of subject and finite), and declarative questions (lexico-syntactically identical to statement). Acoustically, each of these (sub)types was found to have clearly distinct global F0-patterns, as well as a characteristic distribution of final rises [1,2]. The present paper explores the separate contribution of parameters of global downtrend and size of accent-lending pitch movements versus aspects of the terminal rise to the human identification of the four sentence (sub)types, at various positions in the time-course of the utterance. The results show that interrogativity in Dutch can be identified at an early point in the utterance. However, wh-questions are not distinct from statements.
  • Warner, N., Jongman, A., Mucke, D., & Cutler, A. (2001). The phonological status of schwa insertion in Dutch: An EMA study. In B. Maassen, W. Hulstijn, R. Kent, H. Peters, & P. v. Lieshout (Eds.), Speech motor control in normal and disordered speech: 4th International Speech Motor Conference (pp. 86-89). Nijmegen: Vantilt.

    Abstract

    Articulatory data are used to address the question of whether Dutch schwa insertion is a phonological or a phonetic process. By investigating tongue tip raising and dorsal lowering, we show that /l/ when it appears before inserted schwa is a light /l/, just as /l/ before an underlying schwa is, and unlike the dark /l/ before a consonant in non-insertion productions of the same words. The fact that inserted schwa can condition the light/dark /l/ alternation shows that schwa insertion involves the phonological insertion of a segment rather than phonetic adjustments to articulations.
  • Weber, A. (2000). Phonotactic and acoustic cues for word segmentation in English. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP 2000) (pp. 782-785).

    Abstract

    This study investigates the influence of both phonotactic and acoustic cues on the segmentation of spoken English. Listeners detected embedded English words in nonsense sequences (word spotting). Words aligned with phonotactic boundaries were easier to detect than words without such alignment. Acoustic cues to boundaries could also have signaled word boundaries, especially when word onsets lacked phonotactic alignment. However, only one of several durational boundary cues showed a marginally significant correlation with response times (RTs). The results suggest that word segmentation in English is influenced primarily by phonotactic constraints and only secondarily by acoustic aspects of the speech signal.
  • Weber, A. (2000). The role of phonotactics in the segmentation of native and non-native continuous speech. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP, Workshop on Spoken Word Access Processes. Nijmegen: MPI for Psycholinguistics.

    Abstract

    Previous research has shown that listeners make use of their knowledge of phonotactic constraints to segment speech into individual words. The present study investigates the influence of phonotactics when segmenting a non-native language. German and English listeners detected embedded English words in nonsense sequences. German listeners also had knowledge of English, but English listeners had no knowledge of German. Word onsets were either aligned with a syllable boundary or not, according to the phonotactics of the two languages. Words aligned with either German or English phonotactic boundaries were easier for German listeners to detect than words without such alignment. Responses of English listeners were influenced primarily by English phonotactic alignment. The results suggest that both native and non-native phonotactic constraints influence lexical segmentation of a non-native, but familiar, language.
  • Widlok, T. (2006). Two ways of looking at a Mangetti grove. In A. Takada (Ed.), Proceedings of the workshop: Landscape and society (pp. 11-16). Kyoto: 21st Century Center of Excellence Program.
  • Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: a professional framework for multimodality research. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1556-1559).

    Abstract

    Utilization of computer tools in linguistic research has gained importance with the maturation of media frameworks for the handling of digital audio and video. The increased use of these tools in gesture, sign language and multimodal interaction studies has led to stronger requirements on the flexibility, the efficiency and in particular the time accuracy of annotation tools. This paper describes the efforts made to make ELAN a tool that meets these requirements, with special attention to the developments in the area of time accuracy. In subsequent sections an overview will be given of other enhancements in the latest versions of ELAN, that make it a useful tool in multimodality research.
  • Wittenburg, P., Broeder, D., Klein, W., Levinson, S. C., & Romary, L. (2006). Foundations of modern language resource archives. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 625-628).

    Abstract

    A number of serious reasons will convince an increasing amount of researchers to store their relevant material in centers which we will call "language resource archives". They combine the duty of taking care of long-term preservation as well as the task to give access to their material to different user groups. Access here is meant in the sense that an active interaction with the data will be made possible to support the integration of new data, new versions or commentaries of all sort. Modern Language Resource Archives will have to adhere to a number of basic principles to fulfill all requirements and they will have to be involved in federations to create joint language resource domains making it even more simple for the researchers to access the data. This paper makes an attempt to formulate the essential pillars language resource archives have to adhere to.
  • Woensdregt, M., Jara-Ettinger, J., & Rubio-Fernandez, P. (2022). Language universals rely on social cognition: Computational models of the use of this and that to redirect the receiver’s attention. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 1382-1388). Toronto, Canada: Cognitive Science Society.

    Abstract

    Demonstratives—simple referential devices like this and that—are linguistic universals, but their meaning varies cross-linguistically. In languages like English and Italian, demonstratives are thought to encode the referent’s distance from the producer (e.g., that one means “the one far away from me”),
    while in others, like Portuguese and Spanish, they encode relative distance from both producer and receiver (e.g., aquel means “the one far away from both of us”). Here we propose that demonstratives are also sensitive to the receiver’s focus of attention, hence requiring a deeper form of social cognition
    than previously thought. We provide initial empirical and computational evidence for this idea, suggesting that producers use
    demonstratives to redirect the receiver’s attention towards the intended referent, rather than only to indicate its physical distance.
  • Zhang, Y., & Yu, C. (2022). Examining real-time attention dynamics in parent-infant picture book reading. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 1367-1374). Toronto, Canada: Cognitive Science Society.

    Abstract

    Picture book reading is a common word-learning context from which parents repeatedly name objects to their child and it has been found to facilitate early word learning. To learn the correct word-object mappings in a book-reading context, infants need to be able to link what they see with what they hear. However, given multiple objects on every book page, it is not clear how infants direct their attention to objects named by parents. The aim of the current study is to examine how infants mechanistically discover the correct word-object mappings during book reading in real time. We used head-mounted eye-tracking during parent-infant picture book reading and measured the infant's moment-by-moment visual attention to the named referent. We also examined how gesture cues provided by both the child and the parent may influence infants' attention to the named target. We found that although parents provided many object labels during book reading, infants were not able to attend to the named objects easily. However, their abilities to follow and use gestures to direct the other social partner’s attention increase the chance of looking at the named target during parent naming.

Share this page