Publications

Displaying 201 - 281 of 281
  • Reis, A., Faísca, L., Castro, S.-L., & Petersson, K. M. (2010). Preditores da leitura ao longo da escolaridade: Um estudo com alunos do 1 ciclo do ensino básico. In Actas do VII simpósio nacional de investigação em psicologia (pp. 3117-3132).

    Abstract

    A aquisição da leitura decorre ao longo de diversas etapas, desde o momento em que a criança inicia o contacto com o alfabeto até ao momento em que se torna um leitor competente, apto a ler correcta e fluentemente. Compreender a evolução desta competência através de uma análise da diferenciação do peso de variáveis preditoras da leitura possibilita teorizar sobre os mecanismos cognitivos envolvidos nas diferentes fases de desenvolvimento da leitura. Realizámos um estudo transversal com 568 alunos do segundo ao quarto ano do primeiro ciclo do Ensino Básico, em que se avaliou o impacto de capacidades de processamento fonológico, nomeação rápida, conhecimento letra-som e vocabulário, bem como de capacidades cognitivas mais gerais (inteligência não-verbal e memória de trabalho), na exactidão e velocidade da leitura. De uma forma geral, os resultados mostraram que, apesar da consciência fonológica permanecer como o preditor mais importante da exactidão e fluência da leitura, o seu peso decresce à medida que a escolaridade aumenta. Observou-se também que, à medida que o contributo da consciência fonológica para a explicação da velocidade de leitura diminuía, aumentava o contributo de outras variáveis mais associadas ao automatismo e reconhecimento lexical, tais como a nomeação rápida e o vocabulário. Em suma, podemos dizer que ao longo da escolaridade se observa uma alteração dinâmica dos processos cognitivos subjacentes à leitura, o que sugere que a criança evolui de uma estratégia de leitura ancorada em processamentos sub-lexicais, e como tal mais dependente de processamentos fonológicos, para uma estratégia baseada no reconhecimento ortográfico das palavras.
  • Ringersma, J., Zinn, C., & Kemps-Snijders, M. (2009). LEXUS & ViCoS From lexical to conceptual spaces. In 1st International Conference on Language Documentation and Conservation (ICLDC).

    Abstract

    LEXUS and ViCoS: from lexicon to conceptual spaces LEXUS is a web-based lexicon tool and the knowledge space software ViCoS is an extension of LEXUS, allowing users to create relations between objects in and across lexica. LEXUS and ViCoS are part of the Language Archiving Technology software, developed at the MPI for Psycholinguistics to archive and enrich linguistic resources collected in the framework of language documentation projects. LEXUS is of primary interest for language documentation, offering the possibility to not just create a digital dictionary, but additionally it allows the creation of multi-media encyclopedic lexica. ViCoS provides an interface between the lexical space and the ontological space. Its approach permits users to model a world of concepts and their interrelations based on categorization patterns made by the speech community. We describe the LEXUS and ViCoS functionalities using three cases from DoBeS language documentation projects: (1) Marquesan The Marquesan lexicon was initially created in Toolbox and imported into LEXUS using the Toolbox import functionality. The lexicon is enriched with multi-media to illustrate the meaning of the words in its cultural environment. Members of the speech community consider words as keys to access and describe relevant parts of their life and traditions. Their understanding of words is best described by the various associations they evoke rather than in terms of any formal theory of meaning. Using ViCoS a knowledge space of related concepts is being created. (2) Kola-Sámi Two lexica are being created in LEXUS: RuSaDic lexicon is a Russian-Kildin wordlist in which the entries are of relative limited structure and content. SaRuDiC is a more complex structured lexicon with much richer content, including multi-media fragments and derivations. Using ViCoS we have created a connection between the two lexica, so that speakers who are familiair with Russian and wish to revitalize their Kildin can enter the lexicon through the RuSaDic and from there approach the informative SaRuDic. Similary we will create relations from the two lexica to external open databases, like e.g. Álgu. (3) Beaver A speaker database including kinship relations has been created and the database has been imported into LEXUS. In the LEXUS views the relations for individual speakers are being displayed. Using ViCoS the relational information from the database will be extracted to form a kisnhip relation space with specific relation types, like e.g 'mother-of'. The whole set of relations from the database can be displayed in one ViCoS relation window, and zoom functionality is available.
  • Rossi, G. (2010). Interactive written discourse: Pragmatic aspects of SMS communication. In G. Garzone, P. Catenaccio, & C. Degano (Eds.), Diachronic perspectives on genres in specialized communication. Conference Proceedings (pp. 135-138). Milano: CUEM.
  • Rubio-Fernández, P., & Jara-Ettinger, J. (2018). Joint inferences of speakers’ beliefs and referents based on how they speak. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 991-996). Austin, TX: Cognitive Science Society.

    Abstract

    For almost two decades, the poor performance observed with the so-called Director task has been interpreted as evidence of limited use of Theory of Mind in communication. Here we propose a probabilistic model of common ground in referential communication that derives three inferences from an utterance: what the speaker is talking about in a visual context, what she knows about the context, and what referential expressions she prefers. We tested our model by comparing its inferences with those made by human participants and found that it closely mirrors their judgments, whereas an alternative model compromising the hearer’s expectations of cooperativeness and efficiency reveals a worse fit to the human data. Rather than assuming that common ground is fixed in a given exchange and may or may not constrain reference resolution, we show how common ground can be inferred as part of the process of reference assignment.
  • Sadakata, M., Van der Zanden, L., & Sekiyama, K. (2010). Influence of musical training on perception of L2 speech. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 118-121).

    Abstract

    The current study reports specific cases in which a positive transfer of perceptual ability from the music domain to the language domain occurs. We tested whether musical training enhances discrimination and identification performance of L2 speech sounds (timing features, nasal consonants and vowels). Native Dutch and Japanese speakers with different musical training experience, matched for their estimated verbal IQ, participated in the experiments. Results indicated that musical training strongly increases one’s ability to perceive timing information in speech signals. We also found a benefit of musical training on discrimination performance for a subset of the tested vowel contrasts.
  • Saleh, A., Beck, T., Galke, L., & Scherp, A. (2018). Performance comparison of ad-hoc retrieval models over full-text vs. titles of documents. In M. Dobreva, A. Hinze, & M. Žumer (Eds.), Maturity and Innovation in Digital Libraries: 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings (pp. 290-303). Cham, Switzerland: Springer.

    Abstract

    While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.
  • Sauter, D. (2010). Non-verbal emotional vocalizations across cultures [Abstract]. In E. Zimmermann, & E. Altenmüller (Eds.), Evolution of emotional communication: From sounds in nonhuman mammals to speech and music in man (pp. 15). Hannover: University of Veterinary Medicine Hannover.

    Abstract

    Despite differences in language, culture, and ecology, some human characteristics are similar in people all over the world, while other features vary from one group to the next. These similarities and differences can inform arguments about what aspects of the human mind are part of our shared biological heritage and which are predominantly products of culture and language. I will present data from a cross-cultural project investigating the recognition of non-verbal vocalizations of emotions, such as screams and laughs, across two highly different cultural groups. English participants were compared to individuals from remote, culturally isolated Namibian villages. Vocalizations communicating the so-called “basic emotions” (anger, disgust, fear, joy, sadness, and surprise) were bidirectionally recognised. In contrast, a set of additional positive emotions was only recognised within, but not across, cultural boundaries. These results indicate that a number of primarily negative emotions are associated with vocalizations that can be recognised across cultures, while at least some positive emotions are communicated with culture-specific signals. I will discuss these findings in the context of accounts of emotions at differing levels of analysis, with an emphasis on the often-neglected positive emotions.
  • Sauter, D., Crasborn, O., & Haun, D. B. M. (2010). The role of perceptual learning in emotional vocalizations [Abstract]. In C. Douilliez, & C. Humez (Eds.), Third European Conference on Emotion 2010. Proceedings (pp. 39-39). Lille: Université de Lille.

    Abstract

    Many studies suggest that emotional signals can be recognized across cultures and modalities. But to what extent are these signals innate and to what extent are they learned? This study investigated whether auditory learning is necessary for the production of recognizable emotional vocalizations by examining the vocalizations produced by people born deaf. Recordings were made of eight congenitally deaf Dutch individuals, who produced non-verbal vocalizations of a range of negative and positive emotions. Perception was examined in a forced-choice task with hearing Dutch listeners (n = 25). Considerable variability was found across emotions, suggesting that auditory learning is more important for the acquisition of certain types of vocalizations than for others. In particular, achievement and surprise sounds were relatively poorly recognized. In contrast, amusement and disgust vocalizations were well recognized, suggesting that for some emotions, recognizable vocalizations can develop without any auditory learning. The implications of these results for models of emotional communication are discussed, and other routes of social learning available to the deaf individuals are considered.
  • Sauter, D., Crasborn, O., & Haun, D. B. M. (2010). The role of perceptual learning in emotional vocalizations [Abstract]. Journal of the Acoustical Society of America, 128, 2476.

    Abstract

    Vocalizations like screams and laughs are used to communicate affective states, but what acoustic cues in these signals require vocal learning and which ones are innate? This study investigated the role of auditory learning in the production of non-verbal emotional vocalizations by examining the vocalizations produced by people born deaf. Recordings were made of congenitally deaf Dutch individuals and matched hearing controls, who produced non-verbal vocalizations of a range of negative and positive emotions. Perception was examined in a forced-choice task with hearing Dutch listeners (n = 25), and judgments were analyzed together with acoustic cues, including envelope, pitch, and spectral measures. Considerable variability was found across emotions and acoustic cues, and the two types of information were related for a sub-set of the emotion categories. These results suggest that auditory learning is less important for the acquisition of certain types of vocalizations than for others (particularly amusement and relief), and they also point to a less central role for auditory learning of some acoustic features in affective non-verbal vocalizations. The implications of these results for models of vocal emotional communication are discussed.
  • Sauter, D., Eisner, F., Ekman, P., & Scott, S. K. (2009). Universal vocal signals of emotion. In N. Taatgen, & H. Van Rijn (Eds.), Proceedings of the 31st Annual Meeting of the Cognitive Science Society (CogSci 2009) (pp. 2251-2255). Cognitive Science Society.

    Abstract

    Emotional signals allow for the sharing of important information with conspecifics, for example to warn them of danger. Humans use a range of different cues to communicate to others how they feel, including facial, vocal, and gestural signals. Although much is known about facial expressions of emotion, less research has focused on affect in the voice. We compare British listeners to individuals from remote Namibian villages who have had no exposure to Western culture, and examine recognition of non-verbal emotional vocalizations, such as screams and laughs. We show that a number of emotions can be universally recognized from non-verbal vocal signals. In addition we demonstrate the specificity of this pattern, with a set of additional emotions only recognized within, but not across these cultural groups. Our findings indicate that a small set of primarily negative emotions have evolved signals across several modalities, while most positive emotions are communicated with culture-specific signals.
  • Scharenborg, O., & Merkx, D. (2018). The role of articulatory feature representation quality in a computational model of human spoken-word recognition. In Proceedings of the Machine Learning in Speech and Language Processing Workshop (MLSLP 2018).

    Abstract

    Fine-Tracker is a speech-based model of human speech
    recognition. While previous work has shown that Fine-Tracker
    is successful at modelling aspects of human spoken-word
    recognition, its speech recognition performance is not
    comparable to that of human performance, possibly due to
    suboptimal intermediate articulatory feature (AF)
    representations. This study investigates the effect of improved
    AF representations, obtained using a state-of-the-art deep
    convolutional network, on Fine-Tracker’s simulation and
    recognition performance: Although the improved AF quality
    resulted in improved speech recognition; it, surprisingly, did
    not lead to an improvement in Fine-Tracker’s simulation power.
  • Scharenborg, O., Wan, V., & Moore, R. K. (2006). Capturing fine-phonetic variation in speech through automatic classification of articulatory features. In Speech Recognition and Intrinsic Variation Workshop [SRIV2006] (pp. 77-82). ISCA Archive.

    Abstract

    The ultimate goal of our research is to develop a computational model of human speech recognition that is able to capture the effects of fine-grained acoustic variation on speech recognition behaviour. As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. In the experiments reported here, we compared support vector machines (SVMs) with multilayer perceptrons (MLPs). MLPs have been widely (and rather successfully) used for the task of multi-value articulatory feature classification, while (to the best of our knowledge) SVMs have not. This paper compares the performances of the two classifiers and analyses the results in order to better understand the articulatory representations. It was found that the MLPs outperformed the SVMs, but it is concluded that both classifiers exhibit similar behaviour in terms of patterns of errors.
  • Scharenborg, O., Bouwman, G., & Boves, L. (2000). Connected digit recognition with class specific word models. In Proceedings of the COST249 Workshop on Voice Operated Telecom Services workshop (pp. 71-74).

    Abstract

    This work focuses on efficient use of the training material by selecting the optimal set of model topologies. We do this by training multiple word models of each word class, based on a subclassification according to a priori knowledge of the training material. We will examine classification criteria with respect to duration of the word, gender of the speaker, position of the word in the utterance, pauses in the vicinity of the word, and combinations of these. Comparative experiments were carried out on a corpus consisting of Dutch spoken connected digit strings and isolated digits, which are recorded in a wide variety of acoustic conditions. The results show, that classification based on gender of the speaker, position of the digit in the string, pauses in the vicinity of the training tokens, and models based on a combination of these criteria perform significantly better than the set with single models per digit.
  • Scharenborg, O., & Okolowski, S. (2009). Lexical embedding in spoken Dutch. In INTERSPEECH 2009 - 10th Annual Conference of the International Speech Communication Association (pp. 1879-1882). ISCA Archive.

    Abstract

    A stretch of speech is often consistent with multiple words, e.g., the sequence /hæm/ is consistent with ‘ham’ but also with the first syllable of ‘hamster’, resulting in temporary ambiguity. However, to what degree does this lexical embedding occur? Analyses on two corpora of spoken Dutch showed that 11.9%-19.5% of polysyllabic word tokens have word-initial embedding, while 4.1%-7.5% of monosyllabic word tokens can appear word-initially embedded. This is much lower than suggested by an analysis of a large dictionary of Dutch. Speech processing thus appears to be simpler than one might expect on the basis of statistics on a dictionary.
  • Scharenborg, O. (2009). Using durational cues in a computational model of spoken-word recognition. In INTERSPEECH 2009 - 10th Annual Conference of the International Speech Communication Association (pp. 1675-1678). ISCA Archive.

    Abstract

    Evidence that listeners use durational cues to help resolve temporarily ambiguous speech input has accumulated over the past few years. In this paper, we investigate whether durational cues are also beneficial for word recognition in a computational model of spoken-word recognition. Two sets of simulations were carried out using the acoustic signal as input. The simulations showed that the computational model, like humans, takes benefit from durational cues during word recognition, and uses these to disambiguate the speech signal. These results thus provide support for the theory that durational cues play a role in spoken-word recognition.
  • Schuppler, B., Ernestus, M., Van Dommelen, W., & Koreman, J. (2010). Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2466-2469).

    Abstract

    This paper presents a study on the acoustic sub-segmental properties of word-final /t/ in conversational standard Dutch and how these properties contribute to whether humans and an ASR system classify the /t/ as acoustically present or absent. In general, humans and the ASR system use the same cues (presence of a constriction, a burst, and alveolar frication), but the ASR system is also less sensitive to fine cues (weak bursts, smoothly starting friction) than human listeners and misled by the presence of glottal vibration. These data inform the further development of models of human and automatic speech processing.
  • Schuppler, B., Van Dommelen, W., Koreman, J., & Ernestus, M. (2009). Word-final [t]-deletion: An analysis on the segmental and sub-segmental level. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 2275-2278). Causal Productions Pty Ltd.

    Abstract

    This paper presents a study on the reduction of word-final [t]s in conversational standard Dutch. Based on a large amount of tokens annotated on the segmental level, we show that the bigram frequency and the segmental context are the main predictors for the absence of [t]s. In a second study, we present an analysis of the detailed acoustic properties of word-final [t]s and we show that bigram frequency and context also play a role on the subsegmental level. This paper extends research on the realization of /t/ in spontaneous speech and shows the importance of incorporating sub-segmental properties in models of speech.
  • Scott, S., & Sauter, D. (2006). Non-verbal expressions of emotion - acoustics, valence, and cross cultural factors. In Third International Conference on Speech Prosody 2006. ISCA.

    Abstract

    This presentation will address aspects of the expression of emotion in non-verbal vocal behaviour, specifically attempting to determine the roles of both positive and negative emotions, their acoustic bases, and the extent to which these are recognized in non-Western cultures.
  • Scott, D. R., & Cutler, A. (1982). Segmental cues to syntactic structure. In Proceedings of the Institute of Acoustics 'Spectral Analysis and its Use in Underwater Acoustics' (pp. E3.1-E3.4). London: Institute of Acoustics.
  • Senft, G. (2000). COME and GO in Kilivila. In B. Palmer, & P. Geraghty (Eds.), SICOL. Proceedings of the second international conference on Oceanic linguistics: Volume 2, Historical and descriptive studies (pp. 105-136). Canberra: Pacific Linguistics.
  • Senghas, A., Ozyurek, A., & Goldin-Meadow, S. (2010). The evolution of segmentation and sequencing: Evidence from homesign and Nicaraguan Sign Language. In A. D. Smith, M. Schouwstra, B. de Boer, & K. Smith (Eds.), Proceedings of the 8th International conference on the Evolution of Language (EVOLANG 8) (pp. 279-289). Singapore: World Scientific.
  • Seuren, P. A. M. (2009). Logical systems and natural logical intuitions. In Current issues in unity and diversity of languages: Collection of the papers selected from the CIL 18, held at Korea University in Seoul on July 21-26, 2008. http://www.cil18.org (pp. 53-60).

    Abstract

    The present paper is part of a large research programme investigating the nature and properties of the predicate logic inherent in natural language. The general hypothesis is that natural speakers start off with a basic-natural logic, based on natural cognitive functions, including the basic-natural way of dealing with plural objects. As culture spreads, functional pressure leads to greater generalization and mathematical correctness, yielding ever more refined systems until the apogee of standard modern predicate logic. Four systems of predicate calculus are considered: Basic-Natural Predicate Calculus (BNPC), Aritsotelian-Abelardian Predicate Calculus (AAPC), Aritsotelian-Boethian Predicate Calculus (ABPC), also known as the classic Square of Opposition, and Standard Modern Predicate Calculus (SMPC). (ABPC is logically faulty owing to its Undue Existential Import (UEI), but that fault is repaired by the addition of a presuppositional component to the logic.) All four systems are checked against seven natural logical intuitions. It appears that BNPC scores best (five out of seven), followed by ABPC (three out of seven). AAPC and SMPC finish ex aequo with two out of seven.
  • Seuren, P. A. M. (1982). Riorientamenti metodologici nello studio della variabilità linguistica. In D. Gambarara, & A. D'Atri (Eds.), Ideologia, filosofia e linguistica: Atti del Convegno Internazionale di Studi, Rende (CS) 15-17 Settembre 1978 ( (pp. 499-515). Roma: Bulzoni.
  • Seuren, P. A. M. (1985). Predicate raising and semantic transparency in Mauritian Creole. In N. Boretzky, W. Enninger, & T. Stolz (Eds.), Akten des 2. Essener Kolloquiums über "Kreolsprachen und Sprachkontakte", 29-30 Nov. 1985 (pp. 203-229). Bochum: Brockmeyer.
  • Sikveland, A., Öttl, A., Amdal, I., Ernestus, M., Svendsen, T., & Edlund, J. (2010). Spontal-N: A Corpus of Interactional Spoken Norwegian. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 2986-2991). Paris: European Language Resources Association (ELRA).

    Abstract

    Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first corpus of Norwegian in which the majority of speakers have spent significant parts of their lives in Sweden, and in which the recorded speech displays varying degrees of interference from Swedish. The corpus consists of studio quality audio- and video-recordings of four 30-minute free conversations between acquaintances, and a manual orthographic transcription of the entire material. On basis of the orthographic transcriptions, we automatically annotated approximately 50 percent of the material on the phoneme level, by means of a forced alignment between the acoustic signal and pronunciations listed in a dictionary. Approximately seven percent of the automatic transcription was manually corrected. Taking the manual correction as a gold standard, we evaluated several sources of pronunciation variants for the automatic transcription. Spontal-N is intended as a general purpose speech resource that is also suitable for investigating phonetic detail.
  • Simon, E., Escudero, P., & Broersma, M. (2010). Learning minimally different words in a third language: L2 proficiency as a crucial predictor of accuracy in an L3 word learning task. In K. Diubalska-Kolaczyk, M. Wrembel, & M. Kul (Eds.), Proceedings of the Sixth International Symposium on the Acquisition of Second Language Speech (New Sounds 2010).
  • Speed, L., & Majid, A. (2018). Music and odor in harmony: A case of music-odor synaesthesia. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2527-2532). Austin, TX: Cognitive Science Society.

    Abstract

    We report an individual with music-odor synaesthesia who experiences automatic and vivid odor sensations when she hears music. S’s odor associations were recorded on two days, and compared with those of two control participants. Overall, S produced longer descriptions, and her associations were of multiple odors at once, in comparison to controls who typically reported a single odor. Although odor associations were qualitatively different between S and controls, ratings of the consistency of their descriptions did not differ. This demonstrates that crossmodal associations between music and odor exist in non-synaesthetes too. We also found that S is better at discriminating between odors than control participants, and is more likely to experience emotion, memories and evaluations triggered by odors, demonstrating the broader impact of her synaesthesia.

    Additional information

    link to conference website
  • Spilková, H., Brenner, D., Öttl, A., Vondřička, P., Van Dommelen, W., & Ernestus, M. (2010). The Kachna L1/L2 picture replication corpus. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 2432-2436). Paris: European Language Resources Association (ELRA).

    Abstract

    This paper presents the Kachna corpus of spontaneous speech, in which ten Czech and ten Norwegian speakers were recorded both in their native language and in English. The dialogues are elicited using a picture replication task that requires active cooperation and interaction of speakers by asking them to produce a drawing as close to the original as possible. The corpus is appropriate for the study of interactional features and speech reduction phenomena across native and second languages. The combination of productions in non-native English and in speakers’ native language is advantageous for investigation of L2 issues while providing a L1 behaviour reference from all the speakers. The corpus consists of 20 dialogues comprising 12 hours 53 minutes of recording, and was collected in 2008. Preparation of the transcriptions, including a manual orthographic transcription and an automatically generated phonetic transcription, is currently in progress. The phonetic transcriptions are automatically generated by aligning acoustic models with the speech signal on the basis of the orthographic transcriptions and a dictionary of pronunciation variants compiled for the relevant language. Upon completion the corpus will be made available via the European Language Resources Association (ELRA).
  • Staum Casasanto, L., Jasmin, K., & Casasanto, D. (2010). Virtually accommodating: Speech rate accommodation to a virtual interlocutor. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 127-132). Austin, TX: Cognitive Science Society.

    Abstract

    Why do people accommodate to each other’s linguistic behavior? Studies of natural interactions (Giles, Taylor & Bourhis, 1973) suggest that speakers accommodate to achieve interactional goals, influencing what their interlocutor thinks or feels about them. But is this the only reason speakers accommodate? In real-world conversations, interactional motivations are ubiquitous, making it difficult to assess the extent to which they drive accommodation. Do speakers still accommodate even when interactional goals cannot be achieved, for instance, when their interlocutor cannot interpret their accommodation behavior? To find out, we asked participants to enter an immersive virtual reality (VR) environment and to converse with a virtual interlocutor. Participants accommodated to the speech rate of their virtual interlocutor even though he could not interpret their linguistic behavior, and thus accommodation could not possibly help them to achieve interactional goals. Results show that accommodation does not require explicit interactional goals, and suggest other social motivations for accommodation.
  • Stehouwer, H., & van Zaanen, M. (2010). Enhanced suffix arrays as language models: Virtual k-testable languages. In J. M. Sempere, & P. García (Eds.), Grammatical inference: Theoretical results and applications 10th International Colloquium, ICGI 2010, Valencia, Spain, September 13-16, 2010. Proceedings (pp. 305-308). Berlin: Springer.

    Abstract

    In this article, we propose the use of suffix arrays to efficiently implement n-gram language models with practically unlimited size n. This approach, which is used with synchronous back-off, allows us to distinguish between alternative sequences using large contexts. We also show that we can build this kind of models with additional information for each symbol, such as part-of-speech tags and dependency information. The approach can also be viewed as a collection of virtual k-testable automata. Once built, we can directly access the results of any k-testable automaton generated from the input training data. Synchronous back- off automatically identies the k-testable automaton with the largest feasible k. We have used this approach in several classification tasks.
  • Stehouwer, H., & Van Zaanen, M. (2010). Finding patterns in strings using suffix arrays. In M. Ganzha, & M. Paprzycki (Eds.), Proceedings of the International Multiconference on Computer Science and Information Technology, October 18–20, 2010. Wisła, Poland (pp. 505-511). IEEE.

    Abstract

    Finding regularities in large data sets requires implementations of systems that are efficient in both time and space requirements. Here, we describe a newly developed system that exploits the internal structure of the enhanced suffixarray to find significant patterns in a large collection of sequences. The system searches exhaustively for all significantly compressing patterns where patterns may consist of symbols and skips or wildcards. We demonstrate a possible application of the system by detecting interesting patterns in a Dutch and an English corpus.
  • Stehouwer, H., & van Zaanen, M. (2009). Language models for contextual error detection and correction. In Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference (pp. 41-48). Association for Computational Linguistics.

    Abstract

    The problem of identifying and correcting confusibles, i.e. context-sensitive spelling errors, in text is typically tackled using specifically trained machine learning classifiers. For each different set of confusibles, a specific classifier is trained and tuned. In this research, we investigate a more generic approach to context-sensitive confusible correction. Instead of using specific classifiers, we use one generic classifier based on a language model. This measures the likelihood of sentences with different possible solutions of a confusible in place. The advantage of this approach is that all confusible sets are handled by a single model. Preliminary results show that the performance of the generic classifier approach is only slightly worse that that of the specific classifier approach
  • Stehouwer, H., & Van Zaanen, M. (2009). Token merging in language model-based confusible disambiguation. In T. Calders, K. Tuyls, & M. Pechenizkiy (Eds.), Proceedings of the 21st Benelux Conference on Artificial Intelligence (pp. 241-248).

    Abstract

    In the context of confusible disambiguation (spelling correction that requires context), the synchronous back-off strategy combined with traditional n-gram language models performs well. However, when alternatives consist of a different number of tokens, this classification technique cannot be applied directly, because the computation of the probabilities is skewed. Previous work already showed that probabilities based on different order n-grams should not be compared directly. In this article, we propose new probability metrics in which the size of the n is varied according to the number of tokens of the confusible alternative. This requires access to n-grams of variable length. Results show that the synchronous back-off method is extremely robust. We discuss the use of suffix trees as a technique to store variable length n-gram information efficiently.
  • Stehouwer, H., & van Zaanen, M. (2010). Using suffix arrays as language models: Scaling the n-gram. In Proceedings of the 22st Benelux Conference on Artificial Intelligence (BNAIC 2010), October 25-26, 2010.

    Abstract

    In this article, we propose the use of suffix arrays to implement n-gram language models with practically unlimited size n. These unbounded n-grams are called 1-grams. This approach allows us to use large contexts efficiently to distinguish between different alternative sequences while applying synchronous back-off. From a practical point of view, the approach has been applied within the context of spelling confusibles, verb and noun agreement and prenominal adjective ordering. These initial experiments show promising results and we relate the performance to the size of the n-grams used for disambiguation.
  • Stivers, T., Enfield, N. J., & Levinson, S. C. (Eds.). (2010). Question-response sequences in conversation across ten languages [Special Issue]. Journal of Pragmatics, 42(10). doi:10.1016/j.pragma.2010.04.001.
  • Ten Bosch, L., Baayen, R. H., & Ernestus, M. (2006). On speech variation and word type differentiation by articulatory feature representations. In Proceedings of Interspeech 2006 (pp. 2230-2233).

    Abstract

    This paper describes ongoing research aiming at the description of variation in speech as represented by asynchronous articulatory features. We will first illustrate how distances in the articulatory feature space can be used for event detection along speech trajectories in this space. The temporal structure imposed by the cosine distance in articulatory feature space coincides to a large extent with the manual segmentation on phone level. The analysis also indicates that the articulatory feature representation provides better such alignments than the MFCC representation does. Secondly, we will present first results that indicate that articulatory features can be used to probe for acoustic differences in the onsets of Dutch singulars and plurals.
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2018). Analyzing reaction time sequences from human participants in auditory experiments. In Proceedings of Interspeech 2018 (pp. 971-975). doi:10.21437/Interspeech.2018-1728.

    Abstract

    Sequences of reaction times (RT) produced by participants in an experiment are not only influenced by the stimuli, but by many other factors as well, including fatigue, attention, experience, IQ, handedness, etc. These confounding factors result in longterm effects (such as a participant’s overall reaction capability) and in short- and medium-time fluctuations in RTs (often referred to as ‘local speed effects’). Because stimuli are usually presented in a random sequence different for each participant, local speed effects affect the underlying ‘true’ RTs of specific trials in different ways across participants. To be able to focus statistical analysis on the effects of the cognitive process under study, it is necessary to reduce the effect of confounding factors as much as possible. In this paper we propose and compare techniques and criteria for doing so, with focus on reducing (‘filtering’) the local speed effects. We show that filtering matters substantially for the significance analyses of predictors in linear mixed effect regression models. The performance of filtering is assessed by the average between-participant correlation between filtered RT sequences and by Akaike’s Information Criterion, an important measure of the goodness-of-fit of linear mixed effect regression models.
  • ten Bosch, L., Hämäläinen, A., Scharenborg, O., & Boves, L. (2006). Acoustic scores and symbolic mismatch penalties in phone lattices. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing [ICASSP 2006]. IEEE.

    Abstract

    This paper builds on previous work that aims at unraveling the structure of the speech signal by means of using probabilistic representations. The context of this work is a multi-pass speech recognition system in which a phone lattice is created and used as a basis for a lexical search in which symbolic mismatches are allowed at certain costs. The focus is on the optimization of the costs of phone insertions, deletions and substitutions that are used in the lexical decoding pass. Two optimization approaches are presented, one related to a multi-pass computational model for human speech recognition, the other based on a decoding in which Bayes’ risks are minimized. In the final section, the advantages of these optimization methods are discussed and compared.
  • Ten Bosch, L., & Boves, L. (2018). Information encoding by deep neural networks: what can we learn? In Proceedings of Interspeech 2018 (pp. 1457-1461). doi:10.21437/Interspeech.2018-1896.

    Abstract

    The recent advent of deep learning techniques in speech tech-nology and in particular in automatic speech recognition hasyielded substantial performance improvements. This suggeststhat deep neural networks (DNNs) are able to capture structurein speech data that older methods for acoustic modeling, suchas Gaussian Mixture Models and shallow neural networks failto uncover. In image recognition it is possible to link repre-sentations on the first couple of layers in DNNs to structuralproperties of images, and to representations on early layers inthe visual cortex. This raises the question whether it is possi-ble to accomplish a similar feat with representations on DNNlayers when processing speech input. In this paper we presentthree different experiments in which we attempt to untanglehow DNNs encode speech signals, and to relate these repre-sentations to phonetic knowledge, with the aim to advance con-ventional phonetic concepts and to choose the topology of aDNNs more efficiently. Two experiments investigate represen-tations formed by auto-encoders. A third experiment investi-gates representations on convolutional layers that treat speechspectrograms as if they were images. The results lay the basisfor future experiments with recursive networks.
  • Thompson, B., & Lupyan, G. (2018). Automatic estimation of lexical concreteness in 77 languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1122-1127). Austin, TX: Cognitive Science Society.

    Abstract

    We estimate lexical Concreteness for millions of words across 77 languages. Using a simple regression framework, we combine vector-based models of lexical semantics with experimental norms of Concreteness in English and Dutch. By applying techniques to align vector-based semantics across distinct languages, we compute and release Concreteness estimates at scale in numerous languages for which experimental norms are not currently available. This paper lays out the technique and its efficacy. Although this is a difficult dataset to evaluate immediately, Concreteness estimates computed from English correlate with Dutch experimental norms at $\rho$ = .75 in the vocabulary at large, increasing to $\rho$ = .8 among Nouns. Our predictions also recapitulate attested relationships with word frequency. The approach we describe can be readily applied to numerous lexical measures beyond Concreteness
  • Thompson, B., Roberts, S., & Lupyan, G. (2018). Quantifying semantic similarity across languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2551-2556). Austin, TX: Cognitive Science Society.

    Abstract

    Do all languages convey semantic knowledge in the same way? If language simply mirrors the structure of the world, the answer should be a qualified “yes”. If, however, languages impose structure as much as reflecting it, then even ostensibly the “same” word in different languages may mean quite different things. We provide a first pass at a large-scale quantification of cross-linguistic semantic alignment of approximately 1000 meanings in 55 languages. We find that the translation equivalents in some domains (e.g., Time, Quantity, and Kinship) exhibit high alignment across languages while the structure of other domains (e.g., Politics, Food, Emotions, and Animals) exhibits substantial cross-linguistic variability. Our measure of semantic alignment correlates with known phylogenetic distances between languages: more phylogenetically distant languages have less semantic alignment. We also find semantic alignment to correlate with cultural distances between societies speaking the languages, suggesting a rich co-adaptation of language and culture even in domains of experience that appear most constrained by the natural world
  • Torreira, F., & Ernestus, M. (2009). Probabilistic effects on French [t] duration. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 448-451). Causal Productions Pty Ltd.

    Abstract

    The present study shows that [t] consonants are affected by probabilistic factors in a syllable-timed language as French, and in spontaneous as well as in journalistic speech. Study 1 showed a word bigram frequency effect in spontaneous French, but its exact nature depended on the corpus on which the probabilistic measures were based. Study 2 investigated journalistic speech and showed an effect of the joint frequency of the test word and its following word. We discuss the possibility that these probabilistic effects are due to the speaker’s planning of upcoming words, and to the speaker’s adaptation to the listener’s needs.
  • Torreira, F., & Ernestus, M. (2010). Phrase-medial vowel devoicing in spontaneous French. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2006-2009).

    Abstract

    This study investigates phrase-medial vowel devoicing in European French (e.g. /ty po/ [typo] 'you can'). Our spontaneous speech data confirm that French phrase-medial devoicing is a frequent phenomenon affecting high vowels preceded by voiceless consonants. We also found that devoicing is more frequent in temporally reduced and coarticulated vowels. Complete and partial devoicing were conditioned by the same variables (speech rate, consonant type and distance from the end of the AP). Given these results, we propose that phrase-medial vowel devoicing in French arises mainly from the temporal compression of vocalic gestures and the aerodynamic conditions imposed by high vowels.
  • Torreira, F., & Ernestus, M. (2010). The Nijmegen corpus of casual Spanish. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10) (pp. 2981-2985). Paris: European Language Resources Association (ELRA).

    Abstract

    This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual Spanish (NCCSp). The corpus contains around 30 hours of recordings of 52 Madrid Spanish speakers engaged in conversations with friends. Casual speech was elicited during three different parts, which together provided around ninety minutes of speech from every group of speakers. While Parts 1 and 2 did not require participants to perform any specific task, in Part 3 participants negotiated a common answer to general questions about society. Information about how to obtain a copy of the corpus can be found online at http://mirjamernestus.ruhosting.nl/Ernestus/NCCSp
  • Tourtouri, E. N., Delogu, F., & Crocker, M. W. (2018). Specificity and entropy reduction in situated referential processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3356-3361). Austin: Cognitive Science Society.

    Abstract

    In situated communication, reference to an entity in the shared visual context can be established using eitheranexpression that conveys precise (minimally specified) or redundant (over-specified) information. There is, however, along-lasting debate in psycholinguistics concerningwhether the latter hinders referential processing. We present evidence from an eyetrackingexperiment recordingfixations as well asthe Index of Cognitive Activity –a novel measure of cognitive workload –supporting the view that over-specifications facilitate processing. We further present originalevidence that, above and beyond the effect of specificity,referring expressions thatuniformly reduce referential entropyalso benefitprocessing
  • Tuinman, A. (2006). Overcompensation of /t/ reduction in Dutch by German/Dutch bilinguals. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 101-102).
  • Tuinman, A., & Cutler, A. (2010). Casual speech processes: L1 knowledge and L2 speech perception. In K. Dziubalska-Kołaczyk, M. Wrembel, & M. Kul (Eds.), Proceedings of the 6th International Symposium on the Acquisition of Second Language Speech, New Sounds 2010, Poznań, Poland, 1-3 May 2010 (pp. 512-517). Poznan: Adama Mickiewicz University.

    Abstract

    Every language manifests casual speech processes, and hence every second language too. This study examined how listeners deal with second-language casual speech processes, as a function of the processes in their native language. We compared a match case, where a second-language process t/-reduction) is also operative in native speech, with a mismatch case, where a second-language process (/r/-insertion) is absent from native speech. In each case native and non-native listeners judged stimuli in which a given phoneme (in sentence context) varied along a continuum from absent to present. Second-language listeners in general mimicked native performance in the match case, but deviated significantly from native performance in the mismatch case. Together these results make it clear that the mapping from first to second language is as important in the interpretation of casual speech processes as in other dimensions of speech perception. Unfamiliar casual speech processes are difficult to adapt to in a second language. Casual speech processes that are already familiar from native speech, however, are easy to adapt to; indeed, our results even suggest that it is possible for subtle difference in their occurrence patterns across the two languages to be detected,and to be accommodated to in second-language listening.
  • Uddén, J., Araújo, S., Forkstam, C., Ingvar, M., Hagoort, P., & Petersson, K. M. (2009). A matter of time: Implicit acquisition of recursive sequence structures. In N. Taatgen, & H. Van Rijn (Eds.), Proceedings of the Thirty-First Annual Conference of the Cognitive Science Society (pp. 2444-2449).

    Abstract

    A dominant hypothesis in empirical research on the evolution of language is the following: the fundamental difference between animal and human communication systems is captured by the distinction between regular and more complex non-regular grammars. Studies reporting successful artificial grammar learning of nested recursive structures and imaging studies of the same have methodological shortcomings since they typically allow explicit problem solving strategies and this has been shown to account for the learning effect in subsequent behavioral studies. The present study overcomes these shortcomings by using subtle violations of agreement structure in a preference classification task. In contrast to the studies conducted so far, we use an implicit learning paradigm, allowing the time needed for both abstraction processes and consolidation to take place. Our results demonstrate robust implicit learning of recursively embedded structures (context-free grammar) and recursive structures with cross-dependencies (context-sensitive grammar) in an artificial grammar learning task spanning 9 days. Keywords: Implicit artificial grammar learning; centre embedded; cross-dependency; implicit learning; context-sensitive grammar; context-free grammar; regular grammar; non-regular grammar
  • Vagliano, I., Galke, L., Mai, F., & Scherp, A. (2018). Using adversarial autoencoders for multi-modal automatic playlist continuation. In C.-W. Chen, P. Lamere, M. Schedl, & H. Zamani (Eds.), RecSys Challenge '18: Proceedings of the ACM Recommender Systems Challenge 2018 (pp. 5.1-5.6). New York: ACM. doi:10.1145/3267471.3267476.

    Abstract

    The task of automatic playlist continuation is generating a list of recommended tracks that can be added to an existing playlist. By suggesting appropriate tracks, i. e., songs to add to a playlist, a recommender system can increase the user engagement by making playlist creation easier, as well as extending listening beyond the end of current playlist. The ACM Recommender Systems Challenge 2018 focuses on such task. Spotify released a dataset of playlists, which includes a large number of playlists and associated track listings. Given a set of playlists from which a number of tracks have been withheld, the goal is predicting the missing tracks in those playlists. We participated in the challenge as the team Unconscious Bias and, in this paper, we present our approach. We extend adversarial autoencoders to the problem of automatic playlist continuation. We show how multiple input modalities, such as the playlist titles as well as track titles, artists and albums, can be incorporated in the playlist continuation task.
  • Vainio, M., Suni, A., Raitio, T., Nurminen, J., Järvikivi, J., & Alku, P. (2009). New method for delexicalization and its application to prosodic tagging for text-to-speech synthesis. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 1703-1706).

    Abstract

    This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibility to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delexicalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The experiment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.
  • Van Rees Vellinga, M., Hanulikova, A., Weber, A., & Zwitserlood, P. (2010). A neurophysiological investigation of processing phoneme substitutions in L2. In New Sounds 2010: Sixth International Symposium on the Acquisition of Second Language Speech (pp. 518-523). Poznan, Poland: Adam Mickiewicz University.
  • Van der Meij, L., Isaac, A., & Zinn, C. (2010). A web-based repository service for vocabularies and alignments in the cultural heritage domain. In L. Aroyo, G. Antoniou, E. Hyvönen, A. Ten Teije, H. Stuckenschmidt, L. Cabral, & T. Tudorache (Eds.), The Semantic Web: Research and Applications. 7th Extended Semantic Web Conference, Proceedings, Part I (pp. 394-409). Heidelberg: Springer.

    Abstract

    Controlled vocabularies of various kinds (e.g., thesauri, classification schemes) play an integral part in making Cultural Heritage collections accessible. The various institutions participating in the Dutch CATCH programme maintain and make use of a rich and diverse set of vocabularies. This makes it hard to provide a uniform point of access to all collections at once. Our SKOS-based vocabulary and alignment repository aims at providing technology for managing the various vocabularies, and for exploiting semantic alignments across any two of them. The repository system exposes web services that effectively support the construction of tools for searching and browsing across vocabularies and collections or for collection curation (indexing), as we demonstrate.
  • Van Gerven, M., & Simanova, I. (2010). Concept classification with Bayesian multi-task learning. In Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics (pp. 10-17). Los Angeles: Association for Computational Linguistics.

    Abstract

    Multivariate analysis allows decoding of single trial data in individual subjects. Since different models are obtained for each subject it becomes hard to perform an analysis on the group level. We introduce a new algorithm for Bayesian multi-task learning which imposes a coupling between single-subject models. Using
    the CMU fMRI dataset it is shown that the algorithm can be used for concept classification
    based on the average activation of regions in the AAL atlas. Concepts which were most easily classified correspond to the categories shelter,manipulation and eating, which is in accordance with the literature. The multi-task learning algorithm is shown to find regions of interest that are common to all subjects which
    therefore facilitates interpretation of the obtained
    models.
  • Van Berkum, J. J. A. (2009). Does the N400 directly reflect compositional sense-making? Psychophysiology, Special Issue: Society for Psychophysiological Research Abstracts for the Forty-Ninth Annual Meeting, 46(Suppl. 1), s2.

    Abstract

    A not uncommon assumption in psycholinguistics is that the N400 directly indexes high-level semantic integration, the compositional, word-driven construction of sentence- and discourse-level meaning in some language-relevant unification space. The various discourse- and speaker-dependent modulations of the N400 uncovered by us and others are often taken to support this 'compositional integration' position. In my talk, I will argue that these N400 modulations are probably better interpreted as only indirectly reflecting compositional sense-making. The account that I will advance for these N400 effects is a variant of the classic Kutas and Federmeier (2002, TICS) memory retrieval account in which context effects on the word-elicited N400 are taken to reflect contextual priming of LTM access. It differs from the latter in making more explicit that the contextual cues that prime access to a word's meaning in LTM can range from very simple (e.g., a single concept) to very complex ones (e.g., a structured representation of the current discourse). Furthermore, it incorporates the possibility, suggested by recent N400 findings, that semantic retrieval can also be intensified in response to certain ‘relevance signals’, such as strong value-relevance, or a marked delivery (linguistic focus, uncommon choice of words, etc). In all, the perspective I'll draw is that in the context of discourse-level language processing, N400 effects reflect an 'overlay of technologies', with the construction of discourse-level representations riding on top of more ancient sense-making technology.
  • Van Valin Jr., R. D. (2000). Focus structure or abstract syntax? A role and reference grammar account of some ‘abstract’ syntactic phenomena. In Z. Estrada Fernández, & I. Barreras Aguilar (Eds.), Memorias del V Encuentro Internacional de Lingüística en el Noroeste: (2 v.) Estudios morfosintácticos (pp. 39-62). Hermosillo: Editorial Unison.
  • Van den Bos, E. J., & Poletiek, F. H. (2006). Implicit artificial grammar learning in adults and children. In R. Sun (Ed.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006) (pp. 2619). Austin, TX, USA: Cognitive Science Society.
  • Van Hout, A., & Veenstra, A. (2010). Telicity marking in Dutch child language: Event realization or no aspectual coercion? In J. Costa, A. Castro, M. Lobo, & F. Pratas (Eds.), Language Acquisition and Development: Proceedings of GALA 2009 (pp. 216-228). Newcastle upon Tyne: Cambridge Scholars Publishing.
  • Van de Ven, M., Tucker, B. V., & Ernestus, M. (2009). Semantic context effects in the recognition of acoustically unreduced and reduced words. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (pp. 1867-1870). Causal Productions Pty Ltd.

    Abstract

    Listeners require context to understand the casual pronunciation variants of words that are typical of spontaneous speech (Ernestus et al., 2002). The present study reports two auditory lexical decision experiments, investigating listeners' use of semantic contextual information in the comprehension of unreduced and reduced words. We found a strong semantic priming effect for low frequency unreduced words, whereas there was no such effect for reduced words. Word frequency was facilitatory for all words. These results show that semantic context is relevant especially for the comprehension of unreduced words, which is unexpected given the listener driven explanation of reduction in spontaneous speech.
  • Van de Ven, M., Tucker, B. V., & Ernestus, M. (2010). Semantic facilitation in bilingual everyday speech comprehension. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (Interspeech 2010), Makuhari, Japan (pp. 1245-1248).

    Abstract

    Previous research suggests that bilinguals presented with low and high predictability sentences benefit from semantics in clear but not in conversational speech [1]. In everyday speech, however, many words are not highly predictable. Previous research has shown that native listeners can use also more subtle semantic contextual information [2]. The present study reports two auditory lexical decision experiments investigating to what extent late Asian-English bilinguals benefit from subtle semantic cues in their processing of English unreduced and reduced speech. Our results indicate that these bilinguals are less sensitive to semantic cues than native listeners for both speech registers.
  • Van Uytvanck, D., Zinn, C., Broeder, D., Wittenburg, P., & Gardelleni, M. (2010). Virtual language observatory: The portal to the language resources and technology universe. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 900-903). European Language Resources Association (ELRA).

    Abstract

    Over the years, the field of Language Resources and Technology (LRT) hasdeveloped a tremendous amount of resources and tools. However, there is noready-to-use map that researchers could use to gain a good overview andsteadfast orientation when searching for, say corpora or software tools tosupport their studies. It is rather the case that information is scatteredacross project- or organisation-specific sites, which makes it hard if notimpossible for less-experienced researchers to gather all relevant material.Clearly, the provision of metadata is central to resource and softwareexploration. However, in the LRT field, metadata comes in many forms, tastesand qualities, and therefore substantial harmonization and curation efforts arerequired to provide researchers with metadata-based guidance. To address thisissue a broad alliance of LRT providers (CLARIN, the Linguist List, DOBES,DELAMAN, DFKI, ELRA) have initiated the Virtual Language Observatory portal toprovide a low-barrier, easy-to-follow entry point to language resources andtools; it can be accessed via http://www.clarin.eu/vlo
  • Vernes, S. C. (2018). Vocal learning in bats: From genes to behaviour. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 516-518). Toruń, Poland: NCU Press. doi:10.12775/3991-1.128.
  • Versteegh, M., Ten Bosch, L., & Boves, L. (2010). Active word learning under uncertain input conditions. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2930-2933). ISCA.

    Abstract

    This paper presents an analysis of phoneme durations of emotional speech in two languages: Dutch and Korean. The analyzed corpus of emotional speech has been specifically developed for the purpose of cross-linguistic comparison, and is more balanced than any similar corpus available so far: a) it contains expressions by both Dutch and Korean actors and is based on judgments by both Dutch and Korean listeners; b) the same elicitation technique and recording procedure were used for recordings of both languages; and c) the phonetics of the carrier phrase were constructed to be permissible in both languages. The carefully controlled phonetic content of the carrier phrase allows for analysis of the role of specific phonetic features, such as phoneme duration, in emotional expression in Dutch and Korean. In this study the mutual effect of language and emotion on phoneme duration is presented.
  • Versteegh, M., Ten Bosch, L., & Boves, L. (2010). Dealing with uncertain input in word learning. In Proceedings of the IXth IEEE International Conference on Development and Learning (ICDL). Ann Arbor, MI, 18-21 Aug. 2010 (pp. 46-51). IEEE.

    Abstract

    In this paper we investigate a computational model of word learning, that is embedded in a cognitively and ecologically plausible framework. Multi-modal stimuli from four different speakers form a varied source of experience. The model incorporates active learning, attention to a communicative setting and clarity of the visual scene. The model's ability to learn associations between speech utterances and visual concepts is evaluated during training to investigate the influence of active learning under conditions of uncertain input. The results show the importance of shared attention in word learning and the model's robustness against noise.
  • Versteegh, M., Sangati, F., & Zuidema, W. (2010). Simulations of socio-linguistic change: Implications for unidirectionality. In A. Smith, M. Schoustra, B. Boer, & K. Smith (Eds.), Proceedings of the 8th International conference on the Evolution of Language (EVOLANG 8) (pp. 511-512). World Scientific Publishing.
  • Von Holzen, K., & Bergmann, C. (2018). A Meta-Analysis of Infants’ Mispronunciation Sensitivity Development. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1159-1164). Austin, TX: Cognitive Science Society.

    Abstract

    Before infants become mature speakers of their native language, they must acquire a robust word-recognition system which allows them to strike the balance between allowing some variation (mood, voice, accent) and recognizing variability that potentially changes meaning (e.g. cat vs hat). The current meta-analysis quantifies how the latter, termed mispronunciation sensitivity, changes over infants’ first three years, testing competing predictions of mainstream language acquisition theories. Our results show that infants were sensitive to mispronunciations, but accepted them as labels for target objects. Interestingly, and in contrast to predictions of mainstream theories, mispronunciation sensitivity was not modulated by infant age, suggesting that a sufficiently flexible understanding of native language phonology is in place at a young age.
  • Weber, A., & Poellmann, K. (2010). Identifying foreign speakers with an unfamiliar accent or in an unfamiliar language. In New Sounds 2010: Sixth International Symposium on the Acquisition of Second Language Speech (pp. 536-541). Poznan, Poland: Adam Mickiewicz University.
  • Weber, A. (1998). Listening to nonnative language which violates native assimilation rules. In D. Duez (Ed.), Proceedings of the European Scientific Communication Association workshop: Sound patterns of Spontaneous Speech (pp. 101-104).

    Abstract

    Recent studies using phoneme detection tasks have shown that spoken-language processing is neither facilitated nor interfered with by optional assimilation, but is inhibited by violation of obligatory assimilation. Interpretation of these results depends on an assessment of their generality, specifically, whether they also obtain when listeners are processing nonnative language. Two separate experiments are presented in which native listeners of German and native listeners of Dutch had to detect a target fricative in legal monosyllabic Dutch nonwords. All of the nonwords were correct realisations in standard Dutch. For German listeners, however, half of the nonwords contained phoneme strings which violate the German fricative assimilation rule. Whereas the Dutch listeners showed no significant effects, German listeners detected the target fricative faster when the German fricative assimilation was violated than when no violation occurred. The results might suggest that violation of assimilation rules does not have to make processing more difficult per se.
  • Weber, A. (2000). Phonotactic and acoustic cues for word segmentation in English. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP 2000) (pp. 782-785).

    Abstract

    This study investigates the influence of both phonotactic and acoustic cues on the segmentation of spoken English. Listeners detected embedded English words in nonsense sequences (word spotting). Words aligned with phonotactic boundaries were easier to detect than words without such alignment. Acoustic cues to boundaries could also have signaled word boundaries, especially when word onsets lacked phonotactic alignment. However, only one of several durational boundary cues showed a marginally significant correlation with response times (RTs). The results suggest that word segmentation in English is influenced primarily by phonotactic constraints and only secondarily by acoustic aspects of the speech signal.
  • Weber, A. (2009). The role of linguistic experience in lexical recognition [Abstract]. Journal of the Acoustical Society of America, 125, 2759.

    Abstract

    Lexical recognition is typically slower in L2 than in L1. Part of the difficulty comes from a not precise enough processing of L2 phonemes. Consequently, L2 listeners fail to eliminate candidate words that L1 listeners can exclude from competing for recognition. For instance, the inability to distinguish /r/ from /l/ in rocket and locker makes for Japanese listeners both words possible candidates when hearing their onset (e.g., Cutler, Weber, and Otake, 2006). The L2 disadvantage can, however, be dispelled: For L2 listeners, but not L1 listeners, L2 speech from a non-native talker with the same language background is known to be as intelligible as L2 speech from a native talker (e.g., Bent and Bradlow, 2003). A reason for this may be that L2 listeners have ample experience with segmental deviations that are characteristic for their own accent. On this account, only phonemic deviations that are typical for the listeners’ own accent will cause spurious lexical activation in L2 listening (e.g., English magic pronounced as megic for Dutch listeners). In this talk, I will present evidence from cross-modal priming studies with a variety of L2 listener groups, showing how the processing of phonemic deviations is accent-specific but withstands fine phonetic differences.
  • Weber, A. (2000). The role of phonotactics in the segmentation of native and non-native continuous speech. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP, Workshop on Spoken Word Access Processes. Nijmegen: MPI for Psycholinguistics.

    Abstract

    Previous research has shown that listeners make use of their knowledge of phonotactic constraints to segment speech into individual words. The present study investigates the influence of phonotactics when segmenting a non-native language. German and English listeners detected embedded English words in nonsense sequences. German listeners also had knowledge of English, but English listeners had no knowledge of German. Word onsets were either aligned with a syllable boundary or not, according to the phonotactics of the two languages. Words aligned with either German or English phonotactic boundaries were easier for German listeners to detect than words without such alignment. Responses of English listeners were influenced primarily by English phonotactic alignment. The results suggest that both native and non-native phonotactic constraints influence lexical segmentation of a non-native, but familiar, language.
  • Widlok, T. (2006). Two ways of looking at a Mangetti grove. In A. Takada (Ed.), Proceedings of the workshop: Landscape and society (pp. 11-16). Kyoto: 21st Century Center of Excellence Program.
  • Willems, R. M., Labruna, L., D'Esposito, M., Ivry, R., & Casasanto, D. (2010). A functional role for the motor system in language understanding: Evidence from rTMS [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 127). York: University of York.
  • Wittek, A. (1998). Learning verb meaning via adverbial modification: Change-of-state verbs in German and the adverb "wieder" again. In A. Greenhill, M. Hughes, H. Littlefield, & H. Walsh (Eds.), Proceedings of the 22nd Annual Boston University Conference on Language Development (pp. 779-790). Somerville, MA: Cascadilla Press.
  • Witteman, M. J., Weber, A., & McQueen, J. M. (2010). Rapid and long-lasting adaptation to foreign-accented speech [Abstract]. Journal of the Acoustical Society of America, 128, 2486.

    Abstract

    In foreign-accented speech, listeners have to handle noticeable deviations from the standard pronunciation of a target language. Three cross-modal priming experiments investigated how short- and long-term experiences with a foreign accent influence word recognition by native listeners. In experiment 1, German-accented words were presented to Dutch listeners who had either extensive or limited prior experience with German-accented Dutch. Accented words either contained a diphthong substitution that deviated acoustically quite largely from the canonical form (huis [hys], "house", pronounced as [hoys]), or that deviated acoustically to a lesser extent (lijst [lst], "list", pronounced as [lst]). The mispronunciations never created lexical ambiguity in Dutch. While long-term experience facilitated word recognition for both types of substitutions, limited experience facilitated recognition only of words with acoustically smaller deviations. In experiment 2, Dutch listeners with limited experience listened to the German speaker for 4 min before participating in the cross-modal priming experiment. The results showed that speaker-specific learning effects for acoustically large deviations can be obtained already after a brief exposure, as long as the exposure contains evidence of the deviations. Experiment 3 investigates whether these short-term adaptation effects for foreign-accented speech are speaker-independent.
  • Wittenburg, P. (2010). Culture change in data management. In V. Luzar-Stiffler, I. Jarec, & Z. Bekic (Eds.), Proceedings of the ITI 2010, 32nd International Conference on Information Technology Interfaces (pp. 43 -48). Zagreb, Croatia: University of Zagreb.

    Abstract

    In the emerging e-Science scenario users should be able to easily combine data resources and tools/services; and machines should automatically be able to trace paths and carry out interpretations. Users who want to participate need to move from a down-load first to a cyberinfrastructure paradigm, thus increasing their dependency on the seamless operation of all components in the Internet. Such a scenario is inherently complex and requires compliance to guidelines and standards to keep it working smoothly. Only a change in our culture of dealing with research data and awareness about the way we do data lifecycle management will lead to success. Since we have so many legacy resources that are not compliant with the required guidelines, since we need to admit obvious problems in particular with standardization in the area of semantics and since it will take much time to establish trust at the side of researchers, the e-Science scenario can only be achieved stepwise which will take much time.
  • Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: a professional framework for multimodality research. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1556-1559).

    Abstract

    Utilization of computer tools in linguistic research has gained importance with the maturation of media frameworks for the handling of digital audio and video. The increased use of these tools in gesture, sign language and multimodal interaction studies has led to stronger requirements on the flexibility, the efficiency and in particular the time accuracy of annotation tools. This paper describes the efforts made to make ELAN a tool that meets these requirements, with special attention to the developments in the area of time accuracy. In subsequent sections an overview will be given of other enhancements in the latest versions of ELAN, that make it a useful tool in multimodality research.
  • Wittenburg, P., Broeder, D., Klein, W., Levinson, S. C., & Romary, L. (2006). Foundations of modern language resource archives. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 625-628).

    Abstract

    A number of serious reasons will convince an increasing amount of researchers to store their relevant material in centers which we will call "language resource archives". They combine the duty of taking care of long-term preservation as well as the task to give access to their material to different user groups. Access here is meant in the sense that an active interaction with the data will be made possible to support the integration of new data, new versions or commentaries of all sort. Modern Language Resource Archives will have to adhere to a number of basic principles to fulfill all requirements and they will have to be involved in federations to create joint language resource domains making it even more simple for the researchers to access the data. This paper makes an attempt to formulate the essential pillars language resource archives have to adhere to.
  • Wittenburg, P., Trilsbeek, P., & Lenkiewicz, P. (2010). Large multimedia archive for world languages. In SSCS'10 - Proceedings of the 2010 ACM Workshop on Searching Spontaneous Conversational Speech, Co-located with ACM Multimedia 2010 (pp. 53-56). New York: Association for Computing Machinery, Inc. (ACM). doi:10.1145/1878101.1878113.

    Abstract

    In this paper, we describe the core pillars of a large archive oflanguage material recorded worldwide partly about languages that are highly endangered. The bases for the documentation of these languages are audio/video recordings which are then annotated at several linguistic layers. The digital age completely changed the requirements of long-term preservation and it is discussed how the archive met these new challenges. An extensive solution for data replication has been worked out to guarantee bit-stream preservation. Due to an immediate conversion of the incoming data to standards -based formats and checks at upload time lifecycle management of all 50 Terabyte of data is widely simplified. A suitable metadata framework not only allowing users to describe and discover resources, but also allowing them to organize their resources is enabling the management of this amount of resources very efficiently. Finally, it is the Language Archiving Technology software suite which allows users to create, manipulate, access and enrich all archived resources given that they have access permissions.
  • Wittenburg, P., Bel, N., Borin, L., Budin, G., Calzolari, N., Hajicova, E., Koskenniemi, K., Lemnitzer, L., Maegaard, B., Piasecki, M., Pierrel, J.-M., Piperidis, S., Skadina, I., Tufis, D., Van Veenendaal, R., Váradi, T., & Wynne, M. (2010). Resource and service centres as the backbone for a sustainable service infrastructure. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 60-63). European Language Resources Association (ELRA).

    Abstract

    Currently, research infrastructures are being designed and established in manydisciplines since they all suffer from an enormous fragmentation of theirresources and tools. In the domain of language resources and tools the CLARINinitiative has been funded since 2008 to overcome many of the integration andinteroperability hurdles. CLARIN can build on knowledge and work from manyprojects that were carried out during the last years and wants to build stableand robust services that can be used by researchers. Here service centres willplay an important role that have the potential of being persistent and thatadhere to criteria as they have been established by CLARIN. In the last year ofthe so-called preparatory phase these centres are currently developing four usecases that can demonstrate how the various pillars CLARIN has been working oncan be integrated. All four use cases fulfil the criteria of beingcross-national.
  • Xiao, M., Kong, X., Liu, J., & Ning, J. (2009). TMBF: Bloom filter algorithms of time-dependent multi bit-strings for incremental set. In Proceedings of the 2009 International Conference on Ultra Modern Telecommunications & Workshops.

    Abstract

    Set is widely used as a kind of basic data structure. However, when it is used for large scale data set the cost of storage, search and transport is overhead. The bloom filter uses a fixed size bit string to represent elements in a static set, which can reduce storage space and search cost that is a fixed constant. The time-space efficiency is achieved at the cost of a small probability of false positive in membership query. However, for many applications the space savings and locating time constantly outweigh this drawback. Dynamic bloom filter (DBF) can support concisely representation and approximate membership queries of dynamic set instead of static set. It has been proved that DBF not only possess the advantage of standard bloom filter, but also has better features when dealing with dynamic set. This paper proposes a time-dependent multiple bit-strings bloom filter (TMBF) which roots in the DBF and targets on dynamic incremental set. TMBF uses multiple bit-strings in time order to present a dynamic increasing set and uses backward searching to test whether an element is in a set. Based on the system logs from a real P2P file sharing system, the evaluation shows a 20% reduction in searching cost compared to DBF.
  • Zinn, C., Wittenburg, P., & Ringersma, J. (2010). An evolving eScience environment for research data in linguistics. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 894-899). European Language Resources Association (ELRA).

    Abstract

    The amount of research data in the Humanities is increasing at fastspeed. Metadata helps describing and making accessible this data tointerested researchers within and across institutions. While metadatainteroperability is an issue that is being recognised and addressed,the systematic and user-driven provision of annotations and thelinking together of resources into new organisational layers havereceived much less attention. This paper gives an overview of ourevolving technological eScience environment to support suchfunctionality. It describes two tools, ADDIT and ViCoS, which enableresearchers, rather than archive managers, to organise and reorganiseresearch data to fit their particular needs. The two tools, which areembedded into our institute's existing software landscape, are aninitial step towards an eScience environment that gives our scientistseasy access to (multimodal) research data of their interest, andempowers them to structure, enrich, link together, and share such dataas they wish.

Share this page