Publications

Displaying 101 - 174 of 174
  • Meeuwissen, M., Roelofs, A., & Levelt, W. J. M. (2003). Naming analog clocks conceptually facilitates naming digital clocks. In Proceedings of XIII Conference of the European Society of Cognitive Psychology (ESCOP 2003) (pp. 271-271).
  • Merkx, D., & Scharenborg, O. (2018). Articulatory feature classification using convolutional neural networks. In Proceedings of Interspeech 2018 (pp. 2142-2146). doi:10.21437/Interspeech.2018-2275.

    Abstract

    The ultimate goal of our research is to improve an existing speech-based computational model of human speech recognition on the task of simulating the role of fine-grained phonetic information in human speech processing. As part of this work we are investigating articulatory feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Different approaches have been used to build AF classifiers, most notably multi-layer perceptrons. Recently, deep neural networks have been applied to the task of AF classification. This paper aims to improve AF classification by investigating two different approaches: 1) investigating the usefulness of a deep Convolutional neural network (CNN) for AF classification; 2) integrating the Mel filtering operation into the CNN architecture. The results showed a remarkable improvement in classification accuracy of the CNNs over state-of-the-art AF classification results for Dutch, most notably in the minority classes. Integrating the Mel filtering operation into the CNN architecture did not further improve classification performance.
  • Micklos, A., Macuch Silva, V., & Fay, N. (2018). The prevalence of repair in studies of language evolution. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 316-318). Toruń, Poland: NCU Press. doi:10.12775/3991-1.075.
  • Moscoso del Prado Martín, F., & Baayen, R. H. (2003). Using the structure found in time: Building real-scale orthographic and phonetic representations by accumulation of expectations. In H. Bowman, & C. Labiouse (Eds.), Connectionist Models of Cognition, Perception and Emotion: Proceedings of the Eighth Neural Computation and Psychology Workshop (pp. 263-272). Singapore: World Scientific.
  • Mulder, K., Ten Bosch, L., & Boves, L. (2018). Analyzing EEG Signals in Auditory Speech Comprehension Using Temporal Response Functions and Generalized Additive Models. In Proceedings of Interspeech 2018 (pp. 1452-1456). doi:10.21437/Interspeech.2018-1676.

    Abstract

    Analyzing EEG signals recorded while participants are listening to continuous speech with the purpose of testing linguistic hypotheses is complicated by the fact that the signals simultaneously reflect exogenous acoustic excitation and endogenous linguistic processing. This makes it difficult to trace subtle differences that occur in mid-sentence position. We apply an analysis based on multivariate temporal response functions to uncover subtle mid-sentence effects. This approach is based on a per-stimulus estimate of the response of the neural system to speech input. Analyzing EEG signals predicted on the basis of the response functions might then bring to light conditionspecific differences in the filtered signals. We validate this approach by means of an analysis of EEG signals recorded with isolated word stimuli. Then, we apply the validated method to the analysis of the responses to the same words in the middle of meaningful sentences.
  • Munro, R., Bethard, S., Kuperman, V., Lai, V. T., Melnick, R., Potts, C., Schnoebelen, T., & Tily, H. (2010). Crowdsourcing and language studies: The new generation of linguistic data. In Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Proceedings of the Workshop (pp. 122-130). Stroudsburg, PA: Association for Computational Linguistics.
  • Oostdijk, N., & Broeder, D. (2003). The Spoken Dutch Corpus and its exploitation environment. In A. Abeille, S. Hansen-Schirra, & H. Uszkoreit (Eds.), Proceedings of the 4th International Workshop on linguistically interpreted corpora (LINC-03) (pp. 93-101).
  • Otake, T., McQueen, J. M., & Cutler, A. (2010). Competition in the perception of spoken Japanese words. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 114-117).

    Abstract

    Japanese listeners detected Japanese words embedded at the end of nonsense sequences (e.g., kaba 'hippopotamus' in gyachikaba). When the final portion of the preceding context together with the initial portion of the word (e.g., here, the sequence chika) was compatible with many lexical competitors, recognition of the embedded word was more difficult than when such a sequence was compatible with few competitors. This clear effect of competition, established here for preceding context in Japanese, joins similar demonstrations, in other languages and for following contexts, to underline that the functional architecture of the human spoken-word recognition system is a universal one.
  • Ouni, S., Cohen, M. M., Young, K., & Jesse, A. (2003). Internationalization of a talking head. In M. Sole, D. Recasens, & J. Romero (Eds.), Proceedings of 15th International Congress of Phonetics Sciences (pp. 2569-2572). Barcelona: Casual Productions.

    Abstract

    In this paper we describe a general scheme for internationalization of our talking head, Baldi, to speak other languages. We describe the modular structure of the auditory/visual synthesis software. As an example, we have created a synthetic Arabic talker, which is evaluated using a noisy word recognition task comparing this talker with a natural one.
  • Ozyurek, A. (1998). An analysis of the basic meaning of Turkish demonstratives in face-to-face conversational interaction. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (Eds.), Oralite et gestualite: Communication multimodale, interaction: actes du colloque ORAGE 98 (pp. 609-614). Paris: L'Harmattan.
  • Ozyurek, A. (2010). The role of iconic gestures in production and comprehension of language: Evidence from brain and behavior. In S. Kopp, & I. Wachsmuth (Eds.), Gesture in embodied communication and human-computer interaction: 8th International Gesture Workshop, GW 2009, Bielefeld, Germany, February 25-27 2009. Revised selected papers (pp. 1-10). Berlin: Springer.
  • Räsänen, O., Seshadri, S., & Casillas, M. (2018). Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. In Proceedings of Interspeech 2018 (pp. 1200-1204). doi:10.21437/Interspeech.2018-1047.

    Abstract

    Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.
  • Ravignani, A., Garcia, M., Gross, S., de Reus, K., Hoeksema, N., Rubio-Garcia, A., & de Boer, B. (2018). Pinnipeds have something to say about speech and rhythm. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 399-401). Toruń, Poland: NCU Press. doi:10.12775/3991-1.095.
  • Raviv, L., Meyer, A. S., & Lev-Ari, S. (2018). The role of community size in the emergence of linguistic structure. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 402-404). Toruń, Poland: NCU Press. doi:10.12775/3991-1.096.
  • Reinisch, E., Jesse, A., & Nygaard, L. C. (2010). Tone of voice helps learning the meaning of novel adjectives [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 114). York: University of York.

    Abstract

    To understand spoken words listeners have to cope with seemingly meaningless variability in the speech signal. Speakers vary, for example, their tone of voice (ToV) by changing speaking rate, pitch, vocal effort, and loudness. This variation is independent of "linguistic prosody" such as sentence intonation or speech rhythm. The variation due to ToV, however, is not random. Speakers use, for example, higher pitch when referring to small objects than when referring to large objects and importantly, adult listeners are able to use these non-lexical ToV cues to distinguish between the meanings of antonym pairs (e.g., big-small; Nygaard, Herold, & Namy, 2009). In the present study, we asked whether listeners infer the meaning of novel adjectives from ToV and subsequently interpret these adjectives according to the learned meaning even in the absence of ToV. Moreover, if listeners actually acquire these adjectival meanings, then they should generalize these word meanings to novel referents. ToV would thus be a semantic cue to lexical acquisition. This hypothesis was tested in an exposure-test paradigm with adult listeners. In the experiment listeners' eye movements to picture pairs were monitored. The picture pairs represented the endpoints of the adjectival dimensions big-small, hot-cold, and strong-weak (e.g., an elephant and an ant represented big-small). Four picture pairs per category were used. While viewing the pictures participants listened to lexically unconstraining sentences containing novel adjectives, for example, "Can you find the foppick one?" During exposure, the sentences were spoken in infant-directed speech with the intended adjectival meaning expressed by ToV. Word-meaning pairings were counterbalanced across participants. Each word was repeated eight times. Listeners had no explicit task. To guide listeners' attention to the relation between the words and pictures, three sets of filler trials were included that contained real English adjectives (e.g., full-empty). In the subsequent test phase participants heard the novel adjectives in neutral adult-directed ToV. Test sentences were recorded before the speaker was informed about intended word meanings. Participants had to choose which of two pictures on the screen the speaker referred to. Picture pairs that were presented during the exposure phase and four new picture pairs per category that varied along the critical dimensions were tested. During exposure listeners did not spontaneously direct their gaze to the intended referent at the first presentation. But as indicated by listener's fixation behavior, they quickly learned the relationship between ToV and word meaning over only two exposures. Importantly, during test participants consistently identified the intended referent object even in the absence of informative ToV. Learning was found for all three tested categories and did not depend on whether the picture pairs had been presented during exposure. Listeners thus use ToV not only to distinguish between antonym pairs but they are able to extract word meaning from ToV and assign this meaning to novel words. The newly learned word meanings can then be generalized to novel referents even in the absence of ToV cues. These findings suggest that ToV can be used as a semantic cue to lexical acquisition. References Nygaard, L. C., Herold, D. S., & Namy, L. L. (2009) The semantics of prosody: Acoustic and perceptual evidence of prosodic correlates to word meaning. Cognitive Science, 33. 127-146.
  • Reis, A., Faísca, L., Castro, S.-L., & Petersson, K. M. (2010). Preditores da leitura ao longo da escolaridade: Um estudo com alunos do 1 ciclo do ensino básico. In Actas do VII simpósio nacional de investigação em psicologia (pp. 3117-3132).

    Abstract

    A aquisição da leitura decorre ao longo de diversas etapas, desde o momento em que a criança inicia o contacto com o alfabeto até ao momento em que se torna um leitor competente, apto a ler correcta e fluentemente. Compreender a evolução desta competência através de uma análise da diferenciação do peso de variáveis preditoras da leitura possibilita teorizar sobre os mecanismos cognitivos envolvidos nas diferentes fases de desenvolvimento da leitura. Realizámos um estudo transversal com 568 alunos do segundo ao quarto ano do primeiro ciclo do Ensino Básico, em que se avaliou o impacto de capacidades de processamento fonológico, nomeação rápida, conhecimento letra-som e vocabulário, bem como de capacidades cognitivas mais gerais (inteligência não-verbal e memória de trabalho), na exactidão e velocidade da leitura. De uma forma geral, os resultados mostraram que, apesar da consciência fonológica permanecer como o preditor mais importante da exactidão e fluência da leitura, o seu peso decresce à medida que a escolaridade aumenta. Observou-se também que, à medida que o contributo da consciência fonológica para a explicação da velocidade de leitura diminuía, aumentava o contributo de outras variáveis mais associadas ao automatismo e reconhecimento lexical, tais como a nomeação rápida e o vocabulário. Em suma, podemos dizer que ao longo da escolaridade se observa uma alteração dinâmica dos processos cognitivos subjacentes à leitura, o que sugere que a criança evolui de uma estratégia de leitura ancorada em processamentos sub-lexicais, e como tal mais dependente de processamentos fonológicos, para uma estratégia baseada no reconhecimento ortográfico das palavras.
  • Rossi, G. (2010). Interactive written discourse: Pragmatic aspects of SMS communication. In G. Garzone, P. Catenaccio, & C. Degano (Eds.), Diachronic perspectives on genres in specialized communication. Conference Proceedings (pp. 135-138). Milano: CUEM.
  • Rubio-Fernández, P., & Jara-Ettinger, J. (2018). Joint inferences of speakers’ beliefs and referents based on how they speak. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 991-996). Austin, TX: Cognitive Science Society.

    Abstract

    For almost two decades, the poor performance observed with the so-called Director task has been interpreted as evidence of limited use of Theory of Mind in communication. Here we propose a probabilistic model of common ground in referential communication that derives three inferences from an utterance: what the speaker is talking about in a visual context, what she knows about the context, and what referential expressions she prefers. We tested our model by comparing its inferences with those made by human participants and found that it closely mirrors their judgments, whereas an alternative model compromising the hearer’s expectations of cooperativeness and efficiency reveals a worse fit to the human data. Rather than assuming that common ground is fixed in a given exchange and may or may not constrain reference resolution, we show how common ground can be inferred as part of the process of reference assignment.
  • Rubio-Fernández, P., Breheny, R., & Lee, M. W. (2003). Context-independent information in concepts: An investigation of the notion of ‘core features’. In Proceedings of the 25th Annual Conference of the Cognitive Science Society (CogSci 2003). Austin, TX: Cognitive Science Society.
  • Sadakata, M., Van der Zanden, L., & Sekiyama, K. (2010). Influence of musical training on perception of L2 speech. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 118-121).

    Abstract

    The current study reports specific cases in which a positive transfer of perceptual ability from the music domain to the language domain occurs. We tested whether musical training enhances discrimination and identification performance of L2 speech sounds (timing features, nasal consonants and vowels). Native Dutch and Japanese speakers with different musical training experience, matched for their estimated verbal IQ, participated in the experiments. Results indicated that musical training strongly increases one’s ability to perceive timing information in speech signals. We also found a benefit of musical training on discrimination performance for a subset of the tested vowel contrasts.
  • Saleh, A., Beck, T., Galke, L., & Scherp, A. (2018). Performance comparison of ad-hoc retrieval models over full-text vs. titles of documents. In M. Dobreva, A. Hinze, & M. Žumer (Eds.), Maturity and Innovation in Digital Libraries: 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings (pp. 290-303). Cham, Switzerland: Springer.

    Abstract

    While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.
  • Sauter, D. (2010). Non-verbal emotional vocalizations across cultures [Abstract]. In E. Zimmermann, & E. Altenmüller (Eds.), Evolution of emotional communication: From sounds in nonhuman mammals to speech and music in man (pp. 15). Hannover: University of Veterinary Medicine Hannover.

    Abstract

    Despite differences in language, culture, and ecology, some human characteristics are similar in people all over the world, while other features vary from one group to the next. These similarities and differences can inform arguments about what aspects of the human mind are part of our shared biological heritage and which are predominantly products of culture and language. I will present data from a cross-cultural project investigating the recognition of non-verbal vocalizations of emotions, such as screams and laughs, across two highly different cultural groups. English participants were compared to individuals from remote, culturally isolated Namibian villages. Vocalizations communicating the so-called “basic emotions” (anger, disgust, fear, joy, sadness, and surprise) were bidirectionally recognised. In contrast, a set of additional positive emotions was only recognised within, but not across, cultural boundaries. These results indicate that a number of primarily negative emotions are associated with vocalizations that can be recognised across cultures, while at least some positive emotions are communicated with culture-specific signals. I will discuss these findings in the context of accounts of emotions at differing levels of analysis, with an emphasis on the often-neglected positive emotions.
  • Sauter, D., Crasborn, O., & Haun, D. B. M. (2010). The role of perceptual learning in emotional vocalizations [Abstract]. In C. Douilliez, & C. Humez (Eds.), Third European Conference on Emotion 2010. Proceedings (pp. 39-39). Lille: Université de Lille.

    Abstract

    Many studies suggest that emotional signals can be recognized across cultures and modalities. But to what extent are these signals innate and to what extent are they learned? This study investigated whether auditory learning is necessary for the production of recognizable emotional vocalizations by examining the vocalizations produced by people born deaf. Recordings were made of eight congenitally deaf Dutch individuals, who produced non-verbal vocalizations of a range of negative and positive emotions. Perception was examined in a forced-choice task with hearing Dutch listeners (n = 25). Considerable variability was found across emotions, suggesting that auditory learning is more important for the acquisition of certain types of vocalizations than for others. In particular, achievement and surprise sounds were relatively poorly recognized. In contrast, amusement and disgust vocalizations were well recognized, suggesting that for some emotions, recognizable vocalizations can develop without any auditory learning. The implications of these results for models of emotional communication are discussed, and other routes of social learning available to the deaf individuals are considered.
  • Sauter, D., Crasborn, O., & Haun, D. B. M. (2010). The role of perceptual learning in emotional vocalizations [Abstract]. Journal of the Acoustical Society of America, 128, 2476.

    Abstract

    Vocalizations like screams and laughs are used to communicate affective states, but what acoustic cues in these signals require vocal learning and which ones are innate? This study investigated the role of auditory learning in the production of non-verbal emotional vocalizations by examining the vocalizations produced by people born deaf. Recordings were made of congenitally deaf Dutch individuals and matched hearing controls, who produced non-verbal vocalizations of a range of negative and positive emotions. Perception was examined in a forced-choice task with hearing Dutch listeners (n = 25), and judgments were analyzed together with acoustic cues, including envelope, pitch, and spectral measures. Considerable variability was found across emotions and acoustic cues, and the two types of information were related for a sub-set of the emotion categories. These results suggest that auditory learning is less important for the acquisition of certain types of vocalizations than for others (particularly amusement and relief), and they also point to a less central role for auditory learning of some acoustic features in affective non-verbal vocalizations. The implications of these results for models of vocal emotional communication are discussed.
  • Scharenborg, O., & Merkx, D. (2018). The role of articulatory feature representation quality in a computational model of human spoken-word recognition. In Proceedings of the Machine Learning in Speech and Language Processing Workshop (MLSLP 2018).

    Abstract

    Fine-Tracker is a speech-based model of human speech
    recognition. While previous work has shown that Fine-Tracker
    is successful at modelling aspects of human spoken-word
    recognition, its speech recognition performance is not
    comparable to that of human performance, possibly due to
    suboptimal intermediate articulatory feature (AF)
    representations. This study investigates the effect of improved
    AF representations, obtained using a state-of-the-art deep
    convolutional network, on Fine-Tracker’s simulation and
    recognition performance: Although the improved AF quality
    resulted in improved speech recognition; it, surprisingly, did
    not lead to an improvement in Fine-Tracker’s simulation power.
  • Scharenborg, O., McQueen, J. M., Ten Bosch, L., & Norris, D. (2003). Modelling human speech recognition using automatic speech recognition paradigms in SpeM. In Proceedings of Eurospeech 2003 (pp. 2097-2100). Adelaide: Causal Productions.

    Abstract

    We have recently developed a new model of human speech recognition, based on automatic speech recognition techniques [1]. The present paper has two goals. First, we show that the new model performs well in the recognition of lexically ambiguous input. These demonstrations suggest that the model is able to operate in the same optimal way as human listeners. Second, we discuss how to relate the behaviour of a recogniser, designed to discover the optimum path through a word lattice, to data from human listening experiments. We argue that this requires a metric that combines both path-based and word-based measures of recognition performance. The combined metric varies continuously as the input speech signal unfolds over time.
  • Scharenborg, O., ten Bosch, L., & Boves, L. (2003). Recognising 'real-life' speech with SpeM: A speech-based computational model of human speech recognition. In Eurospeech 2003 (pp. 2285-2288).

    Abstract

    In this paper, we present a novel computational model of human speech recognition – called SpeM – based on the theory underlying Shortlist. We will show that SpeM, in combination with an automatic phone recogniser (APR), is able to simulate the human speech recognition process from the acoustic signal to the ultimate recognition of words. This joint model takes an acoustic speech file as input and calculates the activation flows of candidate words on the basis of the degree of fit of the candidate words with the input. Experiments showed that SpeM outperforms Shortlist on the recognition of ‘real-life’ input. Furthermore, SpeM performs only slightly worse than an off-the-shelf full-blown automatic speech recogniser in which all words are equally probable, while it provides a transparent computationally elegant paradigm for modelling word activations in human word recognition.
  • Schiller, N. O. (2003). Metrical stress in speech production: A time course study. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 451-454). Adelaide: Causal Productions.

    Abstract

    This study investigated the encoding of metrical information during speech production in Dutch. In Experiment 1, participants were asked to judge whether bisyllabic picture names had initial or final stress. Results showed significantly faster decision times for initially stressed targets (e.g., LEpel 'spoon') than for targets with final stress (e.g., liBEL 'dragon fly'; capital letters indicate stressed syllables) and revealed that the monitoring latencies are not a function of the picture naming or object recognition latencies to the same pictures. Experiments 2 and 3 replicated the outcome of the first experiment with bi- and trisyllabic picture names. These results demonstrate that metrical information of words is encoded rightward incrementally during phonological encoding in speech production. The results of these experiments are in line with Levelt's model of phonological encoding.
  • Schuppler, B., Ernestus, M., Van Dommelen, W., & Koreman, J. (2010). Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2466-2469).

    Abstract

    This paper presents a study on the acoustic sub-segmental properties of word-final /t/ in conversational standard Dutch and how these properties contribute to whether humans and an ASR system classify the /t/ as acoustically present or absent. In general, humans and the ASR system use the same cues (presence of a constriction, a burst, and alveolar frication), but the ASR system is also less sensitive to fine cues (weak bursts, smoothly starting friction) than human listeners and misled by the presence of glottal vibration. These data inform the further development of models of human and automatic speech processing.
  • Seidl, A., & Johnson, E. K. (2003). Position and vowel quality effects in infant's segmentation of vowel-initial words. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2233-2236). Adelaide: Causal Productions.
  • Senghas, A., Ozyurek, A., & Goldin-Meadow, S. (2010). The evolution of segmentation and sequencing: Evidence from homesign and Nicaraguan Sign Language. In A. D. Smith, M. Schouwstra, B. de Boer, & K. Smith (Eds.), Proceedings of the 8th International conference on the Evolution of Language (EVOLANG 8) (pp. 279-289). Singapore: World Scientific.
  • Seuren, P. A. M. (1985). Predicate raising and semantic transparency in Mauritian Creole. In N. Boretzky, W. Enninger, & T. Stolz (Eds.), Akten des 2. Essener Kolloquiums über "Kreolsprachen und Sprachkontakte", 29-30 Nov. 1985 (pp. 203-229). Bochum: Brockmeyer.
  • Shi, R., Werker, J., & Cutler, A. (2003). Function words in early speech perception. In Proceedings of the 15th International Congress of Phonetic Sciences (pp. 3009-3012).

    Abstract

    Three experiments examined whether infants recognise functors in phrases, and whether their representations of functors are phonetically well specified. Eight- and 13- month-old English infants heard monosyllabic lexical words preceded by real functors (e.g., the, his) versus nonsense functors (e.g., kuh); the latter were minimally modified segmentally (but not prosodically) from real functors. Lexical words were constant across conditions; thus recognition of functors would appear as longer listening time to sequences with real functors. Eightmonth- olds' listening times to sequences with real versus nonsense functors did not significantly differ, suggesting that they did not recognise real functors, or functor representations lacked phonetic specification. However, 13-month-olds listened significantly longer to sequences with real functors. Thus, somewhere between 8 and 13 months of age infants learn familiar functors and represent them with segmental detail. We propose that accumulated frequency of functors in input in general passes a critical threshold during this time.
  • Sikveland, A., Öttl, A., Amdal, I., Ernestus, M., Svendsen, T., & Edlund, J. (2010). Spontal-N: A Corpus of Interactional Spoken Norwegian. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 2986-2991). Paris: European Language Resources Association (ELRA).

    Abstract

    Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first corpus of Norwegian in which the majority of speakers have spent significant parts of their lives in Sweden, and in which the recorded speech displays varying degrees of interference from Swedish. The corpus consists of studio quality audio- and video-recordings of four 30-minute free conversations between acquaintances, and a manual orthographic transcription of the entire material. On basis of the orthographic transcriptions, we automatically annotated approximately 50 percent of the material on the phoneme level, by means of a forced alignment between the acoustic signal and pronunciations listed in a dictionary. Approximately seven percent of the automatic transcription was manually corrected. Taking the manual correction as a gold standard, we evaluated several sources of pronunciation variants for the automatic transcription. Spontal-N is intended as a general purpose speech resource that is also suitable for investigating phonetic detail.
  • Simon, E., Escudero, P., & Broersma, M. (2010). Learning minimally different words in a third language: L2 proficiency as a crucial predictor of accuracy in an L3 word learning task. In K. Diubalska-Kolaczyk, M. Wrembel, & M. Kul (Eds.), Proceedings of the Sixth International Symposium on the Acquisition of Second Language Speech (New Sounds 2010).
  • Speed, L., & Majid, A. (2018). Music and odor in harmony: A case of music-odor synaesthesia. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2527-2532). Austin, TX: Cognitive Science Society.

    Abstract

    We report an individual with music-odor synaesthesia who experiences automatic and vivid odor sensations when she hears music. S’s odor associations were recorded on two days, and compared with those of two control participants. Overall, S produced longer descriptions, and her associations were of multiple odors at once, in comparison to controls who typically reported a single odor. Although odor associations were qualitatively different between S and controls, ratings of the consistency of their descriptions did not differ. This demonstrates that crossmodal associations between music and odor exist in non-synaesthetes too. We also found that S is better at discriminating between odors than control participants, and is more likely to experience emotion, memories and evaluations triggered by odors, demonstrating the broader impact of her synaesthesia.

    Additional information

    link to conference website
  • Spilková, H., Brenner, D., Öttl, A., Vondřička, P., Van Dommelen, W., & Ernestus, M. (2010). The Kachna L1/L2 picture replication corpus. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 2432-2436). Paris: European Language Resources Association (ELRA).

    Abstract

    This paper presents the Kachna corpus of spontaneous speech, in which ten Czech and ten Norwegian speakers were recorded both in their native language and in English. The dialogues are elicited using a picture replication task that requires active cooperation and interaction of speakers by asking them to produce a drawing as close to the original as possible. The corpus is appropriate for the study of interactional features and speech reduction phenomena across native and second languages. The combination of productions in non-native English and in speakers’ native language is advantageous for investigation of L2 issues while providing a L1 behaviour reference from all the speakers. The corpus consists of 20 dialogues comprising 12 hours 53 minutes of recording, and was collected in 2008. Preparation of the transcriptions, including a manual orthographic transcription and an automatically generated phonetic transcription, is currently in progress. The phonetic transcriptions are automatically generated by aligning acoustic models with the speech signal on the basis of the orthographic transcriptions and a dictionary of pronunciation variants compiled for the relevant language. Upon completion the corpus will be made available via the European Language Resources Association (ELRA).
  • Staum Casasanto, L., Jasmin, K., & Casasanto, D. (2010). Virtually accommodating: Speech rate accommodation to a virtual interlocutor. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 127-132). Austin, TX: Cognitive Science Society.

    Abstract

    Why do people accommodate to each other’s linguistic behavior? Studies of natural interactions (Giles, Taylor & Bourhis, 1973) suggest that speakers accommodate to achieve interactional goals, influencing what their interlocutor thinks or feels about them. But is this the only reason speakers accommodate? In real-world conversations, interactional motivations are ubiquitous, making it difficult to assess the extent to which they drive accommodation. Do speakers still accommodate even when interactional goals cannot be achieved, for instance, when their interlocutor cannot interpret their accommodation behavior? To find out, we asked participants to enter an immersive virtual reality (VR) environment and to converse with a virtual interlocutor. Participants accommodated to the speech rate of their virtual interlocutor even though he could not interpret their linguistic behavior, and thus accommodation could not possibly help them to achieve interactional goals. Results show that accommodation does not require explicit interactional goals, and suggest other social motivations for accommodation.
  • Stehouwer, H., & van Zaanen, M. (2010). Enhanced suffix arrays as language models: Virtual k-testable languages. In J. M. Sempere, & P. García (Eds.), Grammatical inference: Theoretical results and applications 10th International Colloquium, ICGI 2010, Valencia, Spain, September 13-16, 2010. Proceedings (pp. 305-308). Berlin: Springer.

    Abstract

    In this article, we propose the use of suffix arrays to efficiently implement n-gram language models with practically unlimited size n. This approach, which is used with synchronous back-off, allows us to distinguish between alternative sequences using large contexts. We also show that we can build this kind of models with additional information for each symbol, such as part-of-speech tags and dependency information. The approach can also be viewed as a collection of virtual k-testable automata. Once built, we can directly access the results of any k-testable automaton generated from the input training data. Synchronous back- off automatically identies the k-testable automaton with the largest feasible k. We have used this approach in several classification tasks.
  • Stehouwer, H., & Van Zaanen, M. (2010). Finding patterns in strings using suffix arrays. In M. Ganzha, & M. Paprzycki (Eds.), Proceedings of the International Multiconference on Computer Science and Information Technology, October 18–20, 2010. Wisła, Poland (pp. 505-511). IEEE.

    Abstract

    Finding regularities in large data sets requires implementations of systems that are efficient in both time and space requirements. Here, we describe a newly developed system that exploits the internal structure of the enhanced suffixarray to find significant patterns in a large collection of sequences. The system searches exhaustively for all significantly compressing patterns where patterns may consist of symbols and skips or wildcards. We demonstrate a possible application of the system by detecting interesting patterns in a Dutch and an English corpus.
  • Stehouwer, H., & van Zaanen, M. (2010). Using suffix arrays as language models: Scaling the n-gram. In Proceedings of the 22st Benelux Conference on Artificial Intelligence (BNAIC 2010), October 25-26, 2010.

    Abstract

    In this article, we propose the use of suffix arrays to implement n-gram language models with practically unlimited size n. These unbounded n-grams are called 1-grams. This approach allows us to use large contexts efficiently to distinguish between different alternative sequences while applying synchronous back-off. From a practical point of view, the approach has been applied within the context of spelling confusibles, verb and noun agreement and prenominal adjective ordering. These initial experiments show promising results and we relate the performance to the size of the n-grams used for disambiguation.
  • Stivers, T., Enfield, N. J., & Levinson, S. C. (Eds.). (2010). Question-response sequences in conversation across ten languages [Special Issue]. Journal of Pragmatics, 42(10). doi:10.1016/j.pragma.2010.04.001.
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2018). Analyzing reaction time sequences from human participants in auditory experiments. In Proceedings of Interspeech 2018 (pp. 971-975). doi:10.21437/Interspeech.2018-1728.

    Abstract

    Sequences of reaction times (RT) produced by participants in an experiment are not only influenced by the stimuli, but by many other factors as well, including fatigue, attention, experience, IQ, handedness, etc. These confounding factors result in longterm effects (such as a participant’s overall reaction capability) and in short- and medium-time fluctuations in RTs (often referred to as ‘local speed effects’). Because stimuli are usually presented in a random sequence different for each participant, local speed effects affect the underlying ‘true’ RTs of specific trials in different ways across participants. To be able to focus statistical analysis on the effects of the cognitive process under study, it is necessary to reduce the effect of confounding factors as much as possible. In this paper we propose and compare techniques and criteria for doing so, with focus on reducing (‘filtering’) the local speed effects. We show that filtering matters substantially for the significance analyses of predictors in linear mixed effect regression models. The performance of filtering is assessed by the average between-participant correlation between filtered RT sequences and by Akaike’s Information Criterion, an important measure of the goodness-of-fit of linear mixed effect regression models.
  • Ten Bosch, L., & Boves, L. (2018). Information encoding by deep neural networks: what can we learn? In Proceedings of Interspeech 2018 (pp. 1457-1461). doi:10.21437/Interspeech.2018-1896.

    Abstract

    The recent advent of deep learning techniques in speech tech-nology and in particular in automatic speech recognition hasyielded substantial performance improvements. This suggeststhat deep neural networks (DNNs) are able to capture structurein speech data that older methods for acoustic modeling, suchas Gaussian Mixture Models and shallow neural networks failto uncover. In image recognition it is possible to link repre-sentations on the first couple of layers in DNNs to structuralproperties of images, and to representations on early layers inthe visual cortex. This raises the question whether it is possi-ble to accomplish a similar feat with representations on DNNlayers when processing speech input. In this paper we presentthree different experiments in which we attempt to untanglehow DNNs encode speech signals, and to relate these repre-sentations to phonetic knowledge, with the aim to advance con-ventional phonetic concepts and to choose the topology of aDNNs more efficiently. Two experiments investigate represen-tations formed by auto-encoders. A third experiment investi-gates representations on convolutional layers that treat speechspectrograms as if they were images. The results lay the basisfor future experiments with recursive networks.
  • Thompson, B., & Lupyan, G. (2018). Automatic estimation of lexical concreteness in 77 languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1122-1127). Austin, TX: Cognitive Science Society.

    Abstract

    We estimate lexical Concreteness for millions of words across 77 languages. Using a simple regression framework, we combine vector-based models of lexical semantics with experimental norms of Concreteness in English and Dutch. By applying techniques to align vector-based semantics across distinct languages, we compute and release Concreteness estimates at scale in numerous languages for which experimental norms are not currently available. This paper lays out the technique and its efficacy. Although this is a difficult dataset to evaluate immediately, Concreteness estimates computed from English correlate with Dutch experimental norms at $\rho$ = .75 in the vocabulary at large, increasing to $\rho$ = .8 among Nouns. Our predictions also recapitulate attested relationships with word frequency. The approach we describe can be readily applied to numerous lexical measures beyond Concreteness
  • Thompson, B., Roberts, S., & Lupyan, G. (2018). Quantifying semantic similarity across languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2551-2556). Austin, TX: Cognitive Science Society.

    Abstract

    Do all languages convey semantic knowledge in the same way? If language simply mirrors the structure of the world, the answer should be a qualified “yes”. If, however, languages impose structure as much as reflecting it, then even ostensibly the “same” word in different languages may mean quite different things. We provide a first pass at a large-scale quantification of cross-linguistic semantic alignment of approximately 1000 meanings in 55 languages. We find that the translation equivalents in some domains (e.g., Time, Quantity, and Kinship) exhibit high alignment across languages while the structure of other domains (e.g., Politics, Food, Emotions, and Animals) exhibits substantial cross-linguistic variability. Our measure of semantic alignment correlates with known phylogenetic distances between languages: more phylogenetically distant languages have less semantic alignment. We also find semantic alignment to correlate with cultural distances between societies speaking the languages, suggesting a rich co-adaptation of language and culture even in domains of experience that appear most constrained by the natural world
  • Torreira, F., & Ernestus, M. (2010). Phrase-medial vowel devoicing in spontaneous French. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2006-2009).

    Abstract

    This study investigates phrase-medial vowel devoicing in European French (e.g. /ty po/ [typo] 'you can'). Our spontaneous speech data confirm that French phrase-medial devoicing is a frequent phenomenon affecting high vowels preceded by voiceless consonants. We also found that devoicing is more frequent in temporally reduced and coarticulated vowels. Complete and partial devoicing were conditioned by the same variables (speech rate, consonant type and distance from the end of the AP). Given these results, we propose that phrase-medial vowel devoicing in French arises mainly from the temporal compression of vocalic gestures and the aerodynamic conditions imposed by high vowels.
  • Torreira, F., & Ernestus, M. (2010). The Nijmegen corpus of casual Spanish. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10) (pp. 2981-2985). Paris: European Language Resources Association (ELRA).

    Abstract

    This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual Spanish (NCCSp). The corpus contains around 30 hours of recordings of 52 Madrid Spanish speakers engaged in conversations with friends. Casual speech was elicited during three different parts, which together provided around ninety minutes of speech from every group of speakers. While Parts 1 and 2 did not require participants to perform any specific task, in Part 3 participants negotiated a common answer to general questions about society. Information about how to obtain a copy of the corpus can be found online at http://mirjamernestus.ruhosting.nl/Ernestus/NCCSp
  • Tourtouri, E. N., Delogu, F., & Crocker, M. W. (2018). Specificity and entropy reduction in situated referential processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3356-3361). Austin: Cognitive Science Society.

    Abstract

    In situated communication, reference to an entity in the shared visual context can be established using eitheranexpression that conveys precise (minimally specified) or redundant (over-specified) information. There is, however, along-lasting debate in psycholinguistics concerningwhether the latter hinders referential processing. We present evidence from an eyetrackingexperiment recordingfixations as well asthe Index of Cognitive Activity –a novel measure of cognitive workload –supporting the view that over-specifications facilitate processing. We further present originalevidence that, above and beyond the effect of specificity,referring expressions thatuniformly reduce referential entropyalso benefitprocessing
  • Tuinman, A., & Cutler, A. (2010). Casual speech processes: L1 knowledge and L2 speech perception. In K. Dziubalska-Kołaczyk, M. Wrembel, & M. Kul (Eds.), Proceedings of the 6th International Symposium on the Acquisition of Second Language Speech, New Sounds 2010, Poznań, Poland, 1-3 May 2010 (pp. 512-517). Poznan: Adama Mickiewicz University.

    Abstract

    Every language manifests casual speech processes, and hence every second language too. This study examined how listeners deal with second-language casual speech processes, as a function of the processes in their native language. We compared a match case, where a second-language process t/-reduction) is also operative in native speech, with a mismatch case, where a second-language process (/r/-insertion) is absent from native speech. In each case native and non-native listeners judged stimuli in which a given phoneme (in sentence context) varied along a continuum from absent to present. Second-language listeners in general mimicked native performance in the match case, but deviated significantly from native performance in the mismatch case. Together these results make it clear that the mapping from first to second language is as important in the interpretation of casual speech processes as in other dimensions of speech perception. Unfamiliar casual speech processes are difficult to adapt to in a second language. Casual speech processes that are already familiar from native speech, however, are easy to adapt to; indeed, our results even suggest that it is possible for subtle difference in their occurrence patterns across the two languages to be detected,and to be accommodated to in second-language listening.
  • Vagliano, I., Galke, L., Mai, F., & Scherp, A. (2018). Using adversarial autoencoders for multi-modal automatic playlist continuation. In C.-W. Chen, P. Lamere, M. Schedl, & H. Zamani (Eds.), RecSys Challenge '18: Proceedings of the ACM Recommender Systems Challenge 2018 (pp. 5.1-5.6). New York: ACM. doi:10.1145/3267471.3267476.

    Abstract

    The task of automatic playlist continuation is generating a list of recommended tracks that can be added to an existing playlist. By suggesting appropriate tracks, i. e., songs to add to a playlist, a recommender system can increase the user engagement by making playlist creation easier, as well as extending listening beyond the end of current playlist. The ACM Recommender Systems Challenge 2018 focuses on such task. Spotify released a dataset of playlists, which includes a large number of playlists and associated track listings. Given a set of playlists from which a number of tracks have been withheld, the goal is predicting the missing tracks in those playlists. We participated in the challenge as the team Unconscious Bias and, in this paper, we present our approach. We extend adversarial autoencoders to the problem of automatic playlist continuation. We show how multiple input modalities, such as the playlist titles as well as track titles, artists and albums, can be incorporated in the playlist continuation task.
  • Van Rees Vellinga, M., Hanulikova, A., Weber, A., & Zwitserlood, P. (2010). A neurophysiological investigation of processing phoneme substitutions in L2. In New Sounds 2010: Sixth International Symposium on the Acquisition of Second Language Speech (pp. 518-523). Poznan, Poland: Adam Mickiewicz University.
  • Van der Meij, L., Isaac, A., & Zinn, C. (2010). A web-based repository service for vocabularies and alignments in the cultural heritage domain. In L. Aroyo, G. Antoniou, E. Hyvönen, A. Ten Teije, H. Stuckenschmidt, L. Cabral, & T. Tudorache (Eds.), The Semantic Web: Research and Applications. 7th Extended Semantic Web Conference, Proceedings, Part I (pp. 394-409). Heidelberg: Springer.

    Abstract

    Controlled vocabularies of various kinds (e.g., thesauri, classification schemes) play an integral part in making Cultural Heritage collections accessible. The various institutions participating in the Dutch CATCH programme maintain and make use of a rich and diverse set of vocabularies. This makes it hard to provide a uniform point of access to all collections at once. Our SKOS-based vocabulary and alignment repository aims at providing technology for managing the various vocabularies, and for exploiting semantic alignments across any two of them. The repository system exposes web services that effectively support the construction of tools for searching and browsing across vocabularies and collections or for collection curation (indexing), as we demonstrate.
  • Van Gerven, M., & Simanova, I. (2010). Concept classification with Bayesian multi-task learning. In Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics (pp. 10-17). Los Angeles: Association for Computational Linguistics.

    Abstract

    Multivariate analysis allows decoding of single trial data in individual subjects. Since different models are obtained for each subject it becomes hard to perform an analysis on the group level. We introduce a new algorithm for Bayesian multi-task learning which imposes a coupling between single-subject models. Using
    the CMU fMRI dataset it is shown that the algorithm can be used for concept classification
    based on the average activation of regions in the AAL atlas. Concepts which were most easily classified correspond to the categories shelter,manipulation and eating, which is in accordance with the literature. The multi-task learning algorithm is shown to find regions of interest that are common to all subjects which
    therefore facilitates interpretation of the obtained
    models.
  • Van Hout, A., & Veenstra, A. (2010). Telicity marking in Dutch child language: Event realization or no aspectual coercion? In J. Costa, A. Castro, M. Lobo, & F. Pratas (Eds.), Language Acquisition and Development: Proceedings of GALA 2009 (pp. 216-228). Newcastle upon Tyne: Cambridge Scholars Publishing.
  • Van de Ven, M., Tucker, B. V., & Ernestus, M. (2010). Semantic facilitation in bilingual everyday speech comprehension. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (Interspeech 2010), Makuhari, Japan (pp. 1245-1248).

    Abstract

    Previous research suggests that bilinguals presented with low and high predictability sentences benefit from semantics in clear but not in conversational speech [1]. In everyday speech, however, many words are not highly predictable. Previous research has shown that native listeners can use also more subtle semantic contextual information [2]. The present study reports two auditory lexical decision experiments investigating to what extent late Asian-English bilinguals benefit from subtle semantic cues in their processing of English unreduced and reduced speech. Our results indicate that these bilinguals are less sensitive to semantic cues than native listeners for both speech registers.
  • Van Uytvanck, D., Zinn, C., Broeder, D., Wittenburg, P., & Gardelleni, M. (2010). Virtual language observatory: The portal to the language resources and technology universe. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 900-903). European Language Resources Association (ELRA).

    Abstract

    Over the years, the field of Language Resources and Technology (LRT) hasdeveloped a tremendous amount of resources and tools. However, there is noready-to-use map that researchers could use to gain a good overview andsteadfast orientation when searching for, say corpora or software tools tosupport their studies. It is rather the case that information is scatteredacross project- or organisation-specific sites, which makes it hard if notimpossible for less-experienced researchers to gather all relevant material.Clearly, the provision of metadata is central to resource and softwareexploration. However, in the LRT field, metadata comes in many forms, tastesand qualities, and therefore substantial harmonization and curation efforts arerequired to provide researchers with metadata-based guidance. To address thisissue a broad alliance of LRT providers (CLARIN, the Linguist List, DOBES,DELAMAN, DFKI, ELRA) have initiated the Virtual Language Observatory portal toprovide a low-barrier, easy-to-follow entry point to language resources andtools; it can be accessed via http://www.clarin.eu/vlo
  • Vernes, S. C. (2018). Vocal learning in bats: From genes to behaviour. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 516-518). Toruń, Poland: NCU Press. doi:10.12775/3991-1.128.
  • Versteegh, M., Ten Bosch, L., & Boves, L. (2010). Active word learning under uncertain input conditions. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2930-2933). ISCA.

    Abstract

    This paper presents an analysis of phoneme durations of emotional speech in two languages: Dutch and Korean. The analyzed corpus of emotional speech has been specifically developed for the purpose of cross-linguistic comparison, and is more balanced than any similar corpus available so far: a) it contains expressions by both Dutch and Korean actors and is based on judgments by both Dutch and Korean listeners; b) the same elicitation technique and recording procedure were used for recordings of both languages; and c) the phonetics of the carrier phrase were constructed to be permissible in both languages. The carefully controlled phonetic content of the carrier phrase allows for analysis of the role of specific phonetic features, such as phoneme duration, in emotional expression in Dutch and Korean. In this study the mutual effect of language and emotion on phoneme duration is presented.
  • Versteegh, M., Ten Bosch, L., & Boves, L. (2010). Dealing with uncertain input in word learning. In Proceedings of the IXth IEEE International Conference on Development and Learning (ICDL). Ann Arbor, MI, 18-21 Aug. 2010 (pp. 46-51). IEEE.

    Abstract

    In this paper we investigate a computational model of word learning, that is embedded in a cognitively and ecologically plausible framework. Multi-modal stimuli from four different speakers form a varied source of experience. The model incorporates active learning, attention to a communicative setting and clarity of the visual scene. The model's ability to learn associations between speech utterances and visual concepts is evaluated during training to investigate the influence of active learning under conditions of uncertain input. The results show the importance of shared attention in word learning and the model's robustness against noise.
  • Versteegh, M., Sangati, F., & Zuidema, W. (2010). Simulations of socio-linguistic change: Implications for unidirectionality. In A. Smith, M. Schoustra, B. Boer, & K. Smith (Eds.), Proceedings of the 8th International conference on the Evolution of Language (EVOLANG 8) (pp. 511-512). World Scientific Publishing.
  • Von Holzen, K., & Bergmann, C. (2018). A Meta-Analysis of Infants’ Mispronunciation Sensitivity Development. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1159-1164). Austin, TX: Cognitive Science Society.

    Abstract

    Before infants become mature speakers of their native language, they must acquire a robust word-recognition system which allows them to strike the balance between allowing some variation (mood, voice, accent) and recognizing variability that potentially changes meaning (e.g. cat vs hat). The current meta-analysis quantifies how the latter, termed mispronunciation sensitivity, changes over infants’ first three years, testing competing predictions of mainstream language acquisition theories. Our results show that infants were sensitive to mispronunciations, but accepted them as labels for target objects. Interestingly, and in contrast to predictions of mainstream theories, mispronunciation sensitivity was not modulated by infant age, suggesting that a sufficiently flexible understanding of native language phonology is in place at a young age.
  • Wagner, A., & Braun, A. (2003). Is voice quality language-dependent? Acoustic analyses based on speakers of three different languages. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 651-654). Adelaide: Causal Productions.
  • Weber, A., & Smits, R. (2003). Consonant and vowel confusion patterns by American English listeners. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences.

    Abstract

    This study investigated the perception of American English phonemes by native listeners. Listeners identified either the consonant or the vowel in all possible English CV and VC syllables. The syllables were embedded in multispeaker babble at three signal-to-noise ratios (0 dB, 8 dB, and 16 dB). Effects of syllable position, signal-to-noise ratio, and articulatory features on vowel and consonant identification are discussed. The results constitute the largest source of data that is currently available on phoneme confusion patterns of American English phonemes by native listeners.
  • Weber, A., & Smits, R. (2003). Consonant and vowel confusion patterns by American English listeners. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 1437-1440). Adelaide: Causal Productions.

    Abstract

    This study investigated the perception of American English phonemes by native listeners. Listeners identified either the consonant or the vowel in all possible English CV and VC syllables. The syllables were embedded in multispeaker babble at three signalto-noise ratios (0 dB, 8 dB, and 16 dB). Effects of syllable position, signal-to-noise ratio, and articulatory features on vowel and consonant identification are discussed. The results constitute the largest source of data that is currently available on phoneme confusion patterns of American English phonemes by native listeners.
  • Weber, A., & Poellmann, K. (2010). Identifying foreign speakers with an unfamiliar accent or in an unfamiliar language. In New Sounds 2010: Sixth International Symposium on the Acquisition of Second Language Speech (pp. 536-541). Poznan, Poland: Adam Mickiewicz University.
  • Weber, A. (1998). Listening to nonnative language which violates native assimilation rules. In D. Duez (Ed.), Proceedings of the European Scientific Communication Association workshop: Sound patterns of Spontaneous Speech (pp. 101-104).

    Abstract

    Recent studies using phoneme detection tasks have shown that spoken-language processing is neither facilitated nor interfered with by optional assimilation, but is inhibited by violation of obligatory assimilation. Interpretation of these results depends on an assessment of their generality, specifically, whether they also obtain when listeners are processing nonnative language. Two separate experiments are presented in which native listeners of German and native listeners of Dutch had to detect a target fricative in legal monosyllabic Dutch nonwords. All of the nonwords were correct realisations in standard Dutch. For German listeners, however, half of the nonwords contained phoneme strings which violate the German fricative assimilation rule. Whereas the Dutch listeners showed no significant effects, German listeners detected the target fricative faster when the German fricative assimilation was violated than when no violation occurred. The results might suggest that violation of assimilation rules does not have to make processing more difficult per se.
  • Willems, R. M., Labruna, L., D'Esposito, M., Ivry, R., & Casasanto, D. (2010). A functional role for the motor system in language understanding: Evidence from rTMS [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 127). York: University of York.
  • Wittek, A. (1998). Learning verb meaning via adverbial modification: Change-of-state verbs in German and the adverb "wieder" again. In A. Greenhill, M. Hughes, H. Littlefield, & H. Walsh (Eds.), Proceedings of the 22nd Annual Boston University Conference on Language Development (pp. 779-790). Somerville, MA: Cascadilla Press.
  • Witteman, M. J., Weber, A., & McQueen, J. M. (2010). Rapid and long-lasting adaptation to foreign-accented speech [Abstract]. Journal of the Acoustical Society of America, 128, 2486.

    Abstract

    In foreign-accented speech, listeners have to handle noticeable deviations from the standard pronunciation of a target language. Three cross-modal priming experiments investigated how short- and long-term experiences with a foreign accent influence word recognition by native listeners. In experiment 1, German-accented words were presented to Dutch listeners who had either extensive or limited prior experience with German-accented Dutch. Accented words either contained a diphthong substitution that deviated acoustically quite largely from the canonical form (huis [hys], "house", pronounced as [hoys]), or that deviated acoustically to a lesser extent (lijst [lst], "list", pronounced as [lst]). The mispronunciations never created lexical ambiguity in Dutch. While long-term experience facilitated word recognition for both types of substitutions, limited experience facilitated recognition only of words with acoustically smaller deviations. In experiment 2, Dutch listeners with limited experience listened to the German speaker for 4 min before participating in the cross-modal priming experiment. The results showed that speaker-specific learning effects for acoustically large deviations can be obtained already after a brief exposure, as long as the exposure contains evidence of the deviations. Experiment 3 investigates whether these short-term adaptation effects for foreign-accented speech are speaker-independent.
  • Wittenburg, P. (2010). Culture change in data management. In V. Luzar-Stiffler, I. Jarec, & Z. Bekic (Eds.), Proceedings of the ITI 2010, 32nd International Conference on Information Technology Interfaces (pp. 43 -48). Zagreb, Croatia: University of Zagreb.

    Abstract

    In the emerging e-Science scenario users should be able to easily combine data resources and tools/services; and machines should automatically be able to trace paths and carry out interpretations. Users who want to participate need to move from a down-load first to a cyberinfrastructure paradigm, thus increasing their dependency on the seamless operation of all components in the Internet. Such a scenario is inherently complex and requires compliance to guidelines and standards to keep it working smoothly. Only a change in our culture of dealing with research data and awareness about the way we do data lifecycle management will lead to success. Since we have so many legacy resources that are not compliant with the required guidelines, since we need to admit obvious problems in particular with standardization in the area of semantics and since it will take much time to establish trust at the side of researchers, the e-Science scenario can only be achieved stepwise which will take much time.
  • Wittenburg, P., Trilsbeek, P., & Lenkiewicz, P. (2010). Large multimedia archive for world languages. In SSCS'10 - Proceedings of the 2010 ACM Workshop on Searching Spontaneous Conversational Speech, Co-located with ACM Multimedia 2010 (pp. 53-56). New York: Association for Computing Machinery, Inc. (ACM). doi:10.1145/1878101.1878113.

    Abstract

    In this paper, we describe the core pillars of a large archive oflanguage material recorded worldwide partly about languages that are highly endangered. The bases for the documentation of these languages are audio/video recordings which are then annotated at several linguistic layers. The digital age completely changed the requirements of long-term preservation and it is discussed how the archive met these new challenges. An extensive solution for data replication has been worked out to guarantee bit-stream preservation. Due to an immediate conversion of the incoming data to standards -based formats and checks at upload time lifecycle management of all 50 Terabyte of data is widely simplified. A suitable metadata framework not only allowing users to describe and discover resources, but also allowing them to organize their resources is enabling the management of this amount of resources very efficiently. Finally, it is the Language Archiving Technology software suite which allows users to create, manipulate, access and enrich all archived resources given that they have access permissions.
  • Wittenburg, P., Bel, N., Borin, L., Budin, G., Calzolari, N., Hajicova, E., Koskenniemi, K., Lemnitzer, L., Maegaard, B., Piasecki, M., Pierrel, J.-M., Piperidis, S., Skadina, I., Tufis, D., Van Veenendaal, R., Váradi, T., & Wynne, M. (2010). Resource and service centres as the backbone for a sustainable service infrastructure. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 60-63). European Language Resources Association (ELRA).

    Abstract

    Currently, research infrastructures are being designed and established in manydisciplines since they all suffer from an enormous fragmentation of theirresources and tools. In the domain of language resources and tools the CLARINinitiative has been funded since 2008 to overcome many of the integration andinteroperability hurdles. CLARIN can build on knowledge and work from manyprojects that were carried out during the last years and wants to build stableand robust services that can be used by researchers. Here service centres willplay an important role that have the potential of being persistent and thatadhere to criteria as they have been established by CLARIN. In the last year ofthe so-called preparatory phase these centres are currently developing four usecases that can demonstrate how the various pillars CLARIN has been working oncan be integrated. All four use cases fulfil the criteria of beingcross-national.
  • Zinn, C., Wittenburg, P., & Ringersma, J. (2010). An evolving eScience environment for research data in linguistics. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 894-899). European Language Resources Association (ELRA).

    Abstract

    The amount of research data in the Humanities is increasing at fastspeed. Metadata helps describing and making accessible this data tointerested researchers within and across institutions. While metadatainteroperability is an issue that is being recognised and addressed,the systematic and user-driven provision of annotations and thelinking together of resources into new organisational layers havereceived much less attention. This paper gives an overview of ourevolving technological eScience environment to support suchfunctionality. It describes two tools, ADDIT and ViCoS, which enableresearchers, rather than archive managers, to organise and reorganiseresearch data to fit their particular needs. The two tools, which areembedded into our institute's existing software landscape, are aninitial step towards an eScience environment that gives our scientistseasy access to (multimodal) research data of their interest, andempowers them to structure, enrich, link together, and share such dataas they wish.

Share this page