Publications

Displaying 101 - 114 of 114
  • Ten Bosch, L., & Boves, L. (2018). Information encoding by deep neural networks: what can we learn? In Proceedings of Interspeech 2018 (pp. 1457-1461). doi:10.21437/Interspeech.2018-1896.

    Abstract

    The recent advent of deep learning techniques in speech tech-nology and in particular in automatic speech recognition hasyielded substantial performance improvements. This suggeststhat deep neural networks (DNNs) are able to capture structurein speech data that older methods for acoustic modeling, suchas Gaussian Mixture Models and shallow neural networks failto uncover. In image recognition it is possible to link repre-sentations on the first couple of layers in DNNs to structuralproperties of images, and to representations on early layers inthe visual cortex. This raises the question whether it is possi-ble to accomplish a similar feat with representations on DNNlayers when processing speech input. In this paper we presentthree different experiments in which we attempt to untanglehow DNNs encode speech signals, and to relate these repre-sentations to phonetic knowledge, with the aim to advance con-ventional phonetic concepts and to choose the topology of aDNNs more efficiently. Two experiments investigate represen-tations formed by auto-encoders. A third experiment investi-gates representations on convolutional layers that treat speechspectrograms as if they were images. The results lay the basisfor future experiments with recursive networks.
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2013). Towards an end-to-end computational model of speech comprehension: simulating a lexical decision task. In Proceedings of INTERSPEECH 2013: 14th Annual Conference of the International Speech Communication Association (pp. 2822-2826).

    Abstract

    This paper describes a computational model of speech comprehension that takes the acoustic signal as input and predicts reaction times as observed in an auditory lexical decision task. By doing so, we explore a new generation of end-to-end computational models that are able to simulate the behaviour of human subjects participating in a psycholinguistic experiment. So far, nearly all computational models of speech comprehension do not start from the speech signal itself, but from abstract representations of the speech signal, while the few existing models that do start from the acoustic signal cannot directly model reaction times as obtained in comprehension experiments. The main functional components in our model are the perception stage, which is compatible with the psycholinguistic model Shortlist B and is implemented with techniques from automatic speech recognition, and the decision stage, which is based on the linear ballistic accumulation decision model. We successfully tested our model against data from 20 participants performing a largescale auditory lexical decision experiment. Analyses show that the model is a good predictor for the average judgment and reaction time for each word.
  • Thompson, B., & Lupyan, G. (2018). Automatic estimation of lexical concreteness in 77 languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1122-1127). Austin, TX: Cognitive Science Society.

    Abstract

    We estimate lexical Concreteness for millions of words across 77 languages. Using a simple regression framework, we combine vector-based models of lexical semantics with experimental norms of Concreteness in English and Dutch. By applying techniques to align vector-based semantics across distinct languages, we compute and release Concreteness estimates at scale in numerous languages for which experimental norms are not currently available. This paper lays out the technique and its efficacy. Although this is a difficult dataset to evaluate immediately, Concreteness estimates computed from English correlate with Dutch experimental norms at $\rho$ = .75 in the vocabulary at large, increasing to $\rho$ = .8 among Nouns. Our predictions also recapitulate attested relationships with word frequency. The approach we describe can be readily applied to numerous lexical measures beyond Concreteness
  • Thompson, B., Roberts, S., & Lupyan, G. (2018). Quantifying semantic similarity across languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2551-2556). Austin, TX: Cognitive Science Society.

    Abstract

    Do all languages convey semantic knowledge in the same way? If language simply mirrors the structure of the world, the answer should be a qualified “yes”. If, however, languages impose structure as much as reflecting it, then even ostensibly the “same” word in different languages may mean quite different things. We provide a first pass at a large-scale quantification of cross-linguistic semantic alignment of approximately 1000 meanings in 55 languages. We find that the translation equivalents in some domains (e.g., Time, Quantity, and Kinship) exhibit high alignment across languages while the structure of other domains (e.g., Politics, Food, Emotions, and Animals) exhibits substantial cross-linguistic variability. Our measure of semantic alignment correlates with known phylogenetic distances between languages: more phylogenetically distant languages have less semantic alignment. We also find semantic alignment to correlate with cultural distances between societies speaking the languages, suggesting a rich co-adaptation of language and culture even in domains of experience that appear most constrained by the natural world
  • Timmer, K., Ganushchak, L. Y., Mitlina, Y., & Schiller, N. O. (2013). Choosing first or second language phonology in 125 ms [Abstract]. Journal of Cognitive Neuroscience, 25 Suppl., 164.

    Abstract

    We are often in a bilingual situation (e.g., overhearing a conversation in the train). We investigated whether first (L1) and second language (L2) phonologies are automatically activated. A masked priming paradigm was used, with Russian words as targets and either Russian or English words as primes. Event-related potentials (ERPs) were recorded while Russian (L1) – English (L2) bilinguals read aloud L1 target words (e.g. РЕЙС /reis/ ‘fl ight’) primed with either L1 (e.g. РАНА /rana/ ‘wound’) or L2 words (e.g. PACK). Target words were read faster when they were preceded by phonologically related L1 primes but not by orthographically related L2 primes. ERPs showed orthographic priming in the 125-200 ms time window. Thus, both L1 and L2 phonologies are simultaneously activated during L1 reading. The results provide support for non-selective models of bilingual reading, which assume automatic activation of the non-target language phonology even when it is not required by the task.
  • Tourtouri, E. N., Delogu, F., & Crocker, M. W. (2018). Specificity and entropy reduction in situated referential processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3356-3361). Austin: Cognitive Science Society.

    Abstract

    In situated communication, reference to an entity in the shared visual context can be established using eitheranexpression that conveys precise (minimally specified) or redundant (over-specified) information. There is, however, along-lasting debate in psycholinguistics concerningwhether the latter hinders referential processing. We present evidence from an eyetrackingexperiment recordingfixations as well asthe Index of Cognitive Activity –a novel measure of cognitive workload –supporting the view that over-specifications facilitate processing. We further present originalevidence that, above and beyond the effect of specificity,referring expressions thatuniformly reduce referential entropyalso benefitprocessing
  • Ünal, E., & Papafragou, A. (2013). Linguistic and conceptual representations of inference as a knowledge source. In S. Baiz, N. Goldman, & R. Hawkes (Eds.), Proceedings of the 37th Annual Boston University Conference on Language Development (BUCLD 37) (pp. 433-443). Boston: Cascadilla Press.
  • Vagliano, I., Galke, L., Mai, F., & Scherp, A. (2018). Using adversarial autoencoders for multi-modal automatic playlist continuation. In C.-W. Chen, P. Lamere, M. Schedl, & H. Zamani (Eds.), RecSys Challenge '18: Proceedings of the ACM Recommender Systems Challenge 2018 (pp. 5.1-5.6). New York: ACM. doi:10.1145/3267471.3267476.

    Abstract

    The task of automatic playlist continuation is generating a list of recommended tracks that can be added to an existing playlist. By suggesting appropriate tracks, i. e., songs to add to a playlist, a recommender system can increase the user engagement by making playlist creation easier, as well as extending listening beyond the end of current playlist. The ACM Recommender Systems Challenge 2018 focuses on such task. Spotify released a dataset of playlists, which includes a large number of playlists and associated track listings. Given a set of playlists from which a number of tracks have been withheld, the goal is predicting the missing tracks in those playlists. We participated in the challenge as the team Unconscious Bias and, in this paper, we present our approach. We extend adversarial autoencoders to the problem of automatic playlist continuation. We show how multiple input modalities, such as the playlist titles as well as track titles, artists and albums, can be incorporated in the playlist continuation task.
  • Van de Weijer, J. (1997). Language input to a prelingual infant. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings of the GALA '97 conference on language acquisition (pp. 290-293). Edinburgh University Press.

    Abstract

    Pitch, intonation, and speech rate were analyzed in a collection of everyday speech heard by one Dutch infant between the ages of six and nine months. Components of each of these variables were measured in the speech of three adult speakers (mother, father, baby-sitter) when they addressed the infant, and when they addressed another adult. The results are in line with previously reported findings which are usually based on laboratory or prearranged settings: infant-directed speech in a natural setting exhibits more pitch variation, a larger number of simple intonation contours, and slower speech rate than does adult-directed speech.
  • Van Heuven, V. J., Haan, J., Janse, E., & Van der Torre, E. J. (1997). Perceptual identification of sentence type and the time-distribution of prosodic interrogativity markers in Dutch. In Proceedings of the ESCA Tutorial and Research Workshop on Intonation: Theory, Models and Applications, Athens, Greece, 1997 (pp. 317-320).

    Abstract

    Dutch distinguishes at least four sentence types: statements and questions, the latter type being subdivided into wh-questions (beginning with a question word), yes/no-questions (with inversion of subject and finite), and declarative questions (lexico-syntactically identical to statement). Acoustically, each of these (sub)types was found to have clearly distinct global F0-patterns, as well as a characteristic distribution of final rises [1,2]. The present paper explores the separate contribution of parameters of global downtrend and size of accent-lending pitch movements versus aspects of the terminal rise to the human identification of the four sentence (sub)types, at various positions in the time-course of the utterance. The results show that interrogativity in Dutch can be identified at an early point in the utterance. However, wh-questions are not distinct from statements.
  • Van Valin Jr., R. D., & LaPolla, R. J. (1997). Syntax: Structure, meaning and function. Cambridge University Press.
  • Van Putten, S. (2013). The meaning of the Avatime additive particle tsye. In M. Balbach, L. Benz, S. Genzel, M. Grubic, A. Renans, S. Schalowski, M. Stegenwallner, & A. Zeldes (Eds.), Information structure: Empirical perspectives on theory (pp. 55-74). Potsdam: Universitätsverlag Potsdam. Retrieved from http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:kobv:517-opus-64804.
  • Vernes, S. C. (2018). Vocal learning in bats: From genes to behaviour. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 516-518). Toruń, Poland: NCU Press. doi:10.12775/3991-1.128.
  • Von Holzen, K., & Bergmann, C. (2018). A Meta-Analysis of Infants’ Mispronunciation Sensitivity Development. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1159-1164). Austin, TX: Cognitive Science Society.

    Abstract

    Before infants become mature speakers of their native language, they must acquire a robust word-recognition system which allows them to strike the balance between allowing some variation (mood, voice, accent) and recognizing variability that potentially changes meaning (e.g. cat vs hat). The current meta-analysis quantifies how the latter, termed mispronunciation sensitivity, changes over infants’ first three years, testing competing predictions of mainstream language acquisition theories. Our results show that infants were sensitive to mispronunciations, but accepted them as labels for target objects. Interestingly, and in contrast to predictions of mainstream theories, mispronunciation sensitivity was not modulated by infant age, suggesting that a sufficiently flexible understanding of native language phonology is in place at a young age.

Share this page