Publications

Displaying 101 - 195 of 195
  • Kuperman, V. (2008). Lexical processing of morphologically complex words: An information-theoretical perspective. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Lattenkamp, E. Z., Vernes, S. C., & Wiegrebe, L. (2018). Mammalian models for the study of vocal learning: A new paradigm in bats. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 235-237). Toruń, Poland: NCU Press. doi:10.12775/3991-1.056.
  • Lauscher, A., Eckert, K., Galke, L., Scherp, A., Rizvi, S. T. R., Ahmed, S., Dengel, A., Zumstein, P., & Klein, A. (2018). Linked open citation database: Enabling libraries to contribute to an open and interconnected citation graph. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 109-118). New York: ACM. doi:10.1145/3197026.3197050.

    Abstract

    Citations play a crucial role in the scientific discourse, in information retrieval, and in bibliometrics. Many initiatives are currently promoting the idea of having free and open citation data. Creation of citation data, however, is not part of the cataloging workflow in libraries nowadays.
    In this paper, we present our project Linked Open Citation Database, in which we design distributed processes and a system infrastructure based on linked data technology. The goal is to show that efficiently cataloging citations in libraries using a semi-automatic approach is possible. We specifically describe the current state of the workflow and its implementation. We show that we could significantly improve the automatic reference extraction that is crucial for the subsequent data curation. We further give insights on the curation and linking process and provide evaluation results that not only direct the further development of the project, but also allow us to discuss its overall feasibility.
  • Lefever, E., Hendrickx, I., Croijmans, I., Van den Bosch, A., & Majid, A. (2018). Discovering the language of wine reviews: A text mining account. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 3297-3302). Paris: LREC.

    Abstract

    It is widely held that smells and flavors are impossible to put into words. In this paper we test this claim by seeking predictive patterns in wine reviews, which ostensibly aim to provide guides to perceptual content. Wine reviews have previously been critiqued as random and meaningless. We collected an English corpus of wine reviews with their structured metadata, and applied machine learning techniques to automatically predict the wine's color, grape variety, and country of origin. To train the three supervised classifiers, three different information sources were incorporated: lexical bag-of-words features, domain-specific terminology features, and semantic word embedding features. In addition, using regression analysis we investigated basic review properties, i.e., review length, average word length, and their relationship to the scalar values of price and review score. Our results show that wine experts do share a common vocabulary to describe wines and they use this in a consistent way, which makes it possible to automatically predict wine characteristics based on the review text alone. This means that odors and flavors may be more expressible in language than typically acknowledged.
  • Lenkiewicz, A., & Drude, S. (2013). Automatic annotation of linguistic 2D and Kinect recordings with the Media Query Language for Elan. In Proceedings of Digital Humanities 2013 (pp. 276-278).

    Abstract

    Research in body language with use of gesture recognition and speech analysis has gained much attention in the recent times, influencing disciplines related to image and speech processing.

    This study aims to design the Media Query Language (MQL) (Lenkiewicz, et al. 2012) combined with the Linguistic Media Query Interface (LMQI) for Elan (Wittenburg, et al. 2006). The system integrated with the new achievements in audio-video recognition will allow querying media files with predefined gesture phases (or motion primitives) and speech characteristics as well as combinations of both. For the purpose of this work the predefined motions and speech characteristics are called patterns for atomic elements and actions for a sequence of patterns. The main assumption is that a user-customized library of patterns and actions and automated media annotation with LMQI will reduce annotation time, hence decreasing costs of creation of annotated corpora. Increase of the number of annotated data should influence the speed and number of possible research in disciplines in which human multimodal interaction is a subject of interest and where annotated corpora are required.
  • Lenkiewicz, P., Pereira, M., Freire, M., & Fernandes, J. (2008). Accelerating 3D medical image segmentation with high performance computing. In Proceedings of the IEEE International Workshops on Image Processing Theory, Tools and Applications - IPT (pp. 1-8).

    Abstract

    Digital processing of medical images has helped physicians and patients during past years by allowing examination and diagnosis on a very precise level. Nowadays possibly the biggest deal of support it can offer for modern healthcare is the use of high performance computing architectures to treat the huge amounts of data that can be collected by modern acquisition devices. This paper presents a parallel processing implementation of an image segmentation algorithm that operates on a computer cluster equipped with 10 processing units. Thanks to well-organized distribution of the workload we manage to significantly shorten the execution time of the developed algorithm and reach a performance gain very close to linear.
  • Levelt, W. J. M., & Schriefers, H. (1987). Stages of lexical access. In G. A. Kempen (Ed.), Natural language generation: new results in artificial intelligence, psychology and linguistics (pp. 395-404). Dordrecht: Nijhoff.
  • Levinson, S. C. (1987). Minimization and conversational inference. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 61-129). Benjamins.
  • Long, M. (2018). The lifelong interplay between language and cognition: From language learning to perspective-taking, new insights into the ageing mind. PhD Thesis, University of Edinburgh, Edinburgh.
  • Lopopolo, A., Frank, S. L., Van den Bosch, A., Nijhof, A., & Willems, R. M. (2018). The Narrative Brain Dataset (NBD), an fMRI dataset for the study of natural language processing in the brain. In B. Devereux, E. Shutova, & C.-R. Huang (Eds.), Proceedings of LREC 2018 Workshop "Linguistic and Neuro-Cognitive Resources (LiNCR) (pp. 8-11). Paris: LREC.

    Abstract

    We present the Narrative Brain Dataset, an fMRI dataset that was collected during spoken presentation of short excerpts of three
    stories in Dutch. Together with the brain imaging data, the dataset contains the written versions of the stimulation texts. The texts are
    accompanied with stochastic (perplexity and entropy) and semantic computational linguistic measures. The richness and unconstrained
    nature of the data allows the study of language processing in the brain in a more naturalistic setting than is common for fMRI studies.
    We hope that by making NBD available we serve the double purpose of providing useful neural data to researchers interested in natural
    language processing in the brain and to further stimulate data sharing in the field of neuroscience of language.
  • Lucas, C., Griffiths, T., Xu, F., & Fawcett, C. (2008). A rational model of preference learning and choice prediction by children. In D. Koller, Y. Bengio, D. Schuurmans, L. Bottou, & A. Culotta (Eds.), Advances in Neural Information Processing Systems.

    Abstract

    Young children demonstrate the ability to make inferences about the preferences of other agents based on their choices. However, there exists no overarching account of what children are doing when they learn about preferences or how they use that knowledge. We use a rational model of preference learning, drawing on ideas from economics and computer science, to explain the behavior of children in several recent experiments. Specifically, we show how a simple econometric model can be extended to capture two- to four-year-olds’ use of statistical information in inferring preferences, and their generalization of these preferences.
  • Lupyan, G., Wendorf, A., Berscia, L. M., & Paul, J. (2018). Core knowledge or language-augmented cognition? The case of geometric reasoning. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 252-254). Toruń, Poland: NCU Press. doi:10.12775/3991-1.062.
  • Magyari, L., & De Ruiter, J. P. (2008). Timing in conversation: The anticipation of turn endings. In J. Ginzburg, P. Healey, & Y. Sato (Eds.), Proceedings of the 12th Workshop on the Semantics and Pragmatics Dialogue (pp. 139-146). London: King's college.

    Abstract

    We examined how communicators can switch between speaker and listener role with such accurate timing. During conversations, the majority of role transitions happens with a gap or overlap of only a few hundred milliseconds. This suggests that listeners can predict when the turn of the current speaker is going to end. Our hypothesis is that listeners know when a turn ends because they know how it ends. Anticipating the last words of a turn can help the next speaker in predicting when the turn will end, and also in anticipating the content of the turn, so that an appropriate response can be prepared in advance. We used the stimuli material of an earlier experiment (De Ruiter, Mitterer & Enfield, 2006), in which subjects were listening to turns from natural conversations and had to press a button exactly when the turn they were listening to ended. In the present experiment, we investigated if the subjects can complete those turns when only an initial fragment of the turn is presented to them. We found that the subjects made better predictions about the last words of those turns that had more accurate responses in the earlier button press experiment.
  • Mai, F., Galke, L., & Scherp, A. (2018). Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 169-178). New York: ACM.

    Abstract

    For (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competitive with the performance on the full-texts if the same number of training samples is used for training. However, it is much easier to obtain title data in large quantities and to use it for training than full-text data. In this paper, we investigate the question how models obtained from training on increasing amounts of title training data compare to models from training on a constant number of full-texts. We evaluate this question on a large-scale dataset from the medical domain (PubMed) and from economics (EconBiz). In these datasets, the titles and annotations of millions of publications are available, and they outnumber the available full-texts by a factor of 20 and 15, respectively. To exploit these large amounts of data to their full potential, we develop three strong deep learning classifiers and evaluate their performance on the two datasets. The results are promising. On the EconBiz dataset, all three classifiers outperform their full-text counterparts by a large margin. The best title-based classifier outperforms the best full-text method by 9.4%. On the PubMed dataset, the best title-based method almost reaches the performance of the best full-text classifier, with a difference of only 2.9%.
  • Mainz, N. (2018). Vocabulary knowledge and learning: Individual differences in adult native speakers. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Majid, A. (2013). Olfactory language and cognition. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual meeting of the Cognitive Science Society (CogSci 2013) (pp. 68). Austin,TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0025/index.html.

    Abstract

    Since the cognitive revolution, a widely held assumption has been that—whereas content may vary across cultures—cognitive processes would be universal, especially those on the more basic levels. Even if scholars do not fully subscribe to this assumption, they often conceptualize, or tend to investigate, cognition as if it were universal (Henrich, Heine, & Norenzayan, 2010). The insight that universality must not be presupposed but scrutinized is now gaining ground, and cognitive diversity has become one of the hot (and controversial) topics in the field (Norenzayan & Heine, 2005). We argue that, for scrutinizing the cultural dimension of cognition, taking an anthropological perspective is invaluable, not only for the task itself, but for attenuating the home-field disadvantages that are inescapably linked to cross-cultural research (Medin, Bennis, & Chandler, 2010).
  • McCafferty, S. G., & Gullberg, M. (Eds.). (2008). Gesture and SLA: Toward an integrated approach [Special Issue]. Studies in Second Language Acquisition, 30(2).
  • Merkx, D., & Scharenborg, O. (2018). Articulatory feature classification using convolutional neural networks. In Proceedings of Interspeech 2018 (pp. 2142-2146). doi:10.21437/Interspeech.2018-2275.

    Abstract

    The ultimate goal of our research is to improve an existing speech-based computational model of human speech recognition on the task of simulating the role of fine-grained phonetic information in human speech processing. As part of this work we are investigating articulatory feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Different approaches have been used to build AF classifiers, most notably multi-layer perceptrons. Recently, deep neural networks have been applied to the task of AF classification. This paper aims to improve AF classification by investigating two different approaches: 1) investigating the usefulness of a deep Convolutional neural network (CNN) for AF classification; 2) integrating the Mel filtering operation into the CNN architecture. The results showed a remarkable improvement in classification accuracy of the CNNs over state-of-the-art AF classification results for Dutch, most notably in the minority classes. Integrating the Mel filtering operation into the CNN architecture did not further improve classification performance.
  • Micklos, A., Macuch Silva, V., & Fay, N. (2018). The prevalence of repair in studies of language evolution. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 316-318). Toruń, Poland: NCU Press. doi:10.12775/3991-1.075.
  • Mitterer, H. (2008). How are words reduced in spontaneous speech? In A. Botonis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (pp. 165-168). Athens: University of Athens.

    Abstract

    Words are reduced in spontaneous speech. If reductions are constrained by functional (i.e., perception and production) constraints, they should not be arbitrary. This hypothesis was tested by examing the pronunciations of high- to mid-frequency words in a Dutch and a German spontaneous speech corpus. In logistic-regression models the "reduction likelihood" of a phoneme was predicted by fixed-effect predictors such as position within the word, word length, word frequency, and stress, as well as random effects such as phoneme identity and word. The models for Dutch and German show many communalities. This is in line with the assumption that similar functional constraints influence reductions in both languages.
  • Mulder, K., Ten Bosch, L., & Boves, L. (2018). Analyzing EEG Signals in Auditory Speech Comprehension Using Temporal Response Functions and Generalized Additive Models. In Proceedings of Interspeech 2018 (pp. 1452-1456). doi:10.21437/Interspeech.2018-1676.

    Abstract

    Analyzing EEG signals recorded while participants are listening to continuous speech with the purpose of testing linguistic hypotheses is complicated by the fact that the signals simultaneously reflect exogenous acoustic excitation and endogenous linguistic processing. This makes it difficult to trace subtle differences that occur in mid-sentence position. We apply an analysis based on multivariate temporal response functions to uncover subtle mid-sentence effects. This approach is based on a per-stimulus estimate of the response of the neural system to speech input. Analyzing EEG signals predicted on the basis of the response functions might then bring to light conditionspecific differences in the filtered signals. We validate this approach by means of an analysis of EEG signals recorded with isolated word stimuli. Then, we apply the validated method to the analysis of the responses to the same words in the middle of meaningful sentences.
  • Mulder, K. (2013). Family and neighbourhood relations in the mental lexicon: A cross-language perspective. PhD Thesis, Radboud University Nijmegen, Nijmegen.

    Abstract

    We lezen en horen dagelijks duizenden woorden zonder dat het ons enige moeite lijkt te kosten. Toch speelt zich in ons brein ondertussen een complex mentaal proces af, waarbij tal van andere woorden dan het aangeboden woord, ook actief worden. Dit gebeurt met name wanneer die andere woorden overeenkomen met de feitelijk aangeboden woorden in spelling, uitspraak of betekenis. Deze activatie als gevolg van gelijkenis strekt zich zelfs uit tot andere talen: ook daarin worden gelijkende woorden actief. Waar liggen de grenzen van dit activatieproces? Activeer je bij het verwerken van het Engelse woord 'steam' ook het Nederlandse woord 'stram'(een zogenaamd 'buurwoord)? En activeer je bij 'clock' zowel 'clockwork' als 'klokhuis' ( twee morfolologische familieleden uit verschillende talen)? Kimberley Mulder onderzocht hoe het leesproces van Nederlands-Engelse tweetaligen wordt beïnvloed door zulke relaties. In meerdere experimentele studies vond ze dat tweetaligen niet alleen morfologische familieleden en orthografische buren activeren uit de taal waarin ze op dat moment lezen, maar ook uit de andere taal die ze kennen. Het lezen van een woord beperkt zich dus geenszins tot wat je eigenlijk ziet, maar activeert een heel netwerk van woorden in je brein.

    Additional information

    full text via Radboud Repository
  • Ortega, G. (2013). Acquisition of a signed phonological system by hearing adults: The Role of sign structure and iconcity. PhD Thesis, University College London, London.

    Abstract

    The phonological system of a sign language comprises meaningless sub-lexical units that define the structure of a sign. A number of studies have examined how learners of a sign language as a first language (L1) acquire these components. However, little is understood about the mechanism by which hearing adults develop visual phonological categories when learning a sign language as a second language (L2). Developmental studies have shown that sign complexity and iconicity, the clear mapping between the form of a sign and its referent, shape in different ways the order of emergence of a visual phonology. The aim of the present dissertation was to investigate how these two factors affect the development of a visual phonology in hearing adults learning a sign language as L2. The empirical data gathered in this dissertation confirms that sign structure and iconicity are important factors that determine L2 phonological development. Non-signers perform better at discriminating the contrastive features of phonologically simple signs than signs with multiple elements. Handshape was the parameter most difficult to learn, followed by movement, then orientation and finally location which is the same order of acquisition reported in L1 sign acquisition. In addition, the ability to access the iconic properties of signs had a detrimental effect in phonological development because iconic signs were consistently articulated less accurately than arbitrary signs. Participants tended to retain the iconic elements of signs but disregarded their exact phonetic structure. Further, non-signers appeared to process iconic signs as iconic gestures at least at the early stages of sign language acquisition. The empirical data presented in this dissertation suggest that non-signers exploit their gestural system as scaffolding of the new manual linguistic system and that sign L2 phonological development is strongly influenced by the structural complexity of a sign and its degree of iconicity.
  • Ortega, G., & Ozyurek, A. (2013). Gesture-sign interface in hearing non-signers' first exposure to sign. In Proceedings of the Tilburg Gesture Research Meeting [TiGeR 2013].

    Abstract

    Natural sign languages and gestures are complex communicative systems that allow the incorporation of features of a referent into their structure. They differ, however, in that signs are more conventionalised because they consist of meaningless phonological parameters. There is some evidence that despite non-signers finding iconic signs more memorable they can have more difficulty at articulating their exact phonological components. In the present study, hearing non-signers took part in a sign repetition task in which they had to imitate as accurately as possible a set of iconic and arbitrary signs. Their renditions showed that iconic signs were articulated significantly less accurately than arbitrary signs. Participants were recalled six months later to take part in a sign generation task. In this task, participants were shown the English translation of the iconic signs they imitated six months prior. For each word, participants were asked to generate a sign (i.e., an iconic gesture). The handshapes produced in the sign repetition and sign generation tasks were compared to detect instances in which both renditions presented the same configuration. There was a significant correlation between articulation accuracy in the sign repetition task and handshape overlap. These results suggest some form of gestural interference in the production of iconic signs by hearing non-signers. We also suggest that in some instances non-signers may deploy their own conventionalised gesture when producing some iconic signs. These findings are interpreted as evidence that non-signers process iconic signs as gestures and that in production, only when sign and gesture have overlapping features will they be capable of producing the phonological components of signs accurately.
  • Ostarek, M. (2018). Envisioning language: An exploration of perceptual processes in language comprehension. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Ozturk, O., & Papafragou, A. (2008). Acquisition of evidentiality and source monitoring. In H. Chan, H. Jacob, & E. Kapia (Eds.), Proceedings from the 32nd Annual Boston University Conference on Language Development [BUCLD 32] (pp. 368-377). Somerville, Mass.: Cascadilla Press.
  • Peeters, D., Chu, M., Holler, J., Ozyurek, A., & Hagoort, P. (2013). Getting to the point: The influence of communicative intent on the kinematics of pointing gestures. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 1127-1132). Austin, TX: Cognitive Science Society.

    Abstract

    In everyday communication, people not only use speech but
    also hand gestures to convey information. One intriguing
    question in gesture research has been why gestures take the
    specific form they do. Previous research has identified the
    speaker-gesturer’s communicative intent as one factor
    shaping the form of iconic gestures. Here we investigate
    whether communicative intent also shapes the form of
    pointing gestures. In an experimental setting, twenty-four
    participants produced pointing gestures identifying a referent
    for an addressee. The communicative intent of the speakergesturer
    was manipulated by varying the informativeness of
    the pointing gesture. A second independent variable was the
    presence or absence of concurrent speech. As a function of their communicative intent and irrespective of the presence of speech, participants varied the durations of the stroke and the post-stroke hold-phase of their gesture. These findings add to our understanding of how the communicative context influences the form that a gesture takes.
  • Petersson, K. M. (2008). On cognition, structured sequence processing, and adaptive dynamical systems. American Institute of Physics Conference Proceedings, 1060(1), 195-200.

    Abstract

    Cognitive neuroscience approaches the brain as a cognitive system: a system that functionally is conceptualized in terms of information processing. We outline some aspects of this concept and consider a physical system to be an information processing device when a subclass of its physical states can be viewed as representational/cognitive and transitions between these can be conceptualized as a process operating on these states by implementing operations on the corresponding representational structures. We identify a generic and fundamental problem in cognition: sequentially organized structured processing. Structured sequence processing provides the brain, in an essential sense, with its processing logic. In an approach addressing this problem, we illustrate how to integrate levels of analysis within a framework of adaptive dynamical systems. We note that the dynamical system framework lends itself to a description of asynchronous event-driven devices, which is likely to be important in cognition because the brain appears to be an asynchronous processing system. We use the human language faculty and natural language processing as a concrete example through out.
  • Piai, V., Roelofs, A., Jensen, O., Schoffelen, J.-M., & Bonnefond, M. (2013). Distinct patterns of brain activity characterize lexical activation and competition in speech production [Abstract]. Journal of Cognitive Neuroscience, 25 Suppl., 106.

    Abstract

    A fundamental ability of speakers is to
    quickly retrieve words from long-term memory. According to a prominent theory, concepts activate multiple associated words, which enter into competition for selection. Previous electrophysiological studies have provided evidence for the activation of multiple alternative words, but did not identify brain responses refl ecting competition. We report a magnetoencephalography study examining the timing and neural substrates of lexical activation and competition. The degree of activation of competing words was
    manipulated by presenting pictures (e.g., dog) simultaneously with distractor
    words. The distractors were semantically related to the picture name (cat), unrelated (pin), or identical (dog). Semantic distractors are stronger competitors to the picture name, because they receive additional activation from the picture, whereas unrelated distractors do not. Picture naming times were longer with semantic than with unrelated and identical distractors. The patterns of phase-locked and non-phase-locked activity were distinct
    but temporally overlapping. Phase-locked activity in left middle temporal
    gyrus, peaking at 400 ms, was larger on unrelated than semantic and identical trials, suggesting differential effort in processing the alternative words activated by the picture-word stimuli. Non-phase-locked activity in the 4-10 Hz range between 400-650 ms in left superior frontal gyrus was larger on semantic than unrelated and identical trials, suggesting different
    degrees of effort in resolving the competition among the alternatives
    words, as refl ected in the naming times. These findings characterize distinct
    patterns of brain activity associated with lexical activation and competition
    respectively, and their temporal relation, supporting the theory that words are selected by competition.
  • Poellmann, K. (2013). The many ways listeners adapt to reductions in casual speech. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Puccini, D. (2013). The use of deictic versus representational gestures in infancy. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Räsänen, O., Seshadri, S., & Casillas, M. (2018). Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. In Proceedings of Interspeech 2018 (pp. 1200-1204). doi:10.21437/Interspeech.2018-1047.

    Abstract

    Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.
  • Ravignani, A., Gingras, B., Asano, R., Sonnweber, R., Matellan, V., & Fitch, W. T. (2013). The evolution of rhythmic cognition: New perspectives and technologies in comparative research. In M. Knauff, M. Pauen, I. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Conference of the Cognitive Science Society (pp. 1199-1204). Austin,TX: Cognitive Science Society.

    Abstract

    Music is a pervasive phenomenon in human culture, and musical
    rhythm is virtually present in all musical traditions. Research
    on the evolution and cognitive underpinnings of rhythm
    can benefit from a number of approaches. We outline key concepts
    and definitions, allowing fine-grained analysis of rhythmic
    cognition in experimental studies. We advocate comparative
    animal research as a useful approach to answer questions
    about human music cognition and review experimental evidence
    from different species. Finally, we suggest future directions
    for research on the cognitive basis of rhythm. Apart from
    research in semi-natural setups, possibly allowed by “drum set
    for chimpanzees” prototypes presented here for the first time,
    mathematical modeling and systematic use of circular statistics
    may allow promising advances.
  • Ravignani, A., Garcia, M., Gross, S., de Reus, K., Hoeksema, N., Rubio-Garcia, A., & de Boer, B. (2018). Pinnipeds have something to say about speech and rhythm. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 399-401). Toruń, Poland: NCU Press. doi:10.12775/3991-1.095.
  • Raviv, L., Meyer, A. S., & Lev-Ari, S. (2018). The role of community size in the emergence of linguistic structure. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 402-404). Toruń, Poland: NCU Press. doi:10.12775/3991-1.096.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). The strength of stress-related lexical competition depends on the presence of first-syllable stress. In Proceedings of Interspeech 2008 (pp. 1954-1954).

    Abstract

    Dutch listeners' looks to printed words were tracked while they listened to instructions to click with their mouse on one of them. When presented with targets from word pairs where the first two syllables were segmentally identical but differed in stress location, listeners used stress information to recognize the target before segmental information disambiguated the words. Furthermore, the amount of lexical competition was influenced by the presence or absence of word-initial stress.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). Lexical stress information modulates the time-course of spoken-word recognition. In Proceedings of Acoustics' 08 (pp. 3183-3188).

    Abstract

    Segmental as well as suprasegmental information is used by Dutch listeners to recognize words. The time-course of the effect of suprasegmental stress information on spoken-word recognition was investigated in a previous study, in which we tracked Dutch listeners' looks to arrays of four printed words as they listened to spoken sentences. Each target was displayed along with a competitor that did not differ segmentally in its first two syllables but differed in stress placement (e.g., 'CENtimeter' and 'sentiMENT'). The listeners' eye-movements showed that stress information is used to recognize the target before distinct segmental information is available. Here, we examine the role of durational information in this effect. Two experiments showed that initial-syllable duration, as a cue to lexical stress, is not interpreted dependent on the speaking rate of the preceding carrier sentence. This still held when other stress cues like pitch and amplitude were removed. Rather, the speaking rate of the preceding carrier affected the speed of word recognition globally, even though the rate of the target itself was not altered. Stress information modulated lexical competition, but did so independently of the rate of the preceding carrier, even if duration was the only stress cue present.
  • Roberts, S. G. (2013). A Bottom-up approach to the cultural evolution of bilingualism. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 1229-1234). Austin, TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0236/index.html.

    Abstract

    The relationship between individual cognition and cultural phenomena at the society level can be transformed by cultural transmission (Kirby, Dowman, & Griffiths, 2007). Top-down models of this process have typically assumed that individuals only adopt a single linguistic trait. Recent extensions include ‘bilingual’ agents, able to adopt multiple linguistic traits (Burkett & Griffiths, 2010). However, bilingualism is more than variation within an individual: it involves the conditional use of variation with different interlocutors. That is, bilingualism is a property of a population that emerges from use. A bottom-up simulation is presented where learners are sensitive to the identity of other speakers. The simulation reveals that dynamic social structures are a key factor for the evolution of bilingualism in a population, a feature that was abstracted away in the top-down models. Top-down and bottom-up approaches may lead to different answers, but can work together to reveal and explore important features of the cultural transmission process.
  • Robotham, L., Trinkler, I., & Sauter, D. (2008). The power of positives: Evidence for an overall emotional recognition deficit in Huntington's disease [Abstract]. Journal of Neurology, Neurosurgery & Psychiatry, 79, A12.

    Abstract

    The recognition of emotions of disgust, anger and fear have been shown to be significantly impaired in Huntington’s disease (eg,Sprengelmeyer et al, 1997, 2006; Gray et al, 1997; Milders et al, 2003,Montagne et al, 2006; Johnson et al, 2007; De Gelder et al, 2008). The relative impairment of these emotions might have implied a recognition impairment specific to negative emotions. Could the asymmetric recognition deficits be due not to the complexity of the emotion but rather reflect the complexity of the task? In the current study, 15 Huntington’s patients and 16 control subjects were presented with negative and positive non-speech emotional vocalisations that were to be identified as anger, fear, sadness, disgust, achievement, pleasure and amusement in a forced-choice paradigm. This experiment more accurately matched the negative emotions with positive emotions in a homogeneous modality. The resulting dually impaired ability of Huntington’s patients to identify negative and positive non-speech emotional vocalisations correctly provides evidence for an overall emotional recognition deficit in the disease. These results indicate that previous findings of a specificity in emotional recognition deficits might instead be due to the limitations of the visual modality. Previous experiments may have found an effect of emotional specificy due to the presence of a single positive emotion, happiness, in the midst of multiple negative emotions. In contrast with the previous literature, the study presented here points to a global deficit in the recognition of emotional sounds.
  • Rommers, J. (2013). Seeing what's next: Processing and anticipating language referring to objects. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Rubio-Fernández, P., & Jara-Ettinger, J. (2018). Joint inferences of speakers’ beliefs and referents based on how they speak. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 991-996). Austin, TX: Cognitive Science Society.

    Abstract

    For almost two decades, the poor performance observed with the so-called Director task has been interpreted as evidence of limited use of Theory of Mind in communication. Here we propose a probabilistic model of common ground in referential communication that derives three inferences from an utterance: what the speaker is talking about in a visual context, what she knows about the context, and what referential expressions she prefers. We tested our model by comparing its inferences with those made by human participants and found that it closely mirrors their judgments, whereas an alternative model compromising the hearer’s expectations of cooperativeness and efficiency reveals a worse fit to the human data. Rather than assuming that common ground is fixed in a given exchange and may or may not constrain reference resolution, we show how common ground can be inferred as part of the process of reference assignment.
  • De Ruiter, L. E. (2008). How useful are polynomials for analyzing intonation? In Proceedings of Interspeech 2008 (pp. 785-789).

    Abstract

    This paper presents the first application of polynomial modeling as a means for validating phonological pitch accent labels to German data. It is compared to traditional phonetic analysis (measuring minima, maxima, alignment). The traditional method fares better in classification, but results are comparable in statistical accent pair testing. Robustness tests show that pitch correction is necessary in both cases. The approaches are discussed in terms of their practicability, applicability to other domains of research and interpretability of their results.
  • Saleh, A., Beck, T., Galke, L., & Scherp, A. (2018). Performance comparison of ad-hoc retrieval models over full-text vs. titles of documents. In M. Dobreva, A. Hinze, & M. Žumer (Eds.), Maturity and Innovation in Digital Libraries: 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings (pp. 290-303). Cham, Switzerland: Springer.

    Abstract

    While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.
  • Sauppe, S., Norcliffe, E., Konopka, A. E., Van Valin Jr., R. D., & Levinson, S. C. (2013). Dependencies first: Eye tracking evidence from sentence production in Tagalog. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 1265-1270). Austin, TX: Cognitive Science Society.

    Abstract

    We investigated the time course of sentence formulation in Tagalog, a verb-initial language in which the verb obligatorily agrees with one of its arguments. Eye-tracked participants described pictures of transitive events. Fixations to the two characters in the events were compared across sentences differing in agreement marking and post-verbal word order. Fixation patterns show evidence for two temporally dissociated phases in Tagalog sentence production. The first, driven by verb agreement, involves early linking of concepts to syntactic functions; the second, driven by word order, involves incremental lexical encoding of these concepts. These results suggest that even the earliest stages of sentence formulation may be guided by a language's grammatical structure.
  • Sauter, D., Eisner, F., Rosen, S., & Scott, S. K. (2008). The role of source and filter cues in emotion recognition in speech [Abstract]. Journal of the Acoustical Society of America, 123, 3739-3740.

    Abstract

    In the context of the source-filter theory of speech, it is well established that intelligibility is heavily reliant on information carried by the filter, that is, spectral cues (e.g., Faulkner et al., 2001; Shannon et al., 1995). However, the extraction of other types of information in the speech signal, such as emotion and identity, is less well understood. In this study we investigated the extent to which emotion recognition in speech depends on filterdependent cues, using a forced-choice emotion identification task at ten levels of noise-vocoding ranging between one and 32 channels. In addition, participants performed a speech intelligibility task with the same stimuli. Our results indicate that compared to speech intelligibility, emotion recognition relies less on spectral information and more on cues typically signaled by source variations, such as voice pitch, voice quality, and intensity. We suggest that, while the reliance on spectral dynamics is likely a unique aspect of human speech, greater phylogenetic continuity across species may be found in the communication of affect in vocalizations.
  • Sauter, D. (2008). The time-course of emotional voice processing [Abstract]. Neurocase, 14, 455-455.

    Abstract

    Research using event-related brain potentials (ERPs) has demonstrated an early differential effect in fronto-central regions when processing emotional, as compared to affectively neutral facial stimuli (e.g., Eimer & Holmes, 2002). In this talk, data demonstrating a similar effect in the auditory domain will be presented. ERPs were recorded in a one-back task where participants had to identify immediate repetitions of emotion category, such as a fearful sound followed by another fearful sound. The stimulus set consisted of non-verbal emotional vocalisations communicating positive and negative sounds, as well as neutral baseline conditions. Similarly to the facial domain, fear sounds as compared to acoustically controlled neutral sounds, elicited a frontally distributed positivity with an onset latency of about 150 ms after stimulus onset. These data suggest the existence of a rapid multi-modal frontocentral mechanism discriminating emotional from non-emotional human signals.
  • Scharenborg, O., & Merkx, D. (2018). The role of articulatory feature representation quality in a computational model of human spoken-word recognition. In Proceedings of the Machine Learning in Speech and Language Processing Workshop (MLSLP 2018).

    Abstract

    Fine-Tracker is a speech-based model of human speech
    recognition. While previous work has shown that Fine-Tracker
    is successful at modelling aspects of human spoken-word
    recognition, its speech recognition performance is not
    comparable to that of human performance, possibly due to
    suboptimal intermediate articulatory feature (AF)
    representations. This study investigates the effect of improved
    AF representations, obtained using a state-of-the-art deep
    convolutional network, on Fine-Tracker’s simulation and
    recognition performance: Although the improved AF quality
    resulted in improved speech recognition; it, surprisingly, did
    not lead to an improvement in Fine-Tracker’s simulation power.
  • Scharenborg, O., & Janse, E. (2013). Changes in the role of intensity as a cue for fricative categorisation. In Proceedings of INTERSPEECH 2013: 14th Annual Conference of the International Speech Communication Association (pp. 3147-3151).

    Abstract

    Older listeners with high-frequency hearing loss rely more on intensity for categorisation of /s/ than normal-hearing older listeners. This study addresses the question whether this increased reliance comes about immediately when the need
    arises, i.e., in the face of a spectrally-degraded signal. A phonetic categorisation task was carried out using intensitymodulated fricatives in a clean and a low-pass filtered condition with two younger and two older listener groups.
    When high-frequency information was removed from the speech signal, younger listeners started using intensity as a cue. The older adults on the other hand, when presented with the low-pass filtered speech, did not rely on intensity differences for fricative identification. These results suggest that the reliance on intensity shown by the older hearingimpaired adults may have been acquired only gradually with
    longer exposure to a degraded speech signal.
  • Scharenborg, O., & Cooke, M. P. (2008). Comparing human and machine recognition performance on a VCV corpus. In ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery".

    Abstract

    Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and without ‘pitch’ information, using the same backend. The results are compared with listener results at the level of articulatory feature classification. While no acoustic feature representation reached the levels of human performance, both MFCCs and Rate maps achieved good scores, with Rate maps nearing human performance on the classification of voicing. Comparing the results on the most difficult articulatory features to classify showed similarities between the humans and the SVMs: e.g., ‘dental’ was by far the least well identified by both groups. Overall, adding pitch information seemed to hamper classification performance.
  • Scharenborg, O. (2008). Modelling fine-phonetic detail in a computational model of word recognition. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1473-1476). ISCA Archive.

    Abstract

    There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal helps listeners to segment a speech signal into syllables and words. In this paper, we compare two computational models of word recognition on their ability to capture and use this finephonetic detail during speech recognition. One model, SpeM, is phoneme-based, whereas the other, newly developed Fine- Tracker, is based on articulatory features. Simulations dealt with modelling the ability of listeners to distinguish short words (e.g., ‘ham’) from the longer words in which they are embedded (e.g., ‘hamster’). The simulations with Fine- Tracker showed that it was, like human listeners, able to distinguish between short words from the longer words in which they are embedded. This suggests that it is possible to extract this fine-phonetic detail from the speech signal and use it during word recognition.
  • Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., & Sloetjes, H. (2008). An exchange format for multimodal annotations. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation of multimodality. We propose a multimodal annotation exchange format, based on the annotation graph formalism, which is supported by import and export routines in the respective tools
  • Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.

    Abstract

    This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.
  • Scott, K., Sakkalou, E., Ellis-Davies, K., Hilbrink, E., Hahn, U., & Gattis, M. (2013). Infant contributions to joint attention predict vocabulary development. In M. Knauff, M. Pauen, I. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Conference of the Cognitive Science Society (pp. 3384-3389). Austin,TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0602/index.html.

    Abstract

    Joint attention has long been accepted as constituting a privileged circumstance in which word learning prospers. Consequently research has investigated the role that maternal responsiveness to infant attention plays in predicting language outcomes. However there has been a recent expansion in research implicating similar predictive effects from individual differences in infant behaviours. Emerging from the foundations of such work comes an interesting question: do the relative contributions of the mother and infant to joint attention episodes impact upon language learning? In an attempt to address this, two joint attention behaviours were assessed as predictors of vocabulary attainment (as measured by OCDI Production Scores). These predictors were: mothers encouraging attention to an object given their infant was already attending to an object (maternal follow-in); and infants looking to an object given their mothers encouragement of attention to an object (infant follow-in). In a sample of 14-month old children (N=36) we compared the predictive power of these maternal and infant follow-in variables on concurrent and later language performance. Results using Growth Curve Analysis provided evidence that while both maternal follow-in and infant follow-in variables contributed to production scores, infant follow-in was a stronger predictor. Consequently it does appear to matter whose final contribution establishes joint attention episodes. Infants who more often follow-in into their mothers’ encouragement of attention have larger, and faster growing vocabularies between 14 and 18-months of age.
  • Seuren, P. A. M. (1971). Qualche osservazione sulla frase durativa e iterativa in italiano. In M. Medici, & R. Simone (Eds.), Grammatica trasformazionale italiana (pp. 209-224). Roma: Bulzoni.
  • Shao, Z. (2013). Contributions of executive control to individual differences in word production. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Shayan, S., Moreira, A., Windhouwer, M., Koenig, A., & Drude, S. (2013). LEXUS 3 - a collaborative environment for multimedia lexica. In Proceedings of the Digital Humanities Conference 2013 (pp. 392-395).
  • Shitova, N. (2018). Electrophysiology of competition and adjustment in word and phrase production. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Sikora, K. (2018). Executive control in language production by adults and children with and without language impairment. PhD Thesis, Radboud University, Nijmegen, The Netherlands.

    Abstract

    The present study examined how the updating, inhibiting, and shifting abilities underlying executive control influence spoken noun-phrase production. Previous studies provided evidence that updating and inhibiting, but not shifting, influence picture naming response time (RT). However, little is known about the role of executive control in more complex forms of language production like generating phrases. We assessed noun-phrase production using picture description and a picture-word interference procedure. We measured picture description RT to assess length, distractor, and switch effects, which were assumed to reflect, respectively, the updating, inhibiting, and shifting abilities of adult participants. Moreover, for each participant we obtained scores on executive control tasks that measured verbal and nonverbal updating, nonverbal inhibiting, and nonverbal shifting. We found that both verbal and nonverbal updating scores correlated with the overall mean picture description RTs. Furthermore, the length effect in the RTs correlated with verbal but not nonverbal updating scores, while the distractor effect correlated with inhibiting scores. We did not find a correlation between the switch effect in the mean RTs and the shifting scores. However, the shifting scores correlated with the switch effect in the normal part of the underlying RT distribution. These results suggest that updating, inhibiting, and shifting each influence the speed of phrase production, thereby demonstrating a contribution of all three executive control abilities to language production.

    Additional information

    full text via Radboud Repository
  • Sloetjes, H., & Wittenburg, P. (2008). Annotation by category - ELAN and ISO DCR. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    The Data Category Registry is one of the ISO initiatives towards the establishment of standards for Language Resource management, creation and coding. Successful application of the DCR depends on the availability of tools that can interact with it. This paper describes the first steps that have been taken to provide users of the multimedia annotation tool ELAN, with the means to create references from tiers and annotations to data categories defined in the ISO Data Category Registry. It first gives a brief description of the capabilities of ELAN and the structure of the documents it creates. After a concise overview of the goals and current state of the ISO DCR infrastructure, a description is given of how the preliminary connectivity with the DCR is implemented in ELAN
  • Smith, A. C., Monaghan, P., & Huettig, F. (2013). Modelling the effects of formal literacy training on language mediated visual attention. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 3420-3425). Austin, TX: Cognitive Science Society.

    Abstract

    Recent empirical evidence suggests that language-mediated eye gaze is partly determined by level of formal literacy training. Huettig, Singh and Mishra (2011) showed that high-literate individuals' eye gaze was closely time locked to phonological overlap between a spoken target word and items presented in a visual display. In contrast, low-literate individuals' eye gaze was not related to phonological overlap, but was instead strongly influenced by semantic relationships between items. Our present study tests the hypothesis that this behavior is an emergent property of an increased ability to extract phonological structure from the speech signal, as in the case of high-literates, with low-literates more reliant on more coarse grained structure. This hypothesis was tested using a neural network model, that integrates linguistic information extracted from the speech signal with visual and semantic information within a central resource. We demonstrate that contrasts in fixation behavior similar to those observed between high and low literates emerge when models are trained on speech signals of contrasting granularity.
  • De Sousa, H. (2008). The development of echo-subject markers in Southern Vanuatu. In T. J. Curnow (Ed.), Selected papers from the 2007 Conference of the Australian Linguistic Society. Australian Linguistic Society.

    Abstract

    One of the defining features of the Southern Vanuatu language family is the echo-subject (ES) marker (Lynch 2001: 177-178). Canonically, an ES marker indicates that the subject of the clause is coreferential with the subject of the preceding clause. This paper begins with a survey of the various ES systems found in Southern Vanuatu. Two prominent differences amongst the ES systems are: a) the level of obligatoriness of the ES marker; and b) the level of grammatical integration between an ES clauses and the preceding clause. The variation found amongst the ES systems reveals a clear path of grammaticalisation from the VP coordinator *ma in Proto–Southern Vanuatu to the various types of ES marker in contemporary Southern Vanuatu languages
  • Speed, L., & Majid, A. (2018). Music and odor in harmony: A case of music-odor synaesthesia. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2527-2532). Austin, TX: Cognitive Science Society.

    Abstract

    We report an individual with music-odor synaesthesia who experiences automatic and vivid odor sensations when she hears music. S’s odor associations were recorded on two days, and compared with those of two control participants. Overall, S produced longer descriptions, and her associations were of multiple odors at once, in comparison to controls who typically reported a single odor. Although odor associations were qualitatively different between S and controls, ratings of the consistency of their descriptions did not differ. This demonstrates that crossmodal associations between music and odor exist in non-synaesthetes too. We also found that S is better at discriminating between odors than control participants, and is more likely to experience emotion, memories and evaluations triggered by odors, demonstrating the broader impact of her synaesthesia.

    Additional information

    link to conference website
  • Stehouwer, H., & Van den Bosch, A. (2008). Putting the t where it belongs: Solving a confusion problem in Dutch. In S. Verberne, H. Van Halteren, & P.-A. Coppen (Eds.), Computational Linguistics in the Netherlands 2007: Selected Papers from the 18th CLIN Meeting (pp. 21-36). Utrecht: LOT.

    Abstract

    A common Dutch writing error is to confuse a word ending in -d with a neighbor word ending in -dt. In this paper we describe the development of a machine-learning-based disambiguator that can determine which word ending is appropriate, on the basis of its local context. We develop alternative disambiguators, varying between a single monolithic classifier and having multiple confusable experts disambiguate between confusable pairs. Disambiguation accuracy of the best developed disambiguators exceeds 99%; when we apply these disambiguators to an external test set of collected errors, our detection strategy correctly identifies up to 79% of the errors.
  • Stoehr, A. (2018). Speech production, perception, and input of simultaneous bilingual preschoolers: Evidence from voice onset time. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Sumner, M., Kurumada, C., Gafter, R., & Casillas, M. (2013). Phonetic variation and the recognition of words with pronunciation variants. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 3486-3492). Austin, TX: Cognitive Science Society.
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2018). Analyzing reaction time sequences from human participants in auditory experiments. In Proceedings of Interspeech 2018 (pp. 971-975). doi:10.21437/Interspeech.2018-1728.

    Abstract

    Sequences of reaction times (RT) produced by participants in an experiment are not only influenced by the stimuli, but by many other factors as well, including fatigue, attention, experience, IQ, handedness, etc. These confounding factors result in longterm effects (such as a participant’s overall reaction capability) and in short- and medium-time fluctuations in RTs (often referred to as ‘local speed effects’). Because stimuli are usually presented in a random sequence different for each participant, local speed effects affect the underlying ‘true’ RTs of specific trials in different ways across participants. To be able to focus statistical analysis on the effects of the cognitive process under study, it is necessary to reduce the effect of confounding factors as much as possible. In this paper we propose and compare techniques and criteria for doing so, with focus on reducing (‘filtering’) the local speed effects. We show that filtering matters substantially for the significance analyses of predictors in linear mixed effect regression models. The performance of filtering is assessed by the average between-participant correlation between filtered RT sequences and by Akaike’s Information Criterion, an important measure of the goodness-of-fit of linear mixed effect regression models.
  • Ten Bosch, L., & Boves, L. (2018). Information encoding by deep neural networks: what can we learn? In Proceedings of Interspeech 2018 (pp. 1457-1461). doi:10.21437/Interspeech.2018-1896.

    Abstract

    The recent advent of deep learning techniques in speech tech-nology and in particular in automatic speech recognition hasyielded substantial performance improvements. This suggeststhat deep neural networks (DNNs) are able to capture structurein speech data that older methods for acoustic modeling, suchas Gaussian Mixture Models and shallow neural networks failto uncover. In image recognition it is possible to link repre-sentations on the first couple of layers in DNNs to structuralproperties of images, and to representations on early layers inthe visual cortex. This raises the question whether it is possi-ble to accomplish a similar feat with representations on DNNlayers when processing speech input. In this paper we presentthree different experiments in which we attempt to untanglehow DNNs encode speech signals, and to relate these repre-sentations to phonetic knowledge, with the aim to advance con-ventional phonetic concepts and to choose the topology of aDNNs more efficiently. Two experiments investigate represen-tations formed by auto-encoders. A third experiment investi-gates representations on convolutional layers that treat speechspectrograms as if they were images. The results lay the basisfor future experiments with recursive networks.
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2013). Towards an end-to-end computational model of speech comprehension: simulating a lexical decision task. In Proceedings of INTERSPEECH 2013: 14th Annual Conference of the International Speech Communication Association (pp. 2822-2826).

    Abstract

    This paper describes a computational model of speech comprehension that takes the acoustic signal as input and predicts reaction times as observed in an auditory lexical decision task. By doing so, we explore a new generation of end-to-end computational models that are able to simulate the behaviour of human subjects participating in a psycholinguistic experiment. So far, nearly all computational models of speech comprehension do not start from the speech signal itself, but from abstract representations of the speech signal, while the few existing models that do start from the acoustic signal cannot directly model reaction times as obtained in comprehension experiments. The main functional components in our model are the perception stage, which is compatible with the psycholinguistic model Shortlist B and is implemented with techniques from automatic speech recognition, and the decision stage, which is based on the linear ballistic accumulation decision model. We successfully tested our model against data from 20 participants performing a largescale auditory lexical decision experiment. Analyses show that the model is a good predictor for the average judgment and reaction time for each word.
  • Thompson, B., & Lupyan, G. (2018). Automatic estimation of lexical concreteness in 77 languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1122-1127). Austin, TX: Cognitive Science Society.

    Abstract

    We estimate lexical Concreteness for millions of words across 77 languages. Using a simple regression framework, we combine vector-based models of lexical semantics with experimental norms of Concreteness in English and Dutch. By applying techniques to align vector-based semantics across distinct languages, we compute and release Concreteness estimates at scale in numerous languages for which experimental norms are not currently available. This paper lays out the technique and its efficacy. Although this is a difficult dataset to evaluate immediately, Concreteness estimates computed from English correlate with Dutch experimental norms at $\rho$ = .75 in the vocabulary at large, increasing to $\rho$ = .8 among Nouns. Our predictions also recapitulate attested relationships with word frequency. The approach we describe can be readily applied to numerous lexical measures beyond Concreteness
  • Thompson, B., Roberts, S., & Lupyan, G. (2018). Quantifying semantic similarity across languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2551-2556). Austin, TX: Cognitive Science Society.

    Abstract

    Do all languages convey semantic knowledge in the same way? If language simply mirrors the structure of the world, the answer should be a qualified “yes”. If, however, languages impose structure as much as reflecting it, then even ostensibly the “same” word in different languages may mean quite different things. We provide a first pass at a large-scale quantification of cross-linguistic semantic alignment of approximately 1000 meanings in 55 languages. We find that the translation equivalents in some domains (e.g., Time, Quantity, and Kinship) exhibit high alignment across languages while the structure of other domains (e.g., Politics, Food, Emotions, and Animals) exhibits substantial cross-linguistic variability. Our measure of semantic alignment correlates with known phylogenetic distances between languages: more phylogenetically distant languages have less semantic alignment. We also find semantic alignment to correlate with cultural distances between societies speaking the languages, suggesting a rich co-adaptation of language and culture even in domains of experience that appear most constrained by the natural world
  • Timmer, K., Ganushchak, L. Y., Mitlina, Y., & Schiller, N. O. (2013). Choosing first or second language phonology in 125 ms [Abstract]. Journal of Cognitive Neuroscience, 25 Suppl., 164.

    Abstract

    We are often in a bilingual situation (e.g., overhearing a conversation in the train). We investigated whether first (L1) and second language (L2) phonologies are automatically activated. A masked priming paradigm was used, with Russian words as targets and either Russian or English words as primes. Event-related potentials (ERPs) were recorded while Russian (L1) – English (L2) bilinguals read aloud L1 target words (e.g. РЕЙС /reis/ ‘fl ight’) primed with either L1 (e.g. РАНА /rana/ ‘wound’) or L2 words (e.g. PACK). Target words were read faster when they were preceded by phonologically related L1 primes but not by orthographically related L2 primes. ERPs showed orthographic priming in the 125-200 ms time window. Thus, both L1 and L2 phonologies are simultaneously activated during L1 reading. The results provide support for non-selective models of bilingual reading, which assume automatic activation of the non-target language phonology even when it is not required by the task.
  • Tourtouri, E. N., Delogu, F., & Crocker, M. W. (2018). Specificity and entropy reduction in situated referential processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3356-3361). Austin: Cognitive Science Society.

    Abstract

    In situated communication, reference to an entity in the shared visual context can be established using eitheranexpression that conveys precise (minimally specified) or redundant (over-specified) information. There is, however, along-lasting debate in psycholinguistics concerningwhether the latter hinders referential processing. We present evidence from an eyetrackingexperiment recordingfixations as well asthe Index of Cognitive Activity –a novel measure of cognitive workload –supporting the view that over-specifications facilitate processing. We further present originalevidence that, above and beyond the effect of specificity,referring expressions thatuniformly reduce referential entropyalso benefitprocessing
  • Trilsbeek, P., Broeder, D., Van Valkenhoef, T., & Wittenburg, P. (2008). A grid of regional language archives. In C. Calzolari (Ed.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008) (pp. 1474-1477). European Language Resources Association (ELRA).

    Abstract

    About two years ago, the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, started an initiative to install regional language archives in various places around the world, particularly in places where a large number of endangered languages exist and are being documented. These digital archives make use of the LAT archiving framework [1] that the MPI has developed
    over the past nine years. This framework consists of a number of web-based tools for depositing, organizing and utilizing linguistic resources in a digital archive. The regional archives are in principle autonomous archives, but they can decide to share metadata descriptions and language resources with the MPI archive in Nijmegen and become part of a grid of linked LAT archives. By doing so, they will also take advantage of the long-term preservation strategy of the MPI archive. This paper describes the reasoning
    behind this initiative and how in practice such an archive is set up.
  • Tromp, J. (2018). Indirect request comprehension in different contexts. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Ünal, E., & Papafragou, A. (2013). Linguistic and conceptual representations of inference as a knowledge source. In S. Baiz, N. Goldman, & R. Hawkes (Eds.), Proceedings of the 37th Annual Boston University Conference on Language Development (BUCLD 37) (pp. 433-443). Boston: Cascadilla Press.
  • Vagliano, I., Galke, L., Mai, F., & Scherp, A. (2018). Using adversarial autoencoders for multi-modal automatic playlist continuation. In C.-W. Chen, P. Lamere, M. Schedl, & H. Zamani (Eds.), RecSys Challenge '18: Proceedings of the ACM Recommender Systems Challenge 2018 (pp. 5.1-5.6). New York: ACM. doi:10.1145/3267471.3267476.

    Abstract

    The task of automatic playlist continuation is generating a list of recommended tracks that can be added to an existing playlist. By suggesting appropriate tracks, i. e., songs to add to a playlist, a recommender system can increase the user engagement by making playlist creation easier, as well as extending listening beyond the end of current playlist. The ACM Recommender Systems Challenge 2018 focuses on such task. Spotify released a dataset of playlists, which includes a large number of playlists and associated track listings. Given a set of playlists from which a number of tracks have been withheld, the goal is predicting the missing tracks in those playlists. We participated in the challenge as the team Unconscious Bias and, in this paper, we present our approach. We extend adversarial autoencoders to the problem of automatic playlist continuation. We show how multiple input modalities, such as the playlist titles as well as track titles, artists and albums, can be incorporated in the playlist continuation task.
  • Van Valin Jr., R. D. (1987). Aspects of the interaction of syntax and pragmatics: Discourse coreference mechanisms and the typology of grammatical systems. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 513-531). Amsterdam: Benjamins.
  • Van der Zande, P. (2013). Hearing and seeing speech: Perceptual adjustments in auditory-visual speech processing. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Van Uytvanck, D., Dukers, A., Ringersma, J., & Trilsbeek, P. (2008). Language-sites: Accessing and presenting language resources via geographic information systems. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). Paris: European Language Resources Association (ELRA).

    Abstract

    The emerging area of Geographic Information Systems (GIS) has proven to add an interesting dimension to many research projects. Within the language-sites initiative we have brought together a broad range of links to digital language corpora and resources. Via Google Earth's visually appealing 3D-interface users can spin the globe, zoom into an area they are interested in and access directly the relevant language resources. This paper focuses on several ways of relating the map and the online data (lexica, annotations, multimedia recordings, etc.). Furthermore, we discuss some of the implementation choices that have been made, including future challenges. In addition, we show how scholars (both linguists and anthropologists) are using GIS tools to fulfill their specific research needs by making use of practical examples. This illustrates how both scientists and the general public can benefit from geography-based access to digital language data
  • Van Valin Jr., R. D. (1987). Pragmatics, island phenomena, and linguistic competence. In A. M. Farley, P. T. Farley, & K.-E. McCullough (Eds.), CLS 22. Papers from the parasession on pragmatics and grammatical theory (pp. 223-233). Chicago Linguistic Society.
  • Van Putten, S. (2013). The meaning of the Avatime additive particle tsye. In M. Balbach, L. Benz, S. Genzel, M. Grubic, A. Renans, S. Schalowski, M. Stegenwallner, & A. Zeldes (Eds.), Information structure: Empirical perspectives on theory (pp. 55-74). Potsdam: Universitätsverlag Potsdam. Retrieved from http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:kobv:517-opus-64804.
  • Váradi, T., Wittenburg, P., Krauwer, S., Wynne, M., & Koskenniemi, K. (2008). CLARIN: Common language resources and technology infrastructure. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper gives an overview of the CLARIN project [1], which aims to create a research infrastructure that makes language resources and technology (LRT) available and readily usable to scholars of all disciplines, in particular the humanities and social sciences (HSS).
  • Vernes, S. C. (2018). Vocal learning in bats: From genes to behaviour. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 516-518). Toruń, Poland: NCU Press. doi:10.12775/3991-1.128.
  • Von Holzen, K., & Bergmann, C. (2018). A Meta-Analysis of Infants’ Mispronunciation Sensitivity Development. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1159-1164). Austin, TX: Cognitive Science Society.

    Abstract

    Before infants become mature speakers of their native language, they must acquire a robust word-recognition system which allows them to strike the balance between allowing some variation (mood, voice, accent) and recognizing variability that potentially changes meaning (e.g. cat vs hat). The current meta-analysis quantifies how the latter, termed mispronunciation sensitivity, changes over infants’ first three years, testing competing predictions of mainstream language acquisition theories. Our results show that infants were sensitive to mispronunciations, but accepted them as labels for target objects. Interestingly, and in contrast to predictions of mainstream theories, mispronunciation sensitivity was not modulated by infant age, suggesting that a sufficiently flexible understanding of native language phonology is in place at a young age.
  • von Stutterheim, C., & Flecken, M. (Eds.). (2013). Principles of information organization in L2 discourse [Special Issue]. International Review of Applied linguistics in Language Teaching (IRAL), 51(2).
  • Vosse, T. G., & Kempen, G. (2008). Parsing verb-final clauses in German: Garden-path and ERP effects modeled by a parallel dynamic parser. In B. Love, K. McRae, & V. Sloutsky (Eds.), Proceedings of the 30th Annual Conference on the Cognitive Science Society (pp. 261-266). Washington: Cognitive Science Society.

    Abstract

    Experimental sentence comprehension studies have shown that superficially similar German clauses with verb-final word order elicit very different garden-path and ERP effects. We show that a computer implementation of the Unification Space parser (Vosse & Kempen, 2000) in the form of a localist-connectionist network can model the observed differences, at least qualitatively. The model embodies a parallel dynamic parser that, in contrast with existing models, does not distinguish between consecutive first-pass and reanalysis stages, and does not use semantic or thematic roles. It does use structural frequency data and animacy information.
  • Wagner, A. (2008). Phoneme inventories and patterns of speech sound perception. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Weber, A., & Melinger, A. (2008). Name dominance in spoken word recognition is (not) modulated by expectations: Evidence from synonyms. In A. Botinis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (ExLing 2008) (pp. 225-228). Athens: University of Athens.

    Abstract

    Two German eye-tracking experiments tested whether top-down expectations interact with acoustically-driven word-recognition processes. Competitor objects with two synonymous names were paired with target objects whose names shared word onsets with either the dominant or the non-dominant name of the competitor. Non-dominant names of competitor objects were either introduced before the test session or not. Eye-movements were monitored while participants heard instructions to click on target objects. Results demonstrate dominant and non-dominant competitor names were considered for recognition, regardless of top-down expectations, though dominant names were always activated more strongly.
  • Weber, A. (2008). What the eyes can tell us about spoken-language comprehension [Abstract]. Journal of the Acoustical Society of America, 124, 2474-2474.

    Abstract

    Lexical recognition is typically slower in L2 than in L1. Part of the difficulty comes from a not precise enough processing of L2 phonemes. Consequently, L2 listeners fail to eliminate candidate words that L1 listeners can exclude from competing for recognition. For instance, the inability to distinguish /r/ from /l/ in rocket and locker makes for Japanese listeners both words possible candidates when hearing their onset (e.g., Cutler, Weber, and Otake, 2006). The L2 disadvantage can, however, be dispelled: For L2 listeners, but not L1 listeners, L2 speech from a non-native talker with the same language background is known to be as intelligible as L2 speech from a native talker (e.g., Bent and Bradlow, 2003). A reason for this may be that L2 listeners have ample experience with segmental deviations that are characteristic for their own accent. On this account, only phonemic deviations that are typical for the listeners’ own accent will cause spurious lexical activation in L2 listening (e.g., English magic pronounced as megic for Dutch listeners). In this talk, I will present evidence from cross-modal priming studies with a variety of L2 listener groups, showing how the processing of phonemic deviations is accent-specific but withstands fine phonetic differences.
  • Wegener, C. (2008). A grammar of Savosavo: A Papuan language of the Solomon Islands. PhD Thesis, Radboud University Nijmegen, Njimegen.
  • Witteman, M. J. (2013). Lexical processing of foreign-accented speech: Rapid and flexible adaptation. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Zinn, C., Cablitz, G., Ringersma, J., Kemps-Snijders, M., & Wittenburg, P. (2008). Constructing knowledge spaces from linguistic resources. In Proceedings of the CIL 18 Workshop on Linguistic Studies of Ontology: From lexical semantics to formal ontologies and back.
  • Zinn, C. (2008). Conceptual spaces in ViCoS. In S. Bechhofer, M. Hauswirth, J. Hoffmann, & M. Koubarakis (Eds.), The semantic web: Research and applications (pp. 890-894). Berlin: Springer.

    Abstract

    We describe ViCoS, a tool for constructing and visualising conceptual spaces in the area of language documentation. ViCoS allows users to enrich existing lexical information about the words of a language with conceptual knowledge. Their work towards language-based, informal ontology building must be supported by easy-to-use workflows and supporting software, which we will demonstrate.
  • De Zubicaray, G. I., Acheson, D. J., & Hartsuiker, R. J. (Eds.). (2013). Mind what you say - general and specific mechanisms for monitoring in speech production [Research topic] [Special Issue]. Frontiers in Human Neuroscience. Retrieved from http://www.frontiersin.org/human_neuroscience/researchtopics/mind_what_you_say_-_general_an/1197.

    Abstract

    Psycholinguistic research has typically portrayed speech production as a relatively automatic process. This is because when errors are made, they occur as seldom as one in every thousand words we utter. However, it has long been recognised that we need some form of control over what we are currently saying and what we plan to say. This capacity to both monitor our inner speech and self-correct our speech output has often been assumed to be a property of the language comprehension system. More recently, it has been demonstrated that speech production benefits from interfacing with more general cognitive processes such as selective attention, short-term memory (STM) and online response monitoring to resolve potential conflict and successfully produce the output of a verbal plan. The conditions and levels of representation according to which these more general planning, monitoring and control processes are engaged during speech production remain poorly understood. Moreover, there remains a paucity of information about their neural substrates, despite some of the first evidence of more general monitoring having come from electrophysiological studies of error related negativities (ERNs). While aphasic speech errors continue to be a rich source of information, there has been comparatively little research focus on instances of speech repair. The purpose of this Frontiers Research Topic is to provide a forum for researchers to contribute investigations employing behavioural, neuropsychological, electrophysiological, neuroimaging and virtual lesioning techniques. In addition, while the focus of the research topic is on novel findings, we welcome submission of computational simulations, review articles and methods papers.
  • Zwitserlood, I., Ozyurek, A., & Perniss, P. M. (2008). Annotation of sign and gesture cross-linguistically. In O. Crasborn, E. Efthimiou, T. Hanke, E. D. Thoutenhoofd, & I. Zwitserlood (Eds.), Construction and Exploitation of Sign Language Corpora. 3rd Workshop on the Representation and Processing of Sign Languages (pp. 185-190). Paris: ELDA.

    Abstract

    This paper discusses the construction of a cross-linguistic, bimodal corpus containing three modes of expression: expressions from two sign languages, speech and gestural expressions in two spoken languages and pantomimic expressions by users of two spoken languages who are requested to convey information without speaking. We discuss some problems and tentative solutions for the annotation of utterances expressing spatial information about referents in these three modes, suggesting a set of comparable codes for the description of both sign and gesture. Furthermore, we discuss the processing of entered annotations in ELAN, e.g. relating descriptive annotations to analytic annotations in all three modes and performing relational searches across annotations on different tiers.

Share this page