Publications

Displaying 101 - 200 of 208
  • Lattenkamp, E. Z., Vernes, S. C., & Wiegrebe, L. (2018). Mammalian models for the study of vocal learning: A new paradigm in bats. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 235-237). Toruń, Poland: NCU Press. doi:10.12775/3991-1.056.
  • Lauscher, A., Eckert, K., Galke, L., Scherp, A., Rizvi, S. T. R., Ahmed, S., Dengel, A., Zumstein, P., & Klein, A. (2018). Linked open citation database: Enabling libraries to contribute to an open and interconnected citation graph. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 109-118). New York: ACM. doi:10.1145/3197026.3197050.

    Abstract

    Citations play a crucial role in the scientific discourse, in information retrieval, and in bibliometrics. Many initiatives are currently promoting the idea of having free and open citation data. Creation of citation data, however, is not part of the cataloging workflow in libraries nowadays.
    In this paper, we present our project Linked Open Citation Database, in which we design distributed processes and a system infrastructure based on linked data technology. The goal is to show that efficiently cataloging citations in libraries using a semi-automatic approach is possible. We specifically describe the current state of the workflow and its implementation. We show that we could significantly improve the automatic reference extraction that is crucial for the subsequent data curation. We further give insights on the curation and linking process and provide evaluation results that not only direct the further development of the project, but also allow us to discuss its overall feasibility.
  • Lefever, E., Hendrickx, I., Croijmans, I., Van den Bosch, A., & Majid, A. (2018). Discovering the language of wine reviews: A text mining account. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 3297-3302). Paris: LREC.

    Abstract

    It is widely held that smells and flavors are impossible to put into words. In this paper we test this claim by seeking predictive patterns in wine reviews, which ostensibly aim to provide guides to perceptual content. Wine reviews have previously been critiqued as random and meaningless. We collected an English corpus of wine reviews with their structured metadata, and applied machine learning techniques to automatically predict the wine's color, grape variety, and country of origin. To train the three supervised classifiers, three different information sources were incorporated: lexical bag-of-words features, domain-specific terminology features, and semantic word embedding features. In addition, using regression analysis we investigated basic review properties, i.e., review length, average word length, and their relationship to the scalar values of price and review score. Our results show that wine experts do share a common vocabulary to describe wines and they use this in a consistent way, which makes it possible to automatically predict wine characteristics based on the review text alone. This means that odors and flavors may be more expressible in language than typically acknowledged.
  • Lenkiewicz, P., Drude, S., Lenkiewicz, A., Gebre, B. G., Masneri, S., Schreer, O., Schwenninger, J., & Bardeli, R. (2014). Application of audio and video processing methods for language research and documentation: The AVATecH Project. In Z. Vetulani, & J. Mariani (Eds.), 5th Language and Technology Conference, LTC 2011, Poznań, Poland, November 25-27, 2011, Revised Selected Papers (pp. 288-299). Berlin: Springer.

    Abstract

    Evolution and changes of all modern languages is a wellknown fact. However, recently it is reaching dynamics never seen before, which results in loss of the vast amount of information encoded in every language. In order to preserve such rich heritage, and to carry out linguistic research, properly annotated recordings of world languages are necessary. Since creating those annotations is a very laborious task, reaching times 100 longer than the length of the annotated media, innovative video processing algorithms are needed, in order to improve the efficiency and quality of annotation process. This is the scope of the AVATecH project presented in this article
  • Lenkiewicz, P., Shkaravska, O., Goosen, T., Windhouwer, M., Broeder, D., Roth, S., & Olsson, O. (2014). The DWAN framework: Application of a web annotation framework for the general humanities to the domain of language resources. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 3644-3649).
  • De León, L., & Levinson, S. C. (Eds.). (1992). Space in Mesoamerican languages [Special Issue]. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung, 45(6).
  • Lev-Ari, S., & Peperkamp, S. (2014). Do people converge to the linguistic patterns of non-reliable speakers? Perceptual learning from non-native speakers. In S. Fuchs, M. Grice, A. Hermes, L. Lancia, & D. Mücke (Eds.), Proceedings of the 10th International Seminar on Speech Production (ISSP) (pp. 261-264).

    Abstract

    People's language is shaped by the input from the environment. The environment, however, offers a range of linguistic inputs that differ in their reliability. We test whether listeners accordingly weigh input from sources that differ in reliability differently. Using a perceptual learning paradigm, we show that listeners adjust their representations according to linguistic input provided by native but not by non-native speakers. This is despite the fact that listeners are able to learn the characteristics of the speech of both speakers. These results provide evidence for a disassociation between adaptation to the characteristic of specific speakers and adjustment of linguistic representations in general based on these learned characteristics. This study also has implications for theories of language change. In particular, it cast doubts on the hypothesis that a large proportion of non-native speakers in a community can bring about linguistic changes
  • Levelt, W. J. M. (2005). Habitual perspective. In Proceedings of the 27th Annual Meeting of the Cognitive Science Society (CogSci 2005).
  • Levelt, W. J. M., & Plomp, R. (1962). Musical consonance and critical bandwidth. In Proceedings of the 4th International Congress Acoustics (pp. 55-55).
  • Levelt, W. J. M., & Schriefers, H. (1987). Stages of lexical access. In G. A. Kempen (Ed.), Natural language generation: new results in artificial intelligence, psychology and linguistics (pp. 395-404). Dordrecht: Nijhoff.
  • Levinson, S. C. (1987). Minimization and conversational inference. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 61-129). Benjamins.
  • Lew, A. A., Hall-Lew, L., & Fairs, A. (2014). Language and Tourism in Sabah, Malaysia and Edinburgh, Scotland. In B. O'Rourke, N. Bermingham, & S. Brennan (Eds.), Opening New Lines of Communication in Applied Linguistics: Proceedings of the 46th Annual Meeting of the British Association for Applied Linguistics (pp. 253-259). London, UK: Scitsiugnil Press.
  • Little, H., & Silvey, C. (2014). Interpreting emerging structures: The interdependence of combinatoriality and compositionality. In Proceedings of the First Conference of the International Association for Cognitive Semiotics (IACS 2014) (pp. 113-114).
  • Little, H., & Eryilmaz, K. (2014). The effect of physical articulation constraints on the emergence of combinatorial structure. In B. De Boer, & T. Verhoef (Eds.), Proceedings of Evolang X, Workshop on Signals, Speech, and Signs (pp. 11-17).
  • Little, H., & De Boer, B. (2014). The effect of size of articulation space on the emergence of combinatorial structure. In E. Cartmill A., S. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th international conference (EvoLangX) (pp. 479-481). Singapore: World Scientific.
  • Liu, Z., Chen, A., & Van de Velde, H. (2014). Prosodic focus marking in Bai. In N. Campbell, D. Gibbon, & D. Hirst (Eds.), Proceedings of Speech Prosody 2014 (pp. 628-631).

    Abstract

    This study investigates prosodic marking of focus in Bai, a Sino-Tibetan language spoken in the Southwest of China, by adopting a semi-spontaneous experimental approach. Our data show that Bai speakers increase the duration of the focused constituent and reduce the duration of the post-focus constituent to encode focus. However, duration is not used in Bai to distinguish focus types differing in size and contrastivity. Further, pitch plays no role in signaling focus and differentiating focus types. The results thus suggest that Bai uses prosody to mark focus, but to a lesser extent, compared to Mandarin Chinese, with which Bai has been in close contact for centuries, and Cantonese, to which Bai is similar in the tonal system, although Bai is similar to Cantonese in its reliance on duration in prosodic focus marking.
  • Liu, S., & Zhang, Y. (2019). Why some verbs are harder to learn than others – A micro-level analysis of everyday learning contexts for early verb learning. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2173-2178). Montreal, QB: Cognitive Science Society.

    Abstract

    Verb learning is important for young children. While most
    previous research has focused on linguistic and conceptual
    challenges in early verb learning (e.g. Gentner, 1982, 2006),
    the present paper examined early verb learning at the
    attentional level and quantified the input for early verb learning
    by measuring verb-action co-occurrence statistics in parent-
    child interaction from the learner’s perspective. To do so, we
    used head-mounted eye tracking to record fine-grained
    multimodal behaviors during parent-infant joint play, and
    analyzed parent speech, parent and infant action, and infant
    attention at the moments when parents produced verb labels.
    Our results show great variability across different action verbs,
    in terms of frequency of verb utterances, frequency of
    corresponding actions related to verb meanings, and infants’
    attention to verbs and actions, which provide new insights on
    why some verbs are harder to learn than others.
  • Lopopolo, A., Frank, S. L., Van den Bosch, A., Nijhof, A., & Willems, R. M. (2018). The Narrative Brain Dataset (NBD), an fMRI dataset for the study of natural language processing in the brain. In B. Devereux, E. Shutova, & C.-R. Huang (Eds.), Proceedings of LREC 2018 Workshop "Linguistic and Neuro-Cognitive Resources (LiNCR) (pp. 8-11). Paris: LREC.

    Abstract

    We present the Narrative Brain Dataset, an fMRI dataset that was collected during spoken presentation of short excerpts of three
    stories in Dutch. Together with the brain imaging data, the dataset contains the written versions of the stimulation texts. The texts are
    accompanied with stochastic (perplexity and entropy) and semantic computational linguistic measures. The richness and unconstrained
    nature of the data allows the study of language processing in the brain in a more naturalistic setting than is common for fMRI studies.
    We hope that by making NBD available we serve the double purpose of providing useful neural data to researchers interested in natural
    language processing in the brain and to further stimulate data sharing in the field of neuroscience of language.
  • Lupyan, G., Wendorf, A., Berscia, L. M., & Paul, J. (2018). Core knowledge or language-augmented cognition? The case of geometric reasoning. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 252-254). Toruń, Poland: NCU Press. doi:10.12775/3991-1.062.
  • Mai, F., Galke, L., & Scherp, A. (2019). CBOW is not all you need: Combining CBOW with the compositional matrix space model. In Proceedings of the Seventh International Conference on Learning Representations (ICLR 2019). OpenReview.net.

    Abstract

    Continuous Bag of Words (CBOW) is a powerful text embedding method. Due to its strong capabilities to encode word content, CBOW embeddings perform well on a wide range of downstream tasks while being efficient to compute. However, CBOW is not capable of capturing the word order. The reason is that the computation of CBOW's word embeddings is commutative, i.e., embeddings of XYZ and ZYX are the same. In order to address this shortcoming, we propose a
    learning algorithm for the Continuous Matrix Space Model, which we call Continual Multiplication of Words (CMOW). Our algorithm is an adaptation of word2vec, so that it can be trained on large quantities of unlabeled text. We empirically show that CMOW better captures linguistic properties, but it is inferior to CBOW in memorizing word content. Motivated by these findings, we propose a hybrid model that combines the strengths of CBOW and CMOW. Our results show that the hybrid CBOW-CMOW-model retains CBOW's strong ability to memorize word content while at the same time substantially improving its ability to encode other linguistic information by 8%. As a result, the hybrid also performs better on 8 out of 11 supervised downstream tasks with an average improvement of 1.2%.
  • Mai, F., Galke, L., & Scherp, A. (2018). Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 169-178). New York: ACM.

    Abstract

    For (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competitive with the performance on the full-texts if the same number of training samples is used for training. However, it is much easier to obtain title data in large quantities and to use it for training than full-text data. In this paper, we investigate the question how models obtained from training on increasing amounts of title training data compare to models from training on a constant number of full-texts. We evaluate this question on a large-scale dataset from the medical domain (PubMed) and from economics (EconBiz). In these datasets, the titles and annotations of millions of publications are available, and they outnumber the available full-texts by a factor of 20 and 15, respectively. To exploit these large amounts of data to their full potential, we develop three strong deep learning classifiers and evaluate their performance on the two datasets. The results are promising. On the EconBiz dataset, all three classifiers outperform their full-text counterparts by a large margin. The best title-based classifier outperforms the best full-text method by 9.4%. On the PubMed dataset, the best title-based method almost reaches the performance of the best full-text classifier, with a difference of only 2.9%.
  • Mamus, E., Rissman, L., Majid, A., & Ozyurek, A. (2019). Effects of blindfolding on verbal and gestural expression of path in auditory motion events. In A. K. Goel, C. M. Seifert, & C. C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2275-2281). Montreal, QB: Cognitive Science Society.

    Abstract

    Studies have claimed that blind people’s spatial representations are different from sighted people, and blind people display superior auditory processing. Due to the nature of auditory and haptic information, it has been proposed that blind people have spatial representations that are more sequential than sighted people. Even the temporary loss of sight—such as through blindfolding—can affect spatial representations, but not much research has been done on this topic. We compared blindfolded and sighted people’s linguistic spatial expressions and non-linguistic localization accuracy to test how blindfolding affects the representation of path in auditory motion events. We found that blindfolded people were as good as sighted people when localizing simple sounds, but they outperformed sighted people when localizing auditory motion events. Blindfolded people’s path related speech also included more sequential, and less holistic elements. Our results indicate that even temporary loss of sight influences spatial representations of auditory motion events
  • Marcoux, K., & Ernestus, M. (2019). Differences between native and non-native Lombard speech in terms of pitch range. In M. Ochmann, M. Vorländer, & J. Fels (Eds.), Proceedings of the ICA 2019 and EAA Euroregio. 23rd International Congress on Acoustics, integrating 4th EAA Euroregio 2019 (pp. 5713-5720). Berlin: Deutsche Gesellschaft für Akustik.

    Abstract

    Lombard speech, speech produced in noise, is acoustically different from speech produced in quiet (plain speech) in several ways, including having a higher and wider F0 range (pitch). Extensive research on native Lombard speech does not consider that non-natives experience a higher cognitive load while producing
    speech and that the native language may influence the non-native speech. We investigated pitch range in plain and Lombard speech in native and non-natives.
    Dutch and American-English speakers read contrastive question-answer pairs in quiet and in noise in English, while the Dutch also read Dutch sentence pairs. We found that Lombard speech is characterized by a wider pitch range than plain speech, for all speakers (native English, non-native English, and native Dutch).
    This shows that non-natives also widen their pitch range in Lombard speech. In sentences with early-focus, we see the same increase in pitch range when going from plain to Lombard speech in native and non-native English, but a smaller increase in native Dutch. In sentences with late-focus, we see the biggest increase for the native English, followed by non-native English and then native Dutch. Together these results indicate an effect of the native language on non-native Lombard speech.
  • Marcoux, K., & Ernestus, M. (2019). Pitch in native and non-native Lombard speech. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 2605-2609). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Lombard speech, speech produced in noise, is
    typically produced with a higher fundamental
    frequency (F0, pitch) compared to speech in quiet. This paper examined the potential differences in native and non-native Lombard speech by analyzing median pitch in sentences with early- or late-focus produced in quiet and noise. We found an increase in pitch in late-focus sentences in noise for Dutch speakers in both English and Dutch, and for American-English speakers in English. These results
    show that non-native speakers produce Lombard speech, despite their higher cognitive load. For the early-focus sentences, we found a difference between the Dutch and the American-English speakers. Whereas the Dutch showed an increased F0 in noise
    in English and Dutch, the American-English speakers did not in English. Together, these results suggest that some acoustic characteristics of Lombard speech, such as pitch, may be language-specific, potentially
    resulting in the native language influencing the non-native Lombard speech.
  • Matic, D., & Nikolaeva, I. (2014). Focus feature percolation: Evidence from Tundra Nenets and Tundra Yukaghir. In S. Müller (Ed.), Proceedings of the 21st International Conference on Head-Driven Phrase Structure Grammar (HPSG 2014) (pp. 299-317). Stanford, CA: CSLI Publications.

    Abstract

    Two Siberian languages, Tundra Nenets and Tundra Yukaghir, do not obey strong island constraints in questioning: any sub-constituent of a relative or adverbial clause can be questioned. We argue that this has to do with how focusing works in these languages. The focused sub-constituent remains in situ, but there is abundant morphosyntactic evidence that the focus feature is passed up to the head of the clause. The result is the formation of a complex focus structure in which both the head and non head daughter are overtly marked as focus, and they are interpreted as a pairwise list such that the focus background is applicable to this list, but not to other alternative lists
  • McQueen, J. M., & Mitterer, H. (2005). Lexically-driven perceptual adjustments of vowel categories. In Proceedings of the ISCA Workshop on Plasticity in Speech Perception (PSP2005) (pp. 233-236).
  • McQueen, J. M., & Cutler, A. (1992). Words within words: Lexical statistics and lexical access. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing: Vol. 1 (pp. 221-224). Alberta: University of Alberta.

    Abstract

    This paper presents lexical statistics on the pattern of occurrence of words embedded in other words. We report the results of an analysis of 25000 words, varying in length from two to six syllables, extracted from a phonetically-coded English dictionary (The Longman Dictionary of Contemporary English). Each syllable, and each string of syllables within each word was checked against the dictionary. Two analyses are presented: the first used a complete list of polysyllables, with look-up on the entire dictionary; the second used a sublist of content words, counting only embedded words which were themselves content words. The results have important implications for models of human speech recognition. The efficiency of these models depends, in different ways, on the number and location of words within words.
  • Merkx, D., Frank, S., & Ernestus, M. (2019). Language learning using speech to image retrieval. In Proceedings of Interspeech 2019 (pp. 1841-1845). doi:10.21437/Interspeech.2019-3067.

    Abstract

    Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on existing neural network approaches to create visually grounded embeddings for spoken utterances. Using a combination of a multi-layer GRU, importance sampling, cyclic learning rates, ensembling and vectorial self-attention our results show a remarkable increase in image-caption retrieval performance over previous work. Furthermore, we investigate which layers in the model learn to recognise words in the input. We find that deeper network layers are better at encoding word presence, although the final layer has slightly lower performance. This shows that our visually grounded sentence encoder learns to recognise words from the input even though it is not explicitly trained for word recognition.
  • Merkx, D., & Scharenborg, O. (2018). Articulatory feature classification using convolutional neural networks. In Proceedings of Interspeech 2018 (pp. 2142-2146). doi:10.21437/Interspeech.2018-2275.

    Abstract

    The ultimate goal of our research is to improve an existing speech-based computational model of human speech recognition on the task of simulating the role of fine-grained phonetic information in human speech processing. As part of this work we are investigating articulatory feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Different approaches have been used to build AF classifiers, most notably multi-layer perceptrons. Recently, deep neural networks have been applied to the task of AF classification. This paper aims to improve AF classification by investigating two different approaches: 1) investigating the usefulness of a deep Convolutional neural network (CNN) for AF classification; 2) integrating the Mel filtering operation into the CNN architecture. The results showed a remarkable improvement in classification accuracy of the CNNs over state-of-the-art AF classification results for Dutch, most notably in the minority classes. Integrating the Mel filtering operation into the CNN architecture did not further improve classification performance.
  • Micklos, A., Macuch Silva, V., & Fay, N. (2018). The prevalence of repair in studies of language evolution. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 316-318). Toruń, Poland: NCU Press. doi:10.12775/3991-1.075.
  • Micklos, A. (2014). The nature of language in interaction. In E. Cartmill, S. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference.
  • Mitterer, H. (2005). Short- and medium-term plasticity for speaker adaptation seem to be independent. In Proceedings of the ISCA Workshop on Plasticity in Speech Perception (PSP2005) (pp. 83-86).
  • Mizera, P., Pollak, P., Kolman, A., & Ernestus, M. (2014). Impact of irregular pronunciation on phonetic segmentation of Nijmegen corpus of Casual Czech. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings (pp. 499-506). Heidelberg: Springer.

    Abstract

    This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the accuracy of phonetic segmentation when canonical or reduced pronunciation is used. The achieved accuracy of realized phonetic segmentation provides information about general accuracy of proper acoustic modelling which is supposed to be applied in spontaneous speech recognition. As a byproduct of presented spontaneous speech segmentation, this paper also describes the created lexicon with canonical pronunciations of words in NCCCz, a tool supporting pronunciation check of lexicon items, and finally also a minidatabase of selected utterances from NCCCz manually labelled on phonetic level suitable for evaluation purposes
  • Moisik, S. R., Zhi Yun, D. P., & Dediu, D. (2019). Active adjustment of the cervical spine during pitch production compensates for shape: The ArtiVarK study. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 864-868). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    The anterior lordosis of the cervical spine is thought
    to contribute to pitch (fo) production by influencing
    cricoid rotation as a function of larynx height. This
    study examines the matter of inter-individual
    variation in cervical spine shape and whether this has
    an influence on how fo is produced along increasing
    or decreasing scales, using the ArtiVarK dataset,
    which contains real-time MRI pitch production data.
    We find that the cervical spine actively participates in
    fo production, but the amount of displacement
    depends on individual shape. In general, anterior
    spine motion (tending toward cervical lordosis)
    occurs for low fo, while posterior movement (tending
    towards cervical kyphosis) occurs for high fo.
  • Mulder, K., Ten Bosch, L., & Boves, L. (2018). Analyzing EEG Signals in Auditory Speech Comprehension Using Temporal Response Functions and Generalized Additive Models. In Proceedings of Interspeech 2018 (pp. 1452-1456). doi:10.21437/Interspeech.2018-1676.

    Abstract

    Analyzing EEG signals recorded while participants are listening to continuous speech with the purpose of testing linguistic hypotheses is complicated by the fact that the signals simultaneously reflect exogenous acoustic excitation and endogenous linguistic processing. This makes it difficult to trace subtle differences that occur in mid-sentence position. We apply an analysis based on multivariate temporal response functions to uncover subtle mid-sentence effects. This approach is based on a per-stimulus estimate of the response of the neural system to speech input. Analyzing EEG signals predicted on the basis of the response functions might then bring to light conditionspecific differences in the filtered signals. We validate this approach by means of an analysis of EEG signals recorded with isolated word stimuli. Then, we apply the validated method to the analysis of the responses to the same words in the middle of meaningful sentences.
  • Nijveld, A., Ten Bosch, L., & Ernestus, M. (2019). ERP signal analysis with temporal resolution using a time window bank. In Proceedings of Interspeech 2019 (pp. 1208-1212). doi:10.21437/Interspeech.2019-2729.

    Abstract

    In order to study the cognitive processes underlying speech comprehension, neuro-physiological measures (e.g., EEG and MEG), or behavioural measures (e.g., reaction times and response accuracy) can be applied. Compared to behavioural measures, EEG signals can provide a more fine-grained and complementary view of the processes that take place during the unfolding of an auditory stimulus.

    EEG signals are often analysed after having chosen specific time windows, which are usually based on the temporal structure of ERP components expected to be sensitive to the experimental manipulation. However, as the timing of ERP components may vary between experiments, trials, and participants, such a-priori defined analysis time windows may significantly hamper the exploratory power of the analysis of components of interest. In this paper, we explore a wide-window analysis method applied to EEG signals collected in an auditory repetition priming experiment.

    This approach is based on a bank of temporal filters arranged along the time axis in combination with linear mixed effects modelling. Crucially, it permits a temporal decomposition of effects in a single comprehensive statistical model which captures the entire EEG trace.
  • Norris, D., Van Ooijen, B., & Cutler, A. (1992). Speeded detection of vowels and steady-state consonants. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing; Vol. 2 (pp. 1055-1058). Alberta: University of Alberta.

    Abstract

    We report two experiments in which vowels and steady-state consonants served as targets in a speeded detection task. In the first experiment, two vowels were compared with one voiced and once unvoiced fricative. Response times (RTs) to the vowels were longer than to the fricatives. The error rate was higher for the consonants. Consonants in word-final position produced the shortest RTs, For the vowels, RT correlated negatively with target duration. In the second experiment, the same two vowel targets were compared with two nasals. This time there was no significant difference in RTs, but the error rate was still significantly higher for the consonants. Error rate and length correlated negatively for the vowels only. We conclude that RT differences between phonemes are independent of vocalic or consonantal status. Instead, we argue that the process of phoneme detection reflects more finely grained differences in acoustic/articulatory structure within the phonemic repertoire.
  • Ortega, G., Sumer, B., & Ozyurek, A. (2014). Type of iconicity matters: Bias for action-based signs in sign language acquisition. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1114-1119). Austin, Tx: Cognitive Science Society.

    Abstract

    Early studies investigating sign language acquisition claimed
    that signs whose structures are motivated by the form of their
    referent (iconic) are not favoured in language development.
    However, recent work has shown that the first signs in deaf
    children’s lexicon are iconic. In this paper we go a step
    further and ask whether different types of iconicity modulate
    learning sign-referent links. Results from a picture description
    task indicate that children and adults used signs with two
    possible variants differentially. While children signing to
    adults favoured variants that map onto actions associated with
    a referent (action signs), adults signing to another adult
    produced variants that map onto objects’ perceptual features
    (perceptual signs). Parents interacting with children used
    more action variants than signers in adult-adult interactions.
    These results are in line with claims that language
    development is tightly linked to motor experience and that
    iconicity can be a communicative strategy in parental input.
  • Parhammer*, S. I., Ebersberg*, M., Tippmann*, J., Stärk*, K., Opitz, A., Hinger, B., & Rossi, S. (2019). The influence of distraction on speech processing: How selective is selective attention? In Proceedings of Interspeech 2019 (pp. 3093-3097). doi:10.21437/Interspeech.2019-2699.

    Abstract

    -* indicates shared first authorship -
    The present study investigated the effects of selective attention on the processing of morphosyntactic errors in unattended parts of speech. Two groups of German native (L1) speakers participated in the present study. Participants listened to sentences in which irregular verbs were manipulated in three different conditions (correct, incorrect but attested ablaut pattern, incorrect and crosslinguistically unattested ablaut pattern). In order to track fast dynamic neural reactions to the stimuli, electroencephalography was used. After each sentence, participants in Experiment 1 performed a semantic judgement task, which deliberately distracted the participants from the syntactic manipulations and directed their attention to the semantic content of the sentence. In Experiment 2, participants carried out a syntactic judgement task, which put their attention on the critical stimuli. The use of two different attentional tasks allowed for investigating the impact of selective attention on speech processing and whether morphosyntactic processing steps are performed automatically. In Experiment 2, the incorrect attested condition elicited a larger N400 component compared to the correct condition, whereas in Experiment 1 no differences between conditions were found. These results suggest that the processing of morphosyntactic violations in irregular verbs is not entirely automatic but seems to be strongly affected by selective attention.
  • Peeters, D., Azar, Z., & Ozyurek, A. (2014). The interplay between joint attention, physical proximity, and pointing gesture in demonstrative choice. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1144-1149). Austin, Tx: Cognitive Science Society.
  • Perlman, M., Clark, N., & Tanner, J. (2014). Iconicity and ape gesture. In E. A. Cartmill, S. G. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 236-243). New Jersey: World Scientific.

    Abstract

    Iconic gestures are hypothesized to be c rucial to the evolution of language. Yet the important question of whether apes produce iconic gestures is the subject of considerable debate. This paper presents the current state of research on iconicity in ape gesture. In particular, it describes some of the empirical evidence suggesting that apes produce three different kinds of iconic gestures; it compares the iconicity hypothesis to other major hypotheses of ape gesture; and finally, it offers some directions for future ape gesture research
  • Petersson, K. M., Grenholm, P., & Forkstam, C. (2005). Artificial grammar learning and neural networks. In G. B. Bruna, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th Annual Conference of the Cognitive Science Society (pp. 1726-1731).

    Abstract

    Recent FMRI studies indicate that language related brain regions are engaged in artificial grammar (AG) processing. In the present study we investigate the Reber grammar by means of formal analysis and network simulations. We outline a new method for describing the network dynamics and propose an approach to grammar extraction based on the state-space dynamics of the network. We conclude that statistical frequency-based and rule-based acquisition procedures can be viewed as complementary perspectives on grammar learning, and more generally, that classical cognitive models can be viewed as a special case of a dynamical systems perspective on information processing
  • Poletiek, F. H., & Rassin E. (Eds.). (2005). Het (on)bewuste [Special Issue]. De Psycholoog.
  • Pouw, W., Paxton, A., Harrison, S. J., & Dixon, J. A. (2019). Acoustic specification of upper limb movement in voicing. In A. Grimminger (Ed.), Proceedings of the 6th Gesture and Speech in Interaction – GESPIN 6 (pp. 68-74). Paderborn: Universitaetsbibliothek Paderborn. doi:10.17619/UNIPB/1-812.
  • Pouw, W., & Dixon, J. A. (2019). Quantifying gesture-speech synchrony. In A. Grimminger (Ed.), Proceedings of the 6th Gesture and Speech in Interaction – GESPIN 6 (pp. 75-80). Paderborn: Universitaetsbibliothek Paderborn. doi:10.17619/UNIPB/1-812.

    Abstract

    Spontaneously occurring speech is often seamlessly accompanied by hand gestures. Detailed
    observations of video data suggest that speech and gesture are tightly synchronized in time,
    consistent with a dynamic interplay between body and mind. However, spontaneous gesturespeech
    synchrony has rarely been objectively quantified beyond analyses of video data, which
    do not allow for identification of kinematic properties of gestures. Consequently, the point in
    gesture which is held to couple with speech, the so-called moment of “maximum effort”, has
    been variably equated with the peak velocity, peak acceleration, peak deceleration, or the onset
    of the gesture. In the current exploratory report, we provide novel evidence from motiontracking
    and acoustic data that peak velocity is closely aligned, and shortly leads, the peak pitch
    (F0) of speech

    Additional information

    https://osf.io/9843h/
  • Räsänen, O., Seshadri, S., & Casillas, M. (2018). Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. In Proceedings of Interspeech 2018 (pp. 1200-1204). doi:10.21437/Interspeech.2018-1047.

    Abstract

    Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.
  • Ravignani, A., Garcia, M., Gross, S., de Reus, K., Hoeksema, N., Rubio-Garcia, A., & de Boer, B. (2018). Pinnipeds have something to say about speech and rhythm. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 399-401). Toruń, Poland: NCU Press. doi:10.12775/3991-1.095.
  • Ravignani, A., Bowling, D., & Kirby, S. (2014). The psychology of biological clocks: A new framework for the evolution of rhythm. In E. A. Cartmill, S. G. Roberts, & H. Lyn (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 262-269). Singapore: World Scientific.
  • Raviv, L., Meyer, A. S., & Lev-Ari, S. (2018). The role of community size in the emergence of linguistic structure. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 402-404). Toruń, Poland: NCU Press. doi:10.12775/3991-1.096.
  • Rissman, L., & Majid, A. (2019). Agency drives category structure in instrumental events. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2661-2667). Montreal, QB: Cognitive Science Society.

    Abstract

    Thematic roles such as Agent and Instrument have a long-standing place in theories of event representation. Nonetheless, the structure of these categories has been difficult to determine. We investigated how instrumental events, such as someone slicing bread with a knife, are categorized in English. Speakers described a variety of typical and atypical instrumental events, and we determined the similarity structure of their descriptions using correspondence analysis. We found that events where the instrument is an extension of an intentional agent were most likely to elicit similar language, highlighting the importance of agency in structuring instrumental categories.
  • Roberts, S. G., Dediu, D., & Levinson, S. C. (2014). Detecting differences between the languages of Neandertals and modern humans. In E. A. Cartmill, S. G. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 501-502). Singapore: World Scientific.

    Abstract

    Dediu and Levinson (2013) argue that Neandertals had essentially modern language and speech, and that they were in genetic contact with the ancestors of modern humans during our dispersal out of Africa. This raises the possibility of cultural and linguistic contact between the two human lineages. If such contact did occur, then it might have influenced the cultural evolution of the languages. Since the genetic traces of contact with Neandertals are limited to the populations outside of Africa, Dediu & Levinson predict that there may be structural differences between the present-day languages derived from languages in contact with Neanderthals, and those derived from languages that were not influenced by such contact. Since the signature of such deep contact might reside in patterns of features, they suggested that machine learning methods may be able to detect these differences. This paper attempts to test this hypothesis and to estimate particular linguistic features that are potential candidates for carrying a signature of Neandertal languages.
  • Roberts, S. G., & De Vos, C. (2014). Gene-culture coevolution of a linguistic system in two modalities. In B. De Boer, & T. Verhoef (Eds.), Proceedings of Evolang X, Workshop on Signals, Speech, and Signs (pp. 23-27).

    Abstract

    Complex communication can take place in a range of modalities such as auditory, visual, and tactile modalities. In a very general way, the modality that individuals use is constrained by their biological biases (humans cannot use magnetic fields directly to communicate to each other). The majority of natural languages have a large audible component. However, since humans can learn sign languages just as easily, it’s not clear to what extent the prevalence of spoken languages is due to biological biases, the social environment or cultural inheritance. This paper suggests that we can explore the relative contribution of these factors by modelling the spontaneous emergence of sign languages that are shared by the deaf and hearing members of relatively isolated communities. Such shared signing communities have arisen in enclaves around the world and may provide useful insights by demonstrating how languages evolve as the deaf proportion of its members has strong biases towards the visual language modality. In this paper we describe a model of cultural evolution in two modalities, combining aspects that are thought to impact the emergence of sign languages in a more general evolutionary framework. The model can be used to explore hypotheses about how sign languages emerge.
  • Roberts, S. G., Thompson, B., & Smith, K. (2014). Social interaction influences the evolution of cognitive biases for language. In E. A. Cartmill, S. G. Roberts, & H. Lyn (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 278-285). Singapore: World Scientific. doi:0.1142/9789814603638_0036.

    Abstract

    Models of cultural evolution demonstrate that the link between individual biases and population- level phenomena can be obscured by the process of cultural transmission (Kirby, Dowman, & Griffiths, 2007). However, recent extensions to these models predict that linguistic diversity will not emerge and that learners should evolve to expect little linguistic variation in their input (Smith & Thompson, 2012). We demonstrate that this result derives from assumptions that privilege certain kinds of social interaction by exploring a range of alternative social models. We find several evolutionary routes to linguistic diversity, and show that social interaction not only influences the kinds of biases which could evolve to support language, but also the effects those biases have on a linguistic system. Given the same starting situation, the evolution of biases for language learning and the distribution of linguistic variation are affected by the kinds of social interaction that a population privileges.
  • Rubio-Fernández, P., & Jara-Ettinger, J. (2018). Joint inferences of speakers’ beliefs and referents based on how they speak. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 991-996). Austin, TX: Cognitive Science Society.

    Abstract

    For almost two decades, the poor performance observed with the so-called Director task has been interpreted as evidence of limited use of Theory of Mind in communication. Here we propose a probabilistic model of common ground in referential communication that derives three inferences from an utterance: what the speaker is talking about in a visual context, what she knows about the context, and what referential expressions she prefers. We tested our model by comparing its inferences with those made by human participants and found that it closely mirrors their judgments, whereas an alternative model compromising the hearer’s expectations of cooperativeness and efficiency reveals a worse fit to the human data. Rather than assuming that common ground is fixed in a given exchange and may or may not constrain reference resolution, we show how common ground can be inferred as part of the process of reference assignment.
  • Saleh, A., Beck, T., Galke, L., & Scherp, A. (2018). Performance comparison of ad-hoc retrieval models over full-text vs. titles of documents. In M. Dobreva, A. Hinze, & M. Žumer (Eds.), Maturity and Innovation in Digital Libraries: 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings (pp. 290-303). Cham, Switzerland: Springer.

    Abstract

    While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.
  • Sauter, D., Wiland, J., Warren, J., Eisner, F., Calder, A., & Scott, S. K. (2005). Sounds of joy: An investigation of vocal expressions of positive emotions [Abstract]. Journal of Cognitive Neuroscience, 61(Supplement), B99.

    Abstract

    A series of experiment tested Ekman’s (1992) hypothesis that there are a set of positive basic emotions that are expressed using vocal para-linguistic sounds, e.g. laughter and cheers. The proposed categories investigated were amusement, contentment, pleasure, relief and triumph. Behavioural testing using a forced-choice task indicated that participants were able to reliably recognize vocal expressions of the proposed emotions. A cross-cultural study in the preliterate Himba culture in Namibia confirmed that these categories are also recognized across cultures. A recognition test of acoustically manipulated emotional vocalizations established that the recognition of different emotions utilizes different vocal cues, and that these in turn differ from the cues used when comprehending speech. In a study using fMRI we found that relative to a signal correlated noise baseline, the paralinguistic expressions of emotion activated bilateral superior temporal gyri and sulci, lateral and anterior to primary auditory cortex, which is consistent with the processing of non linguistic vocal cues in the auditory ‘what’ pathway. Notably amusement was associated with greater activation extending into both temporal poles and amygdale and insular cortex. Overall, these results support the claim that ‘happiness’ can be fractionated into amusement, pleasure, relief and triumph.
  • Scharenborg, O., & Merkx, D. (2018). The role of articulatory feature representation quality in a computational model of human spoken-word recognition. In Proceedings of the Machine Learning in Speech and Language Processing Workshop (MLSLP 2018).

    Abstract

    Fine-Tracker is a speech-based model of human speech
    recognition. While previous work has shown that Fine-Tracker
    is successful at modelling aspects of human spoken-word
    recognition, its speech recognition performance is not
    comparable to that of human performance, possibly due to
    suboptimal intermediate articulatory feature (AF)
    representations. This study investigates the effect of improved
    AF representations, obtained using a state-of-the-art deep
    convolutional network, on Fine-Tracker’s simulation and
    recognition performance: Although the improved AF quality
    resulted in improved speech recognition; it, surprisingly, did
    not lead to an improvement in Fine-Tracker’s simulation power.
  • Scharenborg, O., & Seneff, S. (2005). A two-pass strategy for handling OOVs in a large vocabulary recognition task. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, (pp. 1669-1672). ISCA Archive.

    Abstract

    This paper addresses the issue of large-vocabulary recognition in a specific word class. We propose a two-pass strategy in which only major cities are explicitly represented in the first stage lexicon. An unknown word model encoded as a phone loop is used to detect OOV city names (referred to as rare city names). After which SpeM, a tool that can extract words and word-initial cohorts from phone graphs on the basis of a large fallback lexicon, provides an N-best list of promising city names on the basis of the phone sequences generated in the first stage. This N-best list is then inserted into the second stage lexicon for a subsequent recognition pass. Experiments were conducted on a set of spontaneous telephone-quality utterances each containing one rare city name. We tested the size of the N-best list and three types of language models (LMs). The experiments showed that SpeM was able to include nearly 85% of the correct city names into an N-best list of 3000 city names when a unigram LM, which also boosted the unigram scores of a city name in a given state, was used.
  • Scharenborg, O. (2005). Parallels between HSR and ASR: How ASR can contribute to HSR. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1237-1240). ISCA Archive.

    Abstract

    In this paper, we illustrate the close parallels between the research fields of human speech recognition (HSR) and automatic speech recognition (ASR) using a computational model of human word recognition, SpeM, which was built using techniques from ASR. We show that ASR has proven to be useful for improving models of HSR by relieving them of some of their shortcomings. However, in order to build an integrated computational model of all aspects of HSR, a lot of issues remain to be resolved. In this process, ASR algorithms and techniques definitely can play an important role.
  • Schmidt, J., Janse, E., & Scharenborg, O. (2014). Age, hearing loss and the perception of affective utterances in conversational speech. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 1929-1933).

    Abstract

    This study investigates whether age and/or hearing loss influence the perception of the emotion dimensions arousal (calm vs. aroused) and valence (positive vs. negative attitude) in conversational speech fragments. Specifically, this study focuses on the relationship between participants' ratings of affective speech and acoustic parameters known to be associated with arousal and valence (mean F0, intensity, and articulation rate). Ten normal-hearing younger and ten older adults with varying hearing loss were tested on two rating tasks. Stimuli consisted of short sentences taken from a corpus of conversational affective speech. In both rating tasks, participants estimated the value of the emotion dimension at hand using a 5-point scale. For arousal, higher intensity was generally associated with higher arousal in both age groups. Compared to younger participants, older participants rated the utterances as less aroused, and showed a smaller effect of intensity on their arousal ratings. For valence, higher mean F0 was associated with more negative ratings in both age groups. Generally, age group differences in rating affective utterances may not relate to age group differences in hearing loss, but rather to other differences between the age groups, as older participants' rating patterns were not associated with their individual hearing loss.
  • Schoenmakers, G.-J., & De Swart, P. (2019). Adverbial hurdles in Dutch scrambling. In A. Gattnar, R. Hörnig, M. Störzer, & S. Featherston (Eds.), Proceedings of Linguistic Evidence 2018: Experimental Data Drives Linguistic Theory (pp. 124-145). Tübingen: University of Tübingen.

    Abstract

    This paper addresses the role of the adverb in Dutch direct object scrambling constructions. We report four experiments in which we investigate whether the structural position and the scope sensitivity of the adverb affect acceptability judgments of scrambling constructions and native speakers' tendency to scramble definite objects. We conclude that the type of adverb plays a key role in Dutch word ordering preferences.
  • Schuerman, W. L., McQueen, J. M., & Meyer, A. S. (2019). Speaker statistical averageness modulates word recognition in adverse listening conditions. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1203-1207). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    We tested whether statistical averageness (SA) at the level of the individual speaker could predict a speaker’s intelligibility. 28 female and 21 male speakers of Dutch were recorded producing 336 sentences,
    each containing two target nouns. Recordings were compared to those of all other same-sex speakers using dynamic time warping (DTW). For each sentence, the DTW distance constituted a metric
    of phonetic distance from one speaker to all other speakers. SA comprised the average of these distances. Later, the same participants performed a word recognition task on the target nouns in the same sentences, under three degraded listening conditions. In all three conditions, accuracy increased with SA. This held even when participants listened to their own utterances. These findings suggest that listeners process speech with respect to the statistical
    properties of the language spoken in their community, rather than using their own speech as a reference
  • Scott, D. R., & Cutler, A. (1982). Segmental cues to syntactic structure. In Proceedings of the Institute of Acoustics 'Spectral Analysis and its Use in Underwater Acoustics' (pp. E3.1-E3.4). London: Institute of Acoustics.
  • Seidlmayer, E., Galke, L., Melnychuk, T., Schultz, C., Tochtermann, K., & Förstner, K. U. (2019). Take it personally - A Python library for data enrichment for infometrical applications. In M. Alam, R. Usbeck, T. Pellegrini, H. Sack, & Y. Sure-Vetter (Eds.), Proceedings of the Posters and Demo Track of the 15th International Conference on Semantic Systems co-located with 15th International Conference on Semantic Systems (SEMANTiCS 2019).

    Abstract

    Like every other social sphere, science is influenced by individual characteristics of researchers. However, for investigations on scientific networks, only little data about the social background of researchers, e.g. social origin, gender, affiliation etc., is available.
    This paper introduces ”Take it personally - TIP”, a conceptual model and library currently under development, which aims to support the
    semantic enrichment of publication databases with semantically related background information which resides elsewhere in the (semantic) web, such as Wikidata.
    The supplementary information enriches the original information in the publication databases and thus facilitates the creation of complex scientific knowledge graphs. Such enrichment helps to improve the scientometric analysis of scientific publications as they can also take social backgrounds of researchers into account and to understand social structure in research communities.
  • Seijdel, N., Sakmakidis, N., De Haan, E. H. F., Bohte, S. M., & Scholte, H. S. (2019). Implicit scene segmentation in deeper convolutional neural networks. In Proceedings of the 2019 Conference on Cognitive Computational Neuroscience (pp. 1059-1062). doi:10.32470/CCN.2019.1149-0.

    Abstract

    Feedforward deep convolutional neural networks (DCNNs) are matching and even surpassing human performance on object recognition. This performance suggests that activation of a loose collection of image
    features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Recent findings in humans however, suggest that while feedforward activity may suffice for
    sparse scenes with isolated objects, additional visual operations ('routines') that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to
    performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects
    and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image. Results indicated less distinction between object- and background features for more shallow networks. For those networks, we observed a benefit of training on segmented objects (as compared to unsegmented objects). Overall, deeper networks trained on natural
    (unsegmented) scenes seem to perform implicit 'segmentation' of the objects from their background, possibly by improved selection of relevant features.
  • Seuren, P. A. M. (1969). Generatieve grammatika en semantiek. In Handelingen van het XXVII Vlaams Filologencongres (pp. 276-282).
  • Seuren, P. A. M. (1969). Il concetto di regola grammaticale. In La sintassi: atti del 3 Convegno internazionale di studi, Roma, 17-18 maggio 1969 (pp. 125-141). Rome: Bulzoni.
  • Seuren, P. A. M. (1982). Riorientamenti metodologici nello studio della variabilità linguistica. In D. Gambarara, & A. D'Atri (Eds.), Ideologia, filosofia e linguistica: Atti del Convegno Internazionale di Studi, Rende (CS) 15-17 Settembre 1978 ( (pp. 499-515). Roma: Bulzoni.
  • Seuren, P. A. M. (2014). Scope and external datives. In B. Cornillie, C. Hamans, & D. Jaspers (Eds.), Proceedings of a mini-symposium on Pieter Seuren's 80th birthday organised at the 47th Annual Meeting of the Societas Linguistica Europaea.

    Abstract

    In this study it is argued that scope, as a property of scope‐creating operators, is a real and important element in the semantico‐grammatical description of languages. The notion of scope is illustrated and, as far as possible, defined. A first idea is given of the ‘grammar of scope’, which defines the relation between scope in the logically structured semantic analysis (SA) of sentences on the one hand and surface structure on the other. Evidence is adduced showing that peripheral preposition phrases (PPPs) in the surface structure of sentences represent scope‐creating operators in SA, and that external datives fall into this category: they are scope‐creating PPPs. It follows that, in English and Dutch, the internal dative (I gave John a book) and the external dative (I gave a book to John) are not simple syntactic variants expressing the same meaning. Instead, internal datives are an integral part of the argument structure of the matrix predicate, whereas external datives represent scope‐creating operators in SA. In the Romance languages, the (non‐pronominal) external dative has been re‐analysed as an argument type dative, but this has not happened in English and Dutch, which have many verbs that only allow for an external dative (e.g. donate, reveal). When both datives are allowed, there are systematic semantic differences, including scope differences.
  • Shen, C., & Janse, E. (2019). Articulatory control in speech production. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 2533-2537). Canberra, Australia: Australasian Speech Science and Technology Association Inc.
  • Shen, C., Cooke, M., & Janse, E. (2019). Individual articulatory control in speech enrichment. In M. Ochmann, M. Vorländer, & J. Fels (Eds.), Proceedings of the 23rd International Congress on Acoustics (pp. 5726-5730). Berlin: Deutsche Gesellschaft für Akustik.

    Abstract

    ndividual talkers may use various strategies to enrich their speech while speaking in noise (i.e., Lombard speech) to improve their intelligibility. The resulting acoustic-phonetic changes in Lombard speech vary amongst different speakers, but it is unclear what causes these talker differences, and what impact these differences have on intelligibility. This study investigates the potential role of articulatory control in talkers’ Lombard speech enrichment success. Seventy-eight speakers read out sentences in both their habitual style and in a condition where they were instructed to speak clearly while hearing loud speech-shaped noise. A diadochokinetic (DDK) speech task that requires speakers to repetitively produce word or non-word sequences as accurately and as rapidly as possible, was used to quantify their articulatory control. Individuals’ predicted intelligibility in both speaking styles (presented at -5 dB SNR) was measured using an acoustic glimpse-based metric: the High-Energy Glimpse Proportion (HEGP). Speakers’ HEGP scores show a clear effect of speaking condition (better HEGP scores in the Lombard than habitual condition), but no simple effect of articulatory control on HEGP, nor an interaction between speaking condition and articulatory control. This indicates that individuals’ speech enrichment success as measured by the HEGP metric was not predicted by DDK performance.
  • Shkaravska, O., Van Eekelen, M., & Tamalet, A. (2014). Collected size semantics for strict functional programs over general polymorphic lists. In U. Dal Lago, & R. Pena (Eds.), Foundational and Practical Aspects of Resource Analysis: Third International Workshop, FOPARA 2013, Bertinoro, Italy, August 29-31, 2013, Revised Selected Papers (pp. 143-159). Berlin: Springer.

    Abstract

    Size analysis can be an important part of heap consumption analysis. This paper is a part of ongoing work about typing support for checking output-on-input size dependencies for function definitions in a strict functional language. A significant restriction for our earlier results is that inner data structures (e.g. in a list of lists) all must have the same size. Here, we make a big step forwards by overcoming this limitation via the introduction of higher-order size annotations such that variate sizes of inner data structures can be expressed. In this way the analysis becomes applicable for general, polymorphic nested lists.
  • Sidnell, J., & Stivers, T. (Eds.). (2005). Multimodal Interaction [Special Issue]. Semiotica, 156.
  • De Smedt, K., Hinrichs, E., Meurers, D., Skadiņa, I., Sanford Pedersen, B., Navarretta, C., Bel, N., Lindén, K., Lopatková, M., Hajič, J., Andersen, G., & Lenkiewicz, P. (2014). CLARA: A new generation of researchers in common language resources and their applications. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 2166-2174).
  • Smith, A. C., Monaghan, P., & Huettig, F. (2014). Examining strains and symptoms of the ‘Literacy Virus’: The effects of orthographic transparency on phonological processing in a connectionist model of reading. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014). Austin, TX: Cognitive Science Society.

    Abstract

    The effect of literacy on phonological processing has been described in terms of a virus that “infects all speech processing” (Frith, 1998). Empirical data has established that literacy leads to changes to the way in which phonological information is processed. Harm & Seidenberg (1999) demonstrated that a connectionist network trained to map between English orthographic and phonological representations display’s more componential phonological processing than a network trained only to stably represent the phonological forms of words. Within this study we use a similar model yet manipulate the transparency of orthographic-to-phonological mappings. We observe that networks trained on a transparent orthography are better at restoring phonetic features and phonemes. However, networks trained on non-transparent orthographies are more likely to restore corrupted phonological segments with legal, coarser linguistic units (e.g. onset, coda). Our study therefore provides an explicit description of how differences in orthographic transparency can lead to varying strains and symptoms of the ‘literacy virus’.
  • Smith, A. C., Monaghan, P., & Huettig, F. (2014). A comprehensive model of spoken word recognition must be multimodal: Evidence from studies of language-mediated visual attention. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014). Austin, TX: Cognitive Science Society.

    Abstract

    When processing language, the cognitive system has access to information from a range of modalities (e.g. auditory, visual) to support language processing. Language mediated visual attention studies have shown sensitivity of the listener to phonological, visual, and semantic similarity when processing a word. In a computational model of language mediated visual attention, that models spoken word processing as the parallel integration of information from phonological, semantic and visual processing streams, we simulate such effects of competition within modalities. Our simulations raised untested predictions about stronger and earlier effects of visual and semantic similarity compared to phonological similarity around the rhyme of the word. Two visual world studies confirmed these predictions. The model and behavioral studies suggest that, during spoken word comprehension, multimodal information can be recruited rapidly to constrain lexical selection to the extent that phonological rhyme information may exert little influence on this process.
  • Speed, L., & Majid, A. (2018). Music and odor in harmony: A case of music-odor synaesthesia. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2527-2532). Austin, TX: Cognitive Science Society.

    Abstract

    We report an individual with music-odor synaesthesia who experiences automatic and vivid odor sensations when she hears music. S’s odor associations were recorded on two days, and compared with those of two control participants. Overall, S produced longer descriptions, and her associations were of multiple odors at once, in comparison to controls who typically reported a single odor. Although odor associations were qualitatively different between S and controls, ratings of the consistency of their descriptions did not differ. This demonstrates that crossmodal associations between music and odor exist in non-synaesthetes too. We also found that S is better at discriminating between odors than control participants, and is more likely to experience emotion, memories and evaluations triggered by odors, demonstrating the broader impact of her synaesthesia.

    Additional information

    link to conference website
  • Sprenger, S. A., & Van Rijn, H. (2005). Clock time naming: Complexities of a simple task. In B. G. Bara, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th Annual Meeting of the Cognitive Science Society (pp. 2062-2067).
  • Sumer, B., Perniss, P., Zwitserlood, I., & Ozyurek, A. (2014). Learning to express "left-right" & "front-behind" in a sign versus spoken language. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1550-1555). Austin, Tx: Cognitive Science Society.

    Abstract

    Developmental studies show that it takes longer for
    children learning spoken languages to acquire viewpointdependent
    spatial relations (e.g., left-right, front-behind),
    compared to ones that are not viewpoint-dependent (e.g.,
    in, on, under). The current study investigates how
    children learn to express viewpoint-dependent relations
    in a sign language where depicted spatial relations can be
    communicated in an analogue manner in the space in
    front of the body or by using body-anchored signs (e.g.,
    tapping the right and left hand/arm to mean left and
    right). Our results indicate that the visual-spatial
    modality might have a facilitating effect on learning to
    express these spatial relations (especially in encoding of
    left-right) in a sign language (i.e., Turkish Sign
    Language) compared to a spoken language (i.e.,
    Turkish).
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2018). Analyzing reaction time sequences from human participants in auditory experiments. In Proceedings of Interspeech 2018 (pp. 971-975). doi:10.21437/Interspeech.2018-1728.

    Abstract

    Sequences of reaction times (RT) produced by participants in an experiment are not only influenced by the stimuli, but by many other factors as well, including fatigue, attention, experience, IQ, handedness, etc. These confounding factors result in longterm effects (such as a participant’s overall reaction capability) and in short- and medium-time fluctuations in RTs (often referred to as ‘local speed effects’). Because stimuli are usually presented in a random sequence different for each participant, local speed effects affect the underlying ‘true’ RTs of specific trials in different ways across participants. To be able to focus statistical analysis on the effects of the cognitive process under study, it is necessary to reduce the effect of confounding factors as much as possible. In this paper we propose and compare techniques and criteria for doing so, with focus on reducing (‘filtering’) the local speed effects. We show that filtering matters substantially for the significance analyses of predictors in linear mixed effect regression models. The performance of filtering is assessed by the average between-participant correlation between filtered RT sequences and by Akaike’s Information Criterion, an important measure of the goodness-of-fit of linear mixed effect regression models.
  • Ten Bosch, L., Mulder, K., & Boves, L. (2019). Phase synchronization between EEG signals as a function of differences between stimuli characteristics. In Proceedings of Interspeech 2019 (pp. 1213-1217). doi:10.21437/Interspeech.2019-2443.

    Abstract

    The neural processing of speech leads to specific patterns in the brain which can be measured as, e.g., EEG signals. When properly aligned with the speech input and averaged over many tokens, the Event Related Potential (ERP) signal is able to differentiate specific contrasts between speech signals. Well-known effects relate to the difference between expected and unexpected words, in particular in the N400, while effects in N100 and P200 are related to attention and acoustic onset effects. Most EEG studies deal with the amplitude of EEG signals over time, sidestepping the effect of phase and phase synchronization. This paper investigates the relation between phase in the EEG signals measured in an auditory lexical decision task by Dutch participants listening to full and reduced English word forms. We show that phase synchronization takes place across stimulus conditions, and that the so-called circular variance is narrowly related to the type of contrast between stimuli.
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2014). Comparing reaction time sequences from human participants and computational models. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 462-466).

    Abstract

    This paper addresses the question how to compare reaction times computed by a computational model of speech comprehension with observed reaction times by participants. The question is based on the observation that reaction time sequences substantially differ per participant, which raises the issue of how exactly the model is to be assessed. Part of the variation in reaction time sequences is caused by the so-called local speed: the current reaction time correlates to some extent with a number of previous reaction times, due to slowly varying variations in attention, fatigue etc. This paper proposes a method, based on time series analysis, to filter the observed reaction times in order to separate the local speed effects. Results show that after such filtering the between-participant correlations increase as well as the average correlation between participant and model increases. The presented technique provides insights into relevant aspects that are to be taken into account when comparing reaction time sequences
  • ten Bosch, L., & Scharenborg, O. (2005). ASR decoding in a computational model of human word recognition. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1241-1244). ISCA Archive.

    Abstract

    This paper investigates the interaction between acoustic scores and symbolic mismatch penalties in multi-pass speech decoding techniques that are based on the creation of a segment graph followed by a lexical search. The interaction between acoustic and symbolic mismatches determines to a large extent the structure of the search space of these multipass approaches. The background of this study is a recently developed computational model of human word recognition, called SpeM. SpeM is able to simulate human word recognition data and is built as a multi-pass speech decoder. Here, we focus on unravelling the structure of the search space that is used in SpeM and similar decoding strategies. Finally, we elaborate on the close relation between distances in this search space, and distance measures in search spaces that are based on a combination of acoustic and phonetic features.
  • Ten Bosch, L., & Boves, L. (2018). Information encoding by deep neural networks: what can we learn? In Proceedings of Interspeech 2018 (pp. 1457-1461). doi:10.21437/Interspeech.2018-1896.

    Abstract

    The recent advent of deep learning techniques in speech tech-nology and in particular in automatic speech recognition hasyielded substantial performance improvements. This suggeststhat deep neural networks (DNNs) are able to capture structurein speech data that older methods for acoustic modeling, suchas Gaussian Mixture Models and shallow neural networks failto uncover. In image recognition it is possible to link repre-sentations on the first couple of layers in DNNs to structuralproperties of images, and to representations on early layers inthe visual cortex. This raises the question whether it is possi-ble to accomplish a similar feat with representations on DNNlayers when processing speech input. In this paper we presentthree different experiments in which we attempt to untanglehow DNNs encode speech signals, and to relate these repre-sentations to phonetic knowledge, with the aim to advance con-ventional phonetic concepts and to choose the topology of aDNNs more efficiently. Two experiments investigate represen-tations formed by auto-encoders. A third experiment investi-gates representations on convolutional layers that treat speechspectrograms as if they were images. The results lay the basisfor future experiments with recursive networks.
  • Ter Bekke, M., Ozyurek, A., & Ünal, E. (2019). Speaking but not gesturing predicts motion event memory within and across languages. In A. Goel, C. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2940-2946). Montreal, QB: Cognitive Science Society.

    Abstract

    In everyday life, people see, describe and remember motion events. We tested whether the type of motion event information (path or manner) encoded in speech and gesture predicts which information is remembered and if this varies across speakers of typologically different languages. We focus on intransitive motion events (e.g., a woman running to a tree) that are described differently in speech and co-speech gesture across languages, based on how these languages typologically encode manner and path information (Kita & Özyürek, 2003; Talmy, 1985). Speakers of Dutch (n = 19) and Turkish (n = 22) watched and described motion events. With a surprise (i.e. unexpected) recognition memory task, memory for manner and path components of these events was measured. Neither Dutch nor Turkish speakers’ memory for manner went above chance levels. However, we found a positive relation between path speech and path change detection: participants who described the path during encoding were more accurate at detecting changes to the path of an event during the memory task. In addition, the relation between path speech and path memory changed with native language: for Dutch speakers encoding path in speech was related to improved path memory, but for Turkish speakers no such relation existed. For both languages, co-speech gesture did not predict memory speakers. We discuss the implications of these findings for our understanding of the relations between speech, gesture, type of encoding in language and memory.
  • Thompson, B., & Lupyan, G. (2018). Automatic estimation of lexical concreteness in 77 languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1122-1127). Austin, TX: Cognitive Science Society.

    Abstract

    We estimate lexical Concreteness for millions of words across 77 languages. Using a simple regression framework, we combine vector-based models of lexical semantics with experimental norms of Concreteness in English and Dutch. By applying techniques to align vector-based semantics across distinct languages, we compute and release Concreteness estimates at scale in numerous languages for which experimental norms are not currently available. This paper lays out the technique and its efficacy. Although this is a difficult dataset to evaluate immediately, Concreteness estimates computed from English correlate with Dutch experimental norms at $\rho$ = .75 in the vocabulary at large, increasing to $\rho$ = .8 among Nouns. Our predictions also recapitulate attested relationships with word frequency. The approach we describe can be readily applied to numerous lexical measures beyond Concreteness
  • Thompson, B., Roberts, S., & Lupyan, G. (2018). Quantifying semantic similarity across languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2551-2556). Austin, TX: Cognitive Science Society.

    Abstract

    Do all languages convey semantic knowledge in the same way? If language simply mirrors the structure of the world, the answer should be a qualified “yes”. If, however, languages impose structure as much as reflecting it, then even ostensibly the “same” word in different languages may mean quite different things. We provide a first pass at a large-scale quantification of cross-linguistic semantic alignment of approximately 1000 meanings in 55 languages. We find that the translation equivalents in some domains (e.g., Time, Quantity, and Kinship) exhibit high alignment across languages while the structure of other domains (e.g., Politics, Food, Emotions, and Animals) exhibits substantial cross-linguistic variability. Our measure of semantic alignment correlates with known phylogenetic distances between languages: more phylogenetically distant languages have less semantic alignment. We also find semantic alignment to correlate with cultural distances between societies speaking the languages, suggesting a rich co-adaptation of language and culture even in domains of experience that appear most constrained by the natural world
  • Torreira, F., Roberts, S. G., & Hammarström, H. (2014). Functional trade-off between lexical tone and intonation: Typological evidence from polar-question marking. In C. Gussenhoven, Y. Chen, & D. Dediu (Eds.), Proceedings of the 4th International Symposium on Tonal Aspects of Language (pp. 100-103).

    Abstract

    Tone languages are often reported to make use of utterancelevel intonation as well as of lexical tone. We test the alternative hypotheses that a) the coexistence of lexical tone and utterance-level intonation in tone languages results in a diminished functional load for intonation, and b) that lexical tone and intonation can coexist in tone languages without undermining each other’s functional load in a substantial way. In order to do this, we collected data from two large typological databases, and performed mixed-effects and phylogenetic regression analyses controlling for genealogical and areal factors to estimate the probability of a language exhibiting grammatical devices for encoding polar questions given its status as a tonal or an intonation-only language. Our analyses indicate that, while both tone and intonational languages tend to develop grammatical devices for marking polar questions above chance level, tone languages do this at a significantly higher frequency, with estimated probabilities ranging between 0.88 and .98. This statistical bias provides cross-linguistic empirical support to the view that the use of tonal features to mark lexical contrasts leads to a diminished functional load for utterance-level intonation.
  • Torreira, F., Simonet, M., & Hualde, J. I. (2014). Quasi-neutralization of stress contrasts in Spanish. In N. Campbell, D. Gibbon, & D. Hirst (Eds.), Proceedings of Speech Prosody 2014 (pp. 197-201).

    Abstract

    We investigate the realization and discrimination of lexical stress contrasts in pitch-unaccented words in phrase-medial position in Spanish, a context in which intonational pitch accents are frequently absent. Results from production and perception experiments show that in this context durational and intensity cues to stress are produced by speakers and used by listeners above chance level. However, due to substantial amounts of phonetic overlap between stress categories in production, and of numerous errors in the identification of stress categories in perception, we suggest that, in the absence of intonational cues, Spanish speakers engaged in online language use must rely on contextual information in order to distinguish stress contrasts.
  • Tourtouri, E. N., Delogu, F., & Crocker, M. W. (2018). Specificity and entropy reduction in situated referential processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3356-3361). Austin: Cognitive Science Society.

    Abstract

    In situated communication, reference to an entity in the shared visual context can be established using eitheranexpression that conveys precise (minimally specified) or redundant (over-specified) information. There is, however, along-lasting debate in psycholinguistics concerningwhether the latter hinders referential processing. We present evidence from an eyetrackingexperiment recordingfixations as well asthe Index of Cognitive Activity –a novel measure of cognitive workload –supporting the view that over-specifications facilitate processing. We further present originalevidence that, above and beyond the effect of specificity,referring expressions thatuniformly reduce referential entropyalso benefitprocessing
  • Trippel, T., Broeder, D., Durco, M., & Ohren, O. (2014). Towards automatic quality assessment of component metadata. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 3851-3856).

    Abstract

    Measuring the quality of metadata is only possible by assessing the quality of the underlying schema and the metadata instance. We propose some factors that are measurable automatically for metadata according to the CMD framework, taking into account the variability of schemas that can be defined in this framework. The factors include among others the number of elements, the (re-)use of reusable components, the number of filled in elements. The resulting score can serve as an indicator of the overall quality of the CMD instance, used for feedback to metadata providers or to provide an overview of the overall quality of metadata within a reposi-tory. The score is independent of specific schemas and generalizable. An overall assessment of harvested metadata is provided in form of statistical summaries and the distribution, based on a corpus of harvested metadata. The score is implemented in XQuery and can be used in tools, editors and repositories
  • Troncoso Ruiz, A., Ernestus, M., & Broersma, M. (2019). Learning to produce difficult L2 vowels: The effects of awareness-rasing, exposure and feedback. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 1094-1098). Canberra, Australia: Australasian Speech Science and Technology Association Inc.
  • Vagliano, I., Galke, L., Mai, F., & Scherp, A. (2018). Using adversarial autoencoders for multi-modal automatic playlist continuation. In C.-W. Chen, P. Lamere, M. Schedl, & H. Zamani (Eds.), RecSys Challenge '18: Proceedings of the ACM Recommender Systems Challenge 2018 (pp. 5.1-5.6). New York: ACM. doi:10.1145/3267471.3267476.

    Abstract

    The task of automatic playlist continuation is generating a list of recommended tracks that can be added to an existing playlist. By suggesting appropriate tracks, i. e., songs to add to a playlist, a recommender system can increase the user engagement by making playlist creation easier, as well as extending listening beyond the end of current playlist. The ACM Recommender Systems Challenge 2018 focuses on such task. Spotify released a dataset of playlists, which includes a large number of playlists and associated track listings. Given a set of playlists from which a number of tracks have been withheld, the goal is predicting the missing tracks in those playlists. We participated in the challenge as the team Unconscious Bias and, in this paper, we present our approach. We extend adversarial autoencoders to the problem of automatic playlist continuation. We show how multiple input modalities, such as the playlist titles as well as track titles, artists and albums, can be incorporated in the playlist continuation task.
  • Valtersson, E., & Torreira, F. (2014). Rising intonation in spontaneous French: How well can continuation statements and polar questions be distinguished? In N. Campbell, D. Gibbon, & D. Hirst (Eds.), Proceedings of Speech Prosody 2014 (pp. 785-789).

    Abstract

    This study investigates whether a clear distinction can be made between the prosody of continuation statements and polar questions in conversational French, which are both typically produced with final rising intonation. We show that the two utterance types can be distinguished over chance level by several pitch, duration, and intensity cues. However, given the substantial amount of phonetic overlap and the nature of the observed differences between the two utterance types (i.e. overall F0 scaling, final intensity drop and degree of final lengthening), we propose that variability in the phonetic detail of intonation rises in French is due to the effects of interactional factors (e.g. turn-taking context, type of speech act) rather than to the existence of two distinct rising intonation contour types in this language.
  • Van Dooren, A., Tulling, M., Cournane, A., & Hacquard, V. (2019). Discovering modal polysemy: Lexical aspect might help. In M. Brown, & B. Dailey (Eds.), BUCLD 43: Proceedings of the 43rd annual Boston University Conference on Language Development (pp. 203-216). Sommerville, MA: Cascadilla Press.
  • Van Valin Jr., R. D. (1987). Aspects of the interaction of syntax and pragmatics: Discourse coreference mechanisms and the typology of grammatical systems. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 513-531). Amsterdam: Benjamins.
  • Van Valin Jr., R. D. (1987). Pragmatics, island phenomena, and linguistic competence. In A. M. Farley, P. T. Farley, & K.-E. McCullough (Eds.), CLS 22. Papers from the parasession on pragmatics and grammatical theory (pp. 223-233). Chicago Linguistic Society.
  • Vernes, S. C. (2018). Vocal learning in bats: From genes to behaviour. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 516-518). Toruń, Poland: NCU Press. doi:10.12775/3991-1.128.
  • Von Holzen, K., & Bergmann, C. (2018). A Meta-Analysis of Infants’ Mispronunciation Sensitivity Development. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1159-1164). Austin, TX: Cognitive Science Society.

    Abstract

    Before infants become mature speakers of their native language, they must acquire a robust word-recognition system which allows them to strike the balance between allowing some variation (mood, voice, accent) and recognizing variability that potentially changes meaning (e.g. cat vs hat). The current meta-analysis quantifies how the latter, termed mispronunciation sensitivity, changes over infants’ first three years, testing competing predictions of mainstream language acquisition theories. Our results show that infants were sensitive to mispronunciations, but accepted them as labels for target objects. Interestingly, and in contrast to predictions of mainstream theories, mispronunciation sensitivity was not modulated by infant age, suggesting that a sufficiently flexible understanding of native language phonology is in place at a young age.
  • Wagner, M. A., Broersma, M., McQueen, J. M., & Lemhöfer, K. (2019). Imitating speech in an unfamiliar language and an unfamiliar non-native accent in the native language. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1362-1366). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    This study concerns individual differences in speech imitation ability and the role that lexical representations play in imitation. We examined 1) whether imitation of sounds in an unfamiliar language (L0) is related to imitation of sounds in an unfamiliar
    non-native accent in the speaker’s native language (L1) and 2) whether it is easier or harder to imitate speech when you know the words to be imitated. Fifty-nine native Dutch speakers imitated words with target vowels in Basque (/a/ and /e/) and Greekaccented
    Dutch (/i/ and /u/). Spectral and durational
    analyses of the target vowels revealed no relationship between the success of L0 and L1 imitation and no difference in performance between tasks (i.e., L1
    imitation was neither aided nor blocked by lexical knowledge about the correct pronunciation). The results suggest instead that the relationship of the vowels to native phonological categories plays a bigger role in imitation

Share this page