Publications

Displaying 101 - 200 of 211
  • Kempen, G. (2004). Human grammatical coding: Shared structure formation resources for grammatical encoding and decoding. In Cuny 2004 - The 17th Annual CUNY Conference on Human Sentence Processing. March 25-27, 2004. University of Maryland (pp. 66).
  • Kempen, G. (1996). Human language technology can modernize writing and grammar instruction. In COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2 (pp. 1005-1006). Stroudsburg, PA: Association for Computational Linguistics.
  • Kempen, G., & Janssen, S. (1996). Omspellen: Reuze(n)karwei of peule(n)schil? In H. Croll, & J. Creutzberg (Eds.), Proceedings of the 5e Dag van het Document (pp. 143-146). Projectbureau Croll en Creutzberg.
  • Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In Gesture and Sign-Language in Human-Computer Interaction (Lecture Notes in Artificial Intelligence - LNCS Subseries, Vol. 1371) (pp. 23-35). Berlin, Germany: Springer-Verlag.

    Abstract

    The previous literature has suggested that the hand movement in co-speech gestures and signs consists of a series of phases with qualitatively different dynamic characteristics. In this paper, we propose a syntagmatic rule system for movement phases that applies to both co-speech gestures and signs. Descriptive criteria for the rule system were developed for the analysis video-recorded continuous production of signs and gesture. It involves segmenting a stream of body movement into phases and identifying different phase types. Two human coders used the criteria to analyze signs and cospeech gestures that are produced in natural discourse. It was found that the criteria yielded good inter-coder reliability. These criteria can be used for the technology of automatic recognition of signs and co-speech gestures in order to segment continuous production and identify the potentially meaningbearing phase.
  • Klein, W. (Ed.). (2004). Philologie auf neuen Wegen [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 136.
  • Klein, W. (Ed.). (2004). Universitas [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik (LiLi), 134.
  • Klein, W., & Musan, R. (Eds.). (1999). Das deutsche Perfekt [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (113).
  • Klein, W., & Franceschini, R. (Eds.). (2003). Einfache Sprache [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 131.
  • Klein, W. (Ed.). (1998). Kaleidoskop [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (112).
  • Klein, W. (Ed.). (1987). Sprache und Ritual [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (65).
  • Klein, W., & Schlieben-Lange, B. (Eds.). (1996). Sprache und Subjektivität I [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (101).
  • Klein, W., & Schlieben-Lange, B. (Eds.). (1996). Sprache und Subjektivität II [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (102).
  • Klein, W. (Ed.). (1996). Zweitspracherwerb [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (104).
  • Kohatsu, T., Akamine, S., Sato, M., & Niikuni, K. (2022). Individual differences in empathy affect perspective adoption in language comprehension. In Proceedings of the 39th Annual Meeting of Japanese Cognitive Science Society (pp. 652-656). Tokyo: Japanese Cognitive Science Society.
  • Koutamanis, E., Kootstra, G. J., Dijkstra, T., & Unsworth., S. (2021). Lexical priming as evidence for language-nonselective access in the simultaneous bilingual child's lexicon. In D. Dionne, & L.-A. Vidal Covas (Eds.), BUCLD 45: Proceedings of the 45th annual Boston University Conference on Language Development (pp. 413-430). Sommerville, MA: Cascadilla Press.
  • Kuijpers, C., Van Donselaar, W., & Cutler, A. (1996). Phonological variation: Epenthesis and deletion of schwa in Dutch. In H. T. Bunnell (Ed.), Proceedings of the Fourth International Conference on Spoken Language Processing: Vol. 1 (pp. 94-97). New York: Institute of Electrical and Electronics Engineers.

    Abstract

    Two types of phonological variation in Dutch, resulting from optional rules, are schwa epenthesis and schwa deletion. In a lexical decision experiment it was investigated whether the phonological variants were processed similarly to the standard forms. It was found that the two types of variation patterned differently. Words with schwa epenthesis were processed faster and more accurately than the standard forms, whereas words with schwa deletion led to less fast and less accurate responses. The results are discussed in relation to the role of consonant-vowel alternations in speech processing and the perceptual integrity of onset clusters.
  • Kuzla, C. (2003). Prosodically-conditioned variation in the realization of domain-final stops and voicing assimilation of domain-initial fricatives in German. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2829-2832). Adelaide: Causal Productions.
  • De Lange, F. P., Hagoort, P., & Toni, I. (2003). Differential fronto-parietal contributions to visual and motor imagery. NeuroImage, 19(2), e2094-e2095.

    Abstract

    Mental imagery is a cognitive process crucial to human reasoning. Numerous studies have characterized specific
    instances of this cognitive ability, as evoked by visual imagery (VI) or motor imagery (MI) tasks. However, it
    remains unclear which neural resources are shared between VI and MI, and which are exclusively related to MI.
    To address this issue, we have used fMRI to measure human brain activity during performance of VI and MI
    tasks. Crucially, we have modulated the imagery process by manipulating the degree of mental rotation necessary
    to solve the tasks. We focused our analysis on changes in neural signal as a function of the degree of mental
    rotation in each task.
  • Lattenkamp, E. Z., Vernes, S. C., & Wiegrebe, L. (2018). Mammalian models for the study of vocal learning: A new paradigm in bats. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 235-237). Toruń, Poland: NCU Press. doi:10.12775/3991-1.056.
  • Lauscher, A., Eckert, K., Galke, L., Scherp, A., Rizvi, S. T. R., Ahmed, S., Dengel, A., Zumstein, P., & Klein, A. (2018). Linked open citation database: Enabling libraries to contribute to an open and interconnected citation graph. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 109-118). New York: ACM. doi:10.1145/3197026.3197050.

    Abstract

    Citations play a crucial role in the scientific discourse, in information retrieval, and in bibliometrics. Many initiatives are currently promoting the idea of having free and open citation data. Creation of citation data, however, is not part of the cataloging workflow in libraries nowadays.
    In this paper, we present our project Linked Open Citation Database, in which we design distributed processes and a system infrastructure based on linked data technology. The goal is to show that efficiently cataloging citations in libraries using a semi-automatic approach is possible. We specifically describe the current state of the workflow and its implementation. We show that we could significantly improve the automatic reference extraction that is crucial for the subsequent data curation. We further give insights on the curation and linking process and provide evaluation results that not only direct the further development of the project, but also allow us to discuss its overall feasibility.
  • Lefever, E., Hendrickx, I., Croijmans, I., Van den Bosch, A., & Majid, A. (2018). Discovering the language of wine reviews: A text mining account. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 3297-3302). Paris: LREC.

    Abstract

    It is widely held that smells and flavors are impossible to put into words. In this paper we test this claim by seeking predictive patterns in wine reviews, which ostensibly aim to provide guides to perceptual content. Wine reviews have previously been critiqued as random and meaningless. We collected an English corpus of wine reviews with their structured metadata, and applied machine learning techniques to automatically predict the wine's color, grape variety, and country of origin. To train the three supervised classifiers, three different information sources were incorporated: lexical bag-of-words features, domain-specific terminology features, and semantic word embedding features. In addition, using regression analysis we investigated basic review properties, i.e., review length, average word length, and their relationship to the scalar values of price and review score. Our results show that wine experts do share a common vocabulary to describe wines and they use this in a consistent way, which makes it possible to automatically predict wine characteristics based on the review text alone. This means that odors and flavors may be more expressible in language than typically acknowledged.
  • Levelt, C. C., Fikkert, P., & Schiller, N. O. (2003). Metrical priming in speech production. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2481-2485). Adelaide: Causal Productions.

    Abstract

    In this paper we report on four experiments in which we attempted to prime the stress position of Dutch bisyllabic target nouns. These nouns, picture names, had stress on either the first or the second syllable. Auditory prime words had either the same stress as the target or a different stress (e.g., WORtel – MOtor vs. koSTUUM – MOtor; capital letters indicate stressed syllables in prime – target pairs). Furthermore, half of the prime words were semantically related, the other half were unrelated. In none of the experiments a stress priming effect was found. This could mean that stress is not stored in the lexicon. An additional finding was that targets with initial stress had a faster response than targets with a final stress. We hypothesize that bisyllabic words with final stress take longer to be encoded because this stress pattern is irregular with respect to the lexical distribution of bisyllabic stress patterns, even though it can be regular in terms of the metrical stress rules of Dutch.
  • Levelt, W. J. M., & Schriefers, H. (1987). Stages of lexical access. In G. A. Kempen (Ed.), Natural language generation: new results in artificial intelligence, psychology and linguistics (pp. 395-404). Dordrecht: Nijhoff.
  • Levinson, S. C. (1987). Minimization and conversational inference. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 61-129). Benjamins.
  • Levshina, N., & Moran, S. (Eds.). (2021). Efficiency in human languages: Corpus evidence for universal principles [Special Issue]. Linguistics Vanguard, 7(s3).
  • Liesenfeld, A., & Dingemanse, M. (2022). Bottom-up discovery of structure and variation in response tokens (‘backchannels’) across diverse languages. In Proceedings of Interspeech 2022 (pp. 1126-1130).

    Abstract

    Response tokens (also known as backchannels, continuers, or feedback) are a frequent feature of human interaction, where they serve to display understanding and streamline turn-taking. We propose a bottom-up method to study responsive behaviour across 16 languages (8 language families). We use sequential context and recurrence of turns formats to identify candidate response tokens in a language-agnostic way across diverse conversational corpora. We then use UMAP clustering directly on speech signals to represent structure and variation. We find that (i) written orthographic annotations underrepresent the attested variation, (ii) distinctions between formats can be gradient rather than discrete, (iii) most languages appear to make available a broad distinction between a minimal nasal format `mm' and a fuller `yeah’-like format. Charting this aspect of human interaction contributes to our understanding of interactional infrastructure across languages and can inform the design of speech technologies.
  • Liesenfeld, A., & Dingemanse, M. (2022). Building and curating conversational corpora for diversity-aware language science and technology. In F. Béchet, P. Blache, K. Choukri, C. Cieri, T. DeClerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, & J. Odijk (Eds.), Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022) (pp. 1178-1192). Marseille, France: European Language Resources Association.

    Abstract

    We present an analysis pipeline and best practice guidelines for building and curating corpora of everyday conversation in diverse languages. Surveying language documentation corpora and other resources that cover 67 languages and varieties from 28 phyla, we describe the compilation and curation process, specify minimal properties of a unified format for interactional data, and develop methods for quality control that take into account turn-taking and timing. Two case studies show the broad utility of conversational data for (i) charting human interactional infrastructure and (ii) tracing challenges and opportunities for current ASR solutions. Linguistically diverse conversational corpora can provide new insights for the language sciences and stronger empirical foundations for language technology.
  • Lopopolo, A., Frank, S. L., Van den Bosch, A., Nijhof, A., & Willems, R. M. (2018). The Narrative Brain Dataset (NBD), an fMRI dataset for the study of natural language processing in the brain. In B. Devereux, E. Shutova, & C.-R. Huang (Eds.), Proceedings of LREC 2018 Workshop "Linguistic and Neuro-Cognitive Resources (LiNCR) (pp. 8-11). Paris: LREC.

    Abstract

    We present the Narrative Brain Dataset, an fMRI dataset that was collected during spoken presentation of short excerpts of three
    stories in Dutch. Together with the brain imaging data, the dataset contains the written versions of the stimulation texts. The texts are
    accompanied with stochastic (perplexity and entropy) and semantic computational linguistic measures. The richness and unconstrained
    nature of the data allows the study of language processing in the brain in a more naturalistic setting than is common for fMRI studies.
    We hope that by making NBD available we serve the double purpose of providing useful neural data to researchers interested in natural
    language processing in the brain and to further stimulate data sharing in the field of neuroscience of language.
  • Lupyan, G., Wendorf, A., Berscia, L. M., & Paul, J. (2018). Core knowledge or language-augmented cognition? The case of geometric reasoning. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 252-254). Toruń, Poland: NCU Press. doi:10.12775/3991-1.062.
  • Mai, F., Galke, L., & Scherp, A. (2018). Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 169-178). New York: ACM.

    Abstract

    For (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competitive with the performance on the full-texts if the same number of training samples is used for training. However, it is much easier to obtain title data in large quantities and to use it for training than full-text data. In this paper, we investigate the question how models obtained from training on increasing amounts of title training data compare to models from training on a constant number of full-texts. We evaluate this question on a large-scale dataset from the medical domain (PubMed) and from economics (EconBiz). In these datasets, the titles and annotations of millions of publications are available, and they outnumber the available full-texts by a factor of 20 and 15, respectively. To exploit these large amounts of data to their full potential, we develop three strong deep learning classifiers and evaluate their performance on the two datasets. The results are promising. On the EconBiz dataset, all three classifiers outperform their full-text counterparts by a large margin. The best title-based classifier outperforms the best full-text method by 9.4%. On the PubMed dataset, the best title-based method almost reaches the performance of the best full-text classifier, with a difference of only 2.9%.
  • Majid, A., Van Staden, M., & Enfield, N. J. (2004). The human body in cognition, brain, and typology. In K. Hovie (Ed.), Forum Handbook, 4th International Forum on Language, Brain, and Cognition - Cognition, Brain, and Typology: Toward a Synthesis (pp. 31-35). Sendai: Tohoku University.

    Abstract

    The human body is unique: it is both an object of perception and the source of human experience. Its universality makes it a perfect resource for asking questions about how cognition, brain and typology relate to one another. For example, we can ask how speakers of different languages segment and categorize the human body. A dominant view is that body parts are “given” by visual perceptual discontinuities, and that words are merely labels for these visually determined parts (e.g., Andersen, 1978; Brown, 1976; Lakoff, 1987). However, there are problems with this view. First it ignores other perceptual information, such as somatosensory and motoric representations. By looking at the neural representations of sesnsory representations, we can test how much of the categorization of the human body can be done through perception alone. Second, we can look at language typology to see how much universality and variation there is in body-part categories. A comparison of a range of typologically, genetically and areally diverse languages shows that the perceptual view has only limited applicability (Majid, Enfield & van Staden, in press). For example, using a “coloring-in” task, where speakers of seven different languages were given a line drawing of a human body and asked to color in various body parts, Majid & van Staden (in prep) show that languages vary substantially in body part segmentation. For example, Jahai (Mon-Khmer) makes a lexical distinction between upper arm, lower arm, and hand, but Lavukaleve (Papuan Isolate) has just one word to refer to arm, hand, and leg. This shows that body part categorization is not a straightforward mapping of words to visually determined perceptual parts.
  • Majid, A., Van Staden, M., Boster, J. S., & Bowerman, M. (2004). Event categorization: A cross-linguistic perspective. In K. Forbus, D. Gentner, & T. Tegier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 885-890). Mahwah, NJ: Erlbaum.

    Abstract

    Many studies in cognitive science address how people categorize objects, but there has been comparatively little research on event categorization. This study investigated the categorization of events involving material destruction, such as “cutting” and “breaking”. Speakers of 28 typologically, genetically, and areally diverse languages described events shown in a set of video-clips. There was considerable cross-linguistic agreement in the dimensions along which the events were distinguished, but there was variation in the number of categories and the placement of their boundaries.
  • Mamus, E., Speed, L. J., Ozyurek, A., & Majid, A. (2021). Sensory modality of input influences encoding of motion events in speech but not co-speech gestures. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 376-382). Vienna: Cognitive Science Society.

    Abstract

    Visual and auditory channels have different affordances and
    this is mirrored in what information is available for linguistic
    encoding. The visual channel has high spatial acuity, whereas
    the auditory channel has better temporal acuity. These
    differences may lead to different conceptualizations of events
    and affect multimodal language production. Previous studies of
    motion events typically present visual input to elicit speech and
    gesture. The present study compared events presented as audio-
    only, visual-only, or multimodal (visual+audio) input and
    assessed speech and co-speech gesture for path and manner of
    motion in Turkish. Speakers with audio-only input mentioned
    path more and manner less in verbal descriptions, compared to
    speakers who had visual input. There was no difference in the
    type or frequency of gestures across conditions, and gestures
    were dominated by path-only gestures. This suggests that input
    modality influences speakers’ encoding of path and manner of
    motion events in speech, but not in co-speech gestures.
  • Matsuo, A. (2004). Young children's understanding of ongoing vs. completion in present and perfective participles. In J. v. Kampen, & S. Baauw (Eds.), Proceedings of GALA 2003 (pp. 305-316). Utrecht: Netherlands Graduate School of Linguistics (LOT).
  • McQueen, J. M., & Cho, T. (2003). The use of domain-initial strengthening in segmentation of continuous English speech. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2993-2996). Adelaide: Causal Productions.
  • McQueen, J. M., & Cutler, A. (1998). Spotting (different kinds of) words in (different kinds of) context. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2791-2794). Sydney: ICSLP.

    Abstract

    The results of a word-spotting experiment are presented in which Dutch listeners tried to spot different types of bisyllabic Dutch words embedded in different types of nonsense contexts. Embedded verbs were not reliably harder to spot than embedded nouns; this suggests that nouns and verbs are recognised via the same basic processes. Iambic words were no harder to spot than trochaic words, suggesting that trochaic words are not in principle easier to recognise than iambic words. Words were harder to spot in consonantal contexts (i.e., contexts which themselves could not be words) than in longer contexts which contained at least one vowel (i.e., contexts which, though not words, were possible words of Dutch). A control experiment showed that this difference was not due to acoustic differences between the words in each context. The results support the claim that spoken-word recognition is sensitive to the viability of sound sequences as possible words.
  • Meeuwissen, M., Roelofs, A., & Levelt, W. J. M. (2003). Naming analog clocks conceptually facilitates naming digital clocks. In Proceedings of XIII Conference of the European Society of Cognitive Psychology (ESCOP 2003) (pp. 271-271).
  • Merkx, D., & Frank, S. L. (2021). Human sentence processing: Recurrence or attention? In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2021) (pp. 12-22). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL). doi:10.18653/v1/2021.cmcl-1.2.

    Abstract

    Recurrent neural networks (RNNs) have long been an architecture of interest for computational models of human sentence processing. The recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks but little is known about its ability to model human language processing. We compare Transformer- and RNN-based language models’ ability to account for measures of human reading effort. Our analysis shows Transformers to outperform RNNs in explaining self-paced reading times and neural activity during reading English sentences, challenging the widely held idea that human sentence processing involves recurrent and immediate processing and provides evidence for cue-based retrieval.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2022). Seeing the advantage: Visually grounding word embeddings to better capture human semantic knowledge. In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2022) (pp. 1-11). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL).

    Abstract

    Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings.Importantly, in both experiments we show that he grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2021). Semantic sentence similarity: Size does not always matter. In Proceedings of Interspeech 2021 (pp. 4393-4397). doi:10.21437/Interspeech.2021-1464.

    Abstract

    This study addresses the question whether visually grounded speech recognition (VGS) models learn to capture sentence semantics without access to any prior linguistic knowledge. We produce synthetic and natural spoken versions of a well known semantic textual similarity database and show that our VGS model produces embeddings that correlate well with human semantic similarity judgements. Our results show that a model trained on a small image-caption database outperforms two models trained on much larger databases, indicating that database size is not all that matters. We also investigate the importance of having multiple captions per image and find that this is indeed helpful even if the total number of images is lower, suggesting that paraphrasing is a valuable learning signal. While the general trend in the field is to create ever larger datasets to train models on, our findings indicate other characteristics of the database can just as important.
  • Merkx, D., & Scharenborg, O. (2018). Articulatory feature classification using convolutional neural networks. In Proceedings of Interspeech 2018 (pp. 2142-2146). doi:10.21437/Interspeech.2018-2275.

    Abstract

    The ultimate goal of our research is to improve an existing speech-based computational model of human speech recognition on the task of simulating the role of fine-grained phonetic information in human speech processing. As part of this work we are investigating articulatory feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Different approaches have been used to build AF classifiers, most notably multi-layer perceptrons. Recently, deep neural networks have been applied to the task of AF classification. This paper aims to improve AF classification by investigating two different approaches: 1) investigating the usefulness of a deep Convolutional neural network (CNN) for AF classification; 2) integrating the Mel filtering operation into the CNN architecture. The results showed a remarkable improvement in classification accuracy of the CNNs over state-of-the-art AF classification results for Dutch, most notably in the minority classes. Integrating the Mel filtering operation into the CNN architecture did not further improve classification performance.
  • Micklos, A., Macuch Silva, V., & Fay, N. (2018). The prevalence of repair in studies of language evolution. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 316-318). Toruń, Poland: NCU Press. doi:10.12775/3991-1.075.
  • Mishra, C., & Skantze, G. (2022). Knowing where to look: A planning-based architecture to automate the gaze behavior of social robots. In Proceedings of the 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) (pp. 1201-1208). doi:10.1109/RO-MAN53752.2022.9900740.

    Abstract

    Gaze cues play an important role in human communication and are used to coordinate turn-taking and joint attention, as well as to regulate intimacy. In order to have fluent conversations with people, social robots need to exhibit humanlike gaze behavior. Previous Gaze Control Systems (GCS) in HRI have automated robot gaze using data-driven or heuristic approaches. However, these systems tend to be mainly reactive in nature. Planning the robot gaze ahead of time could help in achieving more realistic gaze behavior and better eye-head coordination. In this paper, we propose and implement a novel planning-based GCS. We evaluate our system in a comparative within-subjects user study (N=26) between a reactive system and our proposed system. The results show that the users preferred the proposed system and that it was significantly more interpretable and better at regulating intimacy.
  • Moscoso del Prado Martín, F., & Baayen, R. H. (2003). Using the structure found in time: Building real-scale orthographic and phonetic representations by accumulation of expectations. In H. Bowman, & C. Labiouse (Eds.), Connectionist Models of Cognition, Perception and Emotion: Proceedings of the Eighth Neural Computation and Psychology Workshop (pp. 263-272). Singapore: World Scientific.
  • Mudd, K., Lutzenberger, H., De Vos, C., & De Boer, B. (2021). Social structure and lexical uniformity: A case study of gender differences in the Kata Kolok community. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 2692-2698). Vienna: Cognitive Science Society.

    Abstract

    Language emergence is characterized by a high degree of lex-
    ical variation. It has been suggested that the speed at which
    lexical conventionalization occurs depends partially on social
    structure. In large communities, individuals receive input from
    many sources, creating a pressure for lexical convergence.
    In small, insular communities, individuals can remember id-
    iolects and share common ground with interlocuters, allow-
    ing these communities to retain a high degree of lexical vari-
    ation. We look at lexical variation in Kata Kolok, a sign lan-
    guage which emerged six generations ago in a Balinese vil-
    lage, where women tend to have more tightly-knit social net-
    works than men. We test if there are differing degrees of lexical
    uniformity between women and men by reanalyzing a picture
    description task in Kata Kolok. We find that women’s produc-
    tions exhibit less lexical uniformity than men’s. One possible
    explanation of this finding is that women’s more tightly-knit
    social networks allow for remembering idiolects, alleviating
    the pressure for lexical convergence, but social network data
    from the Kata Kolok community is needed to support this ex-
    planation.
  • Mulder, K., Ten Bosch, L., & Boves, L. (2018). Analyzing EEG Signals in Auditory Speech Comprehension Using Temporal Response Functions and Generalized Additive Models. In Proceedings of Interspeech 2018 (pp. 1452-1456). doi:10.21437/Interspeech.2018-1676.

    Abstract

    Analyzing EEG signals recorded while participants are listening to continuous speech with the purpose of testing linguistic hypotheses is complicated by the fact that the signals simultaneously reflect exogenous acoustic excitation and endogenous linguistic processing. This makes it difficult to trace subtle differences that occur in mid-sentence position. We apply an analysis based on multivariate temporal response functions to uncover subtle mid-sentence effects. This approach is based on a per-stimulus estimate of the response of the neural system to speech input. Analyzing EEG signals predicted on the basis of the response functions might then bring to light conditionspecific differences in the filtered signals. We validate this approach by means of an analysis of EEG signals recorded with isolated word stimuli. Then, we apply the validated method to the analysis of the responses to the same words in the middle of meaningful sentences.
  • Oostdijk, N., & Broeder, D. (2003). The Spoken Dutch Corpus and its exploitation environment. In A. Abeille, S. Hansen-Schirra, & H. Uszkoreit (Eds.), Proceedings of the 4th International Workshop on linguistically interpreted corpora (LINC-03) (pp. 93-101).
  • Ouni, S., Cohen, M. M., Young, K., & Jesse, A. (2003). Internationalization of a talking head. In M. Sole, D. Recasens, & J. Romero (Eds.), Proceedings of 15th International Congress of Phonetics Sciences (pp. 2569-2572). Barcelona: Casual Productions.

    Abstract

    In this paper we describe a general scheme for internationalization of our talking head, Baldi, to speak other languages. We describe the modular structure of the auditory/visual synthesis software. As an example, we have created a synthetic Arabic talker, which is evaluated using a noisy word recognition task comparing this talker with a natural one.
  • Ozyurek, A. (1998). An analysis of the basic meaning of Turkish demonstratives in face-to-face conversational interaction. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (Eds.), Oralite et gestualite: Communication multimodale, interaction: actes du colloque ORAGE 98 (pp. 609-614). Paris: L'Harmattan.
  • Ozyurek, A., & Kita, S. (1999). Expressing manner and path in English and Turkish: Differences in speech, gesture, and conceptualization. In M. Hahn, & S. C. Stoness (Eds.), Proceedings of the Twenty-first Annual Conference of the Cognitive Science Society (pp. 507-512). London: Erlbaum.
  • Pouw, W., Wit, J., Bögels, S., Rasenberg, M., Milivojevic, B., & Ozyurek, A. (2021). Semantically related gestures move alike: Towards a distributional semantics of gesture kinematics. In V. G. Duffy (Ed.), Digital human modeling and applications in health, safety, ergonomics and risk management. human body, motion and behavior:12th International Conference, DHM 2021, Held as Part of the 23rd HCI International Conference, HCII 2021 (pp. 269-287). Berlin: Springer. doi:10.1007/978-3-030-77817-0_20.
  • Räsänen, O., Seshadri, S., & Casillas, M. (2018). Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. In Proceedings of Interspeech 2018 (pp. 1200-1204). doi:10.21437/Interspeech.2018-1047.

    Abstract

    Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.
  • Ravignani, A., Garcia, M., Gross, S., de Reus, K., Hoeksema, N., Rubio-Garcia, A., & de Boer, B. (2018). Pinnipeds have something to say about speech and rhythm. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 399-401). Toruń, Poland: NCU Press. doi:10.12775/3991-1.095.
  • Raviv, L., Meyer, A. S., & Lev-Ari, S. (2018). The role of community size in the emergence of linguistic structure. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 402-404). Toruń, Poland: NCU Press. doi:10.12775/3991-1.096.
  • Raviv, L., Jacobson, S. L., Plotnik, J. M., Bowman, J., Lynch, V., & Benítez-Burraco, A. (2022). Elephants as a new animal model for studying the evolution of language as a result of self-domestication. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 606-608). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • de Reus, K., Carlson, D., Lowry, A., Gross, S., Garcia, M., Rubio-García, A., Salazar-Casals, A., & Ravignani, A. (2022). Body size predicts vocal tract size in a mammalian vocal learner. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 154-156). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Rubio-Fernández, P., & Jara-Ettinger, J. (2018). Joint inferences of speakers’ beliefs and referents based on how they speak. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 991-996). Austin, TX: Cognitive Science Society.

    Abstract

    For almost two decades, the poor performance observed with the so-called Director task has been interpreted as evidence of limited use of Theory of Mind in communication. Here we propose a probabilistic model of common ground in referential communication that derives three inferences from an utterance: what the speaker is talking about in a visual context, what she knows about the context, and what referential expressions she prefers. We tested our model by comparing its inferences with those made by human participants and found that it closely mirrors their judgments, whereas an alternative model compromising the hearer’s expectations of cooperativeness and efficiency reveals a worse fit to the human data. Rather than assuming that common ground is fixed in a given exchange and may or may not constrain reference resolution, we show how common ground can be inferred as part of the process of reference assignment.
  • Rubio-Fernández, P., Breheny, R., & Lee, M. W. (2003). Context-independent information in concepts: An investigation of the notion of ‘core features’. In Proceedings of the 25th Annual Conference of the Cognitive Science Society (CogSci 2003). Austin, TX: Cognitive Science Society.
  • De Ruiter, J. P. (2004). On the primacy of language in multimodal communication. In Workshop Proceedings on Multimodal Corpora: Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces.(LREC2004) (pp. 38-41). Paris: ELRA - European Language Resources Association (CD-ROM).

    Abstract

    In this paper, I will argue that although the study of multimodal interaction offers exciting new prospects for Human Computer Interaction and human-human communication research, language is the primary form of communication, even in multimodal systems. I will support this claim with theoretical and empirical arguments, mainly drawn from human-human communication research, and will discuss the implications for multimodal communication research and Human-Computer Interaction.
  • Saleh, A., Beck, T., Galke, L., & Scherp, A. (2018). Performance comparison of ad-hoc retrieval models over full-text vs. titles of documents. In M. Dobreva, A. Hinze, & M. Žumer (Eds.), Maturity and Innovation in Digital Libraries: 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings (pp. 290-303). Cham, Switzerland: Springer.

    Abstract

    While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.
  • Sauter, D., Scott, S., & Calder, A. (2004). Categorisation of vocally expressed positive emotion: A first step towards basic positive emotions? [Abstract]. Proceedings of the British Psychological Society, 12, 111.

    Abstract

    Most of the study of basic emotion expressions has focused on facial expressions and little work has been done to specifically investigate happiness, the only positive of the basic emotions (Ekman & Friesen, 1971). However, a theoretical suggestion has been made that happiness could be broken down into discrete positive emotions, which each fulfil the criteria of basic emotions, and that these would be expressed vocally (Ekman, 1992). To empirically test this hypothesis, 20 participants categorised 80 paralinguistic sounds using the labels achievement, amusement, contentment, pleasure and relief. The results suggest that achievement, amusement and relief are perceived as distinct categories, which subjects accurately identify. In contrast, the categories of contentment and pleasure were systematically confused with other responses, although performance was still well above chance levels. These findings are initial evidence that the positive emotions engage distinct vocal expressions and may be considered to be distinct emotion categories.
  • Scharenborg, O., & Merkx, D. (2018). The role of articulatory feature representation quality in a computational model of human spoken-word recognition. In Proceedings of the Machine Learning in Speech and Language Processing Workshop (MLSLP 2018).

    Abstract

    Fine-Tracker is a speech-based model of human speech
    recognition. While previous work has shown that Fine-Tracker
    is successful at modelling aspects of human spoken-word
    recognition, its speech recognition performance is not
    comparable to that of human performance, possibly due to
    suboptimal intermediate articulatory feature (AF)
    representations. This study investigates the effect of improved
    AF representations, obtained using a state-of-the-art deep
    convolutional network, on Fine-Tracker’s simulation and
    recognition performance: Although the improved AF quality
    resulted in improved speech recognition; it, surprisingly, did
    not lead to an improvement in Fine-Tracker’s simulation power.
  • Scharenborg, O., Boves, L., & Ten Bosch, L. (2004). ‘On-line early recognition’ of polysyllabic words in continuous speech. In S. Cassidy, F. Cox, R. Mannell, & P. Sallyanne (Eds.), Proceedings of the Tenth Australian International Conference on Speech Science & Technology (pp. 387-392). Canberra: Australian Speech Science and Technology Association Inc.

    Abstract

    In this paper, we investigate the ability of SpeM, our recognition system based on the combination of an automatic phone recogniser and a wordsearch module, to determine as early as possible during the word recognition process whether a word is likely to be recognised correctly (this we refer to as ‘on-line’ early word recognition). We present two measures that can be used to predict whether a word is correctly recognised: the Bayesian word activation and the amount of available (acoustic) information for a word. SpeM was tested on 1,463 polysyllabic words in 885 continuous speech utterances. The investigated predictors indicated that a word activation that is 1) high (but not too high) and 2) based on more phones is more reliable to predict the correctness of a word than a similarly high value based on a small number of phones or a lower value of the word activation.
  • Scharenborg, O., McQueen, J. M., Ten Bosch, L., & Norris, D. (2003). Modelling human speech recognition using automatic speech recognition paradigms in SpeM. In Proceedings of Eurospeech 2003 (pp. 2097-2100). Adelaide: Causal Productions.

    Abstract

    We have recently developed a new model of human speech recognition, based on automatic speech recognition techniques [1]. The present paper has two goals. First, we show that the new model performs well in the recognition of lexically ambiguous input. These demonstrations suggest that the model is able to operate in the same optimal way as human listeners. Second, we discuss how to relate the behaviour of a recogniser, designed to discover the optimum path through a word lattice, to data from human listening experiments. We argue that this requires a metric that combines both path-based and word-based measures of recognition performance. The combined metric varies continuously as the input speech signal unfolds over time.
  • Scharenborg, O., ten Bosch, L., & Boves, L. (2003). Recognising 'real-life' speech with SpeM: A speech-based computational model of human speech recognition. In Eurospeech 2003 (pp. 2285-2288).

    Abstract

    In this paper, we present a novel computational model of human speech recognition – called SpeM – based on the theory underlying Shortlist. We will show that SpeM, in combination with an automatic phone recogniser (APR), is able to simulate the human speech recognition process from the acoustic signal to the ultimate recognition of words. This joint model takes an acoustic speech file as input and calculates the activation flows of candidate words on the basis of the degree of fit of the candidate words with the input. Experiments showed that SpeM outperforms Shortlist on the recognition of ‘real-life’ input. Furthermore, SpeM performs only slightly worse than an off-the-shelf full-blown automatic speech recogniser in which all words are equally probable, while it provides a transparent computationally elegant paradigm for modelling word activations in human word recognition.
  • Schiller, N. O. (2003). Metrical stress in speech production: A time course study. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 451-454). Adelaide: Causal Productions.

    Abstract

    This study investigated the encoding of metrical information during speech production in Dutch. In Experiment 1, participants were asked to judge whether bisyllabic picture names had initial or final stress. Results showed significantly faster decision times for initially stressed targets (e.g., LEpel 'spoon') than for targets with final stress (e.g., liBEL 'dragon fly'; capital letters indicate stressed syllables) and revealed that the monitoring latencies are not a function of the picture naming or object recognition latencies to the same pictures. Experiments 2 and 3 replicated the outcome of the first experiment with bi- and trisyllabic picture names. These results demonstrate that metrical information of words is encoded rightward incrementally during phonological encoding in speech production. The results of these experiments are in line with Levelt's model of phonological encoding.
  • Scholman, M., Tianai, D., Yung, F., & Demberg, V. (2022). DiscoGeM: A crowdsourced corpus of genre-mixed implicit discourse relations. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. DeClerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022) (pp. 3281-3290). Marseille, France: European Language Resources Association.

    Abstract

    We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech,
    literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods
    were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results
    show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses.
    Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text
    genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic
    relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level
    labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to
    function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into
    non-connective signals of discourse relations.
  • Scott, S., & Sauter, D. (2004). Vocal expressions of emotion and positive and negative basic emotions [Abstract]. Proceedings of the British Psychological Society, 12, 156.

    Abstract

    Previous studies have indicated that vocal and facial expressions of the ‘basic’ emotions share aspects of processing. Thus amygdala damage compromises the perception of fear and anger from the face and from the voice. In the current study we tested the hypothesis that there exist positive basic emotions, expressed mainly in the voice (Ekman, 1992). Vocal stimuli were produced to express the specific positive emotions of amusement, achievement, pleasure, contentment and relief.
  • Seidl, A., & Johnson, E. K. (2003). Position and vowel quality effects in infant's segmentation of vowel-initial words. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2233-2236). Adelaide: Causal Productions.
  • Seuren, P. A. M. (1996). What a universal semantic interlingua can do. In A. Zamulin (Ed.), Perspectives of System Informatics. Proceedings of the Andrei Ershov Second International Memorial Conference, Novosibirsk, Akademgorodok, June 25-28,1996 (pp. 41-42). Novosibirsk: A.P. Ershov Institute of Informatics Systems.
  • Severijnen, G. G., Bosker, H. R., & McQueen, J. M. (2022). Acoustic correlates of Dutch lexical stress re-examined: Spectral tilt is not always more reliable than intensity. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 278-282). doi:10.21437/SpeechProsody.2022-57.

    Abstract

    The present study examined two acoustic cues in the production
    of lexical stress in Dutch: spectral tilt and overall intensity.
    Sluijter and Van Heuven (1996) reported that spectral tilt is a
    more reliable cue to stress than intensity. However, that study
    included only a small number of talkers (10) and only syllables
    with the vowels /aː/ and /ɔ/.
    The present study re-examined this issue in a larger and
    more variable dataset. We recorded 38 native speakers of Dutch
    (20 females) producing 744 tokens of Dutch segmentally
    overlapping words (e.g., VOORnaam vs. voorNAAM, “first
    name” vs. “respectable”), targeting 10 different vowels, in
    variable sentence contexts. For each syllable, we measured
    overall intensity and spectral tilt following Sluijter and Van
    Heuven (1996).
    Results from Linear Discriminant Analyses showed that,
    for the vowel /aː/ alone, spectral tilt showed an advantage over
    intensity, as evidenced by higher stressed/unstressed syllable
    classification accuracy scores for spectral tilt. However, when
    all vowels were included in the analysis, the advantage
    disappeared.
    These findings confirm that spectral tilt plays a larger role
    in signaling stress in Dutch /aː/ but show that, for a larger
    sample of Dutch vowels, overall intensity and spectral tilt are
    equally important.
  • Shattuck-Hufnagel, S., & Cutler, A. (1999). The prosody of speech error corrections revisited. In J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. Bailey (Eds.), Proceedings of the Fourteenth International Congress of Phonetic Sciences: Vol. 2 (pp. 1483-1486). Berkely: University of California.

    Abstract

    A corpus of digitized speech errors is used to compare the prosody of correction patterns for word-level vs. sound-level errors. Results for both peak F0 and perceived prosodic markedness confirm that speakers are more likely to mark corrections of word-level errors than corrections of sound-level errors, and that errors ambiguous between word-level and soundlevel (such as boat for moat) show correction patterns like those for sound level errors. This finding increases the plausibility of the claim that word-sound-ambiguous errors arise at the same level of processing as sound errors that do not form words.
  • Shatzman, K. B. (2004). Segmenting ambiguous phrases using phoneme duration. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 329-332). Seoul: Sunjijn Printing Co.

    Abstract

    The results of an eye-tracking experiment are presented in which Dutch listeners' eye movements were monitored as they heard sentences and saw four pictured objects. Participants were instructed to click on the object mentioned in the sentence. In the critical sentences, a stop-initial target (e.g., "pot") was preceded by an [s], thus causing ambiguity regarding whether the sentence refers to a stop-initial or a cluster-initial word (e.g., "spot"). Participants made fewer fixations to the target pictures when the stop and the preceding [s] were cross-spliced from the cluster-initial word than when they were spliced from a different token of the sentence containing the stop-initial word. Acoustic analyses showed that the two versions differed in various measures, but only one of these - the duration of the [s] - correlated with the perceptual effect. Thus, in this context, the [s] duration information is an important factor guiding word recognition.
  • Shi, R., Werker, J., & Cutler, A. (2003). Function words in early speech perception. In Proceedings of the 15th International Congress of Phonetic Sciences (pp. 3009-3012).

    Abstract

    Three experiments examined whether infants recognise functors in phrases, and whether their representations of functors are phonetically well specified. Eight- and 13- month-old English infants heard monosyllabic lexical words preceded by real functors (e.g., the, his) versus nonsense functors (e.g., kuh); the latter were minimally modified segmentally (but not prosodically) from real functors. Lexical words were constant across conditions; thus recognition of functors would appear as longer listening time to sequences with real functors. Eightmonth- olds' listening times to sequences with real versus nonsense functors did not significantly differ, suggesting that they did not recognise real functors, or functor representations lacked phonetic specification. However, 13-month-olds listened significantly longer to sequences with real functors. Thus, somewhere between 8 and 13 months of age infants learn familiar functors and represent them with segmental detail. We propose that accumulated frequency of functors in input in general passes a critical threshold during this time.
  • Li, Y., Wu, S., Shi, S., Tong, S., Zhang, Y., & Guo, X. (2021). Enhanced inter-brain connectivity between children and adults during cooperation: a dual EEG study. In 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC) (pp. 6289-6292). doi:10.1109/EMBC46164.2021.9630330.

    Abstract

    Previous fNIRS studies have suggested that adult-child cooperation is accompanied by increased inter-brain synchrony. However, its reflection in the electrophysiological synchrony remains unclear. In this study, we designed a naturalistic and well-controlled adult-child interaction paradigm using a tangram solving video game, and recorded dual-EEG from child and adult dyads during cooperative and individual conditions. By calculating the directed inter-brain connectivity in the theta and alpha bands, we found that the inter-brain frontal network was more densely connected and stronger in strength during the cooperative than the individual condition when the adult was watching the child playing. Moreover, the inter-brain network across different dyads shared more common information flows from the player to the observer during cooperation, but was more individually different in solo play. The results suggest an enhancement in inter-brain EEG interactions during adult-child cooperation. However, the enhancement was evident in all cooperative cases but partly depended on the role of participants.
  • Slonimska, A., Özyürek, A., & Capirci, O. (2022). Simultaneity as an emergent property of sign languages. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 678-680). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Speed, L., & Majid, A. (2018). Music and odor in harmony: A case of music-odor synaesthesia. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2527-2532). Austin, TX: Cognitive Science Society.

    Abstract

    We report an individual with music-odor synaesthesia who experiences automatic and vivid odor sensations when she hears music. S’s odor associations were recorded on two days, and compared with those of two control participants. Overall, S produced longer descriptions, and her associations were of multiple odors at once, in comparison to controls who typically reported a single odor. Although odor associations were qualitatively different between S and controls, ratings of the consistency of their descriptions did not differ. This demonstrates that crossmodal associations between music and odor exist in non-synaesthetes too. We also found that S is better at discriminating between odors than control participants, and is more likely to experience emotion, memories and evaluations triggered by odors, demonstrating the broader impact of her synaesthesia.

    Additional information

    link to conference website
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Turn-taking in social talk dialogues: Temporal, formal and functional aspects. In 9th International Conference Speech and Computer (SPECOM'2004) (pp. 454-461).

    Abstract

    This paper presents a quantitative analysis of the
    turn-taking mechanism evidenced in 93 telephone
    dialogues that were taken from the 9-million-word
    Spoken Dutch Corpus. While the first part of the paper
    focuses on the temporal phenomena of turn taking, such
    as durations of pauses and overlaps of turns in the
    dialogues, the second part explores the discoursefunctional
    aspects of utterances in a subset of 8
    dialogues that were annotated especially for this
    purpose. The results show that speakers adapt their turntaking
    behaviour to the interlocutor’s behaviour.
    Furthermore, the results indicate that male-male dialogs
    show a higher proportion of overlapping turns than
    female-female dialogues.
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2018). Analyzing reaction time sequences from human participants in auditory experiments. In Proceedings of Interspeech 2018 (pp. 971-975). doi:10.21437/Interspeech.2018-1728.

    Abstract

    Sequences of reaction times (RT) produced by participants in an experiment are not only influenced by the stimuli, but by many other factors as well, including fatigue, attention, experience, IQ, handedness, etc. These confounding factors result in longterm effects (such as a participant’s overall reaction capability) and in short- and medium-time fluctuations in RTs (often referred to as ‘local speed effects’). Because stimuli are usually presented in a random sequence different for each participant, local speed effects affect the underlying ‘true’ RTs of specific trials in different ways across participants. To be able to focus statistical analysis on the effects of the cognitive process under study, it is necessary to reduce the effect of confounding factors as much as possible. In this paper we propose and compare techniques and criteria for doing so, with focus on reducing (‘filtering’) the local speed effects. We show that filtering matters substantially for the significance analyses of predictors in linear mixed effect regression models. The performance of filtering is assessed by the average between-participant correlation between filtered RT sequences and by Akaike’s Information Criterion, an important measure of the goodness-of-fit of linear mixed effect regression models.
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Durational aspects of turn-taking in spontaneous face-to-face and telephone dialogues. In P. Sojka, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: Proceedings of the 7th International Conference TSD 2004 (pp. 563-570). Heidelberg: Springer.

    Abstract

    On the basis of two-speaker spontaneous conversations, it is shown that the distributions of both pauses and speech-overlaps of telephone and faceto-face dialogues have different statistical properties. Pauses in a face-to-face
    dialogue last up to 4 times longer than pauses in telephone conversations in functionally comparable conditions. There is a high correlation (0.88 or larger) between the average pause duration for the two speakers across face-to-face
    dialogues and telephone dialogues. The data provided form a first quantitative analysis of the complex turn-taking mechanism evidenced in the dialogues available in the 9-million-word Spoken Dutch Corpus.
  • Ten Bosch, L., & Boves, L. (2018). Information encoding by deep neural networks: what can we learn? In Proceedings of Interspeech 2018 (pp. 1457-1461). doi:10.21437/Interspeech.2018-1896.

    Abstract

    The recent advent of deep learning techniques in speech tech-nology and in particular in automatic speech recognition hasyielded substantial performance improvements. This suggeststhat deep neural networks (DNNs) are able to capture structurein speech data that older methods for acoustic modeling, suchas Gaussian Mixture Models and shallow neural networks failto uncover. In image recognition it is possible to link repre-sentations on the first couple of layers in DNNs to structuralproperties of images, and to representations on early layers inthe visual cortex. This raises the question whether it is possi-ble to accomplish a similar feat with representations on DNNlayers when processing speech input. In this paper we presentthree different experiments in which we attempt to untanglehow DNNs encode speech signals, and to relate these repre-sentations to phonetic knowledge, with the aim to advance con-ventional phonetic concepts and to choose the topology of aDNNs more efficiently. Two experiments investigate represen-tations formed by auto-encoders. A third experiment investi-gates representations on convolutional layers that treat speechspectrograms as if they were images. The results lay the basisfor future experiments with recursive networks.
  • Thompson, B., & Lupyan, G. (2018). Automatic estimation of lexical concreteness in 77 languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1122-1127). Austin, TX: Cognitive Science Society.

    Abstract

    We estimate lexical Concreteness for millions of words across 77 languages. Using a simple regression framework, we combine vector-based models of lexical semantics with experimental norms of Concreteness in English and Dutch. By applying techniques to align vector-based semantics across distinct languages, we compute and release Concreteness estimates at scale in numerous languages for which experimental norms are not currently available. This paper lays out the technique and its efficacy. Although this is a difficult dataset to evaluate immediately, Concreteness estimates computed from English correlate with Dutch experimental norms at $\rho$ = .75 in the vocabulary at large, increasing to $\rho$ = .8 among Nouns. Our predictions also recapitulate attested relationships with word frequency. The approach we describe can be readily applied to numerous lexical measures beyond Concreteness
  • Thompson, B., Roberts, S., & Lupyan, G. (2018). Quantifying semantic similarity across languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2551-2556). Austin, TX: Cognitive Science Society.

    Abstract

    Do all languages convey semantic knowledge in the same way? If language simply mirrors the structure of the world, the answer should be a qualified “yes”. If, however, languages impose structure as much as reflecting it, then even ostensibly the “same” word in different languages may mean quite different things. We provide a first pass at a large-scale quantification of cross-linguistic semantic alignment of approximately 1000 meanings in 55 languages. We find that the translation equivalents in some domains (e.g., Time, Quantity, and Kinship) exhibit high alignment across languages while the structure of other domains (e.g., Politics, Food, Emotions, and Animals) exhibits substantial cross-linguistic variability. Our measure of semantic alignment correlates with known phylogenetic distances between languages: more phylogenetically distant languages have less semantic alignment. We also find semantic alignment to correlate with cultural distances between societies speaking the languages, suggesting a rich co-adaptation of language and culture even in domains of experience that appear most constrained by the natural world
  • Tourtouri, E. N., Delogu, F., & Crocker, M. W. (2018). Specificity and entropy reduction in situated referential processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3356-3361). Austin: Cognitive Science Society.

    Abstract

    In situated communication, reference to an entity in the shared visual context can be established using eitheranexpression that conveys precise (minimally specified) or redundant (over-specified) information. There is, however, along-lasting debate in psycholinguistics concerningwhether the latter hinders referential processing. We present evidence from an eyetrackingexperiment recordingfixations as well asthe Index of Cognitive Activity –a novel measure of cognitive workload –supporting the view that over-specifications facilitate processing. We further present originalevidence that, above and beyond the effect of specificity,referring expressions thatuniformly reduce referential entropyalso benefitprocessing
  • Tsutsui, S., Wang, X., Weng, G., Zhang, Y., Crandall, D., & Yu, C. (2022). Action recognition based on cross-situational action-object statistics. In Proceedings of the 2022 IEEE International Conference on Development and Learning (ICDL 2022).

    Abstract

    Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition models with greater generalization ability. To do this, we take inspiration from a cognitive mechanism called cross-situational learning, which states that human learners extract the meaning of concepts by observing instances of the same concept across different situations. We perform controlled experiments with various types of action-object associations, and identify key properties of action-object co-occurrence in training data that lead to better classifiers. Given that these properties are missing in the datasets that are typically used to train action classifiers in the computer vision literature, our work provides useful insights on how we should best construct datasets for efficiently training for better generalization.
  • Vagliano, I., Galke, L., Mai, F., & Scherp, A. (2018). Using adversarial autoencoders for multi-modal automatic playlist continuation. In C.-W. Chen, P. Lamere, M. Schedl, & H. Zamani (Eds.), RecSys Challenge '18: Proceedings of the ACM Recommender Systems Challenge 2018 (pp. 5.1-5.6). New York: ACM. doi:10.1145/3267471.3267476.

    Abstract

    The task of automatic playlist continuation is generating a list of recommended tracks that can be added to an existing playlist. By suggesting appropriate tracks, i. e., songs to add to a playlist, a recommender system can increase the user engagement by making playlist creation easier, as well as extending listening beyond the end of current playlist. The ACM Recommender Systems Challenge 2018 focuses on such task. Spotify released a dataset of playlists, which includes a large number of playlists and associated track listings. Given a set of playlists from which a number of tracks have been withheld, the goal is predicting the missing tracks in those playlists. We participated in the challenge as the team Unconscious Bias and, in this paper, we present our approach. We extend adversarial autoencoders to the problem of automatic playlist continuation. We show how multiple input modalities, such as the playlist titles as well as track titles, artists and albums, can be incorporated in the playlist continuation task.
  • Van Valin Jr., R. D. (1987). Aspects of the interaction of syntax and pragmatics: Discourse coreference mechanisms and the typology of grammatical systems. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 513-531). Amsterdam: Benjamins.
  • Van Geenhoven, V. (1999). A before-&-after picture of when-, before-, and after-clauses. In T. Matthews, & D. Strolovitch (Eds.), Proceedings of the 9th Semantics and Linguistic Theory Conference (pp. 283-315). Ithaca, NY, USA: Cornell University.
  • Van Donselaar, W., Kuijpers, C., & Cutler, A. (1996). How do Dutch listeners process words with epenthetic schwa? In H. T. Bunnell (Ed.), Proceedings of the Fourth International Conference on Spoken Language Processing: Vol. 1 (pp. 149-152). New York: Institute of Electrical and Electronics Engineers.

    Abstract

    Dutch words with certain final consonant clusters are subject to optional schwa epenthesis. The present research aimed at investigating how Dutch listeners deal with this type of phonological variation. By means of syllable monitoring experiments, it was investigated whether Dutch listeners process words with epenthetic schwa (e.g., ’balluk’) as bisyllabic words or rather as monosyllabic words. Real words (e.g., ’balk’, ’balluk’) and pseudowords (e.g., ’golk’, ’golluk’) were compared, to examine effects of lexical representation. No difference was found between monitoring times for BAL targets in ’balluk’ carriers as compared to ’balk’ carriers. This suggests that words with epenthetic schwa are not processed as bisyllabic words. The effects for the pseudo-words paralleled those for the real words, which suggests that they are not due to lexical representation but rather to the application of phonological rules.
  • Van Valin Jr., R. D. (1987). Pragmatics, island phenomena, and linguistic competence. In A. M. Farley, P. T. Farley, & K.-E. McCullough (Eds.), CLS 22. Papers from the parasession on pragmatics and grammatical theory (pp. 223-233). Chicago Linguistic Society.
  • Vernes, S. C., Janik, V. M., Fitch, W. T., & Slater, P. J. B. (Eds.). (2021). Vocal learning in animals and humans [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 376.
  • Vernes, S. C. (2018). Vocal learning in bats: From genes to behaviour. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 516-518). Toruń, Poland: NCU Press. doi:10.12775/3991-1.128.
  • Von Holzen, K., & Bergmann, C. (2018). A Meta-Analysis of Infants’ Mispronunciation Sensitivity Development. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1159-1164). Austin, TX: Cognitive Science Society.

    Abstract

    Before infants become mature speakers of their native language, they must acquire a robust word-recognition system which allows them to strike the balance between allowing some variation (mood, voice, accent) and recognizing variability that potentially changes meaning (e.g. cat vs hat). The current meta-analysis quantifies how the latter, termed mispronunciation sensitivity, changes over infants’ first three years, testing competing predictions of mainstream language acquisition theories. Our results show that infants were sensitive to mispronunciations, but accepted them as labels for target objects. Interestingly, and in contrast to predictions of mainstream theories, mispronunciation sensitivity was not modulated by infant age, suggesting that a sufficiently flexible understanding of native language phonology is in place at a young age.
  • Wagner, A., & Braun, A. (2003). Is voice quality language-dependent? Acoustic analyses based on speakers of three different languages. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 651-654). Adelaide: Causal Productions.
  • Walsh Dickey, L. (1999). Syllable count and Tzeltal segmental allomorphy. In J. Rennison, & K. Kühnhammer (Eds.), Phonologica 1996. Proceedings of the 8th International Phonology Meeting (pp. 323-334). Holland Academic Graphics.

    Abstract

    Tzeltal, a Mayan language spoken in southern Mexico, exhibits allo-morphy of an unusual type. The vowel quality of the perfective suffix is determined by the number of syllables in the stem to which it is attaching. This paper presents previously unpublished data of this allomorphy and demonstrates that a syllable-count analysis of the phenomenon is the proper one. This finding is put in a more general context of segment-prosody interaction in allomorphy.
  • Weber, A., & Smits, R. (2003). Consonant and vowel confusion patterns by American English listeners. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences.

    Abstract

    This study investigated the perception of American English phonemes by native listeners. Listeners identified either the consonant or the vowel in all possible English CV and VC syllables. The syllables were embedded in multispeaker babble at three signal-to-noise ratios (0 dB, 8 dB, and 16 dB). Effects of syllable position, signal-to-noise ratio, and articulatory features on vowel and consonant identification are discussed. The results constitute the largest source of data that is currently available on phoneme confusion patterns of American English phonemes by native listeners.
  • Weber, A., & Smits, R. (2003). Consonant and vowel confusion patterns by American English listeners. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 1437-1440). Adelaide: Causal Productions.

    Abstract

    This study investigated the perception of American English phonemes by native listeners. Listeners identified either the consonant or the vowel in all possible English CV and VC syllables. The syllables were embedded in multispeaker babble at three signalto-noise ratios (0 dB, 8 dB, and 16 dB). Effects of syllable position, signal-to-noise ratio, and articulatory features on vowel and consonant identification are discussed. The results constitute the largest source of data that is currently available on phoneme confusion patterns of American English phonemes by native listeners.
  • Weber, A. (1998). Listening to nonnative language which violates native assimilation rules. In D. Duez (Ed.), Proceedings of the European Scientific Communication Association workshop: Sound patterns of Spontaneous Speech (pp. 101-104).

    Abstract

    Recent studies using phoneme detection tasks have shown that spoken-language processing is neither facilitated nor interfered with by optional assimilation, but is inhibited by violation of obligatory assimilation. Interpretation of these results depends on an assessment of their generality, specifically, whether they also obtain when listeners are processing nonnative language. Two separate experiments are presented in which native listeners of German and native listeners of Dutch had to detect a target fricative in legal monosyllabic Dutch nonwords. All of the nonwords were correct realisations in standard Dutch. For German listeners, however, half of the nonwords contained phoneme strings which violate the German fricative assimilation rule. Whereas the Dutch listeners showed no significant effects, German listeners detected the target fricative faster when the German fricative assimilation was violated than when no violation occurred. The results might suggest that violation of assimilation rules does not have to make processing more difficult per se.
  • Weber, A., & Paris, G. (2004). The origin of the linguistic gender effect in spoken-word recognition: Evidence from non-native listening. In K. Forbus, D. Gentner, & T. Tegier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society. Mahwah, NJ: Erlbaum.

    Abstract

    Two eye-tracking experiments examined linguistic gender effects in non-native spoken-word recognition. French participants, who knew German well, followed spoken instructions in German to click on pictures on a computer screen (e.g., Wo befindet sich die Perle, “where is the pearl”) while their eye movements were monitored. The name of the target picture was preceded by a gender-marked article in the instructions. When a target and a competitor picture (with phonologically similar names) were of the same gender in both German and French, French participants fixated competitor pictures more than unrelated pictures. However, when target and competitor were of the same gender in German but of different gender in French, early fixations to the competitor picture were reduced. Competitor activation in the non-native language was seemingly constrained by native gender information. German listeners showed no such viewing time difference. The results speak against a form-based account of the linguistic gender effect. They rather support the notion that the effect originates from the grammatical level of language processing.
  • Weber, A., & Mueller, K. (2004). Word order variation in German main clauses: A corpus analysis. In Proceedings of the 20th International Conference on Computational Linguistics.

    Abstract

    In this paper, we present empirical data from a corpus study on the linear order of subjects and objects in German main clauses. The aim was to establish the validity of three well-known ordering constraints: given complements tend to occur before new complements, definite before indefinite, and pronoun before full noun phrase complements. Frequencies of occurrences were derived for subject-first and object-first sentences from the German Negra corpus. While all three constraints held on subject-first sentences, results for object-first sentences varied. Our findings suggest an influence of grammatical functions on the ordering of verb complements.

Share this page