Publications

Displaying 301 - 320 of 320
  • Wittek, A. (1998). Learning verb meaning via adverbial modification: Change-of-state verbs in German and the adverb "wieder" again. In A. Greenhill, M. Hughes, H. Littlefield, & H. Walsh (Eds.), Proceedings of the 22nd Annual Boston University Conference on Language Development (pp. 779-790). Somerville, MA: Cascadilla Press.
  • Wittenburg, P. (2004). The IMDI metadata concept. In S. F. Ferreira (Ed.), Workingmaterial on Building the LR&E Roadmap: Joint COCOSDA and ICCWLRE Meeting, (LREC2004). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Brugman, H., Broeder, D., & Russel, A. (2004). XML-based language archiving. In Workshop Proceedings on XML-based Richly Annotaded Corpora (LREC2004) (pp. 63-69). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Lenkiewicz, P., Auer, E., Gebre, B. G., Lenkiewicz, A., & Drude, S. (2012). AV Processing in eHumanities - a paradigm shift. In J. C. Meister (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 538-541).

    Abstract

    Introduction Speech research saw a dramatic change in paradigm in the 90-ies. While earlier the discussion was dominated by a phoneticians’ approach who knew about phenomena in the speech signal, the situation completely changed after stochastic machinery such as Hidden Markov Models [1] and Artificial Neural Networks [2] had been introduced. Speech processing was now dominated by a purely mathematic approach that basically ignored all existing knowledge about the speech production process and the perception mechanisms. The key was now to construct a large enough training set that would allow identifying the many free parameters of such stochastic engines. In case that the training set is representative and the annotations of the training sets are widely ‘correct’ we could assume to get a satisfyingly functioning recognizer. While the success of knowledge-based systems such as Hearsay II [3] was limited, the statistically based approach led to great improvements in recognition rates and to industrial applications.
  • Wittenburg, P., Gulrajani, G., Broeder, D., & Uneson, M. (2004). Cross-disciplinary integration of metadata descriptions. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004) (pp. 113-116). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Johnson, H., Buchhorn, M., Brugman, H., & Broeder, D. (2004). Architecture for distributed language resource management and archiving. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004) (pp. 361-364). Paris: ELRA - European Language Resources Association.
  • Wnuk, E., & Majid, A. (2012). Olfaction in a hunter-gatherer society: Insights from language and culture. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 1155-1160). Austin, TX: Cognitive Science Society.

    Abstract

    According to a widely-held view among various scholars, olfaction is inferior to other human senses. It is also believed by many that languages do not have words for describing smells. Data collected among the Maniq, a small population of nomadic foragers in southern Thailand, challenge the above claims and point to a great linguistic and cultural elaboration of odor. This article presents evidence of the importance of olfaction in indigenous rituals and beliefs, as well as in the lexicon. The results demonstrate the richness and complexity of the domain of smell in Maniq society and thereby challenge the universal paucity of olfactory terms and insignificance of olfaction for humans.
  • Woensdregt, M., & Dingemanse, M. (2020). Other-initiated repair can facilitate the emergence of compositional language. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 474-476). Nijmegen: The Evolution of Language Conferences.
  • Wright, S. E., Windhouwer, M., Schuurman, I., & Broeder, D. (2014). Segueing from a Data Category Registry to a Data Concept Registry. In Proceedings of the 11th International Conference on Terminology and Knowledge Engineering (TKE 2014).

    Abstract

    The terminology Community of Practice has long standardized data categories in the framework of ISO TC 37. ISO 12620:2009 specifies the data model and procedures for a Data Category Registry (DCR), which has been implemented by the Max Planck Institute for Psycholinguistics as the ISOcat DCR. The DCR has been used by not only ISO TC 37, but also by the CLARIN research infra-structure. This paper describes how the needs of these communities have started to diverge and the process of segueing from a DCR to a Data Concept Registry in order to meet the needs of both communities.
  • Yang, J., Van den Bosch, A., & Frank, S. L. (2020). Less is Better: A cognitively inspired unsupervised model for language segmentation. In M. Zock, E. Chersoni, A. Lenci, & E. Santus (Eds.), Proceedings of the Workshop on the Cognitive Aspects of the Lexicon ( 28th International Conference on Computational Linguistics) (pp. 33-45). Stroudsburg: Association for Computational Linguistics.

    Abstract

    Language users process utterances by segmenting them into many cognitive units, which vary in their sizes and linguistic levels. Although we can do such unitization/segmentation easily, its cognitive mechanism is still not clear. This paper proposes an unsupervised model, Less-is-Better (LiB), to simulate the human cognitive process with respect to language unitization/segmentation. LiB follows the principle of least effort and aims to build a lexicon which minimizes the number of unit tokens (alleviating the effort of analysis) and number of unit types (alleviating the effort of storage) at the same time on any given corpus. LiB’s workflow is inspired by empirical cognitive phenomena. The design makes the mechanism of LiB cognitively plausible and the computational requirement light-weight. The lexicon generated by LiB performs the best among different types of lexicons (e.g. ground-truth words) both from an information-theoretical view and a cognitive view, which suggests that the LiB lexicon may be a plausible proxy of the mental lexicon.

    Additional information

    full text via ACL website
  • Yang, A., & Chen, A. (2014). Prosodic focus marking in child and adult Mandarin Chinese. In C. Gussenhoven, Y. Chen, & D. Dediu (Eds.), Proceedings of the 4th International Symposium on Tonal Aspects of Language (pp. 54-58).

    Abstract

    This study investigates how Mandarin Chinese speaking children and adults use prosody to mark focus in spontaneous speech. SVO sentences were elicited from 4- and 8-year-olds and adults in a game setting. Sentence-medial verbs were acoustically analysed for both duration and pitch range in different focus conditions. We have found that like the adults, the 8-year-olds used both duration and pitch range to distinguish focus from non-focus. The 4-year-olds used only duration to distinguish focus from non-focus, unlike the adults and 8-year-olds. None of the three groups of speakers distinguished contrastive focus from non-contrastive focus using pitch range or duration. Regarding the distinction between narrow focus from broad focus, the 4- and 8-year-olds used both pitch range and duration for this purpose, while the adults used only duration
  • Yang, A., & Chen, A. (2014). Prosodic focus-marking in Chinese four- and eight-year-olds. In N. Campbell, D. Gibbon, & D. Hirst (Eds.), Proceedings of Speech Prosody 2014 (pp. 713-717).

    Abstract

    This study investigates how Mandarin Chinese speaking children use prosody to distinguish focus from non-focus, and focus types differing in size of constituent and contrastivity. SVO sentences were elicited from four- and eight-year-olds in a game setting. Sentence-medial verbs were acoustically analysed for both duration and pitch range in different focus conditions. The children started to use duration to differentiate focus from non-focus at the age of four. But their use of pitch range varied with age and depended on non-focus conditions (pre- vs. postfocus) and the lexical tones of the verbs. Further, the children in both age groups used pitch range but not duration to differentiate narrow focus from broad focus, and they did not differentiate contrastive narrow focus from non-contrastive narrow focus using duration or pitch range. The results indicated that Chinese children acquire the prosodic means (duration and pitch range) of marking focus in stages, and their acquisition of these two means appear to be early, compared to children speaking an intonation language, for example, Dutch.
  • Zampieri, M., & Gebre, B. G. (2012). Automatic identification of language varieties: The case of Portuguese. In J. Jancsary (Ed.), Proceedings of the Conference on Natural Language Processing 2012, September 19-21, 2012, Vienna (pp. 233-237). Vienna: Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI).

    Abstract

    Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams.
  • Zampieri, M., Gebre, B. G., & Diwersy, S. (2012). Classifying pluricentric languages: Extending the monolingual model. In Proceedings of SLTC 2012. The Fourth Swedish Language Technology Conference. Lund, October 24-26, 2012 (pp. 79-80). Lund University.

    Abstract

    This study presents a new language identification model for pluricentric languages that uses n-gram language models at the character and word level. The model is evaluated in two steps. The first step consists of the identification of two varieties of Spanish (Argentina and Spain) and two varieties of French (Quebec and France) evaluated independently in binary classification schemes. The second step integrates these language models in a six-class classification with two Portuguese varieties.
  • Zampieri, M., & Gebre, B. G. (2014). VarClass: An open-source language identification tool for language varieties. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 3305-3308).

    Abstract

    This paper presents VarClass, an open-source tool for language identification available both to be downloaded as well as through a graphical user-friendly interface. The main difference of VarClass in comparison to other state-of-the-art language identification tools is its focus on language varieties. General purpose language identification tools do not take language varieties into account and our work aims to fill this gap. VarClass currently contains language models for over 27 languages in which 10 of them are language varieties. We report an average performance of over 90.5% accuracy in a challenging dataset. More language models will be included in the upcoming months
  • Zhang, Y., Ding, R., Frassinelli, D., Tuomainen, J., Klavinskis-Whiting, S., & Vigliocco, G. (2021). Electrophysiological signatures of second language multimodal comprehension. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 2971-2977). Vienna: Cognitive Science Society.

    Abstract

    Language is multimodal: non-linguistic cues, such as prosody,
    gestures and mouth movements, are always present in face-to-
    face communication and interact to support processing. In this
    paper, we ask whether and how multimodal cues affect L2
    processing by recording EEG for highly proficient bilinguals
    when watching naturalistic materials. For each word, we
    quantified surprisal and the informativeness of prosody,
    gestures, and mouth movements. We found that each cue
    modulates the N400: prosodic accentuation, meaningful
    gestures, and informative mouth movements all reduce N400.
    Further, effects of meaningful gestures but not mouth
    informativeness are enhanced by prosodic accentuation,
    whereas effects of mouth are enhanced by meaningful gestures
    but reduced by beat gestures. Compared with L1, L2
    participants benefit less from cues and their interactions, except
    for meaningful gestures and mouth movements. Thus, in real-
    world language comprehension, L2 comprehenders use
    multimodal cues just as L1 speakers albeit to a lesser extent.
  • Zhang, Y., Amatuni, A., Cain, E., Wang, X., Crandall, D., & Yu, C. (2021). Human learners integrate visual and linguistic information cross-situational verb learning. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 2267-2273). Vienna: Cognitive Science Society.

    Abstract

    Learning verbs is challenging because it is difficult to infer the precise meaning of a verb when there are a multitude of relations that one can derive from a single event. To study this verb learning challenge, we used children's egocentric view collected from naturalistic toy-play interaction as learning materials and investigated how visual and linguistic information provided in individual naming moments as well as cross-situational information provided from multiple learning moments can help learners resolve this mapping problem using the Human Simulation Paradigm. Our results show that learners benefit from seeing children's egocentric views compared to third-person observations. In addition, linguistic information can help learners identify the correct verb meaning by eliminating possible meanings that do not belong to the linguistic category. Learners are also able to integrate visual and linguistic information both within and across learning situations to reduce the ambiguity in the space of possible verb meanings.
  • Zhang, Y., Amatuni, A., Crain, E., & Yu, C. (2020). Seeking meaning: Examining a cross-situational solution to learn action verbs using human simulation paradigm. In S. Denison, M. Mack, Y. Xu, & B. C. Armstrong (Eds.), Proceedings of the 42nd Annual Meeting of the Cognitive Science Society (CogSci 2020) (pp. 2854-2860). Montreal, QB: Cognitive Science Society.

    Abstract

    To acquire the meaning of a verb, language learners not only need to find the correct mapping between a specific verb and an action or event in the world, but also infer the underlying relational meaning that the verb encodes. Most verb naming instances in naturalistic contexts are highly ambiguous as many possible actions can be embedded in the same scenario and many possible verbs can be used to describe those actions. To understand whether learners can find the correct verb meaning from referentially ambiguous learning situations, we conducted three experiments using the Human Simulation Paradigm with adult learners. Our results suggest that although finding the right verb meaning from one learning instance is hard, there is a statistical solution to this problem. When provided with multiple verb learning instances all referring to the same verb, learners are able to aggregate information across situations and gradually converge to the correct semantic space. Even in cases where they may not guess the exact target verb, they can still discover the right meaning by guessing a similar verb that is semantically close to the ground truth.
  • Zhou, W., & Broersma, M. (2014). Perception of birth language tone contrasts by adopted Chinese children. In C. Gussenhoven, Y. Chen, & D. Dediu (Eds.), Proceedings of the 4th International Symposium on Tonal Aspects of Language (pp. 63-66).

    Abstract

    The present study investigates how long after adoption adoptees forget the phonology of their birth language. Chinese children who were adopted by Dutch families were tested on the perception of birth language tone contrasts before, during, and after perceptual training. Experiment 1 investigated Cantonese tone 2 (High-Rising) and tone 5 (Low-Rising), and Experiment 2 investigated Mandarin tone 2 (High-Rising) and tone 3 (Low-Dipping). In both experiments, participants were adoptees and non-adopted Dutch controls. Results of both experiments show that the tone contrasts were very difficult to perceive for the adoptees, and that adoptees were not better at perceiving the tone contrasts than their non-adopted Dutch peers, before or after training. This demonstrates that forgetting took place relatively soon after adoption, and that the re-exposure that the adoptees were presented with did not lead to an improvement greater than that of the Dutch control participants. Thus, the findings confirm what has been anecdotally reported by adoptees and their parents, but what had not been empirically tested before, namely that birth language forgetting occurs very soon after adoption
  • Zimianiti, E., Dimitrakopoulou, M., & Tsangalidis, A. (2021). Τhematic roles in dementia: The case of psychological verbs. In A. Botinis (Ed.), ExLing 2021: Proceedings of the 12th International Conference of Experimental Linguistics (pp. 269-272). Athens, Greece: ExLing Society.

    Abstract

    This study investigates the difficulty of people with Mild Cognitive Impairment (MCI), mild and moderate Alzheimer’s disease (AD) in the production and comprehension of psychological verbs, as thematic realization may involve both the canonical and non-canonical realization of arguments. More specifically, we aim to examine whether there is a deficit in the mapping of syntactic and semantic representations in psych-predicates regarding Greek-speaking individuals with MCI and AD, and whether the linguistic abilities associated with θ-role assignment decrease as the disease progresses. Moreover, given the decline of cognitive abilities in people with MCI and AD, we explore the effects of components of memory (Semantic, Episodic, and Working Memory) on the assignment of thematic roles in constructions with psychological verbs.

Share this page