Publications

Displaying 601 - 623 of 623
  • Wittenburg, P., Peters, W., & Drude, S. (2002). Analysis of lexical structures from field linguistics and language engineering. In M. R. González, & C. P. S. Araujo (Eds.), Third international conference on language resources and evaluation (pp. 682-686). Paris: European Language Resources Association.

    Abstract

    Lexica play an important role in every linguistic discipline. We are confronted with many types of lexica. Depending on the type of lexicon and the language we are currently faced with a large variety of structures from very simple tables to complex graphs, as was indicated by a recent overview of structures found in dictionaries from field linguistics and language engineering. It is important to assess these differences and aim at the integration of lexical resources in order to improve lexicon creation, exchange and reuse. This paper describes the first step towards the integration of existing structures and standards into a flexible abstract model.
  • Wittenburg, P., & Broeder, D. (2002). Metadata overview and the semantic web. In P. Austin, H. Dry, & P. Wittenburg (Eds.), Proceedings of the international LREC workshop on resources and tools in field linguistics. Paris: European Language Resources Association.

    Abstract

    The increasing quantity and complexity of language resources leads to new management problems for those that collect and those that need to preserve them. At the same time the desire to make these resources available on the Internet demands an efficient way characterizing their properties to allow discovery and re-use. The use of metadata is seen as a solution for both these problems. However, the question is what specific requirements there are for the specific domain and if these are met by existing frameworks. Any possible solution should be evaluated with respect to its merit for solving the domain specific problems but also with respect to its future embedding in “global” metadata frameworks as part of the Semantic Web activities.
  • Wittenburg, P., Peters, W., & Broeder, D. (2002). Metadata proposals for corpora and lexica. In M. Rodriguez González, & C. Paz Suárez Araujo (Eds.), Third international conference on language resources and evaluation (pp. 1321-1326). Paris: European Language Resources Association.
  • Wittenburg, P., Broeder, D., Offenga, F., & Willems, D. (2002). Metadata set and tools for multimedia/multimodal language resources. In M. Maybury (Ed.), Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002). Workshop on Multimodel Resources and Multimodel Systems Evaluation. (pp. 9-13). Paris: European Language Resources Association.
  • Wittenburg, P., Mosel, U., & Dwyer, A. (2002). Methods of language documentation in the DOBES program. In P. Austin, H. Dry, & P. Wittenburg (Eds.), Proceedings of the international LREC workshop on resources and tools in field linguistics (pp. 36-42). Paris: European Language Resources Association.
  • Wittenburg, P., & Ringersma, J. (2013). Metadata description for lexicons. In R. H. Gouws, U. Heid, W. Schweickard, & H. E. Wiegand (Eds.), Dictionaries: An international encyclopedia of lexicography: Supplementary volume: Recent developments with focus on electronic and computational lexicography (pp. 1329-1335). Berlin: Mouton de Gruyter.
  • Woensdregt, M., & Dingemanse, M. (2020). Other-initiated repair can facilitate the emergence of compositional language. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 474-476). Nijmegen: The Evolution of Language Conferences.
  • Wolf, M. C., Smith, A. C., Meyer, A. S., & Rowland, C. F. (2019). Modality effects in vocabulary acquisition. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 1212-1218). Montreal, QB: Cognitive Science Society.

    Abstract

    It is unknown whether modality affects the efficiency with which humans learn novel word forms and their meanings, with previous studies reporting both written and auditory advantages. The current study implements controls whose absence in previous work likely offers explanation for such contradictory findings. In two novel word learning experiments, participants were trained and tested on pseudoword - novel object pairs, with controls on: modality of test, modality of meaning, duration of exposure and transparency of word form. In both experiments word forms were presented in either their written or spoken form, each paired with a pictorial meaning (novel object). Following a 20-minute filler task, participants were tested on their ability to identify the picture-word form pairs on which they were trained. A between subjects design generated four participant groups per experiment 1) written training, written test; 2) written training, spoken test; 3) spoken training, written test; 4) spoken training, spoken test. In Experiment 1 the written stimulus was presented for a time period equal to the duration of the spoken form. Results showed that when the duration of exposure was equal, participants displayed a written training benefit. Given words can be read faster than the time taken for the spoken form to unfold, in Experiment 2 the written form was presented for 300 ms, sufficient time to read the word yet 65% shorter than the duration of the spoken form. No modality effect was observed under these conditions, when exposure to the word form was equivalent. These results demonstrate, at least for proficient readers, that when exposure to the word form is controlled across modalities the efficiency with which word form-meaning associations are learnt does not differ. Our results therefore suggest that, although we typically begin as aural-only word learners, we ultimately converge on developing learning mechanisms that learn equally efficiently from both written and spoken materials.
  • Wright, S. E., Windhouwer, M., Schuurman, I., & Kemps-Snijders, M. (2013). Community efforts around the ISOcat Data Category Registry. In I. Gurevych, & J. Kim (Eds.), The People's Web meets NLP: Collaboratively constructed language resources (pp. 349-374). New York: Springer.

    Abstract

    The ISOcat Data Category Registry provides a community computing environment for creating, storing, retrieving, harmonizing and standardizing data category specifications (DCs), used to register linguistic terms used in various fields. This chapter recounts the history of DC documentation in TC 37, beginning from paper-based lists created for lexicographers and terminologists and progressing to the development of a web-based resource for a much broader range of users. While describing the considerable strides that have been made to collect a very large comprehensive collection of DCs, it also outlines difficulties that have arisen in developing a fully operative web-based computing environment for achieving consensus on data category names, definitions, and selections and describes efforts to overcome some of the present shortcomings and to establish positive working procedures designed to engage a wide range of people involved in the creation of language resources.
  • Wright, S. E., Windhouwer, M., Schuurman, I., & Broeder, D. (2014). Segueing from a Data Category Registry to a Data Concept Registry. In Proceedings of the 11th International Conference on Terminology and Knowledge Engineering (TKE 2014).

    Abstract

    The terminology Community of Practice has long standardized data categories in the framework of ISO TC 37. ISO 12620:2009 specifies the data model and procedures for a Data Category Registry (DCR), which has been implemented by the Max Planck Institute for Psycholinguistics as the ISOcat DCR. The DCR has been used by not only ISO TC 37, but also by the CLARIN research infra-structure. This paper describes how the needs of these communities have started to diverge and the process of segueing from a DCR to a Data Concept Registry in order to meet the needs of both communities.
  • Yang, J., Van den Bosch, A., & Frank, S. L. (2020). Less is Better: A cognitively inspired unsupervised model for language segmentation. In M. Zock, E. Chersoni, A. Lenci, & E. Santus (Eds.), Proceedings of the Workshop on the Cognitive Aspects of the Lexicon ( 28th International Conference on Computational Linguistics) (pp. 33-45). Stroudsburg: Association for Computational Linguistics.

    Abstract

    Language users process utterances by segmenting them into many cognitive units, which vary in their sizes and linguistic levels. Although we can do such unitization/segmentation easily, its cognitive mechanism is still not clear. This paper proposes an unsupervised model, Less-is-Better (LiB), to simulate the human cognitive process with respect to language unitization/segmentation. LiB follows the principle of least effort and aims to build a lexicon which minimizes the number of unit tokens (alleviating the effort of analysis) and number of unit types (alleviating the effort of storage) at the same time on any given corpus. LiB’s workflow is inspired by empirical cognitive phenomena. The design makes the mechanism of LiB cognitively plausible and the computational requirement light-weight. The lexicon generated by LiB performs the best among different types of lexicons (e.g. ground-truth words) both from an information-theoretical view and a cognitive view, which suggests that the LiB lexicon may be a plausible proxy of the mental lexicon.

    Additional information

    full text via ACL website
  • Yang, A., & Chen, A. (2014). Prosodic focus marking in child and adult Mandarin Chinese. In C. Gussenhoven, Y. Chen, & D. Dediu (Eds.), Proceedings of the 4th International Symposium on Tonal Aspects of Language (pp. 54-58).

    Abstract

    This study investigates how Mandarin Chinese speaking children and adults use prosody to mark focus in spontaneous speech. SVO sentences were elicited from 4- and 8-year-olds and adults in a game setting. Sentence-medial verbs were acoustically analysed for both duration and pitch range in different focus conditions. We have found that like the adults, the 8-year-olds used both duration and pitch range to distinguish focus from non-focus. The 4-year-olds used only duration to distinguish focus from non-focus, unlike the adults and 8-year-olds. None of the three groups of speakers distinguished contrastive focus from non-contrastive focus using pitch range or duration. Regarding the distinction between narrow focus from broad focus, the 4- and 8-year-olds used both pitch range and duration for this purpose, while the adults used only duration
  • Yang, A., & Chen, A. (2014). Prosodic focus-marking in Chinese four- and eight-year-olds. In N. Campbell, D. Gibbon, & D. Hirst (Eds.), Proceedings of Speech Prosody 2014 (pp. 713-717).

    Abstract

    This study investigates how Mandarin Chinese speaking children use prosody to distinguish focus from non-focus, and focus types differing in size of constituent and contrastivity. SVO sentences were elicited from four- and eight-year-olds in a game setting. Sentence-medial verbs were acoustically analysed for both duration and pitch range in different focus conditions. The children started to use duration to differentiate focus from non-focus at the age of four. But their use of pitch range varied with age and depended on non-focus conditions (pre- vs. postfocus) and the lexical tones of the verbs. Further, the children in both age groups used pitch range but not duration to differentiate narrow focus from broad focus, and they did not differentiate contrastive narrow focus from non-contrastive narrow focus using duration or pitch range. The results indicated that Chinese children acquire the prosodic means (duration and pitch range) of marking focus in stages, and their acquisition of these two means appear to be early, compared to children speaking an intonation language, for example, Dutch.
  • Zampieri, M., & Gebre, B. G. (2014). VarClass: An open-source language identification tool for language varieties. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 3305-3308).

    Abstract

    This paper presents VarClass, an open-source tool for language identification available both to be downloaded as well as through a graphical user-friendly interface. The main difference of VarClass in comparison to other state-of-the-art language identification tools is its focus on language varieties. General purpose language identification tools do not take language varieties into account and our work aims to fill this gap. VarClass currently contains language models for over 27 languages in which 10 of them are language varieties. We report an average performance of over 90.5% accuracy in a challenging dataset. More language models will be included in the upcoming months
  • Zhang, Y., Chen, C.-h., & Yu, C. (2019). Mechanisms of cross-situational learning: Behavioral and computational evidence. In Advances in Child Development and Behavior; vol. 56 (pp. 37-63).

    Abstract

    Word learning happens in everyday contexts with many words and many potential referents for those words in view at the same time. It is challenging for young learners to find the correct referent upon hearing an unknown word at the moment. This problem of referential uncertainty has been deemed as the crux of early word learning (Quine, 1960). Recent empirical and computational studies have found support for a statistical solution to the problem termed cross-situational learning. Cross-situational learning allows learners to acquire word meanings across multiple exposures, despite each individual exposure is referentially uncertain. Recent empirical research shows that infants, children and adults rely on cross-situational learning to learn new words (Smith & Yu, 2008; Suanda, Mugwanya, & Namy, 2014; Yu & Smith, 2007). However, researchers have found evidence supporting two very different theoretical accounts of learning mechanisms: Hypothesis Testing (Gleitman, Cassidy, Nappa, Papafragou, & Trueswell, 2005; Markman, 1992) and Associative Learning (Frank, Goodman, & Tenenbaum, 2009; Yu & Smith, 2007). Hypothesis Testing is generally characterized as a form of learning in which a coherent hypothesis regarding a specific word-object mapping is formed often in conceptually constrained ways. The hypothesis will then be either accepted or rejected with additional evidence. However, proponents of the Associative Learning framework often characterize learning as aggregating information over time through implicit associative mechanisms. A learner acquires the meaning of a word when the association between the word and the referent becomes relatively strong. In this chapter, we consider these two psychological theories in the context of cross-situational word-referent learning. By reviewing recent empirical and cognitive modeling studies, our goal is to deepen our understanding of the underlying word learning mechanisms by examining and comparing the two theoretical learning accounts.
  • Zhang, Y., Amatuni, A., Crain, E., & Yu, C. (2020). Seeking meaning: Examining a cross-situational solution to learn action verbs using human simulation paradigm. In S. Denison, M. Mack, Y. Xu, & B. C. Armstrong (Eds.), Proceedings of the 42nd Annual Meeting of the Cognitive Science Society (CogSci 2020) (pp. 2854-2860). Montreal, QB: Cognitive Science Society.

    Abstract

    To acquire the meaning of a verb, language learners not only need to find the correct mapping between a specific verb and an action or event in the world, but also infer the underlying relational meaning that the verb encodes. Most verb naming instances in naturalistic contexts are highly ambiguous as many possible actions can be embedded in the same scenario and many possible verbs can be used to describe those actions. To understand whether learners can find the correct verb meaning from referentially ambiguous learning situations, we conducted three experiments using the Human Simulation Paradigm with adult learners. Our results suggest that although finding the right verb meaning from one learning instance is hard, there is a statistical solution to this problem. When provided with multiple verb learning instances all referring to the same verb, learners are able to aggregate information across situations and gradually converge to the correct semantic space. Even in cases where they may not guess the exact target verb, they can still discover the right meaning by guessing a similar verb that is semantically close to the ground truth.
  • Zhou, W., & Broersma, M. (2014). Perception of birth language tone contrasts by adopted Chinese children. In C. Gussenhoven, Y. Chen, & D. Dediu (Eds.), Proceedings of the 4th International Symposium on Tonal Aspects of Language (pp. 63-66).

    Abstract

    The present study investigates how long after adoption adoptees forget the phonology of their birth language. Chinese children who were adopted by Dutch families were tested on the perception of birth language tone contrasts before, during, and after perceptual training. Experiment 1 investigated Cantonese tone 2 (High-Rising) and tone 5 (Low-Rising), and Experiment 2 investigated Mandarin tone 2 (High-Rising) and tone 3 (Low-Dipping). In both experiments, participants were adoptees and non-adopted Dutch controls. Results of both experiments show that the tone contrasts were very difficult to perceive for the adoptees, and that adoptees were not better at perceiving the tone contrasts than their non-adopted Dutch peers, before or after training. This demonstrates that forgetting took place relatively soon after adoption, and that the re-exposure that the adoptees were presented with did not lead to an improvement greater than that of the Dutch control participants. Thus, the findings confirm what has been anecdotally reported by adoptees and their parents, but what had not been empirically tested before, namely that birth language forgetting occurs very soon after adoption
  • Zinken, J., Rossi, G., & Reddy, V. (2020). Doing more than expected: Thanking recognizes another's agency in providing assistance. In C. Taleghani-Nikazm, E. Betz, & P. Golato (Eds.), Mobilizing others: Grammar and lexis within larger activities (pp. 253-278). Amsterdam: John Benjamins.

    Abstract

    In informal interaction, speakers rarely thank a person who has complied with a request. Examining data from British English, German, Italian, Polish, and Telugu, we ask when speakers do thank after compliance. The results show that thanking treats the other’s assistance as going beyond what could be taken for granted in the circumstances. Coupled with the rareness of thanking after requests, this suggests that cooperation is to a great extent governed by expectations of helpfulness, which can be long-standing, or built over the course of a particular interaction. The higher frequency of thanking in some languages (such as English or Italian) suggests that cultures differ in the importance they place on recognizing the other’s agency in doing as requested.
  • Zuidema, W., & Fitz, H. (2019). Key issues and future directions: Models of human language and speech processing. In P. Hagoort (Ed.), Human language: From genes and brain to behavior (pp. 353-358). Cambridge, MA: MIT Press.
  • Zwitserlood, I. (2014). Meaning at the feature level in sign languages. The case of name signs in Sign Language of the Netherlands (NGT). In R. Kager (Ed.), Where the Principles Fail. A Festschrift for Wim Zonneveld on the occasion of his 64th birthday (pp. 241-251). Utrecht: Utrecht Institute of Linguistics OTS.
  • Zwitserlood, I. (2002). Klassifikatoren in der Niederländischen Gebärdensprache (NGT). In H. Leuniger, & K. Wempe (Eds.), Gebärdensprachlinguistik 2000. Theorie und Anwendung. Vorträge vom Symposium "Gebärdensprachforschung im deutschsprachigem Raum", Frankfurt a.M., 11.-13. Juni 1999 (pp. 113-126). Hamburg: Signum Verlag.
  • Zwitserlood, I., Perniss, P. M., & Ozyurek, A. (2013). Expression of multiple entities in Turkish Sign Language (TİD). In E. Arik (Ed.), Current Directions in Turkish Sign Language Research (pp. 272-302). Newcastle upon Tyne: Cambridge Scholars Publishing.

    Abstract

    This paper reports on an exploration of the ways in which multiple entities are expressed in Turkish Sign Language (TİD). The (descriptive and quantitative) analyses provided are based on a corpus of both spontaneous data and specifically elicited data, in order to provide as comprehensive an account as possible. We have found several devices in TİD for expression of multiple entities, in particular localization, spatial plural predicate inflection, and a specific form used to express multiple entities that are side by side in the same configuration (not reported for any other sign language to date), as well as numerals and quantifiers. In contrast to some other signed languages, TİD does not appear to have a productive system of plural reduplication. We argue that none of the devices encountered in the TİD data is a genuine plural marking device and that the plural interpretation of multiple entity localizations and plural predicate inflections is a by-product of the use of space to indicate the existence or the involvement in an event of multiple entities.
  • Zwitserlood, I. (2002). The complex structure of ‘simple’ signs in NGT. In J. Van Koppen, E. Thrift, E. Van der Torre, & M. Zimmermann (Eds.), Proceedings of ConSole IX (pp. 232-246).

    Abstract

    In this paper, I argue that components in a set of simple signs in Nederlandse Gebarentaal (also called Sign Language of the Netherlands; henceforth: NGT), i.e. hand configuration (including orientation), movement and place of articulation, can also have morphological status. Evidence for this is provided by: firstly, the fact that handshape, orientation, movement and place of articulation show regular meaningful patterns in signs, which patterns also occur in newly formed signs, and secondly, the gradual change of formerly noninflecting predicates into inflectional predicates. The morphological complexity of signs can best be accounted for in autosegmental morphological templates.

Share this page