Publications

Displaying 501 - 519 of 519
  • Windhouwer, M., & Wright, S. E. (2013). LMF and the Data Category Registry: Principles and application. In G. Francopoulo (Ed.), LMF: Lexical Markup Framework (pp. 41-50). London: Wiley.
  • Windhouwer, M. (2012). RELcat: a Relation Registry for ISOcat data categories. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 3661-3664). European Language Resources Association (ELRA).

    Abstract

    The ISOcat Data Category Registry contains basically a flat and easily extensible list of data category specifications. To foster reuse and standardization only very shallow relationships among data categories are stored in the registry. However, to assist crosswalks, possibly based on personal views, between various (application) domains and to overcome possible proliferation of data categories more types of ontological relationships need to be specified. RELcat is a first prototype of a Relation Registry, which allows storing arbitrary relationships. These relationships can reflect the personal view of one linguist or a larger community. The basis of the registry is a relation type taxonomy that can easily be extended. This allows on one hand to load existing sets of relations specified in, for example, an OWL (2) ontology or SKOS taxonomy. And on the other hand allows algorithms that query the registry to traverse the stored semantic network to remain ignorant of the original source vocabulary. This paper describes first experiences with RELcat and explains some initial design decisions.
  • Windhouwer, M. (2012). Towards standardized descriptions of linguistic features: ISOcat and procedures for using common data categories. In J. Jancsary (Ed.), Proceedings of the Conference on Natural Language Processing 2012, (SFLR 2012 workshop), September 19-21, 2012, Vienna (pp. 494). Vienna: Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI).

    Abstract

    Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams.
  • Withers, P. (2012). Metadata management with Arbil. In V. Arranz, D. Broeder, B. Gaiffe, M. Gavrilidou, & M. Monachini (Eds.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 72-75). European Language Resources Association (ELRA).

    Abstract

    Arbil is an application designed to create and manage metadata for research data and to arrange this data into a structure appropriate for archiving. The metadata is displayed in tables, which allows an overview of the metadata and the ability to populate and update many metadata sections in bulk. Both IMDI and Clarin metadata formats are supported and Arbil has been designed as a local application so that it can also be used offline, for instance in remote field sites. The metadata can be entered in any order or at any stage that the user is able; once the metadata and its data are ready for archiving and an Internet connection is available it can be exported from Arbil and in the case of IMDI it can then be transferred to the main archive via LAMUS (archive management and upload system).
  • Wittek, A. (1998). Learning verb meaning via adverbial modification: Change-of-state verbs in German and the adverb "wieder" again. In A. Greenhill, M. Hughes, H. Littlefield, & H. Walsh (Eds.), Proceedings of the 22nd Annual Boston University Conference on Language Development (pp. 779-790). Somerville, MA: Cascadilla Press.
  • Wittenburg, P., Lenkiewicz, P., Auer, E., Gebre, B. G., Lenkiewicz, A., & Drude, S. (2012). AV Processing in eHumanities - a paradigm shift. In J. C. Meister (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 538-541).

    Abstract

    Introduction Speech research saw a dramatic change in paradigm in the 90-ies. While earlier the discussion was dominated by a phoneticians’ approach who knew about phenomena in the speech signal, the situation completely changed after stochastic machinery such as Hidden Markov Models [1] and Artificial Neural Networks [2] had been introduced. Speech processing was now dominated by a purely mathematic approach that basically ignored all existing knowledge about the speech production process and the perception mechanisms. The key was now to construct a large enough training set that would allow identifying the many free parameters of such stochastic engines. In case that the training set is representative and the annotations of the training sets are widely ‘correct’ we could assume to get a satisfyingly functioning recognizer. While the success of knowledge-based systems such as Hearsay II [3] was limited, the statistically based approach led to great improvements in recognition rates and to industrial applications.
  • Wittenburg, P., & Ringersma, J. (2013). Metadata description for lexicons. In R. H. Gouws, U. Heid, W. Schweickard, & H. E. Wiegand (Eds.), Dictionaries: An international encyclopedia of lexicography: Supplementary volume: Recent developments with focus on electronic and computational lexicography (pp. 1329-1335). Berlin: Mouton de Gruyter.
  • Wittenburg, P., Drude, S., & Broeder, D. (2012). Psycholinguistik. In H. Neuroth, S. Strathmann, A. Oßwald, R. Scheffel, J. Klump, & J. Ludwig (Eds.), Langzeitarchivierung von Forschungsdaten. Eine Bestandsaufnahme (pp. 83-108). Boizenburg: Verlag Werner Hülsbusch.

    Abstract

    5.1 Einführung in den Forschungsbereich Die Psycholinguistik ist der Bereich der Linguistik, der sich mit dem Zusammenhang zwischen menschlicher Sprache und dem Denken und anderen mentalen Prozessen beschäftigt, d.h. sie stellt sich einer Reihe von essentiellen Fragen wie etwa (1) Wie schafft es unser Gehirn, im Wesentlichen akustische und visuelle kommunikative Informationen zu verstehen und in mentale Repräsentationen umzusetzen? (2) Wie kann unser Gehirn einen komplexen Sachverhalt, den wir anderen übermitteln wollen, in eine von anderen verarbeitbare Sequenz von verbalen und nonverbalen Aktionen umsetzen? (3) Wie gelingt es uns, in den verschiedenen Phasen des Lebens Sprachen zu erlernen? (4) Sind die kognitiven Prozesse der Sprachverarbeitung universell, obwohl die Sprachsysteme derart unterschiedlich sind, dass sich in den Strukturen kaum Universalien finden lassen?
  • Wnuk, E., & Majid, A. (2012). Olfaction in a hunter-gatherer society: Insights from language and culture. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 1155-1160). Austin, TX: Cognitive Science Society.

    Abstract

    According to a widely-held view among various scholars, olfaction is inferior to other human senses. It is also believed by many that languages do not have words for describing smells. Data collected among the Maniq, a small population of nomadic foragers in southern Thailand, challenge the above claims and point to a great linguistic and cultural elaboration of odor. This article presents evidence of the importance of olfaction in indigenous rituals and beliefs, as well as in the lexicon. The results demonstrate the richness and complexity of the domain of smell in Maniq society and thereby challenge the universal paucity of olfactory terms and insignificance of olfaction for humans.
  • Woensdregt, M., & Dingemanse, M. (2020). Other-initiated repair can facilitate the emergence of compositional language. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 474-476). Nijmegen: The Evolution of Language Conferences.
  • Wright, S. E., Windhouwer, M., Schuurman, I., & Kemps-Snijders, M. (2013). Community efforts around the ISOcat Data Category Registry. In I. Gurevych, & J. Kim (Eds.), The People's Web meets NLP: Collaboratively constructed language resources (pp. 349-374). New York: Springer.

    Abstract

    The ISOcat Data Category Registry provides a community computing environment for creating, storing, retrieving, harmonizing and standardizing data category specifications (DCs), used to register linguistic terms used in various fields. This chapter recounts the history of DC documentation in TC 37, beginning from paper-based lists created for lexicographers and terminologists and progressing to the development of a web-based resource for a much broader range of users. While describing the considerable strides that have been made to collect a very large comprehensive collection of DCs, it also outlines difficulties that have arisen in developing a fully operative web-based computing environment for achieving consensus on data category names, definitions, and selections and describes efforts to overcome some of the present shortcomings and to establish positive working procedures designed to engage a wide range of people involved in the creation of language resources.
  • Yang, J., Van den Bosch, A., & Frank, S. L. (2020). Less is Better: A cognitively inspired unsupervised model for language segmentation. In M. Zock, E. Chersoni, A. Lenci, & E. Santus (Eds.), Proceedings of the Workshop on the Cognitive Aspects of the Lexicon ( 28th International Conference on Computational Linguistics) (pp. 33-45). Stroudsburg: Association for Computational Linguistics.

    Abstract

    Language users process utterances by segmenting them into many cognitive units, which vary in their sizes and linguistic levels. Although we can do such unitization/segmentation easily, its cognitive mechanism is still not clear. This paper proposes an unsupervised model, Less-is-Better (LiB), to simulate the human cognitive process with respect to language unitization/segmentation. LiB follows the principle of least effort and aims to build a lexicon which minimizes the number of unit tokens (alleviating the effort of analysis) and number of unit types (alleviating the effort of storage) at the same time on any given corpus. LiB’s workflow is inspired by empirical cognitive phenomena. The design makes the mechanism of LiB cognitively plausible and the computational requirement light-weight. The lexicon generated by LiB performs the best among different types of lexicons (e.g. ground-truth words) both from an information-theoretical view and a cognitive view, which suggests that the LiB lexicon may be a plausible proxy of the mental lexicon.

    Additional information

    full text via ACL website
  • Zampieri, M., & Gebre, B. G. (2012). Automatic identification of language varieties: The case of Portuguese. In J. Jancsary (Ed.), Proceedings of the Conference on Natural Language Processing 2012, September 19-21, 2012, Vienna (pp. 233-237). Vienna: Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI).

    Abstract

    Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams.
  • Zampieri, M., Gebre, B. G., & Diwersy, S. (2012). Classifying pluricentric languages: Extending the monolingual model. In Proceedings of SLTC 2012. The Fourth Swedish Language Technology Conference. Lund, October 24-26, 2012 (pp. 79-80). Lund University.

    Abstract

    This study presents a new language identification model for pluricentric languages that uses n-gram language models at the character and word level. The model is evaluated in two steps. The first step consists of the identification of two varieties of Spanish (Argentina and Spain) and two varieties of French (Quebec and France) evaluated independently in binary classification schemes. The second step integrates these language models in a six-class classification with two Portuguese varieties.
  • Zhang, Y., Amatuni, A., Crain, E., & Yu, C. (2020). Seeking meaning: Examining a cross-situational solution to learn action verbs using human simulation paradigm. In S. Denison, M. Mack, Y. Xu, & B. C. Armstrong (Eds.), Proceedings of the 42nd Annual Meeting of the Cognitive Science Society (CogSci 2020) (pp. 2854-2860). Montreal, QB: Cognitive Science Society.

    Abstract

    To acquire the meaning of a verb, language learners not only need to find the correct mapping between a specific verb and an action or event in the world, but also infer the underlying relational meaning that the verb encodes. Most verb naming instances in naturalistic contexts are highly ambiguous as many possible actions can be embedded in the same scenario and many possible verbs can be used to describe those actions. To understand whether learners can find the correct verb meaning from referentially ambiguous learning situations, we conducted three experiments using the Human Simulation Paradigm with adult learners. Our results suggest that although finding the right verb meaning from one learning instance is hard, there is a statistical solution to this problem. When provided with multiple verb learning instances all referring to the same verb, learners are able to aggregate information across situations and gradually converge to the correct semantic space. Even in cases where they may not guess the exact target verb, they can still discover the right meaning by guessing a similar verb that is semantically close to the ground truth.
  • Zinken, J., Rossi, G., & Reddy, V. (2020). Doing more than expected: Thanking recognizes another's agency in providing assistance. In C. Taleghani-Nikazm, E. Betz, & P. Golato (Eds.), Mobilizing others: Grammar and lexis within larger activities (pp. 253-278). Amsterdam: John Benjamins.

    Abstract

    In informal interaction, speakers rarely thank a person who has complied with a request. Examining data from British English, German, Italian, Polish, and Telugu, we ask when speakers do thank after compliance. The results show that thanking treats the other’s assistance as going beyond what could be taken for granted in the circumstances. Coupled with the rareness of thanking after requests, this suggests that cooperation is to a great extent governed by expectations of helpfulness, which can be long-standing, or built over the course of a particular interaction. The higher frequency of thanking in some languages (such as English or Italian) suggests that cultures differ in the importance they place on recognizing the other’s agency in doing as requested.
  • Zwitserlood, I. (2012). Classifiers. In R. Pfau, M. Steinbach, & B. Woll (Eds.), Sign Language: an International Handbook (pp. 158-186). Berlin: Mouton de Gruyter.

    Abstract

    Classifiers (currently also called 'depicting handshapes'), are observed in almost all signed languages studied to date and form a well-researched topic in sign language linguistics. Yet, these elements are still subject to much debate with respect to a variety of matters. Several different categories of classifiers have been posited on the basis of their semantics and the linguistic context in which they occur. The function(s) of classifiers are not fully clear yet. Similarly, there are differing opinions regarding their structure and the structure of the signs in which they appear. Partly as a result of comparison to classifiers in spoken languages, the term 'classifier' itself is under debate. In contrast to these disagreements, most studies on the acquisition of classifier constructions seem to consent that these are difficult to master for Deaf children. This article presents and discusses all these issues from the viewpoint that classifiers are linguistic elements.
  • Zwitserlood, I., Perniss, P. M., & Ozyurek, A. (2013). Expression of multiple entities in Turkish Sign Language (TİD). In E. Arik (Ed.), Current Directions in Turkish Sign Language Research (pp. 272-302). Newcastle upon Tyne: Cambridge Scholars Publishing.

    Abstract

    This paper reports on an exploration of the ways in which multiple entities are expressed in Turkish Sign Language (TİD). The (descriptive and quantitative) analyses provided are based on a corpus of both spontaneous data and specifically elicited data, in order to provide as comprehensive an account as possible. We have found several devices in TİD for expression of multiple entities, in particular localization, spatial plural predicate inflection, and a specific form used to express multiple entities that are side by side in the same configuration (not reported for any other sign language to date), as well as numerals and quantifiers. In contrast to some other signed languages, TİD does not appear to have a productive system of plural reduplication. We argue that none of the devices encountered in the TİD data is a genuine plural marking device and that the plural interpretation of multiple entity localizations and plural predicate inflections is a by-product of the use of space to indicate the existence or the involvement in an event of multiple entities.
  • Zwitserlood, I. (2003). Word formation below and above little x: Evidence from Sign Language of the Netherlands. In Proceedings of SCL 19. Nordlyd Tromsø University Working Papers on Language and Linguistics (pp. 488-502).

    Abstract

    Although in many respects sign languages have a similar structure to that of spoken languages, the different modalities in which both types of languages are expressed cause differences in structure as well. One of the most striking differences between spoken and sign languages is the influence of the interface between grammar and PF on the surface form of utterances. Spoken language words and phrases are in general characterized by sequential strings of sounds, morphemes and words, while in sign languages we find that many phonemes, morphemes, and even words are expressed simultaneously. A linguistic model should be able to account for the structures that occur in both spoken and sign languages. In this paper, I will discuss the morphological/ morphosyntactic structure of signs in Nederlandse Gebarentaal (Sign Language of the Netherlands, henceforth NGT), with special focus on the components ‘place of articulation’ and ‘handshape’. I will focus on their multiple functions in the grammar of NGT and argue that the framework of Distributed Morphology (DM), which accounts for word formation in spoken languages, is also suited to account for the formation of structures in sign languages. First I will introduce the phonological and morphological structure of NGT signs. Then, I will briefly outline the major characteristics of the DM framework. Finally, I will account for signs that have the same surface form but have a different morphological structure by means of that framework.

Share this page