Publications

Displaying 301 - 314 of 314
  • Wittenburg, P., Lenkiewicz, P., Auer, E., Gebre, B. G., Lenkiewicz, A., & Drude, S. (2012). AV Processing in eHumanities - a paradigm shift. In J. C. Meister (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 538-541).

    Abstract

    Introduction Speech research saw a dramatic change in paradigm in the 90-ies. While earlier the discussion was dominated by a phoneticians’ approach who knew about phenomena in the speech signal, the situation completely changed after stochastic machinery such as Hidden Markov Models [1] and Artificial Neural Networks [2] had been introduced. Speech processing was now dominated by a purely mathematic approach that basically ignored all existing knowledge about the speech production process and the perception mechanisms. The key was now to construct a large enough training set that would allow identifying the many free parameters of such stochastic engines. In case that the training set is representative and the annotations of the training sets are widely ‘correct’ we could assume to get a satisfyingly functioning recognizer. While the success of knowledge-based systems such as Hearsay II [3] was limited, the statistically based approach led to great improvements in recognition rates and to industrial applications.
  • Wittenburg, P., Gulrajani, G., Broeder, D., & Uneson, M. (2004). Cross-disciplinary integration of metadata descriptions. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004) (pp. 113-116). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Kita, S., & Brugman, H. (2002). Crosslinguistic studies of multimodal communication.
  • Wittenburg, P., Johnson, H., Buchhorn, M., Brugman, H., & Broeder, D. (2004). Architecture for distributed language resource management and archiving. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004) (pp. 361-364). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Peters, W., & Drude, S. (2002). Analysis of lexical structures from field linguistics and language engineering. In M. R. González, & C. P. S. Araujo (Eds.), Third international conference on language resources and evaluation (pp. 682-686). Paris: European Language Resources Association.

    Abstract

    Lexica play an important role in every linguistic discipline. We are confronted with many types of lexica. Depending on the type of lexicon and the language we are currently faced with a large variety of structures from very simple tables to complex graphs, as was indicated by a recent overview of structures found in dictionaries from field linguistics and language engineering. It is important to assess these differences and aim at the integration of lexical resources in order to improve lexicon creation, exchange and reuse. This paper describes the first step towards the integration of existing structures and standards into a flexible abstract model.
  • Wittenburg, P., & Broeder, D. (2002). Metadata overview and the semantic web. In P. Austin, H. Dry, & P. Wittenburg (Eds.), Proceedings of the international LREC workshop on resources and tools in field linguistics. Paris: European Language Resources Association.

    Abstract

    The increasing quantity and complexity of language resources leads to new management problems for those that collect and those that need to preserve them. At the same time the desire to make these resources available on the Internet demands an efficient way characterizing their properties to allow discovery and re-use. The use of metadata is seen as a solution for both these problems. However, the question is what specific requirements there are for the specific domain and if these are met by existing frameworks. Any possible solution should be evaluated with respect to its merit for solving the domain specific problems but also with respect to its future embedding in “global” metadata frameworks as part of the Semantic Web activities.
  • Wittenburg, P., Peters, W., & Broeder, D. (2002). Metadata proposals for corpora and lexica. In M. Rodriguez González, & C. Paz Suárez Araujo (Eds.), Third international conference on language resources and evaluation (pp. 1321-1326). Paris: European Language Resources Association.
  • Wittenburg, P., Mosel, U., & Dwyer, A. (2002). Methods of language documentation in the DOBES program. In P. Austin, H. Dry, & P. Wittenburg (Eds.), Proceedings of the international LREC workshop on resources and tools in field linguistics (pp. 36-42). Paris: European Language Resources Association.
  • Wnuk, E., & Majid, A. (2012). Olfaction in a hunter-gatherer society: Insights from language and culture. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 1155-1160). Austin, TX: Cognitive Science Society.

    Abstract

    According to a widely-held view among various scholars, olfaction is inferior to other human senses. It is also believed by many that languages do not have words for describing smells. Data collected among the Maniq, a small population of nomadic foragers in southern Thailand, challenge the above claims and point to a great linguistic and cultural elaboration of odor. This article presents evidence of the importance of olfaction in indigenous rituals and beliefs, as well as in the lexicon. The results demonstrate the richness and complexity of the domain of smell in Maniq society and thereby challenge the universal paucity of olfactory terms and insignificance of olfaction for humans.
  • Xiao, M., Kong, X., Liu, J., & Ning, J. (2009). TMBF: Bloom filter algorithms of time-dependent multi bit-strings for incremental set. In Proceedings of the 2009 International Conference on Ultra Modern Telecommunications & Workshops.

    Abstract

    Set is widely used as a kind of basic data structure. However, when it is used for large scale data set the cost of storage, search and transport is overhead. The bloom filter uses a fixed size bit string to represent elements in a static set, which can reduce storage space and search cost that is a fixed constant. The time-space efficiency is achieved at the cost of a small probability of false positive in membership query. However, for many applications the space savings and locating time constantly outweigh this drawback. Dynamic bloom filter (DBF) can support concisely representation and approximate membership queries of dynamic set instead of static set. It has been proved that DBF not only possess the advantage of standard bloom filter, but also has better features when dealing with dynamic set. This paper proposes a time-dependent multiple bit-strings bloom filter (TMBF) which roots in the DBF and targets on dynamic incremental set. TMBF uses multiple bit-strings in time order to present a dynamic increasing set and uses backward searching to test whether an element is in a set. Based on the system logs from a real P2P file sharing system, the evaluation shows a 20% reduction in searching cost compared to DBF.
  • Zampieri, M., & Gebre, B. G. (2012). Automatic identification of language varieties: The case of Portuguese. In J. Jancsary (Ed.), Proceedings of the Conference on Natural Language Processing 2012, September 19-21, 2012, Vienna (pp. 233-237). Vienna: Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI).

    Abstract

    Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams.
  • Zampieri, M., Gebre, B. G., & Diwersy, S. (2012). Classifying pluricentric languages: Extending the monolingual model. In Proceedings of SLTC 2012. The Fourth Swedish Language Technology Conference. Lund, October 24-26, 2012 (pp. 79-80). Lund University.

    Abstract

    This study presents a new language identification model for pluricentric languages that uses n-gram language models at the character and word level. The model is evaluated in two steps. The first step consists of the identification of two varieties of Spanish (Argentina and Spain) and two varieties of French (Quebec and France) evaluated independently in binary classification schemes. The second step integrates these language models in a six-class classification with two Portuguese varieties.
  • Zhang, Y., Yurovsky, D., & Yu, C. (2015). Statistical word learning is a continuous process: Evidence from the human simulation paradigm. In D. Noelle, R. Dale, A. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. P. Maglio (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society (CogSci 2015) (pp. 2422-2427). Austin: Cognitive Science Society.

    Abstract

    In the word-learning domain, both adults and young children are able to find the correct referent of a word from highly ambiguous contexts that involve many words and objects by computing distributional statistics across the co-occurrences of words and referents at multiple naming moments (Yu & Smith, 2007; Smith & Yu, 2008). However, there is still debate regarding how learners accumulate distributional information to learn object labels in natural learning environments, and what underlying learning mechanism learners are most likely to adopt. Using the Human Simulation Paradigm (Gillette, Gleitman, Gleitman & Lederer, 1999), we found that participants’ learning performance gradually improved and that their ability to remember and carry over partial knowledge from past learning instances facilitated subsequent learning. These results support the statistical learning model that word learning is a continuous process.
  • Zwitserlood, I. (2002). The complex structure of ‘simple’ signs in NGT. In J. Van Koppen, E. Thrift, E. Van der Torre, & M. Zimmermann (Eds.), Proceedings of ConSole IX (pp. 232-246).

    Abstract

    In this paper, I argue that components in a set of simple signs in Nederlandse Gebarentaal (also called Sign Language of the Netherlands; henceforth: NGT), i.e. hand configuration (including orientation), movement and place of articulation, can also have morphological status. Evidence for this is provided by: firstly, the fact that handshape, orientation, movement and place of articulation show regular meaningful patterns in signs, which patterns also occur in newly formed signs, and secondly, the gradual change of formerly noninflecting predicates into inflectional predicates. The morphological complexity of signs can best be accounted for in autosegmental morphological templates.

Share this page