Paul Trilsbeek

Publications

Displaying 1 - 11 of 11
  • Klamer, M., Trilsbeek, P., Hoogervorst, T., & Haskett, C. (2017). Creating a Language Archive of Insular South East Asia and West New Guinea. In J. Odijk, & A. Van Hessen (Eds.), CLARIN in the Low Countries (pp. 113-121). London: Ubiquity Press. doi:10.5334/bbi.10.

    Abstract

    The geographical region of Insular South East Asia and New Guinea is well-known as an
    area of mega-biodiversity. Less well-known is the extreme linguistic diversity in this area:
    over a quarter of the world’s 6,000 languages are spoken here. As small minority languages,
    most of them will cease to be spoken in the coming few generations. The project described
    here ensures the preservation of unique records of languages and the cultures encapsulated
    by them in the region. The language resources were gathered by twenty linguists at,
    or in collaboration with, Dutch universities over the last 40 years, and were compiled and
    archived in collaboration with The Language Archive (TLA) at the Max Planck Institute in
    Nijmegen. The resulting archive constitutes a collection ofmultimediamaterials and written
    documents from 48 languages in Insular South East Asia and West New Guinea. At TLA,
    the data was archived according to state-of-the-art standards (TLA holds the Data Seal of
    Approval): the component metadata infrastructure CMDI was used; all metadata categories
    as well as relevant units of annotation were linked to the ISO data category registry ISOcat.
    This guaranteed proper integration of the language resources into the CLARIN framework.
    Through the archive, future speaker communities and researchers will be able to extensively
    search thematerials for answers to their own questions, even if they do not themselves know the language, and even if the language dies.
  • Drude, S., Trilsbeek, P., & Broeder, D. (2012). Language Documentation and Digital Humanities: The (DoBeS) Language Archive. In J. C. Meister (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 169-173).

    Abstract

    Overview Since the early nineties, the on-going dramatic loss of the world’s linguistic diversity has gained attention, first by the linguists and increasingly also by the general public. As a response, the new field of language documentation emerged from around 2000 on, starting with the funding initiative ‘Dokumentation Bedrohter Sprachen’ (DoBeS, funded by the Volkswagen foundation, Germany), soon to be followed by others such as the ‘Endangered Languages Documentation Programme’ (ELDP, at SOAS, London), or, in the USA, ‘Electronic Meta-structure for Endangered Languages Documentation’ (EMELD, led by the LinguistList) and ‘Documenting Endangered Languages’ (DEL, by the NSF). From its very beginning, the new field focused on digital technologies not only for recording in audio and video, but also for annotation, lexical databases, corpus building and archiving, among others. This development not just coincides but is intrinsically interconnected with the increasing focus on digital data, technology and methods in all sciences, in particular in the humanities.
  • Drude, S., Broeder, D., Trilsbeek, P., & Wittenburg, P. (2012). The Language Archive: A new hub for language resources. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 3264-3267). European Language Resources Association (ELRA).

    Abstract

    This contribution presents “The Language Archive” (TLA), a new unit at the MPI for Psycholinguistics, discussing the current developments in management of scientific data, considering the need for new data research infrastructures. Although several initiatives worldwide in the realm of language resources aim at the integration, preservation and mobilization of research data, the state of such scientific data is still often problematic. Data are often not well organized and archived and not described by metadata ― even unique data such as field-work observational data on endangered languages is still mostly on perishable carriers. New data centres are needed that provide trusted, quality-reviewed, persistent services and suitable tools and that take legal and ethical issues seriously. The CLARIN initiative has established criteria for suitable centres. TLA is in a good position to be one of such centres. It is based on three essential pillars: (1) A data archive; (2) management, access and annotation tools; (3) archiving and software expertise for collaborative projects. The archive hosts mostly observational data on small languages worldwide and language acquisition data, but also data resulting from experiments
  • Seifart, F., Haig, G., Himmelmann, N. P., Jung, D., Margetts, A., & Trilsbeek, P. (Eds.). (2012). Potentials of language documentation: Methods, analyses, and utilization. Honolulu: University of Hawai‘i Press.

    Abstract

    In the past 10 or so years, intensive documentation activities, i.e. compilations of large, multimedia corpora of spoken endangered languages have contributed to the documentation of important linguistic and cultural aspects of dozens of languages. As laid out in Himmelmann (1998), language documentations include as their central components a collection of spoken texts from a variety of genres, recorded on video and/or audio, with time-aligned annotations consisting of transcription, translation, and also, for some data, morphological segmentation and glossing. Text collections are often complemented by elicited data, e.g. word lists, and structural descriptions such as a grammar sketch. All data are provided with metadata which serve as cataloguing devices for their accessibility in online archives. These newly available language documentation data have enormous potential.
  • Wittenburg, P., & Trilsbeek, P. (2010). Digital archiving - a necessity in documentary linguistics. In G. Senft (Ed.), Endangered Austronesian and Australian Aboriginal languages: Essays on language documentation, archiving and revitalization (pp. 111-136). Canberra: Pacific Linguistics.
  • Wittenburg, P., Trilsbeek, P., & Lenkiewicz, P. (2010). Large multimedia archive for world languages. In SSCS'10 - Proceedings of the 2010 ACM Workshop on Searching Spontaneous Conversational Speech, Co-located with ACM Multimedia 2010 (pp. 53-56). New York: Association for Computing Machinery, Inc. (ACM). doi:10.1145/1878101.1878113.

    Abstract

    In this paper, we describe the core pillars of a large archive oflanguage material recorded worldwide partly about languages that are highly endangered. The bases for the documentation of these languages are audio/video recordings which are then annotated at several linguistic layers. The digital age completely changed the requirements of long-term preservation and it is discussed how the archive met these new challenges. An extensive solution for data replication has been worked out to guarantee bit-stream preservation. Due to an immediate conversion of the incoming data to standards -based formats and checks at upload time lifecycle management of all 50 Terabyte of data is widely simplified. A suitable metadata framework not only allowing users to describe and discover resources, but also allowing them to organize their resources is enabling the management of this amount of resources very efficiently. Finally, it is the Language Archiving Technology software suite which allows users to create, manipulate, access and enrich all archived resources given that they have access permissions.
  • Trilsbeek, P., Broeder, D., Van Valkenhoef, T., & Wittenburg, P. (2008). A grid of regional language archives. In C. Calzolari (Ed.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008) (pp. 1474-1477). European Language Resources Association (ELRA).

    Abstract

    About two years ago, the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, started an initiative to install regional language archives in various places around the world, particularly in places where a large number of endangered languages exist and are being documented. These digital archives make use of the LAT archiving framework [1] that the MPI has developed
    over the past nine years. This framework consists of a number of web-based tools for depositing, organizing and utilizing linguistic resources in a digital archive. The regional archives are in principle autonomous archives, but they can decide to share metadata descriptions and language resources with the MPI archive in Nijmegen and become part of a grid of linked LAT archives. By doing so, they will also take advantage of the long-term preservation strategy of the MPI archive. This paper describes the reasoning
    behind this initiative and how in practice such an archive is set up.
  • Van Uytvanck, D., Dukers, A., Ringersma, J., & Trilsbeek, P. (2008). Language-sites: Accessing and presenting language resources via geographic information systems. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). Paris: European Language Resources Association (ELRA).

    Abstract

    The emerging area of Geographic Information Systems (GIS) has proven to add an interesting dimension to many research projects. Within the language-sites initiative we have brought together a broad range of links to digital language corpora and resources. Via Google Earth's visually appealing 3D-interface users can spin the globe, zoom into an area they are interested in and access directly the relevant language resources. This paper focuses on several ways of relating the map and the online data (lexica, annotations, multimedia recordings, etc.). Furthermore, we discuss some of the implementation choices that have been made, including future challenges. In addition, we show how scholars (both linguists and anthropologists) are using GIS tools to fulfill their specific research needs by making use of practical examples. This illustrates how both scientists and the general public can benefit from geography-based access to digital language data
  • Broeder, D., Claus, A., Offenga, F., Skiba, R., Trilsbeek, P., & Wittenburg, P. (2006). LAMUS: The Language Archive Management and Upload System. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 2291-2294).
  • Trilsbeek, P., & Wittenburg, P. (2005). Archiving challenges. In J. Gippert, N. Himmelmann, & U. Mosel (Eds.), Essentials of language documentation (pp. 311-335). Berlin: Mouton de Gruyter.
  • Wittenburg, P., Skiba, R., & Trilsbeek, P. (2005). The language archive at the MPI: Contents, tools, and technologies. Language Archives Newsletter, 5, 7-9.

Share this page