Paul Trilsbeek

Publications

Displaying 1 - 8 of 8
  • Klamer, M., Trilsbeek, P., Hoogervorst, T., & Haskett, C. (2017). Creating a Language Archive of Insular South East Asia and West New Guinea. In J. Odijk, & A. Van Hessen (Eds.), CLARIN in the Low Countries (pp. 113-121). London: Ubiquity Press. doi:10.5334/bbi.10.

    Abstract

    The geographical region of Insular South East Asia and New Guinea is well-known as an
    area of mega-biodiversity. Less well-known is the extreme linguistic diversity in this area:
    over a quarter of the world’s 6,000 languages are spoken here. As small minority languages,
    most of them will cease to be spoken in the coming few generations. The project described
    here ensures the preservation of unique records of languages and the cultures encapsulated
    by them in the region. The language resources were gathered by twenty linguists at,
    or in collaboration with, Dutch universities over the last 40 years, and were compiled and
    archived in collaboration with The Language Archive (TLA) at the Max Planck Institute in
    Nijmegen. The resulting archive constitutes a collection ofmultimediamaterials and written
    documents from 48 languages in Insular South East Asia and West New Guinea. At TLA,
    the data was archived according to state-of-the-art standards (TLA holds the Data Seal of
    Approval): the component metadata infrastructure CMDI was used; all metadata categories
    as well as relevant units of annotation were linked to the ISO data category registry ISOcat.
    This guaranteed proper integration of the language resources into the CLARIN framework.
    Through the archive, future speaker communities and researchers will be able to extensively
    search thematerials for answers to their own questions, even if they do not themselves know the language, and even if the language dies.
  • Edmunds, R., L'Hours, H., Rickards, L., Trilsbeek, P., Vardigan, M., & Mokrane, M. (2016). Core trustworthy data repositories requirements. Zenodo, 168411. doi:10.5281/zenodo.168411.

    Abstract

    The Core Trustworthy Data Repository Requirements were developed by the DSA–WDS Partnership Working Group on Repository Audit and Certification, a Working Group (WG) of the Research Data Alliance . The goal of the effort was to create a set of harmonized common requirements for certification of repositories at the core level, drawing from criteria already put in place by the Data Seal of Approval (DSA: www.datasealofapproval.org) and the ICSU World Data System (ICSU-WDS: https://www.icsu-wds.org/services/certification). An additional goal of the project was to develop common procedures to be implemented by both DSA and ICSU-WDS. Ultimately, the DSA and ICSU-WDS plan to collaborate on a global framework for repository certification that moves from the core to the extended (nestor-Seal DIN 31644), to the formal (ISO 16363) level.
  • Trilsbeek, P., & Windhouwer, M. (2016). FLAT: A CLARIN-compatible repository solution based on Fedora Commons. In Proceedings of the CLARIN Annual Conference 2016. Clarin ERIC.

    Abstract

    This paper describes the development of a CLARIN-compatible repository solution that fulfils
    both the long-term preservation requirements as well as the current day discoverability and usability
    needs of an online data repository of language resources. The widely used Fedora Commons
    open source repository framework, combined with the Islandora discovery layer, forms
    the basis of the solution. On top of this existing solution, additional modules and tools are developed
    to make it suitable for the types of data and metadata that are used by the participating
    partners.

    Additional information

    link to pdf on CLARIN site
  • Windhouwer, M., Kemps-Snijders, M., Trilsbeek, P., Moreira, A., Van der Veen, B., Silva, G., & Von Rhein, D. (2016). FLAT: Constructing a CLARIN Compatible Home for Language Resources. In K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, & A. Moreno (Eds.), Proccedings of LREC 2016: 10th International Conference on Language Resources and Evalution (pp. 2478-2483). Paris: European Language Resources Association (ELRA).

    Abstract

    Language resources are valuable assets, both for institutions and researchers. To safeguard these resources requirements for repository systems and data management have been specified by various branch organizations, e.g., CLARIN and the Data Seal of Approval. This paper describes these and some additional ones posed by the authors’ home institutions. And it shows how they are met by FLAT, to provide a new home for language resources. The basis of FLAT is formed by the Fedora Commons repository system. This repository system can meet many of the requirements out-of-the box, but still additional configuration and some development work is needed to meet the remaining ones, e.g., to add support for Handles and Component Metadata. This paper describes design decisions taken in the construction of FLAT’s system architecture via a mix-and-match strategy, with a preference for the reuse of existing solutions. FLAT is developed and used by the a Institute and The Language Archive, but is also freely available for anyone in need of a CLARIN-compliant repository for their language resources.
  • Trilsbeek, P., Broeder, D., Van Valkenhoef, T., & Wittenburg, P. (2008). A grid of regional language archives. In C. Calzolari (Ed.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008) (pp. 1474-1477). European Language Resources Association (ELRA).

    Abstract

    About two years ago, the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, started an initiative to install regional language archives in various places around the world, particularly in places where a large number of endangered languages exist and are being documented. These digital archives make use of the LAT archiving framework [1] that the MPI has developed
    over the past nine years. This framework consists of a number of web-based tools for depositing, organizing and utilizing linguistic resources in a digital archive. The regional archives are in principle autonomous archives, but they can decide to share metadata descriptions and language resources with the MPI archive in Nijmegen and become part of a grid of linked LAT archives. By doing so, they will also take advantage of the long-term preservation strategy of the MPI archive. This paper describes the reasoning
    behind this initiative and how in practice such an archive is set up.
  • Van Uytvanck, D., Dukers, A., Ringersma, J., & Trilsbeek, P. (2008). Language-sites: Accessing and presenting language resources via geographic information systems. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). Paris: European Language Resources Association (ELRA).

    Abstract

    The emerging area of Geographic Information Systems (GIS) has proven to add an interesting dimension to many research projects. Within the language-sites initiative we have brought together a broad range of links to digital language corpora and resources. Via Google Earth's visually appealing 3D-interface users can spin the globe, zoom into an area they are interested in and access directly the relevant language resources. This paper focuses on several ways of relating the map and the online data (lexica, annotations, multimedia recordings, etc.). Furthermore, we discuss some of the implementation choices that have been made, including future challenges. In addition, we show how scholars (both linguists and anthropologists) are using GIS tools to fulfill their specific research needs by making use of practical examples. This illustrates how both scientists and the general public can benefit from geography-based access to digital language data
  • Trilsbeek, P., & Wittenburg, P. (2005). Archiving challenges. In J. Gippert, N. Himmelmann, & U. Mosel (Eds.), Essentials of language documentation (pp. 311-335). Berlin: Mouton de Gruyter.
  • Wittenburg, P., Skiba, R., & Trilsbeek, P. (2005). The language archive at the MPI: Contents, tools, and technologies. Language Archives Newsletter, 5, 7-9.

Share this page