Paul Trilsbeek

Publications

Displaying 1 - 6 of 6
  • Broeder, D., Sloetjes, H., Trilsbeek, P., Van Uytvanck, D., Windhouwer, M., & Wittenburg, P. (2011). Evolving challenges in archiving and data infrastructures. In G. L. J. Haig, N. Nau, S. Schnell, & C. Wegener (Eds.), Documenting endangered languages: Achievements and perspectives (pp. 33-54). Berlin: De Gruyter.

    Abstract

    Introduction Increasingly often research in the humanities is based on data. This change in attitude and research practice is driven to a large extent by the availability of small and cheap yet high-quality recording equipment (video cameras, audio recorders) as well as advances in information technology (faster networks, larger data storage, larger computation power, suitable software). In some institutes such as the Max Planck Institute for Psycholinguistics, already in the 90s a clear trend towards an all-digital domain could be identified, making use of state-of-the-art technology for research purposes. This change of habits was one of the reasons for the Volkswagen Foundation to establish the DoBeS program in 2000 with a clear focus on language documentation based on recordings as primary material.
  • Koenig, A., Ringersma, J., & Trilsbeek, P. (2009). The Language Archiving Technology domain. In Z. Vetulani (Ed.), Human Language Technologies as a Challenge for Computer Science and Linguistics (pp. 295-299).

    Abstract

    The Max Planck Institute for Psycholinguistics (MPI) manages an archive of linguistic research data with a current size of almost 20 Terabytes. Apart from in-house researchers other projects also store their data in the archive, most notably the Documentation of Endangered Languages (DoBeS) projects. The archive is available online and can be accessed by anybody with Internet access. To be able to manage this large amount of data the MPI's technical group has developed a software suite called Language Archiving Technology (LAT) that on the one hand helps researchers and archive managers to manage the data and on the other hand helps users in enriching their primary data with additional layers. All the MPI software is Java-based and developed according to open source principles (GNU, 2007). All three major operating systems (Windows, Linux, MacOS) are supported and the software works similarly on all of them. As the archive is online, many of the tools, especially the ones for accessing the data, are browser based. Some of these browser-based tools make use of Adobe Flex to create nice-looking GUIs. The LAT suite is a complete set of management and enrichment tools, and given the interaction between the tools the result is a complete LAT software domain. Over the last 10 years, this domain has proven its functionality and use, and is being deployed to servers in other institutions. This deployment is an important step in getting the archived resources back to the members of the speech communities whose languages are documented. In the paper we give an overview of the tools of the LAT suite and we describe their functionality and role in the integrated process of archiving, management and enrichment of linguistic data.
  • Trilsbeek, P., & Van Uytvanck, D. (2009). Regional archives and community portals. IASA Journal, 32, 69-73.
  • Trilsbeek, P., & Wittenburg, P. (2007). "Los acervos lingüísticos digitales y sus desafíos". In J. Haviland, & F. Farfán (Eds.), Bases de la documentacíon lingüística (pp. 359-385). Mexico: Instituto Nacional de Lenguas Indígenas.

    Abstract

    This chapter describes the challenges that modern digital language archives are faced with. One essential aspect of such an archive is to have a rich metadata catalog such that the archived resources can be easily discovered. The challenge of the archive is to obtain these rich metadata descriptions from the depositors without creating too much overhead for them. The rapid changes in storage technology, file formats and encoding standards make it difficult to build a long-lasting repository, therefore archives need to be set up in such a way that a straightforward and automated migration process to newer technology is possible whenever certain technology becomes obsolete. Other problems arise from the fact that there are many different groups of users of the archive, each of them with their own specific expectations and demands. Often conflicts exist between the requirements for different purposes of the archive, e.g. between long-term preservation of the data versus direct access to the resources via the web. The task of the archive is to come up with a technical solution that works well for most usage scenarios.
  • Trilsbeek, P., & Wittenburg, P. (2005). Archiving challenges. In J. Gippert, N. Himmelmann, & U. Mosel (Eds.), Essentials of language documentation (pp. 311-335). Berlin: Mouton de Gruyter.
  • Wittenburg, P., Skiba, R., & Trilsbeek, P. (2005). The language archive at the MPI: Contents, tools, and technologies. Language Archives Newsletter, 5, 7-9.

Share this page