Wittenburg, P., Trilsbeek, P., & Lenkiewicz, P.
(2010). Large multimedia archive for world languages. Talk presented at the ACM Workshop on Searching Spontaneous Conversational Speech [SSCS 2010]. Firenze, Italy. 2010-10-25 - 2010-10-29. doi:10.1145/1878101.1878113.
In this paper, we describe the core pillars of a large archive oflanguage material recorded worldwide partly about languages that are highly endangered. The bases for the documentation of these languages are audio/video recordings which are then annotated at several linguistic layers. The digital age completely changed the requirements of long-term preservation and it is discussed how the archive met these new challenges. An extensive solution for data replication has been worked out to guarantee bit-stream preservation. Due to an immediate conversion of the incoming data to standards -based formats and checks at upload time lifecycle management of all 50 Terabyte of data is widely simplified. A suitable metadata framework not only allowing users to describe and discover resources, but also allowing them to organize their resources is enabling the management of this amount of resources very efficiently. Finally, it is the Language Archiving Technology software suite which allows users to create, manipulate, access and enrich all archived resources given that they have access permissions.