Klamer, M., Trilsbeek, P., Hoogervorst, T., & Haskett, C.
(2017). Creating a Language Archive of Insular South East Asia and West New Guinea. In J. Odijk, & A. Van Hessen (Eds.), CLARIN in the Low Countries (pp. 113-121). London: Ubiquity Press. doi:10.5334/bbi.10.
The geographical region of Insular South East Asia and New Guinea is well-known as an
area of mega-biodiversity. Less well-known is the extreme linguistic diversity in this area:
over a quarter of the world’s 6,000 languages are spoken here. As small minority languages,
most of them will cease to be spoken in the coming few generations. The project described
here ensures the preservation of unique records of languages and the cultures encapsulated
by them in the region. The language resources were gathered by twenty linguists at,
or in collaboration with, Dutch universities over the last 40 years, and were compiled and
archived in collaboration with The Language Archive (TLA) at the Max Planck Institute in
Nijmegen. The resulting archive constitutes a collection ofmultimediamaterials and written
documents from 48 languages in Insular South East Asia and West New Guinea. At TLA,
the data was archived according to state-of-the-art standards (TLA holds the Data Seal of
Approval): the component metadata infrastructure CMDI was used; all metadata categories
as well as relevant units of annotation were linked to the ISO data category registry ISOcat.
This guaranteed proper integration of the language resources into the CLARIN framework.
Through the archive, future speaker communities and researchers will be able to extensively
search thematerials for answers to their own questions, even if they do not themselves know the language, and even if the language dies.