Unlocking language archives using search
The Language Archive manages one of the largest and most varied sets of natural language data. This data consists of video and audio enriched with annotations. It is available for more than 250 languages, many of which are endangered. Researchers have a need to access this data conveniently and efficiently. We provide several browse and search methods to cover this need, which have been developed and expanded over the years. Metadata and content-oriented search methods can be connected for a more focused search. This article aims to provide a complete overview of the available search mechanisms, with a focus on annotation content search, including a benchmark.
Publication typeProceedings paper