Project

last update: 11 Oct 2007

Mission

The goal of the EAGLES/ISLE Meta Data Initiative is to make a proposal for a standard of metadata descriptions of Multi-Media/Multi-Modal Language Resources. Using such a standard it will become possible to create a browsable and searchable universe of such resources in the Internet. This will enable interested parties to efficiently locate suitable resources and thus increases their reusability.

Currently many language resources are being generated in disciplines such as corpus linguistics, anthropology and language and speech engineering but only few of them are available through the catalogues of the well-known agencies such as LDC and ELRA. Also most of these resources are not available in any "public" way at all and only very few people know about them. It is well known that even in the institutions where these resources are generated it seems to be problematic to exchange information about these resources in a systematic way.

The situation sketched above is the reason for starting the ISLE Meta Data Initiative. The Language Resource community needs a standard for describing the main characteristics of resources such as in the case of corpora: the name of the language spoken, the speakers age, sex and educational background etc. The community also needs tools. Tools to help generate such metadata descriptions in an easy way, preferably during their creation. Tools that will make such descriptions available on the internet and integrate them in the emerging universe of metadata descriptions and tools that allow users to browse and search that universe and finally access the resources themselves.

The project is partly based on existing conventions and standards in the Language Resource community. In earlier corpora such as Childes or ESF Second Language Databank each corpus file included so-called header information in a proprietary format. Also important initiatives such as TEI and CES/xCES worked out tag-sets for typical data describing the whole file which in this initiative is called metadata. Some institutions such as Helsinki University started to build web sites with corpus samples where hyperlinks and comments with typical metadata allow the user to easily navigate between the language samples.

We have established a broad network of interested persons. These are people from the language resource community who are committed in some way to contribute to the eventual goal.
New information will be made available on the web-pages and people will be asked for comments. Public versions of documents to be discussed are placed on the web as they become available.

Workshops & Conferences

The concept of a universe of meta-descriptions and the progress of work will be presented at workshops and conferences. All members of the network and especially the boards will be asked to help promoting this idea and to mention interesting platforms for discussions.

Official Documents

The publication of open documents which can be cited as official IMDI notes are the responsibility of the SB chairman. These documents are discussed in the SB before publication, and there will have to be a general agreement on basic issues, but the chairman has ultimate responsibility.

Steering Board

The steering board consists of committed people who are able - to some extent - to represent specific sub-communities and the different European areas. This board discusses the various proposals in detail, make comments and make decisions. Members of this board actively participate in the discussion process and has meetings at regular intervals asking other specialists for advice where necessary.

Name Place
Christoph Draxler Univ. Munich
Daan Broeder MPI, Nijmegen
Heidi Johnson Univ. Texas, Austin
Laurant Romary LORIA/CNRS, Nancy
Marcus Uneson Univ. Lund
Masimo Moneglia Univ. Florence
Nelleke Oostdijk Univ. Nijmegen
Onno Crasborn Univ. Nijmegen
Peter Austin Univ. London
Peter Wittenburg MPI, Nijmegen