last update: 11 Oct 2007
The goal of the EAGLES/ISLE Meta Data Initiative is to make a proposal for a standard of metadata descriptions of Multi-Media/Multi-Modal Language Resources. Using such a standard it will become possible to create a browsable and searchable universe of such resources in the Internet. This will enable interested parties to efficiently locate suitable resources and thus increases their reusability.
Currently many language resources are being generated in disciplines such as corpus linguistics, anthropology and language and speech engineering but only few of them are available through the catalogues of the well-known agencies such as LDC and ELRA. Also most of these resources are not available in any "public" way at all and only very few people know about them. It is well known that even in the institutions where these resources are generated it seems to be problematic to exchange information about these resources in a systematic way.
The situation sketched above is the reason for starting the ISLE Meta Data Initiative. The Language Resource community needs a standard for describing the main characteristics of resources such as in the case of corpora: the name of the language spoken, the speakers age, sex and educational background etc. The community also needs tools. Tools to help generate such metadata descriptions in an easy way, preferably during their creation. Tools that will make such descriptions available on the internet and integrate them in the emerging universe of metadata descriptions and tools that allow users to browse and search that universe and finally access the resources themselves.
The project is partly based on existing conventions and standards in the Language Resource community. In earlier corpora such as Childes or ESF Second Language Databank each corpus file included so-called header information in a proprietary format. Also important initiatives such as TEI and CES/xCES worked out tag-sets for typical data describing the whole file which in this initiative is called metadata. Some institutions such as Helsinki University started to build web sites with corpus samples where hyperlinks and comments with typical metadata allow the user to easily navigate between the language samples.
We have established a broad network of interested persons. These are people from the
language resource community who are committed in some way to contribute to the eventual goal.
New information will be made available on the web-pages and people will be asked for comments. Public versions of documents to be discussed are placed on the web as they become available.
Workshops & Conferences
The concept of a universe of meta-descriptions and the progress of work will be presented
at workshops and conferences. All members of the network and especially the boards will
be asked to help promoting this idea and to mention interesting platforms for discussions.
The publication of open documents which can be cited as official IMDI notes are the responsibility of the SB chairman. These documents are discussed in the SB before publication, and there will have to be a general agreement on basic issues, but the chairman has ultimate responsibility.
The steering board consists of committed people who are able - to some
extent - to represent specific sub-communities and the different European areas. This
board discusses the various proposals in detail, make comments and make decisions.
Members of this board actively participate in the discussion process and has meetings
at regular intervals asking other specialists for advice where necessary.
|Christoph Draxler||Univ. Munich|
|Daan Broeder||MPI, Nijmegen|
|Heidi Johnson||Univ. Texas, Austin|
|Laurant Romary||LORIA/CNRS, Nancy|
|Marcus Uneson||Univ. Lund|
|Masimo Moneglia||Univ. Florence|
|Nelleke Oostdijk||Univ. Nijmegen|
|Onno Crasborn||Univ. Nijmegen|
|Peter Austin||Univ. London|
|Peter Wittenburg||MPI, Nijmegen|