WP2 : Integrated Resource Domain
The objectives of this WP are:
First the set of language resources to be integrated in the prototypical domain have to be defined. All resources proposed by the 13 data centers involved in the project will be evaluated. The chosen set should contain an attractive mix of relevant resources. The following criteria will be applied: relevance, complexity and size, multilinguality, multimodality, variation of type (corpora, lexica, ...), variety in usage (industrial, research, public).
Then it will be discussed for each resource which type of metadata description is most useful: The ISLE MetaData Initiative (IMDI) set offers detailed categories to describe corpora and lexica. Both emerged as a result of intensive discussions within ISLE and beyond. But it also allows to describe other type of resources by using less elements. Another opportunity for other data types than corpora and lexica is to use the Dublin Core (DC) or Open Language Archives Community (OLAC) set. Dependent on the type of resource it will also be the case that the metadata element sets have to be extended to better cover, for example, multilingual types of resources. This will be checked at the beginning of the project in collaboration with the resource providers. Depending on the result the existing IMDI tools will have to be adapted.
Then the metadata descriptions will have to be created for the selected resources. This will be done in interaction between mainly the MPI and the resource centers. A seminar will be carried out to train the specialists in applying the metadata elements and to understand the characteristics of the resources in detail. Scripts will have to be specified to use existing documentation to bootstrap the XML-based metadata descriptions. These scripts will then be implemented in an efficient way, i.e. similarities between problems will lead to more generic scripts. Further student assistant time will be necessary to manually add descriptions to make them attractive for searching operations. It will be necessary to send MPI specialists to the various centers to help the local people or to transfer the existing descriptions to the MPI so that the specialists at the MPI can carry out the relevant steps. This work will be controlled by the MPI following a continuous interaction scheme, i.e. all descriptions created have to be directly transmitted to the MPI to test their correctness and they will be directly integrated into the emerging domain description if possible.
In parallel to these activities, the MPI will take care of the necessary adaptions of the IMDI standard and of the editing and searching/browsing tools such that efficient work is possible. The adaptions suggested from the INTERA project will be presented to the steering and advisory boards of IMDI to find approval.
Finally, the MPI will integrate the metadata descriptions, create a meaningful browsable hierarchy and deliver the searchable and browsable domain as a distributed repository, i.e. each data center will house the metadata descriptions which belong to "their" resources. This scheme will guarantee that the created domain will remain dynamic. The centers can improve descriptions, add new resources and other centers can join the enterprise easily. Also the necessary links to the language resources themselves will be added when these are openly accessible. A setup will be described such that portals can be created very easily. This ensures that besides ELDA other institutes could decide to build a portal as well. All procedures will be documented such that it is easy for other centers to join.