ECHO Project WP2: Infrastructure and Technology

Peter Wittenburg, December 27, 2002

Intro

Mission and task

People

Recommendations

Technical reports

Existing know how

Existing Tools

Schemas or Format Descriptions

Standardization Initiatives

Important Dates

Links:

ECHO

ISLE

This document serves to summarize what has to be done in Workpackage 2 of the ECHO project. It first repeats gives an overview about Technology in ECHO, second what is stated in the technical annex as tasks for WP2 and then gives some interpretations.

1. Technology in ECHO

Technology issues are one of the four main pillars of ECHO. Technology is dealt with at various layers: (1) It is natural part of the AGORA discussions, in particular between specialists from the humanities disciplines and technology; (2) It is part of the content provision work in so far that the content providers use tools and have to integrate their resources into a browsable and searchable domain; (3) It is subject of the Infrastructure and Tools work package. The following diagram describes the interaction between these ECHO layers.

Technology in ECHO

The content provision teams will use the already existing tools, but also receive new versions created within the ECHO framework. They will discuss with the developers about the usage of the tools, their errors and useful extensions. They will use the AGORA to discuss the requirements and visions of the discipline. The technologists will interact amongst themselves to meet the goals, interact with the content providers about the emerging tools and how to use them. They also will use the AGORA to present the state of the art in technology and their visions about how technology will develop. Further, they will listen to the requirements from the disciplines to extract roadmaps for future developments.

Workpackage 2 therefore has three topics that can be identified separately, although they have to be brought together:

A searchable and browsable metadata infrastructure has to be established that functions as an integrating umbrella for the various contributions from the content provision teams where possible.
Based on existing technology one or more annotation tools have to be created that allow to work on texts, images, audio and video signal in collaborative environments.
A prototypical database containing ethnological descriptions of many societies and a prototypical database containing ethnological objects have to be created.

The details are described below.

2. Technical Annex

Objectives

1. To build a prototypical browsable and searchable knowledge base that can be easily used online by researchers and the interested public. It will be based on current metadata standards such as DC and IMDI and cover language resources from 12 European institutions and the resources gathered in WP3.

2. The realization of a hypermedia form to gather information about non-European cultural heritage.

3. To develop a multimedia annotation tool which allows people to work collaboratively on a multimedia resource and to add comments to it, although being at different locations.

The result must be an integrated demonstrator which will be based on existing Java-based solutions of the partners and which use open standards such as XML-based interchange formats. Both - infrastructure and technology - have to be integrated and to demonstrate the potential of a Common Technological Framework covering several disciplines.

Description of Work

First, the requirements and the existing solutions of the content provision tasks in WP3 will be determined. In parallel, specifications will be drawn for the web-based multimedia annotation and commenting tool. Also in parallel the specifications of the hypermedia form will be worked out.

Second, tool adaptation and development has to be carried out. The existing metadata tools have to be adapted with high priority to make them available to the teams to enter metadata. In parallel, the hypermedia form for non-European cultural heritage will be developed. Its usage in collaborative scenarios is planned to be realized at the end of the development phase (T15).

Third, the participating institutions will apply the available tools and create metadata descriptions and annotations.

Fourth, the two domains infrastructure and annotation/commenting tool will be integrated such that the tool can be started when browsing within the metadata description domain and when a useful resource was found by the user. Therefore, the final demonstrator will give access to the content developed in WP3 to demonstrate the innovative research capabilities.

Deliverables

D2.1 Specification Report covering specifications for the infrastructure and the collaborative annotation tool T6

D2.2 Specification Report for the technical realization of the model forms for non-European cultural Heritage T10

D2.3 Prototype of the hypermedia form for non-European cultural heritage T101

D2.4 A demonstrator covering the infrastructure and the collaborative tool in an integrated way T15

3. Interpretation

First, we will discuss the three objectives separately and then speak about the integration that is mentioned.

3.1 Infrastructure

(main actors: U Lund and MPI Nijmegen)

The work will be based on the IMDI metadata (see note 1) which has been worked out in the European ISLE project (www.mpi.nl/ISLE). A mapping was defined between the IMDI set and the DC set, i.e. the DC domain is included as a subset and if data providers in ECHO would prefer to deliver DC records it would be acceptable although it is not a satisfying solution due to its inherent limitations. The IMDI set will be further developed also within the INTERA (Integrated European Language Resource Area) and the DOBES (Documentation of Endangered Languages; www.mpi.nl/DOBES) projects. It has to be analyzed in detail what kind of metadata descriptions other ECHO content providers may require or deliver. Additional adaptation or mapping work may be necessary to come to one integrated infrastructure. The description of work states that the existing IMDI metadata tools developed within the ISLE/IMDI project have to be extended according to the modifications of the IMDI set and the possible integration requirements. The tools already now guarantee the possibility of browsing and searching in a metadata domain that is the knowledge base that is mentioned under objectives.

The description of work further states that the metadata tools have to be adapted with high priority so that teams can enter metadata and enrich the browsable domain. The goals section states that the domain should cover language resources from 12 European institutions and the content work in WP3. Lund and MPI Nijmegen have to establish a list of institutions providing language resources relevant for the ECHO initiative.

A first workshop about extending and adapting the IMDI set took place at 14/15. November 2002. Contacts with the History of Arts specialists are established to map the MIDAS set with IMDI and to understand how the metadata records can be retrieved from the HIDA-MIDAS database. First contacts have been established with other partners in ECHO as well. An overview about the resources to be delivered revealed that in most cases no explicit schemas are yet available, i.e. most content providers did not yet think about how to present their resources with metadata.

Given this situation a first demonstrator of a true ECHO domain until September 03 with a few selected partners seems to be possible. The actual integration and training work will be carried out by Lund U (MPI Nijmegen will certainly help when necessary). Of course, this new ECHO domain will be integrated with the existing and emerging IMDI domains. As already mentioned an opening with reduced metadata to the DC and OLAC domain is supported.

After September 03 a second phase of the work will start that could even integrate more resources. To be able to do this the different requirements must have become more apparent.

The proposed time scale for the work at this moment is:

Work out an IMDI-MIDAS mapping scheme	December 02/January 03
Interaction with Florence about MD requirement	December 02/January 03
Select language resource providers	December 02/January 03
MD Tool Adaptation	March 03
MD Training in Lund	March/April 03
1^st AGORA Technical Committee Meeting	May 03
1^st AGORA Language Community Meeting	April 03 ??
get infra-portal done	July 03
first infra demonstrator	August/September 03
first mm tool demonstrator	August/September 03
discussions with other partners about MD needs	September/October 03
2^nd MD workshop	October/November 03
tool adaptation and metadata creation work	October 03-February 04
final infra demo	January/February 04
2^nd AGORA Technical Committee Meeting	February 04
2^nd AGORA Language Committee Meeting	February 04

3.2 Multimedia Tool Development

(main actors: MPI Berlin, U Bern, MPI Rom, MPI Nijmegen)

Mainly the functionality of two major existing tools have to be merged within the ECHO framework: U Bern’s DIGILIB and MPI Nijmegen’s ELAN annotation tool. Further, text technology functionality from Berlin has to be integrated where possible. The above mentioned two programs are the basis for what is called the development of a multimedia annotation tool.

What has to be created within the ECHO framework are tools that allow:

to annotate images, audio and video signals and text (see Note 2)
to work collaboratively, i.e. that two or more researchers can work on the same raw resource regarding and annotating it - be it an image, an audio or video file.
to work from remote sites, i.e. that these (collaborating) researchers can sit somewhere in the web and that the raw resource has to be available at his desk nevertheless.

We are faced here with a couple of problems that have to be solved:

We have to see how the major tools can be merged and how text functionality can be integrated. Three options are given to integrate functionality: (1) Integration by extending each tool separately, (2) integration at a component level or (3) integration by using a calling mechanism based on the same file formats.
DIGILIB works with high-resolution images, can zoom in and allows to select an area in 2D to make annotations. ELAN works with texts, sound (WAV, ...) and video (MPEG1, MPEG2, ...) and allows to operate on time, zoom in, select parts, has a fairly elaborate annotation structure concept and creates a structured XML format called EAF (something very similar as discussed in ISO TC37/SC4). So we have to merge the time dimensionality and the spatial dimensionality of the two programs including all the functions the two programs have right now. Further, it has to be checked what kind of XML formats are needed to meet all requirements resulting from that extension.
As mentioned above text technology components have to be specified in detail by the Berlin team to check ways of integration.
We have to make the tool such that collaboration is possible. The kernel of the Nijmegen solution is ready for such collaborations. The task includes tricky issues such as knowing who else is busy with the same resource, updating in real-time the annotations of the collaborators of all modifications done, protecting the work of each collaborator to achieve consistency etc. The collaboration capability requires also that annotation and media files can exist on different a machine that also implies tricky issues. It cannot be assumed that each annotation is directly accepted as being added to the “central” repository, i.e. some resources have to be stored locally.
Add appropriate visualizations, since at this moment DIGILIB does not know a time dimension and ELAN does not have a spatial dimension. The way people want to see the annotation texts may be different, i.e. ways of visualization have to be discussed.
The remote solution is not that critical except that all selection possibilities have to allow URLs and that we have to make sure that the transport of the many bytes (high-resolution images, videos) is sufficiently fast. Currently, MPEG2 movies are out of scope, but this may change. For this purpose streaming solutions have to be added.
Since the content creators have to start right away, they will create formats that will not meet the formats that will be finally agreed on for ECHO. Therefore, converters to port the existing data of WP3 into the chosen formats have to be developed to be able to demonstrate the resulting tool with the created content.

Further, the requirements of the various disciplines in WP3 have to be understood better. It may be that we have to add some functionality, if it is not too complicated and does not distort the time plan.

The proposed time scale for the work at this moment is:

1^st developers meeting and decisions about requirements sp	January 12/13
Development work with ongoing interaction	January - September 03
Content Creators work with what is available	January - September 03
1^st MM Tool Demonstrator	August/September 03
2^nd round of requirements specification	October 03
2^nd developers meeting and decisions about requirements sp	October 03
content creators use the new tool from	October 03
final mm tool demonstrator	January/February 04

3.3 Hypermedia Form (main actor is Paris)

This work has to be discussed in more detail with M. Gaudelier and his co-workers from EHESS Paris.

3.4 Integration

The TA speaks about a necessary integration. What is meant here is the following: People work in the metadata domain by browsing and/or searching. They may find a suitable resource and want to do some work with that, i.e. to start tools. This means that ECHO has to integrate the developed tools at that level. The (multimedia) annotation tools to be developed have to be executable from the metadata domain. The IMDI navigation tool already offers this feature and has mechanisms to start tools like that, but it has to be assured that this way of acting will also work for the new tools where feasible. It has also to be checked whether a strict separation between metadata and resources makes sense for all types of resources in ECHO.

Note 1

Metadata is meant here as keyword like data describing a whole resource for discovery purposes. It is different from scholarly metadata that covers rich annotations of raw material and others. In this document scholarly metadata is data contained in the “resources”.

Note 2

Annotations can be seen as primary texts. It is a matter of detailed specification which functionality can be included.

Last update: January 2, 2003 by A. Verbunt

WP2 Task and Mission
WP2-TR 1-2002 Version 1

Peter Wittenburg, December 27, 2002

1. Technology in ECHO

2. Technical Annex

3. Interpretation

3.1 Infrastructure

3.2 Multimedia Tool Development

3.3 Hypermedia Form (main actor is Paris)

3.4 Integration

WP2 Task and Mission WP2-TR 1-2002 Version 1

Peter Wittenburg, December 27, 2002

1. Technology in ECHO

2. Technical Annex

3. Interpretation

3.1 Infrastructure

3.2 Multimedia Tool Development

3.3 Hypermedia Form (main actor is Paris)

3.4 Integration

WP2 Task and Mission
WP2-TR 1-2002 Version 1