home | introduction | research | people | facilities | events & news | visitor info | contact us | search |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Linguistic Applications at the MPIFor the past couple of years, the Max-Planck-Institute has been working to develop tools which allow linguists, anthropologists, and psychologists to flexibly operate on the corpora of data they have collected. These corpora form the center of all work based on observational data at the MPI. They increasingly include multi-media data, as researchers become interested in topics such as the alignment of syntactic structure and prosody and intonation, or in the alignment of speech and gesture. Currently, the Institute is taking great efforts to digitize speech and video signals and store them on powerful media servers. The linguistic tools we want to eventually create can be illustrated with the following diagram:
Guided by browsing through or searching a set of meta-descriptions, the user will easily find the resources (s)he is interested in within a universe of MPI-resources and resources from other institutes. Corpora and lexicons are closely linked such that information flow is supported in both directions. Flexible viewers/editors help the researcher to analyze, modify or create resources. These viewers support immediate access to all types of information. Powerful search tools help the researcher to find particular fragments of the resource and may produce output which, for example, might be useful in typological studies. Further processing can be carried out on the search output, in fact it can be used as a new resource. The Institute has a couple of tools available which were developed to create the overall architecture step by step. In the following diagram, the tools are briefly described. It should be mentioned that Shoebox is a program distributed by the SIL. The blue boxes indicate the tools which have been in operation for some time. The browsing part of BC (Browsable Corpus) is now ready , with searching on the meta-information has to be added. MED will be extended by the Search tool so that it can be used interactively within the selected environment. FSearch is a tool currently under development; we are testing how to do searches on very large resources using it. If successful, this technology will be integrated into the other tools. EUDICO is the main tool for the future; we intended to make it a general linguistic tool for work with multi-media corpora, in particular for descriptions of multi-modal data. The first version of EUDICO is ready, i.e. via local area and wide area networks, users can view mm-corpora.
The following table gives an overview about the functionality of the tools at this moment. It is also indicated what is planned for the coming year.
Some of the tools mentioned above are in development, and the corresponding information in these web-pages will be subject of continuous change. A highly interesting discussion was raised recently about proper annotation formats for linguistic resources. Since the MPI had to go new ways (our resources are multi-medial and they have to be offered via networks) much time was used to think about appropriate formats (see the list above). March 1999 St. Bird and M. Liberman from University of Pennsylvania have created a very interesting overview page about available tools and they have written an excellent paper: A Formal Framework for Linguistic Annotation. There is also an interesting debate about annotations of multi-media resources within the MPEG community. The emerging standard MPEG7 is meant to deal with the structure of such annotations. For all questions with respect to these pages, please contact Peter Wittenburg of the Max-Planck-Institute. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Last updated: December 27, 2000 15:34 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|