Next: Future work Up: A Framework for Multilevel Previous: Efficient internal representations of

A workbench for visualization and exploitation

The first version of a workbench implementing these principles has been developped in the context of a project called CALIN. This graphical workbench is currently specialized for the French Le Monde corpus developped at TALaNa (University of Paris VII). Still this tool could be easily adapted to other classical tagged corpora or treebanks. The workbench is writen in Java and uses the Silfide XML Parser which support XML link and XML path specifications. The workbench allows to visualise the reference corpus, to access to annotated information (morphology and syntax) simply by clicking on words (see figure 5). Three different modes allow to access to annotations linked to the word, to the compound or the full sentence containing the selected word. Syntactic annotation can be edited in a table or with a syntactic dependency tree (see figure 6)..

The implementation also provides a conversion tool to generate the XML documents from the existing annotated ASCII files as presented table 1. Currently the whole corpus is automatically divided into several XML documents which requieres less memory to be loaded by the XML parser than a complete XML document. Existing tools permits the conversion of the corpus in a single level of annotation in the proposed format. We can also project from our XML encoding a particular level of annotation according to an existing standard XML annotation scheme.

Patrice Lopez
Thu Apr 13 09:23:20 MET DST 2000