The first version of a workbench implementing these principles has
been developped in the context of a project called CALIN. This
graphical workbench is currently specialized for the French Le
Monde corpus developped at TALaNa (University of Paris VII). Still
this tool could be easily adapted to other classical tagged corpora or
treebanks. The workbench is writen in Java and uses the Silfide XML
Parser which
support XML link and XML path specifications. The workbench allows to
visualise the reference corpus, to access to annotated information
(morphology and syntax) simply by clicking on words (see figure
5). Three different
modes allow to access to annotations linked to the word, to the
compound or the full sentence containing the selected word. Syntactic
annotation can be edited in a table or with a syntactic dependency
tree (see figure 6)..
The implementation also provides a conversion tool to generate the XML documents from the existing annotated ASCII files as presented table 1. Currently the whole corpus is automatically divided into several XML documents which requieres less memory to be loaded by the XML parser than a complete XML document. Existing tools permits the conversion of the corpus in a single level of annotation in the proposed format. We can also project from our XML encoding a particular level of annotation according to an existing standard XML annotation scheme.