Patrice Lopez
Laurent Romary
DFKI GmbH
Saarbrücken, Germany
lopez@dfki.de
LORIA
Vand uvre-Lès-Nancy, France
romary@loria.fr
This article presents a 3-step model for multi-layer annotations of corpora. Each kind of annotation for a textual corporacorresponds to a different view on the same document. This principle can be expressed first with a general relational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the encoding of large corpora. The exploitation of this kind of annotated corpora requires efficient manipulation processes and reversive access. We propose to use a third step representation based on a set of optimised FSA resulting from the parsing of the XML documents. These propositions have been implemented in the first version of a workbench dedicated to the French Le Monde corpus.