next up previous
Next: Introduction

A Framework for Multilevel linguistic Annotations

Patrice Lopez
DFKI GmbH
Saarbrücken, Germany
lopez@dfki.de

Laurent Romary
LORIA
Vand tex2html_wrap261 uvre-Lès-Nancy, France
romary@loria.fr

Abstract:

This article presents a 3-step model for multi-layer annotations of corpora. Each kind of annotation for a textual corporacorresponds to a different view on the same document. This principle can be expressed first with a general relational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the encoding of large corpora. The exploitation of this kind of annotated corpora requires efficient manipulation processes and reversive access. We propose to use a third step representation based on a set of optimised FSA resulting from the parsing of the XML documents. These propositions have been implemented in the first version of a workbench dedicated to the French Le Monde corpus.





Patrice Lopez
Thu Apr 13 09:23:20 MET DST 2000