TAGML (Tree Adjoining Grammars Markup Language) is a general norm for
encodind and exchange ressources used with Lexicalised Tree Adjoining
Grammars. A working group in France gathers people (mainly from
TALaNa, ENST, INRIA Rocquencourt and LORIA) who work on this formalism
and try to define standards for common grammars and grammar exchange,
parsers, and tools developments. TAGML is an exemple of the high
level of complexity of the ressources to encode. A LTAG grammar is
defined by a morphological lexicon, a syntactic lexicon and a set of
schemas (non lexicalized elementary tree paterns). The schema are
ordered in tree families in order to capture generalities of
lexicalizations given by the syntactic lexicon. Improvment of LTAG
parsers and tools depends on how this huge amount of datas can be
factorized in order to share computation.
The previous RROM model for morphological lexicon is extented to the other ressources needed at the syntactic level. An inflection (a lemma and a set of morphological features including verb mode for example) corresponds to a set of schemas. This lexicalization relation can include the instanciation of co-anchors (a lemma and a set of possibly underspecified morphological features) and of some additional syntactic features in the schema. Each syntactical instanciation give a complete elementary tree. If we assume that linguistic principles given in [Abeillé et al.1990] and [Candito1999] are fullfilled by the grammar, each syntactical instanciation corresponds to only one semantic instanciation (semantic consistency principle). This model allows an incremental view of the lexicon ressources.
Figure: Simplified RROM for LTAG ressources.
The figure 2 presents the corresponding RROM. To simplify, tree families and structuration of features are not included in this example.
Figure: RROM for multilevel annotated textual corpus.