Clausal coordinate ellipsis in German: The TIGER treebank as a source of evidence
Syntactic parsers and generators need highquality
grammars of coordination and coordinate
ellipsis—structures that occur very
frequently but are much less well understood
theoretically than many other domains
of grammar. Modern grammars of
coordinate ellipsis are based nearly exclusively
on linguistic judgments (intuitions).
The extent to which grammar rules based
on this type of empirical evidence generate
all and only the structures in text corpora,
is unknown. As part of a project on the development
of a grammar and a generator
for coordinate ellipsis in German, we undertook
an extensive exploration of the
TIGER treebank—a syntactically annotated
corpus of about 50,000 newspaper sentences.
We report (1) frequency data for the
various patterns of coordinate ellipsis, and
(2) several rarely (but regularly) occurring
‘fringe deviations’ from the intuition-based
rules for several ellipsis types. This information
can help improve parser and generator
performance.
Share this page