Automatic annotation of bibliographical references for descriptive language materials
Automatic annotation of bibliographical references for descriptive language materials. In P. Forner, J. Kekäläinen, M. Lalmas, & M. De Rijke (Eds.
), Multilingual and multimodal information access evaluation. Second International Conference of the Cross-Language Evaluation Forum, CLEF 2011, Amsterdam, The Netherlands, September 19-22, 2011; Proceedings
(pp. 62-73). Berlin: Springer.
The present paper considers the problem of annotating bibliographical references with labels/classes, given training data of references already annotated with labels. The problem is an instance of document categorization where the documents are short and written in a wide variety of languages. The skewed distributions of title words and labels calls for special carefulness when choosing a Machine Learning approach. The present paper describes how to induce Disjunctive Normal Form formulae (DNFs), which have several advantages over Decision Trees. The approach is evaluated on a large real-world collection of bibliographical references.