Enhanced suffix arrays as language models: Virtual k-testable languages
Stehouwer, H., & van Zaanen, M.
(2010). Enhanced suffix arrays as language models: Virtual k-testable languages. In J. M. Sempere, & P. GarcĂa (
Eds.),
Grammatical inference: Theoretical results and applications 10th International Colloquium, ICGI 2010, Valencia, Spain, September 13-16, 2010. Proceedings (pp. 305-308). Berlin: Springer.
In this article, we propose the use of suffix arrays to efficiently implement n-gram language models with practically unlimited
size n. This approach, which is used with synchronous back-off, allows
us to distinguish between alternative sequences using large contexts. We
also show that we can build this kind of models with additional information for each symbol, such as part-of-speech tags and dependency
information.
The approach can also be viewed as a collection of virtual k-testable
automata. Once built, we can directly access the results of any k-testable
automaton generated from the input training data. Synchronous back-
off automatically identies the k-testable automaton with the largest
feasible k. We have used this approach in several classification tasks.
Publication type
Proceedings paper
Share this page