Using suffix arrays as language models: Scaling the n-gram
Stehouwer, H., & van Zaanen, M.
Using suffix arrays as language models: Scaling the n-gram. In Proceedings of the 22st Benelux Conference on Artificial Intelligence (BNAIC 2010), October 25-26, 2010
In this article, we propose the use of sufﬁx arrays to implement n-gram language models with practically
unlimited size n. These unbounded n-grams are called 1-grams. This approach allows us to use large
contexts efﬁciently to distinguish between different alternative sequences while applying synchronous
From a practical point of view, the approach has been applied within the context of spelling confusibles, verb and noun agreement and prenominal adjective ordering. These initial experiments show
promising results and we relate the performance to the size of the n-grams used for disambiguation.