The Language Archive -
Antal van den Bosch
Big Language Data
Digitized written language can be scooped up at will from the internet and exploited for science. Even without any explicit linguistic annotation the language data itself can directly be used for practical purposes such as spelling correction, text completion, and if parallel text in two languages can be found, for machine translation. Zipf's law ensures that when you have more data, results will be better (log-linearly). In fact many of the best natural language processing systems are based on data only, plus the power of sophisticated stochastic methods. I'll argue that there is a less sophisticated class of methods based on analogical reasoning that produces the same impressive results. I'll discuss the linguistic interestingness of this idea using centenary concepts such as Hermann Paul's Analogiebildung and De Saussure's quatrième proportionelle.
Antal van den Bosch is professor of example based language modeling at the Centre for Language Studies of the Radboud University Nijmegen. His research focuses on the intersection of computational understanding of language and computational generation of language.
- Where and when:
14:30-16:00 Jun 5, 2012MPI Nijmegen, room 163