You are here: Home Departments Other research Research projects The Language Archive Events Antal van den Bosch

The Language Archive -

Antal van den Bosch

Big Language Data

Digitized written language can be scooped up at will from the internet and exploited for science. Even without any explicit linguistic annotation the language data itself can directly be used for practical purposes such as spelling correction, text completion, and if parallel text in two languages can be found, for machine translation. Zipf's law ensures that when you have more data, results will be better (log-linearly). In fact many of the best natural language processing systems are based on data only, plus the power of sophisticated stochastic methods. I'll argue that there is a less sophisticated class of methods based on analogical reasoning that produces the same impressive results. I'll discuss the linguistic interestingness of this idea using centenary concepts such as Hermann Paul's Analogiebildung and De Saussure's quatrième proportionelle.

 

Antal van den Bosch is professor of example based language modeling at the Centre for Language Studies of the Radboud University Nijmegen. His research focuses on the intersection of computational understanding of language and computational generation of language.

Where and when:
14:30-16:00 Jun 5, 2012
MPI Nijmegen, room 163
Contact:
Last checked 2013-02-21 by Nanjo Bogdanowicz

Max Planck Institute
for Psycholinguistics


Street address
Wundtlaan 1
6525 XD Nijmegen
The Netherlands


Mailing address
P.O. Box 310
6500 AH Nijmegen
The Netherlands

Phone:   +31-24-3521911
Fax:        +31-24-3521213
E-mail:   

Image right

scrabble