Browsable corpora at MPI
Max Planck offers access to a large amount of linguistic data through the Internet. Here you can read why this data is collected, what kind of data you can have access to, and details on how you can take a look at it. A preview is also available to give you an impression of the Browsable Corpus.
Goals
The data available in the corpora is being gathered for several purposes:
- Preservation of old data
- Organization of data for current projects (work in progress)
- Making collected data available for new research
- Integration of data processing tools and data
- Unification of standards for language data description, annotation and storage
Access to the corpora
- Access with your favorite web
browser
All IMDI corpora can be accessed by browsing through the metadata with common web browsers (Firefox, Internet Explorer, Safari). Click here to start browsing. - Access by a special XML browser
Before you can have access to the data, you have to download the IMDI-BCBrowser. With the IMDI-BCbrowser you can browse through the corpus structures, search for meta information, and view (linked) annotation and media data (limited access).The metadata is available in the following way: follow the browser's installation instructions, run it and double click on the bookmark "IMDI Corpora (HTTP)" in the upper left panel of the browser's window.
Languages contained in the corpus are, amongst others, Dutch, English, German, Japanese, French, Trumai, Hindi, Tamil.

