Linguistic Data Consortium Catalog (LDC)
Introduction | References | Corpus Structure | Corpus Information |
Document Information | Header Information | Metadata Overview |
Last update: 13-Nov-2000
Introduction
The Linguistic Data Consortium supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards. The LDC's Catalog contains 168 corpora of language data.
References
The Linguistic Data Consortium Catalog
LDC List of Catalog Fields (not used for this overview)
Catalog Structure
The catalog is an access structure on top of
corpora where the metadata is about the corpora in the catalog. Corpora are first divided into major categories according to the type of data they contain, and then are further broken down into minor
categories based on the source of the data.
(See http://morph.ldc.upenn.edu/Catalog/by_type.html)
Meta Date Overview
Catalog number | Contains a unique LDC catalog number |
Name | Contains the name of the corpus |
ISBN | Contains the ISBN |
Data Sources | Contains the corpus data source (broadcast, conversation, microphone etc.) |
Research Project | Contains the projects in which the corpus was used |
Recommended Application | Contains the recommended applications for which the corpus is useful |
Language | Contains the language used in the corpus |
Membership Year | Contains the year in which the corpus was released |
Corpus Type | Defines the type of the corpus (Lexicon, Speech or Text) |