Related Projects Metadata Overview

Related Projects Metadata Overview

Overviews
Metadata element of the following overviews are used in the global overview:

Browsable Corpus (BC)

Corpus Encoding Standard (CES)

Codes for the Human Analysis of Transcripts (CHAT)

Dublin Core (DC)

European Language Resources Association Catalog (ELRA)

European Science Foundation Second Language Databank (ESFSLD)

Gesture Databank (GDB)

International Corpus of English (ICE)

Linguistic Data Consortium Catalog (LDC)

Multimedia Content Description Interface (MPEG-7)

Spoken Dutch Corpus (CGN - Corpus Gesproken Nederlands)

The following important initiatives and projects are not included in the overviews for different reasons:

Archive of Indigenous Languages of Latin America (AILLA)

The AILLA is a project to develop a web-based archive of linguistic materials of the indigenous languages of Latin America.

Some info about AILLA metadata can be found here.

Alaska Native Language Center (ANLC)

The ANLC is recognized as the major center in the United States for the study of Eskimo and Northern Athabaskan languages. It is the center for research and documentation of the twenty Native languages of Alaska.

There is no ANLC overview because ANLC is in the process of getting their metadata in line with Dublin Core.

British National Corpus (BNC)

The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources,
designed to represent a wide cross-section of current British English, both spoken and written.

The reason why there is no BNC overview is because the BNC text encoding is TEI-conformant and we rely on the Corpus Encoding Standard (CES) claim that relevant elements for corpus encoding are selected from TEI.

Linguistic Data Archiving Project (LACITO)

The goals of the LACITO linguistic data archiving project are the conservation and the distribution of speech data.

The maintainers of LACITO are currently looking for a metadata standard.

Michigan Corpus of Academic Spoken English (MICASE)

The on-line, searchable part of a collection of transcripts of academic speech events recorded at the University of Michigan.

The maintainers give a clear overview of meta descriptions in the form of speech events and speaker attributes.

Text Encoding Initiative (TEI)

The Text Encoding Initiative (TEI) is an international project to develop guidelines for the preparation and interchange of electronic texts for scholarly research, and to satisfy a broad range of uses by the language industries more generally.

TEI is not included in the overviews because we rely on the fact that the Corpus Encoding Standard selected the relevant elements from TEI for corpus encoding.

University of Helsinki Language Corpus Server (UHLCS)

The University of Helsinki Language Corpus Server (UHLCS) is a multilingual corpus server located at the Department of General Linguistics, the University of Helsinki. The server contains computer corpora of more than 50 languages, including samples of minority languages and extensive corpora representing different text types.

The UHLCS is currently structured as a language hierarchy. The definition of UHLCS metadata is in progress.

These and other metadata projects that have been examined are listed in language engineering resources.

Overview format

Describes the format used for the overviews

Global overview

Gives a global overview of all metadata elements

Language Engineering Resources

Lists all relevant web-sites