last update: 13-Sep-2000

Glossary of Terms and Abbreviations

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

[A] 

 

Aboriginal Studies Electronic Data Archive (ASEDA)
Corpora with about 300 Australian Indigenous languages including dictionaries, grammars and teaching materials.
ASEDA
See: Aboriginal Studies Electronic Data Archive Browsable Corpus.

[B] 

BC
See Browsable Corpus.
Browsable Corpus
A concept which is aimed to help the researcher to navigate in the universe of corpora at the MPI and eventually even in a global universe. XML is used for the meta descriptions.

[C] 

CES
See Corpus Encoding Standard.
CHILDES
See Child Language Data Exchange System.
Child Language Data Exchange System
A set of tools for studying conversational interactions. These tools include a database of transcripts, programs for computer analysis of transcripts, methods for linguistic coding, and systems for linking transcripts to digitized audio and video.
Corpus
A collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language.
Corpus Encoding Standard (CES)
A standard designed for language engineering research and applications. CES is an application of SGML. It uses relevant guidelines from TEI and extends the TEI for corpus encoding.

[D]

DC
See Dublin Core.
Dublin Core (DC)
A set of meta-data elements used to describe electronic resources. The Dublin Core data model is described with RDF.

[E]

EAGLES
See Expert Advisory Group on Language Engineering Standards.
ELRA
See European Language Resources Association.
European Language Resources Association (ELRA)
An agency which goal is to provide a centralized organization for the validation, management, and distribution of speech, text, and terminology resources and tools, and to promote their use within the European telematics R&TD community.
Expert Advisory Group on Language Engineering Standards (EAGLES)
Initiative to support: the development of standards for language resources, manipulation of language resources and evaluation of resources, tools and products.
Extensible Markup Language (XML)
The universal format for structured documents and data on the Web.

[I]

ICE
See International Corpus of English.
International Corpus of English (ICE)
Corpora of spoken en written material for comparative studies of varieties of English throughout the world.
International Standard in Language Engineering (ISLE)
The project to develop a meta description standard for language resources.
ISLE
See International Standard in Language Engineering.

[L]

Language Resource
Collections of data which primarily document communicative acts of humans by some form of recording and/or descriptions, both directly as in corpora, or at higher levels of abstraction in lexicons and ontologies. The primary data can be text, video recording and/or audio tracks.
Language Resource Community
Researchers and developers working with language resources. These can be either researchers using such resources for theorizing or testing new hypotheses, or technology developers who use such resources to train their statistical recognition machinery.
LDC
See Linguistic Data Consortium.
Linguistic Annotation
Linguistic Data Consortium (LDC)
The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes.

[M]

MATE
See Multilevel Annotation, Tools Engineering.
MCF
See Meta Content Framework.
Meta Content Framework (MCF)
A specification for a data model for describing information organization structures (meta data) for collections of networked information. Instances are described using XML.
Meta-Data
Data about data.
Meta-Description
A structured set of meta-data, which describes a certain language resource or a group of such resources in a way that it is meaningful to the user community.
Meta-Universe
The universe of meta-descriptions which cover all the resources for a particular user community. The meta-universe of descriptions should be open to everyone, although access to the resources themselves can be restricted.
Multilevel Annotation, Tools Engineering (MATE)
An annotation standard of spoken dialogue corpora. It will treat spoken dialogue corpora at multiple levels, focusing on prosody, (morpho-) syntax, co-reference, dialogue acts, and communicative difficulties, as well as inter-level interaction.

[P]

PICS
See Platform for Internet Content Selection.
Platform for Internet Content Selection (PICS)
A cross-industry working group whose goal is to facilitate the development of technologies to give users of interactive media, such as the Internet, control over the kinds of material to which they and their children have access.

[R]

RDF
See The Resource Description Framework.
Resource Description Framework (RDF)
Integrates a variety of web-based metadata activities including sitemaps, content ratings, stream channel definitions, search engine data collection (web crawling), digital library collections, and distributed authoring, using XML as an interchange syntax.

[T]

TEI
See Text Encoding Initiative.
Text Encoding Initiative (TEI)
A standard for preparation and interchange of electronic texts. The TEI guidelines are described with SGML.

[U]

UHLCS
See University of Helsinki Language Corpus Server.
University of Helsinki Language Corpus Server (UHLCS)
A multilingual corpus server containing corpora of more than 50 languages, including samples of minority languages and extensive corpora representing different text types.

[W]

Web Collections
A meta-data syntax that fits easily within the framework of the World Wide Web. Web Collections are an application of XML.

[X]

XML
See Extensible Markup Language.