Codes for the Human Analysis of Transcripts (CHAT)

Introduction References Corpus Structure Corpus Information
Document Information Header Information Metadata Overview

Last update: 30-Aug-2000

 

Introduction

CHAT is the format used for the CHILDES (Child Language Data Exchange System) project.

 

References

MacWhinney, Brian. 1991. The Childes Project: Tools for Analyzing Talk

 

Corpus Structure

 

Corpus Information

The corpus header is called the 'Documentation File' in CHAT. It is stored in a text file (00readme.doc) in the corpus directory. The documentation file contains descriptions about the corpus. Some metadata elements which are extracted from these human readable descriptions are listed under 'Corpus Header' in the metadata overview.

 

Document Information

Each document equals one file in the corpus directory.

 

Header Information

There are three types of document headers in CHAT:

 

Metadata Overview

Corpus Header A basic set of facts about the corpus
  Acknowledgements A statement that asks the user to cite some particular reference when using the corpus
    Reference Name The name of the person cited
    Reference Year The year of the cited reference
  Restrictions A description of the restrictions on the use of the corpus data
  Warnings A description of the limitations on the use of the corpus data
  Pseudonyms ?
  History Gives detailed information about the history of the project
    Funding Description of how the funding was obtained
    Goals Description of the goals of the project
    Data collection Description of how the data was collected
    Sampling procedure Description of the sampling procedure
    Transcription procedure Description of the transcription procedure
    Transcription ignored Description of what was ignored in the transcription
    Transcribers training Description of the transcribers training
    Reliability Description of reliability of the data
    Coding Description about coding and used codes ????
    Computerized Description of how the material was computerized ????
  Codes Description of project-specific codes
  Biographical data Gives biographical information about the informant 
    Informant's Age ?
    Informant's Gender ?
    Informant's Siblings ?
    Informant's Schooling ?
    Informant's Social Class ?
    Informant's Occupation ?
    Informant's Previous residences ?
    Informant's religion ?
    Informant's interest ?
    Informant's friends ?
  Table of contents Gives a brief index to the contents of the corpora ?
  Situational description Gives general situational descriptions ? 
Obligatory header This header must be included to for use with CLAN programs
  Participants Lists all the 'actors' within the file
 

1..N

Speaker's ID The participants are represented by a unique three-letter ID. Mostly the first three letters from the speaker's name are used
    Speaker's Name The speaker's first name
    Speaker's Role The speaker's relationship to the children under study. Standard roles: Target_Child, Mother, Father, Brother, Sister, Teacher, Playmate and Investigator
Constant header Contains information that is constant throughout the file. The information is unlikely to change during the course of the recording session
  Age Specifies the speaker's age in years, months and days.
 

1..N

Speaker's ID The unique speaker's ID which refers to the name and role of a participant
    Speaker's Age Age in years, months and days
  Birth Gives the date of birth of the speaker
 

1..N

Speaker's ID The unique speaker's ID which refers to the name and role of a participant
    Speaker's Date of birth Date of birth
  Coding Indicates the date of the current version of CHAT. Used for updating files and new coding conventions
  Coder Identifies the people who transcribed and coded the file
  Education Specifies the speaker's highest grade in school
 

1..N

Speaker's ID The unique speaker's ID which refers to the name and role of a participant
    Speaker's Education Identifies the speaker's education or years of college
  Filename Gives the name of the computer file
  ID Used by the program "STATFREQ" to assign a unique code to each child
 

1..N

Speaker's ID The unique speaker's ID which refers to the name and role of a participant
    Unique code A unique code to identify the speaker throughout a corpus
  SES Describes the socioeconomic status of the child's family
 

1..N

Speaker's ID The unique speaker's ID which refers to the name and role of a participant
    Speaker's SES The speaker's socioeconomic status. The following adjectives are recommended: welfare, lower, working, lower-middle, middle, upper-middle, upper
  Sex Indicates the speaker's gender
 

1..N

Speaker's ID The unique speaker's ID which refers to the name and role of a participant
    Speaker's Sex Gender of the speaker (male or female)
  Warning Describes user warnings about certain defects or peculiarities in the collection
Changeable header Contain information that can change within the file
  Activities Describes the activities involved in a situation
  Bgd Describes backgrounding material (????)
  Comment Used for all-purpose comments
  Date Indicates the date of interaction
  Language Specifies the language used for the material that follows
  Location Indicates the city, state and country in which the interaction took place
  New Episode Indicates the end of one episode and the beginning of another
  Room Layout A description of the room and its contents
  Situation Describes the general setting of the interaction
  Stim Indicates a particular stimuli used in an elicited production task
  Tape Location Indicates the specific tape from which the transcription was made
    Tape ID Gives the tape identifier
    Tape Side Gives the side of the tape (a or b)
    Tape footage Gives the tape footage
  Time Duration Indicates the time at which the audiotaping began and the time that passed during the course of the taping
    Time Start Gives the time at which the recording began
    Time End Gives the time at which the recording ended
  Time Start Used to "restart" the clock