5.2. Select the sessions and subcorpora that belong to a corpus

A corpus consists of subcorpora and sessions. For example, the following illustration shows a corpus labeled 'Goemai corpus (preliminary version)', together with its two subcorpora 'Natural data' and 'Elicited data'. The subcorpus 'Elicited data' contains further subcorpora (labeled 'Matching games' and 'Picture books'). Each subcorpus consists of sessions (labeled 'Session1' to 'Session6') that contain the actual session data (i.e., IMDI Session files with metadata information and links to info, media, written resource, and lexicon files).

Goemai corpus (preliminary version):
  • Natural data

    • Session1

    • Session2

    • Session3

  • Elicited data

    • Matching games

      • Session4

      • Session5

    • Picture books

      • Session6

To specify such a hierarchical corpus structure, you need to create an IMDI Corpus file for each corpus and subcorpus. You are asked to provide the following information:

  1. Corpus Name:

    A short name or abbreviation that uniquely identifies the corpus or subcorpus.


    Note for researchers working at the MPI for Psycholinguistics, Nijmegen: The Corpus Name should be the same as the name of the corresponding IMDI Corpus file. Furthermore, the file name has to be Unix compatible: do not use file names longer than 14 characters, do not use non-letter or non-number characters (except for the underscore: _), and do not use blank spaces.

  2. Corpus Title:

    The complete title of the corpus or subcorpus. Usually, it is the spelled out version of the abbreviated Name.

  3. Descriptions:

    Descriptive information about the corpus or subcorpus. See Section A.1 for instructions on how to fill in a Descriptions schema. Remember: The field Language refers to the language in which the description is written - not to the language under investigation.

  4. Corpus Links:

    Specify links to (a) all IMDI Session files or (b) all IMDI Corpus files that belong to the subcorpus or corpus.

    1. Specify the IMDI Session files that belong to the subcorpus. E.g., in the IMDI Corpus file Natural.imdi (i.e., 'Natural data'), specify that it contains the IMDI Session files Session1.imdi, Session2.imdi and Session3.imdi.

      Specify the IMDI session files

      Figure 5.5. Specify the IMDI session files

      Specify the links to the IMDI Session files

    2. Specify the IMDI Corpus files that belong to the corpus or subcorpus. E.g., in the IMDI Corpus file Goemai.imdi (i.e., 'Goemai corpus (preliminary version)'), specify that it contains the IMDI Corpus files Natural.imdi (i.e., 'Natural data') and Elicited.imdi (i.e., 'Elicited data').

      Specify the IMDI corpus files

      Figure 5.6. Specify the IMDI corpus files

      Specify the links to the IMDI Corpus files

    To specify a link, do the following:

    1. Click on the Add button. The CorpusLink dialog window appears.

      Add Corpus Link

      Figure 5.7. Add Corpus Link

    2. Enter the name of the session or corpus in the field Name.

    3. Specify the directory location of the corresponding IMDI Session/Corpus file in the field Corpus Link.