2.5. Data categories

In ELAN users are free to invent their own tier setup and labelling method. This flexibility is often necessary due to the nature of the data that is to be transcribed. Moreover, people that are involved in the transcription process may not be fluent in English and as a result an international (English) annotation scheme is not applicable. In those cases a controlled vocabulary (see Section 2.6) and templates (see Section 1.2.11) are convenient tools to help annotators.

The downside of all this flexibility is the amount of work involved to make language resources interoperable. When dealing with only a few resources, data can be manageable, but with an increasing number of resources a convenient way to make them interoperable becomes more important. For this purpose the ISO Data Category Registry is developed.

The Data Category Registry (or DCR) is an list of linguistic concepts covering a range of linguistic domains. The concepts in the DCR can be referenced to from all sorts of tools and resources. Therefore, the DCR acts as a intermediate between those tools and resources.

Referencing to a Data Category is implemented in ELAN as follows. Depending on the type of data you are referencing from (tier type (Section 2.3.6), controlled vocabulary entry (Section 2.6.3) or annotation (Section 2.9.21)), the following or a similar window is displayed.

Local Data Category Selection

Figure 2.27. Local Data Category Selection

The left panel shows the categories stored on your local system. Since there are none in the left panel, the right panel does not display any name or description. To add categories, click on Add Categories. The following window appears:

Remote DCR

Figure 2.28. Remote DCR

This window displays the DCR on a remote server. It includes all profiles and the data categories of those profiles. To select one or more data categories for local storage first click a profile in the left panel. All data categories of the selected profile are displayed in the middle panel ordered by alphabet, ID or Broader Concept. If you select a data category, information of the category is displayed in the right panel. For instance, the data category partOfSpeech has Id 1345 as can be seen below. Holding the CTRL key while clicking multiple lines in the middle panel enables you to select more than one data category. The same holds for using the SHIFT key for selecting a range and using CTRL+A for selecting all categories from the list. Click on Apply to storing the selected data categories on you local system.

Remote Data Category Selection: Result

Figure 2.29. Remote Data Category Selection: Result

In the same way as described above more data categories, also from other profiles, can be selected and stored on your local system. Afterwards, you can highlight a category and associate it to a CV or tier type by clicking Apply:

Local Data Category Selection: Result

Figure 2.30. Local Data Category Selection: Result

The original purpose of this system is to associate (parts of) your data to a common labelling system to improve interoperability between resources and tools. To do so, select a data category and click on Apply. This will associate the selected data category to an annotation, entry of a controlled vocabulary or tier type, depending on the point from which you entered the Local Data Category Selection.