4.2.20. Importing CSV / Tab-delimited Text Files

A CSV (Comma Separated Values) or Tab-delimited Text file is a text file in which one can identify rows and columns. Rows are represented by the lines in the file and the columns are created by separating the values on each line by a specific character, like a comma or a tab. CSV or Tab-delimited Text files can be compared to spreadsheets like the ones in Microsoft Excel in that they also have rows and columns. Note that .csv files can be created by Excel.

Take a look at Figure 4.32. The first row represents the event of a person saying 'so from here'. The first value (as well as the first column of the complete file) represents the tier name, the second and third represent begin time in different formats, the fourth and fifth represent the end time, the sixth an seventh represent the duration and the last value represents the annotation.

Tab-delimited Text

Figure 4.32. Tab-delimited Text


You are able to import CSV or Tab-delimited Text files in ELAN: File > Import > CSV / Tab-delimited Text File.... In the dialog window browse to and select a file that contains CSV or Tab-delimited data and click Open.

The second dialog window contains two sections (see Figure 4.33). The upper section shows a sample table containing data from the selected file. Both rows and columns are numbered. The lower section enables you to specify which columns to include and what data type they represent. This means that the format of the files is flexible: it is not prescribed what data is expected nor how it is formatted. The numbers of the columns in the Import Options section correspond to the numbers of the columns in the sample table. The data types you can select are:

Select at least one column with data type 'Annotation'. If you select a column for begin time, end time and duration, the latter will be ignored in the import process.

Import CSV / Tab-delimited Text

Figure 4.33. Import CSV / Tab-delimited Text


The option Specify first row of data enables you to exclude a header by excluding the first few lines. The option Specify delimiter lets you specify the delimiter if Elan did not guess the correct delimiter. The delimiters supported by Elan are comma, tab, colon and semi-colon.

If you enable the option Default annotation duration Elan creates all annotations from the selected file with durations equal to the number of milliseconds specified. This option works only if there is no time data or only the begin or end times.

Finally click OK to import the data. A new transcription document is created with the imported annotations as its contents.

Another example

To demonstrate that the format of the imported file can be flexible, take a look at the following tab-delimited text:

Tab-delimited text, different orientation

Figure 4.34. Tab-delimited text, different orientation


In this example each column represents a tier with the tier names in the first row and the annotation in the other rows. This file can be imported by selecting the following import options:

Import CSV / Tab-delimited Text

Figure 4.35. Import CSV / Tab-delimited Text


Note that the Specify first row of data option is set to 2. As a consequence Elan starts importing annotations from row 2 instead of row 1. Furthermore, Elan tries to extract tier names from the first line of the file if the column they part of is specified as 'annotation'. This results in this example in two tiers: K-Spch and W-Spch.