ELAN offers various export options. To export, click on File > Export As and one of the options.
Apart from these export options for single files, ELAN also supports multiple file exporting options. More details regarding these options can be found here: Multiple file export options
Figure 1.34. Tier Selection panel in most of the dialogs
Select the tiers by checking the boxes before each tier name.
This tab shows a list of the tier types available in the current transcription. Select the types by checking the boxes before each type name. Selecting the types will select all the tiers of the each selected types. To modify the selected tiers switch back to By Tier Names.
This tab has a list of all the participants in the transcription. Select the participants by checking the boxes before each type name. Selecting the participants will select all the tiers of the each selected participants. To modify the selected participant switch back to By Tier Names.
This tab has a list of all the annotators in the transcription. Select the participants by checking the boxes before each annotator name. Selecting the annotators will select all the tiers of the each selected annotators. To modify the selected tiers switch back to By Tier Names.
This tab has a list of all the languages in the transcription. Select the language(s) by checking the boxes before each language name. Selecting the languages will select all the tiers of the each selected language. To modify the selected tiers switch back to By Tier Names.
To select multiple tiers, press Shift and click on the successive tiers or click and drag the mouse along the tiers to select them
Similar to exporting a document to Shoebox (see Shoebox file) ELAN data can be exported to a Toolbox document with an UTF-8 encoding. This export provides more options for output customization.
To export a file into Toolbox, do the following:
The Toolbox Export dialog box appears:
Only the left part of ELAN tier names containing an @ are identified as tier markers for Toolbox. These markers form a block in the exported file. The right part of the ELAN tier names are identified as participant names. These are exported with the marker ELANParticipant see the figure below:If you use a Shoebox *.typ file to specify the Toolbox database type ELAN
extracts the database type name from the first line of the type file (e.g. the
database type name Text in \+DatabaseType Text
)
and puts is in the first line of the exported file (e.g. \_sh
v3.0 400 Text
).
When there is only one root tier (tier without a parent tier) in the transcription (e.g. ref) this will be used as the record marker by default. When there are multiple root tiers "\block" will be added as record marker. In both cases it is possible to specify a custom record marker instead.
Some options not touched up in Toolbox Export dialog window:
The file is exported as a *.txt
|
*.sht
| *.tbt
file.
If there already exists a file of the same name, ELAN will ask you whether or not it should overwrite the existing file.
It contains the following information:
Each ELAN parent annotation (including all its referring annotations) corresponds to one Toolbox record. E.g., in the illustration below, the ELAN parent annotation “CLLDCh3R02S01.001” corresponds to the Toolbox record “CLLDCh3R02S01.001”.
Each ELAN parent annotation (i.e., each Toolbox record) contains the
additional field markers \ELANBegin
and \ELANEnd
(i.e., the begin and end time of the parent annotation).
This time code information allows you to import the Toolbox file back into ELAN, without having to manually re-align the file (see Shoebox file).
ELAN allows you to export your project to the SIL Fieldwork Language Explorer
software, also referred to as FLEx. The data exchange is realized through
.flextext
files, a file type that defines several container
elements and attributes (see below), onto which ELAN's tiers (via their tier type)
and
annotations have to be mapped. For the configuration of these mappings the complex,
multiple step export window described below, is provided. Configuration will be less
complicated in case the .eaf was created by importing a FLEx
.flextext
file. On import, some FLEx attributes are "encoded"
in the names of tiers, on export these attributes are reconstructed by "decoding"
the
tier names. To better understand the options in the user interface, a simplified
representation of the structure of a .flextext
file follows
here.
<interlinear-text> <item lang="" type="">...</item> <paragraph> <phrase> <item lang="" type="">...</item> <word> <item lang="" type="">...</item> <morph type=""> <item lang="" type="">...</item> </morph> </word> </phrase> </paragraph> </interlinear-text>All elements can occur multiple times, e.g. there can always be multiple
item
child elements for any
parent element.
If your .eaf file contains multiple participants, make sure you have given each participant a name value. You can set a participant value under Tier > Change Tier Attributes....
Choosing File > Export as > FLEx file … will give you the following screen:
In this screen you can specify:
interlinear-text
element and, if so,
which tier it is. This determines whether a tier and its dependent tiers provide
the contents for item
child elements of
interlinear-text
.paragraph
element. If so, its
segmentation is used for grouping phrase
child elements, if not,
each phrase
will be embedded in its own paragraph
element.The second screen allows to:
item
child element of the correct,
corresponding container elementitem
type
attribute of the .flextext
morph
element. This should be a valid FLEx morph type. If this
option is deselected each morph
element will be exported with
attribute type="root"
. The third screen allows to customize the FLEx lang
(language) and
type
attributes output:
type
is based on a FLEx controlled vocabulary, which could be
out-of-date at the time of use, therefore new values can be added manually. The
list of languages currently is based on "decoding" the tier names and on the
content languages of the tiers. The list can be empty, it should be filled
manually in that case.
FLEx requires that for languages that have both a two letter ISO 639-1 code and a three letter ISO 639-3 code, the two letter code should be used. This screen tries to automatically replace three letter codes by their two letter equivalents where needed, but it is good to check the codes.
The final screen allows you to save the file as a flextext file, so it can be used in FLEx.
On the third-party resources page of ELAN (https://archive.mpi.nl/tla/elan/thirdparty ), you can find a teaching-set which covers the aspects of importing from FLEx to ELAN and back to FLEx.
Figure 1.40. Export Chat file
Chat labels must be preceded by * (for root tiers) or % (for dependent tiers). While root tiers have to contain exactly 3 characters, dependent tier names can have up to 7 characters.
All documents can be exported into a tabular format for purposes of further analysis and/or printing. This includes documents that were created by ELAN itself (see Creating a new document and Opening an existing document) as well as documents that were imported into ELAN from any of the supported formats. Do the following:
The Export as tab-delimited text dialog window is displayed, e.g.:
Figure 1.41. Export as tab-delimited text dialog window
In the Export as tab-delimited text dialog window, select those tiers that you want to export. A check mark appears next to any selected tier.
Repeat values of annotations spanning other
annotations
the spanning annotation is put in each row
containing an annotation it spans. The spanning annotation is not in a row
by itself.Only repeat within annotation hierarchies
limits the previous option. An annotation is only repeated if it is on one
of the ancestor tiers in the annotation hierarchy.Sliced annotation output showing temporal
co-occurrences
is an alternative way to repeat annotation
values based on overlaps. In this export all unique begin and end times of
all annotations in the export are placed in one list, creating new
intervals (between each two successive time values). Each interval is
exported if there is at least one annotation overlapping that interval and
in the column of each tier the value of the overlapping annotation, if
any, is exported.Include the annotation id
appends the
annotation identifier between brackets to the annotation value (e.g.
[a13]). This makes it possible to distinguish annotations in the output,
which is hard to do in the case of repeated values. If you choose the SMPTE (hh:mm:ss.ff) format, the selected video standard (PAL or NTSC) just indicates the way seconds and milliseconds are converted to frame numbers. This is independent of the actual video standard of the associated video(s).
*.txt
saves a
tab-delimited text file, *.csv
saves the annotations in a
comma separated values file, placing all text values between double quotes. Make
an appropriate choice and click on Save.
Some Mac applications, like TextEdit, have difficulties to load UTF-8 encoded files. This is most noticeable for “special” characters, e.g. IPA. Using UTF-16 is recommended in that case.
A message appears to inform you that the file has been exported.
The contents and the layout of the exported file depends on the selected options. It can be opened with any program that can handle tab-delimited or comma separated texts, e.g., Microsoft Excel.
Figure 1.42. Tab-delimited text
Some versions of Excel seem to have problems importing tab-separated files (white rectangles are shown instead of the column borders). As a workaround you can open the text file first in a text editor (e.g. Notepad) and copy and paste the content into Excel.
If your ELAN annotations contain syntactic elements, it is possible to export these to Synpathy (see https://archive.mpi.nl/forums/t/synpathy-software-information/2649). This function is available via File > Export as > Tiger-xml…
First select out of the candidate tiers the one you want to be exported.
Afterwards, map the tiers onto the correct description ("word" or "pos"). Finally
enter the name of the file (*.tig
).
Synpathy is a tool for annotating, analyzing, and graphically editing the syntactical structure of sentences (e.g. Linguistically annotated text corpora), developed at the Max Planck Institute for Psycholinguistics. The application is based on the SyntaxViewer from the TIGER search project developed by the IMS (Institute für Maschinelle Sprachverarbeitung, University of Stuttgart).
This function (File > Export as > Interlinearized Text...) is very similar to ELAN’s printing system. Therefore more information can be found in Previewing the printed pages. The main difference is that the width of the exported text depends in this case on the number of characters that fits on one line.
Figure 1.43. Maximum line width
Similarly to the export to interlinear text (see Interlinear text file) you can also export annotations to a HTML file, through the File > Export as > HTML... menu.
Figure 1.44. Export as HTML
The only extra option for the HTML export is
To play the media HTML 5 is required. It is necessary to place the exported html in the same location as the media file in order to play the file from the html export.
In some situations a straight-forward list of the annotation units, one after another, can be handy. For that cause an export option to a “traditional transcript text” has been added to ELAN. In its simplest form it just will create a text file containing the successive annotations of several tiers, in chronological order. This feature can be found under File > Export as > Traditional Transcript Text....
Figure 1.45. Export Transcript Text
"Restrict to the selected time interval' allows you to export only the data that is currently selected. (see Making a selection on an independent tier).
'Wrap lines' sets a maximum number of characters before the line gets wrapped.
'Merge annotations on the same tier...' makes it possible to merge annotations on the same tier if the gap in between these annotations is less than a certain amount of milliseconds.
You can number the annotations, each wrapped line, and include or exclude tier labels or participant labels in the export.
One of the options enables you to include silences with a minimal duration. The figure shows there is a silence of 0.2 seconds between 'yeah' on the tier K-Spch and 'and then you go the other ...' on the tier W-Spch. The first annotation ends at 00:00:04.400 seconds and the next annotation begins at 00:00:04.600 seconds, resulting in a silence of 0.2 seconds. If this silence was shorter than the minimal silence duration entered in the export dialog window (20 ms in the figure), the silence will not be included in the exported file. The silence duration indication can have 1, 2 or 3 numbers of digits after the decimal.
Empty lines after each annotation (block) can also be included or excluded in the generated output file. Lastly, you can set a fixed width (in number of characters) for the tier labels.
The option to use Jefferson-style alignment based on "[" characters in overlapping annotations, can change the position of parts of annotations by vertically aligning corresponding "[" characters. (Alignment of matching "]" characters is not supported yet.)
This export function (File > Export as > Time-aligned Interlinear Text...) produces interlinear output but, unlike standard Interlinear Gloss, the formatting is based on time alignment . This is achieved by using a monospaced (fixed width) font in combination with a customizable character-to-milliseconds calculation factor. As a consequence, depending on this factor, the export might cut off part of the annotation value.
The export offers a few text styling options (underline, bold, italic) and the output format is (simple) HTML.
The ouput can be customized in various ways:
After changes in settings the Apply Changes button updates the preview. The Save As... button starts the actual export, currently html is the only supported format.
When you wish to work with your annotations in Praat, ELAN enables you to export your annotation to a Praat TextGrid. To do this, click File > Export as > Praat TextGrid.... In the dialog window that appears you can select the tiers you wish to export(How to select tiers) and specify whether you want to restrict the output to the selected interval.
After clicking OK, you can enter a file name and select an encoding. In addition to TextGrid files in the default encoding for the operating system, ELAN supports Praat TextGrid files with UTF-8 and UTF-16 encoding. Finally click on Save.
The preliminary export function File > Export as > WebAnnotation JSON... stores annotations according to the W3C Web Annotation Data Model specifications. This model and format are intended to enable sharing and reuse of annotations across applications and platforms.
The export window offers a few options to customize the output. Apart from the
possibility to select the tiers to export and to only export the selected interval,
there are a few format specific options which determine which information is included
and how it is structured. After changing settings, the Update
button applies the settings and updates the preview on the left side of the window.
The Export button initiate the actual export to a
.json
text file.
Sometimes it can be very useful to have a alphabetical list of (unique) words from one or more tiers. ELAN offers a way to generate such lists. Go to File > Export as > List of Words ... and select the tiers(see How to select tiers) from which you want to extract the words. The annotations of the selected tiers will be tokenized (split into words) using either a default set of delimiters or a user definable set. Check Count occurrences if you want the list to include the number of occurrences for each token. The Include overall totals in the export file option results in some basic overall statistics at the end of the file. The Include frequency percentages in the export option adds another column to the output, containing the percentage of each unique word (or annotation) of the total word count. After selecting tiers (or better, deselecting unwanted tiers) you can click OK and choose a file name. Clicking Save will save the word list.
ELAN supports export to SMIL-compliant clips. With a suitable player this enables you to view media files and the associated annotations as a subtitled movie.
For a description of this standard and players see http://www.w3.org/AudioVideo/
Figure 1.49. Export SMIL Real Player
Exporting SMIL for Quick time is very much the same as exporting SMIL for real player (see Export SMIL for Real Player). To export SMIL for Quick time, go to File > Export As > QuickTime.... This will bring up a dialog box very similar to export SMIL for Real player . The only extra option which is not available for real player is Merge tiers into one QuickTime text file.If selected, all tiers are merged into one file and if not selected a separate text file will be generated for each tier. It is also possible to set a transparent background for the subtitles. This is done by selecting Transparent background in the dialog (see Change subtitle text settings) which pops up by clicking the Edit Font and Display Settings... button. Finally click on OK to export.
Another format you can export to from ELAN is QuickTime subtitle Text. To do this, go to File > Export As > QuickTime Text.... Select the tiers(see How to select tiers ) you want to be included in the subtitles. Optionally specify the following options:
Finally click on OK. By default the subtitles are
stored in a QTtext .txt
file. If you enter a file name with the
extension .xml
the subtitles are stored in a TeXML - tx3g
formatted XML file (the merge tiers option is ignored in that case).
Besides the QuickTime subtitle Text (see QuickTime Text) ELAN can export annotations to there are
few other subtitle formats: SubRip (.srt
), Spruce
(.stl
), Timed Text Markup Language(ttml)
(.xml
) and LRC (.lrc
) . Click on
File > Export As > Subtitle Text... and select the
tiers(see How to select tiers ) you want to include in the subtitle
file. Specify whether the subtitles should be restricted to annotations in the selected
time interval, whether the time of the selected interval should be recalculated to
start
from zero and if the master media time offset should be added to the annotations times.
The third option lets you specify the minimal display duration of a subtitle. For
instance, if a annotation is only 0.3 seconds long, but you want to display a subtitle
at least 0.5 seconds, enter 500 (ms). Finally there is an option to specify that for
each selected tier a separate subtitle file should be created. The default is to export
all selected tiers to a single subtitle file.
After you have selected tiers and specified the options, click on OK. Enter a file name in the next window and click on Save.
Tiers for the recognizers are exported in the AVATech tier format. A brief description of the AVATech tier format can be found in this document: Avatech-interface-spec-2014-03-06.pdf. Files can be exported as .txt, .csv and xml.
Figure 1.52. Tiers for AVATech recognizers
ELAN supports any command line tool that can extract clips from a video (or audio)
file. For that purpose it uses a script file named
"clip-media.txt
" which can be found in the folder where ELAN is
installed. In most cases some configuration needs to be performed in the script file,
e.g. which command line tool to use, before clipping can succeed. Therefore ELAN first
checks the (see Special ELAN data folder) for the presence of
the "clip-media.txt
" file, before trying this file in its
installation folder. By copying the customized "clip-media.txt
"
file to the data folder, the changes are accessible to all versions of ELAN.
Mac OS users will have a default execution line in
"clip-media.txt
" looking like this:
osascript ./scripts/qtp_clip_10_10_export.scpt $in_file $out_file $begin(sec.ms)
$end(sec.ms)
Which means that an AppleScript script in the "scripts" folder will be executed when clipping media. There is a PDF file on the ELAN web site to help users with editing the script file.
Windows users can e.g. put a copy of ffmpeg.exe
(or ffmbc.exe
for clipping mp4
files) in the folder where ELAN is installed (or modify the execution line such that
the full path to ffmpeg
is included). You can find ffmpeg
and ffmbc
online.
If you want to use the syntax for ffmpeg
, remove the # in front of the line
starting with 'ffmpeg.exe -i .........'
If you want to use the syntax for ffmbc
, remove
the # in front of 'ffmbc.exe -vcodec copy.......'
Make sure the syntax you do not want
to use has a # in front of it, this comments the line out.
The syntax for ffmpeg
can be: ffmpeg.exe -i $in_file -vcodec copy -acodec copy -ss
$begin(sec.ms) -t $duration(sec.ms) $out_file
Where the elements are:
ffmpeg.exe
: the path of the application$in_file
: specifies the input file $out_file
: output filevcodec copy -acodec copy
: copy both the video- and audiocodec$begin(sec.ms)
: specifies the begin time frame of the clip$duration(sec.ms)
: the duration of the clip.Look in the script file for more explanation and examples. If it is not possible
to edit the script file due to file permissions, copy
"clip-media.txt
" to the Special ELAN data folder (and modify it to use an absolute path to
the clipping application).
A few examples for command line tools are:
C:\ffmpeg.exe -i $in_file -vcodec copy -acodec copy -ss $begin(sec.ms) -t
$duration(sec.ms) $out_file
C:\ffmbc.exe -vcodec copy -acodec copy -ss $begin(hour:min:sec.ms) -t
$duration(hour:min:sec.ms) -i $in_file $out_file
To clip a media file first make a time selection and choose File > Export As > Media Clip using Script.... A dialog will appear in which you can set the file name and the location to save the clipped file to. You can specify more options for clipping in the Preferences dialog, see Editing preferences.
If you have more media files to be clipped, typing a file name with a extension in the 'Save as' dialog will use the same extension for all the files that will be clipped. If you want to use the same extension from the original media file for the clipped files, then don't type an extension with the file name in the 'Save as' dialog which prompts you to set the file name and location for the clipped media files.
To export an image from the ELAN window (i.e. to make a screenshot):
*.jpg
,
*.jpeg
, *.png
or
*.bmp
)If you are using Windows, it sometimes happens that ELAN’s video window is black on the picture created using this function. This can be solved by temporary disabling the hardware video acceleration:
To export a Filmstrip Image first select the time segment you want the filmstrip of. Then click File > Export As > Filmstrip Image.... In the dialog window (see Exporting to a filmstrip image) you can define the width of each video frame, which frames to include and whether ELAN must add a time code in each frame. Moreover, ELAN can add the waveform, with or without a ruler, and specify the height. You can also specify whether the stereo channel should be displayed separately or merged or blended. Click on OK to generate the image. Finally select a destination folder, enter a file name and click on Save.
An example or an exported filmstrip image can be seen in An exported filmstrip image.
This option allows to save an image of a graphical representation of the density of annotations on selected tiers. This is the same functionality, with the same customization options, as in View > Annotation Density Plot...(Annotation Density Plot).
All Shoebox files that were imported into ELAN (see Shoebox file) can be exported back into Shoebox. In this case, the time code information is kept.
To export a file into Shoebox, do the following:
The Shoebox Export dialog box appears. Make a choice and click on OK to continue.
Figure 1.55. Shoebox Export dialog window
Figure 1.56. Name and directory of exported file
The file is exported as a *.txt
|
*.sht
| *.tbt
file.
If there already exists a file of the same name, ELAN will ask you whether or not it should overwrite the existing file, e.g.:
Figure 1.57. File Exists
It contains the following information:
Each ELAN parent annotation (including all its referring annotations) corresponds to one Shoebox record. E.g., in the illustration below, the ELAN parent annotation “Ligya-001” corresponds to the Shoebox record “Ligya-001”.
Each ELAN parent annotation (i.e., each Shoebox record) contains the additional field markers \ELANBegin and \ELANEnd (i.e., the begin and end time of the parent annotation).
This time code information allows you to import the Shoebox file back into ELAN, without having to manually re-align the file (see Shoebox file).
Figure 1.58. ELAN file and exported file
ELAN supports importing file from :
There are also options in ELAN available to import multiple files at once. More details regarding these options can be found here: Multiple file import options
ELAN supports the import of documents from Toolbox, allowing you to link transcribed and/or interlinearized documents to the time axis of media files. In order to import from Toolbox, you need at least the following two files:
*.txt, *.sht, *.tbt
);*.mpg
, *.mov
,
*.wav
etc.);*.typ
). If this is not available, one has to provide a list
with field markers (= tier names).
If you do not know the Toolbox database type file, do the following:
*.txt
| *.sht
| *.tbt
file in Toolbox. Make sure it is the active
window (click on it to activate it).Figure 1.59. Database type properties dialog window
To import a Toolbox file into ELAN, do the following:
Figure 1.60. Import Toolbox file
*.eaf
documents, the Toolbox file and the media
file(s) do not necessarily need to have the same name, and
they do not need to be in the same directory (see Basic Information).
If the Toolbox file contains both aligned (i.e. containing time information) and non-aligned records, the aligned ones will maintain the timing, whereas the location of the non-aligned records will be interpolated automatically.
Instead of using a Toolbox *.txt
| *.sht
| *.tbt
file, there is also an option in ELAN to define the field
markers yourself when importing a Toolbox file.
Figure 1.61. Set Shoebox/Toolbox field markers
*.txt
| *.sht
| *.tbt
fileSome markers are already 'built-in' in ELAN and must not need to be set: ELANBegin, ELANParticipant, ELANEnd.
Once you have manually created a set of field makers, you might want to reuse them later on. ELAN provides support for this:
Figure 1.62. Store markers
Once the import has succeeded, you can add a reference to a media file via the Edit > Linked Files… menu, as described in Changing the linked media files. If the imported Toolbox file was exported from ELAN before, you won’t need to establish the link to the media file(s) again, as in that case the location information is stored in the file.
ELAN imports Toolbox files according to the following conventions:
This addition is necessary because ELAN and Toolbox differ in how they code information about multiple speakers:
Figure 1.63. Toolbox field markers and ELAN tiers
When importing texts by multiple speakers, ELAN splits each Toolbox field into several ELAN tiers (one for each speaker) and adds the speaker-ID to the tier label.
If speaker information is not specified in the Toolbox file, the extension @unknown is added.
The following screenshot illustrates how ELAN treats texts by multiple speakers:
Figure 1.64. Multiple speakers in ELAN
Note that ELAN can only read speaker information if:
When the file is exported back to Toolbox (see Toolbox file(UTF-8)), the extension @‘Speaker-ID’ is automatically dropped from the field marker, and the Toolbox records are sorted according to their record marker (e.g., in the above illustration, “test 001” is sorted before “test 002” etc.)
Figure 1.65. Time Subdivision
Figure 1.66. Fixed time intervals
The time alignment has to be done manually for each Toolbox record. Do the following:
If you do not activate the Bulldozer mode, you will inadvertently overwrite and thereby delete existing annotations. Make sure that Bulldozer Mode is enabled in the Options > Propagate Time Changes menu.
The parent annotation (together with all its referring annotations) is assigned to the new time interval. All other parent annotations are moved to the right.
Figure 1.67. Time alignment
After you have done the time-alignment, you can export the file back to Toolbox – in this case, the time code information will be kept (see Toolbox file(UTF-8)). If you then re-import the file back into ELAN, ELAN automatically assigns the Shoebox records to their correct time intervals.
An imported Toolbox file can be saved as an ELAN file (see Re-open recently accessed files), exported back into Shoebox (see Toolbox file(UTF-8)), or exported as a tab-delimited text (see Tab-delimited text file).
ELAN can import documents from the SIL Fieldworks Language Explorer (FLEx). This involves a few steps:
.flextext
file and relevant media files by clicking the
...-buttons. .flextext
file exported
from FLEx. Optionally also add media files here (if not already in your
.flextext
file). There are several options to customize the
import:
Include "interlinear-text" element
: the top-level
interlinear-text
element can contain a title and other
information. If selected, these will be converted to one or more
tiers.Include "paragraph" element
: a text may contain
multiple paragraph
elements, each containing one or more
phrase
s. If selected, this option allows to ignore the
paragraph
layer when importing.Import participant information from "Note" field
: if
the FLEx file contains a note
item type containing the name or
code of the participant/spaeaker, this option makes that it will be stored
in a tier's partipant
attribute.Smallest time-alignable element
: when the
word
element is selected here, the time-alignment for that
level will be lost when exported again from ELAN to FLEx. In
.flextext
time alignment is stored on the
phrase
level.Specify the top-level "phrase" item type
: by default
the <item type="txt">...</item>
child element is
converted to the parent tier of each level. Here it is possible to specify
an alternative for the phrase
level, e.g.
segnum
.Use the "speaker" attribute as tier prefix
: by default
tier hierarchies for different speakers in the file are prefixed in ELAN
with A, B, C
etc. This option allows to specify that the
contents of the speaker
attribute of a phrase
element should be used instead. Note that this can interfere with the
conversion, the encoding and decoding, from tier name to
.flextext
element.phrase
, word
, morph
etc.) or, more
fine-grained, for each combination of major element plus item type
up to a combination of major element, the type and the language.phrase
element in milliseconds.
This has to be set if the FLEx export files do not contain timestamps. When
importing a FLEx file that was edited in ELAN before and exported as a
.flextext
file, time duration information has already
been stored in the file.Figure 1.68. Import FLEx file
The tier structure created after import in ELAN is roughly like in the example above. The mapping of the FLEx structure onto ELAN tiers follows the schema: <Speaker>_<element>-<item-type>-<language> Where the Speaker prefix is a generic label (A, B, C, ...).
FLEx tiers and their representation in .flextext
:
Word | <word> | <item type=”txt”> |
Morphemes | <morph> | <item type=”txt”> |
Lex. Entries | <morph> | <item type=”cf”> |
<morph> | <item type=”hn”> | |
Lex. Gloss | <morph> | <item type=”gls”> |
Lex. Gram. | <morph> | <item type=”msa”> |
Word Gloss | <word> | <item type="gls"> |
Word Cat. | <word> | <item type=”pos”> |
On the third-party resources page of ELAN (https://archive.mpi.nl/tla/elan/thirdparty ), you can find a workflow description covering importing from FLEx to ELAN and back to FLEx.
It is possible to import CHAT files (used in e.g. the Childes project) in ELAN:
Some remarks about this import feature:
Remaining issues:
The feature to import Transcriber annotation files into ELAN works as follows:
*.trs
) and click on
OpenThe transcriber tiers will be mapped on the ELAN equivalents:
A CSV (Comma Separated Values) or Tab-delimited Text (or Tab Separated Values) file is a text file in which one can identify rows and columns. Rows are represented by the lines in the file and the columns are created by separating the values on each line by a specific character, like a comma or a tab. CSV or Tab-delimited Text files can be compared to spreadsheets like the ones in Microsoft Excel in that they also have rows and columns. Note that .csv files can be created by Excel.
Take a look at Tab-delimited Text. The first row represents the event of a person saying 'so from here'. The first value (as well as the first column of the complete file) represents the tier name, the second and third represent begin time in different formats, the fourth and fifth represent the end time, the sixth an seventh represent the duration and the last value represents the annotation.
You are able to import CSV or Tab-delimited Text files in ELAN: File > Import > CSV / Tab-delimited Text File.... In the dialog window browse to and select a file that contains CSV or Tab-delimited data and click Open.
The second dialog window contains two sections (see Import CSV / Tab-delimited Text). The upper section shows a sample table containing data from the selected file. Both rows and columns are numbered. The lower section enables you to specify which columns to include and what data type they represent. This means that the format of the files is flexible: it is not prescribed what data is expected nor how it is formatted. The numbers of the columns in the Import Options section correspond to the numbers of the columns in the sample table. The data types you can select are:
The option Specify first row of data
enables you to exclude a
header by excluding the first few lines. The option Specify delimiter
lets you specify the delimiter if ELAN did not guess the correct delimiter. The
delimiters supported by ELAN are comma, tab, colon, semi-colon and the vertical line
(vertical bar).
If you enable the option Default annotation duration
ELAN creates
all annotations from the selected file with durations equal to the number of
milliseconds specified. This option works only if there is no time data or only the
begin or end times.
Default annotation duration
will create annotation units with the
specified duration.
Skip empty cells
will leave out the cells in the csv that are
empty. Different tiers can be imported with different segmentations with this option.
Combine with template (.etf)
allows to import annotations into
tiers defined in a template, described in more detail below (Import tab-delimited text in combination with a template)
Finally click OK to import the data. If a transcription document was open when starting the import, the imported tiers and annotations will be added to the already open document, otherwise a new transcription document is created with the imported annotations as its contents.
To demonstrate that the format of the imported file can be flexible, take a look at the following tab-delimited text:
Figure 1.72. Tab-delimited text, different orientation
Figure 1.73. Import CSV / Tab-delimited Text
Specify first row of data
option is set to 2. As
a consequence ELAN starts importing annotations from row 2 instead of row 1.
Furthermore, ELAN tries to extract tier names from the first line of the file if the
column they are part of is specified as 'annotation'. This results in this example
in
two tiers: K-Spch and W-Spch.
To merge a CSV file with an existing *.eaf
file, open the
*.eaf
file first and then choose Import
CSV/Tab-delimited Text File. For information on merging a CSV file that
has been imorted into a new document with an existing *.eaf file, please see Merging transcriptions.
Import with a template:
When theCombine with template (.etf)
checkbox is selected,
the Select... button allows to browse to an ELAN template file
(*.etf
). The import function will then use the tiers and tier
types etc. from the template as the basis for the new transcription and add the imported
annotations to those tiers. The matching is based on tier names being exactly the
same
in the template and in the tier column (as in the first screenshot) or column headers
in
the delimited text file (as in row 1 of the sample tables in the second and third
screenshot).
The import function will try to apply constraints as defined by the tiers and types in the template, but success is not guaranteed. Especially if the template defines many levels of tier dependencies, proper import might fail. Depending on the structure of the delimited text file, the Skip empty cells option may have to be selected or deselected for a successful import.
This option is still experimental.
It is possible to import subtitles that are stored in the SubRip
*.srt
format: File > Import > Subtitle /
Audacity Label File.... HTML and similar formatting tags are filtered
out and multiple speakers are merged into one. The correct encoding of the file has
to
be specified in the import window.
Audacity Label files are a specific kind of tab-delimited text
(*.txt
) files. They can be imported here without the
configuration step that is part of the general Import CSV/Tab-delimited
Text File import.
If this import is started when a document is already open, the imported contents is added to that transcription. Otherwise a new transcription document is created.
ELAN offers the possibility to import a Praat .TextGrid
file: click on
File > Import > Praat TextGrid File.... In the dialog
window that now appears, you can browse to the file you wish to import. You are also
able to Include Praat PointTiers. When selecting this option,
specify the Default PointTiers annotation duration in milliseconds.
Check Skip empty intervals / annotations if you want to do so.
Finally, if you tick the Attempt to auto detect .wav file checkbox,
an audio file with the same name as the TextGrid file will be added to the linked
files,
if it is found.
If there is already a annotation document opened in ELAN, the imported TextGrid is added to the document in one or more new tiers. If there is no annotation document opened, a new document consisting of the TextGrid data is generated.
In addition to TextGrid files in the default encoding for the operating system, ELAN supports Praat TextGrid files with UTF-8 and UTF-16 encoding.
It is possible to import a WebAnnotation JSON file via File >
Import > WebAnnotation JSON File..., the file extension is
.json
or .jsonld
. There are no
configuration options. The contents of the file should comply with the W3C Web Annotation Data
Model specifications, even though the import function only supports a subset
of those specifications (those elements that map quite naturally to ELAN
elements).
Importing Tiers from recognizers will import the tiers in a new file if there is no file currently open in elan. But if a file is open, the tiers will be in the currently open file. To import the tiers from recognizers, go to File > Import > Tiers from Recognizer.... Selecting this option, first will prompt for the import file. If there is no file is open, the tiers are directly imported to the new file. But if a file is already open, then a 'Create tiers from segments' dialog appears. For more information about this dialog see Silence Recognizer.
Importing a document from Shoebox is very much the same as importing a document
from Toolbox (see Toolbox file). As with
the Toolbox import, information about the tier relations can be provided by means
of a
.typ
file or by using a marker file.
When reconstructing the vertical alignment of words on interlinearized markers, the position is recalculated based on the number of bytes per character. But in some files this leads to incorrect alignment, therefore this recalculation can be turned off by unchecking Correct alignment based on the number of bytes per character. This import also tries to take non-spacing characters into account.