If you want to perform a detailed search over multiple EAF-files, but the options offered by Search multiple EAF (see the section called “Searching through multiple annotation files”) are not comprehensive enough, you can use yet another search mode. This allows you to restrict the search domain to certain tiers, to use regular expressions, etc. while examining multiple annotation files at once.
The function can be reached via .eaf
files. The
next time you open the Structured search, it uses the last defined
search domain. The search window offers the possibility to define a
new search domain: click on
and do one of the following:
Select an existing domain from the list and click
. (Click if you want to delete the domain.)Create a new domain:
Click
Click in the new dialog on the Look in pull down box and browse to the directory that contains the annotation files.
Double-click an annotation file
(*.eaf
) to select it. It now appears
in the rightmost box. Alternatively, you can click on the
annotation file name and click the
button.
Repeat this for every annotation file you want to include.
It is also possible to select a complete directory.
All .eaf
files in a selected
directory will be included.
Click
to continue the exporting process; otherwise click to exit the dialog window without exporting.If you clicked
you can save this domain: enter a name and click . If you do not want to save the domain click .Create a new domain from an IMDI search:
Click
Browse to and select an IMDI file that has been exported from a metadata search in the standalone IMDI Browser.
Click
.You can save this domain: enter a name and click
. If you do not want to save the domain click .After defining a search domain for the first time or when you open the Structured search with a search domain from the previous usage, the following window will open:
As you can see there are three tabs offering different kinds of search:
This tab offers the simplest search. It just asks for a search string. After entering the search string you can click on
(or press Enter) to start the search process. This will result in a screen like the one below:It shows tokens that contain the search string and some tokens in the context printed in italic typeface. The default number of tokens in the context is three on both sides. When the number of hits exceeds the maximum number the window can contain, you can view the rest of the hits by clicking the
and button that appear above the list of hits to go back or forward one page. To view an annotation in the timeline view of the main window simply double click it:For further investigation of the results the search window offers a context menu that enables you to view the results in other manners and to save the results. To open the context menu right click on one of the results. The menu has the following options:
: clicking this option shows both frequency and relative frequency (as a percentage) of the tokens found. The relative frequency is relative to the number of hits.
: clicking this option shows the transcription in the timeline viewer similar to double clicking an annotation.
: by clicking this option you enable ELAN to show you information about a token in an info balloon. This balloon will appear when your mouse cursor is hovering over a token. The information shown in the balloon contains:
Transcription file
Tier name
Tier type
Participant
Position in tier
begin time
end time
duration
: this option offers a submenu that enables you to decrease and increase the context size of the results. Minimum size is 0 and maximum size is 8 tokens.
: click this option to change the font and font size of the results.
: when clicking this option, you will be asked to select a directory and enter a filename. The result is a file that contains the following information per token found:
Annotation: the annotation token containing the search string.
HitPositionInAnnotation: the position of the first character of the search string in the annotation.
HitLength: number of characters in the hit
HitNumberInAnnotation: if the search string is found more than once in an annotation, this number will give the rank of the hit within the annotation.
AnnotationBeginTime: the begin time in ms of an annotation containing the search string.
AnnotationEndTime: the end time in ms of an annotation containing the search string.
HitPositionInTier: the position of the annotation in a tier.
TierName: the name of the tier containing the annotation.
TierType: the type of tier containing the annotation.
LeftContext: the left context of the annotation.
RightContext: the right context of the annotation.
TranscriptionName: the path and filename of the transcription in which the annotation is found.
: clicking this option lets you save a file that contains hit statistics. The export dialog contains the following options:
Separate hit count per hit value: if checked there is a line of statistic for each hit. If not checked, there is line per file.
Include file name column.
Include file path column.
Time format: specify whether the time format should be in milliseconds (ms) or seconds and milliseconds (sec.ms).
After clicking
you can enter a filename and click to save the statistics file.When you are in the frequency view (Figure 6.18, “Frequency View”), the context menu has the following options:
: clicking this option will show the annotation results.
: clicking this option shows the transcription in the timeline viewer similar to double clicking an annotation.
: when clicking this option, you will be asked to select a directory and enter a filename. The result is a file that contains the following information:
Annotation
Percentage
Count
The Single Layer tab offers a more elaborate search than the Substring Search tab. The first thing that is different from the Substring Search tab is that the Single Layer Search tab has a query history. Clicking the
and button makes the tab respectively go backward and forward one query.Furthermore, the tab offers different modes to restrict the search. The first mode lets you choose the form of the results. There are three options:
: the search string is part of or exact match in an annotation.
: each element of the search string (elements are divided by spaces) is part of or exact match in one of several consecutive annotations.
: each element of the search string (elements are divided by spaces) is part of or exact match in one of several consecutive tokens within one annotation.
The following mode offers the straightforward distinction between
and search. The third mode lets the user choose if the element of the first mode should contain the search string ( ), if the element should exactly match the search string ( ) or if some regular expression should be used in the match ( ).Finally, one can choose to restrict the search to one tier, a tier type or a participant.When you choose an N-gram to be the form of the result, you
can use two more options: a wildcard and a negation. The wildcard
takes the form of a #-sign. For instance, the search string
the # man
with the mode would return three annotations per
hit: the first annotation contains
the
(or exactly matches that, if
the mode is chosen), the
second annotation may contain anything due to the use of the
wildcard and the third annotation contains or exactly matches
man
. If the mode
is chosen, each
hit contains one annotation. In this annotation there is a N-gram
consisting of three tokens where the first token contains or exactly
matches the
, the second may be
anything and the third contains or exactly matches
man
.
If you want to find N-grams where a token matches anything but
one string, you can use the negation operator NOT(...), where you
can fill in the search string not to be matched on the dots. For
instance, the search string the NOT(strange)
man
would return 3-grams in same way as describe above,
but the hits where the second annotation or token matches
strange are left out.
The Multiple Layer Search tab houses the most comprehensive search in ELAN. Similar to the Single Layer Search tab a Query History is kept, enabling the user to go back and forward a query by clicking the
and respectively. The two modes / and / / are also similar to the second tab. The first new element is the -button. Clicking this button will clear all data of a query.The buttons
and enables you to constrict the minimal and maximal duration of each result. When you click on one of the buttons, a dialog window appears, e.g.:
Here you can enter the minimal or maximal duration as the
total number of milliseconds or in
hours:minutes:seconds.milliseconds. A value of 0 milliseconds or
00:00:00.000 yields as undefined. Searching for annotations with a
maximum duration being less then the minimum duration is impossible.
Hence, entering conflicting values results in an error message
saying that the combination is impossible. After entering a correct
duration, it will be displayed in the corresponding button.
The buttons
and give a dialog similar to that of the previous two buttons. They give the possibility to restrict the annotations in the result to begin after a certain time and end before a certain time. Entering a Begin After-time that is greater than the End Before-time or vice versa results in an error message saying it is impossible. After entering a correct time, it will be displayed in the corresponding button.Beneath the buttons discussed above, you will find a table consisting of white and green fields. Search strings are entered in the white fields while a green field between two non-empty white fields must contain a constraint. The fields on one row give the search strings and constraints to be matched by annotations on one tier. The result of having three rows in the query table is that the search engine may find annotations on three tiers as one hit. Furthermore, it is possible to restrict the search to one (type of) tier for each row by choosing the appropriate option in the pull-down menu on the right of each row.
Let us first take a look at search strings and constraints in one row. If you enter two search strings in two white fields separated by a green field, you must fill in that green field i.e. make a constraint. Right clicking on the green field gives a context menu offering the following constraints:
: between the annotations containing the two search strings, there must be exactly N annotations.
: between the annotations containing the two search strings, there must be more than N annotations.
: between the annotations containing the two search strings, there must be less than N annotations.
: between the annotations containing the two search strings, there must be exactly X milliseconds.
: between the annotations containing the two search strings, there must be more than X milliseconds.
: between the annotations containing the two search strings, there must be exactly X milliseconds.
: there are no constraints.
: clear the current constraint.
When you click on
and there is an empty constraint between two non-empty search string fields, you will get an error message. You will also get an error message if there is an empty search string field and constraint fields between two non-empty search string fields.As we saw earlier the search mechanism on this tab has the possibility to construct a query for up to three tiers. Besides the constraints on annotations on a tier, one can also apply constraints on annotations on different tiers. This means that if the search engine has found an annotation that matches a search string on one tier, the engine looks if the search string for another tier can be matched on another tier while considering the constraint that is between the two search strings.
The top down hierarchy of the rows in the query table does not reflect the hierarchy of the tiers in your data. That means, for instance, that search strings and constraints in the upper query table row may be matched by a child tier of the tier that matches search strings and constraints in the middle query table row.
Right clicking the green field between two search strings gives a context menu with the following constraints:
: the begin time and end time of both annotations are the same:
: part of both annotations overlap. This includes the other options Fully aligned, Left overlap, Right overlap, Surrounding and Within.
before the begin time and end time of the annotation matching the upper search string:
: the begin time and end time of the annotation matching the lower search string lieafter the begin time and end time of the annotation matching the upper search string:
: the begin time and end time of the annotation matching the lower search string lie: the begin time of the annotation matching the lower search string lies before the begintime of the annotation matching the upper search string and end time of the annotation matching the lower search string lies after the end time of the annotation matching the upper search string:
: the begin time of the annotation matching the lower search string lies after the begintime of the annotation matching the upper search string and end time of the annotation matching the lower search string lies before the end time of the annotation matching the upper search string:
: the begin time of the annotation matching a search string lies after the end time of the annotation matching the other search string:
or
: the begin time of the annotations matching the upper search string must lie exactly X milliseconds before the begin time of the annotation matching the lower search string.
: the begin time of the annotations matching the upper search string must lie less than X milliseconds before the begin time of the annotation matching the lower search string.
: the begin time of the annotations matching the upper search string must lie more than X milliseconds before the begin time of the annotation matching the lower search string.
: the begin time of the annotations matching the upper search string must lie exactly X milliseconds before the end time of the annotation matching the lower search string.
: the begin time of the annotations matching the upper search string must lie less than X milliseconds before the end time of the annotation matching the lower search string.
: the begin time of the annotations matching the upper search string must lie more than X milliseconds before the end time of the annotation matching the lower search string.
: the end time of the annotations matching the upper search string must lie exactly X milliseconds before the begin time of the annotation matching the lower search string.
: the end time of the annotations matching the upper search string must lie less than X milliseconds before the begin time of the annotation matching the lower search string.
: the end time of the annotations matching the upper search string must lie more than X milliseconds before the begin time of the annotation matching the lower search string.
: the end time of the annotations matching the upper search string must lie exactly X milliseconds before the end time of the annotation matching the lower search string.
: the end time of the annotations matching the upper search string must lie less than X milliseconds before the end time of the annotation matching the lower search string.
: the end time of the annotations matching the upper search string must lie more than X milliseconds before the end time of the annotation matching the lower search string.
: there are no constraints.
: clear the current constraint.
An example of a Multiple Layer Search with constraints is shown in Figure 6.21, “Multiple Layer query”:
Because the search mechanism offers the possibility to search for patterns in three tiers and there are possibly three search strings per tier, the search results also consist of nine elements per hit. Instead of presenting a hit in the form of a table it is presented on one line with indicators for tiers and annotations. Figure 6.21, “Multiple Layer query” shows the results of the query above. As you can see the tiers in the result are indicated by #1, #2 and #3 corresponding to the upper, middle and lower query table row respectively. The annotations in a tier are surrounded by vertical bars indicating their start and end.
Figure 6.21, “Multiple Layer query” also illustrates
what to do if you would like to use both and
in one query: use the .
In places where you would like to have an exact match use the
^
and $
signs to match
the beginning and end of a string (e.g. ^of$
)
otherwise just enter a word for the substring match.
The figure also show how to use a wildcard to match anything.
Instead of using the #
as in the Single Layer
Search, you can use the regular expression .+
to indicate any character (the dot) one or more times (the plus).
See also Appendix A, REGULAR EXPRESSION SEARCH for more on regular
expressions. The NOT(...) construction on the other hand can be used
in the Multiple Layer Search in the same way as describe in the section called “Single Layer Search tab”.
One final but not less important remark concerns the placing
of more and less restrictive search strings. Figure 6.21, “Multiple Layer query” shows a very restrictive
search string in the upper row: ^n$
. The less
restrictive, or should we say unrestrictive, search string .+ is in
the middle row. As we saw earlier, the hierarchy of the rows in the
query does not reflect the hierarchy in the data. That means that
the search string ^n$
could also be placed in
the lower row and not affect the outcome of the search. While this
is perfectly true, we advise you to place restrictive search strings
in the left most field on the upper most row possible and the least
restrictive search string in the right most field of the lowest row
possible. The reason for this is the order in which the search
engine considers the search strings in the query. If it finds a
restrictive search string it can filter out all the other
possibilities, but if it finds a less restrictive search string it
has to consider all the matches of this search string. In the
example of Figure 6.21, “Multiple Layer query” it is clear
that if ^n$
is in the bottom row, the search
engine first considers all annotations matching
.+
which is in fact all
annotations in the search domain. Because of this, the search takes
much more time than if ^n$
was in the upper
row.