Top of Page

home | introduction | research | people | facilities | events & news | visitor info | contact us | search

Index Technical Facilities


A powerful and flexible search tool was developed which can handle CHAT, Shoebox, and standard text formats. It allows the user to specify "patterns" which can be associated with tiers. Such patterns can be specified as regular expressions. Within these regular expressions it is possible to use variables which are expanded during execution. The latter is very important, since it allows the user working with exotic languages to, for example, specify the set of vowels in a formal way (specification of the set of vowels as an example: vowel:=[aph, uiv, o, ie, ouh, ]) and then use this set in the regular expressions.

After having specified such patterns one can combine them with the help of a logical language to form powerful search strings. The available operators are the basic logical operators (AND, OR, NOT) and sequence operators (WITHIN n Columns, WITHIN m blocks). The latter operators assume that the corpus is structured in that way, i.e. it uses a block structure (per utterance a main tier and some dependent tiers as used in CHAT and Shoebox) and it uses a column structure between tiers within a block as is used in Shoebox. A column can be specified by a certain delimiter such as a space. In that case each word would form another column. Of course, bracketing is possible.

The following example shows a typical corpus created with Shoebox. It has blocks (turns) and a column structure. Additionally the corpus fragment patterns and a logical expression are shown, as they can be input to the Search tool. Such expressions can run on singular files or on a long file list. The output of the Search tool can itself be subject of search runs.


The Search tool is availbale for all major platforms. It is intended that it will be integrated in MED and EUDICO such that it can be used interactively as well.

Last updated: February 15, 2000 13:21

top of page | home

End of Page