ELAN offers several ways to interact with web services. These web services are tools or applications that run on a web server, accept some resource(s) as input, apply an algorithm and return the result as output. Some of the Recognizers in the Audio And Video Recognizers tab are available as a web service, in which case audio or video or any other type of sequential data are uploaded to be processed online. The web services available in this menu work with text rather than multimedia content. These tools can automate certain parts of the annotation process, such as tokenizing and part of speech tagging. These web services can be found by clicking Options > Web Services . WebLicht is described in the next section, support for TypeCraft is still experimental.
WebLicht (Web-Based Linguistic Chaining Tool) is an execution environment developed at Tuebingen University as part of the CLARIN infrastructure. Most of the tools in this environment perform NLP (Natural Language Processing) type of tasks on textual data and most of them are tailored to work with language data in one of the well-described and well-resourced languages. The tools are encapsulated as web services and can be combined into processing chains.
To make use of the WebLicht service, go to Options > Web Services > Weblicht. In the dialog that opens, you can specify how to use WebLicht
Then click the Next button.
The interface of the second step depends on the choice between plain text and tier(s) in the first step.
Sentence
if
the annotations on the selected tier contain sentences or
Word/Token
if the annotations contain single words). There are
some limitations on the tiers you can select for each type;
Sentence
tiers are expected to be a toplevel tier or a
symbolically associated dependent tier thereof, Token
tiers are
expected to be on a symbolic subdivision tier.
Then click Next.
The third step depends on the choice between a tool chain and a single tool in the first step.
TCF
. In
the lower text field the access key can be pasted.
For plain text in the first step, the list contains several services that detect sentence boundaries and then tokenize these sentences. Select one of the tokenize servicese. In case of successful processing the result will be two tiers, for sentences and tokens. If you want to add Part of Speech and/or Lemma annotations, you can use the tiers produced in this step as the input for such services (part of speech taggers) in a second run. There is an option to specify the duration (in ms) of each sentence
For tier input different services are available which can parse text, tag Parts of Speech, etc. Each service has a short description that specifies its function. Hovering over a service with the mouse will show a tooltip containing more information of the service. If the service you are looking for is not listed, you can manually specify its URL.
After configuration with a tool chain file or a single tool, click Finish to start processing. When the processing was successful, you will see a dialog stating the operation is complete. Depending on the service you selected for processing, the tokenized sentence and/or part of speech tags will be added as children of the tier you selected for processing.
The recommended way to use WebLicht from within ELAN is by supplying a preconfigured tool chain file. This file can be created as follows:
chain_randomnumber_.xml
, it might be
practical to rename it to something more intuitive, especially if you plan to have
more than one tool chain