Recognizer: Hecate shot boundary detection
Produced By: Yahoo Research, New York
ELAN extension: TLA, Max Planck Institute for Psycholinguistics, Nijmegen

How to use Hecate and start shot boundary detection from within ELAN (locally)

Version 1.3, May 2024

Hecate (https://github.com/yahoo/hecate) is a video processing library which, in the ELAN context, detects camera shot boundaries. This extension implements ELAN's Recognizer API and allows to call a local installation of Hecate to produce shot information which is converted by this extension to annotation segments. The input is one of the video files linked in the current document.

Prerequisite is that Hecate (and everything it depends on (OpenCV, FFmpeg)) is properly installed and is working correctly when invoked from the command line. If that is the case, the Hectae extension in ELAN's user interface allows to configure a few parameters and to start the detection process.

Extensions based on the Recognizer API are installed in a sub-folder of ELAN's extensions folder. The hecate folder contains:

The main parameters and their categories are:

Parameter typeParameter idDescription
<input> --in_video ELAN automatically preselects the first video file of your current annotation session, but you can change that to other supported files of the session.
<textparam> run-command by default the command line command is set to hecate. Depending on the platform and on how Hecate is installed and configured this might not work and e.g. the full path to the executable might need to be entered.
<textparam> fps the frames-per-second value of the selected video. Hecate produces video frame indices for the shot ranges and since this recognizer has no direct access to e.g. the video player and has no way to extract the frames-per-second property itself, this information has to be provided by the user, so that this extension can convert the shot boundary information to milliseconds.
<numparam> --step the frame subsampling step size. The default value is 1, meaning all frames are considered in the detection process. A higher value determines how many frames are skipped between frames that are processed.

The recognizer produces the shot ranges at the very end of processing and doesn't show progress messages while processing.