Archiving Workflow
A programme such as DOBES is very much dependent on a smooth interaction between the documentation teams and the archiving team. Given the different backgrounds, the different preferences and the geographical distances this is not always easy to achieve. However, in DOBES a high level of mutual understanding has been achieved by having worked out and discussed two stereotypic workflow schemes: a central and a decentral workflow. In reality a mixture of these workflows are applied by the teams. It is Important that all parties make their choices and processes explicit , and as early as possible. Below we will give a number of examples about problems that can, and therefore will, occur, although everyone does their best to prevent them.
Part of the problem results from the fact that the formats the
archive has to use for long-term storage are different from those used for the linguistic analysis and description work.
Central Workflow
In the central workflow all recorded material (tapes, etc.) is sent to the archivist who takes care of the digitization or capturing. The archivist will directly generate archival formats such as MPEG2, but also create formats that can be used for analysis purposes such as MPEG1 or MPEG4. The archivist will process a whole tape (if possible) and create a DMF (Digital Master File) which is stored in the archive as MPEG2. A DVD copy is returned to the depositor so that they can carry out the linguistic analysis work, i.e. break up into meaningful sessions and making the annotations. When this work is finished the depositor has to send back the following information to the archivist, consisting of: (1) a metadata description characterizing the sessions including information about where to exactly cut, and (2) the annotations including time markers etc. Based on the cut information the archivist will take the DMF and break it up into the same sessions and integrate all information as a bundle into the archive.
The advantages are: fewer errors in the digitization/capturing process and better parameter control.
The major disadvantages are: slower cycle time, i.e. the researchers have to wait until they get the material back.
Decentral Workflow
Due to the technological advancement that even allows the capturing of video whilst in the field, the decentral workflow has become much more feasible and attractive for the documentation teams. They immediately digitize or capture the audio/video signals in the field, cut it into sessions and start the annotation process. Finally, they send all material (original tapes, metadata descriptions including cutting information and the annotations) to the archivist. The archivist will generate MPEG2, which cannot be done in the field since it needs too much storage capacity, cut into sessions according to the information received, and then integrate the resulting bundle into the archive.
The advantages are: much faster cycle time
The disadvantages are: the process can be erroneous and the fieldworker has to care about storage management
Problem Cases
In general the process can lead to many errors that can only be minimized with considerable interaction. Here we can only present a few typical examples:
- the teams preferring a decentral workflow have chosen wrong video parameters so that the video information is not usable
- the teams produce video tapes that include still photos and other interruptions so that the archivist has to process the tapes manually (continuous checks, redigitization, etc.)
- the cutting information is not correct leading to wrong alignments between audio/video streams and the annotations
- the researcher uses equipment (MiniDisc) or parameters (nighshot,
etc.) that reduce the quality of the captured material without the researcher being unaware the equipment causes this problem
- the teams use software to create structured textual material such as metadata, annotations or lexica that produces non-recommended formats, or that does not control the structural and encoding consistency so that there is no chance for automatic conversion
- the teams use character representations other than UNICODE and do not document what they are using, which also makes it very difficult to create an archivable document
For more details we refer to the training course material.