Skip to content
Home > Audio and Video > Audio/Video Digitization and Capturing

Audio/Video Digitization and Capturing

We have to distinguish two aspects:

(1) Digitization means to transfer an anologue into a digital representation.  A good example for this is the information that is stored on the good old audio cassette tapes. The sound pressure waveform is contained on the tape in form of an adequate function of magnetization strength. This analog representation has to be converted to a series of numbers, since digital computers can only work with discrete numbers stored at discrete time moments. The following graphic may indicate the process: at equidistant times the actual value of the analog waveform is taken and stored as a number. This process where analog signals are turned into a series of digital numbers is called sampling.




(2) Capturing means to transfer data that is already in a digital representation into a digital representation on computers. Many recording devices have already a digital representation such as DAT-recorders for audio or DV camcorders for video signals. In this case the processing is even more simple since one only has to assure that the digital packaging (format) has to be converted. There is, however, equipment where capturing is not possible since the companies don't offer the digital information in a proper format at an external plug. Earlier MiniDisc recorders only allowed the user to copy the stored data via an analog line. Currently, there are no analog recorders anymore in the video sector, i.e. all cameras carry out a direct digitization.

Digital signals have a number of advantages compared to analog signals: a) copying can in general be done without loss of information which is not true for copying analog signals (3 dB degradation); b) digital circuitry is much more resistent against external influences (electromagnetic radiation etc); c) once data is digitized all processing is exact and reproducable.

However, when digitizing analoge data one has to take care of a few phenomena which are very important to not get distortions:
  • One has to know the bandwidth of the signal, i.e. every movement can be transformed into a set of sinusoids of different frequency and the highest one necessary to describe the movement with sufficient detail is determining the bandwidth (for speech waves we know that children for example can produce frequencies of up to 8 kHz - so 8 kHz would in general be the bandwidth of speech signals)
  • Dependent on the bandwidth one has to chose the sample frequency. According to the Nyquist theoreme on can only reconstruct a sinusoid component if one has at least two points per period, i.e. the sampling frequency should at least be twice the bandwidth. For speech therefore 20 kHz would already be sufficient. Since music has a much higher bandwidth it is generally said that a sample frequency of 44.1 or 48 kHz is sufficient (this is according to the HIFI norm most equipment uses).

We have carried out extensive tests with different setups to support digitization in the field. Much hardware and software is currently being offered, only from some products we know that they work according to the requirements. For a few we know that they do not deliver acceptable results - we will not mention product names here of course. In this note we will make recommendations and we ask all DOBES teams to adhere to them. This is important since we already got digitized video material with which we have great problems in handling them. Any comment is welcome of course.

Agreements

In DOBES we have stated that MPEG2 is the archiving format. Therefore, we will digitize video at a constant bit-rate of 6 Mbps that is a factor of 4 with respect to the data produced compared to the earlier MPEG1 streams. We will use MPEG1 as the standard default delivery format for a while until the computers, networks and the software support MPEG2 without problems. Those who have special wishes should contact the archive manager (dobes@mpi.nl).

General Statements

  • Since video digitization still is not a trivial operation we still recommend that those who want to be sure that everything works nicely send their tapes to us. We will digitize the material and send back the DMFs.
  • Those who need the digitization option in the field for whatever reason should use a DV camcorder, have a PC with an iLink (Firewire) input connector, defragment their disk, and start the DV input process using Adobe Premiere V 6.0. The capture option in AP should be activated since it will inform you that video fragments were loosed and stop the transfer process. This process takes real-time, i.e. to transfer a 60 min tape costs you 60 minutes. 60 minutes of DV, however, means about 15 GB that can create problems due to the size of your hard disk. This procedure would be the most robust one, however, you will need at least a 30 GB hard disk. Other options are discussed below.
  • We recommend then to convert the DV data with the help of the Tsunami MPEG Encoder (TMPGEnc) program on your notebook or PC to a MPEG1 stream - MPEG1 since it creates 4 times less data compared to MPEG2, the conversion time is less and the current software supports it. We also do not recommend working with DV, since it means a factor of 15 times more data compared to MPEG1. Based on this MPEG1 stream the annotations can be created using the recommended programs. Only in few cases the MPEG2 resolution may be necessary to create the annotations. Often people are satisfied with the audio information to create the annotations.
  • We recommend sending the DV tapes and the annotations to the MPI as early as possible. Important is that the TIDEL team receives correct metadata descriptions including exact start and end times for your session! If these times are not exact, all annotations will not be correctly aligned. The exact start/stop times can be extracted from the created AVI file (with the DV stream) with the help of Adobe Premiere 6.0. You have to enter these times in the metadata transcription, since the conversion does not support copying this time information.
  • Only in the case that you intend to send us the MPEG2 files (and not the DV tapes) you have to use the MPI template when carrying out the conversion from the DV format. The parameter settings are crucial to guarantee smoothness.

Default Workflow
  1. Do your recordings with DV and label your tape according to the rules.
  2. Transfer a session to the PC with Adobe Premiere 6.0 and create your session metadata transcription with the exact start and end times.
  3. Convert your data to MPEG1 (at this moment necessary) and if necessary to a separate audio file (wav format) by using the Tsunami MPEG Encoder (TMPGEnc) software and setting the parameters according to the MPI template.
  4. Carry out your annotations based on the MPEG1 file.
  5. Send the video material (DV tapes) to the MPI together with your annotations and the metadata description. The metadata has to include the exact begin and end times of the session on the original DV tape. For test purposes send this immediately when you have created the first transcription. This will allow the TIDEL people to check the correctness of all operations.

Alternative Digitization Procedure

Of course, one could transfer DV data only session wise to the Notebook, do the conversion and then start the analysis work. This would allow you to handle smaller files on your notebook. The problem we foresee is that most people will not have a continuous time code on their DV tape. This means that many recordings on one tape have identical time codes since every session starts from t=0. Therefore, there is no automatic way to tell Adobe Premiere to start and stop the transfer at specific times. You would have to enter the start/stop commands by hand, which is not precise and error prone.

© 2006 DoBeS Archive