Database of Dutch diphone perception

Material -- Gating

paper preprint

DCS project

Each item was final-gated at six points during the target diphone (exceptions exist for initials, stops and affricates see below). Thus, a stimulus consisted of the entire item up to the gating point, including any preceding context. 

Items were gated to 300 ms of a 500 Hz square wave, with a 5 ms period during which the speech signal was ramped down and the square wave simultaneously ramped up. A square wave was used rather than noise or silence, since it is not misperceived as a speech sound, and therefore does not introduce cues for a fricative or a voiceless stop at the gating point. 

In order to specify the boundaries between diphones, both the waveform and the wideband spectrogram of each item was inspected and labels were set to indicate the on- and offset points of a single phoneme. A sample analysis can be seen in the figure below, where "b" indicates the beginning of the diphone, "m" the boundary between phoneme 1 and phoneme 2, and "e" the end of the diphone. "2" indicates the onset of a stop release burst.


The actual gating of the diphones was done according to the following guidelines:

  1. The boundary between a nasal and a neighboring vowel or non-nasal consonant was identified as the point in the spectrogram with the sudden change in spectral distribution of the energy. For nasal-nasal diphones, the boundary between the nasals was located based on spectral changes during the nasalization.
  2. The boundary between a voiceless fricative and a vowel was considered to be at the onset or cessation of the first formant of the vowel.
  3. The boundary between voiced fricatives, including /h/, and vowels was positioned at the onset or cessation of the first formant of the vowel.
  4. The liquid /r/ was most often produced as a trill, in which case the amplitude low point for the first tap of the trill, as determined from the waveform, was taken as the onset of trilled /r/. Trilled /r/ often had a slight burst at the end of the final tap, and the end of the /r/ was judged to be just after this burst, or at the amplitude low point if there was no such burst. When /r/ was produced as an approximant or a fricative, changes in formants or frication had to be used as the boundary instead.
  5. For the liquid /l/ in syllable onsets, a sudden change in the distribution of energy was usually visible in the spectrogram, as for nasals, and this was taken as the segment boundary. Coda /l/ was rather dark, and the moment of maximum decline of energy in the first and second formants of the preceding vowel was taken as the onset of /l/.
  6. The boundary between a glide and a vowel was identified as the point halfway through the duration of the F2 transition. Boundaries between glides and most consonants could be determined based on the criteria for the other consonant, and for the boundary within a glide-glide diphone, the same criteria as for glide-vowel sequences were used. /l, j, w/ initial to the item sometimes had voiceless frication before the onset of voicing, and this was included as part of the segment.
  7. In vowel-vowel diphones, creaky voicing, the silence of a glottal stop, or both seperated the two vowels. The end of the first vowel was identified as the onset of creaky voicing, or the beginning of silence for a glottal stop if no creaky voice was present. The third gate end point was placed at the end of the second vowel, and the second gate end point was placed halfway between the other two gating points.
  8. For voiceless stops which were recorded in an intervocalic environment, the boundaries of the stop were identified as the cessation of voicing for the preceding vowel and onset of voicing for the following one, as determined from the waveform and the voice bar. The second gating point within the stop was placed just before the beginning of the stop burst, and the first gating point was placed halfwaz through the silence. For voiceless stops preceding or following other consonants, the beginning of the stop was also identified as the beginning of the silence for the stop closure, and the end as the end of the burst and onset of the following segment. For voiceless stops initial to the item, only one gate end point was used for the stop itself, at the end of the stop (onset of voicong of following segment if voiced, or at onset of other criteria for the following segment, such as friction, if voiceless). This is because the usual earlier gate end points, during the stop closure, would produce stimuli containing no speech signal at all. Therefore, diphones with a voiceless stop as the first phoneme, if recorded without preceding environment, had only four gates instead of the usual six.
  9. For phonemically voiced stops, whether intervocalic, initial to the item, or adjacent to a consonant, if prevoicing was produced, the beginning of prevoicing was identified as the beginning of the stop. The end of the stop was defined as the end of the burst. If voicing ceased during the stop, the resumption if voicing for the following segment was counted as the end of the burst, otherwise the end of the burst had to be located from the spectrogram. For voiced stops with prevoicing, the first gate end point within the stop was placed halfway through the prevoicing, the second just before the beginning of the burst, and the last at the end of the burst. Thus, diphones with such a voiced stop had six gates. However, the stops /b, d, g/ in Dutch, although said to be fully voiced initially rather than unaspirated as in English, are often produced without prevoicing. If no prevoicing was visible in the waveform at all in initial position, gate end points were placed as for a voiceless stop, producing four gates for the diphone, but if any prevoicing was visible, the stop was treated as a prevoiced voiced stop.
  10. The only affricated-yogh /, was treated as a voiced stop for purposes of positioning gate end points (first gating point halfway through the prevoicing, second just before the burst, and third at the end of the affricate). No gate end point was placed at the end of the burst noise before the frication noise, since the burst without the affrication is rather brief.