|Each item was final-gated at six points during the target
diphone (exceptions exist for initials, stops
and affricates see below). Thus, a stimulus
consisted of the entire item up to the gating point, including any preceding
Items were gated to 300 ms of a 500 Hz square wave, with a 5 ms period
during which the speech signal was ramped down and the square wave simultaneously
ramped up. A square wave was used rather than noise or silence, since it
is not misperceived as a speech sound, and therefore does not introduce
cues for a fricative or a voiceless stop at the gating point.
In order to specify the boundaries between diphones, both the waveform
and the wideband spectrogram of each item was inspected and labels were
set to indicate the on- and offset points of a single phoneme. A sample
analysis can be seen in the figure below, where "b" indicates the beginning
of the diphone, "m" the boundary between phoneme 1 and phoneme 2, and "e"
the end of the diphone. "2" indicates the onset of a stop release burst.
The actual gating of the diphones was done according to the following
The boundary between a nasal and a neighboring vowel or non-nasal
consonant was identified as the point in the spectrogram with the sudden
change in spectral distribution of the energy. For nasal-nasal diphones,
the boundary between the nasals was located based on spectral changes during
The boundary between a voiceless fricative and a vowel was
considered to be at the onset or cessation of the first formant of the
The boundary between voiced fricatives, including /h/, and vowels
was positioned at the onset or cessation of the first formant of the vowel.
The liquid /r/ was most often produced as a trill, in which case
the amplitude low point for the first tap of the trill, as determined from
the waveform, was taken as the onset of trilled /r/. Trilled /r/ often
had a slight burst at the end of the final tap, and the end of the /r/
was judged to be just after this burst, or at the amplitude low point if
there was no such burst. When /r/ was produced as an approximant or a fricative,
changes in formants or frication had to be used as the boundary instead.
For the liquid /l/ in syllable onsets, a sudden change in the distribution
of energy was usually visible in the spectrogram, as for nasals, and this
was taken as the segment boundary. Coda /l/ was rather dark, and the moment
of maximum decline of energy in the first and second formants of the preceding
vowel was taken as the onset of /l/.
The boundary between a glide and a vowel was identified as
the point halfway through the duration of the F2 transition. Boundaries
between glides and most consonants could be determined based on the criteria
for the other consonant, and for the boundary within a glide-glide diphone,
the same criteria as for glide-vowel sequences were used. /l, j, w/ initial
to the item sometimes had voiceless frication before the onset of voicing,
and this was included as part of the segment.
In vowel-vowel diphones, creaky voicing, the silence of a glottal
stop, or both seperated the two vowels. The end of the first vowel was
identified as the onset of creaky voicing, or the beginning of silence
for a glottal stop if no creaky voice was present. The third gate end point
was placed at the end of the second vowel, and the second gate end point
was placed halfway between the other two gating points.
For voiceless stops which were recorded in an
intervocalic environment, the boundaries of the stop were identified as
the cessation of voicing for the preceding vowel and onset of voicing for
the following one, as determined from the waveform and the voice bar. The
second gating point within the stop was placed just before the beginning
of the stop burst, and the first gating point was placed halfwaz through
the silence. For voiceless stops preceding or following other consonants,
the beginning of the stop was also identified as the beginning of the silence
for the stop closure, and the end as the end of the burst and onset of
the following segment. For voiceless stops initial to the item, only one
gate end point was used for the stop itself, at the end of the stop (onset
of voicong of following segment if voiced, or at onset of other criteria
for the following segment, such as friction, if voiceless). This is because
the usual earlier gate end points, during the stop closure, would produce
stimuli containing no speech signal at all. Therefore, diphones with a
voiceless stop as the first phoneme, if recorded without preceding environment,
had only four gates instead of the usual six.
For phonemically voiced stops, whether intervocalic,
initial to the item, or adjacent to a consonant, if prevoicing was produced,
the beginning of prevoicing was identified as the beginning of the stop.
The end of the stop was defined as the end of the burst. If voicing ceased
during the stop, the resumption if voicing for the following segment was
counted as the end of the burst, otherwise the end of the burst had to
be located from the spectrogram. For voiced stops with prevoicing, the
first gate end point within the stop was placed halfway through the prevoicing,
the second just before the beginning of the burst, and the last at the
end of the burst. Thus, diphones with such a voiced stop had six gates.
However, the stops /b, d, g/ in Dutch, although said to be fully voiced
initially rather than unaspirated as in English, are often produced without
prevoicing. If no prevoicing was visible in the waveform at all in initial
position, gate end points were placed as for a voiceless stop, producing four gates
for the diphone, but if any prevoicing was visible, the stop was treated
as a prevoiced voiced stop.
The only affricate /
/, was treated as a voiced stop for purposes of positioning gate end points
(first gating point halfway through the prevoicing, second just before
the burst, and third at the end of the affricate). No gate end point was
placed at the end of the burst noise before the frication noise, since
the burst without the affrication is rather brief.