Hiroya, S., Jasmin, K., Krishnan, S., Lima, C., Ostarek, M., Boebinger, D., & Scott, S. K.
(2016). Speech rhythm measure of non-native speech using a statistical phoneme duration model. Poster presented at the 8th Annual Meeting of the Society for the Neurobiology of Language, London, UK.
We normally understand speech in our native language without
effort. Recent brain imaging studies revealed a common cortical
activation in left-lateralized motor area for speech production
and perception. Moreover, the activity was increased by
listening to speech sounds with less natural frequency
information such as sinewave speech and noise-vocoded
speech. Rhythm is a natural part of speech. There is a difference
between a mora-timed rhythm like Japanese and a stress-
timed rhythm like English. A native Japanese speaker tends to
apply mora-timed rhythm to English. However, few studied
have investigated the neural mechanisms of the processing
of speech rhythm during speech perception. We developed a
method for decomposing speech signals into speech rhythm
and frequency information. English speech sounds spoken by
a native Japanese speaker were manipulated such that their
rhythm was stress-timed like English and more-timed like
Japanese. Stress-timed rhythm was obtained from a native
British English speakers’ speech. Noise-vocoding was used
to minimize contributions of F0 and to control intelligibility
across conditions. Twenty-one healthy right-handed native
English speakers were participated. FMRI was used to image
the brains of participants while they listened to the sentences.
Result showed that left-lateralized supplementary motor
area (SMA), a region involved in speech production, was
more activated for mora-timed rhythm (non-native rhythm)
than stress-timed rhythm. This suggests that integrating
non-native speech rhythm with native language speech may
rely on increased auditory-motor processing. In behavioral
testing, native English speakers judged the naturalness
of speaking rhythm of the sentences. Results confirmed
participants judged English rhythm as being most natural.
However, it is important that a difference between non-native
rhythm and stress-timed rhythm in English speech should be
quantified for further analysis. A pairwise variability index
(PVI) of vocalic intervals was proposed as a speech rhythm
measure. Native Japanese speakers tend to speak unnecessary
vowels in English because a mora basically ends in a vowel.
However, these unnecessary vowels affects PVI values: it is not appropriate to the quantification for non-native speech.
In this study, we developed a statistical model of phonemic
duration in English to be independent of a type of interval.
Speech stimuli of English sentences (TIMIT) spoken by both
English and Japanese native speakers were used. Phonemic
duration for each phoneme were determined by experts.
The expectation-maximization algorithm created a two-state
transition model of the phonemic duration for each native
language. Mean durations in each state were short and long,
respectively. Results showed that a variability among states
of self-transition probability for the native Japanese speaker
was significantly larger than for the native English speaker
(p < 0.01). This indicated that longer phonemic duration was
continuously repeated for native English speakers more than
for native Japanese speakers. This suggests that these structures
of phonemic duration affected activity in the speech perception