Nix, A. J., Mehta, G., Dye, J., & Cutler, A.
(1993). Phoneme detection as a tool for comparing perception of natural and synthetic speech. Computer Speech and Language, 7, 211-228. doi:10.1006/csla.1993.1011.
On simple intelligibility measures, high-quality synthesiser output now scores almost as well as natural speech. Nevertheless, it is widely agreed that perception of synthetic speech is a harder task for listeners than perception of natural speech; in particular, it has been hypothesized that listeners have difficulty identifying phonemes in synthetic speech. If so, a simple measure of the speed with which a phoneme can be identified should prove a useful tool for comparing perception of synthetic and natural speech. The phoneme detection task was here used in three experiments comparing perception of natural and synthetic speech. In the first, response times to synthetic and natural targets were not significantly different, but in the second and third experiments response times to synthetic targets were significantly slower than to natural targets. A speed-accuracy tradeoff in the third experiment suggests that an important factor in this task is the response criterion adopted by subjects. It is concluded that the phoneme detection task is a useful tool for investigating phonetic processing of synthetic speech input, but subjects must be encouraged to adopt a response criterion which emphasizes rapid responding. When this is the case, significantly longer response times for synthetic targets can indicate a processing disadvantage for synthetic speech at an early level of phonetic analysis.