New method for delexicalization and its application to prosodic tagging for text-to-speech synthesis
Vainio, M., Suni, A., Raitio, T., Nurminen, J., Järvikivi, J., & Alku, P.
New method for delexicalization and its application to prosodic tagging for text-to-speech synthesis. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009)
This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibility to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delexicalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The experiment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.