Words within words in a real-speech corpus
Cutler, A., McQueen, J. M., Baayen, R. H., & Drexler, H.
Words within words in a real-speech corpus. In R. Togneri (Ed.
), Proceedings of the 5th Australian International Conference on Speech Science and Technology: Vol. 1
(pp. 362-367). Canberra: Australian Speech Science and Technology Association.
In a 50,000-word corpus of spoken British English the occurrence of words embedded within other words is reported. Within-word embedding in this real speech sample is common, and analogous to the extent of embedding observed in the vocabulary. Imposition of a syllable boundary matching constraint reduces but by no means eliminates spurious embedding. Embedded words are most likely to overlap with the beginning of matrix words, and thus may pose serious problems for speech recognisers.