Publications

Displaying 1 - 22 of 22
  • Brand, S., & Ernestus, M. (2021). Reduction of word-final obstruent-liquid-schwa clusters in Parisian French. Corpus Linguistics and Linguistic Theory, 17(1), 249-285. doi:10.1515/cllt-2017-0067.

    Abstract

    This corpus study investigated pronunciation variants of word-final obstruent-liquid-schwa (OLS) clusters in nouns in casual Parisian French. Results showed that at least one phoneme was absent in 80.7% of the 291 noun tokens in the dataset, and that the whole cluster was absent (e.g., [mis] for ministre) in no less than 15.5% of the tokens. We demonstrate that phonemes are not always completely absent, but that they may leave traces on neighbouring phonemes. Further, the clusters display undocumented voice assimilation patterns. Statistical modelling showed that a phoneme is most likely to be absent if the following phoneme is also absent. The durations of the phonemes are conditioned particularly by the position of the word in the prosodic phrase. We argue, on the basis of three different types of evidence, that in French word-final OLS clusters, the absence of obstruents is mainly due to gradient reduction processes, whereas the absence of schwa and liquids may also be due to categorical deletion processes.
  • Felker, E. R., Broersma, M., & Ernestus, M. (2021). The role of corrective feedback and lexical guidance in perceptual learning of a novel L2 accent in dialogue. Applied Psycholinguistics, 42, 1029-1055. doi:10.1017/S0142716421000205.

    Abstract

    Perceptual learning of novel accents is a critical skill for second-language speech perception, but little is known about the mechanisms that facilitate perceptual learning in communicative contexts. To study perceptual learning in an interactive dialogue setting while maintaining experimental control of the phonetic input, we employed an innovative experimental method incorporating prerecorded speech into a naturalistic conversation. Using both computer-based and face-to-face dialogue settings, we investigated the effect of two types of learning mechanisms in interaction: explicit corrective feedback and implicit lexical guidance. Dutch participants played an information-gap game featuring minimal pairs with an accented English speaker whose /ε/ pronunciations were shifted to /ɪ/. Evidence for the vowel shift came either from corrective feedback about participants’ perceptual mistakes or from onscreen lexical information that constrained their interpretation of the interlocutor’s words. Corrective feedback explicitly contrasting the minimal pairs was more effective than generic feedback. Additionally, both receiving lexical guidance and exhibiting more uptake for the vowel shift improved listeners’ subsequent online processing of accented words. Comparable learning effects were found in both the computer-based and face-to-face interactions, showing that our results can be generalized to a more naturalistic learning context than traditional computer-based perception training programs.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2021). Semantic sentence similarity: Size does not always matter. In Proceedings of Interspeech 2021 (pp. 4393-4397). doi:10.21437/Interspeech.2021-1464.

    Abstract

    This study addresses the question whether visually grounded speech recognition (VGS) models learn to capture sentence semantics without access to any prior linguistic knowledge. We produce synthetic and natural spoken versions of a well known semantic textual similarity database and show that our VGS model produces embeddings that correlate well with human semantic similarity judgements. Our results show that a model trained on a small image-caption database outperforms two models trained on much larger databases, indicating that database size is not all that matters. We also investigate the importance of having multiple captions per image and find that this is indeed helpful even if the total number of images is lower, suggesting that paraphrasing is a valuable learning signal. While the general trend in the field is to create ever larger datasets to train models on, our findings indicate other characteristics of the database can just as important.
  • Bürki, A., Ernestus, M., & Frauenfelder, U. H. (2010). Is there only one "fenêtre" in the production lexicon? On-line evidence on the nature of phonological representations of pronunciation variants for French schwa words. Journal of Memory and Language, 62, 421-437. doi:10.1016/j.jml.2010.01.002.

    Abstract

    This study examines whether the production of words with two phonological variants involves single or multiple lexical phonological representations. Three production experiments investigated the roles of the relative frequencies of the two pronunciation variants of French words with schwa: the schwa variant (e.g., Image ) and the reduced variant (e.g., Image ). In two naming tasks and in a symbol–word association learning task, variants with higher relative frequencies were produced faster. This suggests that the production lexicon keeps a frequency count for each variant and hence that schwa words are represented in the production lexicon with two different lexemes. In addition, the advantage for schwa variants over reduced variants in the naming tasks but not in the learning task and the absence of a variant relative frequency effect for schwa variants produced in isolation support the hypothesis that context affects the variants’ lexical activation and modulates the effect of variant relative frequency.
  • Hanique, I., Schuppler, B., & Ernestus, M. (2010). Morphological and predictability effects on schwa reduction: The case of Dutch word-initial syllables. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 933-936).

    Abstract

    This corpus-based study shows that the presence and duration of schwa in Dutch word-initial syllables are affected by a word’s predictability and its morphological structure. Schwa is less reduced in words that are more predictable given the following word. In addition, schwa may be longer if the syllable forms a prefix, and in prefixes the duration of schwa is positively correlated with the frequency of the word relative to its stem. Our results suggest that the conditions which favor reduced realizations are more complex than one would expect on the basis of the current literature.
  • Kuzla, C., Ernestus, M., & Mitterer, H. (2010). Compensation for assimilatory devoicing and prosodic structure in German fricative perception. In C. Fougeron, B. Kühnert, M. D'Imperio, & N. Vallée (Eds.), Laboratory Phonology 10 (pp. 731-757). Berlin: De Gruyter.
  • Pluymaekers, M., Ernestus, M., Baayen, R. H., & Booij, G. (2010). Morphological effects on fine phonetic detail: The case of Dutch -igheid. In C. Fougeron, B. Kühnert, M. D'Imperio, & N. Vallée (Eds.), Laboratory Phonology 10 (pp. 511-532). Berlin: De Gruyter.
  • Scharenborg, O., Wan, V., & Ernestus, M. (2010). Unsupervised speech segmentation: An analysis of the hypothesized phone boundaries. Journal of the Acoustical Society of America, 127, 1084-1095. doi:10.1121/1.3277194.

    Abstract

    Despite using different algorithms, most unsupervised automatic phone segmentation methods achieve similar performance in terms of percentage correct boundary detection. Nevertheless, unsupervised segmentation algorithms are not able to perfectly reproduce manually obtained reference transcriptions. This paper investigates fundamental problems for unsupervised segmentation algorithms by comparing a phone segmentation obtained using only the acoustic information present in the signal with a reference segmentation created by human transcribers. The analyses of the output of an unsupervised speech segmentation method that uses acoustic change to hypothesize boundaries showed that acoustic change is a fairly good indicator of segment boundaries: over two-thirds of the hypothesized boundaries coincide with segment boundaries. Statistical analyses showed that the errors are related to segment duration, sequences of similar segments, and inherently dynamic phones. In order to improve unsupervised automatic speech segmentation, current one-stage bottom-up segmentation methods should be expanded into two-stage segmentation methods that are able to use a mix of bottom-up information extracted from the speech signal and automatically derived top-down information. In this way, unsupervised methods can be improved while remaining flexible and language-independent.
  • Schuppler, B., Ernestus, M., Van Dommelen, W., & Koreman, J. (2010). Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2466-2469).

    Abstract

    This paper presents a study on the acoustic sub-segmental properties of word-final /t/ in conversational standard Dutch and how these properties contribute to whether humans and an ASR system classify the /t/ as acoustically present or absent. In general, humans and the ASR system use the same cues (presence of a constriction, a burst, and alveolar frication), but the ASR system is also less sensitive to fine cues (weak bursts, smoothly starting friction) than human listeners and misled by the presence of glottal vibration. These data inform the further development of models of human and automatic speech processing.
  • Sikveland, A., Öttl, A., Amdal, I., Ernestus, M., Svendsen, T., & Edlund, J. (2010). Spontal-N: A Corpus of Interactional Spoken Norwegian. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 2986-2991). Paris: European Language Resources Association (ELRA).

    Abstract

    Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first corpus of Norwegian in which the majority of speakers have spent significant parts of their lives in Sweden, and in which the recorded speech displays varying degrees of interference from Swedish. The corpus consists of studio quality audio- and video-recordings of four 30-minute free conversations between acquaintances, and a manual orthographic transcription of the entire material. On basis of the orthographic transcriptions, we automatically annotated approximately 50 percent of the material on the phoneme level, by means of a forced alignment between the acoustic signal and pronunciations listed in a dictionary. Approximately seven percent of the automatic transcription was manually corrected. Taking the manual correction as a gold standard, we evaluated several sources of pronunciation variants for the automatic transcription. Spontal-N is intended as a general purpose speech resource that is also suitable for investigating phonetic detail.
  • Spilková, H., Brenner, D., Öttl, A., Vondřička, P., Van Dommelen, W., & Ernestus, M. (2010). The Kachna L1/L2 picture replication corpus. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 2432-2436). Paris: European Language Resources Association (ELRA).

    Abstract

    This paper presents the Kachna corpus of spontaneous speech, in which ten Czech and ten Norwegian speakers were recorded both in their native language and in English. The dialogues are elicited using a picture replication task that requires active cooperation and interaction of speakers by asking them to produce a drawing as close to the original as possible. The corpus is appropriate for the study of interactional features and speech reduction phenomena across native and second languages. The combination of productions in non-native English and in speakers’ native language is advantageous for investigation of L2 issues while providing a L1 behaviour reference from all the speakers. The corpus consists of 20 dialogues comprising 12 hours 53 minutes of recording, and was collected in 2008. Preparation of the transcriptions, including a manual orthographic transcription and an automatically generated phonetic transcription, is currently in progress. The phonetic transcriptions are automatically generated by aligning acoustic models with the speech signal on the basis of the orthographic transcriptions and a dictionary of pronunciation variants compiled for the relevant language. Upon completion the corpus will be made available via the European Language Resources Association (ELRA).
  • Torreira, F., & Ernestus, M. (2010). Phrase-medial vowel devoicing in spontaneous French. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2006-2009).

    Abstract

    This study investigates phrase-medial vowel devoicing in European French (e.g. /ty po/ [typo] 'you can'). Our spontaneous speech data confirm that French phrase-medial devoicing is a frequent phenomenon affecting high vowels preceded by voiceless consonants. We also found that devoicing is more frequent in temporally reduced and coarticulated vowels. Complete and partial devoicing were conditioned by the same variables (speech rate, consonant type and distance from the end of the AP). Given these results, we propose that phrase-medial vowel devoicing in French arises mainly from the temporal compression of vocalic gestures and the aerodynamic conditions imposed by high vowels.
  • Torreira, F., Adda-Decker, M., & Ernestus, M. (2010). The Nijmegen corpus of casual French. Speech Communication, 52, 201-212. doi:10.1016/j.specom.2009.10.004.

    Abstract

    This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual French (NCCFr). The corpus contains a total of over 36 h of recordings of 46 French speakers engaged in conversations with friends. Casual speech was elicited during three different parts, which together provided around 90 min of speech from every pair of speakers. While Parts 1 and 2 did not require participants to perform any specific task, in Part 3 participants negotiated a common answer to general questions about society. Comparisons with the ESTER corpus of journalistic speech show that the two corpora contain speech of considerably different registers. A number of indicators of casualness, including swear words, casual words, verlan, disfluencies and word repetitions, are more frequent in the NCCFr than in the ESTER corpus, while the use of double negation, an indicator of formal speech, is less frequent. In general, these estimates of casualness are constant through the three parts of the recording sessions and across speakers. Based on these facts, we conclude that our corpus is a rich resource of highly casual speech, and that it can be effectively exploited by researchers in language science and technology.

    Files private

    Request files
  • Torreira, F., & Ernestus, M. (2010). The Nijmegen corpus of casual Spanish. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10) (pp. 2981-2985). Paris: European Language Resources Association (ELRA).

    Abstract

    This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual Spanish (NCCSp). The corpus contains around 30 hours of recordings of 52 Madrid Spanish speakers engaged in conversations with friends. Casual speech was elicited during three different parts, which together provided around ninety minutes of speech from every group of speakers. While Parts 1 and 2 did not require participants to perform any specific task, in Part 3 participants negotiated a common answer to general questions about society. Information about how to obtain a copy of the corpus can be found online at http://mirjamernestus.ruhosting.nl/Ernestus/NCCSp
  • Van de Ven, M., Tucker, B. V., & Ernestus, M. (2010). Semantic facilitation in bilingual everyday speech comprehension. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (Interspeech 2010), Makuhari, Japan (pp. 1245-1248).

    Abstract

    Previous research suggests that bilinguals presented with low and high predictability sentences benefit from semantics in clear but not in conversational speech [1]. In everyday speech, however, many words are not highly predictable. Previous research has shown that native listeners can use also more subtle semantic contextual information [2]. The present study reports two auditory lexical decision experiments investigating to what extent late Asian-English bilinguals benefit from subtle semantic cues in their processing of English unreduced and reduced speech. Our results indicate that these bilinguals are less sensitive to semantic cues than native listeners for both speech registers.
  • Ernestus, M., Mak, W. M., & Baayen, R. H. (2005). Waar 't kofschip strandt. Levende Talen Magazine, 92, 9-11.
  • Ernestus, M., & Mak, W. M. (2005). Analogical effects in reading Dutch verb forms. Memory & Cognition, 33(7), 1160-1173.

    Abstract

    Previous research has shown that the production of morphologically complex words in isolation is affected by the properties of morphologically, phonologically, or semantically similar words stored in the mental lexicon. We report five experiments with Dutch speakers that show that reading an inflectional word form in its linguistic context is also affected by analogical sets of formally similar words. Using the self-paced reading technique, we show in Experiments 1-3 that an incorrectly spelled suffix delays readers less if the incorrect spelling is in line with the spelling of verbal suffixes in other inflectional forms of the same verb. In Experiments 4 and 5, our use of the self-paced reading technique shows that formally similar words with different stems affect the reading of incorrect suffixal allomorphs on a given stem. These intra- and interparadigmatic effects in reading may be due to online processes or to the storage of incorrect forms resulting from analogical effects in production.
  • Kemps, R. J. J. K., Wurm, L. H., Ernestus, M., Schreuder, R., & Baayen, R. H. (2005). Prosodic cues for morphological complexity in Dutch and English. Language and Cognitive Processes, 20(1/2), 43-73. doi:10.1080/01690960444000223.

    Abstract

    Previous work has shown that Dutch listeners use prosodic information in the speech signal to optimise morphological processing: Listeners are sensitive to prosodic differences between a noun stem realised in isolation and a noun stem realised as part of a plural form (in which the stem is followed by an unstressed syllable). The present study, employing a lexical decision task, provides an additional demonstration of listeners' sensitivity to prosodic cues in the stem. This sensitivity is shown for two languages that differ in morphological productivity: Dutch and English. The degree of morphological productivity does not correlate with listeners' sensitivity to prosodic cues in the stem, but it is reflected in differential sensitivities to the word-specific log odds ratio of encountering an unshortened stem (i.e., a stem in isolation) versus encountering a shortened stem (i.e., a stem followed by a suffix consisting of one or more unstressed syllables). In addition to being sensitive to the prosodic cues themselves, listeners are also sensitive to the probabilities of occurrence of these prosodic cues.
  • Kemps, R. J. J. K., Ernestus, M., Schreuder, R., & Baayen, R. H. (2005). Prosodic cues for morphological complexity: The case of Dutch plural nouns. Memory & Cognition, 33(3), 430-446.

    Abstract

    It has recently been shown that listeners use systematic differences in vowel length and intonation to resolve ambiguities between onset-matched simple words (Davis, Marslen-Wilson, & Gaskell, 2002; Salverda, Dahan, & McQueen, 2003). The present study shows that listeners also use prosodic information in the speech signal to optimize morphological processing. The precise acoustic realization of the stem provides crucial information to the listener about the morphological context in which the stem appears and attenuates the competition between stored inflectional variants. We argue that listeners are able to make use of prosodic information, even though the speech signal is highly variable within and between speakers, by virtue of the relative invariance of the duration of the onset. This provides listeners with a baseline against which the durational cues in a vowel and a coda can be evaluated. Furthermore, our experiments provide evidence for item-specific prosodic effects.
  • Keune, K., Ernestus, M., Van Hout, R., & Baayen, R. H. (2005). Variation in Dutch: From written "mogelijk" to spoken "mok". Corpus Linguistics and Linguistic Theory, 1(2), 183-223. doi:10.1515/cllt.2005.1.2.183.

    Abstract

    In Dutch, high-frequency words with the suffix -lijk are often highly reduced in spontaneous unscripted speech. This study addressed socio-geographic variation in the reduction of such words against the backdrop of the variation in their use in written and spoken Dutch. Multivariate analyses of the frequencies with which the words were used in a factorially contrasted set of subcorpora revealed signi ficant variation involving the speaker's country, sex, and education level for spoken Dutch, and involving country and register for written Dutch. Acoustic analyses revealed that Dutch men reduced most often, while Flemish highly educated women reduced least. Two linguistic context effects emerged, one prosodic, and the other pertaining to the flow of information. Words in sentence final position showed less reduction, while words that were better predictable from the preceding word in the sentence(based on mutual information) tended to be reduced more often. The increased probability of reduction for forms that are more predictable in context, combined with the loss of the suffix in the more extremely reduced forms, suggests that highfrequency words in -lijk are undergoing a process of erosion that causes them to gravitate towards monomorphemic function words.
  • Pluymaekers, M., Ernestus, M., & Baayen, R. H. (2005). Articulatory planning is continuous and sensitive to informational redundancy. Phonetica, 62(2-4), 146-159. doi:10.1159/000090095.

    Abstract

    This study investigates the relationship between word repetition, predictability from neighbouring words, and articulatory reduction in Dutch. For the seven most frequent words ending in the adjectival suffix -lijk, 40 occurrences were randomly selected from a large database of face-to-face conversations. Analysis of the selected tokens showed that the degree of articulatory reduction (as measured by duration and number of realized segments) was affected by repetition, predictability from the previous word and predictability from the following word. Interestingly, not all of these effects were significant across morphemes and target words. Repetition effects were limited to suffixes, while effects of predictability from the previous word were restricted to the stems of two of the seven target words. Predictability from the following word affected the stems of all target words equally, but not all suffixes. The implications of these findings for models of speech production are discussed.
  • Pluymaekers, M., Ernestus, M., & Baayen, R. H. (2005). Lexical frequency and acoustic reduction in spoken Dutch. Journal of the Acoustical Society of America, 118(4), 2561-2569. doi:10.1121/1.2011150.

    Abstract

    This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with different frequencies. The materials came from a large database of face-to-face conversations. For each word containing a target affix, one token was randomly selected for acoustic analysis. Measurements were made of the duration of the affix as a whole and the durations of the individual segments in the affix. For three of the four affixes, a higher frequency of the carrier word led to shorter realizations of the affix as a whole, individual segments in the affix, or both. Other relevant factors were the sex and age of the speaker, segmental context, and speech rate. To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes.

Share this page