Displaying 1 - 30 of 30
-
Bürki, A., Ernestus, M., & Frauenfelder, U. H. (2010). Is there only one "fenêtre" in the production lexicon? On-line evidence on the nature of phonological representations of pronunciation variants for French schwa words. Journal of Memory and Language, 62, 421-437. doi:10.1016/j.jml.2010.01.002.
Abstract
This study examines whether the production of words with two phonological variants involves single or multiple lexical phonological representations. Three production experiments investigated the roles of the relative frequencies of the two pronunciation variants of French words with schwa: the schwa variant (e.g., Image ) and the reduced variant (e.g., Image ). In two naming tasks and in a symbol–word association learning task, variants with higher relative frequencies were produced faster. This suggests that the production lexicon keeps a frequency count for each variant and hence that schwa words are represented in the production lexicon with two different lexemes. In addition, the advantage for schwa variants over reduced variants in the naming tasks but not in the learning task and the absence of a variant relative frequency effect for schwa variants produced in isolation support the hypothesis that context affects the variants’ lexical activation and modulates the effect of variant relative frequency. -
Hanique, I., Schuppler, B., & Ernestus, M. (2010). Morphological and predictability effects on schwa reduction: The case of Dutch word-initial syllables. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 933-936).
Abstract
This corpus-based study shows that the presence and duration of schwa in Dutch word-initial syllables are affected by a word’s predictability and its morphological structure. Schwa is less reduced in words that are more predictable given the following word. In addition, schwa may be longer if the syllable forms a prefix, and in prefixes the duration of schwa is positively correlated with the frequency of the word relative to its stem. Our results suggest that the conditions which favor reduced realizations are more complex than one would expect on the basis of the current literature. -
Kuzla, C., Ernestus, M., & Mitterer, H. (2010). Compensation for assimilatory devoicing and prosodic structure in German fricative perception. In C. Fougeron, B. Kühnert, M. D'Imperio, & N. Vallée (
Eds. ), Laboratory Phonology 10 (pp. 731-757). Berlin: De Gruyter.Files private
Request files -
Pluymaekers, M., Ernestus, M., Baayen, R. H., & Booij, G. (2010). Morphological effects on fine phonetic detail: The case of Dutch -igheid. In C. Fougeron, B. Kühnert, M. D'Imperio, & N. Vallée (
Eds. ), Laboratory Phonology 10 (pp. 511-532). Berlin: De Gruyter. -
Scharenborg, O., Wan, V., & Ernestus, M. (2010). Unsupervised speech segmentation: An analysis of the hypothesized phone boundaries. Journal of the Acoustical Society of America, 127, 1084-1095. doi:10.1121/1.3277194.
Abstract
Despite using different algorithms, most unsupervised automatic phone segmentation methods achieve similar performance in terms of percentage correct boundary detection. Nevertheless, unsupervised segmentation algorithms are not able to perfectly reproduce manually obtained reference transcriptions. This paper investigates fundamental problems for unsupervised segmentation algorithms by comparing a phone segmentation obtained using only the acoustic information present in the signal with a reference segmentation created by human transcribers. The analyses of the output of an unsupervised speech segmentation method that uses acoustic change to hypothesize boundaries showed that acoustic change is a fairly good indicator of segment boundaries: over two-thirds of the hypothesized boundaries coincide with segment boundaries. Statistical analyses showed that the errors are related to segment duration, sequences of similar segments, and inherently dynamic phones. In order to improve unsupervised automatic speech segmentation, current one-stage bottom-up segmentation methods should be expanded into two-stage segmentation methods that are able to use a mix of bottom-up information extracted from the speech signal and automatically derived top-down information. In this way, unsupervised methods can be improved while remaining flexible and language-independent. -
Schuppler, B., Ernestus, M., Van Dommelen, W., & Koreman, J. (2010). Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2466-2469).
Abstract
This paper presents a study on the acoustic sub-segmental properties of word-final /t/ in conversational standard Dutch and how these properties contribute to whether humans and an ASR system classify the /t/ as acoustically present or absent. In general, humans and the ASR system use the same cues (presence of a constriction, a burst, and alveolar frication), but the ASR system is also less sensitive to fine cues (weak bursts, smoothly starting friction) than human listeners and misled by the presence of glottal vibration. These data inform the further development of models of human and automatic speech processing. -
Sikveland, A., Öttl, A., Amdal, I., Ernestus, M., Svendsen, T., & Edlund, J. (2010). Spontal-N: A Corpus of Interactional Spoken Norwegian. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (
Eds. ), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 2986-2991). Paris: European Language Resources Association (ELRA).Abstract
Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first corpus of Norwegian in which the majority of speakers have spent significant parts of their lives in Sweden, and in which the recorded speech displays varying degrees of interference from Swedish. The corpus consists of studio quality audio- and video-recordings of four 30-minute free conversations between acquaintances, and a manual orthographic transcription of the entire material. On basis of the orthographic transcriptions, we automatically annotated approximately 50 percent of the material on the phoneme level, by means of a forced alignment between the acoustic signal and pronunciations listed in a dictionary. Approximately seven percent of the automatic transcription was manually corrected. Taking the manual correction as a gold standard, we evaluated several sources of pronunciation variants for the automatic transcription. Spontal-N is intended as a general purpose speech resource that is also suitable for investigating phonetic detail. -
Spilková, H., Brenner, D., Öttl, A., Vondřička, P., Van Dommelen, W., & Ernestus, M. (2010). The Kachna L1/L2 picture replication corpus. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (
Eds. ), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 2432-2436). Paris: European Language Resources Association (ELRA).Abstract
This paper presents the Kachna corpus of spontaneous speech, in which ten Czech and ten Norwegian speakers were recorded both in their native language and in English. The dialogues are elicited using a picture replication task that requires active cooperation and interaction of speakers by asking them to produce a drawing as close to the original as possible. The corpus is appropriate for the study of interactional features and speech reduction phenomena across native and second languages. The combination of productions in non-native English and in speakers’ native language is advantageous for investigation of L2 issues while providing a L1 behaviour reference from all the speakers. The corpus consists of 20 dialogues comprising 12 hours 53 minutes of recording, and was collected in 2008. Preparation of the transcriptions, including a manual orthographic transcription and an automatically generated phonetic transcription, is currently in progress. The phonetic transcriptions are automatically generated by aligning acoustic models with the speech signal on the basis of the orthographic transcriptions and a dictionary of pronunciation variants compiled for the relevant language. Upon completion the corpus will be made available via the European Language Resources Association (ELRA). -
Torreira, F., & Ernestus, M. (2010). Phrase-medial vowel devoicing in spontaneous French. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 2006-2009).
Abstract
This study investigates phrase-medial vowel devoicing in European French (e.g. /ty po/ [typo] 'you can'). Our spontaneous speech data confirm that French phrase-medial devoicing is a frequent phenomenon affecting high vowels preceded by voiceless consonants. We also found that devoicing is more frequent in temporally reduced and coarticulated vowels. Complete and partial devoicing were conditioned by the same variables (speech rate, consonant type and distance from the end of the AP). Given these results, we propose that phrase-medial vowel devoicing in French arises mainly from the temporal compression of vocalic gestures and the aerodynamic conditions imposed by high vowels. -
Torreira, F., Adda-Decker, M., & Ernestus, M. (2010). The Nijmegen corpus of casual French. Speech Communication, 52, 201-212. doi:10.1016/j.specom.2009.10.004.
Abstract
This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual French (NCCFr). The corpus contains a total of over 36 h of recordings of 46 French speakers engaged in conversations with friends. Casual speech was elicited during three different parts, which together provided around 90 min of speech from every pair of speakers. While Parts 1 and 2 did not require participants to perform any specific task, in Part 3 participants negotiated a common answer to general questions about society. Comparisons with the ESTER corpus of journalistic speech show that the two corpora contain speech of considerably different registers. A number of indicators of casualness, including swear words, casual words, verlan, disfluencies and word repetitions, are more frequent in the NCCFr than in the ESTER corpus, while the use of double negation, an indicator of formal speech, is less frequent. In general, these estimates of casualness are constant through the three parts of the recording sessions and across speakers. Based on these facts, we conclude that our corpus is a rich resource of highly casual speech, and that it can be effectively exploited by researchers in language science and technology.Files private
Request files -
Torreira, F., & Ernestus, M. (2010). The Nijmegen corpus of casual Spanish. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (
Eds. ), Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10) (pp. 2981-2985). Paris: European Language Resources Association (ELRA).Abstract
This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual Spanish (NCCSp). The corpus contains around 30 hours of recordings of 52 Madrid Spanish speakers engaged in conversations with friends. Casual speech was elicited during three different parts, which together provided around ninety minutes of speech from every group of speakers. While Parts 1 and 2 did not require participants to perform any specific task, in Part 3 participants negotiated a common answer to general questions about society. Information about how to obtain a copy of the corpus can be found online at http://mirjamernestus.ruhosting.nl/Ernestus/NCCSp -
Van de Ven, M., Tucker, B. V., & Ernestus, M. (2010). Semantic facilitation in bilingual everyday speech comprehension. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (Interspeech 2010), Makuhari, Japan (pp. 1245-1248).
Abstract
Previous research suggests that bilinguals presented with low and high predictability sentences benefit from semantics in clear but not in conversational speech [1]. In everyday speech, however, many words are not highly predictable. Previous research has shown that native listeners can use also more subtle semantic contextual information [2]. The present study reports two auditory lexical decision experiments investigating to what extent late Asian-English bilinguals benefit from subtle semantic cues in their processing of English unreduced and reduced speech. Our results indicate that these bilinguals are less sensitive to semantic cues than native listeners for both speech registers. -
Ernestus, M. (2006). Statistically gradient generalizations for contrastive phonological features. The Linguistic Review, 23(3), 217-233. doi:10.1515/TLR.2006.008.
Abstract
In mainstream phonology, contrastive properties, like stem-final voicing, are simply listed in the lexicon. This article reviews experimental evidence that such contrastive properties may be predictable to some degree and that the relevant statistically gradient generalizations form an inherent part of the grammar. The evidence comes from the underlying voice specification of stem-final obstruents in Dutch. Contrary to received wisdom, this voice specification is partly predictable from the obstruent’s manner and place of articulation and from the phonological properties of the preceding segments. The degree of predictability, which depends on the exact contents of the lexicon, directs speakers’ guesses of underlying voice specifications. Moreover, existing words that disobey the generalizations are disadvantaged by being recognized and produced more slowly and less accurately, also under natural conditions.We discuss how these observations can be accounted for in two types of different approaches to grammar, Stochastic Optimality Theory and exemplar-based modeling. -
Ernestus, M., & Baayen, R. H. (2006). The functionality of incomplete neutralization in Dutch: The case of past-tense formation. In L. Goldstein, D. Whalen, & C. Best (
Eds. ), Laboratory Phonology 8 (pp. 27-49). Berlin: Mouton de Gruyter. -
Ernestus, M., Lahey, M., Verhees, F., & Baayen, R. H. (2006). Lexical frequency and voice assimilation. Journal of the Acoustical Society of America, 120(2), 1040-1051. doi:10.1121/1.2211548.
Abstract
Acoustic duration and degree of vowel reduction are known to correlate with a word’s frequency of occurrence. The present study broadens the research on the role of frequency in speech production to voice assimilation. The test case was regressive voice assimilation in Dutch. Clusters from a corpus of read speech were more often perceived as unassimilated in lower-frequency words and as either completely voiced regressive assimilation or, unexpectedly, as completely voiceless progressive assimilation in higher-frequency words. Frequency did not predict the voice classifications over and above important acoustic cues to voicing, suggesting that the frequency effects on the classifications were carried exclusively by the acoustic signal. The duration of the cluster and the period of glottal vibration during the cluster decreased while the duration of the release noises increased with frequency. This indicates that speakers reduce articulatory effort for higher-frequency words, with some acoustic cues signaling more voicing and others less voicing. A higher frequency leads not only to acoustic reduction but also to more assimilation. -
Kuzla, C., Mitterer, H., Ernestus, M., & Cutler, A. (2006). Perceptual compensation for voice assimilation of German fricatives. In P. Warren, & I. Watson (
Eds. ), Proceedings of the 11th Australasian International Conference on Speech Science and Technology (pp. 394-399).Abstract
In German, word-initial lax fricatives may be produced with substantially reduced glottal vibration after voiceless obstruents. This assimilation occurs more frequently and to a larger extent across prosodic word boundaries than across phrase boundaries. Assimilatory devoicing makes the fricatives more similar to their tense counterparts and could thus hinder word recognition. The present study investigates how listeners cope with assimilatory devoicing. Results of a cross-modal priming experiment indicate that listeners compensate for assimilation in appropriate contexts. Prosodic structure moderates compensation for assimilation: Compensation occurs especially after phrase boundaries, where devoiced fricatives are sufficiently long to be confused with their tense counterparts. -
Kuzla, C., Ernestus, M., & Mitterer, H. (2006). Prosodic structure affects the production and perception of voice-assimilated German fricatives. In R. Hoffmann, & H. Mixdorff (
Eds. ), Speech prosody 2006. Dresden: TUD Press.Abstract
Prosodic structure has long been known to constrain phonological processes [1]. More recently, it has also been recognized as a source of fine-grained phonetic variation of speech sounds. In particular, segments in domain-initial position undergo prosodic strengthening [2, 3], which also implies more resistance to coarticulation in higher prosodic domains [5]. The present study investigates the combined effects of prosodic strengthening and assimilatory devoicing on word-initial fricatives in German, the functional implication of both processes for cues to the fortis-lenis contrast, and the influence of prosodic structure on listeners’ compensation for assimilation. Results indicate that 1. Prosodic structure modulates duration and the degree of assimilatory devoicing, 2. Phonological contrasts are maintained by speakers, but differ in phonetic detail across prosodic domains, and 3. Compensation for assimilation in perception is moderated by prosodic structure and lexical constraints. -
Kuzla, C., Mitterer, H., & Ernestus, M. (2006). Compensation for assimilatory devoicing and prosodic structure in German fricative perception. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 43-44).
-
Mitterer, H., & Ernestus, M. (2006). Listeners recover /t/s that speakers reduce: Evidence from /t/-lenition in Dutch. Journal of Phonetics, 34(1), 73-103. doi:10.1016/j.wocn.2005.03.003.
Abstract
In everyday speech, words may be reduced. Little is known about the consequences of such reductions for spoken word comprehension. This study investigated /t/-lenition in Dutch in two corpus studies and three perceptual experiments. The production studies revealed that /t/-lenition is most likely to occur after [s] and before bilabial consonants. The perception experiments showed that listeners take into account both phonological context, phonetic detail, and the lexical status of the form in the interpretation of codas that may or may not contain a lenited word-final /t/. These results speak against models of word recognition that make hard decisions on a prelexical level. -
Pluymaekers, M., Ernestus, M., Baayen, R. H., & Booij, G. (2006). The role of morphology in fine phonetic detail: The case of Dutch -igheid. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 53-54).
-
Pluymaekers, M., Ernestus, M., & Baayen, R. H. (2006). Effects of word frequency on the acoustic durations of affixes. In Proceedings of Interspeech 2006 (pp. 953-956). Pittsburgh: ICSLP.
Abstract
This study investigates whether the acoustic durations of derivational affixes in Dutch are affected by the frequency of the word they occur in. In a word naming experiment, subjects were presented with a large number of words containing one of the affixes ge-, ver-, ont, or -lijk. Their responses were recorded on DAT tapes, and the durations of the affixes were measured using Automatic Speech Recognition technology. To investigate whether frequency also affected durations when speech rate was high, the presentation rate of the stimuli was varied. The results show that a higher frequency of the word as a whole led to shorter acoustic realizations for all affixes. Furthermore, affixes became shorter as the presentation rate of the stimuli increased. There was no interaction between word frequency and presentation rate, suggesting that the frequency effect also applies in situations in which the speed of articulation is very high. -
Ten Bosch, L., Baayen, R. H., & Ernestus, M. (2006). On speech variation and word type differentiation by articulatory feature representations. In Proceedings of Interspeech 2006 (pp. 2230-2233).
Abstract
This paper describes ongoing research aiming at the description of variation in speech as represented by asynchronous articulatory features. We will first illustrate how distances in the articulatory feature space can be used for event detection along speech trajectories in this space. The temporal structure imposed by the cosine distance in articulatory feature space coincides to a large extent with the manual segmentation on phone level. The analysis also indicates that the articulatory feature representation provides better such alignments than the MFCC representation does. Secondly, we will present first results that indicate that articulatory features can be used to probe for acoustic differences in the onsets of Dutch singulars and plurals. -
Wagner, A., Ernestus, M., & Cutler, A. (2006). Formant transitions in fricative identification: The role of native fricative inventory. Journal of the Acoustical Society of America, 120(4), 2267-2277. doi:10.1121/1.2335422.
Abstract
The distribution of energy across the noise spectrum provides the primary cues for the identification of a fricative. Formant transitions have been reported to play a role in identification of some fricatives, but the combined results so far are conflicting. We report five experiments testing the hypothesis that listeners differ in their use of formant transitions as a function of the presence of spectrally similar fricatives in their native language. Dutch, English, German, Polish, and Spanish native listeners performed phoneme monitoring experiments with pseudowords containing either coherent or misleading formant transitions for the fricatives / s / and / f /. Listeners of German and Dutch, both languages without spectrally similar fricatives, were not affected by the misleading formant transitions. Listeners of the remaining languages were misled by incorrect formant transitions. In an untimed labeling experiment both Dutch and Spanish listeners provided goodness ratings that revealed sensitivity to the acoustic manipulation. We conclude that all listeners may be sensitive to mismatching information at a low auditory level, but that they do not necessarily take full advantage of all available systematic acoustic variation when identifying phonemes. Formant transitions may be most useful for listeners of languages with spectrally similar fricatives. -
Wurm, L. H., Ernestus, M., Schreuder, R., & Baayen, R. H. (2006). Dynamics of the auditory comprehension of prefixed words: Cohort entropies and conditional root uniqueness points. The Mental Lexicon, 1(1), 125-146.
Abstract
This auditory lexical decision study shows that cohort entropies, conditional root uniqueness points, and morphological family size all contribute to the dynamics of the auditory comprehension of prefixed words. Three entropy measures calculated for different positions in the stem of Dutch prefixed words revealed facilitation for higher entropies, except at the point of disambiguation, where we observed inhibition. Morphological family size was also facilitatory, but only for prefixed words in which the conditional root uniqueness point coincided with the conventional uniqueness point. For words with early conditional disambiguation, in contrast, only the morphologically related words that were onset-aligned with the target word facilitated lexical decision. -
Ernestus, M., & Mak, W. M. (2004). Distinctive phonological features differ in relevance for both spoken and written word recognition. Brain and Language, 90(1-3), 378-392. doi:10.1016/S0093-934X(03)00449-8.
Abstract
This paper discusses four experiments on Dutch which show that distinctive phonological features differ in their relevance for word recognition. The relevance of a feature for word recognition depends on its phonological stability, that is, the extent to which that feature is generally realized in accordance with its lexical specification in the relevant word position. If one feature value is uninformative, all values of that feature are less relevant for word recognition, with the least informative feature being the least relevant. Features differ in their relevance both in spoken and written word recognition, though the differences are more pronounced in auditory lexical decision than in self-paced reading. -
Ernestus, M., & Baayen, R. H. (2004). Analogical effects in regular past tense production in Dutch. Linguistics, 42(5), 873-903. doi:10.1515/ling.2004.031.
Abstract
This study addresses the question to what extent the production of regular past tense forms in Dutch is a¤ected by analogical processes. We report an experiment in which native speakers of Dutch listened to existing regular verbs over headphones, and had to indicate which of the past tense allomorphs, te or de, was appropriate for these verbs. According to generative analyses, the choice between the two su‰xes is completely regular and governed by the underlying [voice]-specification of the stem-final segment. In this approach, no analogical e¤ects are expected. In connectionist and analogical approaches, by contrast, the phonological similarity structure in the lexicon is expected to a¤ect lexical processing. Our experimental results support the latter approach: all participants created more nonstandard past tense forms, produced more inconsistency errors, and responded more slowly for verbs with stronger analogical support for the nonstandard form. -
Ernestus, M., & Baayen, R. H. (2004). Kuchde, tobte, en turfte: Lekkage in 't kofschip. Onze Taal, 73(12), 360-361.
-
Kemps, R. J. J. K., Ernestus, M., Schreuder, R., & Baayen, R. H. (2004). Processing reduced word forms: The suffix restoration effect. Brain and Language, 90(1-3), 117-127. doi:10.1016/S0093-934X(03)00425-5.
Abstract
Listeners cannot recognize highly reduced word forms in isolation, but they can do so when these forms are presented in context (Ernestus, Baayen, & Schreuder, 2002). This suggests that not all possible surface forms of words have equal status in the mental lexicon. The present study shows that the reduced forms are linked to the canonical representations in the mental lexicon, and that these latter representations induce reconstruction processes. Listeners restore suffixes that are partly or completely missing in reduced word forms. A series of phoneme-monitoring experiments reveals the nature of this restoration: the basis for suffix restoration is mainly phonological in nature, but orthography has an influence as well. -
Moscoso del Prado Martín, F., Ernestus, M., & Baayen, R. H. (2004). Do type and token effects reflect different mechanisms? Connectionist modeling of Dutch past-tense formation and final devoicing. Brain and Language, 90(1-3), 287-298. doi:10.1016/j.bandl.2003.12.002.
Abstract
In this paper, we show that both token and type-based effects in lexical processing can result from a single, token-based, system, and therefore, do not necessarily reflect different levels of processing. We report three Simple Recurrent Networks modeling Dutch past-tense formation. These networks show token-based frequency effects and type-based analogical effects closely matching the behavior of human participants when producing past-tense forms for both existing verbs and pseudo-verbs. The third network covers the full vocabulary of Dutch, without imposing predefined linguistic structure on the input or output words. -
Rietveld, T., Van Hout, R., & Ernestus, M. (2004). Pitfalls in corpus research. Computers and the Humanities, 38(4), 343-362. doi:10.1007/s10579-004-1919-1.
Abstract
This paper discusses some pitfalls in corpus research and suggests solutions on the basis of examples and computer simulations. We first address reliability problems in language transcriptions, agreement between transcribers, and how disagreements can be dealt with. We then show that the frequencies of occurrence obtained from a corpus cannot always be analyzed with the traditional X2 test, as corpus data are often not sequentially independent and unit independent. Next, we stress the relevance of the power of statistical tests, and the sizes of statistically significant effects. Finally, we point out that a t-test based on log odds often provides a better alternative to a X2 analysis based on frequency counts.
Share this page