Anne Cutler

Publications

Displaying 1 - 100 of 403
  • Asano, Y., Yuan, C., Grohe, A.-K., Weber, A., Antoniou, M., & Cutler, A. (2020). Uptalk interpretation as a function of listening experience. In N. Minematsu, M. Kondo, T. Arai, & R. Hayashi (Eds.), Proceedings of Speech Prosody 2020 (pp. 735-739). Tokyo: ISCA. doi:10.21437/SpeechProsody.2020-150.

    Abstract

    The term “uptalk” describes utterance-final pitch rises that carry no sentence-structural information. Uptalk is usually dialectal or sociolectal, and Australian English (AusEng) is particularly known for this attribute. We ask here whether experience with an uptalk variety affects listeners’ ability to categorise rising pitch contours on the basis of the timing and height of their onset and offset. Listeners were two groups of English-speakers (AusEng, and American English), and three groups of listeners with L2 English: one group with Mandarin as L1 and experience of listening to AusEng, one with German as L1 and experience of listening to AusEng, and one with German as L1 but no AusEng experience. They heard nouns (e.g. flower, piano) in the framework “Got a NOUN”, each ending with a pitch rise artificially manipulated on three contrasts: low vs. high rise onset, low vs. high rise offset and early vs. late rise onset. Their task was to categorise the tokens as “question” or “statement”, and we analysed the effect of the pitch contrasts on their judgements. Only the native AusEng listeners were able to use the pitch contrasts systematically in making these categorisations.
  • Bruggeman, L., & Cutler, A. (2020). No L1 privilege in talker adaptation. Bilingualism: Language and Cognition, 23(3), 681-693. doi:10.1017/S1366728919000646.

    Abstract

    As a rule, listening is easier in first (L1) than second languages (L2); difficult L2 listening can challenge even highly proficient users. We here examine one particular listening function, adaptation to novel talkers, in such a high-proficiency population: Dutch emigrants to Australia, predominantly using English outside the family, but all also retaining L1 proficiency. Using lexically-guided perceptual learning (Norris, McQueen & Cutler, 2003), we investigated these listeners’ adaptation to an ambiguous speech sound, in parallel experiments in both their L1 and their L2. A control study established that perceptual learning outcomes were unaffected by the procedural measures required for this double comparison. The emigrants showed equivalent proficiency in tests in both languages, robust perceptual adaptation in their L2, English, but no adaptation in L1. We propose that adaptation to novel talkers is a language-specific skill requiring regular novel practice; a limited set of known (family) interlocutors cannot meet this requirement.
  • Ip, M. H. K., & Cutler, A. (2020). Universals of listening: Equivalent prosodic entrainment in tone and non-tone languages. Cognition, 202: 104311. doi:10.1016/j.cognition.2020.104311.

    Abstract

    In English and Dutch, listeners entrain to prosodic contours to predict where focus will fall in an utterance. Here, we ask whether this strategy is universally available, even in languages with very different phonological systems (e.g., tone versus non-tone languages). In a phoneme detection experiment, we examined whether prosodic entrainment also occurs in Mandarin Chinese, a tone language, where the use of various suprasegmental cues to lexical identity may take precedence over their use in salience. Consistent with the results from Germanic languages, response times were facilitated when preceding intonation predicted high stress on the target-bearing word, and the lexical tone of the target word (i.e., rising versus falling) did not affect the Mandarin listeners' response. Further, the extent to which prosodic entrainment was used to detect the target phoneme was the same in both English and Mandarin listeners. Nevertheless, native Mandarin speakers did not adopt an entrainment strategy when the sentences were presented in English, consistent with the suggestion that L2 listening may be strained by additional functional load from prosodic processing. These findings have implications for how universal and language-specific mechanisms interact in the perception of focus structure in everyday discourse.

    Supplementary material

    supplementary data
  • Yu, J., Mailhammer, R., & Cutler, A. (2020). Vocabulary structure affects word recognition: Evidence from German listeners. In N. Minematsu, M. Kondo, T. Arai, & R. Hayashi (Eds.), Proceedings of Speech Prosody 2020 (pp. 474-478). Tokyo: ISCA. doi:10.21437/SpeechProsody.2020-97.

    Abstract

    Lexical stress is realised similarly in English, German, and Dutch. On a suprasegmental level, stressed syllables tend to be longer and more acoustically salient than unstressed syllables; segmentally, vowels in unstressed syllables are often reduced. The frequency of unreduced unstressed syllables (where only the suprasegmental cues indicate lack of stress) however, differs across the languages. The present studies test whether listener behaviour is affected by these vocabulary differences, by investigating German listeners’ use of suprasegmental cues to lexical stress in German and English word recognition. In a forced-choice identification task, German listeners correctly assigned single-syllable fragments (e.g., Kon-) to one of two words differing in stress (KONto, konZEPT). Thus, German listeners can exploit suprasegmental information for identifying words. German listeners also performed above chance in a similar task in English (with, e.g., DIver, diVERT), i.e., their sensitivity to these cues also transferred to a nonnative language. An English listener group, in contrast, failed in the English fragment task. These findings mirror vocabulary patterns: German has more words with unreduced unstressed syllables than English does.
  • Mandal, S., Best, C. T., Shaw, J., & Cutler, A. (2020). Bilingual phonology in dichotic perception: A case study of Malayalam and English voicing. Glossa: A Journal of General Linguistics, 5(1): 73. doi:10.5334/gjgl.853.

    Abstract

    Listeners often experience cocktail-party situations, encountering multiple ongoing conversa- tions while tracking just one. Capturing the words spoken under such conditions requires selec- tive attention and processing, which involves using phonetic details to discern phonological structure. How do bilinguals accomplish this in L1-L2 competition? We addressed that question using a dichotic listening task with fluent Malayalam-English bilinguals, in which they were pre- sented with synchronized nonce words, one in each language in separate ears, with competing onsets of a labial stop (Malayalam) and a labial fricative (English), both voiced or both voiceless. They were required to attend to the Malayalam or the English item, in separate blocks, and report the initial consonant they heard. We found that perceptual intrusions from the unattended to the attended language were influenced by voicing, with more intrusions on voiced than voiceless tri- als. This result supports our proposal for the feature specification of consonants in Malayalam- English bilinguals, which makes use of privative features, underspecification and the “standard approach” to laryngeal features, as against “laryngeal realism”. Given this representational account, we observe that intrusions result from phonetic properties in the unattended signal being assimilated to the closest matching phonological category in the attended language, and are more likely for segments with a greater number of phonological feature specifications.
  • Ullas, S., Formisano, E., Eisner, F., & Cutler, A. (2020). Interleaved lexical and audiovisual information can retune phoneme boundaries. Attention, Perception & Psychophysics. Advance online publication. doi:10.3758/s13414-019-01961-8.

    Abstract

    To adapt to situations in which speech perception is difficult, listeners can adjust boundaries between phoneme categories using perceptual learning. Such adjustments can draw on lexical information in surrounding speech, or on visual cues via speech-reading. In the present study, listeners proved they were able to flexibly adjust the boundary between two plosive/stop consonants, /p/-/t/, using both lexical and speech-reading information and given the same experimental design for both cue types. Videos of a speaker pronouncing pseudo-words and audio recordings of Dutch words were presented in alternating blocks of either stimulus type. Listeners were able to switch between cues to adjust phoneme boundaries, and resulting effects were comparable to results from listeners receiving only a single source of information. Overall, audiovisual cues (i.e., the videos) produced the stronger effects, commensurate with their applicability for adapting to noisy environments. Lexical cues were able to induce effects with fewer exposure stimuli and a changing phoneme bias, in a design unlike most prior studies of lexical retuning. While lexical retuning effects were relatively weaker compared to audiovisual recalibration, this discrepancy could reflect how lexical retuning may be more suitable for adapting to speakers than to environments. Nonetheless, the presence of the lexical retuning effects suggests that it may be invoked at a faster rate than previously seen. In general, this technique has further illuminated the robustness of adaptability in speech perception, and offers the potential to enable further comparisons across differing forms of perceptual learning.
  • Ullas, S., Formisano, E., Eisner, F., & Cutler, A. (2020). Audiovisual and lexical cues do not additively enhance perceptual adaptation. Psychonomic Bulletin & Review. Advance online publication. doi:10.3758/s13423-020-01728-5.

    Abstract

    When listeners experience difficulty in understanding a speaker, lexical and audiovisual (or lipreading) information can be a helpful source of guidance. These two types of information embedded in speech can also guide perceptual adjustment, also known as recalibration or perceptual retuning. With retuning or recalibration, listeners can use these contextual cues to temporarily or permanently reconfigure internal representations of phoneme categories to adjust to and understand novel interlocutors more easily. These two types of perceptual learning, previously investigated in large part separately, are highly similar in allowing listeners to use speech-external information to make phoneme boundary adjustments. This study explored whether the two sources may work in conjunction to induce adaptation, thus emulating real life, in which listeners are indeed likely to encounter both types of cue together. Listeners who received combined audiovisual and lexical cues showed perceptual learning effects similar to listeners who only received audiovisual cues, while listeners who received only lexical cues showed weaker effects compared with the two other groups. The combination of cues did not lead to additive retuning or recalibration effects, suggesting that lexical and audiovisual cues operate differently with regard to how listeners use them for reshaping perceptual categories. Reaction times did not significantly differ across the three conditions, so none of the forms of adjustment were either aided or hindered by processing time differences. Mechanisms underlying these forms of perceptual learning may diverge in numerous ways despite similarities in experimental applications.

    Supplementary material

    Data and materials
  • Bruggeman, L., & Cutler, A. (2019). The dynamics of lexical activation and competition in bilinguals’ first versus second language. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1342-1346). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Speech input causes listeners to activate multiple candidate words which then compete with one another. These include onset competitors, that share a beginning (bumper, butter), but also, counterintuitively, rhyme competitors, sharing an ending (bumper, jumper). In L1, competition is typically stronger for onset than for rhyme. In L2, onset competition has been attested but rhyme competition has heretofore remained largely unexamined. We assessed L1 (Dutch) and L2 (English) word recognition by the same late-bilingual individuals. In each language, eye gaze was recorded as listeners heard sentences and viewed sets of drawings: three unrelated, one depicting an onset or rhyme competitor of a word in the input. Activation patterns revealed substantial onset competition but no significant rhyme competition in either L1 or L2. Rhyme competition may thus be a “luxury” feature of maximally efficient listening, to be abandoned when resources are scarcer, as in listening by late bilinguals, in either language.
  • Cutler, A., Burchfield, A., & Antoniou, M. (2019). A criterial interlocutor tally for successful talker adaptation? In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1485-1489). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Part of the remarkable efficiency of listening is accommodation to unfamiliar talkers’ specific pronunciations by retuning of phonemic intercategory boundaries. Such retuning occurs in second (L2) as well as first language (L1); however, recent research with emigrés revealed successful adaptation in the environmental L2 but, unprecedentedly, not in L1 despite continuing L1 use. A possible explanation involving relative exposure to novel talkers is here tested in heritage language users with Mandarin as family L1 and English as environmental language. In English, exposure to an ambiguous sound in disambiguating word contexts prompted the expected adjustment of phonemic boundaries in subsequent categorisation. However, no adjustment occurred in Mandarin, again despite regular use. Participants reported highly asymmetric interlocutor counts in the two languages. We conclude that successful retuning ability requires regular exposure to novel talkers in the language in question, a criterion not met for the emigrés’ or for these heritage users’ L1.
  • Joo, H., Jang, J., Kim, S., Cho, T., & Cutler, A. (2019). Prosodic structural effects on coarticulatory vowel nasalization in Australian English in comparison to American English. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 835-839). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    This study investigates effects of prosodic factors (prominence, boundary) on coarticulatory Vnasalization in Australian English (AusE) in CVN and NVC in comparison to those in American English (AmE). As in AmE, prominence was found to lengthen N, but to reduce V-nasalization, enhancing N’s nasality and V’s orality, respectively (paradigmatic contrast enhancement). But the prominence effect in CVN was more robust than that in AmE. Again similar to findings in AmE, boundary induced a reduction of N-duration and V-nasalization phrase-initially (syntagmatic contrast enhancement), and increased the nasality of both C and V phrasefinally. But AusE showed some differences in terms of the magnitude of V nasalization and N duration. The results suggest that the linguistic contrast enhancements underlie prosodic-structure modulation of coarticulatory V-nasalization in comparable ways across dialects, while the fine phonetic detail indicates that the phonetics-prosody interplay is internalized in the individual dialect’s phonetic grammar.
  • Kember, H., Choi, J., Yu, J., & Cutler, A. (2019). The processing of linguistic prominence. Language and Speech. Advance online publication. doi:10.1177/0023830919880217.

    Abstract

    Prominence, the expression of informational weight within utterances, can be signaled by prosodic highlighting (head-prominence, as in English) or by position (as in Korean edge-prominence). Prominence confers processing advantages, even if conveyed only by discourse manipulations. Here we compared processing of prominence in English and Korean, using a task that indexes processing success, namely recognition memory. In each language, participants’ memory was tested for target words heard in sentences in which they were prominent due to prosody, position, both or neither. Prominence produced recall advantage, but the relative effects differed across language. For Korean listeners the positional advantage was greater, but for English listeners prosodic and syntactic prominence had equivalent and additive effects. In a further experiment semantic and phonological foils tested depth of processing of the recall targets. Both foil types were correctly rejected, suggesting that semantic processing had not reached the level at which word form was no longer available. Together the results suggest that prominence processing is primarily driven by universal effects of information structure; but language-specific differences in frequency of experience prompt different relative advantages of prominence signal types. Processing efficiency increases in each case, however, creating more accurate and more rapidly contactable memory representations.
  • Nazzi, T., & Cutler, A. (2019). How consonants and vowels shape spoken-language recognition. Annual Review of Linguistics, 5, 25-47. doi:10.1146/annurev-linguistics-011718-011919.

    Abstract

    All languages instantiate a consonant/vowel contrast. This contrast has processing consequences at different levels of spoken-language recognition throughout the lifespan. In adulthood, lexical processing is more strongly associated with consonant than with vowel processing; this has been demonstrated across 13 languages from seven language families and in a variety of auditory lexical-level tasks (deciding whether a spoken input is a word, spotting a real word embedded in a minimal context, reconstructing a word minimally altered into a pseudoword, learning new words or the “words” of a made-up language), as well as in written-word tasks involving phonological processing. In infancy, a consonant advantage in word learning and recognition is found to emerge during development in some languages, though possibly not in others, revealing that the stronger lexicon–consonant association found in adulthood is learned. Current research is evaluating the relative contribution of the early acquisition of the acoustic/phonetic and lexical properties of the native language in the emergence of this association
  • Choi, J., Broersma, M., & Cutler, A. (2018). Phonetic learning is not enhanced by sequential exposure to more than one language. Linguistic Research, 35(3), 567-581. doi:10.17250/khisli.35.3.201812.006.

    Abstract

    Several studies have documented that international adoptees, who in early years have experienced a change from a language used in their birth country to a new language in an adoptive country, benefit from the limited early exposure to the birth language when relearning that language’s sounds later in life. The adoptees’ relearning advantages have been argued to be conferred by lasting birth-language knowledge obtained from the early exposure. However, it is also plausible to assume that the advantages may arise from adoptees’ superior ability to learn language sounds in general, as a result of their unusual linguistic experience, i.e., exposure to multiple languages in sequence early in life. If this is the case, then the adoptees’ relearning benefits should generalize to previously unheard language sounds, rather than be limited to their birth-language sounds. In the present study, adult Korean adoptees in the Netherlands and matched Dutch-native controls were trained on identifying a Japanese length distinction to which they had never been exposed before. The adoptees and Dutch controls did not differ on any test carried out before, during, or after the training, indicating that observed adoptee advantages for birth-language relearning do not generalize to novel, previously unheard language sounds. The finding thus fails to support the suggestion that birth-language relearning advantages may arise from enhanced ability to learn language sounds in general conferred by early experience in multiple languages. Rather, our finding supports the original contention that such advantages involve memory traces obtained before adoption
  • Ip, M. H. K., & Cutler, A. (2018). Asymmetric efficiency of juncture perception in L1 and L2. In K. Klessa, J. Bachan, A. Wagner, M. Karpiński, & D. Śledziński (Eds.), Proceedings of Speech Prosody 2018 (pp. 289-296). Baixas, France: ISCA. doi:10.21437/SpeechProsody.2018-59.

    Abstract

    In two experiments, Mandarin listeners resolved potential syntactic ambiguities in spoken utterances in (a) their native language (L1) and (b) English which they had learned as a second language (L2). A new disambiguation task was used, requiring speeded responses to select the correct meaning for structurally ambiguous sentences. Importantly, the ambiguities used in the study are identical in Mandarin and in English, and production data show that prosodic disambiguation of this type of ambiguity is also realised very similarly in the two languages. The perceptual results here showed however that listeners’ response patterns differed for L1 and L2, although there was a significant increase in similarity between the two response patterns with increasing exposure to the L2. Thus identical ambiguity and comparable disambiguation patterns in L1 and L2 do not lead to immediate application of the appropriate L1 listening strategy to L2; instead, it appears that such a strategy may have to be learned anew for the L2.
  • Cutler, A., & Farrell, J. (2018). Listening in first and second language. In J. I. Liontas (Ed.), The TESOL encyclopedia of language teaching. New York: Wiley. doi:10.1002/9781118784235.eelt0583.

    Abstract

    Listeners' recognition of spoken language involves complex decoding processes: The continuous speech stream must be segmented into its component words, and words must be recognized despite great variability in their pronunciation (due to talker differences, or to influence of phonetic context, or to speech register) and despite competition from many spuriously present forms supported by the speech signal. L1 listeners deal more readily with all levels of this complexity than L2 listeners. Fortunately, the decoding processes necessary for competent L2 listening can be taught in the classroom. Evidence-based methodologies targeted at the development of efficient speech decoding include teaching of minimal pairs, of phonotactic constraints, and of reduction processes, as well as the use of dictation and L2 video captions.
  • Ip, M. H. K., & Cutler, A. (2018). Cue equivalence in prosodic entrainment for focus detection. In J. Epps, J. Wolfe, J. Smith, & C. Jones (Eds.), Proceedings of the 17th Australasian International Conference on Speech Science and Technology (pp. 153-156).

    Abstract

    Using a phoneme detection task, the present series of experiments examines whether listeners can entrain to different combinations of prosodic cues to predict where focus will fall in an utterance. The stimuli were recorded by four female native speakers of Australian English who happened to have used different prosodic cues to produce sentences with prosodic focus: a combination of duration cues, mean and maximum F0, F0 range, and longer pre-target interval before the focused word onset, only mean F0 cues, only pre-target interval, and only duration cues. Results revealed that listeners can entrain in almost every condition except for where duration was the only reliable cue. Our findings suggest that listeners are flexible in the cues they use for focus processing.
  • Cutler, A., Burchfield, L. A., & Antoniou, M. (2018). Factors affecting talker adaptation in a second language. In J. Epps, J. Wolfe, J. Smith, & C. Jones (Eds.), Proceedings of the 17th Australasian International Conference on Speech Science and Technology (pp. 33-36).

    Abstract

    Listeners adapt rapidly to previously unheard talkers by adjusting phoneme categories using lexical knowledge, in a process termed lexically-guided perceptual learning. Although this is firmly established for listening in the native language (L1), perceptual flexibility in second languages (L2) is as yet less well understood. We report two experiments examining L1 and L2 perceptual learning, the first in Mandarin-English late bilinguals, the second in Australian learners of Mandarin. Both studies showed stronger learning in L1; in L2, however, learning appeared for the English-L1 group but not for the Mandarin-L1 group. Phonological mapping differences from the L1 to the L2 are suggested as the reason for this result.
  • Johnson, E. K., Bruggeman, L., & Cutler, A. (2018). Abstraction and the (misnamed) language familiarity effect. Cognitive Science, 42, 633-645. doi:10.1111/cogs.12520.

    Abstract

    Talkers are recognized more accurately if they are speaking the listeners’ native language rather than an unfamiliar language. This “language familiarity effect” has been shown not to depend upon comprehension and must instead involve language sound patterns. We further examine the level of sound-pattern processing involved, by comparing talker recognition in foreign languages versus two varieties of English, by (a) English speakers of one variety, (b) English speakers of the other variety, and (c) non-native listeners (more familiar with one of the varieties). All listener groups performed better with native than foreign speech, but no effect of language variety appeared: Native listeners discriminated talkers equally well in each, with the native variety never outdoing the other variety, and non-native listeners discriminated talkers equally poorly in each, irrespective of the variety's familiarity. The results suggest that this talker recognition effect rests not on simple familiarity, but on an abstract level of phonological processing
  • Kidd, E., Junge, C., Spokes, T., Morrison, L., & Cutler, A. (2018). Individual differences in infant speech segmentation: Achieving the lexical shift. Infancy, 23(6), 770-794. doi:10.1111/infa.12256.

    Abstract

    We report a large‐scale electrophysiological study of infant speech segmentation, in which over 100 English‐acquiring 9‐month‐olds were exposed to unfamiliar bisyllabic words embedded in sentences (e.g., He saw a wild eagle up there), after which their brain responses to either the just‐familiarized word (eagle) or a control word (coral) were recorded. When initial exposure occurs in continuous speech, as here, past studies have reported that even somewhat older infants do not reliably recognize target words, but that successful segmentation varies across children. Here, we both confirm and further uncover the nature of this variation. The segmentation response systematically varied across individuals and was related to their vocabulary development. About one‐third of the group showed a left‐frontally located relative negativity in response to familiar versus control targets, which has previously been described as a mature response. Another third showed a similarly located positive‐going reaction (a previously described immature response), and the remaining third formed an intermediate grouping that was primarily characterized by an initial response delay. A fine‐grained group‐level analysis suggested that a developmental shift to a lexical mode of processing occurs toward the end of the first year, with variation across individual infants in the exact timing of this shift.

    Supplementary material

    supporting information
  • Norris, D., McQueen, J. M., & Cutler, A. (2018). Commentary on “Interaction in spoken word recognition models". Frontiers in Psychology, 9: 1568. doi:10.3389/fpsyg.2018.01568.
  • Burchfield, L. A., Luk, S.-.-H.-K., Antoniou, M., & Cutler, A. (2017). Lexically guided perceptual learning in Mandarin Chinese. In Proceedings of Interspeech 2017 (pp. 576-580). doi:10.21437/Interspeech.2017-618.

    Abstract

    Lexically guided perceptual learni ng refers to the use of lexical knowledge to retune sp eech categories and thereby adapt to a novel talker’s pronunciation. This adaptation has been extensively documented, but primarily for segmental-based learning in English and Dutch. In languages with lexical tone, such as Mandarin Chinese, tonal categories can also be retuned in this way, but segmental category retuning had not been studied. We report two experiment s in which Mandarin Chinese listeners were exposed to an ambiguous mixture of [f] and [s] in lexical contexts favoring an interpretation as either [f] or [s]. Listeners were subsequently more likely to identify sounds along a continuum between [f] and [s], and to interpret minimal word pairs, in a manner consistent with this exposure. Thus lexically guided perceptual learning of segmental categories had indeed taken place, consistent with suggestions that such learning may be a universally available adaptation process
  • Choi, J., Cutler, A., & Broersma, M. (2017). Early development of abstract language knowledge: Evidence from perception-production transfer of birth-language memory. Royal Society Open Science, 4: 160660. doi:10.1098/rsos.160660.

    Abstract

    Children adopted early in life into another linguistic community typically forget their birth language but retain, unaware, relevant linguistic knowledge that may facilitate (re)learning of birth-language patterns. Understanding the nature of this knowledge can shed light on how language is acquired. Here, international adoptees from Korea with Dutch as their current language, and matched Dutch-native controls, provided speech production data on a Korean consonantal distinction unlike any Dutch distinctions, at the outset and end of an intensive perceptual training. The productions, elicited in a repetition task, were identified and rated by Korean listeners. Adoptees' production scores improved significantly more across the training period than control participants' scores, and, for adoptees only, relative production success correlated significantly with the rate of learning in perception (which had, as predicted, also surpassed that of the controls). Of the adoptee group, half had been adopted at 17 months or older (when talking would have begun), while half had been prelinguistic (under six months). The former group, with production experience, showed no advantage over the group without. Thus the adoptees' retained knowledge of Korean transferred from perception to production and appears to be abstract in nature rather than dependent on the amount of experience.
  • Choi, J., Broersma, M., & Cutler, A. (2017). Early phonology revealed by international adoptees' birth language retention. Proceedings of the National Academy of Sciences of the United States of America, 114(28), 7307-7312. doi:10.1073/pnas.1706405114.

    Abstract

    Until at least 6 mo of age, infants show good discrimination for familiar phonetic contrasts (i.e., those heard in the environmental language) and contrasts that are unfamiliar. Adult-like discrimination (significantly worse for nonnative than for native contrasts) appears only later, by 9–10 mo. This has been interpreted as indicating that infants have no knowledge of phonology until vocabulary development begins, after 6 mo of age. Recently, however, word recognition has been observed before age 6 mo, apparently decoupling the vocabulary and phonology acquisition processes. Here we show that phonological acquisition is also in progress before 6 mo of age. The evidence comes from retention of birth-language knowledge in international adoptees. In the largest ever such study, we recruited 29 adult Dutch speakers who had been adopted from Korea when young and had no conscious knowledge of Korean language at all. Half were adopted at age 3–5 mo (before native-specific discrimination develops) and half at 17 mo or older (after word learning has begun). In a short intensive training program, we observe that adoptees (compared with 29 matched controls) more rapidly learn tripartite Korean consonant distinctions without counterparts in their later-acquired Dutch, suggesting that the adoptees retained phonological knowledge about the Korean distinction. The advantage is equivalent for the younger-adopted and the older-adopted groups, and both groups not only acquire the tripartite distinction for the trained consonants but also generalize it to untrained consonants. Although infants younger than 6 mo can still discriminate unfamiliar phonetic distinctions, this finding indicates that native-language phonological knowledge is nonetheless being acquired at that age.
  • Cutler, A. (2017). Converging evidence for abstract phonological knowledge in speech processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 1447-1448). Austin, TX: Cognitive Science Society.

    Abstract

    The perceptual processing of speech is a constant interplay of multiple competing albeit convergent processes: acoustic input vs. higher-level representations, universal mechanisms vs. language-specific, veridical traces of speech experience vs. construction and activation of abstract representations. The present summary concerns the third of these issues. The ability to generalise across experience and to deal with resulting abstractions is the hallmark of human cognition, visible even in early infancy. In speech processing, abstract representations play a necessary role in both production and perception. New sorts of evidence are now informing our understanding of the breadth of this role.
  • Ip, M. H. K., & Cutler, A. (2017). Intonation facilitates prediction of focus even in the presence of lexical tones. In Proceedings of Interspeech 2017 (pp. 1218-1222). doi:10.21437/Interspeech.2017-264.

    Abstract

    In English and Dutch, listeners entrain to prosodic contours to predict where focus will fall in an utterance. However, is this strategy universally available, even in languages with different phonological systems? In a phoneme detection experiment, we examined whether prosodic entrainment is also found in Mandarin Chinese, a tone language, where in principle the use of pitch for lexical identity may take precedence over the use of pitch cues to salience. Consistent with the results from Germanic languages, response times were facilitated when preceding intonation predicted accent on the target-bearing word. Acoustic analyses revealed greater F0 range in the preceding intonation of the predicted-accent sentences. These findings have implications for how universal and language-specific mechanisms interact in the processing of salience.
  • Goudbeek, M., Smits, R., Cutler, A., & Swingley, D. (2017). Auditory and phonetic category formation. In H. Cohen, & C. Lefebvre (Eds.), Handbook of categorization in cognitive science (2nd revised ed.) (pp. 687-708). Amsterdam: Elsevier.
  • Kember, H., Grohe, A.-.-K., Zahner, K., Braun, B., Weber, A., & Cutler, A. (2017). Similar prosodic structure perceived differently in German and English. In Proceedings of Interspeech 2017 (pp. 1388-1392).

    Abstract

    English and German have similar prosody, but their speakers realize some pitch falls (not rises) in subtly different ways. We here test for asymmetry in perception. An ABX discrimination task requiring F0 slope or duration judgements on isolated vowels revealed no cross-language difference in duration or F0 fall discrimination, but discrimination of rises (realized similarly in each language) was less accurate for English than for German listeners. This unexpected finding may reflect greater sensitivity to rising patterns by German listeners, or reduced sensitivity by English listeners as a result of extensive exposure to phrase-final rises (“uptalk”) in their language
  • Warner, N., & Cutler, A. (2017). Stress effects in vowel perception as a function of language-specific vocabulary patterns. Phonetica, 74, 81-106. doi:10.1159/000447428.

    Abstract

    Background/Aims: Evidence from spoken word recognition suggests that for English listeners, distinguishing full versus reduced vowels is important, but discerning stress differences involving the same full vowel (as in mu- from music or museum) is not. In Dutch, in contrast, the latter distinction is important. This difference arises from the relative frequency of unstressed full vowels in the two vocabularies. The goal of this paper is to determine how this difference in the lexicon influences the perception of stressed versus unstressed vowels. Methods: All possible sequences of two segments (diphones) in Dutch and in English were presented to native listeners in gated fragments. We recorded identification performance over time throughout the speech signal. The data were here analysed specifically for patterns in perception of stressed versus unstressed vowels. Results: The data reveal significantly larger stress effects (whereby unstressed vowels are harder to identify than stressed vowels) in English than in Dutch. Both language-specific and shared patterns appear regarding which vowels show stress effects. Conclusion: We explain the larger stress effect in English as reflecting the processing demands caused by the difference in use of unstressed vowels in the lexicon. The larger stress effect in English is due to relative inexperience with processing unstressed full vowels
  • Bruggeman, L., & Cutler, A. (2016). Lexical manipulation as a discovery tool for psycholinguistic research. In C. Carignan, & M. D. Tyler (Eds.), Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016) (pp. 313-316).
  • Cutler, A., & Norris, D. (2016). Bottoms up! How top-down pitfalls ensnare speech perception researchers too. Commentary on C. Firestone & B. Scholl: Cognition does not affect perception: Evaluating the evidence for 'top-down' effects. Behavioral and Brain Sciences, e236. doi:10.1017/S0140525X15002745.

    Abstract

    Not only can the pitfalls that Firestone & Scholl (F&S) identify be generalised across multiple studies within the field of visual perception, but also they have general application outside the field wherever perceptual and cognitive processing are compared. We call attention to the widespread susceptibility of research on the perception of speech to versions of the same pitfalls.
  • Ip, M., & Cutler, A. (2016). Cross-language data on five types of prosodic focus. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 330-334).

    Abstract

    To examine the relative roles of language-specific and language-universal mechanisms in the production of prosodic focus, we compared production of five different types of focus by native speakers of English and Mandarin. Two comparable dialogues were constructed for each language, with the same words appearing in focused and unfocused position; 24 speakers recorded each dialogue in each language. Duration, F0 (mean, maximum, range), and rms-intensity (mean, maximum) of all critical word tokens were measured. Across the different types of focus, cross-language differences were observed in the degree to which English versus Mandarin speakers use the different prosodic parameters to mark focus, suggesting that while prosody may be universally available for expressing focus, the means of its employment may be considerably language-specific
  • Jeske, J., Kember, H., & Cutler, A. (2016). Native and non-native English speakers' use of prosody to predict sentence endings. In Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016).
  • Kember, H., Choi, J., & Cutler, A. (2016). Processing advantages for focused words in Korean. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 702-705).

    Abstract

    In Korean, focus is expressed in accentual phrasing. To ascertain whether words focused in this manner enjoy a processing advantage analogous to that conferred by focus as expressed in, e.g, English and Dutch, we devised sentences with target words in one of four conditions: prosodic focus, syntactic focus, prosodic + syntactic focus, and no focus as a control. 32 native speakers of Korean listened to blocks of 10 sentences, then were presented visually with words and asked whether or not they had heard them. Overall, words with focus were recognised significantly faster and more accurately than unfocused words. In addition, words with syntactic focus or syntactic + prosodic focus were recognised faster than words with prosodic focus alone. As for other languages, Korean focus confers processing advantage on the words carrying it. While prosodic focus does provide an advantage, however, syntactic focus appears to provide the greater beneficial effect for recognition memory
  • Norris, D., McQueen, J. M., & Cutler, A. (2016). Prediction, Bayesian inference and feedback in speech recognition. Language, Cognition and Neuroscience, 31(1), 4-18. doi:10.1080/23273798.2015.1081703.

    Abstract

    Speech perception involves prediction, but how is that prediction implemented? In cognitive models prediction has often been taken to imply that there is feedback of activation from lexical to pre-lexical processes as implemented in interactive-activation models (IAMs). We show that simple activation feedback does not actually improve speech recognition. However, other forms of feedback can be beneficial. In particular, feedback can enable the listener to adapt to changing input, and can potentially help the listener to recognise unusual input, or recognise speech in the presence of competing sounds. The common feature of these helpful forms of feedback is that they are all ways of optimising the performance of speech recognition using Bayesian inference. That is, listeners make predictions about speech because speech recognition is optimal in the sense captured in Bayesian models.
  • Choi, J., Broersma, M., & Cutler, A. (2015). Enhanced processing of a lost language: Linguistic knowledge or linguistic skill? In Proceedings of Interspeech 2015: 16th Annual Conference of the International Speech Communication Association (pp. 3110-3114).

    Abstract

    Same-different discrimination judgments for pairs of Korean stop consonants, or of Japanese syllables differing in phonetic segment length, were made by adult Korean adoptees in the Netherlands, by matched Dutch controls, and Korean controls. The adoptees did not outdo either control group on either task, although the same individuals had performed significantly better than matched controls on an identification learning task. This suggests that early exposure to multiple phonetic systems does not specifically improve acoustic-phonetic skills; rather, enhanced performance suggests retained language knowledge.
  • Cutler, A. (2015). Lexical stress in English pronunciation. In M. Reed, & J. M. Levis (Eds.), The Handbook of English Pronunciation (pp. 106-124). Chichester: Wiley.
  • Cutler, A. (2015). Representation of second language phonology. Applied Psycholinguistics, 36(1), 115-128. doi:10.1017/S0142716414000459.

    Abstract

    Orthographies encode phonological information only at the level of words (chiefly, the information encoded concerns phonetic segments; in some cases, tonal information or default stress may be encoded). Of primary interest to second language (L2) learners is whether orthography can assist in clarifying L2 phonological distinctions that are particularly difficult to perceive (e.g., where one native-language phonemic category captures two L2 categories). A review of spoken-word recognition evidence suggests that orthographic information can install knowledge of such a distinction in lexical representations but that this does not affect learners’ ability to perceive the phonemic distinction in speech. Words containing the difficult phonemes become even harder for L2 listeners to recognize, because perception maps less accurately to lexical content.
  • Ernestus, M., & Cutler, A. (2015). BALDEY: A database of auditory lexical decisions. Quarterly Journal of Experimental Psychology, 68, 1469-1488. doi:10.1080/17470218.2014.984730.

    Abstract

    In an auditory lexical decision experiment, 5,541 spoken content words and pseudo-words were presented to 20 native speakers of Dutch. The words vary in phonological makeup and in number of syllables and stress pattern, and are further representative of the native Dutch vocabulary in that most are morphologically complex, comprising two stems or one stem plus derivational and inflectional suffixes, with inflections representing both regular and irregular paradigms; the pseudo-words were matched in these respects to the real words. The BALDEY data file includes response times and accuracy rates, with for each item morphological information plus phonological and acoustic information derived from automatic phonemic segmentation of the stimuli. Two initial analyses illustrate how this data set can be used. First, we discuss several measures of the point at which a word has no further neighbors, and compare the degree to which each measure predicts our lexical decision response outcomes. Second, we investigate how well four different measures of frequency of occurrence (from written corpora, spoken corpora, subtitles and frequency ratings by 70 participants) predict the same outcomes. These analyses motivate general conclusions about the auditory lexical decision task. The (publicly available) BALDEY database lends itself to many further analyses.
  • Cutler, A., & McQueen, J. M. (2014). How prosody is both mandatory and optional. In J. Caspers, Y. Chen, W. Heeren, J. Pacilly, N. O. Schiller, & E. Van Zanten (Eds.), Above and Beyond the Segments: Experimental linguistics and phonetics (pp. 71-82). Amsterdam: Benjamins.

    Abstract

    Speech signals originate as a sequence of linguistic units selected by speakers, but these units are necessarily realised in the suprasegmental dimensions of time, frequency and amplitude. For this reason prosodic structure has been viewed as a mandatory target of language processing by both speakers and listeners. In apparent contradiction, however, prosody has also been argued to be ancillary rather than core linguistic structure, making processing of prosodic structure essentially optional. In the present tribute to one of the luminaries of prosodic research for the past quarter century, we review evidence from studies of the processing of lexical stress and focal accent which reconciles these views and shows that both claims are, each in their own way, fully true.
  • Cutler, A. (2014). In thrall to the vocabulary. Acoustics Australia, 42, 84-89.

    Abstract

    Vocabularies contain hundreds of thousands of words built from only a handful of phonemes; longer words inevitably tend to contain shorter ones. Recognising speech thus requires distinguishing intended words from accidentally present ones. Acoustic information in speech is used wherever it contributes significantly to this process; but as this review shows, its contribution differs across languages, with the consequences of this including: identical and equivalently present information distinguishing the same phonemes being used in Polish but not in German, or in English but not in Italian; identical stress cues being used in Dutch but not in English; expectations about likely embedding patterns differing across English, French, Japanese.
  • Junge, C., & Cutler, A. (2014). Early word recognition and later language skills. Brain sciences, 4(4), 532-559. doi:10.3390/brainsci4040532.

    Abstract

    Recent behavioral and electrophysiological evidence has highlighted the long-term importance for language skills of an early ability to recognize words in continuous speech. We here present further tests of this long-term link in the form of follow-up studies conducted with two (separate) groups of infants who had earlier participated in speech segmentation tasks. Each study extends prior follow-up tests: Study 1 by using a novel follow-up measure that taps into online processing, Study 2 by assessing language performance relationships over a longer time span than previously tested. Results of Study 1 show that brain correlates of speech segmentation ability at 10 months are positively related to 16-month-olds’ target fixations in a looking-while-listening task. Results of Study 2 show that infant speech segmentation ability no longer directly predicts language profiles at the age of five. However, a meta-analysis across our results and those of similar studies (Study 3) reveals that age at follow-up does not moderate effect size. Together, the results suggest that infants’ ability to recognize words in speech certainly benefits early vocabulary development; further observed relationships of later language skills to early word recognition may be consequent upon this vocabulary size effect.
  • Junge, C., Cutler, A., & Hagoort, P. (2014). Successful word recognition by 10-month-olds given continuous speech both at initial exposure and test. Infancy, 19(2), 179-193. doi:10.1111/infa.12040.

    Abstract

    Most words that infants hear occur within fluent speech. To compile a vocabulary, infants therefore need to segment words from speech contexts. This study is the first to investigate whether infants (here: 10-month-olds) can recognize words when both initial exposure and test presentation are in continuous speech. Electrophysiological evidence attests that this indeed occurs: An increased extended negativity (word recognition effect) appears for familiarized target words relative to control words. This response proved constant at the individual level: Only infants who showed this negativity at test had shown such a response, within six repetitions after first occurrence, during familiarization.
  • Tuinman, A., Mitterer, H., & Cutler, A. (2014). Use of syntax in perceptual compensation for phonological reduction. Language and Speech, 57, 68-85. doi:10.1177/0023830913479106.

    Abstract

    Listeners resolve ambiguity in speech by consulting context. Extensive research on this issue has largely relied on continua of sounds constructed to vary incrementally between two phonemic endpoints. In this study we presented listeners instead with phonetic ambiguity of a kind with which they have natural experience: varying degrees of word-final /t/-reduction. In two experiments, Dutch listeners decided whether or not the verb in a sentence such as Maar zij ren(t) soms ‘But she sometimes run(s)’ ended in /t/. In Dutch, presence versus absence of final /t/ distinguishes third- from first-person singular present-tense verbs. Acoustic evidence for /t/ varied from clear to absent, and immediately preceding phonetic context was consistent with more versus less likely deletion of /t/. In both experiments, listeners reported more /t/s in sentences in which /t/ would be syntactically correct. In Experiment 1, the disambiguating syntactic information preceded the target verb, as above, while in Experiment 2, it followed the verb. The syntactic bias was greater for fast than for slow responses in Experiment 1, but no such difference appeared in Experiment 2. We conclude that syntactic information does not directly influence pre-lexical processing, but is called upon in making phoneme decisions.
  • Van der Zande, P., Jesse, A., & Cutler, A. (2014). Hearing words helps seeing words: A cross-modal word repetition effect. Speech Communication, 59, 31-43. doi:10.1016/j.specom.2014.01.001.

    Abstract

    Watching a speaker say words benefits subsequent auditory recognition of the same words. In this study, we tested whether hearing words also facilitates subsequent phonological processing from visual speech, and if so, whether speaker repetition influences the magnitude of this word repetition priming. We used long-term cross-modal repetition priming as a means to investigate the underlying lexical representations involved in listening to and seeing speech. In Experiment 1, listeners identified auditory-only words during exposure and visual-only words at test. Words at test were repeated or new and produced by the exposure speaker or a novel speaker. Results showed a significant effect of cross-modal word repetition priming but this was unaffected by speaker changes. Experiment 2 added an explicit recognition task at test. Listeners’ lipreading performance was again improved by prior exposure to auditory words. Explicit recognition memory was poor, and neither word repetition nor speaker repetition improved it. This suggests that cross-modal repetition priming is neither mediated by explicit memory nor improved by speaker information. Our results suggest that phonological representations in the lexicon are shared across auditory and visual processing, and that speaker information is not transferred across modalities at the lexical level.
  • Van der Zande, P., Jesse, A., & Cutler, A. (2014). Cross-speaker generalisation in two phoneme-level perceptual adaptation processes. Journal of Phonetics, 43, 38-46. doi:10.1016/j.wocn.2014.01.003.

    Abstract

    Speech perception is shaped by listeners' prior experience with speakers. Listeners retune their phonetic category boundaries after encountering ambiguous sounds in order to deal with variations between speakers. Repeated exposure to an unambiguous sound, on the other hand, leads to a decrease in sensitivity to the features of that particular sound. This study investigated whether these changes in the listeners' perceptual systems can generalise to the perception of speech from a novel speaker. Specifically, the experiments looked at whether visual information about the identity of the speaker could prevent generalisation from occurring. In Experiment 1, listeners retuned auditory category boundaries using audiovisual speech input. This shift in the category boundaries affected perception of speech from both the exposure speaker and a novel speaker. In Experiment 2, listeners were repeatedly exposed to unambiguous speech either auditorily or audiovisually, leading to a decrease in sensitivity to the features of the exposure sound. Here, too, the changes affected the perception of both the exposure speaker and the novel speaker. Together, these results indicate that changes in the perceptual system can affect the perception of speech from a novel speaker and that visual speaker identity information did not prevent this generalisation.
  • Warner, N., McQueen, J. M., & Cutler, A. (2014). Tracking perception of the sounds of English. The Journal of the Acoustical Society of America, 135, 2295-3006. doi:10.1121/1.4870486.

    Abstract

    Twenty American English listeners identified gated fragments of all 2288 possible English within-word and cross-word diphones, providing a total of 538 560 phoneme categorizations. The results show orderly uptake of acoustic information in the signal and provide a view of where information about segments occurs in time. Information locus depends on each speech sound’s identity and phonological features. Affricates and diphthongs have highly localized information so that listeners’ perceptual accuracy rises during a confined time range. Stops and sonorants have more distributed and gradually appearing information. The identity and phonological features (e.g., vowel vs consonant) of the neighboring segment also influences when acoustic information about a segment is available. Stressed vowels are perceived significantly more accurately than unstressed vowels, but this effect is greater for lax vowels than for tense vowels or diphthongs. The dataset charts the availability of perceptual cues to segment identity across time for the full phoneme repertoire of English in all attested phonetic contexts.
  • Cutler, A., & Bruggeman, L. (2013). Vocabulary structure and spoken-word recognition: Evidence from French reveals the source of embedding asymmetry. In Proceedings of INTERSPEECH: 14th Annual Conference of the International Speech Communication Association (pp. 2812-2816).

    Abstract

    Vocabularies contain hundreds of thousands of words built from only a handful of phonemes, so that inevitably longer words tend to contain shorter ones. In many languages (but not all) such embedded words occur more often word-initially than word-finally, and this asymmetry, if present, has farreaching consequences for spoken-word recognition. Prior research had ascribed the asymmetry to suffixing or to effects of stress (in particular, final syllables containing the vowel schwa). Analyses of the standard French vocabulary here reveal an effect of suffixing, as predicted by this account, and further analyses of an artificial variety of French reveal that extensive final schwa has an independent and additive effect in promoting the embedding asymmetry.
  • Johnson, E. K., Lahey, M., Ernestus, M., & Cutler, A. (2013). A multimodal corpus of speech to infant and adult listeners. Journal of the Acoustical Society of America, 134, EL534-EL540. doi:10.1121/1.4828977.

    Abstract

    An audio and video corpus of speech addressed to 28 11-month-olds is described. The corpus allows comparisons between adult speech directed towards infants, familiar adults and unfamiliar adult addressees, as well as of caregivers’ word teaching strategies across word classes. Summary data show that infant-directed speech differed more from speech to unfamiliar than familiar adults; that word teaching strategies for nominals versus verbs and adjectives differed; that mothers mostly addressed infants with multi-word utterances; and that infants’ vocabulary size was unrelated to speech rate, but correlated positively with predominance of continuous caregiver speech (not of isolated words) in the input.
  • Kooijman, V., Junge, C., Johnson, E. K., Hagoort, P., & Cutler, A. (2013). Predictive brain signals of linguistic development. Frontiers in Psychology, 4: 25. doi:10.3389/fpsyg.2013.00025.

    Abstract

    The ability to extract word forms from continuous speech is a prerequisite for constructing a vocabulary and emerges in the first year of life. Electrophysiological (ERP) studies of speech segmentation by 9- to 12-month-old listeners in several languages have found a left-localized negativity linked to word onset as a marker of word detection. We report an ERP study showing significant evidence of speech segmentation in Dutch-learning 7-month-olds. In contrast to the left-localized negative effect reported with older infants, the observed overall mean effect had a positive polarity. Inspection of individual results revealed two participant sub-groups: a majority showing a positive-going response, and a minority showing the left negativity observed in older age groups. We retested participants at age three, on vocabulary comprehension and word and sentence production. On every test, children who at 7 months had shown the negativity associated with segmentation of words from speech outperformed those who had produced positive-going brain responses to the same input. The earlier that infants show the left-localized brain responses typically indicating detection of words in speech, the better their early childhood language skills.
  • Otake, T., & Cutler, A. (2013). Lexical selection in action: Evidence from spontaneous punning. Language and Speech, 56(4), 555-573. doi:10.1177/0023830913478933.

    Abstract

    Analysis of a corpus of spontaneously produced Japanese puns from a single speaker over a two-year period provides a view of how a punster selects a source word for a pun and transforms it into another word for humorous effect. The pun-making process is driven by a principle of similarity: the source word should as far as possible be preserved (in terms of segmental sequence) in the pun. This renders homophones (English example: band–banned) the pun type of choice, with part–whole relationships of embedding (cap–capture), and mutations of the source word (peas–bees) rather less favored. Similarity also governs mutations in that single-phoneme substitutions outnumber larger changes, and in phoneme substitutions, subphonemic features tend to be preserved. The process of spontaneous punning thus applies, on line, the same similarity criteria as govern explicit similarity judgments and offline decisions about pun success (e.g., for inclusion in published collections). Finally, the process of spoken-word recognition is word-play-friendly in that it involves multiple word-form activation and competition, which, coupled with known techniques in use in difficult listening conditions, enables listeners to generate most pun types as offshoots of normal listening procedures.
  • Van der Zande, P., Jesse, A., & Cutler, A. (2013). Lexically guided retuning of visual phonetic categories. Journal of the Acoustical Society of America, 134, 562-571. doi:10.1121/1.4807814.

    Abstract

    Listeners retune the boundaries between phonetic categories to adjust to individual speakers' productions. Lexical information, for example, indicates what an unusual sound is supposed to be, and boundary retuning then enables the speaker's sound to be included in the appropriate auditory phonetic category. In this study, it was investigated whether lexical knowledge that is known to guide the retuning of auditory phonetic categories, can also retune visual phonetic categories. In Experiment 1, exposure to a visual idiosyncrasy in ambiguous audiovisually presented target words in a lexical decision task indeed resulted in retuning of the visual category boundary based on the disambiguating lexical context. In Experiment 2 it was tested whether lexical information retunes visual categories directly, or indirectly through the generalization from retuned auditory phonetic categories. Here, participants were exposed to auditory-only versions of the same ambiguous target words as in Experiment 1. Auditory phonetic categories were retuned by lexical knowledge, but no shifts were observed for the visual phonetic categories. Lexical knowledge can therefore guide retuning of visual phonetic categories, but lexically guided retuning of auditory phonetic categories is not generalized to visual categories. Rather, listeners adjust auditory and visual phonetic categories to talker idiosyncrasies separately.
  • El Aissati, A., McQueen, J. M., & Cutler, A. (2012). Finding words in a language that allows words without vowels. Cognition, 124, 79-84. doi:10.1016/j.cognition.2012.03.006.

    Abstract

    Across many languages from unrelated families, spoken-word recognition is subject to a constraint whereby potential word candidates must contain a vowel. This constraint minimizes competition from embedded words (e.g., in English, disfavoring win in twin because t cannot be a word). However, the constraint would be counter-productive in certain languages that allow stand-alone vowelless open-class words. One such language is Berber (where t is indeed a word). Berber listeners here detected words affixed to nonsense contexts with or without vowels. Length effects seen in other languages replicated in Berber, but in contrast to prior findings, word detection was not hindered by vowelless contexts. When words can be vowelless, otherwise universal constraints disfavoring vowelless words do not feature in spoken-word recognition.

    Supplementary material

    mmc1.pdf
  • Cutler, A., & Davis, C. (2012). An orthographic effect in phoneme processing, and its limitations. Frontiers in Psychology, 3, 18. doi:10.3389/fpsyg.2012.00018.

    Abstract

    To examine whether lexically stored knowledge about spelling influences phoneme evaluation, we conducted three experiments with a low-level phonetic judgement task: phoneme goodness rating. In each experiment, listeners heard phonetic tokens varying along a continuum centred on /s/, occurring finally in isolated word or nonword tokens. An effect of spelling appeared in Experiment 1: Native English speakers’ goodness ratings for the best /s/ tokens were significantly higher in words spelled with S (e.g., bless) than in words spelled with C (e.g., voice). No such difference appeared when nonnative speakers rated the same materials in Experiment 2, indicating that the difference could not be due to acoustic characteristics of the S- versus C-words. In Experiment 3, nonwords with lexical neighbours consistently spelled with S (e.g., pless) versus with C (e.g., floice) failed to elicit orthographic neighbourhood effects; no significant difference appeared in native English speakers’ ratings for the S-consistent versus the C-consistent sets. Obligatory influence of lexical knowledge on phonemic processing would have predicted such neighbourhood effects; the findings are thus better accommodated by models in which phonemic decisions draw strategically upon lexical information.
  • Cutler, A. (2012). Native listening: Language experience and the recognition of spoken words. Cambridge, MA: MIT Press.

    Abstract

    Understanding speech in our native tongue seems natural and effortless; listening to speech in a nonnative language is a different experience. In this book, Anne Cutler argues that listening to speech is a process of native listening because so much of it is exquisitely tailored to the requirements of the native language. Her cross-linguistic study (drawing on experimental work in languages that range from English and Dutch to Chinese and Japanese) documents what is universal and what is language specific in the way we listen to spoken language. Cutler describes the formidable range of mental tasks we carry out, all at once, with astonishing speed and accuracy, when we listen. These include evaluating probabilities arising from the structure of the native vocabulary, tracking information to locate the boundaries between words, paying attention to the way the words are pronounced, and assessing not only the sounds of speech but prosodic information that spans sequences of sounds. She describes infant speech perception, the consequences of language-specific specialization for listening to other languages, the flexibility and adaptability of listening (to our native languages), and how language-specificity and universality fit together in our language processing system. Drawing on her four decades of work as a psycholinguist, Cutler documents the recent growth in our knowledge about how spoken-word recognition works and the role of language structure in this process. Her book is a significant contribution to a vibrant and rapidly developing field.
  • Cutler, A. (2012). Native listening: The flexibility dimension. Dutch Journal of Applied Linguistics, 1(2), 169-187.

    Abstract

    The way we listen to spoken language is tailored to the specific benefit of native-language speech input. Listening to speech in non-native languages can be significantly hindered by this native bias. Is it possible to determine the degree to which a listener is listening in a native-like manner? Promising indications of how this question may be tackled are provided by new research findings concerning the great flexibility that characterises listening to the L1, in online adjustment of phonetic category boundaries for adaptation across talkers, and in modulation of lexical dynamics for adjustment across listening conditions. This flexibility pays off in many dimensions, including listening in noise, adaptation across dialects, and identification of voices. These findings further illuminate the robustness and flexibility of native listening, and potentially point to ways in which we might begin to assess degrees of ‘native-likeness’ in this skill.
  • Cutler, A., Otake, T., & Bruggeman, L. (2012). Phonologically determined asymmetries in vocabulary structure across languages. Journal of the Acoustical Society of America, 132(2), EL155-EL160. doi:10.1121/1.4737596.

    Abstract

    Studies of spoken-word recognition have revealed that competition from embedded words differs in strength as a function of where in the carrier word the embedded word is found and have further shown embedding patterns to be skewed such that embeddings in initial position in carriers outnumber embeddings in final position. Lexico-statistical analyses show that this skew is highly attenuated in Japanese, a noninflectional language. Comparison of the extent of the asymmetry in the three Germanic languages English, Dutch, and German allows the source to be traced to a combination of suffixal morphology and vowel reduction in unstressed syllables.
  • Cutler, A. (2012). Eentaalpsychologie is geen taalpsychologie: Part II. [Valedictory lecture Radboud University]. Nijmegen: Radboud University.

    Abstract

    Rede uitgesproken bij het afscheid als hoogleraar Vergelijkende taalpsychologie aan de Faculteit der Sociale Wetenschappen van de Radboud Universiteit Nijmegen op donderdag 20 september 2012
  • Junge, C., Kooijman, V., Hagoort, P., & Cutler, A. (2012). Rapid recognition at 10 months as a predictor of language development. Developmental Science, 15, 463-473. doi:10.1111/j.1467-7687.2012.1144.x.

    Abstract

    Infants’ ability to recognize words in continuous speech is vital for building a vocabulary.We here examined the amount and type of exposure needed for 10-month-olds to recognize words. Infants first heard a word, either embedded within an utterance or in isolation, then recognition was assessed by comparing event-related potentials to this word versus a word that they had not heard directly before. Although all 10-month-olds showed recognition responses to words first heard in isolation, not all infants showed such responses to words they had first heard within an utterance. Those that did succeed in the latter, harder, task, however, understood more words and utterances when re-tested at 12 months, and understood more words and produced more words at 24 months, compared with those who had shown no such recognition response at 10 months. The ability to rapidly recognize the words in continuous utterances is clearly linked to future language development.
  • Junge, C., Cutler, A., & Hagoort, P. (2012). Electrophysiological evidence of early word learning. Neuropsychologia, 50, 3702-3712. doi:10.1016/j.neuropsychologia.2012.10.012.

    Abstract

    Around their first birthday infants begin to talk, yet they comprehend words long before. This study investigated the event-related potentials (ERP) responses of nine-month-olds on basic level picture-word pairings. After a familiarization phase of six picture-word pairings per semantic category, comprehension for novel exemplars was tested in a picture-word matching paradigm. ERPs time-locked to pictures elicited a modulation of the Negative Central (Nc) component, associated with visual attention and recognition. It was attenuated by category repetition as well as by the type-token ratio of picture context. ERPs time-locked to words in the training phase became more negative with repetition (N300-600), but there was no influence of picture type-token ratio, suggesting that infants have identified the concept of each picture before a word was presented. Results from the test phase provided clear support that infants integrated word meanings with (novel) picture context. Here, infants showed different ERP responses for words that did or did not align with the picture context: a phonological mismatch (N200) and a semantic mismatch (N400). Together, results were informative of visual categorization, word recognition and word-to-world-mappings, all three crucial processes for vocabulary construction.
  • McQueen, J. M., Tyler, M., & Cutler, A. (2012). Lexical retuning of children’s speech perception: Evidence for knowledge about words’ component sounds. Language Learning and Development, 8, 317-339. doi:10.1080/15475441.2011.641887.

    Abstract

    Children hear new words from many different talkers; to learn words most efficiently, they should be able to represent them independently of talker-specific pronunciation detail. However, do children know what the component sounds of words should be, and can they use that knowledge to deal with different talkers' phonetic realizations? Experiment 1 replicated prior studies on lexically guided retuning of speech perception in adults, with a picture-verification methodology suitable for children. One participant group heard an ambiguous fricative ([s/f]) replacing /f/ (e.g., in words like giraffe); another group heard [s/f] replacing /s/ (e.g., in platypus). The first group subsequently identified more tokens on a Simpie-[s/f]impie-Fimpie toy-name continuum as Fimpie. Experiments 2 and 3 found equivalent lexically guided retuning effects in 12- and 6-year-olds. Children aged 6 have all that is needed for adjusting to talker variation in speech: detailed and abstract phonological representations and the ability to apply them during spoken-word recognition.

    Files private

    Request files
  • Tuinman, A., Mitterer, H., & Cutler, A. (2012). Resolving ambiguity in familiar and unfamiliar casual speech. Journal of Memory and Language, 66, 530-544. doi:10.1016/j.jml.2012.02.001.

    Abstract

    In British English, the phrase Canada aided can sound like Canada raided if the speaker links the two vowels at the word boundary with an intrusive /r/. There are subtle phonetic differences between an onset /r/ and an intrusive /r/, however. With cross-modal priming and eye-tracking, we examine how native British English listeners and non-native (Dutch) listeners deal with the lexical ambiguity arising from this language-specific connected speech process. Together the results indicate that the presence of /r/ initially activates competing words for both listener groups; however, the native listeners rapidly exploit the phonetic cues and achieve correct lexical selection. In contrast, these advanced L2 listeners to English failed to recover from the /r/-induced competition, and failed to match native performance in either task. The /r/-intrusion process, which adds a phoneme to speech input, thus causes greater difficulty for L2 listeners than connectedspeech processes which alter or delete phonemes.
  • Warner, N. L., McQueen, J. M., Liu, P. Z., Hoffmann, M., & Cutler, A. (2012). Timing of perception for all English diphones [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1967.

    Abstract

    Information in speech does not unfold discretely over time; perceptual cues are gradient and overlapped. However, this varies greatly across segments and environments: listeners cannot identify the affricate in /ptS/ until the frication, but information about the vowel in /li/ begins early. Unlike most prior studies, which have concentrated on subsets of language sounds, this study tests perception of every English segment in every phonetic environment, sampling perceptual identification at six points in time (13,470 stimuli/listener; 20 listeners). Results show that information about consonants after another segment is most localized for affricates (almost entirely in the release), and most gradual for voiced stops. In comparison to stressed vowels, unstressed vowels have less information spreading to neighboring segments and are less well identified. Indeed, many vowels, especially lax ones, are poorly identified even by the end of the following segment. This may partly reflect listeners’ familiarity with English vowels’ dialectal variability. Diphthongs and diphthongal tense vowels show the most sudden improvement in identification, similar to affricates among the consonants, suggesting that information about segments defined by acoustic change is highly localized. This large dataset provides insights into speech perception and data for probabilistic modeling of spoken word recognition.
  • Broersma, M., & Cutler, A. (2011). Competition dynamics of second-language listening. Quarterly Journal of Experimental Psychology, 64, 74-95. doi:10.1080/17470218.2010.499174.

    Abstract

    Spoken-word recognition in a nonnative language is particularly difficult where it depends on discrimination between confusable phonemes. Four experiments here examine whether this difficulty is in part due to phantom competition from “near-words” in speech. Dutch listeners confuse English /aelig/ and /ε/, which could lead to the sequence daf being interpreted as deaf, or lemp being interpreted as lamp. In auditory lexical decision, Dutch listeners indeed accepted such near-words as real English words more often than English listeners did. In cross-modal priming, near-words extracted from word or phrase contexts (daf from DAFfodil, lemp from eviL EMPire) induced activation of corresponding real words (deaf; lamp) for Dutch, but again not for English, listeners. Finally, by the end of untruncated carrier words containing embedded words or near-words (definite; daffodil) no activation of the real embedded forms (deaf in definite) remained for English or Dutch listeners, but activation of embedded near-words (deaf in daffodil) did still remain, for Dutch listeners only. Misinterpretation of the initial vowel here favoured the phantom competitor and disfavoured the carrier (lexically represented as containing a different vowel). Thus, near-words compete for recognition and continue competing for longer than actually embedded words; nonnative listening indeed involves phantom competition.
  • Cutler, A. (2011). Listening to REAL second language. AATSEEL Newsletter, 54(3), 14.
  • Cutler, A., Andics, A., & Fang, Z. (2011). Inter-dependent categorization of voices and segments. In W.-S. Lee, & E. Zee (Eds.), Proceedings of the 17th International Congress of Phonetic Sciences [ICPhS 2011] (pp. 552-555). Hong Kong: Department of Chinese, Translation and Linguistics, City University of Hong Kong.

    Abstract

    Listeners performed speeded two-alternative choice between two unfamiliar and relatively similar voices or between two phonetically close segments, in VC syllables. For each decision type (segment, voice), the non-target dimension (voice, segment) either was constant, or varied across four alternatives. Responses were always slower when a non-target dimension varied than when it did not, but the effect of phonetic variation on voice identity decision was stronger than that of voice variation on phonetic identity decision. Cues to voice and segment identity in speech are processed inter-dependently, but hard categorization decisions about voices draw on, and are hence sensitive to, segmental information.
  • Johnson, E. K., Westrek, E., Nazzi, T., & Cutler, A. (2011). Infant ability to tell voices apart rests on language experience. Developmental Science, 14(5), 1002-1011. doi:10.1111/j.1467-7687.2011.01052.x.

    Abstract

    A visual fixation study tested whether seven-month-olds can discriminate between different talkers. The infants were first habituated to talkers producing sentences in either a familiar or unfamiliar language, then heard test sentences from previously unheard speakers, either in the language used for habituation, or in another language. When the language at test mismatched that in habituation, infants always noticed the change. When language remained constant and only talker altered, however, infants detected the change only if the language was the native tongue. Adult listeners with a different native tongue than the infants did not reproduce the discriminability patterns shown by the infants, and infants detected neither voice nor language changes in reversed speech; both these results argue against explanation of the native-language voice discrimination in terms of acoustic properties of the stimuli. The ability to identify talkers is, like many other perceptual abilities, strongly influenced by early life experience.
  • Tuinman, A., Mitterer, H., & Cutler, A. (2011). Perception of intrusive /r/ in English by native, cross-language and cross-dialect listeners. Journal of the Acoustical Society of America, 130, 1643-1652. doi:10.1121/1.3619793.

    Abstract

    In sequences such as law and order, speakers of British English often insert /r/ between law and and. Acoustic analyses revealed such “intrusive” /r/ to be significantly shorter than canonical /r/. In a 2AFC experiment, native listeners heard British English sentences in which /r/ duration was manipulated across a word boundary [e.g., saw (r)ice], and orthographic and semantic factors were varied. These listeners responded categorically on the basis of acoustic evidence for /r/ alone, reporting ice after short /r/s, rice after long /r/s; orthographic and semantic factors had no effect. Dutch listeners proficient in English who heard the same materials relied less on durational cues than the native listeners, and were affected by both orthography and semantic bias. American English listeners produced intermediate responses to the same materials, being sensitive to duration (less so than native, more so than Dutch listeners), and to orthography (less so than the Dutch), but insensitive to the semantic manipulation. Listeners from language communities without common use of intrusive /r/ may thus interpret intrusive /r/ as canonical /r/, with a language difference increasing this propensity more than a dialect difference. Native listeners, however, efficiently distinguish intrusive from canonical /r/ by exploiting the relevant acoustic variation.
  • Tuinman, A., & Cutler, A. (2011). L1 knowledge and the perception of casual speech processes in L2. In M. Wrembel, M. Kul, & K. Dziubalska-Kolaczyk (Eds.), Achievements and perspectives in SLA of speech: New Sounds 2010. Volume I (pp. 289-301). Frankfurt am Main: Peter Lang.

    Abstract

    Every language manifests casual speech processes, and hence every second language too. This study examined how listeners deal with second-language casual speech processes, as a function of the processes in their native language. We compared a match case, where a second-language process t/-reduction) is also operative in native speech, with a mismatch case, where a second-language process (/r/-insertion) is absent from native speech. In each case native and non-native listeners judged stimuli in which a given phoneme (in sentence context) varied along a continuum from absent to present. Second-language listeners in general mimicked native performance in the match case, but deviated significantly from native performance in the mismatch case. Together these results make it clear that the mapping from first to second language is as important in the interpretation of casual speech processes as in other dimensions of speech perception. Unfamiliar casual speech processes are difficult to adapt to in a second language. Casual speech processes that are already familiar from native speech, however, are easy to adapt to; indeed, our results even suggest that it is possible for subtle difference in their occurrence patterns across the two languages to be detected,and to be accommodated to in second-language listening
  • Tuinman, A., Mitterer, H., & Cutler, A. (2011). The efficiency of cross-dialectal word recognition. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy (pp. 153-156).

    Abstract

    Dialects of the same language can differ in the casual speech processes they allow; e.g., British English allows the insertion of [r] at word boundaries in sequences such as saw ice, while American English does not. In two speeded word recognition experiments, American listeners heard such British English sequences; in contrast to non-native listeners, they accurately perceived intended vowel-initial words even with intrusive [r]. Thus despite input mismatches, cross-dialectal word recognition benefits from the full power of native-language processing.
  • Wagner, M., Tran, D., Togneri, R., Rose, P., Powers, D., Onslow, M., Loakes, D., Lewis, T., Kuratate, T., Kinoshita, Y., Kemp, N., Ishihara, S., Ingram, J., Hajek, J., Grayden, D., Göcke, R., Fletcher, J., Estival, D., Epps, J., Dale, R., Cutler, A., Cox, F., Chetty, G., Cassidy, S., Butcher, A., Burnham, D., Bird, S., Best, C., Bennamoun, M., Arciuli, J., & Ambikairajah, E. (2011). The Big Australian Speech Corpus (The Big ASC). In M. Tabain, J. Fletcher, D. Grayden, J. Hajek, & A. Butcher (Eds.), Proceedings of the Thirteenth Australasian International Conference on Speech Science and Technology (pp. 166-170). Melbourne: ASSTA.
  • Cutler, A. (2010). Abstraction-based efficiency in the lexicon. Laboratory Phonology, 1(2), 301-318. doi:10.1515/LABPHON.2010.016.

    Abstract

    Listeners learn from their past experience of listening to spoken words, and use this learning to maximise the efficiency of future word recognition. This paper summarises evidence that the facilitatory effects of drawing on past experience are mediated by abstraction, enabling learning to be generalised across new words and new listening situations. Phoneme category retuning, which allows adaptation to speaker-specific articulatory characteristics, is generalised on the basis of relatively brief experience to words previously unheard from that speaker. Abstract knowledge of prosodic regularities is applied to recognition even of novel words for which these regularities were violated. Prosodic word-boundary regularities drive segmentation of speech into words independently of the membership of the lexical candidate set resulting from the segmentation operation. Each of these different cases illustrates how abstraction from past listening experience has contributed to the efficiency of lexical recognition.
  • Cutler, A., Eisner, F., McQueen, J. M., & Norris, D. (2010). How abstract phonemic categories are necessary for coping with speaker-related variation. In C. Fougeron, B. Kühnert, M. D'Imperio, & N. Vallée (Eds.), Laboratory phonology 10 (pp. 91-111). Berlin: de Gruyter.
  • Cutler, A., Mitterer, H., Brouwer, S., & Tuinman, A. (2010). Phonological competition in casual speech. In Proceedings of DiSS-LPSS Joint Workshop 2010 (pp. 43-46).
  • Cutler, A., Cooke, M., & Lecumberri, M. L. G. (2010). Preface. Speech Communication, 52, 863. doi:10.1016/j.specom.2010.11.003.

    Abstract

    Adverse listening conditions always make the perception of speech harder, but their deleterious effect is far greater if the speech we are trying to understand is in a non-native language. An imperfect signal can be coped with by recourse to the extensive knowledge one has of a native language, and imperfect knowledge of a non-native language can still support useful communication when speech signals are high-quality. But the combination of imperfect signal and imperfect knowledge leads rapidly to communication breakdown. This phenomenon is undoubtedly well known to every reader of Speech Communication from personal experience. Many readers will also have a professional interest in explaining, or remedying, the problems it produces. The journal’s readership being a decidedly interdisciplinary one, this interest will involve quite varied scientific approaches, including (but not limited to) modelling the interaction of first and second language vocabularies and phonemic repertoires, developing targeted listening training for language learners, and redesigning the acoustics of classrooms and conference halls. In other words, the phenomenon that this special issue deals with is a well-known one, that raises important scientific and practical questions across a range of speech communication disciplines, and Speech Communication is arguably the ideal vehicle for presentation of such a breadth of approaches in a single volume. The call for papers for this issue elicited a large number of submissions from across the full range of the journal’s interdisciplinary scope, requiring the guest editors to apply very strict criteria to the final selection. Perhaps unique in the history of treatments of this topic is the combination represented by the guest editors for this issue: a phonetician whose primary research interest is in second-language speech (MLGL), an engineer whose primary research field is the acoustics of masking in speech processing (MC), and a psychologist whose primary research topic is the recognition of spoken words (AC). In the opening article of the issue, these three authors together review the existing literature on listening to second-language speech under adverse conditions, bringing together these differing perspectives for the first time in a single contribution. The introductory review is followed by 13 new experimental reports of phonetic, acoustic and psychological studies of the topic. The guest editors thank Speech Communication editor Marc Swerts and the journal’s team at Elsevier, as well as all the reviewers who devoted time and expert efforts to perfecting the contributions to this issue.
  • Cutler, A., El Aissati, A., Hanulikova, A., & McQueen, J. M. (2010). Effects on speech parsing of vowelless words in the phonology. In Abstracts of Laboratory Phonology 12 (pp. 115-116).
  • Cutler, A., Treiman, R., & Van Ooijen, B. (2010). Strategic deployment of orthographic knowledge in phoneme detection. Language and Speech, 53(3), 307 -320. doi:10.1177/0023830910371445.

    Abstract

    The phoneme detection task is widely used in spoken-word recognition research. Alphabetically literate participants, however, are more used to explicit representations of letters than of phonemes. The present study explored whether phoneme detection is sensitive to how target phonemes are, or may be, orthographically realized. Listeners detected the target sounds [b, m, t, f, s, k] in word-initial position in sequences of isolated English words. Response times were faster to the targets [b, m, t], which have consistent word-initial spelling, than to the targets [f, s, k], which are inconsistently spelled, but only when spelling was rendered salient by the presence in the experiment of many irregularly spelled filler words. Within the inconsistent targets [f, s, k], there was no significant difference between responses to targets in words with more usual (foam, seed, cattle) versus less usual (phone, cede, kettle) spellings. Phoneme detection is thus not necessarily sensitive to orthographic effects; knowledge of spelling stored in the lexical representations of words does not automatically become available as word candidates are activated. However, salient orthographic manipulations in experimental input can induce such sensitivity. We attribute this to listeners' experience of the value of spelling in everyday situations that encourage phonemic decisions (such as learning new names)
  • Cutler, A., & Shanley, J. (2010). Validation of a training method for L2 continuous-speech segmentation. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 1844-1847).

    Abstract

    Recognising continuous speech in a second language is often unexpectedly difficult, as the operation of segmenting speech is so attuned to native-language structure. We report the initial steps in development of a novel training method for second-language listening, focusing on speech segmentation and employing a task designed for studying this: word-spotting. Listeners detect real words in sequences consisting of a word plus a minimal context. The present validation study shows that learners from varying non-English backgrounds successfully perform a version of this task in English, and display appropriate sensitivity to structural factors that also affect segmentation by native English listeners.
  • Junge, C., Cutler, A., & Hagoort, P. (2010). Ability to segment words from speech as a precursor of later language development: Insights from electrophysiological responses in the infant brain. In M. Burgess, J. Davey, C. Don, & T. McMinn (Eds.), Proceedings of 20th International Congress on Acoustics, ICA 2010. Incorporating Proceedings of the 2010 annual conference of the Australian Acoustical Society (pp. 3727-3732). Australian Acoustical Society, NSW Division.
  • Junge, C., Hagoort, P., Kooijman, V., & Cutler, A. (2010). Brain potentials for word segmentation at seven months predict later language development. In K. Franich, K. M. Iserman, & L. L. Keil (Eds.), Proceedings of the 34th Annual Boston University Conference on Language Development. Volume 1 (pp. 209-220). Somerville, MA: Cascadilla Press.
  • Lecumberri, M. L. G., Cooke, M., & Cutler, A. (Eds.). (2010). Non-native speech perception in adverse conditions [Special Issue]. Speech Communication, 52(11/12).
  • Lecumberri, M. L. G., Cooke, M., & Cutler, A. (2010). Non-native speech perception in adverse conditions: A review. Speech Communication, 52, 864-886. doi:10.1016/j.specom.2010.08.014.

    Abstract

    If listening in adverse conditions is hard, then listening in a foreign language is doubly so: non-native listeners have to cope with both imperfect signals and imperfect knowledge. Comparison of native and non-native listener performance in speech-in-noise tasks helps to clarify the role of prior linguistic experience in speech perception, and, more directly, contributes to an understanding of the problems faced by language learners in everyday listening situations. This article reviews experimental studies on non-native listening in adverse conditions, organised around three principal contributory factors: the task facing listeners, the effect of adverse conditions on speech, and the differences among listener populations. Based on a comprehensive tabulation of key studies, we identify robust findings, research trends and gaps in current knowledge.
  • McQueen, J. M., & Cutler, A. (2010). Cognitive processes in speech perception. In W. J. Hardcastle, J. Laver, & F. E. Gibbon (Eds.), The handbook of phonetic sciences (2nd ed., pp. 489-520). Oxford: Blackwell.
  • Otake, T., McQueen, J. M., & Cutler, A. (2010). Competition in the perception of spoken Japanese words. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 114-117).

    Abstract

    Japanese listeners detected Japanese words embedded at the end of nonsense sequences (e.g., kaba 'hippopotamus' in gyachikaba). When the final portion of the preceding context together with the initial portion of the word (e.g., here, the sequence chika) was compatible with many lexical competitors, recognition of the embedded word was more difficult than when such a sequence was compatible with few competitors. This clear effect of competition, established here for preceding context in Japanese, joins similar demonstrations, in other languages and for following contexts, to underline that the functional architecture of the human spoken-word recognition system is a universal one.
  • Tuinman, A., & Cutler, A. (2010). Casual speech processes: L1 knowledge and L2 speech perception. In K. Dziubalska-Kołaczyk, M. Wrembel, & M. Kul (Eds.), Proceedings of the 6th International Symposium on the Acquisition of Second Language Speech, New Sounds 2010, Poznań, Poland, 1-3 May 2010 (pp. 512-517). Poznan: Adama Mickiewicz University.

    Abstract

    Every language manifests casual speech processes, and hence every second language too. This study examined how listeners deal with second-language casual speech processes, as a function of the processes in their native language. We compared a match case, where a second-language process t/-reduction) is also operative in native speech, with a mismatch case, where a second-language process (/r/-insertion) is absent from native speech. In each case native and non-native listeners judged stimuli in which a given phoneme (in sentence context) varied along a continuum from absent to present. Second-language listeners in general mimicked native performance in the match case, but deviated significantly from native performance in the mismatch case. Together these results make it clear that the mapping from first to second language is as important in the interpretation of casual speech processes as in other dimensions of speech perception. Unfamiliar casual speech processes are difficult to adapt to in a second language. Casual speech processes that are already familiar from native speech, however, are easy to adapt to; indeed, our results even suggest that it is possible for subtle difference in their occurrence patterns across the two languages to be detected,and to be accommodated to in second-language listening.
  • Burnham, D., Ambikairajah, E., Arciuli, J., Bennamoun, M., Best, C. T., Bird, S., Butcher, A. R., Cassidy, S., Chetty, G., Cox, F. M., Cutler, A., Dale, R., Epps, J. R., Fletcher, J. M., Goecke, R., Grayden, D. B., Hajek, J. T., Ingram, J. C., Ishihara, S., Kemp, N., Kinoshita, Y., Kuratate, T., Lewis, T. W., Loakes, D. E., Onslow, M., Powers, D. M., Rose, P., Togneri, R., Tran, D., & Wagner, M. (2009). A blueprint for a comprehensive Australian English auditory-visual speech corpus. In M. Haugh, K. Burridge, J. Mulder, & P. Peters (Eds.), Selected proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus (pp. 96-107). Somerville, MA: Cascadilla Proceedings Project.

    Abstract

    Large auditory-visual (AV) speech corpora are the grist of modern research in speech science, but no such corpus exists for Australian English. This is unfortunate, for speech science is the brains behind speech technology and applications such as text-to-speech (TTS) synthesis, automatic speech recognition (ASR), speaker recognition and forensic identification, talking heads, and hearing prostheses. Advances in these research areas in Australia require a large corpus of Australian English. Here the authors describe a blueprint for building the Big Australian Speech Corpus (the Big ASC), a corpus of over 1,100 speakers from urban and rural Australia, including speakers of non-indigenous, indigenous, ethnocultural, and disordered forms of Australian English, each of whom would be sampled on three occasions in a range of speech tasks designed by the researchers who would be using the corpus.
  • Cutler, A. (2009). Greater sensitivity to prosodic goodness in non-native than in native listeners. Journal of the Acoustical Society of America, 125, 3522-3525. doi:10.1121/1.3117434.

    Abstract

    English listeners largely disregard suprasegmental cues to stress in recognizing words. Evidence for this includes the demonstration of Fear et al. [J. Acoust. Soc. Am. 97, 1893–1904 (1995)] that cross-splicings are tolerated between stressed and unstressed full vowels (e.g., au- of autumn, automata). Dutch listeners, however, do exploit suprasegmental stress cues in recognizing native-language words. In this study, Dutch listeners were presented with English materials from the study of Fear et al. Acceptability ratings by these listeners revealed sensitivity to suprasegmental mismatch, in particular, in replacements of unstressed full vowels by higher-stressed vowels, thus evincing greater sensitivity to prosodic goodness than had been shown by the original native listener group.
  • Cutler, A., Davis, C., & Kim, J. (2009). Non-automaticity of use of orthographic knowledge in phoneme evaluation. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 380-383). Causal Productions Pty Ltd.

    Abstract

    Two phoneme goodness rating experiments addressed the role of orthographic knowledge in the evaluation of speech sounds. Ratings for the best tokens of /s/ were higher in words spelled with S (e.g., bless) than in words where /s/ was spelled with C (e.g., voice). This difference did not appear for analogous nonwords for which every lexical neighbour had either S or C spelling (pless, floice). Models of phonemic processing incorporating obligatory influence of lexical information in phonemic processing cannot explain this dissociation; the data are consistent with models in which phonemic decisions are not subject to necessary top-down lexical influence.
  • Cutler, A. (2009). Psycholinguistics in our time. In P. Rabbitt (Ed.), Inside psychology: A science over 50 years (pp. 91-101). Oxford: Oxford University Press.
  • Cutler, A., Otake, T., & McQueen, J. M. (2009). Vowel devoicing and the perception of spoken Japanese words. Journal of the Acoustical Society of America, 125(3), 1693-1703. doi:10.1121/1.3075556.

    Abstract

    Three experiments, in which Japanese listeners detected Japanese words embedded in nonsense sequences, examined the perceptual consequences of vowel devoicing in that language. Since vowelless sequences disrupt speech segmentation [Norris et al. (1997). Cognit. Psychol. 34, 191– 243], devoicing is potentially problematic for perception. Words in initial position in nonsense sequences were detected more easily when followed by a sequence containing a vowel than by a vowelless segment (with or without further context), and vowelless segments that were potential devoicing environments were no easier than those not allowing devoicing. Thus asa, “morning,” was easier in asau or asazu than in all of asap, asapdo, asaf, or asafte, despite the fact that the /f/ in the latter two is a possible realization of fu, with devoiced [u]. Japanese listeners thus do not treat devoicing contexts as if they always contain vowels. Words in final position in nonsense sequences, however, produced a different pattern: here, preceding vowelless contexts allowing devoicing impeded word detection less strongly (so, sake was detected less accurately, but not less rapidly, in nyaksake—possibly arising from nyakusake—than in nyagusake). This is consistent with listeners treating consonant sequences as potential realizations of parts of existing lexical candidates wherever possible.
  • Kooijman, V., Hagoort, P., & Cutler, A. (2009). Prosodic structure in early word segmentation: ERP evidence from Dutch ten-month-olds. Infancy, 14, 591 -612. doi:10.1080/15250000903263957.

    Abstract

    Recognizing word boundaries in continuous speech requires detailed knowledge of the native language. In the first year of life, infants acquire considerable word segmentation abilities. Infants at this early stage in word segmentation rely to a large extent on the metrical pattern of their native language, at least in stress-based languages. In Dutch and English (both languages with a preferred trochaic stress pattern), segmentation of strong-weak words develops rapidly between 7 and 10 months of age. Nevertheless, trochaic languages contain not only strong-weak words but also words with a weak-strong stress pattern. In this article, we present electrophysiological evidence of the beginnings of weak-strong word segmentation in Dutch 10-month-olds. At this age, the ability to combine different cues for efficient word segmentation does not yet seem to be completely developed. We provide evidence that Dutch infants still largely rely on strong syllables, even for the segmentation of weak-strong words.
  • Tyler, M., & Cutler, A. (2009). Cross-language differences in cue use for speech segmentation. Journal of the Acoustical Society of America, 126, 367-376. doi:10.1121/1.3129127.

    Abstract

    Two artificial-language learning experiments directly compared English, French, and Dutch listeners’ use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable “words.” These words were demarcated by(a) no cue other than transitional probabilities induced by their recurrence, (b) a consistent left-edge cue, or (c) a consistent right-edge cue. Experiment 1 examined a vowel lengthening cue. All three listener groups benefited from this cue in right-edge position; none benefited from it in left-edge position. Experiment 2 examined a pitch-movement cue. English listeners used this cue in left-edge position, French listeners used it in right-edge position, and Dutch listeners used it in both positions. These findings are interpreted as evidence of both language-universal and language-specific effects. Final lengthening is a language-universal effect expressing a more general (non-linguistic) mechanism. Pitch movement expresses prominence which has characteristically different placements across languages: typically at right edges in French, but at left edges in English and Dutch. Finally, stress realization in English versus Dutch encourages greater attention to suprasegmental variation by Dutch than by English listeners, allowing Dutch listeners to benefit from an informative pitch-movement cue even in an uncharacteristic position.
  • Braun, B., Tagliapietra, L., & Cutler, A. (2008). Contrastive utterances make alternatives salient: Cross-modal priming evidence. In Proceedings of Interspeech 2008 (pp. 69-69).

    Abstract

    Sentences with contrastive intonation are assumed to presuppose contextual alternatives to the accented elements. Two cross-modal priming experiments tested in Dutch whether such contextual alternatives are automatically available to listeners. Contrastive associates – but not non- contrastive associates - were facilitated only when primes were produced in sentences with contrastive intonation, indicating that contrastive intonation makes unmentioned contextual alternatives immediately available. Possibly, contrastive contours trigger a “presupposition resolution mechanism” by which these alternatives become salient.
  • Braun, B., Lemhöfer, K., & Cutler, A. (2008). English word stress as produced by English and Dutch speakers: The role of segmental and suprasegmental differences. In Proceedings of Interspeech 2008 (pp. 1953-1953).

    Abstract

    It has been claimed that Dutch listeners use suprasegmental cues (duration, spectral tilt) more than English listeners in distinguishing English word stress. We tested whether this asymmetry also holds in production, comparing the realization of English word stress by native English speakers and Dutch speakers. Results confirmed that English speakers centralize unstressed vowels more, while Dutch speakers of English make more use of suprasegmental differences.
  • Broersma, M., & Cutler, A. (2008). Phantom word activation in L2. System, 36(1), 22-34. doi:10.1016/j.system.2007.11.003.

    Abstract

    L2 listening can involve the phantom activation of words which are not actually in the input. All spoken-word recognition involves multiple concurrent activation of word candidates, with selection of the correct words achieved by a process of competition between them. L2 listening involves more such activation than L1 listening, and we report two studies illustrating this. First, in a lexical decision study, L2 listeners accepted (but L1 listeners did not accept) spoken non-words such as groof or flide as real English words. Second, a priming study demonstrated that the same spoken non-words made recognition of the real words groove, flight easier for L2 (but not L1) listeners, suggesting that, for the L2 listeners only, these real words had been activated by the spoken non-word input. We propose that further understanding of the activation and competition process in L2 lexical processing could lead to new understanding of L2 listening difficulty.
  • Cutler, A., McQueen, J. M., Butterfield, S., & Norris, D. (2008). Prelexically-driven perceptual retuning of phoneme boundaries. In Proceedings of Interspeech 2008 (pp. 2056-2056).

    Abstract

    Listeners heard an ambiguous /f-s/ in nonword contexts where only one of /f/ or /s/ was legal (e.g., frul/*srul or *fnud/snud). In later categorisation of a phonetic continuum from /f/ to /s/, their category boundaries had shifted; hearing -rul led to expanded /f/ categories, -nud expanded /s/. Thus phonotactic sequence information alone induces perceptual retuning of phoneme category boundaries; lexical access is not required.
  • Cutler, A., Garcia Lecumberri, M. L., & Cooke, M. (2008). Consonant identification in noise by native and non-native listeners: Effects of local context. Journal of the Acoustical Society of America, 124(2), 1264-1268. doi:10.1121/1.2946707.

    Abstract

    Speech recognition in noise is harder in second (L2) than first languages (L1). This could be because noise disrupts speech processing more in L2 than L1, or because L1 listeners recover better though disruption is equivalent. Two similar prior studies produced discrepant results: Equivalent noise effects for L1 and L2 (Dutch) listeners, versus larger effects for L2 (Spanish) than L1. To explain this, the latter experiment was presented to listeners from the former population. Larger noise effects on consonant identification emerged for L2 (Dutch) than L1 listeners, suggesting that task factors rather than L2 population differences underlie the results discrepancy.
  • Cutler, A. (2008). The abstract representations in speech processing. Quarterly Journal of Experimental Psychology, 61(11), 1601-1619. doi:10.1080/13803390802218542.

    Abstract

    Speech processing by human listeners derives meaning from acoustic input via intermediate steps involving abstract representations of what has been heard. Recent results from several lines of research are here brought together to shed light on the nature and role of these representations. In spoken-word recognition, representations of phonological form and of conceptual content are dissociable. This follows from the independence of patterns of priming for a word's form and its meaning. The nature of the phonological-form representations is determined not only by acoustic-phonetic input but also by other sources of information, including metalinguistic knowledge. This follows from evidence that listeners can store two forms as different without showing any evidence of being able to detect the difference in question when they listen to speech. The lexical representations are in turn separate from prelexical representations, which are also abstract in nature. This follows from evidence that perceptual learning about speaker-specific phoneme realization, induced on the basis of a few words, generalizes across the whole lexicon to inform the recognition of all words containing the same phoneme. The efficiency of human speech processing has its basis in the rapid execution of operations over abstract representations.
  • Goudbeek, M., Cutler, A., & Smits, R. (2008). Supervised and unsupervised learning of multidimensionally varying nonnative speech categories. Speech Communication, 50(2), 109-125. doi:10.1016/j.specom.2007.07.003.

    Abstract

    The acquisition of novel phonetic categories is hypothesized to be affected by the distributional properties of the input, the relation of the new categories to the native phonology, and the availability of supervision (feedback). These factors were examined in four experiments in which listeners were presented with novel categories based on vowels of Dutch. Distribution was varied such that the categorization depended on the single dimension duration, the single dimension frequency, or both dimensions at once. Listeners were clearly sensitive to the distributional information, but unidimensional contrasts proved easier to learn than multidimensional. The native phonology was varied by comparing Spanish versus American English listeners. Spanish listeners found categorization by frequency easier than categorization by duration, but this was not true of American listeners, whose native vowel system makes more use of duration-based distinctions. Finally, feedback was either available or not; this comparison showed supervised learning to be significantly superior to unsupervised learning.
  • Kim, J., Davis, C., & Cutler, A. (2008). Perceptual tests of rhythmic similarity: II. Syllable rhythm. Language and Speech, 51(4), 343-359. doi:10.1177/0023830908099069.

    Abstract

    To segment continuous speech into its component words, listeners make use of language rhythm; because rhythm differs across languages, so do the segmentation procedures which listeners use. For each of stress-, syllable-and mora-based rhythmic structure, perceptual experiments have led to the discovery of corresponding segmentation procedures. In the case of mora-based rhythm, similar segmentation has been demonstrated in the otherwise unrelated languages Japanese and Telugu; segmentation based on syllable rhythm, however, has been previously demonstrated only for European languages from the Romance family. We here report two target detection experiments in which Korean listeners, presented with speech in Korean and in French, displayed patterns of segmentation like those previously observed in analogous experiments with French listeners. The Korean listeners' accuracy in detecting word-initial target fragments in either language was significantly higher when the fragments corresponded exactly to a syllable in the input than when the fragments were smaller or larger than a syllable. We conclude that Korean and French listeners can call on similar procedures for segmenting speech, and we further propose that perceptual tests of speech segmentation provide a valuable accompaniment to acoustic analyses for establishing languages' rhythmic class membership.
  • Kooijman, V., Johnson, E. K., & Cutler, A. (2008). Reflections on reflections of infant word recognition. In A. D. Friederici, & G. Thierry (Eds.), Early language development: Bridging brain and behaviour (pp. 91-114). Amsterdam: Benjamins.

Share this page