Anne Cutler

Publications

Displaying 1 - 25 of 25
  • Asano, Y., Yuan, C., Grohe, A.-K., Weber, A., Antoniou, M., & Cutler, A. (2020). Uptalk interpretation as a function of listening experience. In N. Minematsu, M. Kondo, T. Arai, & R. Hayashi (Eds.), Proceedings of Speech Prosody 2020 (pp. 735-739). Tokyo: ISCA. doi:10.21437/SpeechProsody.2020-150.

    Abstract

    The term “uptalk” describes utterance-final pitch rises that carry no sentence-structural information. Uptalk is usually dialectal or sociolectal, and Australian English (AusEng) is particularly known for this attribute. We ask here whether experience with an uptalk variety affects listeners’ ability to categorise rising pitch contours on the basis of the timing and height of their onset and offset. Listeners were two groups of English-speakers (AusEng, and American English), and three groups of listeners with L2 English: one group with Mandarin as L1 and experience of listening to AusEng, one with German as L1 and experience of listening to AusEng, and one with German as L1 but no AusEng experience. They heard nouns (e.g. flower, piano) in the framework “Got a NOUN”, each ending with a pitch rise artificially manipulated on three contrasts: low vs. high rise onset, low vs. high rise offset and early vs. late rise onset. Their task was to categorise the tokens as “question” or “statement”, and we analysed the effect of the pitch contrasts on their judgements. Only the native AusEng listeners were able to use the pitch contrasts systematically in making these categorisations.
  • Bruggeman, L., & Cutler, A. (2020). No L1 privilege in talker adaptation. Bilingualism: Language and Cognition, 23(3), 681-693. doi:10.1017/S1366728919000646.

    Abstract

    As a rule, listening is easier in first (L1) than second languages (L2); difficult L2 listening can challenge even highly proficient users. We here examine one particular listening function, adaptation to novel talkers, in such a high-proficiency population: Dutch emigrants to Australia, predominantly using English outside the family, but all also retaining L1 proficiency. Using lexically-guided perceptual learning (Norris, McQueen & Cutler, 2003), we investigated these listeners’ adaptation to an ambiguous speech sound, in parallel experiments in both their L1 and their L2. A control study established that perceptual learning outcomes were unaffected by the procedural measures required for this double comparison. The emigrants showed equivalent proficiency in tests in both languages, robust perceptual adaptation in their L2, English, but no adaptation in L1. We propose that adaptation to novel talkers is a language-specific skill requiring regular novel practice; a limited set of known (family) interlocutors cannot meet this requirement.
  • Ip, M. H. K., & Cutler, A. (2020). Universals of listening: Equivalent prosodic entrainment in tone and non-tone languages. Cognition, 202: 104311. doi:10.1016/j.cognition.2020.104311.

    Abstract

    In English and Dutch, listeners entrain to prosodic contours to predict where focus will fall in an utterance. Here, we ask whether this strategy is universally available, even in languages with very different phonological systems (e.g., tone versus non-tone languages). In a phoneme detection experiment, we examined whether prosodic entrainment also occurs in Mandarin Chinese, a tone language, where the use of various suprasegmental cues to lexical identity may take precedence over their use in salience. Consistent with the results from Germanic languages, response times were facilitated when preceding intonation predicted high stress on the target-bearing word, and the lexical tone of the target word (i.e., rising versus falling) did not affect the Mandarin listeners' response. Further, the extent to which prosodic entrainment was used to detect the target phoneme was the same in both English and Mandarin listeners. Nevertheless, native Mandarin speakers did not adopt an entrainment strategy when the sentences were presented in English, consistent with the suggestion that L2 listening may be strained by additional functional load from prosodic processing. These findings have implications for how universal and language-specific mechanisms interact in the perception of focus structure in everyday discourse.

    Additional information

    supplementary data
  • Yu, J., Mailhammer, R., & Cutler, A. (2020). Vocabulary structure affects word recognition: Evidence from German listeners. In N. Minematsu, M. Kondo, T. Arai, & R. Hayashi (Eds.), Proceedings of Speech Prosody 2020 (pp. 474-478). Tokyo: ISCA. doi:10.21437/SpeechProsody.2020-97.

    Abstract

    Lexical stress is realised similarly in English, German, and Dutch. On a suprasegmental level, stressed syllables tend to be longer and more acoustically salient than unstressed syllables; segmentally, vowels in unstressed syllables are often reduced. The frequency of unreduced unstressed syllables (where only the suprasegmental cues indicate lack of stress) however, differs across the languages. The present studies test whether listener behaviour is affected by these vocabulary differences, by investigating German listeners’ use of suprasegmental cues to lexical stress in German and English word recognition. In a forced-choice identification task, German listeners correctly assigned single-syllable fragments (e.g., Kon-) to one of two words differing in stress (KONto, konZEPT). Thus, German listeners can exploit suprasegmental information for identifying words. German listeners also performed above chance in a similar task in English (with, e.g., DIver, diVERT), i.e., their sensitivity to these cues also transferred to a nonnative language. An English listener group, in contrast, failed in the English fragment task. These findings mirror vocabulary patterns: German has more words with unreduced unstressed syllables than English does.
  • Mandal, S., Best, C. T., Shaw, J., & Cutler, A. (2020). Bilingual phonology in dichotic perception: A case study of Malayalam and English voicing. Glossa: A Journal of General Linguistics, 5(1): 73. doi:10.5334/gjgl.853.

    Abstract

    Listeners often experience cocktail-party situations, encountering multiple ongoing conversa- tions while tracking just one. Capturing the words spoken under such conditions requires selec- tive attention and processing, which involves using phonetic details to discern phonological structure. How do bilinguals accomplish this in L1-L2 competition? We addressed that question using a dichotic listening task with fluent Malayalam-English bilinguals, in which they were pre- sented with synchronized nonce words, one in each language in separate ears, with competing onsets of a labial stop (Malayalam) and a labial fricative (English), both voiced or both voiceless. They were required to attend to the Malayalam or the English item, in separate blocks, and report the initial consonant they heard. We found that perceptual intrusions from the unattended to the attended language were influenced by voicing, with more intrusions on voiced than voiceless tri- als. This result supports our proposal for the feature specification of consonants in Malayalam- English bilinguals, which makes use of privative features, underspecification and the “standard approach” to laryngeal features, as against “laryngeal realism”. Given this representational account, we observe that intrusions result from phonetic properties in the unattended signal being assimilated to the closest matching phonological category in the attended language, and are more likely for segments with a greater number of phonological feature specifications.
  • Ullas, S., Formisano, E., Eisner, F., & Cutler, A. (2020). Audiovisual and lexical cues do not additively enhance perceptual adaptation. Psychonomic Bulletin & Review, 27, 707-715. doi:10.3758/s13423-020-01728-5.

    Abstract

    When listeners experience difficulty in understanding a speaker, lexical and audiovisual (or lipreading) information can be a helpful source of guidance. These two types of information embedded in speech can also guide perceptual adjustment, also known as recalibration or perceptual retuning. With retuning or recalibration, listeners can use these contextual cues to temporarily or permanently reconfigure internal representations of phoneme categories to adjust to and understand novel interlocutors more easily. These two types of perceptual learning, previously investigated in large part separately, are highly similar in allowing listeners to use speech-external information to make phoneme boundary adjustments. This study explored whether the two sources may work in conjunction to induce adaptation, thus emulating real life, in which listeners are indeed likely to encounter both types of cue together. Listeners who received combined audiovisual and lexical cues showed perceptual learning effects similar to listeners who only received audiovisual cues, while listeners who received only lexical cues showed weaker effects compared with the two other groups. The combination of cues did not lead to additive retuning or recalibration effects, suggesting that lexical and audiovisual cues operate differently with regard to how listeners use them for reshaping perceptual categories. Reaction times did not significantly differ across the three conditions, so none of the forms of adjustment were either aided or hindered by processing time differences. Mechanisms underlying these forms of perceptual learning may diverge in numerous ways despite similarities in experimental applications.

    Additional information

    Data and materials
  • Ullas, S., Formisano, E., Eisner, F., & Cutler, A. (2020). Interleaved lexical and audiovisual information can retune phoneme boundaries. Attention, Perception & Psychophysics, 82, 2018-2026. doi:10.3758/s13414-019-01961-8.

    Abstract

    To adapt to situations in which speech perception is difficult, listeners can adjust boundaries between phoneme categories using perceptual learning. Such adjustments can draw on lexical information in surrounding speech, or on visual cues via speech-reading. In the present study, listeners proved they were able to flexibly adjust the boundary between two plosive/stop consonants, /p/-/t/, using both lexical and speech-reading information and given the same experimental design for both cue types. Videos of a speaker pronouncing pseudo-words and audio recordings of Dutch words were presented in alternating blocks of either stimulus type. Listeners were able to switch between cues to adjust phoneme boundaries, and resulting effects were comparable to results from listeners receiving only a single source of information. Overall, audiovisual cues (i.e., the videos) produced the stronger effects, commensurate with their applicability for adapting to noisy environments. Lexical cues were able to induce effects with fewer exposure stimuli and a changing phoneme bias, in a design unlike most prior studies of lexical retuning. While lexical retuning effects were relatively weaker compared to audiovisual recalibration, this discrepancy could reflect how lexical retuning may be more suitable for adapting to speakers than to environments. Nonetheless, the presence of the lexical retuning effects suggests that it may be invoked at a faster rate than previously seen. In general, this technique has further illuminated the robustness of adaptability in speech perception, and offers the potential to enable further comparisons across differing forms of perceptual learning.
  • Ullas, S., Hausfeld, L., Cutler, A., Eisner, F., & Formisano, E. (2020). Neural correlates of phonetic adaptation as induced by lexical and audiovisual context. Journal of Cognitive Neuroscience, 32(11), 2145-2158. doi:10.1162/jocn_a_01608.

    Abstract

    When speech perception is difficult, one way listeners adjust is by reconfiguring phoneme category boundaries, drawing on contextual information. Both lexical knowledge and lipreading cues are used in this way, but it remains unknown whether these two differing forms of perceptual learning are similar at a neural level. This study compared phoneme boundary adjustments driven by lexical or audiovisual cues, using ultra-high-field 7-T fMRI. During imaging, participants heard exposure stimuli and test stimuli. Exposure stimuli for lexical retuning were audio recordings of words, and those for audiovisual recalibration were audio–video recordings of lip movements during utterances of pseudowords. Test stimuli were ambiguous phonetic strings presented without context, and listeners reported what phoneme they heard. Reports reflected phoneme biases in preceding exposure blocks (e.g., more reported /p/ after /p/-biased exposure). Analysis of corresponding brain responses indicated that both forms of cue use were associated with a network of activity across the temporal cortex, plus parietal, insula, and motor areas. Audiovisual recalibration also elicited significant occipital cortex activity despite the lack of visual stimuli. Activity levels in several ROIs also covaried with strength of audiovisual recalibration, with greater activity accompanying larger recalibration shifts. Similar activation patterns appeared for lexical retuning, but here, no significant ROIs were identified. Audiovisual and lexical forms of perceptual learning thus induce largely similar brain response patterns. However, audiovisual recalibration involves additional visual cortex contributions, suggesting that previously acquired visual information (on lip movements) is retrieved and deployed to disambiguate auditory perception.
  • Cutler, A., & Bruggeman, L. (2013). Vocabulary structure and spoken-word recognition: Evidence from French reveals the source of embedding asymmetry. In Proceedings of INTERSPEECH: 14th Annual Conference of the International Speech Communication Association (pp. 2812-2816).

    Abstract

    Vocabularies contain hundreds of thousands of words built from only a handful of phonemes, so that inevitably longer words tend to contain shorter ones. In many languages (but not all) such embedded words occur more often word-initially than word-finally, and this asymmetry, if present, has farreaching consequences for spoken-word recognition. Prior research had ascribed the asymmetry to suffixing or to effects of stress (in particular, final syllables containing the vowel schwa). Analyses of the standard French vocabulary here reveal an effect of suffixing, as predicted by this account, and further analyses of an artificial variety of French reveal that extensive final schwa has an independent and additive effect in promoting the embedding asymmetry.
  • Johnson, E. K., Lahey, M., Ernestus, M., & Cutler, A. (2013). A multimodal corpus of speech to infant and adult listeners. Journal of the Acoustical Society of America, 134, EL534-EL540. doi:10.1121/1.4828977.

    Abstract

    An audio and video corpus of speech addressed to 28 11-month-olds is described. The corpus allows comparisons between adult speech directed towards infants, familiar adults and unfamiliar adult addressees, as well as of caregivers’ word teaching strategies across word classes. Summary data show that infant-directed speech differed more from speech to unfamiliar than familiar adults; that word teaching strategies for nominals versus verbs and adjectives differed; that mothers mostly addressed infants with multi-word utterances; and that infants’ vocabulary size was unrelated to speech rate, but correlated positively with predominance of continuous caregiver speech (not of isolated words) in the input.
  • Kooijman, V., Junge, C., Johnson, E. K., Hagoort, P., & Cutler, A. (2013). Predictive brain signals of linguistic development. Frontiers in Psychology, 4: 25. doi:10.3389/fpsyg.2013.00025.

    Abstract

    The ability to extract word forms from continuous speech is a prerequisite for constructing a vocabulary and emerges in the first year of life. Electrophysiological (ERP) studies of speech segmentation by 9- to 12-month-old listeners in several languages have found a left-localized negativity linked to word onset as a marker of word detection. We report an ERP study showing significant evidence of speech segmentation in Dutch-learning 7-month-olds. In contrast to the left-localized negative effect reported with older infants, the observed overall mean effect had a positive polarity. Inspection of individual results revealed two participant sub-groups: a majority showing a positive-going response, and a minority showing the left negativity observed in older age groups. We retested participants at age three, on vocabulary comprehension and word and sentence production. On every test, children who at 7 months had shown the negativity associated with segmentation of words from speech outperformed those who had produced positive-going brain responses to the same input. The earlier that infants show the left-localized brain responses typically indicating detection of words in speech, the better their early childhood language skills.
  • Otake, T., & Cutler, A. (2013). Lexical selection in action: Evidence from spontaneous punning. Language and Speech, 56(4), 555-573. doi:10.1177/0023830913478933.

    Abstract

    Analysis of a corpus of spontaneously produced Japanese puns from a single speaker over a two-year period provides a view of how a punster selects a source word for a pun and transforms it into another word for humorous effect. The pun-making process is driven by a principle of similarity: the source word should as far as possible be preserved (in terms of segmental sequence) in the pun. This renders homophones (English example: band–banned) the pun type of choice, with part–whole relationships of embedding (cap–capture), and mutations of the source word (peas–bees) rather less favored. Similarity also governs mutations in that single-phoneme substitutions outnumber larger changes, and in phoneme substitutions, subphonemic features tend to be preserved. The process of spontaneous punning thus applies, on line, the same similarity criteria as govern explicit similarity judgments and offline decisions about pun success (e.g., for inclusion in published collections). Finally, the process of spoken-word recognition is word-play-friendly in that it involves multiple word-form activation and competition, which, coupled with known techniques in use in difficult listening conditions, enables listeners to generate most pun types as offshoots of normal listening procedures.
  • Van der Zande, P., Jesse, A., & Cutler, A. (2013). Lexically guided retuning of visual phonetic categories. Journal of the Acoustical Society of America, 134, 562-571. doi:10.1121/1.4807814.

    Abstract

    Listeners retune the boundaries between phonetic categories to adjust to individual speakers' productions. Lexical information, for example, indicates what an unusual sound is supposed to be, and boundary retuning then enables the speaker's sound to be included in the appropriate auditory phonetic category. In this study, it was investigated whether lexical knowledge that is known to guide the retuning of auditory phonetic categories, can also retune visual phonetic categories. In Experiment 1, exposure to a visual idiosyncrasy in ambiguous audiovisually presented target words in a lexical decision task indeed resulted in retuning of the visual category boundary based on the disambiguating lexical context. In Experiment 2 it was tested whether lexical information retunes visual categories directly, or indirectly through the generalization from retuned auditory phonetic categories. Here, participants were exposed to auditory-only versions of the same ambiguous target words as in Experiment 1. Auditory phonetic categories were retuned by lexical knowledge, but no shifts were observed for the visual phonetic categories. Lexical knowledge can therefore guide retuning of visual phonetic categories, but lexically guided retuning of auditory phonetic categories is not generalized to visual categories. Rather, listeners adjust auditory and visual phonetic categories to talker idiosyncrasies separately.
  • Braun, B., Tagliapietra, L., & Cutler, A. (2008). Contrastive utterances make alternatives salient: Cross-modal priming evidence. In Proceedings of Interspeech 2008 (pp. 69-69).

    Abstract

    Sentences with contrastive intonation are assumed to presuppose contextual alternatives to the accented elements. Two cross-modal priming experiments tested in Dutch whether such contextual alternatives are automatically available to listeners. Contrastive associates – but not non- contrastive associates - were facilitated only when primes were produced in sentences with contrastive intonation, indicating that contrastive intonation makes unmentioned contextual alternatives immediately available. Possibly, contrastive contours trigger a “presupposition resolution mechanism” by which these alternatives become salient.
  • Braun, B., Lemhöfer, K., & Cutler, A. (2008). English word stress as produced by English and Dutch speakers: The role of segmental and suprasegmental differences. In Proceedings of Interspeech 2008 (pp. 1953-1953).

    Abstract

    It has been claimed that Dutch listeners use suprasegmental cues (duration, spectral tilt) more than English listeners in distinguishing English word stress. We tested whether this asymmetry also holds in production, comparing the realization of English word stress by native English speakers and Dutch speakers. Results confirmed that English speakers centralize unstressed vowels more, while Dutch speakers of English make more use of suprasegmental differences.
  • Broersma, M., & Cutler, A. (2008). Phantom word activation in L2. System, 36(1), 22-34. doi:10.1016/j.system.2007.11.003.

    Abstract

    L2 listening can involve the phantom activation of words which are not actually in the input. All spoken-word recognition involves multiple concurrent activation of word candidates, with selection of the correct words achieved by a process of competition between them. L2 listening involves more such activation than L1 listening, and we report two studies illustrating this. First, in a lexical decision study, L2 listeners accepted (but L1 listeners did not accept) spoken non-words such as groof or flide as real English words. Second, a priming study demonstrated that the same spoken non-words made recognition of the real words groove, flight easier for L2 (but not L1) listeners, suggesting that, for the L2 listeners only, these real words had been activated by the spoken non-word input. We propose that further understanding of the activation and competition process in L2 lexical processing could lead to new understanding of L2 listening difficulty.
  • Cutler, A., Garcia Lecumberri, M. L., & Cooke, M. (2008). Consonant identification in noise by native and non-native listeners: Effects of local context. Journal of the Acoustical Society of America, 124(2), 1264-1268. doi:10.1121/1.2946707.

    Abstract

    Speech recognition in noise is harder in second (L2) than first languages (L1). This could be because noise disrupts speech processing more in L2 than L1, or because L1 listeners recover better though disruption is equivalent. Two similar prior studies produced discrepant results: Equivalent noise effects for L1 and L2 (Dutch) listeners, versus larger effects for L2 (Spanish) than L1. To explain this, the latter experiment was presented to listeners from the former population. Larger noise effects on consonant identification emerged for L2 (Dutch) than L1 listeners, suggesting that task factors rather than L2 population differences underlie the results discrepancy.
  • Cutler, A., McQueen, J. M., Butterfield, S., & Norris, D. (2008). Prelexically-driven perceptual retuning of phoneme boundaries. In Proceedings of Interspeech 2008 (pp. 2056-2056).

    Abstract

    Listeners heard an ambiguous /f-s/ in nonword contexts where only one of /f/ or /s/ was legal (e.g., frul/*srul or *fnud/snud). In later categorisation of a phonetic continuum from /f/ to /s/, their category boundaries had shifted; hearing -rul led to expanded /f/ categories, -nud expanded /s/. Thus phonotactic sequence information alone induces perceptual retuning of phoneme category boundaries; lexical access is not required.
  • Cutler, A. (2008). The abstract representations in speech processing. Quarterly Journal of Experimental Psychology, 61(11), 1601-1619. doi:10.1080/13803390802218542.

    Abstract

    Speech processing by human listeners derives meaning from acoustic input via intermediate steps involving abstract representations of what has been heard. Recent results from several lines of research are here brought together to shed light on the nature and role of these representations. In spoken-word recognition, representations of phonological form and of conceptual content are dissociable. This follows from the independence of patterns of priming for a word's form and its meaning. The nature of the phonological-form representations is determined not only by acoustic-phonetic input but also by other sources of information, including metalinguistic knowledge. This follows from evidence that listeners can store two forms as different without showing any evidence of being able to detect the difference in question when they listen to speech. The lexical representations are in turn separate from prelexical representations, which are also abstract in nature. This follows from evidence that perceptual learning about speaker-specific phoneme realization, induced on the basis of a few words, generalizes across the whole lexicon to inform the recognition of all words containing the same phoneme. The efficiency of human speech processing has its basis in the rapid execution of operations over abstract representations.
  • Goudbeek, M., Cutler, A., & Smits, R. (2008). Supervised and unsupervised learning of multidimensionally varying nonnative speech categories. Speech Communication, 50(2), 109-125. doi:10.1016/j.specom.2007.07.003.

    Abstract

    The acquisition of novel phonetic categories is hypothesized to be affected by the distributional properties of the input, the relation of the new categories to the native phonology, and the availability of supervision (feedback). These factors were examined in four experiments in which listeners were presented with novel categories based on vowels of Dutch. Distribution was varied such that the categorization depended on the single dimension duration, the single dimension frequency, or both dimensions at once. Listeners were clearly sensitive to the distributional information, but unidimensional contrasts proved easier to learn than multidimensional. The native phonology was varied by comparing Spanish versus American English listeners. Spanish listeners found categorization by frequency easier than categorization by duration, but this was not true of American listeners, whose native vowel system makes more use of duration-based distinctions. Finally, feedback was either available or not; this comparison showed supervised learning to be significantly superior to unsupervised learning.
  • Kim, J., Davis, C., & Cutler, A. (2008). Perceptual tests of rhythmic similarity: II. Syllable rhythm. Language and Speech, 51(4), 343-359. doi:10.1177/0023830908099069.

    Abstract

    To segment continuous speech into its component words, listeners make use of language rhythm; because rhythm differs across languages, so do the segmentation procedures which listeners use. For each of stress-, syllable-and mora-based rhythmic structure, perceptual experiments have led to the discovery of corresponding segmentation procedures. In the case of mora-based rhythm, similar segmentation has been demonstrated in the otherwise unrelated languages Japanese and Telugu; segmentation based on syllable rhythm, however, has been previously demonstrated only for European languages from the Romance family. We here report two target detection experiments in which Korean listeners, presented with speech in Korean and in French, displayed patterns of segmentation like those previously observed in analogous experiments with French listeners. The Korean listeners' accuracy in detecting word-initial target fragments in either language was significantly higher when the fragments corresponded exactly to a syllable in the input than when the fragments were smaller or larger than a syllable. We conclude that Korean and French listeners can call on similar procedures for segmenting speech, and we further propose that perceptual tests of speech segmentation provide a valuable accompaniment to acoustic analyses for establishing languages' rhythmic class membership.
  • Kooijman, V., Johnson, E. K., & Cutler, A. (2008). Reflections on reflections of infant word recognition. In A. D. Friederici, & G. Thierry (Eds.), Early language development: Bridging brain and behaviour (pp. 91-114). Amsterdam: Benjamins.
  • Cutler, A. (1975). Sentence stress and sentence comprehension. PhD Thesis, University of Texas, Austin.
  • Cutler, A., & Fay, D. (1975). You have a Dictionary in your Head, not a Thesaurus. Texas Linguistic Forum, 1, 27-40.
  • Cutler, A. (1971). [Review of the book Probleme der Aufgabenanalyse bei der Erstellung von Sprachprogrammen by K. Bung]. Babel, 7, 29-31.

Share this page