Anne Cutler

Publications

Displaying 1 - 32 of 32
  • Bruggeman, L., & Cutler, A. (2016). Lexical manipulation as a discovery tool for psycholinguistic research. In C. Carignan, & M. D. Tyler (Eds.), Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016) (pp. 313-316).
  • Cutler, A., & Norris, D. (2016). Bottoms up! How top-down pitfalls ensnare speech perception researchers too. Commentary on C. Firestone & B. Scholl: Cognition does not affect perception: Evaluating the evidence for 'top-down' effects. Behavioral and Brain Sciences, e236. doi:10.1017/S0140525X15002745.

    Abstract

    Not only can the pitfalls that Firestone & Scholl (F&S) identify be generalised across multiple studies within the field of visual perception, but also they have general application outside the field wherever perceptual and cognitive processing are compared. We call attention to the widespread susceptibility of research on the perception of speech to versions of the same pitfalls.
  • Ip, M., & Cutler, A. (2016). Cross-language data on five types of prosodic focus. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 330-334).

    Abstract

    To examine the relative roles of language-specific and language-universal mechanisms in the production of prosodic focus, we compared production of five different types of focus by native speakers of English and Mandarin. Two comparable dialogues were constructed for each language, with the same words appearing in focused and unfocused position; 24 speakers recorded each dialogue in each language. Duration, F0 (mean, maximum, range), and rms-intensity (mean, maximum) of all critical word tokens were measured. Across the different types of focus, cross-language differences were observed in the degree to which English versus Mandarin speakers use the different prosodic parameters to mark focus, suggesting that while prosody may be universally available for expressing focus, the means of its employment may be considerably language-specific
  • Jeske, J., Kember, H., & Cutler, A. (2016). Native and non-native English speakers' use of prosody to predict sentence endings. In Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016).
  • Kember, H., Choi, J., & Cutler, A. (2016). Processing advantages for focused words in Korean. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 702-705).

    Abstract

    In Korean, focus is expressed in accentual phrasing. To ascertain whether words focused in this manner enjoy a processing advantage analogous to that conferred by focus as expressed in, e.g, English and Dutch, we devised sentences with target words in one of four conditions: prosodic focus, syntactic focus, prosodic + syntactic focus, and no focus as a control. 32 native speakers of Korean listened to blocks of 10 sentences, then were presented visually with words and asked whether or not they had heard them. Overall, words with focus were recognised significantly faster and more accurately than unfocused words. In addition, words with syntactic focus or syntactic + prosodic focus were recognised faster than words with prosodic focus alone. As for other languages, Korean focus confers processing advantage on the words carrying it. While prosodic focus does provide an advantage, however, syntactic focus appears to provide the greater beneficial effect for recognition memory
  • Norris, D., McQueen, J. M., & Cutler, A. (2016). Prediction, Bayesian inference and feedback in speech recognition. Language, Cognition and Neuroscience, 31(1), 4-18. doi:10.1080/23273798.2015.1081703.

    Abstract

    Speech perception involves prediction, but how is that prediction implemented? In cognitive models prediction has often been taken to imply that there is feedback of activation from lexical to pre-lexical processes as implemented in interactive-activation models (IAMs). We show that simple activation feedback does not actually improve speech recognition. However, other forms of feedback can be beneficial. In particular, feedback can enable the listener to adapt to changing input, and can potentially help the listener to recognise unusual input, or recognise speech in the presence of competing sounds. The common feature of these helpful forms of feedback is that they are all ways of optimising the performance of speech recognition using Bayesian inference. That is, listeners make predictions about speech because speech recognition is optimal in the sense captured in Bayesian models.
  • Cutler, A. (2005). Lexical stress. In D. B. Pisoni, & R. E. Remez (Eds.), The handbook of speech perception (pp. 264-289). Oxford: Blackwell.
  • Cutler, A., & Broersma, M. (2005). Phonetic precision in listening. In W. J. Hardcastle, & J. M. Beck (Eds.), A figure of speech: A Festschrift for John Laver (pp. 63-91). Mahwah, NJ: Erlbaum.
  • Cutler, A., Klein, W., & Levinson, S. C. (2005). The cornerstones of twenty-first century psycholinguistics. In A. Cutler (Ed.), Twenty-first century psycholinguistics: Four cornerstones (pp. 1-20). Mahwah, NJ: Erlbaum.
  • Cutler, A. (2005). The lexical statistics of word recognition problems caused by L2 phonetic confusion. In Proceedings of the 9th European Conference on Speech Communication and Technology (pp. 413-416).
  • Cutler, A., McQueen, J. M., & Norris, D. (2005). The lexical utility of phoneme-category plasticity. In Proceedings of the ISCA Workshop on Plasticity in Speech Perception (PSP2005) (pp. 103-107).
  • Cutler, A. (2005). Why is it so hard to understand a second language in noise? Newsletter, American Association of Teachers of Slavic and East European Languages, 48, 16-16.
  • Cutler, A., Smits, R., & Cooper, N. (2005). Vowel perception: Effects of non-native language vs. non-native dialect. Speech Communication, 47(1-2), 32-42. doi:10.1016/j.specom.2005.02.001.

    Abstract

    Three groups of listeners identified the vowel in CV and VC syllables produced by an American English talker. The listeners were (a) native speakers of American English, (b) native speakers of Australian English (different dialect), and (c) native speakers of Dutch (different language). The syllables were embedded in multispeaker babble at three signal-to-noise ratios (0 dB, 8 dB, and 16 dB). The identification performance of native listeners was significantly better than that of listeners with another language but did not significantly differ from the performance of listeners with another dialect. Dialect differences did however affect the type of perceptual confusions which listeners made; in particular, the Australian listeners’ judgements of vowel tenseness were more variable than the American listeners’ judgements, which may be ascribed to cross-dialectal differences in this vocalic feature. Although listening difficulty can result when speech input mismatches the native dialect in terms of the precise cues for and boundaries of phonetic categories, the difficulty is very much less than that which arises when speech input mismatches the native language in terms of the repertoire of phonemic categories available.
  • Cutler, A. (Ed.). (2005). Twenty-first century psycholinguistics: Four cornerstones. Mahwah, NJ: Erlbaum.
  • Cutler, A. (Ed.). (2005). Twenty-first century psycholinguistics: Four cornerstones. Hillsdale, NJ: Erlbaum.
  • Goudbeek, M., Smits, R., Cutler, A., & Swingley, D. (2005). Acquiring auditory and phonetic categories. In H. Cohen, & C. Lefebvre (Eds.), Handbook of categorization in cognitive science (pp. 497-513). Amsterdam: Elsevier.
  • Kooijman, V., Hagoort, P., & Cutler, A. (2005). Electrophysiological evidence for prelinguistic infants' word recognition in continuous speech. Cognitive Brain Research, 24(1), 109-116. doi:10.1016/j.cogbrainres.2004.12.009.

    Abstract

    Children begin to talk at about age one. The vocabulary they need to do so must be built on perceptual evidence and, indeed, infants begin to recognize spoken words long before they talk. Most of the utterances infants hear, however, are continuous, without pauses between words, so constructing a vocabulary requires them to decompose continuous speech in order to extract the individual words. Here, we present electrophysiological evidence that 10-month-old infants recognize two-syllable words they have previously heard only in isolation when these words are presented anew in continuous speech. Moreover, they only need roughly the first syllable of the word to begin doing this. Thus, prelinguistic infants command a highly efficient procedure for segmentation and recognition of spoken words in the absence of an existing vocabulary, allowing them to tackle effectively the problem of bootstrapping a lexicon out of the highly variable, continuous speech signals in their environment.
  • Sharp, D. J., Scott, S. K., Cutler, A., & Wise, R. J. S. (2005). Lexical retrieval constrained by sound structure: The role of the left inferior frontal gyrus. Brain and Language, 92(3), 309-319. doi:10.1016/j.bandl.2004.07.002.

    Abstract

    Positron emission tomography was used to investigate two competing hypotheses about the role of the left inferior frontal gyrus (IFG) in word generation. One proposes a domain-specific organization, with neural activation dependent on the type of information being processed, i.e., surface sound structure or semantic. The other proposes a process-specific organization, with activation dependent on processing demands, such as the amount of selection needed to decide between competing lexical alternatives. In a novel word retrieval task, word reconstruction (WR), subjects generated real words from heard non-words by the substitution of either a vowel or consonant. Both types of lexical retrieval, informed by sound structure alone, produced activation within anterior and posterior left IFG regions. Within these regions there was greater activity for consonant WR, which is more difficult and imposes greater processing demands. These results support a process-specific organization of the anterior left IFG.
  • Van Donselaar, W., Koster, M., & Cutler, A. (2005). Exploring the role of lexical stress in lexical recognition. Quarterly Journal of Experimental Psychology, 58A(2), 251-273. doi:10.1080/02724980343000927.

    Abstract

    Three cross-modal priming experiments examined the role of suprasegmental information in the processing of spoken words. All primes consisted of truncated spoken Dutch words. Recognition of visually presented word targets was facilitated by prior auditory presentation of the first two syllables of the same words as primes, but only if they were appropriately stressed (e.g., OKTOBER preceded by okTO-); inappropriate stress, compatible with another word (e.g., OKTOBER preceded by OCto-, the beginning of octopus), produced inhibition. Monosyllabic fragments (e.g., OC-) also produced facilitation when appropriately stressed; if inappropriately stressed, they produced neither facilitation nor inhibition. The bisyllabic fragments that were compatible with only one word produced facilitation to semantically associated words, but inappropriate stress caused no inhibition of associates. The results are explained within a model of spoken-word recognition involving competition between simultaneously activated phonological representations followed by activation of separate conceptual representations for strongly supported lexical candidates; at the level of the phonological representations, activation is modulated by both segmental and suprasegmental information.
  • Warner, N., Smits, R., McQueen, J. M., & Cutler, A. (2005). Phonological and statistical effects on timing of speech perception: Insights from a database of Dutch diphone perception. Speech Communication, 46(1), 53-72. doi:10.1016/j.specom.2005.01.003.

    Abstract

    We report detailed analyses of a very large database on timing of speech perception collected by Smits et al. (Smits, R., Warner, N., McQueen, J.M., Cutler, A., 2003. Unfolding of phonetic information over time: A database of Dutch diphone perception. J. Acoust. Soc. Am. 113, 563–574). Eighteen listeners heard all possible diphones of Dutch, gated in portions of varying size and presented without background noise. The present report analyzes listeners’ responses across gates in terms of phonological features (voicing, place, and manner for consonants; height, backness, and length for vowels). The resulting patterns for feature perception differ from patterns reported when speech is presented in noise. The data are also analyzed for effects of stress and of phonological context (neighboring vowel vs. consonant); effects of these factors are observed to be surprisingly limited. Finally, statistical effects, such as overall phoneme frequency and transitional probabilities, along with response biases, are examined; these too exercise only limited effects on response patterns. The results suggest highly accurate speech perception on the basis of acoustic information alone.
  • Warner, N., Kim, J., Davis, C., & Cutler, A. (2005). Use of complex phonological patterns in speech processing: Evidence from Korean. Journal of Linguistics, 41(2), 353-387. doi:10.1017/S0022226705003294.

    Abstract

    Korean has a very complex phonology, with many interacting alternations. In a coronal-/i/ sequence, depending on the type of phonological boundary present, alternations such as palatalization, nasal insertion, nasal assimilation, coda neutralization, and intervocalic voicing can apply. This paper investigates how the phonological patterns of Korean affect processing of morphemes and words. Past research on languages such as English, German, Dutch, and Finnish has shown that listeners exploit syllable structure constraints in processing speech and segmenting it into words. The current study shows that in parsing speech, listeners also use much more complex patterns that relate the surface phonological string to various boundaries.
  • Butterfield, S., & Cutler, A. (1988). Segmentation errors by human listeners: Evidence for a prosodic segmentation strategy. In W. Ainsworth, & J. Holmes (Eds.), Proceedings of SPEECH ’88: Seventh Symposium of the Federation of Acoustic Societies of Europe: Vol. 3 (pp. 827-833). Edinburgh: Institute of Acoustics.
  • Cutler, A., Mehler, J., Norris, D., & Segui, J. (1988). Limits on bilingualism [Letters to Nature]. Nature, 340, 229-230. doi:10.1038/340229a0.

    Abstract

    SPEECH, in any language, is continuous; speakers provide few reliable cues to the boundaries of words, phrases, or other meaningful units. To understand speech, listeners must divide the continuous speech stream into portions that correspond to such units. This segmentation process is so basic to human language comprehension that psycholinguists long assumed that all speakers would do it in the same way. In previous research1,2, however, we reported that segmentation routines can be language-specific: speakers of French process spoken words syllable by syllable, but speakers of English do not. French has relatively clear syllable boundaries and syllable-based timing patterns, whereas English has relatively unclear syllable boundaries and stress-based timing; thus syllabic segmentation would work more efficiently in the comprehension of French than in the comprehension of English. Our present study suggests that at this level of language processing, there are limits to bilingualism: a bilingual speaker has one and only one basic language.
  • Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14, 113-121. doi:10.1037/0096-1523.14.1.113.

    Abstract

    A model of speech segmentation in a stress language is proposed, according to which the occurrence of a strong syllable triggers segmentation of the speech signal, whereas occurrence of a weak syllable does not trigger segmentation. We report experiments in which listeners detected words embedded in nonsense bisyllables more slowly when the bisyllable had two strong syllables than when it had a strong and a weak syllable; mint was detected more slowly in mintayve than in mintesh. According to our proposed model, this result is an effect of segmentation: When the second syllable is strong, it is segmented from the first syllable, and successful detection of the embedded word therefore requires assembly of speech material across a segmentation position. Speech recognition models involving phonemic or syllabic recoding, or based on strictly left-to-right processes, do not predict this result. It is argued that segmentation at strong syllables in continuous speech recognition serves the purpose of detecting the most efficient locations at which to initiate lexical access. (C) 1988 by the American Psychological Association
  • Cutler, A. (1988). The perfect speech error. In L. Hyman, & C. Li (Eds.), Language, speech and mind: Studies in honor of Victoria A. Fromkin (pp. 209-223). London: Croom Helm.
  • Hawkins, J. A., & Cutler, A. (1988). Psycholinguistic factors in morphological asymmetry. In J. A. Hawkins (Ed.), Explaining language universals (pp. 280-317). Oxford: Blackwell.
  • Henderson, L., Coltheart, M., Cutler, A., & Vincent, N. (1988). Preface. Linguistics, 26(4), 519-520. doi:10.1515/ling.1988.26.4.519.
  • Mehta, G., & Cutler, A. (1988). Detection of target phonemes in spontaneous and read speech. Language and Speech, 31, 135-156.

    Abstract

    Although spontaneous speech occurs more frequently in most listeners’ experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalize to the recognition of spontaneous and read speech materials, and their response time to detect word-initial target phonemes was measured. Response were, overall, equally fast in each speech mode. However analysis of effects previously reported in phoneme detection studies revealed significant differences between speech modes. In read speech but not in spontaneous speech, later targets were detected more rapidly than earlier targets, and targets preceded by long words were detected more rapidly than targets preceded by short words. In contrast, in spontaneous speech but not in read speech, targets were detected more rapidly in accented than unaccented words and in strong than in weak syllables. An explanation for this pattern is offered in terms of characteristic prosodic differences between spontaneous and read speech. The results support claim from previous work that listeners pay great attention to prosodic information in the process of recognizing speech.
  • Norris, D., & Cutler, A. (1988). Speech recognition in French and English. MRC News, 39, 30-31.
  • Norris, D., & Cutler, A. (1988). The relative accessibility of phonemes and syllables. Perception and Psychophysics, 43, 541-550. Retrieved from http://www.psychonomic.org/search/view.cgi?id=8530.

    Abstract

    Previous research comparing detection times for syllables and for phonemes has consistently found that syllables are responded to faster than phonemes. This finding poses theoretical problems for strictly hierarchical models of speech recognition, in which smaller units should be able to be identified faster than larger units. However, inspection of the characteristics of previous experiments’stimuli reveals that subjects have been able to respond to syllables on the basis of only a partial analysis of the stimulus. In the present experiment, five groups of subjects listened to identical stimulus material. Phoneme and syllable monitoring under standard conditions was compared with monitoring under conditions in which near matches of target and stimulus occurred on no-response trials. In the latter case, when subjects were forced to analyze each stimulus fully, phonemes were detected faster than syllables.
  • Cutler, A. (1976). High-stress words are easier to perceive than low-stress words, even when they are equally stressed. Texas Linguistic Forum, 2, 53-57.
  • Cutler, A. (1976). Phoneme-monitoring reaction time as a function of preceding intonation contour. Perception and Psychophysics, 20, 55-60. Retrieved from http://www.psychonomic.org/search/view.cgi?id=18194.

    Abstract

    An acoustically invariant one-word segment occurred in two versions of one syntactic context. In one version, the preceding intonation contour indicated that a stress would fall at the point where this word occurred. In the other version, the preceding contour predicted reduced stress at that point. Reaction time to the initial phoneme of the word was faster in the former case, despite the fact that no acoustic correlates of stress were present. It is concluded that a part of the sentence comprehension process is the prediction of upcoming sentence accents.

Share this page