Anne Cutler

Publications

Displaying 1 - 26 of 26
  • Cutler, A., & McQueen, J. M. (2014). How prosody is both mandatory and optional. In J. Caspers, Y. Chen, W. Heeren, J. Pacilly, N. O. Schiller, & E. Van Zanten (Eds.), Above and Beyond the Segments: Experimental linguistics and phonetics (pp. 71-82). Amsterdam: Benjamins.

    Abstract

    Speech signals originate as a sequence of linguistic units selected by speakers, but these units are necessarily realised in the suprasegmental dimensions of time, frequency and amplitude. For this reason prosodic structure has been viewed as a mandatory target of language processing by both speakers and listeners. In apparent contradiction, however, prosody has also been argued to be ancillary rather than core linguistic structure, making processing of prosodic structure essentially optional. In the present tribute to one of the luminaries of prosodic research for the past quarter century, we review evidence from studies of the processing of lexical stress and focal accent which reconciles these views and shows that both claims are, each in their own way, fully true.
  • Cutler, A. (2014). In thrall to the vocabulary. Acoustics Australia, 42, 84-89.

    Abstract

    Vocabularies contain hundreds of thousands of words built from only a handful of phonemes; longer words inevitably tend to contain shorter ones. Recognising speech thus requires distinguishing intended words from accidentally present ones. Acoustic information in speech is used wherever it contributes significantly to this process; but as this review shows, its contribution differs across languages, with the consequences of this including: identical and equivalently present information distinguishing the same phonemes being used in Polish but not in German, or in English but not in Italian; identical stress cues being used in Dutch but not in English; expectations about likely embedding patterns differing across English, French, Japanese.
  • Junge, C., & Cutler, A. (2014). Early word recognition and later language skills. Brain sciences, 4(4), 532-559. doi:10.3390/brainsci4040532.

    Abstract

    Recent behavioral and electrophysiological evidence has highlighted the long-term importance for language skills of an early ability to recognize words in continuous speech. We here present further tests of this long-term link in the form of follow-up studies conducted with two (separate) groups of infants who had earlier participated in speech segmentation tasks. Each study extends prior follow-up tests: Study 1 by using a novel follow-up measure that taps into online processing, Study 2 by assessing language performance relationships over a longer time span than previously tested. Results of Study 1 show that brain correlates of speech segmentation ability at 10 months are positively related to 16-month-olds’ target fixations in a looking-while-listening task. Results of Study 2 show that infant speech segmentation ability no longer directly predicts language profiles at the age of five. However, a meta-analysis across our results and those of similar studies (Study 3) reveals that age at follow-up does not moderate effect size. Together, the results suggest that infants’ ability to recognize words in speech certainly benefits early vocabulary development; further observed relationships of later language skills to early word recognition may be consequent upon this vocabulary size effect.
  • Junge, C., Cutler, A., & Hagoort, P. (2014). Successful word recognition by 10-month-olds given continuous speech both at initial exposure and test. Infancy, 19(2), 179-193. doi:10.1111/infa.12040.

    Abstract

    Most words that infants hear occur within fluent speech. To compile a vocabulary, infants therefore need to segment words from speech contexts. This study is the first to investigate whether infants (here: 10-month-olds) can recognize words when both initial exposure and test presentation are in continuous speech. Electrophysiological evidence attests that this indeed occurs: An increased extended negativity (word recognition effect) appears for familiarized target words relative to control words. This response proved constant at the individual level: Only infants who showed this negativity at test had shown such a response, within six repetitions after first occurrence, during familiarization.
  • Tuinman, A., Mitterer, H., & Cutler, A. (2014). Use of syntax in perceptual compensation for phonological reduction. Language and Speech, 57, 68-85. doi:10.1177/0023830913479106.

    Abstract

    Listeners resolve ambiguity in speech by consulting context. Extensive research on this issue has largely relied on continua of sounds constructed to vary incrementally between two phonemic endpoints. In this study we presented listeners instead with phonetic ambiguity of a kind with which they have natural experience: varying degrees of word-final /t/-reduction. In two experiments, Dutch listeners decided whether or not the verb in a sentence such as Maar zij ren(t) soms ‘But she sometimes run(s)’ ended in /t/. In Dutch, presence versus absence of final /t/ distinguishes third- from first-person singular present-tense verbs. Acoustic evidence for /t/ varied from clear to absent, and immediately preceding phonetic context was consistent with more versus less likely deletion of /t/. In both experiments, listeners reported more /t/s in sentences in which /t/ would be syntactically correct. In Experiment 1, the disambiguating syntactic information preceded the target verb, as above, while in Experiment 2, it followed the verb. The syntactic bias was greater for fast than for slow responses in Experiment 1, but no such difference appeared in Experiment 2. We conclude that syntactic information does not directly influence pre-lexical processing, but is called upon in making phoneme decisions.
  • Van der Zande, P., Jesse, A., & Cutler, A. (2014). Cross-speaker generalisation in two phoneme-level perceptual adaptation processes. Journal of Phonetics, 43, 38-46. doi:10.1016/j.wocn.2014.01.003.

    Abstract

    Speech perception is shaped by listeners' prior experience with speakers. Listeners retune their phonetic category boundaries after encountering ambiguous sounds in order to deal with variations between speakers. Repeated exposure to an unambiguous sound, on the other hand, leads to a decrease in sensitivity to the features of that particular sound. This study investigated whether these changes in the listeners' perceptual systems can generalise to the perception of speech from a novel speaker. Specifically, the experiments looked at whether visual information about the identity of the speaker could prevent generalisation from occurring. In Experiment 1, listeners retuned auditory category boundaries using audiovisual speech input. This shift in the category boundaries affected perception of speech from both the exposure speaker and a novel speaker. In Experiment 2, listeners were repeatedly exposed to unambiguous speech either auditorily or audiovisually, leading to a decrease in sensitivity to the features of the exposure sound. Here, too, the changes affected the perception of both the exposure speaker and the novel speaker. Together, these results indicate that changes in the perceptual system can affect the perception of speech from a novel speaker and that visual speaker identity information did not prevent this generalisation.
  • Van der Zande, P., Jesse, A., & Cutler, A. (2014). Hearing words helps seeing words: A cross-modal word repetition effect. Speech Communication, 59, 31-43. doi:10.1016/j.specom.2014.01.001.

    Abstract

    Watching a speaker say words benefits subsequent auditory recognition of the same words. In this study, we tested whether hearing words also facilitates subsequent phonological processing from visual speech, and if so, whether speaker repetition influences the magnitude of this word repetition priming. We used long-term cross-modal repetition priming as a means to investigate the underlying lexical representations involved in listening to and seeing speech. In Experiment 1, listeners identified auditory-only words during exposure and visual-only words at test. Words at test were repeated or new and produced by the exposure speaker or a novel speaker. Results showed a significant effect of cross-modal word repetition priming but this was unaffected by speaker changes. Experiment 2 added an explicit recognition task at test. Listeners’ lipreading performance was again improved by prior exposure to auditory words. Explicit recognition memory was poor, and neither word repetition nor speaker repetition improved it. This suggests that cross-modal repetition priming is neither mediated by explicit memory nor improved by speaker information. Our results suggest that phonological representations in the lexicon are shared across auditory and visual processing, and that speaker information is not transferred across modalities at the lexical level.
  • Warner, N., McQueen, J. M., & Cutler, A. (2014). Tracking perception of the sounds of English. The Journal of the Acoustical Society of America, 135, 2295-3006. doi:10.1121/1.4870486.

    Abstract

    Twenty American English listeners identified gated fragments of all 2288 possible English within-word and cross-word diphones, providing a total of 538 560 phoneme categorizations. The results show orderly uptake of acoustic information in the signal and provide a view of where information about segments occurs in time. Information locus depends on each speech sound’s identity and phonological features. Affricates and diphthongs have highly localized information so that listeners’ perceptual accuracy rises during a confined time range. Stops and sonorants have more distributed and gradually appearing information. The identity and phonological features (e.g., vowel vs consonant) of the neighboring segment also influences when acoustic information about a segment is available. Stressed vowels are perceived significantly more accurately than unstressed vowels, but this effect is greater for lax vowels than for tense vowels or diphthongs. The dataset charts the availability of perceptual cues to segment identity across time for the full phoneme repertoire of English in all attested phonetic contexts.
  • Botelho da Silva, T., & Cutler, A. (1993). Ill-formedness and transformability in Portuguese idioms. In C. Cacciari, & P. Tabossi (Eds.), Idioms: Processing, structure and interpretation (pp. 129-143). Hillsdale, NJ: Erlbaum.
  • Cutler, A., Kearns, R., Norris, D., & Scott, D. R. (1993). Problems with click detection: Insights from cross-linguistic comparisons. Speech Communication, 13, 401-410. doi:10.1016/0167-6393(93)90038-M.

    Abstract

    Cross-linguistic comparisons may shed light on the levels of processing involved in the performance of psycholinguistic tasks. For instance, if the same pattern of results appears whether or not subjects understand the experimental materials, it may be concluded that the results do not reflect higher-level linguistic processing. In the present study, English and French listeners performed two tasks - click location and speeded click detection - with both English and French sentences, closely matched for syntactic and phonological structure. Clicks were located more accurately in open- than in closed-class words in both English and French; they were detected more rapidly in open- than in closed-class words in English, but not in French. The two listener groups produced the same pattern of responses, suggesting that higher-level linguistic processing was not involved in the listeners' responses. It is concluded that click detection tasks are primarily sensitive to low-level (e.g. acoustic) effects, and hence are not well suited to the investigation of linguistic processing.
  • Cutler, A. (1993). Segmentation problems, rhythmic solutions. Lingua, 92, 81-104. doi:10.1016/0024-3841(94)90338-7.

    Abstract

    The lexicon contains discrete entries, which must be located in speech input in order for speech to be understood; but the continuity of speech signals means that lexical access from spoken input involves a segmentation problem for listeners. The speech environment of prelinguistic infants may not provide special information to assist the infant listeners in solving this problem. Mature language users in possession of a lexicon might be thought to be able to avoid explicit segmentation of speech by relying on information from successful lexical access; however, evidence from adult perceptual studies indicates that listeners do use explicit segmentation procedures. These procedures differ across languages and seem to exploit language-specific rhythmic structure. Efficient as these procedures are, they may not have been developed in response to statistical properties of the input, because bilinguals, equally competent in two languages, apparently only possess one rhythmic segmentation procedure. The origin of rhythmic segmentation may therefore lie in the infant's exploitation of rhythm to solve the segmentation problem and gain a first toehold on lexical acquisition. Recent evidence from speech production and perception studies with prelinguistic infants supports the claim that infants are sensitive to rhythmic structure and its relationship to lexical segmentation.
  • Cutler, A. (1993). Segmenting speech in different languages. The Psychologist, 6(10), 453-455.
  • Cutler, A. (1993). Phonological cues to open- and closed-class words in the processing of spoken sentences. Journal of Psycholinguistic Research, 22, 109-131.

    Abstract

    Evidence is presented that (a) the open and the closed word classes in English have different phonological characteristics, (b) the phonological dimension on which they differ is one to which listeners are highly sensitive, and (c) spoken open- and closed-class words produce different patterns of results in some auditory recognition tasks. What implications might link these findings? Two recent lines of evidence from disparate paradigms—the learning of an artificial language, and natural and experimentally induced misperception of juncture—are summarized, both of which suggest that listeners are sensitive to the phonological reflections of open- vs. closed-class word status. Although these correlates cannot be strictly necessary for efficient processing, if they are present listeners exploit them in making word class assignments. That such a use of phonological information is of value to listeners could be indirect evidence that open- vs. closed-class words undergo different processing operations. Parts of the research reported in this paper were carried out in collaboration with Sally Butterfield and David Carter, and supported by the Alvey Directorate (United Kingdom). Jonathan Stankler's master's research was supported by the Science and Engineering Research Council (United Kingdom). Thanks to all of the above, and to Merrill Garrett, Mike Kelly, James McQueen, and Dennis Norris for further assistance.
  • Cutler, A. (1993). Language-specific processing: Does the evidence converge? In G. T. Altmann, & R. C. Shillcock (Eds.), Cognitive models of speech processing: The Sperlonga Meeting II (pp. 115-123). Hillsdale, NJ: Erlbaum.
  • Cutler, A., & Mehler, J. (1993). The periodicity bias. Journal of Phonetics, 21, 101-108.
  • Jusczyk, P. W., Cutler, A., & Redanz, N. J. (1993). Infants’ preference for the predominant stress patterns of English words. Child Development, 64, 675-687. Retrieved from http://www.jstor.org/stable/1131210.

    Abstract

    One critical aspect of language acquisition is the development of a lexicon that associates sounds and meanings; but developing a lexicon first requires that the infant segment utterances into individual words. How might the infant begin this process? The present study was designed to examine the potential role that sensitivity to predominant stress patterns of words might play in lexical development. In English, by far the majority of words have stressed (strong) initial syllables. Experiment 1 of our study demonstrated that by 9 months of age American infants listen significantly longer to words with strong/weak stress patterns than to words with weak/strong stress patterns. However, Experiment 2 showed that no significant preferences for the predominant stress pattern appear with 6-month-old infants, which suggests that the preference develops as a result of increasing familiarity with the prosodic features of the native language. In a third experiment, 9-month-olds showed a preference for strong/weak patterns even when the speech input was low-pass filtered, which suggests that their preference is specifically for the prosodic structure of the words. Together the results suggest that attention to predominant stress patterns in the native language may form an important part of the infant's process of developing a lexicon.
  • Nix, A. J., Mehta, G., Dye, J., & Cutler, A. (1993). Phoneme detection as a tool for comparing perception of natural and synthetic speech. Computer Speech and Language, 7, 211-228. doi:10.1006/csla.1993.1011.

    Abstract

    On simple intelligibility measures, high-quality synthesiser output now scores almost as well as natural speech. Nevertheless, it is widely agreed that perception of synthetic speech is a harder task for listeners than perception of natural speech; in particular, it has been hypothesized that listeners have difficulty identifying phonemes in synthetic speech. If so, a simple measure of the speed with which a phoneme can be identified should prove a useful tool for comparing perception of synthetic and natural speech. The phoneme detection task was here used in three experiments comparing perception of natural and synthetic speech. In the first, response times to synthetic and natural targets were not significantly different, but in the second and third experiments response times to synthetic targets were significantly slower than to natural targets. A speed-accuracy tradeoff in the third experiment suggests that an important factor in this task is the response criterion adopted by subjects. It is concluded that the phoneme detection task is a useful tool for investigating phonetic processing of synthetic speech input, but subjects must be encouraged to adopt a response criterion which emphasizes rapid responding. When this is the case, significantly longer response times for synthetic targets can indicate a processing disadvantage for synthetic speech at an early level of phonetic analysis.
  • Otake, T., Hatano, G., Cutler, A., & Mehler, J. (1993). Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language, 32, 258-278. doi:10.1006/jmla.1993.1014.

    Abstract

    Four experiments examined segmentation of spoken Japanese words by native and non-native listeners. Previous studies suggested that language rhythm determines the segmentation unit most natural to native listeners: French has syllabic rhythm, and French listeners use the syllable in segmentation, while English has stress rhythm, and segmentation by English listeners is based on stress. The rhythm of Japanese is based on a subsyllabic unit, the mora. In the present experiments Japanese listeners′ response patterns were consistent with moraic segmentation; acoustic artifacts could not have determined the results since nonnative (English and French) listeners showed different response patterns with the same materials. Predictions of a syllabic hypothesis were disconfirmed in the Japanese listeners′ results; in contrast, French listeners showed a pattern of responses consistent with the syllabic hypothesis. The results provide further evidence that listeners′ segmentation of spoken words relies on procedures determined by the characteristic phonology of their native language.
  • Cutler, A., Mehler, J., Norris, D., & Segui, J. (1988). Limits on bilingualism [Letters to Nature]. Nature, 340, 229-230. doi:10.1038/340229a0.

    Abstract

    SPEECH, in any language, is continuous; speakers provide few reliable cues to the boundaries of words, phrases, or other meaningful units. To understand speech, listeners must divide the continuous speech stream into portions that correspond to such units. This segmentation process is so basic to human language comprehension that psycholinguists long assumed that all speakers would do it in the same way. In previous research1,2, however, we reported that segmentation routines can be language-specific: speakers of French process spoken words syllable by syllable, but speakers of English do not. French has relatively clear syllable boundaries and syllable-based timing patterns, whereas English has relatively unclear syllable boundaries and stress-based timing; thus syllabic segmentation would work more efficiently in the comprehension of French than in the comprehension of English. Our present study suggests that at this level of language processing, there are limits to bilingualism: a bilingual speaker has one and only one basic language.
  • Cutler, A. (1988). The perfect speech error. In L. Hyman, & C. Li (Eds.), Language, speech and mind: Studies in honor of Victoria A. Fromkin (pp. 209-223). London: Croom Helm.
  • Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14, 113-121. doi:10.1037/0096-1523.14.1.113.

    Abstract

    A model of speech segmentation in a stress language is proposed, according to which the occurrence of a strong syllable triggers segmentation of the speech signal, whereas occurrence of a weak syllable does not trigger segmentation. We report experiments in which listeners detected words embedded in nonsense bisyllables more slowly when the bisyllable had two strong syllables than when it had a strong and a weak syllable; mint was detected more slowly in mintayve than in mintesh. According to our proposed model, this result is an effect of segmentation: When the second syllable is strong, it is segmented from the first syllable, and successful detection of the embedded word therefore requires assembly of speech material across a segmentation position. Speech recognition models involving phonemic or syllabic recoding, or based on strictly left-to-right processes, do not predict this result. It is argued that segmentation at strong syllables in continuous speech recognition serves the purpose of detecting the most efficient locations at which to initiate lexical access. (C) 1988 by the American Psychological Association
  • Hawkins, J. A., & Cutler, A. (1988). Psycholinguistic factors in morphological asymmetry. In J. A. Hawkins (Ed.), Explaining language universals (pp. 280-317). Oxford: Blackwell.
  • Henderson, L., Coltheart, M., Cutler, A., & Vincent, N. (1988). Preface. Linguistics, 26(4), 519-520. doi:10.1515/ling.1988.26.4.519.
  • Mehta, G., & Cutler, A. (1988). Detection of target phonemes in spontaneous and read speech. Language and Speech, 31, 135-156.

    Abstract

    Although spontaneous speech occurs more frequently in most listeners’ experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalize to the recognition of spontaneous and read speech materials, and their response time to detect word-initial target phonemes was measured. Response were, overall, equally fast in each speech mode. However analysis of effects previously reported in phoneme detection studies revealed significant differences between speech modes. In read speech but not in spontaneous speech, later targets were detected more rapidly than earlier targets, and targets preceded by long words were detected more rapidly than targets preceded by short words. In contrast, in spontaneous speech but not in read speech, targets were detected more rapidly in accented than unaccented words and in strong than in weak syllables. An explanation for this pattern is offered in terms of characteristic prosodic differences between spontaneous and read speech. The results support claim from previous work that listeners pay great attention to prosodic information in the process of recognizing speech.
  • Norris, D., & Cutler, A. (1988). Speech recognition in French and English. MRC News, 39, 30-31.
  • Norris, D., & Cutler, A. (1988). The relative accessibility of phonemes and syllables. Perception and Psychophysics, 43, 541-550. Retrieved from http://www.psychonomic.org/search/view.cgi?id=8530.

    Abstract

    Previous research comparing detection times for syllables and for phonemes has consistently found that syllables are responded to faster than phonemes. This finding poses theoretical problems for strictly hierarchical models of speech recognition, in which smaller units should be able to be identified faster than larger units. However, inspection of the characteristics of previous experiments’stimuli reveals that subjects have been able to respond to syllables on the basis of only a partial analysis of the stimulus. In the present experiment, five groups of subjects listened to identical stimulus material. Phoneme and syllable monitoring under standard conditions was compared with monitoring under conditions in which near matches of target and stimulus occurred on no-response trials. In the latter case, when subjects were forced to analyze each stimulus fully, phonemes were detected faster than syllables.

Share this page