Anne Cutler

Publications

Displaying 1 - 35 of 35
  • Burnham, D., Ambikairajah, E., Arciuli, J., Bennamoun, M., Best, C. T., Bird, S., Butcher, A. R., Cassidy, S., Chetty, G., Cox, F. M., Cutler, A., Dale, R., Epps, J. R., Fletcher, J. M., Goecke, R., Grayden, D. B., Hajek, J. T., Ingram, J. C., Ishihara, S., Kemp, N. and 10 moreBurnham, D., Ambikairajah, E., Arciuli, J., Bennamoun, M., Best, C. T., Bird, S., Butcher, A. R., Cassidy, S., Chetty, G., Cox, F. M., Cutler, A., Dale, R., Epps, J. R., Fletcher, J. M., Goecke, R., Grayden, D. B., Hajek, J. T., Ingram, J. C., Ishihara, S., Kemp, N., Kinoshita, Y., Kuratate, T., Lewis, T. W., Loakes, D. E., Onslow, M., Powers, D. M., Rose, P., Togneri, R., Tran, D., & Wagner, M. (2009). A blueprint for a comprehensive Australian English auditory-visual speech corpus. In M. Haugh, K. Burridge, J. Mulder, & P. Peters (Eds.), Selected proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus (pp. 96-107). Somerville, MA: Cascadilla Proceedings Project.

    Abstract

    Large auditory-visual (AV) speech corpora are the grist of modern research in speech science, but no such corpus exists for Australian English. This is unfortunate, for speech science is the brains behind speech technology and applications such as text-to-speech (TTS) synthesis, automatic speech recognition (ASR), speaker recognition and forensic identification, talking heads, and hearing prostheses. Advances in these research areas in Australia require a large corpus of Australian English. Here the authors describe a blueprint for building the Big Australian Speech Corpus (the Big ASC), a corpus of over 1,100 speakers from urban and rural Australia, including speakers of non-indigenous, indigenous, ethnocultural, and disordered forms of Australian English, each of whom would be sampled on three occasions in a range of speech tasks designed by the researchers who would be using the corpus.
  • Cutler, A. (2009). Greater sensitivity to prosodic goodness in non-native than in native listeners. Journal of the Acoustical Society of America, 125, 3522-3525. doi:10.1121/1.3117434.

    Abstract

    English listeners largely disregard suprasegmental cues to stress in recognizing words. Evidence for this includes the demonstration of Fear et al. [J. Acoust. Soc. Am. 97, 1893–1904 (1995)] that cross-splicings are tolerated between stressed and unstressed full vowels (e.g., au- of autumn, automata). Dutch listeners, however, do exploit suprasegmental stress cues in recognizing native-language words. In this study, Dutch listeners were presented with English materials from the study of Fear et al. Acceptability ratings by these listeners revealed sensitivity to suprasegmental mismatch, in particular, in replacements of unstressed full vowels by higher-stressed vowels, thus evincing greater sensitivity to prosodic goodness than had been shown by the original native listener group.
  • Cutler, A., Davis, C., & Kim, J. (2009). Non-automaticity of use of orthographic knowledge in phoneme evaluation. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 380-383). Causal Productions Pty Ltd.

    Abstract

    Two phoneme goodness rating experiments addressed the role of orthographic knowledge in the evaluation of speech sounds. Ratings for the best tokens of /s/ were higher in words spelled with S (e.g., bless) than in words where /s/ was spelled with C (e.g., voice). This difference did not appear for analogous nonwords for which every lexical neighbour had either S or C spelling (pless, floice). Models of phonemic processing incorporating obligatory influence of lexical information in phonemic processing cannot explain this dissociation; the data are consistent with models in which phonemic decisions are not subject to necessary top-down lexical influence.
  • Cutler, A. (2009). Psycholinguistics in our time. In P. Rabbitt (Ed.), Inside psychology: A science over 50 years (pp. 91-101). Oxford: Oxford University Press.
  • Cutler, A., Otake, T., & McQueen, J. M. (2009). Vowel devoicing and the perception of spoken Japanese words. Journal of the Acoustical Society of America, 125(3), 1693-1703. doi:10.1121/1.3075556.

    Abstract

    Three experiments, in which Japanese listeners detected Japanese words embedded in nonsense sequences, examined the perceptual consequences of vowel devoicing in that language. Since vowelless sequences disrupt speech segmentation [Norris et al. (1997). Cognit. Psychol. 34, 191– 243], devoicing is potentially problematic for perception. Words in initial position in nonsense sequences were detected more easily when followed by a sequence containing a vowel than by a vowelless segment (with or without further context), and vowelless segments that were potential devoicing environments were no easier than those not allowing devoicing. Thus asa, “morning,” was easier in asau or asazu than in all of asap, asapdo, asaf, or asafte, despite the fact that the /f/ in the latter two is a possible realization of fu, with devoiced [u]. Japanese listeners thus do not treat devoicing contexts as if they always contain vowels. Words in final position in nonsense sequences, however, produced a different pattern: here, preceding vowelless contexts allowing devoicing impeded word detection less strongly (so, sake was detected less accurately, but not less rapidly, in nyaksake—possibly arising from nyakusake—than in nyagusake). This is consistent with listeners treating consonant sequences as potential realizations of parts of existing lexical candidates wherever possible.
  • Kooijman, V., Hagoort, P., & Cutler, A. (2009). Prosodic structure in early word segmentation: ERP evidence from Dutch ten-month-olds. Infancy, 14, 591 -612. doi:10.1080/15250000903263957.

    Abstract

    Recognizing word boundaries in continuous speech requires detailed knowledge of the native language. In the first year of life, infants acquire considerable word segmentation abilities. Infants at this early stage in word segmentation rely to a large extent on the metrical pattern of their native language, at least in stress-based languages. In Dutch and English (both languages with a preferred trochaic stress pattern), segmentation of strong-weak words develops rapidly between 7 and 10 months of age. Nevertheless, trochaic languages contain not only strong-weak words but also words with a weak-strong stress pattern. In this article, we present electrophysiological evidence of the beginnings of weak-strong word segmentation in Dutch 10-month-olds. At this age, the ability to combine different cues for efficient word segmentation does not yet seem to be completely developed. We provide evidence that Dutch infants still largely rely on strong syllables, even for the segmentation of weak-strong words.
  • Tyler, M., & Cutler, A. (2009). Cross-language differences in cue use for speech segmentation. Journal of the Acoustical Society of America, 126, 367-376. doi:10.1121/1.3129127.

    Abstract

    Two artificial-language learning experiments directly compared English, French, and Dutch listeners’ use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable “words.” These words were demarcated by(a) no cue other than transitional probabilities induced by their recurrence, (b) a consistent left-edge cue, or (c) a consistent right-edge cue. Experiment 1 examined a vowel lengthening cue. All three listener groups benefited from this cue in right-edge position; none benefited from it in left-edge position. Experiment 2 examined a pitch-movement cue. English listeners used this cue in left-edge position, French listeners used it in right-edge position, and Dutch listeners used it in both positions. These findings are interpreted as evidence of both language-universal and language-specific effects. Final lengthening is a language-universal effect expressing a more general (non-linguistic) mechanism. Pitch movement expresses prominence which has characteristically different placements across languages: typically at right edges in French, but at left edges in English and Dutch. Finally, stress realization in English versus Dutch encourages greater attention to suprasegmental variation by Dutch than by English listeners, allowing Dutch listeners to benefit from an informative pitch-movement cue even in an uncharacteristic position.
  • Cutler, A. (1994). How human speech recognition is affected by phonological diversity among languages. In R. Togneri (Ed.), Proceedings of the fifth Australian International Conference on Speech Science and Technology: Vol. 1 (pp. 285-288). Canberra: Australian Speech Science and Technology Association.

    Abstract

    Listeners process spoken language in ways which are adapted to the phonological structure of their native language. As a consequence, non-native speakers do not listen to a language in the same way as native speakers; moreover, listeners may use their native language listening procedures inappropriately with foreign input. With sufficient experience, however, it may be possible to inhibit this latter (counter-productive) behavior.
  • Cutler, A., & Young, D. (1994). Rhythmic structure of word blends in English. In Proceedings of the Third International Conference on Spoken Language Processing (pp. 1407-1410). Kobe: Acoustical Society of Japan.

    Abstract

    Word blends combine fragments from two words, either in speech errors or when a new word is created. Previous work has demonstrated that in Japanese, such blends preserve moraic structure; in English they do not. A similar effect of moraic structure is observed in perceptual research on segmentation of continuous speech in Japanese; English listeners, by contrast, exploit stress units in segmentation, suggesting that a general rhythmic constraint may underlie both findings. The present study examined whether mis parallel would also hold for word blends. In spontaneous English polysyllabic blends, the source words were significantly more likely to be split before a strong than before a weak (unstressed) syllable, i.e. to be split at a stress unit boundary. In an experiment in which listeners were asked to identify the source words of blends, significantly more correct detections resulted when splits had been made before strong syllables. Word blending, like speech segmentation, appears to be constrained by language rhythm.
  • Cutler, A., Norris, D., & McQueen, J. M. (1994). Modelling lexical access from continuous speech input. Dokkyo International Review, 7, 193-215.

    Abstract

    The recognition of speech involves the segmentation of continuous utterances into their component words. Cross-linguistic evidence is briefly reviewed which suggests that although there are language-specific solutions to this segmentation problem, they have one thing in common: they are all based on language rhythm. In English, segmentation is stress-based: strong syllables are postulated to be the onsets of words. Segmentation, however, can also be achieved by a process of competition between activated lexical hypotheses, as in the Shortlist model. A series of experiments is summarised showing that segmentation of continuous speech depends on both lexical competition and a metrically-guided procedure. In the final section, the implementation of metrical segmentation in the Shortlist model is described: the activation of lexical hypotheses matching strong syllables in the input is boosted and that of hypotheses mismatching strong syllables in the input is penalised.
  • Cutler, A., & Otake, T. (1994). Mora or phoneme? Further evidence for language-specific listening. Journal of Memory and Language, 33, 824-844. doi:10.1006/jmla.1994.1039.

    Abstract

    Japanese listeners detect speech sound targets which correspond precisely to a mora (a phonological unit which is the unit of rhythm in Japanese) more easily than targets which do not. English listeners detect medial vowel targets more slowly than consonants. Six phoneme detection experiments investigated these effects in both subject populations, presented with native- and foreign-language input. Japanese listeners produced faster and more accurate responses to moraic than to nonmoraic targets both in Japanese and, where possible, in English; English listeners responded differently. The detection disadvantage for medial vowels appeared with English listeners both in English and in Japanese; again, Japanese listeners responded differently. Some processing operations which listeners apply to speech input are language-specific; these language-specific procedures, appropriate for listening to input in the native language, may be applied to foreign-language input irrespective of whether they remain appropriate.
  • Cutler, A. (1994). The perception of rhythm in language. Cognition, 50, 79-81. doi:10.1016/0010-0277(94)90021-3.
  • Cutler, A., McQueen, J. M., Baayen, R. H., & Drexler, H. (1994). Words within words in a real-speech corpus. In R. Togneri (Ed.), Proceedings of the 5th Australian International Conference on Speech Science and Technology: Vol. 1 (pp. 362-367). Canberra: Australian Speech Science and Technology Association.

    Abstract

    In a 50,000-word corpus of spoken British English the occurrence of words embedded within other words is reported. Within-word embedding in this real speech sample is common, and analogous to the extent of embedding observed in the vocabulary. Imposition of a syllable boundary matching constraint reduces but by no means eliminates spurious embedding. Embedded words are most likely to overlap with the beginning of matrix words, and thus may pose serious problems for speech recognisers.
  • McQueen, J. M., Norris, D., & Cutler, A. (1994). Competition in spoken word recognition: Spotting words in other words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 621-638.

    Abstract

    Although word boundaries are rarely clearly marked, listeners can rapidly recognize the individual words of spoken sentences. Some theories explain this in terms of competition between multiply activated lexical hypotheses; others invoke sensitivity to prosodic structure. We describe a connectionist model, SHORTLIST, in which recognition by activation and competition is successful with a realistically sized lexicon. Three experiments are then reported in which listeners detected real words embedded in nonsense strings, some of which were themselves the onsets of longer words. Effects both of competition between words and of prosodic structure were observed, suggesting that activation and competition alone are not sufficient to explain word recognition in continuous speech. However, the results can be accounted for by a version of SHORTLIST that is sensitive to prosodic structure.
  • Norris, D., McQueen, J. M., & Cutler, A. (1994). Competition and segmentation in spoken word recognition. In Proceedings of the Third International Conference on Spoken Language Processing: Vol. 1 (pp. 401-404). Yokohama: PACIFICO.

    Abstract

    This paper describes recent experimental evidence which shows that models of spoken word recognition must incorporate both inhibition between competing lexical candidates and a sensitivity to metrical cues to lexical segmentation. A new version of the Shortlist [1][2] model incorporating the Metrical Segmentation Strategy [3] provides a detailed simulation of the data.
  • Cutler, A. (1989). Auditory lexical access: Where do we start? In W. Marslen-Wilson (Ed.), Lexical representation and process (pp. 342-356). Cambridge, MA: MIT Press.

    Abstract

    The lexicon, considered as a component of the process of recognizing speech, is a device that accepts a sound image as input and outputs meaning. Lexical access is the process of formulating an appropriate input and mapping it onto an entry in the lexicon's store of sound images matched with their meanings. This chapter addresses the problems of auditory lexical access from continuous speech. The central argument to be proposed is that utterance prosody plays a crucial role in the access process. Continuous listening faces problems that are not present in visual recognition (reading) or in noncontinuous recognition (understanding isolated words). Aspects of utterance prosody offer a solution to these particular problems.
  • Cutler, A., & Butterfield, S. (1989). Natural speech cues to word segmentation under difficult listening conditions. In J. Tubach, & J. Mariani (Eds.), Proceedings of Eurospeech 89: European Conference on Speech Communication and Technology: Vol. 2 (pp. 372-375). Edinburgh: CEP Consultants.

    Abstract

    One of a listener's major tasks in understanding continuous speech is segmenting the speech signal into separate words. When listening conditions are difficult, speakers can help listeners by deliberately speaking more clearly. In three experiments, we examined how word boundaries are produced in deliberately clear speech. We found that speakers do indeed attempt to mark word boundaries; moreover, they differentiate between word boundaries in a way which suggests they are sensitive to listener needs. Application of heuristic segmentation strategies makes word boundaries before strong syllables easiest for listeners to perceive; but under difficult listening conditions speakers pay more attention to marking word boundaries before weak syllables, i.e. they mark those boundaries which are otherwise particularly hard to perceive.
  • Cutler, A., Howard, D., & Patterson, K. E. (1989). Misplaced stress on prosody: A reply to Black and Byng. Cognitive Neuropsychology, 6, 67-83.

    Abstract

    The recent claim by Black and Byng (1986) that lexical access in reading is subject to prosodic constraints is examined and found to be unsupported. The evidence from impaired reading which Black and Byng report is based on poorly controlled stimulus materials and is inadequately analysed and reported. An alternative explanation of their findings is proposed, and new data are reported for which this alternative explanation can account but their model cannot. Finally, their proposal is shown to be theoretically unmotivated and in conflict with evidence from normal reading.
  • Cutler, A. (1989). Straw modules [Commentary/Massaro: Speech perception]. Behavioral and Brain Sciences, 12, 760-762.
  • Cutler, A. (1989). The new Victorians. New Scientist, (1663), 66.
  • Patterson, R. D., & Cutler, A. (1989). Auditory preprocessing and recognition of speech. In A. Baddeley, & N. Bernsen (Eds.), Research directions in cognitive science: A european perspective: Vol. 1. Cognitive psychology (pp. 23-60). London: Erlbaum.
  • Smith, M. R., Cutler, A., Butterfield, S., & Nimmo-Smith, I. (1989). The perception of rhythm and word boundaries in noise-masked speech. Journal of Speech and Hearing Research, 32, 912-920.

    Abstract

    The present experiment tested the suggestion that human listeners may exploit durational information in speech to parse continuous utterances into words. Listeners were presented with six-syllable unpredictable utterances under noise-masking, and were required to judge between alternative word strings as to which best matched the rhythm of the masked utterances. For each utterance there were four alternative strings: (a) an exact rhythmic and word boundary match, (b) a rhythmic mismatch, and (c) two utterances with the same rhythm as the masked utterance, but different word boundary locations. Listeners were clearly able to perceive the rhythm of the masked utterances: The rhythmic mismatch was chosen significantly less often than any other alternative. Within the three rhythmically matched alternatives, the exact match was chosen significantly more often than either word boundary mismatch. Thus, listeners both perceived speech rhythm and used durational cues effectively to locate the position of word boundaries.
  • Cutler, A., Mehler, J., Norris, D., & Segui, J. (1983). A language-specific comprehension strategy [Letters to Nature]. Nature, 304, 159-160. doi:10.1038/304159a0.

    Abstract

    Infants acquire whatever language is spoken in the environment into which they are born. The mental capability of the newborn child is not biased in any way towards the acquisition of one human language rather than another. Because psychologists who attempt to model the process of language comprehension are interested in the structure of the human mind, rather than in the properties of individual languages, strategies which they incorporate in their models are presumed to be universal, not language-specific. In other words, strategies of comprehension are presumed to be characteristic of the human language processing system, rather than, say, the French, English, or Igbo language processing systems. We report here, however, on a comprehension strategy which appears to be used by native speakers of French but not by native speakers of English.
  • Cutler, A. (1983). Lexical complexity and sentence processing. In G. B. Flores d'Arcais, & R. J. Jarvella (Eds.), The process of language understanding (pp. 43-79). Chichester, Sussex: Wiley.
  • Cutler, A. (1983). Semantics, syntax and sentence accent. In M. Van den Broecke, & A. Cohen (Eds.), Proceedings of the Tenth International Congress of Phonetic Sciences (pp. 85-91). Dordrecht: Foris.
  • Cutler, A., & Ladd, D. R. (Eds.). (1983). Prosody: Models and measurements. Heidelberg: Springer.
  • Cutler, A. (1983). Speakers’ conceptions of the functions of prosody. In A. Cutler, & D. R. Ladd (Eds.), Prosody: Models and measurements (pp. 79-91). Heidelberg: Springer.
  • Ladd, D. R., & Cutler, A. (1983). Models and measurements in the study of prosody. In A. Cutler, & D. R. Ladd (Eds.), Prosody: Models and measurements (pp. 1-10). Heidelberg: Springer.
  • Levelt, W. J. M., & Cutler, A. (1983). Prosodic marking in speech repair. Journal of semantics, 2, 205-217. doi:10.1093/semant/2.2.205.

    Abstract

    Spontaneous self-corrections in speech pose a communication problem; the speaker must make clear to the listener not only that the original Utterance was faulty, but where it was faulty and how the fault is to be corrected. Prosodic marking of corrections - making the prosody of the repair noticeably different from that of the original utterance - offers a resource which the speaker can exploit to provide the listener with such information. A corpus of more than 400 spontaneous speech repairs was analysed, and the prosodic characteristics compared with the syntactic and semantic characteristics of each repair. Prosodic marking showed no relationship at all with the syntactic characteristics of repairs. Instead, marking was associated with certain semantic factors: repairs were marked when the original utterance had been actually erroneous, rather than simply less appropriate than the repair; and repairs tended to be marked more often when the set of items encompassing the error and the repair was small rather than when it was large. These findings lend further weight to the characterization of accent as essentially semantic in function.
  • Cutler, A. (1980). Errors of stress and intonation. In V. A. Fromkin (Ed.), Errors in linguistic performance: Slips of the tongue, ear, pen and hand (pp. 67-80). New York: Academic Press.
  • Cutler, A. (1980). Productivity in word formation. In J. Kreiman, & A. E. Ojeda (Eds.), Papers from the Sixteenth Regional Meeting, Chicago Linguistic Society (pp. 45-51). Chicago, Ill.: CLS.
  • Cutler, A. (1980). La leçon des lapsus. La Recherche, 11(112), 686-692.
  • Cutler, A. (1980). Syllable omission errors and isochrony. In H. W. Dechet, & M. Raupach (Eds.), Temporal variables in speech: studies in honour of Frieda Goldman-Eisler (pp. 183-190). The Hague: Mouton.
  • Cutler, A., & Isard, S. D. (1980). The production of prosody. In B. Butterworth (Ed.), Language production (pp. 245-269). London: Academic Press.
  • Swinney, D. A., Zurif, E. B., & Cutler, A. (1980). Effects of sentential stress and word class upon comprehension in Broca’s aphasics. Brain and Language, 10, 132-144. doi:10.1016/0093-934X(80)90044-9.

    Abstract

    The roles which word class (open/closed) and sentential stress play in the sentence comprehension processes of both agrammatic (Broca's) aphasics and normal listeners were examined with a word monitoring task. Overall, normal listeners responded more quickly to stressed than to unstressed items, but showed no effect of word class. Aphasics also responded more quickly to stressed than to unstressed materials, but, unlike the normals, responded faster to open than to closed class words regardless of their stress. The results are interpreted as support for the theory that Broca's aphasics lack the functional underlying open/closed class word distinction used in word recognition by normal listeners.

Share this page