Anne Cutler

Publications

Displaying 1 - 39 of 39
  • Burnham, D., Ambikairajah, E., Arciuli, J., Bennamoun, M., Best, C. T., Bird, S., Butcher, A. R., Cassidy, S., Chetty, G., Cox, F. M., Cutler, A., Dale, R., Epps, J. R., Fletcher, J. M., Goecke, R., Grayden, D. B., Hajek, J. T., Ingram, J. C., Ishihara, S., Kemp, N. and 10 moreBurnham, D., Ambikairajah, E., Arciuli, J., Bennamoun, M., Best, C. T., Bird, S., Butcher, A. R., Cassidy, S., Chetty, G., Cox, F. M., Cutler, A., Dale, R., Epps, J. R., Fletcher, J. M., Goecke, R., Grayden, D. B., Hajek, J. T., Ingram, J. C., Ishihara, S., Kemp, N., Kinoshita, Y., Kuratate, T., Lewis, T. W., Loakes, D. E., Onslow, M., Powers, D. M., Rose, P., Togneri, R., Tran, D., & Wagner, M. (2009). A blueprint for a comprehensive Australian English auditory-visual speech corpus. In M. Haugh, K. Burridge, J. Mulder, & P. Peters (Eds.), Selected proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus (pp. 96-107). Somerville, MA: Cascadilla Proceedings Project.

    Abstract

    Large auditory-visual (AV) speech corpora are the grist of modern research in speech science, but no such corpus exists for Australian English. This is unfortunate, for speech science is the brains behind speech technology and applications such as text-to-speech (TTS) synthesis, automatic speech recognition (ASR), speaker recognition and forensic identification, talking heads, and hearing prostheses. Advances in these research areas in Australia require a large corpus of Australian English. Here the authors describe a blueprint for building the Big Australian Speech Corpus (the Big ASC), a corpus of over 1,100 speakers from urban and rural Australia, including speakers of non-indigenous, indigenous, ethnocultural, and disordered forms of Australian English, each of whom would be sampled on three occasions in a range of speech tasks designed by the researchers who would be using the corpus.
  • Cutler, A. (2009). Greater sensitivity to prosodic goodness in non-native than in native listeners. Journal of the Acoustical Society of America, 125, 3522-3525. doi:10.1121/1.3117434.

    Abstract

    English listeners largely disregard suprasegmental cues to stress in recognizing words. Evidence for this includes the demonstration of Fear et al. [J. Acoust. Soc. Am. 97, 1893–1904 (1995)] that cross-splicings are tolerated between stressed and unstressed full vowels (e.g., au- of autumn, automata). Dutch listeners, however, do exploit suprasegmental stress cues in recognizing native-language words. In this study, Dutch listeners were presented with English materials from the study of Fear et al. Acceptability ratings by these listeners revealed sensitivity to suprasegmental mismatch, in particular, in replacements of unstressed full vowels by higher-stressed vowels, thus evincing greater sensitivity to prosodic goodness than had been shown by the original native listener group.
  • Cutler, A., Davis, C., & Kim, J. (2009). Non-automaticity of use of orthographic knowledge in phoneme evaluation. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 380-383). Causal Productions Pty Ltd.

    Abstract

    Two phoneme goodness rating experiments addressed the role of orthographic knowledge in the evaluation of speech sounds. Ratings for the best tokens of /s/ were higher in words spelled with S (e.g., bless) than in words where /s/ was spelled with C (e.g., voice). This difference did not appear for analogous nonwords for which every lexical neighbour had either S or C spelling (pless, floice). Models of phonemic processing incorporating obligatory influence of lexical information in phonemic processing cannot explain this dissociation; the data are consistent with models in which phonemic decisions are not subject to necessary top-down lexical influence.
  • Cutler, A. (2009). Psycholinguistics in our time. In P. Rabbitt (Ed.), Inside psychology: A science over 50 years (pp. 91-101). Oxford: Oxford University Press.
  • Cutler, A., Otake, T., & McQueen, J. M. (2009). Vowel devoicing and the perception of spoken Japanese words. Journal of the Acoustical Society of America, 125(3), 1693-1703. doi:10.1121/1.3075556.

    Abstract

    Three experiments, in which Japanese listeners detected Japanese words embedded in nonsense sequences, examined the perceptual consequences of vowel devoicing in that language. Since vowelless sequences disrupt speech segmentation [Norris et al. (1997). Cognit. Psychol. 34, 191– 243], devoicing is potentially problematic for perception. Words in initial position in nonsense sequences were detected more easily when followed by a sequence containing a vowel than by a vowelless segment (with or without further context), and vowelless segments that were potential devoicing environments were no easier than those not allowing devoicing. Thus asa, “morning,” was easier in asau or asazu than in all of asap, asapdo, asaf, or asafte, despite the fact that the /f/ in the latter two is a possible realization of fu, with devoiced [u]. Japanese listeners thus do not treat devoicing contexts as if they always contain vowels. Words in final position in nonsense sequences, however, produced a different pattern: here, preceding vowelless contexts allowing devoicing impeded word detection less strongly (so, sake was detected less accurately, but not less rapidly, in nyaksake—possibly arising from nyakusake—than in nyagusake). This is consistent with listeners treating consonant sequences as potential realizations of parts of existing lexical candidates wherever possible.
  • Kooijman, V., Hagoort, P., & Cutler, A. (2009). Prosodic structure in early word segmentation: ERP evidence from Dutch ten-month-olds. Infancy, 14, 591 -612. doi:10.1080/15250000903263957.

    Abstract

    Recognizing word boundaries in continuous speech requires detailed knowledge of the native language. In the first year of life, infants acquire considerable word segmentation abilities. Infants at this early stage in word segmentation rely to a large extent on the metrical pattern of their native language, at least in stress-based languages. In Dutch and English (both languages with a preferred trochaic stress pattern), segmentation of strong-weak words develops rapidly between 7 and 10 months of age. Nevertheless, trochaic languages contain not only strong-weak words but also words with a weak-strong stress pattern. In this article, we present electrophysiological evidence of the beginnings of weak-strong word segmentation in Dutch 10-month-olds. At this age, the ability to combine different cues for efficient word segmentation does not yet seem to be completely developed. We provide evidence that Dutch infants still largely rely on strong syllables, even for the segmentation of weak-strong words.
  • Tyler, M., & Cutler, A. (2009). Cross-language differences in cue use for speech segmentation. Journal of the Acoustical Society of America, 126, 367-376. doi:10.1121/1.3129127.

    Abstract

    Two artificial-language learning experiments directly compared English, French, and Dutch listeners’ use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable “words.” These words were demarcated by(a) no cue other than transitional probabilities induced by their recurrence, (b) a consistent left-edge cue, or (c) a consistent right-edge cue. Experiment 1 examined a vowel lengthening cue. All three listener groups benefited from this cue in right-edge position; none benefited from it in left-edge position. Experiment 2 examined a pitch-movement cue. English listeners used this cue in left-edge position, French listeners used it in right-edge position, and Dutch listeners used it in both positions. These findings are interpreted as evidence of both language-universal and language-specific effects. Final lengthening is a language-universal effect expressing a more general (non-linguistic) mechanism. Pitch movement expresses prominence which has characteristically different placements across languages: typically at right edges in French, but at left edges in English and Dutch. Finally, stress realization in English versus Dutch encourages greater attention to suprasegmental variation by Dutch than by English listeners, allowing Dutch listeners to benefit from an informative pitch-movement cue even in an uncharacteristic position.
  • Allerhand, M., Butterfield, S., Cutler, A., & Patterson, R. (1992). Assessing syllable strength via an auditory model. In Proceedings of the Institute of Acoustics: Vol. 14 Part 6 (pp. 297-304). St. Albans, Herts: Institute of Acoustics.
  • Cutler, A. (1992). Cross-linguistic differences in speech segmentation. MRC News, 56, 8-9.
  • Cutler, A., & Norris, D. (1992). Detection of vowels and consonants with minimal acoustic variation. Speech Communication, 11, 101-108. doi:10.1016/0167-6393(92)90004-Q.

    Abstract

    Previous research has shown that, in a phoneme detection task, vowels produce longer reaction times than consonants, suggesting that they are harder to perceive. One possible explanation for this difference is based upon their respective acoustic/articulatory characteristics. Another way of accounting for the findings would be to relate them to the differential functioning of vowels and consonants in the syllabic structure of words. In this experiment, we examined the second possibility. Targets were two pairs of phonemes, each containing a vowel and a consonant with similar phonetic characteristics. Subjects heard lists of English words had to press a response key upon detecting the occurrence of a pre-specified target. This time, the phonemes which functioned as vowels in syllabic structure yielded shorter reaction times than those which functioned as consonants. This rules out an explanation for response time difference between vowels and consonants in terms of function in syllable structure. Instead, we propose that consonantal and vocalic segments differ with respect to variability of tokens, both in the acoustic realisation of targets and in the representation of targets by listeners.
  • Cutler, A., Kearns, R., Norris, D., & Scott, D. (1992). Listeners’ responses to extraneous signals coincident with English and French speech. In J. Pittam (Ed.), Proceedings of the 4th Australian International Conference on Speech Science and Technology (pp. 666-671). Canberra: Australian Speech Science and Technology Association.

    Abstract

    English and French listeners performed two tasks - click location and speeded click detection - with both English and French sentences, closely matched for syntactic and phonological structure. Clicks were located more accurately in open- than in closed-class words in both English and French; they were detected more rapidly in open- than in closed-class words in English, but not in French. The two listener groups produced the same pattern of responses, suggesting that higher-level linguistic processing was not involved in these tasks.
  • Cutler, A. (1992). Proceedings with confidence. New Scientist, (1825), 54.
  • Cutler, A. (1992). Processing constraints of the native phonological repertoire on the native language. In Y. Tohkura, E. Vatikiotis-Bateson, & Y. Sagisaka (Eds.), Speech perception, production and linguistic structure (pp. 275-278). Tokyo: Ohmsha.
  • Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Language, 31, 218-236. doi:10.1016/0749-596X(92)90012-M.

    Abstract

    Segmentation of continuous speech into its component words is a nontrivial task for listeners. Previous work has suggested that listeners develop heuristic segmentation procedures based on experience with the structure of their language; for English, the heuristic is that strong syllables (containing full vowels) are most likely to be the initial syllables of lexical words, whereas weak syllables (containing central, or reduced, vowels) are nonword-initial, or, if word-initial, are grammatical words. This hypothesis is here tested against natural and laboratory-induced missegmentations of continuous speech. Precisely the expected pattern is found: listeners erroneously insert boundaries before strong syllables but delete them before weak syllables; boundaries inserted before strong syllables produce lexical words, while boundaries inserted before weak syllables produce grammatical words.
  • Cutler, A., & Robinson, T. (1992). Response time as a metric for comparison of speech recognition by humans and machines. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing: Vol. 1 (pp. 189-192). Alberta: University of Alberta.

    Abstract

    The performance of automatic speech recognition systems is usually assessed in terms of error rate. Human speech recognition produces few errors, but relative difficulty of processing can be assessed via response time techniques. We report the construction of a measure analogous to response time in a machine recognition system. This measure may be compared directly with human response times. We conducted a trial comparison of this type at the phoneme level, including both tense and lax vowels and a variety of consonant classes. The results suggested similarities between human and machine processing in the case of consonants, but differences in the case of vowels.
  • Cutler, A. (1992). Psychology and the segment. In G. Docherty, & D. Ladd (Eds.), Papers in laboratory phonology II: Gesture, segment, prosody (pp. 290-295). Cambridge: Cambridge University Press.
  • Cutler, A., Mehler, J., Norris, D., & Segui, J. (1992). The monolingual nature of speech segmentation by bilinguals. Cognitive Psychology, 24, 381-410.

    Abstract

    Monolingual French speakers employ a syllable-based procedure in speech segmentation; monolingual English speakers use a stress-based segmentation procedure and do not use the syllable-based procedure. In the present study French-English bilinguals participated in segmentation experiments with English and French materials. Their results as a group did not simply mimic the performance of English monolinguals with English language materials and of French monolinguals with French language materials. Instead, the bilinguals formed two groups, defined by forced choice of a dominant language. Only the French-dominant group showed syllabic segmentation and only with French language materials. The English-dominant group showed no syllabic segmentation in either language. However, the English-dominant group showed stress-based segmentation with English language materials; the French-dominant group did not. We argue that rhythmically based segmentation procedures are mutually exclusive, as a consequence of which speech segmentation by bilinguals is, in one respect at least, functionally monolingual.
  • Cutler, A. (1992). The production and perception of word boundaries. In Y. Tohkura, E. Vatikiotis-Bateson, & Y. Sagisaka (Eds.), Speech perception, production and linguistic structure (pp. 419-425). Tokyo: Ohsma.
  • Cutler, A. (1992). The perception of speech: Psycholinguistic aspects. In W. Bright (Ed.), International encyclopedia of language: Vol. 3 (pp. 181-183). New York: Oxford University Press.
  • Cutler, A. (1992). Why not abolish psycholinguistics? In W. Dressler, H. Luschützky, O. Pfeiffer, & J. Rennison (Eds.), Phonologica 1988 (pp. 77-87). Cambridge: Cambridge University Press.
  • McQueen, J. M., & Cutler, A. (1992). Words within words: Lexical statistics and lexical access. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing: Vol. 1 (pp. 221-224). Alberta: University of Alberta.

    Abstract

    This paper presents lexical statistics on the pattern of occurrence of words embedded in other words. We report the results of an analysis of 25000 words, varying in length from two to six syllables, extracted from a phonetically-coded English dictionary (The Longman Dictionary of Contemporary English). Each syllable, and each string of syllables within each word was checked against the dictionary. Two analyses are presented: the first used a complete list of polysyllables, with look-up on the entire dictionary; the second used a sublist of content words, counting only embedded words which were themselves content words. The results have important implications for models of human speech recognition. The efficiency of these models depends, in different ways, on the number and location of words within words.
  • Norris, D., Van Ooijen, B., & Cutler, A. (1992). Speeded detection of vowels and steady-state consonants. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing; Vol. 2 (pp. 1055-1058). Alberta: University of Alberta.

    Abstract

    We report two experiments in which vowels and steady-state consonants served as targets in a speeded detection task. In the first experiment, two vowels were compared with one voiced and once unvoiced fricative. Response times (RTs) to the vowels were longer than to the fricatives. The error rate was higher for the consonants. Consonants in word-final position produced the shortest RTs, For the vowels, RT correlated negatively with target duration. In the second experiment, the same two vowel targets were compared with two nasals. This time there was no significant difference in RTs, but the error rate was still significantly higher for the consonants. Error rate and length correlated negatively for the vowels only. We conclude that RT differences between phonemes are independent of vocalic or consonantal status. Instead, we argue that the process of phoneme detection reflects more finely grained differences in acoustic/articulatory structure within the phonemic repertoire.
  • Cutler, A., & Fear, B. D. (1991). Categoricality in acceptability judgements for strong versus weak vowels. In J. Llisterri (Ed.), Proceedings of the ESCA Workshop on Phonetics and Phonology of Speaking Styles (pp. 18.1-18.5). Barcelona, Catalonia: Universitat Autonoma de Barcelona.

    Abstract

    A distinction between strong and weak vowels can be drawn on the basis of vowel quality, of stress, or of both factors. An experiment was conducted in which sets of contextually matched word-intial vowels ranging from clearly strong to clearly weak were cross-spliced, and the naturalness of the resulting words was rated by listeners. The ratings showed that in general cross-spliced words were only significantly less acceptable than unspliced words when schwa was not involved; this supports a categorical distinction based on vowel quality.
  • Cutler, A. (1991). Linguistic rhythm and speech segmentation. In J. Sundberg, L. Nord, & R. Carlson (Eds.), Music, language, speech and brain (pp. 157-166). London: Macmillan.
  • Cutler, A. (1991). Proceed with caution. New Scientist, (1799), 53-54.
  • Cutler, A. (1991). Prosody in situations of communication: Salience and segmentation. In Proceedings of the Twelfth International Congress of Phonetic Sciences: Vol. 1 (pp. 264-270). Aix-en-Provence: Université de Provence, Service des publications.

    Abstract

    Speakers and listeners have a shared goal: to communicate. The processes of speech perception and of speech production interact in many ways under the constraints of this communicative goal; such interaction is as characteristic of prosodic processing as of the processing of other aspects of linguistic structure. Two of the major uses of prosodic information in situations of communication are to encode salience and segmentation, and these themes unite the contributions to the symposium introduced by the present review.
  • Cutler, A., & Butterfield, S. (1991). Word boundary cues in clear speech: A supplementary report. Speech Communication, 10, 335-353. doi:10.1016/0167-6393(91)90002-B.

    Abstract

    One of a listener's major tasks in understanding continuous speech is segmenting the speech signal into separate words. When listening conditions are difficult, speakers can help listeners by deliberately speaking more clearly. In four experiments, we examined how word boundaries are produced in deliberately clear speech. In an earlier report we showed that speakers do indeed mark word boundaries in clear speech, by pausing at the boundary and lengthening pre-boundary syllables; moreover, these effects are applied particularly to boundaries preceding weak syllables. In English, listeners use segmentation procedures which make word boundaries before strong syllables easier to perceive; thus marking word boundaries before weak syllables in clear speech will make clear precisely those boundaries which are otherwise hard to perceive. The present report presents supplementary data, namely prosodic analyses of the syllable following a critical word boundary. More lengthening and greater increases in intensity were applied in clear speech to weak syllables than to strong. Mean F0 was also increased to a greater extent on weak syllables than on strong. Pitch movement, however, increased to a greater extent on strong syllables than on weak. The effects were, however, very small in comparison to the durational effects we observed earlier for syllables preceding the boundary and for pauses at the boundary.
  • Van Ooijen, B., Cutler, A., & Norris, D. (1991). Detection times for vowels versus consonants. In Eurospeech 91: Vol. 3 (pp. 1451-1454). Genova: Istituto Internazionale delle Comunicazioni.

    Abstract

    This paper reports two experiments with vowels and consonants as phoneme detection targets in real words. In the first experiment, two relatively distinct vowels were compared with two confusible stop consonants. Response times to the vowels were longer than to the consonants. Response times correlated negatively with target phoneme length. In the second, two relatively distinct vowels were compared with their corresponding semivowels. This time, the vowels were detected faster than the semivowels. We conclude that response time differences between vowels and stop consonants in this task may reflect differences between phoneme categories in the variability of tokens, both in the acoustic realisation of targets and in the' representation of targets by subjects.
  • Cutler, A. (1981). Degrees of transparency in word formation. Canadian Journal of Linguistics, 26, 73-77.
  • Cutler, A. (1981). Making up materials is a confounded nuisance, or: Will we able to run any psycholinguistic experiments at all in 1990? Cognition, 10, 65-70. doi:10.1016/0010-0277(81)90026-3.
  • Cutler, A., & Darwin, C. J. (1981). Phoneme-monitoring reaction time and preceding prosody: Effects of stop closure duration and of fundamental frequency. Perception and Psychophysics, 29, 217-224. Retrieved from http://www.psychonomic.org/search/view.cgi?id=12660.

    Abstract

    In an earlier study, it was shown that listeners can use prosodic cues that predict where sentence stress will fall; phoneme-monitoring RTs are faster when the preceding prosody indicates that the word bearing the target will be stressed. Two experiments which further investigate this effect are described. In the first, it is shown that the duration of the closure preceding the release of the target stop consonant burst does not affect the RT advantage for stressed words. In the second, it is shown that fundamental frequency variation is not a necessary component of the prosodic variation that produces the predicted-stress effect. It is argued that sentence processing involves a very flexible use of prosodic information.
  • Cutler, A. (1981). The cognitive reality of suprasegmental phonology. In T. Myers, J. Laver, & J. Anderson (Eds.), The cognitive representation of speech (pp. 399-400). Amsterdam: North-Holland.
  • Cutler, A. (1981). The reliability of speech error data. Linguistics, 19, 561-582.
  • Fodor, J. A., & Cutler, A. (1981). Semantic focus and sentence comprehension. Cognition, 7, 49-59. doi:10.1016/0010-0277(79)90010-6.

    Abstract

    Reaction time to detect a phoneme target in a sentence was found to be faster when the word in which the target occurred formed part of the semantic focus of the sentence. Focus was determined by asking a question before the sentence; that part of the sentence which comprised the answer to the sentence was assumed to be focussed. This procedure made it possible to vary position offocus within the sentence while holding all acoustic aspects of the sentence itself constant. It is argued that sentence understanding is facilitated by rapid identification of focussed information. Since focussed words are usually accented, it is further argued that the active search for accented words demonstrated in previous research should be interpreted as a search for semantic focus.
  • Garnham, A., Shillcock, R. C., Brown, G. D. A., Mill, A. I. D., & Cutler, A. (1981). Slips of the tongue in the London-Lund corpus of spontaneous conversation. Linguistics, 19, 805-817.
  • Cutler, A. (1979). Beyond parsing and lexical look-up. In R. J. Wales, & E. C. T. Walker (Eds.), New approaches to language mechanisms: a collection of psycholinguistic studies (pp. 133-149). Amsterdam: North-Holland.
  • Cutler, A. (1979). Contemporary reaction to Rudolf Meringer’s speech error research. Historiograpia Linguistica, 6, 57-76.
  • Cutler, A., & Norris, D. (1979). Monitoring sentence comprehension. In W. E. Cooper, & E. C. T. Walker (Eds.), Sentence processing: Psycholinguistic studies presented to Merrill Garrett (pp. 113-134). Hillsdale: Erlbaum.
  • Swinney, D. A., & Cutler, A. (1979). The access and processing of idiomatic expressions. Journal of Verbal Learning an Verbal Behavior, 18, 523-534. doi:10.1016/S0022-5371(79)90284-6.

    Abstract

    Two experiments examined the nature of access, storage, and comprehension of idiomatic phrases. In both studies a Phrase Classification Task was utilized. In this, reaction times to determine whether or not word strings constituted acceptable English phrases were measured. Classification times were significantly faster to idiom than to matched control phrases. This effect held under conditions involving different categories of idioms, different transitional probabilities among words in the phrases, and different levels of awareness of the presence of idioms in the materials. The data support a Lexical Representation Hypothesis for the processing of idioms.

Share this page