Anne Cutler

Publications

Displaying 1 - 42 of 42
  • Bruggeman, L., & Cutler, A. (2016). Lexical manipulation as a discovery tool for psycholinguistic research. In C. Carignan, & M. D. Tyler (Eds.), Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016) (pp. 313-316).
  • Cutler, A., & Norris, D. (2016). Bottoms up! How top-down pitfalls ensnare speech perception researchers too. Commentary on C. Firestone & B. Scholl: Cognition does not affect perception: Evaluating the evidence for 'top-down' effects. Behavioral and Brain Sciences, e236. doi:10.1017/S0140525X15002745.

    Abstract

    Not only can the pitfalls that Firestone & Scholl (F&S) identify be generalised across multiple studies within the field of visual perception, but also they have general application outside the field wherever perceptual and cognitive processing are compared. We call attention to the widespread susceptibility of research on the perception of speech to versions of the same pitfalls.
  • Ip, M., & Cutler, A. (2016). Cross-language data on five types of prosodic focus. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 330-334).

    Abstract

    To examine the relative roles of language-specific and language-universal mechanisms in the production of prosodic focus, we compared production of five different types of focus by native speakers of English and Mandarin. Two comparable dialogues were constructed for each language, with the same words appearing in focused and unfocused position; 24 speakers recorded each dialogue in each language. Duration, F0 (mean, maximum, range), and rms-intensity (mean, maximum) of all critical word tokens were measured. Across the different types of focus, cross-language differences were observed in the degree to which English versus Mandarin speakers use the different prosodic parameters to mark focus, suggesting that while prosody may be universally available for expressing focus, the means of its employment may be considerably language-specific
  • Jeske, J., Kember, H., & Cutler, A. (2016). Native and non-native English speakers' use of prosody to predict sentence endings. In Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016).
  • Kember, H., Choi, J., & Cutler, A. (2016). Processing advantages for focused words in Korean. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 702-705).

    Abstract

    In Korean, focus is expressed in accentual phrasing. To ascertain whether words focused in this manner enjoy a processing advantage analogous to that conferred by focus as expressed in, e.g, English and Dutch, we devised sentences with target words in one of four conditions: prosodic focus, syntactic focus, prosodic + syntactic focus, and no focus as a control. 32 native speakers of Korean listened to blocks of 10 sentences, then were presented visually with words and asked whether or not they had heard them. Overall, words with focus were recognised significantly faster and more accurately than unfocused words. In addition, words with syntactic focus or syntactic + prosodic focus were recognised faster than words with prosodic focus alone. As for other languages, Korean focus confers processing advantage on the words carrying it. While prosodic focus does provide an advantage, however, syntactic focus appears to provide the greater beneficial effect for recognition memory
  • Norris, D., McQueen, J. M., & Cutler, A. (2016). Prediction, Bayesian inference and feedback in speech recognition. Language, Cognition and Neuroscience, 31(1), 4-18. doi:10.1080/23273798.2015.1081703.

    Abstract

    Speech perception involves prediction, but how is that prediction implemented? In cognitive models prediction has often been taken to imply that there is feedback of activation from lexical to pre-lexical processes as implemented in interactive-activation models (IAMs). We show that simple activation feedback does not actually improve speech recognition. However, other forms of feedback can be beneficial. In particular, feedback can enable the listener to adapt to changing input, and can potentially help the listener to recognise unusual input, or recognise speech in the presence of competing sounds. The common feature of these helpful forms of feedback is that they are all ways of optimising the performance of speech recognition using Bayesian inference. That is, listeners make predictions about speech because speech recognition is optimal in the sense captured in Bayesian models.
  • El Aissati, A., McQueen, J. M., & Cutler, A. (2012). Finding words in a language that allows words without vowels. Cognition, 124, 79-84. doi:10.1016/j.cognition.2012.03.006.

    Abstract

    Across many languages from unrelated families, spoken-word recognition is subject to a constraint whereby potential word candidates must contain a vowel. This constraint minimizes competition from embedded words (e.g., in English, disfavoring win in twin because t cannot be a word). However, the constraint would be counter-productive in certain languages that allow stand-alone vowelless open-class words. One such language is Berber (where t is indeed a word). Berber listeners here detected words affixed to nonsense contexts with or without vowels. Length effects seen in other languages replicated in Berber, but in contrast to prior findings, word detection was not hindered by vowelless contexts. When words can be vowelless, otherwise universal constraints disfavoring vowelless words do not feature in spoken-word recognition.

    Additional information

    mmc1.pdf
  • Cutler, A., & Davis, C. (2012). An orthographic effect in phoneme processing, and its limitations. Frontiers in Psychology, 3, 18. doi:10.3389/fpsyg.2012.00018.

    Abstract

    To examine whether lexically stored knowledge about spelling influences phoneme evaluation, we conducted three experiments with a low-level phonetic judgement task: phoneme goodness rating. In each experiment, listeners heard phonetic tokens varying along a continuum centred on /s/, occurring finally in isolated word or nonword tokens. An effect of spelling appeared in Experiment 1: Native English speakers’ goodness ratings for the best /s/ tokens were significantly higher in words spelled with S (e.g., bless) than in words spelled with C (e.g., voice). No such difference appeared when nonnative speakers rated the same materials in Experiment 2, indicating that the difference could not be due to acoustic characteristics of the S- versus C-words. In Experiment 3, nonwords with lexical neighbours consistently spelled with S (e.g., pless) versus with C (e.g., floice) failed to elicit orthographic neighbourhood effects; no significant difference appeared in native English speakers’ ratings for the S-consistent versus the C-consistent sets. Obligatory influence of lexical knowledge on phonemic processing would have predicted such neighbourhood effects; the findings are thus better accommodated by models in which phonemic decisions draw strategically upon lexical information.
  • Cutler, A. (2012). Eentaalpsychologie is geen taalpsychologie: Part II. [Valedictory lecture Radboud University]. Nijmegen: Radboud University.

    Abstract

    Rede uitgesproken bij het afscheid als hoogleraar Vergelijkende taalpsychologie aan de Faculteit der Sociale Wetenschappen van de Radboud Universiteit Nijmegen op donderdag 20 september 2012
  • Cutler, A. (2012). Native listening: Language experience and the recognition of spoken words. Cambridge, MA: MIT Press.

    Abstract

    Understanding speech in our native tongue seems natural and effortless; listening to speech in a nonnative language is a different experience. In this book, Anne Cutler argues that listening to speech is a process of native listening because so much of it is exquisitely tailored to the requirements of the native language. Her cross-linguistic study (drawing on experimental work in languages that range from English and Dutch to Chinese and Japanese) documents what is universal and what is language specific in the way we listen to spoken language. Cutler describes the formidable range of mental tasks we carry out, all at once, with astonishing speed and accuracy, when we listen. These include evaluating probabilities arising from the structure of the native vocabulary, tracking information to locate the boundaries between words, paying attention to the way the words are pronounced, and assessing not only the sounds of speech but prosodic information that spans sequences of sounds. She describes infant speech perception, the consequences of language-specific specialization for listening to other languages, the flexibility and adaptability of listening (to our native languages), and how language-specificity and universality fit together in our language processing system. Drawing on her four decades of work as a psycholinguist, Cutler documents the recent growth in our knowledge about how spoken-word recognition works and the role of language structure in this process. Her book is a significant contribution to a vibrant and rapidly developing field.
  • Cutler, A. (2012). Native listening: The flexibility dimension. Dutch Journal of Applied Linguistics, 1(2), 169-187.

    Abstract

    The way we listen to spoken language is tailored to the specific benefit of native-language speech input. Listening to speech in non-native languages can be significantly hindered by this native bias. Is it possible to determine the degree to which a listener is listening in a native-like manner? Promising indications of how this question may be tackled are provided by new research findings concerning the great flexibility that characterises listening to the L1, in online adjustment of phonetic category boundaries for adaptation across talkers, and in modulation of lexical dynamics for adjustment across listening conditions. This flexibility pays off in many dimensions, including listening in noise, adaptation across dialects, and identification of voices. These findings further illuminate the robustness and flexibility of native listening, and potentially point to ways in which we might begin to assess degrees of ‘native-likeness’ in this skill.
  • Cutler, A., Otake, T., & Bruggeman, L. (2012). Phonologically determined asymmetries in vocabulary structure across languages. Journal of the Acoustical Society of America, 132(2), EL155-EL160. doi:10.1121/1.4737596.

    Abstract

    Studies of spoken-word recognition have revealed that competition from embedded words differs in strength as a function of where in the carrier word the embedded word is found and have further shown embedding patterns to be skewed such that embeddings in initial position in carriers outnumber embeddings in final position. Lexico-statistical analyses show that this skew is highly attenuated in Japanese, a noninflectional language. Comparison of the extent of the asymmetry in the three Germanic languages English, Dutch, and German allows the source to be traced to a combination of suffixal morphology and vowel reduction in unstressed syllables.
  • Junge, C., Cutler, A., & Hagoort, P. (2012). Electrophysiological evidence of early word learning. Neuropsychologia, 50, 3702-3712. doi:10.1016/j.neuropsychologia.2012.10.012.

    Abstract

    Around their first birthday infants begin to talk, yet they comprehend words long before. This study investigated the event-related potentials (ERP) responses of nine-month-olds on basic level picture-word pairings. After a familiarization phase of six picture-word pairings per semantic category, comprehension for novel exemplars was tested in a picture-word matching paradigm. ERPs time-locked to pictures elicited a modulation of the Negative Central (Nc) component, associated with visual attention and recognition. It was attenuated by category repetition as well as by the type-token ratio of picture context. ERPs time-locked to words in the training phase became more negative with repetition (N300-600), but there was no influence of picture type-token ratio, suggesting that infants have identified the concept of each picture before a word was presented. Results from the test phase provided clear support that infants integrated word meanings with (novel) picture context. Here, infants showed different ERP responses for words that did or did not align with the picture context: a phonological mismatch (N200) and a semantic mismatch (N400). Together, results were informative of visual categorization, word recognition and word-to-world-mappings, all three crucial processes for vocabulary construction.
  • Junge, C., Kooijman, V., Hagoort, P., & Cutler, A. (2012). Rapid recognition at 10 months as a predictor of language development. Developmental Science, 15, 463-473. doi:10.1111/j.1467-7687.2012.1144.x.

    Abstract

    Infants’ ability to recognize words in continuous speech is vital for building a vocabulary.We here examined the amount and type of exposure needed for 10-month-olds to recognize words. Infants first heard a word, either embedded within an utterance or in isolation, then recognition was assessed by comparing event-related potentials to this word versus a word that they had not heard directly before. Although all 10-month-olds showed recognition responses to words first heard in isolation, not all infants showed such responses to words they had first heard within an utterance. Those that did succeed in the latter, harder, task, however, understood more words and utterances when re-tested at 12 months, and understood more words and produced more words at 24 months, compared with those who had shown no such recognition response at 10 months. The ability to rapidly recognize the words in continuous utterances is clearly linked to future language development.
  • McQueen, J. M., Tyler, M., & Cutler, A. (2012). Lexical retuning of children’s speech perception: Evidence for knowledge about words’ component sounds. Language Learning and Development, 8, 317-339. doi:10.1080/15475441.2011.641887.

    Abstract

    Children hear new words from many different talkers; to learn words most efficiently, they should be able to represent them independently of talker-specific pronunciation detail. However, do children know what the component sounds of words should be, and can they use that knowledge to deal with different talkers' phonetic realizations? Experiment 1 replicated prior studies on lexically guided retuning of speech perception in adults, with a picture-verification methodology suitable for children. One participant group heard an ambiguous fricative ([s/f]) replacing /f/ (e.g., in words like giraffe); another group heard [s/f] replacing /s/ (e.g., in platypus). The first group subsequently identified more tokens on a Simpie-[s/f]impie-Fimpie toy-name continuum as Fimpie. Experiments 2 and 3 found equivalent lexically guided retuning effects in 12- and 6-year-olds. Children aged 6 have all that is needed for adjusting to talker variation in speech: detailed and abstract phonological representations and the ability to apply them during spoken-word recognition.

    Files private

    Request files
  • Tuinman, A., Mitterer, H., & Cutler, A. (2012). Resolving ambiguity in familiar and unfamiliar casual speech. Journal of Memory and Language, 66, 530-544. doi:10.1016/j.jml.2012.02.001.

    Abstract

    In British English, the phrase Canada aided can sound like Canada raided if the speaker links the two vowels at the word boundary with an intrusive /r/. There are subtle phonetic differences between an onset /r/ and an intrusive /r/, however. With cross-modal priming and eye-tracking, we examine how native British English listeners and non-native (Dutch) listeners deal with the lexical ambiguity arising from this language-specific connected speech process. Together the results indicate that the presence of /r/ initially activates competing words for both listener groups; however, the native listeners rapidly exploit the phonetic cues and achieve correct lexical selection. In contrast, these advanced L2 listeners to English failed to recover from the /r/-induced competition, and failed to match native performance in either task. The /r/-intrusion process, which adds a phoneme to speech input, thus causes greater difficulty for L2 listeners than connectedspeech processes which alter or delete phonemes.
  • Warner, N. L., McQueen, J. M., Liu, P. Z., Hoffmann, M., & Cutler, A. (2012). Timing of perception for all English diphones [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1967.

    Abstract

    Information in speech does not unfold discretely over time; perceptual cues are gradient and overlapped. However, this varies greatly across segments and environments: listeners cannot identify the affricate in /ptS/ until the frication, but information about the vowel in /li/ begins early. Unlike most prior studies, which have concentrated on subsets of language sounds, this study tests perception of every English segment in every phonetic environment, sampling perceptual identification at six points in time (13,470 stimuli/listener; 20 listeners). Results show that information about consonants after another segment is most localized for affricates (almost entirely in the release), and most gradual for voiced stops. In comparison to stressed vowels, unstressed vowels have less information spreading to neighboring segments and are less well identified. Indeed, many vowels, especially lax ones, are poorly identified even by the end of the following segment. This may partly reflect listeners’ familiarity with English vowels’ dialectal variability. Diphthongs and diphthongal tense vowels show the most sudden improvement in identification, similar to affricates among the consonants, suggesting that information about segments defined by acoustic change is highly localized. This large dataset provides insights into speech perception and data for probabilistic modeling of spoken word recognition.
  • Clifton, Jr., C., Cutler, A., McQueen, J. M., & Van Ooijen, B. (1999). The processing of inflected forms. [Commentary on H. Clahsen: Lexical entries and rules of language.]. Behavioral and Brain Sciences, 22, 1018-1019.

    Abstract

    Clashen proposes two distinct processing routes, for regularly and irregularly inflected forms, respectively, and thus is apparently making a psychological claim. We argue his position, which embodies a strictly linguistic perspective, does not constitute a psychological processing model.
  • Cutler, A., & Clifton, Jr., C. (1999). Comprehending spoken language: A blueprint of the listener. In C. M. Brown, & P. Hagoort (Eds.), The neurocognition of language (pp. 123-166). Oxford University Press.
  • Cutler, A. (1999). Foreword. In Slips of the Ear: Errors in the perception of Casual Conversation (pp. xiii-xv). New York City, NY, USA: Academic Press.
  • Cutler, A., & Otake, T. (1999). Pitch accent in spoken-word recognition in Japanese. Journal of the Acoustical Society of America, 105, 1877-1888.

    Abstract

    Three experiments addressed the question of whether pitch-accent information may be exploited in the process of recognizing spoken words in Tokyo Japanese. In a two-choice classification task, listeners judged from which of two words, differing in accentual structure, isolated syllables had been extracted ~e.g., ka from baka HL or gaka LH!; most judgments were correct, and listeners’ decisions were correlated with the fundamental frequency characteristics of the syllables. In a gating experiment, listeners heard initial fragments of words and guessed what the words were; their guesses overwhelmingly had the same initial accent structure as the gated word even when only the beginning CV of the stimulus ~e.g., na- from nagasa HLL or nagashi LHH! was presented. In addition, listeners were more confident in guesses with the same initial accent structure as the stimulus than in guesses with different accent. In a lexical decision experiment, responses to spoken words ~e.g., ame HL! were speeded by previous presentation of the same word ~e.g., ame HL! but not by previous presentation of a word differing only in accent ~e.g., ame LH!. Together these findings provide strong evidence that accentual information constrains the activation and selection of candidates for spoken-word recognition.
  • Cutler, A. (1999). Prosodische Struktur und Worterkennung bei gesprochener Sprache. In A. D. Friedrici (Ed.), Enzyklopädie der Psychologie: Sprachrezeption (pp. 49-83). Göttingen: Hogrefe.
  • Cutler, A. (1999). Prosody and intonation, processing issues. In R. A. Wilson, & F. C. Keil (Eds.), MIT encyclopedia of the cognitive sciences (pp. 682-683). Cambridge, MA: MIT Press.
  • Cutler, A., & Norris, D. (1999). Sharpening Ockham’s razor (Commentary on W.J.M. Levelt, A. Roelofs & A.S. Meyer: A theory of lexical access in speech production). Behavioral and Brain Sciences, 22, 40-41.

    Abstract

    Language production and comprehension are intimately interrelated; and models of production and comprehension should, we argue, be constrained by common architectural guidelines. Levelt et al.'s target article adopts as guiding principle Ockham's razor: the best model of production is the simplest one. We recommend adoption of the same principle in comprehension, with consequent simplification of some well-known types of models.
  • Cutler, A. (1999). Spoken-word recognition. In R. A. Wilson, & F. C. Keil (Eds.), MIT encyclopedia of the cognitive sciences (pp. 796-798). Cambridge, MA: MIT Press.
  • Cutler, A., Van Ooijen, B., & Norris, D. (1999). Vowels, consonants, and lexical activation. In J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. Bailey (Eds.), Proceedings of the Fourteenth International Congress of Phonetic Sciences: Vol. 3 (pp. 2053-2056). Berkeley: University of California.

    Abstract

    Two lexical decision studies examined the effects of single-phoneme mismatches on lexical activation in spoken-word recognition. One study was carried out in English, and involved spoken primes and visually presented lexical decision targets. The other study was carried out in Dutch, and primes and targets were both presented auditorily. Facilitation was found only for spoken targets preceded immediately by spoken primes; no facilitation occurred when targets were presented visually, or when intervening input occurred between prime and target. The effects of vowel mismatches and consonant mismatches were equivalent.
  • McQueen, J. M., Norris, D., & Cutler, A. (1999). Lexical influence in phonetic decision-making: Evidence from subcategorical mismatches. Journal of Experimental Psychology: Human Perception and Performance, 25, 1363-1389. doi:10.1037/0096-1523.25.5.1363.

    Abstract

    In 5 experiments, listeners heard words and nonwords, some cross-spliced so that they contained acoustic-phonetic mismatches. Performance was worse on mismatching than on matching items. Words cross-spliced with words and words cross-spliced with nonwords produced parallel results. However, in lexical decision and 1 of 3 phonetic decision experiments, performance on nonwords cross-spliced with words was poorer than on nonwords cross-spliced with nonwords. A gating study confirmed that there were misleading coarticulatory cues in the cross-spliced items; a sixth experiment showed that the earlier results were not due to interitem differences in the strength of these cues. Three models of phonetic decision making (the Race model, the TRACE model, and a postlexical model) did not explain the data. A new bottom-up model is outlined that accounts for the findings in terms of lexical involvement at a dedicated decision-making stage.
  • Otake, T., & Cutler, A. (1999). Perception of suprasegmental structure in a nonnative dialect. Journal of Phonetics, 27, 229-253. doi:10.1006/jpho.1999.0095.

    Abstract

    Two experiments examined the processing of Tokyo Japanese pitchaccent distinctions by native speakers of Japanese from two accentlessvariety areas. In both experiments, listeners were presented with Tokyo Japanese speech materials used in an earlier study with Tokyo Japanese listeners, who clearly exploited the pitch-accent information in spokenword recognition. In the "rst experiment, listeners judged from which of two words, di!ering in accentual structure, isolated syllables had been extracted. Both new groups were, overall, as successful at this task as Tokyo Japanese speakers had been, but their response patterns differed from those of the Tokyo Japanese, for instance in that a bias towards H judgments in the Tokyo Japanese responses was weakened in the present groups' responses. In a second experiment, listeners heard word fragments and guessed what the words were; in this task, the speakers from accentless areas again performed significantly above chance, but their responses showed less sensitivity to the information in the input, and greater bias towards vocabulary distribution frequencies, than had been observed with the Tokyo Japanese listeners. The results suggest that experience with a local accentless dialect affects the processing of accent for word recognition in Tokyo Japanese, even for listeners with extensive exposure to Tokyo Japanese.
  • Shattuck-Hufnagel, S., & Cutler, A. (1999). The prosody of speech error corrections revisited. In J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. Bailey (Eds.), Proceedings of the Fourteenth International Congress of Phonetic Sciences: Vol. 2 (pp. 1483-1486). Berkely: University of California.

    Abstract

    A corpus of digitized speech errors is used to compare the prosody of correction patterns for word-level vs. sound-level errors. Results for both peak F0 and perceived prosodic markedness confirm that speakers are more likely to mark corrections of word-level errors than corrections of sound-level errors, and that errors ambiguous between word-level and soundlevel (such as boat for moat) show correction patterns like those for sound level errors. This finding increases the plausibility of the claim that word-sound-ambiguous errors arise at the same level of processing as sound errors that do not form words.
  • Van Donselaar, W., Kuijpers, C. T., & Cutler, A. (1999). Facilitatory effects of vowel epenthesis on word processing in Dutch. Journal of Memory and Language, 41, 59-77. doi:10.1006/jmla.1999.2635.

    Abstract

    We report a series of experiments examining the effects on word processing of insertion of an optional epenthetic vowel in word-final consonant clusters in Dutch. Such epenthesis turns film, for instance, into film. In a word-reversal task listeners treated words with and without epenthesis alike, as monosyllables, suggesting that the variant forms both activate the same canonical representation, that of a monosyllabic word without epenthesis. In both lexical decision and word spotting, response times to recognize words were significantly faster when epenthesis was present than when the word was presented in its canonical form without epenthesis. It is argued that addition of the epenthetic vowel makes the liquid consonants constituting the first member of a cluster more perceptible; a final phoneme-detection experiment confirmed that this was the case. These findings show that a transformed variant of a word, although it contacts the lexicon via the representation of the canonical form, can be more easily perceptible than that canonical form.
  • Chen, H.-C., & Cutler, A. (1997). Auditory priming in spoken and printed word recognition. In H.-C. Chen (Ed.), Cognitive processing of Chinese and related Asian languages (pp. 77-81). Hong Kong: Chinese University Press.
  • Cutler, A., & Otake, T. (1997). Contrastive studies of spoken-language processing. Journal of Phonetic Society of Japan, 1, 4-13.
  • Cutler, A., & Chen, H.-C. (1997). Lexical tone in Cantonese spoken-word processing. Perception and Psychophysics, 59, 165-179. Retrieved from http://www.psychonomic.org/search/view.cgi?id=778.

    Abstract

    In three experiments, the processing of lexical tone in Cantonese was examined. Cantonese listeners more often accepted a nonword as a word when the only difference between the nonword and the word was in tone, especially when the F0 onset difference between correct and erroneous tone was small. Same–different judgments by these listeners were also slower and less accurate when the only difference between two syllables was in tone, and this was true whether the F0 onset difference between the two tones was large or small. Listeners with no knowledge of Cantonese produced essentially the same same-different judgment pattern as that produced by the native listeners, suggesting that the results display the effects of simple perceptual processing rather than of linguistic knowledge. It is argued that the processing of lexical tone distinctions may be slowed, relative to the processing of segmental distinctions, and that, in speeded-response tasks, tone is thus more likely to be misprocessed than is segmental structure.
  • Cutler, A. (1997). Prosody and the structure of the message. In Y. Sagisaka, N. Campbell, & N. Higuchi (Eds.), Computing prosody: Computational models for processing spontaneous speech (pp. 63-66). Heidelberg: Springer.
  • Cutler, A., Dahan, D., & Van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40, 141-201.

    Abstract

    Research on the exploitation of prosodic information in the recognition of spoken language is reviewed. The research falls into three main areas: the use of prosody in the recognition of spoken words, in which most attention has been paid to the question of whether the prosodic structure of a word plays a role in initial contact with stored lexical representations; the use of prosody in the computation of syntactic structure, in which the resolution of global and local ambiguities has formed the central focus; and the role of prosody in the processing of discourse structure, in which there has been a preponderance of work on the contribution of accentuation and deaccentuation to integration of concepts with an existing discourse model. The review reveals that in each area progress has been made towards new conceptions of prosody's role in processing, and in particular this has involved abandonment of previously held deterministic views of the relationship between prosodic structure and other aspects of linguistic structure
  • Cutler, A. (1997). The comparative perspective on spoken-language processing. Speech Communication, 21, 3-15. doi:10.1016/S0167-6393(96)00075-1.

    Abstract

    Psycholinguists strive to construct a model of human language processing in general. But this does not imply that they should confine their research to universal aspects of linguistic structure, and avoid research on language-specific phenomena. First, even universal characteristics of language structure can only be accurately observed cross-linguistically. This point is illustrated here by research on the role of the syllable in spoken-word recognition, on the perceptual processing of vowels versus consonants, and on the contribution of phonetic assimilation phonemena to phoneme identification. In each case, it is only by looking at the pattern of effects across languages that it is possible to understand the general principle. Second, language-specific processing can certainly shed light on the universal model of language comprehension. This second point is illustrated by studies of the exploitation of vowel harmony in the lexical segmentation of Finnish, of the recognition of Dutch words with and without vowel epenthesis, and of the contribution of different kinds of lexical prosodic structure (tone, pitch accent, stress) to the initial activation of candidate words in lexical access. In each case, aspects of the universal processing model are revealed by analysis of these language-specific effects. In short, the study of spoken-language processing by human listeners requires cross-linguistic comparison.
  • Cutler, A. (1997). The syllable’s role in the segmentation of stress languages. Language and Cognitive Processes, 12, 839-845. doi:10.1080/016909697386718.
  • Koster, M., & Cutler, A. (1997). Segmental and suprasegmental contributions to spoken-word recognition in Dutch. In Proceedings of EUROSPEECH 97 (pp. 2167-2170). Grenoble, France: ESCA.

    Abstract

    Words can be distinguished by segmental differences or by suprasegmental differences or both. Studies from English suggest that suprasegmentals play little role in human spoken-word recognition; English stress, however, is nearly always unambiguously coded in segmental structure (vowel quality); this relationship is less close in Dutch. The present study directly compared the effects of segmental and suprasegmental mispronunciation on word recognition in Dutch. There was a strong effect of suprasegmental mispronunciation, suggesting that Dutch listeners do exploit suprasegmental information in word recognition. Previous findings indicating the effects of mis-stressing for Dutch differ with stress position were replicated only when segmental change was involved, suggesting that this is an effect of segmental rather than suprasegmental processing.
  • McQueen, J. M., & Cutler, A. (1997). Cognitive processes in speech perception. In W. J. Hardcastle, & J. D. Laver (Eds.), The handbook of phonetic sciences (pp. 556-585). Oxford: Blackwell.
  • Norris, D., McQueen, J. M., Cutler, A., & Butterfield, S. (1997). The possible-word constraint in the segmentation of continuous speech. Cognitive Psychology, 34, 191-243. doi:10.1006/cogp.1997.0671.

    Abstract

    We propose that word recognition in continuous speech is subject to constraints on what may constitute a viable word of the language. This Possible-Word Constraint (PWC) reduces activation of candidate words if their recognition would imply word status for adjacent input which could not be a word - for instance, a single consonant. In two word-spotting experiments, listeners found it much harder to detectapple,for example, infapple(where [f] alone would be an impossible word), than invuffapple(wherevuffcould be a word of English). We demonstrate that the PWC can readily be implemented in a competition-based model of continuous speech recognition, as a constraint on the process of competition between candidate words; where a stretch of speech between a candidate word and a (known or likely) word boundary is not a possible word, activation of the candidate word is reduced. This implementation accurately simulates both the present results and data from a range of earlier studies of speech segmentation.
  • Pallier, C., Cutler, A., & Sebastian-Galles, N. (1997). Prosodic structure and phonetic processing: A cross-linguistic study. In Proceedings of EUROSPEECH 97 (pp. 2131-2134). Grenoble, France: ESCA.

    Abstract

    Dutch and Spanish differ in how predictable the stress pattern is as a function of the segmental content: it is correlated with syllable weight in Dutch but not in Spanish. In the present study, two experiments were run to compare the abilities of Dutch and Spanish speakers to separately process segmental and stress information. It was predicted that the Spanish speakers would have more difficulty focusing on the segments and ignoring the stress pattern than the Dutch speakers. The task was a speeded classification task on CVCV syllables, with blocks of trials in which the stress pattern could vary versus blocks in which it was fixed. First, we found interference due to stress variability in both languages, suggesting that the processing of segmental information cannot be performed independently of stress. Second, the effect was larger for Spanish than for Dutch, suggesting that that the degree of interference from stress variation may be partially mitigated by the predictability of stress placement in the language.
  • Suomi, K., McQueen, J. M., & Cutler, A. (1997). Vowel harmony and speech segmentation in Finnish. Journal of Memory and Language, 36, 422-444. doi:10.1006/jmla.1996.2495.

    Abstract

    Finnish vowel harmony rules require that if the vowel in the first syllable of a word belongs to one of two vowel sets, then all subsequent vowels in that word must belong either to the same set or to a neutral set. A harmony mismatch between two syllables containing vowels from the opposing sets thus signals a likely word boundary. We report five experiments showing that Finnish listeners can exploit this information in an on-line speech segmentation task. Listeners found it easier to detect words likehymyat the end of the nonsense stringpuhymy(where there is a harmony mismatch between the first two syllables) than in the stringpyhymy(where there is no mismatch). There was no such effect, however, when the target words appeared at the beginning of the nonsense string (e.g.,hymypuvshymypy). Stronger harmony effects were found for targets containing front harmony vowels (e.g.,hymy) than for targets containing back harmony vowels (e.g.,paloinkypaloandkupalo). The same pattern of results appeared whether target position within the string was predictable or unpredictable. Harmony mismatch thus appears to provide a useful segmentation cue for the detection of word onsets in Finnish speech.

Share this page