Anne Cutler

Publications

Displaying 1 - 16 of 16
  • Cutler, A., & Norris, D. (2016). Bottoms up! How top-down pitfalls ensnare speech perception researchers too. Commentary on C. Firestone & B. Scholl: Cognition does not affect perception: Evaluating the evidence for 'top-down' effects. Behavioral and Brain Sciences, e236. doi:10.1017/S0140525X15002745.

    Abstract

    Not only can the pitfalls that Firestone & Scholl (F&S) identify be generalised across multiple studies within the field of visual perception, but also they have general application outside the field wherever perceptual and cognitive processing are compared. We call attention to the widespread susceptibility of research on the perception of speech to versions of the same pitfalls.
  • Norris, D., McQueen, J. M., & Cutler, A. (2016). Prediction, Bayesian inference and feedback in speech recognition. Language, Cognition and Neuroscience, 31(1), 4-18. doi:10.1080/23273798.2015.1081703.

    Abstract

    Speech perception involves prediction, but how is that prediction implemented? In cognitive models prediction has often been taken to imply that there is feedback of activation from lexical to pre-lexical processes as implemented in interactive-activation models (IAMs). We show that simple activation feedback does not actually improve speech recognition. However, other forms of feedback can be beneficial. In particular, feedback can enable the listener to adapt to changing input, and can potentially help the listener to recognise unusual input, or recognise speech in the presence of competing sounds. The common feature of these helpful forms of feedback is that they are all ways of optimising the performance of speech recognition using Bayesian inference. That is, listeners make predictions about speech because speech recognition is optimal in the sense captured in Bayesian models.
  • Broersma, M., & Cutler, A. (2008). Phantom word activation in L2. System, 36(1), 22-34. doi:10.1016/j.system.2007.11.003.

    Abstract

    L2 listening can involve the phantom activation of words which are not actually in the input. All spoken-word recognition involves multiple concurrent activation of word candidates, with selection of the correct words achieved by a process of competition between them. L2 listening involves more such activation than L1 listening, and we report two studies illustrating this. First, in a lexical decision study, L2 listeners accepted (but L1 listeners did not accept) spoken non-words such as groof or flide as real English words. Second, a priming study demonstrated that the same spoken non-words made recognition of the real words groove, flight easier for L2 (but not L1) listeners, suggesting that, for the L2 listeners only, these real words had been activated by the spoken non-word input. We propose that further understanding of the activation and competition process in L2 lexical processing could lead to new understanding of L2 listening difficulty.
  • Cutler, A., Garcia Lecumberri, M. L., & Cooke, M. (2008). Consonant identification in noise by native and non-native listeners: Effects of local context. Journal of the Acoustical Society of America, 124(2), 1264-1268. doi:10.1121/1.2946707.

    Abstract

    Speech recognition in noise is harder in second (L2) than first languages (L1). This could be because noise disrupts speech processing more in L2 than L1, or because L1 listeners recover better though disruption is equivalent. Two similar prior studies produced discrepant results: Equivalent noise effects for L1 and L2 (Dutch) listeners, versus larger effects for L2 (Spanish) than L1. To explain this, the latter experiment was presented to listeners from the former population. Larger noise effects on consonant identification emerged for L2 (Dutch) than L1 listeners, suggesting that task factors rather than L2 population differences underlie the results discrepancy.
  • Cutler, A. (2008). The abstract representations in speech processing. Quarterly Journal of Experimental Psychology, 61(11), 1601-1619. doi:10.1080/13803390802218542.

    Abstract

    Speech processing by human listeners derives meaning from acoustic input via intermediate steps involving abstract representations of what has been heard. Recent results from several lines of research are here brought together to shed light on the nature and role of these representations. In spoken-word recognition, representations of phonological form and of conceptual content are dissociable. This follows from the independence of patterns of priming for a word's form and its meaning. The nature of the phonological-form representations is determined not only by acoustic-phonetic input but also by other sources of information, including metalinguistic knowledge. This follows from evidence that listeners can store two forms as different without showing any evidence of being able to detect the difference in question when they listen to speech. The lexical representations are in turn separate from prelexical representations, which are also abstract in nature. This follows from evidence that perceptual learning about speaker-specific phoneme realization, induced on the basis of a few words, generalizes across the whole lexicon to inform the recognition of all words containing the same phoneme. The efficiency of human speech processing has its basis in the rapid execution of operations over abstract representations.
  • Goudbeek, M., Cutler, A., & Smits, R. (2008). Supervised and unsupervised learning of multidimensionally varying nonnative speech categories. Speech Communication, 50(2), 109-125. doi:10.1016/j.specom.2007.07.003.

    Abstract

    The acquisition of novel phonetic categories is hypothesized to be affected by the distributional properties of the input, the relation of the new categories to the native phonology, and the availability of supervision (feedback). These factors were examined in four experiments in which listeners were presented with novel categories based on vowels of Dutch. Distribution was varied such that the categorization depended on the single dimension duration, the single dimension frequency, or both dimensions at once. Listeners were clearly sensitive to the distributional information, but unidimensional contrasts proved easier to learn than multidimensional. The native phonology was varied by comparing Spanish versus American English listeners. Spanish listeners found categorization by frequency easier than categorization by duration, but this was not true of American listeners, whose native vowel system makes more use of duration-based distinctions. Finally, feedback was either available or not; this comparison showed supervised learning to be significantly superior to unsupervised learning.
  • Kim, J., Davis, C., & Cutler, A. (2008). Perceptual tests of rhythmic similarity: II. Syllable rhythm. Language and Speech, 51(4), 343-359. doi:10.1177/0023830908099069.

    Abstract

    To segment continuous speech into its component words, listeners make use of language rhythm; because rhythm differs across languages, so do the segmentation procedures which listeners use. For each of stress-, syllable-and mora-based rhythmic structure, perceptual experiments have led to the discovery of corresponding segmentation procedures. In the case of mora-based rhythm, similar segmentation has been demonstrated in the otherwise unrelated languages Japanese and Telugu; segmentation based on syllable rhythm, however, has been previously demonstrated only for European languages from the Romance family. We here report two target detection experiments in which Korean listeners, presented with speech in Korean and in French, displayed patterns of segmentation like those previously observed in analogous experiments with French listeners. The Korean listeners' accuracy in detecting word-initial target fragments in either language was significantly higher when the fragments corresponded exactly to a syllable in the input than when the fragments were smaller or larger than a syllable. We conclude that Korean and French listeners can call on similar procedures for segmenting speech, and we further propose that perceptual tests of speech segmentation provide a valuable accompaniment to acoustic analyses for establishing languages' rhythmic class membership.
  • Akker, E., & Cutler, A. (2003). Prosodic cues to semantic structure in native and nonnative listening. Bilingualism: Language and Cognition, 6(2), 81-96. doi:10.1017/S1366728903001056.

    Abstract

    Listeners efficiently exploit sentence prosody to direct attention to words bearing sentence accent. This effect has been explained as a search for focus, furthering rapid apprehension of semantic structure. A first experiment supported this explanation: English listeners detected phoneme targets in sentences more rapidly when the target-bearing words were in accented position or in focussed position, but the two effects interacted, consistent with the claim that the effects serve a common cause. In a second experiment a similar asymmetry was observed with Dutch listeners and Dutch sentences. In a third and a fourth experiment, proficient Dutch users of English heard English sentences; here, however, the two effects did not interact. The results suggest that less efficient mapping of prosody to semantics may be one way in which nonnative listening fails to equal native listening.
  • Johnson, E. K., Jusczyk, P. W., Cutler, A., & Norris, D. (2003). Lexical viability constraints on speech segmentation by infants. Cognitive Psychology, 46(1), 65-97. doi:10.1016/S0010-0285(02)00507-8.

    Abstract

    The Possible Word Constraint limits the number of lexical candidates considered in speech recognition by stipulating that input should be parsed into a string of lexically viable chunks. For instance, an isolated single consonant is not a feasible word candidate. Any segmentation containing such a chunk is disfavored. Five experiments using the head-turn preference procedure investigated whether, like adults, 12-month-olds observe this constraint in word recognition. In Experiments 1 and 2, infants were familiarized with target words (e.g., rush), then tested on lists of nonsense items containing these words in “possible” (e.g., “niprush” [nip + rush]) or “impossible” positions (e.g., “prush” [p + rush]). The infants listened significantly longer to targets in “possible” versus “impossible” contexts when targets occurred at the end of nonsense items (rush in “prush”), but not when they occurred at the beginning (tan in “tance”). In Experiments 3 and 4, 12-month-olds were similarly familiarized with target words, but test items were real words in sentential contexts (win in “wind” versus “window”). The infants listened significantly longer to words in the “possible” condition regardless of target location. Experiment 5 with targets at the beginning of isolated real words (e.g., win in “wind”) replicated Experiment 2 in showing no evidence of viability effects in beginning position. Taken together, the findings suggest that, in situations in which 12-month-olds are required to rely on their word segmentation abilities, they give evidence of observing lexical viability constraints in the way that they parse fluent speech.
  • McQueen, J. M., Cutler, A., & Norris, D. (2003). Flow of information in the spoken word recognition system. Speech Communication, 41(1), 257-270. doi:10.1016/S0167-6393(02)00108-5.

    Abstract

    Spoken word recognition consists of two major component processes. First, at the prelexical stage, an abstract description of the utterance is generated from the information in the speech signal. Second, at the lexical stage, this description is used to activate all the words stored in the mental lexicon which match the input. These multiple candidate words then compete with each other. We review evidence which suggests that positive (match) and negative (mismatch) information of both a segmental and a suprasegmental nature is used to constrain this activation and competition process. We then ask whether, in addition to the necessary influence of the prelexical stage on the lexical stage, there is also feedback from the lexicon to the prelexical level. In two phonetic categorization experiments, Dutch listeners were asked to label both syllable-initial and syllable-final ambiguous fricatives (e.g., sounds ranging from [f] to [s]) in the word–nonword series maf–mas, and the nonword–word series jaf–jas. They tended to label the sounds in a lexically consistent manner (i.e., consistent with the word endpoints of the series). These lexical effects became smaller in listeners’ slower responses, even when the listeners were put under pressure to respond as fast as possible. Our results challenge models of spoken word recognition in which feedback modulates the prelexical analysis of the component sounds of a word whenever that word is heard
  • Norris, D., McQueen, J. M., & Cutler, A. (2003). Perceptual learning in speech. Cognitive Psychology, 47(2), 204-238. doi:10.1016/S0010-0285(03)00006-9.

    Abstract

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listeners heard ambiguous [f]-final words (e.g., [WI tlo?], from witlof, chicory) and unambiguous [s]-final words (e.g., naaldbos, pine forest). Another group heard the reverse (e.g., ambiguous [na:ldbo?], unambiguous witlof). Listeners who had heard [?] in [f]-final words were subsequently more likely to categorize ambiguous sounds on an [f]–[s] continuum as [f] than those who heard [?] in [s]-final words. Control conditions ruled out alternative explanations based on selective adaptation and contrast. Lexical information can thus be used to train categorization of speech. This use of lexical information differs from the on-line lexical feedback embodied in interactive models of speech perception. In contrast to on-line feedback, lexical feedback for learning is of benefit to spoken word recognition (e.g., in adapting to a newly encountered dialect).
  • Smits, R., Warner, N., McQueen, J. M., & Cutler, A. (2003). Unfolding of phonetic information over time: A database of Dutch diphone perception. Journal of the Acoustical Society of America, 113(1), 563-574. doi:10.1121/1.1525287.

    Abstract

    We present the results of a large-scale study on speech perception, assessing the number and type of perceptual hypotheses which listeners entertain about possible phoneme sequences in their language. Dutch listeners were asked to identify gated fragments of all 1179 diphones of Dutch, providing a total of 488 520 phoneme categorizations. The results manifest orderly uptake of acoustic information in the signal. Differences across phonemes in the rate at which fully correct recognition was achieved arose as a result of whether or not potential confusions could occur with other phonemes of the language ~long with short vowels, affricates with their initial components, etc.!. These data can be used to improve models of how acoustic phonetic information is mapped onto the mental lexicon during speech comprehension.
  • Spinelli, E., McQueen, J. M., & Cutler, A. (2003). Processing resyllabified words in French. Journal of Memory and Language, 48(2), 233-254. doi:10.1016/S0749-596X(02)00513-2.
  • Weber, A., & Cutler, A. (2003). Perceptual similarity co-existing with lexical dissimilarity [Abstract]. Abstracts of the 146th Meeting of the Acoustical Society of America. Journal of the Acoustical Society of America, 114(4 Pt. 2), 2422. doi:10.1121/1.1601094.

    Abstract

    The extreme case of perceptual similarity is indiscriminability, as when two second‐language phonemes map to a single native category. An example is the English had‐head vowel contrast for Dutch listeners; Dutch has just one such central vowel, transcribed [E]. We examine whether the failure to discriminate in phonetic categorization implies indiscriminability in other—e.g., lexical—processing. Eyetracking experiments show that Dutch‐native listeners instructed in English to ‘‘click on the panda’’ look (significantly more than native listeners) at a pictured pencil, suggesting that pan‐ activates their lexical representation of pencil. The reverse, however, is not the case: ‘‘click on the pencil’’ does not induce looks to a panda, suggesting that pen‐ does not activate panda in the lexicon. Thus prelexically undiscriminated second‐language distinctions can nevertheless be maintained in stored lexical representations. The problem of mapping a resulting unitary input to two distinct categories in lexical representations is solved by allowing input to activate only one second‐language category. For Dutch listeners to English, this is English [E], as a result of which no vowels in the signal ever map to words containing [ae]. We suggest that the choice of category is here motivated by a more abstract, phonemic, metric of similarity.
  • Cutler, A., Mehler, J., Norris, D., & Segui, J. (1983). A language-specific comprehension strategy [Letters to Nature]. Nature, 304, 159-160. doi:10.1038/304159a0.

    Abstract

    Infants acquire whatever language is spoken in the environment into which they are born. The mental capability of the newborn child is not biased in any way towards the acquisition of one human language rather than another. Because psychologists who attempt to model the process of language comprehension are interested in the structure of the human mind, rather than in the properties of individual languages, strategies which they incorporate in their models are presumed to be universal, not language-specific. In other words, strategies of comprehension are presumed to be characteristic of the human language processing system, rather than, say, the French, English, or Igbo language processing systems. We report here, however, on a comprehension strategy which appears to be used by native speakers of French but not by native speakers of English.
  • Levelt, W. J. M., & Cutler, A. (1983). Prosodic marking in speech repair. Journal of semantics, 2, 205-217. doi:10.1093/semant/2.2.205.

    Abstract

    Spontaneous self-corrections in speech pose a communication problem; the speaker must make clear to the listener not only that the original Utterance was faulty, but where it was faulty and how the fault is to be corrected. Prosodic marking of corrections - making the prosody of the repair noticeably different from that of the original utterance - offers a resource which the speaker can exploit to provide the listener with such information. A corpus of more than 400 spontaneous speech repairs was analysed, and the prosodic characteristics compared with the syntactic and semantic characteristics of each repair. Prosodic marking showed no relationship at all with the syntactic characteristics of repairs. Instead, marking was associated with certain semantic factors: repairs were marked when the original utterance had been actually erroneous, rather than simply less appropriate than the repair; and repairs tended to be marked more often when the set of items encompassing the error and the repair was small rather than when it was large. These findings lend further weight to the characterization of accent as essentially semantic in function.

Share this page