Anne Cutler

Publications

Displaying 1 - 38 of 38
  • Bruggeman, L., & Cutler, A. (2016). Lexical manipulation as a discovery tool for psycholinguistic research. In C. Carignan, & M. D. Tyler (Eds.), Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016) (pp. 313-316).
  • Cutler, A., & Norris, D. (2016). Bottoms up! How top-down pitfalls ensnare speech perception researchers too. Commentary on C. Firestone & B. Scholl: Cognition does not affect perception: Evaluating the evidence for 'top-down' effects. Behavioral and Brain Sciences, e236. doi:10.1017/S0140525X15002745.

    Abstract

    Not only can the pitfalls that Firestone & Scholl (F&S) identify be generalised across multiple studies within the field of visual perception, but also they have general application outside the field wherever perceptual and cognitive processing are compared. We call attention to the widespread susceptibility of research on the perception of speech to versions of the same pitfalls.
  • Ip, M., & Cutler, A. (2016). Cross-language data on five types of prosodic focus. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 330-334).

    Abstract

    To examine the relative roles of language-specific and language-universal mechanisms in the production of prosodic focus, we compared production of five different types of focus by native speakers of English and Mandarin. Two comparable dialogues were constructed for each language, with the same words appearing in focused and unfocused position; 24 speakers recorded each dialogue in each language. Duration, F0 (mean, maximum, range), and rms-intensity (mean, maximum) of all critical word tokens were measured. Across the different types of focus, cross-language differences were observed in the degree to which English versus Mandarin speakers use the different prosodic parameters to mark focus, suggesting that while prosody may be universally available for expressing focus, the means of its employment may be considerably language-specific
  • Jeske, J., Kember, H., & Cutler, A. (2016). Native and non-native English speakers' use of prosody to predict sentence endings. In Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016).
  • Kember, H., Choi, J., & Cutler, A. (2016). Processing advantages for focused words in Korean. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 702-705).

    Abstract

    In Korean, focus is expressed in accentual phrasing. To ascertain whether words focused in this manner enjoy a processing advantage analogous to that conferred by focus as expressed in, e.g, English and Dutch, we devised sentences with target words in one of four conditions: prosodic focus, syntactic focus, prosodic + syntactic focus, and no focus as a control. 32 native speakers of Korean listened to blocks of 10 sentences, then were presented visually with words and asked whether or not they had heard them. Overall, words with focus were recognised significantly faster and more accurately than unfocused words. In addition, words with syntactic focus or syntactic + prosodic focus were recognised faster than words with prosodic focus alone. As for other languages, Korean focus confers processing advantage on the words carrying it. While prosodic focus does provide an advantage, however, syntactic focus appears to provide the greater beneficial effect for recognition memory
  • Norris, D., McQueen, J. M., & Cutler, A. (2016). Prediction, Bayesian inference and feedback in speech recognition. Language, Cognition and Neuroscience, 31(1), 4-18. doi:10.1080/23273798.2015.1081703.

    Abstract

    Speech perception involves prediction, but how is that prediction implemented? In cognitive models prediction has often been taken to imply that there is feedback of activation from lexical to pre-lexical processes as implemented in interactive-activation models (IAMs). We show that simple activation feedback does not actually improve speech recognition. However, other forms of feedback can be beneficial. In particular, feedback can enable the listener to adapt to changing input, and can potentially help the listener to recognise unusual input, or recognise speech in the presence of competing sounds. The common feature of these helpful forms of feedback is that they are all ways of optimising the performance of speech recognition using Bayesian inference. That is, listeners make predictions about speech because speech recognition is optimal in the sense captured in Bayesian models.
  • El Aissati, A., McQueen, J. M., & Cutler, A. (2012). Finding words in a language that allows words without vowels. Cognition, 124, 79-84. doi:10.1016/j.cognition.2012.03.006.

    Abstract

    Across many languages from unrelated families, spoken-word recognition is subject to a constraint whereby potential word candidates must contain a vowel. This constraint minimizes competition from embedded words (e.g., in English, disfavoring win in twin because t cannot be a word). However, the constraint would be counter-productive in certain languages that allow stand-alone vowelless open-class words. One such language is Berber (where t is indeed a word). Berber listeners here detected words affixed to nonsense contexts with or without vowels. Length effects seen in other languages replicated in Berber, but in contrast to prior findings, word detection was not hindered by vowelless contexts. When words can be vowelless, otherwise universal constraints disfavoring vowelless words do not feature in spoken-word recognition.

    Additional information

    mmc1.pdf
  • Cutler, A., & Davis, C. (2012). An orthographic effect in phoneme processing, and its limitations. Frontiers in Psychology, 3, 18. doi:10.3389/fpsyg.2012.00018.

    Abstract

    To examine whether lexically stored knowledge about spelling influences phoneme evaluation, we conducted three experiments with a low-level phonetic judgement task: phoneme goodness rating. In each experiment, listeners heard phonetic tokens varying along a continuum centred on /s/, occurring finally in isolated word or nonword tokens. An effect of spelling appeared in Experiment 1: Native English speakers’ goodness ratings for the best /s/ tokens were significantly higher in words spelled with S (e.g., bless) than in words spelled with C (e.g., voice). No such difference appeared when nonnative speakers rated the same materials in Experiment 2, indicating that the difference could not be due to acoustic characteristics of the S- versus C-words. In Experiment 3, nonwords with lexical neighbours consistently spelled with S (e.g., pless) versus with C (e.g., floice) failed to elicit orthographic neighbourhood effects; no significant difference appeared in native English speakers’ ratings for the S-consistent versus the C-consistent sets. Obligatory influence of lexical knowledge on phonemic processing would have predicted such neighbourhood effects; the findings are thus better accommodated by models in which phonemic decisions draw strategically upon lexical information.
  • Cutler, A. (2012). Eentaalpsychologie is geen taalpsychologie: Part II. [Valedictory lecture Radboud University]. Nijmegen: Radboud University.

    Abstract

    Rede uitgesproken bij het afscheid als hoogleraar Vergelijkende taalpsychologie aan de Faculteit der Sociale Wetenschappen van de Radboud Universiteit Nijmegen op donderdag 20 september 2012
  • Cutler, A. (2012). Native listening: Language experience and the recognition of spoken words. Cambridge, MA: MIT Press.

    Abstract

    Understanding speech in our native tongue seems natural and effortless; listening to speech in a nonnative language is a different experience. In this book, Anne Cutler argues that listening to speech is a process of native listening because so much of it is exquisitely tailored to the requirements of the native language. Her cross-linguistic study (drawing on experimental work in languages that range from English and Dutch to Chinese and Japanese) documents what is universal and what is language specific in the way we listen to spoken language. Cutler describes the formidable range of mental tasks we carry out, all at once, with astonishing speed and accuracy, when we listen. These include evaluating probabilities arising from the structure of the native vocabulary, tracking information to locate the boundaries between words, paying attention to the way the words are pronounced, and assessing not only the sounds of speech but prosodic information that spans sequences of sounds. She describes infant speech perception, the consequences of language-specific specialization for listening to other languages, the flexibility and adaptability of listening (to our native languages), and how language-specificity and universality fit together in our language processing system. Drawing on her four decades of work as a psycholinguist, Cutler documents the recent growth in our knowledge about how spoken-word recognition works and the role of language structure in this process. Her book is a significant contribution to a vibrant and rapidly developing field.
  • Cutler, A. (2012). Native listening: The flexibility dimension. Dutch Journal of Applied Linguistics, 1(2), 169-187.

    Abstract

    The way we listen to spoken language is tailored to the specific benefit of native-language speech input. Listening to speech in non-native languages can be significantly hindered by this native bias. Is it possible to determine the degree to which a listener is listening in a native-like manner? Promising indications of how this question may be tackled are provided by new research findings concerning the great flexibility that characterises listening to the L1, in online adjustment of phonetic category boundaries for adaptation across talkers, and in modulation of lexical dynamics for adjustment across listening conditions. This flexibility pays off in many dimensions, including listening in noise, adaptation across dialects, and identification of voices. These findings further illuminate the robustness and flexibility of native listening, and potentially point to ways in which we might begin to assess degrees of ‘native-likeness’ in this skill.
  • Cutler, A., Otake, T., & Bruggeman, L. (2012). Phonologically determined asymmetries in vocabulary structure across languages. Journal of the Acoustical Society of America, 132(2), EL155-EL160. doi:10.1121/1.4737596.

    Abstract

    Studies of spoken-word recognition have revealed that competition from embedded words differs in strength as a function of where in the carrier word the embedded word is found and have further shown embedding patterns to be skewed such that embeddings in initial position in carriers outnumber embeddings in final position. Lexico-statistical analyses show that this skew is highly attenuated in Japanese, a noninflectional language. Comparison of the extent of the asymmetry in the three Germanic languages English, Dutch, and German allows the source to be traced to a combination of suffixal morphology and vowel reduction in unstressed syllables.
  • Junge, C., Cutler, A., & Hagoort, P. (2012). Electrophysiological evidence of early word learning. Neuropsychologia, 50, 3702-3712. doi:10.1016/j.neuropsychologia.2012.10.012.

    Abstract

    Around their first birthday infants begin to talk, yet they comprehend words long before. This study investigated the event-related potentials (ERP) responses of nine-month-olds on basic level picture-word pairings. After a familiarization phase of six picture-word pairings per semantic category, comprehension for novel exemplars was tested in a picture-word matching paradigm. ERPs time-locked to pictures elicited a modulation of the Negative Central (Nc) component, associated with visual attention and recognition. It was attenuated by category repetition as well as by the type-token ratio of picture context. ERPs time-locked to words in the training phase became more negative with repetition (N300-600), but there was no influence of picture type-token ratio, suggesting that infants have identified the concept of each picture before a word was presented. Results from the test phase provided clear support that infants integrated word meanings with (novel) picture context. Here, infants showed different ERP responses for words that did or did not align with the picture context: a phonological mismatch (N200) and a semantic mismatch (N400). Together, results were informative of visual categorization, word recognition and word-to-world-mappings, all three crucial processes for vocabulary construction.
  • Junge, C., Kooijman, V., Hagoort, P., & Cutler, A. (2012). Rapid recognition at 10 months as a predictor of language development. Developmental Science, 15, 463-473. doi:10.1111/j.1467-7687.2012.1144.x.

    Abstract

    Infants’ ability to recognize words in continuous speech is vital for building a vocabulary.We here examined the amount and type of exposure needed for 10-month-olds to recognize words. Infants first heard a word, either embedded within an utterance or in isolation, then recognition was assessed by comparing event-related potentials to this word versus a word that they had not heard directly before. Although all 10-month-olds showed recognition responses to words first heard in isolation, not all infants showed such responses to words they had first heard within an utterance. Those that did succeed in the latter, harder, task, however, understood more words and utterances when re-tested at 12 months, and understood more words and produced more words at 24 months, compared with those who had shown no such recognition response at 10 months. The ability to rapidly recognize the words in continuous utterances is clearly linked to future language development.
  • McQueen, J. M., Tyler, M., & Cutler, A. (2012). Lexical retuning of children’s speech perception: Evidence for knowledge about words’ component sounds. Language Learning and Development, 8, 317-339. doi:10.1080/15475441.2011.641887.

    Abstract

    Children hear new words from many different talkers; to learn words most efficiently, they should be able to represent them independently of talker-specific pronunciation detail. However, do children know what the component sounds of words should be, and can they use that knowledge to deal with different talkers' phonetic realizations? Experiment 1 replicated prior studies on lexically guided retuning of speech perception in adults, with a picture-verification methodology suitable for children. One participant group heard an ambiguous fricative ([s/f]) replacing /f/ (e.g., in words like giraffe); another group heard [s/f] replacing /s/ (e.g., in platypus). The first group subsequently identified more tokens on a Simpie-[s/f]impie-Fimpie toy-name continuum as Fimpie. Experiments 2 and 3 found equivalent lexically guided retuning effects in 12- and 6-year-olds. Children aged 6 have all that is needed for adjusting to talker variation in speech: detailed and abstract phonological representations and the ability to apply them during spoken-word recognition.

    Files private

    Request files
  • Tuinman, A., Mitterer, H., & Cutler, A. (2012). Resolving ambiguity in familiar and unfamiliar casual speech. Journal of Memory and Language, 66, 530-544. doi:10.1016/j.jml.2012.02.001.

    Abstract

    In British English, the phrase Canada aided can sound like Canada raided if the speaker links the two vowels at the word boundary with an intrusive /r/. There are subtle phonetic differences between an onset /r/ and an intrusive /r/, however. With cross-modal priming and eye-tracking, we examine how native British English listeners and non-native (Dutch) listeners deal with the lexical ambiguity arising from this language-specific connected speech process. Together the results indicate that the presence of /r/ initially activates competing words for both listener groups; however, the native listeners rapidly exploit the phonetic cues and achieve correct lexical selection. In contrast, these advanced L2 listeners to English failed to recover from the /r/-induced competition, and failed to match native performance in either task. The /r/-intrusion process, which adds a phoneme to speech input, thus causes greater difficulty for L2 listeners than connectedspeech processes which alter or delete phonemes.
  • Warner, N. L., McQueen, J. M., Liu, P. Z., Hoffmann, M., & Cutler, A. (2012). Timing of perception for all English diphones [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1967.

    Abstract

    Information in speech does not unfold discretely over time; perceptual cues are gradient and overlapped. However, this varies greatly across segments and environments: listeners cannot identify the affricate in /ptS/ until the frication, but information about the vowel in /li/ begins early. Unlike most prior studies, which have concentrated on subsets of language sounds, this study tests perception of every English segment in every phonetic environment, sampling perceptual identification at six points in time (13,470 stimuli/listener; 20 listeners). Results show that information about consonants after another segment is most localized for affricates (almost entirely in the release), and most gradual for voiced stops. In comparison to stressed vowels, unstressed vowels have less information spreading to neighboring segments and are less well identified. Indeed, many vowels, especially lax ones, are poorly identified even by the end of the following segment. This may partly reflect listeners’ familiarity with English vowels’ dialectal variability. Diphthongs and diphthongal tense vowels show the most sudden improvement in identification, similar to affricates among the consonants, suggesting that information about segments defined by acoustic change is highly localized. This large dataset provides insights into speech perception and data for probabilistic modeling of spoken word recognition.
  • Braun, B., Tagliapietra, L., & Cutler, A. (2008). Contrastive utterances make alternatives salient: Cross-modal priming evidence. In Proceedings of Interspeech 2008 (pp. 69-69).

    Abstract

    Sentences with contrastive intonation are assumed to presuppose contextual alternatives to the accented elements. Two cross-modal priming experiments tested in Dutch whether such contextual alternatives are automatically available to listeners. Contrastive associates – but not non- contrastive associates - were facilitated only when primes were produced in sentences with contrastive intonation, indicating that contrastive intonation makes unmentioned contextual alternatives immediately available. Possibly, contrastive contours trigger a “presupposition resolution mechanism” by which these alternatives become salient.
  • Braun, B., Lemhöfer, K., & Cutler, A. (2008). English word stress as produced by English and Dutch speakers: The role of segmental and suprasegmental differences. In Proceedings of Interspeech 2008 (pp. 1953-1953).

    Abstract

    It has been claimed that Dutch listeners use suprasegmental cues (duration, spectral tilt) more than English listeners in distinguishing English word stress. We tested whether this asymmetry also holds in production, comparing the realization of English word stress by native English speakers and Dutch speakers. Results confirmed that English speakers centralize unstressed vowels more, while Dutch speakers of English make more use of suprasegmental differences.
  • Broersma, M., & Cutler, A. (2008). Phantom word activation in L2. System, 36(1), 22-34. doi:10.1016/j.system.2007.11.003.

    Abstract

    L2 listening can involve the phantom activation of words which are not actually in the input. All spoken-word recognition involves multiple concurrent activation of word candidates, with selection of the correct words achieved by a process of competition between them. L2 listening involves more such activation than L1 listening, and we report two studies illustrating this. First, in a lexical decision study, L2 listeners accepted (but L1 listeners did not accept) spoken non-words such as groof or flide as real English words. Second, a priming study demonstrated that the same spoken non-words made recognition of the real words groove, flight easier for L2 (but not L1) listeners, suggesting that, for the L2 listeners only, these real words had been activated by the spoken non-word input. We propose that further understanding of the activation and competition process in L2 lexical processing could lead to new understanding of L2 listening difficulty.
  • Cutler, A., Garcia Lecumberri, M. L., & Cooke, M. (2008). Consonant identification in noise by native and non-native listeners: Effects of local context. Journal of the Acoustical Society of America, 124(2), 1264-1268. doi:10.1121/1.2946707.

    Abstract

    Speech recognition in noise is harder in second (L2) than first languages (L1). This could be because noise disrupts speech processing more in L2 than L1, or because L1 listeners recover better though disruption is equivalent. Two similar prior studies produced discrepant results: Equivalent noise effects for L1 and L2 (Dutch) listeners, versus larger effects for L2 (Spanish) than L1. To explain this, the latter experiment was presented to listeners from the former population. Larger noise effects on consonant identification emerged for L2 (Dutch) than L1 listeners, suggesting that task factors rather than L2 population differences underlie the results discrepancy.
  • Cutler, A., McQueen, J. M., Butterfield, S., & Norris, D. (2008). Prelexically-driven perceptual retuning of phoneme boundaries. In Proceedings of Interspeech 2008 (pp. 2056-2056).

    Abstract

    Listeners heard an ambiguous /f-s/ in nonword contexts where only one of /f/ or /s/ was legal (e.g., frul/*srul or *fnud/snud). In later categorisation of a phonetic continuum from /f/ to /s/, their category boundaries had shifted; hearing -rul led to expanded /f/ categories, -nud expanded /s/. Thus phonotactic sequence information alone induces perceptual retuning of phoneme category boundaries; lexical access is not required.
  • Cutler, A. (2008). The abstract representations in speech processing. Quarterly Journal of Experimental Psychology, 61(11), 1601-1619. doi:10.1080/13803390802218542.

    Abstract

    Speech processing by human listeners derives meaning from acoustic input via intermediate steps involving abstract representations of what has been heard. Recent results from several lines of research are here brought together to shed light on the nature and role of these representations. In spoken-word recognition, representations of phonological form and of conceptual content are dissociable. This follows from the independence of patterns of priming for a word's form and its meaning. The nature of the phonological-form representations is determined not only by acoustic-phonetic input but also by other sources of information, including metalinguistic knowledge. This follows from evidence that listeners can store two forms as different without showing any evidence of being able to detect the difference in question when they listen to speech. The lexical representations are in turn separate from prelexical representations, which are also abstract in nature. This follows from evidence that perceptual learning about speaker-specific phoneme realization, induced on the basis of a few words, generalizes across the whole lexicon to inform the recognition of all words containing the same phoneme. The efficiency of human speech processing has its basis in the rapid execution of operations over abstract representations.
  • Goudbeek, M., Cutler, A., & Smits, R. (2008). Supervised and unsupervised learning of multidimensionally varying nonnative speech categories. Speech Communication, 50(2), 109-125. doi:10.1016/j.specom.2007.07.003.

    Abstract

    The acquisition of novel phonetic categories is hypothesized to be affected by the distributional properties of the input, the relation of the new categories to the native phonology, and the availability of supervision (feedback). These factors were examined in four experiments in which listeners were presented with novel categories based on vowels of Dutch. Distribution was varied such that the categorization depended on the single dimension duration, the single dimension frequency, or both dimensions at once. Listeners were clearly sensitive to the distributional information, but unidimensional contrasts proved easier to learn than multidimensional. The native phonology was varied by comparing Spanish versus American English listeners. Spanish listeners found categorization by frequency easier than categorization by duration, but this was not true of American listeners, whose native vowel system makes more use of duration-based distinctions. Finally, feedback was either available or not; this comparison showed supervised learning to be significantly superior to unsupervised learning.
  • Kim, J., Davis, C., & Cutler, A. (2008). Perceptual tests of rhythmic similarity: II. Syllable rhythm. Language and Speech, 51(4), 343-359. doi:10.1177/0023830908099069.

    Abstract

    To segment continuous speech into its component words, listeners make use of language rhythm; because rhythm differs across languages, so do the segmentation procedures which listeners use. For each of stress-, syllable-and mora-based rhythmic structure, perceptual experiments have led to the discovery of corresponding segmentation procedures. In the case of mora-based rhythm, similar segmentation has been demonstrated in the otherwise unrelated languages Japanese and Telugu; segmentation based on syllable rhythm, however, has been previously demonstrated only for European languages from the Romance family. We here report two target detection experiments in which Korean listeners, presented with speech in Korean and in French, displayed patterns of segmentation like those previously observed in analogous experiments with French listeners. The Korean listeners' accuracy in detecting word-initial target fragments in either language was significantly higher when the fragments corresponded exactly to a syllable in the input than when the fragments were smaller or larger than a syllable. We conclude that Korean and French listeners can call on similar procedures for segmenting speech, and we further propose that perceptual tests of speech segmentation provide a valuable accompaniment to acoustic analyses for establishing languages' rhythmic class membership.
  • Kooijman, V., Johnson, E. K., & Cutler, A. (2008). Reflections on reflections of infant word recognition. In A. D. Friederici, & G. Thierry (Eds.), Early language development: Bridging brain and behaviour (pp. 91-114). Amsterdam: Benjamins.
  • Chen, H.-C., & Cutler, A. (1997). Auditory priming in spoken and printed word recognition. In H.-C. Chen (Ed.), Cognitive processing of Chinese and related Asian languages (pp. 77-81). Hong Kong: Chinese University Press.
  • Cutler, A., & Otake, T. (1997). Contrastive studies of spoken-language processing. Journal of Phonetic Society of Japan, 1, 4-13.
  • Cutler, A., & Chen, H.-C. (1997). Lexical tone in Cantonese spoken-word processing. Perception and Psychophysics, 59, 165-179. Retrieved from http://www.psychonomic.org/search/view.cgi?id=778.

    Abstract

    In three experiments, the processing of lexical tone in Cantonese was examined. Cantonese listeners more often accepted a nonword as a word when the only difference between the nonword and the word was in tone, especially when the F0 onset difference between correct and erroneous tone was small. Same–different judgments by these listeners were also slower and less accurate when the only difference between two syllables was in tone, and this was true whether the F0 onset difference between the two tones was large or small. Listeners with no knowledge of Cantonese produced essentially the same same-different judgment pattern as that produced by the native listeners, suggesting that the results display the effects of simple perceptual processing rather than of linguistic knowledge. It is argued that the processing of lexical tone distinctions may be slowed, relative to the processing of segmental distinctions, and that, in speeded-response tasks, tone is thus more likely to be misprocessed than is segmental structure.
  • Cutler, A. (1997). Prosody and the structure of the message. In Y. Sagisaka, N. Campbell, & N. Higuchi (Eds.), Computing prosody: Computational models for processing spontaneous speech (pp. 63-66). Heidelberg: Springer.
  • Cutler, A., Dahan, D., & Van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40, 141-201.

    Abstract

    Research on the exploitation of prosodic information in the recognition of spoken language is reviewed. The research falls into three main areas: the use of prosody in the recognition of spoken words, in which most attention has been paid to the question of whether the prosodic structure of a word plays a role in initial contact with stored lexical representations; the use of prosody in the computation of syntactic structure, in which the resolution of global and local ambiguities has formed the central focus; and the role of prosody in the processing of discourse structure, in which there has been a preponderance of work on the contribution of accentuation and deaccentuation to integration of concepts with an existing discourse model. The review reveals that in each area progress has been made towards new conceptions of prosody's role in processing, and in particular this has involved abandonment of previously held deterministic views of the relationship between prosodic structure and other aspects of linguistic structure
  • Cutler, A. (1997). The comparative perspective on spoken-language processing. Speech Communication, 21, 3-15. doi:10.1016/S0167-6393(96)00075-1.

    Abstract

    Psycholinguists strive to construct a model of human language processing in general. But this does not imply that they should confine their research to universal aspects of linguistic structure, and avoid research on language-specific phenomena. First, even universal characteristics of language structure can only be accurately observed cross-linguistically. This point is illustrated here by research on the role of the syllable in spoken-word recognition, on the perceptual processing of vowels versus consonants, and on the contribution of phonetic assimilation phonemena to phoneme identification. In each case, it is only by looking at the pattern of effects across languages that it is possible to understand the general principle. Second, language-specific processing can certainly shed light on the universal model of language comprehension. This second point is illustrated by studies of the exploitation of vowel harmony in the lexical segmentation of Finnish, of the recognition of Dutch words with and without vowel epenthesis, and of the contribution of different kinds of lexical prosodic structure (tone, pitch accent, stress) to the initial activation of candidate words in lexical access. In each case, aspects of the universal processing model are revealed by analysis of these language-specific effects. In short, the study of spoken-language processing by human listeners requires cross-linguistic comparison.
  • Cutler, A. (1997). The syllable’s role in the segmentation of stress languages. Language and Cognitive Processes, 12, 839-845. doi:10.1080/016909697386718.
  • Koster, M., & Cutler, A. (1997). Segmental and suprasegmental contributions to spoken-word recognition in Dutch. In Proceedings of EUROSPEECH 97 (pp. 2167-2170). Grenoble, France: ESCA.

    Abstract

    Words can be distinguished by segmental differences or by suprasegmental differences or both. Studies from English suggest that suprasegmentals play little role in human spoken-word recognition; English stress, however, is nearly always unambiguously coded in segmental structure (vowel quality); this relationship is less close in Dutch. The present study directly compared the effects of segmental and suprasegmental mispronunciation on word recognition in Dutch. There was a strong effect of suprasegmental mispronunciation, suggesting that Dutch listeners do exploit suprasegmental information in word recognition. Previous findings indicating the effects of mis-stressing for Dutch differ with stress position were replicated only when segmental change was involved, suggesting that this is an effect of segmental rather than suprasegmental processing.
  • McQueen, J. M., & Cutler, A. (1997). Cognitive processes in speech perception. In W. J. Hardcastle, & J. D. Laver (Eds.), The handbook of phonetic sciences (pp. 556-585). Oxford: Blackwell.
  • Norris, D., McQueen, J. M., Cutler, A., & Butterfield, S. (1997). The possible-word constraint in the segmentation of continuous speech. Cognitive Psychology, 34, 191-243. doi:10.1006/cogp.1997.0671.

    Abstract

    We propose that word recognition in continuous speech is subject to constraints on what may constitute a viable word of the language. This Possible-Word Constraint (PWC) reduces activation of candidate words if their recognition would imply word status for adjacent input which could not be a word - for instance, a single consonant. In two word-spotting experiments, listeners found it much harder to detectapple,for example, infapple(where [f] alone would be an impossible word), than invuffapple(wherevuffcould be a word of English). We demonstrate that the PWC can readily be implemented in a competition-based model of continuous speech recognition, as a constraint on the process of competition between candidate words; where a stretch of speech between a candidate word and a (known or likely) word boundary is not a possible word, activation of the candidate word is reduced. This implementation accurately simulates both the present results and data from a range of earlier studies of speech segmentation.
  • Pallier, C., Cutler, A., & Sebastian-Galles, N. (1997). Prosodic structure and phonetic processing: A cross-linguistic study. In Proceedings of EUROSPEECH 97 (pp. 2131-2134). Grenoble, France: ESCA.

    Abstract

    Dutch and Spanish differ in how predictable the stress pattern is as a function of the segmental content: it is correlated with syllable weight in Dutch but not in Spanish. In the present study, two experiments were run to compare the abilities of Dutch and Spanish speakers to separately process segmental and stress information. It was predicted that the Spanish speakers would have more difficulty focusing on the segments and ignoring the stress pattern than the Dutch speakers. The task was a speeded classification task on CVCV syllables, with blocks of trials in which the stress pattern could vary versus blocks in which it was fixed. First, we found interference due to stress variability in both languages, suggesting that the processing of segmental information cannot be performed independently of stress. Second, the effect was larger for Spanish than for Dutch, suggesting that that the degree of interference from stress variation may be partially mitigated by the predictability of stress placement in the language.
  • Suomi, K., McQueen, J. M., & Cutler, A. (1997). Vowel harmony and speech segmentation in Finnish. Journal of Memory and Language, 36, 422-444. doi:10.1006/jmla.1996.2495.

    Abstract

    Finnish vowel harmony rules require that if the vowel in the first syllable of a word belongs to one of two vowel sets, then all subsequent vowels in that word must belong either to the same set or to a neutral set. A harmony mismatch between two syllables containing vowels from the opposing sets thus signals a likely word boundary. We report five experiments showing that Finnish listeners can exploit this information in an on-line speech segmentation task. Listeners found it easier to detect words likehymyat the end of the nonsense stringpuhymy(where there is a harmony mismatch between the first two syllables) than in the stringpyhymy(where there is no mismatch). There was no such effect, however, when the target words appeared at the beginning of the nonsense string (e.g.,hymypuvshymypy). Stronger harmony effects were found for targets containing front harmony vowels (e.g.,hymy) than for targets containing back harmony vowels (e.g.,paloinkypaloandkupalo). The same pattern of results appeared whether target position within the string was predictable or unpredictable. Harmony mismatch thus appears to provide a useful segmentation cue for the detection of word onsets in Finnish speech.

Share this page