Comprehension Dept Publications

Publications Language and Comprehension

Displaying 281 - 300 of 836
  • Cutler, A. (2010). How the native language shapes listening to speech. LOT Winter School 2010, Amsterdam, Free University (VU). Amsterdam, the Netherlands, 2010-01-18 - 2010-01-22.
  • Brouwer, S., Van Engen, K., Calandruccio, L., & Bradlow, A. (2009). Linguistic masking in speech perception under adverse conditions. Talk presented at 50th Annual Meeting of the Psychonomic Society. Boston, MA. 2009-11-19 - 2009-11-22.
  • Jesse, A., & Janse, E. (2010). Seeing a speaker talk when also hearing a competing talker benefits elderly adults. Poster presented at Workshop, "Psycholinguistic approaches to speech recognition in adverse conditions", University of Bristol, UK.
  • Cutler, A. (2010). The continuity of speech, and the continuous development of listeners' ability to deal with it. Talk presented at CSCA Lecture [Cognitive Science Center Amsterdam]. University of Amsterdam, The Netherlands. 2010-03-17.

    Abstract

    Speech is a continuous stream. Listeners can only make sense of speech by identifying the components that comprise it - words. Segmenting speech into words is an operation which has to be learned very early, since it is how infants compile even their initial vocabulary. Infants' relative success at achieving speech segmentation in fact turns out to be a direct predictor of language skills during later development. Adult listeners segment speech so efficiently, however, that they are virtually never aware of the operation of segmentation. In part they achieve this level of efficiency by exploiting accrued knowledge of relevant structure in the native language. Amassing this language-specific knowledge also starts in infancy. However, some relevant features call on more advanced levels of language processing ability; the continuous refinement of segmentation skills is apparent in that these structural features are exploited for segmentation too, even when applying them means overturning otherwise universal constraints available in infancy.
  • Cutler, A., & Shanley, J. (2010). Validation of a training method for L2 continuous-speech segmentation. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 1844-1847).

    Abstract

    Recognising continuous speech in a second language is often unexpectedly difficult, as the operation of segmenting speech is so attuned to native-language structure. We report the initial steps in development of a novel training method for second-language listening, focusing on speech segmentation and employing a task designed for studying this: word-spotting. Listeners detect real words in sequences consisting of a word plus a minimal context. The present validation study shows that learners from varying non-English backgrounds successfully perform a version of this task in English, and display appropriate sensitivity to structural factors that also affect segmentation by native English listeners.
  • Cutler, A., El Aissati, A., Hanulikova, A., & McQueen, J. M. (2010). Effects on speech parsing of vowelless words in the phonology. In Abstracts of Laboratory Phonology 12 (pp. 115-116).
  • Warner, N., Otake, T., & Arai, A. (2010). Intonational structure as a word-boundary cue in Tokyo Japanese. Language and Speech, 53, 107-131. doi:10.1177/0023830909351235.

    Abstract

    While listeners are recognizing words from the connected speech stream, they are also parsing information from the intonational contour. This contour may contain cues to word boundaries, particularly if a language has boundary tones that occur at a large proportion of word onsets. We investigate how useful the pitch rise at the beginning of an accentual phrase (APR) would be as a potential word-boundary cue for Japanese listeners. A corpus study shows that it should allow listeners to locate approximately 40–60% of word onsets, while causing less than 1% false positives. We then present a word-spotting study which shows that Japanese listeners can, indeed, use accentual phrase boundary cues during segmentation. This work shows that the prosodic patterns that have been found in the production of Japanese also impact listeners’ processing.
  • Brouwer, S., Mitterer, H., & Huettig, F. (2010). Shadowing reduced speech and alignment. Journal of the Acoustical Society of America, 128(1), EL32-EL37. doi:10.1121/1.3448022.

    Abstract

    This study examined whether listeners align to reduced speech. Participants were asked to shadow sentences from a casual speech corpus containing canonical and reduced targets. Participants' productions showed alignment: durations of canonical targets were longer than durations of reduced targets; and participants often imitated the segment types (canonical versus reduced) in both targets. The effect sizes were similar to previous work on alignment. In addition, shadowed productions were overall longer in duration than the original stimuli and this effect was larger for reduced than canonical targets. A possible explanation for this finding is that listeners reconstruct canonical forms from reduced forms.
  • Reinisch, E., Jesse, A., & Nygaard, L. C. (2010). Tone of voice helps learning the meaning of novel adjectives [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 114). York: University of York.

    Abstract

    To understand spoken words listeners have to cope with seemingly meaningless variability in the speech signal. Speakers vary, for example, their tone of voice (ToV) by changing speaking rate, pitch, vocal effort, and loudness. This variation is independent of "linguistic prosody" such as sentence intonation or speech rhythm. The variation due to ToV, however, is not random. Speakers use, for example, higher pitch when referring to small objects than when referring to large objects and importantly, adult listeners are able to use these non-lexical ToV cues to distinguish between the meanings of antonym pairs (e.g., big-small; Nygaard, Herold, & Namy, 2009). In the present study, we asked whether listeners infer the meaning of novel adjectives from ToV and subsequently interpret these adjectives according to the learned meaning even in the absence of ToV. Moreover, if listeners actually acquire these adjectival meanings, then they should generalize these word meanings to novel referents. ToV would thus be a semantic cue to lexical acquisition. This hypothesis was tested in an exposure-test paradigm with adult listeners. In the experiment listeners' eye movements to picture pairs were monitored. The picture pairs represented the endpoints of the adjectival dimensions big-small, hot-cold, and strong-weak (e.g., an elephant and an ant represented big-small). Four picture pairs per category were used. While viewing the pictures participants listened to lexically unconstraining sentences containing novel adjectives, for example, "Can you find the foppick one?" During exposure, the sentences were spoken in infant-directed speech with the intended adjectival meaning expressed by ToV. Word-meaning pairings were counterbalanced across participants. Each word was repeated eight times. Listeners had no explicit task. To guide listeners' attention to the relation between the words and pictures, three sets of filler trials were included that contained real English adjectives (e.g., full-empty). In the subsequent test phase participants heard the novel adjectives in neutral adult-directed ToV. Test sentences were recorded before the speaker was informed about intended word meanings. Participants had to choose which of two pictures on the screen the speaker referred to. Picture pairs that were presented during the exposure phase and four new picture pairs per category that varied along the critical dimensions were tested. During exposure listeners did not spontaneously direct their gaze to the intended referent at the first presentation. But as indicated by listener's fixation behavior, they quickly learned the relationship between ToV and word meaning over only two exposures. Importantly, during test participants consistently identified the intended referent object even in the absence of informative ToV. Learning was found for all three tested categories and did not depend on whether the picture pairs had been presented during exposure. Listeners thus use ToV not only to distinguish between antonym pairs but they are able to extract word meaning from ToV and assign this meaning to novel words. The newly learned word meanings can then be generalized to novel referents even in the absence of ToV cues. These findings suggest that ToV can be used as a semantic cue to lexical acquisition. References Nygaard, L. C., Herold, D. S., & Namy, L. L. (2009) The semantics of prosody: Acoustic and perceptual evidence of prosodic correlates to word meaning. Cognitive Science, 33. 127-146.
  • Hamans, C., & Seuren, P. A. M. (2010). Chomsky in search of a pedigree. In D. A. Kibbee (Ed.), Chomskyan (R)evolutions (pp. 377-394). Amsterdam/Philadelphia: Benjamins.

    Abstract

    This paper follows the changing fortunes of Chomsky’s search for a pedigree in the history of Western thought during the late 1960s. Having achieved a unique position of supremacy in the theory of syntax and having exploited that position far beyond the narrow circles of professional syntacticians, he felt the need to shore up his theory with the authority of history. It is shown that this attempt, resulting mainly in his Cartesian Linguistics of 1966, was widely, and rightly, judged to be a radical failure, even though it led to a sudden revival of interest in the history of linguistics. Ironically, the very upswing in historical studies caused by Cartesian Linguistics ended up showing that the real pedigree belongs to Generative Semantics, developed by the same ‘angry young men’ Chomsky was so bent on destroying.
  • Mitterer, H., & Jesse, A. (2010). Correlation versus causation in multisensory perception. Psychonomic Bulletin & Review, 17, 329-334. doi:10.3758/PBR.17.3.329.

    Abstract

    Events are often perceived in multiple modalities. The co-occurring proximal visual and auditory stimuli events are mostly also causally linked to the distal event. This makes it difficult to evaluate whether learned correlation or perceived causation guides binding in multisensory perception. Piano tones are an interesting exception: Piano tones are associated with seeing key strokes but are directly caused by hammers that hit strings hidden from observation. We examined the influence of seeing the hammer or the key stroke on auditory temporal order judgments (TOJ). Participants judged the temporal order of a dog bark and a piano tone, while seeing the piano stroke shifted temporally relative to its audio signal. Visual lead increased "piano-first" responses in auditory TOJ, but more so if only the associated key stroke than if the sound-producing hammer was visible, though both were equally visually salient. This provides evidence for a learning account of audiovisual perception.
  • Junge, C., Hagoort, P., & Cutler, A. (2010). Early word segmentation ability is related to later word processing skill. Poster presented at XVIIIth Biennial International Conference on Infant Studies, Baltimore, MD.
  • Cutler, A. (2010). Native listening: How the native language shapes listening to speech. Talk presented at Cognitive Neuroscience: New Challenges and Future Developments: BCBL Scientific opening ceremony congress. Basque Center on Cognition, Brain and Language, San Sebastian. 2010-05-21.
  • Cutler, A., & Broersma, M. (2010). Competition dynamics in second language listening. Talk presented at Psycholinguistic approaches to speech recognition in adverse conditions. University of Bristol, UK. 2010-03-09.

    Abstract

    Listening, in any language, involves processing the phone¬mic content of spoken input in order to identify the words of which utterances are made up. Models of spoken-word recognition agree that deriving the correct word sequence from speech input is a process in which multiple candidate words are considered in parallel, and in which words com¬pete with one another where they separately lay claim on the same input. In ideal speech situations, the phonemic sequence of each spoken word would be fully instantiated in the speech signal, listeners would correctly identify every one of these uttered phonemes, and listeners stored lexical representations would exactly match with the form in which the words are encoun¬tered in speech. In the real world, with which this workshop is concerned, none of these propositions is guaranteed to hold. This presentation addresses the particular case of lis¬tening in a second language (L2), and the potential effects on the word recognition process of misidentifying a pho¬neme in the input. The misidentification of L2 phonemes is a notorious source of listening difficulty. Contrary to many L2 users’ intuitions, however, the most serious problem is not lexical indistinguishability (‘write’ heard as ‘light’, etc.). There are two reasons for this: Spurious homophones such as ‘light/write’ would contribute only a trivial increase to the signif¬icant number of real homophones in the lexicon, and the number of fully indistinguishable words is massively out¬weighed by the set of temporarily indistinguishable words, which nevertheless add to the amount of lexical competition that a L2 listener experiences. The relevant lexical statistics will be presented in support of this claim. Exacerbating the increase in lexical competition in L2 is the curious situation whereby an L2 user’s lexical represen¬tations can encode phonemic distinctions which are not reli¬ably perceived by the same person in spoken input. This com¬bination of lexical accuracy with perceptual inaccuracy, now repeatedly established in listening experiments, is fatal in the competition situation. As will be illustrated in simulations with a computationally implemented spoken-word recog¬nition model, the combination inevitably results in compe¬tition which is more persistent than the competition from accurately perceived words. Word recognition experiments with L2 listeners confirm that this extra-persistent compe¬tition is indeed observed. The real world of the second-lan¬guage listener is more competition-prone than the world of the native-language listener, ideal or real.
  • Broersma, M. (2010). Korean lenis, fortis, and aspirated stops: Effect of place of articulation on acoustic realization. Poster presented at 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan.

    Abstract

    Unlike most of the world's languages, Korean distinguishes three types of voiceless stops, namely lenis, fortis, and aspirated stops. All occur at three places of articulation. In previous work, acoustic measurements are mostly collapsed over the three places of articulation. This study therefore provides acoustic measurements of Korean lenis, fortis, and aspirated stops at all three places of articulation separately. Clear differences are found among the acoustic characteristics of the stops at the different places of articulation
  • Scharenborg, O. (2010). Modeling the use of durational information in human spoken-word recognition. Journal of the Acoustical Society of America, 127, 3758-3770. doi:10.1121/1.3377050.

    Abstract

    Evidence that listeners, at least in a laboratory environment, use durational cues to help resolve temporarily ambiguous speech input has accumulated over the past decades. This paper introduces Fine-Tracker, a computational model of word recognition specifically designed for tracking fine-phonetic information in the acoustic speech signal and using it during word recognition. Two simulations were carried out using real speech as input to the model. The simulations showed that the Fine-Tracker, as has been found for humans, benefits from durational information during word recognition, and uses it to disambiguate the incoming speech signal. The availability of durational information allows the computational model to distinguish embedded words from their matrix words first simulation, and to distinguish word final realizations of s from word initial realizations second simulation. Fine-Tracker thus provides the first computational model of human word recognition that is able to extract durational information from the speech signal and to use it to differentiate words.
  • Junge, C., Cutler, A., & Hagoort, P. (2010). Dynamics of early word learning in nine-month-olds: An ERP study. Poster presented at FENS forum 2010 - 7th FENS Forum of European Neuroscience, Amsterdam, The Netherlands.

    Abstract

    What happens in the brain when infants are learning the meaning of words? Only a few studies (Torkildsen et al., 2008; Friedrich & Friederici, 2008) addressed this question, but they focused only on novel word learning, not on the acquisition of infant first words. From behavioral research we know that 12-month-olds can recognize novel exemplars of early typical word categories, but only after training them from nine months on (Schafer, 2005). What happens in the brain during such a training? With event-related potentials, we studied the effect of training context on word comprehension. We manipulated the type/token ratio of the training context (one versus six exemplars). 24 normal-developing Dutch nine-month-olds (+/- 14 days, 12 boys) participated. Twenty easily depictive words were chosen based on parental vocabulary reports for 15-months-olds. All trials consisted of a high-resolution photograph shown for 2200ms, with an acoustic label presented at 1000ms. Each training-test block contrasted two words that did not share initial phonemes or semantic class. The training phase started with six trials of one category, followed by six trials of the second category. Results show more negative responses for the more frequent pairings, consistent with word familiarization studies in older infants (Torkildsen et al., 2008; Friedrich & Friederici, 2008). This increase appears to be larger if the pictures changed. In the test phase we tested word comprehension for novel exemplars with the picture-word mismatch paradigm. Here, we observed a similar N400 as Mills et al. (2005) did for 13-month-olds. German 12-month-olds, however, did not show such an effect (Friedrich & Friederici, 2005). Our study makes it implausible that the latter is due to an immaturity of the N400 mechanism. The N400 was present in Dutch 9-month-olds, even though some parents judged their child not to understand most of the words. There was no interaction by training type, suggesting that type/token ratio does not affect infant word recognition of novel exemplars.
  • Cutler, A. (2010). How the native language shapes listening to speech. LOT Winter School. Amsterdam, 2010-01-18 - 2010-01-22.
  • Lecumberri, M. L. G., Cooke, M., & Cutler, A. (Eds.). (2010). Non-native speech perception in adverse conditions [Special Issue]. Speech Communication, 52(11/12).
  • Benders, T., Escudero, P., & Sjerps, M. J. (2010). The interrelation between the number of response options and the stimulus range in vowel categorization. Poster presented at the 160th Meeting of the Acoustical Society of America, Cancun, Mexico.

Share this page