Comprehension Dept Publications

Publications Language and Comprehension

Displaying 341 - 360 of 836
  • Kooijman, V., Hagoort, P., & Cutler, A. (2009). Prosodic structure in early word segmentation: ERP evidence from Dutch ten-month-olds. Infancy, 14, 591 -612. doi:10.1080/15250000903263957.

    Abstract

    Recognizing word boundaries in continuous speech requires detailed knowledge of the native language. In the first year of life, infants acquire considerable word segmentation abilities. Infants at this early stage in word segmentation rely to a large extent on the metrical pattern of their native language, at least in stress-based languages. In Dutch and English (both languages with a preferred trochaic stress pattern), segmentation of strong-weak words develops rapidly between 7 and 10 months of age. Nevertheless, trochaic languages contain not only strong-weak words but also words with a weak-strong stress pattern. In this article, we present electrophysiological evidence of the beginnings of weak-strong word segmentation in Dutch 10-month-olds. At this age, the ability to combine different cues for efficient word segmentation does not yet seem to be completely developed. We provide evidence that Dutch infants still largely rely on strong syllables, even for the segmentation of weak-strong words.
  • Burnham, D., Ambikairajah, E., Arciuli, J., Bennamoun, M., Best, C. T., Bird, S., Butcher, A. R., Cassidy, S., Chetty, G., Cox, F. M., Cutler, A., Dale, R., Epps, J. R., Fletcher, J. M., Goecke, R., Grayden, D. B., Hajek, J. T., Ingram, J. C., Ishihara, S., Kemp, N. and 10 moreBurnham, D., Ambikairajah, E., Arciuli, J., Bennamoun, M., Best, C. T., Bird, S., Butcher, A. R., Cassidy, S., Chetty, G., Cox, F. M., Cutler, A., Dale, R., Epps, J. R., Fletcher, J. M., Goecke, R., Grayden, D. B., Hajek, J. T., Ingram, J. C., Ishihara, S., Kemp, N., Kinoshita, Y., Kuratate, T., Lewis, T. W., Loakes, D. E., Onslow, M., Powers, D. M., Rose, P., Togneri, R., Tran, D., & Wagner, M. (2009). A blueprint for a comprehensive Australian English auditory-visual speech corpus. In M. Haugh, K. Burridge, J. Mulder, & P. Peters (Eds.), Selected proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus (pp. 96-107). Somerville, MA: Cascadilla Proceedings Project.

    Abstract

    Large auditory-visual (AV) speech corpora are the grist of modern research in speech science, but no such corpus exists for Australian English. This is unfortunate, for speech science is the brains behind speech technology and applications such as text-to-speech (TTS) synthesis, automatic speech recognition (ASR), speaker recognition and forensic identification, talking heads, and hearing prostheses. Advances in these research areas in Australia require a large corpus of Australian English. Here the authors describe a blueprint for building the Big Australian Speech Corpus (the Big ASC), a corpus of over 1,100 speakers from urban and rural Australia, including speakers of non-indigenous, indigenous, ethnocultural, and disordered forms of Australian English, each of whom would be sampled on three occasions in a range of speech tasks designed by the researchers who would be using the corpus.
  • Tagliapietra, L., Fanari, R., Collina, S., & Tabossi, P. (2009). Syllabic effects in Italian lexical access. Journal of Psycholinguistic Research, 38(6), 511-526. doi:10.1007/s10936-009-9116-4.

    Abstract

    Two cross-modal priming experiments tested whether lexical access is constrained by syllabic structure in Italian. Results extend the available Italian data on the processing of stressed syllables showing that syllabic information restricts the set of candidates to those structurally consistent with the intended word (Experiment 1). Lexical access, however, takes place as soon as possible and it is not delayed till the incoming input corresponds to the first syllable of the word. And, the initial activated set includes candidates whose syllabic structure does not match the intended word (Experiment 2). The present data challenge the early hypothesis that in Romance languages syllables are the units for lexical access during spoken word recognition. The implications of the results for our understanding of the role of syllabic information in language processing are discussed.
  • Di Betta, A. M., Weber, A., & McQueen, J. M. (2009). Trick or treat? Adaptation to Italian-accented English speech by native English, Italian, and Dutch listeners. Poster presented at 15th Annual Conference on Architectures and Mechanisms for Language Processing (AMLaP 2009), Barcelona.

    Abstract

    English is spoken worldwide by both native (L1) and nonnative (L2) speakers. It is therefore imperative to establish how easily L1 and L2 speakers understand each other. We know that L1 listeners adapt to foreign-accented speech very rapidly (Clarke & Garrett, 2004), and L2 listeners find L2 speakers (from matched and mismatched L1 backgrounds) as intelligible as native speakers (Bent & Bradlow, 2003). But foreign-accented speech can deviate widely from L1 pronunciation norms, for example when adult L2 learners experience difficulties in producing L2 phonemes that are not part of their native repertoire (Strange, 1995). For instance, Italian L2 learners of English often lengthen the lax English vowel /I/, making it sound more like the tense vowel /i/ (Flege et al., 1999). This blurs the distinction between words such as bin and bean. Unless listeners are able to adapt to this kind of pronunciation variance, it would hinder word recognition by both L1 and L2 listeners (e.g., /bin/ could mean either bin or bean). In this study we investigate whether Italian-accented English interferes with on-line word recognition for native English listeners and for nonnative English listeners, both those where the L1 matches the speaker accent (i.e., Italian listeners) and those with an L1 mismatch (i.e., Dutch listeners). Second, we test whether there is perceptual adaptation to the Italian-accented speech during the experiment in each of the three listener groups. Participants in all groups took part in the same cross-modal priming experiment. They heard spoken primes and made lexical decisions to printed targets, presented at the acoustic offset of the prime. The primes, spoken by a native Italian, consisted of 80 English words, half with /I/ in their standard pronunciation but mispronounced with an /i/ (e.g., trick spoken as treek), and half with /i/ in their standard pronunciation and pronounced correctly (e.g., treat). These words also appeared as targets, following either a related prime (which was either identical, e.g., treat-treat, or mispronounced, e.g., treek-trick) or an unrelated prime. All three listener groups showed identity priming (i.e., faster decisions to treat after hearing treat than after an unrelated prime), both overall and in each of the two halves of the experiment. In addition, the Italian listeners showed mispronunciation priming (i.e., faster decisions to trick after hearing treek than after an unrelated prime) in both halves of the experiment, while the English and Dutch listeners showed mispronunciation priming only in the second half of the experiment. These results suggest that Italian listeners, prior to the experiment, have learned to deal with Italian-accented English, and that English and Dutch listeners, during the experiment, can rapidly adapt to Italian-accented English. For listeners already familiar with a particular accent (e.g., through their own pronunciation), it appears that they have already learned how to interpret words with mispronounced vowels. Listeners who are less familiar with a foreign accent can quickly adapt to the way a particular speaker with that accent talks, even if that speaker is not talking in the listeners’ native language.
  • Jesse, A., & Janse, E. (2009). Seeing a speaker's face helps stream segregation for younger and elderly adults [Abstract]. Journal of the Acoustical Society of America, 125(4), 2361.
  • Tagliapietra, L., Fanari, R., De Candia, C., & Tabossi, P. (2009). Phonotactic regularities in the segmentation of spoken Italian. Quarterly Journal of Experimental Psychology, 62(2), 392 -415. doi:10.1080/17470210801907379.

    Abstract

    Five word-spotting experiments explored the role of consonantal and vocalic phonotactic cues in the segmentation of spoken Italian. The first set of experiments tested listeners' sensitivity to phonotactic constraints cueing syllable boundaries. Participants were slower in spotting words in nonsense strings when target onsets were misaligned (e.g., lago in ri.blago) than when they were aligned (e.g., lago in rin.lago) with phonotactically determined syllabic boundaries. This effect held also for sequences that occur only word-medially (e.g., /tl/ in ri.tlago), and competition effects could not account for the disadvantage in the misaligned condition. Similarly, target detections were slower when their offsets were misaligned (e.g., cittaacute in cittaacuteu.ba) than when they were aligned (e.g., cittaacute in cittaacute.oba) with a phonotactic syllabic boundary. The second set of experiments tested listeners' sensitivity to phonotactic cues, which specifically signal lexical (and not just syllable) boundaries. Results corroborate the role of syllabic information in speech segmentation and suggest that Italian listeners make little use of additional phonotactic information that specifically cues word boundaries.

    Files private

    Request files
  • Janse, E., & Ernestus, M. (2009). Recognition of reduced speech and use of phonetic context in listeners with age-related hearing impairment. Poster presented at 157th Meeting of the Acoustical Society of America, Portland, OR.
  • Janse, E., & Jesse, A. (2009). Audiovisual benefit for stream segregation in elderly listeners. Talk presented at Colloquium at Volen National Center for Complex Systems. Brandeis University. Waltham, MA. 2009-05-12.
  • Kuzla, C. (2009). Prosodic structure in speech production and perception. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Tyler, M., & Cutler, A. (2009). Cross-language differences in cue use for speech segmentation. Journal of the Acoustical Society of America, 126, 367-376. doi:10.1121/1.3129127.

    Abstract

    Two artificial-language learning experiments directly compared English, French, and Dutch listeners’ use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable “words.” These words were demarcated by(a) no cue other than transitional probabilities induced by their recurrence, (b) a consistent left-edge cue, or (c) a consistent right-edge cue. Experiment 1 examined a vowel lengthening cue. All three listener groups benefited from this cue in right-edge position; none benefited from it in left-edge position. Experiment 2 examined a pitch-movement cue. English listeners used this cue in left-edge position, French listeners used it in right-edge position, and Dutch listeners used it in both positions. These findings are interpreted as evidence of both language-universal and language-specific effects. Final lengthening is a language-universal effect expressing a more general (non-linguistic) mechanism. Pitch movement expresses prominence which has characteristically different placements across languages: typically at right edges in French, but at left edges in English and Dutch. Finally, stress realization in English versus Dutch encourages greater attention to suprasegmental variation by Dutch than by English listeners, allowing Dutch listeners to benefit from an informative pitch-movement cue even in an uncharacteristic position.
  • Andics, A., McQueen, J. M., Petersson, K. M., Gál, V., & Vidnyánszky, Z. (2009). Neural correlates of voice category learning - An audiovisual fMRI study. Poster presented at 12th Meeting of the Hungarian Neuroscience Society, Budapest.

    Abstract

    Voices in the auditory modality, like faces in the visual modality, are the keys to person recognition. This fMRI experiment investigated the neural organisation of voice categories using a voice-training paradigm. Voice-morph continua were created between two female Hungarian speakers' voices saying six monosyllabic Hungarian words, one continuum per word. Listeners were trained to categorize the middle part of the continua as one voice. This trained voice category was associated with a face. Twenty-five listeners were tested twice with a one-week delay. To induce shifts in the trained category, listeners received feedback on their judgments such that the trained category was associated with different voice-morph intervals each week, allowing within-subject manipulation of whether stimuli corresponded to a trained voice-category centre, to a category boundary or to another voice. FMRI tests each week were preceded by eighty minutes training distributed over two consecutive days. The tests included implicit and explicit categorization tasks. Voice and face selective areas were defined in separate localizer runs. Group-averaged local maxima from these runs were used for small-volume correction analyses. During implicit categorization, stimuli corresponding to trained voice-category centres elicited lower activity than other stimuli in voice-selective regions of the right STS. During explicit categorization, stimuli corresponding to trained voice-category boundaries elicited higher activity than other stimuli in voice-selective regions of the right VLPFC. Furthermore, the unimodal presentation of voices that are more associated with a face may elicit higher activity in visual areas. These results map out the way voice categories are neurally represented.
  • Massaro, D. W., & Jesse, A. (2009). Read my lips: Speech distortions in musical lyrics can be overcome (slightly) by facial information. Speech Communication, 51(7), 604-621. doi:10.1016/j.specom.2008.05.013.

    Abstract

    Understanding the lyrics of many contemporary songs is difficult, and an earlier study [Hidalgo-Barnes, M., Massaro, D.W., 2007. Read my lips: an animated face helps communicate musical lyrics. Psychomusicology 19, 3–12] showed a benefit for lyrics recognition when seeing a computer-animated talking head (Baldi) mouthing the lyrics along with hearing the singer. However, the contribution of visual information was relatively small compared to what is usually found for speech. In the current experiments, our goal was to determine why the face appears to contribute less when aligned with sung lyrics than when aligned with normal speech presented in noise. The first experiment compared the contribution of the talking head with the originally sung lyrics versus the case when it was aligned with the Festival text-to-speech synthesis (TtS) spoken at the original duration of the song’s lyrics. A small and similar influence of the face was found in both conditions. In the three experiments, we compared the presence of the face when the durations of the TtS were equated with the duration of the original musical lyrics to the case when the lyrics were read with typical TtS durations and this speech embedded in noise. The results indicated that the unusual temporally distorted durations of musical lyrics decreases the contribution of the visible speech from the face.
  • Van de Ven, M., Tucker, B. V., & Ernestus, M. (2009). Semantic context effects in the recognition of acoustically unreduced and reduced words. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (pp. 1867-1870). Causal Productions Pty Ltd.

    Abstract

    Listeners require context to understand the casual pronunciation variants of words that are typical of spontaneous speech (Ernestus et al., 2002). The present study reports two auditory lexical decision experiments, investigating listeners' use of semantic contextual information in the comprehension of unreduced and reduced words. We found a strong semantic priming effect for low frequency unreduced words, whereas there was no such effect for reduced words. Word frequency was facilitatory for all words. These results show that semantic context is relevant especially for the comprehension of unreduced words, which is unexpected given the listener driven explanation of reduction in spontaneous speech.
  • Jesse, A. (2009). The face in audiovisual speech perception. Talk presented at Max Planck Research Network on Cognition Workshop Faces in Social Interactions. Harnack-Haus, Berlin, Germany. 2009-06-04.
  • Warner, N., Luna, Q., Butler, L., & Van Volkinburg, H. (2009). Revitalization in a scattered language community: Problems and methods from the perspective of Mutsun language revitalization. International Journal of the Sociology of Language, 198, 135-148. doi:10.1515/IJSL.2009.031.

    Abstract

    This article addresses revitalization of a dormant language whose prospective speakers live in scattered geographical areas. In comparison to increasing the usage of an endangered language, revitalizing a dormant language (one with no living speakers) requires different methods to gain knowledge of the language. Language teaching for a dormant language with a scattered community presents different problems from other teaching situations. In this article, we discuss the types of tasks that must be accomplished for dormant-language revitalization, with particular focus on development of teaching materials. We also address the role of computer technologies, arguing that each use of technology should be evaluated for how effectively it increases fluency. We discuss methods for achieving semi-fluency for the first new speakers of a dormant language, and for spreading the language through the community.
  • Brouwer, S., Mitterer, H., & Huettig, F. (2009). Listeners reconstruct reduced forms during spontaneous speech: Evidence from eye movements. Poster presented at 15th Annual Conference on Architectures and Mechanisms for Language Processing (AMLaP 2009), Barcelona, Spain.
  • Ogasawara, N., & Warner, N. (2009). Processing missing vowels: Allophonic processing in Japanese. Language and Cognitive Processes, 24, 376 -411. doi:10.1080/01690960802084028.

    Abstract

    The acoustic realisation of a speech sound varies, often showing allophonic variation triggered by surrounding sounds. Listeners recognise words and sounds well despite such variation, and even make use of allophonic variability in processing. This study reports five experiments on processing of the reduced/unreduced allophonic alternation of Japanese high vowels. The results show that listeners use phonological knowledge of their native language during phoneme processing and word recognition. However, interactions of the phonological and acoustic effects differ in these two processes. A facilitatory phonological effect and an inhibitory acoustic effect cancel one another out in phoneme processing; while in word recognition, the facilitatory phonological effect overrides the inhibitory acoustic effect. Four potential models of the processing of allophonic variation are discussed. The results can be accommodated in two of them, but require additional assumptions or modifications to the models, and primarily support lexical specification of allophonic variability.

    Files private

    Request files
  • Jesse, A., & Janse, E. (2009). Seeing a speaker's face helps stream segregation for younger and elderly adults. Poster presented at 157th Meeting of the Acoustical Society of America, Portland, OR.
  • Cutler, A. (2009). Greater sensitivity to prosodic goodness in non-native than in native listeners. Journal of the Acoustical Society of America, 125, 3522-3525. doi:10.1121/1.3117434.

    Abstract

    English listeners largely disregard suprasegmental cues to stress in recognizing words. Evidence for this includes the demonstration of Fear et al. [J. Acoust. Soc. Am. 97, 1893–1904 (1995)] that cross-splicings are tolerated between stressed and unstressed full vowels (e.g., au- of autumn, automata). Dutch listeners, however, do exploit suprasegmental stress cues in recognizing native-language words. In this study, Dutch listeners were presented with English materials from the study of Fear et al. Acceptability ratings by these listeners revealed sensitivity to suprasegmental mismatch, in particular, in replacements of unstressed full vowels by higher-stressed vowels, thus evincing greater sensitivity to prosodic goodness than had been shown by the original native listener group.
  • Davids, N., Van den Brink, D., Van Turennout, M., Mitterer, H., & Verhoeven, L. (2009). Towards neurophysiological assessment of phonemic discrimination: Context effects of the mismatch negativity. Clinical Neurophysiology, 120, 1078-1086. doi:10.1016/j.clinph.2009.01.018.

    Abstract

    This study focusses on the optimal paradigm for simultaneous assessment of auditory and phonemic discrimination in clinical populations. We investigated (a) whether pitch and phonemic deviants presented together in one sequence are able to elicit mismatch negativities (MMNs) in healthy adults and (b) whether MMN elicited by a change in pitch is modulated by the presence of the phonemic deviants.

Share this page