Publications

Displaying 1 - 100 of 154
  • Adank, P., Smits, R., & Van Hout, R. (2003). Modeling perceived vowel height, advancement, and rounding. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 647-650). Adelaide: Causal Productions.
  • Allen, S. E. M. (1998). A discourse-pragmatic explanation for the subject-object asymmetry in early null arguments. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings of the GALA '97 Conference on Language Acquisition (pp. 10-15). Edinburgh, UK: Edinburgh University Press.

    Abstract

    The present paper assesses discourse-pragmatic factors as a potential explanation for the subject-object assymetry in early child language. It identifies a set of factors which characterize typical situations of informativeness (Greenfield & Smith, 1976), and uses these factors to identify informative arguments in data from four children aged 2;0 through 3;6 learning Inuktitut as a first language. In addition, it assesses the extent of the links between features of informativeness on one hand and lexical vs. null and subject vs. object arguments on the other. Results suggest that a pragmatics account of the subject-object asymmetry can be upheld to a greater extent than previous research indicates, and that several of the factors characterizing informativeness are good indicators of those arguments which tend to be omitted in early child language.
  • Amatuni, A., Schroer, S. E., Zhang, Y., Peters, R. E., Reza, M. A., Crandall, D., & Yu, C. (2021). In-the-moment visual information from the infant's egocentric view determines the success of infant word learning: A computational study. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 265-271). Vienna: Cognitive Science Society.

    Abstract

    Infants learn the meaning of words from accumulated experiences of real-time interactions with their caregivers. To study the effects of visual sensory input on word learning, we recorded infant's view of the world using head-mounted eye trackers during free-flowing play with a caregiver. While playing, infants were exposed to novel label-object mappings and later learning outcomes for these items were tested after the play session. In this study we use a classification based approach to link properties of infants' visual scenes during naturalistic labeling moments to their word learning outcomes. We find that a model which integrates both highly informative and ambiguous sensory evidence is a better fit to infants' individual learning outcomes than models where either type of evidence is taken alone, and that raw labeling frequency is unable to account for the word learning differences we observe. Here we demonstrate how a computational model, using only raw pixels taken from the egocentric scene image, can derive insights on human language learning.
  • Arnhold, A., Vainio, M., Suni, A., & Järvikivi, J. (2010). Intonation of Finnish verbs. Speech Prosody 2010, 100054, 1-4. Retrieved from http://speechprosody2010.illinois.edu/papers/100054.pdf.

    Abstract

    A production experiment investigated the tonal shape of Finnish finite verbs in transitive sentences without narrow focus. Traditional descriptions of Finnish stating that non-focused finite verbs do not receive accents were only partly supported. Verbs were found to have a consistently smaller pitch range than words in other word classes, but their pitch contours were neither flat nor explainable by pure interpolation.
  • Auer, E., Wittenburg, P., Sloetjes, H., Schreer, O., Masneri, S., Schneider, D., & Tschöpel, S. (2010). Automatic annotation of media field recordings. In C. Sporleder, & K. Zervanou (Eds.), Proceedings of the ECAI 2010 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2010) (pp. 31-34). Lisbon: University de Lisbon. Retrieved from http://ilk.uvt.nl/LaTeCH2010/.

    Abstract

    In the paper we describe a new attempt to come to automatic detectors processing real scene audio-video streams that can be used by researchers world-wide to speed up their annotation and analysis work. Typically these recordings are taken in field and experimental situations mostly with bad quality and only little corpora preventing to use standard stochastic pattern recognition techniques. Audio/video processing components are taken out of the expert lab and are integrated in easy-to-use interactive frameworks so that the researcher can easily start them with modified parameters and can check the usefulness of the created annotations. Finally a variety of detectors may have been used yielding a lattice of annotations. A flexible search engine allows finding combinations of patterns opening completely new analysis and theorization possibilities for the researchers who until were required to do all annotations manually and who did not have any help in pre-segmenting lengthy media recordings.
  • Auer, E., Russel, A., Sloetjes, H., Wittenburg, P., Schreer, O., Masnieri, S., Schneider, D., & Tschöpel, S. (2010). ELAN as flexible annotation framework for sound and image processing detectors. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 890-893). European Language Resources Association (ELRA).

    Abstract

    Annotation of digital recordings in humanities research still is, to a largeextend, a process that is performed manually. This paper describes the firstpattern recognition based software components developed in the AVATecH projectand their integration in the annotation tool ELAN. AVATecH (AdvancingVideo/Audio Technology in Humanities Research) is a project that involves twoMax Planck Institutes (Max Planck Institute for Psycholinguistics, Nijmegen,Max Planck Institute for Social Anthropology, Halle) and two FraunhoferInstitutes (Fraunhofer-Institut für Intelligente Analyse- undInformationssysteme IAIS, Sankt Augustin, Fraunhofer Heinrich-Hertz-Institute,Berlin) and that aims to develop and implement audio and video technology forsemi-automatic annotation of heterogeneous media collections as they occur inmultimedia based research. The highly diverse nature of the digital recordingsstored in the archives of both Max Planck Institutes, poses a huge challenge tomost of the existing pattern recognition solutions and is a motivation to makesuch technology available to researchers in the humanities.
  • Bardhan, N. P., Aslin, R., & Tanenhaus, M. (2010). Adults' self-directed learning of an artificial lexicon: The dynamics of neighborhood reorganization. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Meeting of the Cognitive Science Society (pp. 364-368). Austin, TX: Cognitive Science Society.
  • Bauer, B. L. M. (2003). The adverbial formation in mente in Vulgar and Late Latin: A problem in grammaticalization. In H. Solin, M. Leiwo, & H. Hallo-aho (Eds.), Latin vulgaire, latin tardif VI (pp. 439-457). Hildesheim: Olms.
  • Bergmann, C., Paulus, M., & Fikkert, J. (2010). A closer look at pronoun comprehension: Comparing different methods. In J. Costa, A. Castro, M. Lobo, & F. Pratas (Eds.), Language Acquisition and Development: Proceedings of GALA 2009 (pp. 53-61). Newcastle upon Tyne: Cambridge Scholars Publishing.

    Abstract

    1. Introduction External input is necessary to acquire language. Consequently, the comprehension of various constituents of language, such as lexical items or syntactic and semantic structures should emerge at the same time as or even precede their production. However, in the case of pronouns this general assumption does not seem to hold. On the contrary, while children at the age of four use pronouns and reflexives appropriately during production (de Villiers, et al. 2006), a number of comprehension studies across different languages found chance performance in pronoun trials up to the age of seven, which co-occurs with a high level of accuracy in reflexive trials (for an overview see e.g. Conroy, et al. 2009; Elbourne 2005).
  • Bergmann, C., Gubian, M., & Boves, L. (2010). Modelling the effect of speaker familiarity and noise on infant word recognition. In Proceedings of the 11th Annual Conference of the International Speech Communication Association [Interspeech 2010] (pp. 2910-2913). ISCA.

    Abstract

    In the present paper we show that a general-purpose word learning model can simulate several important findings from recent experiments in language acquisition. Both the addition of background noise and varying the speaker have been found to influence infants’ performance during word recognition experiments. We were able to replicate this behaviour in our artificial word learning agent. We use the results to discuss both advantages and limitations of computational models of language acquisition.
  • Bodur, K., Branje, S., Peirolo, M., Tiscareno, I., & German, J. S. (2021). Domain-initial strengthening in Turkish: Acoustic cues to prosodic hierarchy in stop consonants. In Proceedings of Interspeech 2021 (pp. 1459-1463). doi:10.21437/Interspeech.2021-2230.

    Abstract

    Studies have shown that cross-linguistically, consonants at the left edge of higher-level prosodic boundaries tend to be more forcefully articulated than those at lower-level boundaries, a phenomenon known as domain-initial strengthening. This study tests whether similar effects occur in Turkish, using the Autosegmental-Metrical model proposed by Ipek & Jun [1, 2] as the basis for assessing boundary strength. Productions of /t/ and /d/ were elicited in four domain-initial prosodic positions corresponding to progressively higher-level boundaries: syllable, word, intermediate phrase, and Intonational Phrase. A fifth position, nuclear word, was included in order to better situate it within the prosodic hierarchy. Acoustic correlates of articulatory strength were measured, including closure duration for /d/ and /t/, as well as voice onset time and burst energy for /t/. Our results show that closure duration increases cumulatively from syllable to intermediate phrase, while voice onset time and burst energy are not influenced by boundary strength. These findings provide corroborating evidence for Ipek & Jun’s model, particularly for the distinction between word and intermediate phrase boundaries. Additionally, articulatory strength at the left edge of the nuclear word patterned closely with word-initial position, supporting the view that the nuclear word is not associated with a distinct phrasing domain
  • Bottini, R., & Casasanto, D. (2010). Implicit spatial length modulates time estimates, but not vice versa. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 1348-1353). Austin, TX: Cognitive Science Society.

    Abstract

    Why do people accommodate to each other’s linguistic behavior? Studies of natural interactions (Giles, Taylor & Bourhis, 1973) suggest that speakers accommodate to achieve interactional goals, influencing what their interlocutor thinks or feels about them. But is this the only reason speakers accommodate? In real-world conversations, interactional motivations are ubiquitous, making it difficult to assess the extent to which they drive accommodation. Do speakers still accommodate even when interactional goals cannot be achieved, for instance, when their interlocutor cannot interpret their accommodation behavior? To find out, we asked participants to enter an immersive virtual reality (VR) environment and to converse with a virtual interlocutor. Participants accommodated to the speech rate of their virtual interlocutor even though he could not interpret their linguistic behavior, and thus accommodation could not possibly help them to achieve interactional goals. Results show that accommodation does not require explicit interactional goals, and suggest other social motivations for accommodation.
  • Broeder, D., Kemps-Snijders, M., Van Uytvanck, D., Windhouwer, M., Withers, P., Wittenburg, P., & Zinn, C. (2010). A data category registry- and component-based metadata framework. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 43-47). European Language Resources Association (ELRA).

    Abstract

    We describe our computer-supported framework to overcome the rule of metadata schism. It combines the use of controlled vocabularies, managed by a data category registry, with a component-based approach, where the categories can be combined to yield complex metadata structures. A metadata scheme devised in this way will thus be grounded in its use of categories. Schema designers will profit from existing prefabricated larger building blocks, motivating re-use at a larger scale. The common base of any two metadata schemes within this framework will solve, at least to a good extent, the semantic interoperability problem, and consequently, further promote systematic use of metadata for existing resources and tools to be shared.
  • Broersma, M. (2010). Dutch listener's perception of Korean fortis, lenis, and aspirated stops: First exposure. In K. Dziubalska-Kołaczyk, M. Wrembel, & M. Kul (Eds.), Proceedings of the 6th International Symposium on the Acquisition of Second Language Speech, New Sounds 2010, Poznań, Poland, 1-3 May 2010 (pp. 49-54).
  • Broersma, M. (2010). Korean lenis, fortis, and aspirated stops: Effect of place of articulation on acoustic realization. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan. (pp. 941-944).

    Abstract

    Unlike most of the world's languages, Korean distinguishes three types of voiceless stops, namely lenis, fortis, and aspirated stops. All occur at three places of articulation. In previous work, acoustic measurements are mostly collapsed over the three places of articulation. This study therefore provides acoustic measurements of Korean lenis, fortis, and aspirated stops at all three places of articulation separately. Clear differences are found among the acoustic characteristics of the stops at the different places of articulation
  • Brookshire, G., Casasanto, D., & Ivry, R. (2010). Modulation of motor-meaning congruity effects for valenced words. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Meeting of the Cognitive Science Society (CogSci 2010) (pp. 1940-1945). Austin, TX: Cognitive Science Society.

    Abstract

    We investigated the extent to which emotionally valenced words automatically cue spatio-motor representations. Participants made speeded button presses, moving their hand upward or downward while viewing words with positive or negative valence. Only the color of the words was relevant to the response; on target trials, there was no requirement to read the words or process their meaning. In Experiment 1, upward responses were faster for positive words, and downward for negative words. This effect was extinguished, however, when words were repeated. In Experiment 2, participants performed the same primary task with the addition of distractor trials. Distractors either oriented attention toward the words’ meaning or toward their color. Congruity effects were increased with orientation to meaning, but eliminated with orientation to color. When people read words with emotional valence, vertical spatio-motor representations are activated highly automatically, but this automaticity is modulated by repetition and by attentional orientation to the words’ form or meaning.
  • Brouwer, H., Fitz, H., & Hoeks, J. C. (2010). Modeling the noun phrase versus sentence coordination ambiguity in Dutch: Evidence from Surprisal Theory. In Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics, ACL 2010 (pp. 72-80). Association for Computational Linguistics.

    Abstract

    This paper investigates whether surprisal theory can account for differential processing difficulty in the NP-/S-coordination ambiguity in Dutch. Surprisal is estimated using a Probabilistic Context-Free Grammar (PCFG), which is induced from an automatically annotated corpus. We find that our lexicalized surprisal model can account for the reading time data from a classic experiment on this ambiguity by Frazier (1987). We argue that syntactic and lexical probabilities, as specified in a PCFG, are sufficient to account for what is commonly referred to as an NP-coordination preference.
  • Casasanto, D., & Bottini, R. (2010). Can mirror-reading reverse the flow of time? In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Meeting of the Cognitive Science Society (CogSci 2010) (pp. 1342-1347). Austin, TX: Cognitive Science Society.

    Abstract

    Across cultures, people conceptualize time as if it flows along a horizontal timeline, but the direction of this implicit timeline is culture-specific: in cultures with left-to-right orthography (e.g., English-speaking cultures) time appears to flow rightward, but in cultures with right-to-left orthography (e.g., Arabic-speaking cultures) time flows leftward. Can orthography influence implicit time representations independent of other cultural and linguistic factors? Native Dutch speakers performed a space-time congruity task with the instructions and stimuli written in either standard Dutch or mirror-reversed Dutch. Participants in the Standard Dutch condition were fastest to judge past-oriented phrases by pressing the left button and future-oriented phrases by pressing the right button. Participants in the Mirror-Reversed Dutch condition showed the opposite pattern of reaction times, consistent with results found previously in native Arabic and Hebrew speakers. These results demonstrate a causal role for writing direction in shaping implicit mental representations of time.
  • Casasanto, D., & Bottini, R. (2010). Mirror-reading can reverse the flow of time [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 57). York: University of York.
  • Casasanto, D., & Jasmin, K. (2010). Good and bad in the hands of politicians: Spontaneous gestures during positive and negative speech [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 137). York: University of York.
  • Chen, A. (2003). Language dependence in continuation intonation. In M. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS.) (pp. 1069-1072). Rundle Mall, SA, Austr.: Causal Productions Pty.
  • Chen, A., & Destruel, E. (2010). Intonational encoding of focus in Toulousian French. Speech Prosody 2010, 100233, 1-4. Retrieved from http://speechprosody2010.illinois.edu/papers/100233.pdf.

    Abstract

    Previous studies on focus marking in French have shown that post-focus deaccentuation, phrasing and phonetic cues like peak height and duration are employed to encode narrow focus but tonal patterns appear to be irrelevant. These studies either examined Standard French or did not control for the regional varieties spoken by the speakers. The present study investigated the use of all these cues in expressing narrow focus in naturally spoken declarative sentences in Toulousian French. It was found that similar to Standard French, Toulousian French uses post-focus deaccentuation and phrasing to mark focus. Different from Standard French, Toulousian French does not use the phonetic cues but use tonal patterns to encode focus. Tonal patterns ending with H\% occur more frequently in the VPs when the subject is in focus but tonal patterns ending with L\% occur more frequently in the VPs when the object is in focus. Our study thus provides a first insight into the similarities and differences in focus marking between Toulousian French and Standard French.
  • Chen, A. (2003). Reaction time as an indicator to discrete intonational contrasts in English. In Proceedings of Eurospeech 2003 (pp. 97-100).

    Abstract

    This paper reports a perceptual study using a semantically motivated identification task in which we investigated the nature of two pairs of intonational contrasts in English: (1) normal High accent vs. emphatic High accent; (2) early peak alignment vs. late peak alignment. Unlike previous inquiries, the present study employs an on-line method using the Reaction Time measurement, in addition to the measurement of response frequencies. Regarding the peak height continuum, the mean RTs are shortest for within-category identification but longest for across-category identification. As for the peak alignment contrast, no identification boundary emerges and the mean RTs only reflect a difference between peaks aligned with the vowel onset and peaks aligned elsewhere. We conclude that the peak height contrast is discrete but the previously claimed discreteness of the peak alignment contrast is not borne out.
  • Cho, T. (2003). Lexical stress, phrasal accent and prosodic boundaries in the realization of domain-initial stops in Dutch. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhs 2003) (pp. 2657-2660). Adelaide: Causal Productions.

    Abstract

    This study examines the effects of prosodic boundaries, lexical stress, and phrasal accent on the acoustic realization of stops (/t, d/) in Dutch, with special attention paid to language-specificity in the phonetics-prosody interface. The results obtained from various acoustic measures show systematic phonetic variations in the production of /t d/ as a function of prosodic position, which may be interpreted as being due to prosodicallyconditioned articulatory strengthening. Shorter VOTs were found for the voiceless stop /t/ in prosodically stronger locations (as opposed to longer VOTs in this position in English). The results suggest that prosodically-driven phonetic realization is bounded by a language-specific phonological feature system.
  • Coopmans, C. W., De Hoop, H., Kaushik, K., Hagoort, P., & Martin, A. E. (2021). Structure-(in)dependent interpretation of phrases in humans and LSTMs. In Proceedings of the Society for Computation in Linguistics (SCiL 2021) (pp. 459-463).

    Abstract

    In this study, we compared the performance of a long short-term memory (LSTM) neural network to the behavior of human participants on a language task that requires hierarchically structured knowledge. We show that humans interpret ambiguous noun phrases, such as second blue ball, in line with their hierarchical constituent structure. LSTMs, instead, only do
    so after unambiguous training, and they do not systematically generalize to novel items. Overall, the results of our simulations indicate that a model can behave hierarchically without relying on hierarchical constituent structure.
  • Crago, M. B., Allen, S. E. M., & Pesco, D. (1998). Issues of Complexity in Inuktitut and English Child Directed Speech. In Proceedings of the twenty-ninth Annual Stanford Child Language Research Forum (pp. 37-46).
  • Cutler, A., Murty, L., & Otake, T. (2003). Rhythmic similarity effects in non-native listening? In Proceedings of the 15th International Congress of Phonetic Sciences (PCPhS 2003) (pp. 329-332). Adelaide: Causal Productions.

    Abstract

    Listeners rely on native-language rhythm in segmenting speech; in different languages, stress-, syllable- or mora-based rhythm is exploited. This language-specificity affects listening to non- native speech, if native procedures are applied even though inefficient for the non-native language. However, speakers of two languages with similar rhythmic interpretation should segment their own and the other language similarly. This was observed to date only for related languages (English-Dutch; French-Spanish). We now report experiments in which Japanese listeners heard Telugu, a Dravidian language unrelated to Japanese, and Telugu listeners heard Japanese. In both cases detection of target sequences in speech was harder when target boundaries mismatched mora boundaries, exactly the pattern that Japanese listeners earlier exhibited with Japanese and other languages. These results suggest that Telugu and Japanese listeners use similar procedures in segmenting speech, and support the idea that languages fall into rhythmic classes, with aspects of phonological structure affecting listeners' speech segmentation.
  • Cutler, A., Aslin, R. N., Gervain, J., & Nespor, M. (Eds.). (2021). Special issue in honor of Jacques Mehler, Cognition's founding editor [Special Issue]. Cognition, 213.
  • Cutler, A., & Otake, T. (1998). Assimilation of place in Japanese and Dutch. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: vol. 5 (pp. 1751-1754). Sydney: ICLSP.

    Abstract

    Assimilation of place of articulation across a nasal and a following stop consonant is obligatory in Japanese, but not in Dutch. In four experiments the processing of assimilated forms by speakers of Japanese and Dutch was compared, using a task in which listeners blended pseudo-word pairs such as ranga-serupa. An assimilated blend of this pair would be rampa, an unassimilated blend rangpa. Japanese listeners produced significantly more assimilated than unassimilated forms, both with pseudo-Japanese and pseudo-Dutch materials, while Dutch listeners produced significantly more unassimilated than assimilated forms in each materials set. This suggests that Japanese listeners, whose native-language phonology involves obligatory assimilation constraints, represent the assimilated nasals in nasal-stop sequences as unmarked for place of articulation, while Dutch listeners, who are accustomed to hearing unassimilated forms, represent the same nasal segments as marked for place of articulation.
  • Cutler, A., El Aissati, A., Hanulikova, A., & McQueen, J. M. (2010). Effects on speech parsing of vowelless words in the phonology. In Abstracts of Laboratory Phonology 12 (pp. 115-116).
  • Cutler, A. (1998). How listeners find the right words. In Proceedings of the Sixteenth International Congress on Acoustics: Vol. 2 (pp. 1377-1380). Melville, NY: Acoustical Society of America.

    Abstract

    Languages contain tens of thousands of words, but these are constructed from a tiny handful of phonetic elements. Consequently, words resemble one another, or can be embedded within one another, a coup stick snot with standing. me process of spoken-word recognition by human listeners involves activation of multiple word candidates consistent with the input, and direct competition between activated candidate words. Further, human listeners are sensitive, at an early, prelexical, stage of speeeh processing, to constraints on what could potentially be a word of the language.
  • Cutler, A., & Butterfield, S. (1989). Natural speech cues to word segmentation under difficult listening conditions. In J. Tubach, & J. Mariani (Eds.), Proceedings of Eurospeech 89: European Conference on Speech Communication and Technology: Vol. 2 (pp. 372-375). Edinburgh: CEP Consultants.

    Abstract

    One of a listener's major tasks in understanding continuous speech is segmenting the speech signal into separate words. When listening conditions are difficult, speakers can help listeners by deliberately speaking more clearly. In three experiments, we examined how word boundaries are produced in deliberately clear speech. We found that speakers do indeed attempt to mark word boundaries; moreover, they differentiate between word boundaries in a way which suggests they are sensitive to listener needs. Application of heuristic segmentation strategies makes word boundaries before strong syllables easiest for listeners to perceive; but under difficult listening conditions speakers pay more attention to marking word boundaries before weak syllables, i.e. they mark those boundaries which are otherwise particularly hard to perceive.
  • Cutler, A., Treiman, R., & Van Ooijen, B. (1998). Orthografik inkoncistensy ephekts in foneme detektion? In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2783-2786). Sydney: ICSLP.

    Abstract

    The phoneme detection task is widely used in spoken word recognition research. Alphabetically literate participants, however, are more used to explicit representations of letters than of phonemes. The present study explored whether phoneme detection is sensitive to how target phonemes are, or may be, orthographically realised. Listeners detected the target sounds [b,m,t,f,s,k] in word-initial position in sequences of isolated English words. Response times were faster to the targets [b,m,t], which have consistent word-initial spelling, than to the targets [f,s,k], which are inconsistently spelled, but only when listeners’ attention was drawn to spelling by the presence in the experiment of many irregularly spelled fillers. Within the inconsistent targets [f,s,k], there was no significant difference between responses to targets in words with majority and minority spellings. We conclude that performance in the phoneme detection task is not necessarily sensitive to orthographic effects, but that salient orthographic manipulation can induce such sensitivity.
  • Cutler, A., Mitterer, H., Brouwer, S., & Tuinman, A. (2010). Phonological competition in casual speech. In Proceedings of DiSS-LPSS Joint Workshop 2010 (pp. 43-46).
  • Cutler, A. (1998). The recognition of spoken words with variable representations. In D. Duez (Ed.), Proceedings of the ESCA Workshop on Sound Patterns of Spontaneous Speech (pp. 83-92). Aix-en-Provence: Université de Aix-en-Provence.
  • Cutler, A., & Shanley, J. (2010). Validation of a training method for L2 continuous-speech segmentation. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 1844-1847).

    Abstract

    Recognising continuous speech in a second language is often unexpectedly difficult, as the operation of segmenting speech is so attuned to native-language structure. We report the initial steps in development of a novel training method for second-language listening, focusing on speech segmentation and employing a task designed for studying this: word-spotting. Listeners detect real words in sequences consisting of a word plus a minimal context. The present validation study shows that learners from varying non-English backgrounds successfully perform a version of this task in English, and display appropriate sensitivity to structural factors that also affect segmentation by native English listeners.
  • Declerck, T., Cunningham, H., Saggion, H., Kuper, J., Reidsma, D., & Wittenburg, P. (2003). MUMIS - Advanced information extraction for multimedia indexing and searching digital media - Processing for multimedia interactive services. 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 553-556.
  • Dolscheid, S., Shayan, S., Ozturk, O., Majid, A., & Casasanto, D. (2010). Language shapes mental representations of musical pitch: Implications for metaphorical language processing [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 137). York: University of York.

    Abstract

    Speakers often use spatial metaphors to talk about musical pitch (e.g., a low note, a high soprano). Previous experiments suggest that English speakers also think about pitches as high or low in space, even when theyʼre not using language or musical notation (Casasanto, 2010). Do metaphors in language merely reflect pre-existing associations between space and pitch, or might language also shape these non-linguistic metaphorical mappings? To investigate the role of language in pitch tepresentation, we conducted a pair of non-linguistic spacepitch interference experiments in speakers of two languages that use different spatial metaphors. Dutch speakers usually describe pitches as ʻhighʼ (hoog) and ʻlowʼ (laag). Farsi speakers, however, often describe high-frequency pitches as ʻthinʼ (naazok) and low-frequency pitches as ʻthickʼ (koloft). Do Dutch and Farsi speakers mentally represent pitch differently? To find out, we asked participants to reproduce musical pitches that they heard in the presence of irrelevant spatial information (i.e., lines that varied either in height or in thickness). For the Height Interference experiment, horizontal lines bisected a vertical reference line at one of nine different locations. For the Thickness Interference experiment, a vertical line appeared in the middle of the screen in one of nine thicknesses. In each experiment, the nine different lines were crossed with nine different pitches ranging from C4 to G#4 in semitone increments, to produce 81 distinct trials. If Dutch and Farsi speakers mentally represent pitch the way they talk about it, using different kinds of spatial representations, they should show contrasting patterns of cross-dimensional interference: Dutch speakersʼ pitch estimates should be more strongly affected by irrelevant height information, and Farsi speakersʼ by irrelevant thickness information. As predicted, Dutch speakersʼ pitch estimates were significantly modulated by spatial height but not by thickness. Conversely, Farsi speakersʼ pitch estimates were modulated by spatial thickness but not by height (2x2 ANOVA on normalized slopes of the effect of space on pitch: F(1,71)=17,15 p<.001). To determine whether language plays a causal role in shaping pitch representations, we conducted a training experiment. Native Dutch speakers learned to use Farsi-like metaphors, describing pitch relationships in terms of thickness (e.g., a cello sounds ʻthickerʼ than a flute). After training, Dutch speakers showed a significant effect of Thickness interference in the non-linguistic pitch reproduction task, similar to native Farsi speakers: on average, pitches accompanied by thicker lines were reproduced as lower in pitch (effect of thickness on pitch: r=-.22, p=.002). By conducting psychophysical tasks, we tested the ʻWhorfianʼ question without using words. Yet, results also inform theories of metaphorical language processing. According to psycholinguistic theories (e.g., Bowdle & Gentner, 2005), highly conventional metaphors are processed without any active mapping from the source to the target domain (e.g., from space to pitch). Our data, however, suggest that when people use verbal metaphors they activate a corresponding non-linguistic mapping from either height or thickness to pitch, strengthening this association at the expense of competing associations. As a result, people who use different metaphors in their native languages form correspondingly different representations of musical pitch. Casasanto, D. (2010). Space for Thinking. In Language, Cognition and Space: State of the art and new directions. V. Evans & P. Chilton (Eds.), 453-478, London: Equinox Publishing. Bowdle, B. & Gentner, D. (2005). The career of metaphor. Psychological Review, 112, 193-216.
  • Drozd, K. F. (1998). No as a determiner in child English: A summary of categorical evidence. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings of the Gala '97 Conference on Language Acquisition (pp. 34-39). Edinburgh, UK: Edinburgh University Press,.

    Abstract

    This paper summarizes the results of a descriptive syntactic category analysis of child English no which reveals that young children use and represent no as a determiner and negatives like no pen as NPs, contra standard analyses.
  • Drude, S. (2003). Advanced glossing: A language documentation format and its implementation with Shoebox. In Proceedings of the 2002 International Conference on Language Resources and Evaluation (LREC 2002). Paris: ELRA.

    Abstract

    This paper presents Advanced Glossing, a proposal for a general glossing format designed for language documentation, and a specific setup for the Shoebox-program that implements Advanced Glossing to a large extent. Advanced Glossing (AG) goes beyond the traditional Interlinear Morphemic Translation, keeping syntactic and morphological information apart from each other in separate glossing tables. AG provides specific lines for different kinds of annotation – phonetic, phonological, orthographical, prosodic, categorial, structural, relational, and semantic, and it allows for gradual and successive, incomplete, and partial filling in case that some information may be irrelevant, unknown or uncertain. The implementation of AG in Shoebox sets up several databases. Each documented text is represented as a file of syntactic glossings. The morphological glossings are kept in a separate database. As an additional feature interaction with lexical databases is possible. The implementation makes use of the interlinearizing automatism provided by Shoebox, thus obtaining the table format for the alignment of lines in cells, and for semi-automatic filling-in of information in glossing tables which has been extracted from databases
  • Drude, S. (2003). Digitizing and annotating texts and field recordings in the Awetí project. In Proceedings of the EMELD Language Digitization Project Conference 2003. Workshop on Digitizing and Annotating Text and Field Recordings, LSA Institute, Michigan State University, July 11th -13th.

    Abstract

    Digitizing and annotating texts and field recordings Given that several initiatives worldwide currently explore the new field of documentation of endangered languages, the E-MELD project proposes to survey and unite procedures, techniques and results in order to achieve its main goal, ''the formulation and promulgation of best practice in linguistic markup of texts and lexicons''. In this context, this year's workshop deals with the processing of recorded texts. I assume the most valuable contribution I could make to the workshop is to show the procedures and methods used in the Awetí Language Documentation Project. The procedures applied in the Awetí Project are not necessarily representative of all the projects in the DOBES program, and they may very well fall short in several respects of being best practice, but I hope they might provide a good and concrete starting point for comparison, criticism and further discussion. The procedures to be exposed include: * taping with digital devices, * digitizing (preliminarily in the field, later definitely by the TIDEL-team at the Max Planck Institute in Nijmegen), * segmenting and transcribing, using the transcriber computer program, * translating (on paper, or while transcribing), * adding more specific annotation, using the Shoebox program, * converting the annotation to the ELAN-format developed by the TIDEL-team, and doing annotation with ELAN. Focus will be on the different types of annotation. Especially, I will present, justify and discuss Advanced Glossing, a text annotation format developed by H.-H. Lieb and myself designed for language documentation. It will be shown how Advanced Glossing can be applied using the Shoebox program. The Shoebox setup used in the Awetí Project will be shown in greater detail, including lexical databases and semi-automatic interaction between different database types (jumping, interlinearization). ( Freie Universität Berlin and Museu Paraense Emílio Goeldi, with funding from the Volkswagen Foundation.)
  • Duffield, N., & Matsuo, A. (2003). Factoring out the parallelism effect in ellipsis: An interactional approach? In J. Chilar, A. Franklin, D. Keizer, & I. Kimbara (Eds.), Proceedings of the 39th Annual Meeting of the Chicago Linguistic Society (CLS) (pp. 591-603). Chicago: Chicago Linguistics Society.

    Abstract

    Traditionally, there have been three standard assumptions made about the Parallelism Effect on VP-ellipsis, namely that the effect is categorical, that it applies asymmetrically and that it is uniquely due to syntactic factors. Based on the results of a series of experiments involving online and offline tasks, it will be argued that the Parallelism Effect is instead noncategorical and interactional. The factors investigated include construction type, conceptual and morpho-syntactic recoverability, finiteness and anaphor type (to test VP-anaphora). The results show that parallelism is gradient rather than categorical, effects both VP-ellipsis and anaphora, and is influenced by both structural and non-structural factors.
  • Eisner, F., Weber, A., & Melinger, A. (2010). Generalization of learning in pre-lexical adjustments to word-final devoicing [Abstract]. Journal of the Acoustical Society of America, 128, 2323.

    Abstract

    Pre-lexical representations of speech sounds have been to shown to change dynamically through a mechanism of lexically driven learning. [Norris et al. (2003).] Here we investigated whether this type of learning occurs in native British English (BE) listeners for a word-final stop contrast which is commonly de-voiced in Dutch-accented English. Specifically, this study asked whether the change in pre-lexical representation also encodes information about the position of the critical sound within a word. After exposure to a native Dutch speaker's productions of de-voiced stops in word-final position (but not in any other positions), BE listeners showed evidence of perceptual learning in a subsequent cross-modal priming task, where auditory primes with voiceless final stops (e.g., [si:t], “seat”) facilitated recognition of visual targets with voiced final stops (e.g., “seed”). This learning generalized to test pairs where the critical contrast was in word-initial position, e.g., auditory primes such as [taun] (“town”), facilitated recognition of visual targets like “down”. Control listeners, who had not heard any stops by the speaker during exposure, showed no learning effects. The results suggest that under these exposure conditions, word position is not encoded in the pre-lexical adjustment to the accented phoneme contras
  • Evans, N., Levinson, S. C., & Sterelny, K. (Eds.). (2021). Thematic issue on evolution of kinship systems [Special Issue]. Biological theory, 16.
  • Eviatar, Z., & Huettig, F. (Eds.). (2021). Literacy and writing systems [Special Issue]. Journal of Cultural Cognitive Science.
  • Falk, J. J., Zhang, Y., Scheutz, M., & Yu, C. (2021). Parents adaptively use anaphora during parent-child social interaction. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 1472-1478). Vienna: Cognitive Science Society.

    Abstract

    Anaphora, a ubiquitous feature of natural language, poses a particular challenge to young children as they first learn language due to its referential ambiguity. In spite of this, parents and caregivers use anaphora frequently in child-directed speech, potentially presenting a risk to effective communication if children do not yet have the linguistic capabilities of resolving anaphora successfully. Through an eye-tracking study in a naturalistic free-play context, we examine the strategies that parents employ to calibrate their use of anaphora to their child's linguistic development level. We show that, in this way, parents are able to intuitively scaffold the complexity of their speech such that greater referential ambiguity does not hurt overall communication success.
  • Fitz, H. (2010). Statistical learning of complex questions. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 2692-2698). Austin, TX: Cognitive Science Society.

    Abstract

    The problem of auxiliary fronting in complex polar questions occupies a prominent position within the nature versus nurture controversy in language acquisition. We employ a model of statistical learning which uses sequential and semantic information to produce utterances from a bag of words. This linear learner is capable of generating grammatical questions without exposure to these structures in its training environment. We also demonstrate that the model performs superior to n-gram learners on this task. Implications for nativist theories of language acquisition are discussed.
  • Furman, R., Ozyurek, A., & Küntay, A. C. (2010). Early language-specificity in Turkish children's caused motion event expressions in speech and gesture. In K. Franich, K. M. Iserman, & L. L. Keil (Eds.), Proceedings of the 34th Boston University Conference on Language Development. Volume 1 (pp. 126-137). Somerville, MA: Cascadilla Press.
  • Galke, L., Franke, B., Zielke, T., & Scherp, A. (2021). Lifelong learning of graph neural networks for open-world node classification. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN). Piscataway, NJ: IEEE. doi:10.1109/IJCNN52387.2021.9533412.

    Abstract

    Graph neural networks (GNNs) have emerged as the standard method for numerous tasks on graph-structured data such as node classification. However, real-world graphs are often evolving over time and even new classes may arise. We model these challenges as an instance of lifelong learning, in which a learner faces a sequence of tasks and may take over knowledge acquired in past tasks. Such knowledge may be stored explicitly as historic data or implicitly within model parameters. In this work, we systematically analyze the influence of implicit and explicit knowledge. Therefore, we present an incremental training method for lifelong learning on graphs and introduce a new measure based on k-neighborhood time differences to address variances in the historic data. We apply our training method to five representative GNN architectures and evaluate them on three new lifelong node classification datasets. Our results show that no more than 50% of the GNN's receptive field is necessary to retain at least 95% accuracy compared to training over the complete history of the graph data. Furthermore, our experiments confirm that implicit knowledge becomes more important when fewer explicit knowledge is available.
  • Galke, L., Seidlmayer, E., Lüdemann, G., Langnickel, L., Melnychuk, T., Förstner, K. U., Tochtermann, K., & Schultz, C. (2021). COVID-19++: A citation-aware Covid-19 dataset for the analysis of research dynamics. In Y. Chen, H. Ludwig, Y. Tu, U. Fayyad, X. Zhu, X. Hu, S. Byna, X. Liu, J. Zhang, S. Pan, V. Papalexakis, J. Wang, A. Cuzzocrea, & C. Ordonez (Eds.), Proceedings of the 2021 IEEE International Conference on Big Data (pp. 4350-4355). Piscataway, NJ: IEEE.

    Abstract

    COVID-19 research datasets are crucial for analyzing research dynamics. Most collections of COVID-19 research items do not to include cited works and do not have annotations
    from a controlled vocabulary. Starting with ZB MED KE data on COVID-19, which comprises CORD-19, we assemble a new dataset that includes cited work and MeSH annotations for all records. Furthermore, we conduct experiments on the analysis of research dynamics, in which we investigate predicting links in a co-annotation graph created on the basis of the new dataset. Surprisingly, we find that simple heuristic methods are better at
    predicting future links than more sophisticated approaches such as graph neural networks.
  • Goudbeek, M., & Broersma, M. (2010). The Demo/Kemo corpus: A principled approach to the study of cross-cultural differences in the vocal expression and perception of emotion. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010) (pp. 2211-2215). Paris: ELRA.

    Abstract

    This paper presents the Demo / Kemo corpus of Dutch and Korean emotional speech. The corpus has been specifically developed for the purpose of cross-linguistic comparison, and is more balanced than any similar corpus available so far: a) it contains expressions by both Dutch and Korean actors as well as judgments by both Dutch and Korean listeners; b) the same elicitation technique and recording procedure was used for recordings of both languages; c) the same nonsense sentence, which was constructed to be permissible in both languages, was used for recordings of both languages; and d) the emotions present in the corpus are balanced in terms of valence, arousal, and dominance. The corpus contains a comparatively large number of emotions (eight) uttered by a large number of speakers (eight Dutch and eight Korean). The counterbalanced nature of the corpus will enable a stricter investigation of language-specific versus universal aspects of emotional expression than was possible so far. Furthermore, given the carefully controlled phonetic content of the expressions, it allows for analysis of the role of specific phonetic features in emotional expression in Dutch and Korean.
  • Greenfield, M. D., Honing, H., Kotz, S. A., & Ravignani, A. (Eds.). (2021). Synchrony and rhythm interaction: From the brain to behavioural ecology [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 376.
  • Gubian, M., Bergmann, C., & Boves, L. (2010). Investigating word learning processes in an artificial agent. In Proceedings of the IXth IEEE International Conference on Development and Learning (ICDL). Ann Arbor, MI, 18-21 Aug. 2010 (pp. 178 -184). IEEE.

    Abstract

    Researchers in human language processing and acquisition are making an increasing use of computational models. Computer simulations provide a valuable platform to reproduce hypothesised learning mechanisms that are otherwise very difficult, if not impossible, to verify on human subjects. However, computational models come with problems and risks. It is difficult to (automatically) extract essential information about the developing internal representations from a set of simulation runs, and often researchers limit themselves to analysing learning curves based on empirical recognition accuracy through time. The associated risk is to erroneously deem a specific learning behaviour as generalisable to human learners, while it could also be a mere consequence (artifact) of the implementation of the artificial learner or of the input coding scheme. In this paper a set of simulation runs taken from the ACORNS project is investigated. First a look `inside the box' of the learner is provided by employing novel quantitative methods for analysing changing structures in large data sets. Then, the obtained findings are discussed in the perspective of their ecological validity in the field of child language acquisition.
  • Gullberg, M., & Indefrey, P. (Eds.). (2010). The earliest stages of language learning [Special Issue]. Language Learning, 60(Supplement s2).
  • Hanique, I., Schuppler, B., & Ernestus, M. (2010). Morphological and predictability effects on schwa reduction: The case of Dutch word-initial syllables. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 933-936).

    Abstract

    This corpus-based study shows that the presence and duration of schwa in Dutch word-initial syllables are affected by a word’s predictability and its morphological structure. Schwa is less reduced in words that are more predictable given the following word. In addition, schwa may be longer if the syllable forms a prefix, and in prefixes the duration of schwa is positively correlated with the frequency of the word relative to its stem. Our results suggest that the conditions which favor reduced realizations are more complex than one would expect on the basis of the current literature.
  • Hanulikova, A., & Weber, A. (2010). Production of English interdental fricatives by Dutch, German, and English speakers. In K. Dziubalska-Kołaczyk, M. Wrembel, & M. Kul (Eds.), Proceedings of the 6th International Symposium on the Acquisition of Second Language Speech, New Sounds 2010, Poznań, Poland, 1-3 May 2010 (pp. 173-178). Poznan: Adam Mickiewicz University.

    Abstract

    Non-native (L2) speakers of English often experience difficulties in producing English interdental fricatives (e.g. the voiceless [θ]), and this leads to frequent substitutions of these fricatives (e.g. with [t], [s], and [f]). Differences in the choice of [θ]-substitutions across L2 speakers with different native (L1) language backgrounds have been extensively explored. However, even within one foreign accent, more than one substitution choice occurs, but this has been less systematically studied. Furthermore, little is known about whether the substitutions of voiceless [θ] are phonetically clear instances of [t], [s], and [f], as they are often labelled. In this study, we attempted a phonetic approach to examine language-specific preferences for [θ]-substitutions by carrying out acoustic measurements of L1 and L2 realizations of these sounds. To this end, we collected a corpus of spoken English with L1 speakers (UK-English), and Dutch and German L2 speakers. We show a) that the distribution of differential substitutions using identical materials differs between Dutch and German L2 speakers, b) that [t,s,f]-substitutes differ acoustically from intended [t,s,f], and c) that L2 productions of [θ] are acoustically comparable to L1 productions.
  • Harmon, Z., Barak, L., Shafto, P., Edwards, J., & Feldman, N. H. (2021). Making heads or tails of it: A competition–compensation account of morphological deficits in language impairment. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 1872-1878). Vienna: Cognitive Science Society.

    Abstract

    Children with developmental language disorder (DLD) regularly use the base form of verbs (e.g., dance) instead of inflected forms (e.g., danced). We propose an account of this behavior in which children with DLD have difficulty processing novel inflected verbs in their input. This leads the inflected form to face stronger competition from alternatives. Competition is resolved by the production of a more accessible alternative with high semantic overlap with the inflected form: in English, the bare form. We test our account computationally by training a nonparametric Bayesian model that infers the productivity of the inflectional suffix (-ed). We systematically vary the number of novel types of inflected verbs in the input to simulate the input as processed by children with and without DLD. Modeling results are consistent with our hypothesis, suggesting that children’s inconsistent use of inflectional morphemes could stem from inferences they make on the basis of impoverished data.
  • Hintz, F., Voeten, C. C., McQueen, J. M., & Scharenborg, O. (2021). The effects of onset and offset masking on the time course of non-native spoken-word recognition in noise. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 133-139). Vienna: Cognitive Science Society.

    Abstract

    Using the visual-word paradigm, the present study investigated the effects of word onset and offset masking on the time course of non-native spoken-word recognition in the presence of background noise. In two experiments, Dutch non-native listeners heard English target words, preceded by carrier sentences that were noise-free (Experiment 1) or contained intermittent noise (Experiment 2). Target words were either onset- or offset-masked or not masked at all. Results showed that onset masking delayed target word recognition more than offset masking did, suggesting that – similar to natives – non-native listeners strongly rely on word onset information during word recognition in noise.

    Additional information

    Link to Preprint on BioRxiv
  • Janse, E. (2003). Word perception in natural-fast and artificially time-compressed speech. In M. SolÉ, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of the Phonetic Sciences (pp. 3001-3004).
  • Jasmin, K., & Casasanto, D. (2010). Stereotyping: How the QWERTY keyboard shapes the mental lexicon [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 159). York: University of York.
  • Jesse, A., Reinisch, E., & Nygaard, L. C. (2010). Learning of adjectival word meaning through tone of voice [Abstract]. Journal of the Acoustical Society of America, 128, 2475.

    Abstract

    Speakers express word meaning through systematic but non-canonical acoustic variation of tone of voice (ToV), i.e., variation of speaking rate, pitch, vocal effort, or loudness. Words are, for example, pronounced at a higher pitch when referring to small than to big referents. In the present study, we examined whether listeners can use ToV to learn the meaning of novel adjectives (e.g., “blicket”). During training, participants heard sentences such as “Can you find the blicket one?” spoken with ToV representing hot-cold, strong-weak, and big-small. Participants’ eye movements to two simultaneously shown objects with properties representing the relevant two endpoints (e.g., an elephant and an ant for big-small) were monitored. Assignment of novel adjectives to endpoints was counterbalanced across participants. During test, participants heard the sentences spoken with a neutral ToV, while seeing old or novel picture pairs varying along the same dimensions (e.g., a truck and a car for big-small). Participants had to click on the adjective’s referent. As evident from eye movements, participants did not infer the intended meaning during first exposure, but learned the meaning with the help of ToV during training. At test listeners applied this knowledge to old and novel items even in the absence of informative ToV.
  • Johnson, E. K. (2003). Speaker intent influences infants' segmentation of potentially ambiguous utterances. In Proceedings of the 15th International Congress of Phonetic Sciences (PCPhS 2003) (pp. 1995-1998). Adelaide: Causal Productions.
  • Junge, C., Hagoort, P., Kooijman, V., & Cutler, A. (2010). Brain potentials for word segmentation at seven months predict later language development. In K. Franich, K. M. Iserman, & L. L. Keil (Eds.), Proceedings of the 34th Annual Boston University Conference on Language Development. Volume 1 (pp. 209-220). Somerville, MA: Cascadilla Press.
  • Junge, C., Cutler, A., & Hagoort, P. (2010). Ability to segment words from speech as a precursor of later language development: Insights from electrophysiological responses in the infant brain. In M. Burgess, J. Davey, C. Don, & T. McMinn (Eds.), Proceedings of 20th International Congress on Acoustics, ICA 2010. Incorporating Proceedings of the 2010 annual conference of the Australian Acoustical Society (pp. 3727-3732). Australian Acoustical Society, NSW Division.
  • Karadöller, D. Z., Sumer, B., Ünal, E., & Ozyurek, A. (2021). Spatial language use predicts spatial memory of children: Evidence from sign, speech, and speech-plus-gesture. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 672-678). Vienna: Cognitive Science Society.

    Abstract

    There is a strong relation between children’s exposure to
    spatial terms and their later memory accuracy. In the current
    study, we tested whether the production of spatial terms by
    children themselves predicts memory accuracy and whether
    and how language modality of these encodings modulates
    memory accuracy differently. Hearing child speakers of
    Turkish and deaf child signers of Turkish Sign Language
    described pictures of objects in various spatial relations to each
    other and later tested for their memory accuracy of these
    pictures in a surprise memory task. We found that having
    described the spatial relation between the objects predicted
    better memory accuracy. However, the modality of these
    descriptions in sign, speech, or speech-plus-gesture did not
    reveal differences in memory accuracy. We discuss the
    implications of these findings for the relation between spatial
    language, memory, and the modality of encoding.
  • Kempen, G., & Harbusch, K. (2003). A corpus study into word order variation in German subordinate clauses: Animacy affects linearization independently of function assignment. In Proceedings of AMLaP 2003 (pp. 153-154). Glasgow: Glasgow University.
  • Kempen, G., & Harbusch, K. (1998). A 'tree adjoining' grammar without adjoining: The case of scrambling in German. In Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4).
  • Kemps-Snijders, M., Koller, T., Sloetjes, H., & Verweij, H. (2010). LAT bridge: Bridging tools for annotation and exploration of rich linguistic data. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 2648-2651). European Language Resources Association (ELRA).

    Abstract

    We present a software module, the LAT Bridge, which enables bidirectionalcommunication between the annotation and exploration tools developed at the MaxPlanck Institute for Psycholinguistics as part of our Language ArchivingTechnology (LAT) tool suite. These existing annotation and exploration toolsenable the annotation, enrichment, exploration and archive management oflinguistic resources. The user community has expressed the desire to usedifferent combinations of LAT tools in conjunction with each other. The LATBridge is designed to cater for a number of basic data interaction scenariosbetween the LAT annotation and exploration tools. These interaction scenarios(e.g. bootstrapping a wordlist, searching for annotation examples or lexicalentries) have been identified in collaboration with researchers at ourinstitute.We had to take into account that the LAT tools for annotation and explorationrepresent a heterogeneous application scenario with desktop-installed andweb-based tools. Additionally, the LAT Bridge has to work in situations wherethe Internet is not available or only in an unreliable manner (i.e. with a slowconnection or with frequent interruptions). As a result, the LAT Bridge’sarchitecture supports both online and offline communication between the LATannotation and exploration tools.
  • Khetarpal, N., Majid, A., Malt, B. C., Sloman, S., & Regier, T. (2010). Similarity judgments reflect both language and cross-language tendencies: Evidence from two semantic domains. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 358-363). Austin, TX: Cognitive Science Society.

    Abstract

    Many theories hold that semantic variation in the world’s languages can be explained in terms of a universal conceptual space that is partitioned differently by different languages. Recent work has supported this view in the semantic domain of containers (Malt et al., 1999), and assumed it in the domain of spatial relations (Khetarpal et al., 2009), based in both cases on similarity judgments derived from pile-sorting of stimuli. Here, we reanalyze data from these two studies and find a more complex picture than these earlier studies suggested. In both cases we find that sorting is similar across speakers of different languages (in line with the earlier studies), but nonetheless reflects the sorter’s native language (in contrast with the earlier studies). We conclude that there are cross-culturally shared conceptual tendencies that can be revealed by pile-sorting, but that these tendencies may be modulated to some extent by language. We discuss the implications of these findings for accounts of semantic variation.
  • Kita, S., Ozyurek, A., Allen, S., & Ishizuka, T. (2010). Early links between iconic gestures and sound symbolic words: Evidence for multimodal protolanguage. In A. D. Smith, M. Schouwstra, B. de Boer, & K. Smith (Eds.), Proceedings of the 8th International conference on the Evolution of Language (EVOLANG 8) (pp. 429-430). Singapore: World Scientific.
  • Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In Gesture and Sign-Language in Human-Computer Interaction (Lecture Notes in Artificial Intelligence - LNCS Subseries, Vol. 1371) (pp. 23-35). Berlin, Germany: Springer-Verlag.

    Abstract

    The previous literature has suggested that the hand movement in co-speech gestures and signs consists of a series of phases with qualitatively different dynamic characteristics. In this paper, we propose a syntagmatic rule system for movement phases that applies to both co-speech gestures and signs. Descriptive criteria for the rule system were developed for the analysis video-recorded continuous production of signs and gesture. It involves segmenting a stream of body movement into phases and identifying different phase types. Two human coders used the criteria to analyze signs and cospeech gestures that are produced in natural discourse. It was found that the criteria yielded good inter-coder reliability. These criteria can be used for the technology of automatic recognition of signs and co-speech gestures in order to segment continuous production and identify the potentially meaningbearing phase.
  • Klein, W., & Winkler, S. (Eds.). (2010). Ambiguität [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 40(158).
  • Klein, W., & Franceschini, R. (Eds.). (2003). Einfache Sprache [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 131.
  • Klein, W. (Ed.). (1989). Kindersprache [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (73).
  • Klein, W. (Ed.). (1998). Kaleidoskop [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (112).
  • Klein, W. (Ed.). (1979). Sprache und Kontext [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (33).
  • Koutamanis, E., Kootstra, G. J., Dijkstra, T., & Unsworth., S. (2021). Lexical priming as evidence for language-nonselective access in the simultaneous bilingual child's lexicon. In D. Dionne, & L.-A. Vidal Covas (Eds.), BUCLD 45: Proceedings of the 45th annual Boston University Conference on Language Development (pp. 413-430). Sommerville, MA: Cascadilla Press.
  • Kung, C., Chwilla, D. J., Gussenhoven, C., Bögels, S., & Schriefers, H. (2010). What did you say just now, bitterness or wife? An ERP study on the interaction between tone, intonation and context in Cantonese Chinese. In Proceedings of Speech Prosody 2010 (pp. 1-4).

    Abstract

    Previous studies on Cantonese Chinese showed that rising
    question intonation contours on low-toned words lead to
    frequent misperceptions of the tones. Here we explored the
    processing consequences of this interaction between tone and
    intonation by comparing the processing and identification of
    monosyllabic critical words at the end of questions and
    statements, using a tone identification task, and ERPs as an
    online measure of speech comprehension. Experiment 1
    yielded higher error rates for the identification of low tones at
    the end of questions and a larger N400-P600 pattern, reflecting
    processing difficulty and reanalysis, compared to other
    conditions. In Experiment 2, we investigated the effect of
    immediate lexical context on the tone by intonation interaction.
    Increasing contextual constraints led to a reduction in errors
    and the disappearance of the P600 effect. These results
    indicate that there is an immediate interaction between tone,
    intonation, and context in online speech comprehension. The
    difference in performance and activation patterns between the
    two experiments highlights the significance of context in
    understanding a tone language, like Cantonese-Chinese.
  • Kuzla, C. (2003). Prosodically-conditioned variation in the realization of domain-final stops and voicing assimilation of domain-initial fricatives in German. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2829-2832). Adelaide: Causal Productions.
  • Lai, J., & Poletiek, F. H. (2010). The impact of starting small on the learnability of recursion. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32rd Annual Conference of the Cognitive Science Society (CogSci 2010) (pp. 1387-1392). Austin, TX, USA: Cognitive Science Society.
  • De Lange, F. P., Hagoort, P., & Toni, I. (2003). Differential fronto-parietal contributions to visual and motor imagery. NeuroImage, 19(2), e2094-e2095.

    Abstract

    Mental imagery is a cognitive process crucial to human reasoning. Numerous studies have characterized specific
    instances of this cognitive ability, as evoked by visual imagery (VI) or motor imagery (MI) tasks. However, it
    remains unclear which neural resources are shared between VI and MI, and which are exclusively related to MI.
    To address this issue, we have used fMRI to measure human brain activity during performance of VI and MI
    tasks. Crucially, we have modulated the imagery process by manipulating the degree of mental rotation necessary
    to solve the tasks. We focused our analysis on changes in neural signal as a function of the degree of mental
    rotation in each task.
  • Lecumberri, M. L. G., Cooke, M., & Cutler, A. (Eds.). (2010). Non-native speech perception in adverse conditions [Special Issue]. Speech Communication, 52(11/12).
  • Levelt, C. C., Fikkert, P., & Schiller, N. O. (2003). Metrical priming in speech production. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2481-2485). Adelaide: Causal Productions.

    Abstract

    In this paper we report on four experiments in which we attempted to prime the stress position of Dutch bisyllabic target nouns. These nouns, picture names, had stress on either the first or the second syllable. Auditory prime words had either the same stress as the target or a different stress (e.g., WORtel – MOtor vs. koSTUUM – MOtor; capital letters indicate stressed syllables in prime – target pairs). Furthermore, half of the prime words were semantically related, the other half were unrelated. In none of the experiments a stress priming effect was found. This could mean that stress is not stored in the lexicon. An additional finding was that targets with initial stress had a faster response than targets with a final stress. We hypothesize that bisyllabic words with final stress take longer to be encoded because this stress pattern is irregular with respect to the lexical distribution of bisyllabic stress patterns, even though it can be regular in terms of the metrical stress rules of Dutch.
  • Levinson, S. C. (1979). Pragmatics and social deixis: Reclaiming the notion of conventional implicature. In C. Chiarello (Ed.), Proceedings of the Fifth Annual Meeting of the Berkeley Linguistics Society (pp. 206-223).
  • Levshina, N., & Moran, S. (Eds.). (2021). Efficiency in human languages: Corpus evidence for universal principles [Special Issue]. Linguistics Vanguard, 7(s3).
  • Mamus, E., Speed, L. J., Ozyurek, A., & Majid, A. (2021). Sensory modality of input influences encoding of motion events in speech but not co-speech gestures. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 376-382). Vienna: Cognitive Science Society.

    Abstract

    Visual and auditory channels have different affordances and
    this is mirrored in what information is available for linguistic
    encoding. The visual channel has high spatial acuity, whereas
    the auditory channel has better temporal acuity. These
    differences may lead to different conceptualizations of events
    and affect multimodal language production. Previous studies of
    motion events typically present visual input to elicit speech and
    gesture. The present study compared events presented as audio-
    only, visual-only, or multimodal (visual+audio) input and
    assessed speech and co-speech gesture for path and manner of
    motion in Turkish. Speakers with audio-only input mentioned
    path more and manner less in verbal descriptions, compared to
    speakers who had visual input. There was no difference in the
    type or frequency of gestures across conditions, and gestures
    were dominated by path-only gestures. This suggests that input
    modality influences speakers’ encoding of path and manner of
    motion events in speech, but not in co-speech gestures.
  • Mazzone, M., & Campisi, E. (2010). Embodiment, metafore, comunicazione. In G. P. Storari, & E. Gola (Eds.), Forme e formalizzazioni. Atti del XVI congresso nazionale. Cagliari: CUEC.
  • Mazzone, M., & Campisi, E. (2010). Are there communicative intentions? In L. A. Pérez Miranda, & A. I. Madariaga (Eds.), Advances in cognitive science. IWCogSc-10. Proceedings of the ILCLI International Workshop on Cognitive Science Workshop on Cognitive Science (pp. 307-322). Bilbao, Spain: The University of the Basque Country.

    Abstract

    Grice in pragmatics and Levelt in psycholinguistics have proposed models of human communication where the starting point of communicative action is an individual intention. This assumption, though, has to face serious objections with regard to the alleged existence of explicit representations of the communicative goals to be pursued. Here evidence is surveyed which shows that in fact speaking may ordinarily be a quite automatic activity prompted by contextual cues and driven by behavioural schemata abstracted away from social regularities. On the one hand, this means that there could exist no intentions in the sense of explicit representations of communicative goals, following from deliberate reasoning and triggering the communicative action. On the other hand, however, there are reasons to allow for a weaker notion of intention than this, according to which communication is an intentional affair, after all. Communicative action is said to be intentional in this weaker sense to the extent that it is subject to a double mechanism of control, with respect both to present-directed and future-directed intentions.
  • McQueen, J. M., & Cho, T. (2003). The use of domain-initial strengthening in segmentation of continuous English speech. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2993-2996). Adelaide: Causal Productions.
  • McQueen, J. M., & Cutler, A. (1998). Spotting (different kinds of) words in (different kinds of) context. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2791-2794). Sydney: ICSLP.

    Abstract

    The results of a word-spotting experiment are presented in which Dutch listeners tried to spot different types of bisyllabic Dutch words embedded in different types of nonsense contexts. Embedded verbs were not reliably harder to spot than embedded nouns; this suggests that nouns and verbs are recognised via the same basic processes. Iambic words were no harder to spot than trochaic words, suggesting that trochaic words are not in principle easier to recognise than iambic words. Words were harder to spot in consonantal contexts (i.e., contexts which themselves could not be words) than in longer contexts which contained at least one vowel (i.e., contexts which, though not words, were possible words of Dutch). A control experiment showed that this difference was not due to acoustic differences between the words in each context. The results support the claim that spoken-word recognition is sensitive to the viability of sound sequences as possible words.
  • Meeuwissen, M., Roelofs, A., & Levelt, W. J. M. (2003). Naming analog clocks conceptually facilitates naming digital clocks. In Proceedings of XIII Conference of the European Society of Cognitive Psychology (ESCOP 2003) (pp. 271-271).
  • Merkx, D., & Frank, S. L. (2021). Human sentence processing: Recurrence or attention? In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2021) (pp. 12-22). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL). doi:10.18653/v1/2021.cmcl-1.2.

    Abstract

    Recurrent neural networks (RNNs) have long been an architecture of interest for computational models of human sentence processing. The recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks but little is known about its ability to model human language processing. We compare Transformer- and RNN-based language models’ ability to account for measures of human reading effort. Our analysis shows Transformers to outperform RNNs in explaining self-paced reading times and neural activity during reading English sentences, challenging the widely held idea that human sentence processing involves recurrent and immediate processing and provides evidence for cue-based retrieval.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2021). Semantic sentence similarity: Size does not always matter. In Proceedings of Interspeech 2021 (pp. 4393-4397). doi:10.21437/Interspeech.2021-1464.

    Abstract

    This study addresses the question whether visually grounded speech recognition (VGS) models learn to capture sentence semantics without access to any prior linguistic knowledge. We produce synthetic and natural spoken versions of a well known semantic textual similarity database and show that our VGS model produces embeddings that correlate well with human semantic similarity judgements. Our results show that a model trained on a small image-caption database outperforms two models trained on much larger databases, indicating that database size is not all that matters. We also investigate the importance of having multiple captions per image and find that this is indeed helpful even if the total number of images is lower, suggesting that paraphrasing is a valuable learning signal. While the general trend in the field is to create ever larger datasets to train models on, our findings indicate other characteristics of the database can just as important.
  • Moscoso del Prado Martín, F., & Baayen, R. H. (2003). Using the structure found in time: Building real-scale orthographic and phonetic representations by accumulation of expectations. In H. Bowman, & C. Labiouse (Eds.), Connectionist Models of Cognition, Perception and Emotion: Proceedings of the Eighth Neural Computation and Psychology Workshop (pp. 263-272). Singapore: World Scientific.
  • Mudd, K., Lutzenberger, H., De Vos, C., & De Boer, B. (2021). Social structure and lexical uniformity: A case study of gender differences in the Kata Kolok community. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 2692-2698). Vienna: Cognitive Science Society.

    Abstract

    Language emergence is characterized by a high degree of lex-
    ical variation. It has been suggested that the speed at which
    lexical conventionalization occurs depends partially on social
    structure. In large communities, individuals receive input from
    many sources, creating a pressure for lexical convergence.
    In small, insular communities, individuals can remember id-
    iolects and share common ground with interlocuters, allow-
    ing these communities to retain a high degree of lexical vari-
    ation. We look at lexical variation in Kata Kolok, a sign lan-
    guage which emerged six generations ago in a Balinese vil-
    lage, where women tend to have more tightly-knit social net-
    works than men. We test if there are differing degrees of lexical
    uniformity between women and men by reanalyzing a picture
    description task in Kata Kolok. We find that women’s produc-
    tions exhibit less lexical uniformity than men’s. One possible
    explanation of this finding is that women’s more tightly-knit
    social networks allow for remembering idiolects, alleviating
    the pressure for lexical convergence, but social network data
    from the Kata Kolok community is needed to support this ex-
    planation.
  • Munro, R., Bethard, S., Kuperman, V., Lai, V. T., Melnick, R., Potts, C., Schnoebelen, T., & Tily, H. (2010). Crowdsourcing and language studies: The new generation of linguistic data. In Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Proceedings of the Workshop (pp. 122-130). Stroudsburg, PA: Association for Computational Linguistics.
  • Oostdijk, N., & Broeder, D. (2003). The Spoken Dutch Corpus and its exploitation environment. In A. Abeille, S. Hansen-Schirra, & H. Uszkoreit (Eds.), Proceedings of the 4th International Workshop on linguistically interpreted corpora (LINC-03) (pp. 93-101).
  • Otake, T., McQueen, J. M., & Cutler, A. (2010). Competition in the perception of spoken Japanese words. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 114-117).

    Abstract

    Japanese listeners detected Japanese words embedded at the end of nonsense sequences (e.g., kaba 'hippopotamus' in gyachikaba). When the final portion of the preceding context together with the initial portion of the word (e.g., here, the sequence chika) was compatible with many lexical competitors, recognition of the embedded word was more difficult than when such a sequence was compatible with few competitors. This clear effect of competition, established here for preceding context in Japanese, joins similar demonstrations, in other languages and for following contexts, to underline that the functional architecture of the human spoken-word recognition system is a universal one.
  • Ouni, S., Cohen, M. M., Young, K., & Jesse, A. (2003). Internationalization of a talking head. In M. Sole, D. Recasens, & J. Romero (Eds.), Proceedings of 15th International Congress of Phonetics Sciences (pp. 2569-2572). Barcelona: Casual Productions.

    Abstract

    In this paper we describe a general scheme for internationalization of our talking head, Baldi, to speak other languages. We describe the modular structure of the auditory/visual synthesis software. As an example, we have created a synthetic Arabic talker, which is evaluated using a noisy word recognition task comparing this talker with a natural one.
  • Ozyurek, A. (1998). An analysis of the basic meaning of Turkish demonstratives in face-to-face conversational interaction. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (Eds.), Oralite et gestualite: Communication multimodale, interaction: actes du colloque ORAGE 98 (pp. 609-614). Paris: L'Harmattan.

Share this page