Publications

Displaying 1 - 100 of 237
  • Alday, P. M. (2016). Towards a rigorous motivation for Ziph's law. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/178.html.

    Abstract

    Language evolution can be viewed from two viewpoints: the development of a communicative system and the biological adaptations necessary for producing and perceiving said system. The communicative-system vantage point has enjoyed a wealth of mathematical models based on simple distributional properties of language, often formulated as empirical laws. However, be- yond vague psychological notions of “least effort”, no principled explanation has been proposed for the existence and success of such laws. Meanwhile, psychological and neurobiological mod- els have focused largely on the computational constraints presented by incremental, real-time processing. In the following, we show that information-theoretic entropy underpins successful models of both types and provides a more principled motivation for Zipf’s Law
  • Alhama, R. G., & Zuidema, W. (2016). Generalization in Artificial Language Learning: Modelling the Propensity to Generalize. In Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning (pp. 64-72). Association for Computational Linguistics. doi:10.18653/v1/W16-1909.

    Abstract

    Experiments in Artificial Language Learn-
    ing have revealed much about the cogni-
    tive mechanisms underlying sequence and
    language learning in human adults, in in-
    fants and in non-human animals. This pa-
    per focuses on their ability to generalize
    to novel grammatical instances (i.e., in-
    stances consistent with a familiarization
    pattern). Notably, the propensity to gen-
    eralize appears to be negatively correlated
    with the amount of exposure to the artifi-
    cial language, a fact that has been claimed
    to be contrary to the predictions of statis-
    tical models (Pe
    ̃
    na et al. (2002); Endress
    and Bonatti (2007)). In this paper, we pro-
    pose to model generalization as a three-
    step process, and we demonstrate that the
    use of statistical models for the first two
    steps, contrary to widespread intuitions in
    the ALL-field, can explain the observed
    decrease of the propensity to generalize
    with exposure time.
  • Alhama, R. G., & Zuidema, W. (2016). Pre-Wiring and Pre-Training: What does a neural network need to learn truly general identity rules? In T. R. Besold, A. Bordes, & A. D'Avila Garcez (Eds.), CoCo 2016 Cognitive Computation: Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016. CEUR Workshop Proceedings.

    Abstract

    In an influential paper, Marcus et al. [1999] claimed that connectionist models
    cannot account for human success at learning tasks that involved generalization
    of abstract knowledge such as grammatical rules. This claim triggered a heated
    debate, centered mostly around variants of the Simple Recurrent Network model
    [Elman, 1990]. In our work, we revisit this unresolved debate and analyze the
    underlying issues from a different perspective. We argue that, in order to simulate
    human-like learning of grammatical rules, a neural network model should not be
    used as a
    tabula rasa
    , but rather, the initial wiring of the neural connections and
    the experience acquired prior to the actual task should be incorporated into the
    model. We present two methods that aim to provide such initial state: a manipu-
    lation of the initial connections of the network in a cognitively plausible manner
    (concretely, by implementing a “delay-line” memory), and a pre-training algorithm
    that incrementally challenges the network with novel stimuli. We implement such
    techniques in an Echo State Network [Jaeger, 2001], and we show that only when
    combining both techniques the ESN is able to learn truly general identity rules.
  • Alhama, R. G., Siegelman, N., Frost, R., & Armstrong, B. C. (2019). The role of information in visual word recognition: A perceptually-constrained connectionist account. In A. Goel, C. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 83-89). Austin, TX: Cognitive Science Society.

    Abstract

    Proficient readers typically fixate near the center of a word, with a slight bias towards word onset. We explore a novel account of this phenomenon based on combining information-theory with visual perceptual constraints in a connectionist model of visual word recognition. This account posits that the amount of information-content available for word identification varies across fixation locations and across languages, thereby explaining the overall fixation location bias in different languages, making the novel prediction that certain words are more readily identified when fixating at an atypical fixation location, and predicting specific cross-linguistic differences. We tested these predictions across several simulations in English and Hebrew, and in a pilot behavioral experiment. Results confirmed that the bias to fixate closer to word onset aligns with maximizing information in the visual signal, that some words are more readily identified at atypical fixation locations, and that these effects vary to some degree across languages.
  • Allerhand, M., Butterfield, S., Cutler, A., & Patterson, R. (1992). Assessing syllable strength via an auditory model. In Proceedings of the Institute of Acoustics: Vol. 14 Part 6 (pp. 297-304). St. Albans, Herts: Institute of Acoustics.
  • Arnhold, A., Vainio, M., Suni, A., & Järvikivi, J. (2010). Intonation of Finnish verbs. Speech Prosody 2010, 100054, 1-4. Retrieved from http://speechprosody2010.illinois.edu/papers/100054.pdf.

    Abstract

    A production experiment investigated the tonal shape of Finnish finite verbs in transitive sentences without narrow focus. Traditional descriptions of Finnish stating that non-focused finite verbs do not receive accents were only partly supported. Verbs were found to have a consistently smaller pitch range than words in other word classes, but their pitch contours were neither flat nor explainable by pure interpolation.
  • Auer, E., Wittenburg, P., Sloetjes, H., Schreer, O., Masneri, S., Schneider, D., & Tschöpel, S. (2010). Automatic annotation of media field recordings. In C. Sporleder, & K. Zervanou (Eds.), Proceedings of the ECAI 2010 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2010) (pp. 31-34). Lisbon: University de Lisbon. Retrieved from http://ilk.uvt.nl/LaTeCH2010/.

    Abstract

    In the paper we describe a new attempt to come to automatic detectors processing real scene audio-video streams that can be used by researchers world-wide to speed up their annotation and analysis work. Typically these recordings are taken in field and experimental situations mostly with bad quality and only little corpora preventing to use standard stochastic pattern recognition techniques. Audio/video processing components are taken out of the expert lab and are integrated in easy-to-use interactive frameworks so that the researcher can easily start them with modified parameters and can check the usefulness of the created annotations. Finally a variety of detectors may have been used yielding a lattice of annotations. A flexible search engine allows finding combinations of patterns opening completely new analysis and theorization possibilities for the researchers who until were required to do all annotations manually and who did not have any help in pre-segmenting lengthy media recordings.
  • Auer, E., Russel, A., Sloetjes, H., Wittenburg, P., Schreer, O., Masnieri, S., Schneider, D., & Tschöpel, S. (2010). ELAN as flexible annotation framework for sound and image processing detectors. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 890-893). European Language Resources Association (ELRA).

    Abstract

    Annotation of digital recordings in humanities research still is, to a largeextend, a process that is performed manually. This paper describes the firstpattern recognition based software components developed in the AVATecH projectand their integration in the annotation tool ELAN. AVATecH (AdvancingVideo/Audio Technology in Humanities Research) is a project that involves twoMax Planck Institutes (Max Planck Institute for Psycholinguistics, Nijmegen,Max Planck Institute for Social Anthropology, Halle) and two FraunhoferInstitutes (Fraunhofer-Institut für Intelligente Analyse- undInformationssysteme IAIS, Sankt Augustin, Fraunhofer Heinrich-Hertz-Institute,Berlin) and that aims to develop and implement audio and video technology forsemi-automatic annotation of heterogeneous media collections as they occur inmultimedia based research. The highly diverse nature of the digital recordingsstored in the archives of both Max Planck Institutes, poses a huge challenge tomost of the existing pattern recognition solutions and is a motivation to makesuch technology available to researchers in the humanities.
  • Azar, Z., Backus, A., & Ozyurek, A. (2016). Pragmatic relativity: Gender and context affect the use of personal pronouns in discourse differentially across languages. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1295-1300). Austin, TX: Cognitive Science Society.

    Abstract

    Speakers use differential referring expressions in pragmatically appropriate ways to produce coherent narratives. Languages, however, differ in a) whether REs as arguments can be dropped and b) whether personal pronouns encode gender. We examine two languages that differ from each other in these two aspects and ask whether the co-reference context and the gender encoding options affect the use of REs differentially. We elicited narratives from Dutch and Turkish speakers about two types of three-person events, one including people of the same and the other of mixed-gender. Speakers re-introduced referents into the discourse with fuller forms (NPs) and maintained them with reduced forms (overt or null pronoun). Turkish speakers used pronouns mainly to mark emphasis and only Dutch speakers used pronouns differentially across the two types of videos. We argue that linguistic possibilities available in languages tune speakers into taking different principles into account to produce pragmatically coherent narratives
  • Badimala, P., Mishra, C., Venkataramana, R. K. M., Bukhari, S. S., & Dengel, A. (2019). A Study of Various Text Augmentation Techniques for Relation Classification in Free Text. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (pp. 360-367). Setúbal, Portugal: SciTePress Digital Library. doi:10.5220/0007311003600367.

    Abstract

    Data augmentation techniques have been widely used in visual recognition tasks as it is easy to generate new
    data by simple and straight forward image transformations. However, when it comes to text data augmen-
    tations, it is difficult to find appropriate transformation techniques which also preserve the contextual and
    grammatical structure of language texts. In this paper, we explore various text data augmentation techniques
    in text space and word embedding space. We study the effect of various augmented datasets on the efficiency
    of different deep learning models for relation classification in text.
  • Bardhan, N. P., Aslin, R., & Tanenhaus, M. (2010). Adults' self-directed learning of an artificial lexicon: The dynamics of neighborhood reorganization. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Meeting of the Cognitive Science Society (pp. 364-368). Austin, TX: Cognitive Science Society.
  • Bentum, M., Ten Bosch, L., Van den Bosch, A., & Ernestus, M. (2019). Listening with great expectations: An investigation of word form anticipations in naturalistic speech. In Proceedings of Interspeech 2019 (pp. 2265-2269). doi:10.21437/Interspeech.2019-2741.

    Abstract

    The event-related potential (ERP) component named phonological mismatch negativity (PMN) arises when listeners hear an unexpected word form in a spoken sentence [1]. The PMN is thought to reflect the mismatch between expected and perceived auditory speech input. In this paper, we use the PMN to test a central premise in the predictive coding framework [2], namely that the mismatch between prior expectations and sensory input is an important mechanism of perception. We test this with natural speech materials containing approximately 50,000 word tokens. The corresponding EEG-signal was recorded while participants (n = 48) listened to these materials. Following [3], we quantify the mismatch with two word probability distributions (WPD): a WPD based on preceding context, and a WPD that is additionally updated based on the incoming audio of the current word. We use the between-WPD cross entropy for each word in the utterances and show that a higher cross entropy correlates with a more negative PMN. Our results show that listeners anticipate auditory input while processing each word in naturalistic speech. Moreover, complementing previous research, we show that predictive language processing occurs across the whole probability spectrum.
  • Bentum, M., Ten Bosch, L., Van den Bosch, A., & Ernestus, M. (2019). Quantifying expectation modulation in human speech processing. In Proceedings of Interspeech 2019 (pp. 2270-2274). doi:10.21437/Interspeech.2019-2685.

    Abstract

    The mismatch between top-down predicted and bottom-up perceptual input is an important mechanism of perception according to the predictive coding framework (Friston, [1]). In this paper we develop and validate a new information-theoretic measure that quantifies the mismatch between expected and observed auditory input during speech processing. We argue that such a mismatch measure is useful for the study of speech processing. To compute the mismatch measure, we use naturalistic speech materials containing approximately 50,000 word tokens. For each word token we first estimate the prior word probability distribution with the aid of statistical language modelling, and next use automatic speech recognition to update this word probability distribution based on the unfolding speech signal. We validate the mismatch measure with multiple analyses, and show that the auditory-based update improves the probability of the correct word and lowers the uncertainty of the word probability distribution. Based on these results, we argue that it is possible to explicitly estimate the mismatch between predicted and perceived speech input with the cross entropy between word expectations computed before and after an auditory update.
  • Bergmann, C., Cristia, A., & Dupoux, E. (2016). Discriminability of sound contrasts in the face of speaker variation quantified. In Proceedings of the 38th Annual Conference of the Cognitive Science Society. (pp. 1331-1336). Austin, TX: Cognitive Science Society.

    Abstract

    How does a naive language learner deal with speaker variation irrelevant to distinguishing word meanings? Experimental data is contradictory, and incompatible models have been proposed. Here, we examine basic assumptions regarding the acoustic signal the learner deals with: Is speaker variability a hurdle in discriminating sounds or can it easily be ignored? To this end, we summarize existing infant data. We then present machine-based discriminability scores of sound pairs obtained without any language knowledge. Our results show that speaker variability decreases sound contrast discriminability, and that some contrasts are affected more than others. However, chance performance is rare; most contrasts remain discriminable in the face of speaker variation. We take our results to mean that speaker variation is not a uniform hurdle to discriminating sound contrasts, and careful examination is necessary when planning and interpreting studies testing whether and to what extent infants (and adults) are sensitive to speaker differences.

    Additional information

    Scripts and data
  • Bergmann, C., Paulus, M., & Fikkert, J. (2010). A closer look at pronoun comprehension: Comparing different methods. In J. Costa, A. Castro, M. Lobo, & F. Pratas (Eds.), Language Acquisition and Development: Proceedings of GALA 2009 (pp. 53-61). Newcastle upon Tyne: Cambridge Scholars Publishing.

    Abstract

    1. Introduction External input is necessary to acquire language. Consequently, the comprehension of various constituents of language, such as lexical items or syntactic and semantic structures should emerge at the same time as or even precede their production. However, in the case of pronouns this general assumption does not seem to hold. On the contrary, while children at the age of four use pronouns and reflexives appropriately during production (de Villiers, et al. 2006), a number of comprehension studies across different languages found chance performance in pronoun trials up to the age of seven, which co-occurs with a high level of accuracy in reflexive trials (for an overview see e.g. Conroy, et al. 2009; Elbourne 2005).
  • Bergmann, C., Gubian, M., & Boves, L. (2010). Modelling the effect of speaker familiarity and noise on infant word recognition. In Proceedings of the 11th Annual Conference of the International Speech Communication Association [Interspeech 2010] (pp. 2910-2913). ISCA.

    Abstract

    In the present paper we show that a general-purpose word learning model can simulate several important findings from recent experiments in language acquisition. Both the addition of background noise and varying the speaker have been found to influence infants’ performance during word recognition experiments. We were able to replicate this behaviour in our artificial word learning agent. We use the results to discuss both advantages and limitations of computational models of language acquisition.
  • Bohnemeyer, J. (2004). Argument and event structure in Yukatek verb classes. In J.-Y. Kim, & A. Werle (Eds.), Proceedings of The Semantics of Under-Represented Languages in the Americas. Amherst, Mass: GLSA.

    Abstract

    In Yukatek Maya, event types are lexicalized in verb roots and stems that fall into a number of different form classes on the basis of (a) patterns of aspect-mood marking and (b) priviledges of undergoing valence-changing operations. Of particular interest are the intransitive classes in the light of Perlmutter’s (1978) Unaccusativity hypothesis. In the spirit of Levin & Rappaport Hovav (1995) [L&RH], Van Valin (1990), Zaenen (1993), and others, this paper investigates whether (and to what extent) the association between formal predicate classes and event types is determined by argument structure features such as ‘agentivity’ and ‘control’ or features of lexical aspect such as ‘telicity’ and ‘durativity’. It is shown that mismatches between agentivity/control and telicity/durativity are even more extensive in Yukatek than they are in English (Abusch 1985; L&RH, Van Valin & LaPolla 1997), providing new evidence against Dowty’s (1979) reconstruction of Vendler’s (1967) ‘time schemata of verbs’ in terms of argument structure configurations. Moreover, contrary to what has been claimed in earlier studies of Yukatek (Krämer & Wunderlich 1999, Lucy 1994), neither agentivity/control nor telicity/durativity turn out to be good predictors of verb class membership. Instead, the patterns of aspect-mood marking prove to be sensitive only to the presence or absense of state change, in a way that supports the unified analysis of all verbs of gradual change proposed by Kennedy & Levin (2001). The presence or absence of ‘internal causation’ (L&RH) may motivate the semantic interpretation of transitivization operations. An explicit semantics for the valence-changing operations is proposed, based on Parsons’s (1990) Neo-Davidsonian approach.
  • Bosker, H. R., Reinisch, E., & Sjerps, M. J. (2016). Listening under cognitive load makes speech sound fast. In H. van den Heuvel, B. Cranen, & S. Mattys (Eds.), Proceedings of the Speech Processing in Realistic Environments [SPIRE] Workshop (pp. 23-24). Groningen.
  • Bosker, H. R. (2016). Our own speech rate influences speech perception. In J. Barnes, A. Brugos, S. Stattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 227-231).

    Abstract

    During conversation, spoken utterances occur in rich acoustic contexts, including speech produced by our interlocutor(s) and speech we produced ourselves. Prosodic characteristics of the acoustic context have been known to influence speech perception in a contrastive fashion: for instance, a vowel presented in a fast context is perceived to have a longer duration than the same vowel in a slow context. Given the ubiquity of the sound of our own voice, it may be that our own speech rate - a common source of acoustic context - also influences our perception of the speech of others. Two experiments were designed to test this hypothesis. Experiment 1 replicated earlier contextual rate effects by showing that hearing pre-recorded fast or slow context sentences alters the perception of ambiguous Dutch target words. Experiment 2 then extended this finding by showing that talking at a fast or slow rate prior to the presentation of the target words also altered the perception of those words. These results suggest that between-talker variation in speech rate production may induce between-talker variation in speech perception, thus potentially explaining why interlocutors tend to converge on speech rate in dialogue settings.

    Additional information

    pdf via conference website227
  • Bottini, R., & Casasanto, D. (2010). Implicit spatial length modulates time estimates, but not vice versa. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 1348-1353). Austin, TX: Cognitive Science Society.

    Abstract

    Why do people accommodate to each other’s linguistic behavior? Studies of natural interactions (Giles, Taylor & Bourhis, 1973) suggest that speakers accommodate to achieve interactional goals, influencing what their interlocutor thinks or feels about them. But is this the only reason speakers accommodate? In real-world conversations, interactional motivations are ubiquitous, making it difficult to assess the extent to which they drive accommodation. Do speakers still accommodate even when interactional goals cannot be achieved, for instance, when their interlocutor cannot interpret their accommodation behavior? To find out, we asked participants to enter an immersive virtual reality (VR) environment and to converse with a virtual interlocutor. Participants accommodated to the speech rate of their virtual interlocutor even though he could not interpret their linguistic behavior, and thus accommodation could not possibly help them to achieve interactional goals. Results show that accommodation does not require explicit interactional goals, and suggest other social motivations for accommodation.
  • Brehm, L., Jackson, C. N., & Miller, K. L. (2019). Incremental interpretation in the first and second language. In M. Brown, & B. Dailey (Eds.), BUCLD 43: Proceedings of the 43rd annual Boston University Conference on Language Development (pp. 109-122). Sommerville, MA: Cascadilla Press.
  • Broeder, D., Brugman, H., Oostdijk, N., & Wittenburg, P. (2004). Towards Dynamic Corpora: Workshop on compiling and processing spoken corpora. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004) (pp. 59-62). Paris: European Language Resource Association.
  • Broeder, D., Wittenburg, P., & Crasborn, O. (2004). Using Profiles for IMDI Metadata Creation. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004) (pp. 1317-1320). Paris: European Language Resources Association.
  • Broeder, D., Kemps-Snijders, M., Van Uytvanck, D., Windhouwer, M., Withers, P., Wittenburg, P., & Zinn, C. (2010). A data category registry- and component-based metadata framework. In N. Calzolari, B. Maegaard, J. Mariani, J. Odjik, K. Choukri, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 43-47). European Language Resources Association (ELRA).

    Abstract

    We describe our computer-supported framework to overcome the rule of metadata schism. It combines the use of controlled vocabularies, managed by a data category registry, with a component-based approach, where the categories can be combined to yield complex metadata structures. A metadata scheme devised in this way will thus be grounded in its use of categories. Schema designers will profit from existing prefabricated larger building blocks, motivating re-use at a larger scale. The common base of any two metadata schemes within this framework will solve, at least to a good extent, the semantic interoperability problem, and consequently, further promote systematic use of metadata for existing resources and tools to be shared.
  • Broeder, D., Declerck, T., Romary, L., Uneson, M., Strömqvist, S., & Wittenburg, P. (2004). A large metadata domain of language resources. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004) (pp. 369-372). Paris: European Language Resources Association.
  • Broeder, D., Nava, M., & Declerck, T. (2004). INTERA - a Distributed Domain of Metadata Resources. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Spoken Language Resources and Evaluation (LREC 2004) (pp. 369-372). Paris: European Language Resources Association.
  • Broersma, M. (2010). Dutch listener's perception of Korean fortis, lenis, and aspirated stops: First exposure. In K. Dziubalska-Kołaczyk, M. Wrembel, & M. Kul (Eds.), Proceedings of the 6th International Symposium on the Acquisition of Second Language Speech, New Sounds 2010, Poznań, Poland, 1-3 May 2010 (pp. 49-54).
  • Broersma, M., & Kolkman, K. M. (2004). Lexical representation of non-native phonemes. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 1241-1244). Seoul: Sunjijn Printing Co.
  • Broersma, M. (2010). Korean lenis, fortis, and aspirated stops: Effect of place of articulation on acoustic realization. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan. (pp. 941-944).

    Abstract

    Unlike most of the world's languages, Korean distinguishes three types of voiceless stops, namely lenis, fortis, and aspirated stops. All occur at three places of articulation. In previous work, acoustic measurements are mostly collapsed over the three places of articulation. This study therefore provides acoustic measurements of Korean lenis, fortis, and aspirated stops at all three places of articulation separately. Clear differences are found among the acoustic characteristics of the stops at the different places of articulation
  • Brookshire, G., Casasanto, D., & Ivry, R. (2010). Modulation of motor-meaning congruity effects for valenced words. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Meeting of the Cognitive Science Society (CogSci 2010) (pp. 1940-1945). Austin, TX: Cognitive Science Society.

    Abstract

    We investigated the extent to which emotionally valenced words automatically cue spatio-motor representations. Participants made speeded button presses, moving their hand upward or downward while viewing words with positive or negative valence. Only the color of the words was relevant to the response; on target trials, there was no requirement to read the words or process their meaning. In Experiment 1, upward responses were faster for positive words, and downward for negative words. This effect was extinguished, however, when words were repeated. In Experiment 2, participants performed the same primary task with the addition of distractor trials. Distractors either oriented attention toward the words’ meaning or toward their color. Congruity effects were increased with orientation to meaning, but eliminated with orientation to color. When people read words with emotional valence, vertical spatio-motor representations are activated highly automatically, but this automaticity is modulated by repetition and by attentional orientation to the words’ form or meaning.
  • Brouwer, H., Fitz, H., & Hoeks, J. C. (2010). Modeling the noun phrase versus sentence coordination ambiguity in Dutch: Evidence from Surprisal Theory. In Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics, ACL 2010 (pp. 72-80). Association for Computational Linguistics.

    Abstract

    This paper investigates whether surprisal theory can account for differential processing difficulty in the NP-/S-coordination ambiguity in Dutch. Surprisal is estimated using a Probabilistic Context-Free Grammar (PCFG), which is induced from an automatically annotated corpus. We find that our lexicalized surprisal model can account for the reading time data from a classic experiment on this ambiguity by Frazier (1987). We argue that syntactic and lexical probabilities, as specified in a PCFG, are sufficient to account for what is commonly referred to as an NP-coordination preference.
  • Bruggeman, L., & Cutler, A. (2019). The dynamics of lexical activation and competition in bilinguals’ first versus second language. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1342-1346). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Speech input causes listeners to activate multiple
    candidate words which then compete with one
    another. These include onset competitors, that share a
    beginning (bumper, butter), but also, counterintuitively,
    rhyme competitors, sharing an ending
    (bumper, jumper). In L1, competition is typically
    stronger for onset than for rhyme. In L2, onset
    competition has been attested but rhyme competition
    has heretofore remained largely unexamined. We
    assessed L1 (Dutch) and L2 (English) word
    recognition by the same late-bilingual individuals. In
    each language, eye gaze was recorded as listeners
    heard sentences and viewed sets of drawings: three
    unrelated, one depicting an onset or rhyme competitor
    of a word in the input. Activation patterns revealed
    substantial onset competition but no significant
    rhyme competition in either L1 or L2. Rhyme
    competition may thus be a “luxury” feature of
    maximally efficient listening, to be abandoned when
    resources are scarcer, as in listening by late
    bilinguals, in either language.
  • Bruggeman, L., & Cutler, A. (2016). Lexical manipulation as a discovery tool for psycholinguistic research. In C. Carignan, & M. D. Tyler (Eds.), Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016) (pp. 313-316).
  • Brugman, H., Crasborn, O., & Russel, A. (2004). Collaborative annotation of sign language data with Peer-to-Peer technology. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Language Evaluation (LREC 2004) (pp. 213-216). Paris: European Language Resources Association.
  • Brugman, H., & Russel, A. (2004). Annotating Multi-media/Multi-modal resources with ELAN. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Language Evaluation (LREC 2004) (pp. 2065-2068). Paris: European Language Resources Association.
  • Burenhult, N. (2004). Spatial deixis in Jahai. In S. Burusphat (Ed.), Papers from the 11th Annual Meeting of the Southeast Asian Linguistics Society 2001 (pp. 87-100). Arizona State University: Program for Southeast Asian Studies.
  • Casasanto, D., & Bottini, R. (2010). Can mirror-reading reverse the flow of time? In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Meeting of the Cognitive Science Society (CogSci 2010) (pp. 1342-1347). Austin, TX: Cognitive Science Society.

    Abstract

    Across cultures, people conceptualize time as if it flows along a horizontal timeline, but the direction of this implicit timeline is culture-specific: in cultures with left-to-right orthography (e.g., English-speaking cultures) time appears to flow rightward, but in cultures with right-to-left orthography (e.g., Arabic-speaking cultures) time flows leftward. Can orthography influence implicit time representations independent of other cultural and linguistic factors? Native Dutch speakers performed a space-time congruity task with the instructions and stimuli written in either standard Dutch or mirror-reversed Dutch. Participants in the Standard Dutch condition were fastest to judge past-oriented phrases by pressing the left button and future-oriented phrases by pressing the right button. Participants in the Mirror-Reversed Dutch condition showed the opposite pattern of reaction times, consistent with results found previously in native Arabic and Hebrew speakers. These results demonstrate a causal role for writing direction in shaping implicit mental representations of time.
  • Casasanto, D., & Bottini, R. (2010). Mirror-reading can reverse the flow of time [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 57). York: University of York.
  • Casasanto, D., & Jasmin, K. (2010). Good and bad in the hands of politicians: Spontaneous gestures during positive and negative speech [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 137). York: University of York.
  • Chen, A., & Destruel, E. (2010). Intonational encoding of focus in Toulousian French. Speech Prosody 2010, 100233, 1-4. Retrieved from http://speechprosody2010.illinois.edu/papers/100233.pdf.

    Abstract

    Previous studies on focus marking in French have shown that post-focus deaccentuation, phrasing and phonetic cues like peak height and duration are employed to encode narrow focus but tonal patterns appear to be irrelevant. These studies either examined Standard French or did not control for the regional varieties spoken by the speakers. The present study investigated the use of all these cues in expressing narrow focus in naturally spoken declarative sentences in Toulousian French. It was found that similar to Standard French, Toulousian French uses post-focus deaccentuation and phrasing to mark focus. Different from Standard French, Toulousian French does not use the phonetic cues but use tonal patterns to encode focus. Tonal patterns ending with H\% occur more frequently in the VPs when the subject is in focus but tonal patterns ending with L\% occur more frequently in the VPs when the object is in focus. Our study thus provides a first insight into the similarities and differences in focus marking between Toulousian French and Standard French.
  • Cho, T., & McQueen, J. M. (2004). Phonotactics vs. phonetic cues in native and non-native listening: Dutch and Korean listeners' perception of Dutch and English. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 1301-1304). Seoul: Sunjijn Printing Co.

    Abstract

    We investigated how listeners of two unrelated languages, Dutch and Korean, process phonotactically legitimate and illegitimate sounds spoken in Dutch and American English. To Dutch listeners, unreleased word-final stops are phonotactically illegal because word-final stops in Dutch are generally released in isolation, but to Korean listeners, released final stops are illegal because word-final stops are never released in Korean. Two phoneme monitoring experiments showed a phonotactic effect: Dutch listeners detected released stops more rapidly than unreleased stops whereas the reverse was true for Korean listeners. Korean listeners with English stimuli detected released stops more accurately than unreleased stops, however, suggesting that acoustic-phonetic cues associated with released stops improve detection accuracy. We propose that in non-native speech perception, phonotactic legitimacy in the native language speeds up phoneme recognition, the richness of acousticphonetic cues improves listening accuracy, and familiarity with the non-native language modulates the relative influence of these two factors.
  • Cho, T., & Johnson, E. K. (2004). Acoustic correlates of phrase-internal lexical boundaries in Dutch. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 1297-1300). Seoul: Sunjin Printing Co.

    Abstract

    The aim of this study was to determine if Dutch speakers reliably signal phrase-internal lexical boundaries, and if so, how. Six speakers recorded 4 pairs of phonemically identical strong-weak-strong (SWS) strings with matching syllable boundaries but mismatching intended word boundaries (e.g. reis # pastei versus reispas # tij, or more broadly C1V2(C)#C2V2(C)C3V3(C) vs. C1V2(C)C2V2(C)#C3V3(C)). An Analysis of Variance revealed 3 acoustic parameters that were significantly greater in S#WS items (C2 DURATION, RIME1 DURATION, C3 BURST AMPLITUDE) and 5 parameters that were significantly greater in the SW#S items (C2 VOT, C3 DURATION, RIME2 DURATION, RIME3 DURATION, and V2 AMPLITUDE). Additionally, center of gravity measurements suggested that the [s] to [t] coarticulation was greater in reis # pa[st]ei versus reispa[s] # [t]ij. Finally, a Logistic Regression Analysis revealed that the 3 parameters (RIME1 DURATION, RIME2 DURATION, and C3 DURATION) contributed most reliably to a S#WS versus SW#S classification.
  • Cooper, N., & Cutler, A. (2004). Perception of non-native phonemes in noise. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 469-472). Seoul: Sunjijn Printing Co.

    Abstract

    We report an investigation of the perception of American English phonemes by Dutch listeners proficient in English. Listeners identified either the consonant or the vowel in most possible English CV and VC syllables. The syllables were embedded in multispeaker babble at three signal-to-noise ratios (16 dB, 8 dB, and 0 dB). Effects of signal-to-noise ratio on vowel and consonant identification are discussed as a function of syllable position and of relationship to the native phoneme inventory. Comparison of the results with previously reported data from native listeners reveals that noise affected the responding of native and non-native listeners similarly.
  • Croijmans, I., & Majid, A. (2016). Language does not explain the wine-specific memory advantage of wine experts. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 141-146). Austin, TX: Cognitive Science Society.

    Abstract

    Although people are poor at naming odors, naming a smell helps to remember that odor. Previous studies show wine experts have better memory for smells, and they also name smells differently than novices. Is wine experts’ odor memory is verbally mediated? And is the odor memory advantage that experts have over novices restricted to odors in their domain of expertise, or does it generalize? Twenty-four wine experts and 24 novices smelled wines, wine-related odors and common odors, and remembered these. Half the participants also named the smells. Wine experts had better memory for wines, but not for the other odors, indicating their memory advantage is restricted to wine. Wine experts named odors better than novices, but there was no relationship between experts’ ability to name odors and their memory for odors. This suggests experts’ odor memory advantage is not linguistically mediated, but may be the result of differential perceptual learning
  • Cutler, A., Norris, D., & Sebastián-Gallés, N. (2004). Phonemic repertoire and similarity within the vocabulary. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 65-68). Seoul: Sunjijn Printing Co.

    Abstract

    Language-specific differences in the size and distribution of the phonemic repertoire can have implications for the task facing listeners in recognising spoken words. A language with more phonemes will allow shorter words and reduced embedding of short words within longer ones, decreasing the potential for spurious lexical competitors to be activated by speech signals. We demonstrate that this is the case via comparative analyses of the vocabularies of English and Spanish. A language which uses suprasegmental as well as segmental contrasts, however, can substantially reduce the extent of spurious embedding.
  • Cutler, A., Burchfield, A., & Antoniou, M. (2019). A criterial interlocutor tally for successful talker adaptation? In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1485-1489). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Part of the remarkable efficiency of listening is
    accommodation to unfamiliar talkers’ specific
    pronunciations by retuning of phonemic intercategory
    boundaries. Such retuning occurs in second
    (L2) as well as first language (L1); however, recent
    research with emigrés revealed successful adaptation
    in the environmental L2 but, unprecedentedly, not in
    L1 despite continuing L1 use. A possible explanation
    involving relative exposure to novel talkers is here
    tested in heritage language users with Mandarin as
    family L1 and English as environmental language. In
    English, exposure to an ambiguous sound in
    disambiguating word contexts prompted the expected
    adjustment of phonemic boundaries in subsequent
    categorisation. However, no adjustment occurred in
    Mandarin, again despite regular use. Participants
    reported highly asymmetric interlocutor counts in the
    two languages. We conclude that successful retuning
    ability requires regular exposure to novel talkers in
    the language in question, a criterion not met for the
    emigrés’ or for these heritage users’ L1.
  • Ip, M., & Cutler, A. (2016). Cross-language data on five types of prosodic focus. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 330-334).

    Abstract

    To examine the relative roles of language-specific and language-universal mechanisms in the production of prosodic focus, we compared production of five different types of focus by native speakers of English and Mandarin. Two comparable dialogues were constructed for each language, with the same words appearing in focused and unfocused position; 24 speakers recorded each dialogue in each language. Duration, F0 (mean, maximum, range), and rms-intensity (mean, maximum) of all critical word tokens were measured. Across the different types of focus, cross-language differences were observed in the degree to which English versus Mandarin speakers use the different prosodic parameters to mark focus, suggesting that while prosody may be universally available for expressing focus, the means of its employment may be considerably language-specific
  • Cutler, A., El Aissati, A., Hanulikova, A., & McQueen, J. M. (2010). Effects on speech parsing of vowelless words in the phonology. In Abstracts of Laboratory Phonology 12 (pp. 115-116).
  • Cutler, A., Kearns, R., Norris, D., & Scott, D. (1992). Listeners’ responses to extraneous signals coincident with English and French speech. In J. Pittam (Ed.), Proceedings of the 4th Australian International Conference on Speech Science and Technology (pp. 666-671). Canberra: Australian Speech Science and Technology Association.

    Abstract

    English and French listeners performed two tasks - click location and speeded click detection - with both English and French sentences, closely matched for syntactic and phonological structure. Clicks were located more accurately in open- than in closed-class words in both English and French; they were detected more rapidly in open- than in closed-class words in English, but not in French. The two listener groups produced the same pattern of responses, suggesting that higher-level linguistic processing was not involved in these tasks.
  • Cutler, A., Mitterer, H., Brouwer, S., & Tuinman, A. (2010). Phonological competition in casual speech. In Proceedings of DiSS-LPSS Joint Workshop 2010 (pp. 43-46).
  • Cutler, A., & Robinson, T. (1992). Response time as a metric for comparison of speech recognition by humans and machines. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing: Vol. 1 (pp. 189-192). Alberta: University of Alberta.

    Abstract

    The performance of automatic speech recognition systems is usually assessed in terms of error rate. Human speech recognition produces few errors, but relative difficulty of processing can be assessed via response time techniques. We report the construction of a measure analogous to response time in a machine recognition system. This measure may be compared directly with human response times. We conducted a trial comparison of this type at the phoneme level, including both tense and lax vowels and a variety of consonant classes. The results suggested similarities between human and machine processing in the case of consonants, but differences in the case of vowels.
  • Cutler, A., & Shanley, J. (2010). Validation of a training method for L2 continuous-speech segmentation. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 1844-1847).

    Abstract

    Recognising continuous speech in a second language is often unexpectedly difficult, as the operation of segmenting speech is so attuned to native-language structure. We report the initial steps in development of a novel training method for second-language listening, focusing on speech segmentation and employing a task designed for studying this: word-spotting. Listeners detect real words in sequences consisting of a word plus a minimal context. The present validation study shows that learners from varying non-English backgrounds successfully perform a version of this task in English, and display appropriate sensitivity to structural factors that also affect segmentation by native English listeners.
  • Dalli, A., Tablan, V., Bontcheva, K., Wilks, Y., Broeder, D., Brugman, H., & Wittenburg, P. (2004). Web services architecture for language resources. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004) (pp. 365-368). Paris: ELRA - European Language Resources Association.
  • Dediu, D., & Moisik, S. (2016). Defining and counting phonological classes in cross-linguistic segment databases. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2016: 10th International Conference on Language Resources and Evaluation (pp. 1955-1962). Paris: European Language Resources Association (ELRA).

    Abstract

    Recently, there has been an explosion in the availability of large, good-quality cross-linguistic databases such as WALS (Dryer & Haspelmath, 2013), Glottolog (Hammarstrom et al., 2015) and Phoible (Moran & McCloy, 2014). Databases such as Phoible contain the actual segments used by various languages as they are given in the primary language descriptions. However, this segment-level representation cannot be used directly for analyses that require generalizations over classes of segments that share theoretically interesting features. Here we present a method and the associated R (R Core Team, 2014) code that allows the exible denition of such meaningful classes and that can identify the sets of segments falling into such a class for any language inventory. The method and its results are important for those interested in exploring cross-linguistic patterns of phonetic and phonological diversity and their relationship to extra-linguistic factors and processes such as climate, economics, history or human genetics.
  • Dediu, D., & Moisik, S. R. (2016). Anatomical biasing of click learning and production: An MRI and 3d palate imaging study. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/57.html.

    Abstract

    The current paper presents results for data on click learning obtained from a larger imaging study (using MRI and 3D intraoral scanning) designed to quantify and characterize intra- and inter-population variation of vocal tract structures and the relation of this to speech production. The aim of the click study was to ascertain whether and to what extent vocal tract morphology influences (1) the ability to learn to produce clicks and (2) the productions of those that successfully learn to produce these sounds. The results indicate that the presence of an alveolar ridge certainly does not prevent an individual from learning to produce click sounds (1). However, the subtle details of how clicks are produced may indeed be driven by palate shape (2).
  • Dideriksen, C., Fusaroli, R., Tylén, K., Dingemanse, M., & Christiansen, M. H. (2019). Contextualizing Conversational Strategies: Backchannel, Repair and Linguistic Alignment in Spontaneous and Task-Oriented Conversations. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci 2019) (pp. 261-267). Montreal, QB: Cognitive Science Society.

    Abstract

    Do interlocutors adjust their conversational strategies to the specific contextual demands of a given situation? Prior studies have yielded conflicting results, making it unclear how strategies vary with demands. We combine insights from qualitative and quantitative approaches in a within-participant experimental design involving two different contexts: spontaneously occurring conversations (SOC) and task-oriented conversations (TOC). We systematically assess backchanneling, other-repair and linguistic alignment. We find that SOC exhibit a higher number of backchannels, a reduced and more generic repair format and higher rates of lexical and syntactic alignment. TOC are characterized by a high number of specific repairs and a lower rate of lexical and syntactic alignment. However, when alignment occurs, more linguistic forms are aligned. The findings show that conversational strategies adapt to specific contextual demands.
  • Dieuleveut, A., Van Dooren, A., Cournane, A., & Hacquard, V. (2019). Acquiring the force of modals: Sig you guess what sig means? In M. Brown, & B. Dailey (Eds.), BUCLD 43: Proceedings of the 43rd annual Boston University Conference on Language Development (pp. 189-202). Sommerville, MA: Cascadilla Press.
  • Dolscheid, S., Shayan, S., Ozturk, O., Majid, A., & Casasanto, D. (2010). Language shapes mental representations of musical pitch: Implications for metaphorical language processing [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 137). York: University of York.

    Abstract

    Speakers often use spatial metaphors to talk about musical pitch (e.g., a low note, a high soprano). Previous experiments suggest that English speakers also think about pitches as high or low in space, even when theyʼre not using language or musical notation (Casasanto, 2010). Do metaphors in language merely reflect pre-existing associations between space and pitch, or might language also shape these non-linguistic metaphorical mappings? To investigate the role of language in pitch tepresentation, we conducted a pair of non-linguistic spacepitch interference experiments in speakers of two languages that use different spatial metaphors. Dutch speakers usually describe pitches as ʻhighʼ (hoog) and ʻlowʼ (laag). Farsi speakers, however, often describe high-frequency pitches as ʻthinʼ (naazok) and low-frequency pitches as ʻthickʼ (koloft). Do Dutch and Farsi speakers mentally represent pitch differently? To find out, we asked participants to reproduce musical pitches that they heard in the presence of irrelevant spatial information (i.e., lines that varied either in height or in thickness). For the Height Interference experiment, horizontal lines bisected a vertical reference line at one of nine different locations. For the Thickness Interference experiment, a vertical line appeared in the middle of the screen in one of nine thicknesses. In each experiment, the nine different lines were crossed with nine different pitches ranging from C4 to G#4 in semitone increments, to produce 81 distinct trials. If Dutch and Farsi speakers mentally represent pitch the way they talk about it, using different kinds of spatial representations, they should show contrasting patterns of cross-dimensional interference: Dutch speakersʼ pitch estimates should be more strongly affected by irrelevant height information, and Farsi speakersʼ by irrelevant thickness information. As predicted, Dutch speakersʼ pitch estimates were significantly modulated by spatial height but not by thickness. Conversely, Farsi speakersʼ pitch estimates were modulated by spatial thickness but not by height (2x2 ANOVA on normalized slopes of the effect of space on pitch: F(1,71)=17,15 p<.001). To determine whether language plays a causal role in shaping pitch representations, we conducted a training experiment. Native Dutch speakers learned to use Farsi-like metaphors, describing pitch relationships in terms of thickness (e.g., a cello sounds ʻthickerʼ than a flute). After training, Dutch speakers showed a significant effect of Thickness interference in the non-linguistic pitch reproduction task, similar to native Farsi speakers: on average, pitches accompanied by thicker lines were reproduced as lower in pitch (effect of thickness on pitch: r=-.22, p=.002). By conducting psychophysical tasks, we tested the ʻWhorfianʼ question without using words. Yet, results also inform theories of metaphorical language processing. According to psycholinguistic theories (e.g., Bowdle & Gentner, 2005), highly conventional metaphors are processed without any active mapping from the source to the target domain (e.g., from space to pitch). Our data, however, suggest that when people use verbal metaphors they activate a corresponding non-linguistic mapping from either height or thickness to pitch, strengthening this association at the expense of competing associations. As a result, people who use different metaphors in their native languages form correspondingly different representations of musical pitch. Casasanto, D. (2010). Space for Thinking. In Language, Cognition and Space: State of the art and new directions. V. Evans & P. Chilton (Eds.), 453-478, London: Equinox Publishing. Bowdle, B. & Gentner, D. (2005). The career of metaphor. Psychological Review, 112, 193-216.
  • Doumas, L. A., & Martin, A. E. (2016). Abstraction in time: Finding hierarchical linguistic structure in a model of relational processing. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 2279-2284). Austin, TX: Cognitive Science Society.

    Abstract

    Abstract mental representation is fundamental for human cognition. Forming such representations in time, especially from dynamic and noisy perceptual input, is a challenge for any processing modality, but perhaps none so acutely as for language processing. We show that LISA (Hummel & Holyaok, 1997) and DORA (Doumas, Hummel, & Sandhofer, 2008), models built to process and to learn structured (i.e., symbolic) rep resentations of conceptual properties and relations from unstructured inputs, show oscillatory activation during processing that is highly similar to the cortical activity elicited by the linguistic stimuli from Ding et al.(2016). We argue, as Ding et al.(2016), that this activation reflects formation of hierarchical linguistic representation, and furthermore, that the kind of computational mechanisms in LISA/DORA (e.g., temporal binding by systematic asynchrony of firing) may underlie formation of abstract linguistic representations in the human brain. It may be this repurposing that allowed for the generation or mergence of hierarchical linguistic structure, and therefore, human language, from extant cognitive and neural systems. We conclude that models of thinking and reasoning and models of language processing must be integrated —not only for increased plausiblity, but in order to advance both fields towards a larger integrative model of human cognition
  • Drozdova, P., Van Hout, R., & Scharenborg, O. (2016). Processing and adaptation to ambiguous sounds during the course of perceptual learning. In Proceedings of Interspeech 2016: The 17th Annual Conference of the International Speech Communication Association (pp. 2811-2815). doi:10.21437/Interspeech.2016-814.

    Abstract

    Listeners use their lexical knowledge to interpret ambiguous sounds, and retune their phonetic categories to include this ambiguous sound. Although there is ample evidence for lexically-guided retuning, the adaptation process is not fully understood. Using a lexical decision task with an embedded auditory semantic priming task, the present study investigates whether words containing an ambiguous sound are processed in the same way as “natural” words and whether adaptation to the ambiguous sound tends to equalize the processing of “ambiguous” and natural words. Analyses of the yes/no responses and reaction times to natural and “ambiguous” words showed that words containing an ambiguous sound were accepted as words less often and were processed slower than the same words without ambiguity. The difference in acceptance disappeared after exposure to approximately 15 ambiguous items. Interestingly, lower acceptance rates and slower processing did not have an effect on the processing of semantic information of the following word. However, lower acceptance rates of ambiguous primes predict slower reaction times of these primes, suggesting an important role of stimulus-specific characteristics in triggering lexically-guided perceptual learning.
  • Eijk, L., Ernestus, M., & Schriefers, H. (2019). Alignment of pitch and articulation rate. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 2690-2694). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Previous studies have shown that speakers align their speech to each other at multiple linguistic levels. This study investigates whether alignment is mostly the result of priming from the immediately preceding
    speech materials, focussing on pitch and articulation rate (AR). Native Dutch speakers completed sentences, first by themselves (pre-test), then in alternation with Confederate 1 (Round 1), with Confederate 2 (Round 2), with Confederate 1 again
    (Round 3), and lastly by themselves again (post-test). Results indicate that participants aligned to the confederates and that this alignment lasted during the post-test. The confederates’ directly preceding sentences were not good predictors for the participants’ pitch and AR. Overall, the results indicate that alignment is more of a global effect than a local priming effect.
  • Eisner, F., Weber, A., & Melinger, A. (2010). Generalization of learning in pre-lexical adjustments to word-final devoicing [Abstract]. Journal of the Acoustical Society of America, 128, 2323.

    Abstract

    Pre-lexical representations of speech sounds have been to shown to change dynamically through a mechanism of lexically driven learning. [Norris et al. (2003).] Here we investigated whether this type of learning occurs in native British English (BE) listeners for a word-final stop contrast which is commonly de-voiced in Dutch-accented English. Specifically, this study asked whether the change in pre-lexical representation also encodes information about the position of the critical sound within a word. After exposure to a native Dutch speaker's productions of de-voiced stops in word-final position (but not in any other positions), BE listeners showed evidence of perceptual learning in a subsequent cross-modal priming task, where auditory primes with voiceless final stops (e.g., [si:t], “seat”) facilitated recognition of visual targets with voiced final stops (e.g., “seed”). This learning generalized to test pairs where the critical contrast was in word-initial position, e.g., auditory primes such as [taun] (“town”), facilitated recognition of visual targets like “down”. Control listeners, who had not heard any stops by the speaker during exposure, showed no learning effects. The results suggest that under these exposure conditions, word position is not encoded in the pre-lexical adjustment to the accented phoneme contras
  • Enfield, N. J. (2004). Areal grammaticalisation of postverbal 'acquire' in mainland Southeast Asia. In S. Burusphat (Ed.), Proceedings of the 11th Southeast Asia Linguistics Society Meeting (pp. 275-296). Arizona State University: Tempe.
  • Eryilmaz, K., Little, H., & De Boer, B. (2016). Using HMMs To Attribute Structure To Artificial Languages. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/125.html.

    Abstract

    We investigated the use of Hidden Markov Models (HMMs) as a way of representing repertoires of continuous signals in order to infer their building blocks. We tested the idea on a dataset from an artificial language experiment. The study demonstrates using HMMs for this purpose is viable, but also that there is a lot of room for refinement such as explicit duration modeling, incorporation of autoregressive elements and relaxing the Markovian assumption, in order to accommodate specific details.
  • Felker, E. R., Ernestus, M., & Broersma, M. (2019). Evaluating dictation task measures for the study of speech perception. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 383-387). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    This paper shows that the dictation task, a well-
    known testing instrument in language education, has
    untapped potential as a research tool for studying
    speech perception. We describe how transcriptions
    can be scored on measures of lexical, orthographic,
    phonological, and semantic similarity to target
    phrases to provide comprehensive information about
    accuracy at different processing levels. The former
    three measures are automatically extractable,
    increasing objectivity, and the middle two are
    gradient, providing finer-grained information than
    traditionally used. We evaluate the measures in an
    English dictation task featuring phonetically reduced
    continuous speech. Whereas the lexical and
    orthographic measures emphasize listeners’ word
    identification difficulties, the phonological measure
    demonstrates that listeners can often still recover
    phonological features, and the semantic measure
    captures their ability to get the gist of the utterances.
    Correlational analyses and a discussion of practical
    and theoretical considerations show that combining
    multiple measures improves the dictation task’s
    utility as a research tool.
  • Felker, E. R., Ernestus, M., & Broersma, M. (2019). Lexically guided perceptual learning of a vowel shift in an interactive L2 listening context. In Proceedings of Interspeech 2019 (pp. 3123-3127). doi:10.21437/Interspeech.2019-1414.

    Abstract

    Lexically guided perceptual learning has traditionally been studied with ambiguous consonant sounds to which native listeners are exposed in a purely receptive listening context. To extend previous research, we investigate whether lexically guided learning applies to a vowel shift encountered by non-native listeners in an interactive dialogue. Dutch participants played a two-player game in English in either a control condition, which contained no evidence for a vowel shift, or a lexically constraining condition, in which onscreen lexical information required them to re-interpret their interlocutor’s /ɪ/ pronunciations as representing /ε/. A phonetic categorization pre-test and post-test were used to assess whether the game shifted listeners’ phonemic boundaries such that more of the /ε/-/ɪ/ continuum came to be perceived as /ε/. Both listener groups showed an overall post-test shift toward /ɪ/, suggesting that vowel perception may be sensitive to directional biases related to properties of the speaker’s vowel space. Importantly, listeners in the lexically constraining condition made relatively more post-test /ε/ responses than the control group, thereby exhibiting an effect of lexically guided adaptation. The results thus demonstrate that non-native listeners can adjust their phonemic boundaries on the basis of lexical information to accommodate a vowel shift learned in interactive conversation.
  • Filippi, P., Congdon, J. V., Hoang, J., Bowling, D. L., Reber, S., Pašukonis, A., Hoeschele, M., Ocklenburg, S., de Boer, B., Sturdy, C. B., Newen, A., & Güntürkün, O. (2016). Humans Recognize Vocal Expressions Of Emotional States Universally Across Species. In The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/91.html.

    Abstract

    The perception of danger in the environment can induce physiological responses (such as a heightened state of arousal) in animals, which may cause measurable changes in the prosodic modulation of the voice (Briefer, 2012). The ability to interpret the prosodic features of animal calls as an indicator of emotional arousal may have provided the first hominins with an adaptive advantage, enabling, for instance, the recognition of a threat in the surroundings. This ability might have paved the ability to process meaningful prosodic modulations in the emerging linguistic utterances.
  • Filippi, P., Ocklenburg, S., Bowling, D. L., Heege, L., Newen, A., Güntürkün, O., & de Boer, B. (2016). Multimodal Processing Of Emotional Meanings: A Hypothesis On The Adaptive Value Of Prosody. In The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/90.html.

    Abstract

    Humans combine multiple sources of information to comprehend meanings. These sources can be characterized as linguistic (i.e., lexical units and/or sentences) or paralinguistic (e.g. body posture, facial expression, voice intonation, pragmatic context). Emotion communication is a special case in which linguistic and paralinguistic dimensions can simultaneously denote the same, or multiple incongruous referential meanings. Think, for instance, about when someone says “I’m sad!”, but does so with happy intonation and a happy facial expression. Here, the communicative channels express very specific (although conflicting) emotional states as denotations. In such cases of intermodal incongruence, are we involuntarily biased to respond to information in one channel over the other? We hypothesize that humans are involuntary biased to respond to prosody over verbal content and facial expression, since the ability to communicate socially relevant information such as basic emotional states through prosodic modulation of the voice might have provided early hominins with an adaptive advantage that preceded the emergence of segmental speech (Darwin 1871; Mithen, 2005). To address this hypothesis, we examined the interaction between multiple communicative channels in recruiting attentional resources, within a Stroop interference task (i.e. a task in which different channels give conflicting information; Stroop, 1935). In experiment 1, we used synonyms of “happy” and “sad” spoken with happy and sad prosody. Participants were asked to identify the emotion expressed by the verbal content while ignoring prosody (Word task) or vice versa (Prosody task). Participants responded faster and more accurately in the Prosody task. Within the Word task, incongruent stimuli were responded to more slowly and less accurately than congruent stimuli. In experiment 2, we adopted synonyms of “happy” and “sad” spoken in happy and sad prosody, while a happy or sad face was displayed. Participants were asked to identify the emotion expressed by the verbal content while ignoring prosody and face (Word task), to identify the emotion expressed by prosody while ignoring verbal content and face (Prosody task), or to identify the emotion expressed by the face while ignoring prosody and verbal content (Face task). Participants responded faster in the Face task and less accurately when the two non-focused channels were expressing an emotion that was incongruent with the focused one, as compared with the condition where all the channels were congruent. In addition, in the Word task, accuracy was lower when prosody was incongruent to verbal content and face, as compared with the condition where all the channels were congruent. Our data suggest that prosody interferes with emotion word processing, eliciting automatic responses even when conflicting with both verbal content and facial expressions at the same time. In contrast, although processed significantly faster than prosody and verbal content, faces alone are not sufficient to interfere in emotion processing within a three-dimensional Stroop task. Our findings align with the hypothesis that the ability to communicate emotions through prosodic modulation of the voice – which seems to be dominant over verbal content - is evolutionary older than the emergence of segmental articulation (Mithen, 2005; Fitch, 2010). This hypothesis fits with quantitative data suggesting that prosody has a vital role in the perception of well-formed words (Johnson & Jusczyk, 2001), in the ability to map sounds to referential meanings (Filippi et al., 2014), and in syntactic disambiguation (Soderstrom et al., 2003). This research could complement studies on iconic communication within visual and auditory domains, providing new insights for models of language evolution. Further work aimed at how emotional cues from different modalities are simultaneously integrated will improve our understanding of how humans interpret multimodal emotional meanings in real life interactions.
  • Fisher, S. E., & Tilot, A. K. (Eds.). (2019). Bridging senses: Novel insights from synaesthesia [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 374.
  • Fitz, H. (2010). Statistical learning of complex questions. In S. Ohlsson, & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 2692-2698). Austin, TX: Cognitive Science Society.

    Abstract

    The problem of auxiliary fronting in complex polar questions occupies a prominent position within the nature versus nurture controversy in language acquisition. We employ a model of statistical learning which uses sequential and semantic information to produce utterances from a bag of words. This linear learner is capable of generating grammatical questions without exposure to these structures in its training environment. We also demonstrate that the model performs superior to n-gram learners on this task. Implications for nativist theories of language acquisition are discussed.
  • Floyd, S. (2004). Purismo lingüístico y realidad local: ¿Quichua puro o puro quichuañol? In Proceedings of the Conference on Indigenous Languages of Latin America (CILLA)-I.
  • Frost, R. L. A., Isbilen, E. S., Christiansen, M. H., & Monaghan, P. (2019). Testing the limits of non-adjacent dependency learning: Statistical segmentation and generalisation across domains. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 1787-1793). Montreal, QB: Cognitive Science Society.

    Abstract

    Achieving linguistic proficiency requires identifying words from speech, and discovering the constraints that govern the way those words are used. In a recent study of non-adjacent dependency learning, Frost and Monaghan (2016) demonstrated that learners may perform these tasks together, using similar statistical processes - contrary to prior suggestions. However, in their study, non-adjacent dependencies were marked by phonological cues (plosive-continuant-plosive structure), which may have influenced learning. Here, we test the necessity of these cues by comparing learning across three conditions; fixed phonology, which contains these cues, varied phonology, which omits them, and shapes, which uses visual shape sequences to assess the generality of statistical processing for these tasks. Participants segmented the sequences and generalized the structure in both auditory conditions, but learning was best when phonological cues were present. Learning was around chance on both tasks for the visual shapes group, indicating statistical processing may critically differ across domains.
  • Frost, R. L. A., Monaghan, P., & Christiansen, M. H. (2016). Using Statistics to Learn Words and Grammatical Categories: How High Frequency Words Assist Language Acquisition. In A. Papafragou, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 81-86). Austin, Tx: Cognitive Science Society. Retrieved from https://mindmodeling.org/cogsci2016/papers/0027/index.html.

    Abstract

    Recent studies suggest that high-frequency words may benefit speech segmentation (Bortfeld, Morgan, Golinkoff, & Rathbun, 2005) and grammatical categorisation (Monaghan, Christiansen, & Chater, 2007). To date, these tasks have been examined separately, but not together. We familiarised adults with continuous speech comprising repetitions of target words, and compared learning to a language in which targets appeared alongside high-frequency marker words. Marker words reliably preceded targets, and distinguished them into two otherwise unidentifiable categories. Participants completed a 2AFC segmentation test, and a similarity judgement categorisation test. We tested transfer to a word-picture mapping task, where words from each category were used either consistently or inconsistently to label actions/objects. Participants segmented the speech successfully, but only demonstrated effective categorisation when speech contained high-frequency marker words. The advantage of marker words extended to the early stages of the transfer task. Findings indicate the same high-frequency words may assist speech segmentation and grammatical categorisation.
  • Furman, R., Ozyurek, A., & Küntay, A. C. (2010). Early language-specificity in Turkish children's caused motion event expressions in speech and gesture. In K. Franich, K. M. Iserman, & L. L. Keil (Eds.), Proceedings of the 34th Boston University Conference on Language Development. Volume 1 (pp. 126-137). Somerville, MA: Cascadilla Press.
  • Galke, L., Vagliano, I., & Scherp, A. (2019). Can graph neural networks go „online“? An analysis of pretraining and inference. In Proceedings of the Representation Learning on Graphs and Manifolds: ICLR2019 Workshop.

    Abstract

    Large-scale graph data in real-world applications is often not static but dynamic,
    i. e., new nodes and edges appear over time. Current graph convolution approaches
    are promising, especially, when all the graph’s nodes and edges are available dur-
    ing training. When unseen nodes and edges are inserted after training, it is not
    yet evaluated whether up-training or re-training from scratch is preferable. We
    construct an experimental setup, in which we insert previously unseen nodes and
    edges after training and conduct a limited amount of inference epochs. In this
    setup, we compare adapting pretrained graph neural networks against retraining
    from scratch. Our results show that pretrained models yield high accuracy scores
    on the unseen nodes and that pretraining is preferable over retraining from scratch.
    Our experiments represent a first step to evaluate and develop truly online variants
    of graph neural networks.
  • Galke, L., Melnychuk, T., Seidlmayer, E., Trog, S., Foerstner, K., Schultz, C., & Tochtermann, K. (2019). Inductive learning of concept representations from library-scale bibliographic corpora. In K. David, K. Geihs, M. Lange, & G. Stumme (Eds.), Informatik 2019: 50 Jahre Gesellschaft für Informatik - Informatik für Gesellschaft (pp. 219-232). Bonn: Gesellschaft für Informatik e.V. doi:10.18420/inf2019_26.
  • Gannon, E., He, J., Gao, X., & Chaparro, B. (2016). RSVP Reading on a Smart Watch. In Proceedings of the Human Factors and Ergonomics Society 2016 Annual Meeting (pp. 1130-1134).

    Abstract

    Reading with Rapid Serial Visual Presentation (RSVP) has shown promise for optimizing screen space and increasing reading speed without compromising comprehension. Given the wide use of small-screen devices, the present study compared RSVP and traditional reading on three types of reading comprehension, reading speed, and subjective measures on a smart watch. Results confirm previous studies that show faster reading speed with RSVP without detracting from comprehension. Subjective data indicate that Traditional is strongly preferred to RSVP as a primary reading method. Given the optimal use of screen space, increased speed and comparable comprehension, future studies should focus on making RSVP a more comfortable format.
  • Gerwien, J., & Flecken, M. (2016). First things first? Top-down influences on event apprehension. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 2633-2638). Austin, TX: Cognitive Science Society.

    Abstract

    Not much is known about event apprehension, the earliest stage of information processing in elicited language production studies, using pictorial stimuli. A reason for our lack of knowledge on this process is that apprehension happens very rapidly (<350 ms after stimulus onset, Griffin & Bock 2000), making it difficult to measure the process directly. To broaden our understanding of apprehension, we analyzed landing positions and onset latencies of first fixations on visual stimuli (pictures of real-world events) given short stimulus presentation times, presupposing that the first fixation directly results from information processing during apprehension
  • Goldrick, M., Brehm, L., Pyeong Whan, C., & Smolensky, P. (2019). Transient blend states and discrete agreement-driven errors in sentence production. In G. J. Snover, M. Nelson, B. O'Connor, & J. Pater (Eds.), Proceedings of the Society for Computation in Linguistics (SCiL 2019) (pp. 375-376). doi:10.7275/n0b2-5305.
  • Goudbeek, M., & Broersma, M. (2010). The Demo/Kemo corpus: A principled approach to the study of cross-cultural differences in the vocal expression and perception of emotion. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010) (pp. 2211-2215). Paris: ELRA.

    Abstract

    This paper presents the Demo / Kemo corpus of Dutch and Korean emotional speech. The corpus has been specifically developed for the purpose of cross-linguistic comparison, and is more balanced than any similar corpus available so far: a) it contains expressions by both Dutch and Korean actors as well as judgments by both Dutch and Korean listeners; b) the same elicitation technique and recording procedure was used for recordings of both languages; c) the same nonsense sentence, which was constructed to be permissible in both languages, was used for recordings of both languages; and d) the emotions present in the corpus are balanced in terms of valence, arousal, and dominance. The corpus contains a comparatively large number of emotions (eight) uttered by a large number of speakers (eight Dutch and eight Korean). The counterbalanced nature of the corpus will enable a stricter investigation of language-specific versus universal aspects of emotional expression than was possible so far. Furthermore, given the carefully controlled phonetic content of the expressions, it allows for analysis of the role of specific phonetic features in emotional expression in Dutch and Korean.
  • Gubian, M., Bergmann, C., & Boves, L. (2010). Investigating word learning processes in an artificial agent. In Proceedings of the IXth IEEE International Conference on Development and Learning (ICDL). Ann Arbor, MI, 18-21 Aug. 2010 (pp. 178 -184). IEEE.

    Abstract

    Researchers in human language processing and acquisition are making an increasing use of computational models. Computer simulations provide a valuable platform to reproduce hypothesised learning mechanisms that are otherwise very difficult, if not impossible, to verify on human subjects. However, computational models come with problems and risks. It is difficult to (automatically) extract essential information about the developing internal representations from a set of simulation runs, and often researchers limit themselves to analysing learning curves based on empirical recognition accuracy through time. The associated risk is to erroneously deem a specific learning behaviour as generalisable to human learners, while it could also be a mere consequence (artifact) of the implementation of the artificial learner or of the input coding scheme. In this paper a set of simulation runs taken from the ACORNS project is investigated. First a look `inside the box' of the learner is provided by employing novel quantitative methods for analysing changing structures in large data sets. Then, the obtained findings are discussed in the perspective of their ecological validity in the field of child language acquisition.
  • Gullberg, M., & Indefrey, P. (Eds.). (2010). The earliest stages of language learning [Special Issue]. Language Learning, 60(Supplement s2).
  • Hahn, L. E., Ten Buuren, M., De Nijs, M., Snijders, T. M., & Fikkert, P. (2019). Acquiring novel words in a second language through mutual play with child songs - The Noplica Energy Center. In L. Nijs, H. Van Regenmortel, & C. Arculus (Eds.), MERYC19 Counterpoints of the senses: Bodily experiences in musical learning (pp. 78-87). Ghent, Belgium: EuNet MERYC 2019.

    Abstract

    Child songs are a great source for linguistic learning. Here we explore whether children can acquire novel words in a second language by playing a game featuring child songs in a playhouse. We present data from three studies that serve as scientific proof for the functionality of one game of the playhouse: the Energy Center. For this game, three hand-bikes were mounted on a panel. When children start moving the hand-bikes, child songs start playing simultaneously. Once the children produce enough energy with the hand-bikes, the songs are additionally accompanied with the sounds of musical instruments. In our studies, children executed a picture-selection task to evaluate whether they acquired new vocabulary from the songs presented during the game. Two of our studies were run in the field, one at a Dutch and one at an Indian pre-school. The third study features data from a more controlled laboratory setting. Our results partly confirm that the Energy Center is a successful means to support vocabulary acquisition in a second language. More research with larger sample sizes and longer access to the Energy Center is needed to evaluate the overall functionality of the game. Based on informal observations at our test sites, however, we are certain that children do pick up linguistic content from the songs during play, as many of the children repeat words and phrases from songs they heard. We will pick up upon these promising observations during future studies
  • Hanique, I., Schuppler, B., & Ernestus, M. (2010). Morphological and predictability effects on schwa reduction: The case of Dutch word-initial syllables. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan (pp. 933-936).

    Abstract

    This corpus-based study shows that the presence and duration of schwa in Dutch word-initial syllables are affected by a word’s predictability and its morphological structure. Schwa is less reduced in words that are more predictable given the following word. In addition, schwa may be longer if the syllable forms a prefix, and in prefixes the duration of schwa is positively correlated with the frequency of the word relative to its stem. Our results suggest that the conditions which favor reduced realizations are more complex than one would expect on the basis of the current literature.
  • Hanulikova, A., & Weber, A. (2010). Production of English interdental fricatives by Dutch, German, and English speakers. In K. Dziubalska-Kołaczyk, M. Wrembel, & M. Kul (Eds.), Proceedings of the 6th International Symposium on the Acquisition of Second Language Speech, New Sounds 2010, Poznań, Poland, 1-3 May 2010 (pp. 173-178). Poznan: Adam Mickiewicz University.

    Abstract

    Non-native (L2) speakers of English often experience difficulties in producing English interdental fricatives (e.g. the voiceless [θ]), and this leads to frequent substitutions of these fricatives (e.g. with [t], [s], and [f]). Differences in the choice of [θ]-substitutions across L2 speakers with different native (L1) language backgrounds have been extensively explored. However, even within one foreign accent, more than one substitution choice occurs, but this has been less systematically studied. Furthermore, little is known about whether the substitutions of voiceless [θ] are phonetically clear instances of [t], [s], and [f], as they are often labelled. In this study, we attempted a phonetic approach to examine language-specific preferences for [θ]-substitutions by carrying out acoustic measurements of L1 and L2 realizations of these sounds. To this end, we collected a corpus of spoken English with L1 speakers (UK-English), and Dutch and German L2 speakers. We show a) that the distribution of differential substitutions using identical materials differs between Dutch and German L2 speakers, b) that [t,s,f]-substitutes differ acoustically from intended [t,s,f], and c) that L2 productions of [θ] are acoustically comparable to L1 productions.
  • Harmon, Z., & Kapatsinski, V. (2016). Fuse to be used: A weak cue’s guide to attracting attention. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016). Austin, TX: Cognitive Science Society (pp. 520-525). Austin, TX: Cognitive Science Society.

    Abstract

    Several studies examined cue competition in human learning by testing learners on a combination of conflicting cues rooting for different outcomes, with each cue perfectly predicting its outcome. A common result has been that learners faced with cue conflict choose the outcome associated with the rare cue (the Inverse Base Rate Effect, IBRE). Here, we investigate cue competition including IBRE with sentences containing cues to meanings in a visual world. We do not observe IBRE. Instead we find that position in the sentence strongly influences cue salience. Faced with conflict between an initial cue and a non-initial cue, learners choose the outcome associated with the initial cue, whether frequent or rare. However, a frequent configuration of non-initial cues that are not sufficiently salient on their own can overcome a competing salient initial cue rooting for a different meaning. This provides a possible explanation for certain recurring patterns in language change.
  • Harmon, Z., & Kapatsinski, V. (2016). Determinants of lengths of repetition disfluencies: Probabilistic syntactic constituency in speech production. In R. Burkholder, C. Cisneros, E. R. Coppess, J. Grove, E. A. Hanink, H. McMahan, C. Meyer, N. Pavlou, Ö. Sarıgül, A. R. Singerman, & A. Zhang (Eds.), Proceedings of the Fiftieth Annual Meeting of the Chicago Linguistic Society (pp. 237-248). Chicago: Chicago Linguistic Society.
  • Heilbron, M., Ehinger, B., Hagoort, P., & De Lange, F. P. (2019). Tracking naturalistic linguistic predictions with deep neural language models. In Proceedings of the 2019 Conference on Cognitive Computational Neuroscience (pp. 424-427). doi:10.32470/CCN.2019.1096-0.

    Abstract

    Prediction in language has traditionally been studied using
    simple designs in which neural responses to expected
    and unexpected words are compared in a categorical
    fashion. However, these designs have been contested
    as being ‘prediction encouraging’, potentially exaggerating
    the importance of prediction in language understanding.
    A few recent studies have begun to address
    these worries by using model-based approaches to probe
    the effects of linguistic predictability in naturalistic stimuli
    (e.g. continuous narrative). However, these studies
    so far only looked at very local forms of prediction, using
    models that take no more than the prior two words into
    account when computing a word’s predictability. Here,
    we extend this approach using a state-of-the-art neural
    language model that can take roughly 500 times longer
    linguistic contexts into account. Predictability estimates
    fromthe neural network offer amuch better fit to EEG data
    from subjects listening to naturalistic narrative than simpler
    models, and reveal strong surprise responses akin to
    the P200 and N400. These results show that predictability
    effects in language are not a side-effect of simple designs,
    and demonstrate the practical use of recent advances
    in AI for the cognitive neuroscience of language.
  • Hendricks, I., Lefever, E., Croijmans, I., Majid, A., & Van den Bosch, A. (2016). Very quaffable and great fun: Applying NLP to wine reviews. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: Vol 2 (pp. 306-312). Stroudsburg, PA: Association for Computational Linguistics.

    Abstract

    We automatically predict properties of
    wines on the basis of smell and flavor de-
    scriptions from experts’ wine reviews. We
    show wine experts are capable of describ-
    ing their smell and flavor experiences in
    wine reviews in a sufficiently consistent
    manner, such that we can use their descrip-
    tions to predict properties of a wine based
    solely on language. The experimental re-
    sults show promising F-scores when using
    lexical and semantic information to predict
    the color, grape variety, country of origin,
    and price of a wine. This demonstrates,
    contrary to popular opinion, that wine ex-
    perts’ reviews really are informative.
  • Hintz, F., & Scharenborg, O. (2016). Neighbourhood density influences word recognition in native and non-native speech recognition in noise. In H. Van den Heuvel, B. Cranen, & S. Mattys (Eds.), Proceedings of the Speech Processing in Realistic Environments (SPIRE) workshop (pp. 46-47). Groningen.
  • Hintz, F., & Scharenborg, O. (2016). The effect of background noise on the activation of phonological and semantic information during spoken-word recognition. In Proceedings of Interspeech 2016: The 17th Annual Conference of the International Speech Communication Association (pp. 2816-2820).

    Abstract

    During spoken-word recognition, listeners experience phonological competition between multiple word candidates, which increases, relative to optimal listening conditions, when speech is masked by noise. Moreover, listeners activate semantic word knowledge during the word’s unfolding. Here, we replicated the effect of background noise on phonological competition and investigated to which extent noise affects the activation of semantic information in phonological competitors. Participants’ eye movements were recorded when they listened to sentences containing a target word and looked at three types of displays. The displays either contained a picture of the target word, or a picture of a phonological onset competitor, or a picture of a word semantically related to the onset competitor, each along with three unrelated distractors. The analyses revealed that, in noise, fixations to the target and to the phonological onset competitor were delayed and smaller in magnitude compared to the clean listening condition, most likely reflecting enhanced phonological competition. No evidence for the activation of semantic information in the phonological competitors was observed in noise and, surprisingly, also not in the clear. We discuss the implications of the lack of an effect and differences between the present and earlier studies.
  • Irivine, E., & Roberts, S. G. (2016). Deictic tools can limit the emergence of referential symbol systems. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/99.html.

    Abstract

    Previous experiments and models show that the pressure to communicate can lead to the emergence of symbols in specific tasks. The experiment presented here suggests that the ability to use deictic gestures can reduce the pressure for symbols to emerge in co-operative tasks. In the 'gesture-only' condition, pairs built a structure together in 'Minecraft', and could only communicate using a small range of gestures. In the 'gesture-plus' condition, pairs could also use sound to develop a symbol system if they wished. All pairs were taught a pointing convention. None of the pairs we tested developed a symbol system, and performance was no different across the two conditions. We therefore suggest that deictic gestures, and non-referential means of organising activity sequences, are often sufficient for communication. This suggests that the emergence of linguistic symbols in early hominids may have been late and patchy with symbols only emerging in contexts where they could significantly improve task success or efficiency. Given the communicative power of pointing however, these contexts may be fewer than usually supposed. An approach for identifying these situations is outlined.
  • Janssen, R., Winter, B., Dediu, D., Moisik, S. R., & Roberts, S. G. (2016). Nonlinear biases in articulation constrain the design space of language. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/86.html.

    Abstract

    In Iterated Learning (IL) experiments, a participant’s learned output serves as the next participant’s learning input (Kirby et al., 2014). IL can be used to model cultural transmission and has indicated that weak biases can be amplified through repeated cultural transmission (Kirby et al., 2007). So, for example, structural language properties can emerge over time because languages come to reflect the cognitive constraints in the individuals that learn and produce the language. Similarly, we propose that languages may also reflect certain anatomical biases. Do sound systems adapt to the affordances of the articulation space induced by the vocal tract?
    The human vocal tract has inherent nonlinearities which might derive from acoustics and aerodynamics (cf. quantal theory, see Stevens, 1989) or biomechanics (cf. Gick & Moisik, 2015). For instance, moving the tongue anteriorly along the hard palate to produce a fricative does not result in large changes in acoustics in most cases, but for a small range there is an abrupt change from a perceived palato-alveolar [ʃ] to alveolar [s] sound (Perkell, 2012). Nonlinearities such as these might bias all human speakers to converge on a very limited set of phonetic categories, and might even be a basis for combinatoriality or phonemic ‘universals’.
    While IL typically uses discrete symbols, Verhoef et al. (2014) have used slide whistles to produce a continuous signal. We conducted an IL experiment with human subjects who communicated using a digital slide whistle for which the degree of nonlinearity is controlled. A single parameter (α) changes the mapping from slide whistle position (the ‘articulator’) to the acoustics. With α=0, the position of the slide whistle maps Bark-linearly to the acoustics. As α approaches 1, the mapping gets more double-sigmoidal, creating three plateaus where large ranges of positions map to similar frequencies. In more abstract terms, α represents the strength of a nonlinear (anatomical) bias in the vocal tract.
    Six chains (138 participants) of dyads were tested, each chain with a different, fixed α. Participants had to communicate four meanings by producing a continuous signal using the slide-whistle in a ‘director-matcher’ game, alternating roles (cf. Garrod et al., 2007).
    Results show that for high αs, subjects quickly converged on the plateaus. This quick convergence is indicative of a strong bias, repelling subjects away from unstable regions already within-subject. Furthermore, high αs lead to the emergence of signals that oscillate between two (out of three) plateaus. Because the sigmoidal spaces are spatially constrained, participants increasingly used the sequential/temporal dimension. As a result of this, the average duration of signals with high α was ~100ms longer than with low α. These oscillations could be an expression of a basis for phonemic combinatoriality.
    We have shown that it is possible to manipulate the magnitude of an articulator-induced non-linear bias in a slide whistle IL framework. The results suggest that anatomical biases might indeed constrain the design space of language. In particular, the signaling systems in our study quickly converged (within-subject) on the use of stable regions. While these conclusions were drawn from experiments using slide whistles with a relatively strong bias, weaker biases could possibly be amplified over time by repeated cultural transmission, and likely lead to similar outcomes.
  • Janssen, R., Dediu, D., & Moisik, S. R. (2016). Simple agents are able to replicate speech sounds using 3d vocal tract model. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/97.html.

    Abstract

    Many factors have been proposed to explain why groups of people use different speech sounds in their language. These range from cultural, cognitive, environmental (e.g., Everett, et al., 2015) to anatomical (e.g., vocal tract (VT) morphology). How could such anatomical properties have led to the similarities and differences in speech sound distributions between human languages?

    It is known that hard palate profile variation can induce different articulatory strategies in speakers (e.g., Brunner et al., 2009). That is, different hard palate profiles might induce a kind of bias on speech sound production, easing some types of sounds while impeding others. With a population of speakers (with a proportion of individuals) that share certain anatomical properties, even subtle VT biases might become expressed at a population-level (through e.g., bias amplification, Kirby et al., 2007). However, before we look into population-level effects, we should first look at within-individual anatomical factors. For that, we have developed a computer-simulated analogue for a human speaker: an agent. Our agent is designed to replicate speech sounds using a production and cognition module in a computationally tractable manner.

    Previous agent models have often used more abstract (e.g., symbolic) signals. (e.g., Kirby et al., 2007). We have equipped our agent with a three-dimensional model of the VT (the production module, based on Birkholz, 2005) to which we made numerous adjustments. Specifically, we used a 4th-order Bezier curve that is able to capture hard palate variation on the mid-sagittal plane (XXX, 2015). Using an evolutionary algorithm, we were able to fit the model to human hard palate MRI tracings, yielding high accuracy fits and using as little as two parameters. Finally, we show that the samples map well-dispersed to the parameter-space, demonstrating that the model cannot generate unrealistic profiles. We can thus use this procedure to import palate measurements into our agent’s production module to investigate the effects on acoustics. We can also exaggerate/introduce novel biases.

    Our agent is able to control the VT model using the cognition module.

    Previous research has focused on detailed neurocomputation (e.g., Kröger et al., 2014) that highlights e.g., neurobiological principles or speech recognition performance. However, the brain is not the focus of our current study. Furthermore, present-day computing throughput likely does not allow for large-scale deployment of these architectures, as required by the population model we are developing. Thus, the question whether a very simple cognition module is able to replicate sounds in a computationally tractable manner, and even generalize over novel stimuli, is one worthy of attention in its own right.

    Our agent’s cognition module is based on running an evolutionary algorithm on a large population of feed-forward neural networks (NNs). As such, (anatomical) bias strength can be thought of as an attractor basin area within the parameter-space the agent has to explore. The NN we used consists of a triple-layered (fully-connected), directed graph. The input layer (three neurons) receives the formants frequencies of a target-sound. The output layer (12 neurons) projects to the articulators in the production module. A hidden layer (seven neurons) enables the network to deal with nonlinear dependencies. The Euclidean distance (first three formants) between target and replication is used as fitness measure. Results show that sound replication is indeed possible, with Euclidean distance quickly approaching a close-to-zero asymptote.

    Statistical analysis should reveal if the agent can also: a) Generalize: Can it replicate sounds not exposed to during learning? b) Replicate consistently: Do different, isolated agents always converge on the same sounds? c) Deal with consolidation: Can it still learn new sounds after an extended learning phase (‘infancy’) has been terminated? Finally, a comparison with more complex models will be used to demonstrate robustness.
  • Janzen, G., & Weststeijn, C. (2004). Neural representation of object location and route direction: An fMRI study. NeuroImage, 22(Supplement 1), e634-e635.
  • Janzen, G., & Van Turennout, M. (2004). Neuronale Markierung navigationsrelevanter Objekte im räumlichen Gedächtnis: Ein fMRT Experiment. In D. Kerzel (Ed.), Beiträge zur 46. Tagung experimentell arbeitender Psychologen (pp. 125-125). Lengerich: Pabst Science Publishers.
  • Jasmin, K., & Casasanto, D. (2010). Stereotyping: How the QWERTY keyboard shapes the mental lexicon [Abstract]. In Proceedings of the 16th Annual Conference on Architectures and Mechanisms for Language Processing [AMLaP 2010] (pp. 159). York: University of York.
  • Jeske, J., Kember, H., & Cutler, A. (2016). Native and non-native English speakers' use of prosody to predict sentence endings. In Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016).
  • Jesse, A., Reinisch, E., & Nygaard, L. C. (2010). Learning of adjectival word meaning through tone of voice [Abstract]. Journal of the Acoustical Society of America, 128, 2475.

    Abstract

    Speakers express word meaning through systematic but non-canonical acoustic variation of tone of voice (ToV), i.e., variation of speaking rate, pitch, vocal effort, or loudness. Words are, for example, pronounced at a higher pitch when referring to small than to big referents. In the present study, we examined whether listeners can use ToV to learn the meaning of novel adjectives (e.g., “blicket”). During training, participants heard sentences such as “Can you find the blicket one?” spoken with ToV representing hot-cold, strong-weak, and big-small. Participants’ eye movements to two simultaneously shown objects with properties representing the relevant two endpoints (e.g., an elephant and an ant for big-small) were monitored. Assignment of novel adjectives to endpoints was counterbalanced across participants. During test, participants heard the sentences spoken with a neutral ToV, while seeing old or novel picture pairs varying along the same dimensions (e.g., a truck and a car for big-small). Participants had to click on the adjective’s referent. As evident from eye movements, participants did not infer the intended meaning during first exposure, but learned the meaning with the help of ToV during training. At test listeners applied this knowledge to old and novel items even in the absence of informative ToV.
  • Johns, T. G., Perera, R. M., Vitali, A. A., Vernes, S. C., & Scott, A. (2004). Phosphorylation of a glioma-specific mutation of the EGFR [Abstract]. Neuro-Oncology, 6, 317.

    Abstract

    Mutations of the epidermal growth factor receptor (EGFR) gene are found at a relatively high frequency in glioma, with the most common being the de2-7 EGFR (or EGFRvIII). This mutation arises from an in-frame deletion of exons 2-7, which removes 267 amino acids from the extracellular domain of the receptor. Despite being unable to bind ligand, the de2-7 EGFR is constitutively active at a low level. Transfection of human glioma cells with the de2-7 EGFR has little effect in vitro, but when grown as tumor xenografts this mutated receptor imparts a dramatic growth advantage. We mapped the phosphorylation pattern of de2-7 EGFR, both in vivo and in vitro, using a panel of antibodies specific for different phosphorylated tyrosine residues. Phosphorylation of de2-7 EGFR was detected constitutively at all tyrosine sites surveyed in vitro and in vivo, including tyrosine 845, a known target in the wild-type EGFR for src kinase. There was a substantial upregulation of phosphorylation at every yrosine residue of the de2-7 EGFR when cells were grown in vivo compared to the receptor isolated from cells cultured in vitro. Upregulation of phosphorylation at tyrosine 845 could be stimulated in vitro by the addition of specific components of the ECM via an integrindependent mechanism. These observations may partially explain why the growth enhancement mediated by de2-7 EGFR is largely restricted to the in vivo environment

Share this page