Publications

Displaying 201 - 300 of 344
  • Little, H. (Ed.). (2017). Special Issue on the Emergence of Sound Systems [Special Issue]. The Journal of Language Evolution, 2(1).
  • Liu, Z., Chen, A., & Van de Velde, H. (2014). Prosodic focus marking in Bai. In N. Campbell, D. Gibbon, & D. Hirst (Eds.), Proceedings of Speech Prosody 2014 (pp. 628-631).

    Abstract

    This study investigates prosodic marking of focus in Bai, a Sino-Tibetan language spoken in the Southwest of China, by adopting a semi-spontaneous experimental approach. Our data show that Bai speakers increase the duration of the focused constituent and reduce the duration of the post-focus constituent to encode focus. However, duration is not used in Bai to distinguish focus types differing in size and contrastivity. Further, pitch plays no role in signaling focus and differentiating focus types. The results thus suggest that Bai uses prosody to mark focus, but to a lesser extent, compared to Mandarin Chinese, with which Bai has been in close contact for centuries, and Cantonese, to which Bai is similar in the tonal system, although Bai is similar to Cantonese in its reliance on duration in prosodic focus marking.
  • Lockwood, G., Hagoort, P., & Dingemanse, M. (2016). Synthesized Size-Sound Sound Symbolism. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1823-1828). Austin, TX: Cognitive Science Society.

    Abstract

    Studies of sound symbolism have shown that people can associate sound and meaning in consistent ways when presented with maximally contrastive stimulus pairs of nonwords such as bouba/kiki (rounded/sharp) or mil/mal (small/big). Recent work has shown the effect extends to antonymic words from natural languages and has proposed a role for shared cross-modal correspondences in biasing form-to-meaning associations. An important open question is how the associations work, and particularly what the role is of sound-symbolic matches versus mismatches. We report on a learning task designed to distinguish between three existing theories by using a spectrum of sound-symbolically matching, mismatching, and neutral (neither matching nor mismatching) stimuli. Synthesized stimuli allow us to control for prosody, and the inclusion of a neutral condition allows a direct test of competing accounts. We find evidence for a sound-symbolic match boost, but not for a mismatch difficulty compared to the neutral condition.
  • Lopopolo, A., Frank, S. L., Van den Bosch, A., Nijhof, A., & Willems, R. M. (2018). The Narrative Brain Dataset (NBD), an fMRI dataset for the study of natural language processing in the brain. In B. Devereux, E. Shutova, & C.-R. Huang (Eds.), Proceedings of LREC 2018 Workshop "Linguistic and Neuro-Cognitive Resources (LiNCR) (pp. 8-11). Paris: LREC.

    Abstract

    We present the Narrative Brain Dataset, an fMRI dataset that was collected during spoken presentation of short excerpts of three
    stories in Dutch. Together with the brain imaging data, the dataset contains the written versions of the stimulation texts. The texts are
    accompanied with stochastic (perplexity and entropy) and semantic computational linguistic measures. The richness and unconstrained
    nature of the data allows the study of language processing in the brain in a more naturalistic setting than is common for fMRI studies.
    We hope that by making NBD available we serve the double purpose of providing useful neural data to researchers interested in natural
    language processing in the brain and to further stimulate data sharing in the field of neuroscience of language.
  • Lupyan, G., Wendorf, A., Berscia, L. M., & Paul, J. (2018). Core knowledge or language-augmented cognition? The case of geometric reasoning. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 252-254). Toruń, Poland: NCU Press. doi:10.12775/3991-1.062.
  • Macuch Silva, V., & Roberts, S. G. (2016). Language adapts to signal disruption in interaction. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/20.html.

    Abstract

    Linguistic traits are often seen as reflecting cognitive biases and constraints (e.g. Christiansen & Chater, 2008). However, language must also adapt to properties of the channel through which communication between individuals occurs. Perhaps the most basic aspect of any communication channel is noise. Communicative signals can be blocked, degraded or distorted by other sources in the environment. This poses a fundamental problem for communication. On average, channel disruption accompanies problems in conversation every 3 minutes (27% of cases of other-initiated repair, Dingemanse et al., 2015). Linguistic signals must adapt to this harsh environment. While modern language structures are robust to noise (e.g. Piantadosi et al., 2011), we investigate how noise might have shaped the early emergence of structure in language. The obvious adaptation to noise is redundancy. Signals which are maximally different from competitors are harder to render ambiguous by noise. Redundancy can be increased by adding differentiating segments to each signal (increasing the diversity of segments). However, this makes each signal more complex and harder to learn. Under this strategy, holistic languages may emerge. Another strategy is reduplication - repeating parts of the signal so that noise is less likely to disrupt all of the crucial information. This strategy does not increase the difficulty of learning the language - there is only one extra rule which applies to all signals. Therefore, under pressures for learnability, expressivity and redundancy, reduplicated signals are expected to emerge. However, reduplication is not a pervasive feature of words (though it does occur in limited domains like plurals or iconic meanings). We suggest that this is due to the pressure for redundancy being lifted by conversational infrastructure for repair. Receivers can request that senders repeat signals only after a problem occurs. That is, robustness is achieved by repeating the signal across conversational turns (when needed) instead of within single utterances. As a proof of concept, we ran two iterated learning chains with pairs of individuals in generations learning and using an artificial language (e.g. Kirby et al., 2015). The meaning space was a structured collection of unfamiliar images (3 shapes x 2 textures x 2 outline types). The initial language for each chain was the same written, unstructured, fully expressive language. Signals produced in each generation formed the training language for the next generation. Within each generation, pairs played an interactive communication game. The director was given a target meaning to describe, and typed a word for the matcher, who guessed the target meaning from a set. With a 50% probability, a contiguous section of 3-5 characters in the typed word was replaced by ‘noise’ characters (#). In one chain, the matcher could initiate repair by requesting that the director type and send another signal. Parallel generations across chains were matched for the number of signals sent (if repair was initiated for a meaning, then it was presented twice in the parallel generation where repair was not possible) and noise (a signal for a given meaning which was affected by noise in one generation was affected by the same amount of noise in the parallel generation). For the final set of signals produced in each generation we measured the signal redundancy (the zip compressibility of the signals), the character diversity (entropy of the characters of the signals) and systematic structure (z-score of the correlation between signal edit distance and meaning hamming distance). In the condition without repair, redundancy increased with each generation (r=0.97, p=0.01), and the character diversity decreased (r=-0.99,p=0.001) which is consistent with reduplication, as shown below (part of the initial and the final language): Linear regressions revealed that generations with repair had higher overall systematic structure (main effect of condition, t = 2.5, p < 0.05), increasing character diversity (interaction between condition and generation, t = 3.9, p = 0.01) and redundancy increased at a slower rate (interaction between condition and generation, t = -2.5, p < 0.05). That is, the ability to repair counteracts the pressure from noise, and facilitates the emergence of compositional structure. Therefore, just as systems to repair damage to DNA replication are vital for the evolution of biological species (O’Brien, 2006), conversational repair may regulate replication of linguistic forms in the cultural evolution of language. Future studies should further investigate how evolving linguistic structure is shaped by interaction pressures, drawing on experimental methods and naturalistic studies of emerging languages, both spoken (e.g Botha, 2006; Roberge, 2008) and signed (e.g Senghas, Kita, & Ozyurek, 2004; Sandler et al., 2005).
  • Mai, F., Galke, L., & Scherp, A. (2018). Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 169-178). New York: ACM.

    Abstract

    For (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competitive with the performance on the full-texts if the same number of training samples is used for training. However, it is much easier to obtain title data in large quantities and to use it for training than full-text data. In this paper, we investigate the question how models obtained from training on increasing amounts of title training data compare to models from training on a constant number of full-texts. We evaluate this question on a large-scale dataset from the medical domain (PubMed) and from economics (EconBiz). In these datasets, the titles and annotations of millions of publications are available, and they outnumber the available full-texts by a factor of 20 and 15, respectively. To exploit these large amounts of data to their full potential, we develop three strong deep learning classifiers and evaluate their performance on the two datasets. The results are promising. On the EconBiz dataset, all three classifiers outperform their full-text counterparts by a large margin. The best title-based classifier outperforms the best full-text method by 9.4%. On the PubMed dataset, the best title-based method almost reaches the performance of the best full-text classifier, with a difference of only 2.9%.
  • Majid, A., Van Staden, M., & Enfield, N. J. (2004). The human body in cognition, brain, and typology. In K. Hovie (Ed.), Forum Handbook, 4th International Forum on Language, Brain, and Cognition - Cognition, Brain, and Typology: Toward a Synthesis (pp. 31-35). Sendai: Tohoku University.

    Abstract

    The human body is unique: it is both an object of perception and the source of human experience. Its universality makes it a perfect resource for asking questions about how cognition, brain and typology relate to one another. For example, we can ask how speakers of different languages segment and categorize the human body. A dominant view is that body parts are “given” by visual perceptual discontinuities, and that words are merely labels for these visually determined parts (e.g., Andersen, 1978; Brown, 1976; Lakoff, 1987). However, there are problems with this view. First it ignores other perceptual information, such as somatosensory and motoric representations. By looking at the neural representations of sesnsory representations, we can test how much of the categorization of the human body can be done through perception alone. Second, we can look at language typology to see how much universality and variation there is in body-part categories. A comparison of a range of typologically, genetically and areally diverse languages shows that the perceptual view has only limited applicability (Majid, Enfield & van Staden, in press). For example, using a “coloring-in” task, where speakers of seven different languages were given a line drawing of a human body and asked to color in various body parts, Majid & van Staden (in prep) show that languages vary substantially in body part segmentation. For example, Jahai (Mon-Khmer) makes a lexical distinction between upper arm, lower arm, and hand, but Lavukaleve (Papuan Isolate) has just one word to refer to arm, hand, and leg. This shows that body part categorization is not a straightforward mapping of words to visually determined perceptual parts.
  • Majid, A., Van Staden, M., Boster, J. S., & Bowerman, M. (2004). Event categorization: A cross-linguistic perspective. In K. Forbus, D. Gentner, & T. Tegier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 885-890). Mahwah, NJ: Erlbaum.

    Abstract

    Many studies in cognitive science address how people categorize objects, but there has been comparatively little research on event categorization. This study investigated the categorization of events involving material destruction, such as “cutting” and “breaking”. Speakers of 28 typologically, genetically, and areally diverse languages described events shown in a set of video-clips. There was considerable cross-linguistic agreement in the dimensions along which the events were distinguished, but there was variation in the number of categories and the placement of their boundaries.
  • Majid, A. (2013). Olfactory language and cognition. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual meeting of the Cognitive Science Society (CogSci 2013) (pp. 68). Austin,TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0025/index.html.

    Abstract

    Since the cognitive revolution, a widely held assumption has been that—whereas content may vary across cultures—cognitive processes would be universal, especially those on the more basic levels. Even if scholars do not fully subscribe to this assumption, they often conceptualize, or tend to investigate, cognition as if it were universal (Henrich, Heine, & Norenzayan, 2010). The insight that universality must not be presupposed but scrutinized is now gaining ground, and cognitive diversity has become one of the hot (and controversial) topics in the field (Norenzayan & Heine, 2005). We argue that, for scrutinizing the cultural dimension of cognition, taking an anthropological perspective is invaluable, not only for the task itself, but for attenuating the home-field disadvantages that are inescapably linked to cross-cultural research (Medin, Bennis, & Chandler, 2010).
  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2017). Whether long-term tracking of speech rate affects perception depends on who is talking. In Proceedings of Interspeech 2017 (pp. 586-590). doi:10.21437/Interspeech.2017-1517.

    Abstract

    Speech rate is known to modulate perception of temporally ambiguous speech sounds. For instance, a vowel may be perceived as short when the immediate speech context is slow, but as long when the context is fast. Yet, effects of long-term tracking of speech rate are largely unexplored. Two experiments tested whether long-term tracking of rate influences perception of the temporal Dutch vowel contrast /ɑ/-/a:/. In Experiment 1, one low-rate group listened to 'neutral' rate speech from talker A and to slow speech from talker B. Another high-rate group was exposed to the same neutral speech from A, but to fast speech from B. Between-group comparison of the 'neutral' trials revealed that the low-rate group reported a higher proportion of /a:/ in A's 'neutral' speech, indicating that A sounded faster when B was slow. Experiment 2 tested whether one's own speech rate also contributes to effects of long-term tracking of rate. Here, talker B's speech was replaced by playback of participants' own fast or slow speech. No evidence was found that one's own voice affected perception of talker A in larger speech contexts. These results carry implications for our understanding of the mechanisms involved in rate-dependent speech perception and of dialogue.
  • Matic, D., & Nikolaeva, I. (2014). Focus feature percolation: Evidence from Tundra Nenets and Tundra Yukaghir. In S. Müller (Ed.), Proceedings of the 21st International Conference on Head-Driven Phrase Structure Grammar (HPSG 2014) (pp. 299-317). Stanford, CA: CSLI Publications.

    Abstract

    Two Siberian languages, Tundra Nenets and Tundra Yukaghir, do not obey strong island constraints in questioning: any sub-constituent of a relative or adverbial clause can be questioned. We argue that this has to do with how focusing works in these languages. The focused sub-constituent remains in situ, but there is abundant morphosyntactic evidence that the focus feature is passed up to the head of the clause. The result is the formation of a complex focus structure in which both the head and non head daughter are overtly marked as focus, and they are interpreted as a pairwise list such that the focus background is applicable to this list, but not to other alternative lists
  • Matsuo, A. (2004). Young children's understanding of ongoing vs. completion in present and perfective participles. In J. v. Kampen, & S. Baauw (Eds.), Proceedings of GALA 2003 (pp. 305-316). Utrecht: Netherlands Graduate School of Linguistics (LOT).
  • McQueen, J. M., & Mitterer, H. (2005). Lexically-driven perceptual adjustments of vowel categories. In Proceedings of the ISCA Workshop on Plasticity in Speech Perception (PSP2005) (pp. 233-236).
  • McQueen, J. M., & Cutler, A. (1998). Spotting (different kinds of) words in (different kinds of) context. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2791-2794). Sydney: ICSLP.

    Abstract

    The results of a word-spotting experiment are presented in which Dutch listeners tried to spot different types of bisyllabic Dutch words embedded in different types of nonsense contexts. Embedded verbs were not reliably harder to spot than embedded nouns; this suggests that nouns and verbs are recognised via the same basic processes. Iambic words were no harder to spot than trochaic words, suggesting that trochaic words are not in principle easier to recognise than iambic words. Words were harder to spot in consonantal contexts (i.e., contexts which themselves could not be words) than in longer contexts which contained at least one vowel (i.e., contexts which, though not words, were possible words of Dutch). A control experiment showed that this difference was not due to acoustic differences between the words in each context. The results support the claim that spoken-word recognition is sensitive to the viability of sound sequences as possible words.
  • Merkx, D., & Scharenborg, O. (2018). Articulatory feature classification using convolutional neural networks. In Proceedings of Interspeech 2018 (pp. 2142-2146). doi:10.21437/Interspeech.2018-2275.

    Abstract

    The ultimate goal of our research is to improve an existing speech-based computational model of human speech recognition on the task of simulating the role of fine-grained phonetic information in human speech processing. As part of this work we are investigating articulatory feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Different approaches have been used to build AF classifiers, most notably multi-layer perceptrons. Recently, deep neural networks have been applied to the task of AF classification. This paper aims to improve AF classification by investigating two different approaches: 1) investigating the usefulness of a deep Convolutional neural network (CNN) for AF classification; 2) integrating the Mel filtering operation into the CNN architecture. The results showed a remarkable improvement in classification accuracy of the CNNs over state-of-the-art AF classification results for Dutch, most notably in the minority classes. Integrating the Mel filtering operation into the CNN architecture did not further improve classification performance.
  • Meyer, A. S., & Huettig, F. (Eds.). (2016). Speaking and Listening: Relationships Between Language Production and Comprehension [Special Issue]. Journal of Memory and Language, 89.
  • Micklos, A. (2016). Interaction for facilitating conventionalization: Negotiating the silent gesture communication of noun-verb pairs. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/143.html.

    Abstract

    This study demonstrates how interaction – specifically negotiation and repair – facilitates the emergence, evolution, and conventionalization of a silent gesture communication system. In a modified iterated learning paradigm, partners communicated noun-verb meanings using only silent gesture. The need to disambiguate similar noun-verb pairs drove these "new" language users to develop a morphology that allowed for quicker processing, easier transmission, and improved accuracy. The specific morphological system that emerged came about through a process of negotiation within the dyad, namely by means of repair. By applying a discourse analytic approach to the use of repair in an experimental methodology for language evolution, we are able to determine not only if interaction facilitates the emergence and learnability of a new communication system, but also how interaction affects such a system
  • Micklos, A., Macuch Silva, V., & Fay, N. (2018). The prevalence of repair in studies of language evolution. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 316-318). Toruń, Poland: NCU Press. doi:10.12775/3991-1.075.
  • Micklos, A. (2014). The nature of language in interaction. In E. Cartmill, S. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference.
  • Mitterer, H. (2005). Short- and medium-term plasticity for speaker adaptation seem to be independent. In Proceedings of the ISCA Workshop on Plasticity in Speech Perception (PSP2005) (pp. 83-86).
  • Mizera, P., Pollak, P., Kolman, A., & Ernestus, M. (2014). Impact of irregular pronunciation on phonetic segmentation of Nijmegen corpus of Casual Czech. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings (pp. 499-506). Heidelberg: Springer.

    Abstract

    This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the accuracy of phonetic segmentation when canonical or reduced pronunciation is used. The achieved accuracy of realized phonetic segmentation provides information about general accuracy of proper acoustic modelling which is supposed to be applied in spontaneous speech recognition. As a byproduct of presented spontaneous speech segmentation, this paper also describes the created lexicon with canonical pronunciations of words in NCCCz, a tool supporting pronunciation check of lexicon items, and finally also a minidatabase of selected utterances from NCCCz manually labelled on phonetic level suitable for evaluation purposes
  • Monaghan, P., Brand, J., Frost, R. L. A., & Taylor, G. (2017). Multiple variable cues in the environment promote accurate and robust word learning. In G. Gunzelman, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 817-822). Retrieved from https://mindmodeling.org/cogsci2017/papers/0164/index.html.

    Abstract

    Learning how words refer to aspects of the environment is a complex task, but one that is supported by numerous cues within the environment which constrain the possibilities for matching words to their intended referents. In this paper we tested the predictions of a computational model of multiple cue integration for word learning, that predicted variation in the presence of cues provides an optimal learning situation. In a cross-situational learning task with adult participants, we varied the reliability of presence of distributional, prosodic, and gestural cues. We found that the best learning occurred when cues were often present, but not always. The effect of variability increased the salience of individual cues for the learner, but resulted in robust learning that was not vulnerable to individual cues’ presence or absence. Thus, variability of multiple cues in the language-learning environment provided the optimal circumstances for word learning.
  • Mulder, K., Ten Bosch, L., & Boves, L. (2016). Comparing different methods for analyzing ERP signals. In Proceedings of Interspeech 2016: The 17th Annual Conference of the International Speech Communication Association (pp. 1373-1377). doi:10.21437/Interspeech.2016-967.
  • Mulder, K., Ten Bosch, L., & Boves, L. (2018). Analyzing EEG Signals in Auditory Speech Comprehension Using Temporal Response Functions and Generalized Additive Models. In Proceedings of Interspeech 2018 (pp. 1452-1456). doi:10.21437/Interspeech.2018-1676.

    Abstract

    Analyzing EEG signals recorded while participants are listening to continuous speech with the purpose of testing linguistic hypotheses is complicated by the fact that the signals simultaneously reflect exogenous acoustic excitation and endogenous linguistic processing. This makes it difficult to trace subtle differences that occur in mid-sentence position. We apply an analysis based on multivariate temporal response functions to uncover subtle mid-sentence effects. This approach is based on a per-stimulus estimate of the response of the neural system to speech input. Analyzing EEG signals predicted on the basis of the response functions might then bring to light conditionspecific differences in the filtered signals. We validate this approach by means of an analysis of EEG signals recorded with isolated word stimuli. Then, we apply the validated method to the analysis of the responses to the same words in the middle of meaningful sentences.
  • Ortega, G., & Ozyurek, A. (2016). Generalisable patterns of gesture distinguish semantic categories in communication without language. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1182-1187). Austin, TX: Cognitive Science Society.

    Abstract

    There is a long-standing assumption that gestural forms are geared by a set of modes of representation (acting, representing, drawing, moulding) with each technique expressing speakers’ focus of attention on specific aspects of referents (Müller, 2013). Beyond different taxonomies describing the modes of representation, it remains unclear what factors motivate certain depicting techniques over others. Results from a pantomime generation task show that pantomimes are not entirely idiosyncratic but rather follow generalisable patterns constrained by their semantic category. We show that a) specific modes of representations are preferred for certain objects (acting for manipulable objects and drawing for non-manipulable objects); and b) that use and ordering of deictics and modes of representation operate in tandem to distinguish between semantically related concepts (e.g., “to drink” vs “mug”). This study provides yet more evidence that our ability to communicate through silent gesture reveals systematic ways to describe events and objects around us
  • Ortega, G., & Ozyurek, A. (2013). Gesture-sign interface in hearing non-signers' first exposure to sign. In Proceedings of the Tilburg Gesture Research Meeting [TiGeR 2013].

    Abstract

    Natural sign languages and gestures are complex communicative systems that allow the incorporation of features of a referent into their structure. They differ, however, in that signs are more conventionalised because they consist of meaningless phonological parameters. There is some evidence that despite non-signers finding iconic signs more memorable they can have more difficulty at articulating their exact phonological components. In the present study, hearing non-signers took part in a sign repetition task in which they had to imitate as accurately as possible a set of iconic and arbitrary signs. Their renditions showed that iconic signs were articulated significantly less accurately than arbitrary signs. Participants were recalled six months later to take part in a sign generation task. In this task, participants were shown the English translation of the iconic signs they imitated six months prior. For each word, participants were asked to generate a sign (i.e., an iconic gesture). The handshapes produced in the sign repetition and sign generation tasks were compared to detect instances in which both renditions presented the same configuration. There was a significant correlation between articulation accuracy in the sign repetition task and handshape overlap. These results suggest some form of gestural interference in the production of iconic signs by hearing non-signers. We also suggest that in some instances non-signers may deploy their own conventionalised gesture when producing some iconic signs. These findings are interpreted as evidence that non-signers process iconic signs as gestures and that in production, only when sign and gesture have overlapping features will they be capable of producing the phonological components of signs accurately.
  • Ortega, G., Schiefner, A., & Ozyurek, A. (2017). Speakers’ gestures predict the meaning and perception of iconicity in signs. In G. Gunzelmann, A. Howe, & T. Tenbrink (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 889-894). Austin, TX: Cognitive Science Society.

    Abstract

    Sign languages stand out in that there is high prevalence of
    conventionalised linguistic forms that map directly to their
    referent (i.e., iconic). Hearing adults show low performance
    when asked to guess the meaning of iconic signs suggesting
    that their iconic features are largely inaccessible to them.
    However, it has not been investigated whether speakers’
    gestures, which also share the property of iconicity, may
    assist non-signers in guessing the meaning of signs. Results
    from a pantomime generation task (Study 1) show that
    speakers’ gestures exhibit a high degree of systematicity, and
    share different degrees of form overlap with signs (full,
    partial, and no overlap). Study 2 shows that signs with full
    and partial overlap are more accurately guessed and are
    assigned higher iconicity ratings than signs with no overlap.
    Deaf and hearing adults converge in their iconic depictions
    for some concepts due to the shared conceptual knowledge
    and manual-visual modality.
  • Ortega, G., Sumer, B., & Ozyurek, A. (2014). Type of iconicity matters: Bias for action-based signs in sign language acquisition. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1114-1119). Austin, Tx: Cognitive Science Society.

    Abstract

    Early studies investigating sign language acquisition claimed
    that signs whose structures are motivated by the form of their
    referent (iconic) are not favoured in language development.
    However, recent work has shown that the first signs in deaf
    children’s lexicon are iconic. In this paper we go a step
    further and ask whether different types of iconicity modulate
    learning sign-referent links. Results from a picture description
    task indicate that children and adults used signs with two
    possible variants differentially. While children signing to
    adults favoured variants that map onto actions associated with
    a referent (action signs), adults signing to another adult
    produced variants that map onto objects’ perceptual features
    (perceptual signs). Parents interacting with children used
    more action variants than signers in adult-adult interactions.
    These results are in line with claims that language
    development is tightly linked to motor experience and that
    iconicity can be a communicative strategy in parental input.
  • Ozyurek, A. (1998). An analysis of the basic meaning of Turkish demonstratives in face-to-face conversational interaction. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (Eds.), Oralite et gestualite: Communication multimodale, interaction: actes du colloque ORAGE 98 (pp. 609-614). Paris: L'Harmattan.
  • Peeters, D., Chu, M., Holler, J., Ozyurek, A., & Hagoort, P. (2013). Getting to the point: The influence of communicative intent on the kinematics of pointing gestures. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 1127-1132). Austin, TX: Cognitive Science Society.

    Abstract

    In everyday communication, people not only use speech but
    also hand gestures to convey information. One intriguing
    question in gesture research has been why gestures take the
    specific form they do. Previous research has identified the
    speaker-gesturer’s communicative intent as one factor
    shaping the form of iconic gestures. Here we investigate
    whether communicative intent also shapes the form of
    pointing gestures. In an experimental setting, twenty-four
    participants produced pointing gestures identifying a referent
    for an addressee. The communicative intent of the speakergesturer
    was manipulated by varying the informativeness of
    the pointing gesture. A second independent variable was the
    presence or absence of concurrent speech. As a function of their communicative intent and irrespective of the presence of speech, participants varied the durations of the stroke and the post-stroke hold-phase of their gesture. These findings add to our understanding of how the communicative context influences the form that a gesture takes.
  • Peeters, D. (2016). Processing consequences of onomatopoeic iconicity in spoken language comprehension. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1632-1647). Austin, TX: Cognitive Science Society.

    Abstract

    Iconicity is a fundamental feature of human language. However its processing consequences at the behavioral and neural level in spoken word comprehension are not well understood. The current paper presents the behavioral and electrophysiological outcome of an auditory lexical decision task in which native speakers of Dutch listened to onomatopoeic words and matched control words while their electroencephalogram was recorded. Behaviorally, onomatopoeic words were processed as quickly and accurately as words with an arbitrary mapping between form and meaning. Event-related potentials time-locked to word onset revealed a significant decrease in negative amplitude in the N2 and N400 components and a late positivity for onomatopoeic words in comparison to the control words. These findings advance our understanding of the temporal dynamics of iconic form-meaning mapping in spoken word comprehension and suggest interplay between the neural representations of real-world sounds and spoken words.
  • Peeters, D., Azar, Z., & Ozyurek, A. (2014). The interplay between joint attention, physical proximity, and pointing gesture in demonstrative choice. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1144-1149). Austin, Tx: Cognitive Science Society.
  • Perlman, M., Clark, N., & Tanner, J. (2014). Iconicity and ape gesture. In E. A. Cartmill, S. G. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 236-243). New Jersey: World Scientific.

    Abstract

    Iconic gestures are hypothesized to be c rucial to the evolution of language. Yet the important question of whether apes produce iconic gestures is the subject of considerable debate. This paper presents the current state of research on iconicity in ape gesture. In particular, it describes some of the empirical evidence suggesting that apes produce three different kinds of iconic gestures; it compares the iconicity hypothesis to other major hypotheses of ape gesture; and finally, it offers some directions for future ape gesture research
  • Perlman, M., Fusaroli, R., Fein, D., & Naigles, L. (2017). The use of iconic words in early child-parent interactions. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 913-918). Austin, TX: Cognitive Science Society.

    Abstract

    This paper examines the use of iconic words in early conversations between children and caregivers. The longitudinal data include a span of six observations of 35 children-parent dyads in the same semi-structured activity. Our findings show that children’s speech initially has a high proportion of iconic words, and over time, these words become diluted by an increase of arbitrary words. Parents’ speech is also initially high in iconic words, with a decrease in the proportion of iconic words over time – in this case driven by the use of fewer iconic words. The level and development of iconicity are related to individual differences in the children’s cognitive skills. Our findings fit with the hypothesis that iconicity facilitates early word learning and may play an important role in learning to produce new words.
  • Petersson, K. M., Grenholm, P., & Forkstam, C. (2005). Artificial grammar learning and neural networks. In G. B. Bruna, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th Annual Conference of the Cognitive Science Society (pp. 1726-1731).

    Abstract

    Recent FMRI studies indicate that language related brain regions are engaged in artificial grammar (AG) processing. In the present study we investigate the Reber grammar by means of formal analysis and network simulations. We outline a new method for describing the network dynamics and propose an approach to grammar extraction based on the state-space dynamics of the network. We conclude that statistical frequency-based and rule-based acquisition procedures can be viewed as complementary perspectives on grammar learning, and more generally, that classical cognitive models can be viewed as a special case of a dynamical systems perspective on information processing
  • Piai, V., Roelofs, A., Jensen, O., Schoffelen, J.-M., & Bonnefond, M. (2013). Distinct patterns of brain activity characterize lexical activation and competition in speech production [Abstract]. Journal of Cognitive Neuroscience, 25 Suppl., 106.

    Abstract

    A fundamental ability of speakers is to
    quickly retrieve words from long-term memory. According to a prominent theory, concepts activate multiple associated words, which enter into competition for selection. Previous electrophysiological studies have provided evidence for the activation of multiple alternative words, but did not identify brain responses refl ecting competition. We report a magnetoencephalography study examining the timing and neural substrates of lexical activation and competition. The degree of activation of competing words was
    manipulated by presenting pictures (e.g., dog) simultaneously with distractor
    words. The distractors were semantically related to the picture name (cat), unrelated (pin), or identical (dog). Semantic distractors are stronger competitors to the picture name, because they receive additional activation from the picture, whereas unrelated distractors do not. Picture naming times were longer with semantic than with unrelated and identical distractors. The patterns of phase-locked and non-phase-locked activity were distinct
    but temporally overlapping. Phase-locked activity in left middle temporal
    gyrus, peaking at 400 ms, was larger on unrelated than semantic and identical trials, suggesting differential effort in processing the alternative words activated by the picture-word stimuli. Non-phase-locked activity in the 4-10 Hz range between 400-650 ms in left superior frontal gyrus was larger on semantic than unrelated and identical trials, suggesting different
    degrees of effort in resolving the competition among the alternatives
    words, as refl ected in the naming times. These findings characterize distinct
    patterns of brain activity associated with lexical activation and competition
    respectively, and their temporal relation, supporting the theory that words are selected by competition.
  • Poletiek, F. H., & Rassin E. (Eds.). (2005). Het (on)bewuste [Special Issue]. De Psycholoog.
  • Popov, V., Ostarek, M., & Tenison, C. (2017). Inferential Pitfalls in Decoding Neural Representations. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 961-966). Austin, TX: Cognitive Science Society.

    Abstract

    A key challenge for cognitive neuroscience is to decipher the representational schemes of the brain. A recent class of decoding algorithms for fMRI data, stimulus-feature-based encoding models, is becoming increasingly popular for inferring the dimensions of neural representational spaces from stimulus-feature spaces. We argue that such inferences are not always valid, because decoding can occur even if the neural representational space and the stimulus-feature space use different representational schemes. This can happen when there is a systematic mapping between them. In a simulation, we successfully decoded the binary representation of numbers from their decimal features. Since binary and decimal number systems use different representations, we cannot conclude that the binary representation encodes decimal features. The same argument applies to the decoding of neural patterns from stimulus-feature spaces and we urge caution in inferring the nature of the neural code from such methods. We discuss ways to overcome these inferential limitations.
  • Pouw, W., Aslanidou, A., Kamermans, K. L., & Paas, F. (2017). Is ambiguity detection in haptic imagery possible? Evidence for Enactive imaginings. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 2925-2930). Austin, TX: Cognitive Science Society.

    Abstract

    A classic discussion about visual imagery is whether it affords reinterpretation, like discovering two interpretations in the duck/rabbit illustration. Recent findings converge on reinterpretation being possible in visual imagery, suggesting functional equivalence with pictorial representations. However, it is unclear whether such reinterpretations are necessarily a visual-pictorial achievement. To assess this, 68 participants were briefly presented 2-d ambiguous figures. One figure was presented visually, the other via manual touch alone. Afterwards participants mentally rotated the memorized figures as to discover a novel interpretation. A portion (20.6%) of the participants detected a novel interpretation in visual imagery, replicating previous research. Strikingly, 23.6% of participants were able to reinterpret figures they had only felt. That reinterpretation truly involved haptic processes was further supported, as some participants performed co-thought gestures on an imagined figure during retrieval. These results are promising for further development of an Enactivist approach to imagination.
  • Räsänen, O., Seshadri, S., & Casillas, M. (2018). Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. In Proceedings of Interspeech 2018 (pp. 1200-1204). doi:10.21437/Interspeech.2018-1047.

    Abstract

    Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.
  • Ravignani, A., Gingras, B., Asano, R., Sonnweber, R., Matellan, V., & Fitch, W. T. (2013). The evolution of rhythmic cognition: New perspectives and technologies in comparative research. In M. Knauff, M. Pauen, I. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Conference of the Cognitive Science Society (pp. 1199-1204). Austin,TX: Cognitive Science Society.

    Abstract

    Music is a pervasive phenomenon in human culture, and musical
    rhythm is virtually present in all musical traditions. Research
    on the evolution and cognitive underpinnings of rhythm
    can benefit from a number of approaches. We outline key concepts
    and definitions, allowing fine-grained analysis of rhythmic
    cognition in experimental studies. We advocate comparative
    animal research as a useful approach to answer questions
    about human music cognition and review experimental evidence
    from different species. Finally, we suggest future directions
    for research on the cognitive basis of rhythm. Apart from
    research in semi-natural setups, possibly allowed by “drum set
    for chimpanzees” prototypes presented here for the first time,
    mathematical modeling and systematic use of circular statistics
    may allow promising advances.
  • Ravignani, A., Garcia, M., Gross, S., de Reus, K., Hoeksema, N., Rubio-Garcia, A., & de Boer, B. (2018). Pinnipeds have something to say about speech and rhythm. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 399-401). Toruń, Poland: NCU Press. doi:10.12775/3991-1.095.
  • Ravignani, A., Bowling, D., & Kirby, S. (2014). The psychology of biological clocks: A new framework for the evolution of rhythm. In E. A. Cartmill, S. G. Roberts, & H. Lyn (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 262-269). Singapore: World Scientific.
  • Raviv, L., & Arnon, I. (2016). The developmental trajectory of children's statistical learning abilities. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016). Austin, TX: Cognitive Science Society (pp. 1469-1474). Austin, TX: Cognitive Science Society.

    Abstract

    Infants, children and adults are capable of implicitly extracting regularities from their environment through statistical learning (SL). SL is present from early infancy and found across tasks and modalities, raising questions about the domain generality of SL. However, little is known about its’ developmental trajectory: Is SL fully developed capacity in infancy, or does it improve with age, like other cognitive skills? While SL is well established in infants and adults, only few studies have looked at SL across development with conflicting results: some find age-related improvements while others do not. Importantly, despite its postulated role in language learning, no study has examined the developmental trajectory of auditory SL throughout childhood. Here, we conduct a large-scale study of children's auditory SL across a wide age-range (5-12y, N=115). Results show that auditory SL does not change much across development. We discuss implications for modality-based differences in SL and for its role in language acquisition.
  • Raviv, L., Meyer, A. S., & Lev-Ari, S. (2018). The role of community size in the emergence of linguistic structure. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 402-404). Toruń, Poland: NCU Press. doi:10.12775/3991-1.096.
  • Raviv, L., & Arnon, I. (2016). Language evolution in the lab: The case of child learners. In A. Papagrafou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016). Austin, TX: Cognitive Science Society (pp. 1643-1648). Austin, TX: Cognitive Science Society.

    Abstract

    Recent work suggests that cultural transmission can lead to the emergence of linguistic structure as speakers’ weak individual biases become amplified through iterated learning. However, to date, no published study has demonstrated a similar emergence of linguistic structure in children. This gap is problematic given that languages are mainly learned by children and that adults may bring existing linguistic biases to the task. Here, we conduct a large-scale study of iterated language learning in both children and adults, using a novel, child-friendly paradigm. The results show that while children make more mistakes overall, their languages become more learnable and show learnability biases similar to those of adults. Child languages did not show a significant increase in linguistic structure over time, but consistent mappings between meanings and signals did emerge on many occasions, as found with adults. This provides the first demonstration that cultural transmission affects the languages children and adults produce similarly.
  • Roberts, S. G., Dediu, D., & Levinson, S. C. (2014). Detecting differences between the languages of Neandertals and modern humans. In E. A. Cartmill, S. G. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 501-502). Singapore: World Scientific.

    Abstract

    Dediu and Levinson (2013) argue that Neandertals had essentially modern language and speech, and that they were in genetic contact with the ancestors of modern humans during our dispersal out of Africa. This raises the possibility of cultural and linguistic contact between the two human lineages. If such contact did occur, then it might have influenced the cultural evolution of the languages. Since the genetic traces of contact with Neandertals are limited to the populations outside of Africa, Dediu & Levinson predict that there may be structural differences between the present-day languages derived from languages in contact with Neanderthals, and those derived from languages that were not influenced by such contact. Since the signature of such deep contact might reside in patterns of features, they suggested that machine learning methods may be able to detect these differences. This paper attempts to test this hypothesis and to estimate particular linguistic features that are potential candidates for carrying a signature of Neandertal languages.
  • Roberts, S. G. (2013). A Bottom-up approach to the cultural evolution of bilingualism. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 1229-1234). Austin, TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0236/index.html.

    Abstract

    The relationship between individual cognition and cultural phenomena at the society level can be transformed by cultural transmission (Kirby, Dowman, & Griffiths, 2007). Top-down models of this process have typically assumed that individuals only adopt a single linguistic trait. Recent extensions include ‘bilingual’ agents, able to adopt multiple linguistic traits (Burkett & Griffiths, 2010). However, bilingualism is more than variation within an individual: it involves the conditional use of variation with different interlocutors. That is, bilingualism is a property of a population that emerges from use. A bottom-up simulation is presented where learners are sensitive to the identity of other speakers. The simulation reveals that dynamic social structures are a key factor for the evolution of bilingualism in a population, a feature that was abstracted away in the top-down models. Top-down and bottom-up approaches may lead to different answers, but can work together to reveal and explore important features of the cultural transmission process.
  • Roberts, S. G., & De Vos, C. (2014). Gene-culture coevolution of a linguistic system in two modalities. In B. De Boer, & T. Verhoef (Eds.), Proceedings of Evolang X, Workshop on Signals, Speech, and Signs (pp. 23-27).

    Abstract

    Complex communication can take place in a range of modalities such as auditory, visual, and tactile modalities. In a very general way, the modality that individuals use is constrained by their biological biases (humans cannot use magnetic fields directly to communicate to each other). The majority of natural languages have a large audible component. However, since humans can learn sign languages just as easily, it’s not clear to what extent the prevalence of spoken languages is due to biological biases, the social environment or cultural inheritance. This paper suggests that we can explore the relative contribution of these factors by modelling the spontaneous emergence of sign languages that are shared by the deaf and hearing members of relatively isolated communities. Such shared signing communities have arisen in enclaves around the world and may provide useful insights by demonstrating how languages evolve as the deaf proportion of its members has strong biases towards the visual language modality. In this paper we describe a model of cultural evolution in two modalities, combining aspects that are thought to impact the emergence of sign languages in a more general evolutionary framework. The model can be used to explore hypotheses about how sign languages emerge.
  • Roberts, S. G., Cuskley, C., McCrohon, L., Barceló-Coblijn, L., Feher, O., & Verhoef, T. (Eds.). (2016). The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). doi:10.17617/2.2248195.
  • Roberts, S. G., Thompson, B., & Smith, K. (2014). Social interaction influences the evolution of cognitive biases for language. In E. A. Cartmill, S. G. Roberts, & H. Lyn (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 278-285). Singapore: World Scientific. doi:0.1142/9789814603638_0036.

    Abstract

    Models of cultural evolution demonstrate that the link between individual biases and population- level phenomena can be obscured by the process of cultural transmission (Kirby, Dowman, & Griffiths, 2007). However, recent extensions to these models predict that linguistic diversity will not emerge and that learners should evolve to expect little linguistic variation in their input (Smith & Thompson, 2012). We demonstrate that this result derives from assumptions that privilege certain kinds of social interaction by exploring a range of alternative social models. We find several evolutionary routes to linguistic diversity, and show that social interaction not only influences the kinds of biases which could evolve to support language, but also the effects those biases have on a linguistic system. Given the same starting situation, the evolution of biases for language learning and the distribution of linguistic variation are affected by the kinds of social interaction that a population privileges.
  • Rodd, J., & Chen, A. (2016). Pitch accents show a perceptual magnet effect: Evidence of internal structure in intonation categories. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 697-701).

    Abstract

    The question of whether intonation events have a categorical mental representation has long been a puzzle in prosodic research, and one that experiments testing production and perception across category boundaries have failed to definitively resolve. This paper takes the alternative approach of looking for evidence of structure within a postulated category by testing for a Perceptual Magnet Effect (PME). PME has been found in boundary tones but has not previously been conclusively found in pitch accents. In this investigation, perceived goodness and discriminability of re-synthesised Dutch nuclear rise contours (L*H H%) were evaluated by naive native speakers of Dutch. The variation between these stimuli was quantified using a polynomial-parametric modelling approach (i.e. the SOCoPaSul model) in place of the traditional approach whereby excursion size, peak alignment and pitch register are used independently of each other to quantify variation between pitch accents. Using this approach to calculate the acoustic-perceptual distance between different stimuli, PME was detected: (1) rated goodness, decreased as acoustic-perceptual distance relative to the prototype increased, and (2) equally spaced items far from the prototype were less frequently generalised than equally spaced items in the neighbourhood of the prototype. These results support the concept of categorically distinct intonation events.

    Additional information

    Link to Speech Prosody Website
  • Romberg, A., Zhang, Y., Newman, B., Triesch, J., & Yu, C. (2016). Global and local statistical regularities control visual attention to object sequences. In Proceedings of the 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 262-267).

    Abstract

    Many previous studies have shown that both infants and adults are skilled statistical learners. Because statistical learning is affected by attention, learners' ability to manage their attention can play a large role in what they learn. However, it is still unclear how learners allocate their attention in order to gain information in a visual environment containing multiple objects, especially how prior visual experience (i.e., familiarly of objects) influences where people look. To answer these questions, we collected eye movement data from adults exploring multiple novel objects while manipulating object familiarity with global (frequencies) and local (repetitions) regularities. We found that participants are sensitive to both global and local statistics embedded in their visual environment and they dynamically shift their attention to prioritize some objects over others as they gain knowledge of the objects and their distributions within the task.
  • Rubio-Fernández, P., & Jara-Ettinger, J. (2018). Joint inferences of speakers’ beliefs and referents based on how they speak. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 991-996). Austin, TX: Cognitive Science Society.

    Abstract

    For almost two decades, the poor performance observed with the so-called Director task has been interpreted as evidence of limited use of Theory of Mind in communication. Here we propose a probabilistic model of common ground in referential communication that derives three inferences from an utterance: what the speaker is talking about in a visual context, what she knows about the context, and what referential expressions she prefers. We tested our model by comparing its inferences with those made by human participants and found that it closely mirrors their judgments, whereas an alternative model compromising the hearer’s expectations of cooperativeness and efficiency reveals a worse fit to the human data. Rather than assuming that common ground is fixed in a given exchange and may or may not constrain reference resolution, we show how common ground can be inferred as part of the process of reference assignment.
  • De Ruiter, J. P. (2004). On the primacy of language in multimodal communication. In Workshop Proceedings on Multimodal Corpora: Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces.(LREC2004) (pp. 38-41). Paris: ELRA - European Language Resources Association (CD-ROM).

    Abstract

    In this paper, I will argue that although the study of multimodal interaction offers exciting new prospects for Human Computer Interaction and human-human communication research, language is the primary form of communication, even in multimodal systems. I will support this claim with theoretical and empirical arguments, mainly drawn from human-human communication research, and will discuss the implications for multimodal communication research and Human-Computer Interaction.
  • Saleh, A., Beck, T., Galke, L., & Scherp, A. (2018). Performance comparison of ad-hoc retrieval models over full-text vs. titles of documents. In M. Dobreva, A. Hinze, & M. Žumer (Eds.), Maturity and Innovation in Digital Libraries: 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings (pp. 290-303). Cham, Switzerland: Springer.

    Abstract

    While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.
  • Sauppe, S., Norcliffe, E., Konopka, A. E., Van Valin Jr., R. D., & Levinson, S. C. (2013). Dependencies first: Eye tracking evidence from sentence production in Tagalog. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 1265-1270). Austin, TX: Cognitive Science Society.

    Abstract

    We investigated the time course of sentence formulation in Tagalog, a verb-initial language in which the verb obligatorily agrees with one of its arguments. Eye-tracked participants described pictures of transitive events. Fixations to the two characters in the events were compared across sentences differing in agreement marking and post-verbal word order. Fixation patterns show evidence for two temporally dissociated phases in Tagalog sentence production. The first, driven by verb agreement, involves early linking of concepts to syntactic functions; the second, driven by word order, involves incremental lexical encoding of these concepts. These results suggest that even the earliest stages of sentence formulation may be guided by a language's grammatical structure.
  • Sauter, D., Scott, S., & Calder, A. (2004). Categorisation of vocally expressed positive emotion: A first step towards basic positive emotions? [Abstract]. Proceedings of the British Psychological Society, 12, 111.

    Abstract

    Most of the study of basic emotion expressions has focused on facial expressions and little work has been done to specifically investigate happiness, the only positive of the basic emotions (Ekman & Friesen, 1971). However, a theoretical suggestion has been made that happiness could be broken down into discrete positive emotions, which each fulfil the criteria of basic emotions, and that these would be expressed vocally (Ekman, 1992). To empirically test this hypothesis, 20 participants categorised 80 paralinguistic sounds using the labels achievement, amusement, contentment, pleasure and relief. The results suggest that achievement, amusement and relief are perceived as distinct categories, which subjects accurately identify. In contrast, the categories of contentment and pleasure were systematically confused with other responses, although performance was still well above chance levels. These findings are initial evidence that the positive emotions engage distinct vocal expressions and may be considered to be distinct emotion categories.
  • Sauter, D., Wiland, J., Warren, J., Eisner, F., Calder, A., & Scott, S. K. (2005). Sounds of joy: An investigation of vocal expressions of positive emotions [Abstract]. Journal of Cognitive Neuroscience, 61(Supplement), B99.

    Abstract

    A series of experiment tested Ekman’s (1992) hypothesis that there are a set of positive basic emotions that are expressed using vocal para-linguistic sounds, e.g. laughter and cheers. The proposed categories investigated were amusement, contentment, pleasure, relief and triumph. Behavioural testing using a forced-choice task indicated that participants were able to reliably recognize vocal expressions of the proposed emotions. A cross-cultural study in the preliterate Himba culture in Namibia confirmed that these categories are also recognized across cultures. A recognition test of acoustically manipulated emotional vocalizations established that the recognition of different emotions utilizes different vocal cues, and that these in turn differ from the cues used when comprehending speech. In a study using fMRI we found that relative to a signal correlated noise baseline, the paralinguistic expressions of emotion activated bilateral superior temporal gyri and sulci, lateral and anterior to primary auditory cortex, which is consistent with the processing of non linguistic vocal cues in the auditory ‘what’ pathway. Notably amusement was associated with greater activation extending into both temporal poles and amygdale and insular cortex. Overall, these results support the claim that ‘happiness’ can be fractionated into amusement, pleasure, relief and triumph.
  • Scharenborg, O., & Merkx, D. (2018). The role of articulatory feature representation quality in a computational model of human spoken-word recognition. In Proceedings of the Machine Learning in Speech and Language Processing Workshop (MLSLP 2018).

    Abstract

    Fine-Tracker is a speech-based model of human speech
    recognition. While previous work has shown that Fine-Tracker
    is successful at modelling aspects of human spoken-word
    recognition, its speech recognition performance is not
    comparable to that of human performance, possibly due to
    suboptimal intermediate articulatory feature (AF)
    representations. This study investigates the effect of improved
    AF representations, obtained using a state-of-the-art deep
    convolutional network, on Fine-Tracker’s simulation and
    recognition performance: Although the improved AF quality
    resulted in improved speech recognition; it, surprisingly, did
    not lead to an improvement in Fine-Tracker’s simulation power.
  • Scharenborg, O., & Janse, E. (2013). Changes in the role of intensity as a cue for fricative categorisation. In Proceedings of INTERSPEECH 2013: 14th Annual Conference of the International Speech Communication Association (pp. 3147-3151).

    Abstract

    Older listeners with high-frequency hearing loss rely more on intensity for categorisation of /s/ than normal-hearing older listeners. This study addresses the question whether this increased reliance comes about immediately when the need
    arises, i.e., in the face of a spectrally-degraded signal. A phonetic categorisation task was carried out using intensitymodulated fricatives in a clean and a low-pass filtered condition with two younger and two older listener groups.
    When high-frequency information was removed from the speech signal, younger listeners started using intensity as a cue. The older adults on the other hand, when presented with the low-pass filtered speech, did not rely on intensity differences for fricative identification. These results suggest that the reliance on intensity shown by the older hearingimpaired adults may have been acquired only gradually with
    longer exposure to a degraded speech signal.
  • Scharenborg, O., & Seneff, S. (2005). A two-pass strategy for handling OOVs in a large vocabulary recognition task. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, (pp. 1669-1672). ISCA Archive.

    Abstract

    This paper addresses the issue of large-vocabulary recognition in a specific word class. We propose a two-pass strategy in which only major cities are explicitly represented in the first stage lexicon. An unknown word model encoded as a phone loop is used to detect OOV city names (referred to as rare city names). After which SpeM, a tool that can extract words and word-initial cohorts from phone graphs on the basis of a large fallback lexicon, provides an N-best list of promising city names on the basis of the phone sequences generated in the first stage. This N-best list is then inserted into the second stage lexicon for a subsequent recognition pass. Experiments were conducted on a set of spontaneous telephone-quality utterances each containing one rare city name. We tested the size of the N-best list and three types of language models (LMs). The experiments showed that SpeM was able to include nearly 85% of the correct city names into an N-best list of 3000 city names when a unigram LM, which also boosted the unigram scores of a city name in a given state, was used.
  • Scharenborg, O., Boves, L., & Ten Bosch, L. (2004). ‘On-line early recognition’ of polysyllabic words in continuous speech. In S. Cassidy, F. Cox, R. Mannell, & P. Sallyanne (Eds.), Proceedings of the Tenth Australian International Conference on Speech Science & Technology (pp. 387-392). Canberra: Australian Speech Science and Technology Association Inc.

    Abstract

    In this paper, we investigate the ability of SpeM, our recognition system based on the combination of an automatic phone recogniser and a wordsearch module, to determine as early as possible during the word recognition process whether a word is likely to be recognised correctly (this we refer to as ‘on-line’ early word recognition). We present two measures that can be used to predict whether a word is correctly recognised: the Bayesian word activation and the amount of available (acoustic) information for a word. SpeM was tested on 1,463 polysyllabic words in 885 continuous speech utterances. The investigated predictors indicated that a word activation that is 1) high (but not too high) and 2) based on more phones is more reliable to predict the correctness of a word than a similarly high value based on a small number of phones or a lower value of the word activation.
  • Scharenborg, O. (2005). Parallels between HSR and ASR: How ASR can contribute to HSR. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1237-1240). ISCA Archive.

    Abstract

    In this paper, we illustrate the close parallels between the research fields of human speech recognition (HSR) and automatic speech recognition (ASR) using a computational model of human word recognition, SpeM, which was built using techniques from ASR. We show that ASR has proven to be useful for improving models of HSR by relieving them of some of their shortcomings. However, in order to build an integrated computational model of all aspects of HSR, a lot of issues remain to be resolved. In this process, ASR algorithms and techniques definitely can play an important role.
  • Schmidt, J., Janse, E., & Scharenborg, O. (2014). Age, hearing loss and the perception of affective utterances in conversational speech. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 1929-1933).

    Abstract

    This study investigates whether age and/or hearing loss influence the perception of the emotion dimensions arousal (calm vs. aroused) and valence (positive vs. negative attitude) in conversational speech fragments. Specifically, this study focuses on the relationship between participants' ratings of affective speech and acoustic parameters known to be associated with arousal and valence (mean F0, intensity, and articulation rate). Ten normal-hearing younger and ten older adults with varying hearing loss were tested on two rating tasks. Stimuli consisted of short sentences taken from a corpus of conversational affective speech. In both rating tasks, participants estimated the value of the emotion dimension at hand using a 5-point scale. For arousal, higher intensity was generally associated with higher arousal in both age groups. Compared to younger participants, older participants rated the utterances as less aroused, and showed a smaller effect of intensity on their arousal ratings. For valence, higher mean F0 was associated with more negative ratings in both age groups. Generally, age group differences in rating affective utterances may not relate to age group differences in hearing loss, but rather to other differences between the age groups, as older participants' rating patterns were not associated with their individual hearing loss.
  • Schuller, B., Steidl, S., Batliner, A., Bergelson, E., Krajewski, J., Janott, C., Amatuni, A., Casillas, M., Seidl, A., Soderstrom, M., Warlaumont, A. S., Hidalgo, G., Schnieder, S., Heiser, C., Hohenhorst, W., Herzog, M., Schmitt, M., Qian, K., Zhang, Y., Trigeorgis, G. and 2 moreSchuller, B., Steidl, S., Batliner, A., Bergelson, E., Krajewski, J., Janott, C., Amatuni, A., Casillas, M., Seidl, A., Soderstrom, M., Warlaumont, A. S., Hidalgo, G., Schnieder, S., Heiser, C., Hohenhorst, W., Herzog, M., Schmitt, M., Qian, K., Zhang, Y., Trigeorgis, G., Tzirakis, P., & Zafeiriou, S. (2017). The INTERSPEECH 2017 computational paralinguistics challenge: Addressee, cold & snoring. In Proceedings of Interspeech 2017 (pp. 3442-3446). doi:10.21437/Interspeech.2017-43.

    Abstract

    The INTERSPEECH 2017 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: In the Addressee sub-challenge, it has to be determined whether speech produced by an adult is directed towards another adult or towards a child; in the Cold sub-challenge, speech under cold has to be told apart from ‘healthy’ speech; and in the Snoring subchallenge, four different types of snoring have to be classified. In this paper, we describe these sub-challenges, their conditions, and the baseline feature extraction and classifiers, which include data-learnt feature representations by end-to-end learning with convolutional and recurrent neural networks, and bag-of-audiowords for the first time in the challenge series
  • Scott, K., Sakkalou, E., Ellis-Davies, K., Hilbrink, E., Hahn, U., & Gattis, M. (2013). Infant contributions to joint attention predict vocabulary development. In M. Knauff, M. Pauen, I. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Conference of the Cognitive Science Society (pp. 3384-3389). Austin,TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0602/index.html.

    Abstract

    Joint attention has long been accepted as constituting a privileged circumstance in which word learning prospers. Consequently research has investigated the role that maternal responsiveness to infant attention plays in predicting language outcomes. However there has been a recent expansion in research implicating similar predictive effects from individual differences in infant behaviours. Emerging from the foundations of such work comes an interesting question: do the relative contributions of the mother and infant to joint attention episodes impact upon language learning? In an attempt to address this, two joint attention behaviours were assessed as predictors of vocabulary attainment (as measured by OCDI Production Scores). These predictors were: mothers encouraging attention to an object given their infant was already attending to an object (maternal follow-in); and infants looking to an object given their mothers encouragement of attention to an object (infant follow-in). In a sample of 14-month old children (N=36) we compared the predictive power of these maternal and infant follow-in variables on concurrent and later language performance. Results using Growth Curve Analysis provided evidence that while both maternal follow-in and infant follow-in variables contributed to production scores, infant follow-in was a stronger predictor. Consequently it does appear to matter whose final contribution establishes joint attention episodes. Infants who more often follow-in into their mothers’ encouragement of attention have larger, and faster growing vocabularies between 14 and 18-months of age.
  • Scott, S., & Sauter, D. (2004). Vocal expressions of emotion and positive and negative basic emotions [Abstract]. Proceedings of the British Psychological Society, 12, 156.

    Abstract

    Previous studies have indicated that vocal and facial expressions of the ‘basic’ emotions share aspects of processing. Thus amygdala damage compromises the perception of fear and anger from the face and from the voice. In the current study we tested the hypothesis that there exist positive basic emotions, expressed mainly in the voice (Ekman, 1992). Vocal stimuli were produced to express the specific positive emotions of amusement, achievement, pleasure, contentment and relief.
  • Sekine, K. (2017). Gestural hesitation reveals children’s competence on multimodal communication: Emergence of disguised adaptor. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3113-3118). Austin, TX: Cognitive Science Society.

    Abstract

    Speakers sometimes modify their gestures during the process of production into adaptors such as hair touching or eye scratching. Such disguised adaptors are evidence that the speaker can monitor their gestures. In this study, we investigated when and how disguised adaptors are first produced by children. Sixty elementary school children participated in this study (ten children in each age group; from 7 to 12 years old). They were instructed to watch a cartoon and retell it to their parents. The results showed that children did not produce disguised adaptors until the age of 8. The disguised adaptors accompany fluent speech until the children are 10 years old and accompany dysfluent speech until they reach 11 or 12 years of age. These results suggest that children start to monitor their gestures when they are 9 or 10 years old. Cognitive changes were considered as factors to influence emergence of disguised adaptors
  • Seuren, P. A. M. (2014). Scope and external datives. In B. Cornillie, C. Hamans, & D. Jaspers (Eds.), Proceedings of a mini-symposium on Pieter Seuren's 80th birthday organised at the 47th Annual Meeting of the Societas Linguistica Europaea.

    Abstract

    In this study it is argued that scope, as a property of scope‐creating operators, is a real and important element in the semantico‐grammatical description of languages. The notion of scope is illustrated and, as far as possible, defined. A first idea is given of the ‘grammar of scope’, which defines the relation between scope in the logically structured semantic analysis (SA) of sentences on the one hand and surface structure on the other. Evidence is adduced showing that peripheral preposition phrases (PPPs) in the surface structure of sentences represent scope‐creating operators in SA, and that external datives fall into this category: they are scope‐creating PPPs. It follows that, in English and Dutch, the internal dative (I gave John a book) and the external dative (I gave a book to John) are not simple syntactic variants expressing the same meaning. Instead, internal datives are an integral part of the argument structure of the matrix predicate, whereas external datives represent scope‐creating operators in SA. In the Romance languages, the (non‐pronominal) external dative has been re‐analysed as an argument type dative, but this has not happened in English and Dutch, which have many verbs that only allow for an external dative (e.g. donate, reveal). When both datives are allowed, there are systematic semantic differences, including scope differences.
  • Seuren, P. A. M. (1985). Predicate raising and semantic transparency in Mauritian Creole. In N. Boretzky, W. Enninger, & T. Stolz (Eds.), Akten des 2. Essener Kolloquiums über "Kreolsprachen und Sprachkontakte", 29-30 Nov. 1985 (pp. 203-229). Bochum: Brockmeyer.
  • Shatzman, K. B. (2004). Segmenting ambiguous phrases using phoneme duration. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 329-332). Seoul: Sunjijn Printing Co.

    Abstract

    The results of an eye-tracking experiment are presented in which Dutch listeners' eye movements were monitored as they heard sentences and saw four pictured objects. Participants were instructed to click on the object mentioned in the sentence. In the critical sentences, a stop-initial target (e.g., "pot") was preceded by an [s], thus causing ambiguity regarding whether the sentence refers to a stop-initial or a cluster-initial word (e.g., "spot"). Participants made fewer fixations to the target pictures when the stop and the preceding [s] were cross-spliced from the cluster-initial word than when they were spliced from a different token of the sentence containing the stop-initial word. Acoustic analyses showed that the two versions differed in various measures, but only one of these - the duration of the [s] - correlated with the perceptual effect. Thus, in this context, the [s] duration information is an important factor guiding word recognition.
  • Shayan, S., Moreira, A., Windhouwer, M., Koenig, A., & Drude, S. (2013). LEXUS 3 - a collaborative environment for multimedia lexica. In Proceedings of the Digital Humanities Conference 2013 (pp. 392-395).
  • Shkaravska, O., Van Eekelen, M., & Tamalet, A. (2014). Collected size semantics for strict functional programs over general polymorphic lists. In U. Dal Lago, & R. Pena (Eds.), Foundational and Practical Aspects of Resource Analysis: Third International Workshop, FOPARA 2013, Bertinoro, Italy, August 29-31, 2013, Revised Selected Papers (pp. 143-159). Berlin: Springer.

    Abstract

    Size analysis can be an important part of heap consumption analysis. This paper is a part of ongoing work about typing support for checking output-on-input size dependencies for function definitions in a strict functional language. A significant restriction for our earlier results is that inner data structures (e.g. in a list of lists) all must have the same size. Here, we make a big step forwards by overcoming this limitation via the introduction of higher-order size annotations such that variate sizes of inner data structures can be expressed. In this way the analysis becomes applicable for general, polymorphic nested lists.
  • Sidnell, J., & Stivers, T. (Eds.). (2005). Multimodal Interaction [Special Issue]. Semiotica, 156.
  • Sloetjes, H., & Seibert, O. (2016). Measuring by marking; the multimedia annotation tool ELAN. In A. Spink, G. Riedel, L. Zhou, L. Teekens, R. Albatal, & C. Gurrin (Eds.), Measuring Behavior 2016, 10th International Conference on Methods and Techniques in Behavioral Research (pp. 492-495).

    Abstract

    ELAN is a multimedia annotation tool developed by the Max Planck Institute for Psycholinguistics. It is applied in a variety of research areas. This paper presents a general overview of the tool and new developments as the calculation of inter-rater reliability, a commentary framework, semi-automatic segmentation and labeling and export to Theme.
  • Slonimska, A., & Roberts, S. G. (2017). A case for systematic sound symbolism in pragmatics:The role of the first phoneme in question prediction in context. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 1090-1095). Austin, TX: Cognitive Science Society.

    Abstract

    Turn-taking in conversation is a cognitively demanding process that proceeds rapidly due to interlocutors utilizing a range of cues
    to aid prediction. In the present study we set out to test recent claims that content question words (also called wh-words) sound similar within languages as an adaptation to help listeners predict
    that a question is about to be asked. We test whether upcoming questions can be predicted based on the first phoneme of a turn and the prior context. We analyze the Switchboard corpus of English
    by means of a decision tree to test whether /w/ and /h/ are good statistical cues of upcoming questions in conversation. Based on the results, we perform a controlled experiment to test whether
    people really use these cues to recognize questions. In both studies
    we show that both the initial phoneme and the sequential context help predict questions. This contributes converging evidence that elements of languages adapt to pragmatic pressures applied during
    conversation.
  • De Smedt, K., Hinrichs, E., Meurers, D., Skadiņa, I., Sanford Pedersen, B., Navarretta, C., Bel, N., Lindén, K., Lopatková, M., Hajič, J., Andersen, G., & Lenkiewicz, P. (2014). CLARA: A new generation of researchers in common language resources and their applications. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 2166-2174).
  • Smith, A. C., Monaghan, P., & Huettig, F. (2014). Examining strains and symptoms of the ‘Literacy Virus’: The effects of orthographic transparency on phonological processing in a connectionist model of reading. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014). Austin, TX: Cognitive Science Society.

    Abstract

    The effect of literacy on phonological processing has been described in terms of a virus that “infects all speech processing” (Frith, 1998). Empirical data has established that literacy leads to changes to the way in which phonological information is processed. Harm & Seidenberg (1999) demonstrated that a connectionist network trained to map between English orthographic and phonological representations display’s more componential phonological processing than a network trained only to stably represent the phonological forms of words. Within this study we use a similar model yet manipulate the transparency of orthographic-to-phonological mappings. We observe that networks trained on a transparent orthography are better at restoring phonetic features and phonemes. However, networks trained on non-transparent orthographies are more likely to restore corrupted phonological segments with legal, coarser linguistic units (e.g. onset, coda). Our study therefore provides an explicit description of how differences in orthographic transparency can lead to varying strains and symptoms of the ‘literacy virus’.
  • Smith, A. C., Monaghan, P., & Huettig, F. (2014). A comprehensive model of spoken word recognition must be multimodal: Evidence from studies of language-mediated visual attention. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014). Austin, TX: Cognitive Science Society.

    Abstract

    When processing language, the cognitive system has access to information from a range of modalities (e.g. auditory, visual) to support language processing. Language mediated visual attention studies have shown sensitivity of the listener to phonological, visual, and semantic similarity when processing a word. In a computational model of language mediated visual attention, that models spoken word processing as the parallel integration of information from phonological, semantic and visual processing streams, we simulate such effects of competition within modalities. Our simulations raised untested predictions about stronger and earlier effects of visual and semantic similarity compared to phonological similarity around the rhyme of the word. Two visual world studies confirmed these predictions. The model and behavioral studies suggest that, during spoken word comprehension, multimodal information can be recruited rapidly to constrain lexical selection to the extent that phonological rhyme information may exert little influence on this process.
  • Smith, A. C., Monaghan, P., & Huettig, F. (2013). Modelling the effects of formal literacy training on language mediated visual attention. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 3420-3425). Austin, TX: Cognitive Science Society.

    Abstract

    Recent empirical evidence suggests that language-mediated eye gaze is partly determined by level of formal literacy training. Huettig, Singh and Mishra (2011) showed that high-literate individuals' eye gaze was closely time locked to phonological overlap between a spoken target word and items presented in a visual display. In contrast, low-literate individuals' eye gaze was not related to phonological overlap, but was instead strongly influenced by semantic relationships between items. Our present study tests the hypothesis that this behavior is an emergent property of an increased ability to extract phonological structure from the speech signal, as in the case of high-literates, with low-literates more reliant on more coarse grained structure. This hypothesis was tested using a neural network model, that integrates linguistic information extracted from the speech signal with visual and semantic information within a central resource. We demonstrate that contrasts in fixation behavior similar to those observed between high and low literates emerge when models are trained on speech signals of contrasting granularity.
  • Speed, L., Chen, J., Huettig, F., & Majid, A. (2016). Do classifier categories affect or reflect object concepts? In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 2267-2272). Austin, TX: Cognitive Science Society.

    Abstract

    We conceptualize objects based on sensory and motor information gleaned from real-world experience. But to what extent is such conceptual information structured according to higher level linguistic features too? Here we investigate whether classifiers, a grammatical category, shape the conceptual representations of objects. In three experiments native Mandarin speakers (speakers of a classifier language) and native Dutch speakers (speakers of a language without classifiers) judged the similarity of a target object (presented as a word or picture) with four objects (presented as words or pictures). One object shared a classifier with the target, the other objects did not, serving as distractors. Across all experiments, participants judged the target object as more similar to the object with the shared classifier than distractor objects. This effect was seen in both Dutch and Mandarin speakers, and there was no difference between the two languages. Thus, even speakers of a non-classifier language are sensitive to object similarities underlying classifier systems, and using a classifier system does not exaggerate these similarities. This suggests that classifier systems simply reflect, rather than affect, conceptual structure.
  • Speed, L., & Majid, A. (2018). Music and odor in harmony: A case of music-odor synaesthesia. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 2527-2532). Austin, TX: Cognitive Science Society.

    Abstract

    We report an individual with music-odor synaesthesia who experiences automatic and vivid odor sensations when she hears music. S’s odor associations were recorded on two days, and compared with those of two control participants. Overall, S produced longer descriptions, and her associations were of multiple odors at once, in comparison to controls who typically reported a single odor. Although odor associations were qualitatively different between S and controls, ratings of the consistency of their descriptions did not differ. This demonstrates that crossmodal associations between music and odor exist in non-synaesthetes too. We also found that S is better at discriminating between odors than control participants, and is more likely to experience emotion, memories and evaluations triggered by odors, demonstrating the broader impact of her synaesthesia.

    Additional information

    link to conference website
  • Speed, L., & Majid, A. (2016). Grammatical gender affects odor cognition. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1451-1456). Austin, TX: Cognitive Science Society.

    Abstract

    Language interacts with olfaction in exceptional ways. Olfaction is believed to be weakly linked with language, as demonstrated by our poor odor naming ability, yet olfaction seems to be particularly susceptible to linguistic descriptions. We tested the boundaries of the influence of language on olfaction by focusing on a non-lexical aspect of language (grammatical gender). We manipulated the grammatical gender of fragrance descriptions to test whether the congruence with fragrance gender would affect the way fragrances were perceived and remembered. Native French and German speakers read descriptions of fragrances containing ingredients with feminine or masculine grammatical gender, and then smelled masculine or feminine fragrances and rated them on a number of dimensions (e.g., pleasantness). Participants then completed an odor recognition test. Fragrances were remembered better when presented with descriptions whose grammatical gender matched the gender of the fragrance. Overall, results suggest grammatical manipulations of odor descriptions can affect odor cognition
  • Sprenger, S. A., & Van Rijn, H. (2005). Clock time naming: Complexities of a simple task. In B. G. Bara, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th Annual Meeting of the Cognitive Science Society (pp. 2062-2067).
  • Stanojevic, M., & Alhama, R. G. (2017). Neural discontinuous constituency parsing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 1666-1676). Association for Computational Linguistics.

    Abstract

    One of the most pressing issues in dis-
    continuous constituency transition-based
    parsing is that the relevant information for
    parsing decisions could be located in any
    part of the stack or the buffer. In this pa-
    per, we propose a solution to this prob-
    lem by replacing the structured percep-
    tron model with a recursive neural model
    that computes a global representation of
    the configuration, therefore allowing even
    the most remote parts of the configura-
    tion to influence the parsing decisions. We
    also provide a detailed analysis of how
    this representation should be built out of
    sub-representations of its core elements
    (words, trees and stack). Additionally, we
    investigate how different types of swap or-
    acles influence the results. Our model is
    the first neural discontinuous constituency
    parser, and it outperforms all the previ-
    ously published models on three out of
    four datasets while on the fourth it obtains
    second place by a tiny difference.

    Additional information

    http://aclweb.org/anthology/D17-1174
  • Sumer, B., Grabitz, C., & Küntay, A. (2017). Early produced signs are iconic: Evidence from Turkish Sign Language. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3273-3278). Austin, TX: Cognitive Science Society.

    Abstract

    Motivated form-meaning mappings are pervasive in sign languages, and iconicity has recently been shown to facilitate sign learning from early on. This study investigated the role of iconicity for language acquisition in Turkish Sign Language (TID). Participants were 43 signing children (aged 10 to 45 months) of deaf parents. Sign production ability was recorded using the adapted version of MacArthur Bates Communicative Developmental Inventory (CDI) consisting of 500 items for TID. Iconicity and familiarity ratings for a subset of 104 signs were available. Our results revealed that the iconicity of a sign was positively correlated with the percentage of children producing a sign and that iconicity significantly predicted the percentage of children producing a sign, independent of familiarity or phonological complexity. Our results are consistent with previous findings on sign language acquisition and provide further support for the facilitating effect of iconic form-meaning mappings in sign learning.
  • Sumer, B., Perniss, P., Zwitserlood, I., & Ozyurek, A. (2014). Learning to express "left-right" & "front-behind" in a sign versus spoken language. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1550-1555). Austin, Tx: Cognitive Science Society.

    Abstract

    Developmental studies show that it takes longer for
    children learning spoken languages to acquire viewpointdependent
    spatial relations (e.g., left-right, front-behind),
    compared to ones that are not viewpoint-dependent (e.g.,
    in, on, under). The current study investigates how
    children learn to express viewpoint-dependent relations
    in a sign language where depicted spatial relations can be
    communicated in an analogue manner in the space in
    front of the body or by using body-anchored signs (e.g.,
    tapping the right and left hand/arm to mean left and
    right). Our results indicate that the visual-spatial
    modality might have a facilitating effect on learning to
    express these spatial relations (especially in encoding of
    left-right) in a sign language (i.e., Turkish Sign
    Language) compared to a spoken language (i.e.,
    Turkish).
  • Sumer, B., Perniss, P. M., & Ozyurek, A. (2016). Viewpoint preferences in signing children's spatial descriptions. In J. Scott, & D. Waughtal (Eds.), Proceedings of the 40th Annual Boston University Conference on Language Development (BUCLD 40) (pp. 360-374). Boston, MA: Cascadilla Press.
  • Sumner, M., Kurumada, C., Gafter, R., & Casillas, M. (2013). Phonetic variation and the recognition of words with pronunciation variants. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 3486-3492). Austin, TX: Cognitive Science Society.
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Turn-taking in social talk dialogues: Temporal, formal and functional aspects. In 9th International Conference Speech and Computer (SPECOM'2004) (pp. 454-461).

    Abstract

    This paper presents a quantitative analysis of the
    turn-taking mechanism evidenced in 93 telephone
    dialogues that were taken from the 9-million-word
    Spoken Dutch Corpus. While the first part of the paper
    focuses on the temporal phenomena of turn taking, such
    as durations of pauses and overlaps of turns in the
    dialogues, the second part explores the discoursefunctional
    aspects of utterances in a subset of 8
    dialogues that were annotated especially for this
    purpose. The results show that speakers adapt their turntaking
    behaviour to the interlocutor’s behaviour.
    Furthermore, the results indicate that male-male dialogs
    show a higher proportion of overlapping turns than
    female-female dialogues.
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2018). Analyzing reaction time sequences from human participants in auditory experiments. In Proceedings of Interspeech 2018 (pp. 971-975). doi:10.21437/Interspeech.2018-1728.

    Abstract

    Sequences of reaction times (RT) produced by participants in an experiment are not only influenced by the stimuli, but by many other factors as well, including fatigue, attention, experience, IQ, handedness, etc. These confounding factors result in longterm effects (such as a participant’s overall reaction capability) and in short- and medium-time fluctuations in RTs (often referred to as ‘local speed effects’). Because stimuli are usually presented in a random sequence different for each participant, local speed effects affect the underlying ‘true’ RTs of specific trials in different ways across participants. To be able to focus statistical analysis on the effects of the cognitive process under study, it is necessary to reduce the effect of confounding factors as much as possible. In this paper we propose and compare techniques and criteria for doing so, with focus on reducing (‘filtering’) the local speed effects. We show that filtering matters substantially for the significance analyses of predictors in linear mixed effect regression models. The performance of filtering is assessed by the average between-participant correlation between filtered RT sequences and by Akaike’s Information Criterion, an important measure of the goodness-of-fit of linear mixed effect regression models.
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2014). Comparing reaction time sequences from human participants and computational models. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 462-466).

    Abstract

    This paper addresses the question how to compare reaction times computed by a computational model of speech comprehension with observed reaction times by participants. The question is based on the observation that reaction time sequences substantially differ per participant, which raises the issue of how exactly the model is to be assessed. Part of the variation in reaction time sequences is caused by the so-called local speed: the current reaction time correlates to some extent with a number of previous reaction times, due to slowly varying variations in attention, fatigue etc. This paper proposes a method, based on time series analysis, to filter the observed reaction times in order to separate the local speed effects. Results show that after such filtering the between-participant correlations increase as well as the average correlation between participant and model increases. The presented technique provides insights into relevant aspects that are to be taken into account when comparing reaction time sequences
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2016). Combining data-oriented and process-oriented approaches to modeling reaction time data. In Proceedings of Interspeech 2016: The 17th Annual Conference of the International Speech Communication Association (pp. 2801-2805). doi:10.21437/Interspeech.2016-1072.

    Abstract

    This paper combines two different approaches to modeling reaction time data from lexical decision experiments, viz. a dataoriented statistical analysis by means of a linear mixed effects model, and a process-oriented computational model of human speech comprehension. The linear mixed effect model is implemented by lmer in R. As computational model we apply DIANA, an end-to-end computational model which aims at modeling the cognitive processes underlying speech comprehension. DIANA takes as input the speech signal, and provides as output the orthographic transcription of the stimulus, a word/non-word judgment and the associated reaction time. Previous studies have shown that DIANA shows good results for large-scale lexical decision experiments in Dutch and North-American English. We investigate whether predictors that appear significant in an lmer analysis and processes implemented in DIANA can be related and inform both approaches. Predictors such as ‘previous reaction time’ can be related to a process description; other predictors, such as ‘lexical neighborhood’ are hard-coded in lmer and emergent in DIANA. The analysis focuses on the interaction between subject variables and task variables in lmer, and the ways in which these interactions can be implemented in DIANA.
  • ten Bosch, L., & Scharenborg, O. (2005). ASR decoding in a computational model of human word recognition. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1241-1244). ISCA Archive.

    Abstract

    This paper investigates the interaction between acoustic scores and symbolic mismatch penalties in multi-pass speech decoding techniques that are based on the creation of a segment graph followed by a lexical search. The interaction between acoustic and symbolic mismatches determines to a large extent the structure of the search space of these multipass approaches. The background of this study is a recently developed computational model of human word recognition, called SpeM. SpeM is able to simulate human word recognition data and is built as a multi-pass speech decoder. Here, we focus on unravelling the structure of the search space that is used in SpeM and similar decoding strategies. Finally, we elaborate on the close relation between distances in this search space, and distance measures in search spaces that are based on a combination of acoustic and phonetic features.
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Durational aspects of turn-taking in spontaneous face-to-face and telephone dialogues. In P. Sojka, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: Proceedings of the 7th International Conference TSD 2004 (pp. 563-570). Heidelberg: Springer.

    Abstract

    On the basis of two-speaker spontaneous conversations, it is shown that the distributions of both pauses and speech-overlaps of telephone and faceto-face dialogues have different statistical properties. Pauses in a face-to-face
    dialogue last up to 4 times longer than pauses in telephone conversations in functionally comparable conditions. There is a high correlation (0.88 or larger) between the average pause duration for the two speakers across face-to-face
    dialogues and telephone dialogues. The data provided form a first quantitative analysis of the complex turn-taking mechanism evidenced in the dialogues available in the 9-million-word Spoken Dutch Corpus.
  • Ten Bosch, L., & Boves, L. (2018). Information encoding by deep neural networks: what can we learn? In Proceedings of Interspeech 2018 (pp. 1457-1461). doi:10.21437/Interspeech.2018-1896.

    Abstract

    The recent advent of deep learning techniques in speech tech-nology and in particular in automatic speech recognition hasyielded substantial performance improvements. This suggeststhat deep neural networks (DNNs) are able to capture structurein speech data that older methods for acoustic modeling, suchas Gaussian Mixture Models and shallow neural networks failto uncover. In image recognition it is possible to link repre-sentations on the first couple of layers in DNNs to structuralproperties of images, and to representations on early layers inthe visual cortex. This raises the question whether it is possi-ble to accomplish a similar feat with representations on DNNlayers when processing speech input. In this paper we presentthree different experiments in which we attempt to untanglehow DNNs encode speech signals, and to relate these repre-sentations to phonetic knowledge, with the aim to advance con-ventional phonetic concepts and to choose the topology of aDNNs more efficiently. Two experiments investigate represen-tations formed by auto-encoders. A third experiment investi-gates representations on convolutional layers that treat speechspectrograms as if they were images. The results lay the basisfor future experiments with recursive networks.
  • Ten Bosch, L., Giezenaar, G., Boves, L., & Ernestus, M. (2016). Modeling language-learners' errors in understanding casual speech. In G. Adda, V. Barbu Mititelu, J. Mariani, D. Tufiş, & I. Vasilescu (Eds.), Errors by humans and machines in multimedia, multimodal, multilingual data processing. Proceedings of Errare 2015 (pp. 107-121). Bucharest: Editura Academiei Române.

    Abstract

    In spontaneous conversations, words are often produced in reduced form compared to formal careful speech. In English, for instance, ’probably’ may be pronounced as ’poly’ and ’police’ as ’plice’. Reduced forms are very common, and native listeners usually do not have any problems with interpreting these reduced forms in context. Non-native listeners, however, have great difficulties in comprehending reduced forms. In order to investigate the problems in comprehension that non-native listeners experience, a dictation experiment was conducted in which sentences were presented auditorily to non-natives either in full (unreduced) or reduced form. The types of errors made by the L2 listeners reveal aspects of the cognitive processes underlying this dictation task. In addition, we compare the errors made by these human participants with the type of word errors made by DIANA, a recently developed computational model of word comprehension.
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2017). The recognition of compounds: A computational account. In Proceedings of Interspeech 2017 (pp. 1158-1162). doi:10.21437/Interspeech.2017-1048.

    Abstract

    This paper investigates the processes in comprehending spoken noun-noun compounds, using data from the BALDEY database. BALDEY contains lexicality judgments and reaction times (RTs) for Dutch stimuli for which also linguistic information is included. Two different approaches are combined. The first is based on regression by Dynamic Survival Analysis, which models decisions and RTs as a consequence of the fact that a cumulative density function exceeds some threshold. The parameters of that function are estimated from the observed RT data. The second approach is based on DIANA, a process-oriented computational model of human word comprehension, which simulates the comprehension process with the acoustic stimulus as input. DIANA gives the identity and the number of the word candidates that are activated at each 10 ms time step.

    Both approaches show how the processes involved in comprehending compounds change during a stimulus. Survival Analysis shows that the impact of word duration varies during the course of a stimulus. The density of word and non-word hypotheses in DIANA shows a corresponding pattern with different regimes. We show how the approaches complement each other, and discuss additional ways in which data and process models can be combined.

Share this page