Publications

Displaying 201 - 300 of 374
  • Klein, W. (Ed.). (1996). Zweitspracherwerb [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (104).
  • Kuijpers, C., Van Donselaar, W., & Cutler, A. (1996). Phonological variation: Epenthesis and deletion of schwa in Dutch. In H. T. Bunnell (Ed.), Proceedings of the Fourth International Conference on Spoken Language Processing: Vol. 1 (pp. 94-97). New York: Institute of Electrical and Electronics Engineers.

    Abstract

    Two types of phonological variation in Dutch, resulting from optional rules, are schwa epenthesis and schwa deletion. In a lexical decision experiment it was investigated whether the phonological variants were processed similarly to the standard forms. It was found that the two types of variation patterned differently. Words with schwa epenthesis were processed faster and more accurately than the standard forms, whereas words with schwa deletion led to less fast and less accurate responses. The results are discussed in relation to the role of consonant-vowel alternations in speech processing and the perceptual integrity of onset clusters.
  • Kuzla, C., & Ernestus, M. (2007). Prosodic conditioning of phonetic detail of German plosives. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 461-464). Dudweiler: Pirrot.

    Abstract

    The present study investigates the influence of prosodic structure on the fine-grained phonetic details of German plosives which also cue the phonological fortis-lenis contrast. Closure durations were found to be longer at higher prosodic boundaries. There was also less glottal vibration in lenis plosives at higher prosodic boundaries. Voice onset time in lenis plosives was not affected by prosody. In contrast, for the fortis plosives VOT decreased at higher boundaries, as did the maximal intensity of the release. These results demonstrate that the effects of prosody on different phonetic cues can go into opposite directions, but are overall constrained by the need to maintain phonological contrasts. While prosodic effects on some cues are compatible with a ‘fortition’ account of prosodic strengthening or with a general feature enhancement explanation, the effects on others enhance paradigmatic contrasts only within a given prosodic position.
  • Lai, V. T., Hagoort, P., & Casasanto, D. (2011). Affective and non-affective meaning in words and pictures. In L. Carlson, C. Holscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Meeting of the Cognitive Science Society (pp. 390-395). Austin, TX: Cognitive Science Society.
  • Lai, V. T., Chang, M., Duffield, C., Hwang, J., Xue, N., & Palmer, M. (2007). Defining a methodology for mapping Chinese and English sense inventories. In Proceedings of the 8th Chinese Lexical Semantics Workshop 2007 (CLSW 2007). The Hong Kong Polytechnic University, Hong Kong, May 21-23 (pp. 59-65).

    Abstract

    In this study, we explored methods for linking Chinese and English sense inventories using two opposing approaches: creating links (1) bottom-up: by starting at the finer-grained sense level then proceeding to the verb subcategorization frames and (2) top-down: by starting directly with the more coarse-grained frame levels. The sense inventories for linking include pre-existing corpora, such as English Propbank (Palmer, Gildea, and Kingsbury, 2005), Chinese Propbank (Xue and Palmer, 2004) and English WordNet (Fellbaum, 1998) and newly created corpora, the English and Chinese Sense Inventories from DARPA-GALE OntoNotes. In the linking task, we selected a group of highly frequent and polysemous communication verbs, including say, ask, talk, and speak in English, and shuo, biao-shi, jiang, and wen in Chinese. We found that with the bottom-up method, although speakers of both languages agreed on the links between senses, the subcategorization frames of the corresponding senses did not match consistently. With the top-down method, if the verb frames match in both languages, their senses line up more quickly to each other. The results indicate that the top-down method is more promising in linking English and Chinese sense inventories.
  • Lenkiewicz, P., Auer, E., Schreer, O., Masneri, S., Schneider, D., & Tschöpe, S. (2012). AVATecH ― automated annotation through audio and video analysis. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 209-214). European Language Resources Association.

    Abstract

    In different fields of the humanities annotations of multimodal resources are a necessary component of the research workflow. Examples include linguistics, psychology, anthropology, etc. However, creation of those annotations is a very laborious task, which can take 50 to 100 times the length of the annotated media, or more. This can be significantly improved by applying innovative audio and video processing algorithms, which analyze the recordings and provide automated annotations. This is the aim of the AVATecH project, which is a collaboration of the Max Planck Institute for Psycholinguistics (MPI) and the Fraunhofer institutes HHI and IAIS. In this paper we present a set of results of automated annotation together with an evaluation of their quality.
  • Lenkiewicz, P., Pereira, M., Freire, M., & Fernandes, J. (2008). Accelerating 3D medical image segmentation with high performance computing. In Proceedings of the IEEE International Workshops on Image Processing Theory, Tools and Applications - IPT (pp. 1-8).

    Abstract

    Digital processing of medical images has helped physicians and patients during past years by allowing examination and diagnosis on a very precise level. Nowadays possibly the biggest deal of support it can offer for modern healthcare is the use of high performance computing architectures to treat the huge amounts of data that can be collected by modern acquisition devices. This paper presents a parallel processing implementation of an image segmentation algorithm that operates on a computer cluster equipped with 10 processing units. Thanks to well-organized distribution of the workload we manage to significantly shorten the execution time of the developed algorithm and reach a performance gain very close to linear.
  • Lenkiewicz, P., Wittenburg, P., Schreer, O., Masneri, S., Schneider, D., & Tschöpel, S. (2011). Application of audio and video processing methods for language research. In Proceedings of the conference Supporting Digital Humanities 2011 [SDH 2011], Copenhagen, Denmark, November 17-18, 2011.

    Abstract

    Annotations of media recordings are the grounds for linguistic research. Since creating those annotations is a very laborious task, reaching 100 times longer than the length of the annotated media, innovative audio and video processing algorithms are needed, in order to improve the efficiency and quality of annotation process. The AVATecH project, started by the Max-Planck Institute for Psycholinguistics (MPI) and the Fraunhofer institutes HHI and IAIS, aims at significantly speeding up the process of creating annotations of audio-visual data for humanities research. In order for this to be achieved a range of state-of-the-art audio and video pattern recognition algorithms have been developed and integrated into widely used ELAN annotation tool. To address the problem of heterogeneous annotation tasks and recordings we provide modular components extended by adaptation and feedback mechanisms to achieve competitive annotation quality within significantly less annotation time.
  • Lenkiewicz, P., Wittenburg, P., Gebre, B. G., Lenkiewicz, A., Schreer, O., & Masneri, S. (2011). Application of video processing methods for linguistic research. In Z. Vetulani (Ed.), Human language technologies as a challenge for computer science and linguistics. Proceedings of the 5th Language and Technology Conference (LTC 2011), November 25-27, 2011, Poznań, Poland (pp. 561-564).

    Abstract

    Evolution and changes of all modern languages is a well-known fact. However, recently it is reaching dynamics never seen before, which results in loss of the vast amount of information encoded in every language. In order to preserve such heritage, properly annotated recordings of world languages are necessary. Since creating those annotations is a very laborious task, reaching times 100 longer than the length of the annotated media, innovative video processing algorithms are needed, in order to improve the efficiency and quality of annotation process.
  • Lenkiewicz, A., Lis, M., & Lenkiewicz, P. (2012). Linguistic concepts described with Media Query Language for automated annotation. In J. C. Meiser (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 477-479).

    Abstract

    Introduction Human spoken communication is multimodal, i.e. it encompasses both speech and gesture. Acoustic properties of voice, body movements, facial expression, etc. are an inherent and meaningful part of spoken interaction; they can provide attitudinal, grammatical and semantic information. In the recent years interest in audio-visual corpora has been rising rapidly as they enable investigation of different communicative modalities and provide more holistic view on communication (Kipp et al. 2009). Moreover, for some languages such corpora are the only available resource, as is the case for endangered languages for which no written resources exist.
  • Lenkiewicz, P., Pereira, M., Freire, M., & Fernandes, J. (2011). Extended whole mesh deformation model: Full 3D processing. In Proceedings of the 2011 IEEE International Conference on Image Processing (pp. 1633-1636).

    Abstract

    Processing medical data has always been an interesting field that has shown the need for effective image segmentation methods. Modern medical image segmentation solutions are focused on 3D image volumes, which originate at advanced acquisition devices. Operating on such data in a 3D envi- ronment is essential in order to take the full advantage of the available information. In this paper we present an extended version of our 3D image segmentation and reconstruction model that belongs to the family of Deformable Models and is capable of processing large image volumes in competitive times and in fully 3D environment, offering a big level of automation of the process and a high precision of results. It is also capable of handling topology changes and offers a very good scalability on multi-processing unit architectures. We present a description of the model and show its capabilities in the field of medical image processing.
  • Lenkiewicz, P., Van Uytvanck, D., Wittenburg, P., & Drude, S. (2012). Towards automated annotation of audio and video recordings by application of advanced web-services. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 1880-1883).

    Abstract

    In this paper we describe audio and video processing algorithms that are developed in the scope of AVATecH project. The purpose of these algorithms is to shorten the time taken by manual annotation of audio and video recordings by extracting features from media files and creating semi-automated annotations. We show that the use of such supporting algorithms can shorten the annotation time to 30-50% of the time necessary to perform a fully manual annotation of the same kind.
  • Levelt, W. J. M., & Plomp, R. (1962). Musical consonance and critical bandwidth. In Proceedings of the 4th International Congress Acoustics (pp. 55-55).
  • Little, H., Eryılmaz, K., & De Boer, B. (2016). Emergence of signal structure: Effects of duration constraints. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/25.html.

    Abstract

    Recent work has investigated the emergence of structure in speech using experiments which use artificial continuous signals. Some experiments have had no limit on the duration which signals can have (e.g. Verhoef et al., 2014), and others have had time limitations (e.g. Verhoef et al., 2015). However, the effect of time constraints on the structure in signals has never been experimentally investigated.
  • Little, H., & de Boer, B. (2016). Did the pressure for discrimination trigger the emergence of combinatorial structure? In Proceedings of the 2nd Conference of the International Association for Cognitive Semiotics (pp. 109-110).
  • Little, H., Eryılmaz, K., & De Boer, B. (2016). Differing signal-meaning dimensionalities facilitates the emergence of structure. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/25.html.

    Abstract

    Structure of language is not only caused by cognitive processes, but also by physical aspects of the signalling modality. We test the assumptions surrounding the role which the physical aspects of the signal space will have on the emergence of structure in speech. Here, we use a signal creation task to test whether a signal space and a meaning space having similar dimensionalities will generate an iconic system with signal-meaning mapping and whether, when the topologies differ, the emergence of non-iconic structure is facilitated. In our experiments, signals are created using infrared sensors which use hand position to create audio signals. We find that people take advantage of signal-meaning mappings where possible. Further, we use trajectory probabilities and measures of variance to show that when there is a dimensionality mismatch, more structural strategies are used.
  • Little, H. (2016). Nahran Bhannamz: Language Evolution in an Online Zombie Apocalypse Game. In Createvolang: creativity and innovation in language evolution.
  • Lockwood, G., Hagoort, P., & Dingemanse, M. (2016). Synthesized Size-Sound Sound Symbolism. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1823-1828). Austin, TX: Cognitive Science Society.

    Abstract

    Studies of sound symbolism have shown that people can associate sound and meaning in consistent ways when presented with maximally contrastive stimulus pairs of nonwords such as bouba/kiki (rounded/sharp) or mil/mal (small/big). Recent work has shown the effect extends to antonymic words from natural languages and has proposed a role for shared cross-modal correspondences in biasing form-to-meaning associations. An important open question is how the associations work, and particularly what the role is of sound-symbolic matches versus mismatches. We report on a learning task designed to distinguish between three existing theories by using a spectrum of sound-symbolically matching, mismatching, and neutral (neither matching nor mismatching) stimuli. Synthesized stimuli allow us to control for prosody, and the inclusion of a neutral condition allows a direct test of competing accounts. We find evidence for a sound-symbolic match boost, but not for a mismatch difficulty compared to the neutral condition.
  • Lucas, C., Griffiths, T., Xu, F., & Fawcett, C. (2008). A rational model of preference learning and choice prediction by children. In D. Koller, Y. Bengio, D. Schuurmans, L. Bottou, & A. Culotta (Eds.), Advances in Neural Information Processing Systems.

    Abstract

    Young children demonstrate the ability to make inferences about the preferences of other agents based on their choices. However, there exists no overarching account of what children are doing when they learn about preferences or how they use that knowledge. We use a rational model of preference learning, drawing on ideas from economics and computer science, to explain the behavior of children in several recent experiments. Specifically, we show how a simple econometric model can be extended to capture two- to four-year-olds’ use of statistical information in inferring preferences, and their generalization of these preferences.
  • Macuch Silva, V., & Roberts, S. G. (2016). Language adapts to signal disruption in interaction. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/20.html.

    Abstract

    Linguistic traits are often seen as reflecting cognitive biases and constraints (e.g. Christiansen & Chater, 2008). However, language must also adapt to properties of the channel through which communication between individuals occurs. Perhaps the most basic aspect of any communication channel is noise. Communicative signals can be blocked, degraded or distorted by other sources in the environment. This poses a fundamental problem for communication. On average, channel disruption accompanies problems in conversation every 3 minutes (27% of cases of other-initiated repair, Dingemanse et al., 2015). Linguistic signals must adapt to this harsh environment. While modern language structures are robust to noise (e.g. Piantadosi et al., 2011), we investigate how noise might have shaped the early emergence of structure in language. The obvious adaptation to noise is redundancy. Signals which are maximally different from competitors are harder to render ambiguous by noise. Redundancy can be increased by adding differentiating segments to each signal (increasing the diversity of segments). However, this makes each signal more complex and harder to learn. Under this strategy, holistic languages may emerge. Another strategy is reduplication - repeating parts of the signal so that noise is less likely to disrupt all of the crucial information. This strategy does not increase the difficulty of learning the language - there is only one extra rule which applies to all signals. Therefore, under pressures for learnability, expressivity and redundancy, reduplicated signals are expected to emerge. However, reduplication is not a pervasive feature of words (though it does occur in limited domains like plurals or iconic meanings). We suggest that this is due to the pressure for redundancy being lifted by conversational infrastructure for repair. Receivers can request that senders repeat signals only after a problem occurs. That is, robustness is achieved by repeating the signal across conversational turns (when needed) instead of within single utterances. As a proof of concept, we ran two iterated learning chains with pairs of individuals in generations learning and using an artificial language (e.g. Kirby et al., 2015). The meaning space was a structured collection of unfamiliar images (3 shapes x 2 textures x 2 outline types). The initial language for each chain was the same written, unstructured, fully expressive language. Signals produced in each generation formed the training language for the next generation. Within each generation, pairs played an interactive communication game. The director was given a target meaning to describe, and typed a word for the matcher, who guessed the target meaning from a set. With a 50% probability, a contiguous section of 3-5 characters in the typed word was replaced by ‘noise’ characters (#). In one chain, the matcher could initiate repair by requesting that the director type and send another signal. Parallel generations across chains were matched for the number of signals sent (if repair was initiated for a meaning, then it was presented twice in the parallel generation where repair was not possible) and noise (a signal for a given meaning which was affected by noise in one generation was affected by the same amount of noise in the parallel generation). For the final set of signals produced in each generation we measured the signal redundancy (the zip compressibility of the signals), the character diversity (entropy of the characters of the signals) and systematic structure (z-score of the correlation between signal edit distance and meaning hamming distance). In the condition without repair, redundancy increased with each generation (r=0.97, p=0.01), and the character diversity decreased (r=-0.99,p=0.001) which is consistent with reduplication, as shown below (part of the initial and the final language): Linear regressions revealed that generations with repair had higher overall systematic structure (main effect of condition, t = 2.5, p < 0.05), increasing character diversity (interaction between condition and generation, t = 3.9, p = 0.01) and redundancy increased at a slower rate (interaction between condition and generation, t = -2.5, p < 0.05). That is, the ability to repair counteracts the pressure from noise, and facilitates the emergence of compositional structure. Therefore, just as systems to repair damage to DNA replication are vital for the evolution of biological species (O’Brien, 2006), conversational repair may regulate replication of linguistic forms in the cultural evolution of language. Future studies should further investigate how evolving linguistic structure is shaped by interaction pressures, drawing on experimental methods and naturalistic studies of emerging languages, both spoken (e.g Botha, 2006; Roberge, 2008) and signed (e.g Senghas, Kita, & Ozyurek, 2004; Sandler et al., 2005).
  • Magyari, L., & De Ruiter, J. P. (2008). Timing in conversation: The anticipation of turn endings. In J. Ginzburg, P. Healey, & Y. Sato (Eds.), Proceedings of the 12th Workshop on the Semantics and Pragmatics Dialogue (pp. 139-146). London: King's college.

    Abstract

    We examined how communicators can switch between speaker and listener role with such accurate timing. During conversations, the majority of role transitions happens with a gap or overlap of only a few hundred milliseconds. This suggests that listeners can predict when the turn of the current speaker is going to end. Our hypothesis is that listeners know when a turn ends because they know how it ends. Anticipating the last words of a turn can help the next speaker in predicting when the turn will end, and also in anticipating the content of the turn, so that an appropriate response can be prepared in advance. We used the stimuli material of an earlier experiment (De Ruiter, Mitterer & Enfield, 2006), in which subjects were listening to turns from natural conversations and had to press a button exactly when the turn they were listening to ended. In the present experiment, we investigated if the subjects can complete those turns when only an initial fragment of the turn is presented to them. We found that the subjects made better predictions about the last words of those turns that had more accurate responses in the earlier button press experiment.
  • Majid, A., Van Staden, M., & Enfield, N. J. (2004). The human body in cognition, brain, and typology. In K. Hovie (Ed.), Forum Handbook, 4th International Forum on Language, Brain, and Cognition - Cognition, Brain, and Typology: Toward a Synthesis (pp. 31-35). Sendai: Tohoku University.

    Abstract

    The human body is unique: it is both an object of perception and the source of human experience. Its universality makes it a perfect resource for asking questions about how cognition, brain and typology relate to one another. For example, we can ask how speakers of different languages segment and categorize the human body. A dominant view is that body parts are “given” by visual perceptual discontinuities, and that words are merely labels for these visually determined parts (e.g., Andersen, 1978; Brown, 1976; Lakoff, 1987). However, there are problems with this view. First it ignores other perceptual information, such as somatosensory and motoric representations. By looking at the neural representations of sesnsory representations, we can test how much of the categorization of the human body can be done through perception alone. Second, we can look at language typology to see how much universality and variation there is in body-part categories. A comparison of a range of typologically, genetically and areally diverse languages shows that the perceptual view has only limited applicability (Majid, Enfield & van Staden, in press). For example, using a “coloring-in” task, where speakers of seven different languages were given a line drawing of a human body and asked to color in various body parts, Majid & van Staden (in prep) show that languages vary substantially in body part segmentation. For example, Jahai (Mon-Khmer) makes a lexical distinction between upper arm, lower arm, and hand, but Lavukaleve (Papuan Isolate) has just one word to refer to arm, hand, and leg. This shows that body part categorization is not a straightforward mapping of words to visually determined perceptual parts.
  • Majid, A., & Bowerman, M. (Eds.). (2007). Cutting and breaking events: A crosslinguistic perspective [Special Issue]. Cognitive Linguistics, 18(2).

    Abstract

    This special issue of Cognitive Linguistics explores the linguistic encoding of events of cutting and breaking. In this article we first introduce the project on which it is based by motivating the selection of this conceptual domain, presenting the methods of data collection used by all the investigators, and characterizing the language sample. We then present a new approach to examining crosslinguistic similarities and differences in semantic categorization. Applying statistical modeling to the descriptions of cutting and breaking events elicited from speakers of all the languages, we show that although there is crosslinguistic variation in the number of distinctions made and in the placement of category boundaries, these differences take place within a strongly constrained semantic space: across languages, there is a surprising degree of consensus on the partitioning of events in this domain. In closing, we compare our statistical approach with more conventional semantic analyses, and show how an extensional semantic typological approach like the one illustrated here can help illuminate the intensional distinctions made by languages.
  • Majid, A., Van Staden, M., Boster, J. S., & Bowerman, M. (2004). Event categorization: A cross-linguistic perspective. In K. Forbus, D. Gentner, & T. Tegier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 885-890). Mahwah, NJ: Erlbaum.

    Abstract

    Many studies in cognitive science address how people categorize objects, but there has been comparatively little research on event categorization. This study investigated the categorization of events involving material destruction, such as “cutting” and “breaking”. Speakers of 28 typologically, genetically, and areally diverse languages described events shown in a set of video-clips. There was considerable cross-linguistic agreement in the dimensions along which the events were distinguished, but there was variation in the number of categories and the placement of their boundaries.
  • Majid, A. (2012). Taste in twenty cultures [Abstract]. Abstracts from the XXIth Congress of European Chemoreception Research Organization, ECRO-2011. Publ. in Chemical Senses, 37(3), A10.

    Abstract

    Scholars disagree about the extent to which language can tell us
    about conceptualisation of the world. Some believe that language
    is a direct window onto concepts: Having a word ‘‘bird’’, ‘‘table’’ or
    ‘‘sour’’ presupposes the corresponding underlying concept, BIRD,
    TABLE, SOUR. Others disagree. Words are thought to be uninformative,
    or worse, misleading about our underlying conceptual representations;
    after all, our mental worlds are full of ideas that we
    struggle to express in language. How could this be so, argue sceptics,
    if language were a direct window on our inner life? In this presentation,
    I consider what language can tell us about the
    conceptualisation of taste. By considering linguistic data from
    twenty unrelated cultures – varying in subsistence mode (huntergatherer
    to industrial), ecological zone (rainforest jungle to desert),
    dwelling type (rural and urban), and so forth – I argue any single language is, indeed, impoverished about what it can reveal about
    taste. But recurrent lexicalisation patterns across languages can
    provide valuable insights about human taste experience. Moreover,
    language patterning is part of the data that a good theory of taste
    perception has to be answerable for. Taste researchers, therefore,
    cannot ignore the crosslinguistic facts.
  • Majid, A., & Levinson, S. C. (Eds.). (2011). The senses in language and culture [Special Issue]. The Senses & Society, 6(1).
  • Majid, A., & Levinson, S. C. (2011). The language of perception across cultures [Abstract]. Abstracts of the XXth Congress of European Chemoreception Research Organization, ECRO-2010. Publ. in Chemical Senses, 36(1), E7-E8.

    Abstract

    How are the senses structured by the languages we speak, the cultures we inhabit? To what extent is the encoding of perceptual experiences in languages a matter of how the mind/brain is ―wired-up‖ and to what extent is it a question of local cultural preoccupation? The ―Language of Perception‖ project tests the hypothesis that some perceptual domains may be more ―ineffable‖ – i.e. difficult or impossible to put into words – than others. While cognitive scientists have assumed that proximate senses (olfaction, taste, touch) are more ineffable than distal senses (vision, hearing), anthropologists have illustrated the exquisite variation and elaboration the senses achieve in different cultural milieus. The project is designed to test whether the proximate senses are universally ineffable – suggesting an architectural constraint on cognition – or whether they are just accidentally so in Indo-European languages, so expanding the role of cultural interests and preoccupations. To address this question, a standardized set of stimuli of color patches, geometric shapes, simple sounds, tactile textures, smells and tastes have been used to elicit descriptions from speakers of more than twenty languages—including three sign languages. The languages are typologically, genetically and geographically diverse, representing a wide-range of cultures. The communities sampled vary in subsistence modes (hunter-gatherer to industrial), ecological zones (rainforest jungle to desert), dwelling types (rural and urban), and various other parameters. We examine how codable the different sensory modalities are by comparing how consistent speakers are in how they describe the materials in each modality. Our current analyses suggest that taste may, in fact, be the most codable sensorial domain across languages. Moreover, we have identified exquisite elaboration in the olfactory domains in some cultural settings, contrary to some contemporary predictions within the cognitive sciences. These results suggest that differential codability may be at least partly the result of cultural preoccupation. This shows that the senses are not just physiological phenomena but are constructed through linguistic, cultural and social practices.
  • Majid, A., Boroditsky, L., & Gaby, A. (Eds.). (2012). Time in terms of space [Research topic] [Special Issue]. Frontiers in cultural psychology. Retrieved from http://www.frontiersin.org/cultural_psychology/researchtopics/Time_in_terms_of_space/755.

    Abstract

    This Research Topic explores the question: what is the relationship between representations of time and space in cultures around the world? This question touches on the broader issue of how humans come to represent and reason about abstract entities – things we cannot see or touch. Time is a particularly opportune domain to investigate this topic. Across cultures, people use spatial representations for time, for example in graphs, time-lines, clocks, sundials, hourglasses, and calendars. In language, time is also heavily related to space, with spatial terms often used to describe the order and duration of events. In English, for example, we might move a meeting forward, push a deadline back, attend a long concert or go on a short break. People also make consistent spatial gestures when talking about time, and appear to spontaneously invoke spatial representations when processing temporal language. A large body of evidence suggests a close correspondence between temporal and spatial language and thought. However, the ways that people spatialize time can differ dramatically across languages and cultures. This research topic identifies and explores some of the sources of this variation, including patterns in spatial thinking, patterns in metaphor, gesture and other cultural systems. This Research Topic explores how speakers of different languages talk about time and space and how they think about these domains, outside of language. The Research Topic invites papers exploring the following issues: 1. Do the linguistic representations of space and time share the same lexical and morphosyntactic resources? 2. To what extent does the conceptualization of time follow the conceptualization of space?
  • Malaisé, V., Gazendam, L., & Brugman, H. (2007). Disambiguating automatic semantic annotation based on a thesaurus structure. In Proceedings of TALN 2007.
  • Malt, B. C., Ameel, E., Gennari, S., Imai, M., Saji, N., & Majid, A. (2011). Do words reveal concepts? In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 519-524). Austin, TX: Cognitive Science Society.

    Abstract

    To study concepts, cognitive scientists must first identify some. The prevailing assumption is that they are revealed by words such as triangle, table, and robin. But languages vary dramatically in how they carve up the world by name. Either ordinary concepts must be heavily language-dependent or names cannot be a direct route to concepts. We asked English, Dutch, Spanish, and Japanese speakers to name videos of human locomotion and judge their similarities. We investigated what name inventories and scaling solutions on name similarity and on physical similarity for the groups individually and together suggest about the underlying concepts. Aggregated naming and similarity solutions converged on results distinct from the answers suggested by the word inventories and scaling solutions of any single language. Words such as triangle, table, and robin can help identify the conceptual space of a domain, but they do not directly reveal units of knowledge usefully considered 'concepts'.
  • de Marneffe, M.-C., Tomlinson, J. J., Tice, M., & Sumner, M. (2011). The interaction of lexical frequency and phonetic variation in the perception of accented speech. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 3575-3580). Austin, TX: Cognitive Science Society.

    Abstract

    How listeners understand spoken words despite massive variation in the speech signal is a central issue for linguistic theory. A recent focus on lexical frequency and specificity has proved fruitful in accounting for this phenomenon. Speech perception, though, is a multi-faceted process and likely incorporates a number of mechanisms to map a variable signal to meaning. We examine a well-established language use factor — lexical frequency — and how this factor is integrated with phonetic variability during the perception of accented speech. We show that an integrated perspective highlights a low-level perceptual mechanism that accounts for the perception of accented speech absent native contrasts, while shedding light on the use of interactive language factors in the perception of spoken words.
  • Matsuo, A. (2004). Young children's understanding of ongoing vs. completion in present and perfective participles. In J. v. Kampen, & S. Baauw (Eds.), Proceedings of GALA 2003 (pp. 305-316). Utrecht: Netherlands Graduate School of Linguistics (LOT).
  • McCafferty, S. G., & Gullberg, M. (Eds.). (2008). Gesture and SLA: Toward an integrated approach [Special Issue]. Studies in Second Language Acquisition, 30(2).
  • McQueen, J. M., & Cutler, A. (1998). Spotting (different kinds of) words in (different kinds of) context. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2791-2794). Sydney: ICSLP.

    Abstract

    The results of a word-spotting experiment are presented in which Dutch listeners tried to spot different types of bisyllabic Dutch words embedded in different types of nonsense contexts. Embedded verbs were not reliably harder to spot than embedded nouns; this suggests that nouns and verbs are recognised via the same basic processes. Iambic words were no harder to spot than trochaic words, suggesting that trochaic words are not in principle easier to recognise than iambic words. Words were harder to spot in consonantal contexts (i.e., contexts which themselves could not be words) than in longer contexts which contained at least one vowel (i.e., contexts which, though not words, were possible words of Dutch). A control experiment showed that this difference was not due to acoustic differences between the words in each context. The results support the claim that spoken-word recognition is sensitive to the viability of sound sequences as possible words.
  • Meyer, A. S., & Huettig, F. (Eds.). (2016). Speaking and Listening: Relationships Between Language Production and Comprehension [Special Issue]. Journal of Memory and Language, 89.
  • Micklos, A. (2016). Interaction for facilitating conventionalization: Negotiating the silent gesture communication of noun-verb pairs. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/143.html.

    Abstract

    This study demonstrates how interaction – specifically negotiation and repair – facilitates the emergence, evolution, and conventionalization of a silent gesture communication system. In a modified iterated learning paradigm, partners communicated noun-verb meanings using only silent gesture. The need to disambiguate similar noun-verb pairs drove these "new" language users to develop a morphology that allowed for quicker processing, easier transmission, and improved accuracy. The specific morphological system that emerged came about through a process of negotiation within the dyad, namely by means of repair. By applying a discourse analytic approach to the use of repair in an experimental methodology for language evolution, we are able to determine not only if interaction facilitates the emergence and learnability of a new communication system, but also how interaction affects such a system
  • Mitterer, H. (2007). Top-down effects on compensation for coarticulation are not replicable. In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 1601-1604). Adelaide: Causal Productions.

    Abstract

    Listeners use lexical knowledge to judge what speech sounds they heard. I investigated whether such lexical influences are truly top-down or just reflect a merging of perceptual and lexical constraints. This is achieved by testing whether the lexically determined identity of a phone exerts the appropriate context effects on surrounding phones. The current investigations focuses on compensation for coarticulation in vowel-fricative sequences, where the presence of a rounded vowel (/y/ rather than /i/) leads fricatives to be perceived as /s/ rather than //. This results was consistently found in all three experiments. A vowel was also more likely to be perceived as rounded /y/ if that lead listeners to be perceive words rather than nonwords (Dutch: meny, English id. vs. meni nonword). This lexical influence on the perception of the vowel had, however, no consistent influence on the perception of following fricative.
  • Mitterer, H., & McQueen, J. M. (2007). Tracking perception of pronunciation variation by tracking looks to printed words: The case of word-final /t/. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 1929-1932). Dudweiler: Pirrot.

    Abstract

    We investigated perception of words with reduced word-final /t/ using an adapted eyetracking paradigm. Dutch listeners followed spoken instructions to click on printed words which were accompanied on a computer screen by simple shapes (e.g., a circle). Targets were either above or next to their shapes, and the shapes uniquely identified the targets when the spoken forms were ambiguous between words with or without final /t/ (e.g., bult, bump, vs. bul, diploma). Analysis of listeners’ eye-movements revealed, in contrast to earlier results, that listeners use the following segmental context when compensating for /t/-reduction. Reflecting that /t/-reduction is more likely to occur before bilabials, listeners were more likely to look at the /t/-final words if the next word’s first segment was bilabial. This result supports models of speech perception in which prelexical phonological processes use segmental context to modulate word recognition.
  • Mitterer, H. (Ed.). (2012). Ecological aspects of speech perception [Research topic] [Special Issue]. Frontiers in Cognition.

    Abstract

    Our knowledge of speech perception is largely based on experiments conducted with carefully recorded clear speech presented under good listening conditions to undistracted listeners - a near-ideal situation, in other words. But the reality poses a set of different challenges. First of all, listeners may need to divide their attention between speech comprehension and another task (e.g., driving). Outside the laboratory, the speech signal is often slurred by less than careful pronunciation and the listener has to deal with background noise. Moreover, in a globalized world, listeners need to understand speech in more than their native language. Relatedly, the speakers we listen to often have a different language background so we have to deal with a foreign or regional accent we are not familiar with. Finally, outside the laboratory, speech perception is not an end in itself, but rather a mean to contribute to a conversation. Listeners do not only need to understand the speech they are hearing, they also need to use this information to plan and time their own responses. For this special topic, we invite papers that address any of these ecological aspects of speech perception.
  • Mitterer, H. (2007). Behavior reflects the (degree of) reality of phonological features in the brain as well. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 127-130). Dudweiler: Pirrot.

    Abstract

    To assess the reality of phonological features in language processing (vs. language description), one needs to specify the distinctive claims of distinctive-feature theory. Two of the more farreaching claims are compositionality and generalizability. I will argue that there is some evidence for the first and evidence against the second claim from a recent behavioral paradigm. Highlighting the contribution of a behavioral paradigm also counterpoints the use of brain measures as the only way to elucidate what is "real for the brain". The contributions of the speakers exemplify how brain measures can help us to understand the reality of phonological features in language processing. The evidence is, however, not convincing for a) the claim for underspecification of phonological features—which has to deal with counterevidence from behavioral as well as brain measures—, and b) the claim of position independence of phonological features.
  • Mitterer, H. (2008). How are words reduced in spontaneous speech? In A. Botonis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (pp. 165-168). Athens: University of Athens.

    Abstract

    Words are reduced in spontaneous speech. If reductions are constrained by functional (i.e., perception and production) constraints, they should not be arbitrary. This hypothesis was tested by examing the pronunciations of high- to mid-frequency words in a Dutch and a German spontaneous speech corpus. In logistic-regression models the "reduction likelihood" of a phoneme was predicted by fixed-effect predictors such as position within the word, word length, word frequency, and stress, as well as random effects such as phoneme identity and word. The models for Dutch and German show many communalities. This is in line with the assumption that similar functional constraints influence reductions in both languages.
  • Mitterer, H. (2011). Social accountability influences phonetic alignment. Journal of the Acoustical Society of America. Program abstracts of the 162nd Meeting of the Acoustical Society of America, 130(4), 2442.

    Abstract

    Speakers tend to take over the articulatory habits of their interlocutors [e.g., Pardo, JASA (2006)]. This phonetic alignment could be the consequence of either a social mechanism or a direct and automatic link between speech perception and production. The latter assumes that social variables should have little influence on phonetic alignment. To test that participants were engaged in a "cloze task" (i.e., Stimulus: "In fantasy movies, silver bullets are used to kill ..." Response: "werewolves") with either one or four interlocutors. Given findings with the Asch-conformity paradigm in social psychology, multiple consistent speakers should exert a stronger force on the participant to align. To control the speech style of the interlocutors, their questions and answers were pre-recorded in either a formal or a casual speech style. The stimuli's speech style was then manipulated between participants and was consistent throughout the experiment for a given participant. Surprisingly, participants aligned less with the speech style if there were multiple interlocutors. This may reflect a "diffusion of responsibility:" Participants may find it more important to align when they interact with only one person than with a larger group.
  • Mulder, K., Ten Bosch, L., & Boves, L. (2016). Comparing different methods for analyzing ERP signals. In Proceedings of Interspeech 2016: The 17th Annual Conference of the International Speech Communication Association (pp. 1373-1377). doi:10.21437/Interspeech.2016-967.
  • Namjoshi, J., Tremblay, A., Broersma, M., Kim, S., & Cho, T. (2012). Influence of recent linguistic exposure on the segmentation of an unfamiliar language [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1968.

    Abstract

    Studies have shown that listeners segmenting unfamiliar languages transfer native-language (L1) segmentation cues. These studies, however, conflated L1 and recent linguistic exposure. The present study investigates the relative influences of L1 and recent linguistic exposure on the use of prosodic cues for segmenting an artificial language (AL). Participants were L1-French listeners, high-proficiency L2-French L1-English listeners, and L1-English listeners without functional knowledge of French. The prosodic cue assessed was F0 rise, which is word-final in French, but in English tends to be word-initial. 30 participants heard a 20-minute AL speech stream with word-final boundaries marked by F0 rise, and decided in a subsequent listening task which of two words (without word-final F0 rise) had been heard in the speech stream. The analyses revealed a marginally significant effect of L1 (all listeners) and, importantly, a significant effect of recent linguistic exposure (L1-French and L2-French listeners): accuracy increased with decreasing time in the US since the listeners’ last significant (3+ months) stay in a French-speaking environment. Interestingly, no effect of L2 proficiency was found (L2-French listeners).
  • Narasimhan, B., Eisenbeiss, S., & Brown, P. (Eds.). (2007). The linguistic encoding of multiple-participant events [Special Issue]. Linguistics, 45(3).

    Abstract

    This issue investigates the linguistic encoding of events with three or more participants from the perspectives of language typology and acquisition. Such “multiple-participant events” include (but are not limited to) any scenario involving at least three participants, typically encoded using transactional verbs like 'give' and 'show', placement verbs like 'put', and benefactive and applicative constructions like 'do (something for someone)', among others. There is considerable crosslinguistic and withinlanguage variation in how the participants (the Agent, Causer, Theme, Goal, Recipient, or Experiencer) and the subevents involved in multipleparticipant situations are encoded, both at the lexical and the constructional levels
  • Nordhoff, S., & Hammarström, H. (2011). Glottolog/Langdoc: Defining dialects, languages, and language families as collections of resources. Proceedings of the First International Workshop on Linked Science 2011 (LISC2011), Bonn, Germany, October 24, 2011.

    Abstract

    This paper describes the Glottolog/Langdoc project, an at- tempt to provide near-total bibliographical coverage of descriptive re- sources to the world's languages. Every reference is treated as a resource, as is every \languoid"[1]. References are linked to the languoids which they describe, and languoids are linked to the references described by them. Family relations between languoids are modeled in SKOS, as are relations across dierent classications of the same languages. This setup allows the representation of languoids as collections of references, render- ing the question of the denition of entities like `Scots', `West-Germanic' or `Indo-European' more empirical.
  • Nordhoff, S., & Hammarström, H. (2012). Glottolog/Langdoc: Increasing the visibility of grey literature for low-density languages. In N. Calzolari (Ed.), Proceedings of the 8th International Conference on Language Resources and Evaluation [LREC 2012], May 23-25, 2012 (pp. 3289-3294). [Paris]: ELRA.

    Abstract

    Language resources can be divided into structural resources treating phonology, morphosyntax, semantics etc. and resources treating the social, demographic, ethnic, political context. A third type are meta-resources, like bibliographies, which provide access to the resources of the first two kinds. This poster will present the Glottolog/Langdoc project, a comprehensive bibliography providing web access to 180k bibliographical records to (mainly) low visibility resources from low-density languages. The resources are annotated for macro-area, content language, and document type and are available in XHTML and RDF.
  • Omar, R., Henley, S. M., Hailstone, J. C., Sauter, D., Scott, S. K., Fox, N. C., Rossor, M. N., & Warren, J. D. (2007). Recognition of emotions in faces, voices and music in frontotemporal lobar regeneration [Abstract]. Journal of Neurology, Neurosurgery & Psychiatry, 78(9), 1014.

    Abstract

    Frontotemporal lobar degeneration (FTLD) is a group of neurodegenerative conditions characterised by focal frontal and/or temporal lobe atrophy. Patients develop a range of cognitive and behavioural abnormalities, including prominent difficulties in comprehending and expressing emotions, with significant clinical and social consequences. Here we report a systematic prospective analysis of emotion processing in different input modalities in patients with FTLD. We examined recognition of happiness, sadness, fear and anger in facial expressions, non-verbal vocalisations and music in patients with FTLD and in healthy age matched controls. The FTLD group was significantly impaired in all modalities compared with controls, and this effect was most marked for music. Analysing each emotion separately, recognition of negative emotions was impaired in all three modalities in FTLD, and this effect was most marked for fear and anger. Recognition of happiness was deficient only with music. Our findings support the idea that FTLD causes impaired recognition of emotions across input channels, consistent with a common central representation of emotion concepts. Music may be a sensitive probe of emotional deficits in FTLD, perhaps because it requires a more abstract representation of emotion than do animate stimuli such as faces and voices.
  • Ortega, G., & Ozyurek, A. (2016). Generalisable patterns of gesture distinguish semantic categories in communication without language. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1182-1187). Austin, TX: Cognitive Science Society.

    Abstract

    There is a long-standing assumption that gestural forms are geared by a set of modes of representation (acting, representing, drawing, moulding) with each technique expressing speakers’ focus of attention on specific aspects of referents (Müller, 2013). Beyond different taxonomies describing the modes of representation, it remains unclear what factors motivate certain depicting techniques over others. Results from a pantomime generation task show that pantomimes are not entirely idiosyncratic but rather follow generalisable patterns constrained by their semantic category. We show that a) specific modes of representations are preferred for certain objects (acting for manipulable objects and drawing for non-manipulable objects); and b) that use and ordering of deictics and modes of representation operate in tandem to distinguish between semantically related concepts (e.g., “to drink” vs “mug”). This study provides yet more evidence that our ability to communicate through silent gesture reveals systematic ways to describe events and objects around us
  • Ozturk, O., & Papafragou, A. (2008). Acquisition of evidentiality and source monitoring. In H. Chan, H. Jacob, & E. Kapia (Eds.), Proceedings from the 32nd Annual Boston University Conference on Language Development [BUCLD 32] (pp. 368-377). Somerville, Mass.: Cascadilla Press.
  • Ozyurek, A. (1998). An analysis of the basic meaning of Turkish demonstratives in face-to-face conversational interaction. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (Eds.), Oralite et gestualite: Communication multimodale, interaction: actes du colloque ORAGE 98 (pp. 609-614). Paris: L'Harmattan.
  • Papafragou, A., & Ozturk, O. (2007). Children's acquisition of modality. In Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America (GALANA 2) (pp. 320-327). Somerville, Mass.: Cascadilla Press.
  • Papafragou, A. (2007). On the acquisition of modality. In T. Scheffler, & L. Mayol (Eds.), Penn Working Papers in Linguistics. Proceedings of the 30th Annual Penn Linguistics Colloquium (pp. 281-293). Department of Linguistics, University of Pennsylvania.
  • Peeters, D. (2016). Processing consequences of onomatopoeic iconicity in spoken language comprehension. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1632-1647). Austin, TX: Cognitive Science Society.

    Abstract

    Iconicity is a fundamental feature of human language. However its processing consequences at the behavioral and neural level in spoken word comprehension are not well understood. The current paper presents the behavioral and electrophysiological outcome of an auditory lexical decision task in which native speakers of Dutch listened to onomatopoeic words and matched control words while their electroencephalogram was recorded. Behaviorally, onomatopoeic words were processed as quickly and accurately as words with an arbitrary mapping between form and meaning. Event-related potentials time-locked to word onset revealed a significant decrease in negative amplitude in the N2 and N400 components and a late positivity for onomatopoeic words in comparison to the control words. These findings advance our understanding of the temporal dynamics of iconic form-meaning mapping in spoken word comprehension and suggest interplay between the neural representations of real-world sounds and spoken words.
  • Perniss, P. M., Zwitserlood, I., & Ozyurek, A. (2011). Does space structure spatial language? Linguistic encoding of space in sign languages. In L. Carlson, C. Holscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Meeting of the Cognitive Science Society (pp. 1595-1600). Austin, TX: Cognitive Science Society.
  • Petersson, K. M. (2008). On cognition, structured sequence processing, and adaptive dynamical systems. American Institute of Physics Conference Proceedings, 1060(1), 195-200.

    Abstract

    Cognitive neuroscience approaches the brain as a cognitive system: a system that functionally is conceptualized in terms of information processing. We outline some aspects of this concept and consider a physical system to be an information processing device when a subclass of its physical states can be viewed as representational/cognitive and transitions between these can be conceptualized as a process operating on these states by implementing operations on the corresponding representational structures. We identify a generic and fundamental problem in cognition: sequentially organized structured processing. Structured sequence processing provides the brain, in an essential sense, with its processing logic. In an approach addressing this problem, we illustrate how to integrate levels of analysis within a framework of adaptive dynamical systems. We note that the dynamical system framework lends itself to a description of asynchronous event-driven devices, which is likely to be important in cognition because the brain appears to be an asynchronous processing system. We use the human language faculty and natural language processing as a concrete example through out.
  • Poellmann, K., McQueen, J. M., & Mitterer, H. (2012). How talker-adaptation helps listeners recognize reduced word-forms [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 2053.

    Abstract

    Two eye-tracking experiments tested whether native listeners can adapt
    to reductions in casual Dutch speech. Listeners were exposed to segmental
    ([b] > [m]), syllabic (full-vowel-deletion), or no reductions. In a subsequent
    test phase, all three listener groups were tested on how efficiently they could
    recognize both types of reduced words. In the first Experiment’s exposure
    phase, the (un)reduced target words were predictable. The segmental reductions
    were completely consistent (i.e., involved the same input sequences).
    Learning about them was found to be pattern-specific and generalized in the
    test phase to new reduced /b/-words. The syllabic reductions were not consistent
    (i.e., involved variable input sequences). Learning about them was
    weak and not pattern-specific. Experiment 2 examined effects of word repetition
    and predictability. The (un-)reduced test words appeared in the exposure
    phase and were not predictable. There was no evidence of learning for
    the segmental reductions, probably because they were not predictable during
    exposure. But there was word-specific learning for the vowel-deleted words.
    The results suggest that learning about reductions is pattern-specific and
    generalizes to new words if the input is consistent and predictable. With
    variable input, there is more likely to be adaptation to a general speaking
    style and word-specific learning.
  • Poellmann, K., McQueen, J. M., & Mitterer, H. (2011). The time course of perceptual learning. In W.-S. Lee, & E. Zee (Eds.), Proceedings of the 17th International Congress of Phonetic Sciences 2011 [ICPhS XVII] (pp. 1618-1621). Hong Kong: Department of Chinese, Translation and Linguistics, City University of Hong Kong.

    Abstract

    Two groups of participants were trained to perceive an ambiguous sound [s/f] as either /s/ or /f/ based on lexical bias: One group heard the ambiguous fricative in /s/-final words, the other in /f/-final words. This kind of exposure leads to a recalibration of the /s/-/f/ contrast [e.g., 4]. In order to investigate when and how this recalibration emerges, test trials were interspersed among training and filler trials. The learning effect needed at least 10 clear training items to arise. Its emergence seemed to occur in a rather step-wise fashion. Learning did not improve much after it first appeared. It is likely, however, that the early test trials attracted participants' attention and therefore may have interfered with the learning process.
  • Rapold, C. J. (2007). From demonstratives to verb agreement in Benchnon: A diachronic perspective. In A. Amha, M. Mous, & G. Savà (Eds.), Omotic and Cushitic studies: Papers from the Fourth Cushitic Omotic Conference, Leiden, 10-12 April 2003 (pp. 69-88). Cologne: Rüdiger Köppe.
  • Ravignani, A., & Fitch, W. T. (2012). Sonification of experimental parameters as a new method for efficient coding of behavior. In A. Spink, F. Grieco, O. E. Krips, L. W. S. Loijens, L. P. P. J. Noldus, & P. H. Zimmerman (Eds.), Measuring Behavior 2012, 8th International Conference on Methods and Techniques in Behavioral Research (pp. 376-379).

    Abstract

    Cognitive research is often focused on experimental condition-driven reactions. Ethological studies frequently
    rely on the observation of naturally occurring specific behaviors. In both cases, subjects are filmed during the
    study, so that afterwards behaviors can be coded on video. Coding should typically be blind to experimental
    conditions, but often requires more information than that present on video. We introduce a method for blindcoding
    of behavioral videos that takes care of both issues via three main innovations. First, of particular
    significance for playback studies, it allows creation of a “soundtrack” of the study, that is, a track composed of
    synthesized sounds representing different aspects of the experimental conditions, or other events, over time.
    Second, it facilitates coding behavior using this audio track, together with the possibly muted original video.
    This enables coding blindly to conditions as required, but not ignoring other relevant events. Third, our method
    makes use of freely available, multi-platform software, including scripts we developed.
  • Raviv, L., & Arnon, I. (2016). The developmental trajectory of children's statistical learning abilities. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016). Austin, TX: Cognitive Science Society (pp. 1469-1474). Austin, TX: Cognitive Science Society.

    Abstract

    Infants, children and adults are capable of implicitly extracting regularities from their environment through statistical learning (SL). SL is present from early infancy and found across tasks and modalities, raising questions about the domain generality of SL. However, little is known about its’ developmental trajectory: Is SL fully developed capacity in infancy, or does it improve with age, like other cognitive skills? While SL is well established in infants and adults, only few studies have looked at SL across development with conflicting results: some find age-related improvements while others do not. Importantly, despite its postulated role in language learning, no study has examined the developmental trajectory of auditory SL throughout childhood. Here, we conduct a large-scale study of children's auditory SL across a wide age-range (5-12y, N=115). Results show that auditory SL does not change much across development. We discuss implications for modality-based differences in SL and for its role in language acquisition.
  • Raviv, L., & Arnon, I. (2016). Language evolution in the lab: The case of child learners. In A. Papagrafou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016). Austin, TX: Cognitive Science Society (pp. 1643-1648). Austin, TX: Cognitive Science Society.

    Abstract

    Recent work suggests that cultural transmission can lead to the emergence of linguistic structure as speakers’ weak individual biases become amplified through iterated learning. However, to date, no published study has demonstrated a similar emergence of linguistic structure in children. This gap is problematic given that languages are mainly learned by children and that adults may bring existing linguistic biases to the task. Here, we conduct a large-scale study of iterated language learning in both children and adults, using a novel, child-friendly paradigm. The results show that while children make more mistakes overall, their languages become more learnable and show learnability biases similar to those of adults. Child languages did not show a significant increase in linguistic structure over time, but consistent mappings between meanings and signals did emerge on many occasions, as found with adults. This provides the first demonstration that cultural transmission affects the languages children and adults produce similarly.
  • Regier, T., Khetarpal, N., & Majid, A. (2011). Inferring conceptual structure from cross-language data. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 1488). Austin, TX: Cognitive Science Society.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). The strength of stress-related lexical competition depends on the presence of first-syllable stress. In Proceedings of Interspeech 2008 (pp. 1954-1954).

    Abstract

    Dutch listeners' looks to printed words were tracked while they listened to instructions to click with their mouse on one of them. When presented with targets from word pairs where the first two syllables were segmentally identical but differed in stress location, listeners used stress information to recognize the target before segmental information disambiguated the words. Furthermore, the amount of lexical competition was influenced by the presence or absence of word-initial stress.
  • Reinisch, E., & Weber, A. (2011). Adapting to lexical stress in a foreign accent. In W.-S. Lee, & E. Zee (Eds.), Proceedings of the 17th International Congress of Phonetic Sciences 2011 [ICPhS XVII] (pp. 1678-1681). Hong Kong: Department of Chinese, Translation and Linguistics, City University of Hong Kong.

    Abstract

    An exposure-test paradigm was used to examine whether Dutch listeners can adapt their perception to non-canonical marking of lexical stress in Hungarian-accented Dutch. During exposure, one group of listeners heard only words with correct initial stress, while another group also heard examples of unstressed initial syllables that were marked by high pitch, a possible stress cue in Dutch. Subsequently, listeners’ eye movements to target-competitor pairs with segmental overlap but different stress patterns were tracked while hearing Hungarian-accented Dutch. Listeners who had heard non-canonically produced words previously distinguished target-competitor pairs faster than listeners who had only been exposed to canonical forms before. This suggests that listeners can adapt quickly to speaker-specific realizations of non-canonical lexical stress.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). Lexical stress information modulates the time-course of spoken-word recognition. In Proceedings of Acoustics' 08 (pp. 3183-3188).

    Abstract

    Segmental as well as suprasegmental information is used by Dutch listeners to recognize words. The time-course of the effect of suprasegmental stress information on spoken-word recognition was investigated in a previous study, in which we tracked Dutch listeners' looks to arrays of four printed words as they listened to spoken sentences. Each target was displayed along with a competitor that did not differ segmentally in its first two syllables but differed in stress placement (e.g., 'CENtimeter' and 'sentiMENT'). The listeners' eye-movements showed that stress information is used to recognize the target before distinct segmental information is available. Here, we examine the role of durational information in this effect. Two experiments showed that initial-syllable duration, as a cue to lexical stress, is not interpreted dependent on the speaking rate of the preceding carrier sentence. This still held when other stress cues like pitch and amplitude were removed. Rather, the speaking rate of the preceding carrier affected the speed of word recognition globally, even though the rate of the target itself was not altered. Stress information modulated lexical competition, but did so independently of the rate of the preceding carrier, even if duration was the only stress cue present.
  • Reinisch, E., Weber, A., & Mitterer, H. (2011). Listeners retune phoneme boundaries across languages [Abstract]. Journal of the Acoustical Society of America. Program abstracts of the 162nd Meeting of the Acoustical Society of America, 130(4), 2572-2572.

    Abstract

    Listeners can flexibly retune category boundaries of their native language to adapt to non-canonically produced phonemes. This only occurs, however, if the pronunciation peculiarities can be attributed to stable and not transient speaker-specific characteristics. Listening to someone speaking a second language, listeners could attribute non-canonical pronunciations either to the speaker or to the fact that she is modifying her categories in the second language. We investigated whether, following exposure to Dutch-accented English, Dutch listeners show effects of category retuning during test where they hear the same speaker speaking her native language, Dutch. Exposure was a lexical-decision task where either word-final [f] or [s] was replaced by an ambiguous sound. At test listeners categorized minimal word pairs ending in sounds along an [f]-[s] continuum. Following exposure to English words, Dutch listeners showed boundary shifts of a similar magnitude as following exposure to the same phoneme variants in their native language. This suggests that production patterns in a second language are deemed a stable characteristic. A second experiment suggests that category retuning also occurs when listeners are exposed to and tested with a native speaker of their second language. Listeners thus retune phoneme boundaries across languages.
  • Ringersma, J., & Kemps-Snijders, M. (2007). Creating multimedia dictionaries of endangered languages using LEXUS. In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 65-68). Baixas, France: ISCA-Int.Speech Communication Assoc.

    Abstract

    This paper reports on the development of a flexible web based lexicon tool, LEXUS. LEXUS is targeted at linguists involved in language documentation (of endangered languages). It allows the creation of lexica within the structure of the proposed ISO LMF standard and uses the proposed concept naming conventions from the ISO data categories, thus enabling interoperability, search and merging. LEXUS also offers the possibility to visualize language, since it provides functionalities to include audio, video and still images to the lexicon. With LEXUS it is possible to create semantic network knowledge bases, using typed relations. The LEXUS tool is free for use. Index Terms: lexicon, web based application, endangered languages, language documentation.
  • Roberts, L., & Meyer, A. S. (Eds.). (2012). Individual differences in second language acquisition [Special Issue]. Language Learning, 62(Supplement S2).
  • Robotham, L., Trinkler, I., & Sauter, D. (2008). The power of positives: Evidence for an overall emotional recognition deficit in Huntington's disease [Abstract]. Journal of Neurology, Neurosurgery & Psychiatry, 79, A12.

    Abstract

    The recognition of emotions of disgust, anger and fear have been shown to be significantly impaired in Huntington’s disease (eg,Sprengelmeyer et al, 1997, 2006; Gray et al, 1997; Milders et al, 2003,Montagne et al, 2006; Johnson et al, 2007; De Gelder et al, 2008). The relative impairment of these emotions might have implied a recognition impairment specific to negative emotions. Could the asymmetric recognition deficits be due not to the complexity of the emotion but rather reflect the complexity of the task? In the current study, 15 Huntington’s patients and 16 control subjects were presented with negative and positive non-speech emotional vocalisations that were to be identified as anger, fear, sadness, disgust, achievement, pleasure and amusement in a forced-choice paradigm. This experiment more accurately matched the negative emotions with positive emotions in a homogeneous modality. The resulting dually impaired ability of Huntington’s patients to identify negative and positive non-speech emotional vocalisations correctly provides evidence for an overall emotional recognition deficit in the disease. These results indicate that previous findings of a specificity in emotional recognition deficits might instead be due to the limitations of the visual modality. Previous experiments may have found an effect of emotional specificy due to the presence of a single positive emotion, happiness, in the midst of multiple negative emotions. In contrast with the previous literature, the study presented here points to a global deficit in the recognition of emotional sounds.
  • Rodd, J., & Chen, A. (2016). Pitch accents show a perceptual magnet effect: Evidence of internal structure in intonation categories. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 697-701).

    Abstract

    The question of whether intonation events have a categorical mental representation has long been a puzzle in prosodic research, and one that experiments testing production and perception across category boundaries have failed to definitively resolve. This paper takes the alternative approach of looking for evidence of structure within a postulated category by testing for a Perceptual Magnet Effect (PME). PME has been found in boundary tones but has not previously been conclusively found in pitch accents. In this investigation, perceived goodness and discriminability of re-synthesised Dutch nuclear rise contours (L*H H%) were evaluated by naive native speakers of Dutch. The variation between these stimuli was quantified using a polynomial-parametric modelling approach (i.e. the SOCoPaSul model) in place of the traditional approach whereby excursion size, peak alignment and pitch register are used independently of each other to quantify variation between pitch accents. Using this approach to calculate the acoustic-perceptual distance between different stimuli, PME was detected: (1) rated goodness, decreased as acoustic-perceptual distance relative to the prototype increased, and (2) equally spaced items far from the prototype were less frequently generalised than equally spaced items in the neighbourhood of the prototype. These results support the concept of categorically distinct intonation events.

    Additional information

    Link to Speech Prosody Website
  • Romberg, A., Zhang, Y., Newman, B., Triesch, J., & Yu, C. (2016). Global and local statistical regularities control visual attention to object sequences. In Proceedings of the 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 262-267).

    Abstract

    Many previous studies have shown that both infants and adults are skilled statistical learners. Because statistical learning is affected by attention, learners' ability to manage their attention can play a large role in what they learn. However, it is still unclear how learners allocate their attention in order to gain information in a visual environment containing multiple objects, especially how prior visual experience (i.e., familiarly of objects) influences where people look. To answer these questions, we collected eye movement data from adults exploring multiple novel objects while manipulating object familiarity with global (frequencies) and local (repetitions) regularities. We found that participants are sensitive to both global and local statistics embedded in their visual environment and they dynamically shift their attention to prioritize some objects over others as they gain knowledge of the objects and their distributions within the task.
  • De Ruiter, J. P. (2007). Some multimodal signals in humans. In I. Van de Sluis, M. Theune, E. Reiter, & E. Krahmer (Eds.), Proceedings of the Workshop on Multimodal Output Generation (MOG 2007) (pp. 141-148).

    Abstract

    In this paper, I will give an overview of some well-studied multimodal signals that humans produce while they communicate with other humans, and discuss the implications of those studies for HCI. I will first discuss a conceptual framework that allows us to distinguish between functional and sensory modalities. This distinction is important, as there are multiple functional modalities using the same sensory modality (e.g., facial expression and eye-gaze in the visual modality). A second theoretically important issue is redundancy. Some signals appear to be redundant with a signal in another modality, whereas others give new information or even appear to give conflicting information (see e.g., the work of Susan Goldin-Meadows on speech accompanying gestures). I will argue that multimodal signals are never truly redundant. First, many gestures that appear at first sight to express the same meaning as the accompanying speech generally provide extra (analog) information about manner, path, etc. Second, the simple fact that the same information is expressed in more than one modality is itself a communicative signal. Armed with this conceptual background, I will then proceed to give an overview of some multimodalsignals that have been investigated in human-human research, and the level of understanding we have of the meaning of those signals. The latter issue is especially important for potential implementations of these signals in artificial agents. First, I will discuss pointing gestures. I will address the issue of the timing of pointing gestures relative to the speech it is supposed to support, the mutual dependency between pointing gestures and speech, and discuss the existence of alternative ways of pointing from other cultures. The most frequent form of pointing that does not involve the index finger is a cultural practice called lip-pointing which employs two visual functional modalities, mouth-shape and eye-gaze, simultaneously for pointing. Next, I will address the issue of eye-gaze. A classical study by Kendon (1967) claims that there is a systematic relationship between eye-gaze (at the interlocutor) and turn-taking states. Research at our institute has shown that this relationship is weaker than has often been assumed. If the dialogue setting contains a visible object that is relevant to the dialogue (e.g., a map), the rate of eye-gaze-at-other drops dramatically and its relationship to turn taking disappears completely. The implications for machine generated eye-gaze are discussed. Finally, I will explore a theoretical debate regarding spontaneous gestures. It has often been claimed that the class of gestures that is called iconic by McNeill (1992) are a “window into the mind”. That is, they are claimed to give the researcher (or even the interlocutor) a direct view into the speaker’s thought, without being obscured by the complex transformation that take place when transforming a thought into a verbal utterance. I will argue that this is an illusion. Gestures can be shown to be specifically designed such that the listener can be expected to interpret them. Although the transformations carried out to express a thought in gesture are indeed (partly) different from the corresponding transformations for speech, they are a) complex, and b) severely understudied. This obviously has consequences both for the gesture research agenda, and for the generation of iconic gestures by machines.
  • De Ruiter, J. P. (2004). On the primacy of language in multimodal communication. In Workshop Proceedings on Multimodal Corpora: Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces.(LREC2004) (pp. 38-41). Paris: ELRA - European Language Resources Association (CD-ROM).

    Abstract

    In this paper, I will argue that although the study of multimodal interaction offers exciting new prospects for Human Computer Interaction and human-human communication research, language is the primary form of communication, even in multimodal systems. I will support this claim with theoretical and empirical arguments, mainly drawn from human-human communication research, and will discuss the implications for multimodal communication research and Human-Computer Interaction.
  • De Ruiter, L. E. (2008). How useful are polynomials for analyzing intonation? In Proceedings of Interspeech 2008 (pp. 785-789).

    Abstract

    This paper presents the first application of polynomial modeling as a means for validating phonological pitch accent labels to German data. It is compared to traditional phonetic analysis (measuring minima, maxima, alignment). The traditional method fares better in classification, but results are comparable in statistical accent pair testing. Robustness tests show that pitch correction is necessary in both cases. The approaches are discussed in terms of their practicability, applicability to other domains of research and interpretability of their results.
  • De Ruiter, J. P., & Enfield, N. J. (2007). The BIC model: A blueprint for the communicator. In C. Stephanidis (Ed.), Universal access in Human-Computer Interaction: Applications and services (pp. 251-258). Berlin: Springer.
  • Sadakata, M., & McQueen, J. M. (2011). The role of variability in non-native perceptual learning of a Japanese geminate-singleton fricative contrast. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy (pp. 873-876).

    Abstract

    The current study reports the enhancing effect of a high variability training procedure in the learning of a Japanese geminate-singleton fricative contrast. Dutch natives took part in a five-day training procedure in which they identified geminate and singleton variants of the Japanese fricative /s/. They heard either many repetitions of a limited set of words recorded by a single speaker (simple training) or fewer repetitions of a more variable set of words recorded by multiple speakers (variable training). Pre-post identification evaluations and a transfer test indicated clear benefits of the variable training.
  • Sauermann, A., Höhle, B., Chen, A., & Järvikivi, J. (2011). Intonational marking of focus in different word orders in German children. In M. B. Washburn, K. McKinney-Bock, E. Varis, & A. Sawyer (Eds.), Proceedings of the 28th West Coast Conference on Formal Linguistics (pp. 313-322). Somerville, MA: Cascadilla Proceedings Project.

    Abstract

    The use of word order and intonation to mark focus in child speech has received some attention. However, past work usually examined each device separately or only compared the realizations of focused vs. non-focused constituents. This paper investigates the interaction between word order and intonation in the marking of different focus types in 4- to 5-year old German-speaking children and an adult control group. An answer-reconstruction task was used to elicit syntactic (word order) and intonational focus marking of subject and objects (locus of focus) in three focus types (broad, narrow, and contrastive focus). The results indicate that both children and adults used intonation to distinguish broad from contrastive focus but they differed in the marking of narrow focus. Further, both groups preferred intonation to word order as device for focus marking. But children showed an early sensitivity for the impact of focus type and focus location on word order variation and on phonetic means to mark focus.
  • Sauter, D., Scott, S., & Calder, A. (2004). Categorisation of vocally expressed positive emotion: A first step towards basic positive emotions? [Abstract]. Proceedings of the British Psychological Society, 12, 111.

    Abstract

    Most of the study of basic emotion expressions has focused on facial expressions and little work has been done to specifically investigate happiness, the only positive of the basic emotions (Ekman & Friesen, 1971). However, a theoretical suggestion has been made that happiness could be broken down into discrete positive emotions, which each fulfil the criteria of basic emotions, and that these would be expressed vocally (Ekman, 1992). To empirically test this hypothesis, 20 participants categorised 80 paralinguistic sounds using the labels achievement, amusement, contentment, pleasure and relief. The results suggest that achievement, amusement and relief are perceived as distinct categories, which subjects accurately identify. In contrast, the categories of contentment and pleasure were systematically confused with other responses, although performance was still well above chance levels. These findings are initial evidence that the positive emotions engage distinct vocal expressions and may be considered to be distinct emotion categories.
  • Sauter, D., Eisner, F., Rosen, S., & Scott, S. K. (2008). The role of source and filter cues in emotion recognition in speech [Abstract]. Journal of the Acoustical Society of America, 123, 3739-3740.

    Abstract

    In the context of the source-filter theory of speech, it is well established that intelligibility is heavily reliant on information carried by the filter, that is, spectral cues (e.g., Faulkner et al., 2001; Shannon et al., 1995). However, the extraction of other types of information in the speech signal, such as emotion and identity, is less well understood. In this study we investigated the extent to which emotion recognition in speech depends on filterdependent cues, using a forced-choice emotion identification task at ten levels of noise-vocoding ranging between one and 32 channels. In addition, participants performed a speech intelligibility task with the same stimuli. Our results indicate that compared to speech intelligibility, emotion recognition relies less on spectral information and more on cues typically signaled by source variations, such as voice pitch, voice quality, and intensity. We suggest that, while the reliance on spectral dynamics is likely a unique aspect of human speech, greater phylogenetic continuity across species may be found in the communication of affect in vocalizations.
  • Sauter, D. (2008). The time-course of emotional voice processing [Abstract]. Neurocase, 14, 455-455.

    Abstract

    Research using event-related brain potentials (ERPs) has demonstrated an early differential effect in fronto-central regions when processing emotional, as compared to affectively neutral facial stimuli (e.g., Eimer & Holmes, 2002). In this talk, data demonstrating a similar effect in the auditory domain will be presented. ERPs were recorded in a one-back task where participants had to identify immediate repetitions of emotion category, such as a fearful sound followed by another fearful sound. The stimulus set consisted of non-verbal emotional vocalisations communicating positive and negative sounds, as well as neutral baseline conditions. Similarly to the facial domain, fear sounds as compared to acoustically controlled neutral sounds, elicited a frontally distributed positivity with an onset latency of about 150 ms after stimulus onset. These data suggest the existence of a rapid multi-modal frontocentral mechanism discriminating emotional from non-emotional human signals.
  • Scharenborg, O., Ernestus, M., & Wan, V. (2007). Segmentation of speech: Child's play? In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 1953-1956). Adelaide: Causal Productions.

    Abstract

    The difficulty of the task of segmenting a speech signal into its words is immediately clear when listening to a foreign language; it is much harder to segment the signal into its words, since the words of the language are unknown. Infants are faced with the same task when learning their first language. This study provides a better understanding of the task that infants face while learning their native language. We employed an automatic algorithm on the task of speech segmentation without prior knowledge of the labels of the phonemes. An analysis of the boundaries erroneously placed inside a phoneme showed that the algorithm consistently placed additional boundaries in phonemes in which acoustic changes occur. These acoustic changes may be as great as the transition from the closure to the burst of a plosive or as subtle as the formant transitions in low or back vowels. Moreover, we found that glottal vibration may attenuate the relevance of acoustic changes within obstruents. An interesting question for further research is how infants learn to overcome the natural tendency to segment these ‘dynamic’ phonemes.
  • Scharenborg, O., & Wan, V. (2007). Can unquantised articulatory feature continuums be modelled? In INTERSPEECH 2007 - 8th Annual Conference of the International Speech Communication Association (pp. 2473-2476). ISCA Archive.

    Abstract

    Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Although termed ‘articulatory’, previous definitions make certain assumptions that are invalid, for instance, that articulators ‘hop’ from one fixed position to the next. In this paper, we studied two methods, based on support vector classification (SVC) and regression (SVR), in which the articulation continuum is modelled without being restricted to using discrete AF value classes. A comparison with a baseline system trained on quantised values of the articulation continuum showed that both SVC and SVR outperform the baseline for two of the three investigated AFs, with improvements up to 5.6% absolute.
  • Scharenborg, O., & Cooke, M. P. (2008). Comparing human and machine recognition performance on a VCV corpus. In ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery".

    Abstract

    Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and without ‘pitch’ information, using the same backend. The results are compared with listener results at the level of articulatory feature classification. While no acoustic feature representation reached the levels of human performance, both MFCCs and Rate maps achieved good scores, with Rate maps nearing human performance on the classification of voicing. Comparing the results on the most difficult articulatory features to classify showed similarities between the humans and the SVMs: e.g., ‘dental’ was by far the least well identified by both groups. Overall, adding pitch information seemed to hamper classification performance.
  • Scharenborg, O., Witteman, M. J., & Weber, A. (2012). Computational modelling of the recognition of foreign-accented speech. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 882 -885).

    Abstract

    In foreign-accented speech, pronunciation typically deviates from the canonical form to some degree. For native listeners, it has been shown that word recognition is more difficult for strongly-accented words than for less strongly-accented words. Furthermore recognition of strongly-accented words becomes easier with additional exposure to the foreign accent. In this paper, listeners’ behaviour was simulated with Fine-tracker, a computational model of word recognition that uses real speech as input. The simulations showed that, in line with human listeners, 1) Fine-Tracker’s recognition outcome is modulated by the degree of accentedness and 2) it improves slightly after brief exposure with the accent. On the level of individual words, however, Fine-tracker failed to correctly simulate listeners’ behaviour, possibly due to differences in overall familiarity with the chosen accent (German-accented Dutch) between human listeners and Fine-Tracker.
  • Scharenborg, O., Boves, L., & Ten Bosch, L. (2004). ‘On-line early recognition’ of polysyllabic words in continuous speech. In S. Cassidy, F. Cox, R. Mannell, & P. Sallyanne (Eds.), Proceedings of the Tenth Australian International Conference on Speech Science & Technology (pp. 387-392). Canberra: Australian Speech Science and Technology Association Inc.

    Abstract

    In this paper, we investigate the ability of SpeM, our recognition system based on the combination of an automatic phone recogniser and a wordsearch module, to determine as early as possible during the word recognition process whether a word is likely to be recognised correctly (this we refer to as ‘on-line’ early word recognition). We present two measures that can be used to predict whether a word is correctly recognised: the Bayesian word activation and the amount of available (acoustic) information for a word. SpeM was tested on 1,463 polysyllabic words in 885 continuous speech utterances. The investigated predictors indicated that a word activation that is 1) high (but not too high) and 2) based on more phones is more reliable to predict the correctness of a word than a similarly high value based on a small number of phones or a lower value of the word activation.
  • Scharenborg, O. (2008). Modelling fine-phonetic detail in a computational model of word recognition. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1473-1476). ISCA Archive.

    Abstract

    There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal helps listeners to segment a speech signal into syllables and words. In this paper, we compare two computational models of word recognition on their ability to capture and use this finephonetic detail during speech recognition. One model, SpeM, is phoneme-based, whereas the other, newly developed Fine- Tracker, is based on articulatory features. Simulations dealt with modelling the ability of listeners to distinguish short words (e.g., ‘ham’) from the longer words in which they are embedded (e.g., ‘hamster’). The simulations with Fine- Tracker showed that it was, like human listeners, able to distinguish between short words from the longer words in which they are embedded. This suggests that it is possible to extract this fine-phonetic detail from the speech signal and use it during word recognition.
  • Scharenborg, O., & Janse, E. (2012). Hearing loss and the use of acoustic cues in phonetic categorisation of fricatives. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 1458-1461).

    Abstract

    Aging often affects sensitivity to the higher frequencies, which results in the loss of sensitivity to phonetic detail in speech. Hearing loss may therefore interfere with the categorisation of two consonants that have most information to differentiate between them in those higher frequencies and less in the lower frequencies, e.g., /f/ and /s/. We investigate two acoustic cues, i.e., formant transitions and fricative intensity, that older listeners might use to differentiate between /f/ and /s/. The results of two phonetic categorisation tasks on 38 older listeners (aged 60+) with varying degrees of hearing loss indicate that older listeners seem to use formant transitions as a cue to distinguish /s/ from /f/. Moreover, this ability is not impacted by hearing loss. On the other hand, listeners with increased hearing loss seem to rely more on intensity for fricative identification. Thus, progressive hearing loss may lead to gradual changes in perceptual cue weighting.
  • Scharenborg, O., Janse, E., & Weber, A. (2012). Perceptual learning of /f/-/s/ by older listeners. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 398-401).

    Abstract

    Young listeners can quickly modify their interpretation of a speech sound when a talker produces the sound ambiguously. Young Dutch listeners rely mainly on the higher frequencies to distinguish between /f/ and /s/, but these higher frequencies are particularly vulnerable to age-related hearing loss. We therefore tested whether older Dutch listeners can show perceptual retuning given an ambiguous pronunciation in between /f/ and /s/. Results of a lexically-guided perceptual learning experiment showed that older Dutch listeners are still able to learn non-standard pronunciations of /f/ and /s/. Possibly, the older listeners have learned to rely on other acoustic cues, such as formant transitions, to distinguish between /f/ and /s/. However, the size and duration of the perceptual effect is influenced by hearing loss, with listeners with poorer hearing showing a smaller and a shorter-lived learning effect.
  • Scharenborg, O., Mitterer, H., & McQueen, J. M. (2011). Perceptual learning of liquids. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy (pp. 149-152).

    Abstract

    Previous research on lexically-guided perceptual learning has focussed on contrasts that differ primarily in local cues, such as plosive and fricative contrasts. The present research had two aims: to investigate whether perceptual learning occurs for a contrast with non-local cues, the /l/-/r/ contrast, and to establish whether STRAIGHT can be used to create ambiguous sounds on an /l/-/r/ continuum. Listening experiments showed lexically-guided learning about the /l/-/r/ contrast. Listeners can thus tune in to unusual speech sounds characterised by non-local cues. Moreover, STRAIGHT can be used to create stimuli for perceptual learning experiments, opening up new research possibilities. Index Terms: perceptual learning, morphing, liquids, human word recognition, STRAIGHT.
  • Scheu, O., & Zinn, C. (2007). How did the e-learning session go? The student inspector. In Proceedings of the 13th International Conference on Artificial Intelligence and Education (AIED 2007). Amsterdam: IOS Press.

    Abstract

    Good teachers know their students, and exploit this knowledge to adapt or optimise their instruction. Traditional teachers know their students because they interact with them face-to-face in classroom or one-to-one tutoring sessions. In these settings, they can build student models, i.e., by exploiting the multi-faceted nature of human-human communication. In distance-learning contexts, teacher and student have to cope with the lack of such direct interaction, and this must have detrimental effects for both teacher and student. In a past study we have analysed teacher requirements for tracking student actions in computer-mediated settings. Given the results of this study, we have devised and implemented a tool that allows teachers to keep track of their learners'interaction in e-learning systems. We present the tool's functionality and user interfaces, and an evaluation of its usability.
  • Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., & Sloetjes, H. (2008). An exchange format for multimodal annotations. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation of multimodality. We propose a multimodal annotation exchange format, based on the annotation graph formalism, which is supported by import and export routines in the respective tools
  • Schulte im Walde, S., Melinger, A., Roth, M., & Weber, A. (2007). An empirical characterization of response types in German association norms. In Proceedings of the GLDV workshop on lexical-semantic and ontological resources.
  • Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.

    Abstract

    This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.
  • Scott, S., & Sauter, D. (2004). Vocal expressions of emotion and positive and negative basic emotions [Abstract]. Proceedings of the British Psychological Society, 12, 156.

    Abstract

    Previous studies have indicated that vocal and facial expressions of the ‘basic’ emotions share aspects of processing. Thus amygdala damage compromises the perception of fear and anger from the face and from the voice. In the current study we tested the hypothesis that there exist positive basic emotions, expressed mainly in the voice (Ekman, 1992). Vocal stimuli were produced to express the specific positive emotions of amusement, achievement, pleasure, contentment and relief.
  • Senft, G. (2007). Language, culture and cognition: Frames of spatial reference and why we need ontologies of space [Abstract]. In A. G. Cohn, C. Freksa, & B. Bebel (Eds.), Spatial cognition: Specialization and integration (pp. 12).

    Abstract

    One of the many results of the "Space" research project conducted at the MPI for Psycholinguistics is that there are three "Frames of spatial Reference" (FoRs), the relative, the intrinsic and the absolute FoR. Cross-linguistic research showed that speakers who prefer one FoR in verbal spatial references rely on a comparable coding system for memorizing spatial configurations and for making inferences with respect to these spatial configurations in non-verbal problem solving. Moreover, research results also revealed that in some languages these verbal FoRs also influence gestural behavior. These results document the close interrelationship between language, culture and cognition in the domain "Space". The proper description of these interrelationships in the spatial domain requires language and culture specific ontologies.
  • Seuren, P. A. M. (1996). What a universal semantic interlingua can do. In A. Zamulin (Ed.), Perspectives of System Informatics. Proceedings of the Andrei Ershov Second International Memorial Conference, Novosibirsk, Akademgorodok, June 25-28,1996 (pp. 41-42). Novosibirsk: A.P. Ershov Institute of Informatics Systems.
  • Seuren, P. A. M. (1980). Variabele competentie: Linguïstiek en sociolinguïstiek anno 1980. In Handelingen van het 36e Nederlands Filologencongres: Gehouden te Groningen op woensdag 9, donderdag 10 en vrijdag 11 April 1980 (pp. 41-56). Amsterdam: Holland University Press.
  • Shatzman, K. B. (2004). Segmenting ambiguous phrases using phoneme duration. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 329-332). Seoul: Sunjijn Printing Co.

    Abstract

    The results of an eye-tracking experiment are presented in which Dutch listeners' eye movements were monitored as they heard sentences and saw four pictured objects. Participants were instructed to click on the object mentioned in the sentence. In the critical sentences, a stop-initial target (e.g., "pot") was preceded by an [s], thus causing ambiguity regarding whether the sentence refers to a stop-initial or a cluster-initial word (e.g., "spot"). Participants made fewer fixations to the target pictures when the stop and the preceding [s] were cross-spliced from the cluster-initial word than when they were spliced from a different token of the sentence containing the stop-initial word. Acoustic analyses showed that the two versions differed in various measures, but only one of these - the duration of the [s] - correlated with the perceptual effect. Thus, in this context, the [s] duration information is an important factor guiding word recognition.
  • Sjerps, M. J., McQueen, J. M., & Mitterer, H. (2012). Extrinsic normalization for vocal tracts depends on the signal, not on attention. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 394-397).

    Abstract

    When perceiving vowels, listeners adjust to speaker-specific vocal-tract characteristics (such as F1) through "extrinsic vowel normalization". This effect is observed as a shift in the location of categorization boundaries of vowel continua. Similar effects have been found with non-speech. Non-speech materials, however, have consistently led to smaller effect-sizes, perhaps because of a lack of attention to non-speech. The present study investigated this possibility. Non-speech materials that had previously been shown to elicit reduced normalization effects were tested again, with the addition of an attention manipulation. The results show that increased attention does not lead to increased normalization effects, suggesting that vowel normalization is mainly determined by bottom-up signal characteristics.

Share this page