Publications

Displaying 101 - 200 of 217
  • Herbst, L. E. (2006). The influence of language dominance on bilingual VOT: A case study. In Proceedings of the 4th University of Cambridge Postgraduate Conference on Language Research (CamLing 2006) (pp. 91-98). Cambridge: Cambridge University Press.

    Abstract

    Longitudinally collected VOT data from an early English-Italian bilingual who became increasingly English-dominant was analyzed. Stops in English were always produced with significantly longer VOT than in Italian. However, the speaker did not show any significant change in the VOT production in either language over time, despite the clear dominance of English in his every day language use later in his life. The results indicate that – unlike L2 learners – early bilinguals may remain unaffected by language use with respect to phonetic realization.
  • Hintz, F., & Scharenborg, O. (2016). Neighbourhood density influences word recognition in native and non-native speech recognition in noise. In H. Van den Heuvel, B. Cranen, & S. Mattys (Eds.), Proceedings of the Speech Processing in Realistic Environments (SPIRE) workshop (pp. 46-47). Groningen.
  • Hintz, F., & Scharenborg, O. (2016). The effect of background noise on the activation of phonological and semantic information during spoken-word recognition. In Proceedings of Interspeech 2016: The 17th Annual Conference of the International Speech Communication Association (pp. 2816-2820).

    Abstract

    During spoken-word recognition, listeners experience phonological competition between multiple word candidates, which increases, relative to optimal listening conditions, when speech is masked by noise. Moreover, listeners activate semantic word knowledge during the word’s unfolding. Here, we replicated the effect of background noise on phonological competition and investigated to which extent noise affects the activation of semantic information in phonological competitors. Participants’ eye movements were recorded when they listened to sentences containing a target word and looked at three types of displays. The displays either contained a picture of the target word, or a picture of a phonological onset competitor, or a picture of a word semantically related to the onset competitor, each along with three unrelated distractors. The analyses revealed that, in noise, fixations to the target and to the phonological onset competitor were delayed and smaller in magnitude compared to the clean listening condition, most likely reflecting enhanced phonological competition. No evidence for the activation of semantic information in the phonological competitors was observed in noise and, surprisingly, also not in the clear. We discuss the implications of the lack of an effect and differences between the present and earlier studies.
  • Holler, J., & Stevens, R. (2006). How speakers represent size information in referential communication for knowing and unknowing recipients. In D. Schlangen, & R. Fernandez (Eds.), Brandial '06 Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue, Potsdam, Germany, September 11-13.
  • Irivine, E., & Roberts, S. G. (2016). Deictic tools can limit the emergence of referential symbol systems. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/99.html.

    Abstract

    Previous experiments and models show that the pressure to communicate can lead to the emergence of symbols in specific tasks. The experiment presented here suggests that the ability to use deictic gestures can reduce the pressure for symbols to emerge in co-operative tasks. In the 'gesture-only' condition, pairs built a structure together in 'Minecraft', and could only communicate using a small range of gestures. In the 'gesture-plus' condition, pairs could also use sound to develop a symbol system if they wished. All pairs were taught a pointing convention. None of the pairs we tested developed a symbol system, and performance was no different across the two conditions. We therefore suggest that deictic gestures, and non-referential means of organising activity sequences, are often sufficient for communication. This suggests that the emergence of linguistic symbols in early hominids may have been late and patchy with symbols only emerging in contexts where they could significantly improve task success or efficiency. Given the communicative power of pointing however, these contexts may be fewer than usually supposed. An approach for identifying these situations is outlined.
  • Janse, E. (2009). Hearing and cognitive measures predict elderly listeners' difficulty ignoring competing speech. In M. Boone (Ed.), Proceedings of the International Conference on Acoustics (pp. 1532-1535).
  • Janssen, R., Winter, B., Dediu, D., Moisik, S. R., & Roberts, S. G. (2016). Nonlinear biases in articulation constrain the design space of language. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/86.html.

    Abstract

    In Iterated Learning (IL) experiments, a participant’s learned output serves as the next participant’s learning input (Kirby et al., 2014). IL can be used to model cultural transmission and has indicated that weak biases can be amplified through repeated cultural transmission (Kirby et al., 2007). So, for example, structural language properties can emerge over time because languages come to reflect the cognitive constraints in the individuals that learn and produce the language. Similarly, we propose that languages may also reflect certain anatomical biases. Do sound systems adapt to the affordances of the articulation space induced by the vocal tract?
    The human vocal tract has inherent nonlinearities which might derive from acoustics and aerodynamics (cf. quantal theory, see Stevens, 1989) or biomechanics (cf. Gick & Moisik, 2015). For instance, moving the tongue anteriorly along the hard palate to produce a fricative does not result in large changes in acoustics in most cases, but for a small range there is an abrupt change from a perceived palato-alveolar [ʃ] to alveolar [s] sound (Perkell, 2012). Nonlinearities such as these might bias all human speakers to converge on a very limited set of phonetic categories, and might even be a basis for combinatoriality or phonemic ‘universals’.
    While IL typically uses discrete symbols, Verhoef et al. (2014) have used slide whistles to produce a continuous signal. We conducted an IL experiment with human subjects who communicated using a digital slide whistle for which the degree of nonlinearity is controlled. A single parameter (α) changes the mapping from slide whistle position (the ‘articulator’) to the acoustics. With α=0, the position of the slide whistle maps Bark-linearly to the acoustics. As α approaches 1, the mapping gets more double-sigmoidal, creating three plateaus where large ranges of positions map to similar frequencies. In more abstract terms, α represents the strength of a nonlinear (anatomical) bias in the vocal tract.
    Six chains (138 participants) of dyads were tested, each chain with a different, fixed α. Participants had to communicate four meanings by producing a continuous signal using the slide-whistle in a ‘director-matcher’ game, alternating roles (cf. Garrod et al., 2007).
    Results show that for high αs, subjects quickly converged on the plateaus. This quick convergence is indicative of a strong bias, repelling subjects away from unstable regions already within-subject. Furthermore, high αs lead to the emergence of signals that oscillate between two (out of three) plateaus. Because the sigmoidal spaces are spatially constrained, participants increasingly used the sequential/temporal dimension. As a result of this, the average duration of signals with high α was ~100ms longer than with low α. These oscillations could be an expression of a basis for phonemic combinatoriality.
    We have shown that it is possible to manipulate the magnitude of an articulator-induced non-linear bias in a slide whistle IL framework. The results suggest that anatomical biases might indeed constrain the design space of language. In particular, the signaling systems in our study quickly converged (within-subject) on the use of stable regions. While these conclusions were drawn from experiments using slide whistles with a relatively strong bias, weaker biases could possibly be amplified over time by repeated cultural transmission, and likely lead to similar outcomes.
  • Janssen, R., Dediu, D., & Moisik, S. R. (2016). Simple agents are able to replicate speech sounds using 3d vocal tract model. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/97.html.

    Abstract

    Many factors have been proposed to explain why groups of people use different speech sounds in their language. These range from cultural, cognitive, environmental (e.g., Everett, et al., 2015) to anatomical (e.g., vocal tract (VT) morphology). How could such anatomical properties have led to the similarities and differences in speech sound distributions between human languages?

    It is known that hard palate profile variation can induce different articulatory strategies in speakers (e.g., Brunner et al., 2009). That is, different hard palate profiles might induce a kind of bias on speech sound production, easing some types of sounds while impeding others. With a population of speakers (with a proportion of individuals) that share certain anatomical properties, even subtle VT biases might become expressed at a population-level (through e.g., bias amplification, Kirby et al., 2007). However, before we look into population-level effects, we should first look at within-individual anatomical factors. For that, we have developed a computer-simulated analogue for a human speaker: an agent. Our agent is designed to replicate speech sounds using a production and cognition module in a computationally tractable manner.

    Previous agent models have often used more abstract (e.g., symbolic) signals. (e.g., Kirby et al., 2007). We have equipped our agent with a three-dimensional model of the VT (the production module, based on Birkholz, 2005) to which we made numerous adjustments. Specifically, we used a 4th-order Bezier curve that is able to capture hard palate variation on the mid-sagittal plane (XXX, 2015). Using an evolutionary algorithm, we were able to fit the model to human hard palate MRI tracings, yielding high accuracy fits and using as little as two parameters. Finally, we show that the samples map well-dispersed to the parameter-space, demonstrating that the model cannot generate unrealistic profiles. We can thus use this procedure to import palate measurements into our agent’s production module to investigate the effects on acoustics. We can also exaggerate/introduce novel biases.

    Our agent is able to control the VT model using the cognition module.

    Previous research has focused on detailed neurocomputation (e.g., Kröger et al., 2014) that highlights e.g., neurobiological principles or speech recognition performance. However, the brain is not the focus of our current study. Furthermore, present-day computing throughput likely does not allow for large-scale deployment of these architectures, as required by the population model we are developing. Thus, the question whether a very simple cognition module is able to replicate sounds in a computationally tractable manner, and even generalize over novel stimuli, is one worthy of attention in its own right.

    Our agent’s cognition module is based on running an evolutionary algorithm on a large population of feed-forward neural networks (NNs). As such, (anatomical) bias strength can be thought of as an attractor basin area within the parameter-space the agent has to explore. The NN we used consists of a triple-layered (fully-connected), directed graph. The input layer (three neurons) receives the formants frequencies of a target-sound. The output layer (12 neurons) projects to the articulators in the production module. A hidden layer (seven neurons) enables the network to deal with nonlinear dependencies. The Euclidean distance (first three formants) between target and replication is used as fitness measure. Results show that sound replication is indeed possible, with Euclidean distance quickly approaching a close-to-zero asymptote.

    Statistical analysis should reveal if the agent can also: a) Generalize: Can it replicate sounds not exposed to during learning? b) Replicate consistently: Do different, isolated agents always converge on the same sounds? c) Deal with consolidation: Can it still learn new sounds after an extended learning phase (‘infancy’) has been terminated? Finally, a comparison with more complex models will be used to demonstrate robustness.
  • Jeske, J., Kember, H., & Cutler, A. (2016). Native and non-native English speakers' use of prosody to predict sentence endings. In Proceedings of the 16th Australasian International Conference on Speech Science and Technology (SST2016).
  • Jesse, A., & Janse, E. (2009). Visual speech information aids elderly adults in stream segregation. In B.-J. Theobald, & R. Harvey (Eds.), Proceedings of the International Conference on Auditory-Visual Speech Processing 2009 (pp. 22-27). Norwich, UK: School of Computing Sciences, University of East Anglia.

    Abstract

    Listening to a speaker while hearing another speaker talks is a challenging task for elderly listeners. We show that elderly listeners over the age of 65 with various degrees of age-related hearing loss benefit in this situation from also seeing the speaker they intend to listen to. In a phoneme monitoring task, listeners monitored the speech of a target speaker for either the phoneme /p/ or /k/ while simultaneously hearing a competing speaker. Critically, on some trials, the target speaker was also visible. Elderly listeners benefited in their response times and accuracy levels from seeing the target speaker when monitoring for the less visible /k/, but more so when monitoring for the highly visible /p/. Visual speech therefore aids elderly listeners not only by providing segmental information about the target phoneme, but also by providing more global information that allows for better performance in this adverse listening situation.
  • Joo, H., Jang, J., Kim, S., Cho, T., & Cutler, A. (2019). Prosodic structural effects on coarticulatory vowel nasalization in Australian English in comparison to American English. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 835-839). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    This study investigates effects of prosodic factors (prominence, boundary) on coarticulatory Vnasalization in Australian English (AusE) in CVN and NVC in comparison to those in American English
    (AmE). As in AmE, prominence was found to
    lengthen N, but to reduce V-nasalization, enhancing N’s nasality and V’s orality, respectively (paradigmatic contrast enhancement). But the prominence effect in CVN was more robust than that in AmE. Again similar to findings in AmE, boundary
    induced a reduction of N-duration and V-nasalization phrase-initially (syntagmatic contrast enhancement), and increased the nasality of both C and V phrasefinally.
    But AusE showed some differences in terms
    of the magnitude of V nasalization and N duration. The results suggest that the linguistic contrast enhancements underlie prosodic-structure modulation of coarticulatory V-nasalization in
    comparable ways across dialects, while the fine phonetic detail indicates that the phonetics-prosody interplay is internalized in the individual dialect’s phonetic grammar.
  • Kember, H., Choi, J., & Cutler, A. (2016). Processing advantages for focused words in Korean. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 702-705).

    Abstract

    In Korean, focus is expressed in accentual phrasing. To ascertain whether words focused in this manner enjoy a processing advantage analogous to that conferred by focus as expressed in, e.g, English and Dutch, we devised sentences with target words in one of four conditions: prosodic focus, syntactic focus, prosodic + syntactic focus, and no focus as a control. 32 native speakers of Korean listened to blocks of 10 sentences, then were presented visually with words and asked whether or not they had heard them. Overall, words with focus were recognised significantly faster and more accurately than unfocused words. In addition, words with syntactic focus or syntactic + prosodic focus were recognised faster than words with prosodic focus alone. As for other languages, Korean focus confers processing advantage on the words carrying it. While prosodic focus does provide an advantage, however, syntactic focus appears to provide the greater beneficial effect for recognition memory
  • Kempen, G., & Maassen, B. (1977). The time course of conceptualizing and formulating processes during the production of simple sentences. In Proceedings of The Third Prague Conference on the Psychology of Human Learning and Development. Prague: Institute of Psychology.

    Abstract

    The psychological process of producing sentences includes conceptualization (selecting to-beexpressed conceptual content) and formulation (translating conceptual content into syntactic structures of a language). There is ample evidence, both intuitive and experimental, that the conceptualizing and formulating processes often proceed concurrently, not strictly serially. James Lindsley (Cognitive Psych.,1975, 7, 1-19; J.Psycholinguistic Res., 1976, 5, 331-354) has developed a concurrent model which proved succesful in an experimental situation where simple English Subject-Verb (SV) sentences such as “The boy is greeting”,”The girl is kicking” were produced as descriptions of pictures which showed actor and action. The measurements were reaction times defined as the interval between the moment a picture appeared on a screen and the onset of the vocal utterance by the speaker. Lindsley could show, among other things, that the formulation process for an SV sentence doesn’t start immediately after the actor of a picture (that is, the conceptual content underlying the surface Subject phrase) has been identified, but is somewhat delayed. The delay was needed, according to Lindsley, in order to prevent dysfluencies (hesitations) between surface Subject and verb. We replicated Lindsley’s data for Dutch. However, his model proved inadequate when we added Dutch Verb-Subject (VS) constructions which are obligatory in certain syntactic contexts but synonymous with their SV counterparts. A sentence production theory which is being developed by the first author is able to provide an accurate account of the data. The abovementioned delay is attributed to certain precautions the sentence generator has to take in case of SV but not of VS sentences. These precautions are related to the goal of attaining syntactic coherence of the utterance as a whole, not to the prevention of dysfluencies.
  • Kemps-Snijders, M., Ducret, J., Romary, L., & Wittenburg, P. (2006). An API for accessing the data category registry. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 2299-2302).
  • Kemps-Snijders, M., Nederhof, M.-J., & Wittenburg, P. (2006). LEXUS, a web-based tool for manipulating lexical resources. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1862-1865).
  • Khetarpal, N., Majid, A., & Regier, T. (2009). Spatial terms reflect near-optimal spatial categories. In N. Taatgen, & H. Van Rijn (Eds.), Proceedings of the Thirty-First Annual Conference of the Cognitive Science Society (pp. 2396-2401). Austin, TX: Cognitive Science Society.

    Abstract

    Spatial terms in the world’s languages appear to reflect both universal conceptual tendencies and linguistic convention. A similarly mixed picture in the case of color naming has been accounted for in terms of near-optimal partitions of color space. Here, we demonstrate that this account generalizes to spatial terms. We show that the spatial terms of 9 diverse languages near-optimally partition a similarity space of spatial meanings, just as color terms near-optimally partition color space. This account accommodates both universal tendencies and cross-language differences in spatial category extension, and identifies general structuring principles that appear to operate across different semantic domains.
  • Klassmann, A., Offenga, F., Broeder, D., Skiba, R., & Wittenburg, P. (2006). Comparison of resource discovery methods. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 113-116).
  • Klein, W., & Dimroth, C. (Eds.). (2009). Worauf kann sich der Sprachunterricht stützen? [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 153.
  • Koenig, A., Ringersma, J., & Trilsbeek, P. (2009). The Language Archiving Technology domain. In Z. Vetulani (Ed.), Human Language Technologies as a Challenge for Computer Science and Linguistics (pp. 295-299).

    Abstract

    The Max Planck Institute for Psycholinguistics (MPI) manages an archive of linguistic research data with a current size of almost 20 Terabytes. Apart from in-house researchers other projects also store their data in the archive, most notably the Documentation of Endangered Languages (DoBeS) projects. The archive is available online and can be accessed by anybody with Internet access. To be able to manage this large amount of data the MPI's technical group has developed a software suite called Language Archiving Technology (LAT) that on the one hand helps researchers and archive managers to manage the data and on the other hand helps users in enriching their primary data with additional layers. All the MPI software is Java-based and developed according to open source principles (GNU, 2007). All three major operating systems (Windows, Linux, MacOS) are supported and the software works similarly on all of them. As the archive is online, many of the tools, especially the ones for accessing the data, are browser based. Some of these browser-based tools make use of Adobe Flex to create nice-looking GUIs. The LAT suite is a complete set of management and enrichment tools, and given the interaction between the tools the result is a complete LAT software domain. Over the last 10 years, this domain has proven its functionality and use, and is being deployed to servers in other institutions. This deployment is an important step in getting the archived resources back to the members of the speech communities whose languages are documented. In the paper we give an overview of the tools of the LAT suite and we describe their functionality and role in the integrated process of archiving, management and enrichment of linguistic data.
  • Kuzla, C., Mitterer, H., Ernestus, M., & Cutler, A. (2006). Perceptual compensation for voice assimilation of German fricatives. In P. Warren, & I. Watson (Eds.), Proceedings of the 11th Australasian International Conference on Speech Science and Technology (pp. 394-399).

    Abstract

    In German, word-initial lax fricatives may be produced with substantially reduced glottal vibration after voiceless obstruents. This assimilation occurs more frequently and to a larger extent across prosodic word boundaries than across phrase boundaries. Assimilatory devoicing makes the fricatives more similar to their tense counterparts and could thus hinder word recognition. The present study investigates how listeners cope with assimilatory devoicing. Results of a cross-modal priming experiment indicate that listeners compensate for assimilation in appropriate contexts. Prosodic structure moderates compensation for assimilation: Compensation occurs especially after phrase boundaries, where devoiced fricatives are sufficiently long to be confused with their tense counterparts.
  • Kuzla, C., Ernestus, M., & Mitterer, H. (2006). Prosodic structure affects the production and perception of voice-assimilated German fricatives. In R. Hoffmann, & H. Mixdorff (Eds.), Speech prosody 2006. Dresden: TUD Press.

    Abstract

    Prosodic structure has long been known to constrain phonological processes [1]. More recently, it has also been recognized as a source of fine-grained phonetic variation of speech sounds. In particular, segments in domain-initial position undergo prosodic strengthening [2, 3], which also implies more resistance to coarticulation in higher prosodic domains [5]. The present study investigates the combined effects of prosodic strengthening and assimilatory devoicing on word-initial fricatives in German, the functional implication of both processes for cues to the fortis-lenis contrast, and the influence of prosodic structure on listeners’ compensation for assimilation. Results indicate that 1. Prosodic structure modulates duration and the degree of assimilatory devoicing, 2. Phonological contrasts are maintained by speakers, but differ in phonetic detail across prosodic domains, and 3. Compensation for assimilation in perception is moderated by prosodic structure and lexical constraints.
  • Kuzla, C., Mitterer, H., & Ernestus, M. (2006). Compensation for assimilatory devoicing and prosodic structure in German fricative perception. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 43-44).
  • Lausberg, H., & Sloetjes, H. (2009). NGCS/ELAN - Coding movement behaviour in psychotherapy [Meeting abstract]. PPmP - Psychotherapie · Psychosomatik · Medizinische Psychologie, 59: A113, 103.

    Abstract

    Individual and interactive movement behaviour (non-verbal behaviour / communication) specifically reflects implicit processes in psychotherapy [1,4,11]. However, thus far, the registration of movement behaviour has been a methodological challenge. We will present a coding system combined with an annotation tool for the analysis of movement behaviour during psychotherapy interviews [9]. The NGCS coding system enables to classify body movements based on their kinetic features alone [5,7]. The theoretical assumption behind the NGCS is that its main kinetic and functional movement categories are differentially associated with specific psychological functions and thus, have different neurobiological correlates [5-8]. ELAN is a multimodal annotation tool for digital video media [2,3,12]. The NGCS / ELAN template enables to link any movie to the same coding system and to have different raters independently work on the same file. The potential of movement behaviour analysis as an objective tool for psychotherapy research and for supervision in the psychosomatic practice is discussed by giving examples of the NGCS/ELAN analyses of psychotherapy sessions. While the quality of kinetic turn-taking and the therapistrsquor;s (implicit) adoption of the patientrsquor;s movements may predict therapy outcome, changes in the patientrsquor;s movement behaviour pattern may indicate changes in cognitive concepts and emotional states and thus, may help to identify therapeutically relevant processes [10].
  • Lenkiewicz, P., Pereira, M., Freire, M. M., & Fernandes, J. (2009). A new 3D image segmentation method for parallel architectures. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo [ICME 2009] June 28 – July 3, 2009, New York (pp. 1813-1816).

    Abstract

    This paper presents a novel model for 3D image segmentation and reconstruction. It has been designed with the aim to be implemented over a computer cluster or a multi-core platform. The required features include a nearly absolute independence between the processes participating in the segmentation task and providing amount of work as equal as possible for all the participants. As a result, it is avoid many drawbacks often encountered when performing a parallelization of an algorithm that was constructed to operate in a sequential manner. Furthermore, the proposed algorithm based on the new segmentation model is efficient and shows a very good, nearly linear performance growth along with the growing number of processing units.
  • Lenkiewicz, P., Pereira, M., Freire, M., & Fernandes, J. (2009). The dynamic topology changes model for unsupervised image segmentation. In Proceedings of the 11th IEEE International Workshop on Multimedia Signal Processing (MMSP'09) (pp. 1-5).

    Abstract

    Deformable models are a popular family of image segmentation techniques, which has been gaining significant focus in the last two decades, serving both for real-world applications as well as the base for research work. One of the features that the deformable models offer and that is considered a much desired one, is the ability to change their topology during the segmentation process. Using this characteristic it is possible to perform segmentation of objects with discontinuities in their bodies or to detect an undefined number of objects in the scene. In this paper we present our model for handling the topology changes in image segmentation methods based on the Active Volumes solution. The said model is capable of performing the changes in the structure of objects while the segmentation progresses, what makes it efficient and suitable for implementations over powerful execution environment, like multi-core architectures or computer clusters.
  • Lenkiewicz, P., Pereira, M., Freire, M., & Fernandes, J. (2009). The whole mesh Deformation Model for 2D and 3D image segmentation. In Proceedings of the 2009 IEEE International Conference on Image Processing (ICIP 2009) (pp. 4045-4048).

    Abstract

    In this paper we present a novel approach for image segmentation using Active Nets and Active Volumes. Those solutions are based on the Deformable Models, with slight difference in the method for describing the shapes of interests - instead of using a contour or a surface they represented the segmented objects with a mesh structure, which allows to describe not only the surface of the objects but also to model their interiors. This is obtained by dividing the nodes of the mesh in two categories, namely internal and external ones, which will be responsible for two different tasks. In our new approach we propose to negate this separation and use only one type of nodes. Using that assumption we manage to significantly shorten the time of segmentation while maintaining its quality.
  • Levelt, W. J. M. (1974). Taalpsychologie: Van taalkunde naar psychologie. In Herstal-Conferentie.
  • Levinson, S. C. (2006). On the human "interaction engine". In N. J. Enfield, & S. C. Levinson (Eds.), Roots of human sociality: Culture, cognition and interaction (pp. 39-69). Oxford: Berg.
  • Little, H., Eryılmaz, K., & De Boer, B. (2016). Emergence of signal structure: Effects of duration constraints. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/25.html.

    Abstract

    Recent work has investigated the emergence of structure in speech using experiments which use artificial continuous signals. Some experiments have had no limit on the duration which signals can have (e.g. Verhoef et al., 2014), and others have had time limitations (e.g. Verhoef et al., 2015). However, the effect of time constraints on the structure in signals has never been experimentally investigated.
  • Little, H., & de Boer, B. (2016). Did the pressure for discrimination trigger the emergence of combinatorial structure? In Proceedings of the 2nd Conference of the International Association for Cognitive Semiotics (pp. 109-110).
  • Little, H., Eryılmaz, K., & De Boer, B. (2016). Differing signal-meaning dimensionalities facilitates the emergence of structure. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/25.html.

    Abstract

    Structure of language is not only caused by cognitive processes, but also by physical aspects of the signalling modality. We test the assumptions surrounding the role which the physical aspects of the signal space will have on the emergence of structure in speech. Here, we use a signal creation task to test whether a signal space and a meaning space having similar dimensionalities will generate an iconic system with signal-meaning mapping and whether, when the topologies differ, the emergence of non-iconic structure is facilitated. In our experiments, signals are created using infrared sensors which use hand position to create audio signals. We find that people take advantage of signal-meaning mappings where possible. Further, we use trajectory probabilities and measures of variance to show that when there is a dimensionality mismatch, more structural strategies are used.
  • Little, H. (2016). Nahran Bhannamz: Language Evolution in an Online Zombie Apocalypse Game. In Createvolang: creativity and innovation in language evolution.
  • Liu, S., & Zhang, Y. (2019). Why some verbs are harder to learn than others – A micro-level analysis of everyday learning contexts for early verb learning. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2173-2178). Montreal, QB: Cognitive Science Society.

    Abstract

    Verb learning is important for young children. While most
    previous research has focused on linguistic and conceptual
    challenges in early verb learning (e.g. Gentner, 1982, 2006),
    the present paper examined early verb learning at the
    attentional level and quantified the input for early verb learning
    by measuring verb-action co-occurrence statistics in parent-
    child interaction from the learner’s perspective. To do so, we
    used head-mounted eye tracking to record fine-grained
    multimodal behaviors during parent-infant joint play, and
    analyzed parent speech, parent and infant action, and infant
    attention at the moments when parents produced verb labels.
    Our results show great variability across different action verbs,
    in terms of frequency of verb utterances, frequency of
    corresponding actions related to verb meanings, and infants’
    attention to verbs and actions, which provide new insights on
    why some verbs are harder to learn than others.
  • Lockwood, G., Hagoort, P., & Dingemanse, M. (2016). Synthesized Size-Sound Sound Symbolism. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1823-1828). Austin, TX: Cognitive Science Society.

    Abstract

    Studies of sound symbolism have shown that people can associate sound and meaning in consistent ways when presented with maximally contrastive stimulus pairs of nonwords such as bouba/kiki (rounded/sharp) or mil/mal (small/big). Recent work has shown the effect extends to antonymic words from natural languages and has proposed a role for shared cross-modal correspondences in biasing form-to-meaning associations. An important open question is how the associations work, and particularly what the role is of sound-symbolic matches versus mismatches. We report on a learning task designed to distinguish between three existing theories by using a spectrum of sound-symbolically matching, mismatching, and neutral (neither matching nor mismatching) stimuli. Synthesized stimuli allow us to control for prosody, and the inclusion of a neutral condition allows a direct test of competing accounts. We find evidence for a sound-symbolic match boost, but not for a mismatch difficulty compared to the neutral condition.
  • Macuch Silva, V., & Roberts, S. G. (2016). Language adapts to signal disruption in interaction. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/20.html.

    Abstract

    Linguistic traits are often seen as reflecting cognitive biases and constraints (e.g. Christiansen & Chater, 2008). However, language must also adapt to properties of the channel through which communication between individuals occurs. Perhaps the most basic aspect of any communication channel is noise. Communicative signals can be blocked, degraded or distorted by other sources in the environment. This poses a fundamental problem for communication. On average, channel disruption accompanies problems in conversation every 3 minutes (27% of cases of other-initiated repair, Dingemanse et al., 2015). Linguistic signals must adapt to this harsh environment. While modern language structures are robust to noise (e.g. Piantadosi et al., 2011), we investigate how noise might have shaped the early emergence of structure in language. The obvious adaptation to noise is redundancy. Signals which are maximally different from competitors are harder to render ambiguous by noise. Redundancy can be increased by adding differentiating segments to each signal (increasing the diversity of segments). However, this makes each signal more complex and harder to learn. Under this strategy, holistic languages may emerge. Another strategy is reduplication - repeating parts of the signal so that noise is less likely to disrupt all of the crucial information. This strategy does not increase the difficulty of learning the language - there is only one extra rule which applies to all signals. Therefore, under pressures for learnability, expressivity and redundancy, reduplicated signals are expected to emerge. However, reduplication is not a pervasive feature of words (though it does occur in limited domains like plurals or iconic meanings). We suggest that this is due to the pressure for redundancy being lifted by conversational infrastructure for repair. Receivers can request that senders repeat signals only after a problem occurs. That is, robustness is achieved by repeating the signal across conversational turns (when needed) instead of within single utterances. As a proof of concept, we ran two iterated learning chains with pairs of individuals in generations learning and using an artificial language (e.g. Kirby et al., 2015). The meaning space was a structured collection of unfamiliar images (3 shapes x 2 textures x 2 outline types). The initial language for each chain was the same written, unstructured, fully expressive language. Signals produced in each generation formed the training language for the next generation. Within each generation, pairs played an interactive communication game. The director was given a target meaning to describe, and typed a word for the matcher, who guessed the target meaning from a set. With a 50% probability, a contiguous section of 3-5 characters in the typed word was replaced by ‘noise’ characters (#). In one chain, the matcher could initiate repair by requesting that the director type and send another signal. Parallel generations across chains were matched for the number of signals sent (if repair was initiated for a meaning, then it was presented twice in the parallel generation where repair was not possible) and noise (a signal for a given meaning which was affected by noise in one generation was affected by the same amount of noise in the parallel generation). For the final set of signals produced in each generation we measured the signal redundancy (the zip compressibility of the signals), the character diversity (entropy of the characters of the signals) and systematic structure (z-score of the correlation between signal edit distance and meaning hamming distance). In the condition without repair, redundancy increased with each generation (r=0.97, p=0.01), and the character diversity decreased (r=-0.99,p=0.001) which is consistent with reduplication, as shown below (part of the initial and the final language): Linear regressions revealed that generations with repair had higher overall systematic structure (main effect of condition, t = 2.5, p < 0.05), increasing character diversity (interaction between condition and generation, t = 3.9, p = 0.01) and redundancy increased at a slower rate (interaction between condition and generation, t = -2.5, p < 0.05). That is, the ability to repair counteracts the pressure from noise, and facilitates the emergence of compositional structure. Therefore, just as systems to repair damage to DNA replication are vital for the evolution of biological species (O’Brien, 2006), conversational repair may regulate replication of linguistic forms in the cultural evolution of language. Future studies should further investigate how evolving linguistic structure is shaped by interaction pressures, drawing on experimental methods and naturalistic studies of emerging languages, both spoken (e.g Botha, 2006; Roberge, 2008) and signed (e.g Senghas, Kita, & Ozyurek, 2004; Sandler et al., 2005).
  • Mai, F., Galke, L., & Scherp, A. (2019). CBOW is not all you need: Combining CBOW with the compositional matrix space model. In Proceedings of the Seventh International Conference on Learning Representations (ICLR 2019). OpenReview.net.

    Abstract

    Continuous Bag of Words (CBOW) is a powerful text embedding method. Due to its strong capabilities to encode word content, CBOW embeddings perform well on a wide range of downstream tasks while being efficient to compute. However, CBOW is not capable of capturing the word order. The reason is that the computation of CBOW's word embeddings is commutative, i.e., embeddings of XYZ and ZYX are the same. In order to address this shortcoming, we propose a
    learning algorithm for the Continuous Matrix Space Model, which we call Continual Multiplication of Words (CMOW). Our algorithm is an adaptation of word2vec, so that it can be trained on large quantities of unlabeled text. We empirically show that CMOW better captures linguistic properties, but it is inferior to CBOW in memorizing word content. Motivated by these findings, we propose a hybrid model that combines the strengths of CBOW and CMOW. Our results show that the hybrid CBOW-CMOW-model retains CBOW's strong ability to memorize word content while at the same time substantially improving its ability to encode other linguistic information by 8%. As a result, the hybrid also performs better on 8 out of 11 supervised downstream tasks with an average improvement of 1.2%.
  • Majid, A., Enfield, N. J., & Van Staden, M. (Eds.). (2006). Parts of the body: Cross-linguistic categorisation [Special Issue]. Language Sciences, 28(2-3).
  • Malaisé, V., Aroyo, L., Brugman, H., Gazendam, L., De Jong, A., Negru, C., & Schreiber, G. (2006). Evaluating a thesaurus browser for an audio-visual archive. In S. Staab, & V. Svatek (Eds.), Managing knowledge in a world of networks (pp. 272-286). Berlin: Springer.
  • Mamus, E., Rissman, L., Majid, A., & Ozyurek, A. (2019). Effects of blindfolding on verbal and gestural expression of path in auditory motion events. In A. K. Goel, C. M. Seifert, & C. C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2275-2281). Montreal, QB: Cognitive Science Society.

    Abstract

    Studies have claimed that blind people’s spatial representations are different from sighted people, and blind people display superior auditory processing. Due to the nature of auditory and haptic information, it has been proposed that blind people have spatial representations that are more sequential than sighted people. Even the temporary loss of sight—such as through blindfolding—can affect spatial representations, but not much research has been done on this topic. We compared blindfolded and sighted people’s linguistic spatial expressions and non-linguistic localization accuracy to test how blindfolding affects the representation of path in auditory motion events. We found that blindfolded people were as good as sighted people when localizing simple sounds, but they outperformed sighted people when localizing auditory motion events. Blindfolded people’s path related speech also included more sequential, and less holistic elements. Our results indicate that even temporary loss of sight influences spatial representations of auditory motion events
  • Marcoux, K., & Ernestus, M. (2019). Differences between native and non-native Lombard speech in terms of pitch range. In M. Ochmann, M. Vorländer, & J. Fels (Eds.), Proceedings of the ICA 2019 and EAA Euroregio. 23rd International Congress on Acoustics, integrating 4th EAA Euroregio 2019 (pp. 5713-5720). Berlin: Deutsche Gesellschaft für Akustik.

    Abstract

    Lombard speech, speech produced in noise, is acoustically different from speech produced in quiet (plain speech) in several ways, including having a higher and wider F0 range (pitch). Extensive research on native Lombard speech does not consider that non-natives experience a higher cognitive load while producing
    speech and that the native language may influence the non-native speech. We investigated pitch range in plain and Lombard speech in native and non-natives.
    Dutch and American-English speakers read contrastive question-answer pairs in quiet and in noise in English, while the Dutch also read Dutch sentence pairs. We found that Lombard speech is characterized by a wider pitch range than plain speech, for all speakers (native English, non-native English, and native Dutch).
    This shows that non-natives also widen their pitch range in Lombard speech. In sentences with early-focus, we see the same increase in pitch range when going from plain to Lombard speech in native and non-native English, but a smaller increase in native Dutch. In sentences with late-focus, we see the biggest increase for the native English, followed by non-native English and then native Dutch. Together these results indicate an effect of the native language on non-native Lombard speech.
  • Marcoux, K., & Ernestus, M. (2019). Pitch in native and non-native Lombard speech. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 2605-2609). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Lombard speech, speech produced in noise, is
    typically produced with a higher fundamental
    frequency (F0, pitch) compared to speech in quiet. This paper examined the potential differences in native and non-native Lombard speech by analyzing median pitch in sentences with early- or late-focus produced in quiet and noise. We found an increase in pitch in late-focus sentences in noise for Dutch speakers in both English and Dutch, and for American-English speakers in English. These results
    show that non-native speakers produce Lombard speech, despite their higher cognitive load. For the early-focus sentences, we found a difference between the Dutch and the American-English speakers. Whereas the Dutch showed an increased F0 in noise
    in English and Dutch, the American-English speakers did not in English. Together, these results suggest that some acoustic characteristics of Lombard speech, such as pitch, may be language-specific, potentially
    resulting in the native language influencing the non-native Lombard speech.
  • McDonough, J., Lehnert-LeHouillier, H., & Bardhan, N. P. (2009). The perception of nasalized vowels in American English: An investigation of on-line use of vowel nasalization in lexical access. In Nasal 2009.

    Abstract

    The goal of the presented study was to investigate the use of coarticulatory vowel nasalization in lexical access by native speakers of American English. In particular, we compare the use of coart culatory place of articulation cues to that of coarticulatory vowel nasalization. Previous research on lexical access has shown that listeners use cues to the place of articulation of a postvocalic stop in the preceding vowel. However, vowel nasalization as cue to an upcoming nasal consonant has been argued to be a more complex phenomenon. In order to establish whether coarticulatory vowel nasalization aides in the process of lexical access in the same way as place of articulation cues do, we conducted two perception experiments: an off-line 2AFC discrimination task and an on-line eyetracking study using the visual world paradigm. The results of our study suggest that listeners are indeed able to use vowel nasalization in similar ways to place of articulation information, and that both types of cues aide in lexical access.
  • Melinger, A., Schulte im Walde, S., & Weber, A. (2006). Characterizing response types and revealing noun ambiguity in German association norms. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Trento: Association for Computational Linguistics.

    Abstract

    This paper presents an analysis of semantic association norms for German nouns. In contrast to prior studies, we not only collected associations elicited by written representations of target objects but also by their pictorial representations. In a first analysis, we identified systematic differences in the type and distribution of associate responses for the two presentation forms. In a second analysis, we applied a soft cluster analysis to the collected target-response pairs. We subsequently used the clustering to predict noun ambiguity and to discriminate senses in our target nouns.
  • Merkx, D., Frank, S., & Ernestus, M. (2019). Language learning using speech to image retrieval. In Proceedings of Interspeech 2019 (pp. 1841-1845). doi:10.21437/Interspeech.2019-3067.

    Abstract

    Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on existing neural network approaches to create visually grounded embeddings for spoken utterances. Using a combination of a multi-layer GRU, importance sampling, cyclic learning rates, ensembling and vectorial self-attention our results show a remarkable increase in image-caption retrieval performance over previous work. Furthermore, we investigate which layers in the model learn to recognise words in the input. We find that deeper network layers are better at encoding word presence, although the final layer has slightly lower performance. This shows that our visually grounded sentence encoder learns to recognise words from the input even though it is not explicitly trained for word recognition.
  • Meyer, A. S., & Huettig, F. (Eds.). (2016). Speaking and Listening: Relationships Between Language Production and Comprehension [Special Issue]. Journal of Memory and Language, 89.
  • Meyer, A. S., & Wheeldon, L. (Eds.). (2006). Language production across the life span [Special Issue]. Language and Cognitive Processes, 21(1-3).
  • Micklos, A. (2016). Interaction for facilitating conventionalization: Negotiating the silent gesture communication of noun-verb pairs. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). Retrieved from http://evolang.org/neworleans/papers/143.html.

    Abstract

    This study demonstrates how interaction – specifically negotiation and repair – facilitates the emergence, evolution, and conventionalization of a silent gesture communication system. In a modified iterated learning paradigm, partners communicated noun-verb meanings using only silent gesture. The need to disambiguate similar noun-verb pairs drove these "new" language users to develop a morphology that allowed for quicker processing, easier transmission, and improved accuracy. The specific morphological system that emerged came about through a process of negotiation within the dyad, namely by means of repair. By applying a discourse analytic approach to the use of repair in an experimental methodology for language evolution, we are able to determine not only if interaction facilitates the emergence and learnability of a new communication system, but also how interaction affects such a system
  • Moisik, S. R., Zhi Yun, D. P., & Dediu, D. (2019). Active adjustment of the cervical spine during pitch production compensates for shape: The ArtiVarK study. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 864-868). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    The anterior lordosis of the cervical spine is thought
    to contribute to pitch (fo) production by influencing
    cricoid rotation as a function of larynx height. This
    study examines the matter of inter-individual
    variation in cervical spine shape and whether this has
    an influence on how fo is produced along increasing
    or decreasing scales, using the ArtiVarK dataset,
    which contains real-time MRI pitch production data.
    We find that the cervical spine actively participates in
    fo production, but the amount of displacement
    depends on individual shape. In general, anterior
    spine motion (tending toward cervical lordosis)
    occurs for low fo, while posterior movement (tending
    towards cervical kyphosis) occurs for high fo.
  • Mulder, K., Ten Bosch, L., & Boves, L. (2016). Comparing different methods for analyzing ERP signals. In Proceedings of Interspeech 2016: The 17th Annual Conference of the International Speech Communication Association (pp. 1373-1377). doi:10.21437/Interspeech.2016-967.
  • Musgrave, S., & Cutfield, S. (2009). Language documentation and an Australian National Corpus. In M. Haugh, K. Burridge, J. Mulder, & P. Peters (Eds.), Selected proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages (pp. 10-18). Somerville: Cascadilla Proceedings Project.

    Abstract

    Corpus linguistics and language documentation are usually considered separate subdisciplines within linguistics, having developed from different traditions and often operating on different scales, but the authors will suggest that there are commonalities to the two: both aim to represent language use in a community, and both are concerned with managing digital data. The authors propose that the development of the Australian National Corpus (AusNC) be guided by the experience of language documentation in the management of multimodal digital data and its annotation, and in ethical issues pertaining to making the data accessible. This would allow an AusNC that is distributed, multimodal, and multilingual, with holdings of text, audio, and video data distributed across multiple institutions; and including Indigenous, sign, and migrant community languages. An audit of language material held by Australian institutions and individuals is necessary to gauge the diversity and volume of possible content, and to inform common technical standards.
  • Nijland, L., & Janse, E. (Eds.). (2009). Auditory processing in speakers with acquired or developmental language disorders [Special Issue]. Clinical Linguistics and Phonetics, 23(3).
  • Nijveld, A., Ten Bosch, L., & Ernestus, M. (2019). ERP signal analysis with temporal resolution using a time window bank. In Proceedings of Interspeech 2019 (pp. 1208-1212). doi:10.21437/Interspeech.2019-2729.

    Abstract

    In order to study the cognitive processes underlying speech comprehension, neuro-physiological measures (e.g., EEG and MEG), or behavioural measures (e.g., reaction times and response accuracy) can be applied. Compared to behavioural measures, EEG signals can provide a more fine-grained and complementary view of the processes that take place during the unfolding of an auditory stimulus.

    EEG signals are often analysed after having chosen specific time windows, which are usually based on the temporal structure of ERP components expected to be sensitive to the experimental manipulation. However, as the timing of ERP components may vary between experiments, trials, and participants, such a-priori defined analysis time windows may significantly hamper the exploratory power of the analysis of components of interest. In this paper, we explore a wide-window analysis method applied to EEG signals collected in an auditory repetition priming experiment.

    This approach is based on a bank of temporal filters arranged along the time axis in combination with linear mixed effects modelling. Crucially, it permits a temporal decomposition of effects in a single comprehensive statistical model which captures the entire EEG trace.
  • Offenga, F., Broeder, D., Wittenburg, P., Ducret, J., & Romary, L. (2006). Metadata profile in the ISO data category registry. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1866-1869).
  • Ortega, G., & Ozyurek, A. (2016). Generalisable patterns of gesture distinguish semantic categories in communication without language. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1182-1187). Austin, TX: Cognitive Science Society.

    Abstract

    There is a long-standing assumption that gestural forms are geared by a set of modes of representation (acting, representing, drawing, moulding) with each technique expressing speakers’ focus of attention on specific aspects of referents (Müller, 2013). Beyond different taxonomies describing the modes of representation, it remains unclear what factors motivate certain depicting techniques over others. Results from a pantomime generation task show that pantomimes are not entirely idiosyncratic but rather follow generalisable patterns constrained by their semantic category. We show that a) specific modes of representations are preferred for certain objects (acting for manipulable objects and drawing for non-manipulable objects); and b) that use and ordering of deictics and modes of representation operate in tandem to distinguish between semantically related concepts (e.g., “to drink” vs “mug”). This study provides yet more evidence that our ability to communicate through silent gesture reveals systematic ways to describe events and objects around us
  • Pacheco, A., Araújo, S., Faísca, L., Petersson, K. M., & Reis, A. (2009). Profiling dislexic children: Phonology and visual naming skills. In Abstracts presented at the International Neuropsychological Society, Finnish Neuropsychological Society, Joint Mid-Year Meeting July 29-August 1, 2009. Helsinki, Finland & Tallinn, Estonia (pp. 40). Retrieved from http://www.neuropsykologia.fi/ins2009/INS_MY09_Abstract.pdf.
  • Papafragou, A., & Ozturk, O. (2006). The acquisition of epistemic modality. In A. Botinis (Ed.), Proceedings of ITRW on Experimental Linguistics in ExLing-2006 (pp. 201-204). ISCA Archive.

    Abstract

    In this paper we try to contribute to the body of knowledge about the acquisition of English epistemic modal verbs (e.g. Mary may/has to be at school). Semantically, these verbs encode possibility or necessity with respect to available evidence. Pragmatically, the use of epistemic modals often gives rise to scalar conversational inferences (Mary may be at school -> Mary doesn’t have to be at school). The acquisition of epistemic modals is challenging for children on both these levels. In this paper, we present findings from two studies which were conducted with 5-year-old children and adults. Our findings, unlike previous work, show that 5-yr-olds have mastered epistemic modal semantics, including the notions of necessity and possibility. However, they are still in the process of acquiring epistemic modal pragmatics.
  • Parhammer*, S. I., Ebersberg*, M., Tippmann*, J., Stärk*, K., Opitz, A., Hinger, B., & Rossi, S. (2019). The influence of distraction on speech processing: How selective is selective attention? In Proceedings of Interspeech 2019 (pp. 3093-3097). doi:10.21437/Interspeech.2019-2699.

    Abstract

    -* indicates shared first authorship -
    The present study investigated the effects of selective attention on the processing of morphosyntactic errors in unattended parts of speech. Two groups of German native (L1) speakers participated in the present study. Participants listened to sentences in which irregular verbs were manipulated in three different conditions (correct, incorrect but attested ablaut pattern, incorrect and crosslinguistically unattested ablaut pattern). In order to track fast dynamic neural reactions to the stimuli, electroencephalography was used. After each sentence, participants in Experiment 1 performed a semantic judgement task, which deliberately distracted the participants from the syntactic manipulations and directed their attention to the semantic content of the sentence. In Experiment 2, participants carried out a syntactic judgement task, which put their attention on the critical stimuli. The use of two different attentional tasks allowed for investigating the impact of selective attention on speech processing and whether morphosyntactic processing steps are performed automatically. In Experiment 2, the incorrect attested condition elicited a larger N400 component compared to the correct condition, whereas in Experiment 1 no differences between conditions were found. These results suggest that the processing of morphosyntactic violations in irregular verbs is not entirely automatic but seems to be strongly affected by selective attention.
  • Peeters, D. (2016). Processing consequences of onomatopoeic iconicity in spoken language comprehension. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1632-1647). Austin, TX: Cognitive Science Society.

    Abstract

    Iconicity is a fundamental feature of human language. However its processing consequences at the behavioral and neural level in spoken word comprehension are not well understood. The current paper presents the behavioral and electrophysiological outcome of an auditory lexical decision task in which native speakers of Dutch listened to onomatopoeic words and matched control words while their electroencephalogram was recorded. Behaviorally, onomatopoeic words were processed as quickly and accurately as words with an arbitrary mapping between form and meaning. Event-related potentials time-locked to word onset revealed a significant decrease in negative amplitude in the N2 and N400 components and a late positivity for onomatopoeic words in comparison to the control words. These findings advance our understanding of the temporal dynamics of iconic form-meaning mapping in spoken word comprehension and suggest interplay between the neural representations of real-world sounds and spoken words.
  • Pereiro Estevan, Y., Wan, V., Scharenborg, O., & Gallardo Antolín, A. (2006). Segmentación de fonemas no supervisada basada en métodos kernel de máximo margen. In Proceedings of IV Jornadas en Tecnología del Habla.

    Abstract

    En este artículo se desarrolla un método automático de segmentación de fonemas no supervisado. Este método utiliza el algoritmo de agrupación de máximo margen [1] para realizar segmentación de fonemas sobre habla continua sin necesidad de información a priori para el entrenamiento del sistema.
  • Pluymaekers, M., Ernestus, M., Baayen, R. H., & Booij, G. (2006). The role of morphology in fine phonetic detail: The case of Dutch -igheid. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 53-54).
  • Pluymaekers, M., Ernestus, M., & Baayen, R. H. (2006). Effects of word frequency on the acoustic durations of affixes. In Proceedings of Interspeech 2006 (pp. 953-956). Pittsburgh: ICSLP.

    Abstract

    This study investigates whether the acoustic durations of derivational affixes in Dutch are affected by the frequency of the word they occur in. In a word naming experiment, subjects were presented with a large number of words containing one of the affixes ge-, ver-, ont, or -lijk. Their responses were recorded on DAT tapes, and the durations of the affixes were measured using Automatic Speech Recognition technology. To investigate whether frequency also affected durations when speech rate was high, the presentation rate of the stimuli was varied. The results show that a higher frequency of the word as a whole led to shorter acoustic realizations for all affixes. Furthermore, affixes became shorter as the presentation rate of the stimuli increased. There was no interaction between word frequency and presentation rate, suggesting that the frequency effect also applies in situations in which the speed of articulation is very high.
  • Poletiek, F. H., & Chater, N. (2006). Grammar induction profits from representative stimulus sampling. In R. Sun (Ed.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006) (pp. 1968-1973). Austin, TX, USA: Cognitive Science Society.
  • Pouw, W., Paxton, A., Harrison, S. J., & Dixon, J. A. (2019). Acoustic specification of upper limb movement in voicing. In A. Grimminger (Ed.), Proceedings of the 6th Gesture and Speech in Interaction – GESPIN 6 (pp. 68-74). Paderborn: Universitaetsbibliothek Paderborn. doi:10.17619/UNIPB/1-812.
  • Pouw, W., & Dixon, J. A. (2019). Quantifying gesture-speech synchrony. In A. Grimminger (Ed.), Proceedings of the 6th Gesture and Speech in Interaction – GESPIN 6 (pp. 75-80). Paderborn: Universitaetsbibliothek Paderborn. doi:10.17619/UNIPB/1-812.

    Abstract

    Spontaneously occurring speech is often seamlessly accompanied by hand gestures. Detailed
    observations of video data suggest that speech and gesture are tightly synchronized in time,
    consistent with a dynamic interplay between body and mind. However, spontaneous gesturespeech
    synchrony has rarely been objectively quantified beyond analyses of video data, which
    do not allow for identification of kinematic properties of gestures. Consequently, the point in
    gesture which is held to couple with speech, the so-called moment of “maximum effort”, has
    been variably equated with the peak velocity, peak acceleration, peak deceleration, or the onset
    of the gesture. In the current exploratory report, we provide novel evidence from motiontracking
    and acoustic data that peak velocity is closely aligned, and shortly leads, the peak pitch
    (F0) of speech

    Additional information

    https://osf.io/9843h/
  • Raviv, L., & Arnon, I. (2016). The developmental trajectory of children's statistical learning abilities. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016). Austin, TX: Cognitive Science Society (pp. 1469-1474). Austin, TX: Cognitive Science Society.

    Abstract

    Infants, children and adults are capable of implicitly extracting regularities from their environment through statistical learning (SL). SL is present from early infancy and found across tasks and modalities, raising questions about the domain generality of SL. However, little is known about its’ developmental trajectory: Is SL fully developed capacity in infancy, or does it improve with age, like other cognitive skills? While SL is well established in infants and adults, only few studies have looked at SL across development with conflicting results: some find age-related improvements while others do not. Importantly, despite its postulated role in language learning, no study has examined the developmental trajectory of auditory SL throughout childhood. Here, we conduct a large-scale study of children's auditory SL across a wide age-range (5-12y, N=115). Results show that auditory SL does not change much across development. We discuss implications for modality-based differences in SL and for its role in language acquisition.
  • Raviv, L., & Arnon, I. (2016). Language evolution in the lab: The case of child learners. In A. Papagrafou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016). Austin, TX: Cognitive Science Society (pp. 1643-1648). Austin, TX: Cognitive Science Society.

    Abstract

    Recent work suggests that cultural transmission can lead to the emergence of linguistic structure as speakers’ weak individual biases become amplified through iterated learning. However, to date, no published study has demonstrated a similar emergence of linguistic structure in children. This gap is problematic given that languages are mainly learned by children and that adults may bring existing linguistic biases to the task. Here, we conduct a large-scale study of iterated language learning in both children and adults, using a novel, child-friendly paradigm. The results show that while children make more mistakes overall, their languages become more learnable and show learnability biases similar to those of adults. Child languages did not show a significant increase in linguistic structure over time, but consistent mappings between meanings and signals did emerge on many occasions, as found with adults. This provides the first demonstration that cultural transmission affects the languages children and adults produce similarly.
  • Ringersma, J., Zinn, C., & Kemps-Snijders, M. (2009). LEXUS & ViCoS From lexical to conceptual spaces. In 1st International Conference on Language Documentation and Conservation (ICLDC).

    Abstract

    LEXUS and ViCoS: from lexicon to conceptual spaces LEXUS is a web-based lexicon tool and the knowledge space software ViCoS is an extension of LEXUS, allowing users to create relations between objects in and across lexica. LEXUS and ViCoS are part of the Language Archiving Technology software, developed at the MPI for Psycholinguistics to archive and enrich linguistic resources collected in the framework of language documentation projects. LEXUS is of primary interest for language documentation, offering the possibility to not just create a digital dictionary, but additionally it allows the creation of multi-media encyclopedic lexica. ViCoS provides an interface between the lexical space and the ontological space. Its approach permits users to model a world of concepts and their interrelations based on categorization patterns made by the speech community. We describe the LEXUS and ViCoS functionalities using three cases from DoBeS language documentation projects: (1) Marquesan The Marquesan lexicon was initially created in Toolbox and imported into LEXUS using the Toolbox import functionality. The lexicon is enriched with multi-media to illustrate the meaning of the words in its cultural environment. Members of the speech community consider words as keys to access and describe relevant parts of their life and traditions. Their understanding of words is best described by the various associations they evoke rather than in terms of any formal theory of meaning. Using ViCoS a knowledge space of related concepts is being created. (2) Kola-Sámi Two lexica are being created in LEXUS: RuSaDic lexicon is a Russian-Kildin wordlist in which the entries are of relative limited structure and content. SaRuDiC is a more complex structured lexicon with much richer content, including multi-media fragments and derivations. Using ViCoS we have created a connection between the two lexica, so that speakers who are familiair with Russian and wish to revitalize their Kildin can enter the lexicon through the RuSaDic and from there approach the informative SaRuDic. Similary we will create relations from the two lexica to external open databases, like e.g. Álgu. (3) Beaver A speaker database including kinship relations has been created and the database has been imported into LEXUS. In the LEXUS views the relations for individual speakers are being displayed. Using ViCoS the relational information from the database will be extracted to form a kisnhip relation space with specific relation types, like e.g 'mother-of'. The whole set of relations from the database can be displayed in one ViCoS relation window, and zoom functionality is available.
  • Rissman, L., & Majid, A. (2019). Agency drives category structure in instrumental events. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2661-2667). Montreal, QB: Cognitive Science Society.

    Abstract

    Thematic roles such as Agent and Instrument have a long-standing place in theories of event representation. Nonetheless, the structure of these categories has been difficult to determine. We investigated how instrumental events, such as someone slicing bread with a knife, are categorized in English. Speakers described a variety of typical and atypical instrumental events, and we determined the similarity structure of their descriptions using correspondence analysis. We found that events where the instrument is an extension of an intentional agent were most likely to elicit similar language, highlighting the importance of agency in structuring instrumental categories.
  • Rodd, J., & Chen, A. (2016). Pitch accents show a perceptual magnet effect: Evidence of internal structure in intonation categories. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 697-701).

    Abstract

    The question of whether intonation events have a categorical mental representation has long been a puzzle in prosodic research, and one that experiments testing production and perception across category boundaries have failed to definitively resolve. This paper takes the alternative approach of looking for evidence of structure within a postulated category by testing for a Perceptual Magnet Effect (PME). PME has been found in boundary tones but has not previously been conclusively found in pitch accents. In this investigation, perceived goodness and discriminability of re-synthesised Dutch nuclear rise contours (L*H H%) were evaluated by naive native speakers of Dutch. The variation between these stimuli was quantified using a polynomial-parametric modelling approach (i.e. the SOCoPaSul model) in place of the traditional approach whereby excursion size, peak alignment and pitch register are used independently of each other to quantify variation between pitch accents. Using this approach to calculate the acoustic-perceptual distance between different stimuli, PME was detected: (1) rated goodness, decreased as acoustic-perceptual distance relative to the prototype increased, and (2) equally spaced items far from the prototype were less frequently generalised than equally spaced items in the neighbourhood of the prototype. These results support the concept of categorically distinct intonation events.

    Additional information

    Link to Speech Prosody Website
  • Romberg, A., Zhang, Y., Newman, B., Triesch, J., & Yu, C. (2016). Global and local statistical regularities control visual attention to object sequences. In Proceedings of the 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 262-267).

    Abstract

    Many previous studies have shown that both infants and adults are skilled statistical learners. Because statistical learning is affected by attention, learners' ability to manage their attention can play a large role in what they learn. However, it is still unclear how learners allocate their attention in order to gain information in a visual environment containing multiple objects, especially how prior visual experience (i.e., familiarly of objects) influences where people look. To answer these questions, we collected eye movement data from adults exploring multiple novel objects while manipulating object familiarity with global (frequencies) and local (repetitions) regularities. We found that participants are sensitive to both global and local statistics embedded in their visual environment and they dynamically shift their attention to prioritize some objects over others as they gain knowledge of the objects and their distributions within the task.
  • Sauter, D., Eisner, F., Ekman, P., & Scott, S. K. (2009). Universal vocal signals of emotion. In N. Taatgen, & H. Van Rijn (Eds.), Proceedings of the 31st Annual Meeting of the Cognitive Science Society (CogSci 2009) (pp. 2251-2255). Cognitive Science Society.

    Abstract

    Emotional signals allow for the sharing of important information with conspecifics, for example to warn them of danger. Humans use a range of different cues to communicate to others how they feel, including facial, vocal, and gestural signals. Although much is known about facial expressions of emotion, less research has focused on affect in the voice. We compare British listeners to individuals from remote Namibian villages who have had no exposure to Western culture, and examine recognition of non-verbal emotional vocalizations, such as screams and laughs. We show that a number of emotions can be universally recognized from non-verbal vocal signals. In addition we demonstrate the specificity of this pattern, with a set of additional emotions only recognized within, but not across these cultural groups. Our findings indicate that a small set of primarily negative emotions have evolved signals across several modalities, while most positive emotions are communicated with culture-specific signals.
  • Scharenborg, O., Wan, V., & Moore, R. K. (2006). Capturing fine-phonetic variation in speech through automatic classification of articulatory features. In Speech Recognition and Intrinsic Variation Workshop [SRIV2006] (pp. 77-82). ISCA Archive.

    Abstract

    The ultimate goal of our research is to develop a computational model of human speech recognition that is able to capture the effects of fine-grained acoustic variation on speech recognition behaviour. As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. In the experiments reported here, we compared support vector machines (SVMs) with multilayer perceptrons (MLPs). MLPs have been widely (and rather successfully) used for the task of multi-value articulatory feature classification, while (to the best of our knowledge) SVMs have not. This paper compares the performances of the two classifiers and analyses the results in order to better understand the articulatory representations. It was found that the MLPs outperformed the SVMs, but it is concluded that both classifiers exhibit similar behaviour in terms of patterns of errors.
  • Scharenborg, O., & Okolowski, S. (2009). Lexical embedding in spoken Dutch. In INTERSPEECH 2009 - 10th Annual Conference of the International Speech Communication Association (pp. 1879-1882). ISCA Archive.

    Abstract

    A stretch of speech is often consistent with multiple words, e.g., the sequence /hæm/ is consistent with ‘ham’ but also with the first syllable of ‘hamster’, resulting in temporary ambiguity. However, to what degree does this lexical embedding occur? Analyses on two corpora of spoken Dutch showed that 11.9%-19.5% of polysyllabic word tokens have word-initial embedding, while 4.1%-7.5% of monosyllabic word tokens can appear word-initially embedded. This is much lower than suggested by an analysis of a large dictionary of Dutch. Speech processing thus appears to be simpler than one might expect on the basis of statistics on a dictionary.
  • Scharenborg, O. (2009). Using durational cues in a computational model of spoken-word recognition. In INTERSPEECH 2009 - 10th Annual Conference of the International Speech Communication Association (pp. 1675-1678). ISCA Archive.

    Abstract

    Evidence that listeners use durational cues to help resolve temporarily ambiguous speech input has accumulated over the past few years. In this paper, we investigate whether durational cues are also beneficial for word recognition in a computational model of spoken-word recognition. Two sets of simulations were carried out using the acoustic signal as input. The simulations showed that the computational model, like humans, takes benefit from durational cues during word recognition, and uses these to disambiguate the speech signal. These results thus provide support for the theory that durational cues play a role in spoken-word recognition.
  • Schoenmakers, G.-J., & De Swart, P. (2019). Adverbial hurdles in Dutch scrambling. In A. Gattnar, R. Hörnig, M. Störzer, & S. Featherston (Eds.), Proceedings of Linguistic Evidence 2018: Experimental Data Drives Linguistic Theory (pp. 124-145). Tübingen: University of Tübingen.

    Abstract

    This paper addresses the role of the adverb in Dutch direct object scrambling constructions. We report four experiments in which we investigate whether the structural position and the scope sensitivity of the adverb affect acceptability judgments of scrambling constructions and native speakers' tendency to scramble definite objects. We conclude that the type of adverb plays a key role in Dutch word ordering preferences.
  • Schuerman, W. L., McQueen, J. M., & Meyer, A. S. (2019). Speaker statistical averageness modulates word recognition in adverse listening conditions. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1203-1207). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    We tested whether statistical averageness (SA) at the level of the individual speaker could predict a speaker’s intelligibility. 28 female and 21 male speakers of Dutch were recorded producing 336 sentences,
    each containing two target nouns. Recordings were compared to those of all other same-sex speakers using dynamic time warping (DTW). For each sentence, the DTW distance constituted a metric
    of phonetic distance from one speaker to all other speakers. SA comprised the average of these distances. Later, the same participants performed a word recognition task on the target nouns in the same sentences, under three degraded listening conditions. In all three conditions, accuracy increased with SA. This held even when participants listened to their own utterances. These findings suggest that listeners process speech with respect to the statistical
    properties of the language spoken in their community, rather than using their own speech as a reference
  • Schuppler, B., Van Dommelen, W., Koreman, J., & Ernestus, M. (2009). Word-final [t]-deletion: An analysis on the segmental and sub-segmental level. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 2275-2278). Causal Productions Pty Ltd.

    Abstract

    This paper presents a study on the reduction of word-final [t]s in conversational standard Dutch. Based on a large amount of tokens annotated on the segmental level, we show that the bigram frequency and the segmental context are the main predictors for the absence of [t]s. In a second study, we present an analysis of the detailed acoustic properties of word-final [t]s and we show that bigram frequency and context also play a role on the subsegmental level. This paper extends research on the realization of /t/ in spontaneous speech and shows the importance of incorporating sub-segmental properties in models of speech.
  • Scott, S., & Sauter, D. (2006). Non-verbal expressions of emotion - acoustics, valence, and cross cultural factors. In Third International Conference on Speech Prosody 2006. ISCA.

    Abstract

    This presentation will address aspects of the expression of emotion in non-verbal vocal behaviour, specifically attempting to determine the roles of both positive and negative emotions, their acoustic bases, and the extent to which these are recognized in non-Western cultures.
  • Seidlmayer, E., Galke, L., Melnychuk, T., Schultz, C., Tochtermann, K., & Förstner, K. U. (2019). Take it personally - A Python library for data enrichment for infometrical applications. In M. Alam, R. Usbeck, T. Pellegrini, H. Sack, & Y. Sure-Vetter (Eds.), Proceedings of the Posters and Demo Track of the 15th International Conference on Semantic Systems co-located with 15th International Conference on Semantic Systems (SEMANTiCS 2019).

    Abstract

    Like every other social sphere, science is influenced by individual characteristics of researchers. However, for investigations on scientific networks, only little data about the social background of researchers, e.g. social origin, gender, affiliation etc., is available.
    This paper introduces ”Take it personally - TIP”, a conceptual model and library currently under development, which aims to support the
    semantic enrichment of publication databases with semantically related background information which resides elsewhere in the (semantic) web, such as Wikidata.
    The supplementary information enriches the original information in the publication databases and thus facilitates the creation of complex scientific knowledge graphs. Such enrichment helps to improve the scientometric analysis of scientific publications as they can also take social backgrounds of researchers into account and to understand social structure in research communities.
  • Seijdel, N., Sakmakidis, N., De Haan, E. H. F., Bohte, S. M., & Scholte, H. S. (2019). Implicit scene segmentation in deeper convolutional neural networks. In Proceedings of the 2019 Conference on Cognitive Computational Neuroscience (pp. 1059-1062). doi:10.32470/CCN.2019.1149-0.

    Abstract

    Feedforward deep convolutional neural networks (DCNNs) are matching and even surpassing human performance on object recognition. This performance suggests that activation of a loose collection of image
    features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Recent findings in humans however, suggest that while feedforward activity may suffice for
    sparse scenes with isolated objects, additional visual operations ('routines') that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to
    performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects
    and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image. Results indicated less distinction between object- and background features for more shallow networks. For those networks, we observed a benefit of training on segmented objects (as compared to unsegmented objects). Overall, deeper networks trained on natural
    (unsegmented) scenes seem to perform implicit 'segmentation' of the objects from their background, possibly by improved selection of relevant features.
  • Seuren, P. A. M. (2009). Logical systems and natural logical intuitions. In Current issues in unity and diversity of languages: Collection of the papers selected from the CIL 18, held at Korea University in Seoul on July 21-26, 2008. http://www.cil18.org (pp. 53-60).

    Abstract

    The present paper is part of a large research programme investigating the nature and properties of the predicate logic inherent in natural language. The general hypothesis is that natural speakers start off with a basic-natural logic, based on natural cognitive functions, including the basic-natural way of dealing with plural objects. As culture spreads, functional pressure leads to greater generalization and mathematical correctness, yielding ever more refined systems until the apogee of standard modern predicate logic. Four systems of predicate calculus are considered: Basic-Natural Predicate Calculus (BNPC), Aritsotelian-Abelardian Predicate Calculus (AAPC), Aritsotelian-Boethian Predicate Calculus (ABPC), also known as the classic Square of Opposition, and Standard Modern Predicate Calculus (SMPC). (ABPC is logically faulty owing to its Undue Existential Import (UEI), but that fault is repaired by the addition of a presuppositional component to the logic.) All four systems are checked against seven natural logical intuitions. It appears that BNPC scores best (five out of seven), followed by ABPC (three out of seven). AAPC and SMPC finish ex aequo with two out of seven.
  • Seuren, P. A. M. (1974). Pronomi clitici in italiano. In M. Medici, & A. Sangregorio (Eds.), Fenomeni morfologici e sintattici nell'Italiano contemporaneo (pp. 309-327). Roma: Bulzoni.
  • Shen, C., & Janse, E. (2019). Articulatory control in speech production. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 2533-2537). Canberra, Australia: Australasian Speech Science and Technology Association Inc.
  • Shen, C., Cooke, M., & Janse, E. (2019). Individual articulatory control in speech enrichment. In M. Ochmann, M. Vorländer, & J. Fels (Eds.), Proceedings of the 23rd International Congress on Acoustics (pp. 5726-5730). Berlin: Deutsche Gesellschaft für Akustik.

    Abstract

    ndividual talkers may use various strategies to enrich their speech while speaking in noise (i.e., Lombard speech) to improve their intelligibility. The resulting acoustic-phonetic changes in Lombard speech vary amongst different speakers, but it is unclear what causes these talker differences, and what impact these differences have on intelligibility. This study investigates the potential role of articulatory control in talkers’ Lombard speech enrichment success. Seventy-eight speakers read out sentences in both their habitual style and in a condition where they were instructed to speak clearly while hearing loud speech-shaped noise. A diadochokinetic (DDK) speech task that requires speakers to repetitively produce word or non-word sequences as accurately and as rapidly as possible, was used to quantify their articulatory control. Individuals’ predicted intelligibility in both speaking styles (presented at -5 dB SNR) was measured using an acoustic glimpse-based metric: the High-Energy Glimpse Proportion (HEGP). Speakers’ HEGP scores show a clear effect of speaking condition (better HEGP scores in the Lombard than habitual condition), but no simple effect of articulatory control on HEGP, nor an interaction between speaking condition and articulatory control. This indicates that individuals’ speech enrichment success as measured by the HEGP metric was not predicted by DDK performance.
  • Sloetjes, H., & Seibert, O. (2016). Measuring by marking; the multimedia annotation tool ELAN. In A. Spink, G. Riedel, L. Zhou, L. Teekens, R. Albatal, & C. Gurrin (Eds.), Measuring Behavior 2016, 10th International Conference on Methods and Techniques in Behavioral Research (pp. 492-495).

    Abstract

    ELAN is a multimedia annotation tool developed by the Max Planck Institute for Psycholinguistics. It is applied in a variety of research areas. This paper presents a general overview of the tool and new developments as the calculation of inter-rater reliability, a commentary framework, semi-automatic segmentation and labeling and export to Theme.
  • Speed, L., Chen, J., Huettig, F., & Majid, A. (2016). Do classifier categories affect or reflect object concepts? In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 2267-2272). Austin, TX: Cognitive Science Society.

    Abstract

    We conceptualize objects based on sensory and motor information gleaned from real-world experience. But to what extent is such conceptual information structured according to higher level linguistic features too? Here we investigate whether classifiers, a grammatical category, shape the conceptual representations of objects. In three experiments native Mandarin speakers (speakers of a classifier language) and native Dutch speakers (speakers of a language without classifiers) judged the similarity of a target object (presented as a word or picture) with four objects (presented as words or pictures). One object shared a classifier with the target, the other objects did not, serving as distractors. Across all experiments, participants judged the target object as more similar to the object with the shared classifier than distractor objects. This effect was seen in both Dutch and Mandarin speakers, and there was no difference between the two languages. Thus, even speakers of a non-classifier language are sensitive to object similarities underlying classifier systems, and using a classifier system does not exaggerate these similarities. This suggests that classifier systems simply reflect, rather than affect, conceptual structure.
  • Speed, L., & Majid, A. (2016). Grammatical gender affects odor cognition. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016) (pp. 1451-1456). Austin, TX: Cognitive Science Society.

    Abstract

    Language interacts with olfaction in exceptional ways. Olfaction is believed to be weakly linked with language, as demonstrated by our poor odor naming ability, yet olfaction seems to be particularly susceptible to linguistic descriptions. We tested the boundaries of the influence of language on olfaction by focusing on a non-lexical aspect of language (grammatical gender). We manipulated the grammatical gender of fragrance descriptions to test whether the congruence with fragrance gender would affect the way fragrances were perceived and remembered. Native French and German speakers read descriptions of fragrances containing ingredients with feminine or masculine grammatical gender, and then smelled masculine or feminine fragrances and rated them on a number of dimensions (e.g., pleasantness). Participants then completed an odor recognition test. Fragrances were remembered better when presented with descriptions whose grammatical gender matched the gender of the fragrance. Overall, results suggest grammatical manipulations of odor descriptions can affect odor cognition
  • Stehouwer, H., & van Zaanen, M. (2009). Language models for contextual error detection and correction. In Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference (pp. 41-48). Association for Computational Linguistics.

    Abstract

    The problem of identifying and correcting confusibles, i.e. context-sensitive spelling errors, in text is typically tackled using specifically trained machine learning classifiers. For each different set of confusibles, a specific classifier is trained and tuned. In this research, we investigate a more generic approach to context-sensitive confusible correction. Instead of using specific classifiers, we use one generic classifier based on a language model. This measures the likelihood of sentences with different possible solutions of a confusible in place. The advantage of this approach is that all confusible sets are handled by a single model. Preliminary results show that the performance of the generic classifier approach is only slightly worse that that of the specific classifier approach
  • Stehouwer, H., & Van Zaanen, M. (2009). Token merging in language model-based confusible disambiguation. In T. Calders, K. Tuyls, & M. Pechenizkiy (Eds.), Proceedings of the 21st Benelux Conference on Artificial Intelligence (pp. 241-248).

    Abstract

    In the context of confusible disambiguation (spelling correction that requires context), the synchronous back-off strategy combined with traditional n-gram language models performs well. However, when alternatives consist of a different number of tokens, this classification technique cannot be applied directly, because the computation of the probabilities is skewed. Previous work already showed that probabilities based on different order n-grams should not be compared directly. In this article, we propose new probability metrics in which the size of the n is varied according to the number of tokens of the confusible alternative. This requires access to n-grams of variable length. Results show that the synchronous back-off method is extremely robust. We discuss the use of suffix trees as a technique to store variable length n-gram information efficiently.
  • Sumer, B., Perniss, P. M., & Ozyurek, A. (2016). Viewpoint preferences in signing children's spatial descriptions. In J. Scott, & D. Waughtal (Eds.), Proceedings of the 40th Annual Boston University Conference on Language Development (BUCLD 40) (pp. 360-374). Boston, MA: Cascadilla Press.
  • Ten Bosch, L., Baayen, R. H., & Ernestus, M. (2006). On speech variation and word type differentiation by articulatory feature representations. In Proceedings of Interspeech 2006 (pp. 2230-2233).

    Abstract

    This paper describes ongoing research aiming at the description of variation in speech as represented by asynchronous articulatory features. We will first illustrate how distances in the articulatory feature space can be used for event detection along speech trajectories in this space. The temporal structure imposed by the cosine distance in articulatory feature space coincides to a large extent with the manual segmentation on phone level. The analysis also indicates that the articulatory feature representation provides better such alignments than the MFCC representation does. Secondly, we will present first results that indicate that articulatory features can be used to probe for acoustic differences in the onsets of Dutch singulars and plurals.
  • Ten Bosch, L., Mulder, K., & Boves, L. (2019). Phase synchronization between EEG signals as a function of differences between stimuli characteristics. In Proceedings of Interspeech 2019 (pp. 1213-1217). doi:10.21437/Interspeech.2019-2443.

    Abstract

    The neural processing of speech leads to specific patterns in the brain which can be measured as, e.g., EEG signals. When properly aligned with the speech input and averaged over many tokens, the Event Related Potential (ERP) signal is able to differentiate specific contrasts between speech signals. Well-known effects relate to the difference between expected and unexpected words, in particular in the N400, while effects in N100 and P200 are related to attention and acoustic onset effects. Most EEG studies deal with the amplitude of EEG signals over time, sidestepping the effect of phase and phase synchronization. This paper investigates the relation between phase in the EEG signals measured in an auditory lexical decision task by Dutch participants listening to full and reduced English word forms. We show that phase synchronization takes place across stimulus conditions, and that the so-called circular variance is narrowly related to the type of contrast between stimuli.
  • ten Bosch, L., Hämäläinen, A., Scharenborg, O., & Boves, L. (2006). Acoustic scores and symbolic mismatch penalties in phone lattices. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing [ICASSP 2006]. IEEE.

    Abstract

    This paper builds on previous work that aims at unraveling the structure of the speech signal by means of using probabilistic representations. The context of this work is a multi-pass speech recognition system in which a phone lattice is created and used as a basis for a lexical search in which symbolic mismatches are allowed at certain costs. The focus is on the optimization of the costs of phone insertions, deletions and substitutions that are used in the lexical decoding pass. Two optimization approaches are presented, one related to a multi-pass computational model for human speech recognition, the other based on a decoding in which Bayes’ risks are minimized. In the final section, the advantages of these optimization methods are discussed and compared.
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2016). Combining data-oriented and process-oriented approaches to modeling reaction time data. In Proceedings of Interspeech 2016: The 17th Annual Conference of the International Speech Communication Association (pp. 2801-2805). doi:10.21437/Interspeech.2016-1072.

    Abstract

    This paper combines two different approaches to modeling reaction time data from lexical decision experiments, viz. a dataoriented statistical analysis by means of a linear mixed effects model, and a process-oriented computational model of human speech comprehension. The linear mixed effect model is implemented by lmer in R. As computational model we apply DIANA, an end-to-end computational model which aims at modeling the cognitive processes underlying speech comprehension. DIANA takes as input the speech signal, and provides as output the orthographic transcription of the stimulus, a word/non-word judgment and the associated reaction time. Previous studies have shown that DIANA shows good results for large-scale lexical decision experiments in Dutch and North-American English. We investigate whether predictors that appear significant in an lmer analysis and processes implemented in DIANA can be related and inform both approaches. Predictors such as ‘previous reaction time’ can be related to a process description; other predictors, such as ‘lexical neighborhood’ are hard-coded in lmer and emergent in DIANA. The analysis focuses on the interaction between subject variables and task variables in lmer, and the ways in which these interactions can be implemented in DIANA.
  • Ten Bosch, L., Giezenaar, G., Boves, L., & Ernestus, M. (2016). Modeling language-learners' errors in understanding casual speech. In G. Adda, V. Barbu Mititelu, J. Mariani, D. Tufiş, & I. Vasilescu (Eds.), Errors by humans and machines in multimedia, multimodal, multilingual data processing. Proceedings of Errare 2015 (pp. 107-121). Bucharest: Editura Academiei Române.

    Abstract

    In spontaneous conversations, words are often produced in reduced form compared to formal careful speech. In English, for instance, ’probably’ may be pronounced as ’poly’ and ’police’ as ’plice’. Reduced forms are very common, and native listeners usually do not have any problems with interpreting these reduced forms in context. Non-native listeners, however, have great difficulties in comprehending reduced forms. In order to investigate the problems in comprehension that non-native listeners experience, a dictation experiment was conducted in which sentences were presented auditorily to non-natives either in full (unreduced) or reduced form. The types of errors made by the L2 listeners reveal aspects of the cognitive processes underlying this dictation task. In addition, we compare the errors made by these human participants with the type of word errors made by DIANA, a recently developed computational model of word comprehension.
  • Ter Bekke, M., Ozyurek, A., & Ünal, E. (2019). Speaking but not gesturing predicts motion event memory within and across languages. In A. Goel, C. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2940-2946). Montreal, QB: Cognitive Science Society.

    Abstract

    In everyday life, people see, describe and remember motion events. We tested whether the type of motion event information (path or manner) encoded in speech and gesture predicts which information is remembered and if this varies across speakers of typologically different languages. We focus on intransitive motion events (e.g., a woman running to a tree) that are described differently in speech and co-speech gesture across languages, based on how these languages typologically encode manner and path information (Kita & Özyürek, 2003; Talmy, 1985). Speakers of Dutch (n = 19) and Turkish (n = 22) watched and described motion events. With a surprise (i.e. unexpected) recognition memory task, memory for manner and path components of these events was measured. Neither Dutch nor Turkish speakers’ memory for manner went above chance levels. However, we found a positive relation between path speech and path change detection: participants who described the path during encoding were more accurate at detecting changes to the path of an event during the memory task. In addition, the relation between path speech and path memory changed with native language: for Dutch speakers encoding path in speech was related to improved path memory, but for Turkish speakers no such relation existed. For both languages, co-speech gesture did not predict memory speakers. We discuss the implications of these findings for our understanding of the relations between speech, gesture, type of encoding in language and memory.
  • Torreira, F., & Ernestus, M. (2009). Probabilistic effects on French [t] duration. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 448-451). Causal Productions Pty Ltd.

    Abstract

    The present study shows that [t] consonants are affected by probabilistic factors in a syllable-timed language as French, and in spontaneous as well as in journalistic speech. Study 1 showed a word bigram frequency effect in spontaneous French, but its exact nature depended on the corpus on which the probabilistic measures were based. Study 2 investigated journalistic speech and showed an effect of the joint frequency of the test word and its following word. We discuss the possibility that these probabilistic effects are due to the speaker’s planning of upcoming words, and to the speaker’s adaptation to the listener’s needs.
  • Trilsbeek, P., & Windhouwer, M. (2016). FLAT: A CLARIN-compatible repository solution based on Fedora Commons. In Proceedings of the CLARIN Annual Conference 2016. Clarin ERIC.

    Abstract

    This paper describes the development of a CLARIN-compatible repository solution that fulfils
    both the long-term preservation requirements as well as the current day discoverability and usability
    needs of an online data repository of language resources. The widely used Fedora Commons
    open source repository framework, combined with the Islandora discovery layer, forms
    the basis of the solution. On top of this existing solution, additional modules and tools are developed
    to make it suitable for the types of data and metadata that are used by the participating
    partners.

    Additional information

    link to pdf on CLARIN site
  • Troncoso Ruiz, A., Ernestus, M., & Broersma, M. (2019). Learning to produce difficult L2 vowels: The effects of awareness-rasing, exposure and feedback. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 1094-1098). Canberra, Australia: Australasian Speech Science and Technology Association Inc.
  • Tuinman, A. (2006). Overcompensation of /t/ reduction in Dutch by German/Dutch bilinguals. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 101-102).

Share this page