Publications

Displaying 1 - 100 of 270
  • Alhama, R. G., & Zuidema, W. (2017). Segmentation as Retention and Recognition: the R&R model. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 1531-1536). Austin, TX: Cognitive Science Society.

    Abstract

    We present the Retention and Recognition model (R&R), a probabilistic exemplar model that accounts for segmentation in Artificial Language Learning experiments. We show that R&R provides an excellent fit to human responses in three segmentation experiments with adults (Frank et al., 2010), outperforming existing models. Additionally, we analyze the results of the simulations and propose alternative explanations for the experimental findings.
  • Alhama, R. G., Scha, R., & Zuidema, W. (2014). Rule learning in humans and animals. In E. A. Cartmill, S. Roberts, H. Lyn, & H. Cornish (Eds.), The evolution of language: Proceedings of the 10th International Conference (EVOLANG 10) (pp. 371-372). Singapore: World Scientific.
  • Allen, S. E. M. (1998). A discourse-pragmatic explanation for the subject-object asymmetry in early null arguments. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings of the GALA '97 Conference on Language Acquisition (pp. 10-15). Edinburgh, UK: Edinburgh University Press.

    Abstract

    The present paper assesses discourse-pragmatic factors as a potential explanation for the subject-object assymetry in early child language. It identifies a set of factors which characterize typical situations of informativeness (Greenfield & Smith, 1976), and uses these factors to identify informative arguments in data from four children aged 2;0 through 3;6 learning Inuktitut as a first language. In addition, it assesses the extent of the links between features of informativeness on one hand and lexical vs. null and subject vs. object arguments on the other. Results suggest that a pragmatics account of the subject-object asymmetry can be upheld to a greater extent than previous research indicates, and that several of the factors characterizing informativeness are good indicators of those arguments which tend to be omitted in early child language.
  • Ameka, F. K. (2017). The Uselessness of the Useful: Language Standardisation and Variation in Multilingual Context. In I. Tieken-Boon van Ostade, & C. Percy (Eds.), Prescription and tradition in language: Establishing standards across the time and space (pp. 71-87). Bristol: Multilingual Matters.
  • Azar, Z., Backus, A., & Ozyurek, A. (2017). Highly proficient bilinguals maintain language-specific pragmatic constraints on pronouns: Evidence from speech and gesture. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 81-86). Austin, TX: Cognitive Science Society.

    Abstract

    The use of subject pronouns by bilingual speakers using both a pro-drop and a non-pro-drop language (e.g. Spanish heritage speakers in the USA) is a well-studied topic in research on cross-linguistic influence in language contact situations. Previous studies looking at bilinguals with different proficiency levels have yielded conflicting results on whether there is transfer from the non-pro-drop patterns to the pro-drop language. Additionally, previous research has focused on speech patterns only. In this paper, we study the two modalities of language, speech and gesture, and ask whether and how they reveal cross-linguistic influence on the use of subject pronouns in discourse. We focus on elicited narratives from heritage speakers of Turkish in the Netherlands, in both Turkish (pro-drop) and Dutch (non-pro-drop), as well as from monolingual control groups. The use of pronouns was not very common in monolingual Turkish narratives and was constrained by the pragmatic contexts, unlike in Dutch. Furthermore, Turkish pronouns were more likely to be accompanied by localized gestures than Dutch pronouns, presumably because pronouns in Turkish are pragmatically marked forms. We did not find any cross-linguistic influence in bilingual speech or gesture patterns, in line with studies (speech only) of highly proficient bilinguals. We therefore suggest that speech and gesture parallel each other not only in monolingual but also in bilingual production. Highly proficient heritage speakers who have been exposed to diverse linguistic and gestural patterns of each language from early on maintain monolingual patterns of pragmatic constraints on the use of pronouns multimodally.
  • Baayen, R. H. (2014). Productivity in language production. In D. Sandra, & M. Taft (Eds.), Morphological Structure, Lexical Representation and Lexical Access: A Special Issue of Language and Cognitive Processes (pp. 447-469). London: Routledge.

    Abstract

    Lexical statistics and a production experiment are used to gauge the extent to which the linguistic notion of morphological productivity is relevant for psycholinguistic theories of speech production in languages such as Dutch and English. Lexical statistics of productivity show that despite the relatively poor morphology of Dutch, new words are created often enough for the marginalisation of word formation in theories of speech production to be theoretically unattractive. This conclusion is supported by the results of a production experiment in which subjects freely created hundreds of productive, but only a handful of unproductive, neologisms. A tentative solution is proposed as to why the opposite pattern has been observed in the speech of jargonaphasics.
  • Bauer, B. L. M. (2014). Indefinite HOMO in the Gospels of the Vulgata. In P. Molinell, P. Cuzzoli, & C. Fedriani (Eds.), Latin vulgaire – latin tardif X (pp. 415-435). Bergamo: Bergamo University Press.
  • Bergmann, C., Ten Bosch, L., & Boves, L. (2014). A computational model of the headturn preference procedure: Design, challenges, and insights. In J. Mayor, & P. Gomez (Eds.), Computational Models of Cognitive Processes (pp. 125-136). World Scientific. doi:10.1142/9789814458849_0010.

    Abstract

    The Headturn Preference Procedure (HPP) is a frequently used method (e.g., Jusczyk & Aslin; and subsequent studies) to investigate linguistic abilities in infants. In this paradigm infants are usually first familiarised with words and then tested for a listening preference for passages containing those words in comparison to unrelated passages. Listening preference is defined as the time an infant spends attending to those passages with his or her head turned towards a flashing light and the speech stimuli. The knowledge and abilities inferred from the results of HPP studies have been used to reason about and formally model early linguistic skills and language acquisition. However, the actual cause of infants' behaviour in HPP experiments has been subject to numerous assumptions as there are no means to directly tap into cognitive processes. To make these assumptions explicit, and more crucially, to understand how infants' behaviour emerges if only general learning mechanisms are assumed, we introduce a computational model of the HPP. Simulations with the computational HPP model show that the difference in infant behaviour between familiarised and unfamiliar words in passages can be explained by a general learning mechanism and that many assumptions underlying the HPP are not necessarily warranted. We discuss the implications for conventional interpretations of the outcomes of HPP experiments.
  • Bergmann, C., Tsuji, S., & Cristia, A. (2017). Top-down versus bottom-up theories of phonological acquisition: A big data approach. In Proceedings of Interspeech 2017 (pp. 2103-2107).

    Abstract

    Recent work has made available a number of standardized meta- analyses bearing on various aspects of infant language processing. We utilize data from two such meta-analyses (discrimination of vowel contrasts and word segmentation, i.e., recognition of word forms extracted from running speech) to assess whether the published body of empirical evidence supports a bottom-up versus a top-down theory of early phonological development by leveling the power of results from thousands of infants. We predicted that if infants can rely purely on auditory experience to develop their phonological categories, then vowel discrimination and word segmentation should develop in parallel, with the latter being potentially lagged compared to the former. However, if infants crucially rely on word form information to build their phonological categories, then development at the word level must precede the acquisition of native sound categories. Our results do not support the latter prediction. We discuss potential implications and limitations, most saliently that word forms are only one top-down level proposed to affect phonological development, with other proposals suggesting that top-down pressures emerge from lexical (i.e., word-meaning pairs) development. This investigation also highlights general procedures by which standardized meta-analyses may be reused to answer theoretical questions spanning across phenomena.

    Additional information

    Scripts and data
  • Black, A., & Bergmann, C. (2017). Quantifying infants' statistical word segmentation: A meta-analysis. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Meeting of the Cognitive Science Society (pp. 124-129). Austin, TX: Cognitive Science Society.

    Abstract

    Theories of language acquisition and perceptual learning increasingly rely on statistical learning mechanisms. The current meta-analysis aims to clarify the robustness of this capacity in infancy within the word segmentation literature. Our analysis reveals a significant, small effect size for conceptual replications of Saffran, Aslin, & Newport (1996), and a nonsignificant effect across all studies that incorporate transitional probabilities to segment words. In both conceptual replications and the broader literature, however, statistical learning is moderated by whether stimuli are naturally produced or synthesized. These findings invite deeper questions about the complex factors that influence statistical learning, and the role of statistical learning in language acquisition.
  • Blasi, D. E., Christiansen, M. H., Wichmann, S., Hammarström, H., & Stadler, P. F. (2014). Sound symbolism and the origins of language. In E. A. Cartmill, S. Roberts, H. Lyn, & H. Cornish (Eds.), The evolution of language: Proceedings of the 10th International Conference (EVOLANG 10) (pp. 391-392). Singapore: World Scientific.
  • Blomert, L., & Hagoort, P. (1987). Neurobiologische en neuropsychologische aspecten van dyslexie. In J. Hamers, & A. Van der Leij (Eds.), Dyslexie 87 (pp. 35-44). Lisse: Swets and Zeitlinger.
  • Bocanegra, B. R., Poletiek, F. H., & Zwaan, R. A. (2014). Asymmetrical feature binding across language and perception. In Proceedings of the 7th annual Conference on Embodied and Situated Language Processing (ESLP 2014).
  • Bohnemeyer, J. (1998). Sententiale Topics im Yukatekischen. In Z. Dietmar (Ed.), Deskriptive Grammatik und allgemeiner Sprachvergleich (pp. 55-85). Tübingen, Germany: Max-Niemeyer-Verlag.

    Files private

    Request files
  • Bohnemeyer, J. (1998). Temporale Relatoren im Hispano-Yukatekischen Sprachkontakt. In A. Koechert, & T. Stolz (Eds.), Convergencia e Individualidad - Las lenguas Mayas entre hispanización e indigenismo (pp. 195-241). Hannover, Germany: Verlag für Ethnologie.
  • Bosker, H. R., & Kösem, A. (2017). An entrained rhythm's frequency, not phase, influences temporal sampling of speech. In Proceedings of Interspeech 2017 (pp. 2416-2420). doi:10.21437/Interspeech.2017-73.

    Abstract

    Brain oscillations have been shown to track the slow amplitude fluctuations in speech during comprehension. Moreover, there is evidence that these stimulus-induced cortical rhythms may persist even after the driving stimulus has ceased. However, how exactly this neural entrainment shapes speech perception remains debated. This behavioral study investigated whether and how the frequency and phase of an entrained rhythm would influence the temporal sampling of subsequent speech. In two behavioral experiments, participants were presented with slow and fast isochronous tone sequences, followed by Dutch target words ambiguous between as /ɑs/ “ash” (with a short vowel) and aas /a:s/ “bait” (with a long vowel). Target words were presented at various phases of the entrained rhythm. Both experiments revealed effects of the frequency of the tone sequence on target word perception: fast sequences biased listeners to more long /a:s/ responses. However, no evidence for phase effects could be discerned. These findings show that an entrained rhythm’s frequency, but not phase, influences the temporal sampling of subsequent speech. These outcomes are compatible with theories suggesting that sensory timing is evaluated relative to entrained frequency. Furthermore, they suggest that phase tracking of (syllabic) rhythms by theta oscillations plays a limited role in speech parsing.
  • Bosker, H. R. (2017). The role of temporal amplitude modulations in the political arena: Hillary Clinton vs. Donald Trump. In Proceedings of Interspeech 2017 (pp. 2228-2232). doi:10.21437/Interspeech.2017-142.

    Abstract

    Speech is an acoustic signal with inherent amplitude modulations in the 1-9 Hz range. Recent models of speech perception propose that this rhythmic nature of speech is central to speech recognition. Moreover, rhythmic amplitude modulations have been shown to have beneficial effects on language processing and the subjective impression listeners have of the speaker. This study investigated the role of amplitude modulations in the political arena by comparing the speech produced by Hillary Clinton and Donald Trump in the three presidential debates of 2016. Inspection of the modulation spectra, revealing the spectral content of the two speakers’ amplitude envelopes after matching for overall intensity, showed considerably greater power in Clinton’s modulation spectra (compared to Trump’s) across the three debates, particularly in the 1-9 Hz range. The findings suggest that Clinton’s speech had a more pronounced temporal envelope with rhythmic amplitude modulations below 9 Hz, with a preference for modulations around 3 Hz. This may be taken as evidence for a more structured temporal organization of syllables in Clinton’s speech, potentially due to more frequent use of preplanned utterances. Outcomes are interpreted in light of the potential beneficial effects of a rhythmic temporal envelope on intelligibility and speaker perception.
  • Bowerman, M. (1987). Commentary: Mechanisms of language acquisition. In B. MacWhinney (Ed.), Mechanisms of language acquisition (pp. 443-466). Hillsdale, N.J.: Lawrence Erlbaum.
  • Broeder, D., Schuurman, I., & Windhouwer, M. (2014). Experiences with the ISOcat Data Category Registry. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 4565-4568).
  • Broeder, D., & Van Uytvanck, D. (2014). Metadata formats. In J. Durand, U. Gut, & G. Kristoffersen (Eds.), The Oxford Handbook of Corpus Phonology (pp. 150-165). Oxford: Oxford University Press.
  • Brown, P. (1998). Early Tzeltal verbs: Argument structure and argument representation. In E. Clark (Ed.), Proceedings of the 29th Annual Stanford Child Language Research Forum (pp. 129-140). Stanford: CSLI Publications.

    Abstract

    The surge of research activity focussing on children's acquisition of verbs (e.g., Tomasello and Merriman 1996) addresses some fundamental questions: Just how variable across languages, and across individual children, is the process of verb learning? How specific are arguments to particular verbs in early child language? How does the grammatical category 'Verb' develop? The position of Universal Grammar, that a verb category is early, contrasts with that of Tomasello (1992), Pine and Lieven and their colleagues (1996, in press), and many others, that children develop a verb category slowly, gradually building up subcategorizations of verbs around pragmatic, syntactic, and semantic properties of the language they are exposed to. On this latter view, one would expect the language which the child is learning, the cultural milieu and the nature of the interactions in which the child is engaged, to influence the process of acquiring verb argument structures. This paper explores these issues by examining the development of argument representation in the Mayan language Tzeltal, in both its lexical and verbal cross-referencing forms, and analyzing the semantic and pragmatic factors influencing the form argument representation takes. Certain facts about Tzeltal (the ergative/ absolutive marking, the semantic specificity of transitive and positional verbs) are proposed to affect the representation of arguments. The first 500 multimorpheme combinations of 3 children (aged between 1;8 and 2;4) are examined. It is argued that there is no evidence of semantically light 'pathbreaking' verbs (Ninio 1996) leading the way into word combinations. There is early productivity of cross-referencing affixes marking A, S, and O arguments (although there are systematic omissions). The paper assesses the respective contributions of three kinds of factors to these results - structural (regular morphology), semantic (verb specificity) and pragmatic (the nature of Tzeltal conversational interaction).
  • Brown, P. (2014). Gestures in native Mexico and Central America. In C. Müller, A. Cienki, E. Fricke, S. Ladewig, D. McNeill, & J. Bressem (Eds.), Body -language – communication: An international handbook on multimodality in human interaction. Volume 2 (pp. 1206-1215). Berlin: Mouton de Gruyter.

    Abstract

    The systematic study of kinesics, gaze, and gestural aspects of communication in Central American cultures is a recent phenomenon, most of it focussing on the Mayan cultures of southern Mexico, Guatemala, and Belize. This article surveys ethnographic observations and research reports on bodily aspects of speaking in three domains: gaze and kinesics in social interaction, indexical pointing in adult and caregiver-child interactions, and co-speech gestures associated with “absolute” (geographically-based) systems of spatial reference. In addition, it reports how the indigenous co-speech gesture repertoire has provided the basis for developing village sign languages in the region. It is argued that studies of the embodied aspects of speech in the Mayan areas of Mexico and Central America have contributed to the typology of gestures and of spatial frames of reference. They have refined our understanding of how spatial frames of reference are invoked, communicated, and switched in conversational interaction and of the importance of co-speech gestures in understanding language use, language acquisition, and the transmission of culture-specific cognitive styles.
  • Brown, P. (1998). How and why are women more polite: Some evidence from a Mayan community. In J. Coates (Ed.), Language and gender (pp. 81-99). Oxford: Blackwell.
  • Brown, P. (2017). Politeness and impoliteness. In Y. Huang (Ed.), Oxford handbook of pragmatics (pp. 383-399). Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780199697960.013.16.

    Abstract

    This article selectively reviews the literature on politeness across different disciplines—linguistics, anthropology, communications, conversation analysis, social psychology, and sociology—and critically assesses how both theoretical approaches to politeness and research on linguistic politeness phenomena have evolved over the past forty years. Major new developments include a shift from predominantly linguistic approaches to those examining politeness and impoliteness as processes that are embedded and negotiated in interactional and cultural contexts, as well as a greater focus on how both politeness and interactional confrontation and conflict fit into our developing understanding of human cooperation and universal aspects of human social interaction.

    Files private

    Request files
  • Brown, P., & Levinson, S. C. (1998). Politeness, introduction to the reissue: A review of recent work. In A. Kasher (Ed.), Pragmatics: Vol. 6 Grammar, psychology and sociology (pp. 488-554). London: Routledge.

    Abstract

    This article is a reprint of chapter 1, the introduction to Brown and Levinson, 1987, Politeness: Some universals in language usage (Cambridge University Press).
  • Brown, P., & Gaskins, S. (2014). Language acquisition and language socialization. In N. J. Enfield, P. Kockelman, & J. Sidnell (Eds.), Cambridge handbook of linguistic anthropology (pp. 187-226). Cambridge: Cambridge University Press.
  • Brown, P. (2014). The interactional context of language learning in Tzeltal. In I. Arnon, M. Casillas, C. Kurumada, & B. Estigarriba (Eds.), Language in Interaction: Studies in honor of Eve V. Clark (pp. 51-82). Amsterdam: Benjamins.

    Abstract

    This paper addresses the theories of Eve Clark about how children learn word meanings in western middle-class interactional contexts by examining child language data from a Tzeltal Maya society in southern Mexico where interaction patterns are radically different. Through examples of caregiver interactions with children 12-30 months old, I ask what lessons we can learn from how the details of these interactions unfold in this non-child-centered cultural context, and specifically, what aspects of the Tzeltal linguistic and interactional context might help to focus children’s attention on the meanings and the conventional forms of words being used around them.
  • Burchfield, L. A., Luk, S.-.-H.-K., Antoniou, M., & Cutler, A. (2017). Lexically guided perceptual learning in Mandarin Chinese. In Proceedings of Interspeech 2017 (pp. 576-580). doi:10.21437/Interspeech.2017-618.

    Abstract

    Lexically guided perceptual learni ng refers to the use of lexical knowledge to retune sp eech categories and thereby adapt to a novel talker’s pronunciation. This adaptation has been extensively documented, but primarily for segmental-based learning in English and Dutch. In languages with lexical tone, such as Mandarin Chinese, tonal categories can also be retuned in this way, but segmental category retuning had not been studied. We report two experiment s in which Mandarin Chinese listeners were exposed to an ambiguous mixture of [f] and [s] in lexical contexts favoring an interpretation as either [f] or [s]. Listeners were subsequently more likely to identify sounds along a continuum between [f] and [s], and to interpret minimal word pairs, in a manner consistent with this exposure. Thus lexically guided perceptual learning of segmental categories had indeed taken place, consistent with suggestions that such learning may be a universally available adaptation process
  • Casillas, M., Bergelson, E., Warlaumont, A. S., Cristia, A., Soderstrom, M., VanDam, M., & Sloetjes, H. (2017). A New Workflow for Semi-automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language Environments. In Proceedings of Interspeech 2017 (pp. 2098-2102). doi:10.21437/Interspeech.2017-1418.

    Abstract

    Interoperable annotation formats are fundamental to the utility, expansion, and sustainability of collective data repositories.In language development research, shared annotation schemes have been critical to facilitating the transition from raw acoustic data to searchable, structured corpora. Current schemes typically require comprehensive and manual annotation of utterance boundaries and orthographic speech content, with an additional, optional range of tags of interest. These schemes have been enormously successful for datasets on the scale of dozens of recording hours but are untenable for long-format recording corpora, which routinely contain hundreds to thousands of audio hours. Long-format corpora would benefit greatly from (semi-)automated analyses, both on the earliest steps of annotation—voice activity detection, utterance segmentation, and speaker diarization—as well as later steps—e.g., classification-based codes such as child-vs-adult-directed speech, and speech recognition to produce phonetic/orthographic representations. We present an annotation workflow specifically designed for long-format corpora which can be tailored by individual researchers and which interfaces with the current dominant scheme for short-format recordings. The workflow allows semi-automated annotation and analyses at higher linguistic levels. We give one example of how the workflow has been successfully implemented in a large cross-database project.
  • Casillas, M. (2014). Taking the floor on time: Delay and deferral in children’s turn taking. In I. Arnon, M. Casillas, C. Kurumada, & B. Estigarribia (Eds.), Language in Interaction: Studies in honor of Eve V. Clark (pp. 101-114). Amsterdam: Benjamins.

    Abstract

    A key part of learning to speak with others is figuring out when to start talking and how to hold the floor in conversation. For young children, the challenge of planning a linguistic response can slow down their response latencies, making misunderstanding, repair, and loss of the floor more likely. Like adults, children can mitigate their delays by using fillers (e.g., uh and um) at the start of their turns. In this chapter I analyze the onset and development of fillers in five children’s spontaneous speech from ages 1;6–3;6. My findings suggest that children start using fillers by 2;0, and use them to effectively mitigate delay in making a response.
  • Casillas, M., Amatuni, A., Seidl, A., Soderstrom, M., Warlaumont, A., & Bergelson, E. (2017). What do Babies hear? Analyses of Child- and Adult-Directed Speech. In Proceedings of Interspeech 2017 (pp. 2093-2097). doi:10.21437/Interspeech.2017-1409.

    Abstract

    Child-directed speech is argued to facilitate language development, and is found cross-linguistically and cross-culturally to varying degrees. However, previous research has generally focused on short samples of child-caregiver interaction, often in the lab or with experimenters present. We test the generalizability of this phenomenon with an initial descriptive analysis of the speech heard by young children in a large, unique collection of naturalistic, daylong home recordings. Trained annotators coded automatically-detected adult speech 'utterances' from 61 homes across 4 North American cities, gathered from children (age 2-24 months) wearing audio recorders during a typical day. Coders marked the speaker gender (male/female) and intended addressee (child/adult), yielding 10,886 addressee and gender tags from 2,523 minutes of audio (cf. HB-CHAAC Interspeech ComParE challenge; Schuller et al., in press). Automated speaker-diarization (LENA) incorrectly gender-tagged 30% of male adult utterances, compared to manually-coded consensus. Furthermore, we find effects of SES and gender on child-directed and overall speech, increasing child-directed speech with child age, and interactions of speaker gender, child gender, and child age: female caretakers increased their child-directed speech more with age than male caretakers did, but only for male infants. Implications for language acquisition and existing classification algorithms are discussed.
  • Casillas, M. (2014). Turn-taking. In D. Matthews (Ed.), Pragmatic development in first language acquisition (pp. 53-70). Amsterdam: Benjamins.

    Abstract

    Conversation is a structured, joint action for which children need to learn a specialized set skills and conventions. Because conversation is a primary source of linguistic input, we can better grasp how children become active agents in their own linguistic development by studying their acquisition of conversational skills. In this chapter I review research on children’s turn-taking. This fundamental skill of human interaction allows children to gain feedback, make clarifications, and test hypotheses at every stage of development. I broadly review children’s conversational experiences, the types of turn-based contingency they must acquire, how they ask and answer questions, and when they manage to make timely responses
  • Chang, F., & Fitz, H. (2014). Computational models of sentence production: A dual-path approach. In M. Goldrick, & M. Miozzo (Eds.), The Oxford handbook of language production (pp. 70-89). Oxford: Oxford University Press.

    Abstract

    Sentence production is the process we use to create language-specific sentences that convey particular meanings. In production, there are complex interactions between meaning, words, and syntax at different points in sentences. Computational models can make these interactions explicit and connectionist learning algorithms have been useful for building such models. Connectionist models use domaingeneral mechanisms to learn internal representations and these mechanisms can also explain evidence of long-term syntactic adaptation in adult speakers. This paper will review work showing that these models can generalize words in novel ways and learn typologically-different languages like English and Japanese. It will also present modeling work which shows that connectionist learning algorithms can account for complex sentence production in children and adult production phenomena like structural priming, heavy NP shift, and conceptual/lexical accessibility.
  • Chen, A. (2014). Production-comprehension (A)Symmetry: Individual differences in the acquisition of prosodic focus-marking. In N. Campbell, D. Gibbon, & D. Hirst (Eds.), Proceedings of Speech Prosody 2014 (pp. 423-427).

    Abstract

    Previous work based on different groups of children has shown that four- to five-year-old children are similar to adults in both producing and comprehending the focus-toaccentuation mapping in Dutch, contra the alleged productionprecedes- comprehension asymmetry in earlier studies. In the current study, we addressed the question of whether there are individual differences in the production-comprehension (a)symmetricity. To this end, we examined the use of prosody in focus marking in production and the processing of focusrelated prosody in online language comprehension in the same group of 4- to 5-year-olds. We have found that the relationship between comprehension and production can be rather diverse at an individual level. This result suggests some degree of independence in learning to use prosody to mark focus in production and learning to process focus-related prosodic information in online language comprehension, and implies influences of other linguistic and non-linguistic factors on the production-comprehension (a)symmetricity
  • Chen, A., Chen, A., Kager, R., & Wong, P. (2014). Rises and falls in Dutch and Mandarin Chinese. In C. Gussenhoven, Y. Chen, & D. Dediu (Eds.), Proceedings of the 4th International Symposium on Tonal Aspects of Language (pp. 83-86).

    Abstract

    Despite of the different functions of pitch in tone and nontone languages, rises and falls are common pitch patterns across different languages. In the current study, we ask what is the language specific phonetic realization of rises and falls. Chinese and Dutch speakers participated in a production experiment. We used contexts composed for conveying specific communicative purposes to elicit rises and falls. We measured both tonal alignment and tonal scaling for both patterns. For the alignment measurements, we found language specific patterns for the rises, but for falls. For rises, both peak and valley were aligned later among Chinese speakers compared to Dutch speakers. For all the scaling measurements (maximum pitch, minimum pitch, and pitch range), no language specific patterns were found for either the rises or the falls
  • Clark, N., & Perlman, M. (2014). Breath, vocal, and supralaryngeal flexibility in a human-reared gorilla. In B. De Boer, & T. Verhoef (Eds.), Proceedings of Evolang X, Workshop on Signals, Speech, and Signs (pp. 11-15).

    Abstract

    “Gesture-first” theories dismiss ancestral great apes’ vocalization as a substrate for language evolution based on the claim that extant apes exhibit minimal learning and volitional control of vocalization. Contrary to this claim, we present data of novel learned and voluntarily controlled vocal behaviors produced by a human-fostered gorilla (G. gorilla gorilla). These behaviors demonstrate varying degrees of flexibility in the vocal apparatus (including diaphragm, lungs, larynx, and supralaryngeal articulators), and are predominantly performed in coordination with manual behaviors and gestures. Instead of a gesture-first theory, we suggest that these findings support multimodal theories of language evolution in which vocal and gestural forms are coordinated and supplement one another
  • Collins, J. (2017). Real and spurious correlations involving tonal languages. In N. J. Enfield (Ed.), Dependencies in language: On the causal ontology of linguistics systems (pp. 129-139). Berlin: Language Science Press.
  • Crago, M. B., & Allen, S. E. M. (1998). Acquiring Inuktitut. In O. L. Taylor, & L. Leonard (Eds.), Language Acquisition Across North America: Cross-Cultural And Cross-Linguistic Perspectives (pp. 245-279). San Diego, CA, USA: Singular Publishing Group, Inc.
  • Crago, M. B., Allen, S. E. M., & Pesco, D. (1998). Issues of Complexity in Inuktitut and English Child Directed Speech. In Proceedings of the twenty-ninth Annual Stanford Child Language Research Forum (pp. 37-46).
  • Crasborn, O., Hulsbosch, M., Lampen, L., & Sloetjes, H. (2014). New multilayer concordance functions in ELAN and TROVA. In Proceedings of the Tilburg Gesture Research Meeting [TiGeR 2013].

    Abstract

    Collocations generated by concordancers are a standard instrument in the exploitation of text corpora for the analysis of language use. Multimodal corpora show similar types of patterns, activities that frequently occur together, but there is no tool that offers facilities for visualising such patterns. Examples include timing of eye contact with respect to speech, and the alignment of activities of the two hands in signed languages. This paper describes recent enhancements to the standard CLARIN tools ELAN and TROVA for multimodal annotation to address these needs: first of all the query and concordancing functions were improved, and secondly the tools now generate visualisations of multilayer collocations that allow for intuitive explorations and analyses of multimodal data. This will provide a boost to the linguistic fields of gesture and sign language studies, as it will improve the exploitation of multimodal corpora.
  • Crasborn, O., & Sloetjes, H. (2014). Improving the exploitation of linguistic annotations in ELAN. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 3604-3608).

    Abstract

    This paper discusses some improvements in recent and planned versions of the multimodal annotation tool ELAN, which are targeted at improving the usability of annotated files. Increased support for multilingual documents is provided, by allowing for multilingual vocabularies and by specifying a language per document, annotation layer (tier) or annotation. In addition, improvements in the search possibilities and the display of the results have been implemented, which are especially relevant in the interpretation of the results of complex multi-tier searches.
  • Cutler, A., & Otake, T. (1998). Assimilation of place in Japanese and Dutch. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: vol. 5 (pp. 1751-1754). Sydney: ICLSP.

    Abstract

    Assimilation of place of articulation across a nasal and a following stop consonant is obligatory in Japanese, but not in Dutch. In four experiments the processing of assimilated forms by speakers of Japanese and Dutch was compared, using a task in which listeners blended pseudo-word pairs such as ranga-serupa. An assimilated blend of this pair would be rampa, an unassimilated blend rangpa. Japanese listeners produced significantly more assimilated than unassimilated forms, both with pseudo-Japanese and pseudo-Dutch materials, while Dutch listeners produced significantly more unassimilated than assimilated forms in each materials set. This suggests that Japanese listeners, whose native-language phonology involves obligatory assimilation constraints, represent the assimilated nasals in nasal-stop sequences as unmarked for place of articulation, while Dutch listeners, who are accustomed to hearing unassimilated forms, represent the same nasal segments as marked for place of articulation.
  • Cutler, A. (2017). Converging evidence for abstract phonological knowledge in speech processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 1447-1448). Austin, TX: Cognitive Science Society.

    Abstract

    The perceptual processing of speech is a constant interplay of multiple competing albeit convergent processes: acoustic input vs. higher-level representations, universal mechanisms vs. language-specific, veridical traces of speech experience vs. construction and activation of abstract representations. The present summary concerns the third of these issues. The ability to generalise across experience and to deal with resulting abstractions is the hallmark of human cognition, visible even in early infancy. In speech processing, abstract representations play a necessary role in both production and perception. New sorts of evidence are now informing our understanding of the breadth of this role.
  • Cutler, A. (1987). Components of prosodic effects in speech recognition. In Proceedings of the Eleventh International Congress of Phonetic Sciences: Vol. 1 (pp. 84-87). Tallinn: Academy of Sciences of the Estonian SSR, Institute of Language and Literature.

    Abstract

    Previous research has shown that listeners use the prosodic structure of utterances in a predictive fashion in sentence comprehension, to direct attention to accented words. Acoustically identical words spliced into sentence contexts arc responded to differently if the prosodic structure of the context is \ aricd: when the preceding prosody indicates that the word will he accented, responses are faster than when the preceding prosodv is inconsistent with accent occurring on that word. In the present series of experiments speech hybridisation techniques were first used to interchange the timing patterns within pairs of prosodic variants of utterances, independently of the pitch and intensity contours. The time-adjusted utterances could then serve as a basis lor the orthogonal manipulation of the three prosodic dimensions of pilch, intensity and rhythm. The overall pattern of results showed that when listeners use prosody to predict accent location, they do not simply rely on a single prosodic dimension, hut exploit the interaction between pitch, intensity and rhythm.
  • Cutler, A., & McQueen, J. M. (2014). How prosody is both mandatory and optional. In J. Caspers, Y. Chen, W. Heeren, J. Pacilly, N. O. Schiller, & E. Van Zanten (Eds.), Above and Beyond the Segments: Experimental linguistics and phonetics (pp. 71-82). Amsterdam: Benjamins.

    Abstract

    Speech signals originate as a sequence of linguistic units selected by speakers, but these units are necessarily realised in the suprasegmental dimensions of time, frequency and amplitude. For this reason prosodic structure has been viewed as a mandatory target of language processing by both speakers and listeners. In apparent contradiction, however, prosody has also been argued to be ancillary rather than core linguistic structure, making processing of prosodic structure essentially optional. In the present tribute to one of the luminaries of prosodic research for the past quarter century, we review evidence from studies of the processing of lexical stress and focal accent which reconciles these views and shows that both claims are, each in their own way, fully true.
  • Cutler, A. (1998). How listeners find the right words. In Proceedings of the Sixteenth International Congress on Acoustics: Vol. 2 (pp. 1377-1380). Melville, NY: Acoustical Society of America.

    Abstract

    Languages contain tens of thousands of words, but these are constructed from a tiny handful of phonetic elements. Consequently, words resemble one another, or can be embedded within one another, a coup stick snot with standing. me process of spoken-word recognition by human listeners involves activation of multiple word candidates consistent with the input, and direct competition between activated candidate words. Further, human listeners are sensitive, at an early, prelexical, stage of speeeh processing, to constraints on what could potentially be a word of the language.
  • Ip, M. H. K., & Cutler, A. (2017). Intonation facilitates prediction of focus even in the presence of lexical tones. In Proceedings of Interspeech 2017 (pp. 1218-1222). doi:10.21437/Interspeech.2017-264.

    Abstract

    In English and Dutch, listeners entrain to prosodic contours to predict where focus will fall in an utterance. However, is this strategy universally available, even in languages with different phonological systems? In a phoneme detection experiment, we examined whether prosodic entrainment is also found in Mandarin Chinese, a tone language, where in principle the use of pitch for lexical identity may take precedence over the use of pitch cues to salience. Consistent with the results from Germanic languages, response times were facilitated when preceding intonation predicted accent on the target-bearing word. Acoustic analyses revealed greater F0 range in the preceding intonation of the predicted-accent sentences. These findings have implications for how universal and language-specific mechanisms interact in the processing of salience.
  • Cutler, A., Treiman, R., & Van Ooijen, B. (1998). Orthografik inkoncistensy ephekts in foneme detektion? In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2783-2786). Sydney: ICSLP.

    Abstract

    The phoneme detection task is widely used in spoken word recognition research. Alphabetically literate participants, however, are more used to explicit representations of letters than of phonemes. The present study explored whether phoneme detection is sensitive to how target phonemes are, or may be, orthographically realised. Listeners detected the target sounds [b,m,t,f,s,k] in word-initial position in sequences of isolated English words. Response times were faster to the targets [b,m,t], which have consistent word-initial spelling, than to the targets [f,s,k], which are inconsistently spelled, but only when listeners’ attention was drawn to spelling by the presence in the experiment of many irregularly spelled fillers. Within the inconsistent targets [f,s,k], there was no significant difference between responses to targets in words with majority and minority spellings. We conclude that performance in the phoneme detection task is not necessarily sensitive to orthographic effects, but that salient orthographic manipulation can induce such sensitivity.
  • Cutler, A. (1998). Prosodic structure and word recognition. In A. D. Friederici (Ed.), Language comprehension: A biological perspective (pp. 41-70). Heidelberg: Springer.
  • Cutler, A. (1987). Speaking for listening. In A. Allport, D. MacKay, W. Prinz, & E. Scheerer (Eds.), Language perception and production: Relationships between listening, speaking, reading and writing (pp. 23-40). London: Academic Press.

    Abstract

    Speech production is constrained at all levels by the demands of speech perception. The speaker's primary aim is successful communication, and to this end semantic, syntactic and lexical choices are directed by the needs of the listener. Even at the articulatory level, some aspects of production appear to be perceptually constrained, for example the blocking of phonological distortions under certain conditions. An apparent exception to this pattern is word boundary information, which ought to be extremely useful to listeners, but which is not reliably coded in speech. It is argued that the solution to this apparent problem lies in rethinking the concept of the boundary of the lexical access unit. Speech rhythm provides clear information about the location of stressed syllables, and listeners do make use of this information. If stressed syllables can serve as the determinants of word lexical access codes, then once again speakers are providing precisely the necessary form of speech information to facilitate perception.
  • Cutler, A., & Carter, D. (1987). The prosodic structure of initial syllables in English. In J. Laver, & M. Jack (Eds.), Proceedings of the European Conference on Speech Technology: Vol. 1 (pp. 207-210). Edinburgh: IEE.
  • Cutler, A. (1998). The recognition of spoken words with variable representations. In D. Duez (Ed.), Proceedings of the ESCA Workshop on Sound Patterns of Spontaneous Speech (pp. 83-92). Aix-en-Provence: Université de Aix-en-Provence.
  • Dediu, D., & Graham, S. A. (2014). Genetics and Language. In M. Aronoff (Ed.), Oxford Bibliographies in Linguistics. New York: Oxford University Press. Retrieved from http://www.oxfordbibliographies.com/view/document/obo-9780199772810/obo-9780199772810-0184.xml.

    Abstract

    This article surveys what is currently known about the complex interplay between genetics and the language sciences. It focuses not only on the genetic architecture of language and speech, but also on their interactions on the cultural and evolutionary timescales. Given the complexity of these issues and their current state of flux and high dynamism, this article surveys the main findings and topics of interest while also briefly introducing the main relevant methods, thus allowing the interested reader to fully appreciate and understand them in their proper context. Of course, not all the relevant publications and resources are mentioned, but this article aims to select the most relevant, promising, or accessible for nonspecialists.

    Files private

    Request files
  • Dediu, D. (2017). From biology to language change and diversity. In N. J. Enfield (Ed.), Dependencies in language: On the causal ontology of linguistics systems (pp. 39-52). Berlin: Language Science Press.
  • Dediu, D., & Levinson, S. C. (2014). Language and speech are old: A review of the evidence and consequences for modern linguistic diversity. In E. A. Cartmill, S. G. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 421-422). Singapore: World Scientific.
  • Dediu, D. (2014). Language and biology: The multiple interactions between genetics and language. In N. J. Enfield, P. Kockelman, & J. Sidnell (Eds.), The Cambridge handbook of linguistic anthropology (pp. 686-707). Cambridge: Cambridge University Press.
  • Dediu, D., & Levinson, S. C. (2014). The time frame of the emergence of modern language and its implications. In D. Dor, C. Knight, & J. Lewis (Eds.), The social origins of language (pp. 184-195). Oxford: Oxford University Press.
  • Dingemanse, M. (2017). Brain-to-brain interfaces and the role of language in distributing agency. In N. J. Enfield, & P. Kockelman (Eds.), Distributed Agency (pp. 59-66). Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780190457204.003.0007.

    Abstract

    Brain-to-brain interfaces, in which brains are physically connected without the intervention of language, promise new ways of collaboration and communication between humans. I examine the narrow view of language implicit in current conceptions of brain-to-brain interfaces and put forward a constructive alternative, stressing the role of language in organising joint agency. Two features of language stand out as crucial: its selectivity, which provides people with much-needed filters between public words and private worlds; and its negotiability, which provides people with systematic opportunities for calibrating understanding and expressing consent and dissent. Without these checks and balances, brain-to-brain interfaces run the risk of reducing people to the level of amoeba in a slime mold; with them, they may mature to become useful extensions of human agency
  • Dingemanse, M., & Floyd, S. (2014). Conversation across cultures. In N. J. Enfield, P. Kockelman, & J. Sidnell (Eds.), The Cambridge handbook of linguistic anthropology (pp. 447-480). Cambridge: Cambridge University Press.
  • Dingemanse, M., Torreira, F., & Enfield, N. J. (2014). Conversational infrastructure and the convergent evolution of linguistic items. In E. A. Cartmill, S. G. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 425-426). Singapore: World Scientific.
  • Dingemanse, M. (2017). On the margins of language: Ideophones, interjections and dependencies in linguistic theory. In N. J. Enfield (Ed.), Dependencies in language (pp. 195-202). Berlin: Language Science Press. doi:10.5281/zenodo.573781.

    Abstract

    Linguistic discovery is viewpoint-dependent, just like our ideas about what is marginal and what is central in language. In this essay I consider two supposed marginalia —ideophones and interjections— which provide some useful pointers for widening our field of view. Ideophones challenge us to take a fresh look at language and consider how it is that our communication system combines multiple modes of representation. Interjections challenge us to extend linguistic inquiry beyond sentence level, and remind us that language is social-interactive at core. Marginalia, then, are not the obscure, exotic phenomena that can be safely ignored: they represent opportunities for innovation and invite us to keep pushing the edges of linguistic inquiry.
  • Dingemanse, M., Verhoef, T., & Roberts, S. G. (2014). The role of iconicity in the cultural evolution of communicative signals. In B. De Boer, & T. Verhoef (Eds.), Proceedings of Evolang X, Workshop on Signals, Speech, and Signs (pp. 11-15).
  • Dolscheid, S., Willems, R. M., Hagoort, P., & Casasanto, D. (2014). The relation of space and musical pitch in the brain. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 421-426). Austin, Tx: Cognitive Science Society.

    Abstract

    Numerous experiments show that space and musical pitch are closely linked in people's minds. However, the exact nature of space-pitch associations and their neuronal underpinnings are not well understood. In an fMRI experiment we investigated different types of spatial representations that may underlie musical pitch. Participants judged stimuli that varied in spatial height in both the visual and tactile modalities, as well as auditory stimuli that varied in pitch height. In order to distinguish between unimodal and multimodal spatial bases of musical pitch, we examined whether pitch activations were present in modality-specific (visual or tactile) versus multimodal (visual and tactile) regions active during spatial height processing. Judgments of musical pitch were found to activate unimodal visual areas, suggesting that space-pitch associations may involve modality-specific spatial representations, supporting a key assumption of embodied theories of metaphorical mental representation.
  • Doumas, L. A. A., Hamer, A., Puebla, G., & Martin, A. E. (2017). A theory of the detection and learning of structured representations of similarity and relative magnitude. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 1955-1960). Austin, TX: Cognitive Science Society.

    Abstract

    Responding to similarity, difference, and relative magnitude (SDM) is ubiquitous in the animal kingdom. However, humans seem unique in the ability to represent relative magnitude (‘more’/‘less’) and similarity (‘same’/‘different’) as abstract relations that take arguments (e.g., greater-than (x,y)). While many models use structured relational representations of magnitude and similarity, little progress has been made on how these representations arise. Models that developuse these representations assume access to computations of similarity and magnitude a priori, either encoded as features or as output of evaluation operators. We detail a mechanism for producing invariant responses to “same”, “different”, “more”, and “less” which can be exploited to compute similarity and magnitude as an evaluation operator. Using DORA (Doumas, Hummel, & Sandhofer, 2008), these invariant responses can serve be used to learn structured relational representations of relative magnitude and similarity from pixel images of simple shapes
  • Drozd, K. F. (1998). No as a determiner in child English: A summary of categorical evidence. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings of the Gala '97 Conference on Language Acquisition (pp. 34-39). Edinburgh, UK: Edinburgh University Press,.

    Abstract

    This paper summarizes the results of a descriptive syntactic category analysis of child English no which reveals that young children use and represent no as a determiner and negatives like no pen as NPs, contra standard analyses.
  • Drozdova, P., Van Hout, R., & Scharenborg, O. (2014). Phoneme category retuning in a non-native language. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 553-557).

    Abstract

    Previous studies have demonstrated that native listeners modify their interpretation of a speech sound when a talker produces an ambiguous sound in order to quickly tune into a speaker, but there is hardly any evidence that non-native listeners employ a similar mechanism when encountering ambiguous pronunciations. So far, one study demonstrated this lexically-guided perceptual learning effect for nonnatives, using phoneme categories similar in the native language of the listeners and the non-native language of the stimulus materials. The present study investigates the question whether phoneme category retuning is possible in a nonnative language for a contrast, /l/-/r/, which is phonetically differently embedded in the native (Dutch) and nonnative (English) languages involved. Listening experiments indeed showed a lexically-guided perceptual learning effect. Assuming that Dutch listeners have different phoneme categories for the native Dutch and non-native English /r/, as marked differences between the languages exist for /r/, these results, for the first time, seem to suggest that listeners are not only able to retune their native phoneme categories but also their non-native phoneme categories to include ambiguous pronunciations.
  • Drude, S., Trilsbeek, P., Sloetjes, H., & Broeder, D. (2014). Best practices in the creation, archiving and dissemination of speech corpora at the Language Archive. In S. Ruhi, M. Haugh, T. Schmidt, & K. Wörner (Eds.), Best Practices for Spoken Corpora in Linguistic Research (pp. 183-207). Newcastle upon Tyne: Cambridge Scholars Publishing.
  • Drude, S. (2014). Reduplication as a tool for morphological and phonological analysis in Awetí. In G. G. Gómez, & H. Van der Voort (Eds.), Reduplication in Indigenous languages of South America (pp. 185-216). Leiden: Brill.
  • Dunn, M. (2014). Gender determined dialect variation. In G. G. Corbett (Ed.), The expression of gender (pp. 39-68). Berlin: De Gruyter.
  • Dunn, M. (2014). Language phylogenies. In C. Bowern, & B. Evans (Eds.), The Routledge handbook of historical linguistics (pp. 190-211). London: Routlege.
  • Edmiston, P., Perlman, M., & Lupyan, G. (2017). Creating words from iterated vocal imitation. In G. Gunzelman, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 331-336). Austin, TX: Cognitive Science Society.

    Abstract

    We report the results of a large-scale (N=1571) experiment to investigate whether spoken words can emerge from the process of repeated imitation. Participants played a version of the children’s game “Telephone”. The first generation was asked to imitate recognizable environmental sounds (e.g., glass breaking, water splashing); subsequent generations imitated the imitators for a total of 8 generations. We then examined whether the vocal imitations became more stable and word-like, retained a resemblance to the original sound, and became more suitable as learned category labels. The results showed (1) the imitations became progressively more word-like, (2) even after 8 generations, they could be matched above chance to the environmental sound that motivated them, and (3) imitations from later generations were more effective as learned category labels. These results show how repeated imitation can create progressively more word-like forms while retaining a semblance of iconicity.
  • Eibl-Eibesfeldt, I., Senft, B., & Senft, G. (1998). Trobriander (Ost-Neuguinea, Trobriand Inseln, Kaile'una) Fadenspiele 'ninikula'. In Ethnologie - Humanethologische Begleitpublikationen von I. Eibl-Eibesfeldt und Mitarbeitern. Sammelband I, 1985-1987. Göttingen: Institut für den Wissenschaftlichen Film.
  • Emmorey, K., & Ozyurek, A. (2014). Language in our hands: Neural underpinnings of sign language and co-speech gesture. In M. S. Gazzaniga, & G. R. Mangun (Eds.), The cognitive neurosciences (5th ed., pp. 657-666). Cambridge, Mass: MIT Press.
  • Enfield, N. J. (2014). Causal dynamics of language. In N. J. Enfield, P. Kockelman, & J. Sidnell (Eds.), The Cambridge handbook of linguistic anthropology (pp. 325-342). Cambridge: Cambridge University Press.
  • Enfield, N. J., Kockelman, P., & Sidnell, J. (2014). Interdisciplinary perspectives. In N. J. Enfield, P. Kockelman, & J. Sidnell (Eds.), The Cambridge handbook of linguistic anthropology (pp. 599-602). Cambridge: Cambridge University Press.
  • Enfield, N. J., Kockelman, P., & Sidnell, J. (2014). Introduction: Directions in the anthropology of language. In N. J. Enfield, P. Kockelman, & J. Sidnell (Eds.), The Cambridge handbook of linguistic anthropology (pp. 1-24). Cambridge: Cambridge University Press.
  • Enfield, N. J. (2014). Human agency and the infrastructure for requests. In P. Drew, & E. Couper-Kuhlen (Eds.), Requesting in social interaction (pp. 35-50). Amsterdam: John Benjamins.

    Abstract

    This chapter discusses some of the elements of human sociality that serve as the social and cognitive infrastructure or preconditions for the use of requests and other kinds of recruitments in interaction. The notion of an agent with goals is a canonical starting point, though importantly agency tends not to be wholly located in individuals, but rather is socially distributed. This is well illustrated in the case of requests, in which the person or group that has a certain goal is not necessarily the one who carries out the behavior towards that goal. The chapter focuses on the role of semiotic (mostly linguistic) resources in negotiating the distribution of agency with request-like actions, with examples from video-recorded interaction in Lao, a language spoken in Laos and nearby countries. The examples illustrate five hallmarks of requesting in human interaction, which show some ways in which our ‘manipulation’ of other people is quite unlike our manipulation of tools: (1) that even though B is being manipulated, B wants to help, (2) that while A is manipulating B now, A may be manipulated in return later; (3) that the goal of the behavior may be shared between A and B, (4) that B may not comply, or may comply differently than requested, due to actual or potential contingencies, and (5) that A and B are accountable to one another; reasons may be asked for, and/or given, for the request. These hallmarks of requesting are grounded in a prosocial framework of human agency.
  • Enfield, N. J., & Sidnell, J. (2014). Language presupposes an enchronic infrastructure for social interaction. In D. Dor, C. Knight, & J. Lewis (Eds.), The social origins of language (pp. 92-104). Oxford: Oxford University Press.
  • Enfield, N. J. (2017). Language in the Mainland Southeast Asia Area. In R. Hickey (Ed.), The Cambridge Handbook of Areal Linguistics (pp. 677-702). Cambridge: Cambridge University Press. doi:10.1017/9781107279872.026.
  • Enfield, N. J., Sidnell, J., & Kockelman, P. (2014). System and function. In N. J. Enfield, P. Kockelman, & J. Sidnell (Eds.), The Cambridge handbook of linguistic anthropology (pp. 25-28). Cambridge: Cambridge University Press.
  • Enfield, N. J. (2014). The item/system problem. In N. J. Enfield, P. Kockelman, & J. Sidnell (Eds.), The Cambridge handbook of linguistic anthropology (pp. 48-77). Cambridge: Cambridge University Press.
  • Enfield, N. J. (2014). Transmission biases in the cultural evolution of language: Towards an explanatory framework. In D. Dor, C. Knight, & J. Lewis (Eds.), The social origins of language (pp. 325-335). Oxford: Oxford University Press.
  • Ernestus, M., & Giezenaar, G. (2014). Een goed verstaander heeft maar een half woord nodig. In B. Bossers (Ed.), Vakwerk 9: Achtergronden van de NT2-lespraktijk: Lezingen conferentie Hoeven 2014 (pp. 81-92). Amsterdam: BV NT2.
  • Ernestus, M., Kočková-Amortová, L., & Pollak, P. (2014). The Nijmegen corpus of casual Czech. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 365-370).

    Abstract

    This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which contains more than 30 hours of high-quality recordings of casual conversations in Common Czech, among ten groups of three male and ten groups of three female friends. All speakers were native speakers of Czech, raised in Prague or in the region of Central Bohemia, and were between 19 and 26 years old. Every group of speakers consisted of one confederate, who was instructed to keep the conversations lively, and two speakers naive to the purposes of the recordings. The naive speakers were engaged in conversations for approximately 90 minutes, while the confederate joined them for approximately the last 72 minutes. The corpus was orthographically annotated by experienced transcribers and this orthographic transcription was aligned with the speech signal. In addition, the conversations were videotaped. This corpus can form the basis for all types of research on casual conversations in Czech, including phonetic research and research on how to improve automatic speech recognition. The corpus will be freely available
  • Filippi, P. (2014). Linguistic animals: understanding language through a comparative approach. In E. A. Cartmill, S. Roberts, H. Lyn, & H. Crnish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 74-81). doi:10.1142/9789814603638_0082.

    Abstract

    With the aim to clarify the definition of humans as “linguistic animals”, in the present paper I functionally distinguish three types of language competences: i) language as a general biological tool for communication, ii) “perceptual syntax”, iii) propositional language. Following this terminological distinction, I review pivotal findings on animals' communication systems, which constitute useful evidence for the investigation of the nature of three core components of humans' faculty of language: semantics, syntax, and theory of mind. In fact, despite the capacity to process and share utterances with an open-ended structure is uniquely human, some isolated components of our linguistic competence are in common with nonhuman animals. Therefore, as I argue in the present paper, the investigation of animals' communicative competence provide crucial insights into the range of cognitive constraints underlying humans' ability of language, enabling at the same time the analysis of its phylogenetic path as well as of the selective pressures that have led to its emergence.
  • Filippi, P., Gingras, B., & Fitch, W. T. (2014). The effect of pitch enhancement on spoken language acquisition. In E. A. Cartmill, S. Roberts, H. Lyn, & H. Crnish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 437-438). doi:10.1142/9789814603638_0082.

    Abstract

    The aim of this study is to investigate the word-learning phenomenon utilizing a new model that integrates three processes: a) extracting a word out of a continuous sounds sequence, b) inducing referential meanings, c) mapping a word onto its intended referent, with the possibility to extend the acquired word over a potentially infinite sets of objects of the same semantic category, and over not-previously-heard utterances. Previous work has examined the role of statistical learning and/or of prosody in each of these processes separately. In order to examine the multilayered word-learning task, we integrate these two strands of investigation into a single approach. We have conducted the study on adults and included six different experimental conditions, each including specific perceptual manipulations of the signal. In condition 1, the only cue to word-meaning mapping was the co-occurrence between words and referents (“statistical cue”). This cue was present in all the conditions. In condition 2, we added infant-directed-speech (IDS) typical pitch enhancement as a marker of the target word and of the statistical cue. In condition 3 we placed IDS typical pitch enhancement on random words of the utterances, i.e. inconsistently matching the statistical cue. In conditions 4, 5 and 6 we manipulated respectively duration, a non-prosodic acoustic cue and a visual cue as markers of the target word and of the statistical cue. Systematic comparisons between learning performance in condition 1 with the other conditions revealed that the word-learning process is facilitated only when pitch prominence consistently marks the target word and the statistical cue…
  • Fisher, V. (2017). Dance as Embodied Analogy: Designing an Empirical Research Study. In M. Van Delft, J. Voets, Z. Gündüz, H. Koolen, & L. Wijers (Eds.), Danswetenschap in Nederland. Utrecht: Vereniging voor Dansonderzoek (VDO).
  • Fitz, H. (2014). Computermodelle für Spracherwerb und Sprachproduktion. Forschungsbericht 2014 - Max-Planck-Institut für Psycholinguistik. In Max-Planck-Gesellschaft Jahrbuch 2014. München: Max Planck Society for the Advancement of Science. Retrieved from http://www.mpg.de/7850678/Psycholinguistik_JB_2014?c=8236817.

    Abstract

    Relative clauses are a syntactic device to create complex sentences and they make language structurally productive. Despite a considerable number of experimental studies, it is still largely unclear how children learn relative clauses and how these are processed in the language system. Researchers at the MPI for Psycholinguistics used a computational learning model to gain novel insights into these issues. The model explains the differential development of relative clauses in English as well as cross-linguistic differences
  • Floyd, S. (2014). 'We’ as social categorization in Cha’palaa: A language of Ecuador. In T.-S. Pavlidou (Ed.), Constructing collectivity: 'We' across languages and contexts (pp. 135-158). Amsterdam: Benjamins.

    Abstract

    This chapter connects the grammar of the first person collective pronoun in the Cha’palaa language of Ecuador with its use in interaction for collective reference and social category membership attribution, addressing the problem posed by the fact that non-singular pronouns do not have distributional semantics (“speakers”) but are rather associational (“speaker and relevant associates”). It advocates a cross-disciplinary approach that jointly considers elements of linguistic form, situated usages of those forms in instances of interaction, and the broader ethnographic context of those instances. Focusing on large-scale and relatively stable categories such as racial and ethnic groups, it argues that looking at how speakers categorize themselves and others in the speech situation by using pronouns provides empirical data on the status of macro-social categories for members of a society

    Files private

    Request files
  • Floyd, S. (2014). Four types of reduplication in the Cha'palaa language of Ecuador. In H. van der Voort, & G. Goodwin Gómez (Eds.), Reduplication in Indigenous Languages of South America (pp. 77-114). Leiden: Brill.
  • Floyd, S. (2017). Requesting as a means for negotiating distributed agency. In N. J. Enfield, & P. Kockelman (Eds.), Distributed Agency (pp. 67-78). Oxford: Oxford University Press.
  • Francisco, A. A., Jesse, A., Groen, M. a., & McQueen, J. M. (2014). Audiovisual temporal sensitivity in typical and dyslexic adult readers. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) (pp. 2575-2579).

    Abstract

    Reading is an audiovisual process that requires the learning of systematic links between graphemes and phonemes. It is thus possible that reading impairments reflect an audiovisual processing deficit. In this study, we compared audiovisual processing in adults with developmental dyslexia and adults without reading difficulties. We focused on differences in cross-modal temporal sensitivity both for speech and for non-speech events. When compared to adults without reading difficulties, adults with developmental dyslexia presented a wider temporal window in which unsynchronized speech events were perceived as synchronized. No differences were found between groups for the non-speech events. These results suggests a deficit in dyslexia in the perception of cross-modal temporal synchrony for speech events.
  • Franken, M. K., Eisner, F., Schoffelen, J.-M., Acheson, D. J., Hagoort, P., & McQueen, J. M. (2017). Audiovisual recalibration of vowel categories. In Proceedings of Interspeech 2017 (pp. 655-658). doi:10.21437/Interspeech.2017-122.

    Abstract

    One of the most daunting tasks of a listener is to map a continuous auditory stream onto known speech sound categories and lexical items. A major issue with this mapping problem is the variability in the acoustic realizations of sound categories, both within and across speakers. Past research has suggested listeners may use visual information (e.g., lipreading) to calibrate these speech categories to the current speaker. Previous studies have focused on audiovisual recalibration of consonant categories. The present study explores whether vowel categorization, which is known to show less sharply defined category boundaries, also benefit from visual cues. Participants were exposed to videos of a speaker pronouncing one out of two vowels, paired with audio that was ambiguous between the two vowels. After exposure, it was found that participants had recalibrated their vowel categories. In addition, individual variability in audiovisual recalibration is discussed. It is suggested that listeners’ category sharpness may be related to the weight they assign to visual information in audiovisual speech perception. Specifically, listeners with less sharp categories assign more weight to visual information during audiovisual speech recognition.
  • Friederici, A., & Levelt, W. J. M. (1987). Spatial description in microgravity: Aspects of cognitive adaptation. In P. R. Sahm, R. Jansen, & M. Keller (Eds.), Proceedings of the Norderney Symposium on Scientific Results of the German Spacelab Mission D1 (pp. 518-524). Köln, Germany: Wissenschaftliche Projektführung DI c/o DFVLR.
  • Friederici, A., & Levelt, W. J. M. (1987). Sprache. In K. Immelmann, K. Scherer, & C. Vogel (Eds.), Funkkolleg Psychobiologie (pp. 58-87). Weinheim: Beltz.
  • Fusaroli, R., Tylén, K., Garly, K., Steensig, J., Christiansen, M. H., & Dingemanse, M. (2017). Measures and mechanisms of common ground: Backchannels, conversational repair, and interactive alignment in free and task-oriented social interactions. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 2055-2060). Austin, TX: Cognitive Science Society.

    Abstract

    A crucial aspect of everyday conversational interactions is our ability to establish and maintain common ground. Understanding the relevant mechanisms involved in such social coordination remains an important challenge for cognitive science. While common ground is often discussed in very general terms, different contexts of interaction are likely to afford different coordination mechanisms. In this paper, we investigate the presence and relation of three mechanisms of social coordination – backchannels, interactive alignment and conversational repair – across free and task-oriented conversations. We find significant differences: task-oriented conversations involve higher presence of repair – restricted offers in particular – and backchannel, as well as a reduced level of lexical and syntactic alignment. We find that restricted repair is associated with lexical alignment and open repair with backchannels. Our findings highlight the need to explicitly assess several mechanisms at once and to investigate diverse activities to understand their role and relations.
  • Gast, V., & Levshina, N. (2014). Motivating w(h)-Clefts in English and German: A hypothesis-driven parallel corpus study. In A.-M. De Cesare (Ed.), Frequency, Forms and Functions of Cleft Constructions in Romance and Germanic: Contrastive, Corpus-Based Studies (pp. 377-414). Berlin: De Gruyter.
  • Gebre, B. G., Wittenburg, P., Heskes, T., & Drude, S. (2014). Motion history images for online speaker/signer diarization. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 1537-1541). Piscataway, NJ: IEEE.

    Abstract

    We present a solution to the problem of online speaker/signer diarization - the task of determining "who spoke/signed when?". Our solution is based on the idea that gestural activity (hands and body movement) is highly correlated with uttering activity. This correlation is necessarily true for sign languages and mostly true for spoken languages. The novel part of our solution is the use of motion history images (MHI) as a likelihood measure for probabilistically detecting uttering activities. MHI is an efficient representation of where and how motion occurred for a fixed period of time. We conducted experiments on 4.9 hours of a publicly available dataset (the AMI meeting data) and 1.4 hours of sign language dataset (Kata Kolok data). The best performance obtained is 15.70% for sign language and 31.90% for spoken language (measurements are in DER). These results show that our solution is applicable in real-world applications like video conferences.

    Files private

    Request files
  • Gebre, B. G., Wittenburg, P., Drude, S., Huijbregts, M., & Heskes, T. (2014). Speaker diarization using gesture and speech. In H. Li, & P. Ching (Eds.), Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 582-586).

    Abstract

    We demonstrate how the problem of speaker diarization can be solved using both gesture and speaker parametric models. The novelty of our solution is that we approach the speaker diarization problem as a speaker recognition problem after learning speaker models from speech samples corresponding to gestures (the occurrence of gestures indicates the presence of speech and the location of gestures indicates the identity of the speaker). This new approach offers many advantages: comparable state-of-the-art performance, faster computation and more adaptability. In our implementation, parametric models are used to model speakers' voice and their gestures: more specifically, Gaussian mixture models are used to model the voice characteristics of each person and all persons, and gamma distributions are used to model gestural activity based on features extracted from Motion History Images. Tests on 4.24 hours of the AMI meeting data show that our solution makes DER score improvements of 19% on speech-only segments and 4% on all segments including silence (the comparison is with the AMI system).
  • Gebre, B. G., Crasborn, O., Wittenburg, P., Drude, S., & Heskes, T. (2014). Unsupervised feature learning for visual sign language identification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Vol 2 (pp. 370-376). Redhook, NY: Curran Proceedings.

    Abstract

    Prior research on language identification focused primarily on text and speech. In this paper, we focus on the visual modality and present a method for identifying sign languages solely from short video samples. The method is trained on unlabelled video data (unsupervised feature learning) and using these features, it is trained to discriminate between six sign languages (supervised learning). We ran experiments on video samples involving 30 signers (running for a total of 6 hours). Using leave-one-signer-out cross-validation, our evaluation on short video samples shows an average best accuracy of 84%. Given that sign languages are under-resourced, unsupervised feature learning techniques are the right tools and our results indicate that this is realistic for sign language identification.

Share this page